Дисертації з теми "Classification automatique floue"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-40 дисертацій для дослідження на тему "Classification automatique floue".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Girard, Régis. "Classification conceptuelle sur des données arborescentes et imprécises." La Réunion, 1997. http://elgebar.univ-reunion.fr/login?url=http://thesesenligne.univ.run/97_08_Girard.pdf.
Повний текст джерелаTurpin-Dhilly, Sandrine. "Adaptation des outils de la morphologie floue à l'analyse de données multidimensionnelles." Lille 1, 2000. http://www.theses.fr/2000LIL10035.
Повний текст джерелаAlbert, Benoit. "Méthodes d'optimisation avancées pour la classification automatique." Electronic Thesis or Diss., Université Clermont Auvergne (2021-...), 2024. http://www.theses.fr/2024UCFA0005.
Повний текст джерелаIn data partitioning, the goal is to group objects based on their similarity. K-means is one of the most commonly used models, where each cluster is represented by its centroid. Objects are assigned to the nearest cluster based on a distance metric. The choice of this distance is crucial to account for the similarity between the data points. Opting for the Mahalanobis distance instead of the Euclidean distance enables the model to detect classes of ellipsoidal shape rather than just spherical ones. The use of this distance metric presents numerous opportunities but also raises new challenges explored in my thesis.The central objective is the optimization of models, particularly FCM-GK (a fuzzy variant of k-means), which is a non-convex problem. The idea is to achieve a higher-quality partitioning without creating a new model by applying more robust optimization methods. In this regard, we propose two approaches: ADMM (Alternating Direction Method of Multipliers) and Nesterov's accelerated gradient method. Numerical experiments highlight the particular effectiveness of ADMM optimization, especially when the number of attributes in the dataset is significantly higher than the number of clusters.Incorporating the Mahalanobis distance into the model requires the introduction of an evaluation measure dedicated to partitions based on this distance. An extension of the Xie and Beni evaluation measure is proposed. This index serves as a tool to determine the optimal distance to use.Finally, the management of subsets in ECM (evidential variant) is addressed by determining the optimal imprecision zone. A new formulation of centroids and distances for subsets from clusters is introduced. Theoretical analyses and numerical experiments underscore the relevance of this new formulation
Benouhiba, Toufik. "Approche génétique et floue pour les systèmes d'agents adaptatifs : application à la reconnaissance des scenarii." Troyes, 2005. http://www.theses.fr/2005TROY0014.
Повний текст джерелаThe objective of this thesis is to use minimal a priori knowledge in order to generate uncertain rules which manipulate imprecise data. The proposed architecture has been tested on a multi-agent system to recognize scenarios. The realized works are distributed into three axis: - The first one concerns uncertain reasoning with imprecise data. The evidence theory and intuitionistic fuzzy logic have been used to model such reasoning. – The second axis corresponds to classifier systems and genetic programming. The proposed approach use the power of genetic programming and combine it to classifier systems. A new learning mechanism based on evidence theory is introduced in order to use this theory as a support of reasoning. – The third axis concerns cooperation in adaptive multi-agents systems. Classifier systems have been improved by using an explicit cooperation between a number of classifier agents. We also propose a new data fusion operator based on evidence theory and adapted to the manipulated data. The developed system has been used to recognize car’s maneuvers. In fact, we have proposed a multi-agent architecture to make recognition. Maneuvers are decomposed into several layers in order to recognize them with a given granularity level
Aldea, Emanuel. "Apprentissage de données structurées pour l'interprétation d'images." Paris, Télécom ParisTech, 2009. http://www.theses.fr/2009ENST0053.
Повний текст джерелаImage interpretation methods use primarily the visual features of low-level or high-level interest elements. However, spatial information concerning the relative positioning of these elements is equally beneficial, as it has been shown previously in segmentation and structure recognition. Fuzzy representations permit to assess at the same time the imprecision degree of a relation and the gradual transition between the satisfiability and the non-satisfiability of a relation. The objective of this work is to explore techniques of spatial information representation and their integration in the learning process, within the context of image classifiers that make use of graph kernels. We motivate our choice of labeled graphs for representing images, in the context of learning with SVM classifiers. Graph kernels have been studied intensively in computational chemistry and biology, but an adaptation for image related graphs is necessary, since image structures and properties of the information encoded in the labeling are fundamentally different. We illustrate the integration of spatial information within the graphical model by considering fuzzy adjacency measures between interest elements, and we define a family of graph representations determined by different thresholds applied to these spatial measures. Finally, we employ multiple kernel learning in order to build up classifiers that can take into account different graphical representations of the same image at once. Results show that spatial information complements the visual features of distinctive elements in images and that adapting the discriminative kernel functions for the fuzzy spatial representations is beneficial in terms of performance
Mokhtari, Aimed. "Diagnostic des systèmes hybrides : développement d'une méthode associant la détection par classification et la simulation dynamique." Phd thesis, INSA de Toulouse, 2007. http://tel.archives-ouvertes.fr/tel-00200034.
Повний текст джерелаGokana, Denis. "Contribution à la reconnaissance automatique de caractères manuscrits : application à la lecture optique de caractères sur supports mobiles." Paris 11, 1986. http://www.theses.fr/1986PA112063.
Повний текст джерелаRagot, Nicolas. "MÉLIDIS : Reconnaissance de formes par modélisation mixte intrinsèque/discriminante à base de systèmes d'inférence floue hiérarchisés." Phd thesis, Rennes 1, 2003. http://www.theses.fr/2003REN10078.
Повний текст джерелаCutrona, Jérôme. "Analyse de forme des objets biologiques : représentation, classification et suivi temporel." Reims, 2003. http://www.theses.fr/2003REIMS018.
Повний текст джерелаN biology, the relationship between shape, a major element in computer vision, and function has been emphasized since a long time. This thesis proposes a processing line leading to unsupervised shape classification, deformation tracking and supervised classification of whole population of objects. We first propose a contribution to unsupervised segmentation based on a fuzzy classification method and two semi-automatic methods founded on fuzzy connectedness and watersheds. Next, we perform a study on several shape descriptors including primitives and anti-primitives, contour, silhouete and multi-scale curvature. After shape matching, the descriptors are submitted to statistical analysis to highlight the modes of variations within the samples. The obtained statistical model is the basis of the proposed applications
Isaza, Narvaez Claudia Victoria. "Diagnostic par techniques d'apprentissage floues: concept d'une méthode de validation et d'optimisation des partitions." Phd thesis, INSA de Toulouse, 2007. http://tel.archives-ouvertes.fr/tel-00190884.
Повний текст джерелаLaleye, Frejus Adissa Akintola. "Contributions à l'étude et à la reconnaissance automatique de la parole en Fongbe." Thesis, Littoral, 2016. http://www.theses.fr/2016DUNK0452/document.
Повний текст джерелаOne of the difficulties of an unresourced language is the lack of technology services in the speech and text processing. In this thesis, we faced the problematic of an acoustical study of the isolated and continous speech in Fongbe as part of the speech recognition. Tonal complexity of the oral and the recent agreement of writing the Fongbe led us to study the Fongbe throughout the chain of an automatic speech recognition. In addition to the collected linguistic resources (vocabularies, large text and speech corpus, pronunciation dictionaries) for building the algorithms, we proposed a complete recipe of algorithms (including algorithms of classification and recognition of isolated phonemes and segmentation of continuous speech into syllable), based on an acoustic study of the different sounds, for Fongbe automatic processing. In this manuscript, we also presented a methodology for developing acoustic models and language models to facilitate speech recognition in Fongbe. In this study, it was proposed and evaluated an acoustic modeling based on grapheme (since the Fongbe don't have phonetic dictionary) and also the impact of tonal pronunciation on the performance of a Fongbe ASR system. Finally, the written and oral resources collected for Fongbe and experimental results obtained for each aspect of an ASR chain in Fongbe validate the potential of the methods and algorithms that we proposed
Ribes, Jean-Christophe. "Définition d'une stratégie de surveillance de l'installation AIRIX dans un but de maintenance prédictive." Reims, 2001. http://www.theses.fr/2001REIMS016.
Повний текст джерелаHirsch, Gérard. "Équations de relation floue et mesures d'incertain en reconnaissance de formes." Nancy 1, 1987. http://www.theses.fr/1987NAN10030.
Повний текст джерелаDang, Van Mô. "Classification de donnees spatiales : modeles probabilistes et criteres de partitionnement." Compiègne, 1998. http://www.theses.fr/1998COMP1173.
Повний текст джерелаElisabeth, Erol. "Fouille de données spatio-temporelles, résumés de données et apprentissage automatique : application au système de recommandations touristique, données médicales et détection des transactions atypiques dans le domaine financier." Thesis, Antilles, 2021. http://www.theses.fr/2021ANTI0607.
Повний текст джерелаData mining is one of the components of Customer Relationship Management (CRM), widely deployed in companies. It is the process of extracting interesting, non-trivial, implicit, unknown and potentially useful knowledge from data. This process relies on algorithms from various scientific disciplines (statistics, artificial intelligence, databases) to build models from data stored in data warehouses.The objective of determining models, established from clusters in the service of improving knowledge of the customer in the generic sense, the prediction of his behavior and the optimization of the proposed offer. Since these models are intended to be used by users who are specialists in the field of data, researchers in health economics and management sciences or professionals in the sector studied, this research work emphasizes the usability of data mining environments.This thesis is concerned with spatio-temporal data mining. It particularly highlights an original approach to data processing with the aim of enriching practical knowledge in the field.This work includes an application component in four chapters which corresponds to four systems developed:- A model for setting up a recommendation system based on the collection of GPS positioning data,- A data summary tool optimized for the speed of responses to requests for the medicalization of information systems program (PMSI),- A machine learning tool for the fight against money laundering in the financial system,- A model for the prediction of activity in VSEs which are weather-dependent (tourism, transport, leisure, commerce, etc.). The problem here is to identify classification algorithms and neural networks for data analysis aimed at adapting the company's strategy to economic changes
Hamdan, Hani. "Développement de méthodes de classification pour le contrôle par émission acoustique d'appareils à pression." Compiègne, 2005. http://www.theses.fr/2005COMP1583.
Повний текст джерелаThis PhD thesis deals with real-time computer-aided decision for acoustic emission-based control of pressure equipments. The addressed problem is the taking into account of the location uncertainty of acoustic emission signals, in the mixture model-based clustering. Two new algorithms (EM and CEM for uncertain data) are developed. These algorithms are only based on uncertainty zone data and their development is carried out by optimizing new likelihood criteria adapted to this kind of data. In order to speed up the data processing when the data size becomes very big, we have also developed a new method for the discretization of uncertainty zone data. This method is compared with the traditional one applied to imprecise data. An experimental study using simulated and real data shows the efficiency of the various developed approaches
Caldairou, Benoît. "Contributions à la segmentation des structures cérébrales en IRM foetale." Phd thesis, Université de Strasbourg, 2012. http://tel.archives-ouvertes.fr/tel-00747860.
Повний текст джерелаHernandez, De Leon Hector Ricardo. "Supervision et diagnostic des procédés de production d'eau potable." Phd thesis, INSA de Toulouse, 2006. http://tel.archives-ouvertes.fr/tel-00136157.
Повний текст джерелаMascarilla, Laurent. "Apprentissage de connaissances pour l'interprétation des images satellite." Toulouse 3, 1996. http://www.theses.fr/1996TOU30300.
Повний текст джерелаChang, Chien Kuang Che. "Automated lung screening system of multiple pathological targets in multislice CT." Thesis, Evry, Institut national des télécommunications, 2011. http://www.theses.fr/2011TELE0021/document.
Повний текст джерелаThis research aims at developing a computer-aided diagnosis (CAD) system for fully automatic detection and classification of pathological lung parenchyma patterns in idiopathic interstitial pneumonias (IIP) and emphysema using multi-detector computed tomography (MDCT). The proposed CAD system is based on 3-D mathematical morphology, texture and fuzzy logic analysis, and can be divided into four stages: (1) a multi-resolution decomposition scheme based on a 3-D morphological filter was exploited to discriminate the lung region patterns at different analysis scales. (2) An additional spatial lung partitioning based on the lung tissue texture was introduced to reinforce the spatial separation between patterns extracted at the same resolution level in the decomposition pyramid. Then, (3) a hierarchic tree structure was exploited to describe the relationship between patterns at different resolution levels, and for each pattern, six fuzzy membership functions were established for assigning a probability of association with a normal tissue or a pathological target. Finally, (4) a decision step exploiting the fuzzy-logic assignments selects the target class of each lung pattern among the following categories: normal (N), emphysema (EM), fibrosis/honeycombing (FHC), and ground glass (GDG). The experimental validation of the developed CAD system allowed defining some specifications related with the recommendation values for the number of the resolution levels NRL = 12, and the CT acquisition protocol including the “LUNG” / ”BONPLUS” reconstruction kernel and thin collimations (1.25 mm or less). It also stresses out the difficulty to quantitatively assess the performance of the proposed approach in the absence of a ground truth, such as a volumetric assessment, large margin selection, and distinguishability between fibrosis and high-density (vascular) regions
Guarda, Alvaro. "Apprentissage génétique de règles de reconnaissance visuelle : application à la reconnaissance d'éléments du visage." Grenoble INPG, 1998. http://www.theses.fr/1998INPG0110.
Повний текст джерелаDias, E. Silva Ascendino Flavio. "Contribution a l'analyse structurale par des methodes de classification automatique." Toulouse, INSA, 1986. http://www.theses.fr/1986ISAT0014.
Повний текст джерелаLuqman, Muhammad Muzzamil. "Fuzzy multilevel graph embedding for recognition, indexing and retrieval of graphic document images." Thesis, Tours, 2012. http://www.theses.fr/2012TOUR4005/document.
Повний текст джерелаThis thesis addresses the problem of lack of efficient computational tools for graph based structural pattern recognition approaches and proposes to exploit computational strength of statistical pattern recognition. It has two fold contributions. The first contribution is a new method of explicit graph embedding. The proposed graph embedding method exploits multilevel analysis of graph for extracting graph level information, structural level information and elementary level information from graphs. It embeds this information into a numeric feature vector. The method employs fuzzy overlapping trapezoidal intervals for addressing the noise sensitivity of graph representations and for minimizing the information loss while mapping from continuous graph space to discrete vector space. The method has unsupervised learning abilities and is capable of automatically adapting its parameters to underlying graph dataset. The second contribution is a framework for automatic indexing of graph repositories for graph retrieval and subgraph spotting. This framework exploits explicit graph embedding for representing the cliques of order 2 by numeric feature vectors, together with classification and clustering tools for automatically indexing a graph repository. It does not require a labeled learning set and can be easily deployed to a range of application domains, offering ease of query by example (QBE) and granularity of focused retrieval
Blanchard, Frédéric Herbin Michel. "Visualisation et classification de données multidimensionnelles Application aux images multicomposantes /." Reims : S.C.D. de l'Université, 2005. http://scdurca.univ-reims.fr/exl-doc/GED00000287.pdf.
Повний текст джерелаSchmitt, Emmanuel. "Contribution au Système d'Information d'un Produit « Bois ». Appariement automatique de pièces de bois selon des critères de couleur et de texture." Phd thesis, Université Henri Poincaré - Nancy I, 2007. http://tel.archives-ouvertes.fr/tel-00170106.
Повний текст джерелаArzi, Mohammad. "Traitement automatique des signaux vestibulo-oculaires et optocinétiques." Lyon, INSA, 1986. http://www.theses.fr/1986ISAL0025.
Повний текст джерелаGuillon, Arthur. "Opérateurs de régularisation pour le subspace clustering flou." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS121.
Повний текст джерелаSubspace clustering is a data mining task which consists in simultaneously identifiying groups of similar data and making this similarity explicit, for example by selecting features characteristic of the groups. In this thesis, we consider a specific family of fuzzy subspace clustering models, which are based on the minimization of a cost function. We propose three desirable qualities of clustering, which are absent from the solutions computed by the previous models. We then propose simple penalty terms which we use to encode these properties in the original cost functions. Some of these terms are non-differentiable and the techniques standard in fuzzy clustering cannot be applied to minimize the new cost functions. We thus propose a new, generic optimization algorithm, which extends the standard approach by combining alternate optimization and proximal gradient descent. We then instanciate this algorithm with operators minimizing the three previous penalty terms and show that the resulting algorithms posess the corresponding qualities
Blanchard, Frédéric. "Visualisation et classification de données multidimensionnelles : Application aux images multicomposantes." Reims, 2005. http://theses.univ-reims.fr/exl-doc/GED00000287.pdf.
Повний текст джерелаThe analysis of multicomponent images is a crucial problem. Visualization and clustering problem are two relevant questions about it. We decided to work in the more general frame of data analysis to answer to these questions. The preliminary step of this work is describing the problems induced by the dimensionality and studying the current dimensionality reduction methods. The visualization problem is then considered and a contribution is exposed. We propose a new method of visualization through color image that provides an immediate and sythetic image od data. Applications are presented. The second contribution lies upstream with the clustering procedure strictly speaking. We etablish a new kind of data representation by using rank transformation, fuzziness and agregation procedures. Its use inprove the clustering procedures by dealing with clusters with dissimilar density or variant effectives and by making them more robust. This work presents two important contributions to the field of data analysis applied to multicomponent image. The variety of the tools involved (originally from decision theory, uncertainty management, data mining or image processing) make the presented methods usable in many diversified areas as well as multicomponent images analysis
Silva, Bernardes Juliana. "Evolution et apprentissage automatique pour l'annotation fonctionnelle et la classification des homologies lointains en protéines." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00684155.
Повний текст джерелаGuerra, Thierry-Marie. "Analyse de données objectivo-subjectives : Approche par la théorie des sous-ensembles flous." Valenciennes, 1991. https://ged.uphf.fr/nuxeo/site/esupversions/a3f55508-7363-49a4-a531-9d723ff55359.
Повний текст джерелаEl-Hajjami, Hassan. "Application de la théorie des sous-ensembles flous pour le développement d'un algorithme de classification séquentielle non supervisée et non paramétrique pour le suivi en temps réel de l'évolution de l'état d'une structure soumise à des sollicitations extérieures." Compiègne, 1991. http://www.theses.fr/1991COMPE093.
Повний текст джерелаBoussarsar, Riadh. "Contribution des mesures floues et d'un modèle markovien à la segmentation d'images couleur." Rouen, 1997. http://www.theses.fr/1997ROUES036.
Повний текст джерелаBouayad, Mohammed. "Prétopologie et reconnaissances des formes." Lyon, INSA, 1998. http://theses.insa-lyon.fr/publication/1998ISAL0120/these.pdf.
Повний текст джерелаPretopology has been introduced as a modelisation tool for the pattern recognition and image processing in the early 80’ and since some related works has been achieved. The first chapter describe the model and show the advantage of that model for the main topics of Pattern Recognition and image processing, in particular in automatic classification, supervised learning (and adaptation methods), in multi-scale perception and rejection techniques. Pretopology show two aspect : it’s a modelisation tool (which traduce the notion of proximity, resemblance and/or neighborhood) and a simple way to build algorithm by using the non idempotence of the adherence mapping. A double comparative study has been performed, first between Petropology and other modelisation theories which appear simultaneously in the same domain (fussy theory, mathematical morphology, rough set) and secondly between the pretopological methods in Pattern Recognition and the common approach used in this domain (statistical, structural or syntactical methods) ; we show that these methods are not from the same type. The second bring a more technical contribution concerning the extension of the pretopological model. We show a lack of description for the continuity concept which is omnipresent in Pattern Recognition ; we propose some possible translation of this continuity concept which are coherent with the general theory because they are at the same time models and tools for algorithms
Bouayad, Mohammed Emptoz Hubert. "Prétopologie et reconnaissances des formes." Villeurbanne : Doc'INSA, 2006. http://docinsa.insa-lyon.fr/these/pont.php?id=bouayad.
Повний текст джерелаARMAND, Stéphane. "Analyse Quantifiée de la Marche : extraction de connaissances à partir de données pour l'aide à l'interprétation clinique de la marche digitigrade." Phd thesis, Université de Valenciennes et du Hainaut-Cambresis, 2005. http://tel.archives-ouvertes.fr/tel-00010618.
Повний текст джерелаGunes, Veyis. "Reconnaissance des formes évolutives par combinaison, coopération et sélection de classifieurs." Phd thesis, Université de La Rochelle, 2001. http://tel.archives-ouvertes.fr/tel-00631621.
Повний текст джерелаQuéré, Romain. "Quelques propositions pour la comparaison de partitions non strictes." Phd thesis, Université de La Rochelle, 2012. http://tel.archives-ouvertes.fr/tel-00950514.
Повний текст джерелаElfelly, Nesrine. "Approche neuronale de la représentation et de la commande multimodèles de processus complexes." Thesis, Lille 1, 2010. http://www.theses.fr/2010LIL10156/document.
Повний текст джерелаThis contribution deals with a new approach for complex processes modeling and control. It is essentially based on neuro-fuzzy classification methods and aims to derive a base of models describing the system in the whole operating domain by using only input/output measurements. The implementation of this approach requires three main steps:(1) determination of the multimodel stucture, for which the number of models are firstly worked out by using a neural network with a rival penalized competitive learning. The different operating clusters are then selected referring to an adequate classification algorithm (Kohonen card, K-means or fuzzy K-means),(2) parametric model identification using the classification results and a validation procedure to confirm the efficiency of the proposed multimodel structure through an appropriate decision mechanism which allows the estimation of the contribution (or validity) of each model.(3) determination of the global system control parameters deduced through a fusion of models control parameters.The suggested approach seems to be interessent since it's easy to apply, doesn't require any a priori knowledge and propose to adapt the processing by choosing the adequate methods of data classification and validity computation referring to some aspects of the operating domain of the considered process
Qureshi, Taimur. "Contributions to decision tree based learning." Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20051/document.
Повний текст джерелаLa recherche avancée dans les méthodes d'acquisition de données ainsi que les méthodes de stockage et les technologies d'apprentissage, s'attaquent défi d'automatiser de manière systématique les techniques d'apprentissage de données en vue d'extraire des connaissances valides et utilisables.La procédure de découverte de connaissances s'effectue selon les étapes suivants: la sélection des données, la préparation de ces données, leurs transformation, le fouille de données et finalement l'interprétation et validation des résultats trouvés. Dans ce travail de thèse, nous avons développé des techniques qui contribuent à la préparation et la transformation des données ainsi qu'a des méthodes de fouille des données pour extraire les connaissances. A travers ces travaux, on a essayé d'améliorer l'exactitude de la prédiction durant tout le processus d'apprentissage. Les travaux de cette thèse se basent sur les arbres de décision. On a alors introduit plusieurs approches de prétraitement et des techniques de transformation; comme le discrétisation, le partitionnement flou et la réduction des dimensions afin d'améliorer les performances des arbres de décision. Cependant, ces techniques peuvent être utilisées dans d'autres méthodes d'apprentissage comme la discrétisation qui peut être utilisées pour la classification bayesienne.Dans le processus de fouille de données, la phase de préparation de données occupe généralement 80 percent du temps. En autre, elle est critique pour la qualité de la modélisation. La discrétisation des attributs continus demeure ainsi un problème très important qui affecte la précision, la complexité, la variance et la compréhension des modèles d'induction. Dans cette thèse, nous avons proposes et développé des techniques qui ce basent sur le ré-échantillonnage. Nous avons également étudié d'autres alternatives comme le partitionnement flou pour une induction floue des arbres de décision. Ainsi la logique floue est incorporée dans le processus d'induction pour augmenter la précision des modèles et réduire la variance, en maintenant l'interprétabilité.Finalement, nous adoptons un schéma d'apprentissage topologique qui vise à effectuer une réduction de dimensions non-linéaire. Nous modifions une technique d'apprentissage à base de variété topologiques `manifolds' pour savoir si on peut augmenter la précision et l'interprétabilité de la classification
Laclau, Charlotte. "Hard and fuzzy block clustering algorithms for high dimensional data." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB014.
Повний текст джерелаWith the increasing number of data available, unsupervised learning has become an important tool used to discover underlying patterns without the need to label instances manually. Among different approaches proposed to tackle this problem, clustering is arguably the most popular one. Clustering is usually based on the assumption that each group, also called cluster, is distributed around a center defined in terms of all features while in some real-world applications dealing with high-dimensional data, this assumption may be false. To this end, co-clustering algorithms were proposed to describe clusters by subsets of features that are the most relevant to them. The obtained latent structure of data is composed of blocks usually called co-clusters. In first two chapters, we describe two co-clustering methods that proceed by differentiating the relevance of features calculated with respect to their capability of revealing the latent structure of the data in both probabilistic and distance-based framework. The probabilistic approach uses the mixture model framework where the irrelevant features are assumed to have a different probability distribution that is independent of the co-clustering structure. On the other hand, the distance-based (also called metric-based) approach relied on the adaptive metric where each variable is assigned with its weight that defines its contribution in the resulting co-clustering. From the theoretical point of view, we show the global convergence of the proposed algorithms using Zangwill convergence theorem. In the last two chapters, we consider a special case of co-clustering where contrary to the original setting, each subset of instances is described by a unique subset of features resulting in a diagonal structure of the initial data matrix. Same as for the two first contributions, we consider both probabilistic and metric-based approaches. The main idea of the proposed contributions is to impose two different kinds of constraints: (1) we fix the number of row clusters to the number of column clusters; (2) we seek a structure of the original data matrix that has the maximum values on its diagonal (for instance for binary data, we look for diagonal blocks composed of ones with zeros outside the main diagonal). The proposed approaches enjoy the convergence guarantees derived from the results of the previous chapters. Finally, we present both hard and fuzzy versions of the proposed algorithms. We evaluate our contributions on a wide variety of synthetic and real-world benchmark binary and continuous data sets related to text mining applications and analyze advantages and inconvenients of each approach. To conclude, we believe that this thesis covers explicitly a vast majority of possible scenarios arising in hard and fuzzy co-clustering and can be seen as a generalization of some popular biclustering approaches