Dissertations / Theses on the topic 'Analyse non-supervisée'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Analyse non-supervisée.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Goubet, Étienne. "Contrôle non destructif par analyse supervisée d'images 3D ultrasonores." Cachan, Ecole normale supérieure, 1999. http://www.theses.fr/1999DENS0011.
Full textHuck, Alexis. "Analyse non-supervisée d’images hyperspectrales : démixage linéaire et détection d’anomalies." Aix-Marseille 3, 2009. http://www.theses.fr/2009AIX30036.
Full textThis thesis focusses on two research fields regarding unsupervised analysis of hyperspectral images (HSIs). Under the assumptions of the linear spectral mixing model, the formalism of Non-Negative Matrix Factorization is investigated for unmixing purposes. We propose judicious spectral and spatial a priori knowledge to regularize the problem. In addition, we propose an estimator for the projected gradient optimal step-size. Thus, suitably regularized NMF is shown to be a relevant approach to unmix HSIs. Then, the problem of anomaly detection is considered. We propose an algorithm for Anomalous Component Pursuit (ACP), simultaneously based on projection pursuit and on a probabilistic model and hypothesis testing. ACP detects the anomalies with a constant false alarm rate and discriminates them into spectrally homogeneous classes
Leblanc, Brice. "Analyse non supervisée de données issues de Systèmes de Transport Intelligent-Coopératif." Thesis, Reims, 2020. http://www.theses.fr/2020REIMS014.
Full textThis thesis takes place in the context of Vehicular Ad-hoc Networks (VANET), and more specifically the context of Cooperative-Intelligent Transport System (C-ITS). These systems are exchanging information to enhance road safety.The purpose of this thesis is to introduce data analysis tools that may provide road operators information on the usage/state of their infrastructures. Therefore, this information may help to improve road safety. We identify two cases we want to deal with: driving profile identification and road obstacle detection.For dealing with those issues, we propose to use unsupervised learning approaches: clustering methods for driving profile identification, and concept drift detection for obstacle detection. This thesis introduces three main contributions: a methodology allowing us to transform raw C-ITS data in, first, trajectory, and then, learning data-set; the use of classical clustering methods and Points Of Interests for driving profiles with experiments on mobile device data and network logs data; and the consideration of a crowd of vehicles providing network log data as data streams and considered as input of concept drift detection algorithms to recognize road obstacles
Fontaine, Michaël. "Segmentation non supervisée d'images couleur par analyse de la connexité des pixels." Lille 1, 2001. https://pepite-depot.univ-lille.fr/LIBRE/Th_Num/2001/50376-2001-305-306.pdf.
Full textRafi, Selwa. "Chaînes de Markov cachées et séparation non supervisée de sources." Thesis, Evry, Institut national des télécommunications, 2012. http://www.theses.fr/2012TELE0020/document.
Full textThe restoration problem is usually encountered in various domains and in particular in signal and image processing. It consists in retrieving original data from a set of observed ones. For multidimensional data, the problem can be solved using different approaches depending on the data structure, the transformation system and the noise. In this work, we have first tackled the problem in the case of discrete data and noisy model. In this context, the problem is similar to a segmentation problem. We have exploited Pairwise and Triplet Markov chain models, which generalize Hidden Markov chain models. The interest of these models consist in the possibility to generalize the computation procedure of the posterior probability, allowing one to perform bayesian segmentation. We have considered these methods for two-dimensional signals and we have applied the algorithms to retrieve of old hand-written document which have been scanned and are subject to show through effect. In the second part of this work, we have considered the restoration problem as a blind source separation problem. The well-known "Independent Component Analysis" (ICA) method requires the assumption that the sources be statistically independent. In practice, this condition is not always verified. Consequently, we have studied an extension of the ICA model in the case where the sources are not necessarily independent. We have introduced a latent process which controls the dependence and/or independence of the sources. The model that we propose combines a linear instantaneous mixing model similar to the one of ICA model and a probabilistic model on the sources with hidden variables. In this context, we show how the usual independence assumption can be weakened using the technique of Iterative Conditional Estimation to a conditional independence assumption
RAFI, Selwa. "Chaînes de Markov cachées et séparation non supervisée de sources." Phd thesis, Institut National des Télécommunications, 2012. http://tel.archives-ouvertes.fr/tel-00995414.
Full textCutrona, Jérôme. "Analyse de forme des objets biologiques : représentation, classification et suivi temporel." Reims, 2003. http://www.theses.fr/2003REIMS018.
Full textN biology, the relationship between shape, a major element in computer vision, and function has been emphasized since a long time. This thesis proposes a processing line leading to unsupervised shape classification, deformation tracking and supervised classification of whole population of objects. We first propose a contribution to unsupervised segmentation based on a fuzzy classification method and two semi-automatic methods founded on fuzzy connectedness and watersheds. Next, we perform a study on several shape descriptors including primitives and anti-primitives, contour, silhouete and multi-scale curvature. After shape matching, the descriptors are submitted to statistical analysis to highlight the modes of variations within the samples. The obtained statistical model is the basis of the proposed applications
Boubou, Mounzer. "Contribution aux méthodes de classification non supervisée via des approches prétopologiques et d'agrégation d'opinions." Phd thesis, Université Claude Bernard - Lyon I, 2007. http://tel.archives-ouvertes.fr/tel-00195779.
Full textTa, Minh Thuy. "Techniques d'optimisation non convexe basée sur la programmation DC et DCA et méthodes évolutives pour la classification non supervisée." Electronic Thesis or Diss., Université de Lorraine, 2014. http://www.theses.fr/2014LORR0099.
Full textThis thesis focus on four problems in data mining and machine learning: clustering data streams, clustering massive data sets, weighted hard and fuzzy clustering and finally the clustering without a prior knowledge of the clusters number. Our methods are based on deterministic optimization approaches, namely the DC (Difference of Convex functions) programming and DCA (Difference of Convex Algorithm) for solving some classes of clustering problems cited before. Our methods are also, based on elitist evolutionary approaches. We adapt the clustering algorithm DCA–MSSC to deal with data streams using two windows models: sub–windows and sliding windows. For the problem of clustering massive data sets, we propose to use the DCA algorithm with two phases. In the first phase, massive data is divided into several subsets, on which the algorithm DCA–MSSC performs clustering. In the second phase, we propose a DCA–Weight algorithm to perform a weighted clustering on the obtained centers in the first phase. For the weighted clustering, we also propose two approaches: weighted hard clustering and weighted fuzzy clustering. We test our approach on image segmentation application. The final issue addressed in this thesis is the clustering without a prior knowledge of the clusters number. We propose an elitist evolutionary approach, where we apply several evolutionary algorithms (EAs) at the same time, to find the optimal combination of initial clusters seed and in the same time the optimal clusters number. The various tests performed on several sets of large data are very promising and demonstrate the effectiveness of the proposed approaches
Gan, Changquan. "Une approche de classification non supervisée basée sur la notion des K plus proches voisins." Compiègne, 1994. http://www.theses.fr/1994COMP765S.
Full textTa, Minh Thuy. "Techniques d'optimisation non convexe basée sur la programmation DC et DCA et méthodes évolutives pour la classification non supervisée." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0099/document.
Full textThis thesis focus on four problems in data mining and machine learning: clustering data streams, clustering massive data sets, weighted hard and fuzzy clustering and finally the clustering without a prior knowledge of the clusters number. Our methods are based on deterministic optimization approaches, namely the DC (Difference of Convex functions) programming and DCA (Difference of Convex Algorithm) for solving some classes of clustering problems cited before. Our methods are also, based on elitist evolutionary approaches. We adapt the clustering algorithm DCA–MSSC to deal with data streams using two windows models: sub–windows and sliding windows. For the problem of clustering massive data sets, we propose to use the DCA algorithm with two phases. In the first phase, massive data is divided into several subsets, on which the algorithm DCA–MSSC performs clustering. In the second phase, we propose a DCA–Weight algorithm to perform a weighted clustering on the obtained centers in the first phase. For the weighted clustering, we also propose two approaches: weighted hard clustering and weighted fuzzy clustering. We test our approach on image segmentation application. The final issue addressed in this thesis is the clustering without a prior knowledge of the clusters number. We propose an elitist evolutionary approach, where we apply several evolutionary algorithms (EAs) at the same time, to find the optimal combination of initial clusters seed and in the same time the optimal clusters number. The various tests performed on several sets of large data are very promising and demonstrate the effectiveness of the proposed approaches
Gomes, Da Silva Alzennyr. "Analyse des données évolutives : application aux données d'usage du Web." Phd thesis, Université Paris Dauphine - Paris IX, 2009. http://tel.archives-ouvertes.fr/tel-00445501.
Full textMaugis, Cathy. "Sélection de variables pour la classification non supervisée par mélanges gaussiens : application à l'étude de données transcriptomes." Phd thesis, Université Paris Sud - Paris XI, 2008. http://tel.archives-ouvertes.fr/tel-00344120.
Full textDans la première partie, le modèle proposé, généralisant celui de Raftery et Dean (2006) permet de spécifier le rôle des variables vis-à-vis du processus de classification. Ainsi les variables non significatives peuvent être dépendantes d'une partie des variables retenues pour la classification. Ces modèles sont comparés grâce à un critère de type BIC. Leur identifiabilité est établie et la consistance du critère est démontrée sous des conditions de régularité. En pratique, le statut des variables est obtenu grâce à un algorithme imbriquant deux algorithmes descendants de sélection de variables pour la classification et pour la régression linéaire. L'intérêt de cette procédure est en particulier illustré sur des données transcriptomes. Une amélioration de la modélisation du rôle des variables, consistant à répartir les variables déclarées non significatives entre celles dépendantes et celles indépendantes des variables significatives pour la classification, est ensuite proposée pour pallier une surpénalisation de certains modèles. Enfin, la technologie des puces à ADN engendrant de nombreuses données manquantes, une extension de notre procédure tenant compte de l'existence de ces valeurs manquantes est suggérée, évitant leur
estimation préalable.
Dans la seconde partie, des mélanges gaussiens de formes spécifiques sont considérés et un critère pénalisé non asymptotique est proposé pour sélectionner simultanément le nombre de composantes du mélange et l'ensemble des variables pertinentes pour la classification. Un théorème général de sélection de modèles pour l'estimation de densités par maximum de vraisemblance, proposé par Massart (2007), est utilisé pour déterminer la forme de la pénalité. Ce théorème nécessite le contrôle de l'entropie à crochets des familles de mélanges gaussiens multidimensionnels étudiées. Ce critère dépendant de constantes multiplicatives inconnues, l'heuristique dite "de la pente" est mise en oeuvre pour permettre une utilisation effective de ce critère.
Kassab, Randa. "Analyse des propriétés stationnaires et des propriétés émergentes dans les flux d'information changeant au cours du temps." Thesis, Nancy 1, 2009. http://www.theses.fr/2009NAN10027/document.
Full textMany applications produce and receive continuous, unlimited, and high-speed data streams. This raises obvious problems of storage, treatment and analysis of data, which are only just beginning to be treated in the domain of data streams. On the one hand, it is a question of treating data streams on the fly without having to memorize all the data. On the other hand, it is also a question of analyzing, in a simultaneous and concurrent manner, the regularities inherent in the data stream as well as the novelties, exceptions, or changes occurring in this stream over time. The main contribution of this thesis concerns the development of a new machine learning approach - called ILoNDF - which is based on novelty detection principle. The learning of this model is, contrary to that of its former self, driven not only by the novelty part in the input data but also by the data itself. Thereby, ILoNDF can continuously extract new knowledge relating to the relative frequencies of the data and their variables. This makes it more robust against noise. Being operated in an on-line mode without repeated training, ILoNDF can further address the primary challenges for managing data streams. Firstly, we focus on the study of ILoNDF's behavior for one-class classification when dealing with high-dimensional noisy data. This study enabled us to highlight the pure learning capacities of ILoNDF with respect to the key classification methods suggested until now. Next, we are particularly involved in the adaptation of ILoNDF to the specific context of information filtering. Our goal is to set up user-oriented filtering strategies rather than system-oriented in following two types of directions. The first direction concerns user modeling relying on the model ILoNDF. This provides a new way of looking at user's need in terms of specificity, exhaustivity and contradictory profile-contributing criteria. These criteria go on to estimate the relative importance the user might attach to precision and recall. The filtering threshold can then be adjusted taking into account this knowledge about user's need. The second direction, complementary to the first one, concerns the refinement of ILoNDF's functionality in order to confer it the capacity of tracking drifting user's need over time. Finally, we consider the generalization of our previous work to the case where streaming data can be divided into multiple classes
Kassab, Randa. "Analyse des propriétés stationnaires et des propriétés émergentes dans les flux d'informations changeant au cours du temps." Phd thesis, Université Henri Poincaré - Nancy I, 2009. http://tel.archives-ouvertes.fr/tel-00402644.
Full textL'apport de ce travail de thèse réside principalement dans le développement d'un modèle d'apprentissage - nommé ILoNDF - fondé sur le principe de la détection de nouveauté. L'apprentissage de ce modèle est, contrairement à sa version de départ, guidé non seulement par la nouveauté qu'apporte une donnée d'entrée mais également par la donnée elle-même. De ce fait, le modèle ILoNDF peut acquérir constamment de nouvelles connaissances relatives aux fréquences d'occurrence des données et de leurs variables, ce qui le rend moins sensible au bruit. De plus, doté d'un fonctionnement en ligne sans répétition d'apprentissage, ce modèle répond aux exigences les plus fortes liées au traitement des flux de données.
Dans un premier temps, notre travail se focalise sur l'étude du comportement du modèle ILoNDF dans le cadre général de la classification à partir d'une seule classe en partant de l'exploitation des données fortement multidimensionnelles et bruitées. Ce type d'étude nous a permis de mettre en évidence les capacités d'apprentissage pures du modèle ILoNDF vis-à-vis de l'ensemble des méthodes proposées jusqu'à présent. Dans un deuxième temps, nous nous intéressons plus particulièrement à l'adaptation fine du modèle au cadre précis du filtrage d'informations. Notre objectif est de mettre en place une stratégie de filtrage orientée-utilisateur plutôt qu'orientée-système, et ceci notamment en suivant deux types de directions. La première direction concerne la modélisation utilisateur à l'aide du modèle ILoNDF. Cette modélisation fournit une nouvelle manière de regarder le profil utilisateur en termes de critères de spécificité, d'exhaustivité et de contradiction. Ceci permet, entre autres, d'optimiser le seuil de filtrage en tenant compte de l'importance que pourrait donner l'utilisateur à la précision et au rappel. La seconde direction, complémentaire de la première, concerne le raffinement des fonctionnalités du modèle ILoNDF en le dotant d'une capacité à s'adapter à la dérive du besoin de l'utilisateur au cours du temps. Enfin, nous nous attachons à la généralisation de notre travail antérieur au cas où les données arrivant en flux peuvent être réparties en classes multiples.
Allain, Guillaume. "Prévision et analyse du trafic routier par des méthodes statistiques." Toulouse 3, 2008. http://thesesups.ups-tlse.fr/351/.
Full textThe industrial partner of this work is Mediamobile/V-trafic, a company which processes and broadcasts live road-traffic information. The goal of our work is to enhance traffic information with forecasting and spatial extending. Our approach is sometimes inspired by physical modelling of traffic dynamic, but it mainly uses statistical methods in order to propose self-organising and modular models suitable for industrial constraints. In the first part of this work, we describe a method to forecast trafic speed within a time frame of a few minutes up to several hours. Our method is based on the assumption that traffic on the a road network can be summarized by a few typical profiles. Those profiles are linked to the users' periodical behaviors. We therefore make the assumption that observed speed curves on each point of the network are stemming from a probabilistic mixture model. The following parts of our work will present how we can refine the general method. Medium term forecasting uses variables built from the calendar. The mixture model still stands. Additionnaly we use a fonctionnal regression model to forecast speed curves. We then introduces a local regression model in order to stimulate short-term trafic dynamics. The kernel function is built from real speed observations and we integrate some knowledge about traffic dynamics. The last part of our work focuses on the analysis of speed data from in traffic vehicles. These observations are gathered sporadically in time and on the road segment. The resulting data is completed and smoothed by local polynomial regression
Zullo, Anthony. "Analyse de données fonctionnelles en télédétection hyperspectrale : application à l'étude des paysages agri-forestiers." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30135/document.
Full textIn hyperspectral imaging, each pixel is associated with a spectrum derived from observed reflectance in d measurement points (i.e., wavelengths). We are often facing a situation where the sample size n is relatively low compared to the number d of variables. This phenomenon called "curse of dimensionality" is well known in multivariate statistics. The mored increases with respect to n, the more standard statistical methodologies performances are degraded. Reflectance spectra incorporate in their spectral dimension a continuum that gives them a functional nature. A hyperspectrum can be modelised by an univariate function of wavelength and his representation produces a curve. The use of functional methods allows to take into account functional aspects such as continuity, spectral bands order, and to overcome strong correlations coming from the discretization grid fineness. The main aim of this thesis is to assess the relevance of the functional approach in the field of hyperspectral remote sensing for statistical analysis. We focused on the nonparametric fonctional regression model, including supervised classification. Firstly, the functional approach has been compared with multivariate methods usually involved in remote sensing. The functional approach outperforms multivariate methods in critical situations where one has a small training sample size combined with relatively homogeneous classes (that is to say, hard to discriminate). Secondly, an alternative to the functional approach to overcome the curse of dimensionality has been proposed using parsimonious models. This latter allows, through the selection of few measurement points, to reduce problem dimensionality while increasing results interpretability. Finally, we were interested in the almost systematic situation where one has contaminated functional data. We proved that for a fixed sample size, the finer the discretization, the better the prediction. In other words, the larger dis compared to n, the more effective the functional statistical methodis
Frévent, Camille. "Contribution to spatial statistics for high-dimensional and survival data." Electronic Thesis or Diss., Université de Lille (2022-....), 2022. http://www.theses.fr/2022ULILS032.
Full textIn this thesis, we are interested in statistical spatial learning for high-dimensional and survival data. The objective is to develop unsupervised cluster detection methods by means of spatial scan statistics in the contexts of functional data analysis in one hand and survival data analysis in the other hand. In the first two chapters, we consider univariate and multivariate functional data measured spatially in a geographical area. We propose both parametric and nonparametric spatial scan statistics in this framework. These univariate and multivariate functional approaches avoid the loss of information respectively of a univariate method or a multivariate method applied on the average of the observations during the study period. We study the new methods' performances in simulation studies before applying them on economic and environmental real data. We are also interested in spatial cluster detection of survival data. Although there exist already spatial scan statistics approaches in this framework in the literature, these do not take into account a potential correlation of survival times between individuals of the same spatial unit. Moreover, the spatial nature of the data implies a potential dependence between the spatial units, which should be taken into account. The originality of our proposed method is to introduce a spatial scan statistic based on a Cox model with a spatial frailty, allowing to take into account both the potential correlation between the survival times of the individuals of the same spatial unit and the potential dependence between the spatial units. We compare the performances of this new approach with the existing methods and apply them on real data corresponding to survival times of elderly people with end-stage kidney failure in northern France. Finally, we propose a number of perspectives to our work, both in a direct extension of this thesis in the framework of spatial scan statistics for high-dimensional and survival data, but also perspectives in a broader context of unsupervised spatial analysis (spatial clustering for high-dimensional data (tensors)), and supervised spatial learning (regression)
Nait-Chabane, Ahmed. "Segmentation invariante en rasance des images sonar latéral par une approche neuronale compétitive." Phd thesis, Université de Bretagne occidentale - Brest, 2013. http://tel.archives-ouvertes.fr/tel-00968199.
Full textBlanchard, Frédéric. "Visualisation et classification de données multidimensionnelles : Application aux images multicomposantes." Reims, 2005. http://theses.univ-reims.fr/exl-doc/GED00000287.pdf.
Full textThe analysis of multicomponent images is a crucial problem. Visualization and clustering problem are two relevant questions about it. We decided to work in the more general frame of data analysis to answer to these questions. The preliminary step of this work is describing the problems induced by the dimensionality and studying the current dimensionality reduction methods. The visualization problem is then considered and a contribution is exposed. We propose a new method of visualization through color image that provides an immediate and sythetic image od data. Applications are presented. The second contribution lies upstream with the clustering procedure strictly speaking. We etablish a new kind of data representation by using rank transformation, fuzziness and agregation procedures. Its use inprove the clustering procedures by dealing with clusters with dissimilar density or variant effectives and by making them more robust. This work presents two important contributions to the field of data analysis applied to multicomponent image. The variety of the tools involved (originally from decision theory, uncertainty management, data mining or image processing) make the presented methods usable in many diversified areas as well as multicomponent images analysis
Happillon, Teddy. "Aide au diagnostic de cancers cutanés et de la leucémie lymphoïde chronique par microspectroscopies vibrationnelles couplées à des analyses numériques multivariées." Thesis, Reims, 2013. http://www.theses.fr/2013REIMP204/document.
Full textVibrational spectroscopy is a technology able to record a large amount of molecular information from studied samples. Coupled with chemometrics and classification methods, vibrational spectroscopy is an efficient tool to identify sample structures and substructures. When applied to the biomedical field, this tool shows a high potential for disease diagnosis. It is in this context that the works presented in this thesis have been realized. In a first study, dealing with algorithmic development, an automatic and unsupervised classification algorithm (based on the Fuzzy C-Means) and developed by our laboratory in order to help for skin cancer diagnosis using IR spectroscopy, was improved in order to i) reduce the computational time needed to realize clustering, ii) increase results quality obtained on infrared data, iii) and extend its application fields to simulated and real datasets, commonly used in the literature. This tool has been tested on 16 infrared spectral images of skin cancers (BCC, SCC, Bowen's disease and melanoma), and 49 real and simulated datasets. The obtained results showed the ability of this new algorithm to estimate realistic data partitions regardless the considered dataset. The second study of this work aimed at developing an independent chemometric tool to assist for chronic lymphocytic leukemia diagnosis by Raman spectroscopy. In this second work, different numerical preprocessing steps and a supervised classification algorithm, Support Vector Machines, have been applied on data recorded on blood cells coming from 27 healthy persons and 49 patients with chronic lymphocytic leukemia. The classification results showed a sensitivity of 80% and a specificity of 100% in the disease diagnosis
Kurtz, Camille. "Une approche collaborative segmentation - classification pour l'analyse descendante d'images multirésolutions." Phd thesis, Université de Strasbourg, 2012. http://tel.archives-ouvertes.fr/tel-00735217.
Full textRigouste, Loïs. "Méthodes probabilistes pour l'analyse exploratoire de données textuelles." Phd thesis, Télécom ParisTech, 2006. http://pastel.archives-ouvertes.fr/pastel-00002424.
Full textPradet, Quentin. "Annotation en rôles sémantiques du français en domaine spécifique." Sorbonne Paris Cité, 2015. https://hal.inria.fr/tel-01182711/document.
Full textLn th is Natural Language Processing Ph. D. Thesis, we aim to perform semantic role labeling on French domain-specific texts. This task first disambiguates the sense of predicates in a given text and annotates its child chunks with semantic roles such as Agent, Patient or Destination. The task helps many applications in domains where annotated corpora exist, but is difficult to use otherwise. We first evaluate on the FrameNet corpus an existing method based on VerbNet, which explains why the method is domain-independant. We show that substantial improvements can be obtained. We first use syntactic information by handling the passive voice. Next, we use semantic informations by taking advantage of the selectional restrictions present in VerbNet. To apply this method to French, we first translate lexical resources. We first translate the WordNet lexical database. Next, we translate the VerbNet lexicon which is organized semantically using syntactic information. We obtains its translation, VerbuNet, by reusing two French verb lexicons (the Lexique-Grammaire and Les Verbes Français) and by manually modifying and reorganizing the resulting lexicon. Finally, once those building blocks are in place, we evaluate the feasibilty of semantic role labeling of French and English in three specific domains. We study the pros and cons of using VerbNet and VerbnNet to annotate those domains before explaining our future work
Carel, Elodie. "Segmentation de documents administratifs en couches couleur." Thesis, La Rochelle, 2015. http://www.theses.fr/2015LAROS014/document.
Full textIndustrial companies receive huge volumes of documents everyday. Automation, traceability, feeding information systems, reducing costs and processing times, dematerialization has a clear economic impact. In order to respect the industrial constraints, the traditional digitization process simplifies the images by performing a background/foreground separation. However, this binarization can lead to some segmentation and recognition errors. With the improvements of technology, the community of document analysis has shown a growing interest in the integration of color information in the process to enhance its performance. In order to work within the scope provided by our industrial partner in the digitization flow, an unsupervised segmentation approach was chosen. Our goal is to be able to cope with document images, even when they are encountered for the first time, regardless their content, their structure, and their color properties. To this end, the first issue of this project was to identify a reasonable number of main colors which are observable on an image. Then, we aim to group pixels having both close color properties and a logical or semantic unit into consistent color layers. Thus, provided as a set of binary images, these layers may be reinjected into the digitization chain as an alternative to the conventional binarization. Moreover, they also provide extra-information about colors which could be exploited for segmentation purpose, elements spotting, or as a descriptor. Therefore, we have proposed a spatio-colorimetric approach which gives a set of local regions, known as superpixels, which are perceptually meaningful. Their size is adapted to the content of the document images. These regions are then merged into global color layers by means of a multiresolution analysis
Eke, Samuel. "Stratégie d'évaluation de l'état des transformateurs : esquisse de solutions pour la gestion intégrée des transformateurs vieillissants." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEC013/document.
Full textThis PhD thesis deals the assessment method of the state of power transformers filled with oil. It brings a new approach by implementing classification methods and data mining dedicated to transformer maintenance. It proposes a strategy based on two new oil health indicators built from an adaptive Neuro-Fuzzy Inference System (ANFIS). Two classifiers were built on a labeled learning database. The Naive Bayes classifier was retained for the detection of fault from gases dissolved in oil. A simple and efficient flowchart for evaluating the condition of transformers is proposed. It allows a quick analysis of the parameters resulting from physicochemical analyzes of oil and dissolved gases. Using unsupervised classification techniques through the methods of kmeans and fuzzy C-means allowed to reconstruct operating periods of a transformer, with some particular faults. It has also been demonstrated how these methods can be used as tool to help the maintenance of a group of transformers from available oil analysis data
El, Golli Aïcha. "Extraction de données symboliques et cartes topologiques: application aux données ayant une structure complexe." Phd thesis, Université Paris Dauphine - Paris IX, 2004. http://tel.archives-ouvertes.fr/tel-00178900.
Full textPuigt, Mathieu. "Méthodes de séparation aveugle de sources fondées sur des transformées temps-fréquence. Application à des signaux de parole." Phd thesis, Université Paul Sabatier - Toulouse III, 2007. http://tel.archives-ouvertes.fr/tel-00270811.
Full textNous avons tout d'abord étudié et amélioré des méthodes proposées précédemment par l'équipe, basées sur des critères de variance ou de corrélation, pour des mélanges linéaires instantanés. Elles apportent d'excellentes performances pour des signaux de parole et peuvent aussi séparer des spectres issus de données astrophysiques. Cependant, la nature des mélanges qu'elles peuvent traiter limite leur champ d'application.
Nous avons donc étendu ces approches à des mélanges plus réalistes. Les premières extensions considèrent des mélanges de sources atténuées et décalées temporellement, ce qui correspond physiquement aux mélanges en chambre anéchoïque. Elles nécessitent des hypothèses de parcimonie beaucoup moins fortes que certaines approches de la littérature, tout en traitant le même type de mélanges. Nous avons étudié l'apport de méthodes de classification non-supervisée sur nos approches et avons obtenu de bonnes performances pour des mélanges de signaux de parole.
Enfin, une extension théorique aux mélanges convolutifs généraux est décrite mais nécessite de fortes hypothèses de parcimonie et le réglage d'indéterminations propres aux méthodes fréquentielles.
Wynen, Daan. "Une représentation archétypale de style artistique : résumer et manipuler des stylesartistiques d'une façon interprétable." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM066.
Full textIn this thesis we study the representations used to describe and manipulate artistic style of visual arts.In the neural style transfer literature and related strains of research, different representations have been proposed, but in recent years the by far dominant representations of artistic style in the computer vision community have been those learned by deep neural networks, trained on natural images.We build on these representations with the dual goal of summarizing the artistic styles present in large collections of digitized artworks, as well as manipulating the styles of images both natural and artistic.To this end, we propose a concise and intuitive representation based on archetypal analysis, a classic unsupervised learning method with properties that make it especially suitable for the task. We demonstrate how this archetypal representation of style can be used to discover and describe, in an interpretable way, which styles are present in a large collection.This enables the exploration of styles present in a collection from different angles; different ways of visualizing the information allow for different questions to be asked.These can be about a style that was identified across artworks, about the style of a particular artwork, or more broadly about how the styles that were identified relate to one another.We apply our analysis to a collection of artworks obtained from WikiArt, an online collection effort of visual arts driven by volunteers. This dataset also includes metadata such as artist identies, genre, and style of the artworks. We use this metadata for further analysis of the archetypal style representation along biographic lines of artists and with an eye on the relationships within groups of artists
Grissa, Dhouha. "Etude comportementale des mesures d'intérêt d'extraction de connaissances." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2013. http://tel.archives-ouvertes.fr/tel-01023975.
Full textSharma, Avinash. "Représentation et enregistrement de formes visuelles 3D à l'aide de Laplacien graphe et noyau de la chaleur." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00860533.
Full textTouleimat, Nizar. "Méthodologie d'extraction et d'analyse de réseaux de régulation de gènes : analyse de la réponse transcriptionnelle à l'irradiation chez S. cerevisiæ." Phd thesis, Université d'Evry-Val d'Essonne, 2008. http://tel.archives-ouvertes.fr/tel-00877095.
Full textPuigt, Matthieu. "Méthodes de séparation aveugle de sources fondées sur des transformées temps-fréquence : application à des signaux de parole." Toulouse 3, 2007. http://thesesups.ups-tlse.fr/217/.
Full textSeveral time-frequency (TF) blind source separation (BSS) methods have been proposed in this thesis. In the systems output that have been used, a contribution of each source is estimated, using only mixed signals. All the methods proposed in this manuscript find tiny TF zones where only one source is active and estimate the mixing parameters in these zones. These approaches are particularly well suited for non-stationary sources (speech, music). We first studied and improved linear instantaneous methods based on variance or correlation criteria, that have been previously proposed by our team. They yield excellent performance for signals of speech and can also separate spectra from astrophysical data. However, the nature of the mixtures that they can process limits their application fields. We have extended these approaches to more realistic mixtures. The first extensions consider attenuated and delayed mixtures of sources, which corresponds to mixtures in anechoic chamber. They require less restrictive sparsity assumptions than some approaches previously proposed in the literature, while addressing the same type of mixtures. We have studied the contribution of clustering techniques to our approaches and have achieved good performance for mixtures of speech signals. Lastly, a theoretical extension of these methods to general convolutive mixtures is described. It needs strong sparsity hypotheses and we have to solve classical indeterminacies of frequency-domain BSS methods
Corneli, Marco. "Dynamic stochastic block models, clustering and segmentation in dynamic graphs." Thesis, Paris 1, 2017. http://www.theses.fr/2017PA01E012/document.
Full textThis thesis focuses on the statistical analysis of dynamic graphs, both defined in discrete or continuous time. We introduce a new extension of the stochastic block model (SBM) for dynamic graphs. The proposed approach, called dSBM, adopts non homogeneous Poisson processes to model the interaction times between pairs of nodes in dynamic graphs, either in discrete or continuous time. The intensity functions of the processes only depend on the node clusters, in a block modelling perspective. Moreover, all the intensity functions share some regularity properties on hidden time intervals that need to be estimated. A recent estimation algorithm for SBM, based on the greedy maximization of an exact criterion (exact ICL) is adopted for inference and model selection in dSBM. Moreover, an exact algorithm for change point detection in time series, the "pruned exact linear time" (PELT) method is extended to deal with dynamic graph data modelled via dSBM. The approach we propose can be used for change point analysis in graph data. Finally, a further extension of dSBM is developed to analyse dynamic net- works with textual edges (like social networks, for instance). In this context, the graph edges are associated with documents exchanged between the corresponding vertices. The textual content of the documents can provide additional information about the dynamic graph topological structure. The new model we propose is called "dynamic stochastic topic block model" (dSTBM).Graphs are mathematical structures very suitable to model interactions between objects or actors of interest. Several real networks such as communication networks, financial transaction networks, mobile telephone networks and social networks (Facebook, Linkedin, etc.) can be modelled via graphs. When observing a network, the time variable comes into play in two different ways: we can study the time dates at which the interactions occur and/or the interaction time spans. This thesis only focuses on the first time dimension and each interaction is assumed to be instantaneous, for simplicity. Hence, the network evolution is given by the interaction time dates only. In this framework, graphs can be used in two different ways to model networks. Discrete time […] Continuous time […]. In this thesis both these perspectives are adopted, alternatively. We consider new unsupervised methods to cluster the vertices of a graph into groups of homogeneous connection profiles. In this manuscript, the node groups are assumed to be time invariant to avoid possible identifiability issues. Moreover, the approaches that we propose aim to detect structural changes in the way the node clusters interact with each other. The building block of this thesis is the stochastic block model (SBM), a probabilistic approach initially used in social sciences. The standard SBM assumes that the nodes of a graph belong to hidden (disjoint) clusters and that the probability of observing an edge between two nodes only depends on their clusters. Since no further assumption is made on the connection probabilities, SBM is a very flexible model able to detect different network topologies (hubs, stars, communities, etc.)
Pitou, Cynthia. "Extraction d'informations textuelles au sein de documents numérisés : cas des factures." Thesis, La Réunion, 2017. http://www.theses.fr/2017LARE0015.
Full textDocument processing is the transformation of a human understandable data in a computer system understandable format. Document analysis and understanding are the two phases of document processing. Considering a document containing lines, words and graphical objects such as logos, the analysis of such a document consists in extracting and isolating the words, lines and objects and then grouping them into blocks. The subsystem of document understanding builds relationships (to the right, left, above, below) between the blocks. A document processing system must be able to: locate textual information, identify if that information is relevant comparatively to other information contained in the document, extract that information in a computer system understandable format. For the realization of such a system, major difficulties arise from the variability of the documents characteristics, such as: the type (invoice, form, quotation, report, etc.), the layout (font, style, disposition), the language, the typography and the quality of scanning.This work is concerned with scanned documents, also known as document images. We are particularly interested in locating textual information in invoice images. Invoices are largely used and well regulated documents, but not unified. They contain mandatory information (invoice number, unique identifier of the issuing company, VAT amount, net amount, etc.) which, depending on the issuer, can take various locations in the document. The present work is in the framework of region-based textual information localization and extraction.First, we present a region-based method guided by quadtree decomposition. The principle of the method is to decompose the images of documents in four equals regions and each regions in four new regions and so on. Then, with a free optical character recognition (OCR) engine, we try to extract precise textual information in each region. A region containing a number of expected textual information is not decomposed further. Our method allows to determine accurately in document images, the regions containing text information that one wants to locate and retrieve quickly and efficiently.In another approach, we propose a textual information extraction model consisting in a set of prototype regions along with pathways for browsing through these prototype regions. The life cycle of the model comprises five steps:- Produce synthetic invoice data from real-world invoice images containing the textual information of interest, along with their spatial positions.- Partition the produced data.- Derive the prototype regions from the obtained partition clusters.- Derive pathways for browsing through the prototype regions, from the concept lattice of a suitably defined formal context.- Update incrementally the set of protype regions and the set of pathways, when one has to add additional data
Doan, Tien Tai. "Réalisation d’une aide au diagnostic en orthodontie par apprentissage profond." Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG033.
Full textAccurate processing and diagnosis of dental images is an essential factor determining the success of orthodontic treatment. Many image processing methods have been proposed to address this problem. Those studies mainly work on small datasets of radiographs under laboratory conditions and are not highly applicable as complete products or services. In this thesis, we train deep learning models to diagnose dental problems such as gingivitis and crowded teeth using mobile phones' images. We study feature layers of these models to find the strengths and limitations of each method. Besides training deep learning models, we also embed each of them in a pipeline, including preprocessing and post-processing steps, to create a complete product. For the lack of training data problem, we studied a variety of methods for data augmentation, especially domain adaptation methods using image-to-image translation models, both supervised and unsupervised, and obtain promising results. Image translation networks are also used to simplifying patients' choice of orthodontic appliances by showing them how their teeth could look like during treatment. Generated images have are realistic and in high resolution. Researching further into unsupervised image translation neural networks, we propose an unsupervised imageto- image translation model which can manipulate features of objects in the image without requiring additional annotation. Our model outperforms state-of-the-art techniques on multiple image translation applications and is also extended for few-shot learning problems
Sublemontier, Jacques-Henri. "Classification non supervisée : de la multiplicité des données à la multiplicité des analyses." Phd thesis, Université d'Orléans, 2012. http://tel.archives-ouvertes.fr/tel-00801555.
Full textJouini, Mohamed Soufiane. "Caractérisation des réservoirs basée sur des textures des images scanners de carottes." Thesis, Bordeaux 1, 2009. http://www.theses.fr/2009BOR13769/document.
Full textCores extracted, during wells drilling, are essential data for reservoirs characterization. A medical scanner is used for their acquisition. This feature provide high resolution images improving the capacity of interpretation. The main goal of the thesis is to establish links between these images and petrophysical data. Then parametric texture modelling can be used to achieve this goal and should provide reliable set of descriptors. A possible solution is to focus on parametric methods allowing synthesis. Even though, this method is not a proven mathematically, it provides high confidence on set of descriptors and allows interpretation into synthetic textures. In this thesis methods and algorithms were developed to achieve the following goals : 1. Segment main representative texture zones on cores. This is achieved automatically through learning and classifying textures based on parametric model. 2. Find links between scanner images and petrophysical parameters. This is achieved though calibrating and predicting petrophysical data with images (Supervised Learning Process)
Al-Najdi, Atheer. "Une approche basée sur les motifs fermés pour résoudre le problème de clustering par consensus." Thesis, Université Côte d'Azur (ComUE), 2016. http://www.theses.fr/2016AZUR4111/document.
Full textClustering is the process of partitioning a dataset into groups, so that the instances in the same group are more similar to each other than to instances in any other group. Many clustering algorithms were proposed, but none of them proved to provide good quality partition in all situations. Consensus clustering aims to enhance the clustering process by combining different partitions obtained from different algorithms to yield a better quality consensus solution. In this work, a new consensus clustering method, called MultiCons, is proposed. It uses the frequent closed itemset mining technique in order to discover the similarities between the different base clustering solutions. The identified similarities are presented in a form of clustering patterns, that each defines the agreement between a set of base clusters in grouping a set of instances. By dividing these patterns into groups based on the number of base clusters that define the pattern, MultiCons generates a consensussolution from each group, resulting in having multiple consensus candidates. These different solutions are presented in a tree-like structure, called ConsTree, that facilitates understanding the process of building the multiple consensuses, and also the relationships between the data instances and their structuring in the data space. Five consensus functions are proposed in this work in order to build a consensus solution from the clustering patterns. Approach 1 is to just merge any intersecting clustering patterns. Approach 2 can either merge or split intersecting patterns based on a proposed measure, called intersection ratio
Kalinicheva, Ekaterina. "Unsupervised satellite image time series analysis using deep learning techniques." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS335.
Full textThis thesis presents a set of unsupervised algorithms for satellite image time series (SITS) analysis. Our methods exploit machine learning algorithms and, in particular, neural networks to detect different spatio-temporal entities and their eventual changes in the time.In our thesis, we aim to identify three different types of temporal behavior: no change areas, seasonal changes (vegetation and other phenomena that have seasonal recurrence) and non-trivial changes (permanent changes such as constructions or demolishment, crop rotation, etc). Therefore, we propose two frameworks: one for detection and clustering of non-trivial changes and another for clustering of “stable” areas (seasonal changes and no change areas). The first framework is composed of two steps which are bi-temporal change detection and the interpretation of detected changes in a multi-temporal context with graph-based approaches. The bi-temporal change detection is performed for each pair of consecutive images of the SITS and is based on feature translation with autoencoders (AEs). At the next step, the changes from different timestamps that belong to the same geographic area form evolution change graphs. The graphs are then clustered using a recurrent neural networks AE model to identify different types of change behavior. For the second framework, we propose an approach for object-based SITS clustering. First, we encode SITS with a multi-view 3D convolutional AE in a single image. Second, we perform a two steps SITS segmentation using the encoded SITS and original images. Finally, the obtained segments are clustered exploiting their encoded descriptors
Mure, Simon. "Classification non supervisée de données spatio-temporelles multidimensionnelles : Applications à l’imagerie." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI130/document.
Full textDue to the dramatic increase of longitudinal acquisitions in the past decades such as video sequences, global positioning system (GPS) tracking or medical follow-up, many applications for time-series data mining have been developed. Thus, unsupervised time-series data mining has become highly relevant with the aim to automatically detect and identify similar temporal patterns between time-series. In this work, we propose a new spatio-temporal filtering scheme based on the mean-shift procedure, a state of the art approach in the field of image processing, which clusters multivariate spatio-temporal data. We also propose a hierarchical time-series clustering algorithm based on the dynamic time warping measure that identifies similar but asynchronous temporal patterns. Our choices have been motivated by the need to analyse magnetic resonance images acquired on people affected by multiple sclerosis. The genetics and environmental factors triggering and governing the disease evolution, as well as the occurrence and evolution of individual lesions, are still mostly unknown and under intense investigation. Therefore, there is a strong need to develop new methods allowing automatic extraction and quantification of lesion characteristics. This has motivated our work on time-series clustering methods, which are not widely used in image processing yet and allow to process image sequences without prior knowledge on the final results
Knefati, Muhammad Anas. "Estimation non-paramétrique du quantile conditionnel et apprentissage semi-paramétrique : applications en assurance et actuariat." Thesis, Poitiers, 2015. http://www.theses.fr/2015POIT2280/document.
Full textThe thesis consists of two parts: One part is about the estimation of conditional quantiles and the other is about supervised learning. The "conditional quantile estimate" part is organized into 3 chapters. Chapter 1 is devoted to an introduction to the local linear regression and then goes on to present the methods, the most used in the literature to estimate the smoothing parameter. Chapter 2 addresses the nonparametric estimation methods of conditional quantile and then gives numerical experiments on simulated data and real data. Chapter 3 is devoted to a new conditional quantile estimator, we propose. This estimator is based on the use of asymmetrical kernels w.r.t. x. We show, under some hypothesis, that this new estimator is more efficient than the other estimators already used. The "supervised learning" part is, too, with 3 chapters: Chapter 4 provides an introduction to statistical learning, remembering the basic concepts used in this part. Chapter 5 discusses the conventional methods of supervised classification. Chapter 6 is devoted to propose a method of transferring a semiparametric model. The performance of this method is shown by numerical experiments on morphometric data and credit-scoring data
Hajjar, Chantal. "Cartes auto-organisatrices pour la classification de données symboliques mixtes, de données de type intervalle et de données discrétisées." Thesis, Supélec, 2014. http://www.theses.fr/2014SUPL0066/document.
Full textThis thesis concerns the clustering of symbolic data with bio-inspired geometric methods, more specifically with Self-Organizing Maps. We set up several learning algorithms for the self-organizing maps in order to cluster mixed-feature symbolic data as well as interval-valued data and binned data. Several simulated and real symbolic data sets, including two sets built as part of this thesis, are used to test the proposed methods. In addition, we propose a self-organizing map for binned data in order to accelerate the learning of standard maps, and we use the proposed method for image segmentation
Ternynck, Camille. "Contributions à la modélisation de données spatiales et fonctionnelles : applications." Thesis, Lille 3, 2014. http://www.theses.fr/2014LIL30062/document.
Full textIn this dissertation, we are interested in nonparametric modeling of spatial and/or functional data, more specifically based on kernel method. Generally, the samples we have considered for establishing asymptotic properties of the proposed estimators are constituted of dependent variables. The specificity of the studied methods lies in the fact that the estimators take into account the structure of the dependence of the considered data.In a first part, we study real variables spatially dependent. We propose a new kernel approach to estimating spatial probability density of the mode and regression functions. The distinctive feature of this approach is that it allows taking into account both the proximity between observations and that between sites. We study the asymptotic behaviors of the proposed estimates as well as their applications to simulated and real data. In a second part, we are interested in modeling data valued in a space of infinite dimension or so-called "functional data". As a first step, we adapt the nonparametric regression model, introduced in the first part, to spatially functional dependent data framework. We get convergence results as well as numerical results. Then, later, we study time series regression model in which explanatory variables are functional and the innovation process is autoregressive. We propose a procedure which allows us to take into account information contained in the error process. After showing asymptotic behavior of the proposed kernel estimate, we study its performance on simulated and real data.The third part is devoted to applications. First of all, we present unsupervised classificationresults of simulated and real spatial data (multivariate). The considered classification method is based on the estimation of spatial mode, obtained from the spatial density function introduced in the first part of this thesis. Then, we apply this classification method based on the mode as well as other unsupervised classification methods of the literature on hydrological data of functional nature. Lastly, this classification of hydrological data has led us to apply change point detection tools on these functional data
Belghiti, Moulay Tayeb. "Modélisation et techniques d'optimisation en bio-informatique et fouille de données." Thesis, Rouen, INSA, 2008. http://www.theses.fr/2008ISAM0002.
Full textThis Ph.D. thesis is particularly intended to treat two types of problems : clustering and the multiple alignment of sequence. Our objective is to solve efficiently these global problems and to test DC Programming approach and DCA on real datasets. The thesis is divided into three parts : the first part is devoted to the new approaches of nonconvex optimization-global optimization. We present it a study in depth of the algorithm which is used in this thesis, namely the programming DC and the algorithm DC ( DCA). In the second part, we will model the problem clustering in three nonconvex subproblems. The first two subproblems are distinguished compared to the choice from the norm used, (clustering via norm 1 and 2). The third subproblem uses the method of the kernel, (clustering via the method of the kernel). The third part will be devoted to bioinformatics, one goes this focused on the modeling and the resolution of two subproblems : the multiple alignment of sequence and the alignment of sequence of RNA. All the chapters except the first end in numerical tests
Chaari, Anis. "Nouvelle approche d'identification dans les bases de données biométriques basée sur une classification non supervisée." Phd thesis, Université d'Evry-Val d'Essonne, 2009. http://tel.archives-ouvertes.fr/tel-00549395.
Full textGorin, Arseniy. "Structuration du modèle acoustique pour améliorer les performance de reconnaissance automatique de la parole." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0161/document.
Full textThis thesis focuses on acoustic model structuring for improving HMM-Based automatic speech recognition. The structuring relies on unsupervised clustering of speech utterances of the training data in order to handle speaker and channel variability. The idea is to split the data into acoustically similar classes. In conventional multi-Modeling (or class-Based) approach, separate class-Dependent models are built via adaptation of a speaker-Independent model. When the number of classes increases, less data becomes available for the estimation of the class-Based models, and the parameters are less reliable. One way to handle such problem is to modify the classification criterion applied on the training data, allowing a given utterance to belong to more than one class. This is obtained by relaxing the classification decision through a soft margin. This is investigated in the first part of the thesis. In the main part of the thesis, a novel approach is proposed that uses the clustered data more efficiently in a class-Structured GMM. Instead of adapting all HMM-GMM parameters separately for each class of data, the class information is explicitly introduced into the GMM structure by associating a given density component with a given class. To efficiently exploit such structured HMM-GMM, two different approaches are proposed. The first approach combines class-Structured GMM with class-Dependent mixture weights. In this model the Gaussian components are shared across speaker classes, but they are class-Structured, and the mixture weights are class-Dependent. For decoding an utterance, the set of mixture weights is selected according to the estimated class. In the second approach, the mixture weights are replaced by density component transition probabilities. The approaches proposed in the thesis are analyzed and evaluated on various speech data, which cover different types of variability sources (age, gender, accent and noise)
Berard, Caroline. "Modèles à variables latentes pour des données issues de tiling arrays : Applications aux expériences de ChIP-chip et de transcriptome." Thesis, Paris, AgroParisTech, 2011. http://www.theses.fr/2011AGPT0067.
Full textTiling arrays make possible a large scale exploration of the genome with high resolution. Biological questions usually addressed are either the gene expression or the detection of transcribed regions which can be investigated via transcriptomic experiments, and also the regulation of gene expression thanks to ChIP-chip experiments. In order to analyse ChIP-chip and transcriptomic data, we propose latent variable models, especially Hidden Markov Models, which are part of unsupervised classification methods. The biological features of the tiling arrays signal, such as the spatial dependence between observations along the genome and structural annotation are integrated in the model. Moreover, the models are adapted to the biological question at hand and a model is proposed for each type of experiment. We propose a mixture of regressions for the comparison of two samples, when one sample can be considered as a reference sample (ChIP-chip), and a two-dimensional Gaussian model with constraints on the variance parameter when the two samples play symmetrical roles (transcriptome). Finally, a semi-parametric modeling is considered, allowing more flexible emission distributions. With the objective of classification, we propose a false-positive control in the case of a two-cluster classification and for independent observations. Then, we focus on the classification of a set of observations forming a region of interest such as a gene. The different models are illustrated on real ChIP-chip and transcriptomic datasets coming from a NimbleGen tiling array covering the entire genome of Arabidopsis thaliana
Vinot, Romain. "Classification automatique de textes dans des catégories non thématiques." Phd thesis, Télécom ParisTech, 2004. http://pastel.archives-ouvertes.fr/pastel-00000812.
Full textGorin, Arseniy. "Structuration du modèle acoustique pour améliorer les performance de reconnaissance automatique de la parole." Electronic Thesis or Diss., Université de Lorraine, 2014. http://www.theses.fr/2014LORR0161.
Full textThis thesis focuses on acoustic model structuring for improving HMM-Based automatic speech recognition. The structuring relies on unsupervised clustering of speech utterances of the training data in order to handle speaker and channel variability. The idea is to split the data into acoustically similar classes. In conventional multi-Modeling (or class-Based) approach, separate class-Dependent models are built via adaptation of a speaker-Independent model. When the number of classes increases, less data becomes available for the estimation of the class-Based models, and the parameters are less reliable. One way to handle such problem is to modify the classification criterion applied on the training data, allowing a given utterance to belong to more than one class. This is obtained by relaxing the classification decision through a soft margin. This is investigated in the first part of the thesis. In the main part of the thesis, a novel approach is proposed that uses the clustered data more efficiently in a class-Structured GMM. Instead of adapting all HMM-GMM parameters separately for each class of data, the class information is explicitly introduced into the GMM structure by associating a given density component with a given class. To efficiently exploit such structured HMM-GMM, two different approaches are proposed. The first approach combines class-Structured GMM with class-Dependent mixture weights. In this model the Gaussian components are shared across speaker classes, but they are class-Structured, and the mixture weights are class-Dependent. For decoding an utterance, the set of mixture weights is selected according to the estimated class. In the second approach, the mixture weights are replaced by density component transition probabilities. The approaches proposed in the thesis are analyzed and evaluated on various speech data, which cover different types of variability sources (age, gender, accent and noise)