Dissertations / Theses on the topic 'Analyse supervisée'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Analyse supervisée.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Debeir, Olivier. "Segmentation supervisée d'images." Doctoral thesis, Universite Libre de Bruxelles, 2001. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/211474.
Goubet, Étienne. "Contrôle non destructif par analyse supervisée d'images 3D ultrasonores." Cachan, Ecole normale supérieure, 1999. http://www.theses.fr/1999DENS0011.
Huck, Alexis. "Analyse non-supervisée d’images hyperspectrales : démixage linéaire et détection d’anomalies." Aix-Marseille 3, 2009. http://www.theses.fr/2009AIX30036.
This thesis focusses on two research fields regarding unsupervised analysis of hyperspectral images (HSIs). Under the assumptions of the linear spectral mixing model, the formalism of Non-Negative Matrix Factorization is investigated for unmixing purposes. We propose judicious spectral and spatial a priori knowledge to regularize the problem. In addition, we propose an estimator for the projected gradient optimal step-size. Thus, suitably regularized NMF is shown to be a relevant approach to unmix HSIs. Then, the problem of anomaly detection is considered. We propose an algorithm for Anomalous Component Pursuit (ACP), simultaneously based on projection pursuit and on a probabilistic model and hypothesis testing. ACP detects the anomalies with a constant false alarm rate and discriminates them into spectrally homogeneous classes
Chombart, Anne. "Commande supervisée de systèmes hybrides." Grenoble INPG, 1997. http://www.theses.fr/1997INPG0170.
Faucheux, Cyrille. "Segmentation supervisée d'images texturées par régularisation de graphes." Thesis, Tours, 2013. http://www.theses.fr/2013TOUR4050/document.
In this thesis, we improve a recent image segmentation algorithm based on a graph regularization process. The goal of this method is to compute an indicator function that satisfies a regularity and a fidelity criteria. Its particularity is to represent images with similarity graphs. This data structure allows relations to be established between similar pixels, leading to non-local processing of the data. In order to improve this approach, combine it with another non-local one: the texture features. Two solutions are developped, both based on Haralick features. In the first one, we propose a new fidelity term which is based on the work of Chan and Vese and is able to evaluate the homogeneity of texture features. In the second method, we propose to replace the fidelity criteria by the output of a supervised classifier. Trained to recognize several textures, the classifier is able to produce a better modelization of the problem by identifying the most relevant texture features. This method is also extended to multiclass segmentation problems. Both are applied to 2D and 3D textured images
Dârlea, Georgiana-Lavinia. "Un système de classification supervisée à base de règles implicatives." Chambéry, 2010. http://www.theses.fr/2010CHAMS001.
This PhD thesis presents a series of research works done in the field of supervised data classification more precisely in the domain of semi-automatic learning of fuzzy rules-based classifiers. The prepared manuscript presents first an overview of the classification problem, and also of the main classification methods that have already been implemented and certified in order to place the proposed method in the general context of the domain. Once the context established, the actual research work is presented: the definition of a formal background for representing an elementary fuzzy rule-based classifier in a bi-dimensional space, the description of a learning algorithm for these elementary classifiers for a given data set and the conception of a multi-dimensional classification system which is able to handle multi-classes problems by combining the elementary classifiers. The implementation and testing of all these functionalities and finally the application of the resulted classifier on two real-world digital image problems are finally presented: the analysis of the quality of industrial products using 3D tomographic images and the identification of regions of interest in radar satellite images
Leblanc, Brice. "Analyse non supervisée de données issues de Systèmes de Transport Intelligent-Coopératif." Thesis, Reims, 2020. http://www.theses.fr/2020REIMS014.
This thesis takes place in the context of Vehicular Ad-hoc Networks (VANET), and more specifically the context of Cooperative-Intelligent Transport System (C-ITS). These systems are exchanging information to enhance road safety.The purpose of this thesis is to introduce data analysis tools that may provide road operators information on the usage/state of their infrastructures. Therefore, this information may help to improve road safety. We identify two cases we want to deal with: driving profile identification and road obstacle detection.For dealing with those issues, we propose to use unsupervised learning approaches: clustering methods for driving profile identification, and concept drift detection for obstacle detection. This thesis introduces three main contributions: a methodology allowing us to transform raw C-ITS data in, first, trajectory, and then, learning data-set; the use of classical clustering methods and Points Of Interests for driving profiles with experiments on mobile device data and network logs data; and the consideration of a crowd of vehicles providing network log data as data streams and considered as input of concept drift detection algorithms to recognize road obstacles
Fontaine, Michaël. "Segmentation non supervisée d'images couleur par analyse de la connexité des pixels." Lille 1, 2001. https://pepite-depot.univ-lille.fr/LIBRE/Th_Num/2001/50376-2001-305-306.pdf.
Conan-Guez, Brieuc. "Modélisation supervisée de données fonctionnelles par perceptron multi-couches." Phd thesis, Université Paris Dauphine - Paris IX, 2002. http://tel.archives-ouvertes.fr/tel-00178892.
Vandewalle, Vincent. "Estimation et sélection en classification semi-supervisée." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2009. http://tel.archives-ouvertes.fr/tel-00447141.
Cutrona, Jérôme. "Analyse de forme des objets biologiques : représentation, classification et suivi temporel." Reims, 2003. http://www.theses.fr/2003REIMS018.
N biology, the relationship between shape, a major element in computer vision, and function has been emphasized since a long time. This thesis proposes a processing line leading to unsupervised shape classification, deformation tracking and supervised classification of whole population of objects. We first propose a contribution to unsupervised segmentation based on a fuzzy classification method and two semi-automatic methods founded on fuzzy connectedness and watersheds. Next, we perform a study on several shape descriptors including primitives and anti-primitives, contour, silhouete and multi-scale curvature. After shape matching, the descriptors are submitted to statistical analysis to highlight the modes of variations within the samples. The obtained statistical model is the basis of the proposed applications
Lecomte, Sébastien. "Classification partiellement supervisée par SVM : application à la détection d’événements en surveillance audio." Thesis, Troyes, 2013. http://www.theses.fr/2013TROY0031/document.
This thesis addresses partially supervised Support Vector Machines for novelty detection (One-Class SVM). These have been studied to design abnormal audio events detection for supervision of public infrastructures, in particular public transportation systems. In this context, the null hypothesis (“normal” audio signals) is relatively well known (even though corresponding signals can be notably non stationary). Conversely, every “abnormal” signal should be detected and, if possible, clustered with similar signals. Thus, a reference system based on a single model of normal signals is presented, then we propose to use several concurrent One-Class SVM to cluster new data. Regarding the amount of data to process, special solvers have been studied. The proposed algorithms must be real time. This is the reason why we have also investigated algorithms with warm start capabilities. By the study of these algorithms, we have proposed a unified framework for One Class and Binary SVMs, with and without bias. The proposed approach has been validated on a database of real signals. The whole process applied to the monitoring of a subway station has been presented during the final review of the European Project VANAHEIM
Rafi, Selwa. "Chaînes de Markov cachées et séparation non supervisée de sources." Thesis, Evry, Institut national des télécommunications, 2012. http://www.theses.fr/2012TELE0020/document.
The restoration problem is usually encountered in various domains and in particular in signal and image processing. It consists in retrieving original data from a set of observed ones. For multidimensional data, the problem can be solved using different approaches depending on the data structure, the transformation system and the noise. In this work, we have first tackled the problem in the case of discrete data and noisy model. In this context, the problem is similar to a segmentation problem. We have exploited Pairwise and Triplet Markov chain models, which generalize Hidden Markov chain models. The interest of these models consist in the possibility to generalize the computation procedure of the posterior probability, allowing one to perform bayesian segmentation. We have considered these methods for two-dimensional signals and we have applied the algorithms to retrieve of old hand-written document which have been scanned and are subject to show through effect. In the second part of this work, we have considered the restoration problem as a blind source separation problem. The well-known "Independent Component Analysis" (ICA) method requires the assumption that the sources be statistically independent. In practice, this condition is not always verified. Consequently, we have studied an extension of the ICA model in the case where the sources are not necessarily independent. We have introduced a latent process which controls the dependence and/or independence of the sources. The model that we propose combines a linear instantaneous mixing model similar to the one of ICA model and a probabilistic model on the sources with hidden variables. In this context, we show how the usual independence assumption can be weakened using the technique of Iterative Conditional Estimation to a conditional independence assumption
Ruiz, Dominguez Cinta. "Analyse automatique des troubles de contraction cardiaque en échocardiographie." Paris 11, 2005. http://www.theses.fr/2005PA112074.
Many methods are developed to study the automatic evaluation of the left ventricle regional wall motion (normokinesia, hypokinesia, akinesia and dyskinesia), especially in echocardiography. A new parametric imaging method, based on the temporal intensity of pixels and called ‘parametric analysis of the main motion' (pamm) was proposed. This method synthesises the information contained in a sequence of images into two parametric images interpretable by a clinician: a three-color image of amplitude and a mean time contraction image. 602 segments of a database were scored with the interpretation of the pamm images and compared to a consensual visual interpretation of the cine-loop sequences by two experimented readers. Absolute and relative concordances are 64% and 82%. Some segmental indices were estimated from the pamm images. An automatic classification of the segments into two classes (normal and pathological segments) using this indices was performed. The diagnostic performance of the different indices was evaluated using the roc curve theory. Then a four-classes classification was done using the optimal index. Absolute and relative concordances obtained by the four-classes classification on a test database are 56% and 90%. The results could be improved if the localisation and the echogenicity of the segments are taken into account for the indices estimation
Ferrandiz, Sylvain. "Evaluation d'une mesure de similitude en classification supervisée : application à la préparation de données séquentielles." Phd thesis, Université de Caen, 2006. http://tel.archives-ouvertes.fr/tel-00123406.
du travail est consacrée à la construction et à la sélection des variables descriptives.
L'approche filtre univariée usuellement adoptée nécessite l'emploi d'une méthode
d'évaluation d'une variable. Nous considérons la question de l'évaluation supervisée d'une
variable séquentielle. Pour résoudre ce problème, nous montrons qu'il suffit de résoudre
un problème plus général : celui de l'évaluation supervisée d'une mesure de similitude.
Nous proposons une telle méthode d'évaluation. Pour l'obtenir, nous formulons le
problème en un problème de recherche d'une partition de Voronoi informative. Nous
proposons un nouveau critère d'évaluation supervisée de ces partitions et une nouvelle
heuristique de recherche optimisée. Le critère prévient automatiquement le risque de surapprentissage
et l'heuristique trouve rapidement une bonne solution. Au final, la méthode
réalise une estimation non paramétrique robuste de la densité d'une variable cible catégorielle
conditionnellement à une mesure de similitude définie à partir d'une variable descriptive.
La méthode a été testée sur de nombreux jeux de données. Son utilisation permet
de répondre à des questions comme : quel jour de la semaine ou quelle tranche horaire
sur la semaine discrimine le mieux le segment auquel appartient un foyer à partir de sa
consommation téléphonique fixe ? Quelle série de mesures permet de quantifier au mieux l'appétence à un nouveau service ?
Gay, Dominique. "Calcul de motifs sous contraintes pour la classification supervisée." Phd thesis, Nouvelle Calédonie, 2009. http://portail-documentaire.univ-nc.nc/files/public/bu/theses_unc/TheseDominiqueGay2009.pdf.
Gay, Dominique. "Calcul de motifs sous contraintes pour la classification supervisée." Phd thesis, Université de Nouvelle Calédonie, 2009. http://tel.archives-ouvertes.fr/tel-00516706.
Chzhen, Evgenii. "Plug-in methods in classification." Thesis, Paris Est, 2019. http://www.theses.fr/2019PESC2027/document.
This manuscript studies several problems of constrained classification. In this frameworks of classification our goal is to construct an algorithm which performs as good as the best classifier that obeys some desired property. Plug-in type classifiers are well suited to achieve this goal. Interestingly, it is shown that in several setups these classifiers can leverage unlabeled data, that is, they are constructed in a semi-supervised manner.Chapter 2 describes two particular settings of binary classification -- classification with F-score and classification of equal opportunity. For both problems semi-supervised procedures are proposed and their theoretical properties are established. In the case of the F-score, the proposed procedure is shown to be optimal in minimax sense over a standard non-parametric class of distributions. In the case of the classification of equal opportunity the proposed algorithm is shown to be consistent in terms of the misclassification risk and its asymptotic fairness is established. Moreover, for this problem, the proposed procedure outperforms state-of-the-art algorithms in the field.Chapter 3 describes the setup of confidence set multi-class classification. Again, a semi-supervised procedure is proposed and its nearly minimax optimality is established. It is additionally shown that no supervised algorithm can achieve a so-called fast rate of convergence. In contrast, the proposed semi-supervised procedure can achieve fast rates provided that the size of the unlabeled data is sufficiently large.Chapter 4 describes a setup of multi-label classification where one aims at minimizing false negative error subject to almost sure type constraints. In this part two specific constraints are considered -- sparse predictions and predictions with the control over false negative errors. For the former, a supervised algorithm is provided and it is shown that this algorithm can achieve fast rates of convergence. For the later, it is shown that extra assumptions are necessary in order to obtain theoretical guarantees in this case
Kalakech, Mariam. "Sélection semi-supervisée d'attributs : application à la classification de textures couleur." Thesis, Lille 1, 2011. http://www.theses.fr/2011LIL10018/document.
Within the framework of this thesis, we are interested in feature selection methods based on graph theory in different unsupervised, semi-supervised and supervised learning contexts. We are particularly interested in the feature ranking scores based on must-link et cannot-link constraints. Indeed, these constraints are easy to be obtained on real applications. They just require to formalize for two data samples if they are similar and then must be grouped together or not, without detailed information on the classes to be found. Constraint scores have shown good performances for semi-supervised feature selection. However, these scores strongly depend on the given must-link and cannot-link subsets built by the user. We propose then a new semi-supervised constraint scores that uses both pairwise constraints and local properties of the unconstrained data. Experiments on artificial and real databases show that this new score is less sensitive to the given constraints than the previous scores while providing similar performances. Semi supervised feature selection was also successfully applied to the color texture classification. Indeed, among many texture features which can be extracted from the color images, it is necessary to select the most relevant ones to improve the quality of classification
RAFI, Selwa. "Chaînes de Markov cachées et séparation non supervisée de sources." Phd thesis, Institut National des Télécommunications, 2012. http://tel.archives-ouvertes.fr/tel-00995414.
Boubou, Mounzer. "Contribution aux méthodes de classification non supervisée via des approches prétopologiques et d'agrégation d'opinions." Phd thesis, Université Claude Bernard - Lyon I, 2007. http://tel.archives-ouvertes.fr/tel-00195779.
Gomes, Da Silva Alzennyr. "Analyse des données évolutives : application aux données d'usage du Web." Phd thesis, Université Paris Dauphine - Paris IX, 2009. http://tel.archives-ouvertes.fr/tel-00445501.
Kassab, Randa. "Analyse des propriétés stationnaires et des propriétés émergentes dans les flux d'information changeant au cours du temps." Thesis, Nancy 1, 2009. http://www.theses.fr/2009NAN10027/document.
Many applications produce and receive continuous, unlimited, and high-speed data streams. This raises obvious problems of storage, treatment and analysis of data, which are only just beginning to be treated in the domain of data streams. On the one hand, it is a question of treating data streams on the fly without having to memorize all the data. On the other hand, it is also a question of analyzing, in a simultaneous and concurrent manner, the regularities inherent in the data stream as well as the novelties, exceptions, or changes occurring in this stream over time. The main contribution of this thesis concerns the development of a new machine learning approach - called ILoNDF - which is based on novelty detection principle. The learning of this model is, contrary to that of its former self, driven not only by the novelty part in the input data but also by the data itself. Thereby, ILoNDF can continuously extract new knowledge relating to the relative frequencies of the data and their variables. This makes it more robust against noise. Being operated in an on-line mode without repeated training, ILoNDF can further address the primary challenges for managing data streams. Firstly, we focus on the study of ILoNDF's behavior for one-class classification when dealing with high-dimensional noisy data. This study enabled us to highlight the pure learning capacities of ILoNDF with respect to the key classification methods suggested until now. Next, we are particularly involved in the adaptation of ILoNDF to the specific context of information filtering. Our goal is to set up user-oriented filtering strategies rather than system-oriented in following two types of directions. The first direction concerns user modeling relying on the model ILoNDF. This provides a new way of looking at user's need in terms of specificity, exhaustivity and contradictory profile-contributing criteria. These criteria go on to estimate the relative importance the user might attach to precision and recall. The filtering threshold can then be adjusted taking into account this knowledge about user's need. The second direction, complementary to the first one, concerns the refinement of ILoNDF's functionality in order to confer it the capacity of tracking drifting user's need over time. Finally, we consider the generalization of our previous work to the case where streaming data can be divided into multiple classes
Ribeyre, Corentin. "Méthodes d’analyse supervisée pour l’interface syntaxe-sémantique : de la réécriture de graphes à l’analyse par transitions." Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCC119.
Nowadays, the amount of textual data has become so gigantic, that it is not possible to deal with it manually. In fact, it is now necessary to use Natural Language Processing techniques to extract useful information from these data and understand their underlying meaning. In this thesis, we offer resources, models and methods to allow: (i) the automatic annotation of deep syntactic corpora to extract argument structure that links (verbal) predicates to their arguments (ii) the use of these resources with the help of efficient methods. First, we develop a graph rewriting system and a set of manually-designed rewriting rules to automatically annotate deep syntax in French. Thanks to this approach, two corpora were created: the DeepSequoia, a deep syntactic version of the Séquoia corpus and the DeepFTB, a deep syntactic version of the dependency version of the French Treebank. Next, we extend two transition-based parsers and adapt them to be able to deal with graph structures. We also develop a set of rich linguistic features extracted from various syntactic trees. We think they are useful to bring different kind of topological information to accurately predict predicat-argument structures. Used in an arc-factored second-order parsing model, this set of features gives the first state-of-the-art results on French and outperforms the one established on the DM and PAS corpora for English. Finally, we briefly explore a method to automatically induce the transformation between a tree and a graph. This completes our set of coherent resources and models to automatically analyze the syntax-semantics interface on French and English
Lebrun, Gilles. "Sélection de modèles pour la classification supervisée avec des SVM (Séparateurs à Vaste Marge) : application en traitement et analyse d'images." Caen, 2006. http://www.theses.fr/2006CAEN2049.
This thesis mainly deals with the importance of model selection to design efficient supervised machine learning schemes based on SVM classifiers. Three issues relating to the definition of such machine learning schemes have been investigated. The first issue concerns the evaluation of the generalization abilities of a classifier by cross validation techniques. We show that it is possible to take into account the inherent correlations between SVM training phases in order to significantly reduce the computation costs. The second issue concerns complexity reduction of SVM classifiers. Two approaches are proposed: 1) The design of a methodology to select a subset of relevant examples for producing low complexity SVM decision functions while increasing their generalization abilities; 2) The definition of a given metaheuristic based on Tabu search to optimise a trade-off between generalization abilities and complexities of SVM decision functions. The third issue concerns the development of efficient combination schemes of SVM classifiers using evolutionary algorithms for multi-model optimisation. We show that the higher the number of classes is, the greater the influence of the choices of decomposition, decoding and optimisation is. Proposed methods are used to define efficient SVM decision processes for two kinds of applications dedicated to image processing
Guillemot, Vincent. "Application de méthodes de classification supervisée et intégration de données hétérogènes pour des données transcriptomiques à haut-débit." Phd thesis, Université Paris Sud - Paris XI, 2010. http://tel.archives-ouvertes.fr/tel-00481822.
Zullo, Anthony. "Analyse de données fonctionnelles en télédétection hyperspectrale : application à l'étude des paysages agri-forestiers." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30135/document.
In hyperspectral imaging, each pixel is associated with a spectrum derived from observed reflectance in d measurement points (i.e., wavelengths). We are often facing a situation where the sample size n is relatively low compared to the number d of variables. This phenomenon called "curse of dimensionality" is well known in multivariate statistics. The mored increases with respect to n, the more standard statistical methodologies performances are degraded. Reflectance spectra incorporate in their spectral dimension a continuum that gives them a functional nature. A hyperspectrum can be modelised by an univariate function of wavelength and his representation produces a curve. The use of functional methods allows to take into account functional aspects such as continuity, spectral bands order, and to overcome strong correlations coming from the discretization grid fineness. The main aim of this thesis is to assess the relevance of the functional approach in the field of hyperspectral remote sensing for statistical analysis. We focused on the nonparametric fonctional regression model, including supervised classification. Firstly, the functional approach has been compared with multivariate methods usually involved in remote sensing. The functional approach outperforms multivariate methods in critical situations where one has a small training sample size combined with relatively homogeneous classes (that is to say, hard to discriminate). Secondly, an alternative to the functional approach to overcome the curse of dimensionality has been proposed using parsimonious models. This latter allows, through the selection of few measurement points, to reduce problem dimensionality while increasing results interpretability. Finally, we were interested in the almost systematic situation where one has contaminated functional data. We proved that for a fixed sample size, the finer the discretization, the better the prediction. In other words, the larger dis compared to n, the more effective the functional statistical methodis
Frévent, Camille. "Contribution to spatial statistics for high-dimensional and survival data." Electronic Thesis or Diss., Université de Lille (2022-....), 2022. http://www.theses.fr/2022ULILS032.
In this thesis, we are interested in statistical spatial learning for high-dimensional and survival data. The objective is to develop unsupervised cluster detection methods by means of spatial scan statistics in the contexts of functional data analysis in one hand and survival data analysis in the other hand. In the first two chapters, we consider univariate and multivariate functional data measured spatially in a geographical area. We propose both parametric and nonparametric spatial scan statistics in this framework. These univariate and multivariate functional approaches avoid the loss of information respectively of a univariate method or a multivariate method applied on the average of the observations during the study period. We study the new methods' performances in simulation studies before applying them on economic and environmental real data. We are also interested in spatial cluster detection of survival data. Although there exist already spatial scan statistics approaches in this framework in the literature, these do not take into account a potential correlation of survival times between individuals of the same spatial unit. Moreover, the spatial nature of the data implies a potential dependence between the spatial units, which should be taken into account. The originality of our proposed method is to introduce a spatial scan statistic based on a Cox model with a spatial frailty, allowing to take into account both the potential correlation between the survival times of the individuals of the same spatial unit and the potential dependence between the spatial units. We compare the performances of this new approach with the existing methods and apply them on real data corresponding to survival times of elderly people with end-stage kidney failure in northern France. Finally, we propose a number of perspectives to our work, both in a direct extension of this thesis in the framework of spatial scan statistics for high-dimensional and survival data, but also perspectives in a broader context of unsupervised spatial analysis (spatial clustering for high-dimensional data (tensors)), and supervised spatial learning (regression)
Mahdhaoui, Ammar. "Analyse de Signaux Sociaux pour la Modélisation de l'interaction face à face." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2010. http://tel.archives-ouvertes.fr/tel-00587051.
Bouzouita-Bayoudh, Inès. "Etude et extraction des règles associatives de classification en classification supervisée." Thesis, Montpellier 2, 2012. http://www.theses.fr/2012MON20217.
Within the framework of this thesis, our interest is focused on classification accuracy and the optimalité of the traversal of the search. we introduced a new direct associative classification method called IGARC that extracts directly a classifier formed by generic associative classification rules from a training set in order to reduce the number of associative classification rules without jeopardizing the classification accuracy. Carried out experiments outlined that IGARC is highly competitive in comparison with popular classification methods.We also introduced a new classification approach called AFORTIORI. We address the problem of generating relevant frequent and rare classification rules. Our work is motivated by the long-standing open question of devising an efficient algorithm for finding rules with low support. A particularly relevant field for rare item sets and rare associative classification rules is medical diagnosis. The proposed approach is based on the cover set classical algorithm. It allows obtaining frequent and rare rules while exploring the search space in a depth first manner. To this end, AFORTIORI adopts the covering set algorithm and uses the cover measure in order to guide the traversal of the search space and to generate the most interesting rules for the classification framework even rare ones. We describe our method and provide comparisons with common methods of associative classification on standard benchmark data set
Gan, Changquan. "Une approche de classification non supervisée basée sur la notion des K plus proches voisins." Compiègne, 1994. http://www.theses.fr/1994COMP765S.
Dugué, Nicolas. "Analyse du capitalisme social sur Twitter." Thesis, Orléans, 2015. http://www.theses.fr/2015ORLE2081/document.
Bourdieu, a sociologist, defines social capital as : "The set of current or potential ressources linked to the possession of a lasting relationships network". On Twitter,the friends, followers, users mentionned and retweeted are considered as the relationships network of each user, which ressources are the chance to get relevant information, to beread, to satisfy a narcissist need, to spread information or advertisements. We observethat some Twitter users that we call social capitalists aim to maximize their follower numbers to maximize their social capital. We introduce their methods, based on mutual subscriptions and dedicated hashtags. In order to study them, we first describe a large scaledetection method based on their set of followers and followees. Then, we show with an automated Twitter account that their methods allow to gain followers and to be retweeted efficiently. Afterwards, we bring to light that social capitalists methods allows these users to occupy specific positions in the network allowing them a high visibility.Furthermore, these methods make these users influent according to the major tools. Wethus set up a classification method to detect accurately these user and produce a newinfluence score
Pujari, Manisha. "Prévision de liens dans des grands graphes de terrain (application aux réseaux bibliographiques)." Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD010/document.
In this work, we are interested to tackle the problem of link prediction in complex networks. In particular, we explore topological dyadic approaches for link prediction. Different topological proximity measures have been studied in the scientific literature for finding the probability of appearance of new links in a complex network. Supervided learning methods have also been used to combine the predictions made or information provided by different topological measures. The create predictive models using various topological measures. The problem of supervised learning for link prediction is a difficult problem especially due to the presence of heavy class imbalance. In this thesis, we search different alternative approaches to improve the performance of different dyadic approaches for link prediction. We propose here, a new approach of link prediction based on supervised rank agregation that uses concepts from computational social choice theory. Our approach is founded on supervised techniques of aggregating sorted lists (or preference aggregation). We also explore different ways of improving supervised link prediction approaches. One approach is to extend the set of attributes describing an example (pair of nodes) by attributes calculated in a multiplex network that includes the target network. Multiplex networks have a layered structure, each layer having different kinds of links between same sets of nodes. The second way is to use community information for sampling of examples to deal with the problem of classe imabalance. Experiments conducted on real networks extracted from well known DBLP bibliographic database
Durand, Marie. "La découverte et la compréhension des profils d’apprenants : classification semi-supervisée et acquisition d’une langue seconde." Thesis, Paris 8, 2019. http://www.theses.fr/2019PA080029.
This thesis aims to develop an effective methodology for the discovery and description of the learner's profile of an L2 based on acquisition data (perception, understanding and production). We want to detect patterns in the acquisition behaviours of subgroups of learners, taking into account the multidimensional aspect of the L2 learning process. The proposed methodology belongs to the field of artificial intelligence, more specifically to semi supervised clustering techniques.Our algorithm has been applied to the data base of the VILLA project, which includes the performance of learners from 5 different source languages (French, Italian, Dutch, German and English) with Polish as the target language. 156 adult learners were each tested with a variety of tasks in Polish during 14 hours of teaching session, starting from the initial exposure. These tests made it possible to evaluate their performance on the levels of linguistic analysis that are phonology, morphology, morphosyntax and lexicon. The database also includes their sensitivity to input characteristics, such as the frequency and transparency of lexical elements used in linguistic tasks.The similarity measure used in traditional clustering techniques is revisited in this work in order to evaluate the distance between two learners from an acquisitionist point of view. It is based on the identification of the learner's response strategy to a specific language test structure. We show that this measure makes it possible to detect the presence or absence in the learner's responses of a strategy similar to the LC flexional system, and so enables our algorithm to provide a resulting classification consistent with second language acquisition research. As a result, we claim that our algorithm might be relevant in the empirical establishment of learners' profiles and the discovery of new opportunities for reflection or analysis
Gaillard, Pierre. "Apprentissage statistique de la connexité d'un nuage de points par modèle génératif : application à l'analyse exploratoire et la classification semi-supervisée." Compiègne, 2008. http://www.theses.fr/2008COMP1767.
In this work, we propose a statistical model to learn the connectedness of a set of points. This model combine geometrical and statistical approaches by defining a mixture model based on a graph. From this generative graph, we propose and evaluate methods and algorithms to analyse the set of points and to realize semi-supervised learning
Maugis, Cathy. "Sélection de variables pour la classification non supervisée par mélanges gaussiens : application à l'étude de données transcriptomes." Phd thesis, Université Paris Sud - Paris XI, 2008. http://tel.archives-ouvertes.fr/tel-00344120.
Dans la première partie, le modèle proposé, généralisant celui de Raftery et Dean (2006) permet de spécifier le rôle des variables vis-à-vis du processus de classification. Ainsi les variables non significatives peuvent être dépendantes d'une partie des variables retenues pour la classification. Ces modèles sont comparés grâce à un critère de type BIC. Leur identifiabilité est établie et la consistance du critère est démontrée sous des conditions de régularité. En pratique, le statut des variables est obtenu grâce à un algorithme imbriquant deux algorithmes descendants de sélection de variables pour la classification et pour la régression linéaire. L'intérêt de cette procédure est en particulier illustré sur des données transcriptomes. Une amélioration de la modélisation du rôle des variables, consistant à répartir les variables déclarées non significatives entre celles dépendantes et celles indépendantes des variables significatives pour la classification, est ensuite proposée pour pallier une surpénalisation de certains modèles. Enfin, la technologie des puces à ADN engendrant de nombreuses données manquantes, une extension de notre procédure tenant compte de l'existence de ces valeurs manquantes est suggérée, évitant leur
estimation préalable.
Dans la seconde partie, des mélanges gaussiens de formes spécifiques sont considérés et un critère pénalisé non asymptotique est proposé pour sélectionner simultanément le nombre de composantes du mélange et l'ensemble des variables pertinentes pour la classification. Un théorème général de sélection de modèles pour l'estimation de densités par maximum de vraisemblance, proposé par Massart (2007), est utilisé pour déterminer la forme de la pénalité. Ce théorème nécessite le contrôle de l'entropie à crochets des familles de mélanges gaussiens multidimensionnels étudiées. Ce critère dépendant de constantes multiplicatives inconnues, l'heuristique dite "de la pente" est mise en oeuvre pour permettre une utilisation effective de ce critère.
Happillon, Teddy. "Aide au diagnostic de cancers cutanés et de la leucémie lymphoïde chronique par microspectroscopies vibrationnelles couplées à des analyses numériques multivariées." Thesis, Reims, 2013. http://www.theses.fr/2013REIMP204/document.
Vibrational spectroscopy is a technology able to record a large amount of molecular information from studied samples. Coupled with chemometrics and classification methods, vibrational spectroscopy is an efficient tool to identify sample structures and substructures. When applied to the biomedical field, this tool shows a high potential for disease diagnosis. It is in this context that the works presented in this thesis have been realized. In a first study, dealing with algorithmic development, an automatic and unsupervised classification algorithm (based on the Fuzzy C-Means) and developed by our laboratory in order to help for skin cancer diagnosis using IR spectroscopy, was improved in order to i) reduce the computational time needed to realize clustering, ii) increase results quality obtained on infrared data, iii) and extend its application fields to simulated and real datasets, commonly used in the literature. This tool has been tested on 16 infrared spectral images of skin cancers (BCC, SCC, Bowen's disease and melanoma), and 49 real and simulated datasets. The obtained results showed the ability of this new algorithm to estimate realistic data partitions regardless the considered dataset. The second study of this work aimed at developing an independent chemometric tool to assist for chronic lymphocytic leukemia diagnosis by Raman spectroscopy. In this second work, different numerical preprocessing steps and a supervised classification algorithm, Support Vector Machines, have been applied on data recorded on blood cells coming from 27 healthy persons and 49 patients with chronic lymphocytic leukemia. The classification results showed a sensitivity of 80% and a specificity of 100% in the disease diagnosis
Kassab, Randa. "Analyse des propriétés stationnaires et des propriétés émergentes dans les flux d'informations changeant au cours du temps." Phd thesis, Université Henri Poincaré - Nancy I, 2009. http://tel.archives-ouvertes.fr/tel-00402644.
L'apport de ce travail de thèse réside principalement dans le développement d'un modèle d'apprentissage - nommé ILoNDF - fondé sur le principe de la détection de nouveauté. L'apprentissage de ce modèle est, contrairement à sa version de départ, guidé non seulement par la nouveauté qu'apporte une donnée d'entrée mais également par la donnée elle-même. De ce fait, le modèle ILoNDF peut acquérir constamment de nouvelles connaissances relatives aux fréquences d'occurrence des données et de leurs variables, ce qui le rend moins sensible au bruit. De plus, doté d'un fonctionnement en ligne sans répétition d'apprentissage, ce modèle répond aux exigences les plus fortes liées au traitement des flux de données.
Dans un premier temps, notre travail se focalise sur l'étude du comportement du modèle ILoNDF dans le cadre général de la classification à partir d'une seule classe en partant de l'exploitation des données fortement multidimensionnelles et bruitées. Ce type d'étude nous a permis de mettre en évidence les capacités d'apprentissage pures du modèle ILoNDF vis-à-vis de l'ensemble des méthodes proposées jusqu'à présent. Dans un deuxième temps, nous nous intéressons plus particulièrement à l'adaptation fine du modèle au cadre précis du filtrage d'informations. Notre objectif est de mettre en place une stratégie de filtrage orientée-utilisateur plutôt qu'orientée-système, et ceci notamment en suivant deux types de directions. La première direction concerne la modélisation utilisateur à l'aide du modèle ILoNDF. Cette modélisation fournit une nouvelle manière de regarder le profil utilisateur en termes de critères de spécificité, d'exhaustivité et de contradiction. Ceci permet, entre autres, d'optimiser le seuil de filtrage en tenant compte de l'importance que pourrait donner l'utilisateur à la précision et au rappel. La seconde direction, complémentaire de la première, concerne le raffinement des fonctionnalités du modèle ILoNDF en le dotant d'une capacité à s'adapter à la dérive du besoin de l'utilisateur au cours du temps. Enfin, nous nous attachons à la généralisation de notre travail antérieur au cas où les données arrivant en flux peuvent être réparties en classes multiples.
Yang, Gen. "Modèles prudents en apprentissage statistique supervisé." Thesis, Compiègne, 2016. http://www.theses.fr/2016COMP2263/document.
In some areas of supervised machine learning (e.g. medical diagnostics, computer vision), predictive models are not only evaluated on their accuracy but also on their ability to obtain more reliable representation of the data and the induced knowledge, in order to allow for cautious decision making. This is the problem we studied in this thesis. Specifically, we examined two existing approaches of the literature to make models and predictions more cautious and more reliable: the framework of imprecise probabilities and the one of cost-sensitive learning. These two areas are both used to make models and inferences more reliable and cautious. Yet few existing studies have attempted to bridge these two frameworks due to both theoretical and practical problems. Our contributions are to clarify and to resolve these problems. Theoretically, few existing studies have addressed how to quantify the different classification errors when set-valued predictions are produced and when the costs of mistakes are not equal (in terms of consequences). Our first contribution has been to establish general properties and guidelines for quantifying the misclassification costs for set-valued predictions. These properties have led us to derive a general formula, that we call the generalized discounted cost (GDC), which allow the comparison of classifiers whatever the form of their predictions (singleton or set-valued) in the light of a risk aversion parameter. Practically, most classifiers basing on imprecise probabilities fail to integrate generic misclassification costs efficiently because the computational complexity increases by an order (or more) of magnitude when non unitary costs are used. This problem has led to our second contribution, the implementation of a classifier that can manage the probability intervals produced by imprecise probabilities and the generic error costs with the same order of complexity as in the case where standard probabilities and unitary costs are used. This is to use a binary decomposition technique, the nested dichotomies. The properties and prerequisites of this technique have been studied in detail. In particular, we saw that the nested dichotomies are applicable to all imprecise probabilistic models and they reduce the imprecision level of imprecise models without loss of predictive power. Various experiments were conducted throughout the thesis to illustrate and support our contributions. We characterized the behavior of the GDC using ordinal data sets. These experiences have highlighted the differences between a model based on standard probability framework to produce indeterminate predictions and a model based on imprecise probabilities. The latter is generally more competent because it distinguishes two sources of uncertainty (ambiguity and the lack of information), even if the combined use of these two types of models is also of particular interest as it can assist the decision-maker to improve the data quality or the classifiers. In addition, experiments conducted on a wide variety of data sets showed that the use of nested dichotomies significantly improves the predictive power of an indeterminate model with generic costs
Doan, Tien Tai. "Réalisation d’une aide au diagnostic en orthodontie par apprentissage profond." Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG033.
Accurate processing and diagnosis of dental images is an essential factor determining the success of orthodontic treatment. Many image processing methods have been proposed to address this problem. Those studies mainly work on small datasets of radiographs under laboratory conditions and are not highly applicable as complete products or services. In this thesis, we train deep learning models to diagnose dental problems such as gingivitis and crowded teeth using mobile phones' images. We study feature layers of these models to find the strengths and limitations of each method. Besides training deep learning models, we also embed each of them in a pipeline, including preprocessing and post-processing steps, to create a complete product. For the lack of training data problem, we studied a variety of methods for data augmentation, especially domain adaptation methods using image-to-image translation models, both supervised and unsupervised, and obtain promising results. Image translation networks are also used to simplifying patients' choice of orthodontic appliances by showing them how their teeth could look like during treatment. Generated images have are realistic and in high resolution. Researching further into unsupervised image translation neural networks, we propose an unsupervised imageto- image translation model which can manipulate features of objects in the image without requiring additional annotation. Our model outperforms state-of-the-art techniques on multiple image translation applications and is also extended for few-shot learning problems
Ta, Minh Thuy. "Techniques d'optimisation non convexe basée sur la programmation DC et DCA et méthodes évolutives pour la classification non supervisée." Electronic Thesis or Diss., Université de Lorraine, 2014. http://www.theses.fr/2014LORR0099.
This thesis focus on four problems in data mining and machine learning: clustering data streams, clustering massive data sets, weighted hard and fuzzy clustering and finally the clustering without a prior knowledge of the clusters number. Our methods are based on deterministic optimization approaches, namely the DC (Difference of Convex functions) programming and DCA (Difference of Convex Algorithm) for solving some classes of clustering problems cited before. Our methods are also, based on elitist evolutionary approaches. We adapt the clustering algorithm DCA–MSSC to deal with data streams using two windows models: sub–windows and sliding windows. For the problem of clustering massive data sets, we propose to use the DCA algorithm with two phases. In the first phase, massive data is divided into several subsets, on which the algorithm DCA–MSSC performs clustering. In the second phase, we propose a DCA–Weight algorithm to perform a weighted clustering on the obtained centers in the first phase. For the weighted clustering, we also propose two approaches: weighted hard clustering and weighted fuzzy clustering. We test our approach on image segmentation application. The final issue addressed in this thesis is the clustering without a prior knowledge of the clusters number. We propose an elitist evolutionary approach, where we apply several evolutionary algorithms (EAs) at the same time, to find the optimal combination of initial clusters seed and in the same time the optimal clusters number. The various tests performed on several sets of large data are very promising and demonstrate the effectiveness of the proposed approaches
Kurovszky, Monika. "Etude des systèmes dynamiques hybrides par représentation d'état discrète et automate hybride." Phd thesis, Université Joseph Fourier (Grenoble), 2002. http://tel.archives-ouvertes.fr/tel-00198326.
Allain, Guillaume. "Prévision et analyse du trafic routier par des méthodes statistiques." Toulouse 3, 2008. http://thesesups.ups-tlse.fr/351/.
The industrial partner of this work is Mediamobile/V-trafic, a company which processes and broadcasts live road-traffic information. The goal of our work is to enhance traffic information with forecasting and spatial extending. Our approach is sometimes inspired by physical modelling of traffic dynamic, but it mainly uses statistical methods in order to propose self-organising and modular models suitable for industrial constraints. In the first part of this work, we describe a method to forecast trafic speed within a time frame of a few minutes up to several hours. Our method is based on the assumption that traffic on the a road network can be summarized by a few typical profiles. Those profiles are linked to the users' periodical behaviors. We therefore make the assumption that observed speed curves on each point of the network are stemming from a probabilistic mixture model. The following parts of our work will present how we can refine the general method. Medium term forecasting uses variables built from the calendar. The mixture model still stands. Additionnaly we use a fonctionnal regression model to forecast speed curves. We then introduces a local regression model in order to stimulate short-term trafic dynamics. The kernel function is built from real speed observations and we integrate some knowledge about traffic dynamics. The last part of our work focuses on the analysis of speed data from in traffic vehicles. These observations are gathered sporadically in time and on the road segment. The resulting data is completed and smoothed by local polynomial regression
Eke, Samuel. "Stratégie d'évaluation de l'état des transformateurs : esquisse de solutions pour la gestion intégrée des transformateurs vieillissants." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEC013/document.
This PhD thesis deals the assessment method of the state of power transformers filled with oil. It brings a new approach by implementing classification methods and data mining dedicated to transformer maintenance. It proposes a strategy based on two new oil health indicators built from an adaptive Neuro-Fuzzy Inference System (ANFIS). Two classifiers were built on a labeled learning database. The Naive Bayes classifier was retained for the detection of fault from gases dissolved in oil. A simple and efficient flowchart for evaluating the condition of transformers is proposed. It allows a quick analysis of the parameters resulting from physicochemical analyzes of oil and dissolved gases. Using unsupervised classification techniques through the methods of kmeans and fuzzy C-means allowed to reconstruct operating periods of a transformer, with some particular faults. It has also been demonstrated how these methods can be used as tool to help the maintenance of a group of transformers from available oil analysis data
Ta, Minh Thuy. "Techniques d'optimisation non convexe basée sur la programmation DC et DCA et méthodes évolutives pour la classification non supervisée." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0099/document.
This thesis focus on four problems in data mining and machine learning: clustering data streams, clustering massive data sets, weighted hard and fuzzy clustering and finally the clustering without a prior knowledge of the clusters number. Our methods are based on deterministic optimization approaches, namely the DC (Difference of Convex functions) programming and DCA (Difference of Convex Algorithm) for solving some classes of clustering problems cited before. Our methods are also, based on elitist evolutionary approaches. We adapt the clustering algorithm DCA–MSSC to deal with data streams using two windows models: sub–windows and sliding windows. For the problem of clustering massive data sets, we propose to use the DCA algorithm with two phases. In the first phase, massive data is divided into several subsets, on which the algorithm DCA–MSSC performs clustering. In the second phase, we propose a DCA–Weight algorithm to perform a weighted clustering on the obtained centers in the first phase. For the weighted clustering, we also propose two approaches: weighted hard clustering and weighted fuzzy clustering. We test our approach on image segmentation application. The final issue addressed in this thesis is the clustering without a prior knowledge of the clusters number. We propose an elitist evolutionary approach, where we apply several evolutionary algorithms (EAs) at the same time, to find the optimal combination of initial clusters seed and in the same time the optimal clusters number. The various tests performed on several sets of large data are very promising and demonstrate the effectiveness of the proposed approaches
Vinot, Romain. "Classification automatique de textes dans des catégories non thématiques." Phd thesis, Télécom ParisTech, 2004. http://pastel.archives-ouvertes.fr/pastel-00000812.
Nait-Chabane, Ahmed. "Segmentation invariante en rasance des images sonar latéral par une approche neuronale compétitive." Phd thesis, Université de Bretagne occidentale - Brest, 2013. http://tel.archives-ouvertes.fr/tel-00968199.
Alaoui, Ismaili Oumaima. "Clustering prédictif Décrire et prédire simultanément." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLA010.
Predictive clustering is a new supervised learning framework derived from traditional clustering. This new framework allows to describe and to predict simultaneously. Compared to a classical supervised learning, predictive clsutering algorithms seek to discover the internal structure of the target class in order to use it for predicting the class of new instances.The purpose of this thesis is to look for an interpretable model of predictive clustering. To acheive this objective, we choose to modified traditional K-means algorithm. This new modified version is called predictive K-means. It contains 7 differents steps, each of which can be supervised seperatly from the others. In this thesis, we only deal four steps : 1) data preprocessing, 2) initialization of centers, 3) selecting of the best partition, and 4) importance of features.Our experimental results show that the use of just two supervised steps (data preprocessing and initialization of centers), allow the K-means algorithm to acheive competitive performances with some others predictive clustering algorithms.These results show also that our preprocessing methods can help predictive K-means algorithm to provide results easily comprehensible by users. We are also showing in this thesis that the use of our new measure to evaluate predictive clustering quality, helps our predictive K-means algorithm to find the optimal partition that establishes the best trade-off between description and prediction. It thus allows users to find the different reasons behind the same prediction : two differents instances could have the same predicted label
Blanchard, Frédéric. "Visualisation et classification de données multidimensionnelles : Application aux images multicomposantes." Reims, 2005. http://theses.univ-reims.fr/exl-doc/GED00000287.pdf.
The analysis of multicomponent images is a crucial problem. Visualization and clustering problem are two relevant questions about it. We decided to work in the more general frame of data analysis to answer to these questions. The preliminary step of this work is describing the problems induced by the dimensionality and studying the current dimensionality reduction methods. The visualization problem is then considered and a contribution is exposed. We propose a new method of visualization through color image that provides an immediate and sythetic image od data. Applications are presented. The second contribution lies upstream with the clustering procedure strictly speaking. We etablish a new kind of data representation by using rank transformation, fuzziness and agregation procedures. Its use inprove the clustering procedures by dealing with clusters with dissimilar density or variant effectives and by making them more robust. This work presents two important contributions to the field of data analysis applied to multicomponent image. The variety of the tools involved (originally from decision theory, uncertainty management, data mining or image processing) make the presented methods usable in many diversified areas as well as multicomponent images analysis
Kurtz, Camille. "Une approche collaborative segmentation - classification pour l'analyse descendante d'images multirésolutions." Phd thesis, Université de Strasbourg, 2012. http://tel.archives-ouvertes.fr/tel-00735217.