Дисертації з теми "Réduction de dimension (Statistique)"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Réduction de dimension (Statistique)".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Girard, Robin. "Réduction de dimension en statistique et application en imagerie hyper-spectrale." Phd thesis, Grenoble 1, 2008. http://www.theses.fr/2008GRE10074.
Повний текст джерелаThis thesis deals with high dimensional statistical analysis. We focus on three different problems motivated by medical applications : curve classification, pixel classification and clustering in hyperspectral images. Our approaches are deeply linked with statistical testing procedures (multiple testing, minimax testing, robust testing, and functional testing) and learning theory. Both are introduced in the first part of this thesis. The second part focuses on classification of High dimensional Gaussian data. Our approach is based on a dimensionality reduction, and we show practical and theorical results. In the third and last part of this thesis we focus on hyperspectral image segmentation. We first propose a pixel classification algorithm based on multi-scale analysis, penalised maximum likelihood and feature selection. We give theorical results and simulations for this algorithm. We then propose a pixel clustering algorithm. It involves wavelet decomposition of observations in each pixel, smoothing with a growing region algorithm and frontier extraction based on a voting scheme
Girard, Robin. "Réduction de dimension en statistique et application en imagerie hyper-spectrale." Phd thesis, Université Joseph Fourier (Grenoble), 2008. http://tel.archives-ouvertes.fr/tel-00379179.
Повний текст джерелаKuentz, Vanessa. "Contributions à la réduction de dimension." Thesis, Bordeaux 1, 2009. http://www.theses.fr/2009BOR13871/document.
Повний текст джерелаThis thesis concentrates on dimension reduction approaches, that seek for lower dimensional subspaces minimizing the lost of statistical information. First we focus on multivariate analysis for categorical data. The rotation problem in Multiple Correspondence Analysis (MCA) is treated. We give the analytic expression of the optimal angle of planar rotation for the chosen criterion. If more than two principal components are to be retained, this planar solution is used in a practical algorithm applying successive pairwise planar rotations. Different algorithms for the clustering of categorical variables are also proposed to maximize a given partitioning criterion based on correlation ratios. A real data application highlights the benefits of using rotation in MCA and provides an empirical comparison of the proposed algorithms for categorical variable clustering. Then we study the semiparametric regression method SIR (Sliced Inverse Regression). We propose an extension based on the partitioning of the predictor space that can be used when the crucial linearity condition of the predictor is not verified. We also introduce bagging versions of SIR to improve the estimation of the basis of the dimension reduction subspace. Asymptotic properties of the estimators are obtained and a simulation study shows the good numerical behaviour of the proposed methods. Finally applied multivariate data analysis on various areas is described
Noyel, Guillaume. "Filtrage, réduction de dimension, classification et segmentation morphologique hyperspectrale." Phd thesis, École Nationale Supérieure des Mines de Paris, 2008. http://pastel.archives-ouvertes.fr/pastel-00004473.
Повний текст джерелаLopez, Olivier. "Réduction de dimension en présence de données censurées." Phd thesis, Rennes 1, 2007. http://tel.archives-ouvertes.fr/tel-00195261.
Повний текст джерелаvariable explicative. Nous développons une nouvelle approche de réduction de la dimension afin de résoudre ce problème.
Pedersen, Morten Akhøj. "Méthodes riemanniennes et sous-riemanniennes pour la réduction de dimension." Electronic Thesis or Diss., Université Côte d'Azur, 2023. http://www.theses.fr/2023COAZ4087.
Повний текст джерелаIn this thesis, we propose new methods for dimension reduction based on differential geometry, that is, finding a representation of a set of observations in a space of lower dimension than the original data space. Methods for dimension reduction form a cornerstone of statistics, and thus have a very wide range of applications. For instance, a lower dimensional representation of a data set allows visualization and is often necessary for subsequent statistical analyses. In ordinary Euclidean statistics, the data belong to a vector space and the lower dimensional space might be a linear subspace or a non-linear submanifold approximating the observations. The study of such smooth manifolds, differential geometry, naturally plays an important role in this last case, or when the data space is itself a known manifold. Methods for analysing this type of data form the field of geometric statistics. In this setting, the approximating space found by dimension reduction is naturally a submanifold of the given manifold. The starting point of this thesis is geometric statistics for observations belonging to a known Riemannian manifold, but parts of our work form a contribution even in the case of data belonging to Euclidean space, mathbb{R}^d.An important example of manifold valued data is shapes, in our case discrete or continuous curves or surfaces. In evolutionary biology, researchers are interested in studying reasons for and implications of morphological differences between species. Shape is one way to formalize morphology. This application motivates the first main contribution of the thesis. We generalize a dimension reduction method used in evolutionary biology, phylogenetic principal component analysis (P-PCA), to work for data on a Riemannian manifold - so that it can be applied to shape data. P-PCA is a version of PCA for observations that are assumed to be leaf nodes of a phylogenetic tree. From a statistical point of view, the important property of such data is that the observations (leaf node values) are not necessarily independent. We define and estimate intrinsic weighted means and covariances on a manifold which takes the dependency of the observations into account. We then define phylogenetic PCA on a manifold to be the eigendecomposition of the weighted covariance in the tangent space of the weighted mean. We show that the mean estimator that is currently used in evolutionary biology for studying morphology corresponds to taking only a single step of our Riemannian gradient descent algorithm for the intrinsic mean, when the observations are represented in Kendall's shape space. Our second main contribution is a non-parametric method for dimension reduction that can be used for approximating a set of observations based on a very flexible class of submanifolds. This method is novel even in the case of Euclidean data. The method works by constructing a subbundle of the tangent bundle on the data manifold via local PCA. We call this subbundle the principal subbundle. We then observe that this subbundle induces a sub-Riemannian structure and we show that the resulting sub-Riemannian geodesics with respect to this structure stay close to the set of observations. Moreover, we show that sub-Riemannian geodesics starting from a given point locally generate a submanifold which is radially aligned with the estimated subbundle, even for non-integrable subbundles. Non-integrability is likely to occur when the subbundle is estimated from noisy data, and our method demonstrates that sub-Riemannian geometry is a natural framework for dealing which such problems. Numerical experiments illustrate the power of our framework by showing that we can achieve impressively large range reconstructions even in the presence of quite high levels of noise
I denne afhandling præsenteres nye metoder til dimensionsreduktion, baseret p˚adifferential geometri. Det vil sige metoder til at finde en repræsentation af et datasæti et rum af lavere dimension end det opringelige rum. S˚adanne metoder spiller enhelt central rolle i statistik, og har et meget bredt anvendelsesomr˚ade. En laveredimensionalrepræsentation af et datasæt tillader visualisering og er ofte nødvendigtfor efterfølgende statistisk analyse. I traditionel, Euklidisk statistik ligger observationernei et vektor rum, og det lavere-dimensionale rum kan være et lineært underrumeller en ikke-lineær undermangfoldighed som approksimerer observationerne.Studiet af s˚adanne glatte mangfoldigheder, differential geometri, spiller en vigtig rollei sidstnævnte tilfælde, eller hvis rummet hvori observationerne ligger i sig selv er enmangfoldighed. Metoder til at analysere observationer p˚a en mangfoldighed udgørfeltet geometrisk statistik. I denne kontekst er det approksimerende rum, fundetvia dimensionsreduktion, naturligt en submangfoldighed af den givne mangfoldighed.Udgangspunktet for denne afhandling er geometrisk statistik for observationer p˚a ena priori kendt Riemannsk mangfoldighed, men dele af vores arbejde udgør et bidragselv i tilfældet med observationer i Euklidisk rum, Rd.Et vigtigt eksempel p˚a data p˚a en mangfoldighed er former, i vores tilfældediskrete kurver eller overflader. I evolutionsbiologi er forskere interesseret i at studeregrunde til og implikationer af morfologiske forskelle mellem arter. Former er ´en m˚adeat formalisere morfologi p˚a. Denne anvendelse motiverer det første hovedbidrag idenne afhandling. We generaliserer en metode til dimensionsreduktion brugt i evolutionsbiologi,phylogenetisk principal component analysis (P-PCA), til at virke for datap˚a en Riemannsk mangfoldighed - s˚a den kan anvendes til observationer af former. PPCAer en version af PCA for observationer som antages at være de yderste knuder iet phylogenetisk træ. Fra et statistisk synspunkt er den vigtige egenskab ved s˚adanneobservationer at de ikke nødvendigvis er uafhængige. We definerer og estimerer intrinsiskevægtede middelværdier og kovarianser p˚a en mangfoldighed, som tager højde fors˚adanne observationers afhængighed. Vi definerer derefter phylogenetisk PCA p˚a enmangfoldighed som egendekomposition af den vægtede kovarians i tanget-rummet tilden vægtede middelværdi. Vi viser at estimatoren af middelværdien som pt. bruges ievolutionsbiologi til at studere morfologi svarer til at tage kun et enkelt skridt af voresRiemannske gradient descent algoritme for den intrinsiske middelværdi, n˚ar formernerepræsenteres i Kendall´s form-mangfoldighed.Vores andet hovedbidrag er en ikke-parametrisk metode til dimensionsreduktionsom kan bruges til at approksimere et data sæt baseret p˚a en meget flexibel klasse afsubmangfoldigheder. Denne metode er ny ogs˚a i tilfældet med Euklidisk data. Metodenvirker ved at konstruere et under-bundt af tangentbundet p˚a datamangfoldighedenM via lokale PCA´er. Vi kalder dette underbundt principal underbundtet. Viobserverer at dette underbundt inducerer en sub-Riemannsk struktur p˚a M og vi viserat sub-Riemannske geodæter fra et givent punkt lokalt genererer en submangfoldighedsom radialt flugter med det estimerede subbundt, selv for ikke-integrable subbundter.Ved støjfyldt data forekommer ikke-integrabilitet med stor sandsynlighed, og voresmetode demonstrerer at sub-Riemannsk geometri er en naturlig tilgang til at h˚andteredette. Numeriske eksperimenter illustrerer styrkerne ved metoden ved at vise at denopn˚ar rekonstruktioner over store afstande, selv under høje niveauer af støj
Damon, Cécilia. "Réduction de dimension et régularisation pour l'apprentissage statistique et la prédiction individuelle en IRMf." Paris 11, 2010. http://www.theses.fr/2010PA112107.
Повний текст джерелаPredictive multivariate methods have yet been rarely explored in fMRI at the inter-subject level. An important inter-subjects anatomo-functional variability and the large dimension of fMRI data in comparison to the few number of subjects complicates the identification of the inter-subjects functional variability specific to a phenotype of interest and increases the overfitting phenomenon of classification techniques. Our first objective aims to explore the various approaches available in the field of supervised statistical learning and well-known to control the overfitting problem and more specifically two means: the feature selection and the regularised classification. Our second goal consist in defining a rigorous methodology of the different proposed strategies at several levels: (i) global: comparison of all the strategies based on all the datasets; (ii) local: comparison restricted to a particular subset of strategies based on all the datasets; (iii) individual: comparison of a pair of strategies based on a single dataset. We tested four couples of data (fMRI contrast, phenotypic information) extracted from a large database, including about 200 healthy subjects that have realized the same experimental protocol. We also constructed simulated datasets with a multivariate discriminant signal. The comparative analysis and the function patterns visualisation revealed the strategy combining the multivariate features selection RFE and the SRDA classifier as the most efficient. This strategy identified parcimonious predictive patterns and obtained good predictive performances proved to be relevant only when the contrast-to-noise ratio was strong
Tournier, Maxime. "Réduction de dimension pour l'animation de personnages." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00650696.
Повний текст джерелаZapien, Durand-Viel Karina. "Algorithme de chemin de régularisation pour l'apprentissage statistique." Phd thesis, INSA de Rouen, 2009. http://tel.archives-ouvertes.fr/tel-00557888.
Повний текст джерелаJanon, Alexandre. "Analyse de sensibilité et réduction de dimension. Application à l'océanographie." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00757101.
Повний текст джерелаKarina, Zapien. "Algorithme de Chemin de Régularisation pour l'apprentissage Statistique." Phd thesis, INSA de Rouen, 2009. http://tel.archives-ouvertes.fr/tel-00422854.
Повний текст джерелаL'approche habituelle pour déterminer ces hyperparamètres consiste à utiliser une "grille". On se donne un ensemble de valeurs possibles et on estime, pour chacune de ces valeurs, l'erreur de généralisation du meilleur modèle. On s'intéresse, dans cette thèse, à une approche alternative consistant à calculer l'ensemble des solutions possibles pour toutes les valeurs des hyperparamètres. C'est ce qu'on appelle le chemin de régularisation. Il se trouve que pour les problèmes d'apprentissage qui nous intéressent, des programmes quadratiques paramétriques, on montre que le chemin de régularisation associé à certains hyperparamètres est linéaire par morceaux et que son calcul a une complexité numérique de l'ordre d'un multiple entier de la complexité de calcul d'un modèle avec un seul jeu hyper-paramètres.
La thèse est organisée en trois parties. La première donne le cadre général des problèmes d'apprentissage de type SVM (Séparateurs à Vaste Marge ou Support Vector Machines) ainsi que les outils théoriques et algorithmiques permettant d'appréhender ce problème. La deuxième partie traite du problème d'apprentissage supervisé pour la classification et l'ordonnancement dans le cadre des SVM. On montre que le chemin de régularisation de ces problèmes est linéaire par morceaux. Ce résultat nous permet de développer des algorithmes originaux de discrimination et d'ordonnancement. La troisième partie aborde successivement les problèmes d'apprentissage semi supervisé et non supervisé. Pour l'apprentissage semi supervisé, nous introduisons un critère de parcimonie et proposons l'algorithme de chemin de régularisation associé. En ce qui concerne l'apprentissage non supervisé nous utilisons une approche de type "réduction de dimension". Contrairement aux méthodes à base de graphes de similarité qui utilisent un nombre fixe de voisins, nous introduisons une nouvelle méthode permettant un choix adaptatif et approprié du nombre de voisins.
Alawieh, Hiba. "Fitting distances and dimension reduction methods with applications." Thesis, Lille 1, 2017. http://www.theses.fr/2017LIL10018/document.
Повний текст джерелаIn various studies the number of variables can take high values which makes their analysis and visualization quite difficult. However, several statistical methods have been developed to reduce the complexity of these data, allowing a better comprehension of the knowledge available in these data. In this thesis, our aim is to propose two new methods of multivariate data analysis called: " Multidimensional Fitting" and "Projection under pairwise distance control". The first method is a derivative of multidimensional scaling method (MDS) whose the application requires the availability of two matrices describing the same population: a coordinate matrix and a distance matrix and the objective is to modify the coordinate matrix such that the distances calculated on the modified matrix are as close as possible to the distances observed on the distance matrix. Two extensions of this method have been extended: the first by penalizing the modification vectors of the coordinates and the second by taking into account the random effects that may occur during the modification. The second method is a new method of dimensionality reduction techniques based on the non-linearly projection of the points in a reduced space by taking into account the projection quality of each projected point taken individually in the reduced space. The projection of the points is done by introducing additional variables, called "radii", and indicate to which extent the projection of each point is accurate
Dalalyan, Arnak. "Contribution à la statistique des diffusions. Estimation semiparamétrique et efficacité au second ordre.Agrégation et réduction de dimension pour le modèle de régression." Habilitation à diriger des recherches, Université Pierre et Marie Curie - Paris VI, 2007. http://tel.archives-ouvertes.fr/tel-00192080.
Повний текст джерелаLe premier chapitre contient une description générale des résultats obtenus en les replaçant dans un contexte historique et en présentant les motivations qui nous ont animées pour étudier ces problèmes. J'y décris également de façon informelle les idées clés des démonstrations.
Au second chapitre, je présente les définitions principales nécessaires pour énoncer de façon rigoureuse les résultats les plus importants. Ce chapitre contient également une discussion plus formelle permettant de mettre en lumière certains aspects théoriques et pratiques de nos résultats.
Zapién, Arreola Karina. "Algorithme de chemin de régularisation pour l'apprentissage statistique." Thesis, Rouen, INSA, 2009. http://www.theses.fr/2009ISAM0001/document.
Повний текст джерелаThe selection of a proper model is an essential task in statistical learning. In general, for a given learning task, a set of parameters has to be chosen, each parameter corresponds to a different degree of “complexity”. In this situation, the model selection procedure becomes a search for the optimal “complexity”, allowing us to estimate a model that assures a good generalization. This model selection problem can be summarized as the calculation of one or more hyperparameters defining the model complexity in contrast to the parameters that allow to specify a model in the chosen complexity class. The usual approach to determine these parameters is to use a “grid search”. Given a set of possible values, the generalization error for the best model is estimated for each of these values. This thesis is focused in an alternative approach consisting in calculating the complete set of possible solution for all hyperparameter values. This is what is called the regularization path. It can be shown that for the problems we are interested in, parametric quadratic programming (PQP), the corresponding regularization path is piece wise linear. Moreover, its calculation is no more complex than calculating a single PQP solution. This thesis is organized in three chapters, the first one introduces the general setting of a learning problem under the Support Vector Machines’ (SVM) framework together with the theory and algorithms that allow us to find a solution. The second part deals with supervised learning problems for classification and ranking using the SVM framework. It is shown that the regularization path of these problems is piecewise linear and alternative proofs to the one of Rosset [Ross 07b] are given via the subdifferential. These results lead to the corresponding algorithms to solve the mentioned supervised problems. The third part deals with semi-supervised learning problems followed by unsupervised learning problems. For the semi-supervised learning a sparsity constraint is introduced along with the corresponding regularization path algorithm. Graph-based dimensionality reduction methods are used for unsupervised learning problems. Our main contribution is a novel algorithm that allows to choose the number of nearest neighbors in an adaptive and appropriate way contrary to classical approaches based on a fix number of neighbors
Vezard, Laurent. "Réduction de dimension en apprentissage supervisé. Application à l'étude de l'activité cérébrale." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00926845.
Повний текст джерелаShehzad, Muhammad Ahmed. "Pénalisation et réduction de la dimension des variables auxiliaires en théorie des sondages." Phd thesis, Université de Bourgogne, 2012. http://tel.archives-ouvertes.fr/tel-00812880.
Повний текст джерелаDao, Ngoc Bich. "Réduction de dimension de sac de mots visuels grâce à l’analyse formelle de concepts." Thesis, La Rochelle, 2017. http://www.theses.fr/2017LAROS010/document.
Повний текст джерелаIn several scientific fields such as statistics, computer vision and machine learning, redundant and/or irrelevant information reduction in the data description (dimension reduction) is an important step. This process contains two different categories : feature extraction and feature selection, of which feature selection in unsupervised learning is hitherto an open question. In this manuscript, we discussed about feature selection on image datasets using the Formal Concept Analysis (FCA), with focus on lattice structure and lattice theory. The images in a dataset were described as a set of visual words by the bag of visual words model. Two algorithms were proposed in this thesis to select relevant features and they can be used in both unsupervised learning and supervised learning. The first algorithm was the RedAttSansPerte, which based on lattice structure and lattice theory, to ensure its ability to remove redundant features using the precedence graph. The formal definition of precedence graph was given in this thesis. We also demonstrated their properties and the relationship between this graph and the AC-poset. Results from experiments indicated that the RedAttsSansPerte algorithm reduced the size of feature set while maintaining their performance against the evaluation by classification. Secondly, the RedAttsFloue algorithm, an extension of the RedAttsSansPerte algorithm, was also proposed. This extension used the fuzzy precedence graph. The formal definition and the properties of this graph were demonstrated in this manuscript. The RedAttsFloue algorithm removed redundant and irrelevant features while retaining relevant information according to the flexibility threshold of the fuzzy precedence graph. The quality of relevant information was evaluated by the classification. The RedAttsFloue algorithm is suggested to be more robust than the RedAttsSansPerte algorithm in terms of reduction
Blazere, Melanie. "Inférence statistique en grande dimension pour des modèles structurels. Modèles linéaires généralisés parcimonieux, méthode PLS et polynômes orthogonaux et détection de communautés dans des graphes." Thesis, Toulouse, INSA, 2015. http://www.theses.fr/2015ISAT0018/document.
Повний текст джерелаThis thesis falls within the context of high-dimensional data analysis. Nowadays we have access to an increasing amount of information. The major challenge relies on our ability to explore a huge amount of data and to infer their dependency structures.The purpose of this thesis is to study and provide theoretical guarantees to some specific methods that aim at estimating dependency structures for high-dimensional data. The first part of the thesis is devoted to the study of sparse models through Lasso-type methods. In Chapter 1, we present the main results on this topic and then we generalize the Gaussian case to any distribution from the exponential family. The major contribution to this field is presented in Chapter 2 and consists in oracle inequalities for a Group Lasso procedure applied to generalized linear models. These results show that this estimator achieves good performances under some specific conditions on the model. We illustrate this part by considering the case of the Poisson model. The second part concerns linear regression in high dimension but the sparsity assumptions is replaced by a low dimensional structure underlying the data. We focus in particular on the PLS method that attempts to find an optimal decomposition of the predictors given a response. We recall the main idea in Chapter 3. The major contribution to this part consists in a new explicit analytical expression of the dependency structure that links the predictors to the response. The next two chapters illustrate the power of this formula by emphasising new theoretical results for PLS. The third and last part is dedicated to graphs modelling and especially to community detection. After presenting the main trends on this topic, we draw our attention to Spectral Clustering that allows to cluster nodes of a graph with respect to a similarity matrix. In this thesis, we suggest an alternative to this method by considering a $l_1$ penalty. We illustrate this method through simulations
Spagnol, Adrien. "Indices de sensibilité via des méthodes à noyaux pour des problèmes d'optimisation en grande dimension." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEM012.
Повний текст джерелаThis thesis treats the optimization under constraints of high-dimensional black-box problems. Common in industrial applications, they frequently have an expensive associated cost which make most of the off-the-shelf techniques impractical. In order to come back to a tractable setup, the dimension of the problem is often reduced using different techniques such as sensitivity analysis. A novel sensitivity index is proposed in this work to distinct influential and negligible subsets of inputs in order to obtain a more tractable problem by solely working with the primer. Our index, relying on the Hilbert Schmidt independence criterion, provides an insight on the impact of a variable on the performance of the output or constraints satisfaction, key information in our study setting. Besides assessing which inputs are influential, several strategies are proposed to deal with negligible parameters. Furthermore, expensive industrial applications are often replaced by cheap surrogate models and optimized in a sequential manner. In order to circumvent the limitations due to the high number of parameters, also known as the curse of dimensionality, we introduce in this thesis an extension of the surrogated-based optimization. Thanks to the aforementioned new sensitivity indices, parameters are detected at each iteration and the optimization is conducted in a reduced space
Tenenhaus, Arthur. "Apprentissage dans les espaces de grande dimension : Application à la caractérisation de tumeurs noires de la peau à partir d'images." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2006. http://tel.archives-ouvertes.fr/tel-00142439.
Повний текст джерелаLes deux chapitres suivants proposent de nouvelles méthodes, découlant de cette étude. Elles se fondent sur des principes de réduction de dimension supervisée en se focalisant principalement sur la régression PLS, particulièrement bien adaptée à la gestion de données de grande dimension. Il s'agissait de concevoir des algorithmes de classification s'appuyant sur les principes algorithmiques de la régression PLS. Nous avons proposé, la Kernel Logistic PLS, modèle de classification nonlinéaire et binaire basé à la fois sur la construction de variables latentes et sur des transformations du type Empirical Kernel Map. Nous avons étendu la KL-PLS au cas où la variable à prédire est polytomique donnant naissance à la Kernel Multinomial Logistic PLS regression.
Enfin dans les deux derniers chapitres, nous avons appliqué ces méthodes à de nombreux domaines, notamment en analyse d'images. Nous avons ainsi contribué au développement d'une application en vraie grandeur dans le domaine médical en élaborant un outil d'aide au diagnostic de tumeurs noires de la peau à partir d'images.
Lespinats, Sylvain. "Style du génome exploré par analyse textuelle de l'ADN." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2006. http://tel.archives-ouvertes.fr/tel-00151611.
Повний текст джерелаPartant de ces constatations, nous avons mis en place des procédures d'évaluation des distances entre signatures de façon à rendre plus manifeste les informations biologiques sur lesquelles s'appuient nos analyses. Une méthode de projection non-linéaire des voisinages y est associée ce qui permet de s'affranchir des problèmes de grande dimension et de visualiser l'espace occupé par les données. L'analyse des relations entre les signatures pose le problème de la contribution de chaque variable (les mots) à la distance entre les signatures. Un Z-score original basé sur la variation de la fréquence des mots le long des génomes a permis de quantifier ces contributions. L'étude des variations de l'ensemble des fréquences le long d'un génomes permet d'extraire des segments originaux. Une méthode basée sur l'analyse du signal permet d'ailleurs de segmenter précisément ces zones originales.
Grâce à cet ensemble de méthodes, nous proposons des résultats biologiques. En particulier, nous mettons en évidence une organisation de l'espace des signatures génomiques cohérente avec la taxonomie des espèces. De plus, nous constatons la présence d'une syntaxe de l'ADN : il existe des « mots à caractère syntaxique » et des « mots à caractère sémantique », la signature s'appuyant surtout sur les mots à caractère syntaxique. Enfin, l'analyse des signatures le long du génome permet une détection et une segmentation précise des ARN et de probables transferts horizontaux. Une convergence du style des transferts horizontaux vers la signature de l'hôte a d'ailleurs pu être observée.
Des résultats variés ont été obtenus par analyse des signatures. Ainsi, la simplicité d'utilisation et la rapidité de l'analyse des séquences par signatures en font un outil puissant pour extraire de l'information biologique à partir des génomes.
Roget-Vial, Céline. "deux contributions à l'étude semi-paramétrique d'un modèle de régression." Phd thesis, Université Rennes 1, 2003. http://tel.archives-ouvertes.fr/tel-00008730.
Повний текст джерелаDimeglio, Chloé. "Méthodes d'estimations statistiques et apprentissage pour l'imagerie agricole." Toulouse 3, 2013. http://www.theses.fr/2013TOU30110.
Повний текст джерелаWe have to provide reliable information on the acreage estimate of crop areas. We have time series of indices contained in satellite images, and thus sets of curves. We propose to segment the space in order to reduce the variability of our initial classes of curves. Then, we reduce the data volume and we find a set of meaningful representative functions that characterizes the common behavior of each crop class. This method is close to the extraction of a "structural mean". We compare each unknown curve to a curve of the representative base and we allocate each curve to the class of the nearest representative curve. At the last step we learn the error of estimates on known data and correct the first estimate by calibration
Romary, Thomas. "Inversion des modèles stochastiques de milieux hétérogènes." Paris 6, 2008. https://tel.archives-ouvertes.fr/tel-00395528.
Повний текст джерелаThirion, Bertrand. "Analyse de données d' IRM fonctionnelle : statistiques, information et dynamique." Phd thesis, Télécom ParisTech, 2003. http://tel.archives-ouvertes.fr/tel-00457460.
Повний текст джерелаGirard, Sylvain. "Diagnostic du colmatage des générateurs de vapeur à l'aide de modèles physiques et statistiques." Phd thesis, Ecole Nationale Supérieure des Mines de Paris, 2012. http://pastel.archives-ouvertes.fr/pastel-00798355.
Повний текст джерелаChiapino, Maël. "Apprentissage de structures dans les valeurs extrêmes en grande dimension." Thesis, Paris, ENST, 2018. http://www.theses.fr/2018ENST0035/document.
Повний текст джерелаWe present and study unsupervised learning methods of multivariate extreme phenomena in high-dimension. Considering a random vector on which each marginal is heavy-tailed, the study of its behavior in extreme regions is no longer possible via usual methods that involve finite means and variances. Multivariate extreme value theory provides an adapted framework to this study. In particular it gives theoretical basis to dimension reduction through the angular measure. The thesis is divided in two main part: - Reduce the dimension by finding a simplified dependence structure in extreme regions. This step aim at recover subgroups of features that are likely to exceed large thresholds simultaneously. - Model the angular measure with a mixture distribution that follows a predefined dependence structure. These steps allow to develop new clustering methods for extreme points in high dimension
Chiapino, Maël. "Apprentissage de structures dans les valeurs extrêmes en grande dimension." Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0035.
Повний текст джерелаWe present and study unsupervised learning methods of multivariate extreme phenomena in high-dimension. Considering a random vector on which each marginal is heavy-tailed, the study of its behavior in extreme regions is no longer possible via usual methods that involve finite means and variances. Multivariate extreme value theory provides an adapted framework to this study. In particular it gives theoretical basis to dimension reduction through the angular measure. The thesis is divided in two main part: - Reduce the dimension by finding a simplified dependence structure in extreme regions. This step aim at recover subgroups of features that are likely to exceed large thresholds simultaneously. - Model the angular measure with a mixture distribution that follows a predefined dependence structure. These steps allow to develop new clustering methods for extreme points in high dimension
Raphel, Fabien. "Mathematical modelling and learning of biomedical signals for safety pharmacology." Thesis, Sorbonne université, 2022. http://www.theses.fr/2022SORUS116.
Повний текст джерелаAs a branch of pharmacology, cardiac safety pharmacology aims at investigating compound side effects on the cardiac system at therapeutic doses. These investigations, made through in silico, in vitro and in vivo experiments, allow to select/reject a compound at each step of the drug development process. A large subdomain of cardiac safety pharmacology is devoted to the study of the electrical activity of cardiac cells based on in silico and in vitro assays. This electrical activity is the consequence of polarised structure exchanges (mainly ions) between the extracellular and intracellular medium. A modification of the ionic exchanges induces changes in the electrical activity of the cardiac cell which can be pathological (e.g. by generating arrhythmia). Strong knowledges of these electrical signals are therefore essential to prevent risk of lethal events. Patch-clamp techniques are the most common methods to record the electrical activity of a cardiac cell. Although these electrical signals are well known, they are slow and tedious to perform, and therefore, expansive. A recent alternative is to consider microelectrode array (MEA) devices. Originally developped for neurons studies, its extension to cardiac cells allows a high throughput screening which was not possible with patch-clamp techniques. It consists of a plate with wells in which cardiac cells (forming a tissue) cover some electrodes. Therefore, the extension of these devices to cardiac cells allow to record the electrical activity of the cells at a tissue level (before and after compound addition into the wells). As a new signal, many studies have to be done to understand how ionic exchanges induce this recorded electrical activity, and, finally, to proceed the selection/rejection of a compound. Despite these signals are still not well known, recent studies have shown promising results in the consideration of MEA into cardiac safety pharmacology. The automation of the compound selection/rejection is still challenging and far from industrial applications, which is the final goal of this manuscript. Mathematically, the selection/rejection process can be seen as a binary classification problem. As in any supervised classification (and machine learning tasks, more generally), an input has to be defined. In our case, time series of the cardiac electrical activities are possibly long (minutes or hours) with a high sampling rate (∼ kHz) leading to an input living in a high-dimensional space (hundreds, thousands or even more). Moreover the number of available data is still low (at most hundreds). This critical regime named high dimension/low sample size make the context challenging. The aim of this manuscript is to provide a systematic strategy to select/reject compounds in an automated way, under the following constraints:• Deal with high dimension/low sample size regime. • No assumptions on the data distributions. • Exploit in silico models to improve the classification performances. • No or few parameters to tune. The first part of the manuscript is devoted to the context, followed by the description of the patch-clamp and MEA technologies. This part ends by the description of action potential and field potential models to perform in silico experiments. In a second part, two methodological aspects are developped, trying to comply, at best, with the constraints of the industrial application. The first one describes a double greedy goal-oriented strategy to reduce the input space based on a score function related to the classification success rate. Comparisons with classical dimension reduction methods such as PCA and PLS (with default parameters) are performed, showing that the proposed method led to better results. The second method consists in the construction of an augmented training set based on a reservoir of simulations, by considering the Hausdorff distance between sets and the maximisation of same score function as in the first method. The proposed strategy [...]
Gkamas, Theodosios. "Modélisation statistique de tenseurs d'ordre supérieur en imagerie par résonance magnétique de diffusion." Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAD036/document.
Повний текст джерелаDW-MRI is a non-invasive way to study in vivo the structure of nerve fibers in the brain. In this thesis, fourth order tensors (T4) were used to model DW-MRI data. In addition, the problems of group comparison or individual against a normal group were discussed and solved using statistical analysis on T4s. The approaches use nonlinear dimensional reductions, assisted by non-Euclidean metrics for T4s. The statistics are calculated in the reduced space and allow us to quantify the dissimilarity between the group (or the individual) of interest and the reference group. The proposed approaches are applied to neuromyelitis optica and patients with locked in syndrome. The derived conclusions are consistent with the current medical knowledge
Brunet, Camille. "Classification parcimonieuse et discriminante de données complexes. Une application à la cytologie." Phd thesis, Université d'Evry-Val d'Essonne, 2011. http://tel.archives-ouvertes.fr/tel-00671333.
Повний текст джерелаZwald, Laurent. "PERFORMANCES STATISTIQUES D'ALGORITHMES D'APPRENTISSAGE : ``KERNEL PROJECTION MACHINE'' ET ANALYSE EN COMPOSANTES PRINCIPALES A NOYAU." Phd thesis, Université Paris Sud - Paris XI, 2005. http://tel.archives-ouvertes.fr/tel-00012011.
Повний текст джерелаdes contributions à la communauté du machine learning en utilisant des
techniques de statistiques modernes basées sur des avancées dans l'étude
des processus empiriques. Dans une première partie, les propriétés statistiques de
l'analyse en composantes principales à noyau (KPCA) sont explorées. Le
comportement de l'erreur de reconstruction est étudié avec un point de vue
non-asymptotique et des inégalités de concentration des valeurs propres de la matrice de
Gram sont données. Tous ces résultats impliquent des vitesses de
convergence rapides. Des propriétés
non-asymptotiques concernant les espaces propres de la KPCA eux-mêmes sont également
proposées. Dans une deuxième partie, un nouvel
algorithme de classification a été
conçu : la Kernel Projection Machine (KPM).
Tout en s'inspirant des Support Vector Machines (SVM), il met en lumière que la sélection d'un espace vectoriel par une méthode de
réduction de la dimension telle que la KPCA régularise
convenablement. Le choix de l'espace vectoriel utilisé par la KPM est guidé par des études statistiques de sélection de modéle par minimisation pénalisée de la perte empirique. Ce
principe de régularisation est étroitement relié à la projection fini-dimensionnelle étudiée dans les travaux statistiques de
Birgé et Massart. Les performances de la KPM et de la SVM sont ensuite comparées sur différents jeux de données. Chaque thème abordé dans cette thèse soulève de nouvelles questions d'ordre théorique et pratique.
Malfante, Marielle. "Automatic classification of natural signals for environmental monitoring." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAU025/document.
Повний текст джерелаThis manuscript summarizes a three years work addressing the use of machine learning for the automatic analysis of natural signals. The main goal of this PhD is to produce efficient and operative frameworks for the analysis of environmental signals, in order to gather knowledge and better understand the considered environment. Particularly, we focus on the automatic tasks of detection and classification of natural events.This thesis proposes two tools based on supervised machine learning (Support Vector Machine, Random Forest) for (i) the automatic classification of events and (ii) the automatic detection and classification of events. The success of the proposed approaches lies in the feature space used to represent the signals. This relies on a detailed description of the raw acquisitions in various domains: temporal, spectral and cepstral. A comparison with features extracted using convolutional neural networks (deep learning) is also made, and favours the physical features to the use of deep learning methods to represent transient signals.The proposed tools are tested and validated on real world acquisitions from different environments: (i) underwater and (ii) volcanic areas. The first application considered in this thesis is devoted to the monitoring of coastal underwater areas using acoustic signals: continuous recordings are analysed to automatically detect and classify fish sounds. A day to day pattern in the fish behaviour is revealed. The second application targets volcanoes monitoring: the proposed system classifies seismic events into categories, which can be associated to different phases of the internal activity of volcanoes. The study is conducted on six years of volcano-seismic data recorded on Ubinas volcano (Peru). In particular, the outcomes of the proposed automatic classification system helped in the discovery of misclassifications in the manual annotation of the recordings. In addition, the proposed automatic classification framework of volcano-seismic signals has been deployed and tested in Indonesia for the monitoring of Mount Merapi. The software implementation of the framework developed in this thesis has been collected in the Automatic Analysis Architecture (AAA) package and is freely available
Belhadji, Ayoub. "Echantillonnage des sous-espaces à l’aide des processus ponctuels déterminantaux." Thesis, Ecole centrale de Lille, 2020. http://www.theses.fr/2020ECLI0021.
Повний текст джерелаDeterminantal point processes are probabilistic models of repulsion.These models were studied in various fields: random matrices, quantum optics, spatial statistics, image processing, machine learning and recently numerical integration.In this thesis, we study subspace sampling using determinantal point processes. This problem takes place within the intersection of three sub-domains of approximation theory: subset selection, kernel quadrature and kernel interpolation. We study these classical topics, through a new interpretation of these probabilistic models: a determinantal point process is a natural way to define a random subspace. Beside giving a unified analysis to numerical integration and interpolation under determinantal point processes, this new perspective allows to work out the theoretical guarantees of several approximation algorithms, and to prove their optimality in some settings
Belhadji, Ayoub. "Echantillonnage des sous-espaces à l’aide des processus ponctuels déterminantaux." Thesis, Centrale Lille Institut, 2020. http://www.theses.fr/2020CLIL0021.
Повний текст джерелаDeterminantal point processes are probabilistic models of repulsion.These models were studied in various fields: random matrices, quantum optics, spatial statistics, image processing, machine learning and recently numerical integration.In this thesis, we study subspace sampling using determinantal point processes. This problem takes place within the intersection of three sub-domains of approximation theory: subset selection, kernel quadrature and kernel interpolation. We study these classical topics, through a new interpretation of these probabilistic models: a determinantal point process is a natural way to define a random subspace. Beside giving a unified analysis to numerical integration and interpolation under determinantal point processes, this new perspective allows to work out the theoretical guarantees of several approximation algorithms, and to prove their optimality in some settings
Raja, Suleiman Raja Fazliza. "Méthodes de detection robustes avec apprentissage de dictionnaires. Applications à des données hyperspectrales." Thesis, Nice, 2014. http://www.theses.fr/2014NICE4121/document.
Повний текст джерелаThis Ph.D dissertation deals with a "one among many" detection problem, where one has to discriminate between pure noise under H0 and one among L known alternatives under H1. This work focuses on the study and implementation of robust reduced dimension detection tests using optimized dictionaries. These detection methods are associated with the Generalized Likelihood Ratio test. The proposed approaches are principally assessed on hyperspectral data. In the first part, several technical topics associated to the framework of this dissertation are presented. The second part highlights the theoretical and algorithmic aspects of the proposed methods. Two issues linked to the large number of alternatives arise in this framework. In this context, we propose dictionary learning techniques based on a robust criterion that seeks to minimize the maximum power loss (type minimax). In the case where the learned dictionary has K = 1 column, we show that the exact solution can be obtained. Then, we propose in the case K > 1 three minimax learning algorithms. Finally, the third part of this manuscript presents several applications. The principal application regards astrophysical hyperspectral data of the Multi Unit Spectroscopic Explorer instrument. Numerical results show that the proposed algorithms are robust and in the case K > 1 they allow to increase the minimax detection performances over the K = 1 case. Other possible applications such as worst-case recognition of faces and handwritten digits are presented
Vuillemin, Pierre. "Approximation de modèles dynamiques de grande dimension sur intervalles de fréquences limités." Thesis, Toulouse, ISAE, 2014. http://www.theses.fr/2014ESAE0041/document.
Повний текст джерелаPhysical systems are represented by mathematical models in order to be simulated, analysed or controlled. Depending on the complexity of the physical system it is meant to represent and on the way it has been built, a model can be more or less complex. This complexity can become an issue in practice due to the limited computational power and memory of computers. One way to alleviate this issue consists in using model approximation which is aimed at finding a simpler model that still represents faithfully the physical system.In the case of Linear Time Invariant (LTI) dynamical models, complexity translates into a large dimension of the state vector and one talks about large-scale models. Model approximation is in this case also called model reduction and consists in finding a model with a smaller state vector such that the input-to-output behaviours of both models are close with respect to some measure. The H2-norm has been extensively used in the literature to evaluate the quality of a reduced-order model. Yet, due to the limited band width of actuators, sensors and the fact that models are generally representative on a bounded frequency interval only, a reduced-order model that faithfully reproduces the behaviour of the large-scale one over a bounded frequency interval only, may be morerelevant. That is why, in this study, the frequency-limited H2-norm, or H2,Ω-norm, which is the restriction of theH2-norm over a frequency interval, has been considered. In particular, the problem of finding a reduced-ordermodel that minimises the H2, Ω-norm of the approximation error with the large-scale model has been addressed here. For that purpose, two approaches have been developed. The first one is an empirical approach based on the modification of a sub-optimal H2 model approximation method. Its performances are interesting in practice and compete with some well-known frequency-limited approximation methods. The second one is an optimisationmethod relying on the poles-residues formulation of the H2,Ω-norm. This formulation naturally extends the oneexisting for the H2-norm and can also be used to derive two upper bounds on the H∞-norm of LTI dynamical models which is of particular interest in model reduction. The first-order optimality conditions of the optimal H2,Ω approximation problem are derived and used to built a complex-domain descent algorithm aimed at finding a local minimum of the problem. Together with the H∞ bounds on the approximation error, this approach isused to perform control of large-scale models. From a practical point of view, the methods proposed in this study have been successfully applied in an industrial context as a part of the global process aimed at controlling a flexible civilian aircraft
Durif, Ghislain. "Multivariate analysis of high-throughput sequencing data." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1334/document.
Повний текст джерелаThe statistical analysis of Next-Generation Sequencing data raises many computational challenges regarding modeling and inference, especially because of the high dimensionality of genomic data. The research work in this manuscript concerns hybrid dimension reduction methods that rely on both compression (representation of the data into a lower dimensional space) and variable selection. Developments are made concerning: the sparse Partial Least Squares (PLS) regression framework for supervised classification, and the sparse matrix factorization framework for unsupervised exploration. In both situations, our main purpose will be to focus on the reconstruction and visualization of the data. First, we will present a new sparse PLS approach, based on an adaptive sparsity-inducing penalty, that is suitable for logistic regression to predict the label of a discrete outcome. For instance, such a method will be used for prediction (fate of patients or specific type of unidentified single cells) based on gene expression profiles. The main issue in such framework is to account for the response to discard irrelevant variables. We will highlight the direct link between the derivation of the algorithms and the reliability of the results. Then, motivated by questions regarding single-cell data analysis, we propose a flexible model-based approach for the factorization of count matrices, that accounts for over-dispersion as well as zero-inflation (both characteristic of single-cell data), for which we derive an estimation procedure based on variational inference. In this scheme, we consider probabilistic variable selection based on a spike-and-slab model suitable for count data. The interest of our procedure for data reconstruction, visualization and clustering will be illustrated by simulation experiments and by preliminary results on single-cell data analysis. All proposed methods were implemented into two R-packages "plsgenomics" and "CMF" based on high performance computing
Palazzo, Martin. "Dimensionality Reduction of Biomedical Tumor Profiles : a Machine Learning Approach." Thesis, Troyes, 2021. http://www.theses.fr/2021TROY0031.
Повний текст джерелаThe increasing pace of data generation from tumor profiles profiles during the last decade has enable the development of statistical learning algorithms to explore and analyze the landscape of tumor types, subtypes and patient survival from a biomolecular point of view. Tumor data is mainly described by trasncriptomic features and the level of expression of a given gene-transcript in the tumor cell, therefore these features can be used to learn statistical rules that improves the understanding about the state and type of a cancer cell. Nevertheless transcriptomic tumor data is high dimensional and each tumor can be described by thousands of gene features making it difficult to perform a machine learning task and to understand the underlying biological mechanisms. This thesis studies how to reduce dimensionality and to gain interpretability about which genes encode signals of the data distribution by proposing dimension reduction methods based on Feature Selection and Feature Extraction pipelines. The proposed methods are based on Latent Variable Models and Kernel Methods with the idea to explore the connection between pair-wise similarity functions of tumor samples and low dimensional latent spaces that captures the inner structure of the training data. Proposed methods have shown improvements in supervised and unsupervised feature selection tasks when compared with benchmark methods to classify and learn subgroups of tumors respectively
El, anbari Mohammed. "Regularisation and variable selection using penalized likelihood." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00661689.
Повний текст джерелаHeredia, Guzman Maria Belen. "Contributions to the calibration and global sensitivity analysis of snow avalanche numerical models." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALU028.
Повний текст джерелаSnow avalanche is a natural hazard defined as a snow mass in fast motion. Since the thirties, scientists have been designing snow avalanche models to describe snow avalanches. However, these models depend on some poorly known input parameters that cannot be measured. To understand better model input parameters and model outputs, the aims of this thesis are (i) to propose a framework to calibrate input parameters and (ii) to develop methods to rank input parameters according to their importance in the model taking into account the functional nature of outputs. Within these two purposes, we develop statistical methods based on Bayesian inference and global sensitivity analyses. All the developments are illustrated on test cases and real snow avalanche data.First, we propose a Bayesian inference method to retrieve input parameter distribution from avalanche velocity time series having been collected on experimental test sites. Our results show that it is important to include the error structure (in our case the autocorrelation) in the statistical modeling in order to avoid bias for the estimation of friction parameters.Second, to identify important input parameters, we develop two methods based on variance based measures. For the first method, we suppose that we have a given data sample and we want to estimate sensitivity measures with this sample. Within this purpose, we develop a nonparametric estimation procedure based on the Nadaraya-Watson kernel smoother to estimate aggregated Sobol' indices. For the second method, we consider the setting where the sample is obtained from acceptance/rejection rules corresponding to physical constraints. The set of input parameters become dependent due to the acceptance-rejection sampling, thus we propose to estimate aggregated Shapley effects (extension of Shapley effects to multivariate or functional outputs). We also propose an algorithm to construct bootstrap confidence intervals. For the snow avalanche model application, we consider different uncertainty scenarios to model the input parameters. Under our scenarios, the release avalanche position and volume are the most crucial inputs.Our contributions should help avalanche scientists to (i) account for the error structure in model calibration and (ii) rankinput parameters according to their importance in the models using statistical methods
Lefieux, Vincent. "Modèles semi-paramétriques appliqués à la prévision des séries temporelles : cas de la consommation d’électricité." Phd thesis, Rennes 2, 2007. https://theses.hal.science/tel-00179866/fr/.
Повний текст джерелаRéseau de Transport d’Electricité (RTE), in charge of operating the French electric transportation grid, needs an accurate forecast of the power consumption in order to operate it correctly. The forecasts used everyday result from a model combining a nonlinear parametric regression and a SARIMA model. In order to obtain an adaptive forecasting model, nonparametric forecasting methods have already been tested without real success. In particular, it is known that a nonparametric predictor behaves badly with a great number of explanatory variables, what is commonly called the curse of dimensionality. Recently, semiparametric methods which improve the pure nonparametric approach have been proposed to estimate a regression function. Based on the concept of ”dimension reduction”, one those methods (called MAVE : Moving Average -conditional- Variance Estimate) can apply to time series. We study empirically its effectiveness to predict the future values of an autoregressive time series. We then adapt this method, from a practical point of view, to forecast power consumption. We propose a partially linear semiparametric model, based on the MAVE method, which allows to take into account simultaneously the autoregressive aspect of the problem and the exogenous variables. The proposed estimation procedure is practicaly efficient
Romary, Thomas. "INVERSION DES MODELES STOCHASTIQUES DE MILIEUX HETEROGENES." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2008. http://tel.archives-ouvertes.fr/tel-00395528.
Повний текст джерелаBrunet, Camille. "Sparse and discriminative clustering for complex data : application to cytology." Thesis, Evry-Val d'Essonne, 2011. http://www.theses.fr/2011EVRY0018/document.
Повний текст джерелаThe main topics of this manuscript are sparsity and discrimination for modeling complex data. In a first part, we focus on the GMM context: we introduce a new family of probabilistic models which both clusters and finds a discriminative subspace chosen such as it best discriminates the groups. A family of 12 DLM models is introduced and is based on two three-ideas: firstly, the actual data live in a latent subspace with an intrinsic dimension lower than the dimension of the observed space; secondly, a subspace of K-1 dimensions is theoretically sufficient to discriminate K groups; thirdly, the observation and the latent spaces are linked by a linear transformation. An estimation procedure, named Fisher-EM is proposed and improves, most of the time, clustering performances owing to the use of a discriminative subspace. As each axis, spanning the discriminative subspace, is a linear combination of all original variables, we therefore proposed 3 different methods based on a penalized criterion in order to ease the interpretation results. In particular, it allows to introduce sparsity directly in the loadings of the projection matrix which enables also to make variable selection for clustering. In a second part, we deal with the seriation context. We propose a dissimilarity measure based on a common neighborhood which allows to deal with noisy data and overlapping groups. A forward stepwise seriation algorithm, called the PB-Clus algorithm, is introduced and allows to obtain a block representation form of the data. This tool enables to reveal the intrinsic structure of data even in the case of noisy data, outliers, overlapping and non-Gaussian groups. Both methods has been validated on a biological application based on the cancer cell detection
Portier, François. "Réduction de la dimension en régression." Phd thesis, Université Rennes 1, 2013. http://tel.archives-ouvertes.fr/tel-00871049.
Повний текст джерелаGiacofci, Madison. "Classification non supervisée et sélection de variables dans les modèles mixtes fonctionnels. Applications à la biologie moléculaire." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00987441.
Повний текст джерелаGrishchenko, Dmitry. "Optimisation proximale avec réduction automatique de dimension." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM055.
Повний текст джерелаIn this thesis, we develop a framework to reduce the dimensionality of composite optimization problems with sparsity inducing regularizers. Based on the identification property of proximal methods, we first develop a ``sketch-and-project'' method that uses projections based on the structure of the correct point. This method allows to work with random low-dimensional subspaces instead of considering the full space in the cases when the final solution is sparse. Second, we place ourselves in the context of the delay-tolerant asynchronous proximal methods and use our dimension reduction technique to decrease the total size of communications. However, this technique is proven to converge only for well-conditioned problems both in theory in practice.Thus, we investigate wrapping it up into a proximal reconditioning framework. This leads to a theoretically backed algorithm that is guaranteed to cost less in terms of communications compared with a non-sparsified version; we show in practice that it implies faster runtime convergence when the sparsity of the problem is sufficiently big
Prigent, Sylvain. "Apport de l'imagerie multi et hyperspectrale pour l'évaluation de la pigmentation de la peau." Phd thesis, Université de Nice Sophia-Antipolis, 2012. http://tel.archives-ouvertes.fr/tel-00764831.
Повний текст джерелаTilquin, Florian. "Statistical models on manifolds for anomaly detection in medical images." Thesis, Strasbourg, 2019. https://publication-theses.unistra.fr/public/theses_doctorat/2019/TILQUIN_Florian_2019_ED269.pdf.
Повний текст джерелаWe consider the detection of abnormal patterns in neuroimaging data, in the context of comparing a single subject to a normal control group. Standard approaches for anomaly detection are related to the one-class classification problem, in which one tries to detect outliers (corresponding here to “abnormal” subjects) with respect to a learned distribution of normal controls. These approaches will make a global statement about the subject class (i.e. pathological or not) but do not provide a spatial localization of abnormal patterns within the subject’s image data. On the other hand, the approaches developed for localizing subject-specific abnormalities generally resort to univariate voxel-wise or ROI-based statistical tests and rely on Gaussian distribution assumption. In this thesis we present and compare different standard and novel methods for the detection and localization of subject-specific abnormal patterns within the framework of subject-versus group comparison. The proposed methods rely on a global (multivariate) non-linear modelisation of normal images data, which enable the representation of complex spatial patterns with non Gaussian distributions. The manifold of normal image patterns is learned from a control group with the help of non-linear dimension reduction techniques. Identifying abnormalities is mathematically associated with finding the projection of a subject over the manifold in which the control group lies. The detection itself involves a statistical test of the residual between the projection and the original image. Different types of synthetic datasets have been created in the purpose of comparing the different approaches. Experiments on synthetic data underline the benefit of using multivariate representations, compared to standard univariate approaches. Conclusions regarding the comparisons of linear and non linear multivariate approaches can broadly differ depending on the kind of dataset being analysed. All methods are also illustrated on the detection of abnormal spatial patterns in neuroimage data of dementia afflicted patients
Guérif, Sébastien. "Réduction de dimension en apprentissage numérique non supervisé." Paris 13, 2006. http://www.theses.fr/2006PA132032.
Повний текст джерела