Tesi sul tema "Analyse des données compositionnelles"
Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili
Vedi i top-50 saggi (tesi di laurea o di dottorato) per l'attività di ricerca sul tema "Analyse des données compositionnelles".
Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.
Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.
Vedi le tesi di molte aree scientifiche e compila una bibliografia corretta.
Illous, Hugo. "Abstractions relationnelles de la mémoire pour une analyse compositionnelle de structures de données". Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE015.
Testo completoStatic analyses aim at inferring semantic properties of programs. We distinguish two important classes of static analyses: state analyses and relational analyses. While state analyses aim at computing an over-approximation of reachable states of programs, relational analyses aim at computing functional properties over the input-output states of programs. Relational analyses offer several advantages, such as their ability to infer semantics properties more expressive compared to state analyses. Moreover, they offer the ability to make the analysis compositional, using input-output relations as summaries for procedures, which is an advantage for scalability. In the case of numeric programs, several analyses have been proposed that utilize relational numerical abstract domains to describe relations. On the other hand, designing abstractions for relations over input-output memory states and taking shapes into account is challenging. In this Thesis, we propose a set of novel logical connectives to describe such relations, which rely on separation logic. This logic can express that certain memory areas are unchanged, freshly allocated, or freed, or that only part of the memory is modified (and how). Using these connectives, we build an abstract domain and design a compositional static analysis by abstract interpretation that over-approximates relations over memory states containing inductive structures. We implement this approach as a plug-in of the FRAMA-C analyzer. We evaluate it on small programs written in C that manipulate singly linked lists and binary trees, but also on a bigger program that consists of a part of Emacs. The experimental results show that our approach allows us to infer more expressive semantic properties than states analyses, from a logical point of view. It is also much faster on programs with an important number of function calls without losing precision
Soret, Perrine. "Régression pénalisée de type Lasso pour l’analyse de données biologiques de grande dimension : application à la charge virale du VIH censurée par une limite de quantification et aux données compositionnelles du microbiote". Thesis, Bordeaux, 2019. http://www.theses.fr/2019BORD0254.
Testo completoIn clinical studies and thanks to technological progress, the amount of information collected in the same patient continues to grow leading to situations where the number of explanatory variables is greater than the number of individuals. The Lasso method proved to be appropriate to circumvent over-adjustment problems in high-dimensional settings.This thesis is devoted to the application and development of Lasso-penalized regression for clinical data presenting particular structures.First, in patients with the human immunodeficiency virus, mutations in the virus's genetic structure may be related to the development of drug resistance. The prediction of the viral load from (potentially large) mutations allows guiding treatment choice.Below a threshold, the viral load is undetectable, data are left-censored. We propose two new Lasso approaches based on the Buckley-James algorithm, which imputes censored values by a conditional expectation. By reversing the response, we obtain a right-censored problem, for which non-parametric estimates of the conditional expectation have been proposed in survival analysis. Finally, we propose a parametric estimation based on a Gaussian hypothesis.Secondly, we are interested in the role of the microbiota in the deterioration of respiratory health. The microbiota data are presented as relative abundances (proportion of each species per individual, called compositional data) and they have a phylogenetic structure.We have established a state of the art methods of statistical analysis of microbiota data. Due to the novelty, few recommendations exist on the applicability and effectiveness of the proposed methods. A simulation study allowed us to compare the selection capacity of penalization methods proposed specifically for this type of data.Then we apply this research to the analysis of the association between bacteria / fungi and the decline of pulmonary function in patients with cystic fibrosis from the MucoFong project
Bonacina, Francesco. "Advanced Statistical Approaches for the Global Analysis of Influenza Virus Circulation". Electronic Thesis or Diss., Sorbonne université, 2024. http://www.theses.fr/2024SORUS213.
Testo completoThe mitigation of human Influenza remains a challenge due to the complexities characterizing its spread. Multiple types and subtypes of influenza viruses co-circulate glob-ally, with a dynamic characterized by annual epidemics and occasional shifts due tomajor epidemiological events. This thesis develops statistical tools to study some keyaspects of influenza spatiotemporal ecological dynamics, proposing unconventionalapproaches in epidemiology. The analyses are based on data from FluNet, a com-prehensive dataset provided by the World Health Organization that includes weeklycounts of influenza samples from over 150 countries, categorized by type and subtype.The first two research projects included in the thesis have an applied focus, while thethird study is theoretically oriented, although it includes an application to influenzasurveillance data. The first study examines the decline of influenza during the COVID-19 pandemic, assessing the magnitude of the decline by country globally and usingregression tree-based techniques to identify country-level factors associated with thedecline. The second study examines the coupled dynamics of influenza (sub)types,focusing on their relative abundance across countries and years through the lens ofCompositional Data Analysis. It provides evidence of the changes in (sub)type mixingduring the COVID-19 pandemic and develops probabilistic forecasting algorithms topredict (sub)type composition one year in advance. The third study formulates a con-ditional copula model to describe the dependencies of multivariate data conditionallyupon certain covariates. The asymptotic consistency of the model is then investigated.Finally, the model is used to classify countries and years characterized by similar de-pendencies in the relative abundances of influenza (sub)types
Béranger, Sébastien. "Les espaces paramétriques dans la musique instrumentale depuis 1950 : analyse croisée de trois approches compositionnelles". Nice, 2003. http://www.theses.fr/2003NICE2026.
Testo completoMarine, Cadoret. "Analyse factorielle de données de catégorisation. : Application aux données sensorielles". Rennes, Agrocampus Ouest, 2010. http://www.theses.fr/2010NSARG006.
Testo completoIn sensory analysis, holistic approaches in which objects are considered as a whole are increasingly used to collect data. Their interest comes on a one hand from their ability to acquire other types of information as the one obtained by traditional profiling methods and on the other hand from the fact they require no special skills, which makes them feasible by all subjects. Categorization (or free sorting), in which subjects are asked to provide a partition of objects, belongs to these approaches. The first part of this work focuses on categorization data. After seeing that this method of data collection is relevant, we focus on the statistical analysis of these data through the research of Euclidean representations. The proposed methodology which consists in using factorial methods such as Multiple Correspondence Analysis (MCA) or Multiple Factor Analysis (MFA) is also enriched with elements of validity. This methodology is then illustrated by the analysis of two data sets obtained from beers on a one hand and perfumes on the other hand. The second part is devoted to the study of two data collection methods related to categorization: sorted Napping® and hierarchical sorting. For both data collections, we are also interested in statistical analysis by adopting an approach similar to the one used for categorization data. The last part is devoted to the implementation in the R software of functions to analyze the three kinds of data that are categorization data, hierarchical sorting data and sorted Napping® data
Gomes, Da Silva Alzennyr. "Analyse des données évolutives : application aux données d'usage du Web". Phd thesis, Université Paris Dauphine - Paris IX, 2009. http://tel.archives-ouvertes.fr/tel-00445501.
Testo completoGomes, da Silva Alzennyr. "Analyse des données évolutives : Application aux données d'usage du Web". Paris 9, 2009. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2009PA090047.
Testo completoNowadays, more and more organizations are becoming reliant on the Internet. The Web has become one of the most widespread platforms for information change and retrieval. The growing number of traces left behind user transactions (e. G. : customer purchases, user sessions, etc. ) automatically increases the importance of usage data analysis. Indeed, the way in which a web site is visited can change over time. These changes can be related to some temporal factors (day of the week, seasonality, periods of special offer, etc. ). By consequence, the usage models must be continuously updated in order to reflect the current behaviour of the visitors. Such a task remains difficult when the temporal dimension is ignored or simply introduced into the data description as a numeric attribute. It is precisely on this challenge that the present thesis is focused. In order to deal with the problem of acquisition of real usage data, we propose a methodology for the automatic generation of artificial usage data over which one can control the occurrence of changes and thus, analyse the efficiency of a change detection system. Guided by tracks born of some exploratory analyzes, we propose a tilted window approach for detecting and following-up changes on evolving usage data. In order measure the level of changes, this approach applies two external evaluation indices based on the clustering extension. The proposed approach also characterizes the changes undergone by the usage groups (e. G. Appearance, disappearance, fusion and split) at each timestamp. Moreover, the refereed approach is totally independent of the clustering method used and is able to manage different kinds of data other than usage data. The effectiveness of this approach is evaluated on artificial data sets of different degrees of complexity and also on real data sets from different domains (academic, tourism, e-business and marketing)
Peng, Tao. "Analyse de données loT en flux". Electronic Thesis or Diss., Aix-Marseille, 2021. http://www.theses.fr/2021AIXM0649.
Testo completoSince the advent of the IoT (Internet of Things), we have witnessed an unprecedented growth in the amount of data generated by sensors. To exploit this data, we first need to model it, and then we need to develop analytical algorithms to process it. For the imputation of missing data from a sensor f, we propose ISTM (Incremental Space-Time Model), an incremental multiple linear regression model adapted to non-stationary data streams. ISTM updates its model by selecting: 1) data from sensors located in the neighborhood of f, and 2) the near-past most recent data gathered from f. To evaluate data trustworthiness, we propose DTOM (Data Trustworthiness Online Model), a prediction model that relies on online regression ensemble methods such as AddExp (Additive Expert) and BNNRW (Bagging NNRW) for assigning a trust score in real time. DTOM consists: 1) an initialization phase, 2) an estimation phase, and 3) a heuristic update phase. Finally, we are interested predicting multiple outputs STS in presence of imbalanced data, i.e. when there are more instances in one value interval than in another. We propose MORSTS, an online regression ensemble method, with specific features: 1) the sub-models are multiple output, 2) adoption of a cost sensitive strategy i.e. the incorrectly predicted instance has a higher weight, and 3) management of over-fitting by means of k-fold cross-validation. Experimentation with with real data has been conducted and the results were compared with reknown techniques
Sibony, Eric. "Analyse mustirésolution de données de classements". Thesis, Paris, ENST, 2016. http://www.theses.fr/2016ENST0036/document.
Testo completoThis thesis introduces a multiresolution analysis framework for ranking data. Initiated in the 18th century in the context of elections, the analysis of ranking data has attracted a major interest in many fields of the scientific literature : psychometry, statistics, economics, operations research, machine learning or computational social choice among others. It has been even more revitalized by modern applications such as recommender systems, where the goal is to infer users preferences in order to make them the best personalized suggestions. In these settings, users express their preferences only on small and varying subsets of a large catalog of items. The analysis of such incomplete rankings poses however both a great statistical and computational challenge, leading industrial actors to use methods that only exploit a fraction of available information. This thesis introduces a new representation for the data, which by construction overcomes the two aforementioned challenges. Though it relies on results from combinatorics and algebraic topology, it shares several analogies with multiresolution analysis, offering a natural and efficient framework for the analysis of incomplete rankings. As it does not involve any assumption on the data, it already leads to overperforming estimators in small-scale settings and can be combined with many regularization procedures for large-scale settings. For all those reasons, we believe that this multiresolution representation paves the way for a wide range of future developments and applications
Vidal, Jules. "Progressivité en analyse topologique de données". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS398.
Testo completoTopological Data Analysis (TDA) forms a collection of tools that enable the generic and efficient extraction of features in data. However, although most TDA algorithms have practicable asymptotic complexities, these methods are rarely interactive on real-life datasets, which limits their usability for interactive data analysis and visualization. In this thesis, we aimed at developing progressive methods for the TDA of scientific scalar data, that can be interrupted to swiftly provide a meaningful approximate output and that are able to refine it otherwise. First, we introduce two progressive algorithms for the computation of the critical points and the extremum-saddle persistence diagram of a scalar field. Next, we revisit this progressive framework to introduce an approximation algorithm for the persistence diagram of a scalar field, with strong guarantees on the related approximation error. Finally, in a effort to perform visual analysis of ensemble data, we present a novel progressive algorithm for the computation of the discrete Wasserstein barycenter of a set of persistence diagrams, a notoriously computationally intensive task. Our progressive approach enables the approximation of the barycenter within interactive times. We extend this method to a progressive, time-constraint, topological ensemble clustering algorithm
Sibony, Eric. "Analyse mustirésolution de données de classements". Electronic Thesis or Diss., Paris, ENST, 2016. http://www.theses.fr/2016ENST0036.
Testo completoThis thesis introduces a multiresolution analysis framework for ranking data. Initiated in the 18th century in the context of elections, the analysis of ranking data has attracted a major interest in many fields of the scientific literature : psychometry, statistics, economics, operations research, machine learning or computational social choice among others. It has been even more revitalized by modern applications such as recommender systems, where the goal is to infer users preferences in order to make them the best personalized suggestions. In these settings, users express their preferences only on small and varying subsets of a large catalog of items. The analysis of such incomplete rankings poses however both a great statistical and computational challenge, leading industrial actors to use methods that only exploit a fraction of available information. This thesis introduces a new representation for the data, which by construction overcomes the two aforementioned challenges. Though it relies on results from combinatorics and algebraic topology, it shares several analogies with multiresolution analysis, offering a natural and efficient framework for the analysis of incomplete rankings. As it does not involve any assumption on the data, it already leads to overperforming estimators in small-scale settings and can be combined with many regularization procedures for large-scale settings. For all those reasons, we believe that this multiresolution representation paves the way for a wide range of future developments and applications
Périnel, Emmanuel. "Segmentation en analyse de données symboliques : le cas de données probabilistes". Paris 9, 1996. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1996PA090079.
Testo completoAaron, Catherine. "Connexité et analyse des données non linéaires". Phd thesis, Université Panthéon-Sorbonne - Paris I, 2005. http://tel.archives-ouvertes.fr/tel-00308495.
Testo completoDarlay, Julien. "Analyse combinatoire de données : structures et optimisation". Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00683651.
Testo completoOperto, Grégory. "Analyse structurelle surfacique de données fonctionnelles cétrébrales". Aix-Marseille 3, 2009. http://www.theses.fr/2009AIX30060.
Testo completoFunctional data acquired by magnetic resonance contain a measure of the activity in every location of the brain. If many methods exist, the automatic analysis of these data remains an open problem. In particular, the huge majority of these methods consider these data in a volume-based fashion, in the 3D acquisition space. However, most of the activity is generated within the cortex, which can be considered as a surface. Considering the data on the cortical surface has many advantages : on one hand, its geometry can be taken into account in every processing step, on the other hand considering the whole volume reduces the detection power of usually employed statistical tests. This thesis hence proposes an extension of the application field of volume-based methods to the surface-based domain by adressing problems such as projecting data onto the surface, performing surface-based multi-subjects analysis, and estimating results validity
Le, Béchec Antony. "Gestion, analyse et intégration des données transcriptomiques". Rennes 1, 2007. http://www.theses.fr/2007REN1S051.
Testo completoAiming at a better understanding of diseases, transcriptomic approaches allow the analysis of several thousands of genes in a single experiment. To date, international standard initiatives have allowed the utilization of large quantity of data generated using transcriptomic approaches by the whole scientific community, and a large number of algorithms are available to process and analyze the data sets. However, the major challenge remaining to tackle is now to provide biological interpretations to these large sets of data. In particular, their integration with additional biological knowledge would certainly lead to an improved understanding of complex biological mechanisms. In my thesis work, I have developed a novel and evolutive environment for the management and analysis of transcriptomic data. Micro@rray Integrated Application (M@IA) allows for management, processing and analysis of large scale expression data sets. In addition, I elaborated a computational method to combine multiple data sources and represent differentially expressed gene networks as interaction graphs. Finally, I used a meta-analysis of gene expression data extracted from the literature to select and combine similar studies associated with the progression of liver cancer. In conclusion, this work provides a novel tool and original analytical methodologies thus contributing to the emerging field of integrative biology and indispensable for a better understanding of complex pathophysiological processes
Abdali, Abdelkebir. "Systèmes experts et analyse de données industrielles". Lyon, INSA, 1992. http://www.theses.fr/1992ISAL0032.
Testo completoTo analyses industrial process behavio, many kinds of information are needed. As tye ar mostly numerical, statistical and data analysis methods are well-suited to this activity. Their results must be interpreted with other knowledge about analysis prcess. Our work falls within the framework of the application of the techniques of the Artificial Intelligence to the Statistics. Its aim is to study the feasibility and the development of statistical expert systems in an industrial process field. The prototype ALADIN is a knowledge-base system designed to be an intelligent assistant to help a non-specialist user analyze data collected on industrial processes, written in Turbo-Prolong, it is coupled with the statistical package MODULAD. The architecture of this system is flexible and combing knowledge with general plants, the studied process and statistical methods. Its validation is performed on continuous manufacturing processes (cement and cast iron processes). At present time, we have limited to principal Components analysis problems
David, Claire. "Analyse de XML avec données non-bornées". Paris 7, 2009. http://www.theses.fr/2009PA077107.
Testo completoThe motivation of the work is the specification and static analysis of schema for XML documents paying special attention to data values. We consider words and trees whose positions are labeled both by a letter from a finite alphabet and a data value from an infinite domain. Our goal is to find formalisms which offer good trade-offs between expressibility, decidability and complexity (for the satisfiability problem). We first study an extension of first-order logic with a binary predicate representing data equality. We obtain interesting some interesting results when we consider the two variable fragment. This appraoch is elegant but the complexity results are not encouraging. We proposed another formalism based data patterns which can be desired, forbidden or any boolean combination thereof. We drw precisely the decidability frontier for various fragments on this model. The complexity results that we get, while still high, seems more amenable. In terms of expressivity theses two approaches are orthogonal, the two variable fragment of the extension of FO can expressed unary key and unary foreign key while the boolean combination of data pattern can express arbitrary key but can not express foreign key
Carvalho, Francisco de. "Méthodes descriptives en analyse de données symboliques". Paris 9, 1992. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1992PA090025.
Testo completoRoyer, Jean-Jacques. "Analyse multivariable et filtrage des données régionalisées". Vandoeuvre-les-Nancy, INPL, 1988. http://www.theses.fr/1988NAN10312.
Testo completoFaye, Papa Abdoulaye. "Planification et analyse de données spatio-temporelles". Thesis, Clermont-Ferrand 2, 2015. http://www.theses.fr/2015CLF22638/document.
Testo completoSpatio-temporal modeling allows to make the prediction of a regionalized variable at unobserved points of a given field, based on the observations of this variable at some points of field at different times. In this thesis, we proposed a approach which combine numerical and statistical models. Indeed by using the Bayesian methods we combined the different sources of information : spatial information provided by the observations, temporal information provided by the black-box and the prior information on the phenomenon of interest. This approach allowed us to have a good prediction of the variable of interest and a good quantification of incertitude on this prediction. We also proposed a new method to construct experimental design by establishing a optimality criterion based on the uncertainty and the expected value of the phenomenon
Jamal, Sara. "Analyse spectrale des données du sondage Euclid". Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0263.
Testo completoLarge-scale surveys, as Euclid, will produce a large set of data that will require the development of fully automated data-processing pipelines to analyze the data, extract crucial information and ensure that all requirements are met. From a survey, the redshift is an essential quantity to measure. Distinct methods to estimate redshifts exist in the literature but there is no fully-automated definition of a reliability criterion for redshift measurements. In this work, we first explored common techniques of spectral analysis, as filtering and continuum extraction, that could be used as preprocessing to improve the accuracy of spectral features measurements, then focused on developing a new methodology to automate the reliability assessment of spectroscopic redshift measurements by exploiting Machine Learning (ML) algorithms and features of the posterior redshift probability distribution function (PDF). Our idea consists in quantifying, through ML and zPDFs descriptors, the reliability of a redshift measurement into distinct partitions that describe different levels of confidence. For example, a multimodal zPDF refers to multiple (plausible) redshift solutions possibly with similar probabilities, while a strong unimodal zPDF with a low dispersion and a unique and prominent peak depicts of a more "reliable" redshift estimate. We assess that this new methodology could be very promising for next-generation large spectroscopic surveys on the ground and space such as Euclid and WFIRST
Bobin, Jérôme. "Diversité morphologique et analyse de données multivaluées". Paris 11, 2008. http://www.theses.fr/2008PA112121.
Testo completoLambert, Thierry. "Réalisation d'un logiciel d'analyse de données". Paris 11, 1986. http://www.theses.fr/1986PA112274.
Testo completoZaidi, Fatima Sehar. "Development of statistical monitoring procedures for compositional data". Thesis, Nantes, 2020. http://www.theses.fr/2020NANT4006.
Testo completoStatistical Process Monitoring (SPM) is a widely used methodology, based on the implementation of control charts, for achieving process stability and improving capability through the reduction of the process variability. The selection of a suitable control chart depends on the type and distribution of he data. When there are several quality characteristics, multivariate control charts have to be adopted. But there is a specific category of multivariate data which are constrained by definition and known as Compositional Data (CoDa). This thesis makes an attempt to systematically propose new control charts for the for compositional data that have not yet been proposed so far in the literature. Hotelling T2-CoDa control chart in the presence of measurement error and MEWMACoDa control chart in the presence of measurement error has been proposed for compositional data. Furthermore, some nonparametric charts to monitor compositional data has also been proposed. The performance of each control chart has been studied and the optimal parameters have systematically been evaluated. Real life compositional data examples have been used in order to study the performances of the proposed charts
Fraisse, Bernard. "Automatisation, traitement du signal et recueil de données en diffraction x et analyse thermique : Exploitation, analyse et représentation des données". Montpellier 2, 1995. http://www.theses.fr/1995MON20152.
Testo completoGonzalez, Ignacio. "Analyse canonique régularisée pour des données fortement multidimensionnelles". Toulouse 3, 2007. http://thesesups.ups-tlse.fr/99/.
Testo completoMotivated by the study of relationships between gene expressions and other biological variables, our work consists in presenting and developing a methodology answering this problem. Among the statistical methods treating this subject, Canonical Analysis (CA) seemed well adapted, but the high dimension is at present one of the major obstacles for the statistical techniques of analysis data coming from microarrays. Typically the axis of this work was the research of solutions taking into account this crucial aspect in the implementation of the CA. Among the approaches considered to handle this problem, we were interested in the methods of regularization. The method developed here, called Regularised Canonical Analysis (RCA), is based on the principle of ridge regularization initially introduced in multiple linear regression. RCA needing the choice of two parameters of regulation for its implementation, we proposed the method of M-fold cross-validation to handle this problem. We presented in detail RCA applications to high multidimensional data coming from genomic studies as well as to data coming from other domains. Among other we were interested in a visualization of the data in order to facilitate the interpretation of the results. For that purpose, we proposed some graphical methods: representations of variables (correlations graphs), representations of individuals as well as alternative representations as networks and heatmaps. .
Bazin, Gurvan. "Analyse différée des données du SuperNova Legacy Survey". Paris 7, 2008. http://www.theses.fr/2008PA077135.
Testo completoThe SuperNova Legacy Survey (SNLS) experiment observed type la supemovae (SNeHa) during 5 years. Its aim is the contraint cosmological parameters. The online reduction pipeline is based on spectroscopic identification for each supernova. Systematically using spectroscopy requires a sufficient signal to noise level. Thus, it could lead to selection biases and would not be possible for future surveys The PhD thesis report a complementary method for data reduction based on a completely photometric selection. This analysis, more efficient to select faint events, approximately double the SNeHa sample of the SNLS. This method show a clear bias in the spectroscopic selection. Brighter SNeHa are systematically selected beyond a redshift of 0. 7. On the other hand, no important impact on cosmology was found. So, corrections on intrinsic variability of SNeHa luminosity are robust. In addition, this work is a first step to study the feasibility of such a purely photometric analysis for cosmology. This is a promising method for future projects
Hapdey, Sébastien. "Analyse de données multi-isotopiques en imagerie monophotonique". Paris 11, 2002. http://www.theses.fr/2002PA11TO35.
Testo completoFeydy, Jean. "Analyse de données géométriques, au delà des convolutions". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASN017.
Testo completoGeometric data analysis, beyond convolutionsTo model interactions between points, a simple option is to rely on weighted sums known as convolutions. Over the last decade, this operation has become a building block for deep learning architectures with an impact on many applied fields. We should not forget, however, that the convolution product is far from being the be-all and end-all of computational mathematics.To let researchers explore new directions, we present robust, efficient and principled implementations of three underrated operations: 1. Generic manipulations of distance-like matrices, including kernel matrix-vector products and nearest-neighbor searches.2. Optimal transport, which generalizes sorting to spaces of dimension D > 1.3. Hamiltonian geodesic shooting, which replaces linear interpolation when no relevant algebraic structure can be defined on a metric space of features.Our PyTorch/NumPy routines fully support automatic differentiation and scale up to millions of samples in seconds. They generally outperform baseline GPU implementations with x10 to x1,000 speed-ups and keep linear instead of quadratic memory footprints. These new tools are packaged in the KeOps (kernel methods) and GeomLoss (optimal transport) libraries, with applications that range from machine learning to medical imaging. Documentation is available at: www.kernel-operations.io/keops and /geomloss
Hebert, Pierre-Alexandre. "Analyse de données sensorielles : une approche ordinale floue". Compiègne, 2004. http://www.theses.fr/2004COMP1542.
Testo completoSensory profile data aims at describing the sensory perceptions of human subjects. Such a data is composed of scores attributed by human sensory experts (or judges) in order to describe a set of products according to sensory descriptors. AlI assessments are repeated, usually three times. The thesis describes a new analysis method based on a fuzzy modelling of the scores. The first step of the method consists in extracting and encoding the relevant information of each replicate into a fuzzy weak dominance relation. Then an aggregation procedure over the replicates allows to synthesize the perception of each judge into a new fuzzy relation. Ln a similar way, a consensual relation is finally obtained for each descriptor by fusing the relations of the judges. So as to ensure the interpretation of fused relations, fuzzy preference theory is used. A set of graphical tools is then proposed for the mono and multidimensional analysis of the obtained relations
Narozny, Michel. "Analyse en composantes indépendantes et compression de données". Paris 11, 2005. http://www.theses.fr/2005PA112268.
Testo completoIn this thesis we are interested in the performances of independent component analysis (ICA) when it is used for data compression. First we show that the ICA transformations yield poor performances compared to the Karhunen-Loeve transform (KIT) for the coding of some continuous-tone images and a musical signal, but can outperform the KTL on some synthetic signals. In medium-to-high (resp. Low) bit rate coding, the bit-rate measured is the empirical first (resp. Second, fourth and ninth) order entropy. The mean square error between the original signal and that reconstructed is used for the evaluation of the distortion. Then we show that for non Gaussian signals the problem of finding the optimal linear transform in transform coding is equivalent to finding the solution of a modified ICA problem. Two new algorithms, GCGsup and ICAorth, are then proposed to compute the optimal linear transform and the optimal orthogonal transform, respectively. In our simulations, we show that GCGsup and ICAorth can outperform the KLT or some continuous-tone images and some synthetic signals. Finally, we are also interested in a multicomponent images coding scheme which employs a wavelet transform for reducing the spatial redundancy and the transformations returned by GCGsup et ICAorth for reducing the spectral redundancy. In this case, further work has to be done in order to find some images whose compression using the new transforms is significantly better than that obtained with the TKL
Aubert, Julie. "Analyse statistique de données biologiques à haut débit". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS048/document.
Testo completoThe technological progress of the last twenty years allowed the emergence of an high-throuput biology basing on large-scale data obtained in a automatic way. The statisticians have an important role to be played in the modelling and the analysis of these numerous, noisy, sometimes heterogeneous and collected at various scales. This role can be from several nature. The statistician can propose new concepts, or new methods inspired by questions asked by this biology. He can propose a fine modelling of the phenomena observed by means of these technologies. And when methods exist and require only an adaptation, the role of the statistician can be the one of an expert, who knows the methods, their limits and the advantages.In a first part, I introduce different methods developed with my co-authors for the analysis of high-throughput biological data, based on latent variables models. These models make it possible to explain a observed phenomenon using hidden or latent variables. The simplest latent variable model is the mixture model. The first two presented methods constitutes two examples: the first in a context of multiple tests and the second in the framework of the definition of a hybridization threshold for data derived from microarrays. I also present a model of coupled hidden Markov chains for the detection of variations in the number of copies in genomics taking into account the dependence between individuals, due for example to a genetic proximity. For this model we propose an approximate inference based on a variational approximation, the exact inference not being able to be considered as the number of individuals increases. We also define a latent-block model modeling an underlying structure per block of rows and columns adapted to count data from microbial ecology. Metabarcoding and metagenomic data correspond to the abundance of each microorganism in a microbial community within the environment (plant rhizosphere, human digestive tract, ocean, for example). These data have the particularity of presenting a dispersion stronger than expected under the most conventional models (we speak of over-dispersion). Biclustering is a way to study the interactions between the structure of microbial communities and the biological samples from which they are derived. We proposed to model this phenomenon using a Poisson-Gamma distribution and developed another variational approximation for this particular latent block model as well as a model selection criterion. The model's flexibility and performance are illustrated on three real datasets.A second part is devoted to work dedicated to the analysis of transcriptomic data derived from DNA microarrays and RNA sequencing. The first section is devoted to the normalization of data (detection and correction of technical biases) and presents two new methods that I proposed with my co-authors and a comparison of methods to which I contributed. The second section devoted to experimental design presents a method for analyzing so-called dye-switch design.In the last part, I present two examples of collaboration, derived respectively from an analysis of genes differentially expressed from microrrays data, and an analysis of translatome in sea urchins from RNA-sequencing data, how statistical skills are mobilized, and the added value that statistics bring to genomics projects
Kezouit, Omar Abdelaziz. "Bases de données relationnelles et analyse de données : conception et réalisation d'un système intégré". Paris 11, 1987. http://www.theses.fr/1987PA112130.
Testo completoJais, Jean-Philippe. "Modèles de régression pour l'analyse de données qualitatives longitudinales". Paris 7, 1993. http://www.theses.fr/1993PA077065.
Testo completoKronek, Louis-Philippe. "Analyse combinatoire de données appliquée à la recherche médicale". Grenoble INPG, 2008. http://www.theses.fr/2008INPG0146.
Testo completoLogical analysis of a data is a supervised learning method based on theory of partially defined Boolean functions and combinatorial optimization. Its implementation involves a wide range of methods of resolutions of operation research. The purpose of this work is to continue on developing this method keeping in mind constraints relating to medical research and more particularly the elegance and ease of understanding of the result which should be accessible with basic mathematical knowledge. Three parts of this problem has been treated : efficient model generation, adaptation to survival analysis and optimization of the implementation of a new decision model
El, Hafyani Hafsa. "Analyse de données spatio-temporelles dans le contexte de la collecte participative de données environnementales". Thesis, université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG035.
Testo completoAir quality is one of the major risk factors in human health. Mobile Crowd Sensing (MCS), which is a new paradigm based on the emerging connected micro-sensor technology, offers the opportunity of the assessment of personal exposure to air pollution anywhere and anytime. This leads to the continuous generation of geolocated data series, which results in a big data volume. Such data is deemed to be a mine of information for various analysis, and a unique opportunity of knowledge discovery about pollution exposure. However, achieving this analysis is far from straightforward. In fact, there is a gap to fill between the raw sensor data series and usable information: raw data is highly uneven, noisy, and incomplete. The major challenge addressed by this thesis is to fill this gap by providing a holistic approach for data analytics and mining in the context of MCS. We establish an end-to-end analytics pipeline, which encompasses data preprocessing, their enrichment with contextual information, as well as data modeling and storage. We implemented this pipeline while ensuring its automatized deployment. The proposed approaches have been applied to real-world datasets collected within the Polluscope project
Peyre, Julie. "Analyse statistique des données issues des biopuces à ADN". Phd thesis, Université Joseph Fourier (Grenoble), 2005. http://tel.archives-ouvertes.fr/tel-00012041.
Testo completoDans un premier chapitre, nous étudions le problème de la normalisation des données dont l'objectif est d'éliminer les variations parasites entre les échantillons des populations pour ne conserver que les variations expliquées par les phénomènes biologiques. Nous présentons plusieurs méthodes existantes pour lesquelles nous proposons des améliorations. Pour guider le choix d'une méthode de normalisation, une méthode de simulation de données de biopuces est mise au point.
Dans un deuxième chapitre, nous abordons le problème de la détection de gènes différentiellement exprimés entre deux séries d'expériences. On se ramène ici à un problème de test d'hypothèses multiples. Plusieurs approches sont envisagées : sélection de modèles et pénalisation, méthode FDR basée sur une décomposition en ondelettes des statistiques de test ou encore seuillage bayésien.
Dans le dernier chapitre, nous considérons les problèmes de classification supervisée pour les données de biopuces. Pour remédier au problème du "fléau de la dimension", nous avons développé une méthode semi-paramétrique de réduction de dimension, basée sur la maximisation d'un critère de vraisemblance locale dans les modèles linéaires généralisés en indice simple. L'étape de réduction de dimension est alors suivie d'une étape de régression par polynômes locaux pour effectuer la classification supervisée des individus considérés.
Villa, Francesca. "Calibration photométrique de l'imageur MegaCam : analyse des données SNDice". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00839491.
Testo completoVatsiou, Alexandra. "Analyse de génétique statistique en utilisant des données pangénomiques". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAS002/document.
Testo completoThe complex phenotypes observed nowadays in human populations are determined by genetic as well as environmental factors. For example, nutrition and lifestyle play important roles in the development of multifactorial diseases such as obesity and diabetes. Adaptation on such complex phenotypic traits may occur via allele frequency shifts at multiple loci, a phenomenon known as polygenic selection. Recent advances in statistical approaches and the emergence of high throughput Next Generation Sequencing data has enabled the detection of such signals. Here we aim to understand the extent to which environmental changes lead to shifts in selective pressures as well as the impact of those on disease susceptibility. To achieve that, we propose a gene set enrichment analysis using SNP selection scores that are simply scores that quantify the selection pressure on SNPs and they could be derived from genome-scan methods. Initially we carry out a sensitivity analysis to investigate which of the recent genome-scan methods identify accurately the selected region. A simulation approach was used to assess their performance under a wide range of complex demographic structures under both hard and soft selective sweeps. Then, we develop SEL-GSEA, a tool to identify pathways enriched for evolutionary pressures, which is based on SNP data. Finally, to examine the effect of potential environmental changes that could represent changes in selection pressures, we apply SEL-GSEA as well as Gowinda, an available online tool, on a population-based study. We analyzed three populations (Africans, Europeans and Asians) from the HapMap database. To acquire the SNP selection scores that are the basis for SEL-GSEA, we used a combination of two genome scan methods (iHS and XPCLR) that performed the best in our sensitivity analysis. The results of our analysis show extensive selection pressures on immune related pathways mainly in Africa population as well as on the glycolysis and gluconeogenesis pathway in Europeans, which is related to metabolism and diabetes
Chavent, Marie. "Analyse de données symboliques : une méthode divisive de classification". Paris 9, 1997. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1997PA090029.
Testo completoYahia, Hussein. "Analyse des structures de données arborescentes représentant des images". Paris 11, 1986. http://www.theses.fr/1986PA112292.
Testo completoBossut, Philippe. "Analyse des données : application à l'analyse automatique d'images multispectrales". École nationale supérieure des mines de Paris, 1986. http://www.theses.fr/1986ENMP0010.
Testo completoLaur, Pierre Alain. "Données semi structurées : Découverte, maintenance et analyse de tendances". Montpellier 2, 2004. http://www.theses.fr/2004MON20053.
Testo completoLemoine, Frédéric. "Intégration, interrogation et analyse de données de génomique comparative". Paris 11, 2008. http://www.theses.fr/2008PA112180.
Testo completoOur work takes place within the « Microbiogenomics » project. Microbiogenomics aims at building a genomic prokaryotic data warehouse. This data warehouse gathers numerous data currently dispersed, in order to improve functional annotation of bacterial genomes. Within this project, our work contains several facets. The first one focuses mainly on the analyses of biological data. We are particularly interested in the conservation of gene order during the evolution of prokaryotic genomes. To do so, we designed a computational pipeline aiming at detecting the areas whose gene order is conserved. We then studied the relative evolution of the proteins coded by genes that are located in conserved areas, in comparison with the other proteins. This data were made available through the SynteView synteny visualization tool (http://www. Synteview. U-psud. Fr). Moreover, to broaden the analysis of these data, we need to cross them with other kinds of data, such as pathway data. These data, often dispersed and heterogeneous, are difficult to query. That is why, in a second step, we were interested in querying the Microbiogenomics data warehouse. We designed an architecture and some algorithms to query the data warehouse, while keeping the different points of view given by the sources. These algorithms were implemented in GenoQuery (http://www. Lri. Fr/~lemoine/GenoQuery), a prototype querying module adapted to a genomic data warehouse
Llobell, Fabien. "Classification de tableaux de données, applications en analyse sensorielle". Thesis, Nantes, Ecole nationale vétérinaire, 2020. http://www.theses.fr/2020ONIR143F.
Testo completoMultiblock datasets are more and more frequent in several areas of application. This is particularly the case in sensory evaluation where several tests lead to multiblock datasets, each dataset being related to a subject (judge, consumer, ...). The statistical analysis of this type of data has raised an increasing interest over the last thirty years. However, the clustering of multiblock datasets has received little attention, even though there is an important need for this type of data.In this context, a method called CLUSTATIS devoted to the cluster analysis of datasets is proposed. At the heart of this approach is the STATIS method, which is a multiblock datasets analysis strategy. Several extensions of the CLUSTATIS clustering method are presented. In particular, the case of data from the so-called "Check-All-That-Apply" (CATA) task is considered. An ad-hoc clustering method called CLUSCATA is discussed.In order to improve the homogeneity of clusters from both CLUSTATIS and CLUSCATA, an option to add an additional cluster, called "K+1", is introduced. The purpose of this additional cluster is to collect datasets identified as atypical.The choice of the number of clusters is discussed, ans solutions are proposed. Applications in sensory analysis as well as simulation studies highlight the relevance of the clustering approach.Implementations in the XLSTAT software and in the R environment are presented
Jaunâtre, Kévin. "Analyse et modélisation statistique de données de consommation électrique". Thesis, Lorient, 2019. http://www.theses.fr/2019LORIS520.
Testo completoIn October 2014, the French Environment & Energy Management Agency with the ENEDIS company started a research project named SOLENN ("SOLidarité ENergie iNovation") with multiple objectives such as the study of the control of the electric consumption by following the households and to secure the electric supply. The SOLENN project was lead by the ADEME and took place in Lorient, France. The main goal of this project is to improve the knowledge of the households concerning the saving of electric energy. In this context, we describe a method to estimate extreme quantiles and probabilites of rare events which is implemented in a R package. Then, we propose an extension of the famous Cox's proportional hazards model which allows the etimation of the probabilites of rare events. Finally, we give an application of some statistics models developped in this document on electric consumption data sets which were useful for the SOLENN project. A first application is linked to the electric constraint program directed by ENEDIS in order to secure the electric network. The houses are under a reduction of their maximal power for a short period of time. The goal is to study how the household behaves during this period of time. A second application concern the utilisation of the multiple regression model to study the effect of individuals visits on the electric consumption. The goal is to study the impact on the electric consumption for the week or the month following a visit
Rodriguez-Rojas, Oldemar. "Classification et modèles linéaires en analyse des données symboliques". Paris 9, 2000. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2000PA090064.
Testo completoPorrot, sylvain. "Complexité de Kolmogorov et analyse de flots de données". Lille 1, 1998. https://pepite-depot.univ-lille.fr/LIBRE/Th_Num/1998/50376-1998-209.pdf.
Testo completoBodin, Bruno. "Analyse d'Applications Flot de Données pour la Compilation Multiprocesseur". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00922578.
Testo completo