Tesis sobre el tema "Analyse Confidentielle de Données"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Analyse Confidentielle de Données".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Saadeh, Angelo. "Applications of secure multi-party computation in Machine Learning". Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT022.
Texto completoPrivacy-preserving in machine learning and data analysis is becoming increasingly important as the amount of sensitive personal information collected and used by organizations continues to grow. This poses the risk of exposing sensitive personal information to malicious third parties - which can lead to identity theft, financial fraud, or other types of cybercrime. Laws against the use of private data are important to protect individuals from having their information used and shared. However, by doing so, data protection laws limit the applications of machine learning models, and some of these applications could be life-saving - like in the medical field.Secure multi-party computation (MPC) allows multiple parties to jointly compute a function over their inputs without having to reveal or exchange the data itself. This tool can be used for training collaborative machine learning models when there are privacy concerns about exchanging sensitive datasets between different entities.In this thesis, we (I) use existing and develop new secure multi-party computation algorithms, (II) introduce cryptography-friendly approximations of common machine functions, and (III) complement secure multi-party computation with other privacy tools. This work is done in the goal of implementing privacy-preserving machine learning and data analysis algorithms.Our work and experimental results show that by executing the algorithms using secure multi-party computation both security and correctness are satisfied. In other words, no party has access to another's information and they are still being able to collaboratively train machine learning models with high accuracy results, and to collaboratively evaluate data analysis algorithms in comparison with non-encrypted datasets.Overall, this thesis provides a comprehensive view of secure multi-party computation for machine learning, demonstrating its potential to revolutionize the field. This thesis contributes to the deployment and acceptability of secure multi-party computation in machine learning and data analysis
Alborch, escobar Ferran. "Private Data Analysis over Encrypted Databases : Mixing Functional Encryption with Computational Differential Privacy". Electronic Thesis or Diss., Institut polytechnique de Paris, 2025. http://www.theses.fr/2025IPPAT003.
Texto completoIn our current digitalized society, data is ruling the world. But as it is most of the time related to individuals, its exploitation should respect the privacy of the latter. This issue has raised the differential privacy paradigm, which permits to protect individuals when querying databases containing data about them. But with the emergence of cloud computing, it is becoming increasingly necessary to also consider the confidentiality of "on-cloud'' storage confidentiality of such vast databases, using encryption techniques. This thesis studies how to provide both privacy and confidentiality of such outsourced databases by mixing two primitives: computational differential privacy and functional encryption. First, we study the relationship between computational differential privacy and functional encryption for randomized functions in a generic way. We analyze the privacy of the setting where a malicious analyst may access the encrypted data stored in a server, either by corrupting or breaching it, and prove that a secure randomized functional encryption scheme supporting the appropriate family of functions guarantees the computational differential privacy of the system. Second, we construct efficient randomized functional encryption schemes for certain useful families of functions, and we prove them secure in the standard model under well-known assumptions. The families of functions considered are linear functions, used for example in counting queries, histograms and linear regressions, and quadratic functions, used for example in quadratic regressions and hypothesis testing. The schemes built are then used together with the first result to construct encrypted databases for their corresponding family of queries. Finally, we implement both randomized functional encryption schemes to analyze their efficiency. This shows that our constructions are practical for databases with up to 1 000 000 entries in the case of linear queries and databases with up to 10 000 database entries in the case of quadratic queries
Marine, Cadoret. "Analyse factorielle de données de catégorisation. : Application aux données sensorielles". Rennes, Agrocampus Ouest, 2010. http://www.theses.fr/2010NSARG006.
Texto completoIn sensory analysis, holistic approaches in which objects are considered as a whole are increasingly used to collect data. Their interest comes on a one hand from their ability to acquire other types of information as the one obtained by traditional profiling methods and on the other hand from the fact they require no special skills, which makes them feasible by all subjects. Categorization (or free sorting), in which subjects are asked to provide a partition of objects, belongs to these approaches. The first part of this work focuses on categorization data. After seeing that this method of data collection is relevant, we focus on the statistical analysis of these data through the research of Euclidean representations. The proposed methodology which consists in using factorial methods such as Multiple Correspondence Analysis (MCA) or Multiple Factor Analysis (MFA) is also enriched with elements of validity. This methodology is then illustrated by the analysis of two data sets obtained from beers on a one hand and perfumes on the other hand. The second part is devoted to the study of two data collection methods related to categorization: sorted Napping® and hierarchical sorting. For both data collections, we are also interested in statistical analysis by adopting an approach similar to the one used for categorization data. The last part is devoted to the implementation in the R software of functions to analyze the three kinds of data that are categorization data, hierarchical sorting data and sorted Napping® data
Gomes, Da Silva Alzennyr. "Analyse des données évolutives : application aux données d'usage du Web". Phd thesis, Université Paris Dauphine - Paris IX, 2009. http://tel.archives-ouvertes.fr/tel-00445501.
Texto completoGomes, da Silva Alzennyr. "Analyse des données évolutives : Application aux données d'usage du Web". Paris 9, 2009. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2009PA090047.
Texto completoNowadays, more and more organizations are becoming reliant on the Internet. The Web has become one of the most widespread platforms for information change and retrieval. The growing number of traces left behind user transactions (e. G. : customer purchases, user sessions, etc. ) automatically increases the importance of usage data analysis. Indeed, the way in which a web site is visited can change over time. These changes can be related to some temporal factors (day of the week, seasonality, periods of special offer, etc. ). By consequence, the usage models must be continuously updated in order to reflect the current behaviour of the visitors. Such a task remains difficult when the temporal dimension is ignored or simply introduced into the data description as a numeric attribute. It is precisely on this challenge that the present thesis is focused. In order to deal with the problem of acquisition of real usage data, we propose a methodology for the automatic generation of artificial usage data over which one can control the occurrence of changes and thus, analyse the efficiency of a change detection system. Guided by tracks born of some exploratory analyzes, we propose a tilted window approach for detecting and following-up changes on evolving usage data. In order measure the level of changes, this approach applies two external evaluation indices based on the clustering extension. The proposed approach also characterizes the changes undergone by the usage groups (e. G. Appearance, disappearance, fusion and split) at each timestamp. Moreover, the refereed approach is totally independent of the clustering method used and is able to manage different kinds of data other than usage data. The effectiveness of this approach is evaluated on artificial data sets of different degrees of complexity and also on real data sets from different domains (academic, tourism, e-business and marketing)
Peng, Tao. "Analyse de données loT en flux". Electronic Thesis or Diss., Aix-Marseille, 2021. http://www.theses.fr/2021AIXM0649.
Texto completoSince the advent of the IoT (Internet of Things), we have witnessed an unprecedented growth in the amount of data generated by sensors. To exploit this data, we first need to model it, and then we need to develop analytical algorithms to process it. For the imputation of missing data from a sensor f, we propose ISTM (Incremental Space-Time Model), an incremental multiple linear regression model adapted to non-stationary data streams. ISTM updates its model by selecting: 1) data from sensors located in the neighborhood of f, and 2) the near-past most recent data gathered from f. To evaluate data trustworthiness, we propose DTOM (Data Trustworthiness Online Model), a prediction model that relies on online regression ensemble methods such as AddExp (Additive Expert) and BNNRW (Bagging NNRW) for assigning a trust score in real time. DTOM consists: 1) an initialization phase, 2) an estimation phase, and 3) a heuristic update phase. Finally, we are interested predicting multiple outputs STS in presence of imbalanced data, i.e. when there are more instances in one value interval than in another. We propose MORSTS, an online regression ensemble method, with specific features: 1) the sub-models are multiple output, 2) adoption of a cost sensitive strategy i.e. the incorrectly predicted instance has a higher weight, and 3) management of over-fitting by means of k-fold cross-validation. Experimentation with with real data has been conducted and the results were compared with reknown techniques
Sibony, Eric. "Analyse mustirésolution de données de classements". Thesis, Paris, ENST, 2016. http://www.theses.fr/2016ENST0036/document.
Texto completoThis thesis introduces a multiresolution analysis framework for ranking data. Initiated in the 18th century in the context of elections, the analysis of ranking data has attracted a major interest in many fields of the scientific literature : psychometry, statistics, economics, operations research, machine learning or computational social choice among others. It has been even more revitalized by modern applications such as recommender systems, where the goal is to infer users preferences in order to make them the best personalized suggestions. In these settings, users express their preferences only on small and varying subsets of a large catalog of items. The analysis of such incomplete rankings poses however both a great statistical and computational challenge, leading industrial actors to use methods that only exploit a fraction of available information. This thesis introduces a new representation for the data, which by construction overcomes the two aforementioned challenges. Though it relies on results from combinatorics and algebraic topology, it shares several analogies with multiresolution analysis, offering a natural and efficient framework for the analysis of incomplete rankings. As it does not involve any assumption on the data, it already leads to overperforming estimators in small-scale settings and can be combined with many regularization procedures for large-scale settings. For all those reasons, we believe that this multiresolution representation paves the way for a wide range of future developments and applications
Vidal, Jules. "Progressivité en analyse topologique de données". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS398.
Texto completoTopological Data Analysis (TDA) forms a collection of tools that enable the generic and efficient extraction of features in data. However, although most TDA algorithms have practicable asymptotic complexities, these methods are rarely interactive on real-life datasets, which limits their usability for interactive data analysis and visualization. In this thesis, we aimed at developing progressive methods for the TDA of scientific scalar data, that can be interrupted to swiftly provide a meaningful approximate output and that are able to refine it otherwise. First, we introduce two progressive algorithms for the computation of the critical points and the extremum-saddle persistence diagram of a scalar field. Next, we revisit this progressive framework to introduce an approximation algorithm for the persistence diagram of a scalar field, with strong guarantees on the related approximation error. Finally, in a effort to perform visual analysis of ensemble data, we present a novel progressive algorithm for the computation of the discrete Wasserstein barycenter of a set of persistence diagrams, a notoriously computationally intensive task. Our progressive approach enables the approximation of the barycenter within interactive times. We extend this method to a progressive, time-constraint, topological ensemble clustering algorithm
Sibony, Eric. "Analyse mustirésolution de données de classements". Electronic Thesis or Diss., Paris, ENST, 2016. http://www.theses.fr/2016ENST0036.
Texto completoThis thesis introduces a multiresolution analysis framework for ranking data. Initiated in the 18th century in the context of elections, the analysis of ranking data has attracted a major interest in many fields of the scientific literature : psychometry, statistics, economics, operations research, machine learning or computational social choice among others. It has been even more revitalized by modern applications such as recommender systems, where the goal is to infer users preferences in order to make them the best personalized suggestions. In these settings, users express their preferences only on small and varying subsets of a large catalog of items. The analysis of such incomplete rankings poses however both a great statistical and computational challenge, leading industrial actors to use methods that only exploit a fraction of available information. This thesis introduces a new representation for the data, which by construction overcomes the two aforementioned challenges. Though it relies on results from combinatorics and algebraic topology, it shares several analogies with multiresolution analysis, offering a natural and efficient framework for the analysis of incomplete rankings. As it does not involve any assumption on the data, it already leads to overperforming estimators in small-scale settings and can be combined with many regularization procedures for large-scale settings. For all those reasons, we believe that this multiresolution representation paves the way for a wide range of future developments and applications
Périnel, Emmanuel. "Segmentation en analyse de données symboliques : le cas de données probabilistes". Paris 9, 1996. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1996PA090079.
Texto completoAaron, Catherine. "Connexité et analyse des données non linéaires". Phd thesis, Université Panthéon-Sorbonne - Paris I, 2005. http://tel.archives-ouvertes.fr/tel-00308495.
Texto completoDarlay, Julien. "Analyse combinatoire de données : structures et optimisation". Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00683651.
Texto completoOperto, Grégory. "Analyse structurelle surfacique de données fonctionnelles cétrébrales". Aix-Marseille 3, 2009. http://www.theses.fr/2009AIX30060.
Texto completoFunctional data acquired by magnetic resonance contain a measure of the activity in every location of the brain. If many methods exist, the automatic analysis of these data remains an open problem. In particular, the huge majority of these methods consider these data in a volume-based fashion, in the 3D acquisition space. However, most of the activity is generated within the cortex, which can be considered as a surface. Considering the data on the cortical surface has many advantages : on one hand, its geometry can be taken into account in every processing step, on the other hand considering the whole volume reduces the detection power of usually employed statistical tests. This thesis hence proposes an extension of the application field of volume-based methods to the surface-based domain by adressing problems such as projecting data onto the surface, performing surface-based multi-subjects analysis, and estimating results validity
Le, Béchec Antony. "Gestion, analyse et intégration des données transcriptomiques". Rennes 1, 2007. http://www.theses.fr/2007REN1S051.
Texto completoAiming at a better understanding of diseases, transcriptomic approaches allow the analysis of several thousands of genes in a single experiment. To date, international standard initiatives have allowed the utilization of large quantity of data generated using transcriptomic approaches by the whole scientific community, and a large number of algorithms are available to process and analyze the data sets. However, the major challenge remaining to tackle is now to provide biological interpretations to these large sets of data. In particular, their integration with additional biological knowledge would certainly lead to an improved understanding of complex biological mechanisms. In my thesis work, I have developed a novel and evolutive environment for the management and analysis of transcriptomic data. Micro@rray Integrated Application (M@IA) allows for management, processing and analysis of large scale expression data sets. In addition, I elaborated a computational method to combine multiple data sources and represent differentially expressed gene networks as interaction graphs. Finally, I used a meta-analysis of gene expression data extracted from the literature to select and combine similar studies associated with the progression of liver cancer. In conclusion, this work provides a novel tool and original analytical methodologies thus contributing to the emerging field of integrative biology and indispensable for a better understanding of complex pathophysiological processes
Abdali, Abdelkebir. "Systèmes experts et analyse de données industrielles". Lyon, INSA, 1992. http://www.theses.fr/1992ISAL0032.
Texto completoTo analyses industrial process behavio, many kinds of information are needed. As tye ar mostly numerical, statistical and data analysis methods are well-suited to this activity. Their results must be interpreted with other knowledge about analysis prcess. Our work falls within the framework of the application of the techniques of the Artificial Intelligence to the Statistics. Its aim is to study the feasibility and the development of statistical expert systems in an industrial process field. The prototype ALADIN is a knowledge-base system designed to be an intelligent assistant to help a non-specialist user analyze data collected on industrial processes, written in Turbo-Prolong, it is coupled with the statistical package MODULAD. The architecture of this system is flexible and combing knowledge with general plants, the studied process and statistical methods. Its validation is performed on continuous manufacturing processes (cement and cast iron processes). At present time, we have limited to principal Components analysis problems
David, Claire. "Analyse de XML avec données non-bornées". Paris 7, 2009. http://www.theses.fr/2009PA077107.
Texto completoThe motivation of the work is the specification and static analysis of schema for XML documents paying special attention to data values. We consider words and trees whose positions are labeled both by a letter from a finite alphabet and a data value from an infinite domain. Our goal is to find formalisms which offer good trade-offs between expressibility, decidability and complexity (for the satisfiability problem). We first study an extension of first-order logic with a binary predicate representing data equality. We obtain interesting some interesting results when we consider the two variable fragment. This appraoch is elegant but the complexity results are not encouraging. We proposed another formalism based data patterns which can be desired, forbidden or any boolean combination thereof. We drw precisely the decidability frontier for various fragments on this model. The complexity results that we get, while still high, seems more amenable. In terms of expressivity theses two approaches are orthogonal, the two variable fragment of the extension of FO can expressed unary key and unary foreign key while the boolean combination of data pattern can express arbitrary key but can not express foreign key
Carvalho, Francisco de. "Méthodes descriptives en analyse de données symboliques". Paris 9, 1992. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1992PA090025.
Texto completoRoyer, Jean-Jacques. "Analyse multivariable et filtrage des données régionalisées". Vandoeuvre-les-Nancy, INPL, 1988. http://www.theses.fr/1988NAN10312.
Texto completoFaye, Papa Abdoulaye. "Planification et analyse de données spatio-temporelles". Thesis, Clermont-Ferrand 2, 2015. http://www.theses.fr/2015CLF22638/document.
Texto completoSpatio-temporal modeling allows to make the prediction of a regionalized variable at unobserved points of a given field, based on the observations of this variable at some points of field at different times. In this thesis, we proposed a approach which combine numerical and statistical models. Indeed by using the Bayesian methods we combined the different sources of information : spatial information provided by the observations, temporal information provided by the black-box and the prior information on the phenomenon of interest. This approach allowed us to have a good prediction of the variable of interest and a good quantification of incertitude on this prediction. We also proposed a new method to construct experimental design by establishing a optimality criterion based on the uncertainty and the expected value of the phenomenon
Jamal, Sara. "Analyse spectrale des données du sondage Euclid". Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0263.
Texto completoLarge-scale surveys, as Euclid, will produce a large set of data that will require the development of fully automated data-processing pipelines to analyze the data, extract crucial information and ensure that all requirements are met. From a survey, the redshift is an essential quantity to measure. Distinct methods to estimate redshifts exist in the literature but there is no fully-automated definition of a reliability criterion for redshift measurements. In this work, we first explored common techniques of spectral analysis, as filtering and continuum extraction, that could be used as preprocessing to improve the accuracy of spectral features measurements, then focused on developing a new methodology to automate the reliability assessment of spectroscopic redshift measurements by exploiting Machine Learning (ML) algorithms and features of the posterior redshift probability distribution function (PDF). Our idea consists in quantifying, through ML and zPDFs descriptors, the reliability of a redshift measurement into distinct partitions that describe different levels of confidence. For example, a multimodal zPDF refers to multiple (plausible) redshift solutions possibly with similar probabilities, while a strong unimodal zPDF with a low dispersion and a unique and prominent peak depicts of a more "reliable" redshift estimate. We assess that this new methodology could be very promising for next-generation large spectroscopic surveys on the ground and space such as Euclid and WFIRST
Bobin, Jérôme. "Diversité morphologique et analyse de données multivaluées". Paris 11, 2008. http://www.theses.fr/2008PA112121.
Texto completoLambert, Thierry. "Réalisation d'un logiciel d'analyse de données". Paris 11, 1986. http://www.theses.fr/1986PA112274.
Texto completoLancrenon, Jean. "Authentification d'objets à distance". Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00685206.
Texto completoFraisse, Bernard. "Automatisation, traitement du signal et recueil de données en diffraction x et analyse thermique : Exploitation, analyse et représentation des données". Montpellier 2, 1995. http://www.theses.fr/1995MON20152.
Texto completoGonzalez, Ignacio. "Analyse canonique régularisée pour des données fortement multidimensionnelles". Toulouse 3, 2007. http://thesesups.ups-tlse.fr/99/.
Texto completoMotivated by the study of relationships between gene expressions and other biological variables, our work consists in presenting and developing a methodology answering this problem. Among the statistical methods treating this subject, Canonical Analysis (CA) seemed well adapted, but the high dimension is at present one of the major obstacles for the statistical techniques of analysis data coming from microarrays. Typically the axis of this work was the research of solutions taking into account this crucial aspect in the implementation of the CA. Among the approaches considered to handle this problem, we were interested in the methods of regularization. The method developed here, called Regularised Canonical Analysis (RCA), is based on the principle of ridge regularization initially introduced in multiple linear regression. RCA needing the choice of two parameters of regulation for its implementation, we proposed the method of M-fold cross-validation to handle this problem. We presented in detail RCA applications to high multidimensional data coming from genomic studies as well as to data coming from other domains. Among other we were interested in a visualization of the data in order to facilitate the interpretation of the results. For that purpose, we proposed some graphical methods: representations of variables (correlations graphs), representations of individuals as well as alternative representations as networks and heatmaps. .
Bazin, Gurvan. "Analyse différée des données du SuperNova Legacy Survey". Paris 7, 2008. http://www.theses.fr/2008PA077135.
Texto completoThe SuperNova Legacy Survey (SNLS) experiment observed type la supemovae (SNeHa) during 5 years. Its aim is the contraint cosmological parameters. The online reduction pipeline is based on spectroscopic identification for each supernova. Systematically using spectroscopy requires a sufficient signal to noise level. Thus, it could lead to selection biases and would not be possible for future surveys The PhD thesis report a complementary method for data reduction based on a completely photometric selection. This analysis, more efficient to select faint events, approximately double the SNeHa sample of the SNLS. This method show a clear bias in the spectroscopic selection. Brighter SNeHa are systematically selected beyond a redshift of 0. 7. On the other hand, no important impact on cosmology was found. So, corrections on intrinsic variability of SNeHa luminosity are robust. In addition, this work is a first step to study the feasibility of such a purely photometric analysis for cosmology. This is a promising method for future projects
Hapdey, Sébastien. "Analyse de données multi-isotopiques en imagerie monophotonique". Paris 11, 2002. http://www.theses.fr/2002PA11TO35.
Texto completoFeydy, Jean. "Analyse de données géométriques, au delà des convolutions". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASN017.
Texto completoGeometric data analysis, beyond convolutionsTo model interactions between points, a simple option is to rely on weighted sums known as convolutions. Over the last decade, this operation has become a building block for deep learning architectures with an impact on many applied fields. We should not forget, however, that the convolution product is far from being the be-all and end-all of computational mathematics.To let researchers explore new directions, we present robust, efficient and principled implementations of three underrated operations: 1. Generic manipulations of distance-like matrices, including kernel matrix-vector products and nearest-neighbor searches.2. Optimal transport, which generalizes sorting to spaces of dimension D > 1.3. Hamiltonian geodesic shooting, which replaces linear interpolation when no relevant algebraic structure can be defined on a metric space of features.Our PyTorch/NumPy routines fully support automatic differentiation and scale up to millions of samples in seconds. They generally outperform baseline GPU implementations with x10 to x1,000 speed-ups and keep linear instead of quadratic memory footprints. These new tools are packaged in the KeOps (kernel methods) and GeomLoss (optimal transport) libraries, with applications that range from machine learning to medical imaging. Documentation is available at: www.kernel-operations.io/keops and /geomloss
Hebert, Pierre-Alexandre. "Analyse de données sensorielles : une approche ordinale floue". Compiègne, 2004. http://www.theses.fr/2004COMP1542.
Texto completoSensory profile data aims at describing the sensory perceptions of human subjects. Such a data is composed of scores attributed by human sensory experts (or judges) in order to describe a set of products according to sensory descriptors. AlI assessments are repeated, usually three times. The thesis describes a new analysis method based on a fuzzy modelling of the scores. The first step of the method consists in extracting and encoding the relevant information of each replicate into a fuzzy weak dominance relation. Then an aggregation procedure over the replicates allows to synthesize the perception of each judge into a new fuzzy relation. Ln a similar way, a consensual relation is finally obtained for each descriptor by fusing the relations of the judges. So as to ensure the interpretation of fused relations, fuzzy preference theory is used. A set of graphical tools is then proposed for the mono and multidimensional analysis of the obtained relations
Narozny, Michel. "Analyse en composantes indépendantes et compression de données". Paris 11, 2005. http://www.theses.fr/2005PA112268.
Texto completoIn this thesis we are interested in the performances of independent component analysis (ICA) when it is used for data compression. First we show that the ICA transformations yield poor performances compared to the Karhunen-Loeve transform (KIT) for the coding of some continuous-tone images and a musical signal, but can outperform the KTL on some synthetic signals. In medium-to-high (resp. Low) bit rate coding, the bit-rate measured is the empirical first (resp. Second, fourth and ninth) order entropy. The mean square error between the original signal and that reconstructed is used for the evaluation of the distortion. Then we show that for non Gaussian signals the problem of finding the optimal linear transform in transform coding is equivalent to finding the solution of a modified ICA problem. Two new algorithms, GCGsup and ICAorth, are then proposed to compute the optimal linear transform and the optimal orthogonal transform, respectively. In our simulations, we show that GCGsup and ICAorth can outperform the KLT or some continuous-tone images and some synthetic signals. Finally, we are also interested in a multicomponent images coding scheme which employs a wavelet transform for reducing the spatial redundancy and the transformations returned by GCGsup et ICAorth for reducing the spectral redundancy. In this case, further work has to be done in order to find some images whose compression using the new transforms is significantly better than that obtained with the TKL
Aubert, Julie. "Analyse statistique de données biologiques à haut débit". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS048/document.
Texto completoThe technological progress of the last twenty years allowed the emergence of an high-throuput biology basing on large-scale data obtained in a automatic way. The statisticians have an important role to be played in the modelling and the analysis of these numerous, noisy, sometimes heterogeneous and collected at various scales. This role can be from several nature. The statistician can propose new concepts, or new methods inspired by questions asked by this biology. He can propose a fine modelling of the phenomena observed by means of these technologies. And when methods exist and require only an adaptation, the role of the statistician can be the one of an expert, who knows the methods, their limits and the advantages.In a first part, I introduce different methods developed with my co-authors for the analysis of high-throughput biological data, based on latent variables models. These models make it possible to explain a observed phenomenon using hidden or latent variables. The simplest latent variable model is the mixture model. The first two presented methods constitutes two examples: the first in a context of multiple tests and the second in the framework of the definition of a hybridization threshold for data derived from microarrays. I also present a model of coupled hidden Markov chains for the detection of variations in the number of copies in genomics taking into account the dependence between individuals, due for example to a genetic proximity. For this model we propose an approximate inference based on a variational approximation, the exact inference not being able to be considered as the number of individuals increases. We also define a latent-block model modeling an underlying structure per block of rows and columns adapted to count data from microbial ecology. Metabarcoding and metagenomic data correspond to the abundance of each microorganism in a microbial community within the environment (plant rhizosphere, human digestive tract, ocean, for example). These data have the particularity of presenting a dispersion stronger than expected under the most conventional models (we speak of over-dispersion). Biclustering is a way to study the interactions between the structure of microbial communities and the biological samples from which they are derived. We proposed to model this phenomenon using a Poisson-Gamma distribution and developed another variational approximation for this particular latent block model as well as a model selection criterion. The model's flexibility and performance are illustrated on three real datasets.A second part is devoted to work dedicated to the analysis of transcriptomic data derived from DNA microarrays and RNA sequencing. The first section is devoted to the normalization of data (detection and correction of technical biases) and presents two new methods that I proposed with my co-authors and a comparison of methods to which I contributed. The second section devoted to experimental design presents a method for analyzing so-called dye-switch design.In the last part, I present two examples of collaboration, derived respectively from an analysis of genes differentially expressed from microrrays data, and an analysis of translatome in sea urchins from RNA-sequencing data, how statistical skills are mobilized, and the added value that statistics bring to genomics projects
Hivert, Benjamin. "Clustering et analyse différentielle de données d'expression génique". Electronic Thesis or Diss., Bordeaux, 2024. http://www.theses.fr/2024BORD0171.
Texto completoAnalyses of gene expression data obtained from bulk RNA sequencing (bulk RNA-seq) or single-cell RNA sequencing (scRNA-seq) have become commonplace in immunological studies. They allow for a better understanding of the heterogeneity present in immune responses, whether in reaction to vaccination or disease. Typically, the analysis of these data is conducted in two steps : i) first, an unsupervised classification, or clustering, is performed using all the genes to group samples into distinct and homogeneous subgroups ; ii) then, differential analysis is conducted using hypothesis tests to identify genes that are differentially expressed between these subgroups. However, these two successive steps lead to methodological challenge that is often overlooked in the applied literature. Traditional inference methods require hypothesis to be fixed a priori and independent of the data to ensure effective control of type I error. In the context of these two-steps analyses, the hypothesis tests are based on the results of the clustering, which compromises the control of type I error by traditional methods and can lead to false discoveries. We propose new statistical methods that account for this double use of the data and ensure an effective control of the number of false discoveries
Kezouit, Omar Abdelaziz. "Bases de données relationnelles et analyse de données : conception et réalisation d'un système intégré". Paris 11, 1987. http://www.theses.fr/1987PA112130.
Texto completoJais, Jean-Philippe. "Modèles de régression pour l'analyse de données qualitatives longitudinales". Paris 7, 1993. http://www.theses.fr/1993PA077065.
Texto completoKronek, Louis-Philippe. "Analyse combinatoire de données appliquée à la recherche médicale". Grenoble INPG, 2008. http://www.theses.fr/2008INPG0146.
Texto completoLogical analysis of a data is a supervised learning method based on theory of partially defined Boolean functions and combinatorial optimization. Its implementation involves a wide range of methods of resolutions of operation research. The purpose of this work is to continue on developing this method keeping in mind constraints relating to medical research and more particularly the elegance and ease of understanding of the result which should be accessible with basic mathematical knowledge. Three parts of this problem has been treated : efficient model generation, adaptation to survival analysis and optimization of the implementation of a new decision model
El, Hafyani Hafsa. "Analyse de données spatio-temporelles dans le contexte de la collecte participative de données environnementales". Thesis, université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG035.
Texto completoAir quality is one of the major risk factors in human health. Mobile Crowd Sensing (MCS), which is a new paradigm based on the emerging connected micro-sensor technology, offers the opportunity of the assessment of personal exposure to air pollution anywhere and anytime. This leads to the continuous generation of geolocated data series, which results in a big data volume. Such data is deemed to be a mine of information for various analysis, and a unique opportunity of knowledge discovery about pollution exposure. However, achieving this analysis is far from straightforward. In fact, there is a gap to fill between the raw sensor data series and usable information: raw data is highly uneven, noisy, and incomplete. The major challenge addressed by this thesis is to fill this gap by providing a holistic approach for data analytics and mining in the context of MCS. We establish an end-to-end analytics pipeline, which encompasses data preprocessing, their enrichment with contextual information, as well as data modeling and storage. We implemented this pipeline while ensuring its automatized deployment. The proposed approaches have been applied to real-world datasets collected within the Polluscope project
Peyre, Julie. "Analyse statistique des données issues des biopuces à ADN". Phd thesis, Université Joseph Fourier (Grenoble), 2005. http://tel.archives-ouvertes.fr/tel-00012041.
Texto completoDans un premier chapitre, nous étudions le problème de la normalisation des données dont l'objectif est d'éliminer les variations parasites entre les échantillons des populations pour ne conserver que les variations expliquées par les phénomènes biologiques. Nous présentons plusieurs méthodes existantes pour lesquelles nous proposons des améliorations. Pour guider le choix d'une méthode de normalisation, une méthode de simulation de données de biopuces est mise au point.
Dans un deuxième chapitre, nous abordons le problème de la détection de gènes différentiellement exprimés entre deux séries d'expériences. On se ramène ici à un problème de test d'hypothèses multiples. Plusieurs approches sont envisagées : sélection de modèles et pénalisation, méthode FDR basée sur une décomposition en ondelettes des statistiques de test ou encore seuillage bayésien.
Dans le dernier chapitre, nous considérons les problèmes de classification supervisée pour les données de biopuces. Pour remédier au problème du "fléau de la dimension", nous avons développé une méthode semi-paramétrique de réduction de dimension, basée sur la maximisation d'un critère de vraisemblance locale dans les modèles linéaires généralisés en indice simple. L'étape de réduction de dimension est alors suivie d'une étape de régression par polynômes locaux pour effectuer la classification supervisée des individus considérés.
Villa, Francesca. "Calibration photométrique de l'imageur MegaCam : analyse des données SNDice". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00839491.
Texto completoVatsiou, Alexandra. "Analyse de génétique statistique en utilisant des données pangénomiques". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAS002/document.
Texto completoThe complex phenotypes observed nowadays in human populations are determined by genetic as well as environmental factors. For example, nutrition and lifestyle play important roles in the development of multifactorial diseases such as obesity and diabetes. Adaptation on such complex phenotypic traits may occur via allele frequency shifts at multiple loci, a phenomenon known as polygenic selection. Recent advances in statistical approaches and the emergence of high throughput Next Generation Sequencing data has enabled the detection of such signals. Here we aim to understand the extent to which environmental changes lead to shifts in selective pressures as well as the impact of those on disease susceptibility. To achieve that, we propose a gene set enrichment analysis using SNP selection scores that are simply scores that quantify the selection pressure on SNPs and they could be derived from genome-scan methods. Initially we carry out a sensitivity analysis to investigate which of the recent genome-scan methods identify accurately the selected region. A simulation approach was used to assess their performance under a wide range of complex demographic structures under both hard and soft selective sweeps. Then, we develop SEL-GSEA, a tool to identify pathways enriched for evolutionary pressures, which is based on SNP data. Finally, to examine the effect of potential environmental changes that could represent changes in selection pressures, we apply SEL-GSEA as well as Gowinda, an available online tool, on a population-based study. We analyzed three populations (Africans, Europeans and Asians) from the HapMap database. To acquire the SNP selection scores that are the basis for SEL-GSEA, we used a combination of two genome scan methods (iHS and XPCLR) that performed the best in our sensitivity analysis. The results of our analysis show extensive selection pressures on immune related pathways mainly in Africa population as well as on the glycolysis and gluconeogenesis pathway in Europeans, which is related to metabolism and diabetes
Chavent, Marie. "Analyse de données symboliques : une méthode divisive de classification". Paris 9, 1997. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1997PA090029.
Texto completoYahia, Hussein. "Analyse des structures de données arborescentes représentant des images". Paris 11, 1986. http://www.theses.fr/1986PA112292.
Texto completoBossut, Philippe. "Analyse des données : application à l'analyse automatique d'images multispectrales". École nationale supérieure des mines de Paris, 1986. http://www.theses.fr/1986ENMP0010.
Texto completoLaur, Pierre Alain. "Données semi structurées : Découverte, maintenance et analyse de tendances". Montpellier 2, 2004. http://www.theses.fr/2004MON20053.
Texto completoLemoine, Frédéric. "Intégration, interrogation et analyse de données de génomique comparative". Paris 11, 2008. http://www.theses.fr/2008PA112180.
Texto completoOur work takes place within the « Microbiogenomics » project. Microbiogenomics aims at building a genomic prokaryotic data warehouse. This data warehouse gathers numerous data currently dispersed, in order to improve functional annotation of bacterial genomes. Within this project, our work contains several facets. The first one focuses mainly on the analyses of biological data. We are particularly interested in the conservation of gene order during the evolution of prokaryotic genomes. To do so, we designed a computational pipeline aiming at detecting the areas whose gene order is conserved. We then studied the relative evolution of the proteins coded by genes that are located in conserved areas, in comparison with the other proteins. This data were made available through the SynteView synteny visualization tool (http://www. Synteview. U-psud. Fr). Moreover, to broaden the analysis of these data, we need to cross them with other kinds of data, such as pathway data. These data, often dispersed and heterogeneous, are difficult to query. That is why, in a second step, we were interested in querying the Microbiogenomics data warehouse. We designed an architecture and some algorithms to query the data warehouse, while keeping the different points of view given by the sources. These algorithms were implemented in GenoQuery (http://www. Lri. Fr/~lemoine/GenoQuery), a prototype querying module adapted to a genomic data warehouse
Llobell, Fabien. "Classification de tableaux de données, applications en analyse sensorielle". Thesis, Nantes, Ecole nationale vétérinaire, 2020. http://www.theses.fr/2020ONIR143F.
Texto completoMultiblock datasets are more and more frequent in several areas of application. This is particularly the case in sensory evaluation where several tests lead to multiblock datasets, each dataset being related to a subject (judge, consumer, ...). The statistical analysis of this type of data has raised an increasing interest over the last thirty years. However, the clustering of multiblock datasets has received little attention, even though there is an important need for this type of data.In this context, a method called CLUSTATIS devoted to the cluster analysis of datasets is proposed. At the heart of this approach is the STATIS method, which is a multiblock datasets analysis strategy. Several extensions of the CLUSTATIS clustering method are presented. In particular, the case of data from the so-called "Check-All-That-Apply" (CATA) task is considered. An ad-hoc clustering method called CLUSCATA is discussed.In order to improve the homogeneity of clusters from both CLUSTATIS and CLUSCATA, an option to add an additional cluster, called "K+1", is introduced. The purpose of this additional cluster is to collect datasets identified as atypical.The choice of the number of clusters is discussed, ans solutions are proposed. Applications in sensory analysis as well as simulation studies highlight the relevance of the clustering approach.Implementations in the XLSTAT software and in the R environment are presented
Jaunâtre, Kévin. "Analyse et modélisation statistique de données de consommation électrique". Thesis, Lorient, 2019. http://www.theses.fr/2019LORIS520.
Texto completoIn October 2014, the French Environment & Energy Management Agency with the ENEDIS company started a research project named SOLENN ("SOLidarité ENergie iNovation") with multiple objectives such as the study of the control of the electric consumption by following the households and to secure the electric supply. The SOLENN project was lead by the ADEME and took place in Lorient, France. The main goal of this project is to improve the knowledge of the households concerning the saving of electric energy. In this context, we describe a method to estimate extreme quantiles and probabilites of rare events which is implemented in a R package. Then, we propose an extension of the famous Cox's proportional hazards model which allows the etimation of the probabilites of rare events. Finally, we give an application of some statistics models developped in this document on electric consumption data sets which were useful for the SOLENN project. A first application is linked to the electric constraint program directed by ENEDIS in order to secure the electric network. The houses are under a reduction of their maximal power for a short period of time. The goal is to study how the household behaves during this period of time. A second application concern the utilisation of the multiple regression model to study the effect of individuals visits on the electric consumption. The goal is to study the impact on the electric consumption for the week or the month following a visit
Rodriguez-Rojas, Oldemar. "Classification et modèles linéaires en analyse des données symboliques". Paris 9, 2000. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2000PA090064.
Texto completoPorrot, sylvain. "Complexité de Kolmogorov et analyse de flots de données". Lille 1, 1998. https://pepite-depot.univ-lille.fr/LIBRE/Th_Num/1998/50376-1998-209.pdf.
Texto completoBodin, Bruno. "Analyse d'Applications Flot de Données pour la Compilation Multiprocesseur". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00922578.
Texto completoMartin-Rémy, Aurélie. "Analyse de données de biométrologie : aspects méthodologiques et applications". Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0223/document.
Texto completoMany biomonitoring studies are conducted at INRS, in order to assess occupational exposure to chemicals in France, and to propose reference values to protect workers exposed to these substances. These studies consist in measuring simultaneously biological and airborne exposure of workers exposed to a toxic substance. The relationship between these biological and airborne measurements is then estimated through a linear regression model. When this relationship exists and the route of absorption of the toxic is essentially inhalatory, it is possible to derive a Biological Limit Value (BLV) from the Occupational Exposure Limit Value (OEL) of the toxic substance. However, two characteristics of these data have been identified, which are not or only partially taken into account in the current statistical modelling: the left-censoring due to limits of detection (LoD)/quantification (LoQ) of biological and airborne measurements, and the between-individual variability. Ignoring both of these features in modelling leads to a loss of statistical power and potentially biased conclusions. The work carried out in this thesis allowed us to adapt the regression model to these two characteristics, in a Bayesian framework. The proposed approach is based on the modelling of airborne measurements using random effects models adapted for values below the LoD / LoQ, and on the simultaneous modelling of biological measurements, assumed to depend linearly on a logarithmic scale, on the airborne exposure, while taking into account between-subject variability. This work resulted in a scientific publication in a peer-reviewed journal. This methodology has been applied on beryllium and chromium occupational exposure datasets, after adaptation to the toxicokinetic characteristics of these two substances. It has thus been possible to propose a BLV for beryllium (0.06 μg / g creatinine). The analysis of chromium measurements in two different sectors of activity (occupational exposure to chromate paints, and occupational exposure in the electroplating sector) made it possible to show that urinary chromium depends mainly on airborne exposure to VI chromium, non-VI chromium having less impact. We were not able to show a relationship between the solubility of airborne VI chromium and urinary chromium. A BLV of 0.41 μg / g creatinine, close to the Biological Guidance Value (BGV) proposed by ANSES (0.54 μg / g creatinine), was estimated for occupational exposure to chromate paints, and a BLV of 1.85 μg/g creatinine was obtained for occupational exposure in the electrolytic chromium plating sector, which is consistent with the ANSES proposed BLV in this sector, i-e 1.8 μg / g creatinine