Dissertations / Theses on the topic 'Inférence sélective'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 31 dissertations / theses for your research on the topic 'Inférence sélective.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Yadegari, Iraj. "Prédiction, inférence sélective et quelques problèmes connexes." Thèse, Université de Sherbrooke, 2017. http://hdl.handle.net/11143/10167.
Full textAbstract : We study the problem of point estimation and predictive density estimation of the mean of a selected population, obtaining novel developments which include bias analysis, decomposition of risk, and problems with restricted parameters (Chapter 2). We propose efficient predictive density estimators in terms of Kullback-Leibler and Hellinger losses (Chapter 3) improving on plug-in procedures via a dual loss and via a variance expansion scheme. Finally (Chapter 4), we present findings on improving on the maximum likelihood estimator (MLE) of a bounded normal mean under a class of loss functions, including reflected normal loss, with implications for predictive density estimation. Namely, we give conditions on the loss and the width of the parameter space for which the Bayes estimator with respect to the boundary uniform prior dominates the MLE.
Hivert, Benjamin. "Clustering et analyse différentielle de données d'expression génique." Electronic Thesis or Diss., Bordeaux, 2024. http://www.theses.fr/2024BORD0171.
Full textAnalyses of gene expression data obtained from bulk RNA sequencing (bulk RNA-seq) or single-cell RNA sequencing (scRNA-seq) have become commonplace in immunological studies. They allow for a better understanding of the heterogeneity present in immune responses, whether in reaction to vaccination or disease. Typically, the analysis of these data is conducted in two steps : i) first, an unsupervised classification, or clustering, is performed using all the genes to group samples into distinct and homogeneous subgroups ; ii) then, differential analysis is conducted using hypothesis tests to identify genes that are differentially expressed between these subgroups. However, these two successive steps lead to methodological challenge that is often overlooked in the applied literature. Traditional inference methods require hypothesis to be fixed a priori and independent of the data to ensure effective control of type I error. In the context of these two-steps analyses, the hypothesis tests are based on the results of the clustering, which compromises the control of type I error by traditional methods and can lead to false discoveries. We propose new statistical methods that account for this double use of the data and ensure an effective control of the number of false discoveries
Durand, Jean-Baptiste. "Modèles à structure cachée : inférence, estimation, sélection de modèles et applications." Phd thesis, Université Joseph Fourier (Grenoble), 2003. https://tel.archives-ouvertes.fr/tel-00002754v3.
Full textCaron, François. "Inférence bayésienne pour la détermination et la sélection de modèles stochastiques." Ecole Centrale de Lille, 2006. http://www.theses.fr/2006ECLI0012.
Full textWe are interested in the addition of uncertainty in hidden Markov models. The inference is made in a Bayesian framework based on Monte Carlo methods. We consider multiple sensors that may switch between several states of work. An original jump model is developed for different kind of situations, including synchronous/asynchronous data and the binary valid/invalid case. The model/algorithm is applied to the positioning of a land vehicle equipped with three sensors. One of them is a GPS receiver, whose data are potentially corrupted due to multipaths phenomena. We consider the estimation of the probability density function of the evolution and observation noises in hidden Markov models. First, the case of linear models is addressed and MCMC and particle filter algorithms are developed and applied on three different applications. Then the case of the estimation of probability density functions in nonlinear models is addressed. For that purpose, time-varying Dirichlet processes are defined for the online estimation of time-varying probability density functions
Guilloux, Agathe. "Inférence non paramétrique en statistique des durées de vie sous biais de sélection." Rennes 1, 2004. http://www.theses.fr/2004REN10058.
Full textDelattre, Maud. "Inférence statistique dans les modèles mixtes à dynamique Markovienne." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00765708.
Full textKarmann, Clémence. "Inférence de réseaux pour modèles inflatés en zéro." Thesis, Université de Lorraine, 2019. http://www.theses.fr/2019LORR0146/document.
Full textNetwork inference has more and more applications, particularly in human health and environment, for the study of micro-biological and genomic data. Networks are indeed an appropriate tool to represent, or even study, relationships between entities. Many mathematical estimation techniques have been developed, particularly in the context of Gaussian graphical models, but also in the case of binary or mixed data. The processing of abundance data (of microorganisms such as bacteria for example) is particular for two reasons: on the one hand they do not directly reflect reality because a sequencing process takes place to duplicate species and this process brings variability, on the other hand a species may be absent in some samples. We are then in the context of zero-inflated data. Many graph inference methods exist for Gaussian, binary and mixed data, but zero-inflated models are rarely studied, although they reflect the structure of many data sets in a relevant way. The objective of this thesis is to infer networks for zero-inflated models. In this thesis, we will restrict to conditional dependency graphs. The work presented in this thesis is divided into two main parts. The first one concerns graph inference methods based on the estimation of neighbourhoods by a procedure combining ordinal regression models and variable selection methods. The second one focuses on graph inference in a model where the variables are Gaussian zero-inflated by double truncation (right and left)
Gallopin, Mélina. "Classification et inférence de réseaux pour les données RNA-seq." Thesis, Université Paris-Saclay (ComUE), 2015. http://www.theses.fr/2015SACLS174/document.
Full textThis thesis gathers methodologicals contributions to the statistical analysis of next-generation high-throughput transcriptome sequencing data (RNA-seq). RNA-seq data are discrete and the number of samples sequenced is usually small due to the cost of the technology. These two points are the main statistical challenges for modelling RNA-seq data.The first part of the thesis is dedicated to the co-expression analysis of RNA-seq data using model-based clustering. A natural model for discrete RNA-seq data is a Poisson mixture model. However, a Gaussian mixture model in conjunction with a simple transformation applied to the data is a reasonable alternative. We propose to compare the two alternatives using a data-driven criterion to select the model that best fits each dataset. In addition, we present a model selection criterion to take into account external gene annotations. This model selection criterion is not specific to RNA-seq data. It is useful in any co-expression analysis using model-based clustering designed to enrich functional annotation databases.The second part of the thesis is dedicated to network inference using graphical models. The aim of network inference is to detect relationships among genes based on their expression. We propose a network inference model based on a Poisson distribution taking into account the discrete nature and high inter sample variability of RNA-seq data. However, network inference methods require a large number of samples. For Gaussian graphical models, we propose a non-asymptotic approach to detect relevant subsets of genes based on a block-diagonale decomposition of the covariance matrix. This method is not specific to RNA-seq data and reduces the dimension of any network inference problem based on the Gaussian graphical model
Karmann, Clémence. "Inférence de réseaux pour modèles inflatés en zéro." Electronic Thesis or Diss., Université de Lorraine, 2019. http://www.theses.fr/2019LORR0146.
Full textNetwork inference has more and more applications, particularly in human health and environment, for the study of micro-biological and genomic data. Networks are indeed an appropriate tool to represent, or even study, relationships between entities. Many mathematical estimation techniques have been developed, particularly in the context of Gaussian graphical models, but also in the case of binary or mixed data. The processing of abundance data (of microorganisms such as bacteria for example) is particular for two reasons: on the one hand they do not directly reflect reality because a sequencing process takes place to duplicate species and this process brings variability, on the other hand a species may be absent in some samples. We are then in the context of zero-inflated data. Many graph inference methods exist for Gaussian, binary and mixed data, but zero-inflated models are rarely studied, although they reflect the structure of many data sets in a relevant way. The objective of this thesis is to infer networks for zero-inflated models. In this thesis, we will restrict to conditional dependency graphs. The work presented in this thesis is divided into two main parts. The first one concerns graph inference methods based on the estimation of neighbourhoods by a procedure combining ordinal regression models and variable selection methods. The second one focuses on graph inference in a model where the variables are Gaussian zero-inflated by double truncation (right and left)
Maurent, Eliott. "Des forêts tropicales et des humains dans les Amériques : trajectoires de réponse aux perturbations anthropiques de la diversité et de la composition des arbres. Of tropical forests and humans in the Americas : response trajectories of tree diversity and composition to anthropogenic disturbances." Electronic Thesis or Diss., Paris, AgroParisTech, 2023. http://www.theses.fr/2023AGPT0014.
Full textTropical forests face more frequent and intense anthropogenic disturbances, such as selective logging, namely the felling and harvesting of a few commercially valuable trees in old-growth forests, while the remaining stand is left for natural regeneration. Many studies focused on this regeneration, particularly on the recovery of carbon and timber stocks, most likely due to a strong interest in climate change mitigation and logging profitability. However, despite the crucial role of biodiversity for ecosystem maintenance and functioning - and its intrinsic value - there have been few studies on the impact of selective logging on biodiversity. Therefore, this thesis - organised in three studies - aimed at characterising the response of tree diversity and composition to logging in tropical American forests.First, we drew upon the long-term forest inventories (1986-2021, trees with a diameter at breast height ≥ 10 cm) from Paracou experimental station to build a Bayesian modelling framework of tree diversity and composition trajectories after selective logging. Paracou is located in French Guiana and was disturbed by silvicultural treatments of different intensities in 1986-1987. We propagated in our Bayesian framework the uncertainty associated with botanical determination and functional trait measurements, and modelled Paracou trajectories of taxonomic, phylogenetic and functional tree diversity and composition at the species level, relatively to their pre-disturbance levels. Additionally, we assessed the effect of pre-disturbance tree community characteristics, biophysical conditions and disturbance properties on our forest attribute trajectories. Second, we used a simplified version of the aforementioned Bayesian modelling framework on long-term forest inventories from sample plots located in Costa Rica and three Amazonian countries (respectively belonging to the Observatorio de los Ecosistemas Forestales de Costa Rica and the Tropical managed Forest Observatory). We modelled their post-logging trajectories of taxonomic and functional tree diversity and composition at the genus level, from which we extracted indicators solely over the inventory timespan of each site. We then assessed the effect of pre-disturbance tree community structure and disturbance properties on such indicators. While more variable in the second study with a broader geographical scope than in the first one, we observed similar trends in both studies: diversity mostly increased after logging and tree communities mainly shifted from resource-conservative strategies to resource-acquisitive strategies. Such changes appeared to be driven by the abundant and transient recruitment of early-successional species with acquisitive trait values, which provided them with a competitive advantage as disturbance intensity - i.e., light and space availability - increased. Indeed, changes in diversity and composition increased in both studies with disturbance intensity whereas disturbance selectivity, pre-disturbance tree community characteristics and biophysical conditions had no significant effect. Third, building up on the paramount importance of disturbance intensity in the two previous studies, we developed an original Bayesian hierarchical model of recovery trajectories, considering disturbed forests in a common framework, through a disturbance intensity gradient. We tested our modelling approach on data from two long-term experiments in Costa Rica and French Guiana, set up after selective logging, agriculture, and clearcutting and fire.Overall, these results opened various perspectives on the methods used to evaluate forest response to disturbance, the forest response itself and the ecological processes underlying forest succession, and how disturbed forests could be considered in forest management and conservation plans
Haury, Anne-Claire. "Sélection de variables à partir de données d'expression : signatures moléculaires pour le pronostic du cancer du sein et inférence de réseaux de régulation génique." Phd thesis, Ecole Nationale Supérieure des Mines de Paris, 2012. http://pastel.archives-ouvertes.fr/pastel-00818345.
Full textCaruana, Emmanuel. "Développement d'une nouvelle mesure d'équilibre pour l'aide à la sélection des variables dans un modèle de score de propension." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCC134/document.
Full textPropensity score (PS) methods have become increasingly used to analyze observational data and take into account confusion bias in final estimate of treatment effects. The goal of the PS is to balance the distribution of potential confounders across treatment groups. The performance of the PS strongly relies on variable selection in PS construction and balance assessment in PS analysis. Specifically, the choice of the variables to be included in the PS model is of paramount importance. In order to priorize inclusion and balance of variables related to the outcome, a new balance measure was proposed in this thesis. First, a new weighted balance measure was studied to help in construction of PS model and to obtain the most parsimonious model, by excluding instrumental variables known to be related with increasing bias in final treatment estimate. Several balances measures are proposed to assess final balance, but none of them help researchers to not include instrumental variables. We propose a new weighted balance measure that takes into account, for each covariate, its strength of association with the outcome. This measure was evaluated using a simulation study to assess whether minimization of the measure coincided with minimally biased estimates. Secondly, we propose to apply this measure to a real data set from an observational cohort study
Diallo, Alpha Oumar. "Inférence statistique dans des modèles de comptage à inflation de zéro. Applications en économie de la santé." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0027/document.
Full textThe zero-inflated regression models are a very powerful tool for the analysis of counting data with excess zeros from various areas such as epidemiology, health economics or ecology. However, the theoretical study in these models attracts little attention. This manuscript is interested in the problem of inference in zero-inflated count models.At first, we return to the question of the maximum likelihood estimator in the zero-inflated binomial model. First we show the existence of the maximum likelihood estimator of the parameters in this model. Then, we demonstrate the consistency of this estimator, and let us establish its asymptotic normality. Then, a comprehensive simulation study finite sample sizes are conducted to evaluate the consistency of our results. Finally, an application on real health economics data has been conduct.In a second time, we propose a new statistical analysis model of the consumption of medical care. This model allows, among other things, to identify the causes of the non-use of medical care. We have studied rigorously the mathematical properties of the model. Then, we carried out an exhaustive numerical study using computer simulations and finally applied to the analysis of a database on health care several thousand patients in the USA.A final aspect of this work was to focus on the problem of inference in the zero inflation binomial model in the context of missing covariate data. In this case we propose the weighting method by the inverse of the selection probabilities to estimate the parameters of the model. Then, we establish the consistency and asymptotic normality of the estimator offers. Finally, a simulation study on several samples of finite sizes is conducted to evaluate the behavior of the estimator
Dufournet, Marine. "Quantification du biais de sélection en sécurité routière : apport de l’inférence causale." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSE1244/document.
Full textMany factors associated with the risk and severity of road accidents are now widely considered as causal : alcohol, speed, usage of a mobile phone... Therefore, questions asked by decision-makers now mostly concern the magnitude of their causal effects, as well as the burden of deaths or victims attributable to these various causes of accident. One particularity of road safety epidemiology is that available data generally describe drivers and vehicles involved in road accidents only, or even severe road accidents only. This extreme selection precludes the estimation of causal effects. To circumvent this absence of « control » population of non-crash involved drivers, it is common to use responsibility analysis and to assess the causal effect of a given factor on the risk of being responsible for an accident among involved drivers. The underlying assumption is that non-responsible drivers represent a random sample of the general driving population that was « selected » to crash by circumstances beyond their control and therefore have the same risk factor profile as other drivers on the road at the same time. However, this randomness assumption is questionable. The objective of this thesis is to determine whether available data in road safety allow us to assess causal effects on responsibility without a residual selection bias. We show that a good approximation of causal effect of a given factor on the risk of being responsible is possible only if the inclusion into the dataset does not depend on the severity of the accident, or if the given factor has no effect on speed. This result is shown by using the Structural Causal Model (SCM) framework. The SCM framework is based on a causal graph : the DAG (directed acyclic graph), which represents the relationships among variables. The DAG allows the description of what we observe in the actual world, but also what we would have observed in counterfactual worlds, if we could have intervened and forced the exposure to be set to a given level. Causal effects are then defined by using counterfactual variables, and it is the DAG’s structure which determines whether causal effects are identifiable, or recoverable, and estimable from the distribution of observed variables. However, the assumptions embedded in the DAG which describes the occurence of a severe accident does not ensure that a causal odds ratios is expressible in terms of the observable distribution. Conditioning the estimations on involved drivers in a severe crash correspond to conditioning on a variable in the DAG called « collider », and to create a « collider bias ». We present numerical results to illustrate our theoretical arguments and the magnitude of the bias between the estimable association measure and some causal effects. Under the simple generative model considered, we show that, when the inclusion depends on the severity of the accident, the bias between the estimable association measure and causal effect is larger than the relation between the exposure and speed, or speed and the occurrence of a severe accident is strong. Moreover, the presented designs allow us to describe some situations where the exposure could be alcohol or cannabis intoxication. In the case of alcohol, where alcohol and speed are positively correlated, the estimable associational effect underestimates the causal effect. In the case of cannabis, where cannabis and speed are negatively correlated, the estimable associational effect overestimates the causal effect. On the other hand, we provide a formal definition of internal and external validity, and a counterfactual interpretation of the estimable quantity in the presence of selection bias, when causal effects are not recoverable. This formal interpretation of the estimable quantity in the presence of selection bias is not only useful in the context of responsibility analyses. It is for instance useful to explain the obesity paradox
Vasseur, Yann. "Inférence de réseaux de régulation orientés pour les facteurs de transcription d'Arabidopsis thaliana et création de groupes de co-régulation." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS475/document.
Full textThis thesis deals with the characterisation of key genes in gene expression regulation, called transcription factors, in the plant Arabidopsis thaliana. Using expression data, our biological goal is to cluster transcription factors in groups of co-regulator transcription factors, and in groups of co-regulated transcription factors. To do so, we propose a two-step procedure. First, we infer the network of regulation between transcription factors. Second, we cluster transcription factors based on their connexion patterns to other transcriptions factors.From a statistical point of view, the transcription factors are the variables and the samples are the observations. The regulatory network between the transcription factors is modelled using a directed graph, where variables are nodes. The estimation of the nodes can be interpreted as a problem of variables selection. To infer the network, we perform LASSO type penalised linear regression. A preliminary approach selects a set of variable along the regularisation path using penalised likelihood criterion. However, this approach is unstable and leads to select too many variables. To overcome this difficulty, we propose to put in competition two selection procedures, designed to deal with high dimension data and mixing linear penalised regression and subsampling. Parameters estimation of the two procedures are designed to lead to select stable set of variables. Stability of results is evaluated on simulated data under a graphical model. Subsequently, we use an unsupervised clustering method on each inferred oriented graph to detect groups of co-regulators and groups of co-regulated. To evaluate the proximity between the two classifications, we have developed an index of comparaison of pairs of partitions whose relevance is tested and promoted. From a practical point of view, we propose a cascade simulation method required to respect the model complexity and inspired from parametric bootstrap, to simulate data under our model. We have validated our model by inspecting the proximity between the two classifications on simulated and real data
Le, Floch Edith. "Méthodes multivariées pour l'analyse jointe de données de neuroimagerie et de génétique." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00753829.
Full textTournebize, Rémi. "Influence des variations spatio-temporelles de l’environnement sur la distribution actuelle de la diversité génétique des populations." Thesis, Montpellier, 2017. http://www.theses.fr/2017MONTT140.
Full textThis project aims at understanding how the structure of the intra-specific genetic diversity in emblematic tropical plant species and in the human species was shaped by the spatiotemporal variation of current and past environments. We developed a genetic inference approach based on the coalescent theory to assess the potential impact of past climatic change onto the evolution of the geographic range and of the neutral and/or adaptive genetic diversity in Amborella trichopoda Baill. in New Caledonia (sister-species of all extant angiosperms, NGS and microsatellite datasets), in Coffea canephora Pierre ex A. Froehn in tropical Africa (Robusta coffee, NGS dataset) and in North-Western European and African (Luhya, Kenya) human populations (NGS dataset 1000 Genomes Project). We found that the climatic fluctuations of the Late Pleistocene influenced the evolution of genetic diversity in these species distributed in temperate and tropical environments. The environmental conditions during the Last Glacial Maximum (LGM, 21.000 years before present) appear as an important factor. The demographic contraction associated with the last global glaciation influenced the divergence between Amborella genetic lineages and contributed to the accumulation of genetic differences between C. canephora lineages. Our results suggest that global glaciation events likely drove idiosyncratic genetic differentiation in tropical rain forests but the intensity of this response varied between species. We also identified multiple events of selection in the genomes of the European human population which were likely triggered by the environmental conditions during the LGM. The associated phenotypic adaptations probably allowed the paleo-populations to maintain their demographic expansion despite the new kinds of selective pressure they faced during the last glacial age in Europe
Patin, Etienne. "Influences du mode de vie sur la diversité génétique des populations humaines." Paris 6, 2008. http://www.theses.fr/2008PA066214.
Full textKere, Eric Nazindigouba. "Analyse économétrique des décisions de production des propriétaires forestiers privés non industriels en France." Thesis, Université de Lorraine, 2013. http://www.theses.fr/2013LORR0052/document.
Full textTimber production is related to economic, climate and energy issues. In France,according to data from the National Institute of Geoinformation and Forestry, thebiological growth rate of the forest is greater than the timber harvest rate. Thus, theFrench government has set a target of harvesting an additional quantity of 21 millioncubic meter of timber by 2020 ("Grenelle de l'environnement, 2007"). However, theFrench forest is majority owned by private forest owners who have preferences forboth income from timber trade and from non-timber amenities. The policies toincrease timber production must include these aspects. The objective of this thesisis to understand the determinants of joint production of timber and non-timberamenities in France.Therefore, we first analyze private forest owners' timber supply, taking into accountindividual and regional determinants. Afterwards, we investigate whether thedrivers of forest owners behavior differ within and between these different levels.We show that similar timber supply behavior can be observed when regional characteristicsor those of peers are similar. Then, we highlight a mimicry behavior injoint production decisions of timber and amenities made by private forest owners.Finally, we analyze inter-temporal trade-offs made by the owners from non-timberamenities and income from the sale of wood. We explicitly take into account theprice expectations and growth. Our estimations show that the willingness to pay fornon-timber amenities is e23 for our case study. This value is the difference betweenthe value they could have earned if they tried to maximize timber revenue and therevenue of their actual logging.Mainly beacause of a lack of involvement of private owners, either through a lackof knowledge or interest in their forest, or because other aspects are privileged (nontimberamenities, e.g.), a part of forest ressource is not subject to a commercial offer.Providing ways to mobilize this ressource is one of the challenges of this work. Weshow that the mimetic effects and the contextual effects can be used to encourageforest owners to produce more timber. An effective policy could be a combinationof these two effects. We also show that an increase in the price of timber or theadoption of a tax may be an incentive for timber harvesting
Le, Goff Line. "Formation spontanée de chemins : des fourmis aux marches aléatoires renforcées." Thesis, Paris 10, 2014. http://www.theses.fr/2014PA100180/document.
Full textThis thesis is devoted to the modelisation of the spontaneous formation of preferential paths by walkers that deposit attractive trails on their trajectories. More precisely, through a multidisciplinary approach, which combines modelisation and experimentation, this thesis aims to bring out a set of minimal individual rules that allow the apparition of this phenomena. In this purpose, we study in several ways the minimal models, which are the Reinforced Random Walks (RRW).This work contains two main parts. The first one proves some new results in the field of probability and statistics. We have generalized the work published by M. Benaïm and O. Raimond in 2010 in order to study the asymptotics of a class of RRW, to which U-turns are forbidden. We developped also a statistical procedure that allows under some appropriate regularity hypotheses to estimate the parameters of parametized RRW and to evaluate margins of error.In the second part, we describe the results and the analyses of a experimental and behavioral study of the Linepithema humile ants. One part of our reflection is centered on the role and the value of the parameters of the model defined by J.-L. Deneubourg et al. in 1990. We investigated also the extent to which RRW could reproduce the moving of an ant in a network. To these purposes, we performed experiments that confront ants to a network of one or several forks. We applied to experimental data the statistical tools developed in this thesis and we performed a comparative study between experiments and simulations of several models
Gabrielli, Maëva. "Histoires évolutives et spéciation chez les Zostérops des Mascareignes (Zosteropidés)." Thesis, Toulouse 3, 2020. http://www.theses.fr/2020TOU30055.
Full textUnderstanding how new species arise is a longstanding question in evolutionary biology. With the recent and major progress in sequencing technologies, this question can now be addressed using genome-wide data. The identification of genomic regions under positive selection and that may act as barriers to gene flow is of particular importance as these regions might be involved in the build-up of reproductive isolation, ultimately leading to speciation. Mascarene white-eyes provide an outstanding system to unravel the processes leading to the formation of new species. In particular, the Reunion grey-white eye, a small passerine bird endemic to the small volcanic island of Reunion in the Mascarene archipelago, comprises four geographic forms that differ strikingly in their plumage colouration and are parapatrically distributed within the island. This system is ideal to try identifying the genomic regions differentiating at the onset of divergence. Using data from genome-wide Single Nucleotide Polymorphism (SNP) markers in hundreds of individuals, we first investigate the evolutionary history of the different geographic forms of the Reunion grey white-eye using phylogenetic inferences. Our results provide strong support in favour of within-island diversification, and highlight a role of both strong selection and low dispersal in driving divergence. We then use complete genome sequences to analyse genomic landscapes of differentiation between Reunion grey white-eye geographic forms and between the Reunion grey white-eye and closely related species. Our findings show that incorporating recombination rate information improves the detection of islands of differentiation in the Reunion grey white-eye that may reflect ongoing selection. Finally, we investigate the impacts of geological and climatic events on the evolutionary trajectories of three Mascarene white-eyes. Our findings suggest that local events in Mauritius or Reunion may be the main driver of demographic trajectories in this system. Overall, this thesis furthers our understanding of the origin of diversity in remote oceanic islands and beyond
Le, floch Edith. "Méthodes multivariées pour l'analyse jointe de données de neuroimagerie et de génétique." Thesis, Paris 11, 2012. http://www.theses.fr/2012PA112214/document.
Full textBrain imaging is increasingly recognised as an interesting intermediate phenotype to understand the complex path between genetics and behavioural or clinical phenotypes. In this context, a first goal is to propose methods to identify the part of genetic variability that explains some neuroimaging variability. Classical univariate approaches often ignore the potential joint effects that may exist between genes or the potential covariations between brain regions. Our first contribution is to improve the sensitivity of the univariate approach by taking advantage of the multivariate nature of the genetic data in a local way. Indeed, we adapt cluster-inference techniques from neuroimaging to Single Nucleotide Polymorphism (SNP) data, by looking for 1D clusters of adjacent SNPs associated with the same imaging phenotype. Then, we push further the concept of clusters and we combined voxel clusters and SNP clusters, by using a simple 4D cluster test that detects conjointly brain and genome regions with high associations. We obtain promising preliminary results on both simulated and real datasets .Our second contribution is to investigate exploratory multivariate methods to increase the detection power of imaging genetics studies, by accounting for the potential multivariate nature of the associations, at a longer range, on both the imaging and the genetics sides. Recently, Partial Least Squares (PLS) regression or Canonical Correlation Analysis (CCA) have been proposed to analyse genetic and transcriptomic data. Here, we propose to transpose this idea to the genetics vs. imaging context. Moreover, we investigate the use of different strategies of regularisation and dimension reduction techniques combined with PLS or CCA, to face the overfitting issues due to the very high dimensionality of the data. We propose a comparison study of the different strategies on both a simulated dataset and a real fMRI and SNP dataset. Univariate selection appears to be necessary to reduce the dimensionality. However, the generalisable and significant association uncovered on the real dataset by the two-step approach combining univariate filtering and L1-regularised PLS suggests that discovering meaningful imaging genetics associations calls for a multivariate approach
Duchemin, Quentin. "Growth dynamics of large networks using hidden Markov chains." Thesis, Université Gustave Eiffel, 2022. https://tel.archives-ouvertes.fr/tel-03749513.
Full textThe first part of this thesis aims at introducing new models of random graphs that account for the temporal evolution of networks. More precisely, we focus on growth models where at each instant a new node is added to the existing graph. We attribute to this new entrant properties that characterize its connectivity to the rest of the network and these properties depend only on the previously introduced node. Our random graph models are thus governed by a latent Markovian dynamic characterizing the sequence of nodes in the graph. We are particularly interested in the Stochastic Block Model and in Random Geometric Graphs for which we propose algorithms to estimate the unknown parameters or functions defining the model. We then show how these estimates allow us to solve link prediction or collaborative filtering problems in networks.The theoretical analysis of the above-mentioned algorithms requires advanced probabilistic tools. In particular, one of our proof is relying on a concentration inequality for U-statistics in a dependent framework. Few papers have addressed this thorny question and existing works consider sets of assumptions that do not meet our needs. Therefore, the second part of this manuscript will be devoted to the proof of a concentration inequality for U-statistics of order two for uniformly ergodic Markov chains. In Chapter 5, we exploit this concentration result for U-statistics to make new contributions to three very active areas of Statistics and Machine Learning.Still motivated by link prediction problems in graphs, we study post-selection inference procedures in the framework of logistic regression with $L^1$ penalty. We prove a central limit theorem under the distribution conditional on the selection event and derive asymptotically valid testing procedures and confidence intervals
Lohier, Théophile. "Analyse temporelle de la dynamique de communautés végétales à l'aide de modèles individus-centrés." Thesis, Clermont-Ferrand 2, 2016. http://www.theses.fr/2016CLF22683/document.
Full textPlant communities are complex systems in which multiple species differing by their functional attributes interact with their environment and with each other. Because of the number and the diversity of these interactions the mechanisms that drive the dynamics of theses communities are still poorly understood. Modelling approaches enable to link in a mechanistic fashion the process driving individual plant or population dynamics to the resulting community dynamics. This PhD thesis aims at developing such approaches and to use them to investigate the mechanisms underlying community dynamics. We therefore developed two modelling approaches. The first one is based on a stochastic modelling framework allowing to link the population dynamics to the community dynamics whilst taking account of intra- and interspecific interactions as well as environmental and demographic variations. This approach is easily applicable to real systems and enables to describe the properties of plant population through a small number of demographic parameters. However our work suggests that there is no simple relationship between these parameters and plant functional traits, while they are known to drive their response to extrinsic factors. The second approach has been developed to overcome this limitation and rely on the individual-based model Nemossos that explicitly describes the link between plant functioning and community dynamics. In order to ensure that Nemossos has a large application potential, a strong emphasis has been placed on the tradeoff between realism and parametrization cost. Nemossos has then been successfully parameterized from trait values found in the literature, its realism has been demonstrated and it has been used to investigate the importance of temporal environmental variability for the coexistence of functionally differing species. The complementarity of the two approaches allows us to explore various fundamental questions of community ecology including the impact of competitive interactions on community dynamics, the effect of environmental filtering on their functional composition, or the mechanisms favoring the coexistence of plant species. In this work, the two approaches have been used separately but their coupling might offer interesting perspectives such as the investigation of the relationships between plant functioning and population dynamics. Moreover each of the approaches might be used to run various simulation experiments likely to improve our understanding of mechanisms underlying community dynamics
Bellot, Benoit. "Améliorer les connaissances sur les processus écologiques régissant les dynamiques de populations d'auxiliaires de culture : modélisation couplant paysages et populations pour l'aide à l'échantillonnage biologique dans l'espace et le temps." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1B008/document.
Full textA promising alternative to the chemical control of pests consists in favoring their natural enemies populations by managing the agricultural landscape structure. Identifying favorable spatio-temporal structures can be performed through the exploration of landscape scenarios using coupled models of landscapes and population dynamics. In this approach, population dynamics are simulated on virtual landscapes with controlled properties, and the observation of population patterns allows for the identification of favorable structures. Population modeling however relies on a good knowledge about the ecological processes and their variability within the landscape elements. Current state of knowledge about the ecological mechanisms underlying natural enemies’ of the carabid family population dynamics remains a major obstacle to in silico investigation of favorable landscape scenarios. Literature about the relationship between carabid population and landscape properties allows the formulation of competing hypotheses about these processes. Reducing the number of these hypotheses by analyzing the convergence between their associated population patterns and investigating the stability of their convergence along a landscape gradient appears to be a necessary tep towards a better knowledge about ecological processes. In a first step, we propose a heuristic method based on the simulation of reaction-diffusion models carrying these competing hypotheses. Comparing the population patterns allowed to set a model typology according to their response to the landscape variable, through a classification algorithm, thus reducing the initial number of competing hypotheses. The selection of the most likely hypothesis from this irreducible set must rely on the observation of population patterns on the field. This implies that population patterns are described with spatial and temporal resolutions that are fine enough to select a unique hypothesis among the ones in competition. In the second part, we propose a heuristic method that allows determining a priori sampling strategies that maximize the robustness of ecological hypotheses selection. The simulation of reaction-diffusion models carrying the ecological hypotheses allows to generate virtual population data in space and time. These data are then sampled using strategies differing in the total effort, number of sampling locations, dates and landscape replicates. Population patterns are described from these samples. The sampling strategies are assessed through a classification algorithm that classifies the models according to the associated patterns. The analysis of classification performances, i.e. the ability of the algorithm to discriminate the ecological processes, allows the selection of optimal sampling designs. We also show that the way the sampling effort is distributed between its spatial and temporal components is strongly impacting the ecological processes inference. Reducing the number of competing ecological hypotheses, along with the selection of sampling strategies for optimal model inference both meet a strong need in the process of knowledge improvement about the ecological processes for the exploration of landscape scenarios favoring ecosystem services. In the last chapter, we discuss the implications and future prospects of our work
Jaureguiberry, Xabier. "Fusion pour la séparation de sources audio." Thesis, Paris, ENST, 2015. http://www.theses.fr/2015ENST0030/document.
Full textUnderdetermined blind source separation is a complex mathematical problem that can be satisfyingly resolved for some practical applications, providing that the right separation method has been selected and carefully tuned. In order to automate this selection process, we propose in this thesis to resort to the principle of fusion which has been widely used in the related field of classification yet is still marginally exploited in source separation. Fusion consists in combining several methods to solve a given problem instead of selecting a unique one. To do so, we introduce a general fusion framework in which a source estimate is expressed as a linear combination of estimates of this same source given by different separation algorithms, each source estimate being weighted by a fusion coefficient. For a given task, fusion coefficients can then be learned on a representative training dataset by minimizing a cost function related to the separation objective. To go further, we also propose two ways to adapt the fusion coefficients to the mixture to be separated. The first one expresses the fusion of several non-negative matrix factorization (NMF) models in a Bayesian fashion similar to Bayesian model averaging. The second one aims at learning time-varying fusion coefficients thanks to deep neural networks. All proposed methods have been evaluated on two distinct corpora. The first one is dedicated to speech enhancement while the other deals with singing voice extraction. Experimental results show that fusion always outperform simple selection in all considered cases, best results being obtained by adaptive time-varying fusion with neural networks
Jaureguiberry, Xabier. "Fusion pour la séparation de sources audio." Electronic Thesis or Diss., Paris, ENST, 2015. http://www.theses.fr/2015ENST0030.
Full textUnderdetermined blind source separation is a complex mathematical problem that can be satisfyingly resolved for some practical applications, providing that the right separation method has been selected and carefully tuned. In order to automate this selection process, we propose in this thesis to resort to the principle of fusion which has been widely used in the related field of classification yet is still marginally exploited in source separation. Fusion consists in combining several methods to solve a given problem instead of selecting a unique one. To do so, we introduce a general fusion framework in which a source estimate is expressed as a linear combination of estimates of this same source given by different separation algorithms, each source estimate being weighted by a fusion coefficient. For a given task, fusion coefficients can then be learned on a representative training dataset by minimizing a cost function related to the separation objective. To go further, we also propose two ways to adapt the fusion coefficients to the mixture to be separated. The first one expresses the fusion of several non-negative matrix factorization (NMF) models in a Bayesian fashion similar to Bayesian model averaging. The second one aims at learning time-varying fusion coefficients thanks to deep neural networks. All proposed methods have been evaluated on two distinct corpora. The first one is dedicated to speech enhancement while the other deals with singing voice extraction. Experimental results show that fusion always outperform simple selection in all considered cases, best results being obtained by adaptive time-varying fusion with neural networks
Alquier, Pierre. "Inférence Adaptative, Inductive et Transductive, pour l'Estimation de la Regression et de la Densité." Phd thesis, 2006. http://tel.archives-ouvertes.fr/tel-00119593.
Full textpropriétés statistiques d'algorithmes d'apprentissage dans le cas de
l'estimation de la régression et de la densité. Elle est divisée en
trois parties.
La première partie consiste en une généralisation des théorèmes
PAC-Bayésiens, sur la classification, d'Olivier Catoni, au cas de la régression avec une fonction de perte
générale.
Dans la seconde partie, on étudie plus particulièrement le cas de la
régression aux moindres carrés et on propose un nouvel algorithme de
sélection de variables. Cette méthode peut être appliquée notamment
au cas d'une base de fonctions orthonormales, et conduit alors à des
vitesses de convergence optimales, mais aussi au cas de fonctions de
type noyau, elle conduit alors à une variante des méthodes dites
"machines à vecteurs supports" (SVM).
La troisième partie étend les résultats de la seconde au cas de
l'estimation de densité avec perte quadratique.
Gagnon, Philippe. "Sélection de modèles robuste : régression linéaire et algorithme à sauts réversibles." Thèse, 2017. http://hdl.handle.net/1866/20583.
Full textParto, Sahar. "Bayesian codon models for detecting convergent molecular adaptation." Thèse, 2017. http://hdl.handle.net/1866/21190.
Full textElgbeili, Guillaume. "Probabilité et temps de fixation à l’aide de processus ancestraux." Thèse, 2013. http://hdl.handle.net/1866/10438.
Full textThe expected time for fixation given its occurrence, and the probability of fixa- tion of a new mutant allele in populations subject to various biological phe- nomena are analyzed using the approach of the ancestral process. First, the paper of Tajima (1990) is analyzed, and the missing or incomplete proofs are fully worked out in this Master thesis in order to familiarize ourselves with calculations of fixation times. Our study of Tajima’s paper helps to show the importance of the fixation time in some biological phenomena. Thereafter, we extend the work of Tajima (1990) by introducing the effect of natural selec- tion in the model. Using a diffusion approximation, the work of Mano (2009) provides an interesting result about the expected time of fixation given its oc- currence. We derived an alternative method that uses an ancestral process that approximates well Mani’s result. Simulations are made to verify the accuracy ofthenewapproach.Finally,onemodelsubjecttogeneconversionisanalyzed, since this phenomenon, in the presence of bias, has a similar effect as selection. We deduce an analytical result for the probability of fixation of a new mutant in the population. Finally, simulations are made to determine the probability of fixation and the time of fixation given its occurrence when rates are too large to be calculated analytically.