Tesi sul tema "Dépistage génétique – Méthodes statistiques"
Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili
Vedi i top-43 saggi (tesi di laurea o di dottorato) per l'attività di ricerca sul tema "Dépistage génétique – Méthodes statistiques".
Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.
Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.
Vedi le tesi di molte aree scientifiche e compila una bibliografia corretta.
Ogloblinsky, Marie-Sophie. "Statistical strategies leveraging population data to help with the diagnosis of rare diseases". Electronic Thesis or Diss., Brest, 2024. http://www.theses.fr/2024BRES0039.
Testo completoHigh genetic heterogeneity and complex modes of inheritance in rare diseases pose the challenge of identifying an n-of-one sequencing data and standard analysis methods. To tackle this issue, the PSAP method uses gene-specific null distributions of CADD pathogenicity scores to assess the probability of observing a given genotype in a healthy population. The goal of this work was to address rare disease lack of diagnosis through statistical strategies. We propose PSAP-genomic-regions an extension of the PSAP method to the non-coding genome, using as testing units predefined regions reflecting functional constraint at the scale of the whole genome.We implemented PSAP-genomic-regions and the initial PSAP-genes in Easy-PSAP a user-friendly and versatile Snakemake workflow, accessible to both researchers and clinicians. When applied to families affected by male infertility, Easy-PSAP allowed the prioritization of relevant candidate variants in known and novel genes. We then focused on digenism, the most simple mode of complex inheritance, which implicates the simultaneous alteration of two genes to develop a disease. We reviewed and benchmarked current methods in the literature to detect digenism and put forward new strategies to improve the diagnostic of this complex mode of inheritance
Boulez, Florence. "Étiologies moléculaires des insuffisances surrénales primaires congénitales : développements statistiques pour la validation du séquençage parallèle massif". Thesis, Lyon, 2018. http://www.theses.fr/2018LYSE1057.
Testo completoPrimary adrenal insufficiency (PAI) is characterized by an impaired production of steroid hormones due to an adrenal cortex defect. This condition exposes to the risk of acute insufficiency which may be life-threatening. Today, 80% of pediatric forms of PAI have a genetic origin but 5% have no clear genetic support. Recently discovered mutations in genes relative to the oxidative stress have opened the way to research works on genes unrelated to the adrenal gland. Massive Parallel Sequencing (MPS) is now able to perform millions of sequences and study simultaneously several genes in several patients, which accelerates the diagnosis. Above all, MPS is the preferred technique for new gene discoveries. However, among the challenges of this new technology one may cite the management of the huge amount of data MPS generates and the need for a strict validation process before the use of MPS for diagnosis purposes.The first objective of the present work was to establish a genetic diagnosis in a cohort of patients with PAI and search for new genes. Study the genotypes and phenotypes allows a better understanding of the physiopathological mechanisms of PAI and offering appropriate care for the patients and counseling for families. The second objective was the development of bioinformatic and statistical inference methods to help shifting from the classical Sanger sequencing to MPS. This shift involves a graphical analysis of the quality of sequencing, an adjustment of log-linear models to allow comparing the properties of different pipelines, an adjustment of the generalized additive models to allow estimating the contributions of various sources of sequencing errors. The statistical methods have considered each DNA base-pair as a statistical unit and each patient as a separate study which confers the simultaneous study of all patients the status of a meta-analysis
Meyer, Nicolas. "Méthodes statistiques d'analyse des données d'allélotypage en présence d'homozygotes". Université Louis Pasteur (Strasbourg) (1971-2008), 2007. https://publication-theses.unistra.fr/public/theses_doctorat/2007/MEYER_Nicolas_2007.pdf.
Testo completoAllelotyping data contain measures done using Polymerase Chain Reaction on a batch of DNA microsatellites in order to ascertain the presence or not of an allelic imbalance for this microsatellites. From a statistical point of view, those data are characterised by a high number of missing data (in case of homozygous microsatellite), square or °at matrices, binomial data, sample sizes which may be small with respect to the number of variables and possibly some colinearity. Frequentist statistical methods have a number of shortcomings who led us to choose a bayesian framework to analyse these data. For univariate analyses, the Bayes factor is explored and several variants according to the presence or absence of missing data are compared. Di®erent multiple imputations types are then studied. Meta-analysis models are also assessed. For multivariate analyses, a Partial Least Square model is developed. The model is applied under a generalised linear model (logistic regression) and combined with a Non Iterative Partial Least Squares algorithm which 3 makes it possible to manage simultaneously all the limits of allelotyping data. Properties of this model are explored. It is then applied on allelotyping data on 33 microsatellites of 104 patients who have colon cancer to predict the tumor Astler-Coller stage. A model with all possible microsatellites pairs interactions is also run
Accrachi, El Hadji Ousseynou. "Nouveau cadre statistique pour la cartographie-fine". Master's thesis, Université Laval, 2021. http://hdl.handle.net/20.500.11794/67891.
Testo completoLeclerc, Martin. "Tests d'association génétique pour des durées de vie en grappes". Doctoral thesis, Université Laval, 2016. http://hdl.handle.net/20.500.11794/26667.
Testo completoLes outils statistiques développés dans cette thèse par articles visent à détecter de nouvelles associations entre des variants génétiques et des données de survie en grappes. Le développement méthodologique en analyse des durées de vie est aujourd'hui ininterrompu avec la prolifération des tests d'association génétique et, de façon ultime, de la médecine personnalisée qui est centrée sur la prévention de la maladie et la prolongation de la vie. Dans le premier article, le problème suivant est traité : tester l'égalité de fonctions de survie en présence d'un biais de sélection et de corrélation intra-grappe lorsque l'hypothèse des risques proportionnels n'est pas valide. Le nouveau test est basé sur une statistique de type Cramérvon Mises. La valeur de p est estimée en utilisant une procédure novatrice de bootstrap semiparamétrique qui implique de générer des observations corrélées selon un devis non-aléatoire. Pour des scénarios de simulations présentant un écart vis-à-vis l'hypothèse nulle avec courbes de survie qui se croisent, la statistique de Cramer-von Mises offre de meilleurs résultats que la statistique de Wald du modèle de Cox à risques proportionnels pondéré. Le nouveau test a été utilisé pour analyser l'association entre un polymorphisme nucléotidique (SNP) candidat et le risque de cancer du sein chez des femmes porteuses d'une mutation sur le gène suppresseur de tumeur BRCA2. Un test d'association sequence kernel (SKAT) pour détecter l'association entre un ensemble de SNPs et des durées de vie en grappes provenant d'études familiales a été développé dans le deuxième article. La statistique de test proposée utilise la matrice de parenté de l'échantillon pour modéliser la corrélation intra-famille résiduelle entre les durées de vie via une copule gaussienne. La procédure de test fait appel à l'imputation multiple pour estimer la contribution des variables réponses de survie censurées à la statistique du score, laquelle est un mélange de distributions du khi-carré. Les résultats de simulations indiquent que le nouveau test du score de type noyau ajusté pour la parenté contrôle de façon adéquate le risque d'erreur de type I. Le nouveau test a été appliqué à un ensemble de SNPs du locus TERT. Le troisième article vise à présenter le progiciel R gyriq, lequel implante une version bonifiée du test d'association génétique développé dans le deuxième article. La matrice noyau identical-by-state (IBS) pondérée a été ajoutée, les tests d'association génétique actuellement disponibles pour des variables réponses d'âge d'apparition ont été brièvement revus de pair avec les logiciels les accompagnant, l'implantation du progiciel a été décrite et illustrée par des exemples.
The statistical tools developed in this manuscript-based thesis aim at detecting new associations between genetic variants and clustered survival data. Methodological development in lifetime data analysis is today ongoing with the proliferation of genetic association testing and, ultimately, personalized medicine which focuses on preventing disease and prolonging life. In the first paper, the following problem is considered: testing the equality of survival functions in the presence of selection bias and intracluster correlation when the assumption of proportional hazards does not hold. The new proposed test is based on a Cramér-von Mises type statistic. The p-value is approximated using an innovative semiparametric bootstrap procedure which implies generating correlated observations according to a non-random design. For simulation scenarios of departures from the null hypothesis with crossing survival curves, the Cramer-von Mises statistic clearly outperformed the Wald statistic from the weighted Cox proportional hazards model. The new test was used to analyse the association between a candidate single nucleotide polymorphism (SNP) and breast cancer risk in women carrying a mutation in the BRCA2 tumor suppressor gene. A sequence kernel association test (SKAT) to detect the association between a set of genetic variants and clustered survival outcomes from family studies is developed in the second manuscript. The proposed statistic uses the kinship matrix of the sample to model the residual intra-family correlation between survival outcomes via a Gaussian copula. The test procedure relies on multiple imputation to estimate the contribution of the censored survival outcomes to the score statistic which is a mixture of chi-square distributions. Simulation results show that the new kinship-adjusted kernel score test controls adequately for the type I error rate. The new test was applied to a set of SNPs from the TERT locus. The third manuscript aims at presenting the R package gyriq which implements an enhanced version of the genetic association test developed in the second manuscript. The weighted identical-by-state (IBS) kernel matrix is added, genetic association tests and accompanying software currently available for age-at-onset outcomes are briefly reviewed, the implementation of the package is described, and illustrated through examples.
Guedj, Mickaël. "Méthodes Statistiques pour l’analyse de données génétiques d’association à grande échelle". Evry-Val d'Essonne, 2007. http://www.biblio.univ-evry.fr/theses/2007/2007EVRY0015.pdf.
Testo completoThe increasing availability of dense Single Nucleotide Polymorphisms (SNPs) maps due to rapid improvements in Molecular Biology and genotyping technologies have recently led geneticists towards genome-wide association studies with hopes of encouraging results concerning our understanding of the genetic basis of complex diseases. The analysis of such high-throughput data implies today new statistical and computational problematic to face, which constitute the main topic of this thesis. After a brief description of the main questions raised by genome-wide association studies, we deal with single-marker approaches by a power study of the main association tests. We consider then the use of multi-markers approaches by focusing on the method we developed which relies on the Local Score. Finally, this thesis also deals with the multiple-testing problem: our Local Score-based approach circumvents this problem by reducing the number of tests; in parallel, we present an estimation of the Local False Discovery Rate by a simple Gaussian mixed model
Di, Giacomo Daniela. "Développement de méthodes moléculaires pour la détection et l'interprétation de mutations : applications aux cancers du colon et aux prédispositions génétiques aux cancers du sein et de l'ovaire". Rouen, 2013. http://www.theses.fr/2013ROUENR02.
Testo completoLa prima parte di questo lavoro di tesi riguarda la ricerca sensibile di mutazioni nei geni KRAS e BRAF in tumori primari di pazienti affetti da cancro del colon metastatico. Il trattamento in prima linea di questi pazienti, seguiti nel reparto di Oncologia dell'Ospedale S. Salvatore di L'Aquila, è basato su una triplice chemioterapia combinata con un trattamento anti-angiogenico (anti-VGFR; Bevacizumab). Per il genotipaggio del DNA tumorale abbiamo utilizzato la metodica SNaPshot, seguendo il protocollo messo a punto a Rouen, nei laboratori di Genetica somatica dei tumori. Questa metodica, infatti, permette di rilevare mutazioni anche in campioni contenenti una bassa percentuale di cellule tumorali. Su una serie di 59 pazienti, 31 (53%) sono risultati wild-type e 28 (47%) mutati KRAS (codoni 12 e 13). In questa serie di pazienti non sono state rilevate mutazioni nel gene BRAF. Per quanto riguarda l'evoluzione clinica, nel corso del protocollo terapeutico utilizzato, non è stata trovata nessuna differenza significativa tra il gruppo KRAS wild-type e KRAS mutato. Tuttavia, per questi pazienti trattati con triplice chemioterapia più Bevacizumab, la mutazione c. 35G>A (Gly12Asp), sul gene KRAS, trovata in 15 pazienti (25%), è stata associata significativamente ad una prognosi sfavorevole di sopravvivenza globale. La seconda parte di questa tesi è incentrata sull'interpretazione di varianti di sequenza di significato sconosciuto (VUS), trovate in famiglie con predisposizione genetica al tumore del seno e dell'ovaio, con un interesse particolare sull'effetto che queste varianti di sequenza hanno sullo splicing dell'RNA messaggero. Questo lavoro è stato realizzato in gran parte nell'Unità INSERM U1079, della facoltà di Medicina e Farmacia dell'Università di Rouen, utilizzando sistematicamente un test funzionale di splicing basato sulla trasfezione transitoria di minigeni che portano il cambio di sequenza. In una prima fase, il test, che si avvale di routine dell'utilizzo del minigene pCAS-2 messo a punto nell'Unità INSERM U1079, è stato utilizzato per studiare delle serie importanti di VUS trovate nella rete dei laboratori di diagnostica molecolare francesi o nei laboratori di diagnostica molecolare di L'Aquila e di Roma. Il progetto è stato focalizzato successivamente su un esone particolare del gene BRCA2, l'esone 7, selezionato come modello di regolazione esonica di splicing. Il lavoro descritto in questa tesi si incentra su un totale di 32 varianti di sequenza di questo esone analizzate nel minigene pCAS-2, nonché una gran parte anche nel minigene pcDNA-Dup, sviluppato nei laboratori INSERM U1079, che permette di individuare le variazioni di attività "enhancer di splicing" associate con i cambi di sequenza. Queste 32 varianti sono state anche classificate in due gruppi, in base al loro effetto sulla regolazione esonica di splicing: 11 aumentano, con livelli differenti, l'esclusione dell'esone 7 di BRCA2; 22 non aumentano l'esclusione. Questa importante serie di varianti di sequenza con effetti accertati sulla regolazione dello splicing ci ha permesso di validare un nuovo metodo per prevedere mutazioni esoniche di splicing (Ke et al. , 2011). Gli autori di questo metodo hanno condotto un'analisi sperimentale high-throughput sugli effetti di tutti i possibili 4096 esameri, inseriti in esoni modello, in diverse posizioni e assegnando a ciascun esamero uno "score" di inclusione/esclusione dell'esone. Noi abbiamo utilizzato questi scores per sviluppare una strategia di predizione dell'effetto delle varianti di sequenza studiate sperimentalmente nell'esone 7 di BRCA2. E' da notare come le predizioni del nuovo metodo basato sugli scores di esameri definiti da Ke et al. , 2011, sono risultate perfettamente concordanti con i risultati ottenuti, fatta eccezione per due VUS situate nella stessa posizione nucleotidica, per le quali non è stato osservato l'effetto previsto sullo splicing. I contributi maggiori di questa sezione della tesi sono stati la cartografia dettagliata degli elementi di regolazione esonici di splicing nell'esone 7 di BRCA2 e la validazione di una metodica di predizione dell'effetto che varianti di sequenza hanno su questa regolazione. Abbiamo dimostrato che questa nuova metodica di predizione è più affidabile dei metodi precedenti e proponiamo che questa possa essere incorporata attraverso programmi informatici adeguati nell'analisi di routine delle numerose varianti di sequenza osservate nelle attività di sequenziamento di nuova generazione. Questo lavoro contribuisce all'interpretazione delle VUS trovate in geni predisponenti al cancro in quanto dimostra che le variazioni di sequenza dell'esone, spesso hanno un impatto sulla maturazione dell'RNA messaggero, non solo per le modificazioni dei siti di splicing, ma anche per l'alterazione degli elementi esonici di regolazione. Gli effetti di queste alterazioni sono molto spesso parziali, il che rende difficile definire la loro eventuale patogenicità. Si propone di rafforzare studi multicentrici in modo da poter combinare i dati provenienti da diverse fonti, tra cui la struttura familiare, la segregazione di VUS, i dati clinici e le caratteristiche del tumore per definire un consenso per l'interpretazione di questi difetti parziali splicing
La première partie de ce travail de thèse porte sur la détection sensible des mutations des gènes KRAS et BRAF dans les tumeurs primaires de patients atteints de cancer du colon métastasique. Le traitement de première ligne de ces patients, suivis dans le service d'Oncologie de l'Hôpital universitaire San Salvatore de L'Aquila, est basé sur une triple chimiothérapie combinée avec un traitement anti-angiogénique (anti-VGFR ; Bevacizumab). Nous avons utilisé pour le génotypage de l'ADN tumoral la méthode SNaPshot, d'après le protocole mis au point à Rouen, dans le laboratoire de Génétique Somatique des Tumeurs, car cette méthode permet de détecter des mutations même dans des échantillons contenant une faible proportion de cellules tumorales. Sur une série de 59 patients, 31 (53%) ont été trouvés sauvages et 28 (47%) ont été trouvés mutés dans KRAS (codons 12 et 13). Aucune mutation BRAF n'a été trouvée dans cette série. Aucune différence significative parmi les groupes KRAS sauvage et KRAS muté n'a été trouvée dans l'évolution clinique, au cours du protocole thérapeutique utilisé. Cependant, pour ces patients traités par triple chimiothérapie plus Bevacizumab, la mutation c. 35G>A (Glyl2Asp), trouvée dans 15 patients (25%), était associée significativement à un mauvais pronostic de survie globale. La deuxième partie de cette thèse a porté sur l'interprétation des variations de séquence de signification inconnue (VSI), trouvées dans des familles avec prédisposition génétique aux cancers du sein et de l'ovaire, avec un intérêt particulier pour l'effet de ces variations de séquence sur l'épissage de l'ARN messager. Ce travail a été réalisé en grande partie dans l'Unité Inserm U1079, à la Faculté de Médecine et Pharmacie de l'Université de Rouen, en utilisant systématiquement les tests fonctionnels crépissage basés sur la transfection transitoire de minigènes, portant les changements de séquence. Dans une première phase, le test de routine basé sur le minigène pCAS-2, développé dans l'Unité Inserm U1079, a été utilisé pour étudier des séries importantes de VSI trouvés dans les laboratoires de diagnostic moléculaire du réseau BRCA français ou dans les laboratoires de diagnostic moléculaire de L'Aquila et de Rome. Le projet a été ensuite focalisé sur un exon particulier du gène BRCA2, l'exon 7, choisi comme modèle de régulation exonique de l'épissage. Les travaux décrits dans cette thèse portent sur un total de 32 changements de séquence de cet exon, testés dans le minigène pCAS-2 et en grande partie également dans le minigène pcDNA-Dup, développé dans le laboratoire Inserm U1079, qui permet de détecter les variations d'activité « enhancer d'épissage » associées avec les changements de séquence. Ces 32 changements ont été ainsi classés en deux groupes, selon leur effet sur la régulation exonique de l'épissage : 11 augmentent, avec des degrés différents, l'exclusion de l'exon 7 de BRCA2, et 22 n'augmentent pas l'exclusion. Cette série importante de variations de séquence avec effets établis sur la régulation de l'épissage nous a permis de valider une nouvelle méthode pour la prédiction des mutations exoniques d'épissage (Ke et al. , 2011). Ces auteurs ont réalisé une analyse expérimentale à haut débit de l'effet de tous les 4096 hexamères possibles, insérés dans des exons modèles, à plusieurs positions et ont attribué à chaque hexamère un « score » d'inclusion/exclusion d'exon. Nous avons utilisé ces scores pour développer une stratégie prédictive de l'effet des variations de séquence étudiées expérimentalement dans l'exon 7 de BRCA2. De façon remarquable, le prédictions de la nouvelle méthode basée sur les scores d'hexamères définis par Ke et al. , 2011 ont été parfaitement concordantes avec les résultats obtenus, à l'exception de deux VSI, situés à la même position nucléotidique, pour lesquels un effet prévu sur l'épissage n'a pas été observé. Les contributions majeures de cette partie du travail de thèse sont la cartographie détaillée des éléments de régulation exonique de l'épissage dans l'exon 7 de BRCA2 et la validation d'une méthode de prédiction de l'effet de changements de séquence sur cette régulation. Nous avons montré que cette nouvelle méthode de prédiction est plus fiable que les méthodes précédentes et nous proposons qu'elle soit intégrée, sous la forme de programmes informatiques appropriés, dans l'analyse de routine des nombreuses variations de séquence observées dans les activités de séquençage de nouvelle génération. Ce travail contribue à l'interprétation des VSI trouvés dans les gènes de prédisposition aux cancers, car il montre que les variations exoniques de séquence ont souvent un impact sur la maturation de l'ARN messager, non seulement par la modification des sites d'épissage, mais aussi par l'altération d'éléments de régulation exonique. Les effets de ces altérations sont le plus souvent partiels, ce qui complique la définition de leur pathogénicité éventuelle. Nous proposons le renforcement d'études multicentriques permettant de combiner les données provenant de plusieurs sources, notamment la structure familiale, la ségrégation du VSI, les données cliniques et les caractéristiques tumorales, afin de définir un consensus pour l'interprétation de ces défauts partiels de l'épissage
Roldan, Dana Leticia. "Détection de QTL : interaction entre dispositif expérimental et méthodes statistiques". Toulouse 3, 2011. http://thesesups.ups-tlse.fr/1395/.
Testo completoGenomic regions carrying polymorphisms associated with variation in quantitative traits are termed quantitative trait loci (QTL). Until recently, mapping QTL was mainly based on microsatellite markers. The density of these markers is such that detection of associations between markers and QTL can only be based on linkage analysis (LA), and a family structured design is needed. Once a chromosomal region has been identified to carry a putative QTL, more markers should be developed at a higher density within that region. Tightly linked markers are needed for sufficiently narrowing down the putative QTL position such that finding actual gene mutations becomes feasible. The new SNP markers make this objective realistic, allowing fine mapping based on linkage disequilibrium (LD) of these markers and QTL across families. Designing experiments aiming at fine mapping QTL combining LD and LA (LDLA) is a question raised after a primo localisation obtained from classical family LA, or directly where primo and fine localisation steps are confounded. Questions related to this designing problem were addressed in this thesis: how should one balance family size and number in LDLA design? What is the best LDLA protocol to fine map QTL that were previously roughly localised in a classical LA analysis?. Three steps were followed: (i) Evaluation of a new LDLA method based on regression (Legarra and Fernando, 2009) and numerical comparison of this method with a variance component IBD (identical by descendant) based method (Meuwissen et al. , 2002). The regression approach appeared to be generally as precise as the Meuwissen et al. (2002) method and always much faster. (ii) Design optimization, using this LDLA regression technique, in terms of number of progenies by sire, and type of families (half-sib to mixture full- and half-sib). We found that QTL is more exactly localised with LDLA rather than LA and that experimental structure as well as haplotypes sizes have a big impact on this localisation. A balance between family number and size must be found depending on the case characteristics (explored segment length, marker density, total population size, etc. . . ). (iii) Application of the mapping method to the wool production traits, an example among other of quantitative traits. In the first stage familial linkage analysis was applied to real half-sibs Merino sheep population measured for wool traits. This population consisted in 617 individuals belonging to 10 sire half-sibs. Forty eight microsatellites were used, covering 280. 70 cM in candidate areas. QTLs were found, in particular affecting the fibre diameter coefficient of variation at position 67. 60 cM on OAR11. In a second stage, we evaluated, considering the specificity of our ovine population, the recommendations established after the step (ii) concerning the organisation of a LDLA design. This work allowed as to make practical conclusions for a fine mapping of wool trait QTL in our population
Privé, Florian. "Genetic risk score based on statistical learning Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr Efficient implementation of penalized regression for genetic risk prediction Making the most of Clumping and Thresholding for polygenic scores". Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAS024.
Testo completoGenotyping is becoming cheaper, making genotype data available for millions of indi-viduals. Moreover, imputation enables to get genotype information at millions of locicapturing most of the genetic variation in the human genome. Given such large data andthe fact that many traits and diseases are heritable (e.g. 80% of the variation of heightin the population can be explained by genetics), it is envisioned that predictive modelsbased on genetic information will be part of a personalized medicine.In my thesis work, I focused on improving predictive ability of polygenic models.Because prediction modeling is part of a larger statistical analysis of datasets, I de-veloped tools to allow flexible exploratory analyses of large datasets, which consist intwo R/C++ packages described in the first part of my thesis. Then, I developed someefficient implementation of penalized regression to build polygenic models based onhundreds of thousands of genotyped individuals. Finally, I improved the “clumping andthresholding” method, which is the most widely used polygenic method and is based onsummary statistics that are widely available as compared to individual-level data.Overall, I applied many concepts of statistical learning to genetic data. I used ex-treme gradient boosting for imputing genotyped variants, feature engineering to cap-ture recessive and dominant effects in penalized regression, and parameter tuning andstacked regressions to improve polygenic prediction. Statistical learning is not widelyused in human genetics and my thesis is an attempt to change that
Guedj, Mickael. "Méthodes Statistiques pour l'Analyse de Données Génétiques d'Association à Grande Echelle". Phd thesis, Université d'Evry-Val d'Essonne, 2007. http://tel.archives-ouvertes.fr/tel-00169411.
Testo completoAprès une description introductive des principales problématiques liées aux études d'association à grande échelle, nous abordons plus particulièrement les approches simple-marqueur avec une étude de puissance des principaux tests d'association, ainsi que de leur combinaisons. Nous considérons ensuite l'utilisation d'approches multi-marqueurs avec le développement d'une méthode d'analyse fondée à partir de la statistique du Score Local. Celle-ci permet d'identifier des associations statistiques à partir de régions génomiques complètes, et non plus des marqueurs pris individuellement. Il s'agit d'une méthode simple, rapide et flexible pour laquelle nous évaluons les performances sur des données d'association à grande échelle simulées et réelles. Enfin ce travail traite également du problème du test-multiple, lié aux nombre de tests à réaliser lors de l'analyse de données génétiques ou génomiques haut-débit. La méthode que nous proposons à partir du Score Local prend en compte ce problème. Nous évoquons par ailleurs l'estimation du Local False Discovery Rate à travers un simple modèle de mélange gaussien.
L'ensemble des méthodes décrites dans ce manuscrit ont été implémentées à travers trois logiciels disponibles sur le site du laboratoire Statistique et Génome : fueatest, LHiSA et kerfdr.
Martins, Helena. "Méthodes statistiques pour identifier l'adaptation locale dans les populations continues et mélangées". Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAS022/document.
Testo completoFinding genetic signatures of local adaptation is of great interest for many population genetic studies. Common approaches to sorting selective loci from their genomic background focus on the extreme values of the fixation index, FST, across loci. However, the computation of the fixation index becomes challenging when the population is genetically continuous, when predefining subpopulations is a difficult task, and in the presence of admixed individuals in the sample. In this thesis, we present a new method to identify loci under selection based on an extension of the FST statistic to samples with admixed individuals. Considering our goal of exploring statistical methods to identify local adaptation in admixed population, we included spatial data to compute ancestry coefficients and allele frequencies. To enrich our work, we investigated the effects of linkage disequilibrium and LD-pruning methods in genome scans for selection
Sedki, Mohammed. "Échantillonnage préférentiel adaptatif et méthodes bayésiennes approchées appliquées à la génétique des populations". Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2012. http://tel.archives-ouvertes.fr/tel-00769095.
Testo completoJomphe, Valérie. "Comparaison de la puissance de tests de déséquilibre de liaison dans les études génétiques". Thesis, Université Laval, 2006. http://www.theses.ulaval.ca/2006/23910/23910.pdf.
Testo completoInscrite au Tableau d'honneur de la Faculté des études supérieures
Kileh, Wais Mohamed. "Méthodes statistiques pour la détection de QTL : nouveaux développements et applications chez le canard mulard". Thesis, Paris, AgroParisTech, 2012. http://www.theses.fr/2012AGPT0054/document.
Testo completoQTL detection using the regression of phenotypes on transmission probability is largely used when large families phenotyped for Gaussian trait are available. The aim of this thesis from a methodological point of view, is to propose a method for detection of QTL that takes into account the small number of families on the one hand, and the existence of discrete traits on the other. Thus, we propose to answer the first question, an QTL detection approach, integrating in the calculation of genetic merit of genotyped individuals, the performances calculated over n generations of descendants. The use of a ‘de-regressed proof' as a phenotype to be analysed, proposed by Weller et al. (1990) and Tribout et al. (2008) is generalized. Next, we present the results of comparisons of a model assuming normality of the data to a thresholds model assuming a continuous distribution underlying the observed distribution in the QTL detection of discrete traits. Here we demonstrate that the discrete model is more accurate and more powerful when the studied trait has three modalities distributed unevenly in the population.In the second part of the thesis, the data analysis of GENECAN protocol was performed. This is to identify genomic regions or quantitative trait locus (QTL) associated with interest traits measured on over-feed mule ducks. The mule duck is an hybrid duck from a female Common duck (Anas Platyrhynchos) and a Muscovy drake (Cairina moschata). Three hundred forty two common ducks designed by back-cross (BC) were generated by crossing a line of Kaiya duck and a heavy line of Pekin duck. These BC females were mated with Muscovy ducks to produce 1600 mules ducks which undergo measures of growth, metabolism during the growth and over-feeding periods, over-feeding, of breast muscle and fatty liver qualities. The phenotypic value of genotyped BC females was estimated for each trait as the average phenotypes of their offspring and weighted by a coefficient of determination (CD) function on the number of offspring and heritability of the studied trait. The genetic map comprised 91 microsatellite markers aggregated into 16 linkage groups (LG) and representing 778 cM. For the uni-trait analysis, twenty-two QTL significant at 1% threshold in chromosome-wide have been mapped. These QTLs are mostly involved in the variability of the breast muscle and fatty liver qualities. Chromosomal regions of interest identified in the framework of this study should be in the future be densified to markers to do the fine mapping
Persyn, Elodie. "Analyse d’association de variants génétiques rares dans une population démographiquement stable". Thesis, Nantes, 2017. http://www.theses.fr/2017NANT1016/document.
Testo completoGenome-wide association studies have identified many common risk alleles for a wide variety of complex diseases. However these common variants explain a very small part of the heritability. A hypothesis is the presence of rare genetic variants with stronger effects. Testing the association of those rare variants is challenging due to their low frequency in populations. Many statistical methods have been developed with the strategy to aggregate the information for a group a rare variants. This thesis aims to compare the main strategies through simulating under various genetic scenarios and the application to real sequencing data. We also developed a statistical test, called DoEstRare, which can detect clustered disease-risk variants in local genetic regions, by comparing the position distributions between cases and controls. Moreover, it has been shown that population stratification represents a confounding factor in the analysis interpretations for rare variants. With the recruitment of controls, in the context of projects such as French Exome and VACARME, it is necessary to assess the impact of a very fine geographical structure (France) for different statistical strategies. The second part of this thesis consists in estimating this impact by simulating fine-scale population structures
Elfassihi, Latifa. "Modèles d'analyse simultanée et conditionnelle pour évaluer les associations entre les haplotypes des gènes de susceptibilité et les traits des maladies complexes : Application aux gènes candidats de l'ostéoporose". Thesis, Université Laval, 2010. http://www.theses.ulaval.ca/2010/27404/27404.pdf.
Testo completoSavard, Nathalie. "Méthode d'analyse de liaison génétique pour des familles dans lesquelles il y a de l'hétérogénéité non-allélique intra-familiale". Master's thesis, Université Laval, 2006. http://hdl.handle.net/20.500.11794/18233.
Testo completoThis study presents a linkage analysis method for cases of recombination heterogeneity when it is located in bilineal pedigrees. We propose a modification of the single-locus analysis by Smith's admixture model - which is concerned with inter-familial heterogeneity - so it becomes more appropriate for cases of intra-familial heterogeneity. Our approach first consists in decomposing large pedigrees into nuclear pedigrees so that the intra-familial heterogeneity of the large pedigrees is transformed into inter-familial heterogeneity between the nuclear pedigrees. Then, the nuclear pedigrees are considered both with a single-locus analysis and Smith's admixture model. The power of the proposed method is compared to the power of other methods, including the power of the specific case where there is intra-familialheterogeneity in large pedigrees. We also verify if the decomposition of the pedigrees results in a bigger proportion of type I errors.
Larouche, Geneviève. "Le dépistage par mammographie chez les femmes ayant été testées pour les gènes BRCA1/2 : évaluation des méthodes de rapport et comparaison des taux d'utilisation après et avant le test génétique". Doctoral thesis, Université Laval, 2016. http://hdl.handle.net/20.500.11794/27327.
Testo completoThis thesis aims to assess the effect of BRCA1/2 genetic testing on screening practices according to test results. Three studies were carried out. The participants in these studies were tested for genetic susceptibility to breast and ovarian cancer in the INHERIT BRCAs (Interdisciplinary Health Research Team on BReast CAncer susceptibility) research program, conducted between 1998 and 2004. Self-reported and administrative data from the Quebec Health Insurance Board database (“Régie de l'assurance maladie du Québec” (RAMQ)) for these participants were used. The results from the first two studies were used to support methodological choices in the main study. Since women who were tested for BRCA1/2 tend to overestimate their use of mammography, administrative data are preferable than self-reported information to assess their use of breast cancer screening. RAMQ data are thus considered as a better means of assessing mammography screening following genetic testing BRCA1/2, since specific procedure codes covering all mammography exams, whether done in a private clinic or hospital, can be tracked. Analyses of RAMQ data carried out in the main study suggest that BRCA1/2 mutation carriers and women with an inconclusive test result had more screening mammography after, than prior to, genetic testing. Conversely, non-carriers did not have more breast screening exams. In conclusion, this thesis has allowed a better understanding of the long-term use of mammography after BRCA1/2 genetic testing. It specifically showed that young female non-carriers, contrary to what was expected, do not change their use of mammography after genetic testing. These women could therefore benefit from interventions to improve their cancer screening to their specific level of risk for breast and ovarian cancer. Cancer screening methods that are better adapted to cancer risk would contribute to optimizing utilisation of health resources. Indeed, a stratification risk approach to breast cancer and personalized screening measures should lead to changes in the current recommendations for breast cancer screening. The adherence of women and physicians to these new approaches will then need to be further evaluated.
Saad, Mohamad. "Méthodes statistiques et stratégies d'études d'association de phénotypes complexes : études pan-génomiques de la maladie de Parkinson". Toulouse 3, 2012. http://thesesups.ups-tlse.fr/1657/.
Testo completoMy thesis has focused on statistical methods and strategies to study the genetic components of complex human traits and especially of Parkinson's Disease (PD). My work was developed mainly in two contexts of genome wide association studies (GWAS): the detection of common variants and the detection of rare variants. GWAS is an optimal approach in which we have to control for the type I error and the type II error rates. Indeed, a large number of tests are performed. In addition, we must control for potential population stratification problems. Despite the large sample sizes in recent GWASs based on the single-marker test, they may have individually low power to detect common variants with small effects. The use of the multi-marker test may optimize the coverage of genetic variability and thus increase the power of GWAS. I have focused on the study of these tests, especially the "SNP-Set" test based on kernel machine regression and the haplotypic test. I studied the theoretical aspects of these tests and I evaluated the statistical properties in our empirical data for PD. In addition, in our analyses for PD, I developed imputation and meta-analysis techniques to increase the coverage of the genetic variability and the sample size. Association analysis for rare variants faces several challenges. The single marker test is not powerful to detect such variants and the cost of whole-genome sequence analyses for complex traits is still prohibitive. Our design is a cost-effective alternative which is based on the joint use of public sequence data and GWAS data. Several new tests have been proposed but, to date, their statistical properties are still unclear. On the genome-wide level, the type I error and the type II error rates may depend on several factors as gene length, allelic heterogeneity in the gene, LD between SNPs, overlap between genes and the correlation between the common variants and the trait. I evaluated the statistical properties of several methods in simulated data and also in our GWAS PD data. We show that several methods, based on the linear mixed model, are mathematically equivalent and some are special cases of others. In conclusion, we developed strategies and analytical methods which combine complementary approaches (Common Disease-Common Variant versus Common Disease-Rare Variant) to optimize the characterization of the genetic components of PD in particular and of complex traits in general
Merle, Coralie. "Nouvelles méthodes d'inférence de l'histoire démographique à partir de données génétiques". Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT269/document.
Testo completoThis thesis aims to improve statistical methods suitable for stochastic models of population genetics and to develop statistical methods adapted to next generation sequencing data.Sequential importance sampling algorithms have been defined to estimate likelihoods in models of ancestral population processes. However, these algorithms are based on features of the models with constant population size, and become inefficient when the population size varies in time, making likelihood-based inferences difficult in many demographic situations. In the first contribution of this thesis, we modify a previous sequential importance sampling algorithm to improve the efficiency of the likelihood estimation. Our procedure is still based on features of the model with constant size, but uses a resampling technique with a new resampling probability distribution depending on the pairwise composite likelihood. We tested our algorithm, called sequential importance sampling with resampling (SISR) on simulated data sets under different demographic cases. In most cases, we divided the computational cost by two for the same accuracy of inference, in some cases even by one hundred. This work provides the first assessment of the impact of such resampling techniques on parameter inference using sequential importance sampling, and extends the range of situations where likelihood inferences can be easily performed.The recent development of high-throughput sequencing technologies has revolutionized the generation of genetic data for many organisms : genome wide sequence data are now available. Classical inference methods (maximum likelihood methods (MCMC, IS), methods based on the Sites Frequency Spectrum (SFS)) suitable for polymorphism data sets of some loci assume that the genealogies of the loci are independent. To take advantage of genome wide sequence data with known genome, we need to consider the dependency of genealogies of adjacent positions in the genome. Thus, when we model recombination, the likelihood takes the form of an integral over all possible ancestral recombination graph for the sampled sequences. This space is of much larger dimension than the genealogies space, to the extent that we cannot handle likelihood-based inference while modeling recombination without further approximations.Several methods infer the historical changes in the effective population size but do not consider the complexity of the demographic model fitted.Even if some of them propose a control for potential over-fitting, to the best of our knowledge, no model choice procedure between demographic models of different complexity have been proposed based on IBS segment lengths. The aim of the second contribution of this thesis is to overcome this lack by proposing a model choice procedure between demographic models of different complexity. We focus on a simple model of constant population size and a slightly more complex model with a single past change in the population size.Since these models are embedded, we developed a penalized model choice criterion based on the comparison of observed and predicted haplotype homozygosity.Our penalization relies on Sobol's sensitivity indices and is a form of penalty related to the complexity of the model.This penalized model choice criterion allowed us to choose between a population of constant size and a population size with a past change on simulated data sets and also on a cattle data set
Pellay, François-Xavier. "Méthodes d'estimation statistique de la qualité et méta-analyse de données transcriptomiques pour la recherche biomédicale". Thesis, Lille 1, 2008. http://www.theses.fr/2008LIL10058/document.
Testo completoTo understand the biological phenomena taking place in a cell under physiological or pathological conditions, it is essential to know the genes that it expresses Measuring genetic expression can be done with DNA chlp technology on which are set out thousands of probes that can measure the relative abundance of the genes expressed in the cell. The microarrays called pangenomic are supposed to cover all existing proteincoding genes, that is to say currently around thirty-thousand for human beings. The measure, analysis and interpretation of such data poses a number of problems and the analytlcal methods used will determine the reliability and accuracy of information obtained with the microarrays technology. The aim of thls thesis is to define methods to control measures, improve the analysis and deepen interpretation of microarrays to optimize their utilization in order to apply these methods in the transcriptome analysis of juvenile myelomocytic leukemia patients, to improve the diagnostic and understand the biological mechanisms behind this rare disease. We thereby developed and validated through several independent studies, a quality control program for microarrays, ace.map QC, a software that improves biological Interpretations of microarrays data based on genes ontologies and a visualization tool for global analysis of signaling pathways. Finally, combining the different approaches described, we have developed a method to obtain reliable biological signatures for diagnostic purposes
Laporte, Fabien. "Développement de méthodes statistiques pour l'identification de gènes d'intérêt en présence d'apparentement et de dominance, application à la génétique du maïs". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS066.
Testo completoThe detection of genes is a first step to understand the impact of the genetic information of individuals on their phenotypes. During my PhD, I studied statistical methods to perform genome-wide association studies, with maize hybrids as an application case. Firstly, I studied the inference of relatedness coefficients between individuals from biallelic marker data. This estimation is based on a parametric mixture model. I studied the identifiability of this model in the generic case but also in the specific case of mating design where observed individuals are obtained by crossing lines, a representative case of classical mating design in plant genetics. Then I studied inference of variance component mixed model parameters and particularly the performance of algorithms to test effects of numerous markers. I compared existing programs and I optimized a Min-Max algorithm. Relevance of developed methods had been illustrated for the detection of QTLs through a genome-wide association analysis in a maize hybrids panel
Foll, Matthieu. "Méthodes bayesiennes pour l'estimation de l'histoire démographique et de la pression de sélection à partir de la structure génétique des populations". Phd thesis, Grenoble 1, 2007. http://www.theses.fr/2007GRE10280.
Testo completoRecent advances in the fields of computational biology and molecular biology techniques have led to the emerging discipline of population genomics, whose main objective is the study of the spatial structure of genetic diversity. This structure is determined by both neutral forces, like migration and drift, and adaptive forces, like natural selection, and has important applications in many fields like medical genetics or conservation biology. Here, we develop new statistical methods to evaluate the role of natural selection and environment in this spatial structure. All these methods are based on the Bayesian Dirichlet-multinomial model of genetic differentiation. First, we propose to include environmental variables in the estimation process, in order to identify the biotic and abiotic factors that determine the genetic structure. Then, we study the possibility of extending the Dirichlet-multinomial model to dominant markers, which have become very popular in the last few years, but which are affected by various ascertainment biases. Finally, we try to separate neutral effects from adaptive effects on the genetic structure, in order to identify regions of the genome influenced by natural selection. Three databases have been analyzed as illustrations of the use of these new methods: human data, data of argan tree in Morocco, and data of periwinkle. Finally, we developed three softwares implementing these various models
Foll, Matthieu. "Méthodes bayesiennes pour l'estimation de l'histoire démographique et de la pression de sélection à partir de la structure génétique des populations". Phd thesis, Université Joseph Fourier (Grenoble), 2007. http://tel.archives-ouvertes.fr/tel-00216192.
Testo completoTalbot, Denis. "Estimation de la variance et construction d'intervalles de confiance pour le ratio standardisé de mortalité avec application à l'évaluation d'un programme de dépistage du cancer". Thesis, Université Laval, 2010. http://www.theses.ulaval.ca/2010/27373/27373.pdf.
Testo completoHurel, Julie. "Détection d'organismes génétiquement modifiés (OGM) inconnus par analyse statistique de données de séquençage haut débit". Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1B027.
Testo completoThe European Union has adopted a very restrictive policy towards the dissemination and use of genetically modified organisms (GMOs), whose use in food is not well accepted by consumers. Although a maximum threshold exists for a food to be labelled "GM-free", only known GMOs are easily detectable. A GMO consists mainly of a host genome and a sequence inserted by a non-natural process that confers a particular property on the organism, such as resistance to certain diseases. In recent years, GMOs with an inserted sequence that is not known have been produced that are not detectable by approaches used until now (PCR-type). Hence the need to propose a tool for the detection of unknown GMOs, the subject of this thesis, based on recent advances in terms of high-throughput sequencing. Statistically, each organism has a specific frequency of nucleotide use in its genome. Any introduction of foreign genetic material will locally alter the nucleotide use frequencies in that region, resulting in different nucleotide use frequencies compared to those of the host organism. Based on this assertion, an unknown GMO detection tool has been developed from bacterial sequencing data when the GMO results from the insertion of a foreign gene, the truncation or fusion of a gene that may belong to the host genome. The tool has been tested on 4 GMO bacterial genomes, 7 wild bacterial genomes and 42 synthetic bacterial genomes. The results demonstrate the effectiveness of the method developed by presenting only one false positive gene and identifying more than 99% of the genes of GMO inserts
Nshimyumukiza, Léon. "Cell-free DNA-based noninvasive prenatal screening for Down syndrome in the Quebec healthcare system : health economic aspects". Doctoral thesis, Université Laval, 2017. http://hdl.handle.net/20.500.11794/27889.
Testo completoIntroduction: In the Province of Quebec, about 110,000 pregnant women are eligible to voluntary prenatal screening for trisomy 21(T21). Conventional screening strategies select about 4% of women for invasive fetal chromosome testing. Noninvasive prenatal testing using maternal blood cell-free DNA (NIPT) is a new highly accurate screening strategy that could reduce these invasive procedures but evidence about its health economic aspects (cost-effectiveness and affordability) is still lacking. Objectives: The objective of this thesis is to evaluate the expected health economic aspects of introducing NIPT into the Quebec trisomy 21 screening program. The first study systematically reviewed the literature of full economic evaluation studies on NIPT. The second study evaluated the expected cost-effectiveness of screening strategies incorporating NIPT, as well as conventional screening strategies. The third study evaluated the expected budget impact of implementing NIPT into the Quebec trisomy 21 screening program. Methodology: A systematic review of literature was performed for the first study. For the second and third studies, semi-Markov decision-analytic models were built to simulate the cost-effectiveness and the budget impact of NIPT for a virtual cohort of pregnant women similar to that of Quebec in terms of age and pregnancy rate by age. The main outcome for the cost-effectiveness analysis was the incremental cost per additional trisomy 21 detected. The main outcome for the budget impact analysis was the difference in the overall costs between the two alternatives: the current screening strategy vs. the most cost-effective strategy incorporating NIPT). Results: The first study included 16 studies. Results show that compared to current screening practice a universal NIPT screening program is not cost-effective. A program that offers NIPT to high risk pregnant women was found to be the most cost-effective option in the majority of studies included. The second study showed that NIPT as a second-tier test for high-risk women is cost-effective compared to screening algorithms not including NIPT. Out of 13 strategies compared, the integrated serum screening strategy followed by NIPT was the most cost-effective strategy. Other strategies can improve the number of T21 cases identified, but with increasing incremental costs per case (from $ 61,623 to $1,553,615). Results were sensitive to NIPT cost and cut-offs considered to determine high risk pregnant women. The third study found that NIPT as a second-tier test offered to high-risk women identified by the current screening program is affordable for the Quebec health care system. Compared to the current screening program, this strategy could be implemented at a neutral cost considering a modest yearly saving of $80,432 (95% CI: $79,874-$81,462). Results were sensitive to the NIPT costs and the uptake-rate of invasive diagnostic tests. Conclusion: NIPT as a second-tier test offered to high-risk women identified by the current screening program is cost-effective and affordable for the Quebec health care system. Decision makers should consider its introduction after considerations of others aspects such as ethical issues.
Boitard, Simon. "Cartographie de gènes à caractères quantitatifs par déséquilibre de liaison". Phd thesis, Université Paul Sabatier - Toulouse III, 2006. http://tel.archives-ouvertes.fr/tel-00132675.
Testo completoBernard, Anne. "Développement de méthodes statistiques nécessaires à l'analyse de données génomiques : application à l'influence du polymorphisme génétique sur les caractéristiques cutanées individuelles et l'expression du vieillissement cutané". Phd thesis, Conservatoire national des arts et metiers - CNAM, 2013. http://tel.archives-ouvertes.fr/tel-00925074.
Testo completoBernard, Anne. "Développement de méthodes statistiques nécessaires à l'analyse de données génomiques : application à l'influence du polymorphisme génétique sur les caractéristiques cutanées individuelles et l'expression du vieillissement cutané". Electronic Thesis or Diss., Paris, CNAM, 2013. http://www.theses.fr/2013CNAM0882.
Testo completoNew technologies developed recently in the field of genetic have generated high-dimensional databases, especially SNPs databases. These databases are often characterized by a number of variables much larger than the number of individuals. The goal of this dissertation was to develop appropriate statistical methods to analyse high-dimensional data, and to select the most biologically relevant variables. In the first part, I present the state of the art that describes unsupervised and supervised variables selection methods for two or more blocks of variables. In the second part, I present two new unsupervised "sparse" methods: Group Sparse Principal Component Analysis (GSPCA) and Sparse Multiple Correspondence Analysis (Sparse MCA). Considered as regression problems with a group LASSO penalization, these methods lead to select blocks of quantitative and qualitative variables, respectively. The third part is devoted to interactions between SNPs. A method employed to identify these interactions is presented: the logic regression. Finally, the last part presents an application of these methods on a real SNPs dataset to study the possible influence of genetic polymorphism on facial skin aging in adult women. The methods developed gave relevant results that confirmed the biologist's expectations and that offered new research perspectives
Rebours, Vinciane. "Inflammation et oncogenèse pancréatique : physiologie et physiopathologie". Paris 7, 2012. http://www.theses.fr/2012PA077254.
Testo completoChronic pancreatitis is a well described risk factor of pancreatic adenocarcinoma. The link between chronic inflammation and oncogenesis is partially known. However, the role of pancreatic stellate cells (PSC) and hypoxia seems to be the key in the pathophysiological process. Their activation following pancreatic injury results in extracellular matrix remodeling and changes in cell/cell and epithelial cell/stroma relationship. It also regulates the expression of cytokines and growth factors and promotes fibrosis, cell proliferation and migration and tumor invasion. In patients with chronic pancreatitis, the detection of precancerous lesions (Pancreatic intraepithelial neoplasia (PanIN) and Intraductal pancreatic mucinous neoplasrns (IPMN)), is made difficult by the pancreatic architectural modifications. Despite advances in knowledge of the genome to proteome of these lesions, tools do not allow effective screening in patients at high risk of pancreatic cancer. We proposed three approaches in order to assess the relationship between pancreatic inflammation and oncogenesis. Firstly, we assessed the prevalence of precancerous lesions (PanIN) in long standing pancreatic inflammation (hereditary pancreatitis) and found frequent early and severe PanIN lesions in the course of hereditary pancreatitis. Secondly, we developed a model of culture of thick sections of human normal pancreas and assessed an early activation of pancreatic stellate cells in hypoxic conditions. Finally, we identified specific biomarkers of high grade of dysplasia in precancerous lésions (IPMN) by mass spectrometry imagery. Identifications (Ubiquitin and Thymosin-p4) were validated on IPMN EUS FNA samples
Villemereuil, Pierre de. "Méthodes pour l’étude de l’adaptation locale et application au contexte de l’adaptation aux conditions d’altitude chez la plante alpine Arabis alpina". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAS003/document.
Testo completoLocal adaptation is a micro-evolutionary phenomenon, which arises when populations of the same species are exposed to contrasted environmental conditions.If this environment exert some natural selection pressure, if an adaptive potential exists among the populations and if the gene flow is sufficiently mild, populations are expected to tend toward a local adaptive optimum.In this thesis, I study the methodological means of the study of local adaptation on the one hand, and I investigate this phenomenon along an elevation gradient in the alpine plant Arabis alpina on the other hand.In the first, methodological part, I show that the genome scan methods to detect selection using genetic markers might suffer strong false positive rates when confronted to complex but realistic datasets.I then introduce a statistical method to detect markers under selection, which, contrary to existing methods, make use of both the concept of genetic differentiation (or Fst) and environmental information.This method has been developed in order to reduce its global false positive rate.Finally, I present some perspectives regarding the relationships between the relatively old ``common garden'' experiment and the new developments in molecular biology and statistics.In the second, empirical part, I introduce an analysis of the demographic characteristics of A. alpina in six natural populations. Besides providing interesting biological information on this species (low life expectancy, strongly contrasted reproduction and survival...), these analyses show that growth increase and survival decrease with the decrease of average temperature (hence with altitude).Since these analyses do not allow us to rule out hypotheses such as drift and phenotypic plasticity, I show the results of a common garden experiment which enable us to smooth phenotypic plasticity and, when combined with molecular data, enable us to rule out the hypothesis of drift.The results show the existence of an adaptive phenotypic syndrome, in which plants are smaller, are more compact, grow slower and reproduce less in cold temperature environments.Using the molecular data, I draw a list of 40 locus which might be involved in this adaptive process.In the end, I discuss these empirical findings as a whole to place them in a more general context of alpine ecology. I sum up the main methodological challenges when studying local adaptation and offer some methodological perspectives
Dias, Alves Thomas. "Modélisation du déséquilibre de liaison en génomique des populations par méthodes d'optimisation". Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAS052/document.
Testo completoWe present a new formalism and new methods to model linkage disequilibrium and to account for haplotype structure of population genomics data. Modeling relies on an optimization problem with constraints that is solved using dynamic programming. The algorithmic cost of proposed methods is linear, which is a desirable property to process large datasets.First, we applied our framework to study admixed populations and perform local ancestry inference. Our method is applied to simulated genotypes of admixed human populations and to real genotypes from admixed Populus species.Second, we developed our optimization framework to perform haploptype phasing and imputation based on a population of genotypes. All optimization methods have been developed in a Python package called Loter
Caye, Kévin. "Méthodes de factorisation matricielle pour la génomique des populations et les tests d'association". Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAS046/document.
Testo completoWe present statistical methods based on matrix factorization problems. A first method allows efficient inference of population structure from genetic data and including geographic proximity information. A second method corrects the association studies for confounding factors. We present in this manuscript the models, as well as the theoretical aspects of the inference algorithms. Moreover, using numerical simulations, we compare the performance of our methods with those of existing methods. Finally, we use our methods on real biological data. Our methods have been implemented and distributed as R packages: tess3r and lfmm
Fouchet, Arnaud. "Kernel methods for gene regulatory network inference". Thesis, Evry-Val d'Essonne, 2014. http://www.theses.fr/2014EVRY0058/document.
Testo completoNew technologies in molecular biology, in particular dna microarrays, have greatly increased the quantity of available data. in this context, methods from mathematics and computer science have been actively developed to extract information from large datasets. in particular, the problem of gene regulatory network inference has been tackled using many different mathematical and statistical models, from the most basic ones (correlation, boolean or linear models) to the most elaborate (regression trees, bayesian models with latent variables). despite their qualities when applied to similar problems, kernel methods have scarcely been used for gene network inference, because of their lack of interpretability. in this thesis, two approaches are developed to obtain interpretable kernel methods. firstly, from a theoretical point of view, some kernel methods are shown to consistently estimate a transition function and its partial derivatives from a learning dataset. these estimations of partial derivatives allow to better infer the gene regulatory network than previous methods on realistic gene regulatory networks. secondly, an interpretable kernel methods through multiple kernel learning is presented. this method, called lockni, provides state-of-the-art results on real and realistically simulated datasets
Chepiga, Valentina. "СРАВНИТЕЛЬНО-СТИЛИСТИЧЕСКИЙ АНАЛИЗ ПРОИЗВЕДЕНИЙ РОМЕНА ГАРИ И ЭМИЛЯ АЖАРАSravnitelʹʹno-stilistiČeskij analiz proizvedenij romena gari i èmilâ aŽara". Paris 3, 2008. http://www.theses.fr/2008PA030084.
Testo completoThe attribution of a style to an author may constitute a problematic stake. Certain "cases" surprise, such as the case of Gary and Ajar. Beyond the socio-literary stake, a stylistic stake leads. Two different ways open to the search: first, the genesis of the writing for each "author" by the observation of manuscripts and corresponding processes of writing; second, the composition and the linguistic configuration of the verbal material constituting the "style", a search for which the quantitative analysis will be used. From the studied corpus (three Roman Gary's novels and three Emile Ajar's novels written during the same period to which were added two Paul Pavlowitch's novels) the question of the attribution of Emile Ajar's novels was asked. To reach this objective, it turned out necessary to apply several methods of philological analysis: biographic and contextual research, genetic analysis of manuscripts, linguistic and stylistic analysis coupled with quantitative methods. For the attribution of the novels, the thesis crosses different approaches. That based on the "theory of pattern recognition” elaborated in the Laboratory of Applied Linguistic Studies of the Saint-Petersburg State University - used for the first time on the French language - seemed decisive to obtain our results. This method allows to conclude, from the systematic analysis of syntactical elements, that the novels of the author Ajar and the author Gary were created by the same writer
Vauchelet, Nicolas. "Modélisation mathématique du transport diffusif de charges partiellement quantiques". Phd thesis, Université Paul Sabatier - Toulouse III, 2006. http://tel.archives-ouvertes.fr/tel-00135114.
Testo completomathématique du transport d'électrons confinés dans une nanostructure
dans le but d'implémenter des simulations numériques. Dans de tels
dispositifs nanométriques, les ordres de grandeurs ne jouent pas le
même rôle dans chaque direction. Les électrons peuvent être
extrêmement confinés dans une ou plusieurs directions. Un modèle
quantique est nécessaire pour décrire le confinement. Dans la
direction non confinée, le transport est supposé de nature classique.
Nous proposons alors un système couplé quantique/classique.
Les collisions intervenant lors du transport induisent un régime
diffusif des porteurs de charges. Le modèle diffusif est obtenu grâce
à une limite de diffusion d'un modèle cinétique. L'analyse
mathématique de cette limite de diffusion et du modèle diffusif couplé
sont présentées. Une simulation numérique du transport dans un
nanotransistor est obtenue avec ce modèle.
Faouzi, Johann. "Machine learning to predict impulse control disorders in Parkinson's disease". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS048.
Testo completoImpulse control disorders are a class of psychiatric disorders characterized by impulsivity. These disorders are common during the course of Parkinson's disease, decrease the quality of life of subjects, and increase caregiver burden. Being able to predict which individuals are at higher risk of developing these disorders and when is of high importance. The objective of this thesis is to study impulse control disorders in Parkinson's disease from the statistical and machine learning points of view, and can be divided into two parts. The first part consists in investigating the predictive performance of the altogether factors associated with these disorders in the literature. The second part consists in studying the association and the usefulness of other factors, in particular genetic data, to improve the predictive performance
Magnanensi, Jérémy. "Amélioration et développement de méthodes de sélection du nombre de composantes et de prédicteurs significatifs pour une régression PLS et certaines de ses extensions à l'aide du bootstrap". Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAJ082/document.
Testo completoThe Partial Least Squares (PLS) regression, through its properties, has become a versatile statistic methodology for the analysis of genomic datasets.The reliability of the PLS regression and some of its extensions relies on a robust determination of a tuning parameter, the number of components. Such a determination is still a major aim since no existing criterion could be considered as a global benchmark one in the state-of-art literature. We developed a new bootstrap based stopping criterion in PLS components construction that guarantee a high level of stability. We then adapted and used it to develop and improve variable selection processes, allowing a more reliable and robust determination of significant probe sets related to the studied feature of a pathology
Luu, Keurcien. "Application de l'Analyse en Composantes Principales pour étudier l'adaptation biologique en génomique des populations". Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAS053/document.
Testo completoIdentifying genes involved in local adaptation is of major interest in population genetics. Current statistical methods for genome scans are no longer suited to the analysis of Next Generation Sequencing (NGS) data. We propose new statistical methods to perform genome scans on massive datasets. Our methods rely exclusively on Principal Component Analysis which use in population genetics will be discussed extensively. We also explain the reasons why our approaches can be seen as extensions of existing methods and demonstrate how our PCA-based statistics compare with state-of-the-art methods. Our work has led to the development of pcadapt, an R package designed for outlier detection for various genetic data
Gazal, Steven. "La consanguinité à l'ère du génome haut-débit : estimations et applications". Thesis, Paris 11, 2014. http://www.theses.fr/2014PA11T026/document.
Testo completoAn individual is said to be inbred if his parents are related and if his genealogy contains at least one inbreeding loop leading to a common ancestor. The inbreeding coefficient of an individual is defined as the probability that the individual has received two alleles identical by descent, coming from a single allele present in a common ancestor, at a random marker on the genome. The inbreeding coefficient is a central parameter in genetics, and is used in population genetics to characterize the population structure, and also in genetic epidemiology to search for genetic factors involved in recessive diseases.The inbreeding coefficient was traditionally estimated from genealogies, but methods have been developed to avoid genealogies and to estimate this coefficient from the information provided by genetic markers distributed along the genome.With the advances in high-throughput genotyping techniques, it is now possible to genotype hundreds of thousands of markers for one individual, and to use these methods to reconstruct the regions of identity by descent on his genome and estimate a genomic inbreeding coefficient. There is currently no consensus on the best strategy to adopt with these dense marker maps, in particular to take into account dependencies between alleles at different markers (linkage disequilibrium).In this thesis, we evaluated the different available methods through simulations using real data with realistic patterns of linkage disequilibrium. We highlighted an interesting approach that consists in generating several submaps to minimize linkage disequilibrium, estimating an inbreeding coefficient of each of the submaps based on a hidden Markov method implemented in FEstim software, and taking as estimator the median of these different estimates. The advantage of this approach is that it can be used on any sample size, even on an individual, since it requires no linkage disequilibrium estimate. FEstim is a maximum likelihood estimator, which allows testing whether the inbreeding coefficient is significantly different from zero and determining the most probable mating type of the parents. Finally, through the identification of homozygous regions shared by several consanguineous patients, our strategy permits the identification of recessive mutations involved in monogenic and multifactorial diseases.To facilitate the use of our method, we developed the pipeline FSuite, to interpret results of population genetics and genetic epidemiology studies, as shown on the HapMap III reference panel, and on a case-control Alzheimer's disease data
Loucoubar, Cheikh. "Statistical genetic analysis of infectious disease (malaria) phenotypes from a longitudinal study in a population with significant familial relationships". Phd thesis, Université René Descartes - Paris V, 2012. http://tel.archives-ouvertes.fr/tel-00685104.
Testo completoFaubet, Pierre. "METHODES STATISTIQUES POUR L'ETUDE DE LA STRUCTURATION SPATIALE DE LA DIVERSITE GENETIQUE". Phd thesis, 2009. http://tel.archives-ouvertes.fr/tel-00606630.
Testo completo