Rozprawy doktorskie na temat „Genetics – Statistical methods”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Genetics – Statistical methods.

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Genetics – Statistical methods”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.

1

ZHANG, GE. "STATISTICAL METHODS IN GENETIC ASSOCIATION". University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1196099744.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Lange, Christoph. "Generalized estimating equation methods in statistical genetics". Thesis, University of Reading, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.269921.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Jung, Min Kyung. "Statistical methods for biological applications". [Bloomington, Ind.] : Indiana University, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3278454.

Pełny tekst źródła
Streszczenie:
Thesis (Ph.D.)--Indiana University, Dept. of Mathematics, 2007.
Source: Dissertation Abstracts International, Volume: 68-10, Section: B, page: 6740. Adviser: Elizabeth A. Housworth. Title from dissertation home page (viewed May 20, 2008).
Style APA, Harvard, Vancouver, ISO itp.
4

Yung, Godwin Yuen Han. "Statistical methods for analyzing genetic sequencing association studies". Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:33493313.

Pełny tekst źródła
Streszczenie:
Case-control genetic sequencing studies are increasingly being conducted to identify rare variants associated with complex diseases. Oftentimes, these studies collect a variety of secondary traits--quantitative and qualitative traits besides the case-control disease status. Reusing the data and studying the association between rare variants and secondary phenotypes provide an attractive and cost effective approach that can lead to discovery of new genetic associations. In Chapter 1, we carry out an extensive investigation of the validity of ad hoc methods, which are simple, computationally efficient methods frequently applied in practice to study the association between secondary phenotypes and single common genetic variants. Though other researchers have investigated the same problem, we make two key contributions to existing literature. First, we show that in taking an ad hoc approach, it may be desirable to adjust for covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype in the population. Second, we show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a non-logistic model such as the probit model. Spurious associations can be avoided by including interaction terms in the fitted regression model. Our results are justified theoretically and via simulations, and illustrated by a genome-wide association study of smoking using a lung cancer case-control study. In Chapter 2, we consider the problem of testing associations between secondary phenotypes and sets of rare genetic variants. We show that popular region-based methods such as the burden test and the sequence kernel association test (SKAT) can only be applied under the same conditions as those applicable to ad hoc methods (Chapter 1). For a more robust alternative, we propose an inverse-probability-weighted version of the optimal SKAT (SKAT-O) to account for unequal sampling of cases and controls. As an extension of SKAT-O, our approach is data adaptive and includes the weighted burden test and weighted SKAT as special cases. In addition to weighting individuals to account for the biased sampling, we can also consider weighting the variants in SKAT-O. Decreasing the weight of non-causal variants and increasing the weight of causal variants can improve power. However, since researchers do not know which variants are actually causal, it is common practice to weight genetic variants as a function of their minor allele frequencies. This is motivated by the belief that rarer variants are more likely to have larger effects. In Chapter 3, we propose a new unsupervised statistical framework for predicting the functional status of genetic variants. Compared to existing methods, the proposed algorithm integrates a diverse set of annotations---which are partitioned beforehand into multiple groups by the user---and predicts the functional status for each group, taking into account within- and between-group correlations. We demonstrate the advantages of the algorithm through application to real annotation data and conclude with future directions.
Biostatistics
Style APA, Harvard, Vancouver, ISO itp.
5

Shringarpure, Suyash. "Statistical Methods for studying Genetic Variation in Populations". Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/117.

Pełny tekst źródła
Streszczenie:
The study of genetic variation in populations is of great interest for the study of the evolutionary history of humans and other species. Improvement in sequencing technology has resulted in the availability of many large datasets of genetic data. Computational methods have therefore become quite important in analyzing these data. Two important problems that have been studied using genetic data are population stratification (modeling individual ancestry with respect to ancestral populations) and genetic association (finding genetic polymorphisms that affect a trait). In this thesis, we develop methods to improve our understanding of these two problems. For the population stratification problem, we develop hierarchical Bayesian models that incorporate the evolutionary processes that are known to affect genetic variation. By developing mStruct, we show that modeling more evolutionary processes improves the accuracy of the recovered population structure. We demonstrate how nonparametric Bayesian processes can be used to address the question of choosing the optimal number of ancestral populations that describe the genetic diversity of a given sample of individuals. We also examine how sampling bias in genotyping study design can affect results of population structure analysis and propose a probabilistic framework for modeling and correcting sample selection bias. Genome-wide association studies (GWAS) have vastly improved our understanding of many diseases. However, such studies have failed to uncover much of the variation responsible for a number of common multi-factorial diseases and complex traits. We show how artificial selection experiments on model organisms can be used to better understand the nature of genetic associations. We demonstrate using simulations that using data from artificial selection experiments improves the performance of conventional methods of performing association. We also validate our approach using semi-simulated data from an artificial selection experiment on Drosophila Melanogaster.
Style APA, Harvard, Vancouver, ISO itp.
6

Yu, Xiaoqing. "Statistical Methods and Analyses for Next-generation Sequencing Data". Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1403708200.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Cordell, Heather Jane. "Statistical methods in the genetic analysis of type 1 diabetes". Thesis, University of Oxford, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.296834.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Lee, Yiu-fai, i 李耀暉. "Analysis for segmental sharing and linkage disequilibrium: a genomewide association study on myopia". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B43912217.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Allchin, Lorraine Doreen May. "Statistical methods for mapping complex traits". Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:65f392ba-1b64-4b00-8871-7cee98809ce1.

Pełny tekst źródła
Streszczenie:
The first section of this thesis addresses the problem of simultaneously identifying multiple loci that are associated with a trait, using a Bayesian Markov Chain Monte Carlo method. It is applicable to both case/control and quantitative data. I present simulations comparing the methods to standard frequentist methods in human case/control and mouse QTL datasets, and show that in the case/control simulations the standard frequentist method out performs my model for all but the highest effect simulations and that for the mouse QTL simulations my method performs as well as the frequentist method in some cases and worse in others. I also present analysis of real data and simulations applying my method to a simulated epistasis data set. The next section was inspired by the challenges involved in applying a Markov Chain Monte Carlo method to genetic data. It is an investigation into the performance and benefits of the Matlab parallel computing toolbox, specifically its implementation of the Cuda programing language to Matlab's higher level language. Cuda is a language which allows computational calculations to be carried out on the computer's graphics processing unit (GPU) rather than its central processing unit (CPU). The appeal of this tool box is its ease of use as few code adaptions are needed. The final project of this thesis was to develop an HMM for reconstructing the founders of sparsely sequenced inbred populations. The motivation here, that whilst sequencing costs are rapidly decreasing, it is still prohibitively expensive to fully sequence a large number of individuals. It was proposed that, for populations descended from a known number of founders, it would be possible to sequence these individuals with a very low coverage, use a hidden Markov model (HMM) to represent the chromosomes as mosaics of the founders, then use these states to impute the missing data. For this I developed a Viterbi algorithm with a transition probability matrix based on recombination rate which changes for each observed state.
Style APA, Harvard, Vancouver, ISO itp.
10

Vaez, Torshizi Rasoul. "Quantitative genetic analyses of production and reproduction traits in Australian merino sheep". Thesis, The University of Sydney, 1996. https://hdl.handle.net/2123/27593.

Pełny tekst źródła
Streszczenie:
Restricted Maximum Likelihood (REML) procedures based on a derivative-free algorithm using the Simplex method and fitting an animal model were used to estimate variance and covariance components for several performances of productive traits, namely, body weight measured at birth, weaning, 10 month, 16 month and 22 months of age, greasy fleece average daily gain to 4, 10, 16 and 22 months of age, clean fleece average daily gain to 10, 16, 22 months of age and mean fibre diameter measured at 10, 16 and 22 months of age. For these traits, the importance of maternal effects, either additive genetic or environmental, were investigated. The interrelationship among the performances of each trait were studied, and then were used to determine the efficiencies of indirect selection at early ages compared with later ages for improvement of an animal's lifetime production.
Style APA, Harvard, Vancouver, ISO itp.
11

Zang, Yong, i 臧勇. "Robust tests under genetic model uncertainty in case-control association studies". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46419123.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
12

Choy, Yan-tsun, i 蔡恩浚. "Statistical evaluation of mixed DNA stains". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B42664287.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
13

Guturu, Harendra. "Deciphering human gene regulation using computational and statistical methods". Thesis, Stanford University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3581147.

Pełny tekst źródła
Streszczenie:

It is estimated that at least 10-20% of the mammalian genome is dedicated towards regulating the 1-2% of the genome that codes for proteins. This non-coding, regulatory layer is a necessity for the development of complex organisms, but is poorly understood compared to the genetic code used to translate coding DNA into proteins. In this dissertation, I will discuss methods developed to better understand the gene regulatory layer. I begin, in Chapter 1, with a broad overview of gene regulation, motivation for studying it, the state of the art with a historically context and where to look forward.

In Chapter 2, I discuss a computational method developed to detect transcription factor (TF) complexes. The method compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid transcription factor (TF) complexes. Structural data were integrated to explore overlapping motif arrangements while ensuring physical plausibility of the TF complex. Using this approach, I predicted 422 physically realistic TF complex motifs at 18% false discovery rate (FDR). I found that the set of complexes is enriched in known TF complexes. Additionally, novel complexes were supported by chromatin immunoprecipitation sequencing (ChIP-seq) datasets. Analysis of the structural modeling revealed three cooperativity mechanisms and a tendency of TF pairs to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. The TF complexes and associated binding site predictions are made available as a web resource at http://complex.stanford.edu.

Next, in Chapter 3, I discuss how gene enrichment analysis can be applied to genome-wide conserved binding sites to successfully infer regulatory functions for a given TF complex. A genomic screen predicted 732,568 combinatorial binding sites for 422 TF complex motifs. From these predictions, I inferred 2,440 functional roles, which are consistent with known functional roles of TF complexes. In these functional associations, I found interesting themes such as promiscuous partnering of TFs (such as ETS) in the same functional context (T cells). Additionally, functional enrichment identified two novel TF complex motifs associated with spinal cord patterning genes and mammary gland development genes, respectively. Based on these predictions, I discovered novel spinal cord patterning enhancers (5/9, 56% validation rate) and enhancers active in MCF7 cells (11/19, 53% validation rate). This set replete with thousands of additional predictions will serve as a powerful guide for future studies of regulatory patterns and their functional roles.

Then, in Chapter 4, I outline a method developed to predict disease susceptibility due to gene mis-regulation. The method interrogates ensembles of conserved binding sites of regulatory factors disrupted by an individual's variants and then looks for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is reflective of their very different medical histories. These results suggest that erosion of gene regulation results in function specific mutation loads that manifest as disease predispositions in a familial lineage. Additionally, this aggregate analysis method addresses the problem that although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing loci.

Finally, I conclude in Chapter 5 with a summary of my findings throughout my research and future directions of research based on my findings.

Style APA, Harvard, Vancouver, ISO itp.
14

Hu, Xianghong. "Statistical methods for Mendelian randomization using GWAS summary data". HKBU Institutional Repository, 2019. https://repository.hkbu.edu.hk/etd_oa/639.

Pełny tekst źródła
Streszczenie:
Mendelian Randomization (MR) is a powerful tool for accessing causality of exposure on an outcome using genetic variants as the instrumental variables. Much of the recent developments is propelled by the increasing availability of GWAS summary data. However, the accuracy of the MR causal effect estimates could be challenged in case of the MR assumptions are violated. The source of biases could attribute to the weak effects arising because of polygenicity, the presentence of horizontal pleiotropy and other biases, e.g., selection bias. In this thesis, we proposed two works, expecting to deal with these issues.In the first part, we proposed a method named 'Bayesian Weighted Mendelian Randomization (BMWR)' for causal inference using summary statistics from GWAS. In BWMR, we not only take into account the uncertainty of weak effects owning to polygenicity of human genomics but also models the weak horizontal pleiotropic effects. Moreover, BWMR adopts a Bayesian reweighting strategy for detection of large pleiotropic outliers. An efficient algorithm based on variational inference was developed to make BWMR computationally efficient and stable. Considering the underestimated variance provided by variational inference, we further derived a closed form variance estimator inspired by a linear response method. We conducted several simulations to evaluate the performance of BWMR, demonstrating the advantage of BWMR over other methods. Then, we applied BWMR to access causality between 126 metabolites and 90 complex traits, revealing novel causal relationships. In the second part, we further developed BWMR-C: Statistical correction of selection bias for Mendelian Randomization based on a Bayesian weighted method. Based on the framework of BWMR, the probability model in BWMR-C is built conditional on the IV selection criteria. In such way, BWMR-C delicated to reduce the influence of the selection process on the causal effect estimates and also preserve the good properties of BWMR. To make the causal inference computationally stable and efficient, we developed a variational EM algorithm. We conducted several comprehensive simulations to evaluate the performance of BWMR-C for correction of selection bias. Then, we applied BWMR-C on seven body fat distribution related traits and 140 UK Biobank traits. Our results show that BWMR-C achieves satisfactory performance for correcting selection bias. Keywords: Mendelian Randomization, polygenicity, horizontal pleiotropy, selection bias, variation inference.
Style APA, Harvard, Vancouver, ISO itp.
15

Ciampa, Julia Grant. "Multilocus approaches to the detection of disease susceptibility regions : methods and applications". Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:8f82a624-7d80-438c-af3e-68ce983ff45f.

Pełny tekst źródła
Streszczenie:
This thesis focuses on multilocus methods designed to detect single nucleotide polymorphisms (SNPs) that are associated with disease using case-control data. I study multilocus methods that allow for interaction in the regression model because epistasis is thought to be pervasive in the etiology of common human diseases. In contrast, the single-SNP models widely used in genome wide association studies (GWAS) are thought to oversimplify the underlying biology. I consider both pairwise interactions between individual SNPs and modular interactions between sets of biologically similar SNPs. Modular epistasis may be more representative of disease processes and its incorporation into regression analyses yields more parsimonious models. My methodological work focuses on strategies to increase power to detect susceptibility SNPs in the presence of genetic interaction. I emphasize the effect of gene-gene independence constraints and explore methods to relax them. I review several existing methods for interaction analyses and present their first empirical evaluation in a GWAS setting. I introduce the innovative retrospective Tukey score test (RTS) that investigates modular epistasis. Simulation studies suggest it offers a more powerful alternative to existing methods. I present diverse applications of these methods, using data from a multi-stage GWAS on prostate cancer (PRCA). My applied work is designed to generate hypotheses about the functionality of established susceptibility regions for PRCA by identifying SNPs that affect disease risk through interactions with them. Comparison of results across methods illustrates the impact of incorporating different forms of epistasis on inference about disease association. The top findings from these analyses are well supported by molecular studies. The results unite several susceptibility regions through overlapping biological pathways known to be disrupted in PRCA, motivating replication study.
Style APA, Harvard, Vancouver, ISO itp.
16

Csilléry, Katalin. "Statistical inference in population genetics using microsatellites". Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3865.

Pełny tekst źródła
Streszczenie:
Statistical inference from molecular population genetic data is currently a very active area of research for two main reasons. First, in the past two decades an enormous amount of molecular genetic data have been produced and the amount of data is expected to grow even more in the future. Second, drawing inferences about complex population genetics problems, for example understanding the demographic and genetic factors that shaped modern populations, poses a serious statistical challenge. Amongst the many different kinds of genetic data that have appeared in the past two decades, the highly polymorphic microsatellites have played an important role. Microsatellites revolutionized the population genetics of natural populations, and were the initial tool for linkage mapping in humans and other model organisms. Despite their important role, and extensive use, the evolutionary dynamics of microsatellites are still not fully understood, and their statistical methods are often underdeveloped and do not adequately model microsatellite evolution. In this thesis, I address some aspects of this problem by assessing the performance of existing statistical tools, and developing some new ones. My work encompasses a range of statistical methods from simple hypothesis testing to more recent, complex computational statistical tools. This thesis consists of four main topics. First, I review the statistical methods that have been developed for microsatellites in population genetics applications. I review the different models of the microsatellite mutation process, and ask which models are the most supported by data, and how models were incorporated into statistical methods. I also present estimates of mutation parameters for several species based on published data. Second, I evaluate the performance of estimators of genetic relatedness using real data from five vertebrate populations. I demonstrate that the overall performance of marker-based pairwise relatedness estimators mainly depends on the population relatedness composition and may only be improved by the marker data quality within the limits of the population relatedness composition. Third, I investigate the different null hypotheses that may be used to test for independence between loci. Using simulations I show that testing for statistical independence (i.e. zero linkage disequilibrium, LD) is difficult to interpret in most cases, and instead a null hypothesis should be tested, which accounts for the “background LD” due to finite population size. I investigate the utility of a novel approximate testing procedure to circumvent this problem, and illustrate its use on a real data set from red deer. Fourth, I explore the utility of Approximate Bayesian Computation, inference based on summary statistics, to estimate demographic parameters from admixed populations. Assuming a simple demographic model, I show that the choice of summary statistics greatly influences the quality of the estimation, and that different parameters are better estimated with different summary statistics. Most importantly, I show how the estimation of most admixture parameters can be considerably improved via the use of linkage disequilibrium statistics from microsatellite data.
Style APA, Harvard, Vancouver, ISO itp.
17

Su, Zhan. "Statistical methods for the analysis of genetic association studies". Thesis, University of Oxford, 2008. http://ora.ox.ac.uk/objects/uuid:98614f8b-63fe-4fa1-9a24-422216ad14cf.

Pełny tekst źródła
Streszczenie:
One of the main biological goals of recent years is to determine the genes in the human genome that cause disease. Recent technological advances have realised genome-wide association studies, which have uncovered numerous genetic regions implicated with human diseases. The current approach to analysing data from these studies is based on testing association at single SNPs but this is widely accepted as underpowered to detect rare and poorly tagged variants. In this thesis we propose several novel approaches to analysing large-scale association data, which aim to improve upon the power offered by traditional approaches. We combine an established imputation framework with a sophisticated disease model that allows for multiple disease causing mutations at a single locus. To evaluate our methods, we have developed a fast and realistic method to simulate association data conditional on population genetic data. The simulation results show that our methods remain powerful even if the causal variant is not well tagged, there are haplotypic effects or there is allelic heterogeneity. Our methods are further validated by the analysis of the recent WTCCC genome-wide association data, where we have detected confirmed disease loci, known regions of allelic heterogeneity and new signals of association. One of our methods also has the facility to identify the high risk haplotype backgrounds that harbour the disease alleles, and therefore can be used for fine-mapping. We believe that the incorporation of our methods into future association studies will help progress the understanding genetic diseases.
Style APA, Harvard, Vancouver, ISO itp.
18

Ahiska, Bartu. "Reference-free identification of genetic variation in metagenomic sequence data using a probabilistic model". Thesis, University of Oxford, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.561121.

Pełny tekst źródła
Streszczenie:
Microorganisms are an indispensable part of our ecosystem, yet the natural metabolic and ecological diversity of these organisms is poorly understood due to a historical reliance of microbiology on laboratory grown cultures. The awareness that this diversity cannot be studied by laboratory isolation, together with recent advances in low cost scalable sequencing technology, have enabled the foundation of culture-independent microbiology, or metagenomics. The study of environmental microbial samples with metagenomics has led to many advances, but a number of technological and methodological challenges still remain. A potentially diverse set of taxa may be represented in anyone environmental sample. Existing tools for representing the genetic composition of such samples sequenced with short-read data, and tools for identifying variation amongst them, are still in their infancy. This thesis makes the case that a new framework based on a joint-genome graph can constitute a powerful tool for representing and manipulating the joint genomes of population samples. I present the development of a collection of methods, called SCRAPS, to construct these efficient graphs in small communities without the availability or bias of a reference genome. A key novelty is that genetic variation is identified from the data structure using a probabilistic algorithm that can provide a measure of the confidence in each call. SCRAPS is first tested on simulated short read data for accuracy and efficiency. At least 95% of non-repetitive small-scale genetic variation with a minor allele read depth greater than 10x is correctly identified; the number false positives per conserved nucleotide is consistently better than 1 part in 333 x 103. SCRAPS is then applied to artificially pooled experimental datasets. As part of this study, SCRAPS is used to identify genetic variation in an epidemiological 11 sample Neisseria meningitidis dataset collected from the African meningitis belt". In total 14,000 sites of genetic variation are identified from 48 million Illumina/Solexa reads. The results clearly show the genetic differences between two waves of infection that has plagued northern Ghana and Burkina Faso.
Style APA, Harvard, Vancouver, ISO itp.
19

Silver, Matthew. "Statistical methods in neuroimaging genetics : pathways sparse regression and cluster size inference". Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/11124.

Pełny tekst źródła
Streszczenie:
In the field of neuroimaging genetics, brain images are used as phenotypes in the search for genetic variants associated with brain structure or function. This search presents a formidable statistical challenge, not least because of the very high dimensionality of genotype and phenotype data produced by modern SNP (single nucleotide polymorphism) arrays and high resolution MRI. This thesis focuses on the use of multivariate sparse regression models such as the group lasso and sparse group lasso for the identification of gene pathways associated with both univariate and multivariate quantitative traits. The methods described here take particular account of various factors specific to pathways genome-wide association studies including widespread correlation (linkage disequilibrium) between genetic predictors, and the fact that many variants overlap multiple pathways. A resampling strategy that exploits finite sample variability is employed to provide robust rankings for pathways, SNPs and genes. Comprehensive simulation studies are presented comparing one proposed method, pathways group lasso with adaptive weights, to a popular alternative. This method is extended to the case of a multivariate phenotype, and the resulting pathways sparse reduced-rank regression model and algorithm is applied to a study identifying gene pathways associated with structural change in the brain characteristic of Alzheimer’s disease. The original model is also adapted for the task of ’pathways-driven’ SNP and gene selection, and this latter model, pathways sparse group lasso with adaptive weights, is applied in a search for SNPs and genes associated with elevated lipid levels in two separate cohorts of Asian adults. Finally, in a separate section an existing method for the identification of spatially extended clusters of image voxels with heightened activation is evaluated in an imaging genetic context. This method, known as cluster size inference, rests on a number of assumptions. Using real imaging and SNP data, false positive rates are found to be poorly controlled outside of a narrow range of parameters related to image smoothness and activation thresholds for cluster formation.
Style APA, Harvard, Vancouver, ISO itp.
20

Kecskemetry, Peter D. "Computationally intensive methods for hidden Markov models with applications to statistical genetics". Thesis, University of Oxford, 2014. https://ora.ox.ac.uk/objects/uuid:8dd5d68d-27e9-4412-868c-0477e438a2c5.

Pełny tekst źródła
Streszczenie:
In most fields of technology and science, the exponential increase of available data is an apparent trend. In genetics, the main contributor to this trend is the improving efficiency of sequencing technologies. While the Human Genome project focused on assembling a single reference sequence not long ago, now there are aims to sequence million genomes in upcoming projects. The consequent computational challenge is being able to utilise this wealth of data, which requires the development of sufficiently powerful methods for analysis. However, the speed of transistor-based computing processors has recently hit a power ceiling and developers can no longer rely on hardware improvements automatically providing performance improvements in software directly. The result is that analysis methods are failing to keep up with the speed of data generation, and at this age of exponential data explosion it is becoming critical to find any solution for improving the performance of statistical methods. One traditional approach is to apply approximations - often trading the quality of results for response time. Another approach is to achieve algorithmic optimisations for existing methods without sacrificing results. Unfortunately, the possibilities for purely algorithmic optimisations often tend to be limited. A third approach is to attempt to harness the computational power of the presently re-emerging field of parallel computing. While the theoretical performance of parallel platforms roughly follows Moore's law, exploiting the power of parallelisms requires significant effort during development and may not even be possible in certain applications. This work attempts to explore avenues for achieving high performance for Hidden Markov Models (HMMs) and HMM applications in population genetics. The second chapter of this thesis introduces a single-locus variant of the IMPUTE2 method for calling and phasing genotype variants based on genotype likelihood data. This method uses both approximations and algorithmic optimisations and achieves performance improvements without a considerable drop in accuracy. It is also aimed to be highly parallelisable. The third chapter presents GPGPU-focused parallelisation methods over the statespace for HMM algorithms specifically under the Li and Stephens model, which is a widely and successfully used approximation of the coalescent. Practical experiments show ×200-×6000 times acceleration with a CUDA implementation of the popular Chromopainter method, which is based on the Li and Stephens model. The last chapter explores the theoretical possibility of parallelising HMM algorithms across blocks of observations (inspired by but not limited to methods used in genetics). A novel view and derivation is presented for block parallelism, along with accompanying analyses of applicability and relevance. Performance analysis results indicate that the application of block-parallelism is expected to be highly relevant for most large-scale HMM applications on present-day computing platforms, while block-parallelism may become a necessity for utilising the improving power of parallel hardware in the close future.
Style APA, Harvard, Vancouver, ISO itp.
21

Shen, Xia. "Novel Statistical Methods in Quantitative Genetics : Modeling Genetic Variance for Quantitative Trait Loci Mapping and Genomic Evaluation". Doctoral thesis, Uppsala universitet, Beräknings- och systembiologi, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-170091.

Pełny tekst źródła
Streszczenie:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision.  Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes.  The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Style APA, Harvard, Vancouver, ISO itp.
22

Hu, Yueqing, i 胡躍清. "Some topics in the statistical analysis of forensic DNA and genetic family data". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B38831491.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
23

Heinig, Matthias Alexander [Verfasser]. "Statistical methods for the analysis of the genetics of gene expression / Matthias Alexander Heinig". Berlin : Freie Universität Berlin, 2011. http://d-nb.info/1025305442/34.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
24

Ai, Ni, i 艾妮. "A novel framework for expression quantitative trait loci mapping". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B4715214X.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
25

O'Connell, Jared Michael. "Statistical methods for genotype microarray data on large cohorts of individuals". Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:4e3328cf-0d8e-4587-b24d-9b59fa220f32.

Pełny tekst źródła
Streszczenie:
Genotype microarrays assay hundreds of thousands of genetic variants on an individual's genome. The availability of this high throughput genotyping capability has transformed the field of genetics over the past decade by enabling thousands of individuals to be rapidly assayed. This has lead to the discovery of hundreds of genetic variants that are associated with disease and other phenotypes in genome wide association studies (GWAS). These data have also brought with them a number of new statistical and computational challenges. This thesis deals with two primary analysis problems involving microarray data; genotype calling and haplotype inference. Genotype calling involves converting the noisy bivariate fluorescent signals generated by microarray data into genotype values for each genetic variant and individual. Poor quality genotype calling can lead to false positives and loss of power in GWAS so this is an important task. We introduce a new genotype calling method that is highly accurate and has the novel capability of fusing microarray data with next-generation sequencing data for greater accuracy and fewer missing values. Our new method compares favourably to other available genotype calling software. Haplotype inference (or phasing) involves deconvolving these genotypes into the two inherited parental chromosomes for an individual. The development of phasing methods has been a fertile field for statistical genetics research for well over ten years. Depending on the demography of a cohort, different phasing methods may be more appropriate than others. We review the popular offerings and introduce a new approach to try and unify two distinct problems; the phasing of extended pedigrees and the phasing of unrelated individuals. We conduct an extensive comparison of phasing methods on real and simulated data. Finally we demonstrate some preliminary results on extending methodology to sample sizes in the tens of thousands.
Style APA, Harvard, Vancouver, ISO itp.
26

Cresswell, Kellen Garrison. "Spectral methods for the detection and characterization of Topologically Associated Domains". VCU Scholars Compass, 2019. https://scholarscompass.vcu.edu/etd/6100.

Pełny tekst źródła
Streszczenie:
The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for identification of hierarchies. Additionally, there are no publicly available tools for comparison of TADs across datasets. These tools are necessary to conduct large-scale genome-wide analysis and comparison of 3D structure. To address the challenge of TAD identification, we developed a novel sliding window-based spectral clustering framework that uses gaps between consecutive eigenvectors for TAD boundary identification. Our method, implemented in an R package, SpectralTAD, has automatic parameter selection, is robust to sequencing depth, resolution and sparsity of Hi-C data, and detects hierarchical, biologically relevant TADs. SpectralTAD outperforms four state-of-the-art TAD callers in simulated and experimental settings. We demonstrate that TAD boundaries shared among multiple levels of the TAD hierarchy were more enriched in classical boundary marks and more conserved across cell lines and tissues. SpectralTAD is available at http://bioconductor.org/packages/SpectralTAD/. To address the problem of TAD comparison, we developed TADCompare. TADCompare is based on a spectral clustering-derived measure called the eigenvector gap, which enables a loci-by-loci comparison of TAD boundary differences between datasets. Using this measure, we introduce methods for identifying differential and consensus TAD boundaries and tracking TAD boundary changes over time. We further propose a novel framework for the systematic classification of TAD boundary changes. Colocalization- and gene enrichment analysis of different types of TAD boundary changes revealed distinct biological functionality associated with them. TADCompare is available on https://github.com/dozmorovlab/TADCompare.
Style APA, Harvard, Vancouver, ISO itp.
27

Minnier, Jessica. "Inference and Prediction for High Dimensional Data via Penalized Regression and Kernel Machine Methods". Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10327.

Pełny tekst źródła
Streszczenie:
Analysis of high dimensional data often seeks to identify a subset of important features and assess their effects on the outcome. Furthermore, the ultimate goal is often to build a prediction model with these features that accurately assesses risk for future subjects. Such statistical challenges arise in the study of genetic associations with health outcomes. However, accurate inference and prediction with genetic information remains challenging, in part due to the complexity in the genetic architecture of human health and disease. A valuable approach for improving prediction models with a large number of potential predictors is to build a parsimonious model that includes only important variables. Regularized regression methods are useful, though often pose challenges for inference due to nonstandard limiting distributions or finite sample distributions that are difficult to approximate. In Chapter 1 we propose and theoretically justify a perturbation-resampling method to derive confidence regions and covariance estimates for marker effects estimated from regularized procedures with a general class of objective functions and concave penalties. Our methods outperform their asymptotic-based counterparts, even when effects are estimated as zero. In Chapters 2 and 3 we focus on genetic risk prediction. The difficulty in accurate risk assessment with genetic studies can in part be attributed to several potential obstacles: sparsity in marker effects, a large number of weak signals, and non-linear effects. Single marker analyses often lack power to select informative markers and typically do not account for non-linearity. One approach to gain predictive power and efficiency is to group markers based on biological knowledge such genetic pathways or gene structure. In Chapter 2 we propose and theoretically justify a multi-stage method for risk assessment that imposes a naive bayes kernel machine (KM) model to estimate gene-set specific risk models, and then aggregates information across all gene-sets by adaptively estimating gene-set weights via a regularization procedure. In Chapter 3 we extend these methods to meta-analyses by introducing sampling-based weights in the KM model. This permits building risk prediction models with multiple studies that have heterogeneous sampling schemes
Style APA, Harvard, Vancouver, ISO itp.
28

Speed, Douglas Christopher. "Exploring nonlinear regression methods, with application to association studies". Thesis, University of Cambridge, 2011. https://www.repository.cam.ac.uk/handle/1810/241092.

Pełny tekst źródła
Streszczenie:
The field of nonlinear regression is a long way from reaching a consensus. Once a method decides to explore nonlinear combinations of predictors, a number of questions are raised, such as what nonlinear combinations to permit and how best to search the resulting model space. Genetic Association Studies comprise an area that stands to gain greatly from the development of more sophisticated regression methods. While these studies' ability to interrogate the genome has advanced rapidly over recent years, it is thought that a lack of suitable regression tools prevents them from achieving their full potential. I have tried to investigate the area of regression in a methodical manner. In Chapter 1, I explain the regression problem and outline existing methods. I observe that both linear and nonlinear methods can be categorised according to the restrictions enforced by their underlying model assumptions and speculate that a method with as few restrictions as possible might prove more powerful. In order to design such a method, I begin by assuming each predictor is tertiary (takes no more than three distinct values). In Chapters 2 and 3, I propose the method Sparse Partitioning. Its name derives from the way it searches for high scoring partitions of the predictor set, where each partition defines groups of predictors that jointly contribute towards the response. A sparsity assumption supposes most predictors belong in the 'null group' indicating they have no effect on the outcome. In Chapter 4, I compare the performance of Sparse Partitioning to existing methods using simulated and real data. The results highlight how greatly a method's power depends on the validity of its model assumptions. For this reason, Sparse Partitioning appears to offer a robust alternative to current methods, as its lack of restrictions allows it to maintain power in scenarios where other methods will fail. Sparse Partitioning relies on Markov chain Monte Carlo estimation, which limits the size of problem on which it can be used. Therefore, in Chapter 5, I propose a deterministic version ofthe method which, although less powerful, is not affected by convergence issues. In Chapter 6, I describe Bayesian Projection Pursuit, which adds spline fitting into the method to cope withnon-tertiary predictors.
Style APA, Harvard, Vancouver, ISO itp.
29

Mayor, Lianne Rosalind. "Statistical methods in molecular and population genetics : clustering of similar genes and investigating relatedness of individuals". Thesis, Imperial College London, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445322.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
30

McCaskie, Pamela Ann. "Multiple-imputation approaches to haplotypic analysis of population-based data with applications to cardiovascular disease". University of Western Australia. School of Population Health, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0160.

Pełny tekst źródła
Streszczenie:
[Truncated abstract] This thesis investigates novel methods for the genetic association analysis of haplotype data in samples of unrelated individuals, and applies these methods to the analysis of coronary heart disease and related phenotypes. Determining the inheritance pattern of genetic variants in studies of unrelated individuals can be problematic because family members of the studied individuals are often not available. For the analysis of individual genetic loci, no problem arises because the unit of interest is the observed genotype. When the unit of interest is the linear combination of alleles along one chromosome, inherited together in a haplotype, it is not always possible to determine with certainty the inheritance pattern, and therefore statistical methods to infer these patterns must be adopted. Due to genotypic heterozygosity, mutliple possible haplotype configurations can often resolve an individual's genotype measures at multiple loci. When haplotypes are not known, but are inferred statistically, an element of uncertainty is thus inherent which, if not dealt with appropriately, can result in unreliable estimates of effect sizes in an association setting. The core aim of the research described in this thesis was to develop and implement a general method for haplotype-based association analysis using multiple imputation to appropriately deal with uncertainty haplotype assignment. Regression-based approaches to association analysis provide flexible methods to investigate the influence of a covariate on a response variable, adjusting for the effects of other variables including interaction terms. ... These methods are then applied to models accommodating binary, quantitative, longitudinal and survival data. The performance of the multiple imputation method implemented was assessed using simulated data under a range of haplotypic effect sizes and genetic inheritance patterns. The multiple imputation approach performed better, on average, than ignoring haplotypic uncertainty, and provided estimates that in most cases were similar to those observed when haplotypes were known. The haplotype association methods developed in this thesis were used to investigate the genetic epidemiology of cardiovascular disease, utilising data for the cholesteryl ester transfer protein gene (CETP), the hepatic lipase (LIPC) gene and the 15- lipoxygenase (ALOX15) gene on a total of 6,487 individuals from three Western Australian studies. Results of these analyses suggested single nucleotide polymorphisms (SNPs) and haplotypes in the CETP gene were associated with increased plasma high-density lipoprotein cholesterol (HDL-C). SNPs in the LIPC gene were also associated with increased HDL-C and haplotypes in the ALOX15 gene were associated with risk of carotid plaque among individuals with premature CHD. The research presented in this thesis is both novel and important as it provides methods for the analysis of haplotypic associations with a range of response types, while incorporating information about haplotype uncertainty inherent in populationbased studies. These methods are shown to perform well for a range of simulated and real data situations, and have been written into a statistical analysis package that has been freely released to the research community.
Style APA, Harvard, Vancouver, ISO itp.
31

Haddon, Andrew L. "Evaluation of Some Statistical Methods for the Identification of Differentially Expressed Genes". FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/1913.

Pełny tekst źródła
Streszczenie:
Microarray platforms have been around for many years and while there is a rise of new technologies in laboratories, microarrays are still prevalent. When it comes to the analysis of microarray data to identify differentially expressed (DE) genes, many methods have been proposed and modified for improvement. However, the most popular methods such as Significance Analysis of Microarrays (SAM), samroc, fold change, and rank product are far from perfect. When it comes down to choosing which method is most powerful, it comes down to the characteristics of the sample and distribution of the gene expressions. The most practiced method is usually SAM or samroc but when the data tends to be skewed, the power of these methods decrease. With the concept that the median becomes a better measure of central tendency than the mean when the data is skewed, the tests statistics of the SAM and fold change methods are modified in this thesis. This study shows that the median modified fold change method improves the power for many cases when identifying DE genes if the data follows a lognormal distribution.
Style APA, Harvard, Vancouver, ISO itp.
32

Vukcevic, Damjan. "Bayesian and frequentist methods and analyses of genome-wide association studies". Thesis, University of Oxford, 2009. http://ora.ox.ac.uk/objects/uuid:8f89593e-a4ab-4df0-b297-74194be7891c.

Pełny tekst źródła
Streszczenie:
Recent technological advances and remarkable successes have led to genome-wide association studies (GWAS) becoming a tool of choice for investigating the genetic basis of common complex human diseases. These studies typically involve samples from thousands of individuals, scanning their DNA at up to a million loci along the genome to discover genetic variants that affect disease risk. Hundreds of such variants are now known for common diseases, nearly all discovered by GWAS over the last three years. As a result, many new studies are planned for the future or are already underway. In this thesis, I present analysis results from actual studies and some developments in theory and methodology. The Wellcome Trust Case Control Consortium (WTCCC) published one of the first large-scale GWAS in 2007. I describe my contribution to this study and present the results from some of my follow-up analyses. I also present results from a GWAS of a bipolar disorder sub-phenotype, and a recent and on-going fine mapping experiment. Building on methods developed as part of the WTCCC, I describe a Bayesian approach to GWAS analysis and compare it to widely used frequentist approaches. I do so both theoretically, by interpreting each approach from the perspective of the other, and empirically, by comparing their performance in the context of replicated GWAS findings. I discuss the implications of these comparisons on the interpretation and analysis of GWAS generally, highlighting the advantages of the Bayesian approach. Finally, I examine the effect of linkage disequilibrium on the detection and estimation of various types of genetic effects, particularly non-additive effects. I derive a theoretical result showing how the power to detect a departure from an additive model at a marker locus decays faster than the power to detect an association.
Style APA, Harvard, Vancouver, ISO itp.
33

Yip, Wai-Ki. "Statistical Methods for Analyzing DNA Methylation Data and Subpopulation Analysis of Continuous, Binary and Count Data for Clinical Trials". Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:14226106.

Pełny tekst źródła
Streszczenie:
DNA methylation may represent an important contributor to the missing heritability described in complex trait genetics. However, technology to measure DNA methylation has outpaced statistical methods for analysis. Novel methodologies are required to accommodate this growing volume of DNA methylation data. In this dissertation, I propose two novel methods to analyze DNA methylation data: (1) a new statistic based on spatial location information of DNA methylation sites to detect differentially methylated regions in the genome in case and control studies; and (2) a principal component approach for the detection of unknown substructure in DNA methylation data. For each method, I review existing ones and demonstrate the efficacy of my proposed method using simulation and data application. Medical research is increasingly focused on personalizing the care of patients. A better understanding of the interaction between treatment and patient specific prognostic factors will enable practitioners to expand the availability of tailored therapies improving patient outcomes. The Subpopulation Treatment Effect Pattern Plot (STEPP) approach was developed to allow researchers to investigate the heterogeneity of treatment effects on survival outcomes across increasing values of a continuously measured covariate, such as biomarker measurement. I extend the STEPP approach to continuous, binary and count outcomes which can be easily modeled with generalized linear models (GLM). The statistical significance of any observed heterogeneity of treatment effect is assessed using permutation tests. The method is implemented in the R software package (stepp) and is available in R version 3.1.1. The efficacy of my STEPP extension is demonstrated by using simulation and data application.
Style APA, Harvard, Vancouver, ISO itp.
34

Díaz, Oscar. "Genetic diversity in Elymus species (Triticeae) with emphasis on the Nordic region /". Svalöv : Swedish Univ. of Agricultural Sciences (Sveriges lantbruksuniv.), 1999. http://epsilon.slu.se/avh/1999/91-576-5493-X.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
35

Andersson, Alfred. "Neural networks for imputation of missing genotype data : An alternative to the classical statistical methods in bioinformatics". Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413635.

Pełny tekst źródła
Streszczenie:
In this project, two different machine learning models were tested in an attempt at imputing missing genotype data from patients on two different panels. As the integrity of the patients had to be protected, initial training was done on data simulated from the 1000 Genomes Project. The first model consisted of two convolutional variational autoencoders and the latent representations of the networks were shuffled to force the networks to find the same patterns in the two datasets. This model was unfortunately unsuccessful at imputing the missing data. The second model was based on a UNet structure and was more successful at the task of imputation. This model had one encoder for each dataset, making each encoder specialized at finding patterns in its own data. Further improvements are required in order for the model to be fully capable at imputing the missing data.
Style APA, Harvard, Vancouver, ISO itp.
36

Coop, Graham M. "The likelihood of gene trees under selective models". Thesis, University of Oxford, 2004. http://ora.ox.ac.uk/objects/uuid:ba97d36c-61c1-40c8-a1f4-e7ddc8918d5b.

Pełny tekst źródła
Streszczenie:
The extent to which natural selection shapes diversity within populations is a key question for population genetics. Thus, there is considerable interest in quantifying the strength of selection. In this thesis a full likelihood approach for inference about selection at a single site within an otherwise neutral fully-linked sequence of sites is developed. Integral to many of the ideas introduced in this thesis is the reversibility of the diffusion process, and some past approaches to this concept are reviewed. A coalescent model of evolution is used to model the ancestry of a sample of DNA sequences which have the selected site segregating. A novel method for simulating the coalescent with selection, acting at a single biallelic site, is described. Selection is incorporated through modelling the frequency of the selected and neutral allelic classes stochastically back in time. The ancestry is then simulated using a subdivided population model considering the population frequencies through time as variable population sizes. The approach is general and can be used for any selection scheme at a biallelic locus. The mutation model, for the selected and neutral sites, is the infinitely-many-sites model where there is no back or parallel mutation at sites. This allows a unique perfect phylogeny, a gene tree, to be constructed from the configuration of mutations on the sample sequences. An importance sampling algorithm is described to explore over coalescent tree space consistent with this gene tree. The method is used to assess the evidence for selection in a number of data sets. These are as follows: a partial selective sweep in the G6PD gene (Verrelli et al., 2002); a recent full sweep in the Factor IX gene (Harris and Hey, 2001); and balancing selection in the DCP1 gene (Rieder et al., 1999). Little evidence of the action of selection is found in the data set of Verrelli et al. (2002) and the data set of Rieder et al. (1999) seems inconsistent with the model of balancing selection. The patterns of diversity in the data set of Harris and Hey (2001) offer support of the hypothesis of a full sweep.
Style APA, Harvard, Vancouver, ISO itp.
37

Cuthbertson, Charles. "Limits to the rate of adaptation". Thesis, University of Oxford, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.670176.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

Liley, Albert James. "Statistical co-analysis of high-dimensional association studies". Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/270628.

Pełny tekst źródła
Streszczenie:
Modern medical practice and science involve complex phenotypic definitions. Understanding patterns of association across this range of phenotypes requires co-analysis of high-dimensional association studies in order to characterise shared and distinct elements. In this thesis I address several problems in this area, with a general linking aim of making more efficient use of available data. The main application of these methods is in the analysis of genome-wide association studies (GWAS) and similar studies. Firstly, I developed methodology for a Bayesian conditional false discovery rate (cFDR) for levering GWAS results using summary statistics from a related disease. I extended an existing method to enable a shared control design, increasing power and applicability, and developed an approximate bound on false-discovery rate (FDR) for the procedure. Using the new method I identified several new variant-disease associations. I then developed a second application of shared control design in the context of study replication, enabling improvement in power at the cost of changing the spectrum of sensitivity to systematic errors in study cohorts. This has application in studies on rare diseases or in between-case analyses. I then developed a method for partially characterising heterogeneity within a disease by modelling the bivariate distribution of case-control and within-case effect sizes. Using an adaptation of a likelihood-ratio test, this allows an assessment to be made of whether disease heterogeneity corresponds to differences in disease pathology. I applied this method to a range of simulated and real datasets, enabling insight into the cause of heterogeneity in autoantibody positivity in type 1 diabetes (T1D). Finally, I investigated the relation of subtypes of juvenile idiopathic arthritis (JIA) to adult diseases, using modified genetic risk scores and linear discriminants in a penalised regression framework. The contribution of this thesis is in a range of methodological developments in the analysis of high-dimensional association study comparison. Methods such as these will have wide application in the analysis of GWAS and similar areas, particularly in the development of stratified medicine.
Style APA, Harvard, Vancouver, ISO itp.
39

Baker, Peter John. "Applied Bayesian modelling in genetics". Thesis, Queensland University of Technology, 2001.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
40

Kam-Thong, Tony Verfasser], i Klaus-Robert [Akademischer Betreuer] [Müller. "Massive parallelization of combinatorial statistical genetics analyses porting machine learning methods on general purpose graphics processing units (GPU) / Tony Kam-Thong. Betreuer: Klaus Robert Müller". Berlin : Universitätsbibliothek der Technischen Universität Berlin, 2012. http://d-nb.info/102553879X/34.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
41

Magosi, Lerato Elaine. "Dissecting heterogeneity in GWAS meta-analysis". Thesis, University of Oxford, 2017. https://ora.ox.ac.uk/objects/uuid:c853f7e7-93de-440c-b57c-fcfc03d3bb86.

Pełny tekst źródła
Streszczenie:
Statistical heterogeneity refers to differences among results of studies combined in a meta-analysis beyond that expected by chance. On the one hand, excessive heterogeneity can diminish power to discover genetic signals; on the other, moderate heterogeneity can reveal important biological differences among studies. Given its double-edged nature, this thesis dissects heterogeneity in genetic association meta-analyses from three vantage points. First, a novel multi-variant statistic, M is proposed to detect genome-wide (systematic) heterogeneity patterns in genetic association meta-analyses. This was motivated by the limited availability of appropriate methodology to measure the impact of heterogeneity across genetic signals, since traditional metrics (Q, I2 and T2) measure heterogeneity at individual variants. Second, given that meta-analyses comprising small numbers of studies typically report imprecise summary effect estimates; GWAS-derived empirical heterogeneity priors are used to improve precision in estimation of average genetic effects and heterogeneity in smaller meta-analyses (e.g. ≤ 10 studies). Third, a critical evaluation of the Han-Eskin random-effects model shows how it can identify small effect heterogeneous loci overlooked by traditional fixed and random-effects methods. This work draws attention to the existence of genome-wide heterogeneity patterns, to reveal systematic differences among the ascertainment criteria of participating studies in a meta-analysis of coronary disease (CAD) risk. Furthermore, simulation studies with the Han-Eskin random-effects model revealed inflated genetic signals at small effect loci when heterogeneity levels were high. However, it did reveal an additional CAD risk variant overlooked by traditional meta-analysis methods. We therefore recommend a holistic approach to exploring heterogeneity in meta-analyses which assesses heterogeneity of genetic effects both at individual variants with traditional statistics and across multiple genetic signals with the M statistic. Furthermore, it is critically important to review forest plots for small effect loci identified using the Han-Eskin random-effects model amidst moderate-to-high heterogeneity (I2 ≥ 40%).
Style APA, Harvard, Vancouver, ISO itp.
42

Pook, Torsten [Verfasser], Henner [Akademischer Betreuer] Simianer, Henner [Gutachter] Simianer, Timothy Mathes [Gutachter] Beissinger i Hans-Peter [Gutachter] Piepho. "Methods and software to enhance statistical analysis in large scale problems in breeding and quantitative genetics / Torsten Pook ; Gutachter: Henner Simianer, Timothy Mathes Beissinger, Hans-Peter Piepho ; Betreuer: Henner Simianer". Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2019. http://d-nb.info/1199608254/34.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
43

Shar, Nisar Ahmed. "Statistical methods for predicting genetic regulation". Thesis, University of Leeds, 2016. http://etheses.whiterose.ac.uk/16729/.

Pełny tekst źródła
Streszczenie:
Transcriptional regulation of gene expression is essential for cellular differentiation and function, and defects in the process are associated with cancer. Transcription is regulated by the cis-acting regulatory regions and trans-acting regulatory elements. Transcription factors bind on enhancers and repressors and form complexes by interacting with each other to control the expression of the genes. Understanding the regulation of genes would help us to understand the biological system and can be helpful in identifying therapeutic targets for diseases such as cancer. The ENCODE project has mapped binding sites of many TFs in some important cell types and this project also has mapped DNase I hypersensitivity sites across the cell types. Predicting transcription factors mutual interactions would help us in finding the potential transcription regulatory networks. Here, we have developed two methods for prediction of transcription factors mutual interactions from ENCODE ChIP-seq data, and both methods generated similar results which tell us about the accuracy of the methods. It is known that functional regions of genome are conserved and here we identified that shared/overlapping transcription factor binding sites in multiple cell types and in transcription factors pairs are more conserved than their respective non-shared/non-overlapping binding sites. It has been also studied that co-binding sites influence the expression level of genes. Most of the genes mapped to the transcription factor co-binding sites have significantly higher level of expression than those genes which were mapped to the single transcription factor bound sites. The ENCODE data suggests a very large number of potential regulatory sites across the complete genome in many cell types and methods are needed to identify those that are most relevant and to connect them to the genes that they control. A penalized regression method, LASSO was used to build correlative models, and choose two regulatory regions that are predictive of gene expression, and link them to their respective gene. Here, we show that our identified regulatory regions accumulate significant number of somatic mutations that occur in cancer cells, suggesting that their effects may drive cancer initiation and development. Harboring of somatic mutations in these identified regulatory regions is an indication of positive selection, which has been also observed in cancer related genes.
Style APA, Harvard, Vancouver, ISO itp.
44

Rivas, Cruz Manuel A. "Medical relevance and functional consequences of protein truncating variants". Thesis, University of Oxford, 2015. http://ora.ox.ac.uk/objects/uuid:a042ca18-7b35-4a62-aef0-e3ba2e8795f7.

Pełny tekst źródła
Streszczenie:
Genome-wide association studies have greatly improved our understanding of the contribution of common variants to the genetic architecture of complex traits. However, two major limitations have been highlighted. First, common variant associations typically do not identify the causal variant and/or the gene that it is exerting its effect on to influence a trait. Second, common variant associations usually consist of variants with small effects. As a consequence, it is more challenging to harness their translational impact. Association studies of rare variants and complex traits may be able to help address these limitations. Empirical population genetic data shows that deleterious variants are rare. More specifically, there is a very strong depletion of common protein truncating variants (PTVs, commonly referred to as loss-of-function variants) in the genome, a group of variants that have been shown to have large effect on gene function, are enriched for severe disease-causing mutations, but in other instances may actually be protective against disease. This thesis is divided into three parts dedicated to the study of protein truncating variants, their medical relevance, and their functional consequences. First, I present statistical, bioinformatic, and computational methods developed for the study of protein truncating variants and their association to complex traits, and their functional consequences. Second, I present application of the methods to a number of case-control and quantitative trait studies discovering new variants and genes associated to breast and ovarian cancer, type 1 diabetes, lipids, and metabolic traits measured with NMR spectroscopy. Third, I present work on improving annotation of protein truncating variants by studying their functional consequences. Taken together, these results highlight the utility of interrogating protein truncating variants in medical and functional genomic studies.
Style APA, Harvard, Vancouver, ISO itp.
45

Cheng, Lulu. "Statistical Methods for Genetic Pathway-Based Data Analysis". Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/52039.

Pełny tekst źródła
Streszczenie:
The wide application of the genomic microarray technology triggers a tremendous need in the development of the high dimensional genetic data analysis. Many statistical methods for the microarray data analysis consider one gene at a time, but they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from the prior biological knowledge and are called "pathways". We have made contributions on two specific research topics related to the high dimensional genetic pathway data. One is to propose a semi- parametric model for identifying pathways related to the zero inflated clinical outcomes; the other is to propose a multilevel Gaussian graphical model for exploring both pathway and gene level network structures. For the first problem, we develop a semiparametric model via a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero inflated Poisson hierarchical regression model with unknown link function. The nonparametric pathway effect is estimated via the kernel machine and the unknown link function is estimated by transforming a mixture of beta cumulative density functions. Our approach provides flexible semiparametric settings to describe the complicated association between gene microarray expressions and the clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor are used to make the statistical inferences. Our simulation results support that the semiparametric approach is more accurate and flexible than the zero inflated Poisson regression with the canonical link function, this is especially true when the number of genes is large. The usefulness of our approaches is demonstrated through its applications to a canine gene expression data set (Enerson et al., 2006). Our approaches can also be applied to other settings where a large number of highly correlated predictors are present. Unlike the first problem, the second one is to take into account that pathways are not independent of each other because of shared genes and interactions among pathways. Multi-pathway analysis has been a challenging problem because of the complex dependence structure among pathways. By considering the dependency among pathways as well as genes within each pathway, we propose a multi-level Gaussian graphical model (MGGM): one level is for pathway network and the second one is for gene network. We develop a multilevel L1 penalized likelihood approach to achieve the sparseness on both levels. We also provide an iterative weighted graphical LASSO algorithm (Guo et al., 2011) for MGGM. Some asymptotic properties of the estimator are also illustrated. Our simulation results support the advantages of our approach; our method estimates the network more accurate on the pathway level, and sparser on the gene level. We also demonstrate usefulness of our approach using the canine genes-pathways data set.
Ph. D.
Style APA, Harvard, Vancouver, ISO itp.
46

Czarn, Andrew Simon Timothy. "Statistical exploratory analysis of genetic algorithms". University of Western Australia. School of Computer Science and Software Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0030.

Pełny tekst źródła
Streszczenie:
[Truncated abstract] Genetic algorithms (GAs) have been extensively used and studied in computer science, yet there is no generally accepted methodology for exploring which parameters significantly affect performance, whether there is any interaction between parameters and how performance varies with respect to changes in parameters. This thesis presents a rigorous yet practical statistical methodology for the exploratory study of GAs. This methodology addresses the issues of experimental design, blocking, power and response curve analysis. It details how statistical analysis may assist the investigator along the exploratory pathway. The statistical methodology is demonstrated in this thesis using a number of case studies with a classical genetic algorithm with one-point crossover and bit-replacement mutation. In doing so we answer a number of questions about the relationship between the performance of the GA and the operators and encoding used. The methodology is suitable, however, to be applied to other adaptive optimization algorithms not treated in this thesis. In the first instance, as an initial demonstration of our methodology, we describe case studies using four standard test functions. It is found that the effect upon performance of crossover is predominantly linear while the effect of mutation is predominantly quadratic. Higher order effects are noted but contribute less to overall behaviour. In the case of crossover both positive and negative gradients are found which suggests using rates as high as possible for some problems while possibly excluding it for others. .... This is illustrated by showing how the use of Gray codes impedes the performance on a lower modality test function compared with a higher modality test function. Computer animation is then used to illustrate the actual mechanism by which this occurs. Fourthly, the traditional concept of a GA is that of selection, crossover and mutation. However, a limited amount of data from the literature has suggested that the niche for the beneficial effect of crossover upon GA performance may be smaller than has traditionally been held. Based upon previous results on not-linear-separable problems an exploration is made by comparing two test problem suites, one comprising non-rotated functions and the other comprising the same functions rotated by 45 degrees in the solution space rendering them not-linear-separable. It is shown that for the difficult rotated functions the crossover operator is detrimental to the performance of the GA. It is conjectured that what makes a problem difficult for the GA is complex and involves factors such as the degree of optimization at local minima due to crossover, the bias associated with the mutation operator and the Hamming Distances present in the individual problems due to the encoding. Furthermore, the GA was tested on a real world landscape minimization problem to see if the results obtained would match those from the difficult rotated functions. It is demonstrated that they match and that the features which make certain of the test functions difficult are also present in the real world problem. Overall, the proposed methodology is found to be an effective tool for revealing relationships between a randomized optimization algorithm and its encoding and parameters that are difficult to establish from more ad-hoc experimental studies alone.
Style APA, Harvard, Vancouver, ISO itp.
47

Shen, Rujun, i 沈汝君. "Mining optimal technical trading rules with genetic algorithms". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B47870011.

Pełny tekst źródła
Streszczenie:
In recent years technical trading rules are widely known by more and more people, not only the academics many investors also learn to apply them in financial markets. One approach of constructing technical trading rules is to use technical indicators, such as moving average(MA) and filter rules. These trading rules are widely used possibly because the technical indicators are simple to compute and can be programmed easily. An alternative approach of constructing technical trading rules is to rely on some chart patterns. However, the patterns and signals detected by these rules are often made by the visual inspection through human eyes. As for as I know, there are no universally acceptable methods of constructing the chart patterns. In 2000, Prof. Andrew Lo and his colleagues are the first ones who define five pairs of chart patterns mathematically. They are Head-and-Shoulders(HS) & Inverted Headand- Shoulders(IHS), Broadening tops(BTOP) & bottoms(BBOT), Triangle tops(TTOP) & bottoms(TBOT), Rectangle tops(RTOP) & bottoms( RBOT) and Double tops(DTOP) & bottoms(DBOT). The basic formulation of a chart pattern consists of two steps: detection of (i) extreme points of a price series; and (ii) shape of the pattern. In Lo et al.(2000), the method of kernel smoothing was used to identify the extreme points. It was admitted by Lo et al. (2000) that the optimal bandwidth used in kernel method is not the best choice and the expert judgement is needed in detecting the bandwidth. In addition, their work considered chart pattern detection only but no buy/sell signal detection. It should be noted that it is possible to have a chart pattern formed without a signal detected, but in this case no transaction will be made. In this thesis, I propose a new class of technical trading rules which aims to resolve the above problems. More specifically, each chart pattern is parameterized by a set of parameters which governs the shape of the pattern, the entry and exit signals of trades. Then the optimal set of parameters can be determined by using genetic algorithms (GAs). The advantage of GA is that they can deal with a high-dimensional optimization problems no matter the parameters to be optimized are continuous or discrete. In addition, GA can also be convenient to use in the situation that the fitness function is not differentiable or has a multi-modal surface.
published_or_final_version
Statistics and Actuarial Science
Master
Master of Philosophy
Style APA, Harvard, Vancouver, ISO itp.
48

Clark, Taane Gregory. "Statistical methods for finding associations in dense genetic regions". Thesis, University of Oxford, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.413976.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
49

Ferreira, Teresa. "Statistical methods for modelling epistasis in genetic association studies". Thesis, University of Oxford, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.543476.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
50

Lin, Xinyi (Cindy). "Statistical Methods for High-Dimensional Data in Genetic Epidemiology". Thesis, Harvard University, 2014. http://dissertations.umi.com/gsas.harvard:11326.

Pełny tekst źródła
Streszczenie:
Recent technological advancements have enabled us to collect an unprecedented amount of genetic epidemiological data. The overarching goal of these genetic epidemiology studies is to uncover the underlying biological mechanisms so that improved strategies for disease prevention and management can be developed. To efficiently analyze and interpret high-dimensional biological data, it is imperative to develop novel statistical methods as conventional statistical methods are generally not applicable or are inefficient. In this dissertation, we introduce three novel, powerful and computationally efficient kernel machine set-based association tests for analyzing high-throughput genetic epidemiological data. In the first chapter, we construct a test for identifying common genetic variants that are predictive of a time-to-event outcome. In the second chapter, we develop a test for identifying gene-environment interactions for common genetic variants. In the third chapter, we propose a test for identifying gene-environment interactions for rare genetic variants.
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii