Rozprawy doktorskie na temat „Statistical genetics”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Statistical genetics.

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Statistical genetics”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.

1

Qiao, Dandi. "Statistical Approaches for Next-Generation Sequencing Data". Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10689.

Pełny tekst źródła
Streszczenie:
During the last two decades, genotyping technology has advanced rapidly, which enabled the tremendous success of genome-wide association studies (GWAS) in the search of disease susceptibility loci (DSLs). However, only a small fraction of the overall predicted heritability can be explained by the DSLs discovered. One possible explanation for this ”missing heritability” phenomenon is that many causal variants are rare. The recent development of high-throughput next-generation sequencing (NGS) technology provides the instrument to look closely at these rare variants with precision and efficiency. However, new approaches for both the storage and analysis of sequencing data are in imminent needs. In this thesis, we introduce three methods that could be utilized in the management and analysis of sequencing data. In Chapter 1, we propose a novel and simple algorithm for compressing sequencing data that leverages on the scarcity of rare variant data, which enables the storage and analysis of sequencing data efficiently in current hardware environment. We also provide a C++ implementation that supports direct and parallel loading of the compressed format without requiring extra time for decompression. Chapter 2 and 3 focus on the association analysis of sequencing data in population-based design. In Chapter 2, we present a statistical methodology that allows the identification of genetic outliers to obtain a genetically homogeneous subpopulation, which reduces the false positives due to population substructure. Our approach is computationally efficient that can be applied to all the genetic loci in the data and does not require pruning of variants in linkage disequilibrium (LD). In Chapter 3, we propose a general analysis framework in which thousands of genetic loci can be tested simultaneously for association with complex phenotypes. The approach is built on spatial-clustering methodology, assuming that genetic loci that are associated with the target phenotype cluster in certain genomic regions. In contrast to standard methodology for multi-loci analysis, which has focused on the dimension reduction of data, the proposed approach profits from the availability of large numbers of genetic loci. Thus it will be especially relevant for whole-genome sequencing studies which commonly record several thousand loci per gene.
Style APA, Harvard, Vancouver, ISO itp.
2

Oldmeadow, Christopher. "Latent variable models in statistical genetics". Thesis, Queensland University of Technology, 2009. https://eprints.qut.edu.au/31995/1/Christopher_Oldmeadow_Thesis.pdf.

Pełny tekst źródła
Streszczenie:
Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.
Style APA, Harvard, Vancouver, ISO itp.
3

Bruen, Trevor Cormac Vincent. "Discrete and statistical approaches to genetics". Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=102964.

Pełny tekst źródła
Streszczenie:
This thesis presents a number of major innovations in related but different areas of research. The contributions range along a continuum from mathematical phylogenetics, to development of statistical methodology for detecting recombination and finally to the application of statistical techniques to understand Feline Immunodeficiency Virus (FIV) an important pathogen. An underlying theme is the application of combinatorial and statistical ideas to problems in evolutionary biology and genetics.
Chapter 2 and Chapter 3 give a number of results relevant to mathematical phylogenetics, in particular maximum parsimony. Chapter 2 presents a new formulation of maximum parsimony in terms of character subdivision, providing a direct link with the character compatibility problem, also known as the perfect phylogeny problem. Specialization of this result to two characters gives a simple formula based on the intersection graph for calculating the parsimony score for a, pair of characters. Chapter 3 further explores maximum parsimony. In particular, it is shown that a maximum parsimony tree for a sequence of characters minimizes a subtree-prune and regraft (SPR) distance to the sets of trees on which each character is convex. Similar connections are also drawn between the Robinson-Foulds distance and a new variant of Dollo parsimony.
Chapter 4 presents an application of the work in Chapters 2 and 3 to develop a statistical test for detecting recombination. An extensive coalescent based simulation study shows that this new test is both robust and powerful in a variety of different circumstances compared to a number of current methods. In fact, a simple model of mutation rate correlation is shown to mislead a number of competing tests, causing recombination to be falsely inferred. Analysis of empirical data sets confirm that the new test is one of the best approaches to distinguish recurrent mutation from recombination.
Finally, Chapter 5 uses the test developed in Chapter 4 to localize recombinant breakpoints in 14 genomic strains of FIV taken from a wild population of cougars. Based on the technique, three recombinant strains of FIV are identified. Previous studies have focused on the epidemiology and population structure of the virus and this study shows that recombination has also played an important role in the evolution of FIV.
Style APA, Harvard, Vancouver, ISO itp.
4

Baillie, John Kenneth. "Statistical genetics in infectious disease susceptibility". Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/17620.

Pełny tekst źródła
Streszczenie:
Death from infectious disease is common heritable, and in many cases a consequence of the host response, rather than direct effects of the pathogen. Since the host response in sepsis is orchestrated by the transmission of a variety of signals, both intra-cellular and inter-cellular, with which we have at least some capacity to intervene, it follows that it should be possible to prevent death through pharmaceutical modulation of inflammatory cascades. So far, it is not. The best candidate therapy for sepsis, activated protein C, failed to live up to initial promise and was ultimately withdrawn from the market in dismal failure. The premise of the work presented here is that a different approach – to develop an understanding of the host response at a genomic level – may yield more tractable insights, specifically into the problem of host susceptibility to influenza, a heritable cause of death in otherwise healthy people and a significant global threat. Since the sequencing of the human genome, it has become possible to identify genomic loci underlying host susceptibility to disease using genome-wide association studies (GWAS), best exemplified by the Wellcome Trust Case Control Consortium. This new technology creates substantial new challenges. The genetic markers associated with a phenotype are rarely causative, frequently in poorly-understood intergenic regions, and tend to have small effect sizes, such that tens or even hundreds of thousands of subjects must be recruited to have sufficient power to detect them. It is therefore not straightforward to translate these genotype-phenotype associations into useful understanding of the role of genes and gene products in disease pathogenesis. Attempts to overcome these challenges in order to discover genomic loci underlying individual susceptibility to infection form the core of this thesis. Ultimately these efforts converge with the development of a new computational method to detect phenotype-associated loci from genome-wide association studies (GWAS) using co-expression at regulatory regions of the genome.
Style APA, Harvard, Vancouver, ISO itp.
5

Mitchell, Brittany L. "Statistical genetic analyses of neuropsychological traits". Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/227852/14/Brittany%20Mitchell%20Thesis.pdf.

Pełny tekst źródła
Streszczenie:
Neuropsychological traits affect both the brain and behaviour and are responsible for a large proportion of worldwide disability. This PhD thesis employs computational, statistical and genetic approaches to identify and understand the genetic and environmental influences on a wide range of psychiatric, neurological and cognitive disorders. The work presented in this thesis details novel findings on several fronts including new genetic marker discovery, using genetics to predict an individual’s disease risk, and disentangling pertinent risk factors that affect cognitive and mental health. This insight is an important step towards developing more effective treatments and intervention strategies.
Style APA, Harvard, Vancouver, ISO itp.
6

Hudson, Julie. "Maternal Gene-Environment Effects: An Evaluation of Statistical Approaches to Detect Effects and an Investigation of the Effect of Violations of Model Assumptions". Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39637.

Pełny tekst źródła
Streszczenie:
Discovering the associations between genetic variables and disease status can help reduce the burden of disease on society. This thesis focuses on the methods required to detect maternal genetic effects (an effect where the genes of the mother affect the disease risk of the child) and interaction effects between these maternal genes and environmental variables in trio data consisting of parents and an affected child. A simulation study was conducted to determine the extent to which testing for these effects is affected by violations to the mating symmetry assumption required for two current methods when control parents are not available.. This study showed that methods for maternal effect estimation are not robust to these violations; however, the interaction test is robust to the violation. Finally, a candidate gene study on orofacial clefts was conducted to evaluate maternal gene-environment interactions in international consortium data. Significant effects were found but the large magnitude of the effect estimates raises concerns about the validity of the results. This thesis tries also discusses the lack of methods and software available to estimate maternal gene environment interactions.
Style APA, Harvard, Vancouver, ISO itp.
7

Casale, Francesco Paolo. "Multivariate linear mixed models for statistical genetics". Thesis, University of Cambridge, 2016. https://www.repository.cam.ac.uk/handle/1810/267465.

Pełny tekst źródła
Streszczenie:
In the last decade, genome-wide association studies have helped to advance our understanding of the genetic architecture of many important traits, including diseases. However, the statistical analysis of genotype-phenotype associations remains challenging due to multiple factors. First, many traits have polygenic architectures, which means that they are controlled by a large number of variants with small individual effects. Second, as increasingly deep phenotype data are being generated there is a need for multivariate analysis approaches to leverage multiple related phenotypes while retaining computational efficiency. Additionally, genetic analyses are confronted by strong confounding factors that can create spurious associations when not properly accounted for in the statistical model. We here derive more flexible methods that allow integrating genetic effects across variants and multiple quantitative traits. To do so, we build on the classical linear mixed model (LMM), a widely adopted framework for genetic studies. The first contribution of this thesis is mtSet, an efficient mixed-model approach that enables genome-wide association testing between sets of genetic variants and multiple traits while accounting for confounding factors. In both simulations and real-data applications we demonstrate that mtSet effectively combines the advantages of variant-set and multi-trait analyses. Next, we present a new model for gene-context interactions that builds on mtSet. The proposed interaction set test (iSet) yields increased statistical power for detecting polygenic interactions. Additionally, iSet enables the identification of genetic loci that are associated with different configurations of causal variants across contexts. After benchmarking the proposed method using simulated data, we consider two applications to real datasets, where we investigate genetic effects on gene expression across different cellular contexts and sex-specific genetic effects on lipid levels. Finally, we describe LIMIX, a software framework for the flexible implementation of different LMMs. Most of the models considered in this thesis, including mtSet and iSet, are implemented and available in LIMIX. A unique aspect of the software is an inference framework that allows a large class of genetic models to be defined and, in many cases, to be efficiently fitted by exploiting specific algebraic properties. We demonstrate the utility of this software suite in two applied collaboration projects. Taken together, this thesis demonstrates the value of flexible and integrative modelling in genetics and contributes new statistical methods for genetic analysis. These approaches generalise previous models, yet retain the computational efficiency that is needed to tackle large genetic datasets.
Style APA, Harvard, Vancouver, ISO itp.
8

Csilléry, Katalin. "Statistical inference in population genetics using microsatellites". Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3865.

Pełny tekst źródła
Streszczenie:
Statistical inference from molecular population genetic data is currently a very active area of research for two main reasons. First, in the past two decades an enormous amount of molecular genetic data have been produced and the amount of data is expected to grow even more in the future. Second, drawing inferences about complex population genetics problems, for example understanding the demographic and genetic factors that shaped modern populations, poses a serious statistical challenge. Amongst the many different kinds of genetic data that have appeared in the past two decades, the highly polymorphic microsatellites have played an important role. Microsatellites revolutionized the population genetics of natural populations, and were the initial tool for linkage mapping in humans and other model organisms. Despite their important role, and extensive use, the evolutionary dynamics of microsatellites are still not fully understood, and their statistical methods are often underdeveloped and do not adequately model microsatellite evolution. In this thesis, I address some aspects of this problem by assessing the performance of existing statistical tools, and developing some new ones. My work encompasses a range of statistical methods from simple hypothesis testing to more recent, complex computational statistical tools. This thesis consists of four main topics. First, I review the statistical methods that have been developed for microsatellites in population genetics applications. I review the different models of the microsatellite mutation process, and ask which models are the most supported by data, and how models were incorporated into statistical methods. I also present estimates of mutation parameters for several species based on published data. Second, I evaluate the performance of estimators of genetic relatedness using real data from five vertebrate populations. I demonstrate that the overall performance of marker-based pairwise relatedness estimators mainly depends on the population relatedness composition and may only be improved by the marker data quality within the limits of the population relatedness composition. Third, I investigate the different null hypotheses that may be used to test for independence between loci. Using simulations I show that testing for statistical independence (i.e. zero linkage disequilibrium, LD) is difficult to interpret in most cases, and instead a null hypothesis should be tested, which accounts for the “background LD” due to finite population size. I investigate the utility of a novel approximate testing procedure to circumvent this problem, and illustrate its use on a real data set from red deer. Fourth, I explore the utility of Approximate Bayesian Computation, inference based on summary statistics, to estimate demographic parameters from admixed populations. Assuming a simple demographic model, I show that the choice of summary statistics greatly influences the quality of the estimation, and that different parameters are better estimated with different summary statistics. Most importantly, I show how the estimation of most admixture parameters can be considerably improved via the use of linkage disequilibrium statistics from microsatellite data.
Style APA, Harvard, Vancouver, ISO itp.
9

Sperrin, Matthew. "Statistical methodology motivated by problems in genetics". Thesis, Lancaster University, 2010. http://eprints.lancs.ac.uk/49088/.

Pełny tekst źródła
Streszczenie:
Sequencing the human genome has made vast amounts of potentially useful genetic data accessible. An important challenge in statistics is to develop methodology to extract information from this data. In this thesis, developments are made in two methodological areas that have wide applications in genetics. First, probabilistic methods to deal with the label switching problem in Bayesian mixture models are introduced. Mixture models are used in situations where populations may consist of a number of sub-populations, or as a semi-parametric modelling tool. The label switching problem can prevent meaningful interpretation of the output of Markov Chain Monte Carlo samplers. Specifically, inference on attributes specific to sub-populations can be difficult. Such attributes play an important role in understanding genetic effects. We introduce probabilistic relabelling strategies as a natural way of overcoming the label switching problem, and compare with existing strategies. The comparisons demonstrate that the advantages oered by probabilistic strategies come without loss in parameter estimation ability. Second, we introduce direct eect testing (DET), which is a novel method that distinguishes direct from indirect eects between binary predictors and a binary response. DET consists of two stages: the rst stage nds eects, the second stage infers the uncertainty in determining which predictors cause which eects. The method is useful when it is of interest to recover direct eects between a large number of predictors and the response. This is a common goal in genetics, where we are interested in the eects of variations in the genome on the prevalence of a phenotype. This work includes detailed simulations, comparing the ability of a number of methods at recovering direct eects. DET outperforms existing methods at recovering direct eects in situations where there is high correlation between predictors, and matches their performance when the correlation is moderate or small.
Style APA, Harvard, Vancouver, ISO itp.
10

Lange, Christoph. "Generalized estimating equation methods in statistical genetics". Thesis, University of Reading, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.269921.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
11

ZHANG, GE. "STATISTICAL METHODS IN GENETIC ASSOCIATION". University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1196099744.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
12

Wright, David Jonathan. "Investigating statistical homogeneity of a human chromosome". Thesis, Queen Mary, University of London, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.338927.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
13

Ngong, Chiano Mathias. "Statistical problems in human genetic linkage analysis". Thesis, University of Cambridge, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.339750.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
14

Liesch, Rahel. "Statistical Genetics for the Budset in Norway Spruce". Thesis, Uppsala University, Department of Mathematics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-121386.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
15

Jung, Min Kyung. "Statistical methods for biological applications". [Bloomington, Ind.] : Indiana University, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3278454.

Pełny tekst źródła
Streszczenie:
Thesis (Ph.D.)--Indiana University, Dept. of Mathematics, 2007.
Source: Dissertation Abstracts International, Volume: 68-10, Section: B, page: 6740. Adviser: Elizabeth A. Housworth. Title from dissertation home page (viewed May 20, 2008).
Style APA, Harvard, Vancouver, ISO itp.
16

Choy, Yan-tsun. "Statistical evaluation of mixed DNA stains". Click to view the E-thesis via HKUTO, 2009. http://sunzi.lib.hku.hk/hkuto/record/B42664287.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
17

Yu, Xiaoqing. "Statistical Methods and Analyses for Next-generation Sequencing Data". Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1403708200.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
18

Yung, Godwin Yuen Han. "Statistical methods for analyzing genetic sequencing association studies". Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:33493313.

Pełny tekst źródła
Streszczenie:
Case-control genetic sequencing studies are increasingly being conducted to identify rare variants associated with complex diseases. Oftentimes, these studies collect a variety of secondary traits--quantitative and qualitative traits besides the case-control disease status. Reusing the data and studying the association between rare variants and secondary phenotypes provide an attractive and cost effective approach that can lead to discovery of new genetic associations. In Chapter 1, we carry out an extensive investigation of the validity of ad hoc methods, which are simple, computationally efficient methods frequently applied in practice to study the association between secondary phenotypes and single common genetic variants. Though other researchers have investigated the same problem, we make two key contributions to existing literature. First, we show that in taking an ad hoc approach, it may be desirable to adjust for covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype in the population. Second, we show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a non-logistic model such as the probit model. Spurious associations can be avoided by including interaction terms in the fitted regression model. Our results are justified theoretically and via simulations, and illustrated by a genome-wide association study of smoking using a lung cancer case-control study. In Chapter 2, we consider the problem of testing associations between secondary phenotypes and sets of rare genetic variants. We show that popular region-based methods such as the burden test and the sequence kernel association test (SKAT) can only be applied under the same conditions as those applicable to ad hoc methods (Chapter 1). For a more robust alternative, we propose an inverse-probability-weighted version of the optimal SKAT (SKAT-O) to account for unequal sampling of cases and controls. As an extension of SKAT-O, our approach is data adaptive and includes the weighted burden test and weighted SKAT as special cases. In addition to weighting individuals to account for the biased sampling, we can also consider weighting the variants in SKAT-O. Decreasing the weight of non-causal variants and increasing the weight of causal variants can improve power. However, since researchers do not know which variants are actually causal, it is common practice to weight genetic variants as a function of their minor allele frequencies. This is motivated by the belief that rarer variants are more likely to have larger effects. In Chapter 3, we propose a new unsupervised statistical framework for predicting the functional status of genetic variants. Compared to existing methods, the proposed algorithm integrates a diverse set of annotations---which are partitioned beforehand into multiple groups by the user---and predicts the functional status for each group, taking into account within- and between-group correlations. We demonstrate the advantages of the algorithm through application to real annotation data and conclude with future directions.
Biostatistics
Style APA, Harvard, Vancouver, ISO itp.
19

Zang, Yong, i 臧勇. "Robust tests under genetic model uncertainty in case-control association studies". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46419123.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
20

Shringarpure, Suyash. "Statistical Methods for studying Genetic Variation in Populations". Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/117.

Pełny tekst źródła
Streszczenie:
The study of genetic variation in populations is of great interest for the study of the evolutionary history of humans and other species. Improvement in sequencing technology has resulted in the availability of many large datasets of genetic data. Computational methods have therefore become quite important in analyzing these data. Two important problems that have been studied using genetic data are population stratification (modeling individual ancestry with respect to ancestral populations) and genetic association (finding genetic polymorphisms that affect a trait). In this thesis, we develop methods to improve our understanding of these two problems. For the population stratification problem, we develop hierarchical Bayesian models that incorporate the evolutionary processes that are known to affect genetic variation. By developing mStruct, we show that modeling more evolutionary processes improves the accuracy of the recovered population structure. We demonstrate how nonparametric Bayesian processes can be used to address the question of choosing the optimal number of ancestral populations that describe the genetic diversity of a given sample of individuals. We also examine how sampling bias in genotyping study design can affect results of population structure analysis and propose a probabilistic framework for modeling and correcting sample selection bias. Genome-wide association studies (GWAS) have vastly improved our understanding of many diseases. However, such studies have failed to uncover much of the variation responsible for a number of common multi-factorial diseases and complex traits. We show how artificial selection experiments on model organisms can be used to better understand the nature of genetic associations. We demonstrate using simulations that using data from artificial selection experiments improves the performance of conventional methods of performing association. We also validate our approach using semi-simulated data from an artificial selection experiment on Drosophila Melanogaster.
Style APA, Harvard, Vancouver, ISO itp.
21

Choy, Yan-tsun, i 蔡恩浚. "Statistical evaluation of mixed DNA stains". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B42664287.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
22

Cordell, Heather Jane. "Statistical methods in the genetic analysis of type 1 diabetes". Thesis, University of Oxford, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.296834.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
23

Mathieson, Iain. "Genes in space : selection, association and variation in spatially structured populations". Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:85f051b6-2121-49cf-9468-3ca7ba77cc4a.

Pełny tekst źródła
Streszczenie:
Spatial structure in a population creates distinctive patterns in genetic data. There are two reasons to model this process. First, since the genetic structure of a population is induced by its historical spatial structure, it can be used to make inference about history and demography. Second, these models provide corrections to other analyses that are confounded by spatial structure. Since is it is now common to collect genome-wide data on many thousands of samples, a major challenge is to develop fast, scalable, approximate algorithms that can analyse these datasets. A practical approach is to focus on subsets of the data that are most informative, for example rare variants. First we look at the problem of estimating selection coefficients in spatially structured populations. We demonstrate this approach using classical datasets of moth colour morph frequencies, and then use it in a model incorporating both ancient and modern DNA to estimate the selective advantage of one of the best known examples of local adaptation in humans, lactase persistence in Europeans. Next, we turn to the problem of association studies in spatially structured populations. We demonstrate that rare variants are more confounded by non-genetic risk than common variants. Excess confounding is a consequence of the fact that rare variants are highly in- formative about recent ancestry and therefore, in a spatially explicit model, about location. Finally, we use this insight into rare variants to develop methods for inference about population history using rare variant and haplotype sharing as simple summary statistics. These approaches are extremely fast and can be applied to genome-wide data on thousands of samples, yet they provide an accurate description of the history of a population, both identifying recent ancestry and estimating migration rates between subpopulations.
Style APA, Harvard, Vancouver, ISO itp.
24

Ahiska, Bartu. "Reference-free identification of genetic variation in metagenomic sequence data using a probabilistic model". Thesis, University of Oxford, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.561121.

Pełny tekst źródła
Streszczenie:
Microorganisms are an indispensable part of our ecosystem, yet the natural metabolic and ecological diversity of these organisms is poorly understood due to a historical reliance of microbiology on laboratory grown cultures. The awareness that this diversity cannot be studied by laboratory isolation, together with recent advances in low cost scalable sequencing technology, have enabled the foundation of culture-independent microbiology, or metagenomics. The study of environmental microbial samples with metagenomics has led to many advances, but a number of technological and methodological challenges still remain. A potentially diverse set of taxa may be represented in anyone environmental sample. Existing tools for representing the genetic composition of such samples sequenced with short-read data, and tools for identifying variation amongst them, are still in their infancy. This thesis makes the case that a new framework based on a joint-genome graph can constitute a powerful tool for representing and manipulating the joint genomes of population samples. I present the development of a collection of methods, called SCRAPS, to construct these efficient graphs in small communities without the availability or bias of a reference genome. A key novelty is that genetic variation is identified from the data structure using a probabilistic algorithm that can provide a measure of the confidence in each call. SCRAPS is first tested on simulated short read data for accuracy and efficiency. At least 95% of non-repetitive small-scale genetic variation with a minor allele read depth greater than 10x is correctly identified; the number false positives per conserved nucleotide is consistently better than 1 part in 333 x 103. SCRAPS is then applied to artificially pooled experimental datasets. As part of this study, SCRAPS is used to identify genetic variation in an epidemiological 11 sample Neisseria meningitidis dataset collected from the African meningitis belt". In total 14,000 sites of genetic variation are identified from 48 million Illumina/Solexa reads. The results clearly show the genetic differences between two waves of infection that has plagued northern Ghana and Burkina Faso.
Style APA, Harvard, Vancouver, ISO itp.
25

Bos, David H. "Statistical genetics and molecular evolution of major histocompatibility complex genes". Thesis, University of Canterbury. Biological Sciences, 2005. http://hdl.handle.net/10092/6773.

Pełny tekst źródła
Streszczenie:
MHC region genes have been the subject of molecular evolutionary studies both from single species and from a variety of taxa. The African clawed frog, Xenopus laevis, provides a good model for the study of immune genes such as the MHC class Ia because of the genomic architecture of the MHC region. Herein, I investigate 1) the molecular evolution of the MHC class Ia gene at the population level in X laevis, and 2) the evolution of proteasome subunits psmb5 and lmp7 following duplication from their common ancestral locus using a phylogenetic sampling of mainly vertebrate taxa. Modelbased maximum likelihood statistical procedures are used in an effort to overcome typical problems associated with complex patterns of molecular evolution at these loci. In this thesis I present several new findings, and Chapters I and II focus on phylogenetic investigation of proteasome subunits. Results indicate that several evolutionary mechanisms operate on lmp7 that makes phylogenetic reconstruction of this locus difficult. I show that analysis of this gene is sensitive to the particular assumptions of various models of nucleotide evolution commonly used for phylogenetics. I also investigate whether or not natural selection operated differentially on duplicates of the proto-lmp7 gene locus. I provide evidence that positive Darwinian evolution contributed to the functional divergence of gene family members derived from this locus-making this one of the few examples of positive natural selection operating on a protein with housekeeping functions. Several new and major findings are also presented for the X. laevis class la MHC gene in Chapters III, IV and V. For the first time I provide robust estimates of substitution rates that show the operation of natural selection on peptide binding region (PBR) amino acids of the class la gene. I also show for the first time that intralocus recombinations are a major source of variation in the class la gene in X. laevis. Patterns of polymorphism at the class la locus are investigated in greater detail, and provide evidence for a molecular basis driving the coevolution of functionally linked genes. Combining data from other species, my results also demonstrate that the mode of MHC class la evolution is different than the classical paradigm detailed in mammals. Finally, my research is the first to demonstrate that non-linkage of the class I and class II genes in a single genomic region is not always necessary for this mode of class la evolution, as previously expected.
Style APA, Harvard, Vancouver, ISO itp.
26

Lundell, Jill F. "Tuning Hyperparameters in Supervised Learning Models and Applications of Statistical Learning in Genome-Wide Association Studies with Emphasis on Heritability". DigitalCommons@USU, 2019. https://digitalcommons.usu.edu/etd/7594.

Pełny tekst źródła
Streszczenie:
Machine learning is a buzz word that has inundated popular culture in the last few years. This is a term for a computer method that can automatically learn and improve from data instead of being explicitly programmed at every step. Investigations regarding the best way to create and use these methods are prevalent in research. Machine learning models can be difficult to create because models need to be tuned. This dissertation explores the characteristics of tuning three popular machine learning models and finds a way to automatically select a set of tuning parameters. This information was used to create an R software package called EZtune that can be used to automatically tune three widely used machine learning algorithms: support vector machines, gradient boosting machines, and adaboost. The second portion of this dissertation investigates the implementation of machine learning methods in finding locations along a genome that are associated with a trait. The performance of methods that have been commonly used for these types of studies, and some that have not been commonly used, are assessed using simulated data. The affect of the strength of the relationship between the genetic code and the trait is of particular interest. It was found that the strength of this relationship was the most important characteristic in the efficacy of each method.
Style APA, Harvard, Vancouver, ISO itp.
27

Vaez, Torshizi Rasoul. "Quantitative genetic analyses of production and reproduction traits in Australian merino sheep". Thesis, The University of Sydney, 1996. https://hdl.handle.net/2123/27593.

Pełny tekst źródła
Streszczenie:
Restricted Maximum Likelihood (REML) procedures based on a derivative-free algorithm using the Simplex method and fitting an animal model were used to estimate variance and covariance components for several performances of productive traits, namely, body weight measured at birth, weaning, 10 month, 16 month and 22 months of age, greasy fleece average daily gain to 4, 10, 16 and 22 months of age, clean fleece average daily gain to 10, 16, 22 months of age and mean fibre diameter measured at 10, 16 and 22 months of age. For these traits, the importance of maternal effects, either additive genetic or environmental, were investigated. The interrelationship among the performances of each trait were studied, and then were used to determine the efficiencies of indirect selection at early ages compared with later ages for improvement of an animal's lifetime production.
Style APA, Harvard, Vancouver, ISO itp.
28

Lee, Yiu-fai, i 李耀暉. "Analysis for segmental sharing and linkage disequilibrium: a genomewide association study on myopia". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B43912217.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
29

Ciampa, Julia Grant. "Multilocus approaches to the detection of disease susceptibility regions : methods and applications". Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:8f82a624-7d80-438c-af3e-68ce983ff45f.

Pełny tekst źródła
Streszczenie:
This thesis focuses on multilocus methods designed to detect single nucleotide polymorphisms (SNPs) that are associated with disease using case-control data. I study multilocus methods that allow for interaction in the regression model because epistasis is thought to be pervasive in the etiology of common human diseases. In contrast, the single-SNP models widely used in genome wide association studies (GWAS) are thought to oversimplify the underlying biology. I consider both pairwise interactions between individual SNPs and modular interactions between sets of biologically similar SNPs. Modular epistasis may be more representative of disease processes and its incorporation into regression analyses yields more parsimonious models. My methodological work focuses on strategies to increase power to detect susceptibility SNPs in the presence of genetic interaction. I emphasize the effect of gene-gene independence constraints and explore methods to relax them. I review several existing methods for interaction analyses and present their first empirical evaluation in a GWAS setting. I introduce the innovative retrospective Tukey score test (RTS) that investigates modular epistasis. Simulation studies suggest it offers a more powerful alternative to existing methods. I present diverse applications of these methods, using data from a multi-stage GWAS on prostate cancer (PRCA). My applied work is designed to generate hypotheses about the functionality of established susceptibility regions for PRCA by identifying SNPs that affect disease risk through interactions with them. Comparison of results across methods illustrates the impact of incorporating different forms of epistasis on inference about disease association. The top findings from these analyses are well supported by molecular studies. The results unite several susceptibility regions through overlapping biological pathways known to be disrupted in PRCA, motivating replication study.
Style APA, Harvard, Vancouver, ISO itp.
30

Lu, Li. "Some actuarial and statistical investigations into topics on genetics and insurance". Thesis, Heriot-Watt University, 2006. http://hdl.handle.net/10399/154.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
31

Zorrilla, Luc. "Beyond high mutation highrecombination limit in statisticalgenetics". Thesis, KTH, Fysik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-296875.

Pełny tekst źródła
Streszczenie:
One considers the bi-allelic model in population genetics which describes a population of genomes evolving under the processes of selection, mutation, recombination and drift. A focus is made on the Quasi-Linkage Equilibrium (QLE) phase with recent derivations from Neher and Shraiman, which exists for fast recombinations compared to selection strength and whose dynamics is greatly simplified compared to the general case. Using results in the QLE regime along with Direct Coupling Analysis (DCA) one can infer fitness landscape in a population, in particular epistasis coefficients. Following these ideas, we investigate here in detail a relation between population size, recombination rate and epistasis variance describing where the QLE regime breaks down, from a DCA-inference point of view. In particular, we find that there is no clear variation of the critical recombination rate with population size, but that as expected there is a linear dependence between the standard deviation of total epistasis and that critical recombination rate.
I examensarbetet behandlas modeller i populationsgenetik med två alleler per lokus. Modellerna beskiver hur en mängd genom ändras över tid under inflytande av naturligt urval, mutationer, rekombination och genetisk drift. Fokus ligger på en fas icke-egentlig linkagejämvikt (Quasi Linkage Equilibrium, QLE) med härledningar av Neher och Shraiman. Denna fas finns när rekombination är en snabb process relativt det naturliga urvalet, och förenklar dynamiken avsevärt jämfört med det allmäna fallet. Med användning av resultat som gäller i QLE samt direktkopplingsanalys (Direct Coupling Analysis, DCA) kan man härleda urvalslandskapet i vilket en population befinner sig i, särsklit epistaskoefficienter. Med användning av dessa ideer undersöker vi här i detalj ett samband mellan populationsstorlek, rekombantionshastighet och epistasspridning som beskriver var QLE slutar gälla, från ett DCA- inferens-perspektiv. Vi finner att det inte finns något klart samband mellan den kritiska rekombinationshastigheten och populationsstorleken, men som väntat ett linjärt förhållande mellan epistasvariationen och den kritiska rekombinationshastighete.
Style APA, Harvard, Vancouver, ISO itp.
32

Golding, Pauline Lindsay. "Development of a statistical method for the identification of gene-environment interactions". Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/6520.

Pełny tekst źródła
Streszczenie:
In order to understand common, complex disease it is necessary to consider not just genetic risks and environmental risks, but also the interplay between them. This thesis aims to develop methodology for the detection of gene-environment interactions specifically; both by looking at the strengths and weaknesses of traditional approaches and through the development and testing of a novel statistical method. Developments in genotyping technology enable researchers to collect large volumes of polymorphisms in human genes, yet very few statistical methods are able to handle the volume, variation and complexity of this data, especially in combination with environmental risk factors. Interactions between genes and the environment are often subject to the curse of dimensionality, with each new variable increasing the potential number of interactions exponentially, leading to low power and a high false positive rate. The Mixed Tree Method (MTM) exploits the differences between environmental and genetic variables, by selecting the most appropriate features from conventional methods (including recursive partitioning, random forests and logistic regression) and combining them with new comparison algorithms which rank the genetic variables by the likelihood that they interact with the environmental variable under study. Results show the MTM to be as effective as the most successful current method for identification of interactions, but maintaining a much lower false positive rate and computational burden. As the number of SNPs in the dataset increases, the success of MTM compared to other methods becomes greater while the comparator approaches exhibit computational problems and rapidly increasing processing times. The MTM is also applied to a colorectal cancer dataset to show its use in a practical setting. The results together suggest that MTM could be a useful strategy for identifying gene environment interactions in future studies into complex disease.
Style APA, Harvard, Vancouver, ISO itp.
33

Shen, Xia. "Novel Statistical Methods in Quantitative Genetics : Modeling Genetic Variance for Quantitative Trait Loci Mapping and Genomic Evaluation". Doctoral thesis, Uppsala universitet, Beräknings- och systembiologi, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-170091.

Pełny tekst źródła
Streszczenie:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision.  Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes.  The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Style APA, Harvard, Vancouver, ISO itp.
34

Guturu, Harendra. "Deciphering human gene regulation using computational and statistical methods". Thesis, Stanford University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3581147.

Pełny tekst źródła
Streszczenie:

It is estimated that at least 10-20% of the mammalian genome is dedicated towards regulating the 1-2% of the genome that codes for proteins. This non-coding, regulatory layer is a necessity for the development of complex organisms, but is poorly understood compared to the genetic code used to translate coding DNA into proteins. In this dissertation, I will discuss methods developed to better understand the gene regulatory layer. I begin, in Chapter 1, with a broad overview of gene regulation, motivation for studying it, the state of the art with a historically context and where to look forward.

In Chapter 2, I discuss a computational method developed to detect transcription factor (TF) complexes. The method compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid transcription factor (TF) complexes. Structural data were integrated to explore overlapping motif arrangements while ensuring physical plausibility of the TF complex. Using this approach, I predicted 422 physically realistic TF complex motifs at 18% false discovery rate (FDR). I found that the set of complexes is enriched in known TF complexes. Additionally, novel complexes were supported by chromatin immunoprecipitation sequencing (ChIP-seq) datasets. Analysis of the structural modeling revealed three cooperativity mechanisms and a tendency of TF pairs to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. The TF complexes and associated binding site predictions are made available as a web resource at http://complex.stanford.edu.

Next, in Chapter 3, I discuss how gene enrichment analysis can be applied to genome-wide conserved binding sites to successfully infer regulatory functions for a given TF complex. A genomic screen predicted 732,568 combinatorial binding sites for 422 TF complex motifs. From these predictions, I inferred 2,440 functional roles, which are consistent with known functional roles of TF complexes. In these functional associations, I found interesting themes such as promiscuous partnering of TFs (such as ETS) in the same functional context (T cells). Additionally, functional enrichment identified two novel TF complex motifs associated with spinal cord patterning genes and mammary gland development genes, respectively. Based on these predictions, I discovered novel spinal cord patterning enhancers (5/9, 56% validation rate) and enhancers active in MCF7 cells (11/19, 53% validation rate). This set replete with thousands of additional predictions will serve as a powerful guide for future studies of regulatory patterns and their functional roles.

Then, in Chapter 4, I outline a method developed to predict disease susceptibility due to gene mis-regulation. The method interrogates ensembles of conserved binding sites of regulatory factors disrupted by an individual's variants and then looks for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is reflective of their very different medical histories. These results suggest that erosion of gene regulation results in function specific mutation loads that manifest as disease predispositions in a familial lineage. Additionally, this aggregate analysis method addresses the problem that although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing loci.

Finally, I conclude in Chapter 5 with a summary of my findings throughout my research and future directions of research based on my findings.

Style APA, Harvard, Vancouver, ISO itp.
35

Hu, Xianghong. "Statistical methods for Mendelian randomization using GWAS summary data". HKBU Institutional Repository, 2019. https://repository.hkbu.edu.hk/etd_oa/639.

Pełny tekst źródła
Streszczenie:
Mendelian Randomization (MR) is a powerful tool for accessing causality of exposure on an outcome using genetic variants as the instrumental variables. Much of the recent developments is propelled by the increasing availability of GWAS summary data. However, the accuracy of the MR causal effect estimates could be challenged in case of the MR assumptions are violated. The source of biases could attribute to the weak effects arising because of polygenicity, the presentence of horizontal pleiotropy and other biases, e.g., selection bias. In this thesis, we proposed two works, expecting to deal with these issues.In the first part, we proposed a method named 'Bayesian Weighted Mendelian Randomization (BMWR)' for causal inference using summary statistics from GWAS. In BWMR, we not only take into account the uncertainty of weak effects owning to polygenicity of human genomics but also models the weak horizontal pleiotropic effects. Moreover, BWMR adopts a Bayesian reweighting strategy for detection of large pleiotropic outliers. An efficient algorithm based on variational inference was developed to make BWMR computationally efficient and stable. Considering the underestimated variance provided by variational inference, we further derived a closed form variance estimator inspired by a linear response method. We conducted several simulations to evaluate the performance of BWMR, demonstrating the advantage of BWMR over other methods. Then, we applied BWMR to access causality between 126 metabolites and 90 complex traits, revealing novel causal relationships. In the second part, we further developed BWMR-C: Statistical correction of selection bias for Mendelian Randomization based on a Bayesian weighted method. Based on the framework of BWMR, the probability model in BWMR-C is built conditional on the IV selection criteria. In such way, BWMR-C delicated to reduce the influence of the selection process on the causal effect estimates and also preserve the good properties of BWMR. To make the causal inference computationally stable and efficient, we developed a variational EM algorithm. We conducted several comprehensive simulations to evaluate the performance of BWMR-C for correction of selection bias. Then, we applied BWMR-C on seven body fat distribution related traits and 140 UK Biobank traits. Our results show that BWMR-C achieves satisfactory performance for correcting selection bias. Keywords: Mendelian Randomization, polygenicity, horizontal pleiotropy, selection bias, variation inference.
Style APA, Harvard, Vancouver, ISO itp.
36

Li, Yong-Jun. "The application of statistical physics in bioinformatics /". View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?PHYS%202003%20LI.

Pełny tekst źródła
Streszczenie:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 55-58). Also available in electronic version. Access restricted to campus users.
Style APA, Harvard, Vancouver, ISO itp.
37

McCaskie, Pamela Ann. "Multiple-imputation approaches to haplotypic analysis of population-based data with applications to cardiovascular disease". University of Western Australia. School of Population Health, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0160.

Pełny tekst źródła
Streszczenie:
[Truncated abstract] This thesis investigates novel methods for the genetic association analysis of haplotype data in samples of unrelated individuals, and applies these methods to the analysis of coronary heart disease and related phenotypes. Determining the inheritance pattern of genetic variants in studies of unrelated individuals can be problematic because family members of the studied individuals are often not available. For the analysis of individual genetic loci, no problem arises because the unit of interest is the observed genotype. When the unit of interest is the linear combination of alleles along one chromosome, inherited together in a haplotype, it is not always possible to determine with certainty the inheritance pattern, and therefore statistical methods to infer these patterns must be adopted. Due to genotypic heterozygosity, mutliple possible haplotype configurations can often resolve an individual's genotype measures at multiple loci. When haplotypes are not known, but are inferred statistically, an element of uncertainty is thus inherent which, if not dealt with appropriately, can result in unreliable estimates of effect sizes in an association setting. The core aim of the research described in this thesis was to develop and implement a general method for haplotype-based association analysis using multiple imputation to appropriately deal with uncertainty haplotype assignment. Regression-based approaches to association analysis provide flexible methods to investigate the influence of a covariate on a response variable, adjusting for the effects of other variables including interaction terms. ... These methods are then applied to models accommodating binary, quantitative, longitudinal and survival data. The performance of the multiple imputation method implemented was assessed using simulated data under a range of haplotypic effect sizes and genetic inheritance patterns. The multiple imputation approach performed better, on average, than ignoring haplotypic uncertainty, and provided estimates that in most cases were similar to those observed when haplotypes were known. The haplotype association methods developed in this thesis were used to investigate the genetic epidemiology of cardiovascular disease, utilising data for the cholesteryl ester transfer protein gene (CETP), the hepatic lipase (LIPC) gene and the 15- lipoxygenase (ALOX15) gene on a total of 6,487 individuals from three Western Australian studies. Results of these analyses suggested single nucleotide polymorphisms (SNPs) and haplotypes in the CETP gene were associated with increased plasma high-density lipoprotein cholesterol (HDL-C). SNPs in the LIPC gene were also associated with increased HDL-C and haplotypes in the ALOX15 gene were associated with risk of carotid plaque among individuals with premature CHD. The research presented in this thesis is both novel and important as it provides methods for the analysis of haplotypic associations with a range of response types, while incorporating information about haplotype uncertainty inherent in populationbased studies. These methods are shown to perform well for a range of simulated and real data situations, and have been written into a statistical analysis package that has been freely released to the research community.
Style APA, Harvard, Vancouver, ISO itp.
38

Allchin, Lorraine Doreen May. "Statistical methods for mapping complex traits". Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:65f392ba-1b64-4b00-8871-7cee98809ce1.

Pełny tekst źródła
Streszczenie:
The first section of this thesis addresses the problem of simultaneously identifying multiple loci that are associated with a trait, using a Bayesian Markov Chain Monte Carlo method. It is applicable to both case/control and quantitative data. I present simulations comparing the methods to standard frequentist methods in human case/control and mouse QTL datasets, and show that in the case/control simulations the standard frequentist method out performs my model for all but the highest effect simulations and that for the mouse QTL simulations my method performs as well as the frequentist method in some cases and worse in others. I also present analysis of real data and simulations applying my method to a simulated epistasis data set. The next section was inspired by the challenges involved in applying a Markov Chain Monte Carlo method to genetic data. It is an investigation into the performance and benefits of the Matlab parallel computing toolbox, specifically its implementation of the Cuda programing language to Matlab's higher level language. Cuda is a language which allows computational calculations to be carried out on the computer's graphics processing unit (GPU) rather than its central processing unit (CPU). The appeal of this tool box is its ease of use as few code adaptions are needed. The final project of this thesis was to develop an HMM for reconstructing the founders of sparsely sequenced inbred populations. The motivation here, that whilst sequencing costs are rapidly decreasing, it is still prohibitively expensive to fully sequence a large number of individuals. It was proposed that, for populations descended from a known number of founders, it would be possible to sequence these individuals with a very low coverage, use a hidden Markov model (HMM) to represent the chromosomes as mosaics of the founders, then use these states to impute the missing data. For this I developed a Viterbi algorithm with a transition probability matrix based on recombination rate which changes for each observed state.
Style APA, Harvard, Vancouver, ISO itp.
39

Silver, Matthew. "Statistical methods in neuroimaging genetics : pathways sparse regression and cluster size inference". Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/11124.

Pełny tekst źródła
Streszczenie:
In the field of neuroimaging genetics, brain images are used as phenotypes in the search for genetic variants associated with brain structure or function. This search presents a formidable statistical challenge, not least because of the very high dimensionality of genotype and phenotype data produced by modern SNP (single nucleotide polymorphism) arrays and high resolution MRI. This thesis focuses on the use of multivariate sparse regression models such as the group lasso and sparse group lasso for the identification of gene pathways associated with both univariate and multivariate quantitative traits. The methods described here take particular account of various factors specific to pathways genome-wide association studies including widespread correlation (linkage disequilibrium) between genetic predictors, and the fact that many variants overlap multiple pathways. A resampling strategy that exploits finite sample variability is employed to provide robust rankings for pathways, SNPs and genes. Comprehensive simulation studies are presented comparing one proposed method, pathways group lasso with adaptive weights, to a popular alternative. This method is extended to the case of a multivariate phenotype, and the resulting pathways sparse reduced-rank regression model and algorithm is applied to a study identifying gene pathways associated with structural change in the brain characteristic of Alzheimer’s disease. The original model is also adapted for the task of ’pathways-driven’ SNP and gene selection, and this latter model, pathways sparse group lasso with adaptive weights, is applied in a search for SNPs and genes associated with elevated lipid levels in two separate cohorts of Asian adults. Finally, in a separate section an existing method for the identification of spatially extended clusters of image voxels with heightened activation is evaluated in an imaging genetic context. This method, known as cluster size inference, rests on a number of assumptions. Using real imaging and SNP data, false positive rates are found to be poorly controlled outside of a narrow range of parameters related to image smoothness and activation thresholds for cluster formation.
Style APA, Harvard, Vancouver, ISO itp.
40

Kecskemetry, Peter D. "Computationally intensive methods for hidden Markov models with applications to statistical genetics". Thesis, University of Oxford, 2014. https://ora.ox.ac.uk/objects/uuid:8dd5d68d-27e9-4412-868c-0477e438a2c5.

Pełny tekst źródła
Streszczenie:
In most fields of technology and science, the exponential increase of available data is an apparent trend. In genetics, the main contributor to this trend is the improving efficiency of sequencing technologies. While the Human Genome project focused on assembling a single reference sequence not long ago, now there are aims to sequence million genomes in upcoming projects. The consequent computational challenge is being able to utilise this wealth of data, which requires the development of sufficiently powerful methods for analysis. However, the speed of transistor-based computing processors has recently hit a power ceiling and developers can no longer rely on hardware improvements automatically providing performance improvements in software directly. The result is that analysis methods are failing to keep up with the speed of data generation, and at this age of exponential data explosion it is becoming critical to find any solution for improving the performance of statistical methods. One traditional approach is to apply approximations - often trading the quality of results for response time. Another approach is to achieve algorithmic optimisations for existing methods without sacrificing results. Unfortunately, the possibilities for purely algorithmic optimisations often tend to be limited. A third approach is to attempt to harness the computational power of the presently re-emerging field of parallel computing. While the theoretical performance of parallel platforms roughly follows Moore's law, exploiting the power of parallelisms requires significant effort during development and may not even be possible in certain applications. This work attempts to explore avenues for achieving high performance for Hidden Markov Models (HMMs) and HMM applications in population genetics. The second chapter of this thesis introduces a single-locus variant of the IMPUTE2 method for calling and phasing genotype variants based on genotype likelihood data. This method uses both approximations and algorithmic optimisations and achieves performance improvements without a considerable drop in accuracy. It is also aimed to be highly parallelisable. The third chapter presents GPGPU-focused parallelisation methods over the statespace for HMM algorithms specifically under the Li and Stephens model, which is a widely and successfully used approximation of the coalescent. Practical experiments show ×200-×6000 times acceleration with a CUDA implementation of the popular Chromopainter method, which is based on the Li and Stephens model. The last chapter explores the theoretical possibility of parallelising HMM algorithms across blocks of observations (inspired by but not limited to methods used in genetics). A novel view and derivation is presented for block parallelism, along with accompanying analyses of applicability and relevance. Performance analysis results indicate that the application of block-parallelism is expected to be highly relevant for most large-scale HMM applications on present-day computing platforms, while block-parallelism may become a necessity for utilising the improving power of parallel hardware in the close future.
Style APA, Harvard, Vancouver, ISO itp.
41

Dilthey, Alexander Tilo. "Statistical HLA type imputation from large and heterogeneous datasets". Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:1bca18bf-b9d5-4777-b58e-a0dca4c9dbea.

Pełny tekst źródła
Streszczenie:
An individual's Human Leukocyte Antigen (HLA) type is an essential immunogenetic parameter, influencing susceptibility to a variety of autoimmune and infectious diseases, to certain types of cancer and the likelihood of adverse drug reactions. I present and evaluate two models for the accurate statistical determination of HLA types for single-population and multi-population studies, based on SNP genotypes. Importantly, SNP genotypes are already available for many studies, so that the application of the statistical methods presented here does not incur any extra cost besides computing time. HLA*IMP:01 is based on a parallelized and modified version of LDMhc (Leslie et al., 2008), enabling the processing of large reference panels and improving call rates. In a homogeneous single-population imputation scenario on a mainly British dataset, it achieves accuracies (posterior predictive values) and call rates >=88% at all classical HLA loci (HLA-A, HLA-B, HLA-C, HLA-DQA1, HLA-DQB1, HLA-DRB1) at 4-digit HLA type resolution. HLA*IMP:02 is specifically designed to deal with multi-population heterogeneous reference panels and based on a new algorithm to construct haplotype graph models that takes into account haplotype estimate uncertainty, allows for missing data and enables the inclusion of prior knowledge on linkage disequilibrium. It works as well as HLA*IMP:01 on homogeneous panels and substantially outperforms it in more heterogeneous scenarios. In a cross-European validation experiment, even without setting a call threshold, HLA*IMP:02 achieves an average accuracy of 96% at 4-digit resolution (>=91% for all loci, which is achieved at HLA-DRB1). HLA*IMP:02 can accurately predict structural variation (DRB paralogs), can (to an extent) detect errors in the reference panel and is highly tolerant of missing data. I demonstrate that a good match between imputation and reference panels in terms of principal components and reference panel size are essential determinants of high imputation accuracy under HLA*IMP:02.
Style APA, Harvard, Vancouver, ISO itp.
42

Sharif, Maarya. "Statistical issues in modelling the ancestry from Y-chromosome and surname data". Thesis, University of Glasgow, 2012. http://theses.gla.ac.uk/3407/.

Pełny tekst źródła
Streszczenie:
A considerable industry has grown-up around genealogical inference from genetic testing, supplementing more traditional genealogical techniques but with very limited quantification of uncertainty. In many societies Y-chromosomes are co-inherited with surnames and as such passed down from father to son. This thesis seeks to explore what the correlation can say about ancestry. In particular it is concerned with estimation of the time to the most recent common paternal ancestor (TMRCA) for pairs of males who are not known to be directly related but share the same surname, based on the repeat number at short tandem repeat (STR) markers on their Y-chromosomes. We develop a model of TMRCA estimation based on the difference in repeat numbers in pairs of male haplotypes using a Bayesian framework and Markov-Chain Monte-Carlo techniques, such as adaptive Metropolis-Hastings algorithm. The model incorporates the process of STR discovery and the calibration of mutation rates, which can differ across STRs. In simulation studies, we find that the estimates of TMRCA are rather robust to the ascertainment process and the way in which it is modelled. However, they are affected by the site-specific mutation rates at the typed STRs. Indeed sequencing the fastest mutating STRs yields a lower error in the estimated TMRCA than random STRs. In the British context, we extend our model to include additional information such as the haplogroup status (as determined from single nucleotide polymorphisms, SNPs) of the pair of males, as well as the frequency and origin of the surname. In general, the effect of this is to reduce estimates of the TMRCA for pairs of males with an older TMRCA, typically outwith the period of surname establishment (about 500-700 years ago). In the genealogical context, incorporating surname frequency (within the prior distribution) results in lower estimates of TMRCA for pairs of males who appear to have diverged from a common male ancestor since the period of surname establishment. In addition, we include uncertainty in the years per generation conversion factor in our model.
Style APA, Harvard, Vancouver, ISO itp.
43

Fernandez, Daniel. "Cell States and Cell Fate: Statistical and Computational Models in (Epi)Genomics". Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:14226043.

Pełny tekst źródła
Streszczenie:
This dissertation develops and applies several statistical and computational methods to the analysis of Next Generation Sequencing (NGS) data in order to gain a better understanding of our biology. In the rest of the chapter we introduce key concepts in molecular biology, and recent technological developments that help us better understand this complex science, which, in turn, provide the foundation and motivation for the subsequent chapters. In the second chapter we present the problem of estimating gene/isoform expression at the allelic level, and different models to solve this problem. First, we describe the observed data and the computational workflow to process the data. Next, we propose frequentist and bayesian models motivated by the central dogma of molecular biology and the data generating process (DGP) for RNA-Seq. We develop EM and Gibbs sampling approaches to estimate gene and transcript-specic expression from our proposed models. Finally, we present the performance of our models in simulations and we end with the analysis of experimental RNA-Seq data at the allelic level. In the third chapter we present our paired factorial experimental design to study parentally biased gene/isoform expression in the mouse cerebellum, and dynamic changes of this pattern between young and adult stages of cerebellar development. We present a bayesian variable selection model to estimate the difference in expression between the paternal and maternal genes, while incorporating relevant factors and its interactions into the model. Next, we apply our model to our experimental data, and further on we validate our predictions using pyrosequencing follow-up experiments. We subsequently applied our model to the pyrosequencing data across multiple brain regions. Our method, combined with the validation experiments, allowed us to find novel imprinted genes, and investigate, for the first time, imprinting dynamics across brain regions and across development. In the fourth chapter we move from the controlled-experiments in mouse isogenic lines to the highly variant world of human genetics in observational studies. In this chapter we introduce a Bayesian Regression Allelic Imbalance Model, BRAIM, that estimates the imbalance coming from two major sources: cis-regulation and imprinting. We model the cis-effect as an additive effect for the heterozygous group and we model the parent-of-origin detect with a latent variable that indicates to which parent a given allele belongs. Next, we show the performance of the model under simulation scenarios, and finally we apply the model to several experiments across multiple tissues and multiple individuals. In the fifth chapter we characterize the transcriptional regulation and gene expression of in-vitro Embryonic Stem Cells (ESCs), and two-related in-vivo cells; the Inner Cell Mass (ICM) tissue, and the embryonic tissue at day 6.5. Our objective is two fold. First we would like to understand the differences in gene expression between the ESCs and their in-vivo counterpart from where these cells were derived (ICM). Second, we want to characterize the active transcriptional regulatory regions using several histone modifications and to connect such regulatory activity with gene expression. In this chapter we used several statistical and computational methods to analyze and visualize the data, and it provides a good showcase of how combining several methods of analysis we can delve into interesting developmental biology.
Style APA, Harvard, Vancouver, ISO itp.
44

Crisci, Jessica L. "On Identifying Signatures of Positive Selection in Human Populations: A Dissertation". eScholarship@UMMS, 2013. https://escholarship.umassmed.edu/gsbs_diss/664.

Pełny tekst źródła
Streszczenie:
As sequencing technology continues to produce better quality genomes at decreasing costs, there has been a recent surge in the variety of data that we are now able to analyze. This is particularly true with regards to our understanding of the human genome—where the last decade has seen data advances in primate epigenomics, ancient hominid genomics, and a proliferation of human polymorphism data from multiple populations. In order to utilize such data however, it has become critical to develop increasingly sophisticated tools spanning both bioinformatics and statistical inference. In population genetics particularly, new statistical approaches for analyzing population data are constantly being developed—unfortunately, often without proper model testing and evaluation of type-I and type-II error. Because the common Wright-Fisher assumptions underlying such models are generally violated in natural populations, this statistical testing is critical. Thus, my dissertation has two distinct but related themes: 1) evaluating methods of statistical inference in population genetics, and 2) utilizing these methods to analyze the evolutionary history of humans and our closest relatives. The resulting collection of work has not only provided important biological insights (including some of the first strong evidence of selection on human-specific epigenetic modifications (Shulha, Crisci, Reshetov, Tushir et al. 2012, PLoS Bio), and a characterization of human-specific genetic changes distinguishing modern humans from Neanderthals (Crisci et al. 2011, GBE)), but also important insights in to the performance of population genetic methodologies which will motivate the future development of improved approaches for statistical inference (Crisci et al, in review).
Style APA, Harvard, Vancouver, ISO itp.
45

Crisci, Jessica L. "On Identifying Signatures of Positive Selection in Human Populations: A Dissertation". eScholarship@UMMS, 2006. http://escholarship.umassmed.edu/gsbs_diss/664.

Pełny tekst źródła
Streszczenie:
As sequencing technology continues to produce better quality genomes at decreasing costs, there has been a recent surge in the variety of data that we are now able to analyze. This is particularly true with regards to our understanding of the human genome—where the last decade has seen data advances in primate epigenomics, ancient hominid genomics, and a proliferation of human polymorphism data from multiple populations. In order to utilize such data however, it has become critical to develop increasingly sophisticated tools spanning both bioinformatics and statistical inference. In population genetics particularly, new statistical approaches for analyzing population data are constantly being developed—unfortunately, often without proper model testing and evaluation of type-I and type-II error. Because the common Wright-Fisher assumptions underlying such models are generally violated in natural populations, this statistical testing is critical. Thus, my dissertation has two distinct but related themes: 1) evaluating methods of statistical inference in population genetics, and 2) utilizing these methods to analyze the evolutionary history of humans and our closest relatives. The resulting collection of work has not only provided important biological insights (including some of the first strong evidence of selection on human-specific epigenetic modifications (Shulha, Crisci, Reshetov, Tushir et al. 2012, PLoS Bio), and a characterization of human-specific genetic changes distinguishing modern humans from Neanderthals (Crisci et al. 2011, GBE)), but also important insights in to the performance of population genetic methodologies which will motivate the future development of improved approaches for statistical inference (Crisci et al, in review).
Style APA, Harvard, Vancouver, ISO itp.
46

Katsumata, Yuriko. "STATISTICAL ANALYSES TO DETECT AND REFINE GENETIC ASSOCIATIONS WITH NEURODEGENERATIVE DISEASES". UKnowledge, 2017. https://uknowledge.uky.edu/epb_etds/17.

Pełny tekst źródła
Streszczenie:
Dementia is a clinical state caused by neurodegeneration and characterized by a loss of function in cognitive domains and behavior. Alzheimer’s disease (AD) is the most common form of dementia. Although the amyloid β (Aβ) protein and hyperphosphorylated tau aggregates in the brain are considered to be the key pathological hallmarks of AD, the exact cause of AD is yet to be identified. In addition, clinical diagnoses of AD can be error prone. Many previous studies have compared the clinical diagnosis of AD against the gold standard of autopsy confirmation and shown substantial AD misdiagnosis Hippocampal sclerosis of aging (HS-Aging) is one type of dementia that is often clinically misdiagnosed as AD. AD and HS-Aging are controlled by different genetic architectures. Familial AD, which often occurs early in life, is linked to mainly mutations in three genes: APP, PSEN1, and PSEN2. Late-onset AD (LOAD) is strongly associated with the ε4 allele of apolipoprotein E (APOE) gene. In addition to the APOE gene, genome-wide association studies (GWAS) have identified several single nucleotide polymorphisms (SNPs) in or close to some genes associated with LOAD. On the other hand, GRN, TMEM106B, ABCC9, and KCNMB2 have been reported to harbor risk alleles associated with HS-Aging pathology. Although GWAS have succeeded in revealing numerous susceptibility variants for dementias, it is an ongoing challenge to identify functional loci and to understand how they contribute to dementia pathogenesis. Until recently, rare variants were not investigated comprehensively. GWAS rely on genotype imputation which is not reliable for rare variants. Therefore, imputed rare variants are typically removed from GWAS analysis. Recent advances in sequencing technologies enable accurate genotyping of rare variants, thus potentially improving our understanding the role of rare variants on disease. There are significant computational and statistical challenges for these sequencing studies. Traditional single variant-based association tests are underpowered to detect rare variant associations. Instead, more powerful and computationally efficient approaches for aggregating the effects of rare variants have become a standard approach for association testing. The sequence-kernel association test (SKAT) is one of the most powerful rare variant analysis methods. A recently-proposed scan-statistic-based test is another approach to detect the location of rare variant clusters influencing disease. In the first study, we examined the gene-based associations of the four putative risk genes, GRN, TMEM106B, ABCC9, and KCNMB2 with HS-aging pathology. We analyzed haplotype associations of a targeted ABCC9 region with HS-Aging pathology and with ABCC9 gene expression. In the second study, we elucidated the role of the non-coding SNPs identified in the International Genomics of Alzheimer’s Project (IGAP) consortium GWAS within a systems genetics framework to understand the flow of biological information underlying AD. In the last study, we identified genetic regions which contain rare variants associated with AD using a scan-statistic-based approach.
Style APA, Harvard, Vancouver, ISO itp.
47

Silva, Heyder Diniz. "Aspectos biométricos da detecção de QTL'S ("Quantitative Trait Loci") em espécies cultivadas". Universidade de São Paulo, 2001. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-18102002-162652/.

Pełny tekst źródła
Streszczenie:
O mapeamento de QTL's difere dos demais tipos de pesquisas conduzida em genética. Por se tratar basicamente de um procedimento de testes múltiplos, surge, neste contexto, um problema que se refere ao nível de significância conjunto da análise, e consequentemente, seu poder. Deste modo, avaliou-se, via simulação computacional de dados, o poder de detecção de QTL's da análise de marcas simples, realizada por meio de regressão linear múltipla, utilizando o procedimento stepwise" para seleção das marcas e procedimentos baseados em testes individuais, utilizando os critérios FDR e de Bonferroni para determinação nível de significância conjunto. Os resultados mostraram que o procedimento baseado em regressão múltipla, utilizando o procedimento stepwise" foi mais poderoso em identificar as marcas associadas a QTL's e, mesmo nos casos em que este procedimento apresentou poder ligeiramente inferior aos demais, verificou-se que o mesmo tem como grande vantagem selecionar apenas as marcas mais fortemente ligadas aos QTL's. Dentre os critérios FDR e de Bonferroni, o primeiro mostrou-se, em geral, mais poderoso, devendo ser adotado nos procedimentos de mapeamento por intervalo. Outro problema encontrado na análise de QTL's refere-se µa abordagem da interação QTL's x ambientes. Neste contexto, apresentou-se uma partição da variância da interação genótipos x ambientes em efeitos explicados pelos marcadores e desvios, a partir da qual obtiveram-se os estimadores da proporção da variância genética (pm), e da variância da interação genótipos x ambientes (pms), explicadas pelos marcadores moleculares. Estes estimadores independem de desvios das frequências alélicas dos marcadores em relação µ as esperadas (1:2:1 em uma geração F2, 1:1 em um retrocruzamento, etc.), porém, apresentam uma alta probabilidade de obtenção de estimativas fora do intervalo paramétrico, principalmente para valores elevados destas proporções. Contudo, estas probabilidades podem ser reduzidas com o aumento do número de repetições e/ou de ambientes nos quais as progênies são avaliadas. A partir de um conjunto de dados de produtividade de grãos, referentes µ a avaliação de 68 progênies de milho, genotipadas para 77 marcadores moleculares codominantes e avaliadas em quatro ambientes, verificou-se que as metodologias apresentadas permitiram estimar as proporções pm e pms, bem como classificar as marcas associadas a QTL's, conforme seu nível de interação. O procedimento permitiu ainda a identificação de regiões cromossômicas envolvidas no controle genético do caractere sob estudo conforme sua maior ou menor estabilidade ao longo dos ambientes.
In general terms, QTL mapping di®ers from other research ac-tivities in genetics. Being basically a multiple test procedure, problems arise which are related to the joint level of signi¯cance of the analysis, and consequently, to its power. Using computational simulation of data, the power of simple marker analysis, carried out through multiple linear regression, using stepwise procedures to select the markers was obtained. Procedures based on single tests, using both the FDR and the Bonferroni criteria to determinate the joint level of signi¯cance were also used. Results showed that the procedure based on multiple regression, using the stepwise technique, was the most powerful in identifying markers associated to QTL's. However, in cases where its power was smaller, its advantage was the ability to detect only markers strongly associates with QTL's. In comparision with the Bonferroni method, the FDR criterion was in general more powerful, and should be adopted in the interval mapping procedures. Additional problems found in the QTL analysis refer to the QTL x environment interaction. We consider this aspect by par-titioning the genotype x environment interaction variance in components explained by the molecular markers and deviations. This alowed estimating the proportion of the genetic variance (pm), and genotype x environment variance (pms), explained by the markers. These estimators are not a®ected by deviations of allelic frequencies of the markers in relation to the expected values (1:2:1 in a F2 generation, 1:1 in a backcross , etc). However, there is a high probability of obtaining estimates out of the parametric range, specially for high values of this proportion. Nevertheless, these probabilities can be reduced by increasing the number of replications and/or environments where the progenies are evaluated. Based on a set of grain yield data, obtained from the evaluation of 68 maize progenies genotyped for 77 codominant molecular markers, and evaluated as top crosses in four environments, the presented methodologies allowed estimating proportions pm and pms as well the classification of markers associated to QTL's, with respect to its level of genotype x environment interaction. The procedure also allowed the identification of chromosomic regions, involved in the genetical control of the considered trait, according to its stability, in relation to the observed environmental variation.
Style APA, Harvard, Vancouver, ISO itp.
48

Ramasamy, Adaikalavan. "Increasing statistical power and generalizability in genomics microarray research". Thesis, University of Oxford, 2009. http://ora.ox.ac.uk/objects/uuid:81ccede7-a268-4c7a-9bf8-a2b68634846d.

Pełny tekst źródła
Streszczenie:
The high-throughput technologies developed in the last decade have revolutionized the speed of data accumulation in the life sciences. As a result we have very rich and complex data that holds great promise to solving many complex biological questions. One such technology that is very well established and widespread is DNA microarrays, which allows one to simultaneously measure the expression levels of tens of thousands of genes in a biological tissue. This thesis aims to contribute to the development of statistics that allow the end users to obtain robust and meaningful results from DNA microarrays for further investigation. The methodology, implementation and pragmatic issues of two important and related topics – sample size estimations for designing new studies and meta-analysis of existing studies – are presented here to achieve this aim. Real life case studies and guided steps are also given. Sample size estimation is important at the design stage to ensure a study has sufficient statistical power to address the stated objective given the financial constraints. The commonly used formula for estimating the number of biological samples, its short-comings and potential amelioration are discussed. The optimal number of biological samples and number of measurements per sample that minimizes the cost is also presented. Meta-analysis or the synthesis of information from existing studies is very attractive because it can increase the statistical power by making comprehensive and inexpensive use of available information. Furthermore, one can also easily test the generalizability of findings (i.e. the extent of results from a particular valid study can be applied to other circumstances). The key issues in conducting a meta-analysis for microarrays studies, a checklist and R codes are presented here. Finally, the poor availability of raw data in microarray studies is discussed here with recommendations for authors, journal editors and funding bodies. Good availability of data is important for meta-analysis in order to avoid biased results and for sample size estimation.
Style APA, Harvard, Vancouver, ISO itp.
49

AMALAPURAPU, SUCHITRA S. "A STATISTICAL ANALYSIS OF AMINO ACID CHANGES IN THE HUMAN GENOME". University of Cincinnati / OhioLINK, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1051720394.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
50

He, Karen Yingyi. "DETECTING LOW FREQUENCY AND RARE VARIANTS ASSOCIATED WITH BLOOD PRESSURE". Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case157435735160471.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii