Dissertations / Theses on the topic 'Genetic algorithms – Statistical methods'

To see the other types of publications on this topic, follow the link: Genetic algorithms – Statistical methods.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Genetic algorithms – Statistical methods.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Czarn, Andrew Simon Timothy. "Statistical exploratory analysis of genetic algorithms." University of Western Australia. School of Computer Science and Software Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0030.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
[Truncated abstract] Genetic algorithms (GAs) have been extensively used and studied in computer science, yet there is no generally accepted methodology for exploring which parameters significantly affect performance, whether there is any interaction between parameters and how performance varies with respect to changes in parameters. This thesis presents a rigorous yet practical statistical methodology for the exploratory study of GAs. This methodology addresses the issues of experimental design, blocking, power and response curve analysis. It details how statistical analysis may assist the investigator along the exploratory pathway. The statistical methodology is demonstrated in this thesis using a number of case studies with a classical genetic algorithm with one-point crossover and bit-replacement mutation. In doing so we answer a number of questions about the relationship between the performance of the GA and the operators and encoding used. The methodology is suitable, however, to be applied to other adaptive optimization algorithms not treated in this thesis. In the first instance, as an initial demonstration of our methodology, we describe case studies using four standard test functions. It is found that the effect upon performance of crossover is predominantly linear while the effect of mutation is predominantly quadratic. Higher order effects are noted but contribute less to overall behaviour. In the case of crossover both positive and negative gradients are found which suggests using rates as high as possible for some problems while possibly excluding it for others. .... This is illustrated by showing how the use of Gray codes impedes the performance on a lower modality test function compared with a higher modality test function. Computer animation is then used to illustrate the actual mechanism by which this occurs. Fourthly, the traditional concept of a GA is that of selection, crossover and mutation. However, a limited amount of data from the literature has suggested that the niche for the beneficial effect of crossover upon GA performance may be smaller than has traditionally been held. Based upon previous results on not-linear-separable problems an exploration is made by comparing two test problem suites, one comprising non-rotated functions and the other comprising the same functions rotated by 45 degrees in the solution space rendering them not-linear-separable. It is shown that for the difficult rotated functions the crossover operator is detrimental to the performance of the GA. It is conjectured that what makes a problem difficult for the GA is complex and involves factors such as the degree of optimization at local minima due to crossover, the bias associated with the mutation operator and the Hamming Distances present in the individual problems due to the encoding. Furthermore, the GA was tested on a real world landscape minimization problem to see if the results obtained would match those from the difficult rotated functions. It is demonstrated that they match and that the features which make certain of the test functions difficult are also present in the real world problem. Overall, the proposed methodology is found to be an effective tool for revealing relationships between a randomized optimization algorithm and its encoding and parameters that are difficult to establish from more ad-hoc experimental studies alone.
2

Shen, Rujun, and 沈汝君. "Mining optimal technical trading rules with genetic algorithms." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B47870011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In recent years technical trading rules are widely known by more and more people, not only the academics many investors also learn to apply them in financial markets. One approach of constructing technical trading rules is to use technical indicators, such as moving average(MA) and filter rules. These trading rules are widely used possibly because the technical indicators are simple to compute and can be programmed easily. An alternative approach of constructing technical trading rules is to rely on some chart patterns. However, the patterns and signals detected by these rules are often made by the visual inspection through human eyes. As for as I know, there are no universally acceptable methods of constructing the chart patterns. In 2000, Prof. Andrew Lo and his colleagues are the first ones who define five pairs of chart patterns mathematically. They are Head-and-Shoulders(HS) & Inverted Headand- Shoulders(IHS), Broadening tops(BTOP) & bottoms(BBOT), Triangle tops(TTOP) & bottoms(TBOT), Rectangle tops(RTOP) & bottoms( RBOT) and Double tops(DTOP) & bottoms(DBOT). The basic formulation of a chart pattern consists of two steps: detection of (i) extreme points of a price series; and (ii) shape of the pattern. In Lo et al.(2000), the method of kernel smoothing was used to identify the extreme points. It was admitted by Lo et al. (2000) that the optimal bandwidth used in kernel method is not the best choice and the expert judgement is needed in detecting the bandwidth. In addition, their work considered chart pattern detection only but no buy/sell signal detection. It should be noted that it is possible to have a chart pattern formed without a signal detected, but in this case no transaction will be made. In this thesis, I propose a new class of technical trading rules which aims to resolve the above problems. More specifically, each chart pattern is parameterized by a set of parameters which governs the shape of the pattern, the entry and exit signals of trades. Then the optimal set of parameters can be determined by using genetic algorithms (GAs). The advantage of GA is that they can deal with a high-dimensional optimization problems no matter the parameters to be optimized are continuous or discrete. In addition, GA can also be convenient to use in the situation that the fitness function is not differentiable or has a multi-modal surface.
published_or_final_version
Statistics and Actuarial Science
Master
Master of Philosophy
3

Barreau, Thibaud. "Strategic optimization of a global bank capital management using statistical methods on open data." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273413.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This project is about the optimization of the capital management of a French global bank. Capital management corresponds here to allocating the available capital to the different business units. In this project, I focus on the optimization of the allocation of the risk weighted assets (RWA) between some of the business units of the bank, as a representation of the allocated capital. Emphasis is put on the market and retail part of the bank and the first step was to be able to model the evolution of a business unit given an economic environment. The second one was about optimizing the distribution of RWA among the selected parts of the bank.
Projektets ämne handlar om att optimering allokering av kapital inom en fransk global bank. Kapital management syftar här på hur kapital ska fördelas mellan olika avdelningar inom banken. I detta projekt fokuserar jag på optimering av allokeringen av riskvägda resurser (RWA) mellan några av bankens enheter, som en representation av det allokerade kapitalet. Uppsatsen inriktar sig främst emot retail-delen av banken. Första steget var att modellera utvecklingen av en bankavdelning givet en ekonomisk omgivning? Andra steget var att försöka optimera fördelningen av RWA mellan de utvalda bankavdelningarna.
4

Larsen, Ross Allen Andrew. "Food Shelf Life: Estimation and Experimental Design." Diss., CLICK HERE for online access, 2006. http://contentdm.lib.byu.edu/ETD/image/etd1315.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Herrington, Hira B. "A Heuristic Evolutionary Method for the Complementary Cell Suppression Problem." NSUWorks, 2015. http://nsuworks.nova.edu/gscis_etd/28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Cell suppression is a common method for disclosure avoidance used to protect sensitive information in two-dimensional tables where row and column totals are published along with non-sensitive data. In tables with only positive cell values, cell suppression has been demonstrated to be non-deterministic NP-hard. Therefore, finding more efficient methods for producing low-cost solutions is an area of active research. Genetic algorithms (GA) have shown to be effective in finding good solutions to the cell suppression problem. However, these methods have the shortcoming that they tend to produce a large proportion of infeasible solutions. The primary goal of this research was to develop a GA that produced low-cost solutions with fewer infeasible solutions created at each generation than previous methods without introducing excessive CPU runtime costs. This research involved developing a GA that produces low-cost solutions with fewer infeasible solutions produced at each generation; and implementing selection and replacement operations that maintained genetic diversity during the evolution process. The GA's performance was tested using tables containing 10,000 and 100,000 cells. The primary criterion for the evaluation of effectiveness of the GA was total cost of the complementary suppressions and the CPU runtime. Experimental results indicate that the GA-based method developed in this dissertation produced better quality solutions than those produced by extant heuristics. Because existing heuristics are very effective, this GA-based method was able to surpass them only modestly. Existing evolutionary methods have also been used to improve upon the quality of solutions produced by heuristics. Experimental results show that the GA-based method developed in this dissertation is computationally more efficient than GA-based methods proposed in the literature. This is attributed to the fact that the specialized genetic operators designed in this study produce fewer infeasible solutions. The results of these experiments suggest the need for continued research into non-probabilistic methods to seed the initial populations, selection and replacement strategies that factor in genetic diversity on the level of the circuits protecting sensitive cells; solution-preserving crossover and mutation operators; and the use of cost benefit ratios to determine program termination.
6

ZHANG, GE. "STATISTICAL METHODS IN GENETIC ASSOCIATION." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1196099744.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Valenzuela-Del, Rio Jose Eugenio. "Bayesian adaptive sampling for discrete design alternatives in conceptual design." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50263.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The number of technology alternatives has lately grown to satisfy the increasingly demanding goals in modern engineering. These technology alternatives are handled in the design process as either concepts or categorical design inputs. Additionally, designers desire to bring into early design more and more accurate, but also computationally burdensome, simulation tools to obtain better performing initial designs that are more valuable in subsequent design stages. It constrains the computational budget to optimize the design space. These two factors unveil the need of a conceptual design methodology to use more efficiently sophisticated tools for engineering problems with several concept solutions and categorical design choices. Enhanced initial designs and discrete alternative selection are pursued. Advances in computational speed and the development of Bayesian adaptive sampling techniques have enabled the industry to move from the use of look-up tables and simplified models to complex physics-based tools in conceptual design. These techniques focus computational resources on promising design areas. Nevertheless, the vast majority of the work has been done on problems with continuous spaces, whereas concepts and categories are treated independently. However, observations show that engineering objectives experience similar topographical trends across many engineering alternatives. In order to address these challenges, two meta-models are developed. The first one borrows the Hamming distance and function space norms from machine learning and functional analysis, respectively. These distances allow defining categorical metrics that are used to build an unique probabilistic surrogate whose domain includes, not only continuous and integer variables, but also categorical ones. The second meta-model is based on a multi-fidelity approach that enhances a concept prediction with previous concept observations. These methodologies leverage similar trends seen from observations and make a better use of sample points increasing the quality of the output in the discrete alternative selection and initial designs for a given analysis budget. An extension of stochastic mixed-integer optimization techniques to include the categorical dimension is developed by adding appropriate generation, mutation, and crossover operators. The resulted stochastic algorithm is employed to adaptively sample mixed-integer-categorical design spaces. The proposed surrogates are compared against traditional independent methods for a set of canonical problems and a physics-based rotor-craft model on a screened design space. Next, adaptive sampling algorithms on the developed surrogates are applied to the same problems. These tests provide evidence of the merit of the proposed methodologies. Finally, a multi-objective rotor-craft design application is performed in a large domain space. This thesis provides several novel academic contributions. The first contribution is the development of new efficient surrogates for systems with categorical design choices. Secondly, an adaptive sampling algorithm is proposed for systems with mixed-integer-categorical design spaces. Finally, previously sampled concepts can be brought to construct efficient surrogates of novel concepts. With engineering judgment, design community could apply these contributions to discrete alternative selection and initial design assessment when similar topographical trends are observed across different categories and/or concepts. Also, it could be crucial to overcome the current cost of carrying a set of concepts and wider design spaces in the categorical dimension forward into preliminary design.
8

Rogers, Alex. "Modelling genetic algorithms and evolving populations." Thesis, University of Southampton, 2000. https://eprints.soton.ac.uk/261289/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A formalism for modelling the dynamics of genetic algorithms using methods from statistical physics, originally due to Pr¨ugel-Bennett and Shapiro, is extended to ranking selection, a form of selection commonly used in the genetic algorithm community. The extension allows a reduction in the number of macroscopic variables required to model the mean behaviour of the genetic algorithm. This reduction allows a more qualitative understanding of the dynamics to be developed without sacrificing quantitative accuracy. The work is extended beyond modelling the dynamics of the genetic algorithm. A caricature of an optimisation problem with many local minima is considered — the basin with a barrier problem. The first passage time — the time required to escape the local minima to the global minimum — is calculated and insights gained as to how the genetic algorithm is searching the landscape. The interaction of the various genetic algorithm operators and how these interactions give rise to optimal parameters values is studied.
9

Shar, Nisar Ahmed. "Statistical methods for predicting genetic regulation." Thesis, University of Leeds, 2016. http://etheses.whiterose.ac.uk/16729/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Transcriptional regulation of gene expression is essential for cellular differentiation and function, and defects in the process are associated with cancer. Transcription is regulated by the cis-acting regulatory regions and trans-acting regulatory elements. Transcription factors bind on enhancers and repressors and form complexes by interacting with each other to control the expression of the genes. Understanding the regulation of genes would help us to understand the biological system and can be helpful in identifying therapeutic targets for diseases such as cancer. The ENCODE project has mapped binding sites of many TFs in some important cell types and this project also has mapped DNase I hypersensitivity sites across the cell types. Predicting transcription factors mutual interactions would help us in finding the potential transcription regulatory networks. Here, we have developed two methods for prediction of transcription factors mutual interactions from ENCODE ChIP-seq data, and both methods generated similar results which tell us about the accuracy of the methods. It is known that functional regions of genome are conserved and here we identified that shared/overlapping transcription factor binding sites in multiple cell types and in transcription factors pairs are more conserved than their respective non-shared/non-overlapping binding sites. It has been also studied that co-binding sites influence the expression level of genes. Most of the genes mapped to the transcription factor co-binding sites have significantly higher level of expression than those genes which were mapped to the single transcription factor bound sites. The ENCODE data suggests a very large number of potential regulatory sites across the complete genome in many cell types and methods are needed to identify those that are most relevant and to connect them to the genes that they control. A penalized regression method, LASSO was used to build correlative models, and choose two regulatory regions that are predictive of gene expression, and link them to their respective gene. Here, we show that our identified regulatory regions accumulate significant number of somatic mutations that occur in cancer cells, suggesting that their effects may drive cancer initiation and development. Harboring of somatic mutations in these identified regulatory regions is an indication of positive selection, which has been also observed in cancer related genes.
10

Pittman, Jennifer L. "Adaptive splines and genetic algorithms for optimal statistical modeling." Adobe Acrobat reader required to view the full dissertation, 2000. http://www.etda.libraries.psu.edu/theses/approved/WorldWideIndex/ETD-23/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Rattray, Magnus. "Modelling the dynamics of genetic algorithms using statistical mechanics." Thesis, University of Manchester, 1996. http://publications.aston.ac.uk/598/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A formalism for modelling the dynamics of Genetic Algorithms (GAs) using methods from statistical mechanics, originally due to Prugel-Bennett and Shapiro, is reviewed, generalized and improved upon. This formalism can be used to predict the averaged trajectory of macroscopic statistics describing the GA's population. These macroscopics are chosen to average well between runs, so that fluctuations from mean behaviour can often be neglected. Where necessary, non-trivial terms are determined by assuming maximum entropy with constraints on known macroscopics. Problems of realistic size are described in compact form and finite population effects are included, often proving to be of fundamental importance. The macroscopics used here are cumulants of an appropriate quantity within the population and the mean correlation (Hamming distance) within the population. Including the correlation as an explicit macroscopic provides a significant improvement over the original formulation. The formalism is applied to a number of simple optimization problems in order to determine its predictive power and to gain insight into GA dynamics. Problems which are most amenable to analysis come from the class where alleles within the genotype contribute additively to the phenotype. This class can be treated with some generality, including problems with inhomogeneous contributions from each site, non-linear or noisy fitness measures, simple diploid representations and temporally varying fitness. The results can also be applied to a simple learning problem, generalization in a binary perceptron, and a limit is identified for which the optimal training batch size can be determined for this problem. The theory is compared to averaged results from a real GA in each case, showing excellent agreement if the maximum entropy principle holds. Some situations where this approximation brakes down are identified. In order to fully test the formalism, an attempt is made on the strong sc np-hard problem of storing random patterns in a binary perceptron. Here, the relationship between the genotype and phenotype (training error) is strongly non-linear. Mutation is modelled under the assumption that perceptron configurations are typical of perceptrons with a given training error. Unfortunately, this assumption does not provide a good approximation in general. It is conjectured that perceptron configurations would have to be constrained by other statistics in order to accurately model mutation for this problem. Issues arising from this study are discussed in conclusion and some possible areas of further research are outlined.
12

Shringarpure, Suyash. "Statistical Methods for studying Genetic Variation in Populations." Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/117.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The study of genetic variation in populations is of great interest for the study of the evolutionary history of humans and other species. Improvement in sequencing technology has resulted in the availability of many large datasets of genetic data. Computational methods have therefore become quite important in analyzing these data. Two important problems that have been studied using genetic data are population stratification (modeling individual ancestry with respect to ancestral populations) and genetic association (finding genetic polymorphisms that affect a trait). In this thesis, we develop methods to improve our understanding of these two problems. For the population stratification problem, we develop hierarchical Bayesian models that incorporate the evolutionary processes that are known to affect genetic variation. By developing mStruct, we show that modeling more evolutionary processes improves the accuracy of the recovered population structure. We demonstrate how nonparametric Bayesian processes can be used to address the question of choosing the optimal number of ancestral populations that describe the genetic diversity of a given sample of individuals. We also examine how sampling bias in genotyping study design can affect results of population structure analysis and propose a probabilistic framework for modeling and correcting sample selection bias. Genome-wide association studies (GWAS) have vastly improved our understanding of many diseases. However, such studies have failed to uncover much of the variation responsible for a number of common multi-factorial diseases and complex traits. We show how artificial selection experiments on model organisms can be used to better understand the nature of genetic associations. We demonstrate using simulations that using data from artificial selection experiments improves the performance of conventional methods of performing association. We also validate our approach using semi-simulated data from an artificial selection experiment on Drosophila Melanogaster.
13

Cheng, Lulu. "Statistical Methods for Genetic Pathway-Based Data Analysis." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/52039.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The wide application of the genomic microarray technology triggers a tremendous need in the development of the high dimensional genetic data analysis. Many statistical methods for the microarray data analysis consider one gene at a time, but they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from the prior biological knowledge and are called "pathways". We have made contributions on two specific research topics related to the high dimensional genetic pathway data. One is to propose a semi- parametric model for identifying pathways related to the zero inflated clinical outcomes; the other is to propose a multilevel Gaussian graphical model for exploring both pathway and gene level network structures. For the first problem, we develop a semiparametric model via a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero inflated Poisson hierarchical regression model with unknown link function. The nonparametric pathway effect is estimated via the kernel machine and the unknown link function is estimated by transforming a mixture of beta cumulative density functions. Our approach provides flexible semiparametric settings to describe the complicated association between gene microarray expressions and the clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor are used to make the statistical inferences. Our simulation results support that the semiparametric approach is more accurate and flexible than the zero inflated Poisson regression with the canonical link function, this is especially true when the number of genes is large. The usefulness of our approaches is demonstrated through its applications to a canine gene expression data set (Enerson et al., 2006). Our approaches can also be applied to other settings where a large number of highly correlated predictors are present. Unlike the first problem, the second one is to take into account that pathways are not independent of each other because of shared genes and interactions among pathways. Multi-pathway analysis has been a challenging problem because of the complex dependence structure among pathways. By considering the dependency among pathways as well as genes within each pathway, we propose a multi-level Gaussian graphical model (MGGM): one level is for pathway network and the second one is for gene network. We develop a multilevel L1 penalized likelihood approach to achieve the sparseness on both levels. We also provide an iterative weighted graphical LASSO algorithm (Guo et al., 2011) for MGGM. Some asymptotic properties of the estimator are also illustrated. Our simulation results support the advantages of our approach; our method estimates the network more accurate on the pathway level, and sparser on the gene level. We also demonstrate usefulness of our approach using the canine genes-pathways data set.
Ph. D.
14

Yung, Godwin Yuen Han. "Statistical methods for analyzing genetic sequencing association studies." Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:33493313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Case-control genetic sequencing studies are increasingly being conducted to identify rare variants associated with complex diseases. Oftentimes, these studies collect a variety of secondary traits--quantitative and qualitative traits besides the case-control disease status. Reusing the data and studying the association between rare variants and secondary phenotypes provide an attractive and cost effective approach that can lead to discovery of new genetic associations. In Chapter 1, we carry out an extensive investigation of the validity of ad hoc methods, which are simple, computationally efficient methods frequently applied in practice to study the association between secondary phenotypes and single common genetic variants. Though other researchers have investigated the same problem, we make two key contributions to existing literature. First, we show that in taking an ad hoc approach, it may be desirable to adjust for covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype in the population. Second, we show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a non-logistic model such as the probit model. Spurious associations can be avoided by including interaction terms in the fitted regression model. Our results are justified theoretically and via simulations, and illustrated by a genome-wide association study of smoking using a lung cancer case-control study. In Chapter 2, we consider the problem of testing associations between secondary phenotypes and sets of rare genetic variants. We show that popular region-based methods such as the burden test and the sequence kernel association test (SKAT) can only be applied under the same conditions as those applicable to ad hoc methods (Chapter 1). For a more robust alternative, we propose an inverse-probability-weighted version of the optimal SKAT (SKAT-O) to account for unequal sampling of cases and controls. As an extension of SKAT-O, our approach is data adaptive and includes the weighted burden test and weighted SKAT as special cases. In addition to weighting individuals to account for the biased sampling, we can also consider weighting the variants in SKAT-O. Decreasing the weight of non-causal variants and increasing the weight of causal variants can improve power. However, since researchers do not know which variants are actually causal, it is common practice to weight genetic variants as a function of their minor allele frequencies. This is motivated by the belief that rarer variants are more likely to have larger effects. In Chapter 3, we propose a new unsupervised statistical framework for predicting the functional status of genetic variants. Compared to existing methods, the proposed algorithm integrates a diverse set of annotations---which are partitioned beforehand into multiple groups by the user---and predicts the functional status for each group, taking into account within- and between-group correlations. We demonstrate the advantages of the algorithm through application to real annotation data and conclude with future directions.
Biostatistics
15

Lange, Christoph. "Generalized estimating equation methods in statistical genetics." Thesis, University of Reading, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.269921.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Brunning, James Jonathan Jesse. "Alignment models and algorithms for statistical machine translation." Thesis, University of Cambridge, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.608922.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Clark, Taane Gregory. "Statistical methods for finding associations in dense genetic regions." Thesis, University of Oxford, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.413976.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Ferreira, Teresa. "Statistical methods for modelling epistasis in genetic association studies." Thesis, University of Oxford, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.543476.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Su, Zhan. "Statistical methods for the analysis of genetic association studies." Thesis, University of Oxford, 2008. http://ora.ox.ac.uk/objects/uuid:98614f8b-63fe-4fa1-9a24-422216ad14cf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
One of the main biological goals of recent years is to determine the genes in the human genome that cause disease. Recent technological advances have realised genome-wide association studies, which have uncovered numerous genetic regions implicated with human diseases. The current approach to analysing data from these studies is based on testing association at single SNPs but this is widely accepted as underpowered to detect rare and poorly tagged variants. In this thesis we propose several novel approaches to analysing large-scale association data, which aim to improve upon the power offered by traditional approaches. We combine an established imputation framework with a sophisticated disease model that allows for multiple disease causing mutations at a single locus. To evaluate our methods, we have developed a fast and realistic method to simulate association data conditional on population genetic data. The simulation results show that our methods remain powerful even if the causal variant is not well tagged, there are haplotypic effects or there is allelic heterogeneity. Our methods are further validated by the analysis of the recent WTCCC genome-wide association data, where we have detected confirmed disease loci, known regions of allelic heterogeneity and new signals of association. One of our methods also has the facility to identify the high risk haplotype backgrounds that harbour the disease alleles, and therefore can be used for fine-mapping. We believe that the incorporation of our methods into future association studies will help progress the understanding genetic diseases.
20

Lin, Xinyi (Cindy). "Statistical Methods for High-Dimensional Data in Genetic Epidemiology." Thesis, Harvard University, 2014. http://dissertations.umi.com/gsas.harvard:11326.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Recent technological advancements have enabled us to collect an unprecedented amount of genetic epidemiological data. The overarching goal of these genetic epidemiology studies is to uncover the underlying biological mechanisms so that improved strategies for disease prevention and management can be developed. To efficiently analyze and interpret high-dimensional biological data, it is imperative to develop novel statistical methods as conventional statistical methods are generally not applicable or are inefficient. In this dissertation, we introduce three novel, powerful and computationally efficient kernel machine set-based association tests for analyzing high-throughput genetic epidemiological data. In the first chapter, we construct a test for identifying common genetic variants that are predictive of a time-to-event outcome. In the second chapter, we develop a test for identifying gene-environment interactions for common genetic variants. In the third chapter, we propose a test for identifying gene-environment interactions for rare genetic variants.
21

Yi, Wan Kitty Yuen. "Statistical methods for the analysis of genetic association studies." Thesis, University of Kent, 2011. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.544040.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Sucheston, Lara E. "STATISTICAL METHODS FOR THE GENETIC ANALYSIS OF DEVELOPMENTAL DISORDERS." Case Western Reserve University School of Graduate Studies / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=case1175883318.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Marschall, Tobias [Verfasser]. "Algorithms and statistical methods for exact motif discovery / Tobias Marschall." Dortmund : Universitätsbibliothek Technische Universität Dortmund, 2011. http://d-nb.info/1012572064/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Georgieva, Antoniya. "Stochastic methods and genetic algorithms for neural network learning." Thesis, University of Portsmouth, 2008. https://researchportal.port.ac.uk/portal/en/theses/stochastic-methods-and-genetic-algorithms-for-neural-network-learning(67dae83c-ec3d-4db2-875c-6e7407a4144f).html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis presents results from the developemnt, investigation, testing and evaluation of novel meta-heuristic techniques aiming to further improve the state-of-the-art of algorithms for local minima free Neural Network supervised learning. Several approaches for solving Global Optimisation problems that make use of novel meta-heuristic techniques, so called Low-discrepancy Sequences, and hybrid Evolutionary Algorithms are proposed here, investigated and critically discussed. Furthermore, the novel methods are tested on a number of multimodal mathematical function optimisation problems, as well as on a variety of Neural Network learning tasks, including real-world benchmark datasets. Comparison of the results from the investigated methods with such from standard Backpropagation, Evolutionary Algorithms, and other stochastic approaches (Simulated Annealing, Tabu Search, etc.) is conducted in order to demonstrate their competitiveness in terms of number of function evaluations, learning speed and Neural Network generalisation abilities. Finally, the investigated techniques are applied and tested on real-world problems for the intelligent recognition and classification of cork tiles. An Intelligent Computer Vision system is built. The system includes the following stages: image acquisition; image processing (feature extraction and statistical data processing); Neural Network architecture design; supervised learning utilising the proposed Global Optimisation techniques; and finally, extensive system evaluation. The presented examples and case studies demonstrate that the proposed techniques can be effectively applied for the optimisation of mathematical multimodal functions. The investigated methods are successful in local minima free Neural Network learning, and they can be used for solving real-world industrial problems.
25

Cordell, Heather Jane. "Statistical methods in the genetic analysis of type 1 diabetes." Thesis, University of Oxford, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.296834.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Davies, Joanna L. "Statistical methods for modelling sources of heterogeneity in genetic epidemiology." Thesis, University of Oxford, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.534165.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Salem, Rany Mansour. "Statistical methods for genetic association analysis involving complex longitudinal data." Diss., [La Jolla] : [San Diego] : University of California, San Diego ; San Diego State University, 2009. http://wwwlib.umi.com/cr/ucsd/fullcit?p3366492.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (Ph. D.)--University of California, San Diego and San Diego State University, 2009.
Title from first page of PDF file (viewed Aug. 14, 2009). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references.
28

Mahjani, Behrang. "Methods from Statistical Computing for Genetic Analysis of Complex Traits." Doctoral thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-284378.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The goal of this thesis is to explore, improve and implement some advanced modern computational methods in statistics, focusing on applications in genetics. The thesis has three major directions. First, we study likelihoods for genetics analysis of experimental populations. Here, the maximum likelihood can be viewed as a computational global optimization problem. We introduce a faster optimization algorithm called PruneDIRECT, and explain how it can be parallelized for permutation testing using the Map-Reduce framework. We have implemented PruneDIRECT as an open source R package, and also Software as a Service for cloud infrastructures (QTLaaS). The second part of the thesis focusses on using sparse matrix methods for solving linear mixed models with large correlation matrices. For populations with known pedigrees, we show that the inverse of covariance matrix is sparse. We describe how to use this sparsity to develop a new method to maximize the likelihood and calculate the variance components. In the final part of the thesis we study computational challenges of psychiatric genetics, using only pedigree information. The aim is to investigate existence of maternal effects in obsessive compulsive behavior. We add the maternal effects to the linear mixed model, used in the second part of this thesis, and we describe the computational challenges of working with binary traits.
eSSENCE
29

Xia, Fan, and 夏凡. "Some topics on statistical analysis of genetic imprinting data and microbiome compositional data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/206673.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Genetic association study is a useful tool to identify the genetic component that is responsible for a disease. The phenomenon that a certain gene expresses in a parent-of-origin manner is referred to as genomic imprinting. When a gene is imprinted, the performance of the disease-association study will be affected. This thesis presents statistical testing methods developed specially for nuclear family data centering around the genetic association studies incorporating imprinting effects. For qualitative diseases with binary outcomes, a class of TDTI* type tests was proposed in a general two-stage framework, where the imprinting effects were examined prior to association testing. On quantitative trait loci, a class of Q-TDTI(c) type tests and another class of Q-MAX(c) type tests were proposed. The proposed testing methods flexibly accommodate families with missing parental genotype and with multiple siblings. The performance of all the methods was verified by simulation studies. It was found that the proposed methods improve the testing power for detecting association in the presence of imprinting. The class of TDTI* tests was applied to a rheumatoid arthritis study data. Also, the class of Q-TDTI(c) tests was applied to analyze the Framingham Heart Study data. The human microbiome is the collection of the microbiota, together with their genomes and their habitats throughout the human body. The human microbiome comprises an inalienable part of our genetic landscape and contributes to our metabolic features. Also, current studies have suggested the variety of human microbiome in human diseases. With the high-throughput DNA sequencing, the human microbiome composition can be characterized based on bacterial taxa relative abundance and the phylogenetic constraint. Such taxa data are often high-dimensional overdispersed and contain excessive number of zeros. Taking into account of these characteristics in taxa data, this thesis presents statistical methods to identify associations between covariate/outcome and the human microbiome composition. To assess environmental/biological covariate effect to microbiome composition, an additive logistic normal multinomial regression model was proposed and a group l1 penalized likelihood estimation method was further developed to facilitate selection of covariates and estimation of parameters. To identify microbiome components associated with biological/clinical outcomes, a Bayesian hierarchical regression model with spike and slab prior for variable selection was proposed and a Markov chain Monte Carlo algorithm that combines stochastic variable selection procedure and random walk metropolis-hasting steps was developed for model estimation. Both of the methods were illustrated using simulations as well as a real human gut microbiome dataset from The Penn Gut Microbiome Project.
published_or_final_version
Statistics and Actuarial Science
Doctoral
Doctor of Philosophy
30

Sotero, Charity Faith Gallemit. "Statistical Support Algorithms for Clinical Decisions and Prevention of Genetic-related Heart Disease." Thesis, California State University, Long Beach, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10751893.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

Drug-induced long QT syndrome (diLQTS) can lead to seemingly healthy patients experiencing cardiac arrest, specifically Torsades de Pointes (TdP), which may lead to death. Clinical decision support systems (CDSS) assist better prescribing of drugs, in part by issuing alerts that warn of the drug’s potential harm. LQTS may be either genetic or acquired. Thirteen distinct genetic mutations have already been identified for hereditary LQTS. Since hereditary and acquired LQTS both share similar clinical symptoms, it is reasonable to assume that they both have some sort of genetic component. The goal of this study is to identify genetic risk markers for diLQTS and TdP. These markers will be used to develop a statistical DSS for clinical applications and prevention of genetic-related heart disease. We will use data from a genome-wide associate study conducted by the Pharmacogenomics of Arrhythmia Therapy subgroup of the Pharmacogenetics Research Network, focused on subjects with a history of diLQTS or TdP after taking medication. The data was made available for general research use by National Center for Biotechnology Information (NCBI). The data consists of 831 total patients, with 172 diLQTS and TdP case patients. Out of 620,901 initial markers, variable screening is done by a preliminary t-test (α=0.01), and the resulting feasible set of 5,754 markers associated with diLQTS to prevent TdP were used to create an appropriate predictive model. Methods used to create a predictive model were ensemble logistic regression, elastic net, random forests, artificial neural networks, and linear discriminant analysis. Of these methods using all 5,754 markers, accuracy ranged from 76.84% to 90.29%, with artificial neural networks as the most accurate model. Finally, variable importance algorithms were applied to extract a feasible set of markers from the ensemble logistic regression, elastic net, and random forests methods, and used to produce a subset of genetic markers suitable to build a proposed DSS. Of the methods using a subset of 61 markers, accuracy ranged from 76.59% to 87.00%, with ensemble logistic regression as the most accurate model. Of the methods using a subset of 22 markers, accuracy ranged from 74.24% to 82.87%, with the single hidden layer neural network (using the subset of markers extracted from the ensemble bagged logistic model) as the most accurate model.

31

Wang, Dennis Yi Qing. "Statistical modelling of gene regulation : applications to haematopoiesis." Thesis, University of Cambridge, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.607969.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Dick, Grant, and n/a. "Spatially-structured niching methods for evolutionary algorithms." University of Otago. Department of Information Science, 2008. http://adt.otago.ac.nz./public/adt-NZDU20080902.161336.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Traditionally, an evolutionary algorithm (EA) operates on a single population with no restrictions on possible mating pairs. Interesting changes to the behaviour of EAs emerge when the structure of the population is altered so that mating between individuals is restricted. Variants of EAs that use such populations are grouped into the field of spatially-structured EAs (SSEAs). Previous research into the behaviour of SSEAs has primarily focused on the impact space has on the selection pressure in the system. Selection pressure is usually characterised by takeover times and the ratio between the neighbourhood size and the overall dimension of space. While this research has given indications into where and when the use of an SSEA might be suitable, it does not provide a complete coverage of system behaviour in SSEAs. This thesis presents new research into areas of SSEA behaviour that have been left either unexplored or briefly touched upon in current EA literature. The behaviour of genetic drift in finite panmictic populations is well understood. This thesis attempts to characterise the behaviour of genetic drift in spatially-structured populations. First, an empirical investigation into genetic drift in two commonly encountered topologies, rings and torii, is performed. An observation is made that genetic drift in these two configurations of space is independent of the genetic structure of individuals and additive of the equivalent-sized panmictic population. In addition, localised areas of homogeneity present themselves within the structure purely as a result of drifting. A model based on the theory of random walks to absorbing boundaries is presented which accurately characterises the time to fixation through random genetic drift in ring topologies. A large volume of research has gone into developing niching methods for solving multimodal problems. Previously, these techniques have used panmictic populations. This thesis introduces the concept of localised niching, where the typically global niching methods are applied to the overlapping demes of a spatially structured population. Two implementations, local sharing and local clearing are presented and are shown to be frequently faster and more robust to parameter settings, and applicable to more problems than their panmictic counterparts. Current SSEAs typically use a single fitness function across the entire population. In the context of multimodal problems, this means each location in space attempts to discover all the optima. A preferable situation would be to use the inherent spatial properties of an SSEA to localise optimisation of peaks. This thesis adapts concepts from multiobjective optimisation with environmental gradients and applies them to multimodal problems. In addition to adapting to the fitness landscape, individuals evolve towards their preferred environmental conditions. This has the effect of separating individuals into regions that concentrate on different optima with the global fitness function. The thesis also gives insights into the expected number of individuals occupying each optima in the problem. The SSEAs and related models developed in this thesis are of interest to both researchers and end-users of evolutionary computation. From the end-user�s perspective, the developed SSEAs require less a priori knowledge of a given problem domain in order to operate effectively, so they can be more readily applied to difficult, poorly-defined problems. Also, the theoretical findings of this thesis provides a more complete understanding of evolution within spatially-structured populations, which is of interest not only to evolutionary computation practitioners, but also to researchers in the fields of population genetics and ecology.
33

Sofer, Tamar. "Statistical Methods for High Dimensional Data in Environmental Genomics." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this dissertation, we propose methodology to analyze high dimensional genomics data, in which the observations have large number of outcome variables, in addition to exposure variables. In the Chapter 1, we investigate methods for genetic pathway analysis, where we have a small number of exposure variables. We propose two Canonical Correlation Analysis based methods, that select outcomes either sequentially or by screening, and show that the performance of the proposed methods depend on the correlation between the genes in the pathway. We also propose and investigate criterion for fixing the number of outcomes, and a powerful test for the exposure effect on the pathway. The methodology is applied to show that air pollution exposure affects gene methylation of a few genes from the asthma pathway. In Chapter 2, we study penalized multivariate regression as an efficient and flexible method to study the relationship between large number of covariates and multiple outcomes. We use penalized likelihood to shrink model parameters to zero and to select only the important effects. We use the Bayesian Information Criterion (BIC) to select tuning parameters for the employed penalty and show that it chooses the right tuning parameter with high probability. These are combined in the “two-stage procedure”, and asymptotic results show that it yields consistent, sparse and asymptotically normal estimator of the regression parameters. The method is illustrated on gene expression data in normal and diabetic patients. In Chapter 3 we propose a method for estimation of covariates-dependent principal components analysis (PCA) and covariance matrices. Covariates, such as smoking habits, can affect the variation in a set of gene methylation values. We develop a penalized regression method that incorporates covariates in the estimation of principal components. We show that the parameter estimates are consistent and sparse, and show that using the BIC to select the tuning parameter for the penalty functions yields good models. We also propose the scree plot residual variance criterion for selecting the number of principal components. The proposed procedure is implemented to show that the first three principal components of genes methylation in the asthma pathway are different in people who did not smoke, and people who did.
34

Hu, Yueqing, and 胡躍清. "Some topics in the statistical analysis of forensic DNA and genetic family data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B38831491.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Hutt, Benjamin David. "Evolving artificial neural network controllers for robots using species-based methods." Thesis, University of Reading, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.270831.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Nicholson, George. "Statistical methods for inferring human population history from multi-locus genetic data." Thesis, University of Oxford, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.275404.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Lee, Michael James. "Methods in Percolation." Thesis, University of Canterbury. Physics and Astronomy, 2008. http://hdl.handle.net/10092/2365.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Algorithms are presented for the computationally efficient manipulation of graphs. These are subsequently used as the basis of a Monte Carlo method for sampling from the microcanonical ensemble of lattice configurations of a percolation model within a neighbourhood of the critical point. This new method arbitrarily increments and decrements the number of occupied lattice sites, and is shown to be a generalisation of several earlier, purely incremental, methods. As demonstrations of capability, the method was used to construct a phase diagram for exciton transport on a disordered surface, and to study finite size effects upon the incipient spanning cluster. Application of the method to the classical site percolation model on the two-dimensional square lattice resulted in an exceptionally precise estimate of the critical threshold. Although this estimate is not in agreement with earlier results, its accuracy was established through an application specific test of randomness, which is also introduced here. The same test suggests that many earlier results have been systematically biased due to the use of deficient pseudorandom number generators. The estimate made here has since been independently confirmed.
38

Arif, Omar. "Robust target localization and segmentation using statistical methods." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33882.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis aims to contribute to the area of visual tracking, which is the process of identifying an object of interest through a sequence of successive images. The thesis explores kernel-based statistical methods, which map the data to a higher dimensional space. A pre-image framework is provided to find the mapping from the embedding space to the input space for several manifold learning and dimensional learning algorithms. Two algorithms are developed for visual tracking that are robust to noise and occlusions. In the first algorithm, a kernel PCA-based eigenspace representation is used. The de-noising and clustering capabilities of the kernel PCA procedure lead to a robust algorithm. This framework is extended to incorporate the background information in an energy based formulation, which is minimized using graph cut and to track multiple objects using a single learned model. In the second method, a robust density comparison framework is developed that is applied to visual tracking, where an object is tracked by minimizing the distance between a model distribution and given candidate distributions. The superior performance of kernel-based algorithms comes at a price of increased storage and computational requirements. A novel method is developed that takes advantage of the universal approximation capabilities of generalized radial basis function neural networks to reduce the computational and storage requirements for kernel-based methods.
39

Guan, Ting. "Novel Statistical Methods for Multiple-variant Genetic Association Studies with Related Individuals." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/96243.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Genetic association studies usually include related individuals. Meanwhile, high-throughput sequencing technologies produce data of multiple genetic variants. Due to linkage disequilibrium (LD) and familial relatedness, the genotype data from such studies often carries complex correlations. Moreover, missing values in genotype usually lead to loss of power in genetic association tests. Also, repeated measurements of phenotype and dynamic covariates from longitudinal studies bring in more opportunities but also challenges in the discovery of disease-related genetic factors. This dissertation focuses on developing novel statistical methods to address some challenging questions remaining in genetic association studies due to the aforementioned reasons. So far, a lot of methods have been proposed to detect disease-related genetic regions (e.g., genes, pathways). However, with multiple-variant data from a sample with relatedness, it is critical to account for the complex genotypic correlations when assessing genetic contribution. Recognizing the limitations of existing methods, in the first work of this dissertation, the Adaptive-weight Burden Test (ABT) --- a score test between a quantitative trait and the genotype data with complex correlations --- is proposed. ABT achieves higher power by adopting data-driven weights, which make good use of the LD and relatedness. Because the null distribution has been successfully derived, the computational simplicity of ABT makes it a good fit for genome-wide association studies. Genotype missingness commonly arises due to limitations in genotyping technologies. Imputation of the missing values in genotype usually improves quality of the data used in the subsequent association test and thus increases power. Complex correlations, though troublesome, provide the opportunity to proper handling of genotypic missingness. In the second part of this dissertation, a genotype imputation method is developed, which can impute the missingness in multiple genetic variants via the LD and the relatedness. The popularity of longitudinal studies in genetics and genomics calls for methods deliberately designed for repeated measurements. Therefore, a multiple-variant genetic association test for a longitudinal trait on samples with relatedness is developed, which treats the longitudinal measurements as observations of functions and thus takes into account the time factor properly.
PHD
40

Yu, Xuesong. "Statistical methods for analyzing genomic data with consideration of spatial structures /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/9553.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Swan, Roger William. "Optimisation of water treatment works using Monte-Carlo methods and genetic algorithms." Thesis, University of Birmingham, 2015. http://etheses.bham.ac.uk//id/eprint/5868/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Hand movements reveal the temporal characteristics of visual attention Optimisation of potable water treatment could result in substantial cost savings for water companies and their customers. To address this issue, computational modelling of water treatment works using static and dynamic models was examined alongside the application of optimisation techniques including genetic algorithms and operational zone identification. These methods were explored with the assistance of case study data from an operational works. It was found that dynamic models were more accurate than static models at predicting the water quality of an operational site but that the root mean square error of the models was within 5% of each other for key performance criteria. Using these models, a range of abstraction rates, for which a water treatment works was predicted to operate sufficiently, were identified, dependent on raw water temperature and total organic carbon concentration. Genetic algorithms were also applied to the water treatment works models to identify near optimal design and operating regimes. Static models were identified as being more suitable for whole works optimisation than dynamic models based on their relative accuracy, simplicity and computational demands.
42

Winkleblack, Scott Kenneth swinkleb. "ReGen: Optimizing Genetic Selection Algorithms for Heterogeneous Computing." DigitalCommons@CalPoly, 2014. https://digitalcommons.calpoly.edu/theses/1236.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
GenSel is a genetic selection analysis tool used to determine which genetic markers are informational for a given trait. Performing genetic selection related analyses is a time consuming and computationally expensive task. Due to an expected increase in the number of genotyped individuals, analysis times will increase dramatically. Therefore, optimization efforts must be made to keep analysis times reasonable. This thesis focuses on optimizing one of GenSel’s underlying algorithms for heterogeneous computing. The resulting algorithm exposes task-level parallelism and data-level parallelism present but inaccessible in the original algorithm. The heterogeneous computing solution, ReGen, outperforms the optimized CPU implementation achieving a 1.84 times speedup.
43

Dambreville, Samuel. "Statistical and geometric methods for shape-driven segmentation and tracking." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/22707.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (Ph. D.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2008.
Committee Chair: Allen Tannenbaum; Committee Member: Anthony Yezzi; Committee Member: Marc Niethammer; Committee Member: Patricio Vela; Committee Member: Yucel Altunbasak.
44

Howie, Bryan. "Statistical methods for phasing haplotypes and inputing genotypes in large population genetic datasets." Thesis, University of Oxford, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.531825.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Gale, Joanne. "Statistical Methods for the Analysis of Quantitative Trait Data in Genetic Association Studies." Thesis, University of Oxford, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.504345.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Lipson, Mark (Mark Israel). "New statistical genetic methods for elucidating the history and evolution of human populations." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/89873.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mathematics, 2014.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 165-173).
In the last few decades, the study of human history has been fundamentally changed by our ability to detect the signatures left within our genomes by adaptations, migrations, population size changes, and other processes. Rapid advances in DNA sequencing technology have now made it possible to interrogate these signals at unprecedented levels of detail, but extracting more complex information about the past from patterns of genetic variation requires new and more sophisticated models. This thesis presents a suite of sensitive and efficient statistical tools for learning about human history and evolution from large-scale genetic data. We focus first on the problem of admixture inference and describe two new methods for determining the dates, sources, and proportions of ancestral mixtures between diverged populations. These methods have already been applied to a number of important historical questions, in particular that of tracing the course of the Austronesian expansion in Southeast Asia. We also report a new approach for estimating the human mutation rate, a fundamental parameter in evolutionary genetics, and provide evidence that it is higher than has been proposed in recent pedigree-based studies.
by Mark Lipson.
Ph. D.
47

Tachmazidou, Ioanna. "Bayesian statistical methods for genetic association studies with case-control and cohort design." Thesis, Imperial College London, 2008. http://hdl.handle.net/10044/1/4398.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Large-scale genetic association studies are carried out with the hope of discovering single nucleotide polymorphisms involved in the etiology of complex diseases. We propose a coalescent-based model for association mapping which potentially increases the power to detect disease-susceptibility variants in genetic association studies with case-control and cohort design. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions and we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium (LD) therein assuming a perfect phylogeny. The haplotype space is then partitioned into disjoint clusters within which the phenotype-haplotype association is assumed to be the same. The novelty of our approach consists in the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common mutation. Our approach is fully Bayesian and we develop Markov Chain Monte Carlo algorithms to sample efficiently over the space of possible partitions. We have also developed a Bayesian survival regression model for high-dimension and small sample size settings. We provide a Bayesian variable selection procedure and shrinkage tool by imposing shrinkage priors on the regression coefficients. We have developed a computationally efficient optimization algorithm to explore the posterior surface and find the maximum a posteriori estimates of the regression coefficients. We compare the performance of the proposed methods in simulation studies and using real datasets to both single-marker analyses and recently proposed multi-marker methods and show that our methods perform similarly in localizing the causal allele while yielding lower false positive rates. Moreover, our methods offer computational advantages over other multi-marker approaches.
48

Iotchkova, Valentina Valentinova. "Bayesian methods for multivariate phenotype analysis in genome-wide association studies." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:66fd61e1-a6e3-4e91-959b-31a3ec88967c.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Most genome-wide association studies search for genetic variants associated to a single trait of interest, despite the main interest usually being the understanding of a complex genotype-phenotype network. Furthermore, many studies collect data on multiple phenotypes, each measuring a different aspect of the biological system under consideration, therefore it can often make sense to jointly analyze the phenotypes. However this is rarely the case and there is a lack of well developed methods for multiple phenotype analysis. Here we propose novel approaches for genome-wide association analysis, which scan the genome one SNP at a time for association with multivariate traits. The first half of this thesis focuses on an analytic model averaging approach which bi-partitions traits into associated and unassociated, fits all such models and measures evidence of association using a Bayes factor. The discrete nature of the model allows very fine control of prior beliefs about which sets of traits are more likely to be jointly associated. Using simulated data we show that this method can have much greater power than simpler approaches that do not explicitly model residual correlation between traits. On real data of six hematological parameters in 3 population cohorts (KORA, UKNBS and TwinsUK) from the HaemGen consortium, this model allows us to uncover an association at the RCL locus that was not identified in the original analysis but has been validated in a much larger study. In the second half of the thesis we propose and explore the properties of models that use priors encouraging sparse solutions, in the sense that genetic effects of phenotypes are shrunk towards zero when there is little evidence of association. To do this we explore and use spike and slab (SAS) priors. All methods combine both hypothesis testing, via calculation of a Bayes factor, and model selection, which occurs implicitly via the sparsity priors. We have successfully implemented a Variational Bayesian approach to fit this model, which provides a tractable approximation to the posterior distribution, and allows us to approximate the very high-dimensional integral required for the Bayes factor calculation. This approach has a number of desirable properties. It can handle missing phenotype data, which is a real feature of most studies. It allows for both correlation due to relatedness between subjects or population structure and residual phenotype correlation. It can be viewed as a sparse Bayesian multivariate generalization of the mixed model approaches that have become popular recently in the GWAS literature. In addition, the method is computationally fast and can be applied to millions of SNPs for a large number of phenotypes. Furthermore we apply our method to 15 glycans from 3 isolated population cohorts (ORCADES, KORCULA and VIS), where we uncover association at a known locus, not identified in the original study but discovered later in a larger one. We conclude by discussing future directions.
49

Li, Li. "Evolutionary optimization methods for mass customizing platform products." Click to view the E-thesis via HKUTO, 2007. http://sunzi.lib.hku.hk/HKUTO/record/B3955790X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Lee, Yiu-fai, and 李耀暉. "Analysis for segmental sharing and linkage disequilibrium: a genomewide association study on myopia." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B43912217.

Full text
APA, Harvard, Vancouver, ISO, and other styles

To the bibliography