To see the other types of publications on this topic, follow the link: Genetic data.

Dissertations / Theses on the topic 'Genetic data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Genetic data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Qiao, Dandi. "Statistical Approaches for Next-Generation Sequencing Data." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10689.

Full text
Abstract:
During the last two decades, genotyping technology has advanced rapidly, which enabled the tremendous success of genome-wide association studies (GWAS) in the search of disease susceptibility loci (DSLs). However, only a small fraction of the overall predicted heritability can be explained by the DSLs discovered. One possible explanation for this ”missing heritability” phenomenon is that many causal variants are rare. The recent development of high-throughput next-generation sequencing (NGS) technology provides the instrument to look closely at these rare variants with precision and efficiency. However, new approaches for both the storage and analysis of sequencing data are in imminent needs. In this thesis, we introduce three methods that could be utilized in the management and analysis of sequencing data. In Chapter 1, we propose a novel and simple algorithm for compressing sequencing data that leverages on the scarcity of rare variant data, which enables the storage and analysis of sequencing data efficiently in current hardware environment. We also provide a C++ implementation that supports direct and parallel loading of the compressed format without requiring extra time for decompression. Chapter 2 and 3 focus on the association analysis of sequencing data in population-based design. In Chapter 2, we present a statistical methodology that allows the identification of genetic outliers to obtain a genetically homogeneous subpopulation, which reduces the false positives due to population substructure. Our approach is computationally efficient that can be applied to all the genetic loci in the data and does not require pruning of variants in linkage disequilibrium (LD). In Chapter 3, we propose a general analysis framework in which thousands of genetic loci can be tested simultaneously for association with complex phenotypes. The approach is built on spatial-clustering methodology, assuming that genetic loci that are associated with the target phenotype cluster in certain genomic regions. In contrast to standard methodology for multi-loci analysis, which has focused on the dimension reduction of data, the proposed approach profits from the availability of large numbers of genetic loci. Thus it will be especially relevant for whole-genome sequencing studies which commonly record several thousand loci per gene.
APA, Harvard, Vancouver, ISO, and other styles
2

Haroun, Paul. "Genetic algorithm and data visualization." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0017/MQ37125.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Lankhorst, Marc Martijn. "Genetic algorithms in data analysis." [S.l. : [Groningen] : s.n.] ; [University Library Groningen] [Host], 1996. http://irs.ub.rug.nl/ppn/142964662.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Hiden, Hugo George. "Data-based modelling using genetic programming." Thesis, University of Newcastle Upon Tyne, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.246137.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Auton, Adam. "The estimation of recombination rates from population genetic data." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:dc38045b-725d-4afc-8c76-94769db3534d.

Full text
Abstract:
Genetic recombination is an important process that generates new combinations of genes on which natural selection can operate. As such, an understanding of recombination in the human genome will provide insight into the evolutionary processes that have shaped our genetic history. The aim of this thesis is to use samples of population genetic data to explore the patterns of variation in the rate of recombination in the human genome. To do this I introduce a novel means of estimating recombination rates from population genetic data. The new, computationally efficient method incorporates a model of recombination hotspots that was absent in existing methods. I use samples from the International HapMap Project to obtain recombination rate estimates for the autosomal portion of the genome. Using these estimates, I demonstrate that recombination has a number of interesting relationships with other genome features such as genes, DNA repeats, and sequence motifs. Furthermore, I show that genes of differing function have significantly different rates of recombination. I explore the relationship between recombination and specific sequence motifs and argue that while sequence motifs are an important factor in determining the location of recombination hotspots, the factor that controls motif activity is unknown. The observation of many relationships between recombination and other genome features motivates an attempt to quantify the contributions to the recombination rate from specific features. I employ a wavelet analysis to investigate scale-specific patterns of recombination. In doing so, I reveal a number of highly significant correlations between recombination and other features of the genome at both the fine and broad scales, but find that relatively little of the variation in recombination rates can be explained. I conclude with a discussion of the results contained in the body of the thesis, and suggest a number of areas for future research.
APA, Harvard, Vancouver, ISO, and other styles
6

Agarwala, Vineeta. "Integrating empirical data and population genetic simulations to study the genetic architecture of type 2 diabetes." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11120.

Full text
Abstract:
Most common diseases have substantial heritable components but are characterized by complex inheritance patterns implicating numerous genetic and environmental factors. A longstanding goal of human genetics research is to delineate the genetic architecture of these traits - the number, frequencies, and effect sizes of disease-causing alleles - to inform mapping studies, elucidate mechanisms of disease, and guide development of targeted clinical therapies and diagnostics. Although vast empirical genetic data has now been collected for common diseases, different and contradictory hypotheses have been advocated about features of genetic architecture (e.g., the contribution of rare vs. common variants). Here, we present a framework which combines multiple empirical datasets and simulation studies to enable systematic testing of hypotheses about both global and locus-specific complex trait architecture. We apply this to type 2 diabetes (T2D).
APA, Harvard, Vancouver, ISO, and other styles
7

Romano, Eduardo O. "Selection indices for combining marker genetic data and animal model information /." This resource online, 1993. http://scholar.lib.vt.edu/theses/available/etd-09192009-040546/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Xin. "Haplotype Inference from Pedigree Data and Population Data." Cleveland, Ohio : Case Western Reserve University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=case1259867573.

Full text
Abstract:
Thesis(Ph.D.)--Case Western Reserve University, 2010
Title from PDF (viewed on 2009-12-30) Department of Electrical Engineering and Computer Science Includes abstract Includes bibliographical references and appendices Available online via the OhioLINK ETD Center
APA, Harvard, Vancouver, ISO, and other styles
9

Shenoy, U. Nagaraj. "Automatic Data Partitioning By Hierarchical Genetic Search." Thesis, Indian Institute of Science, 1996. http://hdl.handle.net/2005/172.

Full text
Abstract:
CDAC
The introduction of languages like High Performance Fortran (HPF) which allow the programmer to indicate how the arrays used in the program have to be distributed across the local memories of a multi-computer has not completely unburdened the parallel programmer from the intricacies of these architectures. In order to tap the full potential of these architectures, the compiler has to perform this crucial task of data partitioning automatically. This would not only unburden the programmer but would make the programs more efficient since the compiler can be made more intelligent to take care of the architectural nuances. The topic of this thesis namely the automatic data partitioning deals with finding the best data partition for the various arrays used in the entire program in such a way that the cost of execution of the entire program is minimized. The compiler could resort to runtime redistribution of the arrays at various points in the program if found profitable. Several aspects of this problem have been proven to be NP-complete. Other researchers have suggested heuristic solutions to solve this problem. In this thesis we propose a genetic algorithm namely the Hierarchical Genetic Search algorithm to solve this problem.
APA, Harvard, Vancouver, ISO, and other styles
10

Al-Madi, Naila Shikri. "Improved Genetic Programming Techniques For Data Classification." Diss., North Dakota State University, 2014. https://hdl.handle.net/10365/27097.

Full text
Abstract:
Evolutionary algorithms are one category of optimization techniques that are inspired by processes of biological evolution. Evolutionary computation is applied to many domains and one of the most important is data mining. Data mining is a relatively broad field that deals with the automatic knowledge discovery from databases and it is one of the most developed fields in the area of artificial intelligence. Classification is a data mining method that assigns items in a collection to target classes with the goal to accurately predict the target class for each item in the data. Genetic programming (GP) is one of the effective evolutionary computation techniques to solve classification problems. GP solves classification problems as an optimization tasks, where it searches for the best solution with highest accuracy. However, GP suffers from some weaknesses such as long execution time, and the need to tune many parameters for each problem. Furthermore, GP can not obtain high accuracy for multiclass classification problems as opposed to binary problems. In this dissertation, we address these drawbacks and propose some approaches in order to overcome them. Adaptive GP variants are proposed in order to automatically adapt the parameter settings and shorten the execution time. Moreover, two approaches are proposed to improve the accuracy of GP when applied to multiclass classification problems. In addition, a Segment-based approach is proposed to accelerate the GP execution time for the data classification problem. Furthermore, a parallelization of the GP process using the MapReduce methodology was proposed which aims to shorten the GP execution time and to provide the ability to use large population sizes leading to a faster convergence. The proposed approaches are evaluated using different measures, such as accuracy, execution time, sensitivity, specificity, and statistical tests. Comparisons between the proposed approaches with the standard GP, and with other classification techniques were performed, and the results showed that these approaches overcome the drawbacks of standard GP by successfully improving the accuracy and execution time.
APA, Harvard, Vancouver, ISO, and other styles
11

McCaskie, Pamela Ann. "Multiple-imputation approaches to haplotypic analysis of population-based data with applications to cardiovascular disease." University of Western Australia. School of Population Health, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0160.

Full text
Abstract:
[Truncated abstract] This thesis investigates novel methods for the genetic association analysis of haplotype data in samples of unrelated individuals, and applies these methods to the analysis of coronary heart disease and related phenotypes. Determining the inheritance pattern of genetic variants in studies of unrelated individuals can be problematic because family members of the studied individuals are often not available. For the analysis of individual genetic loci, no problem arises because the unit of interest is the observed genotype. When the unit of interest is the linear combination of alleles along one chromosome, inherited together in a haplotype, it is not always possible to determine with certainty the inheritance pattern, and therefore statistical methods to infer these patterns must be adopted. Due to genotypic heterozygosity, mutliple possible haplotype configurations can often resolve an individual's genotype measures at multiple loci. When haplotypes are not known, but are inferred statistically, an element of uncertainty is thus inherent which, if not dealt with appropriately, can result in unreliable estimates of effect sizes in an association setting. The core aim of the research described in this thesis was to develop and implement a general method for haplotype-based association analysis using multiple imputation to appropriately deal with uncertainty haplotype assignment. Regression-based approaches to association analysis provide flexible methods to investigate the influence of a covariate on a response variable, adjusting for the effects of other variables including interaction terms. ... These methods are then applied to models accommodating binary, quantitative, longitudinal and survival data. The performance of the multiple imputation method implemented was assessed using simulated data under a range of haplotypic effect sizes and genetic inheritance patterns. The multiple imputation approach performed better, on average, than ignoring haplotypic uncertainty, and provided estimates that in most cases were similar to those observed when haplotypes were known. The haplotype association methods developed in this thesis were used to investigate the genetic epidemiology of cardiovascular disease, utilising data for the cholesteryl ester transfer protein gene (CETP), the hepatic lipase (LIPC) gene and the 15- lipoxygenase (ALOX15) gene on a total of 6,487 individuals from three Western Australian studies. Results of these analyses suggested single nucleotide polymorphisms (SNPs) and haplotypes in the CETP gene were associated with increased plasma high-density lipoprotein cholesterol (HDL-C). SNPs in the LIPC gene were also associated with increased HDL-C and haplotypes in the ALOX15 gene were associated with risk of carotid plaque among individuals with premature CHD. The research presented in this thesis is both novel and important as it provides methods for the analysis of haplotypic associations with a range of response types, while incorporating information about haplotype uncertainty inherent in populationbased studies. These methods are shown to perform well for a range of simulated and real data situations, and have been written into a statistical analysis package that has been freely released to the research community.
APA, Harvard, Vancouver, ISO, and other styles
12

Cole, Rowena Marie. "Clustering with genetic algorithms." University of Western Australia. Dept. of Computer Science, 1998. http://theses.library.uwa.edu.au/adt-WU2003.0008.

Full text
Abstract:
Clustering is the search for those partitions that reflect the structure of an object set. Traditional clustering algorithms search only a small sub-set of all possible clusterings (the solution space) and consequently, there is no guarantee that the solution found will be optimal. We report here on the application of Genetic Algorithms (GAs) -- stochastic search algorithms touted as effective search methods for large and complex spaces -- to the problem of clustering. GAs which have been made applicable to the problem of clustering (by adapting the representation, fitness function, and developing suitable evolutionary operators) are known as Genetic Clustering Algorithms (GCAs). There are two parts to our investigation of GCAs: first we look at clustering into a given number of clusters. The performance of GCAs on three generated data sets, analysed using 4320 differing combinations of adaptions, establishes their efficacy. Choice of adaptions and parameter settings is data set dependent, but comparison between results using generated and real data sets indicate that performance is consistent for similar data sets with the same number of objects, clusters, attributes, and a similar distribution of objects. Generally, group-number representations are better suited to the clustering problem, as are dynamic scaling, elite selection and high mutation rates. Independent generalised models fitted to the correctness and timing results for each of the generated data sets produced accurate predictions of the performance of GCAs on similar real data sets. While GCAs can be successfully adapted to clustering, and the method produces results as accurate and correct as traditional methods, our findings indicate that, given a criterion based on simple distance metrics, GCAs provide no advantages over traditional methods. Second, we investigate the potential of genetic algorithms for the more general clustering problem, where the number of clusters is unknown. We show that only simple modifications to the adapted GCAs are needed. We have developed a merging operator, which with elite selection, is employed to evolve an initial population with a large number of clusters toward better clusterings. With regards to accuracy and correctness, these GCAs are more successful than optimisation methods such as simulated annealing. However, such GCAs can become trapped in local minima in the same manner as traditional hierarchical methods. Such trapping is characterised by the situation where good (k-1)-clusterings do not result from our merge operator acting on good k-clusterings. A marked improvement in the algorithm is observed with the addition of a local heuristic.
APA, Harvard, Vancouver, ISO, and other styles
13

Ahsan, Nasir Computer Science &amp Engineering Faculty of Engineering UNSW. "Learning causal networks from gene expression data." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/26151.

Full text
Abstract:
In this thesis we present a new model for identifying dependencies within a gene regulatory cycle. The model incorporates both probabilistic and temporal aspects, but is kept deliberately simple to make it amenable for learning from the gene expression data of microarray experiments. A key simplifying feature in our model is the use of a compression function for collapsing multiple causes of gene expression into a single cause. This allows us to introduce a learning algorithm which avoids the over-fitting tendencies of models with many parameters. We have validated the learning algorithm on simulated data, and carried out experiments on real microarray data. In doing so, we have discovered novel, yet plausible, biological relationships.
APA, Harvard, Vancouver, ISO, and other styles
14

Li, Xiang, and xiali@cs rmit edu au. "Utilising Restricted For-Loops in Genetic Programming." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080110.122751.

Full text
Abstract:
Genetic programming is an approach that utilises the power of evolution to allow computers to evolve programs. While loops are natural components of most programming languages and appear in every reasonably-sized application, they are rarely used in genetic programming. The work is to investigate a number of restricted looping constructs to determine whether any significant benefits can be obtained in genetic programming. Possible benefits include: Solving problems which cannot be solved without loops, evolving smaller sized solutions which can be more easily understood by human programmers and solving existing problems quicker by using fewer evaluations. In this thesis, a number of explicit restricted loop formats were formulated and tested on the Santa Fe ant problem, a modified ant problem, a sorting problem, a visit-every-square problem and a difficult object classification problem. The experimental results showed that these explicit loops can be success fully used in genetic programming. The evolutionary process can decide when, where and how to use them. Runs with these loops tended to generate smaller sized solutions in fewer evaluations. Solutions with loops were found to some problems that could not be solved without loops. The results and analysis of this thesis have established that there are significant benefits in using loops in genetic programming. Restricted loops can avoid the difficulties of evolving consistent programs and the infinite iterations problem. Researchers and other users of genetic programming should not be afraid of loops.
APA, Harvard, Vancouver, ISO, and other styles
15

Lei, Celestino. "Using genetic algorithms and boosting for data preprocessing." Thesis, University of Macau, 2002. http://umaclib3.umac.mo/record=b1447848.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Leslie, Stephen. "Inference of Population Stratification Using Population Genetic Data." Thesis, University of Oxford, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.504423.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Cheng, Lulu. "Statistical Methods for Genetic Pathway-Based Data Analysis." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/52039.

Full text
Abstract:
The wide application of the genomic microarray technology triggers a tremendous need in the development of the high dimensional genetic data analysis. Many statistical methods for the microarray data analysis consider one gene at a time, but they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from the prior biological knowledge and are called "pathways". We have made contributions on two specific research topics related to the high dimensional genetic pathway data. One is to propose a semi- parametric model for identifying pathways related to the zero inflated clinical outcomes; the other is to propose a multilevel Gaussian graphical model for exploring both pathway and gene level network structures. For the first problem, we develop a semiparametric model via a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero inflated Poisson hierarchical regression model with unknown link function. The nonparametric pathway effect is estimated via the kernel machine and the unknown link function is estimated by transforming a mixture of beta cumulative density functions. Our approach provides flexible semiparametric settings to describe the complicated association between gene microarray expressions and the clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor are used to make the statistical inferences. Our simulation results support that the semiparametric approach is more accurate and flexible than the zero inflated Poisson regression with the canonical link function, this is especially true when the number of genes is large. The usefulness of our approaches is demonstrated through its applications to a canine gene expression data set (Enerson et al., 2006). Our approaches can also be applied to other settings where a large number of highly correlated predictors are present. Unlike the first problem, the second one is to take into account that pathways are not independent of each other because of shared genes and interactions among pathways. Multi-pathway analysis has been a challenging problem because of the complex dependence structure among pathways. By considering the dependency among pathways as well as genes within each pathway, we propose a multi-level Gaussian graphical model (MGGM): one level is for pathway network and the second one is for gene network. We develop a multilevel L1 penalized likelihood approach to achieve the sparseness on both levels. We also provide an iterative weighted graphical LASSO algorithm (Guo et al., 2011) for MGGM. Some asymptotic properties of the estimator are also illustrated. Our simulation results support the advantages of our approach; our method estimates the network more accurate on the pathway level, and sparser on the gene level. We also demonstrate usefulness of our approach using the canine genes-pathways data set.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
18

Liu, Dongqing. "GENETIC ALGORITHMS FOR SAMPLE CLASSIFICATION OF MICROARRAY DATA." University of Akron / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=akron1125253420.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Delman, Bethany. "Genetic algorithms in cryptography /." Link to online version, 2003. https://ritdml.rit.edu/dspace/handle/1850/263.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Xia, Fan, and 夏凡. "Some topics on statistical analysis of genetic imprinting data and microbiome compositional data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/206673.

Full text
Abstract:
Genetic association study is a useful tool to identify the genetic component that is responsible for a disease. The phenomenon that a certain gene expresses in a parent-of-origin manner is referred to as genomic imprinting. When a gene is imprinted, the performance of the disease-association study will be affected. This thesis presents statistical testing methods developed specially for nuclear family data centering around the genetic association studies incorporating imprinting effects. For qualitative diseases with binary outcomes, a class of TDTI* type tests was proposed in a general two-stage framework, where the imprinting effects were examined prior to association testing. On quantitative trait loci, a class of Q-TDTI(c) type tests and another class of Q-MAX(c) type tests were proposed. The proposed testing methods flexibly accommodate families with missing parental genotype and with multiple siblings. The performance of all the methods was verified by simulation studies. It was found that the proposed methods improve the testing power for detecting association in the presence of imprinting. The class of TDTI* tests was applied to a rheumatoid arthritis study data. Also, the class of Q-TDTI(c) tests was applied to analyze the Framingham Heart Study data. The human microbiome is the collection of the microbiota, together with their genomes and their habitats throughout the human body. The human microbiome comprises an inalienable part of our genetic landscape and contributes to our metabolic features. Also, current studies have suggested the variety of human microbiome in human diseases. With the high-throughput DNA sequencing, the human microbiome composition can be characterized based on bacterial taxa relative abundance and the phylogenetic constraint. Such taxa data are often high-dimensional overdispersed and contain excessive number of zeros. Taking into account of these characteristics in taxa data, this thesis presents statistical methods to identify associations between covariate/outcome and the human microbiome composition. To assess environmental/biological covariate effect to microbiome composition, an additive logistic normal multinomial regression model was proposed and a group l1 penalized likelihood estimation method was further developed to facilitate selection of covariates and estimation of parameters. To identify microbiome components associated with biological/clinical outcomes, a Bayesian hierarchical regression model with spike and slab prior for variable selection was proposed and a Markov chain Monte Carlo algorithm that combines stochastic variable selection procedure and random walk metropolis-hasting steps was developed for model estimation. Both of the methods were illustrated using simulations as well as a real human gut microbiome dataset from The Penn Gut Microbiome Project.
published_or_final_version
Statistics and Actuarial Science
Doctoral
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
21

Uduman, Mohamed. "Identifying the largest complete data set from ALFRED /." Link to online version, 2006. https://ritdml.rit.edu/dspace/handle/1850/1876.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Romano, Eduardo. "Selection indices for combining marker genetic data and animal model information." Thesis, Virginia Tech, 1993. http://hdl.handle.net/10919/44879.

Full text
Abstract:
It was suggested that marker and phenotypic information be combined in order to obtain more accurate or earlier genetic evaluations. An improvement in accuracy or time of evaluation due to utilization of marker assisted selection (MAS) increases genetic progress. Fernando and Grossman (1989) suggested including marker information directly into the Animal Model, Best Linear Unbiased Prediction system, but several problems need to be solved before their approach becomes feasible. Other selection indices were suggested but either do not use all the available information or are suitable only for evaluation of the offspring of the sire from which the marker information was established. A selection index combining marker and Animal Model information was developed to allow comparisons involving offspring, grandoffspring and great-grandoffspring of a sire. Marker information was assumed to be a least square estimate of the difference between the average effects of the two quantitative trait loci (QTL) alleles present in a sire (Dp) and the standard error of this estimate (SE(Dp)). Estimates may have been obtained from a daughter or granddaughter design. Comparisons among grandoffspring and great-grandoffspring also require an estimate of the recombination rate (r) between the marker and the QTL. The Animal Model information consists of predicted transmitting ability (PTA) and reliability of PTA. PTA was assumed not to include any marker information. The expected percentage of the gain in accuracy (PGA) due to the inclusion of marker information in the selection indices is affected by the degree of polymorphism at the marker locus. The polymorphism information content (PIC) of a marker locus was computed for the second and third generations and for mates genotyped or not. PGA increased with larger Dos lower SE(Dp), lower r, a smaller number of own and progeny records, and larger PIC. PGA and PIC reduce over generations. Marker information in dairy cattle is likely to be used in generations beyond offspring. Then, only the use of highly polymorphic markers with a large and accurately estimated effect may be economically justified.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
23

Stewart, William C. L. "Alternative models for estimating genetic maps from pedigree data /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/8975.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Jochumsson, Thorvaldur. "Inferring Genetic Networks from Expression Data with Mutual Information." Thesis, University of Skövde, Department of Computer Science, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-736.

Full text
Abstract:

Recent methods to infer genetic networks are based on identifying gene interactions by similarities in expression profiles. These methods are founded on the assumption that interacting genes share higher similarities in their expression profiles than non-interacting genes. In this dissertation this assumption is validated when using mutual information as a similarity measure. Three algorithms that calculate mutual information between expression data are developed: 1) a basic approach implemented with the histogram technique; 2) an extension of the basic approach that takes into consideration time delay between expression profiles; 3) an extension of the basic approach that takes into consideration that genes are regulated in a complex manner by multiple genes. In our experiments we compare the mutual information distributions for profiles of interacting and non-interacting genes. The results show that interacting genes do not share higher mutual information in their expression profiles than non-interacting genes, thus contradicting the basic assumption that similarity measures need to fulfil. This indicates that mutual information is not appropriate as similarity measure, which contradicts earlier proposals.

APA, Harvard, Vancouver, ISO, and other styles
25

Ayaz, Eyup Serdar. "Resonctructing Signaling Pathways From Rnai Data Using Genetic Algorithms." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613813/index.pdf.

Full text
Abstract:
Cell signaling is a list of chemical reactions that are used for intercellular and intracellular communication. Signaling pathways denote these chemical reactions in a systematic manner. Today, many signaling pathways are constructed by several experimental methods. However there are still many communication skills of cells that are needed to be discovered. RNAi system allows us to see the phenotypes when some genes are removed from living cells. By observing these phenotypes, we can build signaling pathways. However it is costly in terms of time and space complexity. Furthermore, there are some interactions RNAi data cannot distinguish that results in many different signaling pathways all of which are consistent with the RNAi data. In this thesis, we combine genetic algorithms with some greedy approaches to find the topologies that fit the Boolean single knock-down RNAi experiments. Our algorithm finds nearly all of the results for small inputs in a few minutes. It can also find a significant number of results for larger inputs. Then we eliminate isomorphic topologies from the output set of this algorithm. This process fairly reduces the number of topologies. Afterwards we offer a simple scheme for suggesting new wet-lab RNAi experiments which is necessary to have a complete approach to find the actual network. Also we describe a new activation and deactivation model for pathways when the activation of the phenotype after RNAi knock-downs are given as weighted variables. We adapt the first genetic algorithm approach to this model for directly finding the most possible network.
APA, Harvard, Vancouver, ISO, and other styles
26

Liu, Kejun. "Software and Methods for Analyzing Molecular Genetic Marker Data." NCSU, 2003. http://www.lib.ncsu.edu/theses/available/etd-07182003-122001/.

Full text
Abstract:
Genetic analysis of molecular markers has allowed biologists to ask a wide variety of questions. This dissertation explores some aspects of the statistical and computational issues used in the genetic marker data analysis. Chapter 1 gives an introduction to genetic marker data, as well as a brief description to each chapter. Chapter 2 presents the different genetic analyses performed on a large data set and discusses the use of microsatellites to describe the maize germplasm and to improve maize germplasm maintenance. Considerable attention is focused on how the maize germplasm is organized and genetic variation is distributed. A novel maximum likelihood method is developed to estimate the historical contributions for maize inbred lines. Chapter 3 covers a new method for optimal selection of a core set of lines from a large germplasm collection. The simulated annealing algorithm for choosing an optimal k-subset is described and evaluated using the maize germplasm as an example; general constraints are incorporated in the algorithm, and the efficiency of the algorithms is compared to existing methods. Chapter 4 covers a two-stage strategy to partition a chromosomal region into blocks with extensive within-block linkage disequilibrium, and to select the optimal subset of SNPs that essentially captures the haplotype variation within a block. Population simulations suggest that the recursive bisection algorithm for block partitioning is generally reliable for recombination hotspots identification. Maximal entropy theory is applied to choose optimal subset of SNPs. The procedures are evaluated analytically as well as by simulation. The final chapter covers a new software package for genetic marker data analysis. The methods implemented in the package are listed. A brief tutorial is included to illustrate the features of the package. Chapter 5 also describes a new method for estimating population specific F-statistics and an extended algorithm for estimating haplotype frequencies.
APA, Harvard, Vancouver, ISO, and other styles
27

Romero, Carol Eduardo. "A genetic algorithm for reservoir characterisation using production data." Thesis, Imperial College London, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.394511.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Tong, Dong Ling. "Genetic algorithm-neural network : feature extraction for bioinformatics data." Thesis, Bournemouth University, 2010. http://eprints.bournemouth.ac.uk/15788/.

Full text
Abstract:
With the advance of gene expression data in the bioinformatics field, the questions which frequently arise, for both computer and medical scientists, are which genes are significantly involved in discriminating cancer classes and which genes are significant with respect to a specific cancer pathology. Numerous computational analysis models have been developed to identify informative genes from the microarray data, however, the integrity of the reported genes is still uncertain. This is mainly due to the misconception of the objectives of microarray study. Furthermore, the application of various preprocessing techniques in the microarray data has jeopardised the quality of the microarray data. As a result, the integrity of the findings has been compromised by the improper use of techniques and the ill-conceived objectives of the study. This research proposes an innovative hybridised model based on genetic algorithms (GAs) and artificial neural networks (ANNs), to extract the highly differentially expressed genes for a specific cancer pathology. The proposed method can efficiently extract the informative genes from the original data set and this has reduced the gene variability errors incurred by the preprocessing techniques. The novelty of the research comes from two perspectives. Firstly, the research emphasises on extracting informative features from a high dimensional and highly complex data set, rather than to improve classification results. Secondly, the use of ANN to compute the fitness function of GA which is rare in the context of feature extraction. Two benchmark microarray data have been taken to research the prominent genes expressed in the tumour development and the results show that the genes respond to different stages of tumourigenesis (i.e. different fitness precision levels) which may be useful for early malignancy detection. The extraction ability of the proposed model is validated based on the expected results in the synthetic data sets. In addition, two bioassay data have been used to examine the efficiency of the proposed model to extract significant features from the large, imbalanced and multiple data representation bioassay data.
APA, Harvard, Vancouver, ISO, and other styles
29

Carter, Jason W. "Testing effectiveness of genetic algorithms for exploratory data analysis." Thesis, Monterey, California. Naval Postgraduate School, 1997. http://hdl.handle.net/10945/9065.

Full text
Abstract:
Approved for public release; distribution is unlimited
Heuristic methods of solving exploratory data analysis problems suffer from one major weakness - uncertainty regarding the optimality of the results. The developers of DaMI (Data Mining Initiative), a genetic algorithm designed to mine the CCEP (Comprehensive Clinical Evaluation Program) database in the search for a Persian Gulf War syndrome, proposed a method to overcome this weakness: reproducibility -- the conjecture that consistent convergence on the same solutions is both necessary and sufficient to ensure a genetic algorithm has effectively searched an unknown solution space. We demonstrate the weakness of this conjecture in light of accepted genetic algorithm theory. We then test the conjecture by modifying the CCEP database with the insertion of an interesting solution of known quality and performing a discovery session using DaMI on this modified database. The necessity of reproducibility as a terminating condition is falsified by the algorithm finding the optimal solution without yielding strong reproducibility. The sufficiency of reproducibility as a terminating condition is analyzed by manual examination of the CCEP database in which strong reproducibility was experienced. Ex post facto knowledge of the solution space is used to prove that DaMI had not found the optimal solutions though it gave strong reproducibility, causing us to reject the conjecture that strong reproducibile is a sufficient terminating condition.
APA, Harvard, Vancouver, ISO, and other styles
30

Lin, Xinyi (Cindy). "Statistical Methods for High-Dimensional Data in Genetic Epidemiology." Thesis, Harvard University, 2014. http://dissertations.umi.com/gsas.harvard:11326.

Full text
Abstract:
Recent technological advancements have enabled us to collect an unprecedented amount of genetic epidemiological data. The overarching goal of these genetic epidemiology studies is to uncover the underlying biological mechanisms so that improved strategies for disease prevention and management can be developed. To efficiently analyze and interpret high-dimensional biological data, it is imperative to develop novel statistical methods as conventional statistical methods are generally not applicable or are inefficient. In this dissertation, we introduce three novel, powerful and computationally efficient kernel machine set-based association tests for analyzing high-throughput genetic epidemiological data. In the first chapter, we construct a test for identifying common genetic variants that are predictive of a time-to-event outcome. In the second chapter, we develop a test for identifying gene-environment interactions for common genetic variants. In the third chapter, we propose a test for identifying gene-environment interactions for rare genetic variants.
APA, Harvard, Vancouver, ISO, and other styles
31

Li, Qiao. "Data mining and statistical techniques applied to genetic epidemiology." Thesis, University of East Anglia, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.533716.

Full text
Abstract:
Genetic epidemiology is the study of the joint action of genes and environmental factors in determining the phenotypes of diseases. The twin study is a classic and important epidemiological tool, which can help to separate the underlying effects of genes and environment on phenotypes. Twin data have been widely examined using traditional methods to genetic epidemiological research. However, they provide a rich sources information related to many complex phenotypes that has the potential to be further explored and exploited. This thesis focuses on two major genetic epidemiological approaches: familial aggregation analysis and linkage analysis, using twin data from TwinsUK Registry. Structural equation modelling (SEM) is a conventional method used in familial aggregation analysis, and is applied in this research to discover the underlying genetic and environmental influences on two complex phenotypes: coping strategies and osteoarthritis. However, SEM is a confirmatory method and relies on prior biomedical hypotheses. A new exploratory method, named MDS-C, combining multidimensional scaling and clustering method is developed in this thesis. It does not rely on using prior hypothetical models and is applied to uncover underlying genetic determinants of bone mineral density (BMD). The results suggest that the genetic influence on BMD is site-specific. Haseman-Elston (H-E) regression is a conventional linkage analysis approach using the identity by descent (IBD) information between twins to detect quantitative trait loci (QTLs) which regulate the quantitative phenotype. However, it only considers the genetic effect from individual loci. Two new approaches including a pair-wise H-E regression (PWH-E) and a feature screening approach (FSA) are proposed in this research to detect QTLs allowing gene-gene interaction. Simulation studies demonstrate that PWH-E and FSA have greater power to detect QTLs with interactions. Application to real-world BMD data results in identifying a set of potential QTLs, including 7 chromosomal loci consistent with previous genome-wide studies.
APA, Harvard, Vancouver, ISO, and other styles
32

ALMEIDA, MANOEL ROBERTO AGUIRRE DE. "HIBRID NEURO-FUZZY-GENETIC SYSTEM FOR AUTOMATIC DATA MINING." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2004. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=5303@1.

Full text
Abstract:
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Esta dissertação apresenta a proposta e o desenvolvimento de um sistema de mineração de dados inteiramente automático. O objetivo principal é criar um sistema que seja capaz de realizar a extração de informações obscuras a partir de bases de dados complexas, sem exigir a presença de um especialista técnico para configurá-lo. O sistema híbrido neuro-fuzzy hierárquico com particionamento binário (NFHB) vem apresentando excelentes resultados em tarefas de classificação de padrões e previsão, além de possuir importantes características não encontradas em outros sistemas similares, entre elas: aprendizado automático de sua estrutura; capacidade de receber um número maior de entradas abrangendo um maior número de aplicações; e geração de regras lingüísticas como produto de seu treinamento. Entretanto, este modelo ainda necessita de uma complexa parametrização inicial antes de seu treinamento, impedindo que o processo seja automático em sua totalidade. O novo modelo proposto busca otimizar a parametrização do sistema NFHB utilizando a técnica de coevolução genética, criando assim um novo sistema de mineração de dados completamente automático. O trabalho foi realizado em quatro partes principais: avaliação de sistemas existentes utilizados na mineração de dados; estudo do sistema NFHB e a determinação de seus principais parâmetros; desenvolvimento do sistema híbrido neuro-fuzzy-genético automático para mineração de dados; e o estudo de casos. No estudo dos sistemas existentes para mineração de dados buscou-se encontrar algum modelo que apresentasse bons resultados e ainda fosse passível de automatização. Várias técnicas foram estudadas, entre elas: Métodos Estatísticos, Árvores de Decisão, Associação de Regras, Algoritmos Genéticos, Redes Neurais Artificiais, Sistemas Fuzzy e Sistemas Neuro-Fuzzy. O sistema NFHB foi escolhido como sistema de inferência e extração de regras para a realização da mineração de dados. Deste modo, este modelo foi estudado e seus parâmetros mais importantes foram determinados. Além disso, técnicas de seleção de variáveis de entradas foram investigadas para servirem como opções para o modelo. Ao final, foi obtido um conjunto de parâmetros que deve ser automaticamente determinado para a completa configuração deste sistema. Um modelo coevolutivo genético hierárquico foi criado para realizar com excelência a tarefa de otimização do sistema NFHB. Desta forma, foi modelada uma arquitetura hierárquica de Algoritmos Genéticos (AG s), onde os mesmos realizam tarefas de otimização complementares. Nesta etapa, também foram determinados os melhores operadores genéticos, a parametrização dos AG s, a melhor representação dos cromossomas e as funções de avaliação. O melhor conjunto de parâmetros encontrado é utilizado na configuração do NFHB, tornando o processo inteiramente automático. No estudo de casos, vários testes foram realizados em bases de dados reais e do tipo benchmark. Para problemas de previsão, foram utilizadas séries de carga de energia elétrica de seis empresas: Cerj, Copel, Eletropaulo, Cemig, Furnas e Light. Na área de classificação de padrões, foram utilizadas bases conhecidas de vários artigos da área como Glass Data, Wine Data, Bupa Liver Disorders e Pima Indian Diabetes. Após a realização dos testes, foi feita uma comparação com os resultados obtidos por vários algoritmos e pelo NFHB original, porém com parâmetros determinados por um especialista. Os testes mostraram que o modelo criado obteve resultados bastante satisfatórios, pois foi possível, com um processo completamente automático, obter taxas de erro semelhantes às obtidas por um especialista, e em alguns casos taxas menores. Desta forma, um usuário do sistema, sem qualquer conhecimento técnico sobre os modelos utilizados, pode utilizá-lo para realizar min
This dissertation presents the proposal and the development of a totally automatic data mining system. The main objective is to create a system that is capable of extracting obscure information from complex databases, without demanding the presence of a technical specialist to configure it. The Hierarchical Neuro-Fuzzy Binary Space Partitioning model (NFHB) has produced excellent results in pattern classification and time series forecasting tasks. Additionally, it provides important features that are not present in other similar systems, such as: automatic learning of its structure; ability to deal with a larger number of input variables, thus increasing the range of possible applications; and generation of linguistic rules as a result of its training process. However, this model depends on a complex configuration process before the training is performed, hindering to achieve a totally automatic system. The model proposed in this Dissertation tries to optimize the NFHB system parameters by using the genetic coevolution technique, thus creating a new automatic data mining system. This work consisted of four main parts: evaluation of existing systems used in data mining; study of the NFHB system and definition of its main parameters; development of the automatic hybrid neuro-fuzzy-genetic system for data mining; and case studies. In the study of existing data mining systems, the aim was to find a suitable model that could yield good results and still be automated. Several techniques have been studied, among them: Statistical methods, Decision Trees, Rules Association, Genetic Algorithms, Artificial Neural Networks, Fuzzy and Neuro- Fuzzy Systems. The NFHB System was chosen for inference and rule extraction in the data mining process. In this way, this model was carefully studied and its most important parameters were determined. Moreover, input variable selection techniques were investigated, to be used with the proposed model. Finally, a set of parameters was defined, which must be determined automatically for the complete system configuration. A hierarchical coevolutive genetic model was created to execute the system optimization task with efficiency. Therefore, a hierarchical architecture of genetic algorithms (GAs) was created, where the GAs execute complementary optimization tasks. In this stage, the best genetic operators, the GAs configuration, the chromossomes representation, and evaluation functions were also determined. The best set of parameters found was used in the NFHB configuration, making the process entirely automatic. In the case studies, various tests were performed with benchmark databases. For forecasting problems, six electric load series were used: Cerj, Copel, Eletropaulo, Cemig, Furnas and Light. In the pattern classification area, some well known databases were used, namely Glass Data, Wine Data, Bupa Liver Disorders and Pima Indian Diabetes. After the tests were carried out, a comparison was made with known models and with the original NFHB System, configured by a specialist. The tests have demonstrated that the proposed model generates satisfactory results, producing, with an automatic process, similar errors to the ones obtained with a specialist configuration, and, in some cases, even better results can be obtained. Therefore, a user without any technical knowledge of the system, can use it to perform data mining, extracting information and knowledge that can help him/her in decision taking processes, which is the final objective of a Knowledge Data Discovery process.
APA, Harvard, Vancouver, ISO, and other styles
33

MEDEIROS, SHELLY CRISTIANE DAVILA. "INVERSION OF PARAMETERS IN SEISMIC DATA BY GENETIC ALGORITHMS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2005. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=8622@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
Esta dissertação investiga o uso de Algoritmos Genéticos aplicados em dados sísmicos com o objetivo de obter parâmetros físicos e atributos sísmicos que auxiliem na caracterização das rochas de um subsolo terrestre. Os dados sísmicos têm sido extensamente empregados no setor de exploração de petróleo. As aplicações envolvendo sísmica não se restringem na busca por novas reservas de petróleo, mas também são usadas para projetar novos poços e melhorar a produção dos reservatórios de petróleo. O levantamento de dados sísmicos permite analisar extensas áreas da subsuperfície com custo praticável em relação a outras técnicas. Entretanto, a interpretação desses dados com o objetivo de obter informações relevantes e acuradas não é uma tarefa simples. Para isto, várias técnicas de inversão sísmica vêm sendo desenvolvidas. Este trabalho consistiu em avaliar uma alternativa que emprega Algoritmos Genéticos para inverter parâmetros a partir de dados sísmicos. Existem 3 etapas principais neste trabalho. Primeiramente, foram estudados o tema da exploração sísmica e a técnica de Algoritmos Genéticos. Na segunda etapa foi definido um modelo, usando Algoritmos Genéticos, que busca, neste caso, minimizar uma medida de erro, para obtenção dos parâmetros objetivos. Finalmente, foi implementado um sistema a partir do modelo proposto e realizados os estudos de casos com dados sísmicos sintéticos para avaliar o seu desempenho. O modelo baseado em Algoritmos Genéticos foi avaliado submetendo-se seus resultados a um especialista e comparando-os com os da busca aleatória. Os resultados obtidos se mostraram consistentemente satisfatórios e sempre superiores aos da busca exaustiva.
This dissertation investigates the use of Genetic Algorithms applied to seismic data with the objective of obtaining physical parameters and seismic attributes that would facilitate the characterization of rocks in terrestrial subsoil. The seismic data has been extensively utilized in the field of petroleum exploration. The applications involving seismic are not restrained to the search for new petroleum reserves, but are also used to project new wells and to improve the production of existing petroleum reservoirs. The survey of seismic data allows the analysis of extended areas of the subsurface at an affordable price relative to other techniques. However, the interpretation of the data with the objective of obtaining relevant and accurate information is not an easy task. For that, several seismic inversion techniques are being developed. This work consists in evaluating an alternative that uses Genetic Algorithms to invert parameters from seismic data. There are 3 main stages in this work. Initially, the theme of seismic exploration and the technique of Genetic Algorithms have been studied. On the second stage a model has been defined, using Genetic Algorithms, which aims, in this case, to minimize an error measurement, obtaining objective parameters. Finally, a system from the proposed model has been implanted and the study of cases with synthetic seismic data has been executed to evaluate its performance. The process of optimizing has been compared to the process of random search and the results obtained by the model have always been superior.
APA, Harvard, Vancouver, ISO, and other styles
34

Minichiello, Mark Joseph. "Analysis of genetic variation data using ancestral recombination graphs." Thesis, University of Cambridge, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.613255.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Jaffrezic, Florence. "Statistical models for the genetic analysis of longitudinal data." Thesis, University of Edinburgh, 2001. http://hdl.handle.net/1842/12274.

Full text
Abstract:
The first objective of this work was to compare and contrast different methodologies for genetic analysis. As the range of all possible models can be very large in practice, it is advisable to have a preliminary idea of the covariance structure of the data, and a non-parametric approach based on the variogram was proposed. It is especially adapted for exploratory analysis when a large number of observations is available per subject over time and was applied to the analysis of daily records for milk production in dairy cattle. Model comparisons in the univariate case showed that character processes were generally better able to fit the covariance structure than random regression with fewer parameters. However, CP models do not allow a straightforward extension to the multivariate case. Further research showed that structured antedependence models offer similar advantages to character processes compared to random regression while allowing an extension to multi-trait analyses. SAD models were even able to capture the highly non-stationary correlation pattern in the application to lactation curve analysis. For genetic evaluation of dairy cattle, longitudinal models can easily provide estimation of individual cumulative milk productions as well as genetic values at 305 days. However, these predictions do not take into account the drying-off process and can be highly overestimated for short lactations. A methodology to correct them was suggested. All these analyses were performed in the case of normally distributed longitudinal data. An extension to the genetic analysis of non-normally repeated measures was considered. Estimation procedure becomes much more complicated and requires the use of Markov Chain Monte Carlo methods. In this study antedependence models appeared to be the most appropriate for genetic analysis of longitudinal data.
APA, Harvard, Vancouver, ISO, and other styles
36

Alsulaiman, Thamer. "Detecting complex genetic mutations in large human genome data." Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/6908.

Full text
Abstract:
All cellular forms of life contain Deoxyribonucleic acid (DNA). DNA is a molecule that carries all the information necessary to perform both, basic and complex cellular functions. DNA is replicated to form new tissue/organs, and to pass genetic information to future generations. DNA replication ideally yield an exact copy of the original DNA. While replication generally occurs without error, it may leave DNA vulnerable to accidental changes via mistakes made during the replication process. Those changes are called mutations. Mutations range in magnitude. Yet, mutations of any magnitude range in consequences, from no effect on the organism, to disease initiation (e.g. cancer), or even death. In this thesis, we limit our focus to mutations in human DNA, and in particular, MMBIR mutations. Recent literature in human genomics has found Microhomology-mediated break-induced replication (MMBIR) to be a common mechanism producing complex mutations in DNA. MMBIRFinder is a tool to detect MMBIR regions in Yeast DNA. Although MMBIRFinder is successful on Yeast DNA, MMBIRFinder is not capable of detecting MMBIR mutations in human DNA. Among several reasons, one major reason for its deficiency with human DNA is the amount of computations required to process human large data. Our contribution in this regard is two fold: 1) We utilize parallel computations to significantly reduce the processing time consumed by the original MMBIFinder, and address several performance degrading issues inherent in the original design; 2) We introduce a new heuristic to detect MMBIR mutations that were not detected by the original MMBIRFinder, even in the case of small sized DNA, like Yeast DNA.
APA, Harvard, Vancouver, ISO, and other styles
37

Evenstone, Lauren. "Employing Limited Next Generation Sequence Data for the Development of Genetic Loci of Phylogenetic and Population Genetic Utility." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2191.

Full text
Abstract:
Massively parallel high throughput sequencers are transforming the scientific research by reducing the cost and time necessary to sequence entire genomes. The goal of this project is to produce preliminary genome assemblies of calliphorid flies using Life Technologies’ Ion Torrent sequencing and Illumina’s MiSeq sequencing. I located, assembled, and annotated a novel mitochondrial genome for one such fly, the little studied Chrysomya pacifica that is central to one hypothesis about blow fly evolution. With sequencing data from Chrysomya megacephala, its forensically relevant sister species, much insight can be gained by alignments, sequence and protein analysis, and many more tools within the CLC Genomics Workbench software program. I present these analyses here of these recently diverged species.
APA, Harvard, Vancouver, ISO, and other styles
38

Chen, Li. "Searching for significant feature interaction from biological data." Diss., Online access via UMI:, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
39

Rodríguez, Botigué Laura 1984. "Demographic insights of human north African populations using genetic data." Doctoral thesis, Universitat Pompeu Fabra, 2012. http://hdl.handle.net/10803/108336.

Full text
Abstract:
The history of North Africa is extremely complex, and it has been difficult to assess from genetic and archeological data whether early populations were replaced by later migrations or if there has been continuous settlement of the region. To resolve the history of human origin and migrations in North Africa, I have used two main forms of genetic data, the maternally inherited mtDNA and 730,000 genome-wide SNPs from a genotype array in a sample set representative of the region. I have discovered that North Africa is a mosaic of an autochthonous component dating back to the Paleolithic and at least four other ancestries, two recent ancestries from sub-Saharan Africa and the others from Europe and the Near East. We have also discovered extensive North African gene flow to the Iberian Peninsula, and minor proportions in the rest of the Europe.
La història del Nord d’Àfrica és extremadament complexa, i fins ara ha estat molt difícil determinar a partir de la genètica o l’arqueologia si els primers pobladors van ser reempleçats per migracions posteriors, o si el poblament de la regió ha estat continuat al llarg del temps. Per tal d’investigar els orígens i les migracions de l’home al Nord d’Àfrica he fet servir dos marcadors genètics en un grup de poblacions representatives de la regio, el marcador heretat per via materna, el DNA mitocondrial (mtDNA), i 730,000 SNPs de tot el genoma genotipats amb un xip. He descobert que el Nord d’Àfrica és un mosaic format per un component autòcton amb origens en el Paleolític i un mínim de quatre components més, dos d’ells recents d’origen sub-Saharià i els altres Europeu i d’Orient Proper. També hem descobert un flux genic recent d’origen Nord Africà molt elevat a la Península Ibèrica, i en menor quantitat a Europa.
APA, Harvard, Vancouver, ISO, and other styles
40

Lindlöf, Angelica. "Deriving Genetic Networks from Gene Expression Data and Prior Knowledge." Thesis, University of Skövde, Department of Computer Science, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-589.

Full text
Abstract:

In this work three different approaches for deriving genetic association networks were tested. The three approaches were Pearson correlation, an algorithm based on the Boolean network approach and prior knowledge. Pearson correlation and the algorithm based on the Boolean network approach derived associations from gene expression data. In the third approach, prior knowledge from a known genetic network of a related organism was used to derive associations for the target organism, by using homolog matching and mapping the known genetic network to the related organism. The results indicate that the Pearson correlation approach gave the best results, but the prior knowledge approach seems to be the one most worth pursuing

APA, Harvard, Vancouver, ISO, and other styles
41

Gay, Jo. "Estimating the rate of gene conversion from population genetic data." Thesis, University of Oxford, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.509942.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Hobbs, Mike. "Genetic algorithms for spatial data analysis in geographical information systems." Thesis, University of Kent, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.262636.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Sthamer, Harmen-Hinrich. "The automatic generation of software test data using genetic algorithms." Thesis, University of South Wales, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.320726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Xu, Yun. "Chemometrics pattern recognition with applications to genetic and metabolomics data." Thesis, University of Bristol, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.435733.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Kumuthini, Judit. "Extraction of genetic network from microarray data using Bayesian framework." Thesis, Cranfield University, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.442547.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Eisner, Eric David. "Incorporating diverse data to improve genetic network alignment with IsoRank." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/77068.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 26).
To more accurately predict which genes from different species have the same function (orthologs), I extend the network-alignment algorithm IsoRank to simultaneously align multiple unrelated networks over the same set of nodes. In addition to the original protein-interaction networks, I align genetic-interaction networks, gene-expression correlations, and chromosome localization data to improve the functional similarity of aligned genes. Alignments are evaluated with consistency measurements of protein function within ortholog clusters, and with an information-retrieval statistic from a small set of known orthologs. Integrating these additional types of data is shown to improve IsoRank's predictions of classes of genes that have sparse coverage in the original protein-interaction networks.
by Eric David Eisner.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
47

Gerbault, P. "Modeling demographic and evolutionary history : integrating genetic and archaeological data." Thesis, University College London (University of London), 2013. http://discovery.ucl.ac.uk/1389022/.

Full text
Abstract:
In recent years, population genetic data has been used increasingly to make inferences on population history – particularly concerning the human past. However, any observed genetic data is only one outcome amongst a very large number of possibilities that can arise under the same population history; and reversely, multiple different histories can give rise to the same data. For this reason direct interpretation of patterns in genetic data to recover evolutionary history is highly problematic, and inferring evolutionary histories – including demographic and natural selection-related parameters – in a secure statistical framework, requires the exploration of a range of explicit models. Such models are better informed when conditioned on multiple data sources rather than purely genetic data, such as archaeological, environmental, and behavioural or cultural data. This PhD aims at integrating such data into a single framework in order to examine how well various population history hypotheses can explain observed patterns in various data. The approach I have used is simulation modeling coupled with approximate Bayesian computation (ABC) techniques to investigate three population histories. I have used a forward simulation model coupled with ABC to investigate demographic and evolutionary parameters of (i) the evolution of the ectodysplasin-A receptor (EDAR) derived allele in Southeast Asia, and (ii) the gene-culture co-evolution of lactase persistence (LP) and dairying in Europe. In both cases, the model simulates the allele frequency and the underlying population demography, conditioned on archaeological data for when and where farming starts. Natural selection is inferred to have driven both alleles to high frequencies in their respective regions. However, the reasons why these alleles would have been favored are still unclear. I therefore further apply the simulation / ABC framework to explore various selective hypotheses conditioned on key environmental factors. (iii) I have used a coalescent approach to assess how domestication has affected goat mitochondrial DNA (mtDNA) diversity. I have used the coalescent to simulate genealogies under demographic models informed by archaeological data. I then applied an ABC approach to determine which of those models best explains the observed patterns of mtDNA diversity in goats and to estimate demographic parameters from those models.
APA, Harvard, Vancouver, ISO, and other styles
48

Trochet, Holly. "Simple Bayesian approaches to modelling pleiotropy in genetic association data." Thesis, University of Oxford, 2017. https://ora.ox.ac.uk/objects/uuid:a1e3d606-ef39-4ef7-981e-088d6703c04f.

Full text
Abstract:
Genome-wide association studies (GWAS) have become one of the most common types of genetic studies, due to their success at finding markers associated with diseases and other traits of interest. With thousands of these studies performed in the last decade, a variety of methods have been developed to meta-analyze their results to search for variants that affect multiple traits. This thesis introduces one such method, based on approximate Bayes factors (ABFs), which relies only on effect size estimates and standard errors from GWAS. Through application to simulated data as well as three different datasets, we demonstrate the statistical properties, strengths, and limitations of our approach. We show that only can this method be applied to the meta-analysis of a single trait across multiple studies, but because it does does not make strong assumptions about the similarity of effect sizes across traits, it can also be used to detect effects across multiple traits at a single marker. Additionally, it can account for confounding due to things like shared samples and can be applied exhaustively across all possible combinations of associations to determine the subset of traits or studies that are most likely to be associated with a given variant. This affords the opportunity to make statements about which traits explicitly are and are not associated with a marker, and these patterns can be explored over the whole genome to learn about the genetic relationships between different traits. We also discuss some of the individual markers highlighted by our analyses - some known, and some potentially novel - and the traits associated with them.
APA, Harvard, Vancouver, ISO, and other styles
49

Ramachandran, Sohini. "The signature of historical migrations on human population genetic data /." May be available electronically:, 2007. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Salem, Rany Mansour. "Statistical methods for genetic association analysis involving complex longitudinal data." Diss., [La Jolla] : [San Diego] : University of California, San Diego ; San Diego State University, 2009. http://wwwlib.umi.com/cr/ucsd/fullcit?p3366492.

Full text
Abstract:
Thesis (Ph. D.)--University of California, San Diego and San Diego State University, 2009.
Title from first page of PDF file (viewed Aug. 14, 2009). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography