Dissertations / Theses on the topic 'Bioinformatics and biostatistics'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Bioinformatics and biostatistics.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Shi, Jing. "Biostatistics and bioinformatics methods for analysis of pathways and gene expression /." May be available electronically:, 2007. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.
Full textShankar, Vijay. "Extension of Multivariate Analyses to the Field of Microbial Ecology." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1464358122.
Full textCrabtree, Nathaniel Mark. "Multi-Class Computational Evolution| Development, Benchmark Comparison, and Application to RNA-Seq Biomarker Discovery." Thesis, University of Arkansas at Little Rock, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10620232.
Full textA computational evolution system (CES) is a knowledge-discovery engine that constructs and evolves classifiers with a small number of features to identify subtle, synergistic relationships among features and to discriminate groups in high-dimensional data analysis. CESs have previously been designed to only analyze binary datasets. In this work, the CES method has been expanded to accommodate multi-class data.
The multi-class CES was compared to three common classification and feature selection methods: random forest, random k-nearest neighbor, and support vector machines. The four classifiers were evaluated on three real RNA sequencing datasets. Performance was evaluated via cross validation to assess classification accuracy, number of features selected, stability of the selected feature sets, and run-time.
The three common classification and feature selection methods were originally designed for microarray data, which is fundamentally different from RNA-Seq data. In order to preprocess RNA-Seq count data for classification, the data was normalized and transformed via a variance stabilizing transformation to remove the variance-mean relationship that is commonly observed in RNA-Seq count data.
Compared to the three competing methods, the multi-class CES selected far fewer features. The identified features are potential biomarkers that may be more relevant than the longer lists of features identified by the competing methods. The CES performed best on the dataset with the smallest sample size, indicating that it has a unique advantage in these situations since most classification algorithms suffer in terms of accuracy when the sample size is small.
The CES identified numerous potentially-important biomarkers in each of the three real datasets that are validated by previous research and worthy of additional investigation. CES was especially helpful at identifying important features in the rat blood RNA-Seq data set. Subsequent ontological analysis of these selected features revealed protein folding as an important process in that dataset. The other contribution of this research to science was to extend the applicability of CES to biomarker discovery in multi-class settings. New software algorithms based on CES have already been developed, and the multi-class modifications presented here are directly applicable and would also benefit the newer software.
Mirina, Alexandra. "Computational approaches for intelligent processing of biomedical data." Thesis, Yeshiva University, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3664552.
Full textThe rapid development of novel experimental techniques has led to the generation of an abundance of biological data, which holds great potential for elucidating many scientific problems. The analysis of such complex heterogeneous information, which we often have to deal with, requires appropriate state-of-the-art analytical methods. Here we demonstrate how an unconventional approach and intelligent data processing can lead to meaningful results.
This work includes three major parts. In the first part we describe a correction methodology for genome-wide association studies (GWAS). We demonstrate the existing bias for the selection of larger genes for downstream analyses in GWA studies and propose a method to adjust for this bias. Thus, we effectively show the need for data preprocessing in order to obtain a biologically relevant result. In the second part, building on the results obtained in the first part, we attempt to elucidate the underlying mechanisms of aging and longevity by conducting a longevity GWAS. Here we took an unconventional approach to the GWAS analysis by applying the idea of genetic buffering. Doing this allowed us to identify pairs of genetic markers that play a role in longevity. Furthermore, we were able to confirm some of them by means of a downstream network analysis. In the third and final part, we discuss the characteristics of chronic lymphocytic leukemia (CLL) B-cells and perform clustering analysis based on immunoglobulin (Ig) mutation patterns. By comparing the sequences of Ig of CLL patients and healthy donors, we show that different Ig heavy chain (IGHV) regions in CLL exhibit similarities with different B-cell subtypes of healthy donors, which raised a question about the single origin of CLL cases.
Zhang, Ju. "Trans-Ancestral Genetic Correlation Estimates from Summary Statistics for Admixed Populations." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1619455882746982.
Full textLott, Paul Christian. "StochHMM| A Flexible Hidden Markov Model Framework." Thesis, University of California, Davis, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3602142.
Full textIn the era of genomics, data analysis models and algorithms that provide the means to reduce large complex sets into meaningful information are integral to further our understanding of complex biological systems. Hidden Markov models comprise one such data analysis technique that has become the basis of many bioinformatics tools. Its relative success is primarily due to its conceptually simplicity and robust statistical foundation. Despite being one of the most popular data analysis modeling techniques for classification of linear sequences of data, researchers have few available software options to rapidly implement the necessary modeling framework and algorithms. Most tools are still hand-coded because current implementation solutions do not provide the required ease or flexibility that allows researchers to implement models in non-traditional ways. I have developed a free hidden Markov model C++ library and application, called StochHMM, that provides researchers with the flexibility to apply hidden Markov models to unique sequence analysis problems. It provides researchers the ability to rapidly implement a model using a simple text file and at the same time provide the flexibility to adapt the model in non-traditional ways. In addition, it provides many features that are not available in any current HMM implementation tools, such as stochastic sampling algorithms, ability to link user-defined functions into the HMM framework, and multiple ways to integrate additional data sources together to make better predictions. Using StochHMM, we have been able to rapidly implement models for R-loop prediction and classification of methylation domains. The R-loop predictions uncovered the epigenetic regulatory role of R-loops at CpG promoters and protein coding genes 3' transcription termination. Classification of methylation domains in multiple pluripotent tissues identified epigenetics gene tracks that will help inform our understanding of epigenetic diseases.
Himmelstein, Daniel S. "The hetnet awakens| understanding complex diseases through data integration andopen science." Thesis, University of California, San Francisco, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10133408.
Full textHuman disease is complex. However, the explosion of biomedical data is providing new opportunities to improve our understanding. My dissertation focused on how to harness the biodata revolution. Broadly, I addressed three questions: how to integrate data, how to extract insights from data, and how to make science more open.
To integrate data, we pioneered the hetnet—a network with multiple node and relationship types. After several preludes, we released Hetionet v1.0, which contains 2,250,197 relationships of 24 types. Hetionet encodes the collective knowledge produced by millions of studies over the last half century.
To extract insights from data, we developed a machine learning approach for hetnets. In order to predict the probability that an unknown relationship exists, our algorithm identifies influential network patterns. We used the approach to prioritize disease—gene associations and drug repurposing opportunities. By evaluating our predictions on withheld knowledge, we demonstrated the systematic success of our method.
After encountering friction that interfered with data integration and rapid communication, I began looking at how to make science more open. The quest led me to explore realtime open notebook science and expose publishing delays at journals as well as the problematic licensing of publicly-funded research data.
Petereit, Julia. "Petal - A New Approach to Construct and Analyze Gene Co-Expression Networks in R." Thesis, University of Nevada, Reno, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10248467.
Full textpetal is a network analysis method that includes and takes advantage of precise Mathematics, Statistics, and Graph Theory, but remains practical to the life scientist. petal is built upon the assumption that large complex systems follow a scale-free and small-world network topology. One main intention of creating this program is to eliminate unnecessary noise and imprecision introduced by the user. Consequently, no user input parameters are required, and the program is designed to allow the two structural properties, scale-free and small-world, to govern the construction of network models.
The program is implemented in the statistical language R and is freely available as a package for download. Its package includes several simple R functions that the researcher can use to construct co-expression networks and extract gene groupings from a biologically meaningful network model. More advanced R users may use other functions for further downstream analyses, if desired.
The petal algorithm is discussed and its application demonstrated on several datasets. petal results show that the technique is capable of detecting biologically meaningful network modules from co-expression networks. That is, scientists can use this technique to identify groups of genes with possible similar function based on their expression information.
While this approach is motivated by whole-system gene expression data, the fundamental components of the method are transparent and can be applied to large datasets of many types, sizes, and stemming from various fields.
Dimont, Emmanuel. "Methods for the Analysis of Differential Composition of Gene Expression." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:14226062.
Full textBueno, Raymund. "Investigating Mechanisms of Robustness in BRCA -Mutated Breast and Ovarian Cancers." Thesis, Yeshiva University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=11014738.
Full textThe BRCA1 and BRCA2 (BRCA) genes are two tumor suppressors that when mutated, predispose patients to breast and ovarian cancer. The BRCA genes encode proteins that mediate the repair of DNA double strand breaks. Functional loss of the BRCA genes is detrimental to the integrity of the genome because without access to functional BRCA protein, inefficient and error-prone repair pathways are used instead. These pathways, such as Non-homologous end joining, do not accurately repair the DNA, which can introduce mutations and genomic rearrangements. Ultimately the genome is not repaired faithfully and the predisposition to cancer greatly increases. In addition to their contribution to DNA repair, the BRCA genes have been shown to have transcriptional activity, and this functional role can also be a driving factor behind the tumor suppressor activity.
Robustness is the ability of a complex system to sustain viability despite perturbations to it. In the context of a complex disease such as cancer, robustness gives cancers the ability to sustain uncontrollable growth and invasiveness despite treatments such as chemotherapy that attempt to eliminate the tumor. A complex system is robust however can be fragile to perturbations that the system not optimized against. In cancers, these fragilities have the potential to be cancer specific targets that can eradicate the disease specifically.
Patients with mutations in BRCA tend to have breast and ovarian cancers that are difficult to treat; chemotherapy is the only option and no targeted therapies are available. Targeting the synthetic lethal interaction (SLI), a mechanism of robustness, between BRCA and PARP1 genes was clinically effective in treating BRCA-mutated breast and ovarian cancers. This suggests that understanding robustness in cancers can reveal potential cancer specific therapies.
In this thesis, a computational approach was developed to identify candidate mechanisms of robustness in BRCA-mutated breast and ovarian cancers using the publicly accessible patient gene expression and mutation data from the Cancer Genome Atlas (TCGA). Results showed that in ovarian cancer patients with a BRCA2 mutation, the expression of genes that function in the DNA damage response were kept at stable expression state compared to those patients without a mutation. The stable expression of genes in the DNA damage response may highlight a SLI gene network that is precisely controlled. This result is significant as disrupting this precision can potentially lead to cancer specific death. In breast cancers, genes that were differentially expressed in patients with BRCA mutations were identified. A Bayesian network was performed to infer candidate interactions between BRCA1 and BRCA2 and the differentially expressed FLT3, HOXA11, HPGD, MLF1, NGFR, PLAT, and ZBTB16 genes. These genes function in processes important to cancer progression such as apoptosis and cell migration. The connection between these genes with BRCA may highlight how the BRCA genes influence cancer progression.
Taken together, the findings of this thesis enhance our understanding of the BRCA genes and their role in DNA damage response and transcriptional regulation in human breast and ovarian cancers. These results have been attained from systems-level models to identify candidate mechanisms underlying robustness of cancers. The work presented predicts interesting candidate genes that may have potential as drug targets or biomarkers in BRCA-mutated breast and ovarian cancers.
Rajabi, Zeyad. "BIAS : bioinformatics integrated application software and discovering relationships between transcription factors." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81427.
Full textQuiroz, Alejandro. "Deciphering the Biological Mechanisms Driving the Phenotype of Interest." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10708.
Full textYu, Jingting. "Methods to Evaluate the Effects of Chromatin Organization in eQTL Mapping and the Effects of Design Factors in Cancer Single-cell Studies." Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1554463507829716.
Full textOng, Vy Quoc. "Subgroup Analysis of Patients with Hepatocellular Carcinoma| A Quest for Statistical Algorithms for Tissue Classification Problem." Thesis, California State University, Long Beach, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10840510.
Full textHepatocellular carcinoma (HCC) is the most common type of liver cancer. This type of cancer has been observed with prevalence as the third leading cause of death from cancer worldwide and as the ninth leading cause of cancerous mortality in the United States. People with hepatitis B or C are considered to be at high risk for this kind of cancer. Remarkably poor prognostic HCC patients with low survival rates commonly possess intra-hepatic metastases that are either tumor thrombi in the portal vein or intra-hepatic spread. It is uncommon for them to die of extra-hepatic metastases. Therefore, identifying metastatic HCC has become vital and clinically challenging in efforts of timely therapeutic intervention to improve the survival rate of patients who suffer from this disease.
To date, studies that look for an accurate molecular profiling model have been developed to identify these patients in advance for a better treatment or intervention. An approach has been to focus on identifying individual candidate genes characterizing metastatic HCC. Another direction has been to find a global genome scale solution by using microarray technology to obtain a gene expression for this carcinoma. Among research following the latter was that developed by Qing-Hai Ye et al., Nature Medicine, Volume 9, Number 4, April 2003. They applied cDNA microarray-based gene expression profiling with compound co-variate predictors for primary HCC, metastatic HCC, and metastasis-free HCC binary classification tasks on a dataset of 87 observations and 9984 features taken from 40 hepatitis B-positive Chinese patients. Notably, a robust 153-gene model was generated to successfully classify tumor-thrombi-in-the-portal-vein samples with metastasis-free samples. However, they admitted distinguishing primary tumor samples from their matched-metastatic lesions were still a challenge. In this molecule signature, a gene named osteopontin, a secreted phosphoprotein, served as the lead gene in diagnosing HCC metastasis.
The analysis is based on the metastatic status of HCC, which is clinically predetermined. However, the validation of the class definition is needed to investigate if the data are sufficient to translate the three classes predefined. We will use some statistical clustering algorithms to validate the class defined. After that, we will conduct variable selection to find markers that are differentially expressed genes among clinical groups validated from this research. Next, using the compound markers found by this research, we will develop a statistical model to predict a new patient’s HCC type for intervention. The generalized performance of the prediction model will be evaluated via a cross-validation test. This study aims to build a highly accurate model that renders a better classification of the fore-mentioned clinical groups of HCC and thus enhances the rate of predicting metastatic patients.
Alouani, David James. "THE AGING PROCESS OF C. ELEGANS VIEWED THROUGH TIME DEPENDENT PROTEIN EXPRESSION ANALYSIS." Case Western Reserve University School of Graduate Studies / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=case1436393267.
Full textYip, Wai-Ki. "Statistical Methods for Analyzing DNA Methylation Data and Subpopulation Analysis of Continuous, Binary and Count Data for Clinical Trials." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:14226106.
Full textArnold, Brian. "Evolutionary Dynamics of a Multiple-Ploidy System in Arabidopsis Arenosa." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467222.
Full textBiology, Organismic and Evolutionary
Zack, Travis Ian. "Exploring cancer's fractured genomic landscape| Searching for cancer drivers and vulnerabilities in somatic copy number alterations." Thesis, Harvard University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3645095.
Full textSomatic copy number alterations (SCNAs) are a class of alterations that lead to deviations from diploidy in developing and established tumors. A feature that distinguishes SCNAs from other alterations is their genomic footprint. The large genomic footprint of SCNAs in a typical cancer's genome presents both a challenge and an opportunity to find targetable vulnerabilities in cancer. Because a single event affects many genes, it is often challenging to identify the tumorigenic targets of SCNAs. Conversely, events that affect multiple genes may provide specific vulnerabilities through "bystander" genes, in addition to vulnerabilities directly associated with the targets.
We approached the goal of understanding how the structure of SCNAs may lead to dependency in two ways. To improve our understanding of how SCNAs promote tumor progression we analyzed the SCNAs in 4934 primary tumors in 11 common cancers collected by the Cancer Genome Atlas (TCGA). The scale of this dataset provided insights into the structure and patterns of SCNA, including purity and ploidy rates across disease, mechanistic forces shaping patterns of SCNA, regions undergoing significantly recurrent SCNAs, and correlations between SCNAs in regions implicated in cancer formation.
In a complementary approach, we integrating SCNA data and pooled RNAi screening data involving 11,000 genes across 86 cell lines to find non-driver genes whose partial loss led to increased sensitivity to RNAi suppression. We identified a new set of cancer specific vulnerabilities predicted by loss of non-driver genes, with the most significant gene being PSMC2, an obligate member of the 26S proteasome. Biochemically, we found that PSMC2 is in excess of cellular requirement in diploid cells, but becomes the stoichiometric limiting factor in proteasome formation after partial loss of this gene.
In summary, my work improved our understanding of the structure and patterns of SCNA, both informing how cancers develop and predicting novel cancer vulnerabilities. Our characterization of the SCNAs present across 5000 tumors uncovered novel structure in SCNAs and significant regions likely to contain driver genes. Through integrating SCNA data with the results of a functional genetic screen, we also uncovered a new set of vulnerabilities caused by unintended loss of non-driver genes.
Ablorh, Akweley. "Meta-Analysis of a Multi-Ethnic, Breast Cancer Case-Control Targeted Sequencing Study." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:16121143.
Full textEpidemiology
Patel, Vishal N. "Colon Cancer and its Molecular Subsystems: Network Approaches to Dissecting Driver Gene Biology." Case Western Reserve University School of Graduate Studies / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=case1310087563.
Full textAthippozhy, Antony Thomas. "ANALYSIS OF DIFFERENTIAL GENE EXPRESSION AND ALTERNATIVE SPLICING IN THE LIVER AND GASTROINTESTINAL TRACT IN THE LACTATING RAT." UKnowledge, 2011. http://uknowledge.uky.edu/gradschool_diss/218.
Full textChan, Ying Leong. "Leveraging genetic association data to investigate the polygenic architecture of human traits and diseases." Thesis, Harvard University, 2014. http://dissertations.umi.com/gsas.harvard:11372.
Full textLoveless, Ian. "Binary Classification With First Phase Feature Selection forGene Expression Survival Data." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1555444873531262.
Full textReese, Sarah. "Detecting and Correcting Batch Effects in High-Throughput Genomic Experiments." VCU Scholars Compass, 2013. http://scholarscompass.vcu.edu/etd/3180.
Full textStansfield, John C. "Methods for Joint Normalization and Comparison of Hi-C data." VCU Scholars Compass, 2019. https://scholarscompass.vcu.edu/etd/5951.
Full textLarson, Jessica. "Hidden Markov Models Predict Epigenetic Chromatin Domains." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10105.
Full textLi, Ran. "Chemometrics Development using Multivariate Statistics and Vibrational Spectroscopy and its Application to Cancer Diagnosis." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1449067634.
Full textGe, Jianye. "Computational Algorithms and Evidence Interpretation in DNA Forensics based on Genomic Data." University of Cincinnati / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1234916402.
Full textMcSweeny, Andrew J. "Identification of Candidate Genes within Blood Pressure QTL Containing Regions Using Gene Expression Data." University of Toledo Health Science Campus / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=mco1212501779.
Full textStombaugh, Jesse. "Predicting the Structure of RNA 3D Motifs." Bowling Green State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1225391806.
Full textVigliotti, Chloé. "Etude de l'impact d'un changement de régime alimentaire sur le microbiome intestinal de Podarcis sicula." Thesis, Paris, Muséum national d'histoire naturelle, 2017. http://www.theses.fr/2017MNHN0011/document.
Full textWe collected and compared intestinal microbiota and microbiomes from several Podarcis sicula lizards, which live in Croatian continental and insular populations. One of these populations has recently changed its diet over an 46 years timespan, switching from an insectivorous diet to an omnivorous one (up to 80% herbivorous). Diversity analyses of these microbial communities, based on the V4 region of their 16S rRNA, showed that the microbiota taxonomic diversity (or alpha diversity) is higher in omnivorous lizards (enrichment in methanogenic archaea) than in insectivorous ones. Besides, microbial communities seem weakly structured: 5 enterotypes are detected at the phylum level, and 3 major phyla (Bacteroidetes, Firmicutes and Proteobacteria) are present. However, neither diet, spatial or temporal origin, nor lizard gender correlate with significant differences in microbiota. Linear discriminant analyses with size effect, based on OTUs and functionally annotated reads from the microbiomes, suggest that Podarcis sicula diet change is associated to targeted changes of the abundance of some enzymes in the microbiomes. Such a result leads us to propose a hypothesis of targeted changes in the microbial communities of this non-model holobiont, instead of more radical transformations. On a more theoretical level, this thesis also proposes network models (Reads similarity networks and bipartite graphs) that can help improving microbiome analyses
Bechtel, Jason M. "Characterization of Genomic MidRange Inhomogeneity." University of Toledo Health Science Campus / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=mco1217365784.
Full textBhajun, Ricky. "Une approche réseau pour l’inférence du rôle des microARN dans la corégulation des processus biologiques." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAS045/document.
Full textRNA interference is a process in which a small non-coding RNA will bind to a specific messenger RNA and regulate its expression. This evolutionary conserved mechanism is found in all superior eukaryotes from plants to mammals. Nowadays, we know that RNA interference is a major regulatory process involved in developmental biology and tumor progression. MicroRNAs (miRNAs) are endogenous (coded in and produced by the cell) non-coding RNAs which are able to regulate a whole set of genes, typically hundreds of genes. This doctoral thesis consisted in the analysis of the miRNA mediated coregulation through a network approach based on target sharing. Coregulation is the process where many different miRNAs will regulate the same set of genes and thus the same biological process. In particular, the work consisted in the inference of a miRNA network, in its topological analysis and also its biological interpretation. Indeed, the final aim of the work was to generate new biological hypothesis. As such, two different groups of miRNAs were first retrieved. One of them was predicted to be involved in the small GTPase signaling and was further validated in vitro. Moreover, a miRNA community involved in the maintenance of stem cells pluripotency was also discovered. Finally, a systemic analysis of the target-based miRNAs network was conducted to better understand their integration with biologic networks and their role in cell fate
Song, Yeunjoo E. "New Score Tests for Genetic Linkage Analysis in a Likelihood Framework." Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1354561219.
Full textManser, Paul. "Methods for Integrative Analysis of Genomic Data." VCU Scholars Compass, 2014. http://scholarscompass.vcu.edu/etd/3638.
Full textThiel, Bonnie Arlene. "Bioinformatics approaches to studying immune processes associated with immunity to Mycobacterium tuberculosis infection in the lung and blood." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1627247387242562.
Full textHaynes, Eric E. "Identifying Common Genes from Rheumatoid Arthritis, Systemic Lupus, Multiple Sclerosis and Sjogrens Syndrome by Pooling Existing Microarray Data." University of Toledo Health Science Campus / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=mco1374011043.
Full textWang, Heming. "LOCAL ANCESTRY INFERENCE AND ITS IMPLICATION IN SEARCHING FOR SELECTION EVIDENCE IN RECENT ADMIXED POPULATION." Case Western Reserve University School of Graduate Studies / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=case1473439566976121.
Full textHaddon, Andrew L. "Evaluation of Some Statistical Methods for the Identification of Differentially Expressed Genes." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/1913.
Full textBrown, Andrew S. "Identification of a phospho-hnRNP E1 Nucleic Acid Consensus Sequence Mediating Epithelial to Mesenchymal Transition." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1437943957.
Full textMARCEDDU, GIUSEPPE. "Bioinformatics e Biostatistics applied to research in pediatric genetic disease. Clinical evidence in IFNλ4 polymorphisms associated with HCV infection in patients with beta thalassemia and WGCNA analysis weighted for IFNλ4 genotype rs12979860 to detect RPL9P18 as hub in HCV infected cell." Doctoral thesis, Università degli Studi di Cagliari, 2015. http://hdl.handle.net/11584/266612.
Full textAbuelqumsan, Mustafa. "Assessment of supervised classification methods for the analysis of RNA-seq data." Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0582/document.
Full textSince a decade, “Next Generation Sequencing” (NGS) technologies enabled to characterize genomic sequences at an unprecedented pace. Many studies focused of human genetic diversity and on transcriptome (the part of genome transcribed into ribonucleic acid). Indeed, different tissues of our body express different genes at different moments, enabling cell differentiation and functional response to environmental changes. Since many diseases affect gene expression, transcriptome profiles can be used for medical purposes (diagnostic and prognostic). A wide variety of advanced statistical and machine learning methods have been proposed to address the general problem of classifying individuals according to multiple variables (e.g. transcription level of thousands of genes in hundreds of samples). During my thesis, I led a comparative assessment of machine learning methods and their parameters, to optimize the accuracy of sample classification based on RNA-seq transcriptome profiles
Cui, Lingfei. "A Likelihood Method to Estimate/Detect Gene Flow and A Distance Method to Estimate Species Trees in the Presence of Gene Flow." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1406158261.
Full textFolch, Fortuny Abel. "Chemometric Approaches for Systems Biology." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/77148.
Full textEsta tesis doctoral se centra en el estudio, desarrollo y aplicación de técnicas quimiométricas en el emergente campo de la biología de sistemas. Procedimientos comúnmente utilizados y métodos nuevos se aplican para resolver preguntas de investigación en distintos equipos multidisciplinares, tanto del ámbito académico como del industrial. Las metodologías desarrolladas en este documento enriquecen la plétora de técnicas utilizadas en las ciencias ómicas para entender el funcionamiento de organismos biológicos y mejoran los procesos en la industria biotecnológica, integrando conocimiento biológico a diferentes niveles y explotando los paquetes de software derivados de esta tesis. Esta disertación se estructura en cuatro partes. El primer bloque describe el marco en el cual se articulan las contribuciones aquí presentadas. En él se esbozan los objetivos de los dos proyectos de investigación relacionados con esta tesis. Asimismo, se introducen los temas específicos desarrollados en este documento mediante presentaciones en conferencias y artículos de investigación. En esta parte figura una descripción exhaustiva de las ciencias ómicas y sus interrelaciones en el paradigma de la biología de sistemas, junto con una revisión de los métodos multivariantes más aplicados en quimiometría, que suponen las pilares sobre los que se asientan los nuevos procedimientos aquí propuestos. La segunda parte se centra en resolver problemas dentro de metabolómica, fluxómica, proteómica y genómica a partir del análisis de datos. Para ello se proponen varias alternativas para comprender a grandes rasgos los datos de flujos metabólicos en estado estacionario. Algunas de ellas están basadas en la aplicación de métodos multivariantes propuestos con anterioridad, mientras que otras son técnicas nuevas basadas en descomposiciones bilineales utilizando rutas metabólicas elementales. A partir de éstas se ha desarrollado software de libre acceso para la comunidad científica. A su vez, en esta tesis se propone un marco para analizar datos metabólicos en estado no estacionario. Para ello se adapta el enfoque tradicional para sistemas en estado estacionario, modelando las dinámicas de los experimentos empleando análisis de datos de dos y tres vías. En esta parte de la tesis también se establecen relaciones entre los distintos niveles ómicos, integrando diferentes fuentes de información en modelos de fusión de datos. Finalmente, se estudia la interacción entre organismos, como naranjas y hongos, mediante el análisis multivariante de imágenes, con futuras aplicaciones a la industria alimentaria. El tercer bloque de esta tesis representa un estudio a fondo de diferentes problemas relacionados con datos faltantes en quimiometría, biología de sistemas y en la industria de bioprocesos. En los capítulos más teóricos de esta parte, se proponen nuevos algoritmos para ajustar modelos multivariantes, tanto exploratorios como de regresión, en presencia de datos faltantes. Estos algoritmos sirven además como estrategias de preprocesado de los datos antes del uso de cualquier otro método. Respecto a las aplicaciones, en este bloque se explora la reconstrucción de redes en ciencias ómicas cuando aparecen valores faltantes o atípicos en las bases de datos. Una segunda aplicación de esta parte es la transferencia de modelos de calibración entre instrumentos de infrarrojo cercano, evitando así costosas re-calibraciones en bioindustrias y laboratorios de investigación. Finalmente, se propone un paquete software que incluye una interfaz amigable, disponible de forma gratuita para imputación de datos faltantes. En la última parte, se discuten los aspectos más relevantes de esta tesis para la investigación y la biotecnología, incluyendo líneas futuras de trabajo.
Aquesta tesi doctoral es centra en l'estudi, desenvolupament, i aplicació de tècniques quimiomètriques en l'emergent camp de la biologia de sistemes. Procediments comúnment utilizats i mètodes nous s'apliquen per a resoldre preguntes d'investigació en diferents equips multidisciplinars, tant en l'àmbit acadèmic com en l'industrial. Les metodologies desenvolupades en aquest document enriquixen la plétora de tècniques utilitzades en les ciències òmiques per a entendre el funcionament d'organismes biològics i milloren els processos en la indústria biotecnològica, integrant coneixement biològic a distints nivells i explotant els paquets de software derivats d'aquesta tesi. Aquesta dissertació s'estructura en quatre parts. El primer bloc descriu el marc en el qual s'articulen les contribucions ací presentades. En ell s'esbossen els objectius dels dos projectes d'investigació relacionats amb aquesta tesi. Així mateix, s'introduixen els temes específics desenvolupats en aquest document mitjançant presentacions en conferències i articles d'investigació. En aquesta part figura una descripació exhaustiva de les ciències òmiques i les seues interrelacions en el paradigma de la biologia de sistemes, junt amb una revisió dels mètodes multivariants més aplicats en quimiometria, que supossen els pilars sobre els quals s'assenten els nous procediments ací proposats. La segona part es centra en resoldre problemes dins de la metabolòmica, fluxòmica, proteòmica i genòmica a partir de l'anàlisi de dades. Per a això es proposen diverses alternatives per a compendre a grans trets les dades de fluxos metabòlics en estat estacionari. Algunes d'elles estàn basades en l'aplicació de mètodes multivariants propostos amb anterioritat, mentre que altres són tècniques noves basades en descomposicions bilineals utilizant rutes metabòliques elementals. A partir d'aquestes s'ha desenvolupat software de lliure accés per a la comunitat científica. Al seu torn, en aquesta tesi es proposa un marc per a analitzar dades metabòliques en estat no estacionari. Per a això s'adapta l'enfocament tradicional per a sistemes en estat estacionari, modelant les dinàmiques dels experiments utilizant anàlisi de dades de dues i tres vies. En aquesta part de la tesi també s'establixen relacions entre els distints nivells òmics, integrant diferents fonts d'informació en models de fusió de dades. Finalment, s'estudia la interacció entre organismes, com taronges i fongs, mitjançant l'anàlisi multivariant d'imatges, amb futures aplicacions a la indústria alimentària. El tercer bloc d'aquesta tesi representa un estudi a fons de diferents problemes relacionats amb dades faltants en quimiometria, biologia de sistemes i en la indústria de bioprocessos. En els capítols més teòrics d'aquesta part, es proposen nous algoritmes per a ajustar models multivariants, tant exploratoris com de regressió, en presencia de dades faltants. Aquests algoritmes servixen ademés com a estratègies de preprocessat de dades abans de l'ús de qualsevol altre mètode. Respecte a les aplicacions, en aquest bloc s'explora la reconstrucció de xarxes en ciències òmiques quan apareixen valors faltants o atípics en les bases de dades. Una segona aplicació d'aquesta part es la transferència de models de calibració entre instruments d'infrarroig proper, evitant així costoses re-calibracions en bioindústries i laboratoris d'investigació. Finalment, es proposa un paquet software que inclou una interfície amigable, disponible de forma gratuïta per a imputació de dades faltants. En l'última part, es discutixen els aspectes més rellevants d'aquesta tesi per a la investigació i la biotecnologia, incloent línies futures de treball.
Folch Fortuny, A. (2016). Chemometric Approaches for Systems Biology [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/77148
TESIS
Premiado
NI, JIAQIAN. "Plasma Biomarkers for Age-Related Macular Degeneration." Cleveland State University / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=csu1236700270.
Full textWang, Xiangxue. "A PROGNOSTIC AND PREDICTIVE COMPUTATIONAL PATHOLOGY BASED COMPANION DIAGNOSTIC APPROACH: PRECISION MEDICINE FOR LUNG CANCER." Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1574125440501667.
Full textTavera, Gloria. "Helicobacter pylori Genetic Variation and Gastric Disease." Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1565176211647636.
Full textSeidel, Richard Alan. "Conservation Biology of the Gammarus pecos Species Complex: Ecological Patterns across Aquatic Habitats in an Arid Ecosystem." Miami University / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=miami1251472290.
Full textLe, Priol Christophe. "Variance de l'expression des microARN et des ARN messagers dans le cancer." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAS023.
Full textThe majority of gene expression studies focus on looking for genes whose mean expression is different when comparing two or more populations of samples. In this context, the variance is treated as a parameter to be controlled. However, similarly to a difference of mean, a difference of variance in gene expression between sample populations may also be biologically and physiologically relevant.MicroRNAs (miRNAs) are key gene expression regulators. The large number of their targets and the fine tuning of their regulation confer to miRNAs a buffering role. The objective of my thesis is to study the variance in expression of miRNAs and messenger RNAs (ARNm), especially those targeted by miRNAs, in particular during cancerogenesis. We hope that this approach can identify genes which cannot be identified by the tradional differential expression analysis and yet serve as potential biomarkers or therapeutic targets. Furthermore, by combining both miRNA and mRNA expression and analyzing their variance at a system level, we aim at better characterize the buffering role of miRNAs.Several methods including statistical tests of equality of variance and models based on the negative binomial distribution were evaluated. The performances of these methods were thoroughly tested on simulated datasets. Then, they were applied to The Cancer Genome Atlas datasets in order to identify genes with a differential expression variance when comparing normal and tumor samples. Many miRNAs and mRNAs with an increase of expression variance in tumors were detected. Interestingly, among these mRNAs, some key biological functions such as catabolism or autophagy are over-represented in most cancers. Thus, analyzing genes having a differential expression variance is relevant to gain knowledge in tumor progression and opens a new space for the discovery of new potential biomarkers and therapeutic avenues
Biton, Anne. "Analyse en composantes indépendantes du transcriptome de cancers." Thesis, Paris 11, 2011. http://www.theses.fr/2011PA11T028.
Full textPractice of gene expression data analysis shows that it is advantageous to analyze biologicalprocesses in terms of modules rather than simply consider gene one by one. In this project, we conducted anunsupervised analysis of the gene expression data of several cohorts of urothelial tumors, applying theIndependent Component Analysis method. Several studies demonstrated the outperformance of ICA overPCA and clustering-based methods in obtaining a more realistic decomposition of the expression data intoconsistent patterns of coexpressed genes associated with the studied phenotypes[1, 2, 3, 4].Urothelial tumors arise and evolve through two distinct pathways which radically differ on their probabilityof progression to muscle invasion. Except the mutation of FGFR3 in the less aggressive group, theunderlying molecular processes have not been completely identified. The first and main objective of the PhDthesis was dedicated to the biological interpretation of the different independent components to help toconfirm and extend the list of biological processes known to be involved in bladder cancer.Each independent component (IC) is characterized by a list of gene projections on the one hand and weightedcontributions of tumor samples on the other hand. By combining biological expertise and comparison of theassociated list of genes to known pathways, and jointly studying the association of the components tomolecular and clinical annotations, we have been able to differentiate components that were caused bytechnical factors, such as surgical sampling, from those having consistent biological interpretationin terms of tumor biology. Moreover, among the biologically meaningful signals, this analysis allowed us todifferentiate the signals from stroma of the tumor, like immune response mediated by B- and T-lymphocytes,from the signals produced by the tumors themselves, like signals related to proliferation, or differentiation.The clustering of the tumor samples according to their contributions on some ICs can be closely associated toanatomo-clinical annotations, and highlighted new potential subtypes of tumors which suggest existence ofadditional pathways of bladder cancer progression. Similarly, the study of the contributions of preestablishedgroups of tumors based on clinical or molecular criteria showed different levels of stromacontamination between FGFR3 non-mutated and mutated tumors. The reproducibility of the components wasinvestigated using correlation graphs. The major part of the interpreted ICs was validated on threeindependent bladder cancer datasets, and several of them were also identified in an urothelial cancer celllines data set.A second study about retinoblastoma gave us the occasion to show that we can take advantage ofICA in various contexts. Retinoblastoma is initiated by the loss of both alleles of the RB1 tumor suppressorgene. Although necessary for initiation, other genomic events, that remain to be identified, are needed for theprogression of the disease [5]. We observed, as it was previously described [6], an association between age ofthe patients and levels of genomic aberrations, the younger patients having fewer alterations than the olderpatients, which generally present gain of 1q and loss of 16q. We showed that this tendency of the tumors tobe clustered into two groups of age can also be observed on the expression data by applying ICA whose oneof the component was highly correlated to the age of the patients. These results suggest the existence of twopathways of progression in retinoblastoma.The analysis of high throughput data provides many lists of genes. To interpret them, a possibility isto study the latest related publications grouped by pre-defined group of topics (function, cellular location...).To that aim, in a third study, we introduced a web-based Java application tool named GeneValorization whichgives a clear and handful overview of the bibliography corresponding to one particular gene list [7]