Dissertationen: „Bioinformatics and biostatistics“

1

Shi, Jing. „Biostatistics and bioinformatics methods for analysis of pathways and gene expression /“. May be available electronically:, 2007. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Shankar, Vijay. „Extension of Multivariate Analyses to the Field of Microbial Ecology“. Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1464358122.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Crabtree, Nathaniel Mark. „Multi-Class Computational Evolution| Development, Benchmark Comparison, and Application to RNA-Seq Biomarker Discovery“. Thesis, University of Arkansas at Little Rock, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10620232.

Der volle Inhalt der Quelle

Annotation:

A computational evolution system (CES) is a knowledge-discovery engine that constructs and evolves classifiers with a small number of features to identify subtle, synergistic relationships among features and to discriminate groups in high-dimensional data analysis. CESs have previously been designed to only analyze binary datasets. In this work, the CES method has been expanded to accommodate multi-class data.

The multi-class CES was compared to three common classification and feature selection methods: random forest, random k-nearest neighbor, and support vector machines. The four classifiers were evaluated on three real RNA sequencing datasets. Performance was evaluated via cross validation to assess classification accuracy, number of features selected, stability of the selected feature sets, and run-time.

The three common classification and feature selection methods were originally designed for microarray data, which is fundamentally different from RNA-Seq data. In order to preprocess RNA-Seq count data for classification, the data was normalized and transformed via a variance stabilizing transformation to remove the variance-mean relationship that is commonly observed in RNA-Seq count data.

Compared to the three competing methods, the multi-class CES selected far fewer features. The identified features are potential biomarkers that may be more relevant than the longer lists of features identified by the competing methods. The CES performed best on the dataset with the smallest sample size, indicating that it has a unique advantage in these situations since most classification algorithms suffer in terms of accuracy when the sample size is small.

The CES identified numerous potentially-important biomarkers in each of the three real datasets that are validated by previous research and worthy of additional investigation. CES was especially helpful at identifying important features in the rat blood RNA-Seq data set. Subsequent ontological analysis of these selected features revealed protein folding as an important process in that dataset. The other contribution of this research to science was to extend the applicability of CES to biomarker discovery in multi-class settings. New software algorithms based on CES have already been developed, and the multi-class modifications presented here are directly applicable and would also benefit the newer software.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Mirina, Alexandra. „Computational approaches for intelligent processing of biomedical data“. Thesis, Yeshiva University, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3664552.

Der volle Inhalt der Quelle

Annotation:

The rapid development of novel experimental techniques has led to the generation of an abundance of biological data, which holds great potential for elucidating many scientific problems. The analysis of such complex heterogeneous information, which we often have to deal with, requires appropriate state-of-the-art analytical methods. Here we demonstrate how an unconventional approach and intelligent data processing can lead to meaningful results.

This work includes three major parts. In the first part we describe a correction methodology for genome-wide association studies (GWAS). We demonstrate the existing bias for the selection of larger genes for downstream analyses in GWA studies and propose a method to adjust for this bias. Thus, we effectively show the need for data preprocessing in order to obtain a biologically relevant result. In the second part, building on the results obtained in the first part, we attempt to elucidate the underlying mechanisms of aging and longevity by conducting a longevity GWAS. Here we took an unconventional approach to the GWAS analysis by applying the idea of genetic buffering. Doing this allowed us to identify pairs of genetic markers that play a role in longevity. Furthermore, we were able to confirm some of them by means of a downstream network analysis. In the third and final part, we discuss the characteristics of chronic lymphocytic leukemia (CLL) B-cells and perform clustering analysis based on immunoglobulin (Ig) mutation patterns. By comparing the sequences of Ig of CLL patients and healthy donors, we show that different Ig heavy chain (IGHV) regions in CLL exhibit similarities with different B-cell subtypes of healthy donors, which raised a question about the single origin of CLL cases.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Zhang, Ju. „Trans-Ancestral Genetic Correlation Estimates from Summary Statistics for Admixed Populations“. Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1619455882746982.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Lott, Paul Christian. „StochHMM| A Flexible Hidden Markov Model Framework“. Thesis, University of California, Davis, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3602142.

Der volle Inhalt der Quelle

Annotation:

In the era of genomics, data analysis models and algorithms that provide the means to reduce large complex sets into meaningful information are integral to further our understanding of complex biological systems. Hidden Markov models comprise one such data analysis technique that has become the basis of many bioinformatics tools. Its relative success is primarily due to its conceptually simplicity and robust statistical foundation. Despite being one of the most popular data analysis modeling techniques for classification of linear sequences of data, researchers have few available software options to rapidly implement the necessary modeling framework and algorithms. Most tools are still hand-coded because current implementation solutions do not provide the required ease or flexibility that allows researchers to implement models in non-traditional ways. I have developed a free hidden Markov model C++ library and application, called StochHMM, that provides researchers with the flexibility to apply hidden Markov models to unique sequence analysis problems. It provides researchers the ability to rapidly implement a model using a simple text file and at the same time provide the flexibility to adapt the model in non-traditional ways. In addition, it provides many features that are not available in any current HMM implementation tools, such as stochastic sampling algorithms, ability to link user-defined functions into the HMM framework, and multiple ways to integrate additional data sources together to make better predictions. Using StochHMM, we have been able to rapidly implement models for R-loop prediction and classification of methylation domains. The R-loop predictions uncovered the epigenetic regulatory role of R-loops at CpG promoters and protein coding genes 3' transcription termination. Classification of methylation domains in multiple pluripotent tissues identified epigenetics gene tracks that will help inform our understanding of epigenetic diseases.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Himmelstein, Daniel S. „The hetnet awakens| understanding complex diseases through data integration andopen science“. Thesis, University of California, San Francisco, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10133408.

Der volle Inhalt der Quelle

Annotation:

Human disease is complex. However, the explosion of biomedical data is providing new opportunities to improve our understanding. My dissertation focused on how to harness the biodata revolution. Broadly, I addressed three questions: how to integrate data, how to extract insights from data, and how to make science more open.

To integrate data, we pioneered the hetnet—a network with multiple node and relationship types. After several preludes, we released Hetionet v1.0, which contains 2,250,197 relationships of 24 types. Hetionet encodes the collective knowledge produced by millions of studies over the last half century.

To extract insights from data, we developed a machine learning approach for hetnets. In order to predict the probability that an unknown relationship exists, our algorithm identifies influential network patterns. We used the approach to prioritize disease—gene associations and drug repurposing opportunities. By evaluating our predictions on withheld knowledge, we demonstrated the systematic success of our method.

After encountering friction that interfered with data integration and rapid communication, I began looking at how to make science more open. The quest led me to explore realtime open notebook science and expose publishing delays at journals as well as the problematic licensing of publicly-funded research data.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Petereit, Julia. „Petal - A New Approach to Construct and Analyze Gene Co-Expression Networks in R“. Thesis, University of Nevada, Reno, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10248467.

Der volle Inhalt der Quelle

Annotation:

petal is a network analysis method that includes and takes advantage of precise Mathematics, Statistics, and Graph Theory, but remains practical to the life scientist. petal is built upon the assumption that large complex systems follow a scale-free and small-world network topology. One main intention of creating this program is to eliminate unnecessary noise and imprecision introduced by the user. Consequently, no user input parameters are required, and the program is designed to allow the two structural properties, scale-free and small-world, to govern the construction of network models.

The program is implemented in the statistical language R and is freely available as a package for download. Its package includes several simple R functions that the researcher can use to construct co-expression networks and extract gene groupings from a biologically meaningful network model. More advanced R users may use other functions for further downstream analyses, if desired.

The petal algorithm is discussed and its application demonstrated on several datasets. petal results show that the technique is capable of detecting biologically meaningful network modules from co-expression networks. That is, scientists can use this technique to identify groups of genes with possible similar function based on their expression information.

While this approach is motivated by whole-system gene expression data, the fundamental components of the method are transparent and can be applied to large datasets of many types, sizes, and stemming from various fields.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Dimont, Emmanuel. „Methods for the Analysis of Differential Composition of Gene Expression“. Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:14226062.

Der volle Inhalt der Quelle

Annotation:

Modern next-generation sequencing and microarray-based assays have empowered the computational biologist to measure various aspects of biological activity. This has led to the growth of genomics, transcriptomics and proteomics as fields of study of the complete set of DNA, RNA and proteins in living cells respectively. One major challenge in the analysis of this data, however, has been the widespread lack of sufficiently large sample sizes due to the high cost of new emerging technologies, making statistical inference difficult. In addition, due to the hierarchical nature of the various types of data, it is important to correctly integrate them to make meaningful biological discoveries and better informed decisions for the successful treatment of disease. In this dissertation I propose: (1) a novel method for more powerful statistical testing of differential digital gene expression between two conditions, (2) a framework for the integration of multi-level biologic data, demonstrated with the compositional analysis of gene expression and its link to promoter structure, and (3) an extension to a more complex generalized linear modeling framework, demonstrated with the compositional analysis of gene expression and its link to pathway structure adjusted for confounding covariates.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Bueno, Raymund. „Investigating Mechanisms of Robustness in BRCA -Mutated Breast and Ovarian Cancers“. Thesis, Yeshiva University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=11014738.

Der volle Inhalt der Quelle

Annotation:

The BRCA1 and BRCA2 (BRCA) genes are two tumor suppressors that when mutated, predispose patients to breast and ovarian cancer. The BRCA genes encode proteins that mediate the repair of DNA double strand breaks. Functional loss of the BRCA genes is detrimental to the integrity of the genome because without access to functional BRCA protein, inefficient and error-prone repair pathways are used instead. These pathways, such as Non-homologous end joining, do not accurately repair the DNA, which can introduce mutations and genomic rearrangements. Ultimately the genome is not repaired faithfully and the predisposition to cancer greatly increases. In addition to their contribution to DNA repair, the BRCA genes have been shown to have transcriptional activity, and this functional role can also be a driving factor behind the tumor suppressor activity.

Robustness is the ability of a complex system to sustain viability despite perturbations to it. In the context of a complex disease such as cancer, robustness gives cancers the ability to sustain uncontrollable growth and invasiveness despite treatments such as chemotherapy that attempt to eliminate the tumor. A complex system is robust however can be fragile to perturbations that the system not optimized against. In cancers, these fragilities have the potential to be cancer specific targets that can eradicate the disease specifically.

Patients with mutations in BRCA tend to have breast and ovarian cancers that are difficult to treat; chemotherapy is the only option and no targeted therapies are available. Targeting the synthetic lethal interaction (SLI), a mechanism of robustness, between BRCA and PARP1 genes was clinically effective in treating BRCA-mutated breast and ovarian cancers. This suggests that understanding robustness in cancers can reveal potential cancer specific therapies.

In this thesis, a computational approach was developed to identify candidate mechanisms of robustness in BRCA-mutated breast and ovarian cancers using the publicly accessible patient gene expression and mutation data from the Cancer Genome Atlas (TCGA). Results showed that in ovarian cancer patients with a BRCA2 mutation, the expression of genes that function in the DNA damage response were kept at stable expression state compared to those patients without a mutation. The stable expression of genes in the DNA damage response may highlight a SLI gene network that is precisely controlled. This result is significant as disrupting this precision can potentially lead to cancer specific death. In breast cancers, genes that were differentially expressed in patients with BRCA mutations were identified. A Bayesian network was performed to infer candidate interactions between BRCA1 and BRCA2 and the differentially expressed FLT3, HOXA11, HPGD, MLF1, NGFR, PLAT, and ZBTB16 genes. These genes function in processes important to cancer progression such as apoptosis and cell migration. The connection between these genes with BRCA may highlight how the BRCA genes influence cancer progression.

Taken together, the findings of this thesis enhance our understanding of the BRCA genes and their role in DNA damage response and transcriptional regulation in human breast and ovarian cancers. These results have been attained from systems-level models to identify candidate mechanisms underlying robustness of cancers. The work presented predicts interesting candidate genes that may have potential as drug targets or biomarkers in BRCA-mutated breast and ovarian cancers.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

Rajabi, Zeyad. „BIAS : bioinformatics integrated application software and discovering relationships between transcription factors“. Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81427.

Der volle Inhalt der Quelle

Annotation:

In the first part of this thesis, we present a new development platform especially tailored to Bioinformatics research and software development called Bias (Bioinformatics Integrated Application Software) designed to provide the tools necessary for carrying out integrative Bioinformatics research. Bias follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system, and it supports standards and data-exchange protocols common to Bioinformatics. The second part of this thesis is on the design and implementation of modules and libraries within Bias related to transcription factors. We present a module in Bias that focuses on discovering competitive relationships between mouse and yeast transcription factors. By competitive relationships we mean the competitive binding of two transcription factors for a given binding site. We also present a method that divides a transcription factor's set of binding sites into two or more different sets when constructing PSSMs.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

Quiroz, Alejandro. „Deciphering the Biological Mechanisms Driving the Phenotype of Interest“. Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10708.

Der volle Inhalt der Quelle

Annotation:

The two key concepts of Neo-Darwinian evolution theory are genotype and phenotype. Genotype is defined as the genetic constitution of an organism and phenotype refers to the observable characteristics of that organism. Schematically the relationship between genotype and phenotype can be settled as Genotype + Environment + Random Variation \(\underrightarrow{\text{yields}}\) Phenotype. This schematic representation has led to the fundamental problem of given the interactions of the genes and environment, up to what extent is possible to establish a relationship between gene structure and function to the phenotype (Weatherall, D. J., et. al., (2001)). Since R. A. Fisher establishing the basis of quantitative trait loci up to the work of Subramanian, et. al., (1995) gene set enrichment analysis, several statistical methods have been devoted to answer this question, some with more success and scientific repercussion than others. In this work we attempt to answer to this question by delineating the biological mechanisms driven by the genes that are characterize the differences and actions of the phenotypes of interest. Our contribution resides on two pillars: we present an alternative way to conceive gene expression measurements and the use of functional gene set annotation systems as guided prior knowledge of the biological mechanisms that drive the phenotype of interest. Based on these two pillars we propose a method to infer the Functional Network Inference and an alternative method to perform expression Quantitative Trait Loci analysis. (eQTL) From the Functional Network Inference method we are able to identify what mechanisms describe the behavior of most of the, there fore establishing its importance. The alternative method to perform eQTL analysis that we present, is more direct way to associated variations at a sequence level and the biological mechanisms it affects. With this proposal we attempt to address two important issues of traditional eQTL analysis: statistical power and biological implications.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

Yu, Jingting. „Methods to Evaluate the Effects of Chromatin Organization in eQTL Mapping and the Effects of Design Factors in Cancer Single-cell Studies“. Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1554463507829716.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Ong, Vy Quoc. „Subgroup Analysis of Patients with Hepatocellular Carcinoma| A Quest for Statistical Algorithms for Tissue Classification Problem“. Thesis, California State University, Long Beach, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10840510.

Der volle Inhalt der Quelle

Annotation:

Hepatocellular carcinoma (HCC) is the most common type of liver cancer. This type of cancer has been observed with prevalence as the third leading cause of death from cancer worldwide and as the ninth leading cause of cancerous mortality in the United States. People with hepatitis B or C are considered to be at high risk for this kind of cancer. Remarkably poor prognostic HCC patients with low survival rates commonly possess intra-hepatic metastases that are either tumor thrombi in the portal vein or intra-hepatic spread. It is uncommon for them to die of extra-hepatic metastases. Therefore, identifying metastatic HCC has become vital and clinically challenging in efforts of timely therapeutic intervention to improve the survival rate of patients who suffer from this disease.

To date, studies that look for an accurate molecular profiling model have been developed to identify these patients in advance for a better treatment or intervention. An approach has been to focus on identifying individual candidate genes characterizing metastatic HCC. Another direction has been to find a global genome scale solution by using microarray technology to obtain a gene expression for this carcinoma. Among research following the latter was that developed by Qing-Hai Ye et al., Nature Medicine, Volume 9, Number 4, April 2003. They applied cDNA microarray-based gene expression profiling with compound co-variate predictors for primary HCC, metastatic HCC, and metastasis-free HCC binary classification tasks on a dataset of 87 observations and 9984 features taken from 40 hepatitis B-positive Chinese patients. Notably, a robust 153-gene model was generated to successfully classify tumor-thrombi-in-the-portal-vein samples with metastasis-free samples. However, they admitted distinguishing primary tumor samples from their matched-metastatic lesions were still a challenge. In this molecule signature, a gene named osteopontin, a secreted phosphoprotein, served as the lead gene in diagnosing HCC metastasis.

The analysis is based on the metastatic status of HCC, which is clinically predetermined. However, the validation of the class definition is needed to investigate if the data are sufficient to translate the three classes predefined. We will use some statistical clustering algorithms to validate the class defined. After that, we will conduct variable selection to find markers that are differentially expressed genes among clinical groups validated from this research. Next, using the compound markers found by this research, we will develop a statistical model to predict a new patient’s HCC type for intervention. The generalized performance of the prediction model will be evaluated via a cross-validation test. This study aims to build a highly accurate model that renders a better classification of the fore-mentioned clinical groups of HCC and thus enhances the rate of predicting metastatic patients.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Alouani, David James. „THE AGING PROCESS OF C. ELEGANS VIEWED THROUGH TIME DEPENDENT PROTEIN EXPRESSION ANALYSIS“. Case Western Reserve University School of Graduate Studies / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=case1436393267.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Yip, Wai-Ki. „Statistical Methods for Analyzing DNA Methylation Data and Subpopulation Analysis of Continuous, Binary and Count Data for Clinical Trials“. Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:14226106.

Der volle Inhalt der Quelle

Annotation:

DNA methylation may represent an important contributor to the missing heritability described in complex trait genetics. However, technology to measure DNA methylation has outpaced statistical methods for analysis. Novel methodologies are required to accommodate this growing volume of DNA methylation data. In this dissertation, I propose two novel methods to analyze DNA methylation data: (1) a new statistic based on spatial location information of DNA methylation sites to detect differentially methylated regions in the genome in case and control studies; and (2) a principal component approach for the detection of unknown substructure in DNA methylation data. For each method, I review existing ones and demonstrate the efficacy of my proposed method using simulation and data application. Medical research is increasingly focused on personalizing the care of patients. A better understanding of the interaction between treatment and patient specific prognostic factors will enable practitioners to expand the availability of tailored therapies improving patient outcomes. The Subpopulation Treatment Effect Pattern Plot (STEPP) approach was developed to allow researchers to investigate the heterogeneity of treatment effects on survival outcomes across increasing values of a continuously measured covariate, such as biomarker measurement. I extend the STEPP approach to continuous, binary and count outcomes which can be easily modeled with generalized linear models (GLM). The statistical significance of any observed heterogeneity of treatment effect is assessed using permutation tests. The method is implemented in the R software package (stepp) and is available in R version 3.1.1. The efficacy of my STEPP extension is demonstrated by using simulation and data application.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Arnold, Brian. „Evolutionary Dynamics of a Multiple-Ploidy System in Arabidopsis Arenosa“. Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467222.

Der volle Inhalt der Quelle

Annotation:

Whole-genome duplication (WGD), which leads to polyploidy, has been implicated in speciation and biological novelty. In plants, many species have experienced historical bouts of WGD or exhibit extant ploidy variation, which is likely representative of an early stage in the evolution of new polyploid lineages. To elucidate the evolutionary dynamics of autopolyploids and species with multiple ploidy levels, I develop population genetic theory in Chapter 2 that I use in Chapter 4 to extract information about the evolutionary history of Arabidopsis arenosa, a European wildflower that has diploid and autotetraploid populations. Chapter 3 involves a separate project exploring the ascertainment bias in restriction site associated DNA sequencing (RADseq). In Chapter 2, I develop coalescent models for autotetraploid species with tetrasomic inheritance and show that the ancestral genetic process in a large population without recombination may be approximated using Kingman’s standard coalescent, with a coalescent effective population size 4N. Using this result, I was able to use existing coalescent simulation programs to show in Chapter 4 that, in A. arenosa, a widespread autotetraploid race arose from a single ancestral population. This autopolyploidization event was not accompanied by immediate reproductive isolation between diploids and tetraploids in this species, as I find evidence of extensive interploidy admixture between diploid and tetraploid populations that are geographically close. To draw these conclusions about population history in Chapter 4, I used a reduced representation genome-sequencing approach based on restriction digestion. However, I was bothered by the possibility that sampling chromosomes based on restriction digestion may introduce a bias in allele frequency estimation due to polymorphisms in restriction sites. To explore the effects of this nonrandom sampling and its sensitivity to different evolutionary parameters, we developed a coalescent-simulation framework in Chapter 3 to mimic the biased recovery of chromosomes in RAdseq experiments. We show that loci with missing haplotypes have estimated diversity statistic values that can deviate dramatically from true values and are also enriched for particular genealogical histories. These results urge caution when applying this technique to make population genetic inferences and helped me tailor analyses in Chapter 4 to accommodate for this particular method of DNA sequencing.
Biology, Organismic and Evolutionary

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

Zack, Travis Ian. „Exploring cancer's fractured genomic landscape| Searching for cancer drivers and vulnerabilities in somatic copy number alterations“. Thesis, Harvard University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3645095.

Der volle Inhalt der Quelle

Annotation:

Somatic copy number alterations (SCNAs) are a class of alterations that lead to deviations from diploidy in developing and established tumors. A feature that distinguishes SCNAs from other alterations is their genomic footprint. The large genomic footprint of SCNAs in a typical cancer's genome presents both a challenge and an opportunity to find targetable vulnerabilities in cancer. Because a single event affects many genes, it is often challenging to identify the tumorigenic targets of SCNAs. Conversely, events that affect multiple genes may provide specific vulnerabilities through "bystander" genes, in addition to vulnerabilities directly associated with the targets.

We approached the goal of understanding how the structure of SCNAs may lead to dependency in two ways. To improve our understanding of how SCNAs promote tumor progression we analyzed the SCNAs in 4934 primary tumors in 11 common cancers collected by the Cancer Genome Atlas (TCGA). The scale of this dataset provided insights into the structure and patterns of SCNA, including purity and ploidy rates across disease, mechanistic forces shaping patterns of SCNA, regions undergoing significantly recurrent SCNAs, and correlations between SCNAs in regions implicated in cancer formation.

In a complementary approach, we integrating SCNA data and pooled RNAi screening data involving 11,000 genes across 86 cell lines to find non-driver genes whose partial loss led to increased sensitivity to RNAi suppression. We identified a new set of cancer specific vulnerabilities predicted by loss of non-driver genes, with the most significant gene being PSMC2, an obligate member of the 26S proteasome. Biochemically, we found that PSMC2 is in excess of cellular requirement in diploid cells, but becomes the stoichiometric limiting factor in proteasome formation after partial loss of this gene.

In summary, my work improved our understanding of the structure and patterns of SCNA, both informing how cancers develop and predicting novel cancer vulnerabilities. Our characterization of the SCNAs present across 5000 tumors uncovered novel structure in SCNAs and significant regions likely to contain driver genes. Through integrating SCNA data with the results of a functional genetic screen, we also uncovered a new set of vulnerabilities caused by unintended loss of non-driver genes.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

Ablorh, Akweley. „Meta-Analysis of a Multi-Ethnic, Breast Cancer Case-Control Targeted Sequencing Study“. Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:16121143.

Der volle Inhalt der Quelle

Annotation:

Breast cancer, the most commonly diagnosed cancer in American women, is a heritable disease with nearly one hundred known genetic risk factors. Using next generation sequencing, we explored the contribution of genetics at 12 GWAS-identified loci to breast cancer susceptibility in a multi-ethnic breast cancer case-control study. Methods: The study population consists of 4,611 breast cancer cases and controls (2,316 cases and 2,295 controls) from four mutually exclusive ethnicities: African, Latina, Japanese, or European American.We conducted rare variant association testing between sequenced genotypes and simulated phenotypes to compare the performance of several approaches for assessing rare variant associations across multiple ethnicities and the statistical performance of different ethnic sampling fractions, including single-ethnicity studies and studies that sample up to four ethnicities. Findings from simulation of causal rare variant penetrance models were then applied to a non-synonymous protein-coding rare variant association study of breast cancer. Next, we applied variance partitioning methods to determine what proportion of breast cancer heritability is explained by rare and common, coding and non-coding, and the complete set of sequenced genetic variants. Results: Variance component-based tests were better powered in several scenarios. Multi-ethnic studies were well powered, with inclusion of African Americans providing the largest gains in statistical power. Rare variation in several genes was nominally associated (alpha=0.05) with breast cancer risk. Common variants explained a significant amount of breast cancer heritability (5%; SE=2%). Total breast cancer heritability from all sequenced SNVs from all 12 loci was approximately 11% (S.E.=4%), a substantial portion of breast cancer heritability which ranges from 27% to 32% in European familial studies. Conclusion: Our findings suggest that association studies between rare variants and complex disease should consider including subjects from multiple ethnicities, with preference given to genetically diverse groups. We demonstrate practices with the potential to uncover and localize gene-based associations using gene-based rare variant association testing at 12 GWAS-identified breast cancer susceptibility loci. We also present strong evidence that just this subset of previously-identified loci explains a substantial portion of heritability which suggests that all GWAS-identified loci may explain more breast cancer heritability than the 17% previously reported.
Epidemiology

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Patel, Vishal N. „Colon Cancer and its Molecular Subsystems: Network Approaches to Dissecting Driver Gene Biology“. Case Western Reserve University School of Graduate Studies / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=case1310087563.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Athippozhy, Antony Thomas. „ANALYSIS OF DIFFERENTIAL GENE EXPRESSION AND ALTERNATIVE SPLICING IN THE LIVER AND GASTROINTESTINAL TRACT IN THE LACTATING RAT“. UKnowledge, 2011. http://uknowledge.uky.edu/gradschool_diss/218.

Der volle Inhalt der Quelle

Annotation:

Rat exon microarrays were utilized to detect changes in mRNA expression and alternative splicing in the liver, duodenum, jejunum, and ileum of the lactating rat when compared to age-matched virgin controls. Analysis of data at the level of gene expression revealed differential expression of genes involved in cholesterol biosynthesis in each tissue examined, suggesting increased Sterol Response Element Binding Protein activity. We also detected decreased mRNA from components of the T-cell signaling pathway in the jejunum and ileum. We characterized expression of solute carrier and adenosine triphosphate binding cassette proteins. In addition to characterizing genes by pathway, we have also grouped genes based on their pattern of expression to identify important genes. Amongst genes upregulated in all tissues was Slc39a4, which is a critical transporter in the absorption of zinc in enterocytes. Alternative splicing analysis detected a substantial amount of alternative splicing in the ileum compared to other tissues. In addition, in the liver Abcg8, a protein that functions as a heterodimer to export cholesterol in the bile, shows differential splicing in the liver, but not in other tissues. We also detected differential expression of Ugt1a6 in the liver based on usage of an alternative first exon, which is consistent with altered protein levels observed previously. Differential splicing also appears to occur in Ace2 in the ileum, which could have consequences on the renin-angiotensin pathway.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Chan, Ying Leong. „Leveraging genetic association data to investigate the polygenic architecture of human traits and diseases“. Thesis, Harvard University, 2014. http://dissertations.umi.com/gsas.harvard:11372.

Der volle Inhalt der Quelle

Annotation:

Many human traits and diseases have a polygenic architecture, where phenotype is partially determined by variation in many genes. These complex traits or diseases can be highly heritable and genome-wide association studies (GWAS) have been relatively successful in the identification of associated variants. However, these variants typically do not account for most of the heritability and thus, the genetic architecture remains uncertain.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Loveless, Ian. „Binary Classification With First Phase Feature Selection forGene Expression Survival Data“. The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1555444873531262.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

Reese, Sarah. „Detecting and Correcting Batch Effects in High-Throughput Genomic Experiments“. VCU Scholars Compass, 2013. http://scholarscompass.vcu.edu/etd/3180.

Der volle Inhalt der Quelle

Annotation:

Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal components analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of principal components analysis to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test if a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We further compare existing batch effect correction methods and apply gPCA to test their effectiveness. We conclude that our novel statistic that utilizes guided principal components analysis to identify whether batch effects exist in high-throughput genomic data is effective. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

Stansfield, John C. „Methods for Joint Normalization and Comparison of Hi-C data“. VCU Scholars Compass, 2019. https://scholarscompass.vcu.edu/etd/5951.

Der volle Inhalt der Quelle

Annotation:

The development of chromatin conformation capture technology has opened new avenues of study into the 3D structure and function of the genome. Chromatin structure is known to influence gene regulation, and differences in structure are now emerging as a mechanism of regulation between, e.g., cell differentiation and disease vs. normal states. Hi-C sequencing technology now provides a way to study the 3D interactions of the chromatin over the whole genome. However, like all sequencing technologies, Hi-C suffers from several forms of bias stemming from both the technology and the DNA sequence itself. Several normalization methods have been developed for normalizing individual Hi-C datasets, but little work has been done on developing joint normalization methods for comparing two or more Hi-C datasets. To make full use of Hi-C data, joint normalization and statistical comparison techniques are needed to carry out experiments to identify regions where chromatin structure differs between conditions. We develop methods for the joint normalization and comparison of two Hi-C datasets, which we then extended to more complex experimental designs. Our normalization method is novel in that it makes use of the distance-dependent nature of chromatin interactions. Our modification of the Minus vs. Average (MA) plot to the Minus vs. Distance (MD) plot allows for a nonparametric data-driven normalization technique using loess smoothing. Additionally, we present a simple statistical method using Z-scores for detecting differentially interacting regions between two datasets. Our initial method was published as the Bioconductor R package HiCcompare [http://bioconductor.org/packages/HiCcompare/](http://bioconductor.org/packages/HiCcompare/). We then further extended our normalization and comparison method for use in complex Hi-C experiments with more than two datasets and optional covariates. We extended the normalization method to jointly normalize any number of Hi-C datasets by using a cyclic loess procedure on the MD plot. The cyclic loess normalization technique can remove between dataset biases efficiently and effectively even when several datasets are analyzed at one time. Our comparison method implements a generalized linear model-based approach for comparing complex Hi-C experiments, which may have more than two groups and additional covariates. The extended methods are also available as a Bioconductor R package [http://bioconductor.org/packages/multiHiCcompare/](http://bioconductor.org/packages/multiHiCcompare/). Finally, we demonstrate the use of HiCcompare and multiHiCcompare in several test cases on real data in addition to comparing them to other similar methods (https://doi.org/10.1002/cpbi.76).

APA, Harvard, Vancouver, ISO und andere Zitierweisen

26

Larson, Jessica. „Hidden Markov Models Predict Epigenetic Chromatin Domains“. Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10105.

Der volle Inhalt der Quelle

Annotation:

Epigenetics is an important layer of transcriptional control necessary for cell-type specific gene regulation. We developed computational methods to analyze the combinatorial effect and large-scale organizations of genome-wide distributions of epigenetic marks. Throughout this dissertation, we show that regions containing multiple genes with similar epigenetic patterns are found throughout the genome, suggesting the presence of several chromatin domains. In Chapter 1, we develop a hidden Markov model (HMM) for detecting the types and locations of epigenetic domains from multiple histone modifications. We use this method to analyze a published ChIP-seq dataset of five histone modification marks in mouse embryonic stem cells. We successfully detect domains of consistent epigenetic patterns from ChIP-seq data, providing new insights into the role of epigenetics in longrange gene regulation. In Chapter 2, we expand our model to investigate the genome-wide patterns of histone modifications in multiple human cell lines. We find that chromatin states can be used to accurately classify cell differentiation stage, and that three cancer cell lines can be classified as differentiated cells. We also found that genes whose chromatin states change dynamically in accordance with differentiation stage are not randomly distributed across the genome, but tend to be embedded in multi-gene chromatin domains. Moreover, many specialized gene clusters are associated with stably occupied domains. In the last chapter, we develop a more sophisticated, tiered HMM to include a domain structure in our chromatin annotation. We find that a model with three domains and five sub-states per domain best fits our data. Each state has a unique epigenetic pattern, while still staying true to its domain’s specific functional aspects and expression profiles. The majority of the genome (including most introns and intergenic regions) has low epigenetic signals and is assigned to the same domain. Our model outperforms current chromatin state models due to its increased domain coherency and interpretation.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

27

Li, Ran. „Chemometrics Development using Multivariate Statistics and Vibrational Spectroscopy and its Application to Cancer Diagnosis“. The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1449067634.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

28

Ge, Jianye. „Computational Algorithms and Evidence Interpretation in DNA Forensics based on Genomic Data“. University of Cincinnati / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1234916402.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

29

McSweeny, Andrew J. „Identification of Candidate Genes within Blood Pressure QTL Containing Regions Using Gene Expression Data“. University of Toledo Health Science Campus / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=mco1212501779.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

30

Stombaugh, Jesse. „Predicting the Structure of RNA 3D Motifs“. Bowling Green State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1225391806.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

31

Vigliotti, Chloé. „Etude de l'impact d'un changement de régime alimentaire sur le microbiome intestinal de Podarcis sicula“. Thesis, Paris, Muséum national d'histoire naturelle, 2017. http://www.theses.fr/2017MNHN0011/document.

Der volle Inhalt der Quelle

Annotation:

Nous avons collecté et comparé les microbiotes et les microbiomes intestinaux de plusieurs dizaines de lézards de l’espèce Podarcis sicula, vivant dans des populations continentales et insulaires croates. L’une de ces populations présentait la particularité d’avoir subi un changement de régime alimentaire récent, une transition d’un régime insectivore vers un régime omnivore (à 80% herbivores) sur une période de 46 ans. Les analyses de diversité menées sur la région V4 de l’ARN ribosomique 16S de ces communautés microbiennes ont révélé que la diversité spécifique (diversité alpha) des microbiotes de lézards omnivores (enrichis en archées méthanogènes) excède celle des microbiotes de lézards insectivores. Les communautés microbiennes des lézards apparaissent en outre faiblement structurées : 5 entérotypes peuvent être identifiés au niveau du phylum, et 3 phyla majoraires (les Bactéroidètes, les Firmicutes et les Protéobactéries) sont présents dans cette espèce. Cependant, ni le régime alimentaire, l’origine spatiale ou temporelle, et le sexe des lézards ne se traduisent par des différences significatives et majeures dans les microbiotes. Des analyses linéaires discriminantes avec effet de la taille des OTUs et des reads des microbiomes fonctionnellement annotés indiquent plutôt que le changement de régime alimentaire de Podarcis sicula est associé à des changements ciblés dans l’abondance de certains composants du microbiote et du microbiome de ces lézards, nous conduisant à formuler l’hypothèse de changements ciblés des communautés microbiennes dans cet holobionte non-modèle, par opposition à des transformations plus radicales. Sur un plan plus théorique, cette thèse propose également des modèles de réseaux (réseaux de similarité de reads et graphes bipartis) susceptibles d’aider à approfondir les analyses des microbiomes
We collected and compared intestinal microbiota and microbiomes from several Podarcis sicula lizards, which live in Croatian continental and insular populations. One of these populations has recently changed its diet over an 46 years timespan, switching from an insectivorous diet to an omnivorous one (up to 80% herbivorous). Diversity analyses of these microbial communities, based on the V4 region of their 16S rRNA, showed that the microbiota taxonomic diversity (or alpha diversity) is higher in omnivorous lizards (enrichment in methanogenic archaea) than in insectivorous ones. Besides, microbial communities seem weakly structured: 5 enterotypes are detected at the phylum level, and 3 major phyla (Bacteroidetes, Firmicutes and Proteobacteria) are present. However, neither diet, spatial or temporal origin, nor lizard gender correlate with significant differences in microbiota. Linear discriminant analyses with size effect, based on OTUs and functionally annotated reads from the microbiomes, suggest that Podarcis sicula diet change is associated to targeted changes of the abundance of some enzymes in the microbiomes. Such a result leads us to propose a hypothesis of targeted changes in the microbial communities of this non-model holobiont, instead of more radical transformations. On a more theoretical level, this thesis also proposes network models (Reads similarity networks and bipartite graphs) that can help improving microbiome analyses

APA, Harvard, Vancouver, ISO und andere Zitierweisen

32

Bechtel, Jason M. „Characterization of Genomic MidRange Inhomogeneity“. University of Toledo Health Science Campus / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=mco1217365784.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

33

Bhajun, Ricky. „Une approche réseau pour l’inférence du rôle des microARN dans la corégulation des processus biologiques“. Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAS045/document.

Der volle Inhalt der Quelle

Annotation:

L'interférence par l'ARN est un processus selon lequel un petit ARN non codant se lie à un ARN messager cible dans la cellule pour moduler son expression. Ce mécanisme a été conservé au cours de l'évolution : il est retrouvé aussi bien chez les animaux que chez les végétaux. Nous savons aujourd'hui que le rôle de l'interférence par l'ARN est fondamental, dans le développement embryonnaire comme dans la progression tumorale. Les microARN (miARN) sont des ARN non codant endogènes dont l'une des particularités est leur capacité à réguler tout un ensemble de gènes par interférence avec les ARN messagers. Il est ainsi prédit qu'un seul miARN serait capable de réguler plusieurs centaines de gènes différents. La thèse a consisté en l'analyse de la corégulation médiée par les miARN grâce à l'inférence de réseau basée sur le partage de gènes cibles. La corégulation est un phénomène où plusieurs miARN différents interviennent sur les mêmes familles de gènes et donc sur les mêmes processus biologiques. Le travail a plus spécifiquement consisté en la mise en place d'un réseau de miARN, en son analyse topologique mais également en son interprétation biologique. Le but final était de proposer de nouvelles hypothèses biologiques à tester afin de mieux comprendre la corégulation des processus biologiques par les miARN. Au travers de ces travaux, deux groupes de miARN ont pu être mis en évidence, dont l'un impliqué dans la régulation de la signalisation par les petites GTPases – hypothèse par la suite validée par plusieurs expériences in vitro. Dans un second temps, une communauté de miARN impliquée dans le maintien de la pluripotence des cellules souches a pu également être mise en évidence. Pour compléter ces analyses, une étude systémique de la topologie des réseaux de miARN a été menée afin de mieux comprendre leur intégration dans les réseaux biologiques et leur rôle dans le devenir cellulaire
RNA interference is a process in which a small non-coding RNA will bind to a specific messenger RNA and regulate its expression. This evolutionary conserved mechanism is found in all superior eukaryotes from plants to mammals. Nowadays, we know that RNA interference is a major regulatory process involved in developmental biology and tumor progression. MicroRNAs (miRNAs) are endogenous (coded in and produced by the cell) non-coding RNAs which are able to regulate a whole set of genes, typically hundreds of genes. This doctoral thesis consisted in the analysis of the miRNA mediated coregulation through a network approach based on target sharing. Coregulation is the process where many different miRNAs will regulate the same set of genes and thus the same biological process. In particular, the work consisted in the inference of a miRNA network, in its topological analysis and also its biological interpretation. Indeed, the final aim of the work was to generate new biological hypothesis. As such, two different groups of miRNAs were first retrieved. One of them was predicted to be involved in the small GTPase signaling and was further validated in vitro. Moreover, a miRNA community involved in the maintenance of stem cells pluripotency was also discovered. Finally, a systemic analysis of the target-based miRNAs network was conducted to better understand their integration with biologic networks and their role in cell fate

APA, Harvard, Vancouver, ISO und andere Zitierweisen

34

Song, Yeunjoo E. „New Score Tests for Genetic Linkage Analysis in a Likelihood Framework“. Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1354561219.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

35

Manser, Paul. „Methods for Integrative Analysis of Genomic Data“. VCU Scholars Compass, 2014. http://scholarscompass.vcu.edu/etd/3638.

Der volle Inhalt der Quelle

Annotation:

In recent years, the development of new genomic technologies has allowed for the investigation of many regulatory epigenetic marks besides expression levels, on a genome-wide scale. As the price for these technologies continues to decrease, study sizes will not only increase, but several different assays are beginning to be used for the same samples. It is therefore desirable to develop statistical methods to integrate multiple data types that can handle the increased computational burden of incorporating large data sets. Furthermore, it is important to develop sound quality control and normalization methods as technical errors can compound when integrating multiple genomic assays. DNA methylation is a commonly studied epigenetic mark, and the Infinium HumanMethylation450 BeadChip has become a popular microarray that provides genome-wide coverage and is affordable enough to scale to larger study sizes. It employs a complex array design that has complicated efforts to develop normalization methods. We propose a novel normalization method that uses a set of stable methylation sites from housekeeping genes as empirical controls to fit a local regression hypersurface to signal intensities. We demonstrate that our method performs favorably compared to other popular methods for the array. We also discuss an approach to estimating cell-type admixtures, which is a frequent biological confound in these studies. For data integration we propose a gene-centric procedure that uses canonical correlation and subsequent permutation testing to examine correlation or other measures of association and co-localization of epigenetic marks on the genome. Specifically, a likelihood ratio test for general association between data modalities is performed after an initial dimension reduction step. Canonical scores are then regressed against covariates of interest using linear mixed effects models. Lastly, permutation testing is performed on weighted correlation matrices to test for co-localization of relationships to physical locations in the genome. We demonstrate these methods on a set of developmental brain samples from the BrainSpan consortium and find substantial relationships between DNA methylation, gene expression, and alternative promoter usage primarily in genes related to axon guidance. We perform a second integrative analysis on another set of brain samples from the Stanley Medical Research Institute.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

36

Thiel, Bonnie Arlene. „Bioinformatics approaches to studying immune processes associated with immunity to Mycobacterium tuberculosis infection in the lung and blood“. Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1627247387242562.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

37

Haynes, Eric E. „Identifying Common Genes from Rheumatoid Arthritis, Systemic Lupus, Multiple Sclerosis and Sjogrens Syndrome by Pooling Existing Microarray Data“. University of Toledo Health Science Campus / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=mco1374011043.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

38

Wang, Heming. „LOCAL ANCESTRY INFERENCE AND ITS IMPLICATION IN SEARCHING FOR SELECTION EVIDENCE IN RECENT ADMIXED POPULATION“. Case Western Reserve University School of Graduate Studies / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=case1473439566976121.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

39

Haddon, Andrew L. „Evaluation of Some Statistical Methods for the Identification of Differentially Expressed Genes“. FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/1913.

Der volle Inhalt der Quelle

Annotation:

Microarray platforms have been around for many years and while there is a rise of new technologies in laboratories, microarrays are still prevalent. When it comes to the analysis of microarray data to identify differentially expressed (DE) genes, many methods have been proposed and modified for improvement. However, the most popular methods such as Significance Analysis of Microarrays (SAM), samroc, fold change, and rank product are far from perfect. When it comes down to choosing which method is most powerful, it comes down to the characteristics of the sample and distribution of the gene expressions. The most practiced method is usually SAM or samroc but when the data tends to be skewed, the power of these methods decrease. With the concept that the median becomes a better measure of central tendency than the mean when the data is skewed, the tests statistics of the SAM and fold change methods are modified in this thesis. This study shows that the median modified fold change method improves the power for many cases when identifying DE genes if the data follows a lognormal distribution.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

40

Brown, Andrew S. „Identification of a phospho-hnRNP E1 Nucleic Acid Consensus Sequence Mediating Epithelial to Mesenchymal Transition“. Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1437943957.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

41

MARCEDDU, GIUSEPPE. „Bioinformatics e Biostatistics applied to research in pediatric genetic disease. Clinical evidence in IFNλ4 polymorphisms associated with HCV infection in patients with beta thalassemia and WGCNA analysis weighted for IFNλ4 genotype rs12979860 to detect RPL9P18 as hub in HCV infected cell“. Doctoral thesis, Università degli Studi di Cagliari, 2015. http://hdl.handle.net/11584/266612.

Der volle Inhalt der Quelle

Annotation:

Genome-wide association studies have identified host genetic variation to be critical for spontaneous clearance and treatment response in patients infected with hepatitis C virus (HCV). We demonstrated the same in patients with thalassemia major infected by genotype 1b of HCV. In the present first part study we retrospectively analyzed 368 anti-HCV positive patients with beta-thalassemia in two Italian major thalassemic centers (Cagliari and Turin). The strongest IFNλ4 SNP found associated with HCV was rs12979860 where C/C genotype was related to response to the interferon treatment and, above all, to spontaneous clearance of the virus. However, the positive predictive power was stronger for viral persistence than spontaneous clearance indeed TT allele was more predictive than CC. Another polymorphism rs4803221 was analyzed because had independent effects respect to rs12979860. The haplotype tagged by SNP rs12979860 and rs4803221 significantly could improve the viral clearance prediction in infected patients. Neither necrotic-inflammation or bilirubin values in the chronic phase of the hepatitis C were related to IFNλ4 polymorphisms. No relation among IFNλ4 polymorphisms and fibrosis stage directly shown by the liver biopsy was found. Second part of our study was to identify hub genes associated in pathways closely related to IFNλ4 variants in HCV response. We used gene expression profile data of GSE54648, downloaded from Gene Expression Omnibus (GEO). We focused our attention on expression genes differential between rs12979860 unfavorable TT genotype and favorable CC genotype, using weighted gene expression network analysis (WGCNA - R package). Significant modules were selected using the clustering analysis. At the final the best significant module was “black” module. Its pathways were involved in translation mechanisms such as translation termination, translation elongation, nuclear-transcribed mRNA catabolic process, cellular protein complex disassembly, therefore biological mechanisms that occur inside ribosome. We discovered RPL9P18 pseudogene as a hub potentially related in inhibition of spontaneous clearance and furthermore likely involved in drug treatment inhibition. Our result suggests an active role for ribosome pseudogene in innate antiviral response probably during ISG (IFN-stimulated genes) translation. Moreover, through co-expression analysis we demonstrate a new possible role of IFNλ4 genotype in HCV infection, associate with expression of ribosomal pathways.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

42

Abuelqumsan, Mustafa. „Assessment of supervised classification methods for the analysis of RNA-seq data“. Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0582/document.

Der volle Inhalt der Quelle

Annotation:

Les technologies « Next Generation Sequencing» (NGS), qui permettent de caractériser les séquences génomiques à un rythme sans précédent, sont utilisées pour caractériser la diversité génétique humaine et le transcriptome (partie du génome transcrite en acides ribonucléiques). Les variations du niveau d’expression des gènes selon les organes et circonstances, sous-tendent la différentiation cellulaire et la réponse aux changements d’environnement. Comme les maladies affectent souvent l’expression génique, les profils transcriptomiques peuvent servir des fins médicales (diagnostic, pronostic). Différentes méthodes d’apprentissage artificiel ont été proposées pour classer des individus sur base de données multidimensionnelles (par exemple, niveau d’expression de tous les gènes dans des d’échantillons). Pendant ma thèse, j’ai évalué des méthodes de « machine learning » afin d’optimiser la précision de la classification d’échantillons sur base de profils transcriptomiques de type RNA-seq
Since a decade, “Next Generation Sequencing” (NGS) technologies enabled to characterize genomic sequences at an unprecedented pace. Many studies focused of human genetic diversity and on transcriptome (the part of genome transcribed into ribonucleic acid). Indeed, different tissues of our body express different genes at different moments, enabling cell differentiation and functional response to environmental changes. Since many diseases affect gene expression, transcriptome profiles can be used for medical purposes (diagnostic and prognostic). A wide variety of advanced statistical and machine learning methods have been proposed to address the general problem of classifying individuals according to multiple variables (e.g. transcription level of thousands of genes in hundreds of samples). During my thesis, I led a comparative assessment of machine learning methods and their parameters, to optimize the accuracy of sample classification based on RNA-seq transcriptome profiles

APA, Harvard, Vancouver, ISO und andere Zitierweisen

43

Cui, Lingfei. „A Likelihood Method to Estimate/Detect Gene Flow and A Distance Method to Estimate Species Trees in the Presence of Gene Flow“. The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1406158261.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

44

Folch, Fortuny Abel. „Chemometric Approaches for Systems Biology“. Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/77148.

Der volle Inhalt der Quelle

Annotation:

The present Ph.D. thesis is devoted to study, develop and apply approaches commonly used in chemometrics to the emerging field of systems biology. Existing procedures and new methods are applied to solve research and industrial questions in different multidisciplinary teams. The methodologies developed in this document will enrich the plethora of procedures employed within omic sciences to understand biological organisms and will improve processes in biotechnological industries integrating biological knowledge at different levels and exploiting the software packages derived from the thesis. This dissertation is structured in four parts. The first block describes the framework in which the contributions presented here are based. The objectives of the two research projects related to this thesis are highlighted and the specific topics addressed in this document via conference presentations and research articles are introduced. A comprehensive description of omic sciences and their relationships within the systems biology paradigm is given in this part, jointly with a review of the most applied multivariate methods in chemometrics, on which the novel approaches proposed here are founded. The second part addresses many problems of data understanding within metabolomics, fluxomics, proteomics and genomics. Different alternatives are proposed in this block to understand flux data in steady state conditions. Some are based on applications of multivariate methods previously applied in other chemometrics areas. Others are novel approaches based on a bilinear decomposition using elemental metabolic pathways, from which a GNU licensed toolbox is made freely available for the scientific community. As well, a framework for metabolic data understanding is proposed for non-steady state data, using the same bilinear decomposition proposed for steady state data, but modelling the dynamics of the experiments using novel two and three-way data analysis procedures. Also, the relationships between different omic levels are assessed in this part integrating different sources of information of plant viruses in data fusion models. Finally, an example of interaction between organisms, oranges and fungi, is studied via multivariate image analysis techniques, with future application in food industries. The third block of this thesis is a thoroughly study of different missing data problems related to chemometrics, systems biology and industrial bioprocesses. In the theoretical chapters of this part, new algorithms to obtain multivariate exploratory and regression models in the presence of missing data are proposed, which serve also as preprocessing steps of any other methodology used by practitioners. Regarding applications, this block explores the reconstruction of networks in omic sciences when missing and faulty measurements appear in databases, and how calibration models between near infrared instruments can be transferred, avoiding costs and time-consuming full recalibrations in bioindustries and research laboratories. Finally, another software package, including a graphical user interface, is made freely available for missing data imputation purposes. The last part discusses the relevance of this dissertation for research and biotechnology, including proposals deserving future research.
Esta tesis doctoral se centra en el estudio, desarrollo y aplicación de técnicas quimiométricas en el emergente campo de la biología de sistemas. Procedimientos comúnmente utilizados y métodos nuevos se aplican para resolver preguntas de investigación en distintos equipos multidisciplinares, tanto del ámbito académico como del industrial. Las metodologías desarrolladas en este documento enriquecen la plétora de técnicas utilizadas en las ciencias ómicas para entender el funcionamiento de organismos biológicos y mejoran los procesos en la industria biotecnológica, integrando conocimiento biológico a diferentes niveles y explotando los paquetes de software derivados de esta tesis. Esta disertación se estructura en cuatro partes. El primer bloque describe el marco en el cual se articulan las contribuciones aquí presentadas. En él se esbozan los objetivos de los dos proyectos de investigación relacionados con esta tesis. Asimismo, se introducen los temas específicos desarrollados en este documento mediante presentaciones en conferencias y artículos de investigación. En esta parte figura una descripción exhaustiva de las ciencias ómicas y sus interrelaciones en el paradigma de la biología de sistemas, junto con una revisión de los métodos multivariantes más aplicados en quimiometría, que suponen las pilares sobre los que se asientan los nuevos procedimientos aquí propuestos. La segunda parte se centra en resolver problemas dentro de metabolómica, fluxómica, proteómica y genómica a partir del análisis de datos. Para ello se proponen varias alternativas para comprender a grandes rasgos los datos de flujos metabólicos en estado estacionario. Algunas de ellas están basadas en la aplicación de métodos multivariantes propuestos con anterioridad, mientras que otras son técnicas nuevas basadas en descomposiciones bilineales utilizando rutas metabólicas elementales. A partir de éstas se ha desarrollado software de libre acceso para la comunidad científica. A su vez, en esta tesis se propone un marco para analizar datos metabólicos en estado no estacionario. Para ello se adapta el enfoque tradicional para sistemas en estado estacionario, modelando las dinámicas de los experimentos empleando análisis de datos de dos y tres vías. En esta parte de la tesis también se establecen relaciones entre los distintos niveles ómicos, integrando diferentes fuentes de información en modelos de fusión de datos. Finalmente, se estudia la interacción entre organismos, como naranjas y hongos, mediante el análisis multivariante de imágenes, con futuras aplicaciones a la industria alimentaria. El tercer bloque de esta tesis representa un estudio a fondo de diferentes problemas relacionados con datos faltantes en quimiometría, biología de sistemas y en la industria de bioprocesos. En los capítulos más teóricos de esta parte, se proponen nuevos algoritmos para ajustar modelos multivariantes, tanto exploratorios como de regresión, en presencia de datos faltantes. Estos algoritmos sirven además como estrategias de preprocesado de los datos antes del uso de cualquier otro método. Respecto a las aplicaciones, en este bloque se explora la reconstrucción de redes en ciencias ómicas cuando aparecen valores faltantes o atípicos en las bases de datos. Una segunda aplicación de esta parte es la transferencia de modelos de calibración entre instrumentos de infrarrojo cercano, evitando así costosas re-calibraciones en bioindustrias y laboratorios de investigación. Finalmente, se propone un paquete software que incluye una interfaz amigable, disponible de forma gratuita para imputación de datos faltantes. En la última parte, se discuten los aspectos más relevantes de esta tesis para la investigación y la biotecnología, incluyendo líneas futuras de trabajo.
Aquesta tesi doctoral es centra en l'estudi, desenvolupament, i aplicació de tècniques quimiomètriques en l'emergent camp de la biologia de sistemes. Procediments comúnment utilizats i mètodes nous s'apliquen per a resoldre preguntes d'investigació en diferents equips multidisciplinars, tant en l'àmbit acadèmic com en l'industrial. Les metodologies desenvolupades en aquest document enriquixen la plétora de tècniques utilitzades en les ciències òmiques per a entendre el funcionament d'organismes biològics i milloren els processos en la indústria biotecnològica, integrant coneixement biològic a distints nivells i explotant els paquets de software derivats d'aquesta tesi. Aquesta dissertació s'estructura en quatre parts. El primer bloc descriu el marc en el qual s'articulen les contribucions ací presentades. En ell s'esbossen els objectius dels dos projectes d'investigació relacionats amb aquesta tesi. Així mateix, s'introduixen els temes específics desenvolupats en aquest document mitjançant presentacions en conferències i articles d'investigació. En aquesta part figura una descripació exhaustiva de les ciències òmiques i les seues interrelacions en el paradigma de la biologia de sistemes, junt amb una revisió dels mètodes multivariants més aplicats en quimiometria, que supossen els pilars sobre els quals s'assenten els nous procediments ací proposats. La segona part es centra en resoldre problemes dins de la metabolòmica, fluxòmica, proteòmica i genòmica a partir de l'anàlisi de dades. Per a això es proposen diverses alternatives per a compendre a grans trets les dades de fluxos metabòlics en estat estacionari. Algunes d'elles estàn basades en l'aplicació de mètodes multivariants propostos amb anterioritat, mentre que altres són tècniques noves basades en descomposicions bilineals utilizant rutes metabòliques elementals. A partir d'aquestes s'ha desenvolupat software de lliure accés per a la comunitat científica. Al seu torn, en aquesta tesi es proposa un marc per a analitzar dades metabòliques en estat no estacionari. Per a això s'adapta l'enfocament tradicional per a sistemes en estat estacionari, modelant les dinàmiques dels experiments utilizant anàlisi de dades de dues i tres vies. En aquesta part de la tesi també s'establixen relacions entre els distints nivells òmics, integrant diferents fonts d'informació en models de fusió de dades. Finalment, s'estudia la interacció entre organismes, com taronges i fongs, mitjançant l'anàlisi multivariant d'imatges, amb futures aplicacions a la indústria alimentària. El tercer bloc d'aquesta tesi representa un estudi a fons de diferents problemes relacionats amb dades faltants en quimiometria, biologia de sistemes i en la indústria de bioprocessos. En els capítols més teòrics d'aquesta part, es proposen nous algoritmes per a ajustar models multivariants, tant exploratoris com de regressió, en presencia de dades faltants. Aquests algoritmes servixen ademés com a estratègies de preprocessat de dades abans de l'ús de qualsevol altre mètode. Respecte a les aplicacions, en aquest bloc s'explora la reconstrucció de xarxes en ciències òmiques quan apareixen valors faltants o atípics en les bases de dades. Una segona aplicació d'aquesta part es la transferència de models de calibració entre instruments d'infrarroig proper, evitant així costoses re-calibracions en bioindústries i laboratoris d'investigació. Finalment, es proposa un paquet software que inclou una interfície amigable, disponible de forma gratuïta per a imputació de dades faltants. En l'última part, es discutixen els aspectes més rellevants d'aquesta tesi per a la investigació i la biotecnologia, incloent línies futures de treball.
Folch Fortuny, A. (2016). Chemometric Approaches for Systems Biology [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/77148
TESIS
Premiado

APA, Harvard, Vancouver, ISO und andere Zitierweisen

45

NI, JIAQIAN. „Plasma Biomarkers for Age-Related Macular Degeneration“. Cleveland State University / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=csu1236700270.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

46

Wang, Xiangxue. „A PROGNOSTIC AND PREDICTIVE COMPUTATIONAL PATHOLOGY BASED COMPANION DIAGNOSTIC APPROACH: PRECISION MEDICINE FOR LUNG CANCER“. Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1574125440501667.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

47

Tavera, Gloria. „Helicobacter pylori Genetic Variation and Gastric Disease“. Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1565176211647636.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

48

Seidel, Richard Alan. „Conservation Biology of the Gammarus pecos Species Complex: Ecological Patterns across Aquatic Habitats in an Arid Ecosystem“. Miami University / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=miami1251472290.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

49

Le, Priol Christophe. „Variance de l'expression des microARN et des ARN messagers dans le cancer“. Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAS023.

Der volle Inhalt der Quelle

Annotation:

La majorité des études sur l’expression des gènes cherchent à identifier des gènes présentant des différences de moyenne d’expression entre plusieurs populations d’échantillons. Dans ce cadre, la variance est considérée comme un paramètre à contrôler. Cependant, à l’instar d’une différence de moyenne, une différence de variance d’expression de gènes entre populations d’échantillons peut avoir un sens biologique et physiologique.Les microARN (miARN) sont d’importants régulateurs de l’expression des gènes. Le nombre important de leurs cibles et leur mode d’action confèrent aux miARN un rôle tampon. L’objectif de ma thèse est d’étudier la variance d’expression des miARN et des ARN messagers (ARNm), en particulier ceux ciblés par des miARN, durant la cancérogénèse. Nous espérons que cette approche permettra d’identifier des gènes qui ne peuvent pas être détectés par l’analyse classique de différence de moyenne d’expression et qui pourraient servir de biomarqueurs potentiels ou de cibles thérapeutiques. En outre, en combinant l’expression de miARN et d’ARNm et en analysant leur variance à une échelle systémique, nous espérons pouvoir mieux caractériser le rôle tampon des miARN.Plusieur méthodes incluant des tests statistiques d’égalité de variance et des modèles basés sur la distribution binomiale négative ont été évaluées. Les performances de ces méthodes ont été étudiées en détails à l’aide de jeux de données simulées. Par la suite, elles ont été appliquées aux jeux de données The Cancer Genome Atlas dans le but d’identifier des gènes ayant une différence de variance d’expression entre échantillons sains et tumoraux. De nombreux miARN et ARNm présentant une augmentation de leur variance d’expression dans les tumeurs ont été détectés. Pour la plupart des cancers, certaines fonctions biologiques importantes telles que le catabolisme ou l’autophagie sont sur-représentées parmi ces ARNm. Ainsi, analyser des gènes différentiellement variants semble être une approche pertinente pour avoir une meilleure compréhension de la progression tumorale et devrait être prise en compte dans le cadre de la recherche de nouveaux biomarqueurs et cibles thérapeutiques potentiels
The majority of gene expression studies focus on looking for genes whose mean expression is different when comparing two or more populations of samples. In this context, the variance is treated as a parameter to be controlled. However, similarly to a difference of mean, a difference of variance in gene expression between sample populations may also be biologically and physiologically relevant.MicroRNAs (miRNAs) are key gene expression regulators. The large number of their targets and the fine tuning of their regulation confer to miRNAs a buffering role. The objective of my thesis is to study the variance in expression of miRNAs and messenger RNAs (ARNm), especially those targeted by miRNAs, in particular during cancerogenesis. We hope that this approach can identify genes which cannot be identified by the tradional differential expression analysis and yet serve as potential biomarkers or therapeutic targets. Furthermore, by combining both miRNA and mRNA expression and analyzing their variance at a system level, we aim at better characterize the buffering role of miRNAs.Several methods including statistical tests of equality of variance and models based on the negative binomial distribution were evaluated. The performances of these methods were thoroughly tested on simulated datasets. Then, they were applied to The Cancer Genome Atlas datasets in order to identify genes with a differential expression variance when comparing normal and tumor samples. Many miRNAs and mRNAs with an increase of expression variance in tumors were detected. Interestingly, among these mRNAs, some key biological functions such as catabolism or autophagy are over-represented in most cancers. Thus, analyzing genes having a differential expression variance is relevant to gain knowledge in tumor progression and opens a new space for the discovery of new potential biomarkers and therapeutic avenues

APA, Harvard, Vancouver, ISO und andere Zitierweisen

50

Biton, Anne. „Analyse en composantes indépendantes du transcriptome de cancers“. Thesis, Paris 11, 2011. http://www.theses.fr/2011PA11T028.

Der volle Inhalt der Quelle

Annotation:

L'analyse de données d'expression montre qu'il est avantageux d'analyser les processus biologiques en termes de modules plutôt que simplement considérer les gènes un par un. Dans ce projet nous avons conduit une analyse non supervisée des données d'expression de gènes de plusieurs cohortes de tumeurs urothéliales en appliquant la méthode d'Analyse en Composantes Indépendantes. Plusieurs études ont démontré les meilleures performances de l'ACI par rapport à l'ACP et les méthodes de clustering, pour obtenir une décomposition plus réaliste des données d'expression en patterns d'expression pertinents et associés avec le phénotype d'intérêt.Les tumeurs urothéliales apparaissent et évoluent selon deux voies distinctes dont la probabilité de progression en cancer musculo-invasif diffère radicalement. Excepté la mutation de FGFR3 dans le groupe le moins agressif, les processus moléculaires sous-jacents n'ont pas été complètement identifiés. Le principal objectif de cette thèse était dédié aux interprétations biologiques des différentes composantes indépendantes pour aider à confirmer et étendre la liste des processus biologiques connus pour être impliqués dans le cancer de vessie.Chaque composante indépendante est caractérisée par une liste de projections de gènes et de contributions pondérées d'échantillons tumoraux . En combinant expertise biologique et comparaison des listes de gènes à des voies existantes et en étudiant conjointement l'association des composantes aux annotations cliniques et moléculaires, nous avons pu différencier les CIscausées par des facteurs techniques, tels que le prélèvement chirurgical de celles ayant des interprétations biologiques pertinentes. De plus, parmi les signaux pertinents biologiquement, cette analyse nous a permis de différencier les signaux provenant du stroma, comme la réponse immunitaire médiée par les lymphocytesB&T, de ceux produits par les tumeurs elles-mêmes, comme les signaux reliés à la prolifération ou à la différenciation. La classification des tumeurs selon leurs contributions à certaines CIs a pu être étroitement associée à des annotations anatomo-cliniques, et a mis en évidence de nouveaux sous-types de tumeur spotentiels, qui suggèrent l'existence de voies de progression supplémentaires dans le cancer de vessie. De façon similaire, l'étude des contributions de groupes de tumeurs basés sur des annotations cliniques ou moléculaires a montré différents niveaux de contamination par le stroma entre les tumeurs mutées et nonmutées pour FGFR3. La reproductibilité des composantes a été étudiée en utilisant des graphes de corrélation. La majeure partie des CIs interprétées a été validée sur trois jeux de données indépendants, et plusieurs d'entre elles ont aussi détectées dans un jeu de données de lignées cellulaires.Une deuxième étude sur le rétinoblastome a montré que nous pouvions tirer partie de l'ACI dans des contextes variés. Le rétinoblastome est initié par la perte des deux alléles du gène suppresseur de tumeur RB1. D'autres évènements génomiques non identifiés sont nécessaires à la progression de la maladie. Nous avons observé une association entre âge des patients et altérations génomiques. Les patients jeunes présentant moins d'altérations que les patients âgés, ces derniers présentant des gains du 1q et des pertes du 16q. Cette séparation des tumeurs selon l'âge est également observée sur les données d'expression, notamment en appliquant l'ACI dont l'une des composantes discrimine les patients selon leur âge. Ces résultats suggèrent l'existence de deux voies de progression dans le rétinoblastome. L'analyse des données à haut débit fournit de nombreuses listes de gènes. Afin de les interpréter, une possibilité est d'extraire les dernières publications groupées par sujets prédéfinis (fonction, localisation,...)
Practice of gene expression data analysis shows that it is advantageous to analyze biologicalprocesses in terms of modules rather than simply consider gene one by one. In this project, we conducted anunsupervised analysis of the gene expression data of several cohorts of urothelial tumors, applying theIndependent Component Analysis method. Several studies demonstrated the outperformance of ICA overPCA and clustering-based methods in obtaining a more realistic decomposition of the expression data intoconsistent patterns of coexpressed genes associated with the studied phenotypes[1, 2, 3, 4].Urothelial tumors arise and evolve through two distinct pathways which radically differ on their probabilityof progression to muscle invasion. Except the mutation of FGFR3 in the less aggressive group, theunderlying molecular processes have not been completely identified. The first and main objective of the PhDthesis was dedicated to the biological interpretation of the different independent components to help toconfirm and extend the list of biological processes known to be involved in bladder cancer.Each independent component (IC) is characterized by a list of gene projections on the one hand and weightedcontributions of tumor samples on the other hand. By combining biological expertise and comparison of theassociated list of genes to known pathways, and jointly studying the association of the components tomolecular and clinical annotations, we have been able to differentiate components that were caused bytechnical factors, such as surgical sampling, from those having consistent biological interpretationin terms of tumor biology. Moreover, among the biologically meaningful signals, this analysis allowed us todifferentiate the signals from stroma of the tumor, like immune response mediated by B- and T-lymphocytes,from the signals produced by the tumors themselves, like signals related to proliferation, or differentiation.The clustering of the tumor samples according to their contributions on some ICs can be closely associated toanatomo-clinical annotations, and highlighted new potential subtypes of tumors which suggest existence ofadditional pathways of bladder cancer progression. Similarly, the study of the contributions of preestablishedgroups of tumors based on clinical or molecular criteria showed different levels of stromacontamination between FGFR3 non-mutated and mutated tumors. The reproducibility of the components wasinvestigated using correlation graphs. The major part of the interpreted ICs was validated on threeindependent bladder cancer datasets, and several of them were also identified in an urothelial cancer celllines data set.A second study about retinoblastoma gave us the occasion to show that we can take advantage ofICA in various contexts. Retinoblastoma is initiated by the loss of both alleles of the RB1 tumor suppressorgene. Although necessary for initiation, other genomic events, that remain to be identified, are needed for theprogression of the disease [5]. We observed, as it was previously described [6], an association between age ofthe patients and levels of genomic aberrations, the younger patients having fewer alterations than the olderpatients, which generally present gain of 1q and loss of 16q. We showed that this tendency of the tumors tobe clustered into two groups of age can also be observed on the expression data by applying ICA whose oneof the component was highly correlated to the age of the patients. These results suggest the existence of twopathways of progression in retinoblastoma.The analysis of high throughput data provides many lists of genes. To interpret them, a possibility isto study the latest related publications grouped by pre-defined group of topics (function, cellular location...).To that aim, in a third study, we introduced a web-based Java application tool named GeneValorization whichgives a clear and handful overview of the bibliography corresponding to one particular gene list [7]

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema „Bioinformatics and biostatistics“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an