Dissertations / Theses on the topic 'Omics data analysi'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Omics data analysi.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
MASPERO, DAVIDE. "Computational strategies to dissect the heterogeneity of multicellular systems via multiscale modelling and omics data analysis." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2022. http://hdl.handle.net/10281/368331.
Full textHeterogeneity pervades biological systems and manifests itself in the structural and functional differences observed both among different individuals of the same group (e.g., organisms or disease systems) and among the constituent elements of a single individual (e.g., cells). The study of the heterogeneity of biological systems and, in particular, of multicellular systems is fundamental for the mechanistic understanding of complex physiological and pathological phenomena (e.g., cancer), as well as for the definition of effective prognostic, diagnostic, and therapeutic strategies. This work focuses on developing and applying computational methods and mathematical models for characterising the heterogeneity of multicellular systems and, especially, cancer cell subpopulations underlying the evolution of neoplastic pathology. Similar methodologies have been developed to characterise viral evolution and heterogeneity effectively. The research is divided into two complementary portions, the first aimed at defining methods for the analysis and integration of omics data generated by sequencing experiments, the second at modelling and multiscale simulation of multicellular systems. Regarding the first strand, next-generation sequencing technologies allow us to generate vast amounts of omics data, for example, related to the genome or transcriptome of a given individual, through bulk or single-cell sequencing experiments. One of the main challenges in computer science is to define computational methods to extract useful information from such data, taking into account the high levels of data-specific errors, mainly due to technological limitations. In particular, in the context of this work, we focused on developing methods for the analysis of gene expression and genomic mutation data. In detail, an exhaustive comparison of machine-learning methods for denoising and imputation of single-cell RNA-sequencing data has been performed. Moreover, methods for mapping expression profiles onto metabolic networks have been developed through an innovative framework that has allowed one to stratify cancer patients according to their metabolism. A subsequent extension of the method allowed us to analyse the distribution of metabolic fluxes within a population of cells via a flux balance analysis approach. Regarding the analysis of mutational profiles, the first method for reconstructing phylogenomic models from longitudinal data at single-cell resolution has been designed and implemented, exploiting a framework that combines a Markov Chain Monte Carlo with a novel weighted likelihood function. Similarly, a framework that exploits low-frequency mutation profiles to reconstruct robust phylogenies and likely chains of infection has been developed by analysing sequencing data from viral samples. The same mutational profiles also allow us to deconvolve the signal in the signatures associated with specific molecular mechanisms that generate such mutations through an approach based on non-negative matrix factorisation. The research conducted with regard to the computational simulation has led to the development of a multiscale model, in which the simulation of cell population dynamics, represented through a Cellular Potts Model, is coupled to the optimisation of a metabolic model associated with each synthetic cell. Using this model, it is possible to represent assumptions in mathematical terms and observe properties emerging from these assumptions. Finally, we present a first attempt to combine the two methodological approaches which led to the integration of single-cell RNA-seq data within the multiscale model, allowing data-driven hypotheses to be formulated on the emerging properties of the system.
Wang, Zhi. "Module-Based Analysis for "Omics" Data." Thesis, North Carolina State University, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3690212.
Full textThis thesis focuses on methodologies and applications of module-based analysis (MBA) in omics studies to investigate the relationships of phenotypes and biomarkers, e.g., SNPs, genes, and metabolites. As an alternative to traditional single–biomarker approaches, MBA may increase the detectability and reproducibility of results because biomarkers tend to have moderate individual effects but significant aggregate effect; it may improve the interpretability of findings and facilitate the construction of follow-up biological hypotheses because MBA assesses biomarker effects in a functional context, e.g., pathways and biological processes. Finally, for exploratory “omics” studies, which usually begin with a full scan of a long list of candidate biomarkers, MBA provides a natural way to reduce the total number of tests, and hence relax the multiple-testing burdens and improve power.
The first MBA project focuses on genetic association analysis that assesses the main and interaction effects for sets of genetic (G) and environmental (E) factors rather than for individual factors. We develop a kernel machine regression approach to evaluate the complete effect profile (i.e., the G, E, and G-by-E interaction effects separately or in combination) and construct a kernel function for the Gene-Environmental (GE) interaction directly from the genetic kernel and the environmental kernel. We use simulation studies and real data applications to show improved performance of the Kernel Machine (KM) regression method over the commonly adapted PC regression methods across a wide range of scenarios. The largest gain in power occurs when the underlying effect structure is involved complex GE interactions, suggesting that the proposed method could be a useful and powerful tool for performing exploratory or confirmatory analyses in GxE-GWAS.
In the second MBA project, we extend the kernel machine framework developed in the first project to model biomarkers with network structure. Network summarizes the functional interplay among biological units; incorporating network information can more precisely model the biological effects, enhance the ability to detect true signals, and facilitate our understanding of the underlying biological mechanisms. In the work, we develop two kernel functions to capture different network structure information. Through simulations and metabolomics study, we show that the proposed network-based methods can have markedly improved power over the approaches ignoring network information.
Metabolites are the end products of cellular processes and reflect the ultimate responses of biology system to genetic variations or environment exposures. Because of the unique properties of metabolites, pharmcometabolomics aims to understand the underlying signatures that contribute to individual variations in drug responses and identify biomarkers that can be helpful to response predictions. To facilitate mining pharmcometabolomic data, we establish an MBA pipeline that has great practical value in detection and interpretation of signatures, which may potentially indicate a functional basis for the drug response. We illustrate the utilities of the pipeline by investigating two scientific questions in aspirin study: (1) which metabolites changes can be attributed to aspirin intake, and (2) what are the metabolic signatures that can be helpful in predicting aspirin resistance. Results show that the MBA pipeline enables us to identify metabolic signatures that are not found in preliminary single-metabolites analysis.
Zheng, Ning. "Mediation modeling and analysis forhigh-throughput omics data." Thesis, Uppsala universitet, Statistiska institutionen, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-256318.
Full textCampanella, Gianluca. "Statistical analysis of '-omics' data : developments and applications." Thesis, Imperial College London, 2015. http://hdl.handle.net/10044/1/32109.
Full textBudimir, Iva <1992>. "Stochastic Modeling and Correlation Analysis of Omics Data." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amsdottorato.unibo.it/9792/1/Budimir_Iva_tesi.pdf.
Full textKim, Jieun. "Computational tools for the integrative analysis of muti-omics data to decipher trans-omics networks." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/28524.
Full textDing, Hao. "Visualization and Integrative analysis of cancer multi-omics data." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1467843712.
Full textCastleberry, Alissa. "Integrated Analysis of Multi-Omics Data Using Sparse Canonical Correlation Analysis." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu15544898045976.
Full textTellaroli, Paola. "Three topics in omics research." Doctoral thesis, Università degli studi di Padova, 2015. http://hdl.handle.net/11577/3423912.
Full textIl titolo piuttosto generico di questa tesi è dovuto al fatto che sono stati indagati diversi aspetti di fenomeni biologici. La maggior parte di questo lavoro è stato rivolto alla ricerca dei limiti di uno degli strumenti essenziali per l'analisi di dati di espressione genica: l'analisi dei gruppi. Esistendo diverse centinaia di metodi di raggruppamento, chiaramente non c'è carenza di algoritmi di analisi dei gruppi, ma, allo stesso tempo, alcuni quesiti fondamentali non hanno ancora ricevuto risposte soddisfacenti. In particolare, presentiamo un nuovo algoritmo di analisi dei gruppi per dati statici ed una nuova strategia per il raggruppamento di dati temporali di breve lunghezza. Infine, abbiamo analizzato dati provenienti da una tecnologia relativamente nuova, chiamata Cap Analysis Gene Expression, utile per l'analisi dei promotori su tutto il genoma e ancora in gran parte inesplorata.
Ayati, Marzieh. "Algorithms to Integrate Omics Data for Personalized Medicine." Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case1527679638507616.
Full textKonrad, Attila. "Investigation of Pathway Analysis Tools for mapping omics data to pathways." Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20843.
Full textThis thesis examines PATs from a multidisciplinary view. There are a lot of PAT's existing today analyzing specific type of omics data, therefore we investigate them and what they can do. By defining some specific requirements such as how many omics data types it can handle, the accuracy of the PAT can be obtained to get the most suitable PAT when it comes to mapping omics data to pathways. Results show that no PATs found today fulfills the specific set of requirements or the main goal though software testing. The Ingenuity PAT is the closest to fulfill the requirements. Requested by the end user, two PATs are tested in combination to see if these can fulfill the requirements of the end user. Uniprot batch converter was tested with FEvER and results did not turn out successfully since the combination of the two PATs is no better than the Ingenuity PAT. Focus then turned to an alternative combination, a homepage called NCBI that have search engines connected to several free PATs available thus fulfilling the requirements. Through the search engine “omics” data can be combined and more than one input can be taken at a time. Since technology is rapidly moving forward, the need for new tools for data interpretation also grows. It means that in a near future we may be able to find a PAT that fulfills the requirements of the end users.
Lu, Yingzhou. "Multi-omics Data Integration for Identifying Disease Specific Biological Pathways." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/83467.
Full textMaster of Science
Lin, Yingxin. "Statistical modelling and machine learning for single cell data harmonisation and analysis." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/28034.
Full textEichner, Johannes [Verfasser]. "Machine learning and statistical methods for preclinical omics data analysis / Johannes Eichner." München : Verlag Dr. Hut, 2015. http://d-nb.info/1079768874/34.
Full textBarcelona, Cabeza Rosa. "Genomics tools in the cloud: the new frontier in omics data analysis." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/672757.
Full textLos avances tecnológicos en la secuenciación de próxima generación (NGS) han revolucionado el campo de la genómica. El aumento de velocidad y rendimiento de las tecnologías NGS de los últimos años junto con la reducción de su coste ha permitido interrogar base por base el genoma humano de una manera eficiente y asequible. Todos estos avances han permitido incrementar el uso de las tecnologías NGS en la práctica clínica para la identificación de variaciones genómicas y su relación con determinadas enfermedades. Sin embargo, sigue siendo necesario mejorar la accesibilidad, el procesamiento y la interpretación de los datos debido a la enorme cantidad de datos generados y a la gran cantidad de herramientas disponibles para procesarlos. Además de la gran cantidad de algoritmos disponibles para el descubrimiento de variantes, cada tipo de variación y de datos requiere un algoritmo específico. Por ello, se requiere una sólida formación en bioinformática tanto para poder seleccionar el algoritmo más adecuado como para ser capaz de ejecutarlo correctamente. Partiendo de esa base, el objetivo de este proyecto es facilitar el procesamiento de datos de secuenciación para la identificación e interpretación de variantes para los no bioinformáticos. Todo ello mediante la creación de flujos de trabajo de alto rendimiento y con una sólida base científica, sin dejar de ser accesibles y fáciles de utilizar, así como de una plataforma sencilla y muy intuitiva para la interpretación de datos. Se ha realizado una exhaustiva revisión bibliográfica donde se han seleccionado los mejores algoritmos con los que crear flujos de trabajo automáticos para el descubrimiento de variantes cortas germinales (SNPs e indels) y variantes estructurales germinales (SV), incluyendo tanto CNV como reordenamientos cromosómicos, de ADN humano moderno. Además de crear flujos de trabajo para el descubrimiento de variantes, se ha implementado un flujo para la optimización in silico de la detección de CNV a partir de datos de WES y TS (isoCNV). Se ha demostrado que dicha optimización aumenta la sensibilidad de detección utilizando solo datos NGS, lo que es especialmente importante para el diagnóstico clínico. Además, se ha desarrollado un flujo de trabajo para el descubrimiento de variantes mediante la integración de datos de WES y RNA-seq (varRED) que ha demostrado aumentar el número de variantes detectadas sobre las identificadas cuando solo se utilizan datos de WES. Es importante señalar que la identificación de variantes no solo es importante para las poblaciones modernas, el estudio de las variaciones en genomas antiguos es esencial para comprender la evolución humana. Por ello, se ha implementado un flujo de trabajo para la identificación de variantes cortas a partir de muestras antiguas de WGS. Dicho flujo se ha aplicado a una mandíbula humana datada entre el 16980-16510 a.C. Las variantes ancestrales allí descubiertas se informaron sin mayor interpretación debido a la baja cobertura de la muestra. Finalmente, se ha implementado GINO para facilitar la interpretación de las variantes identificadas por los flujos de trabajo desarrollados en esta tesis. GINO es una plataforma fácil de usar para la visualización e interpretación de variantes germinales que requiere licencia de uso. Con el desarrollo de esta tesis se ha conseguido implementar las herramientas necesarias para la identificación de alto rendimiento de todos los tipos de variantes germinales, así como de una poderosa plataforma para visualizar dichas variantes de forma sencilla y rápida. El uso de esta plataforma permite a los no bioinformáticos centrarse en interpretar los resultados sin tener que preocuparse por el procesamiento de los datos con la garantía de que estos sean científicamente robustos. Además, ha sentado las bases para en un futuro próximo implementar una plataforma para el completo análisis y visualización de datos genómicos
Bioinformática
Wolf, Beat [Verfasser], and Thomas [Gutachter] Dandekar. "Reducing the complexity of OMICS data analysis / Beat Wolf ; Gutachter: Thomas Dandekar." Würzburg : Universität Würzburg, 2017. http://d-nb.info/1142114295/34.
Full textSchurmann, Claudia [Verfasser]. "Analysis and Integration of Complex Omics Data of the SHIP Study / Claudia Schurmann." Greifswald : Universitätsbibliothek Greifswald, 2013. http://d-nb.info/1042077789/34.
Full textHernández, de Diego Rafael. "Development of bioinformatics resources for the integrative analysis of Next Generation omics data." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/91227.
Full textLos avances en las técnicas de secuenciación y el abaratamiento tecnológico han favorecido el desarrollo y la popularización de una nueva gama de disciplinas de investigación genómica, conocidas como "ómicas". Estas tecnologías son capaces de realizar mediciones simultáneas de miles de moléculas esenciales para la vida, tales como el ADN, el ARN, las proteínas y los metabolitos. Históricamente, la investigación genómica clásica ha seguido un enfoque reduccionista al estudiar la estructura, regulación y función de estas moléculas de manera independiente. Sin embargo, el método reduccionista es incapaz de explicar muchos de los fenómenos biológicos que tienen lugar en un sistema vivo, sugiriendo que la esencia del sistema no puede explicarse simplemente mediante la enumeración de elementos que lo componen, sino que radica en la dinámica de los procesos biológicos que entre ellos acontecen. La Biología de Sistemas se ha establecido en los últimos años como el área de investigación multidisciplinaria que trata de modelar el comportamiento dinámico de los sistemas biológicos a través del estudio holístico de las interacciones entre sus partes, combinando mediciones simultáneas de diferentes tipos de moléculas e integrando múltiples fuentes de información para identificar aquellos componentes que cambian de manera coordinada en las condiciones estudiadas. La BS es un área interdisciplinar que requiere que biólogos, matemáticos, bioquímicos y otros investigadores trabajen en estrecha colaboración, y en la que la informática tiene un papel fundamental dado el volumen y la complejidad de los datos. Esta tesis aborda el problema de la gestión, integración y análisis de los datos en estudios multi-ómicos. Más específicamente, la investigación se ha centrado en dos de los retos computacionales más característicos de la BS: el desarrollo de bases de datos integrativas y el problema de la visualización integrativa. Así, la primera parte de este trabajo se ha dedicado al diseño y creación de un recurso bioinformático para la gestión de experimentos multi-ómicos. La plataforma desarrollada (STATegra EMS) ofrece un conjunto de herramientas que facilitan el almacenamiento y la organización de los grandes conjuntos de datos que son generados durante los experimentos, así como la anotación de las posteriores etapas de procesamiento y análisis de la información. La heterogeneidad, el volumen y la variabilidad de los datos son algunos de los obstáculos que han sido abordados durante el desarrollo del STATegra EMS, con el fin de alcanzar un registro detallado de la meta-información que permita discriminar cada conjunto de datos y lograr así una integración exitosa de la información. Para ello, la plataforma desarrollada ofrece una interfaz web colaborativa y de fácil manejo en la que se combinan modernas tecnologías web y conocidos estándares comunitarios para la representación de los diferentes componentes del experimento. En la segunda parte de esta tesis se discuten la situación actual y las dificultades de la visualización integrativa de datos multi-ómicos, y se presenta la herramienta desarrollada, PaintOmics 3. La visualización integrativa combinada con técnicas de análisis de datos es probablemente una de las herramientas más poderosa para la interpretación y validación de los resultados en BS. PaintOmics 3 proporciona un completo marco de trabajo para realizar análisis de enriquecimiento de funciones biológicas en experimentos con múltiples condiciones y tipos de datos, en el que se combinan potentes herramientas de visualización integrativa sobre diagramas de interacción molecular y redes de reacción KEGG, redes de interacción de procesos biológicos, y estudios estadísticos de los datos. Además, a diferencia de otras herramientas, PaintOmics 3 destaca por su facilidad de uso e interactividad, así como por su flexibilidad y variedad de los datos
Els avenços en les tècniques de seqüenciació d'alt rendiment i l'abaratiment tecnològic posterior han afavorit el desenvolupament i la popularització d'una nova gamma de disciplines d'investigació genòmica, conegudes col¿lectivament com a "òmiques". Aquestes tecnologies permeten realitzar mesuraments simultanis de milers de molècules essencials per a la vida, com ara l'ADN, l'ARN, les proteïnes i els metabòlits. Històricament, la investigació genòmica clàssica ha seguit un enfocament reduccionista a l'hora d'estudiar l'estructura, la regulació i la funció d'aquestes unitats biològiques de manera independent. No obstant això, el mètode reduccionista és incapaç d'explicar molts dels fenòmens biològics que tenen lloc en un sistema viu, suggerint que l'essència del sistema no es pot explicar simplement mitjançant l'enumeració d'elements que el componen, sinó que radica en la dinàmica dels processos biològics que tenen lloc entre ells. La Biologia de Sistemes (BS) ha esdevingut els darrers anys l'àrea d'investigació multidisciplinària que tracta de modelar el comportament dinàmic dels sistemes biològics a través de l'estudi holístic de les interaccions entre les seues parts, combinant mesuraments simultanis de diferents tipus de molècules i integrant múltiples fonts d'informació per a identificar aquells components que canvien de manera coordinada en les condicions objecte d'estudi. La BS és una àrea interdisciplinar que requereix que biòlegs, matemàtics, bioquímics i altres investigadors treballen plegats i en la qual la informàtica té un paper fonamental, atès el volum i la complexitat de les dades emprades. Aquesta tesi aborda el problema de la gestió, la integració i l'anàlisi de les dades en estudis multi-òmics. Més concretament, la investigació s'ha centrat en dos dels reptes computacionals més característics de la BS: el desenvolupament de bases de dades integratives i el problema de la visualització integrativa. Així, la primera part d'aquest treball s'ha dedicat al disseny i creació d'un recurs bioinformàtic per a la gestió d'experiments multi-òmics. La plataforma desenvolupada (STATegra EMS) ofereix un conjunt d'eines que faciliten l'emmagatzematge i l'organització dels grans conjunts de dades que són generats durant aquests experiments, així com l'anotació de les etapes posteriors de processament i anàlisi de la informació. L'heterogeneïtat, el volum i l'alta variabilitat de les dades òmiques són alguns dels obstacles que han estat abordats durant el desenvolupament de l'STATegra EMS, amb la finalitat d'assolir un registre detallat de la meta-informació que permeta discriminar cada conjunt de dades i aconseguir així una integració reeixida de la informació. Per a aconseguir-ho, la plataforma desenvolupada ofereix una interfície web collaborativa i fàcil de fer servir que conjumina modernes tecnologies web i coneguts estàndards comunitaris per a la representació dels diferents components de l'experiment. En la segona part d'aquesta tesi s'hi estudia la situació actual i les dificultats de la visualització integrativa de dades en experiments multi-òmics i s'hi presenta l'eina web desenvolupada: PaintOmics 3. La visualització integrativa en combinació amb tècniques d'anàlisi de dades és probablement una de les eines més poderosa per a la interpretació i validació dels resultats en BS. PaintOmics 3 proporciona un marc complet de treball per a fer anàlisis d'enriquiment de funcions biològiques en experiments amb múltiples condicions i tipus de dades; s'hi combinen eines potents de visualització integrativa de dades sobre diagrames d'interacció molecular i xarxes de reacció KEGG, xarxes d'interacció de processos biològics i estudis estadístics de les dades. A més, a diferència d'altres eines desenvolupades, PaintOmics 3 és molt interactiva i fàcil d'usar, i destaca per la flexibilitat i varietat de dades que accepta, co
Hernández De Diego, R. (2017). Development of bioinformatics resources for the integrative analysis of Next Generation omics data [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/91227
TESIS
Cao, Yingying [Verfasser], and Daniel [Akademischer Betreuer] Hoffmann. "Computational analysis and interpretation of multi-omics data / Yingying Cao ; Betreuer: Daniel Hoffmann." Duisburg, 2021. http://d-nb.info/1234911124/34.
Full textFaria, Do Valle Italo <1990>. "New Approaches for the Molecular Profiling of Human Cancers through Omics Data Analysis." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amsdottorato.unibo.it/7997/1/FariaDoValle_Italo_tesi.pdf.
Full textFatai, Azeez Ayomide. "Computational analysis of multilevel omics data for the elucidation of molecular mechanisms of cancer." University of the Western Cape, 2015. http://hdl.handle.net/11394/4782.
Full textCancer is a group of diseases that arises from irreversible genomic and epigenomic alterations that result in unrestrained proliferation of abnormal cells. Detailed understanding of the molecular mechanisms underlying a cancer would aid the identification of most, if not all, genes responsible for its progression and the development of molecularly targeted chemotherapy. The challenge of recurrence after treatment shows that our understanding of cancer mechanisms is still poor. As a contribution to overcoming this challenge, we provide an integrative multi-omic analysis on glioblastoma multiforme (GBM) for which large data sets on di erent classes of genomic and epigenomic alterations have been made available in the Cancer Genome Atlas data portal. The rst part of this study involves protein network analysis for the elucidation of GBM tumourigenic molecular mechanisms, identification of driver genes, prioritization of genes in chromosomal regions with copy number alteration, and co-expression and transcriptional analysis. Functional modules were obtained by edge-betweenness clustering of a protein network constructed from genes with predicted functional impact mutations and differentially expressed genes. Pathway enrichment analysis was performed on each module to identify statistical overrepresentation of signaling pathways. Known and novel candidate cancer driver genes were identi ed in the modules, and functionally relevant genes in chromosomal regions altered by homologous deletion or high-level amplication were prioritized with the protein network. Co-expressed modules enriched in cancer biological processes and transcription factor targets were identified using network genes that demonstrated high expression variance. Our findings show that GBM's molecular mechanisms are much more complex than those reported in previous studies. We next identified differentially expressed miRNAs for which target genes associated with the protein network were also differentially expressed. MiRNAs and target genes were prioritized based on the number of targeted genes and targeting miRNAs, respectively. MiRNAs that correlated with time to progression were selected by an elastic net-penalized Cox regression model for survival analysis. These miRNA were combined into a signature that independently predicted adjuvant therapy-linked progression-free survival in GBM and its subtypes and overall survival in GBM. The results show that miRNAs play significant roles in GBM progression and patients' survival finally, a prognostic mRNA signature that independently predicted progression-free and overall survival was identified. Pathway enrichment analysis was carried on genes with high expression variance across a cohort to identify those in chemoradioresistance associated pathways. A support vector machine-based method was then used to identify a set of genes that discriminated between rapidly- and slowly-progressing GBM patients, with minimal 5 % cross-validation error rate. The prognostic value of the gene set was demonstrated by its ability to predict adjuvant therapy-linked progression-free and overall survival in GBM and its subtypes and was validated in an independent data set. We have identified a set of genes involved in tumourigenic mechanisms that could potentially be exploited as targets in drug development for the treatment of primary and recurrent GBM. Furthermore, given their demonstrated accuracy in this study, the identified miRNA and mRNA signatures have strong potential to be combined and developed into a robust clinical test for predicting prognosis and treatment response.
Zuo, Yiming. "Differential Network Analysis based on Omic Data for Cancer Biomarker Discovery." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78217.
Full textPh. D.
Monteiro, Martins Sara [Verfasser]. "Bioinformatics analysis of multi-omics data elucidates U2 snRNP function in transcription / Sara Monteiro Martins." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2021. http://d-nb.info/1239894643/34.
Full textBockmayr, Michael [Verfasser]. "Integrative analysis of "omics" data and histopathological features in breast and ovarian cancer / Michael Bockmayr." Berlin : Medizinische Fakultät Charité - Universitätsmedizin Berlin, 2017. http://d-nb.info/1126504262/34.
Full textElhezzani, Najla Saad R. "New statistical methodologies for improved analysis of genomic and omic data." Thesis, King's College London (University of London), 2018. https://kclpure.kcl.ac.uk/portal/en/theses/new-statistical-methodologies-for-improved-analysis-of-genomic-and-omic-data(eb8d95f4-e926-4c54-984f-94d86306525a).html.
Full textHafez, Khafaga Ahmed Ibrahem 1987. "Bioinformatics approaches for integration and analysis of fungal omics data oriented to knowledge discovery and diagnosis." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/671160.
Full textThe aim of this thesis has been to develop a series of bioinformatic resources for analysis of NGS data, proteomics, or other omics technologies in the field of study and diagnosis of yeast infections. In particular, we have explored and designed distinct computational techniques to identify novel biomarker candidates of resistance traits, to predict DNA/RNA sequences’ features, and to optimize sequencing strategies for host-pathogen transcriptome sequencing studies (Dual RNA-seq). We have designed and developed an efficient bioinformatic solution composed of a server-side component constituted by distinct pipelines for VariantSeq, Denovoseq and RNAseq analyses as well as another component constituted by distinct GUI-based software to let the user to access, manage and run the pipelines with friendly-to-use interfaces. We have also designed and developed SeqEditor a software for sequence analysis and primers design for species identification and detection in PCR diagnosis. We also have developed CandidaMine an integrated data warehouse of fungal omics and for data analysis and knowledge discovery.
Tsai, Tsung-Heng. "Bayesian Alignment Model for Analysis of LC-MS-based Omic Data." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/64151.
Full textPh. D.
Ruffalo, Matthew M. "Algorithms for Constructing Features for Integrated Analysis of Disparate Omic Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=case1449238712.
Full textLi, Yichao. "Algorithmic Methods for Multi-Omics Biomarker Discovery." Ohio University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1541609328071533.
Full textBerti, Elisa. "Applicazione del metodo QDanet_PRO alla classificazione di dati omici." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/9411/.
Full textBylesjö, Max. "Latent variable based computational methods for applications in life sciences : Analysis and integration of omics data sets." Doctoral thesis, Umeå universitet, Kemi, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1616.
Full textFunktionsgenomik är ett forskningsområde med det slutgiltiga målet att karakterisera alla gener i ett genom hos en organism. Detta inkluderar studier av hur DNA transkriberas till mRNA, hur det sedan translateras till proteiner och hur dessa proteiner interagerar och påverkar organismens biokemiska processer. Den traditionella ansatsen har varit att studera funktionen, regleringen och translateringen av en gen i taget. Ny teknik inom fältet har dock möjliggjort studier av hur tusentals transkript, proteiner och små molekyler uppträder gemensamt i en organism vid ett givet tillfälle eller över tid. Konkret innebär detta även att stora mängder data genereras även från små, isolerade experiment. Att hitta globala trender och att utvinna användbar information från liknande data-mängder är ett icke-trivialt beräkningsmässigt problem som kräver avancerade och tolkningsbara matematiska modeller. Denna avhandling beskriver utvecklingen och tillämpningen av olika beräkningsmässiga metoder för att klassificera och integrera stora mängder empiriskt (uppmätt) data. Gemensamt för alla metoder är att de baseras på latenta variabler: variabler som inte uppmätts direkt utan som beräknats från andra, observerade variabler. Detta koncept är väl anpassat till studier av komplexa system som kan beskrivas av ett fåtal, oberoende faktorer som karakteriserar de huvudsakliga egenskaperna hos systemet, vilket är kännetecknande för många kemiska och biologiska system. Metoderna som beskrivs i avhandlingen är generella men i huvudsak utvecklade för och tillämpade på data från biologiska experiment. I avhandlingen demonstreras hur dessa metoder kan användas för att hitta komplexa samband mellan uppmätt data och andra faktorer av intresse, utan att förlora de egenskaper hos metoden som är kritiska för att tolka resultaten. Metoderna tillämpas för att hitta gemensamma och unika egenskaper hos regleringen av transkript och hur dessa påverkas av och påverkar små molekyler i trädet poppel. Utöver detta beskrivs ett större experiment i poppel där relationen mellan nivåer av transkript, proteiner och små molekyler undersöks med de utvecklade metoderna.
Bylesjö, Max. "Latent variable based computational methods for applications in life sciences : Analysis and integration of omics data sets /." Umeå : Chemistry Kemi, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1616.
Full textRonen, Jonathan. "Integrative analysis of data from multiple experiments." Doctoral thesis, Humboldt-Universität zu Berlin, 2020. http://dx.doi.org/10.18452/21612.
Full textThe development of high throughput sequencing (HTS) was followed by a swarm of protocols utilizing HTS to measure different molecular aspects such as gene expression (transcriptome), DNA methylation (methylome) and more. This opened opportunities for developments of data analysis algorithms and procedures that consider data produced by different experiments. Considering data from seemingly unrelated experiments is particularly beneficial for Single cell RNA sequencing (scRNA-seq). scRNA-seq produces particularly noisy data, due to loss of nucleic acids when handling the small amounts in single cells, and various technical biases. To address these challenges, I developed a method called netSmooth, which de-noises and imputes scRNA-seq data by applying network diffusion over a gene network which encodes expectations of co-expression patterns. The gene network is constructed from other experimental data. Using a gene network constructed from protein-protein interactions, I show that netSmooth outperforms other state-of-the-art scRNA-seq imputation methods at the identification of blood cell types in hematopoiesis, as well as elucidation of time series data in an embryonic development dataset, and identification of tumor of origin for scRNA-seq of glioblastomas. netSmooth has a free parameter, the diffusion distance, which I show can be selected using data-driven metrics. Thus, netSmooth may be used even in cases when the diffusion distance cannot be optimized explicitly using ground-truth labels. Another task which requires in-tandem analysis of data from different experiments arises when different omics protocols are applied to the same biological samples. Analyzing such multiomics data in an integrated fashion, rather than each data type (RNA-seq, DNA-seq, etc.) on its own, is benefitial, as each omics experiment only elucidates part of an integrated cellular system. The simultaneous analysis may reveal a comprehensive view.
Gomari, Daniel Parviz [Verfasser], Jan [Akademischer Betreuer] Krumsiek, Karsten [Gutachter] Suhre, and Jan [Gutachter] Krumsiek. "Novel network-based methods for multi-omics data analysis and interpretation / Daniel Parviz Gomari ; Gutachter: Karsten Suhre, Jan Krumsiek ; Betreuer: Jan Krumsiek." München : Universitätsbibliothek der TU München, 2021. http://d-nb.info/1235664775/34.
Full textMeng, Chen [Verfasser], Bernhard [Akademischer Betreuer] Küster, and Dmitrij [Akademischer Betreuer] Frischmann. "Application of multivariate methods to the integrative analysis of high-throughput omics data / Chen Meng. Betreuer: Bernhard Küster. Gutachter: Bernhard Küster ; Dmitrij Frischmann." München : Universitätsbibliothek der TU München, 2016. http://d-nb.info/1082347299/34.
Full textDenecker, Thomas. "Bioinformatique et analyse de données multiomiques : principes et applications chez les levures pathogènes Candida glabrata et Candida albicans Functional networks of co-expressed genes to explore iron homeostasis processes in the pathogenic yeast Candida glabrata Efficient, quick and easy-to-use DNA replication timing analysis with START-R suite FAIR_Bioinfo: a turnkey training course and protocol for reproducible computational biology Label-free quantitative proteomics in Candida yeast species: technical and biological replicates to assess data reproducibility Rendre ses projets R plus accessibles grâce à Shiny Pixel: a content management platform for quantitative omics data Empowering the detection of ChIP-seq "basic peaks" (bPeaks) in small eukaryotic genomes with a web user-interactive interface A hypothesis-driven approach identifies CDK4 and CDK6 inhibitors as candidate drugs for treatments of adrenocortical carcinomas Characterization of the replication timing program of 6 human model cell lines." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASL010.
Full textBiological research is changing. First, studies are often based on quantitative experimental approaches. The analysis and the interpretation of the obtained results thus need computer science and statistics. Also, together with studies focused on isolated biological objects, high throughput experimental technologies allow to capture the functioning of biological systems (identification of components as well as the interactions between them). Very large amounts of data are also available in public databases, freely reusable to solve new open questions. Finally, the data in biological research are heterogeneous (digital data, texts, images, biological sequences, etc.) and stored on multiple supports (paper or digital). Thus, "data analysis" has gradually emerged as a key research issue, and in only ten years, the field of "Bioinformatics" has been significantly changed. Having a large amount of data to answer a biological question is often not the main challenge. The real challenge is the ability of researchers to convert the data into information and then into knowledge. In this context, several biological research projects were addressed in this thesis. The first concerns the study of iron homeostasis in the pathogenic yeast Candida glabrata. The second concerns the systematic investigation of post-translational modifications of proteins in the pathogenic yeast Candida albicans. In these two projects, omics data were used: transcriptomics and proteomics. Appropriate bioinformatics and analysis tools were developed, leading to the emergence of new research hypotheses. Particular and constant attention has also been paid to the question of data reproducibility and sharing of results with the scientific community
Fonseca, Renata Santana. "Modelos de sobreviv?ncia com fra??o de cura e omiss?o nas covari?veis." Universidade Federal do Rio Grande do Norte, 2009. http://repositorio.ufrn.br:8080/jspui/handle/123456789/17004.
Full textCoordena??o de Aperfei?oamento de Pessoal de N?vel Superior
In this work we study the survival cure rate model proposed by Yakovlev (1993) that are considered in a competing risk setting. Covariates are introduced for modeling the cure rate and we allow some covariates to have missing values. We consider only the cases by which the missing covariates are categorical and implement the EM algorithm via the method of weights for maximum likelihood estimation. We present a Monte Carlo simulation experiment to compare the properties of the estimators based on this method with those estimators under the complete case scenario. We also evaluate, in this experiment, the impact in the parameter estimates when we increase the proportion of immune and censored individuals among the not immune one. We demonstrate the proposed methodology with a real data set involving the time until the graduation for the undergraduate course of Statistics of the Universidade Federal do Rio Grande do Norte
Neste trabalho estudamos o modelo de sobreviv^encia com fra??o de cura proposto por Yakovlev et al. (1993) que possui uma estrutura de riscos competitivos. Covari?veis s?o introduzidas para modelar o n?mero m?dio de riscos e permitimos que algumas destas covari?veis apresentem omiss?o. Consideramos apenas os casos em que as covari?veis omissas s?o categ?ricas e as estimativas dos par?metros s?o obtidas atrav?s do algoritmo EM ponderado. Apresentamos uma s?rie de simula??es para confrontar as estimativas obtidas atrav?s deste m?todo com as obtidas quando se exclui do banco de dados as observa??es que apresentam omiss?o, conhecida como an?lise de casos completos. Avaliamos tamb?m atrav?s de simula??es, o impacto na estimativa dos par?metros quando aumenta-se o percentual de curados e de censura entre indiv?duos n?o curados. Um conjunto de dados reais referentes ao tempo at? a conclus?o do curso de estat?stica na Universidade Federal do Rio Grande do Norte ? utilizado para ilustrar o m?todo.
Voillet, Valentin. "Approche intégrative du développement musculaire afin de décrire le processus de maturation en lien avec la survie néonatale." Thesis, Toulouse, INPT, 2016. http://www.theses.fr/2016INPT0067/document.
Full textOver the last decades, some omics data integration studies have been developed to participate in the detailed description of complex traits with socio-economic interests. In this context, the aim of the thesis is to combine different heterogeneous omics data to better describe and understand the last third of gestation in pigs, period influencing the piglet mortality at birth. In the thesis, we better defined the molecular and cellular basis underlying the end of gestation, with a focus on the skeletal muscle. This tissue is specially involved in the efficiency of several physiological functions, such as thermoregulation and motor functions. According to the experimental design, tissues were collected at two days of gestation (90 or 110 days of gestation) from four fetal genotypes. These genotypes consisted in two extreme breeds for mortality at birth (Meishan and Large White) and two reciprocal crosses. Through statistical and computational analyses (descriptive analyses, network inference, clustering and biological data integration), we highlighted some biological mechanisms regulating the maturation process in pigs, but also in other livestock species (cattle and sheep). Some genes and proteins were identified as being highly involved in the muscle energy metabolism. Piglets with a muscular metabolism immaturity would be associated with a higher risk of mortality at birth. A second aspect of the thesis was the imputation of missing individual row values in the multidimensional statistical method framework, such as the multiple factor analysis (MFA). In our context, MFA was particularly interesting in integrating data coming from the same individuals on different tissues (two or more). To avoid missing individual row values, we developed a method, called MI-MFA (multiple imputation - MFA), allowing the estimation of the MFA components for these missing individuals
Czerwińska, Urszula. "Unsupervised deconvolution of bulk omics profiles : methodology and application to characterize the immune landscape in tumors Determining the optimal number of independent components for reproducible transcriptomic data analysis Application of independent component analysis to tumor transcriptomes reveals specific and reproducible immune-related signals A multiscale signalling network map of innate immune response in cancer reveals signatures of cell heterogeneity and functional polarization." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCB075.
Full textTumors are engulfed in a complex microenvironment (TME) including tumor cells, fibroblasts, and a diversity of immune cells. Currently, a new generation of cancer therapies based on modulation of the immune system response is in active clinical development with first promising results. Therefore, understanding the composition of TME in each tumor case is critically important to make a prognosis on the tumor progression and its response to treatment. However, we lack reliable and validated quantitative approaches to characterize the TME in order to facilitate the choice of the best existing therapy. One part of this challenge is to be able to quantify the cellular composition of a tumor sample (called deconvolution problem in this context), using its bulk omics profile (global quantitative profiling of certain types of molecules, such as mRNA or epigenetic markers). In recent years, there was a remarkable explosion in the number of methods approaching this problem in several different ways. Most of them use pre-defined molecular signatures of specific cell types and extrapolate this information to previously unseen contexts. This can bias the TME quantification in those situations where the context under study is significantly different from the reference. In theory, under certain assumptions, it is possible to separate complex signal mixtures, using classical and advanced methods of source separation and dimension reduction, without pre-existing source definitions. If such an approach (unsupervised deconvolution) is feasible to apply for bulk omic profiles of tumor samples, then this would make it possible to avoid the above mentioned contextual biases and provide insights into the context-specific signatures of cell types. In this work, I developed a new method called DeconICA (Deconvolution of bulk omics datasets through Immune Component Analysis), based on the blind source separation methodology. DeconICA has an aim to decipher and quantify the biological signals shaping omics profiles of tumor samples or normal tissues. A particular focus of my study was on the immune system-related signals and discovering new signatures of immune cell types. In order to make my work more accessible, I implemented the DeconICA method as an R package named "DeconICA". By applying this software to the standard benchmark datasets, I demonstrated that DeconICA is able to quantify immune cells with accuracy comparable to published state-of-the-art methods but without a priori defining a cell type-specific signature genes. The implementation can work with existing deconvolution methods based on matrix factorization techniques such as Independent Component Analysis (ICA) or Non-Negative Matrix Factorization (NMF). Finally, I applied DeconICA to a big corpus of data containing more than 100 transcriptomic datasets composed of, in total, over 28000 samples of 40 tumor types generated by different technologies and processed independently. This analysis demonstrated that ICA-based immune signals are reproducible between datasets and three major immune cell types: T-cells, B-cells and Myeloid cells can be reliably identified and quantified. Additionally, I used the ICA-derived metagenes as context-specific signatures in order to study the characteristics of immune cells in different tumor types. The analysis revealed a large diversity and plasticity of immune cells dependent and independent on tumor type. Some conclusions of the study can be helpful in identification of new drug targets or biomarkers for immunotherapy of cancer
Abily-Donval, Lénaïg. "Exploration des mécanismes physiopathologiques des mucopolysacharidoses et de la maladie de Fabry par approches "omiques" et modulation de l'autophagie. Urinary metabolic phenotyping of mucopolysaccharidosis type I combining untargeted and targeted strategies with data modeling Unveiling metabolic remodeling in mucopolysaccharidosis type III through integrative metabolomics and pathway analysis." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMR108.
Full textLysosomal diseases caused by quantitative or qualitative hydrolase or transporter defect induce multiorgan features. Some specific symptomatic treatments are available but they do not cure patients. Pathophysiological bases of lysosomal disease are poorly understood and cannot be due to storage only. A better knowledge of these pathologies could improve their management. The first aim of this study was to apply “omics” strategies in mucopolysaccharidosis and in Fabry disease. This thesis allowed the implementation of an untargeted metabolomic methodology based on a multidimensional analytical strategy including high-resolution mass spectrometry coupled with ultra-high-performance liquid chromatography and ion mobility. Analysis of metabolic pathways showed a major remodeling of the amino acid metabolisms as well as oxidative stress via glutathione metabolism. In Fabry disease, changes were observed in expression of interleukin 7 and FGF2. The second study focused on modulation of autophagy in Fabry disease. In this work, we have shown a disruption of the autophagic process and a delay in enzyme targeting to the lysosome in Fabry disease. Autophagic inhibition reduced accumulation of accumulated substrate (Gb3) and improved the efficiency of enzyme replacement therapy. This work allowed a better knowledge of the physiopathological mechanisms implicated in lysosomal diseases and showed the complexity of lysosome. These data could ameliorate management of these disease and are associated with hope for patients
Hulot, Audrey. "Analyses de données omiques : clustering et inférence de réseaux Female ponderal index at birth and idiopathic infertility." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASL034.
Full textThe development of biological high-throughput technologies (next-generation sequencing and mass spectrometry) have provided researchers with a large amount of data, also known as -omics, that help better understand the biological processes.However, each source of data separately explains only a very small part of a given process. Linking the differents -omics sources between them should help us understand more of these processes.In this manuscript, we will focus on two approaches, clustering and network inference, applied to omics data.The first part of the manuscript presents three methodological developments on this topic. The first two methods are applicable in a situation where the data are heterogeneous.The first method is an algorithm for aggregating trees, in order to create a consensus out of a set of trees. The complexity of the process is sub-quadratic, allowing to use it on data leading to a great number of leaves in the trees. This algorithm is available in an R-package named mergeTrees on the CRAN.The second method deals with the integration data from trees and networks, by transforming these objects into distance matrices using cophenetic and shortest path distances, respectively. This method relies on Multidimensional Scaling and Multiple Factor Analysis and can be also used to build consensus trees or networks.Finally, we use the Gaussian Graphical Models setting and seek to estimate a graph, as well as communities in the graph, from several tables. This method is based on a combination of Stochastic Block Model, Latent Block Model and Graphical Lasso.The second part of the manuscript presents analyses conducted on transcriptomics and metagenomics data to identify targets to gain insight into the predisposition of Ankylosing Spondylitis
Elmansy, Dalia F. "Computational Methods to Characterize the Etiology of Complex Diseases at Multiple Levels." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1583416431321447.
Full textTeng, Sin Yong. "Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries." Doctoral thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2020. http://www.nusl.cz/ntk/nusl-433427.
Full textWolf, Beat. "Reducing the complexity of OMICS data analysis." Doctoral thesis, 2017. https://nbn-resolving.org/urn:nbn:de:bvb:20-opus-153687.
Full textDas Gebiet der Genetik steht vor vielen Herausforderungen, sowohl in der Forschung als auch Diagnostik, aufgrund des "next generation sequencing" (NGS), eine Technologie die DNA immer schneller und billiger sequenziert. NGS wird nicht nur verwendet um DNA zu analysieren sondern auch RNA, ein der DNA sehr ähnliches Molekül, wobei in beiden Fällen große Datenmengen zu erzeugt werden. Durch die große Menge an Daten entstehen Infrastruktur und Benutzbarkeitsprobleme, da leistungsstarke Computerinfrastrukturen erforderlich sind, und es viele manuelle Schritte in der Datenanalyse gibt die kompliziert auszuführen sind. Diese beiden Probleme begrenzen die Verwendung von NGS in der Klinik und Forschung, da es einen Engpass sowohl im Bereich der Rechnerleistung als auch beim Personal gibt, da für viele Analysen Genetikern die erforderlichen Computerkenntnisse fehlen. In dieser Arbeit haben wir untersucht wie die Informatik helfen kann diese Situation zu verbessern indem die Komplexität dieser Art von Analyse reduziert wird. Wir haben angeschaut, wie die Analyse zugänglicher gemacht werden kann um die Anzahl Personen zu erhöhen, die OMICS (OMICS gruppiert verschiedene Genetische Datenquellen) Datenanalysen durchführen können. In enger Zusammenarbeit mit dem Institut für Humangenetik der Universität Würzburg wurde eine graphische NGS Datenanalysen Pipeline erstellt um diese Frage zu erläutern. Die graphische Pipeline wurde für den Diagnostikbereich entwickelt ohne aber die Forschung aus dem Auge zu lassen. Darum warum die Pipeline in verschiedenen Forschungsgebieten verwendet, darunter mit direkter Autorenteilname Publikationen in der Genomik, Transkriptomik und Epigenomik, Die Pipeline wurde auch durch eine Benutzerumfrage validiert, welche bestätigt, dass unsere graphische Pipeline die Komplexität der OMICS Datenanalyse reduziert. Wir haben auch untersucht wie die Leistung der Datenanalyse verbessert werden kann, damit die nötige Infrastruktur zugänglicher wird. Das wurde sowohl durch das optimieren der verfügbaren Methoden (wo z.B. die Variantenanalyse bis zu 18 mal schneller wurde) als auch mit verteiltem Rechnen angegangen, um eine bestehende Infrastruktur besser zu verwenden. Die Verbesserungen wurden in der zuvor beschriebenen graphischen Pipeline integriert, wobei generell die geringe Ressourcenverbrauch ein Fokus war. Um die künftige Entwicklung von parallelen und verteilten Anwendung zu unterstützen, ob in der Genetik oder anderswo, haben wir geschaut, wie man es einfacher machen könnte solche Applikationen zu entwickeln. Dies führte zu einem wichtigen informatischen Result, in dem wir, basierend auf dem Model von „parallel object programming“ (POP), eine Erweiterung der Java-Sprache namens POP-Java entwickelt haben, die eine einfache und transparente Verteilung von Objekten ermöglicht. Durch diese Entwicklung brachten wir das POP-Modell in die Cloud, Hadoop-Cluster und präsentieren ein neues Model für ein verteiltes kollaboratives rechnen, FriendComputing genannt. Die verschiedenen veröffentlichten Teile dieser Dissertation werden speziel aufgelistet und diskutiert
Benjamin, Ashlee Marie. "Computational Processing of Omics Data: Implications for Analysis." Diss., 2013. http://hdl.handle.net/10161/8217.
Full textIn this work, I present four studies across the range of 'omics data types - a Genome- Wide Association Study for gene-by-sex interaction of obesity traits, computational models for transcription start site classification, an assessment of reference-based mapping methods for RNA-Seq data from non-model organisms, and a statistical model for open-platform proteomics data alignment.
Obesity is an increasingly prevalent and severe health concern with a substantial heritable component, and marked sex differences. We sought to determine if the effect of genetic variants also differed by sex by performing a genome-wide association study modeling the effect of genotype-by-sex interaction on obesity phenotypes. Genotype data from individuals in the Framingham Heart Study Offspring cohort were analyzed across five exams. Although no variants showed genome-wide significant gene-by-sex interaction in any individual exam, four polymorphisms displayed a consistent BMI association (P-values .00186 to .00010) across all five exams. These variants were clustered downstream of LYPLAL1, which encodes a lipase/esterase expressed in adipose tissue, a locus previously identified as having sex-specific effects on central obesity. Primary effects in males were in the opposite direction as females and were replicated in Framingham Generation 3. Our data support a sex-influenced association between genetic variation at the LYPLAL1 locus and obesity-related traits.
The application of deep sequencing to map 5' capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: focused promot- ers with transcription start sites (TSSs) that occur in a narrowly defined genomic span and dispersed promoters with TSSs that are spread over a larger window. Pre- vious studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, and our collaborators recently inves- tigated the relationship with chromatin features. It was found that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Here, we present computational models supporting the stronger contribution of chro- matin features to the definition of dispersed promoters compared to focused start sites. Specifically, dispersed promoters display enrichment for well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone vari- ants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes.
The application of next-generation sequencing technology to gene expression quantification analysis, namely, RNA-Sequencing, has transformed the way in which gene expression studies are conducted and analyzed. These advances are of partic- ular interest to researchers studying non-model organisms, as the need for knowl- edge of sequence information is overcome. De novo assembly methods have gained widespread acceptance in the RNA-Seq community for non-model organisms with no true reference genome or transcriptome. While such methods have tremendous utility, computational complexity is still a significant challenge for organisms with large and complex genomes. Here we present a comparison of four reference-based mapping methods for non-human primate data. We explore mapping efficacy, correlation between computed expression values, and utility for differential expression analyses. We show that reference-based mapping methods indeed have utility in RNA-Seq analysis of mammalian data with no true reference, and that the details of mapping methods should be carefully considered when doing so. We find that shorter seed sequences, allowance of mismatches, and allowance of gapped alignments, in addition to splice junction gaps result in more sensitive alignments of non-human primate RNA-Seq data.
Open-platform proteomics experiments seek to quantify and identify the proteins present in biological samples. Much like differential gene expression analyses, it is often of interest to determine how protein abundance differs in various physiological conditions. Label free LC-MS/MS enables the rapid measurement of thousands of proteins, providing a wealth of peptide intensity information for differential analysis. However, the processing of raw proteomics data poses significant challenges that must be overcome prior to analysis. We specifically address the matching of peptide measurements across samples - an essential pre-processing step in every proteomics experiment. Presented here is a novel method for open-platform proteomics data alignment with the ability to incorporate previously unused aspects of the data, particularly ion mobility drift times and product ion data. Our results suggest that the inclusion of additional data results in higher numbers of more confident matches, without increasing the number of mismatches. We also show that the incorporation of product ion data can improve results dramatically. Based on these results, we argue that the incorporation of ion mobility drift times and product ion information are worthy pursuits. In addition, alignment methods should be flexible enough to utilize all available data, particularly with recent advancements in experimental separation methods. The addition of drift times and/or high energy to alignment methods and accurate mass and time (AMT) tag databases can greatly improve experimenters ability to identify measured peptides, reducing analysis costs and potentially the need to run additional experiments.
Dissertation
"Sparse Models For Multimodal Imaging And Omics Data Integration." 2015.
Find full textNersisyan, Lilit. "Telomere analysis based on high-throughput multi-omics data." Doctoral thesis, 2017. https://ul.qucosa.de/id/qucosa%3A16297.
Full textCosta, João Carlos Sequeira. "Development of an automated pipeline for meta-omics data analysis." Master's thesis, 2017. http://hdl.handle.net/1822/56113.
Full textKnowing what lies around us has been a goal for many decades now, and the new advances in sequencing technologies and in meta-omics approaches have permitted to start answering some of the main questions of microbiology - what is there, and what is it doing? The exponential growth of omics studies has been answered by the development of some bioinformatic tools capable of handling Metagenomics (MG) analysis, with a scarce few integrating such analysis with Metatranscriptomics (MT) or Metaproteomics (MP) studies. Furthermore, the existing tools for meta-omics analysis are usually not user friendly, usually limited to command-line usage. Because of the variety in meta-omics approaches, a standard workflow is not possible, but some routines exist, which may be implemented in a single tool, thereby facilitating the work of laboratory professionals. In the framework of this master thesis, a pipeline for integrative MG and MT data analysis was developed. This pipeline aims to retrieve comprehensive comparative gene/transcript expression results obtained from different biological samples. The user can access the data at the end of each step and summaries containing several parameters of evaluation of the previous step, and final graphical representations, like Krona plots and Differential Expression (DE) heatmaps. Several quality reports are also generated. The pipeline was constructed with tools tested and validated for meta-omics data analysis. Selected tools include FastQC, Trimmomatic and SortMeRNA for preprocessing, MetaSPAdes and Megahit for assembly, MetaQUAST and Bowtie2 for reporting on the quality of the assembly, FragGeneScan and DIAMOND for annotation and DeSEQ2 for DE analysis. Firstly, the tools were tested separately and then integrated in several python wrappers to construct the software Meta-Omics Software for Community Analysis (MOSCA). MOSCA performs preprocessing of MG and MT reads, assembly of the reads, annotation of the assembled contigs, and a final data analysis. Real datasets were used to test the capabilities of the tool. Since different types of files can be obtained along the workflow, it is possible to perform further analyses to obtain additional information and/or additional data representations, such as metabolic pathway mapping.
O objectivo da microbiologia, e em particular daqueles que se dedicam ao estudo de comunidades microbianas, é descobrir o que compõe as comunidades, e a função de cada microrganismo no seio da comunidade. Graças aos avanços nas técnicas de sequenciação, em particular no desenvolvimento de tecnologias de Next Generation Sequencing, surgiram abordagens de meta-ómicas que têm vindo a ajudar a responder a estas questões. Várias ferramentas foram desenvolvidas para lidar com estas questões, nomeadamente lidando com dados de Metagenómica (MG), e algumas poucas integrando esse tipo de análise com estudos de Metatranscriptómica (MT) e Metaproteómica (MP). Além da escassez de ferramentas bioinformáticas, as que já existem não costumam ser facilmente manipuláveis por utilizadores com pouca experiencia em informática, e estão frequentemente limitadas a uso por linha de comando. Um formato geral para uma ferramenta de análise meta-ómica não é possível devido à grande variedade de aplicações. No entanto, certas aplicações possuem certas rotinas, que são passíveis de serem implementadas numa ferramenta, facilitando assim o trabalho dos profissionais de laboratório. Nesta tese, uma pipeline integrada para análise de dados de MG e MT foi desenvolvida, pretendendo determinar a expressão de genes/transcriptos entre diferentes amostras biológicas. O utilizador tem disponíveis os resultados de cada passo, sumários com vários parâmetros para avaliação do procedimento, e representações gráficas como gráficos Krona e heatmaps de expressão diferencial. Vários relatórios sobre a qualidade dos resultados obtidos também são gerados. A ferramenta foi construída baseada em ferramentas e procedimentos testados e validados com análise de dados de meta-ómica. Essas ferramentas são FastQC, Trimmomatic e SortMeRNA para pré-processamento, Megahit e MetaSPAdes para assemblagem, MetaQUAST e Bowtie2 para controlo da qualidade dos contigs obtidos na assemblagem, FragGeneScan e DIAMOND para anotação e DeSEQ2 para análise de expressão diferencial. As ferramentas foram testadas uma a uma, e depois integradas em diferentes wrappers de python para compôr a Meta-Omics Software for Community Analysis (MOSCA). A MOSCA executa pré-processamento de reads de MG e MT, assemblagem das reads, anotação dos contigs assemblados, e uma análise de dados final Foram usados dados reais para testar as capacidades da MOSCA. Como podem ser obtidos diferentes tipos de ficheiros ao longo da execução da MOSCA, é possível levar a cabo análises posteriores para obter informação adicional e/ou representações de dados adicionais, como mapeamento de vias metabólicas.
Choong, Wai-kok, and 鍾偉國. "A PPI-based GO functional enrichment analysis for “omics” data." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/86506525034712415580.
Full text國立陽明大學
生物醫學資訊研究所
99
With the popularization of high-throughput technology, enrichment tools have been rapidly developed for analysing large-scale ``omics'' data. However, most methods emphasize statistical significance rather then biological considerationand have difficulty assigning correct statistical significance to terms with few entities. It is therefore difficult for researchers to figure out accurate biological interpretation and assess the quality of Gene Ontology (GO) enrichment results. In this study, we introduce a new functional enrichment analysis strategy. It integrates: 1)comparative genes/proteins quantization from experiments 2)the evidence code of GO annotation for quality control 3)the interaction relationship provided by STRING to figure out the GO terms with accurate biological interpretation. The output is expected to be precise to describe the experiments. In addition, we provide several output styles with graphic visualization. The PPI within terms, the DAG structure and gene similarity between terms are considered to cluster enriched GO terms. Applying our strategy to the p53 +/- status expression dataset, the enriched term with the highest score is GO:0010640 (platelet-derived growth factor receptor signaling pathway, F3 gene, F7 gene), which is supported by literature. Since most of the top-ranked GO terms in the results are supported by previous study, we believe that the genes or proteins in the enriched terms have potential to be candidates for biomarker discovery or targets for experimantal design.
Huang, Chia-Yu, and 黃家郁. "A Hybrid Analysis Method for Construction of Heterogeneous Network from Multi-Omics Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/23261558416219789439.
Full text國立臺灣大學
生醫電子與資訊學研究所
105
It has been illuminated that tumorigenesis is caused by an accumulation of perturbation of different layers of biomolecules. Therefore, there has been growing interest in the integrated analysis of multilayer omics data. The dimensionality, heterogeneity, and dependency of omics data necessitate an effective hybrid-analysis method for systematically exploring the associations and interactions between layers. No such method has been previously developed. In the present study, we aimed to develop a hybrid-analysis method that incorporates multi-omics data for systematically identifying the omics features related to the specific outcome, such as drug responsiveness and patients’ prognosis. These identified features were then presented by a network, using a node to represent a feature and an edge for correlation between features. Besides, the method could cluster a group of highly correlated omics features into a module, providing the putative interactions of biomolecules. The proposed method can be briefly divided into the following four steps. First, omics data were collected and conducted the normalization to transform each dataset into an appropriate scale. Next, we preselected the features of interest to reduce the dimension. Third, Least Absolute Shrinkage and Selection Operator (Lasso) estimator was introduced to identify representative nodes in each module. Finally, we built integral modules by correlation analyses. To test the feasibility of our method, the simulation study and two applications of public datasets, the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA), were conducted. The results of the simulation study demonstrated the feasibility of applying the lasso estimator in the hybrid-analysis method and suggested that improved performance can be achieved by integrating all layers of data simultaneously. Two feature networks were constructed, related to paclitaxel response and survival, respectively. The former network involved a total of 98 modules constituted by 2,033 features from 5 data types. Among them, the expression of ABCB1, which encodes multidrug transporters, was the most relevant factor for drug resistance and was expressed differentially among several cancer types. In addition, we identified the gene set “MICROTUBULE POLYMERIZATION OR DEPOLYMERIZATION”, which influences assembly or disassembly of microtubules, suggesting that paclitaxel affects the functions of microtubules as well as cell movement. In the second network, we identified a total of 266 features that jointly constructed 61 modules correlated with the risk of colon cancer. The mutation status of sulfatase-modifying factor 1 (SUMF1) and the potassium channel member 5 (KCNK5) were the top two most influential factors. Moreover, the loss of chromosome 1p and hypermethylation of multiple CpG loci on chromosome 7, including the sites in HOXA13, were identified associated with poor prognosis. It is expected that the results obtained here could promote the understanding of drug resistance mechanisms and tumor development and progression. To sum up, we developed an effective and robust hybrid-analysis method to investigate multi-omics networks with implications in drug response and prognosis of cancers. Its performance was corroborated using a simulation study and two real datasets. Our model is widely applicable to other omics data and is anticipated to facilitate the exploration of highly heterogeneous cancers.