Dissertations / Theses: 'Analysis of biological data'

1

Droop, Alastair Philip. "Correlation Analysis of Multivariate Biological Data." Thesis, University of York, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.507622.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

McCormick, Paul Stephen. "Statistical analysis of biological expression data." Thesis, University of Cambridge, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.613819.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Hasegawa, Takanori. "Reconstructing Biological Systems Incorporating Multi-Source Biological Data via Data Assimilation Techniques." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/195985.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Waterworth, Alan Richard. "Data analysis techniques of measured biological impedance." Thesis, University of Sheffield, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.340146.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Becker, Katinka [Verfasser]. "Logical Analysis of Biological Data / Katinka Becker." Berlin : Freie Universität Berlin, 2021. http://d-nb.info/1241541779/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

REHMAN, HAFEEZ UR. "Integration and Analysis of Heterogeneous Biological Data." Doctoral thesis, Politecnico di Torino, 2014. http://hdl.handle.net/11583/2537092.

Full text

Abstract:

We live in the era of networks. The power of networks is the most fundamental driving force behind the machinery of life. Living bodies stay alive through complex inter-regulations of biochemical networks and information flows through these networks with such a great intensity and complexity that it exceeds anything that the human ingenuity has been able to spawn so far. Due to this overwhelming complexity we have begun to see a rapid rise in studies aimed at explaining the fundamental concepts and hidden properties of such complex systems. This thesis provides a strong foundation of using networks to understand complex biological phenomenon like protein functions, as well as more accurate method of modeling gene regulatory networks. In the first part we presented a methodology that uses existing biological data with gene ontology functional dependencies to infer functions of uncharacterized proteins. We combined different sources of structural and functional information along with gene ontology based term-specific relationships to predict precise functions of unannotated proteins. Such term-specific relationships, defined to clearly identify the functional contexts of each activity among the interacting proteins, which enables a dramatical improvement of the annotation accuracy with respect to previous approaches. The presented methodology may be easily extended to integrate more sources of biological information to further improve the function prediction confidence. In the second part of this thesis we discussed an extended BN model to account for post-transcriptional regulation in GRN simulation. Thanks to this extended model, we discussed the set of attractors of two biologically confirmed networks, focusing on the regulatory role of miR-7. Attractors have been compared with networks in which the miRNA was removed. The central role of the miRNA for increasing the network stability has been highlighted in both the networks, confirming the cooperative stabilizing role of miR-7. The enhanced BN model presented in this thesis is only a first step towards a more realistic analysis of the high-level functional and topological characteristics of GRNs. Resorting to the tool facilities, the dynamics of real networks can be analyzed. Thanks to the extended model that includes post-transcriptional regulations, not only the network simulation can be more reliable, but also it can offer new insights on the role of miRNAs from a functional perspective, and this improves the current state-of-the-art, which mostly focuses on high-level gene/gene or gene/protein interactions, neglecting post-transcriptional regulations. Due to its discrete nature, the BN model may still neglect some regulatory fine adjustments. However, the largest number of the computed attractors, now including miRNAs, still represents meaningful states of the network. The simple glimpse into the complexity of the network dynamics, that the toolkit is able to provide, could be used not only as a validation of in vitro experiments, but as a real System Biology tool able to rise new questions and drive new experiments.

APA, Harvard, Vancouver, ISO, and other styles

7

Li, Yehua. "Topics in functional data analysis with biological applications." [College Station, Tex. : Texas A&M University, 2006. http://hdl.handle.net/1969.1/ETD-TAMU-1867.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Li. "Integrative Modeling and Analysis of High-throughput Biological Data." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30192.

Full text

Abstract:

Computational biology is an interdisciplinary field that focuses on developing mathematical models and algorithms to interpret biological data so as to understand biological problems. With current high-throughput technology development, different types of biological data can be measured in a large scale, which calls for more sophisticated computational methods to analyze and interpret the data. In this dissertation research work, we propose novel methods to integrate, model and analyze multiple biological data, including microarray gene expression data, protein-DNA interaction data and protein-protein interaction data. These methods will help improve our understanding of biological systems. First, we propose a knowledge-guided multi-scale independent component analysis (ICA) method for biomarker identification on time course microarray data. Guided by a knowledge gene pool related to a specific disease under study, the method can determine disease relevant biological components from ICA modes and then identify biologically meaningful markers related to the specific disease. We have applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. Second, we propose a novel method for transcriptional regulatory network identification by integrating gene expression data and protein-DNA binding data. The approach is built upon a multi-level analysis strategy designed for suppressing false positive predictions. With this strategy, a regulatory module becomes increasingly significant as more relevant gene sets are formed at finer levels. At each level, a two-stage support vector regression (SVR) method is utilized to reduce false positive predictions by integrating binding motif information and gene expression data; a significance analysis procedure is followed to assess the significance of each regulatory module. The resulting performance on simulation data and yeast cell cycle data shows that the multi-level SVR approach outperforms other existing methods in the identification of both regulators and their target genes. We have further applied the proposed method to breast cancer cell line data to identify condition-specific regulatory modules associated with estrogen treatment. Experimental results show that our method can identify biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. Third, we propose a bootstrapping Markov Random Filed (MRF)-based method for subnetwork identification on microarray data by incorporating protein-protein interaction data. Methodologically, an MRF-based network score is first derived by considering the dependency among genes to increase the chance of selecting hub genes. A modified simulated annealing search algorithm is then utilized to find the optimal/suboptimal subnetworks with maximal network score. A bootstrapping scheme is finally implemented to generate confident subnetworks. Experimentally, we have compared the proposed method with other existing methods, and the resulting performance on simulation data shows that the bootstrapping MRF-based method outperforms other methods in identifying ground truth subnetwork and hub genes. We have then applied our method to breast cancer data to identify significant subnetworks associated with drug resistance. The identified subnetworks not only show good reproducibility across different data sets, but indicate several pathways and biological functions potentially associated with the development of breast cancer and drug resistance. In addition, we propose to develop network-constrained support vector machines (SVM) for cancer classification and prediction, by taking into account the network structure to construct classification hyperplanes. The simulation study demonstrates the effectiveness of our proposed method. The study on the real microarray data sets shows that our network-constrained SVM, together with the bootstrapping MRF-based subnetwork identification approach, can achieve better classification performance compared with conventional biomarker selection approaches and SVMs. We believe that the research presented in this dissertation not only provides novel and effective methods to model and analyze different types of biological data, the extensive experiments on several real microarray data sets and results also show the potential to improve the understanding of biological mechanisms related to cancers by generating novel hypotheses for further study.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

9

Causey, Jason L. "Studying Low Complexity Structures in Bioinformatics Data Analysis of Biological and Biomedical Data." Thesis, University of Arkansas at Little Rock, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10750808.

Full text

Abstract:

Biological, biomedical, and radiological data tend to be large, complex, and noisy. Gene expression studies contain expression levels for thousands of genes and hundreds or thousands of patients. Chest Computed Tomography images used for diagnosing lung cancer consist of hundreds of 2-D image ”slices”, each containing hundreds of thousands of pixels. Beneath the size and apparent complexity of many of these data are simple and sparse structures. These low complexity structures can be leveraged into new approaches to biological, biomedical, and radiological data analyses. Two examples are presented here. First, a new framework SparRec (Sparse Recovery) for imputation of GWAS data, based on a matrix completion (MC) model taking advantage of the low-rank and low number of co-clusters of GWAS matrices. SparRec is flexible enough to impute meta-analyses with multiple cohorts genotyped on different sets of SNPs, even without a reference panel. Compared with Mendel-Impute, another MC method, our low-rank based method achieves similar accuracy and efficiency even with up to 90% missing data; our co-clustering based method has advantages in running time. MC methods are shown to have advantages over statistics-based methods, including Beagle and fastPhase. Second, we demonstrate NoduleX, a method for predicting lung nodule malignancy from chest Computed Tomography (CT) data, based on deep convolutional neural networks. For training and validation, we analyze >1000 lung nodules in images from the LIDC/IDRI cohort and compare our results with classifications provided by four experienced thoracic radiologists who participated in the LIDC project. NoduleX achieves high accuracy for nodule malignancy classification, with an AUC of up to 0.99, commensurate with the radiologists’ analysis. Whether they are leveraged directly or extracted using mathematical optimization and machine learning techniques, low complexity structures provide researchers with powerful tools for taming complex data.

APA, Harvard, Vancouver, ISO, and other styles

10

Zandegiacomo, Cella Alice. "Multiplex network analysis with application to biological high-throughput data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10495/.

Full text

Abstract:

In questa tesi vengono studiate alcune caratteristiche dei network a multiplex; in particolare l'analisi verte sulla quantificazione delle differenze fra i layer del multiplex. Le dissimilarita sono valutate sia osservando le connessioni di singoli nodi in layer diversi, sia stimando le diverse partizioni dei layer. Sono quindi introdotte alcune importanti misure per la caratterizzazione dei multiplex, che vengono poi usate per la costruzione di metodi di community detection . La quantificazione delle differenze tra le partizioni di due layer viene stimata utilizzando una misura di mutua informazione. Viene inoltre approfondito l'uso del test dell'ipergeometrica per la determinazione di nodi sovra-rappresentati in un layer, mostrando l'efficacia del test in funzione della similarita dei layer. Questi metodi per la caratterizzazione delle proprieta dei network a multiplex vengono applicati a dati biologici reali. I dati utilizzati sono stati raccolti dallo studio DILGOM con l'obiettivo di determinare le implicazioni genetiche, trascrittomiche e metaboliche dell'obesita e della sindrome metabolica. Questi dati sono utilizzati dal progetto Mimomics per la determinazione di relazioni fra diverse omiche. Nella tesi sono analizzati i dati metabolici utilizzando un approccio a multiplex network per verificare la presenza di differenze fra le relazioni di composti sanguigni di persone obese e normopeso.

APA, Harvard, Vancouver, ISO, and other styles

11

Narayan, Chaya. "Study of Optically Active Biological Fluids Using Polarimetric Data Analysis." University of Akron / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=akron1314038487.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Barnicki, Steven Louis. "An integrated data acquisition and analysis system for biological signals /." The Ohio State University, 1989. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487599963593257.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Wirth, Henry. "Analysis of large-scale molecular biological data using self-organizing maps." Doctoral thesis, Universitätsbibliothek Leipzig, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-101298.

Full text

Abstract:

Modern high-throughput technologies such as microarrays, next generation sequencing and mass spectrometry provide huge amounts of data per measurement and challenge traditional analyses. New strategies of data processing, visualization and functional analysis are inevitable. This thesis presents an approach which applies a machine learning technique known as self organizing maps (SOMs). SOMs enable the parallel sample- and feature-centered view of molecular phenotypes combined with strong visualization and second-level analysis capabilities. We developed a comprehensive analysis and visualization pipeline based on SOMs. The unsupervised SOM mapping projects the initially high number of features, such as gene expression profiles, to meta-feature clusters of similar and hence potentially co-regulated single features. This reduction of dimension is attained by the re-weighting of primary information and does not entail a loss of primary information in contrast to simple filtering approaches. The meta-data provided by the SOM algorithm is visualized in terms of intuitive mosaic portraits. Sample-specific and common properties shared between samples emerge as a handful of localized spots in the portraits collecting groups of co-regulated and co-expressed meta-features. This characteristic color patterns reflect the data landscape of each sample and promote immediate identification of (meta-)features of interest. It will be demonstrated that SOM portraits transform large and heterogeneous sets of molecular biological data into an atlas of sample-specific texture maps which can be directly compared in terms of similarities and dissimilarities. Spot-clusters of correlated meta-features can be extracted from the SOM portraits in a subsequent step of aggregation. This spot-clustering effectively enables reduction of the dimensionality of the data in two subsequent steps towards a handful of signature modules in an unsupervised fashion. Furthermore we demonstrate that analysis techniques provide enhanced resolution if applied to the meta-features. The improved discrimination power of meta-features in downstream analyses such as hierarchical clustering, independent component analysis or pairwise correlation analysis is ascribed to essentially two facts: Firstly, the set of meta-features better represents the diversity of patterns and modes inherent in the data and secondly, it also possesses the better signal-to-noise characteristics as a comparable collection of single features. Additionally to the pattern-driven feature selection in the SOM portraits, we apply statistical measures to detect significantly differential features between sample classes. Implementation of scoring measurements supplements the basal SOM algorithm. Further, two variants of functional enrichment analyses are introduced which link sample specific patterns of the meta-feature landscape with biological knowledge and support functional interpretation of the data based on the ‘guilt by association’ principle. Finally, case studies selected from different ‘OMIC’ realms are presented in this thesis. In particular, molecular phenotype data derived from expression microarrays (mRNA, miRNA), sequencing (DNA methylation, histone modification patterns) or mass spectrometry (proteome), and also genotype data (SNP-microarrays) is analyzed. It is shown that the SOM analysis pipeline implies strong application capabilities and covers a broad range of potential purposes ranging from time series and treatment-vs.-control experiments to discrimination of samples according to genotypic, phenotypic or taxonomic classifications.

APA, Harvard, Vancouver, ISO, and other styles

14

Günther, Clara-Cecilie. "Statistical analysis of biological data – diagnostic tests, geneontology and gene expression." Doctoral thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for matematiske fag, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-5748.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Uhlmann, Johannes [Verfasser], and Rolf [Akademischer Betreuer] Niedermeier. "Multivariate Algorithmics in Biological Data Analysis / Johannes Uhlmann. Betreuer: Rolf Niedermeier." Berlin : Universitätsbibliothek der Technischen Universität Berlin, 2011. http://d-nb.info/1014971853/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Scelfo, Tony (Tony W. ). "Data visualization of biological microscopy image analyses." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/37073.

Full text

Abstract:

Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
Includes bibliographical references.
The Open Microscopy Environment (OME) provides biologists with a framework to store, analyze and manipulate large sets of image data. Current microscopes are capable of generating large numbers of images and when coupled with automated analysis routines, researchers are able to generate intractable sets of data. I have developed an extension to the OME toolkit, named the LoViewer, which allows researchers to quickly identify clusters of images based on relationships between analytically measured parameters. By identifying unique subsets of data, researchers are able to make use of the rest of the OME client software to view interesting images in high resolution, classify them into category groups and apply further analysis routines. The design of the LoViewer itself and its integration with the rest of the OME toolkit will be discussed in detail in body of this thesis.
by Tony Scelfo.
M.Eng.and S.B.

APA, Harvard, Vancouver, ISO, and other styles

17

Ge, Tian. "Some novel models and methods for neuroimaging data analysis." Thesis, University of Warwick, 2013. http://wrap.warwick.ac.uk/58416/.

Full text

Abstract:

In this thesis, we develop some novel models and methods for the analysis of both structural and functional brain images, and their joint analysis with genetic data. In the first project, we present a suite of methods to increase the power of whole-brain genome-wide association studies. We introduce a kernel-based multilocus model to capture the interactions between single nucleotide polymorphisms (SNPs) and model their joint effect on the imaging traits. We provide a fast implementation of voxel- and cluster-wise inferences based on random field theory to take full use of the 3D spatial information in images and account for multiple comparison problems. We also propose a fast permutation procedure to increase the efficiency of standard permutation methods and provide accurate small p-value estimates based on parametric tail approximation. We explore the relationship between 448,294 SNPs and 18,043 genes in 31,662 voxels of the entire brain across 740 elderly subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI), and show boosted power of our approach by making head-to-head comparisons with previous voxel-wise genome-wide association studies. The various advantages of our methods over existing approaches indicate a great potential offered by this novel framework to detect genetic in uences on human brains. In the second project, we present a Bayesian spatial model of multiple sclerosis binary lesion maps based on a spatial generalized linear mixed model with spatially varying coefficients. Our model fully respects the binary nature of the data and the spatial structure of the lesion maps, as opposed to existing massive univariate methods, and produces regularized (smoothed) estimates of lesion incidence without an arbitrary smoothing parameter. Our model also allows for explicit modeling of the spatially varying effects of subject specific covariates such as age, gender, disease duration and disabilities scores, producing spatial maps of these effects and their significance, as well as the (scalar) effect of spatially varying covariates such as the fraction of white matter in each voxel. We apply our model to binary lesion maps derived from T2-weighted MRI images from 250 multiple sclerosis patients classified into five clinical subtypes, and determine the spatial dependence between lesion location and subject specific covariates. We also demonstrate dramatically improved predictive capabilities of our model over existing methods.

APA, Harvard, Vancouver, ISO, and other styles

18

de, Vito Roberta. "Multi-study factor models for high-dimensional biological data." Doctoral thesis, Università degli studi di Padova, 2016. http://hdl.handle.net/11577/3424398.

Full text

Abstract:

High-throughput assays are transforming the study of biology, and are generating a rich, complex and diverse collection of high-dimensional data sets. Building systematic knowledge from this data is a cumulative process, which requires analyses that integrate multiple sources, studies, and technologies. The increased availability of ensembles of studies on related clinical populations, assaying technologies, and genomic features poses two categories of very important multi-study statistical components: 1) common factors shared across multiple studies; 2) study-specific factors. To capture these two different quantities, in this thesis we propose a novel class of factor analysis models, both under a frequentist and Bayesian approach. In the frequentist approach an ECM algorithm is provided to obtain the maximum likelihood estimates. Moreover, we propose a Bayesian approach to apply the method to settings with more variables than subjects. In modeling dependencies among many variables, a sparse structure underlying the associations among genes is assumed. Both methods allow to perform joint analysis of multiple high-throughput studies. The results are helpful for combining multiple studies, identifying reproducible biology across studies and interesting study-specific components, and removing idiosyncratic variation that lacks cross-study reproducibility.
Le analisi scientifiche su un alto numero di campioni (high-throughput assays) stanno trasformando gli studi biologici. In particolare gli high-throughput assays generano una ricca, complessa e varia collezione di dati a più dimensioni. Estrarre informazioni significative in maniera sistematica da questo tipo di dati richiede un processo progressivo che si basa sull’analisi simultanea di risorse, studi e tecnologie differenti. La crescente disponibilità di numerosi studi clinici su rilevanti gruppi, popolazioni e diversi studi genetici genera due categorie: la prima, una categoria relativa ai fattori condivisi da tutti gli studi ed una seconda, relativa a fattori specifici di ogni studio. Per catturare queste due differenti categorie abbiamo proposto, nell'ambito di tale tesi, una nuova classe di modellizzazione di analisi fattoriale che abbiamo sviluppato in un approccio sia frequentista che Bayesiano. Nell'approccio frequentista, è stato proposto un algoritmo ECM per la stima di massima verosimiglianza dei parametri. Inoltre, in questa tesi, si è proposto un approccio Bayesiano per adattare questo modello ad un contesto di più variabili che soggetti, p>n. Nel modellizzare la dipendenza tra variabili, si è assunta una struttura sparsa per sottolineare le associazioni tra i geni. Entrambi i metodi hanno consentito di modellizzare i diversi studi. Inoltre, i risultati hanno permesso di poter identificare un segnale biologico riproducibile e comune in tutti gli studi, nonché ad eliminare quella parte di varianza che oscura questo segnale.

APA, Harvard, Vancouver, ISO, and other styles

19

Handl, Julia [Verfasser]. "Multiobjective approaches to the data-driven analysis of biological systems / Julia Handl." Aachen : Shaker, 2006. http://d-nb.info/1166508315/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Pettit, Jean-Baptiste Olivier Georges. "Spatial analysis of complex biological tissues from single cell gene expression data." Thesis, University of Cambridge, 2015. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.708750.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Zhang, Yuji. "Module-based Analysis of Biological Data for Network Inference and Biomarker Discovery." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/28482.

Full text

Abstract:

Systems biology comprises the global, integrated analysis of large-scale data encoding different levels of biological information with the aim to obtain global insight into the cellular networks. Several studies have unveiled the modular and hierarchical organization inherent in these networks. In this dissertation, we propose and develop innovative systems approaches to integrate multi-source biological data in a modular manner for network inference and biomarker discovery in complex diseases such as breast cancer. The first part of the dissertation is focused on gene module identification in gene expression data. As the most popular way to identify gene modules, many cluster algorithms have been applied to the gene expression data analysis. For the purpose of evaluating clustering algorithms from a biological point of view, we propose a figure of merit based on Kullback-Leibler divergence between cluster membership and known gene ontology attributes. Several benchmark expression-based gene clustering algorithms are compared using the proposed method with different parameter settings. Applications to diverse public time course gene expression data demonstrated that fuzzy c-means clustering is superior to other clustering methods with regard to the enrichment of clusters for biological functions. These results contribute to the evaluation of clustering outcomes and the estimations of optimal clustering partitions. The second part of the dissertation presents a hybrid computational intelligence method to infer gene regulatory modules. We explore the combined advantages of the nonlinear and dynamic properties of neural networks, and the global search capabilities of the hybrid genetic algorithm and particle swarm optimization method to infer network interactions at modular level. The proposed computational framework is tested in two biological processes: yeast cell cycle, and human Hela cancer cell cycle. The identified gene regulatory modules were evaluated using several validation strategies: 1) gene set enrichment analysis to evaluate the gene modules derived from clustering results; (2) binding site enrichment analysis to determine enrichment of the gene modules for the cognate binding sites of their predicted transcription factors; (3) comparison with previously reported results in the literatures to confirm the inferred regulations. The proposed framework could be beneficial to biologists for predicting the components of gene regulatory modules in which any candidate gene is involved. Such predictions can then be used to design a more streamlined experimental approach for biological validation. Understanding the dynamics of these gene regulatory modules will shed light on the related regulatory processes. Driven by the fact that complex diseases such as cancer are “diseases of pathways”, we extended the module concept to biomarker discovery in cancer research. In the third part of the dissertation, we explore the combined advantages of molecular interaction network and gene expression profiles to identify biomarkers in cancer research. The reliability of conventional gene biomarkers has been challenged because of the biological heterogeneity and noise within and across patients. In this dissertation, we present a module-based biomarker discovery approach that integrates interaction network topology and high-throughput gene expression data to identify markers not as individual genes but as modules. To select reliable biomarker sets across different studies, a hybrid method combining group feature selection with ensemble feature selection is proposed. First, a group feature selection method is used to extract the modules (subnetworks) with discriminative power between disease groups. Then, an ensemble feature selection method is used to select the optimal biomarker sets, in which a double-validation strategy is applied. The ensemble method allows combining features selected from multiple classifications with various data subsampling to increase the reliability and classification accuracy of the final selected biomarker set. The results from four breast cancer studies demonstrated the superiority of the module biomarkers identified by the proposed approach: they can achieve higher accuracies, and are more reliable in datasets with same clinical design. Based on the experimental results above, we believe that the proposed systems approaches provide meaningful solutions to discover the cellular regulatory processes and improve the understanding about disease mechanisms. These computational approaches are primarily developed for analysis of high-throughput genomic data. Nevertheless, the proposed methods can also be extended to analyze high-throughput data in proteomics and metablomics areas.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

22

Kelley, Ryan Matthew. "The analysis and integration of high-throughput biological data for pathway discovery." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2009. http://wwwlib.umi.com/cr/ucsd/fullcit?p3341729.

Full text

Abstract:

Thesis (Ph. D.)--University of California, San Diego, 2009.
Title from first page of PDF file (viewed February 6, 2009). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 153-170).

APA, Harvard, Vancouver, ISO, and other styles

23

Lu, Yingzhou. "Multi-omics Data Integration for Identifying Disease Specific Biological Pathways." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/83467.

Full text

Abstract:

Pathway analysis is an important task for gaining novel insights into the molecular architecture of many complex diseases. With the advancement of new sequencing technologies, a large amount of quantitative gene expression data have been continuously acquired. The springing up omics data sets such as proteomics has facilitated the investigation on disease relevant pathways. Although much work has previously been done to explore the single omics data, little work has been reported using multi-omics data integration, mainly due to methodological and technological limitations. While a single omic data can provide useful information about the underlying biological processes, multi-omics data integration would be much more comprehensive about the cause-effect processes responsible for diseases and their subtypes. This project investigates the combination of miRNAseq, proteomics, and RNAseq data on seven types of muscular dystrophies and control group. These unique multi-omics data sets provide us with the opportunity to identify disease-specific and most relevant biological pathways. We first perform t-test and OVEPUG test separately to define the differential expressed genes in protein and mRNA data sets. In multi-omics data sets, miRNA also plays a significant role in muscle development by regulating their target genes in mRNA dataset. To exploit the relationship between miRNA and gene expression, we consult with the commonly used gene library - Targetscan to collect all paired miRNA-mRNA and miRNA-protein co-expression pairs. Next, by conducting statistical analysis such as Pearson's correlation coefficient or t-test, we measured the biologically expected correlation of each gene with its upstream miRNAs and identify those showing negative correlation between the aforementioned miRNA-mRNA and miRNA-protein pairs. Furthermore, we identify and assess the most relevant disease-specific pathways by inputting the differential expressed genes and negative correlated genes into the gene-set libraries respectively, and further characterize these prioritized marker subsets using IPA (Ingenuity Pathway Analysis) or KEGG. We will then use Fisher method to combine all these p-values derived from separate gene sets into a joint significance test assessing common pathway relevance. In conclusion, we will find all negative correlated paired miRNA-mRNA and miRNA-protein, and identifying several pathophysiological pathways related to muscular dystrophies by gene set enrichment analysis. This novel multi-omics data integration study and subsequent pathway identification will shed new light on pathophysiological processes in muscular dystrophies and improve our understanding on the molecular pathophysiology of muscle disorders, preventing and treating disease, and make people become healthier in the long term.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

24

Riley, Michael. "Significant pattern discovery in gene location and phylogeny." Thesis, Aberystwyth University, 2009. http://hdl.handle.net/2160/fbabd607-ae86-44ed-a3f1-9eb2c461da32.

Full text

Abstract:

This thesis documents the investigation into the acquisition of knowledge from biological data using computational methods for the discovery of significantly frequent patterns in gene location and phylogeny. Beginning with an initial statistical analysis of distribution of gene locations in the flowering plant Arabidopsis thaliana, we discover unexplained elements of order. The second area of this research looks into frequent patterns in the single dimensional linear structure of the physical locations of genes on the genome of Saccharomyces cerevisiae. This is an area of epigenetics which has, hitherto, attracted little attention. The frequent patterns are patterns of structure represented in Datalog, suitable for analyses using the logic programming methodology Prolog. This is used to find patterns in gene location with respect to various gene attributes such as molecular function and the distance between genes. Here we find significant frequent patterns in neighbouring pairs of genes. We also discover very significant patterns in the molecular function of genes separated by distances of between 5,000 and 20,000 base pairs. However, in complete contrast to the latter result, we find that the distribution of genes of molecular function within a local region of ±20, 000 base pairs is locationally independent. In the second part of this research we look for significantly frequent patterns of phylogenetic subtrees in a broad database of phylogenetic trees. Here we investigate the use of two types of frequent phylogenetic structures. Firstly, phylogenetic pairs are used to determine relationships between organisms. Secondly, phylogenetic triple structures are used to represent subtrees. Frequent subtree mining is then used to establish phylogenetic relationships with a high confidence between a small set of organisms. This exercise was invaluable to enable these procedures to be extended in future to encompass much larger sets of organisms. This research has revealed effective methods for the analysis of, and has discovered patterns of order in the locations of genes within genomes. Research into phylogenetic tree generation based on protein structure has discovered the requirements for an effective method to extract elements of phylogenetic information from a phylogenetic database and reconstruct a single consensus tree from that information. In this way it should be possible to produce a species tree of life with high degree of confidence and resolution.

APA, Harvard, Vancouver, ISO, and other styles

25

Mosquera, Mayo José Luís. "Methods and Models for the Analysis of Biological Signifïcance Based on HighThroughput Data." Doctoral thesis, Universitat de Barcelona, 2014. http://hdl.handle.net/10803/286465.

Full text

Abstract:

The advent of high-throughput technologies has generated a huge quantity of omics data. The results of these experiments usually are long lists of genes that can be used as biomarkers. A major challenge for the researchers is to attribute a biological interpretation or significance to these list of potential biomarkers, by using biological information stored in bioinformatics resources such as the Gene Ontology (GO) or the Kyoto Encyclopedia of Genes and Genomes (KEGG), or combining them with other types of omics data. This dissertation had two main objectives. First, to study mathematical properties of two types of semantic similarity measures for exploring GO categories, and second, to classify and to study the evolution of GO tools for enrichment analysis. The first measure considered was a semantic similarity measure proposed by Lord et al. It is a node- based approach based on the Graph Theory. The second measure actually was a group pseudo- distances proposed Joslyn et al. They were edge-based approaches based on the algebraic point of view of the Partially Ordered Sets (POSET) Theory. So, in order of reaching our objectives, first of all a review and description of main methods about graph theory and POSET theory was carried out. This fact allowed us to realized that there are to ways for mapping objects (e.g. genes) in to the terms of an ontology (e.g. GO). First formulation is called Object-Ontology Complex (OOC). It was proposed by Carey in order to perform statistical computations. Second formulation is called POSET Ontology (POSO) and it was introduced by Joslyn et al. In order to classify the GO for enrichment analysis the first 26 GO available at the website of The GO Consortium were surveyed. These left us list of 205 features that were used for building an Standard Functionalities Set. Based on these functionalities the 26 GO tools were classified according to their capabilities. The study of the GO tools evolution was based on the monitoring of these 26 GO tools. So the statistical analysis consisted of a descriptive statistics, an inferential analysis and a multivariate analysis. With regard to the first objective, we have seen the Lord's measure is the same as the Resnik's measure, previously published. It has observed that there exists a certain level of analogy between the formalization of the OOC and the POSO for mapping genes to objects to terms of an ontology. A property and a corollary to calculate semantic similarity measures from node-based approaches based on a matrix point of view have been proposed. It has been proved that the Lord's measure and the Joslyn's measure can be redefined in terms of metric distance. An R package called sims for computing semantic similarity measures between terms of an arbitrary ontology and comparing semantic similarity profiles based on the GO terms associated with two lists of genes has been developed. Based on the classification of the GO programs a web-based tool called SerbGO devoted to select and compare GO tools stored in was developed. The statistical analysis about the evolution of GO tools suggested that the promoters have introduced improvements over time, but clear models of GO tools have been detected. According to the results of the statistical analysis an ontology called DeGOT was built in order to provide an structured vocabulary for the developers when they dealing with the task of introducing improvements in the existing GO tools for enrichment analysis or designing a new one program. DeGOT can be used for supporting queries and comparison results of SerbGO.
L'aparició de les tecnologies d'alt rendiment ha generat una quantitat ingent de dades òmiques. Els resultats d'aquests experiment són llargues llistes de gens, que poden ser utilitzats com a biomarcadors. Un dels grans reptes dels investigadors experimentals és atribuir una interpretació o significació biològica a aquests biomarcadors potencials, ja be sigui extraient la informació bioblògica emmagatzemada en recursos com la Gene Ontology (GO) o la Kyoto Encyclopedia of Genes and Genomes (KEGG), o be combinant-les amb altres dades òmiques. Els objectius de la tesis eren: primer, estudiar les propietats matemàtiques de dos tipus de mesures de similaritat semàntica per a explorar categories GO, i segon, classificar i estudiar l'evolució de les eines GO per a l'anàlisi d'enriquiment. La primera mesura de similaritat semàntica considerada, proposada per en Lord et al., es fonamentava en la teoria de grafs, i la segona era un grup de pseudo-distàncies, proposades per Joslyn et al., fonamentades en la teoria dels Partially Ordered Sets (POSETs). L'estudi de les eines GO es va basar en les primeres 26 eines disponibles al web del The GO Consortium. S'ha vist que la mesura d'en Lord et al. és la mateixa mesura que la d'en Resnik, anteriorment publicada. S'ha observat una analogia en la forma de mapejar els gens a la GO via grafs i/o via POSETs. S'han proposat una propietat i un corol·lari que permeten calcular matricialment les la primera mesura de similaritat semàntica. S'ha demostrat que ambdues mesures estan associades a la distància mètrica. A'ha desenvolupat un paquet R, anomenat sims, que permet calcular similaritats semàntiques d'una ontologia arbitraria i comparar perfils de similaritat semàntica de la GO. S'ha proposat un Conjunt de Funcionalitats Estàndard per a classificar eines GO i s'ha desenvolupat un programari web, anomenat SerbGO, dirigit a seleccionar i comparar eines GO. L'estudi estadístic ha revelat que els promotors de les eines GO han introduït millores al llarg del temps, però no s'han detectat models ben definits. S'ha desenvolupat una ontologia, anomenada DeGOT, que proporciona un vocabulari als desenvolupadors per a introduir millores a les eines o dissenyar una de nova.

APA, Harvard, Vancouver, ISO, and other styles

26

Georgi, Benjamin [Verfasser]. "Context-specific independence mixture models for cluster analysis of biological data / Benjamin Georgi." Berlin : Freie Universität Berlin, 2009. http://d-nb.info/102366402X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Yang, Karren Dai. "Learning causal graphs under interventions and applications to single-cell biological data analysis." Thesis, Massachusetts Institute of Technology, 2021. https://hdl.handle.net/1721.1/130806.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Department of Biological Engineering, February, 2021
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021
Cataloged from the official PDF version of thesis.
Includes bibliographical references (pages 49-51).
This thesis studies the problem of learning causal directed acyclic graphs (DAGs) in the setting where both observational and interventional data is available. This setting is common in biology, where gene regulatory networks can be intervened on using chemical reagents or gene deletions. The identifiability of causal DAGs under perfect interventions, which eliminate dependencies between targeted variables and their direct causes, has previously been studied. This thesis first extends these identifiability results to general interventions, which may modify the dependencies between targeted variables and their causes without eliminating them, by defining and characterizing the interventional Markov equivalence class that can be identified from general interventions. Subsequently, this thesis proposes the first provably consistent algorithm for learning DAGs in this setting. Finally, this algorithm as well as related work is applied to analyze biological datasets.
by Karren Dai Yang.
S.M.
S.M.
S.M. Massachusetts Institute of Technology, Department of Biological Engineering
S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

28

Chittenden, Thomas William. "Quantitative integration of biological knowledge for the analysis of high-throughput genomic data." Thesis, University of Oxford, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.559860.

Full text

Abstract:

The development of high-throughput technologies has changed the way in which we approach questions in biology by allowing us to assess the relative state of tens of thousands of genes or gene products in a single assay. A great deal of research has focused on developing statistical methods to identify biologically relevant sets of genes whose collective state correlates with a given phenotype under study. However, placing these gene sets into an intellectual framework that allows for hypothesis generation and mechanistic interpretation remains a significant challenge. To address these issues, we first apply and then extend a well-established gene ontology, singular enrichment analysis method to quantitatively assess overrepresented biological themes within lists of somatically mutated and abnormally expressed genes from publically available human breast, colorectal, lung, prostate, and renal cancer datasets. We further validate the utility of this novel approach with actual experimental laboratory investigations. Finally, we describe a general strategy for constructing prediction models by integrating prior biological knowledge with gene expression data from three large human breast cancer datasets. We show how this biological network-based model improves performance and interoperability by identifying genes more closely related to breast cancer etiology and patient survival. The work presented throughout this manuscript indicates the utility and proposes the future development of such methodologies to address many of the contemporary concerns associated with the analysis of a wide array of high-dimensional genomic data types.

APA, Harvard, Vancouver, ISO, and other styles

29

Ghazanfar, Shila. "Statistical approaches to harness high throughput sequencing data in diverse biological systems." Thesis, The University of Sydney, 2017. http://hdl.handle.net/2123/17268.

Full text

Abstract:

The development of novel statistical approaches to questions specific to biological systems of interest is becoming more valuable as we tackle increasingly complex problems. This thesis explores three distinct biological systems in which high throughput sequencing data is utilised, varying in research area, organism, number of sequencing platforms and datasets integrated, and structure such as matched samples; showcasing the variety of study designs and thus the need for tailored statistical approaches. First, we characterise allelic imbalance from RNA-Seq data including stringent filtering criteria and a count based likelihood ratio test. This work identified genes of particular importance in livestock genomics such as those related to energy use. Second, we outline a novel methodology to identify highly expressed genes and cells for single cell RNA-Seq data. We derive a gamma-normal mixture model to identify lowly and highly expressed components, and use this to identify novel markers for olfactory sensory neuron (OSN) maturity across publicly available mouse neuron datasets. In addition we estimate single cell networks and find that mature OSN single cell networks are more centralised than immature OSN single cell networks. Third, we develop two novel frameworks for relating information from Whole Exome DNA-Seq and RNA-Seq data when i) samples are matched and when ii) samples are not necessary matched between platforms. In the latter case, we relate functional somatic mutation driver gene scores to transcriptional network correlation disturbance using a permutation testing framework, identifying potential candidate genes for targeted therapies. In the former case, we estimate directed mutation-expression networks for each cancer using linear models, providing a useful exploratory tool for identifying novel relationships among genes. This thesis demonstrates the importance of tailored statistical approaches to further understanding across many biological systems.

APA, Harvard, Vancouver, ISO, and other styles

30

Fellenberg, Matthias. "A bioinformatic approach to the metabolic and functional analysis of biological high throughput data." [S.l. : s.n.], 2002. http://deposit.ddb.de/cgi-bin/dokserv?idn=966129822.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Leader, Debbie. "Methods for incorporating biological information into the statistical analysis of gene expression microarray data." Thesis, University of Auckland, 2009. http://hdl.handle.net/2292/5609.

Full text

Abstract:

Microarray technology has made it possible for researchers to simultaneously measure the expression levels of tens of thousands of genes. It is believed that most human diseases and biological phenomena occur through the interaction of groups of genes that are functionally related. To investigate the feasibility of incorporating functional information and/or constraints (based on biological and technical needs) into the classification process two approaches were examined in this thesis. The first of these approaches investigated the effect of incorporating a pre-filter into the gene selection step of the classifier construction process. Both simulated and real microarray datasets were used to assess the utility of this approach. The pre-filter was based on an early method for determining if a gene had undergone a biologically relevant level of differential expression between two classes. The genes retained by the pre-filter were ranked using one of five standard statistical ranking methods and the most highly ranked were used to construct a predictive classifier. To generate the simulated data a selection of different parametric and non-parametric techniques were employed. The results from these analyses showed that when the constraints that the pre-filter contains were placed on the classification analysis, the predictive performance of the classifiers were similar to when the pre-filter was not used. The second approach explored the feasibility of incorporating sets of functionally related genes into the classification process. Three publicly available datasets obtained from studies into breast cancer were used to assess the utility of this approach. A summary of each gene-set was derived by reducing the dimensionality of each gene-set via the use of Principal Co-ordinates Analysis. The reduced gene-sets were then ranked based on their ability to distinguish between the two classes (via Hotelling’s T2) and those most highly ranked were used to construct a classifier via logistic regression. The results from the analyses undertaken for this approach showed that it was possible to incorporate function information into the classification process whilst maintaining an equivalent (if not higher) level of predictive performance, as well as improving the biological interpretability of the classifier.

APA, Harvard, Vancouver, ISO, and other styles

32

Jeanmougin, Marine. "Statistical methods for robust analysis of transcriptome data by integration of biological prior knowledge." Thesis, Evry-Val d'Essonne, 2012. http://www.theses.fr/2012EVRY0029/document.

Full text

Abstract:

Au cours de la dernière décennie, les progrès en Biologie Moléculaire ont accéléré le développement de techniques d'investigation à haut-débit. En particulier, l'étude du transcriptome a permis des avancées majeures dans la recherche médicale. Dans cette thèse, nous nous intéressons au développement de méthodes statistiques dédiées au traitement et à l'analyse de données transcriptomiques à grande échelle. Nous abordons le problème de sélection de signatures de gènes à partir de méthodes d'analyse de l'expression différentielle et proposons une étude de comparaison de différentes approches, basée sur plusieurs stratégies de simulations et sur des données réelles. Afin de pallier les limites de ces méthodes classiques qui s'avèrent peu reproductibles, nous présentons un nouvel outil, DiAMS (DIsease Associated Modules Selection), dédié à la sélection de modules de gènes significatifs. DiAMS repose sur une extension du score-local et permet l'intégration de données d'expressions et de données d'interactions protéiques. Par la suite, nous nous intéressons au problème d'inférence de réseaux de régulation de gènes. Nous proposons une méthode de reconstruction à partir de modèles graphiques Gaussiens, basée sur l'introduction d'a priori biologique sur la structure des réseaux. Cette approche nous permet d'étudier les interactions entre gènes et d'identifier des altérations dans les mécanismes de régulation, qui peuvent conduire à l'apparition ou à la progression d'une maladie. Enfin l'ensemble de ces développements méthodologiques sont intégrés dans un pipeline d'analyse que nous appliquons à l'étude de la rechute métastatique dans le cancer du sein
Recent advances in Molecular Biology have led biologists toward high-throughput genomic studies. In particular, the investigation of the human transcriptome offers unprecedented opportunities for understanding cellular and disease mechanisms. In this PhD, we put our focus on providing robust statistical methods dedicated to the treatment and the analysis of high-throughput transcriptome data. We discuss the differential analysis approaches available in the literature for identifying genes associated with a phenotype of interest and propose a comparison study. We provide practical recommendations on the appropriate method to be used based on various simulation models and real datasets. With the eventual goal of overcoming the inherent instability of differential analysis strategies, we have developed an innovative approach called DiAMS, for DIsease Associated Modules Selection. This method was applied to select significant modules of genes rather than individual genes and involves the integration of both transcriptome and protein interactions data in a local-score strategy. We then focus on the development of a framework to infer gene regulatory networks by integration of a biological informative prior over network structures using Gaussian graphical models. This approach offers the possibility of exploring the molecular relationships between genes, leading to the identification of altered regulations potentially involved in disease processes. Finally, we apply our statistical developments to study the metastatic relapse of breast cancer

APA, Harvard, Vancouver, ISO, and other styles

33

Curry, Edward William James. "Mining large collections of gene expression data to elucidate transcriptional regulation of biological processes." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/9437.

Full text

Abstract:

A vast amount of gene expression data is available to biological researchers. As of October 2010, the GEO database has 45,777 chips of publicly available gene expression pro ling data from the Affymetrix (HGU133v2) GeneChip platform, representing 2.5 billion numerical measurements. Given this wealth of data, `meta-analysis' methods allowing inferences to be made from combinations of samples from different experiments are critically important. This thesis explores the application of localized pattern-mining approaches, as exemplified by biclustering, for large-scale gene expression analysis. Biclustering methods are particularly attractive for the analysis of large compendia of gene expression data as they allow the extraction of relationships that occur only across subsets of genes and samples. Standard correlation methods, however, assume a single correlation relationship between two genes occurs across all samples in the data. There are a number of existing biclustering methods, but as these did not prove suitable for large scale analysis, a novel method named `IslandCluster' was developed. This method provided a framework for investigating the results of different approaches to biclustering meta-analysis. The biclustering methods used in this work involve preprocessing of gene expression data into a unified scale in order to assess the significance of expression patterns. A novel discretisation approach is shown to identify distinct classes of genes' expression values more appropriately than approaches reported in the literature. A Gene Expression State Transformation (`GESTr') introduced as the first reported modelling of the biological state of expression on a unified scale and is shown to facilitate effective meta-analysis. Localised co-dependency analysis is introduced, a paradigm for identifying transcriptional relationships from gene expression data. Tools implementing this analysis were developed and used to analyse specificity of transcriptional relationships, to distinguish related subsets within a set of transcription factor (TF) targets and to tease apart combinatorial regulation of a set of targets by multiple TFs. The state of pluripotency, from which a mammalian cell has the potential to differentiate into any cell from any of the three adult germ layers, is maintained by forced expression of Nanog and may be induced from a non-pluripotent state by the expression of Oct4, Sox2, Klf4 and cMyc. Analysis of cMyc regulatory targets shed light on a recent proposition that cMyc induces an `embryonic stem cell like' transcriptional signature outside embryonic stem (ES) cells, revealing a cMyc-responsive subset of the signature and identifying ES cell expressed targets with evidence of broad cMyc-induction. Regulatory targets through which cMyc, Oct4, Sox2 and Nanog may maintain or induce pluripotency were identified, offering insight into transcriptional mechanisms involved in the control of pluripotency and demonstrating the utility of the novel analysis approaches presented in this work.

APA, Harvard, Vancouver, ISO, and other styles

34

He, Xin. "A semi-automated framework for the analytical use of gene-centric data with biological ontologies." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/25505.

Full text

Abstract:

Motivation Translational bioinformatics(TBI) has been defined as ‘the development and application of informatics methods that connect molecular entities to clinical entities’ [1], which has emerged as a systems theory approach to bridge the huge wealth of biomedical data into clinical actions using a combination of innovations and resources across the entire spectrum of biomedical informatics approaches [2]. The challenge for TBI is the availability of both comprehensive knowledge based on genes and the corresponding tools that allow their analysis and exploitation. Traditionally, biological researchers usually study one or only a few genes at a time, but in recent years high throughput technologies such as gene expression microarrays, protein mass-spectrometry and next-generation DNA and RNA sequencing have emerged that allow the simultaneous measurement of changes on a genome-wide scale. These technologies usually result in large lists of interesting genes, but meaningful biological interpretation remains a major challenge. Over the last decade, enrichment analysis has become standard practice in the analysis of such gene lists, enabling systematic assessment of the likelihood of differential representation of defined groups of genes compared to suitably annotated background knowledge. The success of such analyses are highly dependent on the availability and quality of the gene annotation data. For many years, genes were annotated by different experts using inconsistent, non-standard terminologies. Large amounts of variation and duplication in these unstructured annotation sets, made them unsuitable for principled quantitative analysis. More recently, a lot of effort has been put into the development and use of structured, domain specific vocabularies to annotate genes. The Gene Ontology is one of the most successful examples of this where genes are annotated with terms from three main clades; biological process, molecular function and cellular component. However, there are many other established and emerging ontologies to aid biological data interpretation, but are rarely used. For the same reason, many bioinformatic tools only support analysis analysis using the Gene Ontology. The lack of annotation coverage and the support for them in existing analytical tools to aid biological interpretation of data has become a major limitation to their utility and uptake. Thus, automatic approaches are needed to facilitate the transformation of unstructured data to unlock the potential of all ontologies, with corresponding bioinformatics tools to support their interpretation. Approaches In this thesis, firstly, similar to the approach in [3,4], I propose a series of computational approaches implemented in a new tool OntoSuite-Miner to address the ontology based gene association data integration challenge. This approach uses NLP based text mining methods for ontology based biomedical text mining. What differentiates my approach from other approaches is that I integrate two of the most wildly used NLP modules into the framework, not only increasing the confidence of the text mining results, but also providing an annotation score for each mapping, based on the number of pieces of evidence in the literature and the number of NLP modules that agreed with the mapping. Since heterogeneous data is important in understanding human disease, the approach was designed to be generic, thus the ontology based annotation generation can be applied to different sources and can be repeated with different ontologies. Secondly, in respect of the second challenge proposed by TBI, to increase the statistical power of the annotation enrichment analysis, I propose OntoSuite-Analytics, which integrates a collection of enrichment analysis methods into a unified open-source software package named topOnto, in the statistical programming language R. The package supports enrichment analysis across multiple ontologies with a set of implemented statistical/topological algorithms, allowing the comparison of enrichment results across multiple ontologies and between different algorithms. Results The methodologies described above were implemented and a Human Disease Ontology (HDO) based gene annotation database was generated by mining three publicly available database, OMIM, GeneRIF and Ensembl variation. With the availability of the HDO annotation and the corresponding ontology enrichment analysis tools in topOnto, I profiled 277 gene classes with human diseases and generated ‘disease environments’ for 1310 human diseases. The exploration of the disease profiles and disease environment provides an overview of known disease knowledge and provides new insights into disease mechanisms. The integration of multiple ontologies into a disease context demonstrates how ‘orthogonal’ ontologies can lead to biological insight that would have been missed by more traditional single ontology analysis.

APA, Harvard, Vancouver, ISO, and other styles

35

Petrizzelli, Marianyela. "Mathematical modelling and integration of complex biological data : analysis of the heterosis phenomenon in yeast." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS204/document.

Full text

Abstract:

Le cadre général de cette thèse est la question de la relation génotype-phénotype, abordée à travers l'analyse du phénomène d'hétérosis chez la levure, dans une approche associant biologie, mathématiques et statistiques. Antérieurement à ce travail, un très gros jeu de données hétérogènes, correspondant à différents niveaux d'organisation (protéomique, caractères de fermentation et traits d'histoire de vie), avait été recueilli sur un dispositif demi-diallèle entre 11 souches appartenant à deux espèces. Ce type de données est idéalement adapté pour la modélisation multi-échelle et pour tester des modèles de prédiction de la variation de phénotypes intégrés à partir de caractères protéiques et métaboliques (flux), tout en tenant compte des structures de dépendance entre variables et entre observations. J’ai d'abord décomposé, pour chaque caractère, la variance génétique totale en variances des effets additifs, de consanguinité et d'hétérosis, et j’ai montré que la distribution de ces composantes permettait de définir des groupes bien tranchés de protéines dans lesquels se plaçaient la plupart des caractères de fermentation et de traits d'histoire de vie. Au sein de ces groupes, les corrélations entre les variances des effets d'hétérosis et de consanguinité pouvaient être positives, négatives ou nulles, ce qui a constitué la première mise en évidence expérimentale d’un découplage possible entre les deux phénomènes. Le second volet de la thèse a consisté à interfacer les données de protéomique quantitative avec un modèle stœchiométrique du métabolisme carboné central de la levure, en utilisant une approche de modélisation à base de contraintes. M'appuyant sur un algorithme récent, j’ai cherché, dans l'espace des solutions possibles, celle qui minimisait la distance entre le vecteur de flux et le vecteur des abondances observées des protéines. J’ai ainsi pu prédire un ensemble de flux et comparer les patrons de corrélations entre caractères à plusieurs niveaux d'intégration. Les données révèlent deux grandes familles de caractères de fermentation ou de traits d'histoire de vie dont l'interprétation biochimique est cohérente en termes de trade-off, et qui n'avaient pas été mises en évidence à partir des seules données de protéomique quantitative. L'ensemble de mes travaux permet de mieux comprendre l'évolution de la relation entre génotype et phénotype
The general framework of this thesis is the issue of the genotype-phenotype relationship, through the analysis of the heterosis phenomenon in yeast, in an approach combining biology, mathematics and statistics. Prior to this work, a very large set of heterogeneous data, corresponding to different levels of organization (proteomics, fermentation and life history traits), had been collected on a semi-diallel design involving 11 strains belonging to two species. This type of data is ideally suited for multi-scale modelling and for testing models for predicting the variation of integrated phenotypes from protein and metabolic (flux) traits, taking into account dependence patterns between variables and between observations. I first decomposed, for each trait, the total genetic variance into variances of additive, inbreeding and heterosis effects, and showed that the distribution of these components made it possible to define well-defined groups of proteins in which most of the characters of fermentation and life history traits took place. Within these groups, the correlations between the variances of heterosis and inbreeding effects could be positive, negative or null, which was the first experimental demonstration of a possible decoupling between the two phenomena. The second part of the thesis consisted of interfacing quantitative proteomic data with the yeast genome-scale metabolic model using a constraint-based modelling approach. Using a recent algorithm, I looked, in the space of possible solutions, for the one that minimized the distance between the flux vector and the vector of the observed abundances of proteins. I was able to predict unobserved fluxes, and to compare correlation patterns at different integration levels. Data allowed to distinguish between two major types of fermentation or life history traits whose biochemical interpretation is consistent in terms of trade-off, and which had not been highlighted from quantitative proteomic data alone. Altogether, my thesis work allows a better understanding of the evolution of the genotype-phenotype map

APA, Harvard, Vancouver, ISO, and other styles

36

Yee, Thomas William. "The Analysis of binary data in quantitative plant ecology." Thesis, University of Auckland, 1993. http://hdl.handle.net/2292/1973.

Full text

Abstract:

The analysis of presence/absence data of plant species by regression analysis is the subject of this thesis. A nonparametric approach is emphasized, and methods which take into account correlations between species are also considered. In particular, generalized additive models (GAMs) are used, and these are applied to species’ responses to greenhouse scenarios and to examine multispecies interactions. Parametric models are used to estimate optimal conditions for the presence of species and to test several niche theory hypotheses. An extension of GAMs called vector GAMs is proposed, and they provide a means for proposing nonparametric versions of the following models: multivariate regression, the proportional and nonproportional odds model, the multiple logistic regression model, and bivariate binary regression models such as bivariate probit model and the bivariate logistic model. Some theoretical properties of vector GAMs are deduced from those pertaining to ordinary GAMs, and its relationship with the generalized estimating equations (GEE) approach elucidated.
Whole document restricted, but available by request, use the feedback form to request access.

APA, Harvard, Vancouver, ISO, and other styles

37

Wu, Chiung Ting. "Machine Learning Approaches for Modeling and Correction of Confounding Effects in Complex Biological Data." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/103739.

Full text

Abstract:

With the huge volume of biological data generated by new technologies and the booming of new machine learning based analytical tools, we expect to advance life science and human health at an unprecedented pace. Unfortunately, there is a significant gap between the complex raw biological data from real life and the data required by mathematical and statistical tools. This gap is contributed by two fundamental and universal problems in biological data that are both related to confounding effects. The first is the intrinsic complexities of the data. An observed sample could be the mixture of multiple underlying sources and we may be only interested in one or part of the sources. The second type of complexities come from the acquisition process of the data. Different samples may be gathered at different time and/or from different locations. Therefore, each sample is associated with specific distortion that must be carefully addressed. These confounding effects obscure the signals of interest in the acquired data. Specifically, this dissertation will address the two major challenges in confounding effects removal: alignment and deconvolution. Liquid chromatography–mass spectrometry (LC-MS) is a standard method for proteomics and metabolomics analysis of biological samples. Unfortunately, it suffers from various changes in the retention time (RT) of the same compound in different samples, and these must be subsequently corrected (aligned) during data processing. Classic alignment methods such as in the popular XCMS package often assume a single time-warping function for each sample. Thus, the potentially varying RT drift for compounds with different masses in a sample is neglected in these methods. Moreover, the systematic change in RT drift across run order is often not considered by alignment algorithms. Therefore, these methods cannot effectively correct all misalignments. To utilize this information, we develop an integrated reference-free profile alignment method, neighbor-wise compound-specific Graphical Time Warping (ncGTW), that can detect misaligned features and align profiles by leveraging expected RT drift structures and compound-specific warping functions. Specifically, ncGTW uses individualized warping functions for different compounds and assigns constraint edges on warping functions of neighboring samples. We applied ncGTW to two large-scale metabolomics LC-MS datasets, which identifies many misaligned features and successfully realigns them. These features would otherwise be discarded or uncorrected using existing methods. When the desired signal is buried in a mixture, deconvolution is needed to recover the pure sources. Many biological questions can be better addressed when the data is in the form of individual sources, instead of mixtures. Though there are some promising supervised deconvolution methods, when there is no a priori information, unsupervised deconvolution is still needed. Among current unsupervised methods, Convex Analysis of Mixtures (CAM) is the most theoretically solid and strongest performing one. However, there are some major limitations of this method. Most importantly, the overall time complexity can be very high, especially when analyzing a large dataset or a dataset with many sources. Also, since there are some stochastic and heuristic steps, the deconvolution result is not accurate enough. To address these problems, we redesigned the modules of CAM. In the feature clustering step, we propose a clustering method, radius-fixed clustering, which could not only control the space size of the cluster, but also find out the outliers simultaneously. Therefore, the disadvantages of K-means clustering, such as instability and the need of cluster number are avoided. Moreover, when identifying the convex hull, we replace Quickhull with linear programming, which decreases the computation time significantly. To avoid the not only heuristic but also approximated step in optimal simplex identification, we propose a greedy search strategy instead. The experimental results demonstrate the vast improvement of computation time. The accuracy of the deconvolution is also shown to be higher than the original CAM.
Doctor of Philosophy
Due to the complexity of biological data, there are two major pre-processing steps: alignment and deconvolution. The alignment step corrects the time and location related data acquisition distortion by aligning the detected signals to a reference signal. Though many alignment methods are proposed for biological data, most of them fail to consider the relationships among samples carefully. This piece of structure information can help alignment when the data is noisy and/or irregular. To utilize this information, we develop a new method, Neighbor-wise Compound-specific Graphical Time Warping (ncGTW), inspired by graph theory. This new alignment method not only utilizes the structural information but also provides a reference-free solution. We show that the performance of our new method is better than other methods in both simulations and real datasets. When the signal is from a mixture, deconvolution is needed to recover the pure sources. Many biological questions can be better addressed when the data is in the form of single sources, instead of mixtures. There is a classic unsupervised deconvolution method: Convex Analysis of Mixtures (CAM). However, there are some limitations of this method. For example, the time complexity of some steps is very high. Thus, when facing a large dataset or a dataset with many sources, the computation time would be extremely long. Also, since there are some stochastic and heuristic steps, the deconvolution result may be not accurate enough. We improved CAM and the experimental results show that the speed and accuracy of the deconvolution is significantly improved.

APA, Harvard, Vancouver, ISO, and other styles

38

Winter, Eitan E. "Evolutionary analyses of protein-coding genes using large biological data sets." Thesis, University of Oxford, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.427615.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Hanisch, Daniel [Verfasser]. "New Analysis Methods for Gene Expression Data via Construction and Incorporation of Biological Networks / Daniel Hanisch." Aachen : Shaker, 2005. http://d-nb.info/1181615232/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Sun, Guoli. "Significant distinct branches of hierarchical trees| A framework for statistical analysis and applications to biological data." Thesis, State University of New York at Stony Brook, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3685086.

Full text

Abstract:

One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity.

We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. With each of the five datasets, there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques.

One dataset uses Cores Of Recurrent Events (CORE) to select features. CORE was developed with my participation in the course of this work. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/CORE/index.html.

Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/TBEST/index.html.

APA, Harvard, Vancouver, ISO, and other styles

41

Watzl, June Qiongye. "Improved magnetic resonance spectroscopy data acquisition and processing for the study of biological specimens." Thesis, The University of Sydney, 2001. https://hdl.handle.net/2123/27715.

Full text

Abstract:

Two-dimensional COSY spectroscopy has been developed as an adjunct for the detection and staging of malignant disease. However the application of 2D spectroscopy clinically has been hampered by difficulties in quantification resulting from variabilities in the cross peak volumes measured in 2D COSY spectra. The aim of this thesis is to optimise data acquisition parameters and post acquisitional data processing parameters for two dimensional (2D) MR spectroscopy for use with biological specimens. In this thesis, the factors affecting cross peak volumes in twodimensional COSY spectra of human thyroid biopsy tissues are systematically measured and catalogued. Model systems of cultured human colorectal cells and standard samples of amino acids are used to demonstrate that similar dependencies in cross peak volume profiles can be obtained across a wide range of biological specimens. Computer simulation of 2D COSY spectra of different spin systems, is used to model the dependence of cross peak volumes on the T2, J coupling and of the sample components. The effects of spin system and on the short and long range couplings are described. This information is used to design a new method of 2D data acquisition in which the time of data acquisition can be reduced by up to 37.5%. A step—shaped or weighted acquisition scheme is used to mould the data set to the desired final data shape in t1, by modulating the number of scans collected in each free induction decay. Compensating window functions (OPERA house - Optimised Processing to Enhance time—Reduced Acquisition) are designed to minimise artefacts arising from the stepped acquisition. This scheme is successfully used to reduce acquisition time, in magnitude-mode and phase—sensitive double quantum filtered lH-lH COSY spectra of amino acid standards and cultured human tumour cells. Finally, a small study of thyroid biopsies is performed where the potential clinical usefulness of these methods are demonstrated by showing that the collection of additional information is possible with increased resolution in a reduced experimental time.

APA, Harvard, Vancouver, ISO, and other styles

42

SARTOR, MAUREEN A. "TESTING FOR DIFFERENTIALLY EXPRESSED GENES AND KEY BIOLOGICAL CATEGORIES IN DNA MICROARRAY ANALYSIS." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1195656673.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Verzotto, Davide. "Advanced Computational Methods for Massive Biological Sequence Analysis." Doctoral thesis, Università degli studi di Padova, 2011. http://hdl.handle.net/11577/3426282.

Full text

Abstract:

With the advent of modern sequencing technologies massive amounts of biological data, from protein sequences to entire genomes, are becoming increasingly available. This poses the need for the automatic analysis and classification of such a huge collection of data, in order to enhance knowledge in the Life Sciences. Although many research efforts have been made to mathematically model this information, for example finding patterns and similarities among protein or genome sequences, these approaches often lack structures that address specific biological issues. In this thesis, we present novel computational methods for three fundamental problems in molecular biology: the detection of remote evolutionary relationships among protein sequences, the identification of subtle biological signals in related genome or protein functional sites, and the phylogeny reconstruction by means of whole-genome comparisons. The main contribution is given by a systematic analysis of patterns that may affect these tasks, leading to the design of practical and efficient new pattern discovery tools. We thus introduce two advanced paradigms of pattern discovery and filtering based on the insight that functional and conserved biological motifs, or patterns, should lie in different sites of sequences. This enables to carry out space-conscious approaches that avoid a multiple counting of the same patterns. The first paradigm considered, namely irredundant common motifs, concerns the discovery of common patterns, for two sequences, that have occurrences not covered by other patterns, whose coverage is defined by means of specificity and extension. The second paradigm, namely underlying motifs, concerns the filtering of patterns, from a given set, that have occurrences not overlapping other patterns with higher priority, where priority is defined by lexicographic properties of patterns on the boundary between pattern matching and statistical analysis. We develop three practical methods directly based on these advanced paradigms. Experimental results indicate that we are able to identify subtle similarities among biological sequences, using the same type of information only once. In particular, we employ the irredundant common motifs and the statistics based on these patterns to solve the remote protein homology detection problem. Results show that our approach, called Irredundant Class, outperforms the state-of-the-art methods in a challenging benchmark for protein analysis. Afterwards, we establish how to compare and filter a large number of complex motifs (e.g., degenerate motifs) obtained from modern motif discovery tools, in order to identify subtle signals in different biological contexts. In this case we employ the notion of underlying motifs. Tests on large protein families indicate that we drastically reduce the number of motifs that scientists should manually inspect, further highlighting the actual functional motifs. Finally, we combine the two proposed paradigms to allow the comparison of whole genomes, and thus the construction of a novel and practical distance function. With our method, called Unic Subword Approach, we relate to each other the regions of two genome sequences by selecting conserved motifs during evolution. Experimental results show that our approach achieves better performance than other state-of-the-art methods in the whole-genome phylogeny reconstruction of viruses, prokaryotes, and unicellular eukaryotes, further identifying the major clades of these organisms.
Con l'avvento delle moderne tecnologie di sequenziamento, massive quantità di dati biologici, da sequenze proteiche fino a interi genomi, sono disponibili per la ricerca. Questo progresso richiede l'analisi e la classificazione automatica di tali collezioni di dati, al fine di migliorare la conoscenza nel campo delle Scienze della Vita. Nonostante finora siano stati proposti molti approcci per modellare matematicamente le sequenze biologiche, ad esempio cercando pattern e similarità tra sequenze genomiche o proteiche, questi metodi spesso mancano di strutture in grado di indirizzare specifiche questioni biologiche. In questa tesi, presentiamo nuovi metodi computazionali per tre problemi fondamentali della biologia molecolare: la scoperta di relazioni evolutive remote tra sequenze proteiche, l'individuazione di segnali biologici complessi in siti funzionali tra loro correlati, e la ricostruzione della filogenesi di un insieme di organismi, attraverso la comparazione di interi genomi. Il principale contributo è dato dall'analisi sistematica dei pattern che possono interessare questi problemi, portando alla progettazione di nuovi strumenti computazionali efficaci ed efficienti. Vengono introdotti così due paradigmi avanzati per la scoperta e il filtraggio di pattern, basati sull'osservazione che i motivi biologici funzionali, o pattern, sono localizzati in differenti regioni delle sequenze in esame. Questa osservazione consente di realizzare approcci parsimoniosi in grado di evitare un conteggio multiplo degli stessi pattern. Il primo paradigma considerato, ovvero irredundant common motifs, riguarda la scoperta di pattern comuni a coppie di sequenze che hanno occorrenze non coperte da altri pattern, la cui copertura è definita da una maggiore specificità e/o possibile estensione dei pattern. Il secondo paradigma, ovvero underlying motifs, riguarda il filtraggio di pattern che hanno occorrenze non sovrapposte a quelle di altri pattern con maggiore priorità, dove la priorità è definita da proprietà lessicografiche dei pattern al confine tra pattern matching e analisi statistica. Sono stati sviluppati tre metodi computazionali basati su questi paradigmi avanzati. I risultati sperimentali indicano che i nostri metodi sono in grado di identificare le principali similitudini tra sequenze biologiche, utilizzando l'informazione presente in maniera non ridondante. In particolare, impiegando gli irredundant common motifs e le statistiche basate su questi pattern risolviamo il problema della rilevazione di omologie remote tra proteine. I risultati evidenziano che il nostro approccio, chiamato Irredundant Class, ottiene ottime prestazioni su un benchmark impegnativo, e migliora i metodi allo stato dell'arte. Inoltre, per individuare segnali biologici complessi utilizziamo la nozione di underlying motifs, definendo così alcune modalità per il confronto e il filtraggio di motivi degenerati ottenuti tramite moderni strumenti di pattern discovery. Esperimenti su grandi famiglie proteiche dimostrano che il nostro metodo riduce drasticamente il numero di motivi che gli scienziati dovrebbero altrimenti ispezionare manualmente, mettendo in luce inoltre i motivi funzionali identificati in letteratura. Infine, combinando i due paradigmi proposti presentiamo una nuova e pratica funzione di distanza tra interi genomi. Con il nostro metodo, chiamato Unic Subword Approach, relazioniamo tra loro le diverse regioni di due sequenze genomiche, selezionando i motivi conservati durante l'evoluzione. I risultati sperimentali evidenziano che il nostro approccio offre migliori prestazioni rispetto ad altri metodi allo stato dell'arte nella ricostruzione della filogenesi di organismi quali virus, procarioti ed eucarioti unicellulari, identificando inoltre le sottoclassi principali di queste specie.

APA, Harvard, Vancouver, ISO, and other styles

44

Lawrence, Michael. "Interactive graphics, graphical user interfaces and software interfaces for the analysis of biological experimental data and networks." [Ames, Iowa : Iowa State University], 2008.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

45

Singh, Nitesh Kumar [Verfasser]. "Integrating diverse biological sources and computational methods for the analysis of high-throughput expression data / Nitesh Kumar Singh." Greifswald : Universitätsbibliothek Greifswald, 2014. http://d-nb.info/1060136937/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Blankenship, James R. "Assessing the ability of hyperspectral data to detect Lyngbya SPP a potential biological indicator for presence of metal objects in the littoral environment." Thesis, Monterey, Calif. : Naval Postgraduate School, 2006. http://bosun.nps.edu/uhtbin/hyperion.exe/06Dec%5FBlankenship.pdf.

Full text

Abstract:

Thesis (M.S. in Space Systems Operations)--Naval Postgraduate School, December 2006.
Thesis Advisor(s): Daria Siciliano, R. C. Olsen. "December 2006." Includes bibliographical references (p. 233-239). Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

47

Castellano, Escuder Pol. "Statistical methods for intake prediction and biological significance analysis in nutrimetabolomic studies." Doctoral thesis, Universitat de Barcelona, 2021. http://hdl.handle.net/10803/673827.

Full text

Abstract:

This thesis is the product of three and a half years working on the complex world of metabolomics and nutrition. All the work presented here is focussed on the problems arising from associating and integrating metabolomics data with nutritional or dietary data. This issue has been approached using both observational and interventional studies and from a mainly bioinformatic point of view, proposing different methods and tools to reduce the complexity of nutrimetabolomics data analysis. Thus, this work consists of four chapters divided into three parts, in addition to a summary of the content of the entire thesis in Catalan, the references, and the appendices. The first part consists of a global introduction, where the fundamental concepts needed for the correct understanding of the thesis are reviewed, as well as basic concepts about metabolomics and nutrition, the state of the art of the nutrimetabolomics field, and the fundamentals of biological significance analyses, among others. Then, this first part ends with a brief definition of the objectives of this work. In the second part, the results of this thesis are carefully presented and discussed. The results are presented in a compact format, with each section being a summary of a scientific publication. These results include the develompent of an ontology that defines the relationships between dietary metabolites and foods, the development of an open source tool for metabolomics data analysis, the development of an open source tool for nutrimetabolomics enrichment analysis, other open source tools developed in the context of this work, and a section with different publications where the methods and tools developed have been applied. Then, all these individual results are discussed together, providing a global and unified context where all the developments of this thesis are related. Lastly, the third part of this thesis presents the conclusions, contextualizing all the obtained results within the main objective of the thesis: contribute to the improvement of the integration and interpretation of nutrimetabolomics data. Additionally, in the appendices, the published results and some extra information used in carrying out this research are presented. Finally, although this thesis is made up of contents from the fields of metabolomics, nutrition, bioinformatics and biostatistics, it has been written for a wide scientific audience, trying to be as comprehensible as possible for any profile of researchers, avoiding unnecessary complexities and always following the transversal objective of the thesis. I hope you find it useful but, above all, that you enjoy reading it.

APA, Harvard, Vancouver, ISO, and other styles

48

Eberle, Jonas [Verfasser]. "Reconciling Molecular with Biological and Morphological Data Towards an Integrative Analysis of the Evolutionary Biology of Chafers / Jonas Eberle." Bonn : Universitäts- und Landesbibliothek Bonn, 2018. http://d-nb.info/1163662356/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Savelli, Raphaël. "Study of microphytobenthos dynamics in temperate intertidal mudflats by using physical-biological coupled modelling and remote sensing data analysis." Thesis, La Rochelle, 2019. http://www.theses.fr/2019LAROS030.

Full text

Abstract:

La production primaire (PP) élevée des vasières intertidales aux latitudes tempérées est principalement assurée par le microphytobenthos (MPB), qui soutient les réseaux trophiques benthiques et pélagiques. Dans cette thèse, nous utilisons un modèle couplé physique-biologie pour étudier la variabilité spatio-temporelle de la dynamique du MPB sur une vasière intertidale de la côte Atlantique française. Le modèle simule explicitement la biomasse du MPB et du brouteur Peringia ulvae. Les résultats fournissent des conclusions clés sur la dynamique du MPB. À l’hiver et au printemps, une lumière et une température de surface de la vase (TSV) optimales pour la croissance du MPB donnent lieu à une efflorescence printanière du MPB. La lumière est le facteur le plus limitant annuellement. Cependant, la TSV limite la croissance du MPB 40 % du temps en été. La photoinhibition pourrait se superposer à la thermoinhibition au printemps et en été. Le broutage et la remise en suspension (RES) du MPB façonnent également la dynamique du MPB. La bioturbation par P. ulvae contribue à une RES chronique du MPB du sédiment vers la colonne d’eau au printemps et en été. Les vagues contribuent à la RES du MPB par le biais d’événements de RES massive en hiver, au printemps et en automne. 50 % de la PP annuelle du MPB est exporté vers la colonne d’eau par le biais de RES chroniques et massives. Nous avons également développé une méthode qui combine les données de télédétection et les résultats du modèle couplé physique-biologique en un algorithme capable de prédire la PP à partir de données satellitales. En plus d’apporter de nouvelles perspectives sur la dynamique du MPB, ce travail propose de nouveaux outils numériques pour surveiller et prédire la PP du MPB et son devenir dans les eaux côtières dans un contexte de changement climatique
The high primary production (PP) of intertidal mudflats at temperate latitudes is mostly supported by microphytobenthos (MPB), which support both benthic and pelagic food webs. In the present thesis, we use a physical-biological coupled model to investigate the spatial and temporal variability of MPB dynamics on a large temperate intertidal mudflat of the French Atlantic coast. The model explicitly simulates the MPB biomass and the grazer (Peringia ulvae) biomass and density. The outputs provide key findings on MPB dynamics. In winter-spring, optimal light and mud surface temperature (MST) conditions for MPB growth lead to a MPB spring bloom. Light is the most limiting driver over the year. However, a high MST limits the MPB growth 40% of the time during summer. The photoinhibition of MPB photosynthesis can potentially superimpose on thermoinhibition in spring-summer. Grazing and resuspension of MPB biomass also shape the dynamics of the MPB biomass. Bioturbation by P. ulvae contributes to a chronic export of MPB biomass from the sediment to the water column in spring-summer. Waves contribute to the MPB resuspension through massive resuspension events in winter, spring and fall. 50% of the annual MPB PP is exported to the water column through chronic and massive resuspension events. We also developed a new method that combine remote sensing data with outputs of the physical-biological coupled model into a single algorithm that can predict PP from satellite data. In addition to bring new insights on the MPB dynamics, this work proposes new numerical tools to monitor and predict MPB PP and its fate in coastal waters in a context of climate change

APA, Harvard, Vancouver, ISO, and other styles

50

Andorf, Sandra. "A systems biological approach towards the molecular basis of heterosis in Arabidopsis thaliana." Phd thesis, Universität Potsdam, 2011. http://opus.kobv.de/ubp/volltexte/2011/5117/.

Full text

Abstract:

Heterosis is defined as the superiority in performance of heterozygous genotypes compared to their corresponding genetically different homozygous parents. This phenomenon is already known since the beginning of the last century and it has been widely used in plant breeding, but the underlying genetic and molecular mechanisms are not well understood. In this work, a systems biological approach based on molecular network structures is proposed to contribute to the understanding of heterosis. Hybrids are likely to contain additional regulatory possibilities compared to their homozygous parents and, therefore, they may be able to correctly respond to a higher number of environmental challenges, which leads to a higher adaptability and, thus, the heterosis phenomenon. In the network hypothesis for heterosis, presented in this work, more regulatory interactions are expected in the molecular networks of the hybrids compared to the homozygous parents. Partial correlations were used to assess this difference in the global interaction structure of regulatory networks between the hybrids and the homozygous genotypes. This network hypothesis for heterosis was tested on metabolite profiles as well as gene expression data of the two parental Arabidopsis thaliana accessions C24 and Col-0 and their reciprocal crosses. These plants are known to show a heterosis effect in their biomass phenotype. The hypothesis was confirmed for mid-parent and best-parent heterosis for either hybrid of our experimental metabolite as well as gene expression data. It was shown that this result is influenced by the used cutoffs during the analyses. Too strict filtering resulted in sets of metabolites and genes for which the network hypothesis for heterosis does not hold true for either hybrid regarding mid-parent as well as best-parent heterosis. In an over-representation analysis, the genes that show the largest heterosis effects according to our network hypothesis were compared to genes of heterotic quantitative trait loci (QTL) regions. Separately for either hybrid regarding mid-parent as well as best-parent heterosis, a significantly larger overlap between the resulting gene lists of the two different approaches towards biomass heterosis was detected than expected by chance. This suggests that each heterotic QTL region contains many genes influencing biomass heterosis in the early development of Arabidopsis thaliana. Furthermore, this integrative analysis led to a confinement and an increased confidence in the group of candidate genes for biomass heterosis in Arabidopsis thaliana identified by both approaches.
Als Heterosis-Effekt wird die Überlegenheit in einem oder mehreren Leistungsmerkmalen (z.B. Blattgröße von Pflanzen) von heterozygoten (mischerbigen) Nachkommen über deren unterschiedlich homozygoten (reinerbigen) Eltern bezeichnet. Dieses Phänomen ist schon seit Beginn des letzten Jahrhunderts bekannt und wird weit verbreitet in der Pflanzenzucht genutzt. Trotzdem sind die genetischen und molekularen Grundlagen von Heterosis noch weitestgehend unbekannt. Es wird angenommen, dass heterozygote Individuen mehr regulatorische Möglichkeiten aufweisen als ihre homozygoten Eltern und sie somit auf eine größere Anzahl an wechselnden Umweltbedingungen richtig reagieren können. Diese erhöhte Anpassungsfähigkeit führt zum Heterosis-Effekt. In dieser Arbeit wird ein systembiologischer Ansatz, basierend auf molekularen Netzwerkstrukturen verfolgt, um zu einem besseren Verständnis von Heterosis beizutragen. Dazu wird eine Netzwerkhypothese für Heterosis vorgestellt, die vorhersagt, dass die heterozygoten Individuen, die Heterosis zeigen, mehr regulatorische Interaktionen in ihren molekularen Netzwerken aufweisen als die homozygoten Eltern. Partielle Korrelationen wurden verwendet, um diesen Unterschied in den globalen Interaktionsstrukturen zwischen den Heterozygoten und ihren homozygoten Eltern zu untersuchen. Die Netzwerkhypothese wurde anhand von Metabolit- und Genexpressionsdaten der beiden homozygoten Arabidopsis thaliana Pflanzenlinien C24 und Col-0 und deren wechselseitigen Kreuzungen getestet. Arabidopsis thaliana Pflanzen sind bekannt dafür, dass sie einen Heterosis-Effekt im Bezug auf ihre Biomasse zeigen. Die heterozygoten Pflanzen weisen bei gleichem Alter eine höhere Biomasse auf als die homozygoten Pflanzen. Die Netzwerkhypothese für Heterosis konnte sowohl im Bezug auf mid-parent Heterosis (Unterschied in der Leistung des Heterozygoten im Vergleich zum Mittelwert der Eltern) als auch auf best-parent Heterosis (Unterschied in der Leistung des Heterozygoten im Vergleich zum Besseren der Eltern) für beide Kreuzungen für die Metabolit- und Genexpressionsdaten bestätigt werden. In einer Überrepräsentations-Analyse wurden die Gene, für die die größte Veränderung in der Anzahl der regulatorischen Interaktionen, an denen sie vermutlich beteiligt sind, festgestellt wurde, mit den Genen aus einer quantitativ genetischen (QTL) Analyse von Biomasse-Heterosis in Arabidopsis thaliana verglichen. Die ermittelten Gene aus beiden Studien zeigen eine größere Überschneidung als durch Zufall erwartet. Das deutet darauf hin, dass jede identifizierte QTL-Region viele Gene, die den Biomasse-Heterosis-Effekt in Arabidopsis thaliana beeinflussen, enthält. Die Gene, die in den Ergebnislisten beider Analyseverfahren überlappen, können mit größerer Zuversicht als Kandidatengene für Biomasse-Heterosis in Arabidopsis thaliana betrachtet werden als die Ergebnisse von nur einer Studie.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Analysis of biological data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles