Dissertations / Theses on the topic 'Biological data'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Biological data.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Rundqvist, David. "Grouping Biological Data." Thesis, Linköping University, Department of Computer and Information Science, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-6327.
Full textToday, scientists in various biomedical fields rely on biological data sources in their research. Large amounts of information concerning, for instance, genes, proteins and diseases are publicly available on the internet, and are used daily for acquiring knowledge. Typically, biological data is spread across multiple sources, which has led to heterogeneity and redundancy.
The current thesis suggests grouping as one way of computationally managing biological data. A conceptual model for this purpose is presented, which takes properties specific for biological data into account. The model defines sub-tasks and key issues where multiple solutions are possible, and describes what approaches for these that have been used in earlier work. Further, an implementation of this model is described, as well as test cases which show that the model is indeed useful.
Since the use of ontologies is relatively new in the management of biological data, the main focus of the thesis is on how semantic similarity of ontological annotations can be used for grouping. The results of the test cases show for example that the implementation of the model, using Gene Ontology, is capable of producing groups of data entries with similar molecular functions.
Hasegawa, Takanori. "Reconstructing Biological Systems Incorporating Multi-Source Biological Data via Data Assimilation Techniques." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/195985.
Full textJakonienė, Vaida. "Integration of biological data /." Linköping : Linköpings universitet, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-7484.
Full textJakonienė, Vaida. "Integration of Biological Data." Doctoral thesis, Linköpings universitet, IISLAB - Laboratoriet för intelligenta informationssystem, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-7484.
Full textDost, Banu. "Optimization algorithms for biological data." Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/ucsd/fullcit?p3397170.
Full textTitle from first page of PDF file (viewed March 23, 2010). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 149-159).
Schmidberger, Markus. "Parallel Computing for Biological Data." Diss., lmu, 2009. http://nbn-resolving.de/urn:nbn:de:bvb:19-104921.
Full textBERNARDINI, GIULIA. "COMBINATORIAL METHODS FOR BIOLOGICAL DATA." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2021. http://hdl.handle.net/10281/305220.
Full textThe main goal of this thesis is to develop new algorithmic frameworks to deal with (i) a convenient representation of a set of similar genomes and (ii) phylogenetic data, with particular attention to the increasingly accurate tumor phylogenies. A “pan-genome” is, in general, any collection of genomic sequences to be analyzed jointly or to be used as a reference for a population. A phylogeny, in turn, is meant to describe the evolutionary relationships among a group of items, be they species of living beings, genes, natural languages, ancient manuscripts or cancer cells. With the exception of one of the results included in this thesis, related to the analysis of tumor phylogenies, the focus of the whole work is mainly theoretical, the intent being to lay firm algorithmic foundations for the problems by investigating their combinatorial aspects, rather than to provide practical tools for attacking them. Deep theoretical insights on the problems allow a rigorous analysis of existing methods, identifying their strong and weak points, providing details on how they perform and helping to decide which problems need to be further addressed. In addition, it is often the case where new theoretical results (algorithms, data structures and reductions to other well-studied problems) can either be directly applied or adapted to fit the model of a practical problem, or at least they serve as inspiration for developing new practical tools. The first part of this thesis is devoted to methods for handling an elastic-degenerate text, a computational object that compactly encodes a collection of similar texts, like a pan-genome. Specifically, we attack the problem of matching a sequence in an elastic-degenerate text, both exactly and allowing a certain amount of errors, and the problem of comparing two degenerate texts. In the second part we consider both tumor phylogenies, describing the evolution of a tumor, and “classical” phylogenies, representing, for instance, the evolutionary history of the living beings. In particular, we present new techniques to compare two or more tumor phylogenies, needed to evaluate the results of different inference methods, and we give a new, efficient solution to a longstanding problem on “classical” phylogenies: to decide whether, in the presence of missing data, it is possible to arrange a set of species in a phylogenetic tree that enjoys specific properties.
Chakraborty, Ushashi. "Finding the Most Predictive Data Source in Biological Data." Thesis, North Dakota State University, 2013. https://hdl.handle.net/10365/26567.
Full textGel, Moreno Bernat. "Dissemination and visualisation of biological data." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/283143.
Full textLes recents millores tecnològiques han portat a una explosió en la quantitat de dades biològiques que es generen i a l'aparició de nous reptes en el camp de la gestió de les dades biològiques. Per a maximitzar el coneixement que podem extreure d'aquestes ingents quantitats de dades cal que solucionem el problemes associats al seu anàlisis, i en particular a la seva disseminació i visualització. La compartició d'aquestes dades de manera lliure i gratuïta pot beneficiar en gran mesura a la comunitat científica i a la societat en general, però per a fer-ho calen noves eines i tècniques. Actualment, molts grups són capaços de generar grans conjunts de dades i la seva publicació en pot incrementar molt el valor científic. A més, la disponibilitat de grans conjunts de dades és necessària per al desenvolupament de nous algorismes d'anàlisis. És important, doncs, que les dades biològiques que es generen siguin accessibles de manera senzilla, estandaritzada i lliure. Disseminació El Sistema d'Anotació Distribuïda (DAS) és un protocol dissenyat per a la publicació i integració d'anotacions sobre entitats biològiques de manera distribuïda. DAS segueix una esquema de client-servidor, on el client obté dades d'un o més servidors per a combinar-les, processar-les o visualitzar-les. Avui dia, però, crear un servidor DAS necessita uns coneixements i infraestructures que van més enllà dels recursos de molts grups de recerca. Per això, hem creat easyDAS, una plataforma per a la creació automàtica de servidors DAS. Amb easyDAS un usuari pot crear un servidor DAS a través d'una senzilla interfície web i amb només alguns clics. Visualització Els navegadors genomics són un dels paradigmes de de visualització de dades genòmiques més usats i permet veure conjunts de dades posicionades al llarg d'una seqüència. Movent-se al llarg d'aquesta seqüència és possibles explorar aquestes dades. Quan aquest projecte va començar, l'any 2007, tots els grans navegadors genomics oferien una interactivitat limitada basada en l'ús de botons. Des d'un punt de vista d'arquitectura tots els navegadors basats en web eren molt semblants: un client senzill encarregat d'ensenyar les imatges i un servidor complex encarregat d'obtenir les dades, processar-les i generar les imatges. Així, cada canvi en els paràmetres de visualització requeria una nova petició al servidor, impactant molt negativament en la velocitat de resposta percebuda. Vam crear un prototip de navegador genòmic anomenat GenExp. És un navegador interactiu basat en web que fa servir canvas per a dibuixar en client i que ofereix la possibilitatd e manipulació directa de la respresentació del genoma. GenExp té a més algunes característiques úniques com la possibilitat de crear multiples finestres de visualització o la possibilitat de guardar i compartir sessions de navegació. A més, com que és un client DAS pot integrar les dades de qualsevol servidor DAS com els d'Ensembl, UCSC o fins i tot aquells creats amb easyDAS. A més, hem desenvolupat jsDAS, la primera llibreria de client DAS completa escrita en javascript. jsDAS es pot integrar en qualsevol aplicació DAS per a dotar-la de la possibilitat d'accedir a dades de servidors DAS. Tot el programari desenvolupat en el marc d'aquesta tesis està lliurement disponible i sota una llicència de codi lliure.
Droop, Alastair Philip. "Correlation Analysis of Multivariate Biological Data." Thesis, University of York, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.507622.
Full textMcCormick, Paul Stephen. "Statistical analysis of biological expression data." Thesis, University of Cambridge, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.613819.
Full textBokhari, Yahya. "DISCOVERING DRIVER MUTATIONS IN BIOLOGICAL DATA." VCU Scholars Compass, 2018. https://scholarscompass.vcu.edu/etd/5637.
Full textLuo, Jun. "Mining algorithms for generic and biological data." [Gainesville, Fla.]: University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE0000567.
Full textZou, Cunlu. "Applications of Granger causality to biological data." Thesis, University of Warwick, 2010. http://wrap.warwick.ac.uk/35694/.
Full textWaterworth, Alan Richard. "Data analysis techniques of measured biological impedance." Thesis, University of Sheffield, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.340146.
Full textStyczynski, Mark Philip-Walter. "Applications of motif discovery in biological data." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/38976.
Full textIncludes bibliographical references (p. 437-458).
Sequential motif discovery, the ability to identify conserved patterns in ordered datasets without a priori knowledge of exactly what those patterns will be, is a frequently encountered and difficult problem in computational biology and biochemical engineering. The most prevalent example of such a problem is finding conserved DNA sequences in the upstream regions of genes that are believed to be coregulated. Other examples are as diverse as identifying conserved secondary structure in proteins and interpreting time-series data. This thesis creates a unified, generic approach to addressing these (and other) problems in sequential motif discovery and demonstrates the utility of that approach on a number of applications. A generic motif discovery algorithm was created for the purpose of finding conserved patterns in arbitrary data types. This approach and implementation, name Gemoda, decouples three key steps in the motif discovery process: comparison, clustering, and convolution. Since it decouples these steps, Gemoda is a modular algorithm; that is, any comparison metric can be used with any clustering algorithm and any convolution scheme. The comparison metric is a data-specific function that transforms the motif discovery problem into a solvable graph-theoretic problem that still adequately represents the important similarities in the data.
(cont.) This thesis presents the development of Gemoda as well as applications of this approach in a number of different contexts. One application is an exhaustive solution of an abstraction of the transcription factor binding site discovery problem in DNA. A similar application is to the analysis of upstream regions of regulons in microbial DNA. Another application is the identification of protein sequence homologies in a set of related proteins in the presence of significant noise. A quite different application is the discovery of extended local secondary structure homology between a protein and a protein complex known to be in the same structural family. The final application is to the analysis of metabolomic datasets. The diversity of these sample applications, which range from the analysis of strings (like DNA and amino acid sequences) to real-valued data (like protein structures and metabolomic datasets) demonstrates that our generic approach is successful and useful for solving established and novel problems alike. The last application, of analyzing metabolomic datasets, is of particular interest. Using Gemoda, an appropriate comparison function, and appropriate data handling, a novel and useful approach to the interpretation of metabolite profiling datasets obtained from gas chromatography coupled to mass spectrometry is developed.
(cont.) The use of a motif discovery approach allows for the expansion of the scope of metabolites that can be tracked and analyzed in an untargeted metabolite profiling (or metabolomic) experiment. This new approach, named SpectConnect, is presented herein along with examples that verify its efficacy and utility in some validation experiments. The beginning of a broader application of SpectConnect's potential is presented as well. The success of SpectConnect, a novel application of Gemoda, validates the utility of a truly generic approach to motif discovery. By not getting bogged down in the specifics of a type of data and a problem unique to that type of data, a broader class of problems can be addressed that otherwise would have been extremely difficult to handle.
by Mark Philip-Walter Styczynski.
Ph.D.
Scelfo, Tony (Tony W. ). "Data visualization of biological microscopy image analyses." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/37073.
Full textIncludes bibliographical references.
The Open Microscopy Environment (OME) provides biologists with a framework to store, analyze and manipulate large sets of image data. Current microscopes are capable of generating large numbers of images and when coupled with automated analysis routines, researchers are able to generate intractable sets of data. I have developed an extension to the OME toolkit, named the LoViewer, which allows researchers to quickly identify clusters of images based on relationships between analytically measured parameters. By identifying unique subsets of data, researchers are able to make use of the rest of the OME client software to view interesting images in high resolution, classify them into category groups and apply further analysis routines. The design of the LoViewer itself and its integration with the rest of the OME toolkit will be discussed in detail in body of this thesis.
by Tony Scelfo.
M.Eng.and S.B.
Becker, Katinka [Verfasser]. "Logical Analysis of Biological Data / Katinka Becker." Berlin : Freie Universität Berlin, 2021. http://d-nb.info/1241541779/34.
Full textSlotta, Douglas J. "Evalutating Biological Data Using Rank Correlation Methods." Diss., Virginia Tech, 2005. http://hdl.handle.net/10919/27613.
Full textPh. D.
Anderson, Sarah G. "Statistical Methods for Biological and Relational Data." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1365441350.
Full textEren, Kemal. "Application of biclustering algorithms to biological data." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332533492.
Full textMahammad, Beigi Majid. "Kernel methods for high-dimensional biological data." [S.l. : s.n.], 2008.
Find full textREHMAN, HAFEEZ UR. "Integration and Analysis of Heterogeneous Biological Data." Doctoral thesis, Politecnico di Torino, 2014. http://hdl.handle.net/11583/2537092.
Full textLi, Honghao. "Interpretable biological network reconstruction from observational data." Electronic Thesis or Diss., Université Paris Cité, 2021. http://www.theses.fr/2021UNIP5207.
Full textThis thesis is focused on constraint-based methods, one of the basic types of causal structure learning algorithm. We use PC algorithm as a representative, for which we propose a simple and general modification that is applicable to any PC-derived methods. The modification ensures that all separating sets used during the skeleton reconstruction step to remove edges between conditionally independent variables remain consistent with respect to the final graph. It consists in iterating the structure learning algorithm while restricting the search of separating sets to those that are consistent with respect to the graph obtained at the end of the previous iteration. The restriction can be achieved with limited computational complexity with the help of block-cut tree decomposition of the graph skeleton. The enforcement of separating set consistency is found to increase the recall of constraint-based methods at the cost of precision, while keeping similar or better overall performance. It also improves the interpretability and explainability of the obtained graphical model. We then introduce the recently developed constraint-based method MIIC, which adopts ideas from the maximum likelihood framework to improve the robustness and overall performance of the obtained graph. We discuss the characteristics and the limitations of MIIC, and propose several modifications that emphasize the interpretability of the obtained graph and the scalability of the algorithm. In particular, we implement the iterative approach to enforce separating set consistency, and opt for a conservative rule of orientation, and exploit the orientation probability feature of MIIC to extend the edge notation in the final graph to illustrate different causal implications. The MIIC algorithm is applied to a dataset of about 400 000 breast cancer records from the SEER database, as a large-scale real-life benchmark
Pustułka-Hunt, Elżbieta Katarzyna. "Biological sequence indexing using persistent Java." Thesis, University of Glasgow, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.270957.
Full textLi, Yehua. "Topics in functional data analysis with biological applications." [College Station, Tex. : Texas A&M University, 2006. http://hdl.handle.net/1969.1/ETD-TAMU-1867.
Full textScholz, Matthias. "Approaches to analyse and interpret biological profile data." Phd thesis, [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=980988799.
Full textMathur, Sachin Dinakarpandian Deendayal. "Assessing biological significance of clusters of microarray data." Diss., UMK access, 2004.
Find full text"A thesis in computer science." Typescript. Advisor: Deendayal Dinakarpandian. Vita. Title from "catalog record" of the print edition Description based on contents viewed Feb. 27, 2006. Includes bibliographical references (leaves 35-36). Online version of the print edition.
Iacucci, Ernesto. "Ontological characterization of high through-put biological data." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=84102.
Full textAnastasiadis, Aristoklis. "Neural networks training and applications using biological data." Thesis, Birkbeck (University of London), 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.428055.
Full textYang, L. "Optimisation approaches for data mining in biological systems." Thesis, University College London (University of London), 2016. http://discovery.ucl.ac.uk/1473809/.
Full textShrestha, Anuj. "Association Rule Mining of Biological Field Data Sets." Thesis, North Dakota State University, 2017. https://hdl.handle.net/10365/28394.
Full textBioinformatics Seed Grant Program NIH/UND
National Science Foundation (NSF) Grant IIA-1355466
Chen, Li. "Searching for significant feature interaction from biological data." Diss., Online access via UMI:, 2007.
Find full textDang, Vinh Q. "Evolutionary approaches for feature selection in biological data." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2014. https://ro.ecu.edu.au/theses/1276.
Full textCausey, Jason L. "Studying Low Complexity Structures in Bioinformatics Data Analysis of Biological and Biomedical Data." Thesis, University of Arkansas at Little Rock, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10750808.
Full textBiological, biomedical, and radiological data tend to be large, complex, and noisy. Gene expression studies contain expression levels for thousands of genes and hundreds or thousands of patients. Chest Computed Tomography images used for diagnosing lung cancer consist of hundreds of 2-D image ”slices”, each containing hundreds of thousands of pixels. Beneath the size and apparent complexity of many of these data are simple and sparse structures. These low complexity structures can be leveraged into new approaches to biological, biomedical, and radiological data analyses. Two examples are presented here. First, a new framework SparRec (Sparse Recovery) for imputation of GWAS data, based on a matrix completion (MC) model taking advantage of the low-rank and low number of co-clusters of GWAS matrices. SparRec is flexible enough to impute meta-analyses with multiple cohorts genotyped on different sets of SNPs, even without a reference panel. Compared with Mendel-Impute, another MC method, our low-rank based method achieves similar accuracy and efficiency even with up to 90% missing data; our co-clustering based method has advantages in running time. MC methods are shown to have advantages over statistics-based methods, including Beagle and fastPhase. Second, we demonstrate NoduleX, a method for predicting lung nodule malignancy from chest Computed Tomography (CT) data, based on deep convolutional neural networks. For training and validation, we analyze >1000 lung nodules in images from the LIDC/IDRI cohort and compare our results with classifications provided by four experienced thoracic radiologists who participated in the LIDC project. NoduleX achieves high accuracy for nodule malignancy classification, with an AUC of up to 0.99, commensurate with the radiologists’ analysis. Whether they are leveraged directly or extracted using mathematical optimization and machine learning techniques, low complexity structures provide researchers with powerful tools for taming complex data.
Flöter, André. "Analyzing biological expression data based on decision tree induction." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=978444728.
Full textFlöter, André. "Analyzing biological expression data based on decision tree induction." Phd thesis, Universität Potsdam, 2005. http://opus.kobv.de/ubp/volltexte/2006/641/.
Full textModern biological analysis techniques supply scientists with various forms of data. One category of such data are the so called "expression data". These data indicate the quantities of biochemical compounds present in tissue samples.
Recently, expression data can be generated at a high speed. This leads in turn to amounts of data no longer analysable by classical statistical techniques. Systems biology is the new field that focuses on the modelling of this information.
At present, various methods are used for this purpose. One superordinate class of these methods is machine learning. Methods of this kind had, until recently, predominantly been used for classification and prediction tasks. This neglected a powerful secondary benefit: the ability to induce interpretable models.
Obtaining such models from data has become a key issue within Systems biology. Numerous approaches have been proposed and intensively discussed. This thesis focuses on the examination and exploitation of one basic technique: decision trees.
The concept of comparing sets of decision trees is developed. This method offers the possibility of identifying significant thresholds in continuous or discrete valued attributes through their corresponding set of decision trees. Finding significant thresholds in attributes is a means of identifying states in living organisms. Knowing about states is an invaluable clue to the understanding of dynamic processes in organisms. Applied to metabolite concentration data, the proposed method was able to identify states which were not found with conventional techniques for threshold extraction.
A second approach exploits the structure of sets of decision trees for the discovery of combinatorial dependencies between attributes. Previous work on this issue has focused either on expensive computational methods or the interpretation of single decision trees a very limited exploitation of the data. This has led to incomplete or unstable results. That is why a new method is developed that uses sets of decision trees to overcome these limitations.
Both the introduced methods are available as software tools. They can be applied consecutively or separately. That way they make up a package of analytical tools that usefully supplement existing methods.
By means of these tools, the newly introduced methods were able to confirm existing knowledge and to suggest interesting and new relationships between metabolites.
Neuere biologische Analysetechniken liefern Forschern verschiedenste Arten von Daten. Eine Art dieser Daten sind die so genannten "Expressionsdaten". Sie geben die Konzentrationen biochemischer Inhaltsstoffe in Gewebeproben an.
Neuerdings können Expressionsdaten sehr schnell erzeugt werden. Das führt wiederum zu so großen Datenmengen, dass sie nicht mehr mit klassischen statistischen Verfahren analysiert werden können. "System biology" ist eine neue Disziplin, die sich mit der Modellierung solcher Information befasst.
Zur Zeit werden dazu verschiedenste Methoden benutzt. Eine Superklasse dieser Methoden ist das maschinelle Lernen. Dieses wurde bis vor kurzem ausschließlich zum Klassifizieren und zum Vorhersagen genutzt. Dabei wurde eine wichtige zweite Eigenschaft vernachlässigt, nämlich die Möglichkeit zum Erlernen von interpretierbaren Modellen.
Die Erstellung solcher Modelle hat mittlerweile eine Schlüsselrolle in der "Systems biology" erlangt. Es sind bereits zahlreiche Methoden dazu vorgeschlagen und diskutiert worden. Die vorliegende Arbeit befasst sich mit der Untersuchung und Nutzung einer ganz grundlegenden Technik: den Entscheidungsbäumen.
Zunächst wird ein Konzept zum Vergleich von Baummengen entwickelt, welches das Erkennen bedeutsamer Schwellwerte in reellwertigen Daten anhand ihrer zugehörigen Entscheidungswälder ermöglicht. Das Erkennen solcher Schwellwerte dient dem Verständnis von dynamischen Abläufen in lebenden Organismen. Bei der Anwendung dieser Technik auf metabolische Konzentrationsdaten wurden bereits Zustände erkannt, die nicht mit herkömmlichen Techniken entdeckt werden konnten.
Ein zweiter Ansatz befasst sich mit der Auswertung der Struktur von Entscheidungswäldern zur Entdeckung von kombinatorischen Abhängigkeiten zwischen Attributen. Bisherige Arbeiten hierzu befassten sich vornehmlich mit rechenintensiven Verfahren oder mit einzelnen Entscheidungsbäumen, eine sehr eingeschränkte Ausbeutung der Daten. Das führte dann entweder zu unvollständigen oder instabilen Ergebnissen. Darum wird hier eine Methode entwickelt, die Mengen von Entscheidungsbäumen nutzt, um diese Beschränkungen zu überwinden.
Beide vorgestellten Verfahren gibt es als Werkzeuge für den Computer, die entweder hintereinander oder einzeln verwendet werden können. Auf diese Weise stellen sie eine sinnvolle Ergänzung zu vorhandenen Analyswerkzeugen dar.
Mit Hilfe der bereitgestellten Software war es möglich, bekanntes Wissen zu bestätigen und interessante neue Zusammenhänge im Stoffwechsel von Pflanzen aufzuzeigen.
Kogelnik, Andreas Matthias. "Biological information management with application to human genome data." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/15923.
Full textYu, Yun William. "Compressive algorithms for search and storage in biological data." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/112879.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 187-197).
Disparate biological datasets often exhibit similar well-defined structure; efficient algorithms can be designed to exploit this structure. In this doctoral thesis, we present a framework for similarity search based on entropy and fractal dimension; here, we prove that a clustered search algorithm scales in time with metric entropy number of covering hyperspheres-if the fractal dimension is low. Using these ideas, entropy-scaling versions of standard bioinformatics search tools can be designed, including for small-molecule, metagenomics, and protein structure search. This 'compressive acceleration' approach taking advantage of redundancy and sparsity in biological data can be leveraged also for next-generation sequencing (NGS) read mapping. By pairing together a clustered grouping over similar reads and a homology table for similarities in the human genome, our CORA framework can accelerate all-mapping by several orders of magnitude. Additionally, we also present work on filtering empirical base-calling quality scores from Next Generation Sequencing data. By using the sparsity of k-mers of sufficient length in the human genome and imposing a human prior through the use of frequent k-mers in a large corpus of human DNA reads, we are able to quickly discard over 90% of the information found in those quality scores while retaining or even improving downstream variant-calling accuracy. This filtering step allows for fast lossy compression of quality scores.
by Yun William Yu.
Ph. D.
Chen, Li. "Integrative Modeling and Analysis of High-throughput Biological Data." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30192.
Full textPh. D.
Ha, Sook Shin. "Dimensionality Reduction, Feature Selection and Visualization of Biological Data." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/77169.
Full textPh. D.
Su, Wei. "Motif Mining On Structured And Semi-structured Biological Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1365089538.
Full textGuo, Junhai. "Statistical significance and biological relevance of microarray data clustering." Cincinnati, Ohio : University of Cincinnati, 2008. http://rave.ohiolink.edu/etdc/view.cgi?acc_num=ucin1204736862.
Full textAli, S. (Syed). "Employing VLC technology for transmitting data in biological tissue." Master's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201905141758.
Full textde, Vito Roberta. "Multi-study factor models for high-dimensional biological data." Doctoral thesis, Università degli studi di Padova, 2016. http://hdl.handle.net/11577/3424398.
Full textLe analisi scientifiche su un alto numero di campioni (high-throughput assays) stanno trasformando gli studi biologici. In particolare gli high-throughput assays generano una ricca, complessa e varia collezione di dati a più dimensioni. Estrarre informazioni significative in maniera sistematica da questo tipo di dati richiede un processo progressivo che si basa sull’analisi simultanea di risorse, studi e tecnologie differenti. La crescente disponibilità di numerosi studi clinici su rilevanti gruppi, popolazioni e diversi studi genetici genera due categorie: la prima, una categoria relativa ai fattori condivisi da tutti gli studi ed una seconda, relativa a fattori specifici di ogni studio. Per catturare queste due differenti categorie abbiamo proposto, nell'ambito di tale tesi, una nuova classe di modellizzazione di analisi fattoriale che abbiamo sviluppato in un approccio sia frequentista che Bayesiano. Nell'approccio frequentista, è stato proposto un algoritmo ECM per la stima di massima verosimiglianza dei parametri. Inoltre, in questa tesi, si è proposto un approccio Bayesiano per adattare questo modello ad un contesto di più variabili che soggetti, p>n. Nel modellizzare la dipendenza tra variabili, si è assunta una struttura sparsa per sottolineare le associazioni tra i geni. Entrambi i metodi hanno consentito di modellizzare i diversi studi. Inoltre, i risultati hanno permesso di poter identificare un segnale biologico riproducibile e comune in tutti gli studi, nonché ad eliminare quella parte di varianza che oscura questo segnale.
You, Chang Hun. "Learning patterns in dynamic graphs with application to biological networks." Pullman, Wash. : Washington State University, 2009. http://www.dissertations.wsu.edu/Dissertations/Summer2009/c_you_072309.pdf.
Full textTitle from PDF title page (viewed on Aug. 19, 2009). "School of Electrical Engineering and Computer Science." Includes bibliographical references (p. 114-117).
Rodriguez, Palacios Miguel Andres. "Reversed Voodoo Dolls: An exploration of physical visualizations of biological data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175796.
Full textFysiska visualiseringar är artefakter som materialiserar abstrakt data. Genom att använda sig av mänskliga naturliga förmågor interagerar de med information i den fysiska världen. Dessa visualiseringar skapar möjligheter för appliceringar inom nya tillämpningsområden. För att undersöka om fysiska visualiseringar kan stödja fjärrövervakning av biologisk data introducerades en sond i form av en omvänd voodoodocka. Med en människolik figur representerar denna sond en verklig person. På så sätt utnyttjar den naturliga associationer till mänskliga egenskaper och omvänder konceptet vodoodockor på ett lekfullt sätt. De fysiska visualiseringarna av biologisk data testas ur ett säkerhetsperspektiv. Två värden, hjärtfrekvens och rörelse, mäts från en människokropp för att göra det möjligt att övervaka en persons tillstånd på distans. Under studien observeras sex användare då de interagerar med sonden. Studien visar hur användarna tolkar sondens data och hur användningen varierar med avseende på sondens olika modaliteter. Resultaten från denna studie tyder på att datamappningen till sondens kroppsdelar effektivt ökade förståelsen. Dessutom bekräftar resultaten att användning av flera modaliteter i fysiska visualiseringar gör det möjligt att presentera information, anpassat till olika situationer i den verkliga världen. Till vilken grad voodoodockan ger en känsla av kroppslighet samt konsekvenser av de valda metaforerna diskuteras. I slutsatsen hävdas att användarnas svar och tolkningar tyder på att den omvända voodoodockan fungerade som ett medel för att övervaka biologisk data.
BONOMO, Mariella. "Knowledge Extraction from Biological and Social Graphs." Doctoral thesis, Università degli Studi di Palermo, 2022. https://hdl.handle.net/10447/576508.
Full textGarratt, Jane Annabel. "Morphological data from coccolith images using Fourier power spectra." Thesis, Kingston University, 1992. http://eprints.kingston.ac.uk/20749/.
Full textZandegiacomo, Cella Alice. "Multiplex network analysis with application to biological high-throughput data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10495/.
Full text