Дисертації з теми "High Throughput Phenotypic Data"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "High Throughput Phenotypic Data".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Yu, Haipeng. "Designing and modeling high-throughput phenotyping data in quantitative genetics." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/97579.
Повний текст джерелаDoctor of Philosophy
Quantitative genetics aims to bridge the genome to phenome gap. With the advent of genotyping technologies, the genomic information of individuals can be included in a quantitative genetic model. A new challenge is to obtain sufficient and accurate phenotypes in an automated fashion with less human labor and reduced costs. The high-throughput phenotyping (HTP) technologies have emerged recently, opening a new opportunity to address this challenge. However, there is a paucity of research in phenotyping design and modeling high-dimensional HTP data. The main themes of this dissertation are 1) genomic connectedness that could potentially be used as a means to design a phenotyping experiment and 2) a novel statistical approach that aims to handle high-dimensional HTP data. In the first three studies, I first compared genomic connectedness with pedigree-based connectedness. This was followed by investigating the relationship between genomic connectedness and prediction accuracy derived from cross-validation. Additionally, I developed a connectedness R package that implements a variety of connectedness measures. The fourth study investigated a novel statistical approach by leveraging the combination of dimension reduction and graphical models to understand the interrelationships among high-dimensional HTP data.
Manrique, Tito. "Functional linear regression models : application to high-throughput plant phenotyping functional data." Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT264/document.
Повний текст джерелаFunctional data analysis (FDA) is a statistical branch that is increasingly being used in many applied scientific fields such as biological experimentation, finance, physics, etc. A reason for this is the use of new data collection technologies that increase the number of observations during a time interval.Functional datasets are realization samples of some random functions which are measurable functions defined on some probability space with values in an infinite dimensional functional space.There are many questions that FDA studies, among which functional linear regression is one of the most studied, both in applications and in methodological development.The objective of this thesis is the study of functional linear regression models when both the covariate X and the response Y are random functions and both of them are time-dependent. In particular we want to address the question of how the history of a random function X influences the current value of another random function Y at any given time t.In order to do this we are mainly interested in three models: the functional concurrent model (FCCM), the functional convolution model (FCVM) and the historical functional linear model. In particular for the FCVM and FCCM we have proposed estimators which are consistent, robust and which are faster to compute compared to others already proposed in the literature.Our estimation method in the FCCM extends the Ridge Regression method developed in the classical linear case to the functional data framework. We prove the probability convergence of this estimator, obtain a rate of convergence and develop an optimal selection procedure of theregularization parameter.The FCVM allows to study the influence of the history of X on Y in a simple way through the convolution. In this case we use the continuous Fourier transform operator to define an estimator of the functional coefficient. This operator transforms the convolution model into a FCCM associated in the frequency domain. The consistency and rate of convergence of the estimator are derived from the FCCM.The FCVM can be generalized to the historical functional linear model, which is itself a particular case of the fully functional linear model. Thanks to this we have used the Karhunen–Loève estimator of the historical kernel. The related question about the estimation of the covariance operator of the noise in the fully functional linear model is also treated.Finally we use all the aforementioned models to study the interaction between Vapour Pressure Deficit (VPD) and Leaf Elongation Rate (LER) curves. This kind of data is obtained with high-throughput plant phenotyping platform and is well suited to be studied with FDA methods
Paszkowski-Rogacz, Maciej. "Integration and analysis of phenotypic data from functional screens." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-63063.
Повний текст джерелаXue, Zeyun. "Integration of high-throughput phenotyping and genomics data to explore Arabidopsis natural variation." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASB001.
Повний текст джерелаNitrogen and water are crucial for plant survival as well as for crop yield, however the molecular mechanisms that plants mobilise to respond to Nitrogen (N) or Water (W) deficiency and their combination still remain partly unknown. The interconnections between water status and N availability have drawn much attention. Given their critical importance, it is of great importance to dissect the role of each stress in the combined stress. We here address the question of how mild drought and nitrogen stress responses are integrated and how they impaired rosette growth and plant metabolism. In this thesis, a systematic investigation was performed to understand how the N deficiency and drought conjugate to shape dynamic rosette growth in Arabidopsis. We integrated transcriptome and metabolomic data to draw a holistic view of drought x N-deficiency interactions. Moreover, as a case study, 5 highly divergent accessions were used to investigate how genetic components regulate stress responses, in other words, GxWxN interactions. Evaluation of drought, N deficiency and combined stress transcriptomes and metabolomes revealed shared and stress-specific response signatures that were conserved primarily across genotypes, although many more genotype-specific responses also were uncovered. The accession-specific transcriptome adjustments and metabolic profile reflected distinct physiological basal status, such as those of Col-0 and Tsu-0. We also found a subset of stress-responsive genes that are responsible for fine-tuning combined stress response, such as ROXYs, TAR4, NRT2.5, GLN1;4. In addition, we integrated transcriptomic and metabolomic data to construct a multi-omics regulatory network. Two drought stress-responsive metabolites, Raffinose and Myoinositol were highlighted by integrative analysis showing shared N-deficiency patterns in 5 accessions. This study provides molecular resolution of genetic variation in combined stress responses involving interactions between N-deficiency and drought stress and illustrates respective transcriptome and metabolome plasticity. Moreover, large-scale GWA analysis using worldwide populations was conducted to decipher the genetic architecture at the metabolic level and provide links between the metabolomic plasticity and phenotypic diversity behind local adaptation. In addition, this extends our vision of the diversity at the species scale. The comparison of GWA analysis based on regional-scale population and species-wide population also sheds light on how population structure can limit the detection power of GWA analysis
Mack, Jennifer [Verfasser]. "Constraint-based automated reconstruction of grape bunches from 3D range data for high-throughput phenotyping / Jennifer Mack." Bonn : Universitäts- und Landesbibliothek Bonn, 2019. http://d-nb.info/1200020081/34.
Повний текст джерелаMervin, Lewis. "Improved in silico methods for target deconvolution in phenotypic screens." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/283004.
Повний текст джерелаRoguski, Łukasz 1987. "High-throughput sequencing data compression." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/565775.
Повний текст джерелаGràcies als avenços en el camp de les tecnologies de seqüenciació, en els darrers anys la recerca biomèdica ha viscut una revolució, que ha tingut com un dels resultats l'explosió del volum de dades genòmiques generades arreu del món. La mida típica de les dades de seqüenciació generades en experiments d'escala mitjana acostuma a situar-se en un rang entre deu i cent gigabytes, que s'emmagatzemen en diversos arxius en diferents formats produïts en cada experiment. Els formats estàndards actuals de facto de representació de dades genòmiques són en format textual. Per raons pràctiques, les dades necessiten ser emmagatzemades en format comprimit. En la majoria dels casos, aquests mètodes de compressió es basen en compressors de text de caràcter general, com ara gzip. Amb tot, no permeten explotar els models d'informació especifícs de dades de seqüenciació. És per això que proporcionen funcionalitats limitades i estalvi insuficient d'espai d'emmagatzematge. Això explica per què operacions relativament bàsiques, com ara el processament, l'emmagatzematge i la transferència de dades genòmiques, s'han convertit en un dels principals obstacles de processos actuals d'anàlisi. Per tot això, aquesta tesi se centra en mètodes d'emmagatzematge i compressió eficients de dades generades en experiments de sequenciació. En primer lloc, proposem un compressor innovador d'arxius FASTQ de propòsit general. A diferència de gzip, aquest compressor permet reduir de manera significativa la mida de l'arxiu resultant del procés de compressió. A més a més, aquesta eina permet processar les dades a una velocitat alta. A continuació, presentem mètodes de compressió que fan ús de l'alta redundància de seqüències present en les dades de seqüenciació. Aquests mètodes obtenen la millor ratio de compressió d'entre els compressors FASTQ del marc teòric actual, sense fer ús de cap referència externa. També mostrem aproximacions de compressió amb pèrdua per emmagatzemar dades de seqüenciació auxiliars, que permeten reduir encara més la mida de les dades. En últim lloc, aportem un sistema flexible de compressió i un format de dades. Aquest sistema fa possible generar de manera semi-automàtica solucions de compressió que no estan lligades a cap mena de format específic d'arxius de dades genòmiques. Per tal de facilitar la gestió complexa de dades, diversos conjunts de dades amb formats heterogenis poden ser emmagatzemats en contenidors configurables amb l'opció de dur a terme consultes personalitzades sobre les dades emmagatzemades. A més a més, exposem que les solucions simples basades en el nostre sistema poden obtenir resultats comparables als compressors de format específic de l'estat de l'art. En resum, les solucions desenvolupades i descrites en aquesta tesi poden ser incorporades amb facilitat en processos d'anàlisi de dades genòmiques. Si prenem aquestes solucions conjuntament, aporten una base sòlida per al desenvolupament d'aproximacions completes encaminades a l'emmagatzematge i gestió eficient de dades genòmiques.
Prinz, zu Salm-Horstmar Maximilian Philipp Albrecht. "The Chromosome 8p23.1 Inversion : High-Throughput Detection & Investigation of Phenotypic Impact." Thesis, Imperial College London, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.516479.
Повний текст джерелаJin, Shuangshuang. "Integrated data modeling in high-throughput proteomices." Online access for everyone, 2007. http://www.dissertations.wsu.edu/Dissertations/Fall2007/S_Jin_111907.pdf.
Повний текст джерелаCapparuccini, Maria. "Inferential Methods for High-Throughput Methylation Data." VCU Scholars Compass, 2010. http://scholarscompass.vcu.edu/etd/156.
Повний текст джерелаDurif, Ghislain. "Multivariate analysis of high-throughput sequencing data." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1334/document.
Повний текст джерелаThe statistical analysis of Next-Generation Sequencing data raises many computational challenges regarding modeling and inference, especially because of the high dimensionality of genomic data. The research work in this manuscript concerns hybrid dimension reduction methods that rely on both compression (representation of the data into a lower dimensional space) and variable selection. Developments are made concerning: the sparse Partial Least Squares (PLS) regression framework for supervised classification, and the sparse matrix factorization framework for unsupervised exploration. In both situations, our main purpose will be to focus on the reconstruction and visualization of the data. First, we will present a new sparse PLS approach, based on an adaptive sparsity-inducing penalty, that is suitable for logistic regression to predict the label of a discrete outcome. For instance, such a method will be used for prediction (fate of patients or specific type of unidentified single cells) based on gene expression profiles. The main issue in such framework is to account for the response to discard irrelevant variables. We will highlight the direct link between the derivation of the algorithms and the reliability of the results. Then, motivated by questions regarding single-cell data analysis, we propose a flexible model-based approach for the factorization of count matrices, that accounts for over-dispersion as well as zero-inflation (both characteristic of single-cell data), for which we derive an estimation procedure based on variational inference. In this scheme, we consider probabilistic variable selection based on a spike-and-slab model suitable for count data. The interest of our procedure for data reconstruction, visualization and clustering will be illustrated by simulation experiments and by preliminary results on single-cell data analysis. All proposed methods were implemented into two R-packages "plsgenomics" and "CMF" based on high performance computing
Zhang, Xuekui. "Mixture models for analysing high throughput sequencing data." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/35982.
Повний текст джерелаHoffmann, Steve. "Genome Informatics for High-Throughput Sequencing Data Analysis." Doctoral thesis, Universitätsbibliothek Leipzig, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-152643.
Повний текст джерелаDiese Arbeit stellt drei verschiedene algorithmische und statistische Strategien für die Analyse von Hochdurchsatz-Sequenzierungsdaten vor. Zuerst führen wir eine auf enhanced Suffixarrays basierende heuristische Methode ein, die kurze Sequenzen mit grossen Genomen aligniert. Die Methode basiert auf der Idee einer fehlertoleranten Traversierung eines Suffixarrays für Referenzgenome in Verbindung mit dem Konzept der Matching-Statistik von Chang und einem auf Bitvektoren basierenden Alignmentalgorithmus von Myers. Die vorgestellte Methode unterstützt Paired-End und Mate-Pair Alignments, bietet Methoden zur Erkennung von Primersequenzen und zum trimmen von Poly-A-Signalen an. Auch in unabhängigen Benchmarks zeichnet sich das Verfahren durch hohe Sensitivität und Spezifität in simulierten und realen Datensätzen aus. Für eine große Anzahl von Sequenzierungsprotokollen erzielt es bessere Ergebnisse als andere bekannte Short-Read Alignmentprogramme. Zweitens stellen wir einen auf dynamischer Programmierung basierenden Algorithmus für das spliced alignment problem vor. Der Vorteil dieses Algorithmus ist seine Fähigkeit, nicht nur kollineare Spleiß- Ereignisse, d.h. Spleiß-Ereignisse auf dem gleichen genomischen Strang, sondern auch zirkuläre und andere nicht-kollineare Spleiß-Ereignisse zu identifizieren. Das Verfahren zeichnet sich durch eine hohe Genauigkeit aus: während es bei der Erkennung kollinearer Spleiß-Varianten vergleichbare Ergebnisse mit anderen Methoden erzielt, schlägt es die Wettbewerber mit Blick auf Sensitivität und Spezifität bei der Vorhersage nicht-kollinearer Spleißvarianten. Die Anwendung dieses Algorithmus führte zur Identifikation neuer Isoformen. In unserer Publikation berichten wir über eine neue Isoform des Tumorsuppressorgens p53. Da dieses Gen eines der am besten untersuchten Gene des menschlichen Genoms ist, könnte die Anwendung unseres Algorithmus helfen, eine Vielzahl weiterer Isoformen bei weniger prominenten Genen zu identifizieren. Drittens stellen wir ein datenadaptives Modell zur Identifikation von Single Nucleotide Variations (SNVs) vor. In unserer Arbeit zeigen wir, dass sich unser auf empirischen log-likelihoods basierendes Modell automatisch an die Qualität der Sequenzierungsexperimente anpasst und eine \"Entscheidung\" darüber trifft, welche potentiellen Variationen als SNVs zu klassifizieren sind. In unseren Simulationen ist diese Methode auf Augenhöhe mit aktuell eingesetzten Verfahren. Schließlich stellen wir eine Auswahl biologischer Ergebnisse vor, die mit den Besonderheiten der präsentierten Alignmentverfahren in Zusammenhang stehen
Stromberg, Michael Peter. "Enabling high-throughput sequencing data analysis with MOSAIK." Thesis, Boston College, 2010. http://hdl.handle.net/2345/1332.
Повний текст джерелаDuring the last few years, numerous new sequencing technologies have emerged that require tools that can process large amounts of read data quickly and accurately. Regardless of the downstream methods used, reference-guided aligners are at the heart of all next-generation analysis studies. I have developed a general reference-guided aligner, MOSAIK, to support all current sequencing technologies (Roche 454, Illumina, Applied Biosystems SOLiD, Helicos, and Sanger capillary). The calibrated alignment qualities calculated by MOSAIK allow the user to fine-tune the alignment accuracy for a given study. MOSAIK is a highly configurable and easy-to-use suite of alignment tools that is used in hundreds of labs worldwide. MOSAIK is an integral part of our genetic variant discovery pipeline. From SNP and short-INDEL discovery to structural variation discovery, alignment accuracy is an essential requirement and enables our downstream analyses to provide accurate calls. In this thesis, I present three major studies that were formative during the development of MOSAIK and our analysis pipeline. In addition, I present a novel algorithm that identifies mobile element insertions (non-LTR retrotransposons) in the human genome using split-read alignments in MOSAIK. This algorithm has a low false discovery rate (4.4 %) and enabled our group to be the first to determine the number of mobile elements that differentially occur between any two individuals
Thesis (PhD) — Boston College, 2010
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Biology
Xing, Zhengrong. "Poisson multiscale methods for high-throughput sequencing data." Thesis, The University of Chicago, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10195268.
Повний текст джерелаIn this dissertation, we focus on the problem of analyzing data from high-throughput sequencing experiments. With the emergence of more capable hardware and more efficient software, these sequencing data provide information at an unprecedented resolution. However, statistical methods developed for such data rarely tackle the data at such high resolutions, and often make approximations that only hold under certain conditions.
We propose a model-based approach to dealing with such data, starting from a single sample. By taking into account the inherent structure present in such data, our model can accurately capture important genomic regions. We also present the model in such a way that makes it easily extensible to more complicated and biologically interesting scenarios.
Building upon the single-sample model, we then turn to the statistical question of detecting differences between multiple samples. Such questions often arise in the context of expression data, where much emphasis has been put on the problem of detecting differential expression between two groups. By extending the framework for a single sample to incorporate additional group covariates, our model provides a systematic approach to estimating and testing for such differences. We then apply our method to several empirical datasets, and discuss the potential for further applications to other biological tasks.
We also seek to address a different statistical question, where the goal here is to perform exploratory analysis to uncover hidden structure within the data. We incorporate the single-sample framework into a commonly used clustering scheme, and show that our enhanced clustering approach is superior to the original clustering approach in many ways. We then apply our clustering method to a few empirical datasets and discuss our findings.
Finally, we apply the shrinkage procedure used within the single-sample model to tackle a completely different statistical issue: nonparametric regression with heteroskedastic Gaussian noise. We propose an algorithm that accurately recovers both the mean and variance functions given a single set of observations, and demonstrate its advantages over state-of-the art methods through extensive simulation studies.
Wang, Yuanyuan (Marcia). "Statistical Methods for High Throughput Screening Drug Discovery Data." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/1204.
Повний текст джерелаClassification methods are commonly proposed as solutions to this problem. However, regarding drug discovery, researchers are more interested in ranking compounds by predicted activity than in the classification itself. This feature makes my approach distinct from common classification techniques.
In this thesis, two AIDS data sets from the National Cancer Institute (NCI) are mainly used. Local methods, namely K-nearest neighbours (KNN) and classification and regression trees (CART), perform very well on these data in comparison with linear/logistic regression, neural networks, and Multivariate Adaptive Regression Splines (MARS) models, which assume more smoothness. One reason for the superiority of local methods is the local behaviour of the data. Indeed, I argue that conventional classification criteria such as misclassification rate or deviance tend to select too small a tree or too large a value of k (the number of nearest neighbours). A more local model (bigger tree or smaller k) gives a better performance in terms of drug discovery.
Because off-the-shelf KNN works relatively well, this thesis takes this promising method and makes several novel modifications, which further improve its performance. The choice of k is optimized for each test point to be predicted. The empirically observed superiority of allowing k to vary is investigated. The nature of the problem, ranking of objects rather than estimating the probability of activity, enables the k-varying algorithm to stand out. Similarly, KNN combined with a kernel weight function (weighted KNN) is proposed and demonstrated to be superior to the regular KNN method.
High dimensionality of the explanatory variables is known to cause problems for KNN and many other classifiers. I propose a novel method (subset KNN) of averaging across multiple classifiers based on building classifiers on subspaces (subsets of variables). It improves the performance of KNN for HTS data. When applied to CART, it also performs as well as or even better than the popular methods of bagging and boosting. Part of this improvement is due to the discovery that classifiers based on irrelevant subspaces (unimportant explanatory variables) do little damage when averaged with good classifiers based on relevant subspaces (important variables). This result is particular to the ranking of objects rather than estimating the probability of activity. A theoretical justification is proposed. The thesis also suggests diagnostics for identifying important subsets of variables and hence further reducing the impact of the curse of dimensionality.
In order to have a broader evaluation of these methods, subset KNN and weighted KNN are applied to three other data sets: the NCI AIDS data with Constitutional descriptors, Mutagenicity data with BCUT descriptors and Mutagenicity data with Constitutional descriptors. The k-varying algorithm as a method for unbalanced data is also applied to NCI AIDS data with Constitutional descriptors. As a baseline, the performance of KNN on such data sets is reported. Although different methods are best for the different data sets, some of the proposed methods are always amongst the best.
Finally, methods are described for estimating activity rates and error rates in HTS data. By combining auxiliary information about repeat tests of the same compound, likelihood methods can extract interesting information about the magnitudes of the measurement errors made in the assay process. These estimates can be used to assess model performance, which sheds new light on how various models handle the large random or systematic assay errors often present in HTS data.
Yang, Yang. "Data mining support for high-throughput discovery of nanomaterials." Thesis, University of Leeds, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.577527.
Повний текст джерелаBirchall, Kristian. "Reduced graph approaches to analysing high-throughput screening data." Thesis, University of Sheffield, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.443869.
Повний текст джерелаFritz, Markus Hsi-Yang. "Exploiting high throughput DNA sequencing data for genomic analysis." Thesis, University of Cambridge, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.610819.
Повний текст джерелаWoolford, Julie Ruth. "Statistical analysis of small RNA high-throughput sequencing data." Thesis, University of Cambridge, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.610375.
Повний текст джерелаChen, Li. "Integrative Modeling and Analysis of High-throughput Biological Data." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30192.
Повний текст джерелаPh. D.
Kannan, Anusha Aiyalu. "Detecting relevant changes in high throughput gene expression data /." Online version of thesis, 2008. http://hdl.handle.net/1850/10832.
Повний текст джерелаShan, Jing. "High-Throughput Identification of Molecular Factors that Promote Phenotypic Stabilization of Primary Human Hepatocytes in vitro." Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:27007729.
Повний текст джерелаŠupraha, Luka. "Phenotypic evolution and adaptive strategies in marine phytoplankton (Coccolithophores)." Doctoral thesis, Uppsala universitet, Paleobiologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-302903.
Повний текст джерелаLu, Feng. "Big data scalability for high throughput processing and analysis of vehicle engineering data." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-207084.
Повний текст джерелаZandegiacomo, Cella Alice. "Multiplex network analysis with application to biological high-throughput data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10495/.
Повний текст джерелаBleuler, Stefan. "Search heuristics for module identification from biological high-throughput data /." [S.l.] : [s.n.], 2008. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=17386.
Повний текст джерелаKircher, Martin. "Understanding and improving high-throughput sequencing data production and analysis." Doctoral thesis, Universitätsbibliothek Leipzig, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-71102.
Повний текст джерелаMohamadi, Hamid. "Parallel algorithms and software tools for high-throughput sequencing data." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/62072.
Повний текст джерелаScience, Faculty of
Graduate
Gustafsson, Mika. "Gene networks from high-throughput data : Reverse engineering and analysis." Doctoral thesis, Linköpings universitet, Kommunikations- och transportsystem, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54089.
Повний текст джерелаCunningham, Gordon John. "Application of cluster analysis to high-throughput multiple data types." Thesis, University of Glasgow, 2011. http://theses.gla.ac.uk/2715/.
Повний текст джерелаAinsworth, David. "Computational approaches for metagenomic analysis of high-throughput sequencing data." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/44070.
Повний текст джерелаLasher, Christopher Donald. "Discovering contextual connections between biological processes using high-throughput data." Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/77217.
Повний текст джерелаPh. D.
Doineau, Raphaël. "Development of droplet-based microfluidic technology for high-throughput single-cell phenotypic screening of B cell repertoires." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCC263/document.
Повний текст джерелаThe adaptive immune system plays a leading role in defense against infection. The humoral response, involving the production of antibodies, is an important component of the adaptive immune response. During an infection, specific B cells of the immune system proliferate and release large amounts of antibodies which bind selectively to the target protein (antigen) found on the invading pathogen, inducing destruction of the pathogen. However, the immune system does not always respond efficiently enough to destroy pathogens, and tolerance mechanisms prevent the generation of antibodies against human protein - such as cell surface markers on cancer cells or cytokines involved in inflammatory and autoimmune disease - that could be important therapeutic targets. Hence, there is great interest in research and development of specific antibodies that can be used for immunotherapy of patients. Due to their high affinity and selective binding to antigens, monoclonal antibodies (mAbs) have emerged as powerful therapeutic agents. Monoclonal antibodies derived from single B cells have a unique sequence and display binding affinity for a specific antigen. However, until now, the discovery of mAbs has been limited by the lack of high-throughput methods for the direct and large-scale screening of non-immortalized primary B cells to uncover rare B cells which produce the specific antibodies of clinical interest. This is now becoming possible with the emergence and improvement of in vitro compartmentalization methods for single-cell encapsulation and screening in picoliter droplets. In my PhD project, I describe the development of binding immunoassays and microfluidic devices for the direct phenotypic screening of single-cells from enriched B cell populations. This development has enabled detailed analysis of the humoral immune response, with single-cell resolution and is an essential component of an antibody-discovery pipeline coupling single-cell phenotypic screening to single-cell antibody sequencing. It is now possible, for the first time, to screen millions of single B cells based on the binding activity of the secreted antibodies and to recover the antibody sequences
Hänzelmann, Sonja 1981. "Pathway-centric approaches to the analysis of high-throughput genomics data." Doctoral thesis, Universitat Pompeu Fabra, 2012. http://hdl.handle.net/10803/108337.
Повний текст джерелаEn l'última dècada, la biologia molecular ha evolucionat des d'una perspectiva reduccionista cap a una perspectiva a nivell de sistemes que intenta desxifrar les complexes interaccions entre els components cel•lulars. Amb l'aparició de les tecnologies d'alt rendiment actualment és possible interrogar genomes sencers amb una resolució sense precedents. La dimensió i la naturalesa desestructurada d'aquestes dades ha posat de manifest la necessitat de desenvolupar noves eines i metodologies per a convertir aquestes dades en coneixement biològic. Per contribuir a aquest repte hem explotat l'abundància de dades genòmiques procedents d'instruments d'alt rendiment i disponibles públicament, i hem desenvolupat mètodes bioinformàtics focalitzats en l'extracció d'informació a nivell de via molecular en comptes de fer-ho al nivell individual de cada gen. En primer lloc, hem desenvolupat GSVA (Gene Set Variation Analysis), un mètode que facilita l'organització i la condensació de perfils d'expressió dels gens en conjunts. GSVA possibilita anàlisis posteriors en termes de vies moleculars amb dades d'expressió gènica provinents de microarrays i RNA-seq. Aquest mètode estima la variació de les vies moleculars a través d'una població de mostres i permet la integració de fonts heterogènies de dades biològiques amb mesures d'expressió a nivell de via molecular. Per il•lustrar les característiques de GSVA, l'hem aplicat a diversos casos usant diferents tipus de dades i adreçant qüestions biològiques. GSVA està disponible com a paquet de programari lliure per R dins el projecte Bioconductor. En segon lloc, hem desenvolupat una estratègia centrada en vies moleculars basada en el genoma per reposicionar fàrmacs per la diabetis tipus 2 (T2D). Aquesta estratègia consisteix en dues fases: primer es construeix una xarxa reguladora que s'utilitza per identificar mòduls de regulació gènica que condueixen a la malaltia; després, a partir d'aquests mòduls es busquen compostos que els podrien afectar. La nostra estratègia ve motivada per l'observació que els gens que provoquen una malaltia tendeixen a agrupar-se, formant mòduls patogènics, i pel fet que podria caldre una actuació simultània sobre múltiples gens per assolir un efecte en el fenotipus de la malaltia. Per trobar compostos potencials, hem usat dades genòmiques exposades a compostos dipositades en bases de dades públiques. Hem recollit unes 20.000 mostres que han estat exposades a uns 1.800 compostos. L'expressió gènica es pot interpretar com un fenotip intermedi que reflecteix les vies moleculars desregulades subjacents a una malaltia. Per tant, considerem que els gens d'un mòdul patològic que responen, a nivell transcripcional, d'una manera similar a l'exposició del medicament tenen potencialment un efecte terapèutic. Hem aplicat aquesta estratègia a dades d'expressió gènica en illots pancreàtics humans corresponents a individus sans i diabètics, i hem identificat quatre compostos potencials (methimazole, pantoprazole, extracte de taronja amarga i torcetrapib) que podrien tenir un efecte positiu sobre la secreció de la insulina. Aquest és el primer cop que una xarxa reguladora d'illots pancreàtics humans s'ha utilitzat per reposicionar compostos per a T2D. En conclusió, aquesta tesi aporta dos enfocaments diferents en termes de vies moleculars a problemes bioinformàtics importants, com ho son el contrast de la funció biològica i el reposicionament de fàrmacs "in silico". Aquestes contribucions demostren el paper central de les anàlisis basades en vies moleculars a l'hora d'interpretar dades genòmiques procedents d'instruments d'alt rendiment.
Swamy, Sajani. "The automation of glycopeptide discovery in high throughput MS/MS data." Thesis, Waterloo, Ont. : University of Waterloo, 2004. http://etd.uwaterloo.ca/etd/sswamy2004.pdf.
Повний текст джерела"A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Master of Mathematics in Computer Science." Includes bibliographical references.
Mammana, Alessandro [Verfasser]. "Patterns and algorithms in high-throughput sequencing count data / Alessandro Mammana." Berlin : Freie Universität Berlin, 2016. http://d-nb.info/1108270956/34.
Повний текст джерелаLove, Michael I. [Verfasser]. "Statistical analysis of high-throughput sequencing count data / Michael I. Love." Berlin : Freie Universität Berlin, 2013. http://d-nb.info/1043197842/34.
Повний текст джерелаKostadinov, Ivaylo [Verfasser]. "Marine Metagenomics: From high-throughput data to ecogenomic interpretation / Ivaylo Kostadinov." Bremen : IRC-Library, Information Resource Center der Jacobs University Bremen, 2012. http://d-nb.info/1035211564/34.
Повний текст джерелаBao, Suying, and 鲍素莹. "Deciphering the mechanisms of genetic disorders by high throughput genomic data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/196471.
Повний текст джерелаpublished_or_final_version
Biochemistry
Doctoral
Doctor of Philosophy
Wang, Dao Sen. "Conditional Differential Expression for Biomarker Discovery In High-throughput Cancer Data." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/38819.
Повний текст джерелаCao, Hongfei. "High-throughput Visual Knowledge Analysis and Retrieval in Big Data Ecosystems." Thesis, University of Missouri - Columbia, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13877134.
Повний текст джерелаVisual knowledge plays an important role in many highly skilled applications, such as medical diagnosis, geospatial image analysis and pathology diagnosis. Medical practitioners are able to interpret and reason about diagnostic images based on not only primitive-level image features such as color, texture, and spatial distribution but also their experience and tacit knowledge which are seldom articulated explicitly. This reasoning process is dynamic and closely related to real-time human cognition. Due to a lack of visual knowledge management and sharing tools, it is difficult to capture and transfer such tacit and hard-won expertise to novices. Moreover, many mission-critical applications require the ability to process such tacit visual knowledge in real time. Precisely how to index this visual knowledge computationally and systematically still poses a challenge to the computing community.
My dissertation research results in novel computational approaches for highthroughput visual knowledge analysis and retrieval from large-scale databases using latest technologies in big data ecosystems. To provide a better understanding of visual reasoning, human gaze patterns are qualitatively measured spatially and temporally to model observers’ cognitive process. These gaze patterns are then indexed in a NoSQL distributed database as a visual knowledge repository, which is accessed using various unique retrieval methods developed through this dissertation work. To provide meaningful retrievals in real time, deep-learning methods for automatic annotation of visual activities and streaming similarity comparisons are developed under a gaze-streaming framework using Apache Spark.
This research has several potential applications that offer a broader impact among the scientific community and in the practical world. First, the proposed framework can be adapted for different domains, such as fine arts, life sciences, etc. with minimal effort to capture human reasoning processes. Second, with its real-time visual knowledge search function, this framework can be used for training novices in the interpretation of domain images, by helping them learn experts’ reasoning processes. Third, by helping researchers to understand human visual reasoning, it may shed light on human semantics modeling. Finally, integrating reasoning process with multimedia data, future retrieval of media could embed human perceptual reasoning for database search beyond traditional content-based media retrievals.
Ballinger, Tracy J. "Analysis of genomic rearrangements in cancer from high throughput sequencing data." Thesis, University of California, Santa Cruz, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3729995.
Повний текст джерелаIn the last century cancer has become increasingly prevalent and is the second largest killer in the United States, estimated to afflict 1 in 4 people during their life. Despite our long history with cancer and our herculean efforts to thwart the disease, in many cases we still do not understand the underlying causes or have successful treatments. In my graduate work, I’ve developed two approaches to the study of cancer genomics and applied them to the whole genome sequencing data of cancer patients from The Cancer Genome Atlas (TCGA). In collaboration with Dr. Ewing, I built a pipeline to detect retrotransposon insertions from paired-end high-throughput sequencing data and found somatic retrotransposon insertions in a fifth of cancer patients.
My second novel contribution to the study of cancer genomics is the development of the CN-AVG pipeline, a method for reconstructing the evolutionary history of a single tumor by predicting the order of structural mutations such as deletions, duplications, and inversions. The CN-AVG theory was developed by Drs. Haussler, Zerbino, and Paten and samples potential evolutionary histories for a tumor using Markov Chain Monte Carlo sampling. I contributed to the development of this method by testing its accuracy and limitations on simulated evolutionary histories. I found that the ability to reconstruct a history decays exponentially with increased breakpoint reuse, but that we can estimate how accurately we reconstruct a mutation event using the likelihood scores of the events. I further designed novel techniques for the application of CN-AVG to whole genome sequencing data from actual patients and applied these techniques to search for evolutionary patterns in glioblastoma multiforme using sequencing data from TCGA. My results show patterns of two-hit deletions, as we would expect, and amplifications occurring over several mutational events. I also find that the CN-AVG method frequently makes use of whole chromosome copy number changes following by localized deletions, a bias that could be mitigated through modifying the cost function for an evolutionary history.
Yu, Guoqiang. "Machine Learning to Interrogate High-throughput Genomic Data: Theory and Applications." Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/28980.
Повний текст джерелаPh. D.
Zucker, Mark Raymond. "Inferring Clonal Heterogeneity in Chronic Lymphocytic Leukemia From High-Throughput Data." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1554049121307262.
Повний текст джерелаGuennel, Tobias. "Statistical Methods for Normalization and Analysis of High-Throughput Genomic Data." VCU Scholars Compass, 2012. http://scholarscompass.vcu.edu/etd/2647.
Повний текст джерелаFerber, Kyle L. "Methods for Predicting an Ordinal Response with High-Throughput Genomic Data." VCU Scholars Compass, 2016. http://scholarscompass.vcu.edu/etd/4585.
Повний текст джерелаGlaus, Peter. "Bayesian methods for gene expression analysis from high-throughput sequencing data." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/bayesian-methods-for-gene-expression-analysis-from-highthroughput-sequencing-data(cf9680e0-a3f2-4090-8535-a39f3ef50cc4).html.
Повний текст джерелаPaicu, Claudia. "miRNA detection and analysis from high-throughput small RNA sequencing data." Thesis, University of East Anglia, 2016. https://ueaeprints.uea.ac.uk/63738/.
Повний текст джерелаMalo, Nathalie. "Statistical contributions to data analysis for high-throughput screening of chemical compounds." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=102680.
Повний текст джерелаThe broad objectives of this research are to develop and evaluate robust and reliable statistical methods for both data preprocessing and statistical inference. This thesis is divided into three papers. The first manuscript is a critical review of the current practices in HTS data analysis. It includes several recommendations for improving sensitivity and specificity of screens. The second manuscript compares the performance of different robust preprocessing methods applied to replicated two-way data with respect to detection of outlying cells. The third manuscript evaluates some of the statistical methods described in the first manuscript with respect to their performance when applied to several empirical data sets.