Dissertations / Theses on the topic 'Biological Sequence Analysis'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Biological Sequence Analysis.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Yeats, Corin Anthony. "Biological investigations through sequence analysis." Thesis, University of Cambridge, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.614848.
Full textThompson, James. "Genetic algorithms applied to biological sequence analysis /." Link to online version, 2006. https://ritdml.rit.edu/dspace/handle/1850/2269.
Full textParbhane, R. V. "Analysis of DNA sequences: modeling sequence dependent features and their biological roles." Thesis(Ph.D.), CSIR-National Chemical Laboratory, Pune, 2000. http://dspace.ncl.res.in:8080/xmlui/handle/20.500.12252/2285.
Full textVerzotto, Davide. "Advanced Computational Methods for Massive Biological Sequence Analysis." Doctoral thesis, Università degli studi di Padova, 2011. http://hdl.handle.net/11577/3426282.
Full textCon l'avvento delle moderne tecnologie di sequenziamento, massive quantità di dati biologici, da sequenze proteiche fino a interi genomi, sono disponibili per la ricerca. Questo progresso richiede l'analisi e la classificazione automatica di tali collezioni di dati, al fine di migliorare la conoscenza nel campo delle Scienze della Vita. Nonostante finora siano stati proposti molti approcci per modellare matematicamente le sequenze biologiche, ad esempio cercando pattern e similarità tra sequenze genomiche o proteiche, questi metodi spesso mancano di strutture in grado di indirizzare specifiche questioni biologiche. In questa tesi, presentiamo nuovi metodi computazionali per tre problemi fondamentali della biologia molecolare: la scoperta di relazioni evolutive remote tra sequenze proteiche, l'individuazione di segnali biologici complessi in siti funzionali tra loro correlati, e la ricostruzione della filogenesi di un insieme di organismi, attraverso la comparazione di interi genomi. Il principale contributo è dato dall'analisi sistematica dei pattern che possono interessare questi problemi, portando alla progettazione di nuovi strumenti computazionali efficaci ed efficienti. Vengono introdotti così due paradigmi avanzati per la scoperta e il filtraggio di pattern, basati sull'osservazione che i motivi biologici funzionali, o pattern, sono localizzati in differenti regioni delle sequenze in esame. Questa osservazione consente di realizzare approcci parsimoniosi in grado di evitare un conteggio multiplo degli stessi pattern. Il primo paradigma considerato, ovvero irredundant common motifs, riguarda la scoperta di pattern comuni a coppie di sequenze che hanno occorrenze non coperte da altri pattern, la cui copertura è definita da una maggiore specificità e/o possibile estensione dei pattern. Il secondo paradigma, ovvero underlying motifs, riguarda il filtraggio di pattern che hanno occorrenze non sovrapposte a quelle di altri pattern con maggiore priorità, dove la priorità è definita da proprietà lessicografiche dei pattern al confine tra pattern matching e analisi statistica. Sono stati sviluppati tre metodi computazionali basati su questi paradigmi avanzati. I risultati sperimentali indicano che i nostri metodi sono in grado di identificare le principali similitudini tra sequenze biologiche, utilizzando l'informazione presente in maniera non ridondante. In particolare, impiegando gli irredundant common motifs e le statistiche basate su questi pattern risolviamo il problema della rilevazione di omologie remote tra proteine. I risultati evidenziano che il nostro approccio, chiamato Irredundant Class, ottiene ottime prestazioni su un benchmark impegnativo, e migliora i metodi allo stato dell'arte. Inoltre, per individuare segnali biologici complessi utilizziamo la nozione di underlying motifs, definendo così alcune modalità per il confronto e il filtraggio di motivi degenerati ottenuti tramite moderni strumenti di pattern discovery. Esperimenti su grandi famiglie proteiche dimostrano che il nostro metodo riduce drasticamente il numero di motivi che gli scienziati dovrebbero altrimenti ispezionare manualmente, mettendo in luce inoltre i motivi funzionali identificati in letteratura. Infine, combinando i due paradigmi proposti presentiamo una nuova e pratica funzione di distanza tra interi genomi. Con il nostro metodo, chiamato Unic Subword Approach, relazioniamo tra loro le diverse regioni di due sequenze genomiche, selezionando i motivi conservati durante l'evoluzione. I risultati sperimentali evidenziano che il nostro approccio offre migliori prestazioni rispetto ad altri metodi allo stato dell'arte nella ricostruzione della filogenesi di organismi quali virus, procarioti ed eucarioti unicellulari, identificando inoltre le sottoclassi principali di queste specie.
Margolin, Yelena 1977. "Analysis of sequence-selective guanine oxidation by biological agents." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/42381.
Full textVita.
Includes bibliographical references.
Oxidatively damaged DNA has been strongly associated with cancer, chronic degenerative diseases and aging. Guanine is the most frequently oxidized base in the DNA, and generation of a guanine radical cation (G'") as an intermediate in the oxidation reaction leads to migration of a resulting cationic hole through the DNA n-stack until it is trapped at the lowest-energy sites. These sites reside at runs of guanines, such as 5'-GG-3' sequences, and are characterized by the lowest sequence-specific ionization potentials (IPs). The charge transfer mechanism suggests that hotspots of oxidative DNA damage induced by electron transfer reagents can be predicted based on the primary DNA sequence. However, preliminary data indicated that nitrosoperoxycarbonate (ONOOCO2"), a mediator of chronic inflammation and a one-electron oxidant, displayed unusual guanine oxidation properties that were the focus of present work. As a first step in our study, we determined relative levels of guanine oxidation, induced by ONOOCO2 in all possible three-base sequence contexts (XGY) within double-stranded oligonucleotides. These levels were compared to the relative oxidation induced within the same guanines by photoactivated riboflavin, a one-electron reagent. We found that, in agreement with previous studies, photoactivated riboflavin was selective for guanines of lowest IPs located within 5'-GG-3' sequences. In contrast, ONOOCO2" preferentially reacted with guanines located within 5'-GC-3' sequences characterized by the highest IPs. This demonstrated that that sequence-specific IP was not a determinant of guanine reactivity with ONOOCO2". Sequence selectivities for both reagents were double-strand specific. Selectivity of ONOOCO2 for 5'-GC-3' sites was also observed in human genomic DNA after ligation-mediated PCR analysis.
(cont.) Relative yields of different guanine lesions produced by both ONOOCO2" and riboflavin varied 4- to 5-fold across all sequence contexts. To assess the role of solvent exposure in mediating guanine oxidation by ONOOCO2", relative reactivities of mismatched guanines with ONOOCO2" were measured. The majority of the mismatches displayed an increased reactivity with ONOOCO2 as compared to the fully matched G-C base-pairs. The extent of reactivity enhancement was sequence context-dependent, and the greatest levels of enhancement were observed for the conformationally flexible guanine- guanine (G-G) mismatches and for guanines located across from a synthetic abasic site. To test the hypothesis that the negative charge of an oxidant influences its reactivity with guanines in DNA, sequence-selective guanine oxidation by a negatively charged reagent, Fe+2-EDTA, was assessed and compared to guanine oxidation produced by a neutral oxidant, y-radiation. Because both of these agents cause high levels of deoxyribose oxidation, a general method to quantify sequence-specific nucleobase oxidation in the presence of direct strand breaks was developed. This method exploited activity of exonuclease III (Exo III), a 3' to 5' exonuclease, and utilized phosphorothioate-modified synthetic oligonucleotides that were resistant to Exo III activity. This method was employed to determine sequence-selective guanine oxidation by Fe+2-EDTA complex and y-radiation and to show that both agents produced identical guanine oxidation pattems and were equally reactive with all guanines, irrespective of their sequence-specific IPs or sequence context.
(cont.) This showed that negative charge was not a determinant of Fe+2-EDTA-mediated guanine oxidation. Finally, the role of oxidant binding on nucleobase damage was assessed by studying sequence-selective oxidation produced by DNA-bound Fe+2 ions in the presence of H202. We found that the major oxidation targets were thymines located within 5'-TGG-3' motifs, demonstrating that while guanines were a required element for coordination of Fe+2 to DNA, they were not oxidized. Our results suggest that factors other than sequence-specific IPs can act as major determinants of sequence-selective guanine oxidation, and that current models of guanine oxidation and charge transfer in DNA cannot be used to adequately predict the location and identity of mutagenic lesions in the genome.
by Yelena Margolin.
Ph.D.
Kim, Eagu. "Inverse Parametric Alignment for Accurate Biological Sequence Comparison." Diss., The University of Arizona, 2008. http://hdl.handle.net/10150/193664.
Full textBehr, Jonathan Robert. "Novel tools for sequence and epitope analysis of glycosaminoglycans." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/42383.
Full textIncludes bibliographical references.
Our understanding of glycosaminoglycan (GAG) biology has been limited by a lack of sensitive and efficient analytical tools designed to deal with these complex molecules. GAGs are heterogeneous and often sulfated linear polys accharides found throughout the extracellular environment, and available to researchers only in limited mixtures. A series of sensitive label-free analytical tools were developed to provide sequence information and to quantify whole epitopes from GAG mixtures. Three complementary sets of tools were developed to provide GAG sequence information. Two novel exolytic sulfatases from Flavobacterium heparinum that degrade heparan/heparan sulfate glycosaminoglycans (HSGAGs) were cloned and characterized. These exolytic enzymes enabled the exo-sequencing of a HSGAG oligosaccharide. Phenylboronic acids (PBAs) were specifically reacted with unsulfated chondroitin sulfate (CS) disaccharides from within a larger mixture. The resulting cyclic esters were easily detected in mass spectrometry (MS) using the distinct isotopic abundance of boron. Electrospray ionization tandem mass spectrometry (ESI-MSn) was employed to determine the fragmentation patterns of HSGAG disaccharides. These patterns were used to quantify relative amounts of isomeric disaccharides in a mixture. Fragmentation information is valuable for building methods for oligosaccharide sequencing, and the general method can be applied to quantify any isomers using MSn. Three other tools were developed to quantify GAG epitopes. Two microfluidic devices were characterized as HSGAG sensors. Sensors were functionalized either with protamine to quantify total HSGAGs or with antithrombin-III (AT-III) to quantify a specific anticoagulant epitope.
(cont.) A charge sensitive silicon field effect sensor accurately quantified clinically relevant anticoagulants including low molecular weight heparins (LMWH), even out of serum. A mass sensitive suspended microchannel resonator (SMR) measured the same clinically relevant HSGAGs. When these two sensors were compared, the SMR proved more robust and versatile. The SMR signal is more stable, it can be reused ad infinitum, and surface modifications can be automated and monitored. The field effect sensor provided an advantage in selectivity by preferentially detecting highly charged HSGAGs instead of any massive, non-specifically bound proteins. Lastly, anti-HSGAG single chain variable fragments (scFv) were evolved using yeast surface display towards generating antibodies for HSGAG epitope sensing and clinical GAG neutralization.
by Jonathan Robert Behr.
Ph.D.
Tångrot, Jeanette. "Structural Information and Hidden Markov Models for Biological Sequence Analysis." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1629.
Full textBioinformatik är ett område där datavetenskapliga och statistiska metoder används för att analysera och strukturera biologiska data. Ett viktigt område inom bioinformatiken försöker förutsäga vilken tredimensionell struktur och funktion ett protein har, utifrån dess aminosyrasekvens och/eller likheter med andra, redan karaktäriserade, proteiner. Det är känt att två proteiner med likande aminosyrasekvenser också har liknande tredimensionella strukturer. Att två proteiner har liknande strukturer behöver dock inte betyda att deras sekvenser är lika, vilket kan göra det svårt att hitta strukturella likheter utifrån ett proteins aminosyrasekvens. Den här avhandlingen beskriver två metoder för att hitta likheter mellan proteiner, den ena med fokus på att bestämma vilken familj av proteindomäner, med känd 3D-struktur, en given sekvens tillhör, medan den andra försöker förutsäga ett proteins veckning, d.v.s. ge en grov bild av proteinets struktur. Båda metoderna använder s.k. dolda Markov modeller (hidden Markov models, HMMer), en statistisk metod som bland annat kan användas för att beskriva proteinfamiljer. Med hjälp en HMM kan man förutsäga om en viss proteinsekvens tillhör den familj modellen representerar. Båda metoderna använder också strukturinformation för att öka modellernas förmåga att känna igen besläktade sekvenser, men på olika sätt. Det mesta av arbetet i avhandlingen handlar om strukturellt förankrade HMMer (structure-anchored HMMs, saHMMer). För att bygga saHMMerna används strukturbaserade sekvensöverlagringar, vilka genereras utifrån hur proteindomänerna kan läggas på varandra i rymden, snarare än utifrån vilka aminosyror som ingår i deras sekvenser. I varje proteinfamilj används bara ett särskilt, representativt urval av domäner. Dessa är valda så att då sekvenserna jämförs parvis, finns det inget par inom familjen med högre sekvensidentitet än ca 20%. Detta urval görs för att få så stor spridning som möjligt på sekvenserna inom familjen. En programvaruserie har utvecklats för att välja ut representanter för varje familj och sedan bygga saHMMer baserade på dessa. Det visar sig att saHMMerna kan hitta rätt familj till en hög andel av de testade sekvenserna, med nästan inga fel. De är också bättre än den ofta använda metoden Pfam på att hitta rätt familj till helt nya proteinsekvenser. saHMMerna finns tillgängliga genom FISH-servern, vilken alla kan använda via Internet för att hitta vilken familj ett intressant protein kan tillhöra. Den andra metoden som presenteras i avhandlingen är sekundärstruktur-HMMer, ssHMMer, vilka är byggda från vanliga multipla sekvensöverlagringar, men också från information om vilka sekundärstrukturer proteinsekvenserna i familjen har. När en proteinsekvens jämförs med ssHMMen används en förutsägelse om sekundärstrukturen, och den beräknade sannolikheten att sekvensen tillhör familjen kommer att baseras både på sekvensen av aminosyror och på sekundärstrukturen. Vid en jämförelse visar det sig att HMMer baserade på flera sekvenser är bättre än sådana baserade på endast en sekvens, när det gäller att hitta rätt veckning för en proteinsekvens. HMMerna blir ännu bättre om man också tar hänsyn till sekundärstrukturen, både då den riktiga sekundärstrukturen används och då man använder en teoretiskt förutsagd.
Jeanette Hargbo.
Won, Kyoung-Jae. "Exploring the structure of Hidden Markov Models for biological sequence analysis." Thesis, University of Southampton, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.427702.
Full textTörnkvist, Maria. "Synovial sarcoma : molecular, biological and clinical implications /." Stockholm, 2004. http://diss.kib.ki.se/2004/91-7140-024-9/.
Full textSchliep, Alexander. "A Bayesian approach to learning Hidden Markov model topology with applications to biological sequence analysis." [S.l. : s.n.], 2002. http://deposit.ddb.de/cgi-bin/dokserv?idn=964626330.
Full textRoth, Christian [Verfasser]. "Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts / Christian Roth." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2021. http://nbn-resolving.de/urn:nbn:de:gbv:7-21.11130/00-1735-0000-0008-5912-0-2.
Full textFält, Susann. "Analysis of global gene expression in complex biological systems using microarray technology /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-612-3/.
Full textBajak, Edyta Zofia. "Genotoxic stress: novel biomarkers and detection methods : uncovering RNAs role in epigenetics of carcinogenesis /." Stockholm, 2005. http://diss.kib.ki.se/2005/91-7140-415-5/.
Full textGrigolon, Silvia. "Modelling and inference for biological systems : from auxin dynamics in plants to protein sequences." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112178/document.
Full textAll biological systems are made of atoms and molecules interacting in a non- trivial manner. Such non-trivial interactions induce complex behaviours allow- ing organisms to fulfill all their vital functions. These features can be found in all biological systems at different levels, from molecules and genes up to cells and tissues. In the past few decades, physicists have been paying much attention to these intriguing aspects by framing them in network approaches for which a number of theoretical methods offer many powerful ways to tackle systemic problems. At least two different ways of approaching these challenges may be considered: direct modeling methods and approaches based on inverse methods. In the context of this thesis, we made use of both methods to study three different problems occurring on three different biological scales. In the first part of the thesis, we mainly deal with the very early stages of tissue development in plants. We propose a model aimed at understanding which features drive the spontaneous collective behaviour in space and time of PINs, the transporters which pump the phytohormone auxin out of cells. In the second part of the thesis, we focus instead on the structural properties of proteins. In particular we ask how conservation of protein function across different organ- isms constrains the evolution of protein sequences and their diversity. Hereby we propose a new method to extract the sequence positions most relevant for protein function. Finally, in the third part, we study intracellular molecular networks that implement auxin signaling in plants. In this context, and using extensions of a previously published model, we examine how network structure affects network function. The comparison of different network topologies provides insights into the role of different modules and of a negative feedback loop in particular. Our introduction of the dynamical response function allows us to characterize the systemic properties of the auxin signaling when external stimuli are applied
Cao, Haibo. "Protein Structure Recognition From Eigenvector Analysis to Structural Threading Method." Washington, D.C. : Oak Ridge, Tenn. : United States. Dept. of Energy. Office of Science ; distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy, 2003. http://www.osti.gov/servlets/purl/822060-2L2Xvm/native/.
Full textPublished through the Information Bridge: DOE Scientific and Technical Information. "IS-T 2028" Haibo Cao. 12/12/2003. Report is also available in paper and microfiche from NTIS.
Wang, Bo. "Novel statistical methods for evaluation of metabolic biomarkers applied to human cancer cell lines." Miami University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=miami1399046331.
Full textKhodji, Hiba. "Apprentissage profond et transfert de connaissances pour la détection d'erreurs dans les séquences biologiques." Electronic Thesis or Diss., Strasbourg, 2023. http://www.theses.fr/2023STRAD058.
Full textThe widespread use of high throughput technologies in the biomedical field is producing massive amounts of data, notably the new generation of genome sequencing technologies. Multiple Sequence Alignment (MSA) serves as a fundamental tool for the analysis of this data, with applications including genome annotation, protein structure and function prediction, or understanding evolutionary relationships, etc. However, the accuracy of MSA is often compromised due to factors such as unreliable alignment algorithms, inaccurate gene prediction, or incomplete genome sequencing. This thesis addresses the issue of data quality assessment by leveraging deep learning techniques. We propose novel models based on convolutional neural networks for the identification of errors in visual representations of MSAs. Our primary objective is to assist domain experts in their research studies, where the accuracy of MSAs is crucial. Therefore, we focused on providing reliable explanations for our model predictions by harnessing the potential of explainable artificial intelligence (XAI). Particularly, we leveraged visual explanations as a foundation for a transfer learning framework that aims essentially to improve a model's ability to focus on underlying features in an input. Finally, we proposed novel evaluation metrics designed to assess this ability. Initial findings suggest that our approach achieves a good balance between model complexity, performance, and explainability, and could be leveraged in domains where data availability is limited and the need for comprehensive result explanation is paramount
Grandin, Lori Cristina. "Aplicações de modelos logisticos regressivos em biologia molecular." [s.n.], 2006. http://repositorio.unicamp.br/jspui/handle/REPOSIP/307187.
Full textDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica
Made available in DSpace on 2018-08-06T00:58:27Z (GMT). No. of bitstreams: 1 Grandin_LoriCristina_M.pdf: 676646 bytes, checksum: 180775a0f53ffb688d7c1603339ff1b8 (MD5) Previous issue date: 2006
Resumo: O avanço do sequenciamento dos genes tem incentivado o desenvolvimento de novas técnicas estatísticas para analisar dados genéticos. Nesse trabalho, os modelos logísticos regressivos, introduzidos por Bonney (1986), são apresentados primeiramente no contexto de análise de dados de família e posteriormente esses modelos são utilizados para analisar freqüências de códons em seqüências de DNA mitocondrial. Considerar independência entre os nucleotídeos no códon pode ser uma suposição muito forte, ou seja, biologicamente irreal. Por isso, várias estruturas de dependência são apresentadas para analisar as freqüências dos códons. Por exemplo, uma estrutura markoviana de primeira ordem pode ser adequada para explicar a dependência das bases no códon. A função de log-verossimilhança é avaliada e várias comparações são feitas para analisar qual o modelo mais parcimonioso. Aplicações desses modelos são feitas utilizando-se dados reais de seqüências do gene NADH4 do genoma mitocondrial humano
Abstract: The advance of gene sequencing has stimulated the development of new statistical techniques to analyze genetic data. In this work the logistic regressive models, introduced by Bonney (1986), are presented first in the context of analysis of familial data and then they are used to analyze codon frequencies in mitochondrial DNA sequences. The assumption of independence among nucleotide frequencies in a codon can be a very strong one, or biologically unreal. In view of this, several structures of dependence are presented to analyze the codon frequencies. For example, a first order Markovian structure can be appropriate to explain the dependence of the base frequencies in the codon. The log-likelihood function is evaluated and several comparisons are made to analyze which is the most parcimonious model. Applications of these models are made using real data of NADH4 gene sequences of the human mitochondrial genome
Mestrado
Bioestatistica
Mestre em Estatística
Zhi, Degui. "Discovery and analysis of mosaic arrangements in biological sequences and structures." Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2006. http://wwwlib.umi.com/cr/ucsd/fullcit?p3194816.
Full textTitle from first page of PDF file (viewed February 28, 2006). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 140-156).
Barton, Carl Samuel. "Algorithmic problems in strings with applications to the analysis of biological sequences." Thesis, King's College London (University of London), 2015. http://kclpure.kcl.ac.uk/portal/en/theses/algorithmic-problems-in-strings-with-applications-to-the-analysis-of-biological-sequences(461c8961-c256-4ff8-97f7-c0718709367d).html.
Full textFortino, Vittorio. "Sequence analysis in bioinformatics: methodological and practical aspects." Doctoral thesis, Universita degli studi di Salerno, 2013. http://hdl.handle.net/10556/985.
Full textMy PhD research activities has focused on the development of new computational methods for biological sequence analyses. To overcome an intrinsic problem to protein sequence analysis, whose aim was to infer homologies in large biological protein databases with short queries, I developed a statistical framework BLAST-based to detect distant homologies conserved in transmembrane domains of different bacterial membrane proteins. Using this framework, transmembrane protein domains of all Salmonella spp. have been screened and more than five thousands of significant homologies have been identified. My results show that the proposed framework detects distant homologies that, because of their conservation in distinct bacterial membrane proteins, could represent ancient signatures about the existence of primeval genetic elements (or mini-genes) coding for short polypeptides that formed, through a primitive assembly process, more complex genes. Further, my statistical framework lays the foundation for new bioinformatics tools to detect homologies domain-oriented, or in other words, the ability to find statistically significant homologies in specific target-domains. The second problem that I faced deals with the analysis of transcripts obtained with RNA-Seq data. I developed a novel computational method that combines transcript borders, obtained from mapped RNA-Seq reads, with sequence features based operon predictions to accurately infer operons in prokaryotic genomes. Since the transcriptome of an organism is dynamic and condition dependent, the RNA-Seq mapped reads are used to determine a set of confirmed or predicted operons and from it specific transcriptomic features are extracted and combined with standard genomic features to train and validate three operon classification models (Random Forests - RFs, Neural Networks – NNs, and Support Vector Machines - SVMs). These classifiers have been exploited to refine the operon map annotated by DOOR, one of the most used database of prokaryotic operons. This method proved that the integration of genomic and transcriptomic features improve the accuracy of operon predictions, and that it is possible to predict the existence of potential new operons. An inherent limitation of using RNA-Seq to improve operon structure predictions is that it can be not applied to genes not expressed under the condition studied. I evaluated my approach on different RNA-Seq based transcriptome profiles of Histophilus somni and Porphyromonas gingivalis. These transcriptome profiles were obtained using the standard RNA-Seq or the strand-specific RNA-Seq method. My experimental results demonstrate that the three classifiers achieved accurate operon maps including reliable predictions of new operons. [edited by author]
XI n.s.
Rosen, Gail L. "Signal processing for biologically-inspired gradient source localization and DNA sequence analysis." Diss., Available online, Georgia Institute of Technology, 2006, 2006. http://etd.gatech.edu/theses/available/etd-07102006-123527/.
Full textOliver Brand, Committee Member ; James H. McClellan, Committee Member ; Paul Hasler, Committee Chair ; Mark T. Smith, Committee Member ; David Anderson, Committee Member.
Bao, Yu. "Identification and Analysis of Critical Sites in RNA/Protein Sequences and Biological Networks." Kyoto University, 2018. http://hdl.handle.net/2433/235113.
Full textLindskog, Mats. "Computational analyses of biological sequences -applications to antibody-based proteomics and gene family characterization." Doctoral thesis, KTH, School of Biotechnology (BIO), 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-527.
Full textFollowing the completion of the human genome sequence, post-genomic efforts have shifted the focus towards the analysis of the encoded proteome. Several different systematic proteomics approaches have emerged, for instance, antibody-based proteomics initiatives, where antibodies are used to functionally explore the human proteome. One such effort is HPR (the Swedish Human Proteome Resource), where affinity-purified polyclonal antibodies are generated and subsequently used for protein expression and localization studies in normal and diseased tissues. The antibodies are directed towards protein fragments, PrESTs (Protein Epitope Signature Tags), which are selected based on criteria favourable in subsequent laboratory procedures.
This thesis describes the development of novel software (Bishop) to facilitate the selection of proper protein fragments, as well as ensuring a high-throughput processing of selected target proteins. The majority of proteins were successfully processed by this approach, however, the design strategy resulted in a number ofnfall-outs. These proteins comprised alternative splice variants, as well as proteins exhibiting high sequence similarities to other human proteins. Alternative strategies were developed for processing of these proteins. The strategy for handling of alternative splice variants included the development of additional software and was validated by comparing the immunohistochemical staining patterns obtained with antibodies generated towards the same target protein. Processing of high sequence similarity proteins was enabled by assembling human proteins into clusters according to their pairwise sequence identities. Each cluster was represented by a single PrEST located in the region of the highest sequence similarity among all cluster members, thereby representing the entire cluster. This strategy was validated by identification of all proteins within a cluster using antibodies directed to such cluster specific PrESTs using Western blot analysis. In addition, the PrEST design success rates for more than 4,000 genes were evaluated.
Several genomes other than human have been finished, currently more than 300 genomes are fully sequenced. Following the release of the tree model organism black cottonwood (Populus trichocarpa), a bioinformatic analysis identified unknown cellulose synthases (CesAs), and revealed a total of 18 CesA family members. These genes are thought to have arisen from several rounds of genome duplication. This number is significantly higher than previous studies performed in other plant genomes, which comprise only ten CesA family members in those genomes. Moreover, identification of corresponding orthologous ESTs belonging to the closely related hybrid aspen (P. tremula x tremuloides) for two pairs of CesAs suggest that they are actively transcribed. This indicates that a number of paralogs have preserved their functionalities following extensive genome duplication events in the tree’s evolutionary history.
Patavino, Claudio <1981>. "Core Genome Multilocus Sequence Typing and Single Nucleotide Polymorphism Analysis in the Epidemiology of Brucella melitensis Infections." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amsdottorato.unibo.it/9076/1/Thesis_PhD_Claudio_Patavino.pdf.
Full textLerario, Antonio Marcondes. "Perfis de expressão de genes relacionados a metástases em uma coorte de pacientes adultos e pediátricos portadores de neoplasias do córtex da supra-renal." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/5/5135/tde-24112008-171634/.
Full textAdrenocortical carcinoma (ACC) is a rare neoplasm with a poor prognosis. Although molecular studies have uncovered many aspects of ACC tumorigenesis, little is known about molecular pathways involved in metastatic spread. The objective of our study is to analyze the expression profile of metastasis-related genes in a cohort of metastatic and nonmetastatic adrenocortical tumors in order to identify genes involved in the metastatic spread, as well as to find new prognostic markers. The expression profiles of 27 adrenocortical tumors from 15 adults (8 ACC and 7 adenomas) and 12 children (5 metastatic and 7 non-metastatic) were evaluated by an array of 113 known to be involved in human metastasis. Cluster analysis showed adult adrenocortical adenomas form a group distinct from other adrenocortical tumors (adult carcinomas and pediatric tumors). The comparison of adult adenoma and ACC revealed that MMP11 and DENR were differentially expressed between these two groups while no gene was differentially expressed among pediatric adrenocortical tumors. Similarly to cluster analysis, Principal component analysis failed to identify partition amongst pediatric tumors categorized by their evolution. The expression data of MMP2, TIMP3 and FN1 genes by RT-PCR agreed with those generated by the arrays. LOH of 22q12.3 region was detected in some cases in which TIMP3 down regulation was verified (but not in all cases). In conclusion, we have identified important aspects of molecular pathways and biological characteristics involved in metastatic spread of adrenocortical tumors. Distinctive patterns of gene expression between metastatic and nonmetastatic tumors may help in prognosis prediction
Piccinini, S. "ZEIN CODING SEQUENCE ANALYSES FOR MAIZE GENOTYPING AND ZEIN PROTEIN MANIPULATION TOWARDS THE IMPROVEMENT OF THE MAIZE SEED PROTEIN QUALITY." Doctoral thesis, Università degli Studi di Milano, 2014. http://hdl.handle.net/2434/241132.
Full textCoppe, Alessandro. "A bioinformatic and computational approach to regulation of genome function: integrated analysis of genome organization, promoter sequences and gene expression." Doctoral thesis, Università degli studi di Padova, 2008. http://hdl.handle.net/11577/3426395.
Full textLeggio, Loredana. "Functional analysis of two alternative transcripts from porin1 gene of Drosophila melanogaster and involvement of corresponding 5'UTR sequences in the translation control." Doctoral thesis, Università di Catania, 2017. http://hdl.handle.net/10761/3935.
Full textSiedhoff, Dominic [Verfasser], Heinrich [Akademischer Betreuer] Müller, and Dorit [Gutachter] Merhof. "A parameter-optimizing model-based approach to the analysis of low-SNR image sequences for biological virus detection / Dominic Siedhoff ; Gutachter: Dorit Merhof ; Betreuer: Heinrich Müller." Dortmund : Universitätsbibliothek Dortmund, 2016. http://d-nb.info/1115464019/34.
Full textBellora, Pereyra Nicolás. "In silico analysis of regulatory motifs in gene promoters." Doctoral thesis, Universitat Pompeu Fabra, 2010. http://hdl.handle.net/10803/7202.
Full textLa regulació de la transcripció dels gens és un procés complex que implica moltes proteïnes diferents, algunes de les quals s'unexien a motius específics d'ADN localitzats a la regió promotora dels gens. S'espera que la necessitat de mantenir les interaccions específiques entre els factors de transcripció i les proteïnes implicades en el complex de la ARN polimerasa II imposi limitacions en la posició relativa i l'espaiat dels motius d'interacció amb l'ADN. La feina presentada en aquesta tesi inclou el desenvolupament d'un nou metode per l'identificació de motius que mostren una localització preferencial en seqüències d'ADN i l'implementació d'una aplicació web pública anomenada PEAKS. Hem investigat si la col·locació i la naturalesa de la majoria dels motius comuns depen del rang d'expresió del gen. Hem trobat diferències que serveixen per il·lustrar el fet que moltes senyals clau de regulació gènica poden estar presents en la regió proximal del promotor dels gens de mamífers. També hem aplicat altres mètodes per a l'identificació de factors de transcripció (TFs) específics involucrats en la co-regulació d'un grup de gens. Dades de llocs d'unio dels TFs (TFBSs) verificats experimentalment recolzen la rellevància biològica dels nostres resultats.
Reis, Thais Aparecida Vieira. "Estudo epidemiológico da doença diarréica aguda associada aos adenovírus, em Juiz de Fora, Minas Gerais, no período 2007-2010." Universidade Federal de Juiz de Fora, 2012. https://repositorio.ufjf.br/jspui/handle/ufjf/1927.
Full textApproved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-07-13T15:59:16Z (GMT) No. of bitstreams: 1 thaisaparecidavieirareis.pdf: 1630302 bytes, checksum: d9e1d09066119dbc3c4a339f7ccc6564 (MD5)
Made available in DSpace on 2016-07-13T15:59:16Z (GMT). No. of bitstreams: 1 thaisaparecidavieirareis.pdf: 1630302 bytes, checksum: d9e1d09066119dbc3c4a339f7ccc6564 (MD5) Previous issue date: 2012-06-29
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
FAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas Gerais
A doença diarréica aguda (DDA) é, ainda hoje, uma das principais causas de morbidade e mortalidade infantil, nos países em desenvolvimento. Na diarreia aguda não bacteriana, os adenovírus entéricos constituem um dos importantes agentes etiológicos da doença. Os adenovírus humanos (HAdV) pertencem à família Adenoviridae e ao gênero Mastadenovirus, cujos membros estão classificados em 7 espécies (A-G) e 54 sorotipos. Dentre estes, os sorotipos 40 e 41, ambos da espécie F, são os mais comumente associados com a DDA. Considerando-se, então, a importância da DDA em países em desenvolvimento, o grande número de casos que não são esclarecidos e o pouco conhecimento sobre a infecção e a participação dos HAdV na gênese da DDA, em Juiz de Fora, Minas Gerais, foi realizado o presente estudo. Entre janeiro de 2007 e dezembro de 2010, foram analisados 395 espécimes fecais diarréicos, provenientes de indivíduos de várias idades, atendidos em serviços ambulatoriais e hospitalizados. A presença dos HAdV foi detectada pela reação de PCR , utilizando-se os iniciadores específicos e a caracterização molecular das amostras positivas foi feita pelo seqüenciamento e análise filogenética das sequências parciais do gene do Hexon. Para as análises estatísticas, foi utilizado o programa SPSS versão 13.0, tendo se adotado um valor de significância como p<0,05. A prevalência da infecção por HAdV, no período 2007-2010, foi de 10,9% (43/395). Os resultados mostraram que não houve correlação significante entre a procedência da amostra (ambulatorial X hospitalar) e a ocorrência da infecção (p=0,152), o mesmo tendo sido observado em relação ao gênero do indivíduo infectado (p=0,393). Por outro lado, a maioria dos casos positivos foi detectada em crianças de até 24 meses de idade, mostrando uma correlação estatisticamente significante entre a idade dos indivíduos infectados e a ocorrência da infecção (p=0,007). Na maioria dos casos de infecção pelo HAdV (36/43), este foi o único agente viral detectado, no entanto, foram observados casos de coinfecção HAdV/Rotavirus (5/43) e HAdV/Norovirus (2/43). A análise filogenética das sequências parciais do gene do Hexon, de 35 amostras positivas, revelou que todas agruparam com amostras de HAdV da espécie F, sorotipo 41, confirmando assim, a associação de HAdV entéricos, nos casos estudados. Este levantamento epidemiológico revelou a presença e a circulação destes vírus na população de Juiz de Fora, no período avaliado, bem como sua importante participação na gênese da DDA, permitindo assim, esclarecer uma boa parte dos casos da doença, que normalmente ficaria sem definição etiológica.
Acute diarrheal disease (ADD) is still the major cause of child morbidity and mortality in developing countries. Among the non-bacterial diarrhea, enteric adenoviruses are one of the most important etiologic agents of disease. Human adenoviruses (HAdV) belongs to the Adenoviridae family and Mastadenovirus genus. The virus are classified into seven species (A-G) and 54 serotypes. Among them, serotypes 40 and 41, both of the species F, are the most commonly associated with ADD. Taking in consideration the importance of the DDA in developing countries, the large number of cases that don’t have the etiologic agent identified and the lack of knowledge about the participation of HAdV infection and the pathogenesis of ADD, we performed this study in Juiz de Fora, Minas Gerais. Between January of 2007 and December of 2010 395 diarrheal fecal specimens originating from individuals of various ages treated in ambulatory and hospitalized were analyzed. The presence of HAdV was detected by PCR, using specific primers, and molecular characterization of positive samples was performed by sequencing and phylogenetic analysis of partial sequences of the hexon gene. For statistical analyzes, we used SPSS version 13.0, adopting a value of p <0.05 as significant. The prevalence of infection by HAdV between 2007-2010 was 10.9% (43/395). The results showed no significant correlation between the origin of the sample (hospital X ambulatory) and the occurrence of infection (p = 0.152), and the same was observed in relation to gender of the infected person (P=0,393). Moreover, the majority of positive cases was detected in children under 24 months of age, showing a statistically significant correlation between the age of the infected individuals and the occurrence of infection (p=0,007). In most cases of infection HAdV (36/43), this was the only viral agent detected, however, cases of co-infection HAdV / Rotavirus (5/43) and HAdV /Norovirus (2/43) were identified. Phylogenetic analysis of partial sequences of the hexon gene from 35 positive samples revealed that all samples clustered with HAdV species F, serotypes 41, confirming the association of enteric HAdV in the cases of this study. This epidemiological survey revealed the presence and circulation of these viruses in the population of Juiz de Fora in the period studied, as well as its important role in the genesis of the DDA, our data identified a good number of cases of the disease, which normally remains unidentified.
Grievink, Liat Shavit. "Lineage specific evolution and phylogenetic analysis : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biomathematics at Massey University, Palmerston North, New Zealand." Massey University, 2009. http://hdl.handle.net/10179/1048.
Full textXu, Minzhen. "Regulation of Transcription of Mouse Immunoglobulin Germ-Line γ1 RNA: Structural Characterization of Germ-Line γ1 RNA and Molecular Analysis of the Promoter: A Dissertation." eScholarship@UMMS, 1991. https://escholarship.umassmed.edu/gsbs_diss/99.
Full textBarletta, Vívian Honorato. "Detecção e caracterização molecular de norovírus associados a casos de doença diarréica aguda infantil." Universidade Federal de Juiz de Fora (UFJF), 2011. https://repositorio.ufjf.br/jspui/handle/ufjf/5287.
Full textApproved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-08-07T19:10:07Z (GMT) No. of bitstreams: 1 vivianhonoratobarletta.pdf: 2632443 bytes, checksum: c92921b16b55178bf7a44b922bd843ef (MD5)
Made available in DSpace on 2017-08-07T19:10:07Z (GMT). No. of bitstreams: 1 vivianhonoratobarletta.pdf: 2632443 bytes, checksum: c92921b16b55178bf7a44b922bd843ef (MD5) Previous issue date: 2011-02-24
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Os norovírus (NoV) são importantes agentes etiológicos, responsáveis por surtos e casos esporádicos de doença diarréica aguda, que acometem indivíduos de todas as idades. A partícula viral não apresenta envelope e o material genético é constituído por uma molécula de RNA de fita simples, de polaridade positiva. Pertencem à família Caliciviridae, gênero Norovirus e estão classificados em cinco genogrupos (GI–V), sendo que os NoV humanos estão agrupados nos genogrupos I, II e IV e destes, os NoV do genogrupo II e genótipo 4 (GII.4) são os mais comumente encontrados, em todo o mundo. Apesar da associação destes vírus com a doença diarréica aguda estar bem documentada na literatura mundial, no Brasil, os trabalhos são escassos e restritos aos grandes centros e adjacências. Assim, considerando-se o pouco conhecimento sobre os norovírus e a inexistência de dados epidemiológicos na cidade de Juiz de Fora, MG, foi realizado o presente estudo, cujos objetivos foram a detecção e caracterização molecular de amostras de NoV, associadas a casos esporádicos de doença diarréica aguda infantil, bem como a avaliação da influência de fatores climáticos e demográficos na ocorrência destas infecções. De janeiro de 2008 a dezembro de 2009, 218 espécimes fecais foram analisados para a presença de NoV, por RT-PCR convencional, todos obtidos de crianças de 0 a 12 anos de idade, proveniente de atendimentos ambulatoriais (89,45%) e hospitalizados (10,55%). Foram detectadas 20 (9,17%) amostras positivas e observou-se uma tendência de sazonalidade das infecções no período da estação seca, no ano de 2008, fato que não se repetiu em 2009. A maioria das amostras positivas foi detectada em crianças na faixa de 0 a 36 meses e não houve correlação, estatisticamente significante, entre a ocorrência das infecções e o sexo. Das 20 amostras detectadas, 19 foram caracterizadas como NoV GII e 1 como NoV GI. O sequenciamento parcial do genoma e a análise filogenética das amostras selecionadas, revelou a presença de NoV dos genótipos GII.4 e GII.6, que cocircularam nos dois anos do estudo. As amostras NoV GII.4, detectadas em Juiz de Fora, apresentaram maior similaridade de nucleotídeos e de aminoácidos com aquelas que circularam no estado do Rio de Janeiro nos anos de 2006, 2007 e 2008. A análise filogenética das amostras NoV GII.6 detectadas em Juiz de Fora, associada à alta similaridade das sequências de nucleotídeos e aminoácidos, mostrou que estas foram mais proximamente relacionadas com a amostra NoV GII.6 (GU132461/2007), detectada no estado do Rio de Janeiro em 2007, fatos que, aliados à proximidade geográfica de ambas as cidades, sugerem uma possível linhagem comum entre as mesmas. Este levantamento epidemiológico permitiu constatar a presença e circulação de NoV na população infantil de Juiz de Fora, MG, demonstrando sua importante participação como agente etiológico das diarreias agudas, também nesta comunidade
Noroviruses (NoV) are important etiological agents responsible for outbreaks and sporadic cases of acute diarrhea in individuals of all ages. The viral particles are nonenveloped with and the genome is composed of a positive single-stranded RNA. Norovirus belongs to the Caliciviridae family, Norovirus genus and are classified into five genogroups (GI-V), with GI, GII and GIV being found in human and among them, the NoV GII genotype 4 (GII.4) are the most commonly found worldwide. In Brazil, norovirus surveys are realized mainly in research institutes, carried out in the biggest centers and surroundings. Thus, considering the little knowledge about these viruses and the lack of epidemiological data on this viral infection in the Juiz de Fora city, MG state, it was performed this study, which aimed to detect and characterize the NoV samples, associated with sporadic cases of acute infantile diarrhea, as well as asses the influence of climatic and demographic factors in the occurrence of these infections. Between January 2008 to December 2009, 218 fecal specimens were analyzed for the presence of NoV by conventional RT-PCR, all obtained from children 0-12 years of age, from outpatient (89.45%) and inpatients (10.55%). We detected 20 (9.17%) positive samples and there was a tendency for seasonal infections during the dry season in 2008, a fact which was not repeated in 2009. The biggest number of positive samples were detected in children aged 0 to 24 months and there was no statistically significant correlation between the occurrence of infections and sex. Of the 20 samples detected, 19 were characterized as NoV GII and 1 as NoV GI. The partial genome sequencing and phylogenetic analysis of selected samples revealed the presence of NoV genotypes GII.4 and GII.6, which co-circulated in the two years of study. Samples NoV GII.4 detected in Juiz de Fora, showed greater similarity of nucleotides and aminoacids with those that circulated in the state of Rio de Janeiro during 2006, 2007 and 2008. Phylogenetic analysis of the samples GII.6 NoV, detected in Juiz de Fora, associated with the high similarity of nucleotide and amino acid sequences showed that they were most closely related to the sample GII.6 NoV (GU132461/2007) detected in the state of Rio de Janeiro in 2007. This fact associated with the geographical proximity of both cities, suggesting a possible common lineage between them. This epidemiological survey revealed the presence and circulation of NoV in the infantile population of Juiz de Fora, MG, demonstrating its important role as an etiologic agent of acute diarrhea, also in this community.
BERTOLAZZI, Giorgio. "MicroRNA Interaction Networks." Doctoral thesis, Università degli Studi di Palermo, 2021. http://hdl.handle.net/10447/498927.
Full textBertolazzi’s thesis focuses on developing and applying computational methods to predict microRNA binding sites located on messenger RNA molecules. MicroRNAs (miRNAs) regulate gene expression by binding target messenger RNA molecules (mRNAs). Therefore, the prediction of miRNA binding is important to investigate cellular processes. Moreover, alterations in miRNA activity have been associated with many human diseases, such as cancer. The thesis explores miRNA binding behavior and highlights fundamental information for miRNA target prediction. In particular, a machine learning approach is used to upgrade an existing target prediction algorithm named ComiR; the original version of ComiR considers miRNA binding sites located on mRNA 3’UTR region. The novel algorithm significantly improves the ComiR prediction capacity by including miRNA binding sites located on mRNA coding regions.
"Graphical representation of biological sequences and its applications." Thesis, 2010. http://library.cuhk.edu.hk/record=b6074915.
Full textIn this thesis, we have two main contributions: (1) We construct a protein map with the help of our proposed new graphical representation for protein sequences. Each protein sequence can be represented as a point in this map, and cluster analysis of proteins can be performed for comparison between the points. This protein map can be used to mathematically specify the similarity of two proteins and predict properties of an unknown protein based on its amino acid sequence. (2) We construct a novel genome space with biological geometry, which is a subspace in RN . In this space each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships.
Yu, Chenglong.
Adviser: Luk Hing Sun.
Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2010.
Includes bibliographical references (leaves 59-64).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
Min, Renqiang. "Machine Learning Approaches to Biological Sequence and Phenotype Data Analysis." Thesis, 2010. http://hdl.handle.net/1807/26209.
Full textLoving, Joshua. "Bit-parallel and SIMD alignment algorithms for biological sequence analysis." Thesis, 2017. https://hdl.handle.net/2144/27172.
Full textXu, Weijia. "On integrating biological sequence analysis with metric distance based database management systems." Thesis, 2006. http://hdl.handle.net/2152/2955.
Full textSahraeian, Sayed 1983. "Probabilistic Approaches in Comparative Analysis of Biological Networks and Sequences." Thesis, 2013. http://hdl.handle.net/1969.1/149225.
Full textHerms, Inke [Verfasser]. "Probabilistic arithmetic automata : applications of a stochastic computational framework in biological sequence analysis / Inke Herms." 2009. http://d-nb.info/999735500/34.
Full textWang, Jian. "Feature search in biological sequence data analysis of gene-finding tools and implementation of Interactive Pattern Search /." 2004. http://purl.galileo.usg.edu/uga%5Fetd/wang%5Fjian%5F200408%5Fms.
Full textDirected by Eileen T. Kraemer. Includes articles published in, and an article submitted to Bioinformatics. Includes bibliographical references (leaves 134-139).
Zwickl, Derrick Joel. "Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion." Thesis, 2006. http://hdl.handle.net/2152/2666.
Full textSchliep, Alexander [Verfasser]. "A Bayesian approach to learning Hidden Markov model topology with applications to biological sequence analysis / vorgelegt von Alexander Schliep." 2002. http://d-nb.info/964626330/34.
Full textShao, Qiang. "Theoretical Studies on Proteins to Reveal the Mechanism of Their Folding and Biological Functions." 2009. http://hdl.handle.net/1969.1/ETD-TAMU-2009-12-7395.
Full textMa, Fangrui. "Biological sequence analyses theory, algorithms, and applications /." 2009. http://proquest.umi.com/pqdweb?did=1821098721&sid=1&Fmt=2&clientId=14215&RQT=309&VName=PQD.
Full textTitle from title screen (site viewed October 13, 2009). PDF text: xv, 233 p. : ill. ; 4 Mb. UMI publication number: AAT 3360173. Includes bibliographical references. Also available in microfilm and microfiche formats.
Presta, Luana. "Modeling biological systems: from genome sequences to functional insights." Doctoral thesis, 2018. http://hdl.handle.net/2158/1129632.
Full textKuo, Chung-Yi, and 郭仲翊. "Systematic Biological Analysis of Tandem Repeats Sequences in Different Species based on Machine Learning." Thesis, 2019. http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5396025%22.&searchmode=basic.
Full text國立中興大學
資訊管理學系所
107
Tandem Repeats Sequences are often used in Genetics, and the most well-known use is as a molecular genetic marker studies, which typically exhibit high sequence variability between populations and individuals, and having codominance. Therefore, it is also widely used in genetic diversity analysis. Genetics and Evolution are closely related. Species are all evolved from common descent. It means that species''s genetic information, such as tandem repeat, may contain the genetic information about the ancestors. At the same time, the classification criteria for species classification can represent the characteristics of the common descent of the same class of organism, and this property should also exist in tandem repeats sequences. Therefore, this study analyzes tandem repeats and species classifications, and hopes to find the association between tandem repeats and evolution. The data set used in this study is the genomic data of the Complete and Chromosome that has been sequenced and completed in the NCBI Genome database. According to the classification system of taxonomy, genomic data of 80 different species in 12 different phylum were selected. After finding the model of all tandem repeats by using the tool for finding repeated sequences, the two series of feature selection methods are used to select the representative and representative tandem repeat model. Finally, using the machine learning algorithm C4.5 and CART to build a classification model to explore the feasibility of tandem repeats as species classification.