Dissertations / Theses on the topic 'Biological Sequence Analysis'

To see the other types of publications on this topic, follow the link: Biological Sequence Analysis.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Biological Sequence Analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Yeats, Corin Anthony. "Biological investigations through sequence analysis." Thesis, University of Cambridge, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.614848.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Thompson, James. "Genetic algorithms applied to biological sequence analysis /." Link to online version, 2006. https://ritdml.rit.edu/dspace/handle/1850/2269.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Parbhane, R. V. "Analysis of DNA sequences: modeling sequence dependent features and their biological roles." Thesis(Ph.D.), CSIR-National Chemical Laboratory, Pune, 2000. http://dspace.ncl.res.in:8080/xmlui/handle/20.500.12252/2285.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Verzotto, Davide. "Advanced Computational Methods for Massive Biological Sequence Analysis." Doctoral thesis, Università degli studi di Padova, 2011. http://hdl.handle.net/11577/3426282.

Full text
Abstract:
With the advent of modern sequencing technologies massive amounts of biological data, from protein sequences to entire genomes, are becoming increasingly available. This poses the need for the automatic analysis and classification of such a huge collection of data, in order to enhance knowledge in the Life Sciences. Although many research efforts have been made to mathematically model this information, for example finding patterns and similarities among protein or genome sequences, these approaches often lack structures that address specific biological issues. In this thesis, we present novel computational methods for three fundamental problems in molecular biology: the detection of remote evolutionary relationships among protein sequences, the identification of subtle biological signals in related genome or protein functional sites, and the phylogeny reconstruction by means of whole-genome comparisons. The main contribution is given by a systematic analysis of patterns that may affect these tasks, leading to the design of practical and efficient new pattern discovery tools. We thus introduce two advanced paradigms of pattern discovery and filtering based on the insight that functional and conserved biological motifs, or patterns, should lie in different sites of sequences. This enables to carry out space-conscious approaches that avoid a multiple counting of the same patterns. The first paradigm considered, namely irredundant common motifs, concerns the discovery of common patterns, for two sequences, that have occurrences not covered by other patterns, whose coverage is defined by means of specificity and extension. The second paradigm, namely underlying motifs, concerns the filtering of patterns, from a given set, that have occurrences not overlapping other patterns with higher priority, where priority is defined by lexicographic properties of patterns on the boundary between pattern matching and statistical analysis. We develop three practical methods directly based on these advanced paradigms. Experimental results indicate that we are able to identify subtle similarities among biological sequences, using the same type of information only once. In particular, we employ the irredundant common motifs and the statistics based on these patterns to solve the remote protein homology detection problem. Results show that our approach, called Irredundant Class, outperforms the state-of-the-art methods in a challenging benchmark for protein analysis. Afterwards, we establish how to compare and filter a large number of complex motifs (e.g., degenerate motifs) obtained from modern motif discovery tools, in order to identify subtle signals in different biological contexts. In this case we employ the notion of underlying motifs. Tests on large protein families indicate that we drastically reduce the number of motifs that scientists should manually inspect, further highlighting the actual functional motifs. Finally, we combine the two proposed paradigms to allow the comparison of whole genomes, and thus the construction of a novel and practical distance function. With our method, called Unic Subword Approach, we relate to each other the regions of two genome sequences by selecting conserved motifs during evolution. Experimental results show that our approach achieves better performance than other state-of-the-art methods in the whole-genome phylogeny reconstruction of viruses, prokaryotes, and unicellular eukaryotes, further identifying the major clades of these organisms.
Con l'avvento delle moderne tecnologie di sequenziamento, massive quantità di dati biologici, da sequenze proteiche fino a interi genomi, sono disponibili per la ricerca. Questo progresso richiede l'analisi e la classificazione automatica di tali collezioni di dati, al fine di migliorare la conoscenza nel campo delle Scienze della Vita. Nonostante finora siano stati proposti molti approcci per modellare matematicamente le sequenze biologiche, ad esempio cercando pattern e similarità tra sequenze genomiche o proteiche, questi metodi spesso mancano di strutture in grado di indirizzare specifiche questioni biologiche. In questa tesi, presentiamo nuovi metodi computazionali per tre problemi fondamentali della biologia molecolare: la scoperta di relazioni evolutive remote tra sequenze proteiche, l'individuazione di segnali biologici complessi in siti funzionali tra loro correlati, e la ricostruzione della filogenesi di un insieme di organismi, attraverso la comparazione di interi genomi. Il principale contributo è dato dall'analisi sistematica dei pattern che possono interessare questi problemi, portando alla progettazione di nuovi strumenti computazionali efficaci ed efficienti. Vengono introdotti così due paradigmi avanzati per la scoperta e il filtraggio di pattern, basati sull'osservazione che i motivi biologici funzionali, o pattern, sono localizzati in differenti regioni delle sequenze in esame. Questa osservazione consente di realizzare approcci parsimoniosi in grado di evitare un conteggio multiplo degli stessi pattern. Il primo paradigma considerato, ovvero irredundant common motifs, riguarda la scoperta di pattern comuni a coppie di sequenze che hanno occorrenze non coperte da altri pattern, la cui copertura è definita da una maggiore specificità e/o possibile estensione dei pattern. Il secondo paradigma, ovvero underlying motifs, riguarda il filtraggio di pattern che hanno occorrenze non sovrapposte a quelle di altri pattern con maggiore priorità, dove la priorità è definita da proprietà lessicografiche dei pattern al confine tra pattern matching e analisi statistica. Sono stati sviluppati tre metodi computazionali basati su questi paradigmi avanzati. I risultati sperimentali indicano che i nostri metodi sono in grado di identificare le principali similitudini tra sequenze biologiche, utilizzando l'informazione presente in maniera non ridondante. In particolare, impiegando gli irredundant common motifs e le statistiche basate su questi pattern risolviamo il problema della rilevazione di omologie remote tra proteine. I risultati evidenziano che il nostro approccio, chiamato Irredundant Class, ottiene ottime prestazioni su un benchmark impegnativo, e migliora i metodi allo stato dell'arte. Inoltre, per individuare segnali biologici complessi utilizziamo la nozione di underlying motifs, definendo così alcune modalità per il confronto e il filtraggio di motivi degenerati ottenuti tramite moderni strumenti di pattern discovery. Esperimenti su grandi famiglie proteiche dimostrano che il nostro metodo riduce drasticamente il numero di motivi che gli scienziati dovrebbero altrimenti ispezionare manualmente, mettendo in luce inoltre i motivi funzionali identificati in letteratura. Infine, combinando i due paradigmi proposti presentiamo una nuova e pratica funzione di distanza tra interi genomi. Con il nostro metodo, chiamato Unic Subword Approach, relazioniamo tra loro le diverse regioni di due sequenze genomiche, selezionando i motivi conservati durante l'evoluzione. I risultati sperimentali evidenziano che il nostro approccio offre migliori prestazioni rispetto ad altri metodi allo stato dell'arte nella ricostruzione della filogenesi di organismi quali virus, procarioti ed eucarioti unicellulari, identificando inoltre le sottoclassi principali di queste specie.
APA, Harvard, Vancouver, ISO, and other styles
5

Margolin, Yelena 1977. "Analysis of sequence-selective guanine oxidation by biological agents." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/42381.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Biological Engineering Division, February 2008.
Vita.
Includes bibliographical references.
Oxidatively damaged DNA has been strongly associated with cancer, chronic degenerative diseases and aging. Guanine is the most frequently oxidized base in the DNA, and generation of a guanine radical cation (G'") as an intermediate in the oxidation reaction leads to migration of a resulting cationic hole through the DNA n-stack until it is trapped at the lowest-energy sites. These sites reside at runs of guanines, such as 5'-GG-3' sequences, and are characterized by the lowest sequence-specific ionization potentials (IPs). The charge transfer mechanism suggests that hotspots of oxidative DNA damage induced by electron transfer reagents can be predicted based on the primary DNA sequence. However, preliminary data indicated that nitrosoperoxycarbonate (ONOOCO2"), a mediator of chronic inflammation and a one-electron oxidant, displayed unusual guanine oxidation properties that were the focus of present work. As a first step in our study, we determined relative levels of guanine oxidation, induced by ONOOCO2 in all possible three-base sequence contexts (XGY) within double-stranded oligonucleotides. These levels were compared to the relative oxidation induced within the same guanines by photoactivated riboflavin, a one-electron reagent. We found that, in agreement with previous studies, photoactivated riboflavin was selective for guanines of lowest IPs located within 5'-GG-3' sequences. In contrast, ONOOCO2" preferentially reacted with guanines located within 5'-GC-3' sequences characterized by the highest IPs. This demonstrated that that sequence-specific IP was not a determinant of guanine reactivity with ONOOCO2". Sequence selectivities for both reagents were double-strand specific. Selectivity of ONOOCO2 for 5'-GC-3' sites was also observed in human genomic DNA after ligation-mediated PCR analysis.
(cont.) Relative yields of different guanine lesions produced by both ONOOCO2" and riboflavin varied 4- to 5-fold across all sequence contexts. To assess the role of solvent exposure in mediating guanine oxidation by ONOOCO2", relative reactivities of mismatched guanines with ONOOCO2" were measured. The majority of the mismatches displayed an increased reactivity with ONOOCO2 as compared to the fully matched G-C base-pairs. The extent of reactivity enhancement was sequence context-dependent, and the greatest levels of enhancement were observed for the conformationally flexible guanine- guanine (G-G) mismatches and for guanines located across from a synthetic abasic site. To test the hypothesis that the negative charge of an oxidant influences its reactivity with guanines in DNA, sequence-selective guanine oxidation by a negatively charged reagent, Fe+2-EDTA, was assessed and compared to guanine oxidation produced by a neutral oxidant, y-radiation. Because both of these agents cause high levels of deoxyribose oxidation, a general method to quantify sequence-specific nucleobase oxidation in the presence of direct strand breaks was developed. This method exploited activity of exonuclease III (Exo III), a 3' to 5' exonuclease, and utilized phosphorothioate-modified synthetic oligonucleotides that were resistant to Exo III activity. This method was employed to determine sequence-selective guanine oxidation by Fe+2-EDTA complex and y-radiation and to show that both agents produced identical guanine oxidation pattems and were equally reactive with all guanines, irrespective of their sequence-specific IPs or sequence context.
(cont.) This showed that negative charge was not a determinant of Fe+2-EDTA-mediated guanine oxidation. Finally, the role of oxidant binding on nucleobase damage was assessed by studying sequence-selective oxidation produced by DNA-bound Fe+2 ions in the presence of H202. We found that the major oxidation targets were thymines located within 5'-TGG-3' motifs, demonstrating that while guanines were a required element for coordination of Fe+2 to DNA, they were not oxidized. Our results suggest that factors other than sequence-specific IPs can act as major determinants of sequence-selective guanine oxidation, and that current models of guanine oxidation and charge transfer in DNA cannot be used to adequately predict the location and identity of mutagenic lesions in the genome.
by Yelena Margolin.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
6

Kim, Eagu. "Inverse Parametric Alignment for Accurate Biological Sequence Comparison." Diss., The University of Arizona, 2008. http://hdl.handle.net/10150/193664.

Full text
Abstract:
For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. In practice, substitution scores are usually chosen by convention, and gap penalties are often found by trial and error. In contrast, a rigorous way to determine parameter values that are appropriate for aligning biological sequences is by solving the problem of Inverse Parametric Sequence Alignment. Given examples of biologically correct reference alignments, this is the problem of finding parameter values that make the examples score as close as possible to optimal alignments of their sequences. The reference alignments that are currently available contain regions where the alignment is not specified, which leads to a version of the problem with partial examples.In this dissertation, we develop a new polynomial-time algorithm for Inverse Parametric Sequence Alignment that is simple to implement, fast in practice, and can learn hundreds of parameters simultaneously from hundreds of examples. Computational results with partial examples show that best possible values for all 212 parameters of the standard alignment scoring model for protein sequences can be computed from 200 examples in 4 hours of computation on a standard desktop machine. We also consider a new scoring model with a small number of additional parameters that incorporates predicted secondary structure for the protein sequences. By learning parameter values for this new secondary-structure-based model, we can improve on the alignment accuracy of the standard model by as much as 15% for sequences with less than 25% identity.
APA, Harvard, Vancouver, ISO, and other styles
7

Behr, Jonathan Robert. "Novel tools for sequence and epitope analysis of glycosaminoglycans." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/42383.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Biological Engineering Division, 2007.
Includes bibliographical references.
Our understanding of glycosaminoglycan (GAG) biology has been limited by a lack of sensitive and efficient analytical tools designed to deal with these complex molecules. GAGs are heterogeneous and often sulfated linear polys accharides found throughout the extracellular environment, and available to researchers only in limited mixtures. A series of sensitive label-free analytical tools were developed to provide sequence information and to quantify whole epitopes from GAG mixtures. Three complementary sets of tools were developed to provide GAG sequence information. Two novel exolytic sulfatases from Flavobacterium heparinum that degrade heparan/heparan sulfate glycosaminoglycans (HSGAGs) were cloned and characterized. These exolytic enzymes enabled the exo-sequencing of a HSGAG oligosaccharide. Phenylboronic acids (PBAs) were specifically reacted with unsulfated chondroitin sulfate (CS) disaccharides from within a larger mixture. The resulting cyclic esters were easily detected in mass spectrometry (MS) using the distinct isotopic abundance of boron. Electrospray ionization tandem mass spectrometry (ESI-MSn) was employed to determine the fragmentation patterns of HSGAG disaccharides. These patterns were used to quantify relative amounts of isomeric disaccharides in a mixture. Fragmentation information is valuable for building methods for oligosaccharide sequencing, and the general method can be applied to quantify any isomers using MSn. Three other tools were developed to quantify GAG epitopes. Two microfluidic devices were characterized as HSGAG sensors. Sensors were functionalized either with protamine to quantify total HSGAGs or with antithrombin-III (AT-III) to quantify a specific anticoagulant epitope.
(cont.) A charge sensitive silicon field effect sensor accurately quantified clinically relevant anticoagulants including low molecular weight heparins (LMWH), even out of serum. A mass sensitive suspended microchannel resonator (SMR) measured the same clinically relevant HSGAGs. When these two sensors were compared, the SMR proved more robust and versatile. The SMR signal is more stable, it can be reused ad infinitum, and surface modifications can be automated and monitored. The field effect sensor provided an advantage in selectivity by preferentially detecting highly charged HSGAGs instead of any massive, non-specifically bound proteins. Lastly, anti-HSGAG single chain variable fragments (scFv) were evolved using yeast surface display towards generating antibodies for HSGAG epitope sensing and clinical GAG neutralization.
by Jonathan Robert Behr.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
8

Tångrot, Jeanette. "Structural Information and Hidden Markov Models for Biological Sequence Analysis." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1629.

Full text
Abstract:
Bioinformatics is a fast-developing field, which makes use of computational methods to analyse and structure biological data. An important branch of bioinformatics is structure and function prediction of proteins, which is often based on finding relationships to already characterized proteins. It is known that two proteins with very similar sequences also share the same 3D structure. However, there are many proteins with similar structures that have no clear sequence similarity, which make it difficult to find these relationships. In this thesis, two methods for annotating protein domains are presented, one aiming at assigning the correct domain family or families to a protein sequence, and the other aiming at fold recognition. Both methods use hidden Markov models (HMMs) to find related proteins, and they both exploit the fact that structure is more conserved than sequence, but in two different ways. Most of the research presented in the thesis focuses on the structure-anchored HMMs, saHMMs. For each domain family, an saHMM is constructed from a multiple structure alignment of carefully selected representative domains, the saHMM-members. These saHMM-members are collected in the so called "midnight ASTRAL set", and are chosen so that all saHMM-members within the same family have mutual sequence identities below a threshold of about 20%. In order to construct the midnight ASTRAL set and the saHMMs, a pipe-line of software tools are developed. The saHMMs are shown to be able to detect the correct family relationships at very high accuracy, and perform better than the standard tool Pfam in assigning the correct domain families to new domain sequences. We also introduce the FI-score, which is used to measure the performance of the saHMMs, in order to select the optimal model for each domain family. The saHMMs are made available for searching through the FISH server, and can be used for assigning family relationships to protein sequences. The other approach presented in the thesis is secondary structure HMMs (ssHMMs). These HMMs are designed to use both the sequence and the predicted secondary structure of a query protein when scoring it against the model. A rigorous benchmark is used, which shows that HMMs made from multiple sequences result in better fold recognition than those based on single sequences. Adding secondary structure information to the HMMs improves the ability of fold recognition further, both when using true and predicted secondary structures for the query sequence.
Bioinformatik är ett område där datavetenskapliga och statistiska metoder används för att analysera och strukturera biologiska data. Ett viktigt område inom bioinformatiken försöker förutsäga vilken tredimensionell struktur och funktion ett protein har, utifrån dess aminosyrasekvens och/eller likheter med andra, redan karaktäriserade, proteiner. Det är känt att två proteiner med likande aminosyrasekvenser också har liknande tredimensionella strukturer. Att två proteiner har liknande strukturer behöver dock inte betyda att deras sekvenser är lika, vilket kan göra det svårt att hitta strukturella likheter utifrån ett proteins aminosyrasekvens. Den här avhandlingen beskriver två metoder för att hitta likheter mellan proteiner, den ena med fokus på att bestämma vilken familj av proteindomäner, med känd 3D-struktur, en given sekvens tillhör, medan den andra försöker förutsäga ett proteins veckning, d.v.s. ge en grov bild av proteinets struktur. Båda metoderna använder s.k. dolda Markov modeller (hidden Markov models, HMMer), en statistisk metod som bland annat kan användas för att beskriva proteinfamiljer. Med hjälp en HMM kan man förutsäga om en viss proteinsekvens tillhör den familj modellen representerar. Båda metoderna använder också strukturinformation för att öka modellernas förmåga att känna igen besläktade sekvenser, men på olika sätt. Det mesta av arbetet i avhandlingen handlar om strukturellt förankrade HMMer (structure-anchored HMMs, saHMMer). För att bygga saHMMerna används strukturbaserade sekvensöverlagringar, vilka genereras utifrån hur proteindomänerna kan läggas på varandra i rymden, snarare än utifrån vilka aminosyror som ingår i deras sekvenser. I varje proteinfamilj används bara ett särskilt, representativt urval av domäner. Dessa är valda så att då sekvenserna jämförs parvis, finns det inget par inom familjen med högre sekvensidentitet än ca 20%. Detta urval görs för att få så stor spridning som möjligt på sekvenserna inom familjen. En programvaruserie har utvecklats för att välja ut representanter för varje familj och sedan bygga saHMMer baserade på dessa. Det visar sig att saHMMerna kan hitta rätt familj till en hög andel av de testade sekvenserna, med nästan inga fel. De är också bättre än den ofta använda metoden Pfam på att hitta rätt familj till helt nya proteinsekvenser. saHMMerna finns tillgängliga genom FISH-servern, vilken alla kan använda via Internet för att hitta vilken familj ett intressant protein kan tillhöra. Den andra metoden som presenteras i avhandlingen är sekundärstruktur-HMMer, ssHMMer, vilka är byggda från vanliga multipla sekvensöverlagringar, men också från information om vilka sekundärstrukturer proteinsekvenserna i familjen har. När en proteinsekvens jämförs med ssHMMen används en förutsägelse om sekundärstrukturen, och den beräknade sannolikheten att sekvensen tillhör familjen kommer att baseras både på sekvensen av aminosyror och på sekundärstrukturen. Vid en jämförelse visar det sig att HMMer baserade på flera sekvenser är bättre än sådana baserade på endast en sekvens, när det gäller att hitta rätt veckning för en proteinsekvens. HMMerna blir ännu bättre om man också tar hänsyn till sekundärstrukturen, både då den riktiga sekundärstrukturen används och då man använder en teoretiskt förutsagd.
Jeanette Hargbo.
APA, Harvard, Vancouver, ISO, and other styles
9

Won, Kyoung-Jae. "Exploring the structure of Hidden Markov Models for biological sequence analysis." Thesis, University of Southampton, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.427702.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Törnkvist, Maria. "Synovial sarcoma : molecular, biological and clinical implications /." Stockholm, 2004. http://diss.kib.ki.se/2004/91-7140-024-9/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Schliep, Alexander. "A Bayesian approach to learning Hidden Markov model topology with applications to biological sequence analysis." [S.l. : s.n.], 2002. http://deposit.ddb.de/cgi-bin/dokserv?idn=964626330.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Roth, Christian [Verfasser]. "Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts / Christian Roth." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2021. http://nbn-resolving.de/urn:nbn:de:gbv:7-21.11130/00-1735-0000-0008-5912-0-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Fält, Susann. "Analysis of global gene expression in complex biological systems using microarray technology /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-612-3/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Bajak, Edyta Zofia. "Genotoxic stress: novel biomarkers and detection methods : uncovering RNAs role in epigenetics of carcinogenesis /." Stockholm, 2005. http://diss.kib.ki.se/2005/91-7140-415-5/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Grigolon, Silvia. "Modelling and inference for biological systems : from auxin dynamics in plants to protein sequences." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112178/document.

Full text
Abstract:
Tous les systèmes biologiques sont formés d’atomes et de molécules qui interagissent et dont émergent des propriétés subtiles et complexes. Par ces interactions, les organismes vivants peuvent subvenir à toutes leurs fonctions vitales. Ces propriétés apparaissent dans tous les systèmes biologiques à des niveaux différents, du niveau des molécules et gènes jusqu’aux niveau des cellules et tissus. Ces dernières années, les physiciens se sont impliqués dans la compréhension de ces aspects particulièrement intrigants, en particulier en étudiant les systèmes vivants dans le cadre de la théorie des réseaux, théorie qui offre des outils d’analyse très puissants. Il est possible aujourd’hui d’identifier deux classes d’approches qui sont utilisée pour étudier ces types de systèmes complexes : les méthodes directes de modélisation et les approches inverses d’inférence. Dans cette thèse, mon travail est basé sur les deux types d’approches appliquées à trois niveaux de systèmes biologiques. Dans la première partie de la thèse, je me concentre sur les premières étapes du développement des tissus biologiques des plantes. Je propose un nouveau modèle pour comprendre la dynamique collective des transporteurs de l’hormone auxine et qui permet la croissance non-homogène des tissu dans l’espace et le temps. Dans la deuxième partie de la thèse, j’analyse comment l’évolution contraint la diversité́ de séquence des protéines tout en conservant leur fonction dans différents organismes. En particulier, je propose une nouvelle méthode pour inférer les sites essentiels pour la fonction ou la structure de protéines à partir d’un ensemble de séquences biologiques. Finalement, dans la troisième partie de la thèse, je travaille au niveau cellulaire et étudie les réseaux de signalisation associés à l’auxine. Dans ce contexte, je reformule un modèle préexistant et propose une nouvelle technique qui permet de définir et d’étudier la réponse du système aux signaux externes pour des topologies de réseaux différentes. J’exploite ce cadre théorique pour identifier le rôle fonctionnel de différentes topologies dans ces systèmes
All biological systems are made of atoms and molecules interacting in a non- trivial manner. Such non-trivial interactions induce complex behaviours allow- ing organisms to fulfill all their vital functions. These features can be found in all biological systems at different levels, from molecules and genes up to cells and tissues. In the past few decades, physicists have been paying much attention to these intriguing aspects by framing them in network approaches for which a number of theoretical methods offer many powerful ways to tackle systemic problems. At least two different ways of approaching these challenges may be considered: direct modeling methods and approaches based on inverse methods. In the context of this thesis, we made use of both methods to study three different problems occurring on three different biological scales. In the first part of the thesis, we mainly deal with the very early stages of tissue development in plants. We propose a model aimed at understanding which features drive the spontaneous collective behaviour in space and time of PINs, the transporters which pump the phytohormone auxin out of cells. In the second part of the thesis, we focus instead on the structural properties of proteins. In particular we ask how conservation of protein function across different organ- isms constrains the evolution of protein sequences and their diversity. Hereby we propose a new method to extract the sequence positions most relevant for protein function. Finally, in the third part, we study intracellular molecular networks that implement auxin signaling in plants. In this context, and using extensions of a previously published model, we examine how network structure affects network function. The comparison of different network topologies provides insights into the role of different modules and of a negative feedback loop in particular. Our introduction of the dynamical response function allows us to characterize the systemic properties of the auxin signaling when external stimuli are applied
APA, Harvard, Vancouver, ISO, and other styles
16

Cao, Haibo. "Protein Structure Recognition From Eigenvector Analysis to Structural Threading Method." Washington, D.C. : Oak Ridge, Tenn. : United States. Dept. of Energy. Office of Science ; distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy, 2003. http://www.osti.gov/servlets/purl/822060-2L2Xvm/native/.

Full text
Abstract:
Thesis (Ph.D.); Submitted to Iowa State Univ., Ames, IA (US); 12 Dec 2003.
Published through the Information Bridge: DOE Scientific and Technical Information. "IS-T 2028" Haibo Cao. 12/12/2003. Report is also available in paper and microfiche from NTIS.
APA, Harvard, Vancouver, ISO, and other styles
17

Wang, Bo. "Novel statistical methods for evaluation of metabolic biomarkers applied to human cancer cell lines." Miami University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=miami1399046331.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Khodji, Hiba. "Apprentissage profond et transfert de connaissances pour la détection d'erreurs dans les séquences biologiques." Electronic Thesis or Diss., Strasbourg, 2023. http://www.theses.fr/2023STRAD058.

Full text
Abstract:
L'utilisation généralisée des technologies à haut débit dans le domaine biomédical génère d'énormes quantités de données, notamment la nouvelle génération de technologies de séquençage du génome. L'alignement multiple de séquences sert d'outil fondamental pour analyser ces données, avec des applications dans l'annotation des génomes, prédiction des structures et fonctions des protéines, ou la compréhension des relations évolutives, etc. Toutefois, divers facteurs, tels que des algorithmes d'alignement peu fiables, une prédiction de gènes incorrecte, ou des séquençages génomiques incomplets, ont tendance à compromettre la précision des alignements multiples de séquences. Dans cette thèse, nous nous intéressons à l'évaluation de la qualité des données en utilisant des techniques d'apprentissage profond. Nous proposons des modèles basés sur les réseaux de neurones convolutifs pour l'identification d'erreurs dans les représentations visuelles des alignements. Notre objectif principal est de proposer un outil d'assistance aux experts du domaine dans leurs études, où la fiabilité des alignements est cruciale. Ainsi, nous nous sommes intéressés à fournir des explications fiables pour les prédictions de nos modèles en exploitant l'intelligence artificielle explicable (XAI). Plus particulièrement, nous avons exploité les explications visuelles comme fondement pour un mécanisme de transfert d'apprentissage visant principalement à améliorer la capacité d'un modèle à discerner les caractéristiques les plus pertinentes dans les données d'entrée. Enfin, nous avons proposé de nouvelles métriques conçues pour permettre l'évaluation de cette capacité. Les premiers résultats suggèrent que notre approche parvient à trouver un bon équilibre entre la complexité d'un modèle, sa performance, et son explicabilité, et qu'elle peut être exploitée dans des domaines où la disponibilité des données est limitée et la compréhension des résultats est cruciale
The widespread use of high throughput technologies in the biomedical field is producing massive amounts of data, notably the new generation of genome sequencing technologies. Multiple Sequence Alignment (MSA) serves as a fundamental tool for the analysis of this data, with applications including genome annotation, protein structure and function prediction, or understanding evolutionary relationships, etc. However, the accuracy of MSA is often compromised due to factors such as unreliable alignment algorithms, inaccurate gene prediction, or incomplete genome sequencing. This thesis addresses the issue of data quality assessment by leveraging deep learning techniques. We propose novel models based on convolutional neural networks for the identification of errors in visual representations of MSAs. Our primary objective is to assist domain experts in their research studies, where the accuracy of MSAs is crucial. Therefore, we focused on providing reliable explanations for our model predictions by harnessing the potential of explainable artificial intelligence (XAI). Particularly, we leveraged visual explanations as a foundation for a transfer learning framework that aims essentially to improve a model's ability to focus on underlying features in an input. Finally, we proposed novel evaluation metrics designed to assess this ability. Initial findings suggest that our approach achieves a good balance between model complexity, performance, and explainability, and could be leveraged in domains where data availability is limited and the need for comprehensive result explanation is paramount
APA, Harvard, Vancouver, ISO, and other styles
19

Grandin, Lori Cristina. "Aplicações de modelos logisticos regressivos em biologia molecular." [s.n.], 2006. http://repositorio.unicamp.br/jspui/handle/REPOSIP/307187.

Full text
Abstract:
Orientador: Hildete Prisco Pinheiro
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica
Made available in DSpace on 2018-08-06T00:58:27Z (GMT). No. of bitstreams: 1 Grandin_LoriCristina_M.pdf: 676646 bytes, checksum: 180775a0f53ffb688d7c1603339ff1b8 (MD5) Previous issue date: 2006
Resumo: O avanço do sequenciamento dos genes tem incentivado o desenvolvimento de novas técnicas estatísticas para analisar dados genéticos. Nesse trabalho, os modelos logísticos regressivos, introduzidos por Bonney (1986), são apresentados primeiramente no contexto de análise de dados de família e posteriormente esses modelos são utilizados para analisar freqüências de códons em seqüências de DNA mitocondrial. Considerar independência entre os nucleotídeos no códon pode ser uma suposição muito forte, ou seja, biologicamente irreal. Por isso, várias estruturas de dependência são apresentadas para analisar as freqüências dos códons. Por exemplo, uma estrutura markoviana de primeira ordem pode ser adequada para explicar a dependência das bases no códon. A função de log-verossimilhança é avaliada e várias comparações são feitas para analisar qual o modelo mais parcimonioso. Aplicações desses modelos são feitas utilizando-se dados reais de seqüências do gene NADH4 do genoma mitocondrial humano
Abstract: The advance of gene sequencing has stimulated the development of new statistical techniques to analyze genetic data. In this work the logistic regressive models, introduced by Bonney (1986), are presented first in the context of analysis of familial data and then they are used to analyze codon frequencies in mitochondrial DNA sequences. The assumption of independence among nucleotide frequencies in a codon can be a very strong one, or biologically unreal. In view of this, several structures of dependence are presented to analyze the codon frequencies. For example, a first order Markovian structure can be appropriate to explain the dependence of the base frequencies in the codon. The log-likelihood function is evaluated and several comparisons are made to analyze which is the most parcimonious model. Applications of these models are made using real data of NADH4 gene sequences of the human mitochondrial genome
Mestrado
Bioestatistica
Mestre em Estatística
APA, Harvard, Vancouver, ISO, and other styles
20

Zhi, Degui. "Discovery and analysis of mosaic arrangements in biological sequences and structures." Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2006. http://wwwlib.umi.com/cr/ucsd/fullcit?p3194816.

Full text
Abstract:
Thesis (Ph. D.)--University of California, San Diego, 2006.
Title from first page of PDF file (viewed February 28, 2006). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 140-156).
APA, Harvard, Vancouver, ISO, and other styles
21

Barton, Carl Samuel. "Algorithmic problems in strings with applications to the analysis of biological sequences." Thesis, King's College London (University of London), 2015. http://kclpure.kcl.ac.uk/portal/en/theses/algorithmic-problems-in-strings-with-applications-to-the-analysis-of-biological-sequences(461c8961-c256-4ff8-97f7-c0718709367d).html.

Full text
Abstract:
Recent advances in molecular biology have dramatically changed the way biological data analysis is performed [119, 92]. Next-Generation Sequenc- ing (NGS) technologies produce high-throughput data of highly controlled quality, hundreds of times faster and cheaper than a decade ago. Mapping of short reads to a reference sequence is a fundamental problem in NGS technologies. After finding an occurrence of a high quality fragment of the read, the rest must be approximately aligned, but a good alignment would not be expected to contain a large number of gaps (consecutive insertions or deletions). We present an alternative alignment algorithm which computes the optimal alignment with a bounded number of gaps. Another problem arising from NGS technologies is merging overlapping reads into a single string. We present a data structure which allows for the efficient computation of the overlaps between two strings as well as being applicable to other problems. Weighted strings are a representation of data that allows for a subtle representation of ambiguity in strings. In this document we present algorithms for other problems related to weighted strings: the computation of exact and approximate inverted repeats in weighted strings, computing repetitions and computing covers. We investigate the average-case complexity of wildcard matching. Wildcards can be used to model single nucleotide polymorphisms and so, efficient algorithms to search for strings with wildcards are necessary. In this document we investigate how efficient algorithms for this problem can be on average. There exist many organisms such as viruses, bacteria, eukaryotic cells, and archaea which have a circular DNA structure. If a biologist wishes to find occurrences of a particular virus in a carriers DNA sequence which may not be circular it must be possible to efficiently locate occurrences of circular strings. In this document we present a number of algorithms for circular string matching.
APA, Harvard, Vancouver, ISO, and other styles
22

Fortino, Vittorio. "Sequence analysis in bioinformatics: methodological and practical aspects." Doctoral thesis, Universita degli studi di Salerno, 2013. http://hdl.handle.net/10556/985.

Full text
Abstract:
2011 - 2012
My PhD research activities has focused on the development of new computational methods for biological sequence analyses. To overcome an intrinsic problem to protein sequence analysis, whose aim was to infer homologies in large biological protein databases with short queries, I developed a statistical framework BLAST-based to detect distant homologies conserved in transmembrane domains of different bacterial membrane proteins. Using this framework, transmembrane protein domains of all Salmonella spp. have been screened and more than five thousands of significant homologies have been identified. My results show that the proposed framework detects distant homologies that, because of their conservation in distinct bacterial membrane proteins, could represent ancient signatures about the existence of primeval genetic elements (or mini-genes) coding for short polypeptides that formed, through a primitive assembly process, more complex genes. Further, my statistical framework lays the foundation for new bioinformatics tools to detect homologies domain-oriented, or in other words, the ability to find statistically significant homologies in specific target-domains. The second problem that I faced deals with the analysis of transcripts obtained with RNA-Seq data. I developed a novel computational method that combines transcript borders, obtained from mapped RNA-Seq reads, with sequence features based operon predictions to accurately infer operons in prokaryotic genomes. Since the transcriptome of an organism is dynamic and condition dependent, the RNA-Seq mapped reads are used to determine a set of confirmed or predicted operons and from it specific transcriptomic features are extracted and combined with standard genomic features to train and validate three operon classification models (Random Forests - RFs, Neural Networks – NNs, and Support Vector Machines - SVMs). These classifiers have been exploited to refine the operon map annotated by DOOR, one of the most used database of prokaryotic operons. This method proved that the integration of genomic and transcriptomic features improve the accuracy of operon predictions, and that it is possible to predict the existence of potential new operons. An inherent limitation of using RNA-Seq to improve operon structure predictions is that it can be not applied to genes not expressed under the condition studied. I evaluated my approach on different RNA-Seq based transcriptome profiles of Histophilus somni and Porphyromonas gingivalis. These transcriptome profiles were obtained using the standard RNA-Seq or the strand-specific RNA-Seq method. My experimental results demonstrate that the three classifiers achieved accurate operon maps including reliable predictions of new operons. [edited by author]
XI n.s.
APA, Harvard, Vancouver, ISO, and other styles
23

Rosen, Gail L. "Signal processing for biologically-inspired gradient source localization and DNA sequence analysis." Diss., Available online, Georgia Institute of Technology, 2006, 2006. http://etd.gatech.edu/theses/available/etd-07102006-123527/.

Full text
Abstract:
Thesis (Ph. D.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2007.
Oliver Brand, Committee Member ; James H. McClellan, Committee Member ; Paul Hasler, Committee Chair ; Mark T. Smith, Committee Member ; David Anderson, Committee Member.
APA, Harvard, Vancouver, ISO, and other styles
24

Bao, Yu. "Identification and Analysis of Critical Sites in RNA/Protein Sequences and Biological Networks." Kyoto University, 2018. http://hdl.handle.net/2433/235113.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Lindskog, Mats. "Computational analyses of biological sequences -applications to antibody-based proteomics and gene family characterization." Doctoral thesis, KTH, School of Biotechnology (BIO), 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-527.

Full text
Abstract:

Following the completion of the human genome sequence, post-genomic efforts have shifted the focus towards the analysis of the encoded proteome. Several different systematic proteomics approaches have emerged, for instance, antibody-based proteomics initiatives, where antibodies are used to functionally explore the human proteome. One such effort is HPR (the Swedish Human Proteome Resource), where affinity-purified polyclonal antibodies are generated and subsequently used for protein expression and localization studies in normal and diseased tissues. The antibodies are directed towards protein fragments, PrESTs (Protein Epitope Signature Tags), which are selected based on criteria favourable in subsequent laboratory procedures.

This thesis describes the development of novel software (Bishop) to facilitate the selection of proper protein fragments, as well as ensuring a high-throughput processing of selected target proteins. The majority of proteins were successfully processed by this approach, however, the design strategy resulted in a number ofnfall-outs. These proteins comprised alternative splice variants, as well as proteins exhibiting high sequence similarities to other human proteins. Alternative strategies were developed for processing of these proteins. The strategy for handling of alternative splice variants included the development of additional software and was validated by comparing the immunohistochemical staining patterns obtained with antibodies generated towards the same target protein. Processing of high sequence similarity proteins was enabled by assembling human proteins into clusters according to their pairwise sequence identities. Each cluster was represented by a single PrEST located in the region of the highest sequence similarity among all cluster members, thereby representing the entire cluster. This strategy was validated by identification of all proteins within a cluster using antibodies directed to such cluster specific PrESTs using Western blot analysis. In addition, the PrEST design success rates for more than 4,000 genes were evaluated.

Several genomes other than human have been finished, currently more than 300 genomes are fully sequenced. Following the release of the tree model organism black cottonwood (Populus trichocarpa), a bioinformatic analysis identified unknown cellulose synthases (CesAs), and revealed a total of 18 CesA family members. These genes are thought to have arisen from several rounds of genome duplication. This number is significantly higher than previous studies performed in other plant genomes, which comprise only ten CesA family members in those genomes. Moreover, identification of corresponding orthologous ESTs belonging to the closely related hybrid aspen (P. tremula x tremuloides) for two pairs of CesAs suggest that they are actively transcribed. This indicates that a number of paralogs have preserved their functionalities following extensive genome duplication events in the tree’s evolutionary history.

APA, Harvard, Vancouver, ISO, and other styles
26

Patavino, Claudio <1981&gt. "Core Genome Multilocus Sequence Typing and Single Nucleotide Polymorphism Analysis in the Epidemiology of Brucella melitensis Infections." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amsdottorato.unibo.it/9076/1/Thesis_PhD_Claudio_Patavino.pdf.

Full text
Abstract:
The use of whole-genome sequencing (WGS) using next-generation sequencing (NGS) technologies has become a widely accepted method for microbiology laboratories in the application of molecular typing, for outbreak tracing and genomic epidemiology. Several studies demonstrated the usefulness of WGS data analysis through single-nucleotide polymorphism (SNP) calling from a reference sequence analysis for Brucella melitensis, whereas gene-by-gene comparison through core-genome multilocus sequence typing (cgMLST) has not been explored so far. The current study developed an allele-based cgMLST method and compared its performance to that of the genome-wide SNP approach and the traditional multilocus variable-number tandem repeat analysis (MLVA) on a defined sample collection. The data set was comprised of 37 epidemiologically linked animal cases of brucellosis as well as 71 isolates with unknown epidemiological status, composed of human and animal samples collected in Italy. The cgMLST scheme generated in this study contained 2,704 targets of the B. melitensis 16M reference genome. We established the potential criteria necessary for inclusion of an isolate into a brucellosis outbreak cluster to be 6 loci in the cgMLST and 7 in WGS SNP analysis. Higher phylogenetic distance resolution was achieved with cgMLST and SNP analysis than with MLVA, particularly for strains belonging to the same lineage, thereby allowing diverse and unrelated genotypes to be identified with greater confidence. The application of a cgMLST scheme to the characterization of B. melitensis strains provided insights into the epidemiology of this pathogen, and it is a candidate to be a benchmark tool for outbreak investigations in human and animal brucellosis.
APA, Harvard, Vancouver, ISO, and other styles
27

Lerario, Antonio Marcondes. "Perfis de expressão de genes relacionados a metástases em uma coorte de pacientes adultos e pediátricos portadores de neoplasias do córtex da supra-renal." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/5/5135/tde-24112008-171634/.

Full text
Abstract:
O carcinoma do córtex da supra-renal (ACC) é uma neoplasia rara e de prognóstico sombrio. Embora estudos moleculares tenham explorado diversos aspectos relacionados à tumorigênese destas neoplasias, o conhecimento das vias relacionadas à disseminação metastática é restrito. O objetivo do presente estudo é avaliar a expressão de genes relacionados a metástases em uma coorte de pacientes portadores de tumores do córtex da supra-renal metastáticos e não-metastáticos, a fim de identificar vias envolvidas na disseminação metastática destas neoplasias, novos marcadores prognósticos e eventuais alvos terapêuticos. Os perfis de expressão de 27 tumores do córtex da supra-renal de 15 pacientes adultos (8 ACC e 7 adenomas) e 12 pediátricos (5 metastáticos e 7 não-metastáticos) foram avaliados por um array de expressão contendo um painel de 113 genes que sabidamente estão envolvidos no processo de disseminação metastática de diversas neoplasias humanas. A análise de grupamentos mostrou que adenoma dos pacientes adultos forma um grupo distinto dos demais tumores (ACC de adultos e tumores pediátricos). Os genes MMP11e DENR foram identificados como diferencialmente expressos quando se compararam os adenomas e ACC de adultos. Na comparação dos tumores pediátricos nenhum gene foi diferencialmente expresso. Assim como a análise de grupamento, a PCA utilizando grupo selecionado de genes também não foi capaz partir os tumores pediátricos em subgrupos pela evolução. A expressão dos genes MMP2, TIMP3 e FN1 também foram avaliados por RT-PCR e foram concordantes com os dados gerados pelo array de expressão. O papel da LOH como causa da redução da expressão de TIMP3 foi estudado com tipagem de microssatélites. Em alguns casos, foi identificada LOH da região 22q13. Porém, em outros casos em que a expressão do TIMP3 foi bastante reduzida, não houve LOH. Em resumo, foram identificados aspectos moleculares importantes envolvidos na disseminação e metástases de neoplasias do córtex da supra-renal de adultos e crianças, bem como características biológicas deste processo. Diferentes padrões de expressão identificados em tumores metastáticos e não-metastáticos podem ajudar na predição do prognóstico
Adrenocortical carcinoma (ACC) is a rare neoplasm with a poor prognosis. Although molecular studies have uncovered many aspects of ACC tumorigenesis, little is known about molecular pathways involved in metastatic spread. The objective of our study is to analyze the expression profile of metastasis-related genes in a cohort of metastatic and nonmetastatic adrenocortical tumors in order to identify genes involved in the metastatic spread, as well as to find new prognostic markers. The expression profiles of 27 adrenocortical tumors from 15 adults (8 ACC and 7 adenomas) and 12 children (5 metastatic and 7 non-metastatic) were evaluated by an array of 113 known to be involved in human metastasis. Cluster analysis showed adult adrenocortical adenomas form a group distinct from other adrenocortical tumors (adult carcinomas and pediatric tumors). The comparison of adult adenoma and ACC revealed that MMP11 and DENR were differentially expressed between these two groups while no gene was differentially expressed among pediatric adrenocortical tumors. Similarly to cluster analysis, Principal component analysis failed to identify partition amongst pediatric tumors categorized by their evolution. The expression data of MMP2, TIMP3 and FN1 genes by RT-PCR agreed with those generated by the arrays. LOH of 22q12.3 region was detected in some cases in which TIMP3 down regulation was verified (but not in all cases). In conclusion, we have identified important aspects of molecular pathways and biological characteristics involved in metastatic spread of adrenocortical tumors. Distinctive patterns of gene expression between metastatic and nonmetastatic tumors may help in prognosis prediction
APA, Harvard, Vancouver, ISO, and other styles
28

Piccinini, S. "ZEIN CODING SEQUENCE ANALYSES FOR MAIZE GENOTYPING AND ZEIN PROTEIN MANIPULATION TOWARDS THE IMPROVEMENT OF THE MAIZE SEED PROTEIN QUALITY." Doctoral thesis, Università degli Studi di Milano, 2014. http://hdl.handle.net/2434/241132.

Full text
Abstract:
Maize (Zea mays) is an important source of proteins for human and animal nutrition. However, because of the lack of lysine and the low content in methionine and tryptophan, maize’s proteins are of low quality. These deficiencies mainly result from the low levels of these essential amino acids in the zein storage proteins, which account for 50% of the total protein in mature seed. In this context, the first aim of this PhD thesis has been to develop artificial zein genes encoding for polypeptides with a higher content in lysine and methionine, and capable to be sorted and correctly accumulated into the endosperm, as occur for natural zein polypeptides. Two strategies have been employed for maize bio-fortification. First, we exploited the natural heterogeneity among α-zein genes to create a synthetic gene, ZRK, in which six arginine residues have been substituted with lysine. Then, by combining the N-terminal methionine-rich G3 sequence and the C-terminal lysine-rich region of Histone3 and Histone4 of maize, the G3H3 and G3H4 artificial genes were created, respectively. In vitro and in vivo expression analyses of these genes showed that all synthetic proteins are synthesized and accumulated into the ER membranes of either the rabbit reticulocyte/canine membrane system or of transformed tobacco protoplasts. The second aim of this thesis has been to use the wide heterogeneity of zein gene family to obtain an intra-species recognition tool, or individual barcode, for inbreds and Lombard varieties discrimination. Lombard varieties and maize inbreds were analysed by 2D gel protein fractionations and DNA gel blot analyses. For each genotype the 2D and Southern blot pattern were converted into a binary code, and then into a barcode. In both the approaches, each genotype was univocally identified making zeins a valuable tool for identification of maize germplasm.
APA, Harvard, Vancouver, ISO, and other styles
29

Coppe, Alessandro. "A bioinformatic and computational approach to regulation of genome function: integrated analysis of genome organization, promoter sequences and gene expression." Doctoral thesis, Università degli studi di Padova, 2008. http://hdl.handle.net/11577/3426395.

Full text
Abstract:
Although much is known about gene expression regulation in both Prokaryotes and Eukaryotes, this complex and fascinating mechanism still remains to be fully elucidated. The relatively recent advent of high-throughput techniques for studying transcription has made available an invaluable amount of data that can be used for genome-wide analysis using bioinformatics approaches. These computational methods have now become an integrative part of biological research. The different topics of this thesis are related to the development and application of computational methodologies to better understand the basis of genomic gene expression regulation at different levels. A first level of investigation regarded the relationships among chromosomal structure, expression profile and functional characteristics, focusing on genomic organization and structure. For this task, REEF (REgionally Enriched Features) software has been developed, designed to identify genomic regions enriched in specific features, such as a class or group of genes homogeneous for expression and/or functional characteristics. REEF can be used to detect density variations of specific features along the genome sequence, for example genomic regions with significant enrichment of genes which are co-expressed, differentially expressed, or related to particular molecular functions. Local feature enrichment is calculated using test statistic based on the hypergeometric distribution applied genome-wide by sliding windows and false discovery rate is used for controlling multiplicity. REEF has been applied to the study of genomic distribution of tissue-specific genes and to the analysis of gene differentially expressed when comparing different myeloid cell lines. These analyses identified clusters of tissue-specific genes in the human genome and positional enrichment of hemopoietic functional module-related genes. The second level of investigation regarded gene expression regulation at promoter level. Unknown transcription factor binding sites might be detected by searching for shared sequence elements in upstream regulatory regions of genes with common biological function and/or similar expression profile. In fact, genes with similar expression are frequently co-regulated and genes with related function are often similarly expressed. New methodologies for the identification of regulatory motifs in human promoters were developed and tested. Since a drawback of this approach is the exceedingly high number of results, the use of biological knowledge both before and after application of automated pattern discovery allowed the definition of a “sheltered environment” enhancing the specificity of the computational analysis. COOP (Clustering of Overlapping Patterns) software for the extraction of sequence motifs was developed and used to analyze genomic sequences of 1 Kb upstream of 91 retina specific genes, identifying a set of putative regulative motifs, frequently occurring in retina promoter sequences. Most of them are localized in the proximal portion of promoters and tend to be less variable in central region than in lateral regions and some of them are similar to known regulatory sequences. The performances of COOP were further evaluated by simulation approaches and by applying it to a standard positive control dataset, proposed by Tompa and colleagues for systematic evaluation and comparison of pattern discovery software. A webtool for the prediction of functional elements in promoter sequences, MOST (MOtif Searching web Tool), has been applied to different datasets under various testing conditions in order to study the influence of specific search parameters on results. Two groups of promoter sequences containing known regulatory signals were used as positive control datasets: the public yeast benchmark dataset of Tompa and colleagues and a custom produced dataset of 37 human promoter sequences, subgroups of which contained some instances of one of nine different signals. The testing of performances of the method on different benchmark datasets gave quite positive results. Taking the concepts behind COOP to a new level, a more rigorous methodology was developed for the identification of surprising and putatively regulatory motifs, by comparing their frequency in promoters sequences of co-expressed genes with that in a background set of sequences, representative of the whole set of human gene promoters. Promoter sequences are divided in overlapping regions, considered independently, for identifying positional bias in the arrangement of transcription factors binding sites along promoters. Due to the genome-wide characteristics of this approach, a new webtool for the automatic identification and retrieval of a high number of promoters in the human genome was also developed. This motif discovery methodology has been adopted to investigate structure of promoters of genes crucial during myeloid differentiation.
APA, Harvard, Vancouver, ISO, and other styles
30

Leggio, Loredana. "Functional analysis of two alternative transcripts from porin1 gene of Drosophila melanogaster and involvement of corresponding 5'UTR sequences in the translation control." Doctoral thesis, Università di Catania, 2017. http://hdl.handle.net/10761/3935.

Full text
Abstract:
VDAC (Voltage Dependent Anion-selective channel) is a voltage-dependent anion selective channel, a pore forming protein located in the outer mitochondrial membrane (OMM) of all eukaryotic organisms. This protein allows the passage of nucleotides, ions and small metabolites between cytosol and mitochondria. VDAC has a beta-barrel structure with its N-terminal forming an a-helix inserted into the pore and involved in the gating process. VDAC takes a maximum open state at voltage around 0mV; while at voltages greater than 20mV, for positive and negative values, VDAC switches to a closed state. The crucial role of this channel is dictated by its strategic position, able to interact with many enzymes, proteins or metabolites directly or indirectly involved in several cellular pathways, explaining thus its involvement in many diseases. In Drosophila melanogaster the gene encoding for the principal isoform of VDAC is porin1,which is made up by five exons, of which exon 1A and exon 1B, being 5 UTR sequences, are alternative between them. In fact, by means an alternative splicing process two transcripts are produced containing at the 5 -end or the exon 1A or the exon 1B, followed by the same coding sequence. The alternative transcripts 1A-VDAC and 1B-VDAC are produced in all developmental stage of fly and in any tissue. Thanks to a previous work from our team we know that 1B-VDAC transcript is unproductive because it is not translated. This result allowed us to speculate about a different cellular function for this 1B-VDAC transcript, respect the canonical 1A-VDAC mRNA. Considering all data known, the main objectives of my thesis work were: 1) understanding the molecular mechanisms responsible of the failing of 1B-VDAC mRNA translation; 2) investigate about the meaning in D. melanogaster of the alternative unproductive 1B-VDAC mRNA. Using several organism models, such as Saccharomyces cerevisiae Dpor1, HeLa cells and Drosophila melanogaster embryonic SL2 cell, we obtained important results that allow us to formulate the following hypothesis: the inhibitor role played by the 5 UTR 1B sequence on translation in yeast is probably associated to the action of specific RBPs able to bind the inner sequence 16-31. In yeast this mechanism is itself sufficient to guarantee the translational repression of the coding sequence of a gene reporter as well as the full-length mRNA of porin1 gene, demonstrating in this way that the 5 UTR 1B contains all necessary information for inducing inhibition of protein synthesis in yeast; in Drosophila the 3 UTR sequence of 1B-VDAC transcript is indispensable for carry out the translational repression mechanism of the same transcript. In fly indeed, the 1B-Luc construct is never expressed while the same 5 UTR-1B fused to the porin1 coding sequence does not influence translation of the same porin; the 5 UTR 1A represents in general a sequence able to amplify translation of any coding sequence fused to it. Indeed, fusing the 5 UTR 1A with coding sequences of gene reporters we obtained always a noticeable increase in the expression of the relative protein. This effect is not detectable in fly cells where, after transfection with the heterologous transcript 1A-porin, an increase of the endogenous amount of VDAC protein is not obtained. In Drosophila the 3 UTR sequence of 1A-VDAC transcript plays probably a role in controlling endogenous levels of VDAC. Indeed, by transfecting fly cells with the 1A-VDAC transcript which does not contain the 3 UTR sequence, the VDAC protein is only weakly translated.
APA, Harvard, Vancouver, ISO, and other styles
31

Siedhoff, Dominic [Verfasser], Heinrich [Akademischer Betreuer] Müller, and Dorit [Gutachter] Merhof. "A parameter-optimizing model-based approach to the analysis of low-SNR image sequences for biological virus detection / Dominic Siedhoff ; Gutachter: Dorit Merhof ; Betreuer: Heinrich Müller." Dortmund : Universitätsbibliothek Dortmund, 2016. http://d-nb.info/1115464019/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Bellora, Pereyra Nicolás. "In silico analysis of regulatory motifs in gene promoters." Doctoral thesis, Universitat Pompeu Fabra, 2010. http://hdl.handle.net/10803/7202.

Full text
Abstract:
Regulation of gene transcription is a complex process involving many different proteins, some of which bind in a sequence-specific manner to DNA motifs in the gene promoter. The need to maintain specific interactions between transcription factors and proteins involved in the RNA polymerase II complex is expected to impose constrains on the relative position and spacing of the interacting DNA motifs. The present work includes the development of a novel approach to identify motifs that show a preferential location in DNA sequences and the implementation of a public web application called PEAKS. We investigated if the arrangement and nature of the most common motifs depended on the breath of expression of the gene. We found differences that serve to illustrate that many key specific regulatory signals may be present in the proximal promoter region in mammalian genes. We also apply other methods for the identification of specific transcription factors (TFs) involved in the co-regulation of a set of genes. Data from experimentally-verified transcription factors binding sites (TFBSs) support the biological relevance of our findings.
La regulació de la transcripció dels gens és un procés complex que implica moltes proteïnes diferents, algunes de les quals s'unexien a motius específics d'ADN localitzats a la regió promotora dels gens. S'espera que la necessitat de mantenir les interaccions específiques entre els factors de transcripció i les proteïnes implicades en el complex de la ARN polimerasa II imposi limitacions en la posició relativa i l'espaiat dels motius d'interacció amb l'ADN. La feina presentada en aquesta tesi inclou el desenvolupament d'un nou metode per l'identificació de motius que mostren una localització preferencial en seqüències d'ADN i l'implementació d'una aplicació web pública anomenada PEAKS. Hem investigat si la col·locació i la naturalesa de la majoria dels motius comuns depen del rang d'expresió del gen. Hem trobat diferències que serveixen per il·lustrar el fet que moltes senyals clau de regulació gènica poden estar presents en la regió proximal del promotor dels gens de mamífers. També hem aplicat altres mètodes per a l'identificació de factors de transcripció (TFs) específics involucrats en la co-regulació d'un grup de gens. Dades de llocs d'unio dels TFs (TFBSs) verificats experimentalment recolzen la rellevància biològica dels nostres resultats.
APA, Harvard, Vancouver, ISO, and other styles
33

Reis, Thais Aparecida Vieira. "Estudo epidemiológico da doença diarréica aguda associada aos adenovírus, em Juiz de Fora, Minas Gerais, no período 2007-2010." Universidade Federal de Juiz de Fora, 2012. https://repositorio.ufjf.br/jspui/handle/ufjf/1927.

Full text
Abstract:
Submitted by Renata Lopes (renatasil82@gmail.com) on 2016-06-30T20:05:29Z No. of bitstreams: 1 thaisaparecidavieirareis.pdf: 1630302 bytes, checksum: d9e1d09066119dbc3c4a339f7ccc6564 (MD5)
Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-07-13T15:59:16Z (GMT) No. of bitstreams: 1 thaisaparecidavieirareis.pdf: 1630302 bytes, checksum: d9e1d09066119dbc3c4a339f7ccc6564 (MD5)
Made available in DSpace on 2016-07-13T15:59:16Z (GMT). No. of bitstreams: 1 thaisaparecidavieirareis.pdf: 1630302 bytes, checksum: d9e1d09066119dbc3c4a339f7ccc6564 (MD5) Previous issue date: 2012-06-29
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
FAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas Gerais
A doença diarréica aguda (DDA) é, ainda hoje, uma das principais causas de morbidade e mortalidade infantil, nos países em desenvolvimento. Na diarreia aguda não bacteriana, os adenovírus entéricos constituem um dos importantes agentes etiológicos da doença. Os adenovírus humanos (HAdV) pertencem à família Adenoviridae e ao gênero Mastadenovirus, cujos membros estão classificados em 7 espécies (A-G) e 54 sorotipos. Dentre estes, os sorotipos 40 e 41, ambos da espécie F, são os mais comumente associados com a DDA. Considerando-se, então, a importância da DDA em países em desenvolvimento, o grande número de casos que não são esclarecidos e o pouco conhecimento sobre a infecção e a participação dos HAdV na gênese da DDA, em Juiz de Fora, Minas Gerais, foi realizado o presente estudo. Entre janeiro de 2007 e dezembro de 2010, foram analisados 395 espécimes fecais diarréicos, provenientes de indivíduos de várias idades, atendidos em serviços ambulatoriais e hospitalizados. A presença dos HAdV foi detectada pela reação de PCR , utilizando-se os iniciadores específicos e a caracterização molecular das amostras positivas foi feita pelo seqüenciamento e análise filogenética das sequências parciais do gene do Hexon. Para as análises estatísticas, foi utilizado o programa SPSS versão 13.0, tendo se adotado um valor de significância como p<0,05. A prevalência da infecção por HAdV, no período 2007-2010, foi de 10,9% (43/395). Os resultados mostraram que não houve correlação significante entre a procedência da amostra (ambulatorial X hospitalar) e a ocorrência da infecção (p=0,152), o mesmo tendo sido observado em relação ao gênero do indivíduo infectado (p=0,393). Por outro lado, a maioria dos casos positivos foi detectada em crianças de até 24 meses de idade, mostrando uma correlação estatisticamente significante entre a idade dos indivíduos infectados e a ocorrência da infecção (p=0,007). Na maioria dos casos de infecção pelo HAdV (36/43), este foi o único agente viral detectado, no entanto, foram observados casos de coinfecção HAdV/Rotavirus (5/43) e HAdV/Norovirus (2/43). A análise filogenética das sequências parciais do gene do Hexon, de 35 amostras positivas, revelou que todas agruparam com amostras de HAdV da espécie F, sorotipo 41, confirmando assim, a associação de HAdV entéricos, nos casos estudados. Este levantamento epidemiológico revelou a presença e a circulação destes vírus na população de Juiz de Fora, no período avaliado, bem como sua importante participação na gênese da DDA, permitindo assim, esclarecer uma boa parte dos casos da doença, que normalmente ficaria sem definição etiológica.
Acute diarrheal disease (ADD) is still the major cause of child morbidity and mortality in developing countries. Among the non-bacterial diarrhea, enteric adenoviruses are one of the most important etiologic agents of disease. Human adenoviruses (HAdV) belongs to the Adenoviridae family and Mastadenovirus genus. The virus are classified into seven species (A-G) and 54 serotypes. Among them, serotypes 40 and 41, both of the species F, are the most commonly associated with ADD. Taking in consideration the importance of the DDA in developing countries, the large number of cases that don’t have the etiologic agent identified and the lack of knowledge about the participation of HAdV infection and the pathogenesis of ADD, we performed this study in Juiz de Fora, Minas Gerais. Between January of 2007 and December of 2010 395 diarrheal fecal specimens originating from individuals of various ages treated in ambulatory and hospitalized were analyzed. The presence of HAdV was detected by PCR, using specific primers, and molecular characterization of positive samples was performed by sequencing and phylogenetic analysis of partial sequences of the hexon gene. For statistical analyzes, we used SPSS version 13.0, adopting a value of p <0.05 as significant. The prevalence of infection by HAdV between 2007-2010 was 10.9% (43/395). The results showed no significant correlation between the origin of the sample (hospital X ambulatory) and the occurrence of infection (p = 0.152), and the same was observed in relation to gender of the infected person (P=0,393). Moreover, the majority of positive cases was detected in children under 24 months of age, showing a statistically significant correlation between the age of the infected individuals and the occurrence of infection (p=0,007). In most cases of infection HAdV (36/43), this was the only viral agent detected, however, cases of co-infection HAdV / Rotavirus (5/43) and HAdV /Norovirus (2/43) were identified. Phylogenetic analysis of partial sequences of the hexon gene from 35 positive samples revealed that all samples clustered with HAdV species F, serotypes 41, confirming the association of enteric HAdV in the cases of this study. This epidemiological survey revealed the presence and circulation of these viruses in the population of Juiz de Fora in the period studied, as well as its important role in the genesis of the DDA, our data identified a good number of cases of the disease, which normally remains unidentified.
APA, Harvard, Vancouver, ISO, and other styles
34

Grievink, Liat Shavit. "Lineage specific evolution and phylogenetic analysis : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biomathematics at Massey University, Palmerston North, New Zealand." Massey University, 2009. http://hdl.handle.net/10179/1048.

Full text
Abstract:
Phylogenetic models generally assume a homogeneous, time reversible, stationary process. These assumptions are often violated by the real, far more complex, evolutionary process. This thesis is centered on non-homogeneous, lineage-specific, properties of molecular sequences. It consist several related but independent studies. LineageSpecificSeqgen, an extension to the Seq-Gen program, which allows generation of sequences with changes in the proportion of variable sites, is introduced. This program is then used in a simulation study showing that changes in the proportion of variable sites can hinder tree estimation accuracy, and that tree reconstruction under the best-fit model chosen using a relative test can result in a wrong tree. In this case, the less commonly used absolute model-fit was a better predictor of tree estimation accuracy. This study found that increased taxon sampling of lineages that have undergone a change in the proportion of variable sites was critical for accurate tree reconstruction and that, in contrast to some earlier findings, the accuracy of maximum parsimony is adversely affected by such changes. This thesis also addresses the well-known long-branch attraction artifact. A nonparametric bootstrap test to identify changes in the substitution process is introduced, validated, and applied to the case of Microsporidia, a highly reduced intracellular parasite. Microsporidia was first thought to be an early branching eukaryote, but is now believed to be sister to, or included within, fungi. Its apparent basal eukaryote position is considered a result of long-branch attraction due to an elevated evolutionary rate in the microsporidian lineage. This study shows that long-branch estimates and basal positioning of Microsporidia both correlate with increased proportions of radical substitutions in the microsporidian lineage. In simulated data, such increased proportions of radical substitutions leads to erroneous long-branch estimates. These results suggest that the long microsporidian branch is likely to be a result of an increased proportion of radical substitutions on that branch, rather than increased evolutionary rate per se. The focus of the last study is the intriguing case of Mesostigma, a fresh water green alga for which contradicting phylogenetic relationships were inferred. While some studies placed Mesostigma within the Streptophyta lineage (which includes land plants), others placed it as the deepest green algae divergence. This basal positioning is regarded as a result of long-branch attraction due to poor taxon sampling. Reinvestigation of a 13- taxon mitochondrial amino acid dataset and a sub-dataset of 8 taxa reveals that site sampling, and in particular the treatment of missing data, is just as important a factor for accurate tree reconstruction as taxon sampling. This study identifies a difficulty in recreating the long-branch attraction observed for the 8-taxon dataset in simulated data. The cause is likely to be the smaller number of amino acid characters per site in simulated data compared to real data, highlighting the fact that there are properties of the evolutionary process that are yet to be accurately modeled.
APA, Harvard, Vancouver, ISO, and other styles
35

Xu, Minzhen. "Regulation of Transcription of Mouse Immunoglobulin Germ-Line γ1 RNA: Structural Characterization of Germ-Line γ1 RNA and Molecular Analysis of the Promoter: A Dissertation." eScholarship@UMMS, 1991. https://escholarship.umassmed.edu/gsbs_diss/99.

Full text
Abstract:
The antibody class switch is achieved by DNA recombination between the sequences called switch (S) regions located 5' to immunoglobulin (Ig) heavy chain constant (CH) region genes. This process can be induced in cultured B cells by polyclonal stimulation and switching can be directed to specific antibody classes by certain lymphokines. These stimuli may regulate the accessibility of CH genes and their S regions to a recombinase as indicated by hypomethylation and transcriptional activity. For example, RNAs transcribed from specific unrearranged (germ-line) CH genes are induced prior to switching under conditions that promote subsequent switching to these same CH genes. The function of transcription of these germ-line CH genes is unknown. How stimuli regulate the accessibility of CHgenes is also unclear. I report in this dissertation the structure of the RNA transcribed from the unrearranged Cγ1 gene in mouse spleen cells treated with LPS plus a HeLa cell supernatant containing recombinant interleukin 4 (rIL-4). I will also show that an 150-bp region upstream of the first initiation site of germ-line γ1 RNA contains promoter and enhancer elements responsible for basal level expression and inducibility by phorbol 12-myristate 13-acetate (PMA) and synergy with IL-4 in an IgM+ B cell line, L10A6.2, and an IgG2a+B cell line, A20.3. The germ-line γ1 RNA is initiated at multiple start sites 5' to the tandem repeats of the γ1 switch (Sγ1) region. As is true for analogous RNAs transcribed from other unrearranged genes, the germ-line γ1 RNA has an I exon transcribed from the region 5' to the Sγ1 region.. The Iγ1 exon is spliced at a unique site to the Cγ1 gene. The germ-line γ1 RNA has an open-reading frame (ORF) that potentially encodes a small protein 48 amino acids in length. Elements located within the 150 bp region 5' to the first initiation site of germ-line γ1 RNA are necessary and sufficient to confer inducibility by PMA and synergy with IL-4 to a minimal thymidine kinase (TK) promoter in L10A6.2 cells but are not sufficient to confer this inducibility in A20.3 cells. Linker-scanning mutations demonstrated that these multiple elements function in a mutually dependent manner as indicated by the fact that mutation of any single element will decrease constitutive expression and inducibility by PMA and PMA plus IL-4. This 150-bp region contains several consensus sequences that bind to known or putative transcription factors, including a C/EBP binding site/IL-4 response element (in the promoter for Ia Aαkgene), four CACCC boxes, a PU box, a TGFβ inhibitory element (TIE), an interferon-αβ response element (αβIRE), and an AP-3 site. My results begin to provide a description of the mechanism of regulation of the accessibility of unrearranged germ-line Sγ1-Cγ1 gene. By activating the germ-line γ1 promoter, IL-4 induces transcription of germ-line γ1 RNA, thereby inducing accessibility of the Sγ1-Cγ1 gene. By inhibiting expression of the germ-line γ1 promoter, IFNγ and TGFβ down-regulate transcription of germ-line γ1 RNA, thus reducing the accessibility of the Sγ1-Cγ1 gene. My results also suggest that signaling via the antigen receptor on B cells may be involved in induction of switch to IgG1. Furthermore, this is the first case reported in which multiple functionally interdependent elements are needed to respond to PMA.
APA, Harvard, Vancouver, ISO, and other styles
36

Barletta, Vívian Honorato. "Detecção e caracterização molecular de norovírus associados a casos de doença diarréica aguda infantil." Universidade Federal de Juiz de Fora (UFJF), 2011. https://repositorio.ufjf.br/jspui/handle/ufjf/5287.

Full text
Abstract:
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-06-21T13:51:59Z No. of bitstreams: 1 vivianhonoratobarletta.pdf: 2632443 bytes, checksum: c92921b16b55178bf7a44b922bd843ef (MD5)
Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-08-07T19:10:07Z (GMT) No. of bitstreams: 1 vivianhonoratobarletta.pdf: 2632443 bytes, checksum: c92921b16b55178bf7a44b922bd843ef (MD5)
Made available in DSpace on 2017-08-07T19:10:07Z (GMT). No. of bitstreams: 1 vivianhonoratobarletta.pdf: 2632443 bytes, checksum: c92921b16b55178bf7a44b922bd843ef (MD5) Previous issue date: 2011-02-24
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Os norovírus (NoV) são importantes agentes etiológicos, responsáveis por surtos e casos esporádicos de doença diarréica aguda, que acometem indivíduos de todas as idades. A partícula viral não apresenta envelope e o material genético é constituído por uma molécula de RNA de fita simples, de polaridade positiva. Pertencem à família Caliciviridae, gênero Norovirus e estão classificados em cinco genogrupos (GI–V), sendo que os NoV humanos estão agrupados nos genogrupos I, II e IV e destes, os NoV do genogrupo II e genótipo 4 (GII.4) são os mais comumente encontrados, em todo o mundo. Apesar da associação destes vírus com a doença diarréica aguda estar bem documentada na literatura mundial, no Brasil, os trabalhos são escassos e restritos aos grandes centros e adjacências. Assim, considerando-se o pouco conhecimento sobre os norovírus e a inexistência de dados epidemiológicos na cidade de Juiz de Fora, MG, foi realizado o presente estudo, cujos objetivos foram a detecção e caracterização molecular de amostras de NoV, associadas a casos esporádicos de doença diarréica aguda infantil, bem como a avaliação da influência de fatores climáticos e demográficos na ocorrência destas infecções. De janeiro de 2008 a dezembro de 2009, 218 espécimes fecais foram analisados para a presença de NoV, por RT-PCR convencional, todos obtidos de crianças de 0 a 12 anos de idade, proveniente de atendimentos ambulatoriais (89,45%) e hospitalizados (10,55%). Foram detectadas 20 (9,17%) amostras positivas e observou-se uma tendência de sazonalidade das infecções no período da estação seca, no ano de 2008, fato que não se repetiu em 2009. A maioria das amostras positivas foi detectada em crianças na faixa de 0 a 36 meses e não houve correlação, estatisticamente significante, entre a ocorrência das infecções e o sexo. Das 20 amostras detectadas, 19 foram caracterizadas como NoV GII e 1 como NoV GI. O sequenciamento parcial do genoma e a análise filogenética das amostras selecionadas, revelou a presença de NoV dos genótipos GII.4 e GII.6, que cocircularam nos dois anos do estudo. As amostras NoV GII.4, detectadas em Juiz de Fora, apresentaram maior similaridade de nucleotídeos e de aminoácidos com aquelas que circularam no estado do Rio de Janeiro nos anos de 2006, 2007 e 2008. A análise filogenética das amostras NoV GII.6 detectadas em Juiz de Fora, associada à alta similaridade das sequências de nucleotídeos e aminoácidos, mostrou que estas foram mais proximamente relacionadas com a amostra NoV GII.6 (GU132461/2007), detectada no estado do Rio de Janeiro em 2007, fatos que, aliados à proximidade geográfica de ambas as cidades, sugerem uma possível linhagem comum entre as mesmas. Este levantamento epidemiológico permitiu constatar a presença e circulação de NoV na população infantil de Juiz de Fora, MG, demonstrando sua importante participação como agente etiológico das diarreias agudas, também nesta comunidade
Noroviruses (NoV) are important etiological agents responsible for outbreaks and sporadic cases of acute diarrhea in individuals of all ages. The viral particles are nonenveloped with and the genome is composed of a positive single-stranded RNA. Norovirus belongs to the Caliciviridae family, Norovirus genus and are classified into five genogroups (GI-V), with GI, GII and GIV being found in human and among them, the NoV GII genotype 4 (GII.4) are the most commonly found worldwide. In Brazil, norovirus surveys are realized mainly in research institutes, carried out in the biggest centers and surroundings. Thus, considering the little knowledge about these viruses and the lack of epidemiological data on this viral infection in the Juiz de Fora city, MG state, it was performed this study, which aimed to detect and characterize the NoV samples, associated with sporadic cases of acute infantile diarrhea, as well as asses the influence of climatic and demographic factors in the occurrence of these infections. Between January 2008 to December 2009, 218 fecal specimens were analyzed for the presence of NoV by conventional RT-PCR, all obtained from children 0-12 years of age, from outpatient (89.45%) and inpatients (10.55%). We detected 20 (9.17%) positive samples and there was a tendency for seasonal infections during the dry season in 2008, a fact which was not repeated in 2009. The biggest number of positive samples were detected in children aged 0 to 24 months and there was no statistically significant correlation between the occurrence of infections and sex. Of the 20 samples detected, 19 were characterized as NoV GII and 1 as NoV GI. The partial genome sequencing and phylogenetic analysis of selected samples revealed the presence of NoV genotypes GII.4 and GII.6, which co-circulated in the two years of study. Samples NoV GII.4 detected in Juiz de Fora, showed greater similarity of nucleotides and aminoacids with those that circulated in the state of Rio de Janeiro during 2006, 2007 and 2008. Phylogenetic analysis of the samples GII.6 NoV, detected in Juiz de Fora, associated with the high similarity of nucleotide and amino acid sequences showed that they were most closely related to the sample GII.6 NoV (GU132461/2007) detected in the state of Rio de Janeiro in 2007. This fact associated with the geographical proximity of both cities, suggesting a possible common lineage between them. This epidemiological survey revealed the presence and circulation of NoV in the infantile population of Juiz de Fora, MG, demonstrating its important role as an etiologic agent of acute diarrhea, also in this community.
APA, Harvard, Vancouver, ISO, and other styles
37

BERTOLAZZI, Giorgio. "MicroRNA Interaction Networks." Doctoral thesis, Università degli Studi di Palermo, 2021. http://hdl.handle.net/10447/498927.

Full text
Abstract:
La tesi di Giorgio Bertolazzi è incentrata sullo sviluppo di nuovi algoritmi per la predizione dei legami miRNA-mRNA. In particolare, un algoritmo di machine-learning viene proposto per l'upgrade del web tool ComiR; la versione originale di ComiR considerava soltanto i siti di legame dei miRNA collocati nella regione 3'UTR dell'RNA messaggero. La nuova versione di ComiR include nella ricerca dei legami la regione codificante dell'RNA messaggero.
Bertolazzi’s thesis focuses on developing and applying computational methods to predict microRNA binding sites located on messenger RNA molecules. MicroRNAs (miRNAs) regulate gene expression by binding target messenger RNA molecules (mRNAs). Therefore, the prediction of miRNA binding is important to investigate cellular processes. Moreover, alterations in miRNA activity have been associated with many human diseases, such as cancer. The thesis explores miRNA binding behavior and highlights fundamental information for miRNA target prediction. In particular, a machine learning approach is used to upgrade an existing target prediction algorithm named ComiR; the original version of ComiR considers miRNA binding sites located on mRNA 3’UTR region. The novel algorithm significantly improves the ComiR prediction capacity by including miRNA binding sites located on mRNA coding regions.
APA, Harvard, Vancouver, ISO, and other styles
38

"Graphical representation of biological sequences and its applications." Thesis, 2010. http://library.cuhk.edu.hk/record=b6074915.

Full text
Abstract:
Among all existing alignment-free methods for comparing biological sequences, the sequence graphical representation provides a simple approach to view, sort, and compare gene structures. The aim of graphical representation is to display DNA or protein sequences graphically so that we can easily find out visually how similar or how different they are. Of course, only the visual comparison of sequences is not enough for the follow-up research work. We need more accurate comparison. This leads us to develop the application of the graphical representation for biological sequences.
In this thesis, we have two main contributions: (1) We construct a protein map with the help of our proposed new graphical representation for protein sequences. Each protein sequence can be represented as a point in this map, and cluster analysis of proteins can be performed for comparison between the points. This protein map can be used to mathematically specify the similarity of two proteins and predict properties of an unknown protein based on its amino acid sequence. (2) We construct a novel genome space with biological geometry, which is a subspace in RN . In this space each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships.
Yu, Chenglong.
Adviser: Luk Hing Sun.
Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2010.
Includes bibliographical references (leaves 59-64).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
APA, Harvard, Vancouver, ISO, and other styles
39

Min, Renqiang. "Machine Learning Approaches to Biological Sequence and Phenotype Data Analysis." Thesis, 2010. http://hdl.handle.net/1807/26209.

Full text
Abstract:
To understand biology at a system level, I presented novel machine learning algorithms to reveal the underlying mechanisms of how genes and their products function in different biological levels in this thesis. Specifically, at sequence level, based on Kernel Support Vector Machines (SVMs), I proposed learned random-walk kernel and learned empirical-map kernel to identify protein remote homology solely based on sequence data, and I proposed a discriminative motif discovery algorithm to identify sequence motifs that characterize protein sequences' remote homology membership. The proposed approaches significantly outperform previous methods, especially on some challenging protein families. At expression and protein level, using hierarchical Bayesian graphical models, I developed the first high-throughput computational predictive model to filter sequence-based predictions of microRNA targets by incorporating the proteomic data of putative microRNA target genes, and I proposed another probabilistic model to explore the underlying mechanisms of microRNA regulation by combining the expression profile data of messenger RNAs and microRNAs. At cellular level, I further investigated how yeast genes manifest their functions in cell morphology by performing gene function prediction from the morphology data of yeast temperature-sensitive alleles. The developed prediction models enable biologists to choose some interesting yeast essential genes and study their predicted novel functions.
APA, Harvard, Vancouver, ISO, and other styles
40

Loving, Joshua. "Bit-parallel and SIMD alignment algorithms for biological sequence analysis." Thesis, 2017. https://hdl.handle.net/2144/27172.

Full text
Abstract:
High-throughput next-generation sequencing techniques have hugely decreased the cost and increased the speed of sequencing, resulting in an explosion of sequencing data. This motivates the development of high-efficiency sequence alignment algorithms. In this thesis, I present multiple bit-parallel and Single Instruction Multiple Data (SIMD) algorithms that greatly accelerate the processing of biological sequences. The first chapter describes the BitPAl bit-parallel algorithms for global alignment with general integer scoring, which assigns integer weights for match, mismatch, and insertion/deletion. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations. Bit-parallelism has previously been applied to other pattern matching problems, producing fast algorithms. In timed tests, we show that BitPAl runs 7 - 25 times faster than a standard iterative algorithm. The second part involves two approaches to alignment with substitution scoring, which assigns a potentially different substitution weight to every pair of alphabet characters, better representing the relative rates of different mutations. The first approach extends the existing BitPAl method. The second approach is a new SIMD algorithm that uses partial sums of adjacent score differences. I present a simple partial sum method as well as one that uses parallel scan for additional acceleration. Results demonstrate that these algorithms are significantly faster than existing SIMD dynamic programming algorithms. Finally, I describe two extensions to the partial sums algorithm. The first adds support for affine gap penalty scoring. Affine gap scoring represents the biological likelihood that it is more likely for gaps to be continuous than to be distributed throughout a region by introducing a gap opening penalty and a gap extension penalty. The second extension is an algorithm that uses the partial sums method to calculate the tandem alignment of a pattern against a text sequence using a single pattern copy. Next generation sequencing data provides a wealth of information to researchers. Extracting that information in a timely manner increases the utility and practicality of sequence analysis algorithms. This thesis presents a family of algorithms which provide alignment scores in less time than previous algorithms.
APA, Harvard, Vancouver, ISO, and other styles
41

Xu, Weijia. "On integrating biological sequence analysis with metric distance based database management systems." Thesis, 2006. http://hdl.handle.net/2152/2955.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Sahraeian, Sayed 1983. "Probabilistic Approaches in Comparative Analysis of Biological Networks and Sequences." Thesis, 2013. http://hdl.handle.net/1969.1/149225.

Full text
Abstract:
Comparative analysis of genomic data investigates the relationship of genome structure and function across different biological species to shed light on their similarities and differences. In this dissertation, we study two important problems in comparative genomics, namely comparative sequence analysis and comparative network analysis. In the comparative sequence analysis, we study the multiple sequence alignment of protein and DNA sequences as well as the structural alignment of multiple RNA sequences. For closely related sequences, multiple sequence alignment can be efficiently performed through progressive techniques. However, for divergent sequences it is very challenging to predict an accurate alignment. Here, we introduce PicXAA, an efficient non-progressive technique for multiple protein and DNA sequence alignment. We also further extend PicXAA to PicXAA-R for structural alignment of RNA sequences. PicXAA and PicXAA-R greedily build up the alignment from sequence regions with high local similarity, thereby yielding an accurate global alignment that effectively captures local similarities among sequences. As another important research area in comparative genomics, we also investigate the comparative network analysis problem. Translation of increasing number of large-scale biological networks into meaningful biological insights requires efficient computational techniques. One such example is network querying, which aims to identify subnetwork regions in a large target network that are similar to a given query network. Here, we introduce an efficient algorithm for querying large-scale biological networks, called RESQUE. RESQUE adopts a semi-Markov random walk model to probabilistically estimate the correspondence scores between nodes that belong to different networks. The target network is iteratively reduced based on the estimated correspondence scores until the best matching subnetwork emerges. The proposed network querying scheme is computationally efficient, can handle any network query with an arbitrary topology, and yields accurate querying results. We also extend the idea used in RESQUE to develop an efficient algorithm for alignment of multiple large-scale biological networks, called SMETANA. SMETANA outperforms state-of- the-art network alignment techniques, in terms of both computational efficiency and alignment accuracy. The accomplished studies have enabled us to provide a coherent framework for probabilistic approach to comparative analysis of biological sequences and networks. Such a probabilistic framework helps us employ rigorous mathematical schemes to find accurate and efficient solutions to these problems.
APA, Harvard, Vancouver, ISO, and other styles
43

Herms, Inke [Verfasser]. "Probabilistic arithmetic automata : applications of a stochastic computational framework in biological sequence analysis / Inke Herms." 2009. http://d-nb.info/999735500/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Wang, Jian. "Feature search in biological sequence data analysis of gene-finding tools and implementation of Interactive Pattern Search /." 2004. http://purl.galileo.usg.edu/uga%5Fetd/wang%5Fjian%5F200408%5Fms.

Full text
Abstract:
Thesis (M.S.)--University of Georgia, 2004.
Directed by Eileen T. Kraemer. Includes articles published in, and an article submitted to Bioinformatics. Includes bibliographical references (leaves 134-139).
APA, Harvard, Vancouver, ISO, and other styles
45

Zwickl, Derrick Joel. "Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion." Thesis, 2006. http://hdl.handle.net/2152/2666.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Schliep, Alexander [Verfasser]. "A Bayesian approach to learning Hidden Markov model topology with applications to biological sequence analysis / vorgelegt von Alexander Schliep." 2002. http://d-nb.info/964626330/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Shao, Qiang. "Theoretical Studies on Proteins to Reveal the Mechanism of Their Folding and Biological Functions." 2009. http://hdl.handle.net/1969.1/ETD-TAMU-2009-12-7395.

Full text
Abstract:
The folding mechanism of several β-structures (e.g., β-hairpins and β-sheets) was studied using newly developed enhanced sampling methods along with MD simulations in all implicit solvent environments. The influence of different implicit solvent models on the folding simulation of β-structure was also tested. Through the analysis of the free energy landscape as the function of several suitable reaction coordinates, we observed that the folding of β-hairpins is actually a two-state transition. In addition, the folding free energy landscapes for those related hairpins indicate the apparent sequence dependence, which demonstrates different folding mechanisms of similar β-structures of varied sequence. We also found that the stability of backbone hydrogen bonds is determined by the turn sequence and the composition of hydrophobic core cluster in β-structures. Neither of these findings was reported before. The processive movement of kinesin was also studied at the mesoscopic level. We developed a simple physical model to understand the asymmetric hand-over-hand mechanism of the kinesin walking on the microtubule. The hand-over-hand motion of the conventional kinesin is reproduced in our model and good agreement is achieved between calculated and experimental results. The experimentally observed limping of the truncated kinesin is also perfectly described by our model. The global conformational change of kinesin heads (e.g., the power stroke of neck-linkers which works as lever-arms during the kinesin walking, the transition between open and closed states of the switch region of the nucleotide binding domain in each head induced by the nucleotide binding and release) was studied for both dimeric and monomeric kinesins using a coarse-grained model, anisotropic network model (ANM). At the same time Langevin mode analysis was used to study the solvent influence on the motions of the kinesin head mimicked by ANM. Additionally, the correlation between the neck-linker and the nucleotide binding site was also studied for dimeric and monomeric kinesins. The former shows the apparent correlation between two subdomains whereas the latter does not, which may explain the experimental observation that only the dimeric kinesin is capable of walking processively on the microtubule.
APA, Harvard, Vancouver, ISO, and other styles
48

Ma, Fangrui. "Biological sequence analyses theory, algorithms, and applications /." 2009. http://proquest.umi.com/pqdweb?did=1821098721&sid=1&Fmt=2&clientId=14215&RQT=309&VName=PQD.

Full text
Abstract:
Thesis (Ph.D.)--University of Nebraska-Lincoln, 2009.
Title from title screen (site viewed October 13, 2009). PDF text: xv, 233 p. : ill. ; 4 Mb. UMI publication number: AAT 3360173. Includes bibliographical references. Also available in microfilm and microfiche formats.
APA, Harvard, Vancouver, ISO, and other styles
49

Presta, Luana. "Modeling biological systems: from genome sequences to functional insights." Doctoral thesis, 2018. http://hdl.handle.net/2158/1129632.

Full text
Abstract:
Più di vent’anni fa il primo sequenziamento genico automatizzato di un organismo costituì una rivoluzione nel mondo delle scienze biologiche. Dopo qualche anno Carl Woese suggerì che, alla lunga, la reale giustificazione della genomica sarebbe stata la genomica dei microrganismi procarioti, a causa delle importanti implicazioni per lo studio dell’evoluzione biologica e le tante applicazioni biotecnologiche (da quelle mediche a quelle industriali, ambientali e agricole). Ad ogni modo, la sfida divenne poi la possibilità di inferire computazionalmente le proprietà biologiche di un organismo sulla sola base della sua sequenza genica. Tale sfida, ancora in corso, risiede nella possibilità i) di ricostruire le sequenze genomiche di un organismo a partire dalle (relativamente corte) reads ottentute da diverse piattaforme di sequenziamento; ii) di identificare geni all’interno delle sequenze di DNA e assegnare loro funzioni; iii) di predire i fenotipi degli organismi. E’ possibile immaginare queste sfide come ricostruzioni -omiche 1-D, 2-D e 3-D. In questa tesi lo scopo era di esplorare, usando specifici casi studio, tali inferenze biologiche computazionali 1-D, 2-D e 3-D sulle sequenze genomiche procariotiche. Ciascun capitolo della sezione risultati presenterà dati su genomi batterici di ceppi importanti per svariate applicazioni biotecnologiche. I risultati complessivi sono presentati a seconda del grado di profondità dell’inferenza funzionale (fenotipica), dall’assemblaggio e semplice annotazione funzionale di un genoma fino ai potenti modelli metabolici genome-scale. Il focus principale sarà incentrato sull’enfatizzazione del valore predittivo del metabolic modeling su scala genomica per fenotipi complessi e predizione di geni essenziali in silico. La conclusione principale riguarda l’uso integrato dei tool computazionali per assistere le inferenze di systems biology, per aiutare tutte le scienze biologiche (genomiche) predittive. More than twenty years ago the first genome sequencing of an organism was seen as a revolution in the world of biological sciences. After a few years, Carl Woese suggested that in the long run, the real justification of genomics would have been genomic of prokaryotic microorganisms, due to the important implication for the study of biological evolution and the many biotechnological applications (spanning from medical, to agricultural, environmental and industrial). However, the challenge was then the possibility to computationally infer the biological properties of an organism on the simple basis of its genome sequence. Such challenge, still ongoing, relies on the possibility: i) to reconstruct the genome sequences of organisms from the (relatively short) sequence reads obtained on the various sequencing platforms; ii) to identify genes inside DNA sequences and assign functions; iii) to predict organisms’ phenotypes. It is possible to imagine such challenges as a 1-D, 2-D and 3-D “-omics” reconstruction. In this thesis, the aim was to explore, by using specific case studies, such 1-D, 2-D and 3-D computational biology inference on prokaryotic genome sequences. Each chapter of the results section will provide data on bacterial genomes of relevant strains for various biotechnological applications. The overall results are presented according to the depth of functional (phenotypical) inference, from genome assembly and simple functional annotation to the powerful genome-scale metabolic models. The main focus will be centered to emphasize the high predictive value of genome-scale metabolic modeling for complex phenotypes and in silico prediction of gene essentiality. The main conclusion is concerned with an integrated use of the computational tools to assist system biology-based computational inferences, to help whole predictive biological (genomic) sciences.
APA, Harvard, Vancouver, ISO, and other styles
50

Kuo, Chung-Yi, and 郭仲翊. "Systematic Biological Analysis of Tandem Repeats Sequences in Different Species based on Machine Learning." Thesis, 2019. http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5396025%22.&searchmode=basic.

Full text
Abstract:
碩士
國立中興大學
資訊管理學系所
107
Tandem Repeats Sequences are often used in Genetics, and the most well-known use is as a molecular genetic marker studies, which typically exhibit high sequence variability between populations and individuals, and having codominance. Therefore, it is also widely used in genetic diversity analysis. Genetics and Evolution are closely related. Species are all evolved from common descent. It means that species''s genetic information, such as tandem repeat, may contain the genetic information about the ancestors. At the same time, the classification criteria for species classification can represent the characteristics of the common descent of the same class of organism, and this property should also exist in tandem repeats sequences. Therefore, this study analyzes tandem repeats and species classifications, and hopes to find the association between tandem repeats and evolution. The data set used in this study is the genomic data of the Complete and Chromosome that has been sequenced and completed in the NCBI Genome database. According to the classification system of taxonomy, genomic data of 80 different species in 12 different phylum were selected. After finding the model of all tandem repeats by using the tool for finding repeated sequences, the two series of feature selection methods are used to select the representative and representative tandem repeat model. Finally, using the machine learning algorithm C4.5 and CART to build a classification model to explore the feasibility of tandem repeats as species classification.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography