Dissertations / Theses on the topic 'Protein sequence alignment'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Protein sequence alignment.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Abhiman, Saraswathi. "Prediction of function shift in protein families /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-869-X/.
Full textCarroll, Hyrum D. "Biologically Relevant Multiple Sequence Alignment." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2623.pdf.
Full textTalbot, Danielle. "Identifying misalignments in sequence alignment for protein modelling." Thesis, University of Reading, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445754.
Full textGarriga, Nogales Edgar 1990. "New algorithmic contributions for large scale multiple sequence alignments of protein sequences." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2022. http://hdl.handle.net/10803/673526.
Full textEn aquests dies de profunds canvis i una ràpida evolució de la tecnologia, la quantitat de dataque la ciència ha de treballar ha crescut increïblement ràpid i la grandària dels arxius ha crescutde manera quasi prohibitiva.Els alineaments múltiples de seqüència (MSA) es fan servir endiverses àrees de la biologia, i l'increment de les dades ha produït una degradació delsresultats. És per això, que es proposa una nova estratègia per realitzar els alineaments. Aquestnou paradigma permet alinear milions de seqüències i l'opcio de modularitzar el procés.'Regressive' permet la paral·lelització del procés i la combinació de diferents algoritmesd'agrupacio (guide-tree) amb el mètode de alineament que és desitgi. Dins del camp del'agrupació, s'ha de repensar l'estratègia per crear els guide-tree. Un estudi sobre l'estat actualdels mètodes i les seves virtuts i punts febles ha sigut realitzar per llençar una mica de llum enaquesta àrea. Els 'guide-tree' no poden ser el coll de botella, i haurien de servir per començarde la millor manera possible el procés d'alineament.
Bonneau, Richard A. "Gene annotation using Ab initio protein structure prediction : method development and application to major protein families /." Thesis, Connect to this title online; UW restricted, 2001. http://hdl.handle.net/1773/9241.
Full textLassmann, Timo. "Algorithms for building and evaluating multiple sequence alignments /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-887-8/.
Full textHollich, Volker. "Orthology and protein domain architecture evolution /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-783-9/.
Full textLi, Yuheng. "Searching for remotely homologous sequences in protein databases with hybrid PSI-blast." The Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1164741421.
Full textDeBlasio, Dan, and John Kececioglu. "Core column prediction for protein multiple sequence alignments." BIOMED CENTRAL LTD, 2017. http://hdl.handle.net/10150/623957.
Full textAniba, Mohamed Radhouane. "Knowledge based expert system development in bioinformatics : applied to multiple sequence alignment of protein sequences." Strasbourg, 2010. https://publication-theses.unistra.fr/public/theses_doctorat/2010/ANIBA_Mohamed_Radhouane_2010.pdf.
Full textThe objective of this PhD project was the development of an integrated expert system to test, evaluate and optimize all the stages of the construction and the analysis of a multiple sequence alignment. The new system was validated using standard benchmark cases and brings a ncw vision to software development in Bioinformatics: knowledge-guided systems. The architecture used to build the expert system is highly modular and flcxible, allowing AlcxSys to evolve as new algorithms are made available. In the future, AlexSys will he uscd to furthcr optimize each stage of the alignment process, for example by optimizing the input parameters of the different algorithms. The inference engine could also be extended to identify combinations of algorithms that could potentially provide complementary information about the input sequences. For example, well aligned regions from different aligners could be identified and combined into a single consensus alignment. Additional structural and functional information could also be exploited to improve the final alignment accuracy. Finally, a crucial aspect of any bioinformatics tool is its accessibility and usability. Therefore, we are currently developing a web server, and a web services based distributed system. We will also design a novel visualization module that will provide an intuitive, user-friendly interface to all the information retrieved and constructed by AlexSys
Madangopal, Sangeetha. "Comparison of Methods Used for Aligning Protein Sequences." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_theses/30.
Full textZhao, Zhiyu. "Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison." ScholarWorks@UNO, 2008. http://scholarworks.uno.edu/td/851.
Full textOhlson, Tomas. "The use of evolutionary information in protein alignments and homology identification." Doctoral thesis, Stockholm : Stockholm Bioinformatics Center, Stockholm University, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-812.
Full textTångrot, Jeanette. "Structural Information and Hidden Markov Models for Biological Sequence Analysis." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1629.
Full textBioinformatik är ett område där datavetenskapliga och statistiska metoder används för att analysera och strukturera biologiska data. Ett viktigt område inom bioinformatiken försöker förutsäga vilken tredimensionell struktur och funktion ett protein har, utifrån dess aminosyrasekvens och/eller likheter med andra, redan karaktäriserade, proteiner. Det är känt att två proteiner med likande aminosyrasekvenser också har liknande tredimensionella strukturer. Att två proteiner har liknande strukturer behöver dock inte betyda att deras sekvenser är lika, vilket kan göra det svårt att hitta strukturella likheter utifrån ett proteins aminosyrasekvens. Den här avhandlingen beskriver två metoder för att hitta likheter mellan proteiner, den ena med fokus på att bestämma vilken familj av proteindomäner, med känd 3D-struktur, en given sekvens tillhör, medan den andra försöker förutsäga ett proteins veckning, d.v.s. ge en grov bild av proteinets struktur. Båda metoderna använder s.k. dolda Markov modeller (hidden Markov models, HMMer), en statistisk metod som bland annat kan användas för att beskriva proteinfamiljer. Med hjälp en HMM kan man förutsäga om en viss proteinsekvens tillhör den familj modellen representerar. Båda metoderna använder också strukturinformation för att öka modellernas förmåga att känna igen besläktade sekvenser, men på olika sätt. Det mesta av arbetet i avhandlingen handlar om strukturellt förankrade HMMer (structure-anchored HMMs, saHMMer). För att bygga saHMMerna används strukturbaserade sekvensöverlagringar, vilka genereras utifrån hur proteindomänerna kan läggas på varandra i rymden, snarare än utifrån vilka aminosyror som ingår i deras sekvenser. I varje proteinfamilj används bara ett särskilt, representativt urval av domäner. Dessa är valda så att då sekvenserna jämförs parvis, finns det inget par inom familjen med högre sekvensidentitet än ca 20%. Detta urval görs för att få så stor spridning som möjligt på sekvenserna inom familjen. En programvaruserie har utvecklats för att välja ut representanter för varje familj och sedan bygga saHMMer baserade på dessa. Det visar sig att saHMMerna kan hitta rätt familj till en hög andel av de testade sekvenserna, med nästan inga fel. De är också bättre än den ofta använda metoden Pfam på att hitta rätt familj till helt nya proteinsekvenser. saHMMerna finns tillgängliga genom FISH-servern, vilken alla kan använda via Internet för att hitta vilken familj ett intressant protein kan tillhöra. Den andra metoden som presenteras i avhandlingen är sekundärstruktur-HMMer, ssHMMer, vilka är byggda från vanliga multipla sekvensöverlagringar, men också från information om vilka sekundärstrukturer proteinsekvenserna i familjen har. När en proteinsekvens jämförs med ssHMMen används en förutsägelse om sekundärstrukturen, och den beräknade sannolikheten att sekvensen tillhör familjen kommer att baseras både på sekvensen av aminosyror och på sekundärstrukturen. Vid en jämförelse visar det sig att HMMer baserade på flera sekvenser är bättre än sådana baserade på endast en sekvens, när det gäller att hitta rätt veckning för en proteinsekvens. HMMerna blir ännu bättre om man också tar hänsyn till sekundärstrukturen, både då den riktiga sekundärstrukturen används och då man använder en teoretiskt förutsagd.
Jeanette Hargbo.
Ng, Pauline Crystal. "PSSMs : not just roadkill on the information superhighway /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/8116.
Full textKemena, Carsten 1983. "Improving the accuracy and the efficiency of multiple sequence alignment methods." Doctoral thesis, Universitat Pompeu Fabra, 2012. http://hdl.handle.net/10803/128678.
Full textEl alineamiento es uno de los métodos básicos en la comparación de secuencias biológicas, y a menudo el primer pasó en análisis posteriores. Por su posición privilegiada al principio de muchos estudios, la calidad del alineamiento es de gran importancia, de hecho cada resultado basado en un alineamiento depende en gran medida de la calidad de ´este. Este hecho se ha confirmado en diversos artículos recientes, en los cuales se ha investigado los efectos de la elección del método de alineamiento en la reconstrucción filogenética y la estimación de la selección positiva. En esta tesis, presento varios proyectos enfocados en la implementación de mejoras tanto en los métodos de alineamiento múltiple de secuencias como en la evaluación de estos. Concretamente, he tratado problemas como la evaluación de alineamientos estructurales de proteínas, la construcción de alineamientos estructurales y precisos de ARN y también el alineamiento de grandes conjuntos de secuencias.
Hu, Junbin. "Structural and functional studies on heat shock protein Hsp40-Hdj1 and Golgi ER trafficking protein Get3." Thesis, Birmingham, Ala. : University of Alabama at Birmingham, 2009. https://www.mhsl.uab.edu/dt/2009p/huj.pdf.
Full textJohansson, Joakim. "Modifying a Protein-Protein Interaction Identifier with a Topology and Sequence-Order Independent Structural Comparison Method." Thesis, Linköpings universitet, Bioinformatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-147777.
Full textOzer, Hatice Gulcin. "Residue Associations In Protein Family Alignments." The Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1211570026.
Full textMenlove, Kit J. "Model Detection Based upon Amino Acid Properties." BYU ScholarsArchive, 2010. https://scholarsarchive.byu.edu/etd/2253.
Full textNobili, Alberto [Verfasser]. "Improving biocatalysts via semi-rational protein design : use of a multiple sequence alignment platform to reduce screening efforts and facilitate hit identification / Alberto Nobili." Greifswald : Universitätsbibliothek Greifswald, 2016. http://d-nb.info/1113294191/34.
Full textCao, Haibo. "Protein Structure Recognition From Eigenvector Analysis to Structural Threading Method." Washington, D.C. : Oak Ridge, Tenn. : United States. Dept. of Energy. Office of Science ; distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy, 2003. http://www.osti.gov/servlets/purl/822060-2L2Xvm/native/.
Full textPublished through the Information Bridge: DOE Scientific and Technical Information. "IS-T 2028" Haibo Cao. 12/12/2003. Report is also available in paper and microfiche from NTIS.
Gomes, Mireille. "Role of mutual information for predicting contact residues in proteins." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:5ec3c90c-73fb-494f-ad2e-efc718406aa4.
Full textPinheiro, Ana Rita Almeida. "Extracellular enzymes of Botryosphaeriaceae family." Master's thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/17307.
Full textAs espécies da família Botryosphaeriaceae são morfologicamente diversas e descritas como endofíticas, patogénias e saprófitas. Estas são normalmente encontradas numa grande diversidade de hospedeiros. Os fungos patogénicos para plantas Macrophomina phaseolina, Neofusicoccum parvum e Diplodia corticola secretam uma variedade de enzimas extracelulares, tais como proteases e glicosil hidrolases, algumas das quais envolvidas na interação hospedeiro-patogénio. A fim de elucidar a correlação entre microrganismo secretoma-hospedeiro, foi comparado entre estes organismos a quantidade de sequências que codificam para enzimas tais como proteases extracelulares e glicosil hidrolases (xilanases e endoglucanases). Através de ferramentas bioinformáticas, tais como, Clustal X2 e T-Coffee, foi realizado o alinhamento múltiplo de sequências dos domínios das proteínas. Além disso, para estudar a relação evolutiva entre as sequências de proteínas foram construídas árvores filogenéticas utilizando a ferramenta MEGA. Entre M. phaseolina, N. parvum e D. corticola, o genoma de D. corticola contém genes que codificam para uma maior diversidade de famílias glicosil hidrolases sugerindo uma melhor capacidade de adaptação durante sua interação com espécies hospedeiras. A similaridade de sequências observada no alinhamento múltiplo de sequências entre M. phaseolina, N. parvum e D. corticola é explicado pela sua relação evolutiva e não pelo hospedeiro de cada um. A análise filogenética demonstra que a nível evolutivo, M. phaseolina e D. corticola estão mais próximos entre si do que a N. parvum.
Species of the Botryosphaeriaceae family are morphologically diverse and are described as endophytes, pathogens and saprophytes. They are commonly found in a wide range of hosts. The plant pathogenic fungi Macrophomina phaseolina, Neofusicoccum parvum and Diplodia corticola secrete a variety of extracellular enzymes, such as proteases and glycoside hydrolases, some of which are involved in host-pathogen interaction. In order to elucidate the correlation microorganism secretome-host, the amount of sequences encoding extracellular enzymes such as proteases and glycoside hydrolase (xylanases and endoglucanases) was compared between organisms. Through bioinformatics tools, namely Clustal X2 and T-Coffee, multiple sequence alignment of the protein domains was performed. Furthermore, to study the phylogenetic relationship between protein sequences, phylogenetic trees were constructed using MEGA tool. Between M. phaseolina, N. parvum and D. corticola, D. corticola genome contains genes that encode a larger diversity of glycoside hydrolase families suggesting a better capacity for adaptability during its interaction with host species. The sequence similarity observed in the multiple sequence alignment between M. phaseolina, N. parvum and D. corticola is explained by the evolutionary relationship and not by their host type. The phylogenetic analysis shows that at the evolutionary level, M. phaseolina and D. corticola are closer to each other than to N. parvum.
Ho, Ngai-lam, and 何毅林. "Algorithms on constrained sequence alignment." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30201949.
Full textCunial, Fabio. "Analysis of the subsequence composition of biosequences." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44716.
Full textSantos-Ciminera, Patricia Dantas Ciminera Patricia Dantas Santos Santos Patricia. "Molecular epidemiology of epidemic severe malaria caused by Plasmodium vivax in the state of Amazonas, Brazil /." Download the dissertation in PDF, 2005. http://www.lrc.usuhs.mil/dissertations/pdf/Santos2005.pdf.
Full textTress, Michael. "Towards improving the accuracy of GenTHREADER alignments." Thesis, University of Warwick, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.247983.
Full textMidic, Uros. "Genome-Wide Prediction of Intrinsic Disorder; Sequence Alignment of Intrinsically Disordered Proteins." Diss., Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/159800.
Full textPh.D.
Intrinsic disorder (ID) is defined as a lack of stable tertiary and/or secondary structure under physiological conditions in vitro. Intrinsically disordered proteins (IDPs) are highly abundant in nature. IDPs possess a number of crucial biological functions, being involved in regulation, recognition, signaling and control, e.g. their functional repertoire complements the functions of ordered proteins. Intrinsically disordered regions (IDRs) of IDPs have a different amino-acid composition than structured regions and proteins. This fact has been exploited for development of predictors of ID; the best predictors currently achieve around 80% per-residue accuracy. Earlier studies revealed that some IDPs are associated with various human diseases, including cancer, cardiovascular disease, amyloidoses, neurodegenerative diseases, diabetes and others. We developed a methodology for prediction and analysis of abundance of intrinsic disorder on the genome scale, which combines data from various gene and protein databases, and utilizes several ID prediction tools. We used this methodology to perform a large-scale computational analysis of the abundance of (predicted) ID in transcripts of various classes of disease-related genes. We further analyzed the relationships between ID and the occurrence of alternative splicing and Molecular Recognition Features (MoRFs) in human disease classes. An important, never before addressed issue with such genome-wide applications of ID predictors is that - for less-studied organisms - in addition to the experimentally confirmed protein sequences, there is a large number of putative sequences, which have been predicted with automated annotation procedures and lack experimental confirmation. In the human genome, these predicted sequences have significantly higher predicted disorder content. I investigated a hypothesis that this discrepancy is not correct, and that it is due to incorrectly annotated parts of the putative protein sequences that exhibit some similarities to confirmed IDRs, which lead to high predicted ID content. I developed a procedure to create synthetic nonsense peptide sequences by translation of non-coding regions of genomic sequences and translation of coding regions with incorrect codon alignment. I further trained several classifiers to discriminate between confirmed sequences and synthetic nonsense sequences, and used these predictors to estimate the abundance of incorrectly annotated regions in putative sequences, as well as to explore the link between such regions and intrinsic disorder. Sequence alignment is an essential tool in modern bioinformatics. Substitution matrices - such as the BLOSUM family - contain 20x20 parameters which are related to the evolutionary rates of amino acid substitutions. I explored various strategies for extension of sequence alignment to utilize the (predicted) disorder/structure information about the sequences being aligned. These strategies employ an extended 40 symbol alphabet which contains 20 symbols for amino acids in ordered regions and 20 symbols for amino acids in IDRs, as well as expanded 40x40 and 40x20 matrices. The new matrices exhibit significant and substantial differences in the substitution scores for IDRs and structured regions. Tests on a reference dataset show that 40x40 matrices perform worse than the standard 20x20 matrices, while 40x20 matrices - used in a scenario where ID is predicted for a query sequence but not for the target sequences - have at least comparable performance. However, I also demonstrate that the variations in performance between 20x20 and 20x40 matrices are insignificant compared to the variation in obtained matrices that occurs when the underlying algorithm for calculation of substitution matrices is changed.
Temple University--Theses
Mokin, Sergey. "Measuring deviation from a deeply conserved consensus in protein multiple sequence alignments." Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=21956.
Full textD'une espèce à l'autre, des variations peuvent survenir dans la composition des protéines. Les tendances suivies par les colonnes d'un alignement de séquences multiples reflètent les différentes pressions évolutionnaires imposes sur les séquences. Les analyses de conservation de protéines sont utiles à plusieurs fins, comme dans l'évaluation des mutations de maladies, l'analyse de pseudogenes ainsi que les prédictions fonctionnelles de résidus. Cette étude décrit une nouvelle mesure de conservation de colonnes pour les analyses d'alignement de séquences multiples. De plus, nous décrivons l'utilisation de cette nouvelle mesure pour calculer la déviation statistique avec un consensus d'alignement. Nous avons utilisé cette mesure pour deux études cas de séquence : (a) Celle de pseudogenes putatifs du Mycobactérie, et (b) Celle de jeunes séquences spécifiques a certains lignages rétrotransposés dans les génomes humains et souris. Ce faisant, nous avons classifié les positions de résidus hautement conservés et avons évalué les cas ou d'importantes variations existent avec les consensus des alignements de séquences multiples. Cette nouvelle échelle de conservation indique qu'il existe un degré variable de conservation physiochimique pour une entropie fixe des colonnes. En retour, ceci nous permet de détecter les variations physiochimiques des consensus d'une colonne qui ne serait autrement pas détecté par des mesures d'entropie.
Almeida, André Atanasio Maranhão 1981. "Novas abordagens para o problema do alinhamento múltiplo de sequências." [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275646.
Full textTese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-22T15:29:14Z (GMT). No. of bitstreams: 1 Almeida_AndreAtanasioMaranhao_D.pdf: 2248939 bytes, checksum: b57ed5328b80a2fc7f36d1509558e756 (MD5) Previous issue date: 2013
Resumo: Alinhamento de seqüências é, reconhecidamente, uma das tarefas de maior importância em bioinformática. Tal importância origina-se no fato de ser uma operação básica utilizada por diversos outros procedimentos na área, como busca em bases de dados, visualização do efeito da evolução em uma família de proteínas, construção de árvores filogenéticas e identificação de motifs preservados. Seqüências podem ser alinhadas aos pares, problema para o qual já se conhece algoritmo exato com complexidade de tempo O(l2), para seqüências de comprimento l. Pode-se também alinhar simultaneamente três ou mais seqüências, o que é chamado de alinhamento múltiplo de seqüências (MSA, do inglês Multiple Sequence Alignment ). Este, que é empregado em tarefas como detecção de padrões para caracterizar famílias protéicas e predição de estruturas secundárias e terciárias de proteínas, é um problema NP - Difícil. Neste trabalho foram desenvolvidos métodos heurísticos para alinhamento múltiplo de seqüências de proteína. Estudaram-se as principais abordagens e métodos existentes e foi realizada uma série de implementações e avaliações. Em um primeiro momento foram construídos 342 alinhadores múltiplos utilizando a abordagem progressiva. Esta, que é uma abordagem largamente utilizada para construção de MSAs, consiste em três etapas. Na primeira delas é computada a matriz de distâncias. Em seguida, uma árvore guia é gerada com base na matriz e, finalmente, o MSA é construído através de alinhamentos de pares, cuja ordem é definida pela árvore. Os alinhadores desenvolvidos combinam diferentes métodos aplicados a cada uma das etapas. Para a computação das matrizes de distâncias foram desenvolvidos dois métodos, que são capazes também de gerar alinhamentos de pares de seqüências. Um deles constrói o alinhamento com base em alinhamentos locais e o outro utiliza uma função logarítmica para a penalização de gaps. Foram utilizados ainda outros métodos disponíveis numa ferramenta chamada PHYLIP. Para a geração das árvores guias, foram utilizados os métodos clássicos UPGMA e Neighbor Joining. Usaram-se implementações disponíveis em uma ferramenta chamada R. Já para a construção do alinhamento múltiplo, foram implementados os métodos seleção por bloco único e seleção do par mais próximo. Estes, que se destinam a seleção xiii do par de alinhamentos a agrupar no ciclo corrente, são comumente utilizados para tal tarefa. Já para o agrupamento de um par de alinhamentos, foram implementados 12 métodos inspirados em métodos comumente utilizados - alinhamento de consensos e alinhamento de perfis. Foram feitas todas as combinações possíveis entre esses métodos, resultando em 342 alinhadores. Eles foram avaliados quanto à qualidade dos alinhamentos que geram e avaliou-se também o desempenho dos métodos, utilizados em cada etapa. Em seguida foram realizadas avaliações no contexto de alinhamento baseado em consistência. Nesta abordagem, considera-se MSA ótimo aquele que estão de acordo com a maioria dos alinhamentos ótimos para os n(n ? 1)/2 alinhamentos de pares contidos no MSA. Alterações foram realizadas em um alinhador múltiplo conhecido, MUMMALS, que usa a abordagem. As modificações foram feitas no método de contagem k-mer, assim como, em outro momento, substituiu-se a parte inicial do algoritmo. Foram alterados os métodos para computação da matriz de distâncias e para geração da árvore guia por outros que foram bem avaliados nos testes realizados para a abordagem progressiva. No total, foram implementadas e avaliadas 89 variações do algoritmo original do MUMMALS e, apesar do MUMMALS já produzir alinhamentos de alta qualidade, melhoras significativas foram alcançadas. O trabalho foi concluído com a implementação e a avaliação de algoritmos iterativos. Estes se caracterizam pela dependência de outros alinhadores para a produção de alinhamentos iniciais. Ao alinhador iterativo cabe a tarefa de refinar tais alinhamentos através de uma série de ciclos até que haja uma estabilização na qualidade dos alinhamentos. Foram implementados e avaliados dois alinhadores iterativos não estocásticos, assim como um algoritmo genético (GA) voltado para a geração de MSAs. Nesse algoritmo genético, implementado na forma de um ambiente parametrizável para execução de algoritmos genéticos para MSA, chamado ALGAe, foram realizadas diversas experiências que progressivamente elevaram a qualidade dos alinhamentos gerados. No ALGAe foram incluídas outras abordagens para construção de alinhamentos múltiplos, tais como baseada em blocos, em consenso e em modelos. A primeira foi aplicada na geração de indivíduos para a população inicial. Foram implementados alinhadores baseados em blocos usando duas abordagens distintas e, para uma delas, foram implementadas cinco variações. A segunda foi aplicada na definição de um operador de cruzamento, que faz uso da ferramenta M-COFFEE para realizar alinhamentos baseados em consenso a partir de indivíduos da população corrente do GA, e a terceira foi utilizada para definir uma função de aptidão, que utiliza a ferramenta PSIPRED para predição das estruturas secundárias das seqüências. O ALGAe permite a realização de uma grande variedade de novas avaliações
Abstract: Sequence alignment is one the most important tasks of bioinformatics. It is a basic operation used for several procedures in that domain, such as sequence database searches, evolution effect visualization in an entire protein family, phylogenetic trees construction and preserved motifs identification. Sequences can be aligned in pairs and generate a pairwise alignment. Three or more sequences can also be simultaneously aligned and generate a multiple sequence alignment (MSA). MSAs could be used for pattern recognition for protein family characterization and secondary and tertiary protein structure prediction. Let l be the sequence length. The pairwise alignment takes time O(l2) to build an exact alignment. However, multiple sequence alignment is a NP-Hard problem. In this work, heuristic methods were developed for multiple protein sequence alignment. The main approaches and methods applied to the problem were studied and a series of aligners developed and evaluated. In a first moment 342 multiple aligners using the progressive approach were built. That is a largely used approach for MSA construction and is composed by three steps. In the first one a distance matrix is computed. Then, a guide tree is built based on the matrix and finally the MSA is constructed through pairwise alignments. The order to the pairwise alignments is defined by the tree. The developed aligners combine distinct methods applied to each of steps. Then, evaluations in the consistency based alignment context were performed. In that approach, a MSA is optimal when agree with the majority along all possible optimal pairwise alignments. MUMMALS is a known consistency based aligner. It was changed in this evaluation. The k-mer counting method was modified in two distinct ways. The k value and the compressed alphabet were ranged. In another evaluation, the k-mer counting method and guide tree construction method were replaced. In the last stage of the work, iterative algorithms were developed and evaluated. Those methods are characterized by other aligner's dependence. The other aligners generate an initial population and the iterative aligner performs a refinement procedure, which iteratively changes the alignments until the alignments quality are stabilized. Several evaluations were performed. However, a genetic algorithm for MSA construction stood out along this stage. In that aligner were added other approaches for multiple sequence alignment construction, such as block based, consensus based and template based. The first one was applied to initial population generation, the second one was used for a crossover operator creation and the third one defined a fitness function
Doutorado
Ciência da Computação
Doutor em Ciência da Computação
Koike, Ryoaro. "Comparison of Protein Sequences and Structures based on the Partition Function Formulation : Probabilistic Alignment." 京都大学 (Kyoto University), 2003. http://hdl.handle.net/2433/148598.
Full textLiang, Chengzhi. "COPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequences." Thesis, University of Waterloo, 2001. http://hdl.handle.net/10012/1050.
Full textCapella, Gutiérrez Salvador Jesús 1985. "Analysis of multiple protein sequence alignments and phylogenetic trees in the context of phylogenomics studies." Doctoral thesis, Universitat Pompeu Fabra, 2012. http://hdl.handle.net/10803/97289.
Full textFilogenómica es una disciplina biológica que puede ser entendida como la intersección entre los campos de la genómica y la evolución. Su área de estudio es el análisis evolutivo de los genomas y como se relacionan las distintas especies entre sí. Además, la filogenómica tiene como objetivo anotar funcionalmente, con gran precisi ón, genomas recién secuenciados. De hecho, esta disciplina ha crecido rápidamente en los úultimos años como respuesta a la avalancha de datos provenientes de distintos proyectos genómicos. Para alcanzar sus objetivos, la filogenómica depende, en gran medida, de los distintos métodos usados para generar árboles filogenéticos. Los árboles filogenéticos son las herramientas básicas de la filogenómica y sirven para representar como secuencias y especies se relacionan entre sí por ascendencia. Durante el desarrollo de mi tesis, he centrado mis esfuerzos en mejorar una pipeline (conjunto de programas ejecutados de forma controlada) automática que permite generar árboles filogenéticos con gran precisión, y como ofrecer estos datos a la comunidad científica a través de una base de datos. Entre los esfuerzos realizados para mejorar la pipeline, me he centrado especialmente en el post-procesamiento previo a cualquier análisis de alineamientos múltiples de secuencias, ya que la calidad del alineamiento determina la de los estudios posteriores. En un contexto más biológico, he usado esta pipeline junto con otras herramientas filogenómicas en el estudio de la posición filogenética de Microsporidia. Dadas sus características genómicas especiales, la evolución de Microsporidia constituye uno de los problemas clásicos y difíciles de resolver en filogenómica. Finalmente, he usado también la pipeline como parte de un nuevo método para seleccionar combinaciones óptimas de genes con potencial como marcadores filogenéticos. De hecho, he usado este método para identificar conjuntos de marcadores filogenéticos que permiten reconstruir con alto grado de precisión las relaciones evolutivas en Cyanobacterias y en Hongos. Lo más interesante de este método es que eval úa la fiabilidad de los marcadores en especies no usadas para su selección.
Scheeff, Eric David. "Multiple alignments of protein structures and their application to sequence annotation with hidden Markov models /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2003. http://wwwlib.umi.com/cr/ucsd/fullcit?p3112860.
Full textNosek, Ondřej. "Hardwarová akcelerace algoritmu pro hledání podobnosti dvou DNA řetězců." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2007. http://www.nusl.cz/ntk/nusl-236882.
Full textYáñez, Marissa Elena. "Structural and functional studies of minor pseudopilins from the type 2 secretion system of Vibrio cholerae /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/8086.
Full textPelikán, Ondřej. "Predikce škodlivosti aminokyselinových mutací s využitím metody MAPP." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236151.
Full textLehrach, Wolfgang. "Bayesian machine learning methods for predicting protein-peptide interactions and detecting mosaic structures in DNA sequences alignments." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/29846.
Full textJanda, Jan-Oliver [Verfasser], and Rainer [Akademischer Betreuer] Merkl. "Data mining for important amino acid residues in multiple sequence alignments and protein structures / Jan-Oliver Janda. Betreuer: Rainer Merkl." Regensburg : Universitätsbibliothek Regensburg, 2014. http://d-nb.info/1051132843/34.
Full textJanda, Jan-Oliver Verfasser], and Rainer [Akademischer Betreuer] [Merkl. "Data mining for important amino acid residues in multiple sequence alignments and protein structures / Jan-Oliver Janda. Betreuer: Rainer Merkl." Regensburg : Universitätsbibliothek Regensburg, 2014. http://nbn-resolving.de/urn:nbn:de:bvb:355-epub-299076.
Full textDurek, Pawel, Christian Schudoma, Wolfram Weckwerth, Joachim Selbig, and Dirk Walther. "Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins." Universität Potsdam, 2009. http://opus.kobv.de/ubp/volltexte/2010/4512/.
Full textSimms, Amy Nicole. "Examination of Neisseria gonorrhoeae opacity protein expression during experimental murine genital tract infection /." Download the dissertation in PDF, 2005. http://www.lrc.usuhs.mil/dissertations/pdf/Simms2005.pdf.
Full textGrigolon, Silvia. "Modelling and inference for biological systems : from auxin dynamics in plants to protein sequences." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112178/document.
Full textAll biological systems are made of atoms and molecules interacting in a non- trivial manner. Such non-trivial interactions induce complex behaviours allow- ing organisms to fulfill all their vital functions. These features can be found in all biological systems at different levels, from molecules and genes up to cells and tissues. In the past few decades, physicists have been paying much attention to these intriguing aspects by framing them in network approaches for which a number of theoretical methods offer many powerful ways to tackle systemic problems. At least two different ways of approaching these challenges may be considered: direct modeling methods and approaches based on inverse methods. In the context of this thesis, we made use of both methods to study three different problems occurring on three different biological scales. In the first part of the thesis, we mainly deal with the very early stages of tissue development in plants. We propose a model aimed at understanding which features drive the spontaneous collective behaviour in space and time of PINs, the transporters which pump the phytohormone auxin out of cells. In the second part of the thesis, we focus instead on the structural properties of proteins. In particular we ask how conservation of protein function across different organ- isms constrains the evolution of protein sequences and their diversity. Hereby we propose a new method to extract the sequence positions most relevant for protein function. Finally, in the third part, we study intracellular molecular networks that implement auxin signaling in plants. In this context, and using extensions of a previously published model, we examine how network structure affects network function. The comparison of different network topologies provides insights into the role of different modules and of a negative feedback loop in particular. Our introduction of the dynamical response function allows us to characterize the systemic properties of the auxin signaling when external stimuli are applied
Chrysostomou, Charalambos. "Characterisation and classification of protein sequences by using enhanced amino acid indices and signal processing-based methods." Thesis, De Montfort University, 2013. http://hdl.handle.net/2086/9895.
Full textHatherley, Rowan. "Structural bioinformatics studies and tool development related to drug discovery." Thesis, Rhodes University, 2016. http://hdl.handle.net/10962/d1020021.
Full textKhan, Abdul Kareem. "Electrostaticanalisys the Ras active site." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7161.
Full textRas es una proteïna essencial de senyalització i actúa com un interruptor cel.lular. Les característiques estructurals de Ras en el seu estat actiu (ON) són diferents de les que té a l'estat inactiu (OFF). En aquesta tesi es duu a terme una anàlisi exhaustiva de l'estabilitat dels residus del centre actiu deRas en l'estat actiu i inactiu.
The electrostatic preorganization of the active site has been put forward as the general framework of action of enzymes. Thus, enzymes would position "strategic" residues in such a way to be prepared to catalyze reactions by
interacting in a stronger way with the transition state, in this way decreasing the activation energy g cat for the catalytic process. It has been proposed that
such electrostatic preorientation should be shown by analyzing the electrostatic stability of individual residues in the active site.
Ras protein is an essential signaling molecule and functions as a switch in the
cell. The structural features of the Ras protein in its active state (ON state) are different than those in its inactive state (OFF state). In this thesis, an exhaustive analysis of the stability of residues in the active and inactive Ras active site is performed.
Chi, Yang, and 楊奇. "Use Sequence-Structure Alignment Approach to Predict Protein Function." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/14206718616440163154.
Full text中華大學
資訊工程學系碩士班
93
Protein interaction plays important role in the most beings. “Guilt-by-Association” is a method in common use to infer functions of protein,that is,if we can realize functions of any one of a pair of proteins which have interactions,we can conclude that the others has high-relative functions. There are three kinds of protein interactions classified by their functions:Metabolism or signal channel,Pattern-formed channel,Organism macro molecule structuring. Evan the part of Organism macro molecule structuring is very important knowledge. No doubt there is close relations between protein functions and its molecule structure,so far there is about 40 percent of protein that we don’t know what functions it has in human’s protein datum. I am very interested in this research; therefore, I think that I can use Sequence and structure to predict protein functions. Though many methods exist to predict protein’s second-class structure and third-class structure,but few considered in the molecule structure factor. Whether it is effective to consider the structure and array factor when we predict protein functions.Therefore,it motive us to find a more reliable method to predict protein functions. In this paper,we attempt to use protein sequence and structure characteristics,derive the second-class structure by first sequence,to predict functions of an unknown protein. The sample data was quoted from the known proteins in PDB(Protein Data Bank) Website, a famous biochemical unit,and make a study by gathering, sorting, pruning, training, and predicting.We will use HMM method to calculate study and predict a first-class array and also second-class array of protein functions. We expect to attain 50% accuracy in prediction by the known proteins data and wish to have some contribution in development of bio-information.
Ma, Fangrui. "Biological sequence analyses theory, algorithms, and applications /." 2009. http://proquest.umi.com/pqdweb?did=1821098721&sid=1&Fmt=2&clientId=14215&RQT=309&VName=PQD.
Full textTitle from title screen (site viewed October 13, 2009). PDF text: xv, 233 p. : ill. ; 4 Mb. UMI publication number: AAT 3360173. Includes bibliographical references. Also available in microfilm and microfiche formats.
Ho, Cheng Chen, and 何誠禎. "Using Evolutionary Computation to Solve the Multiple Protein Sequence Alignment." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/63510373306799359361.
Full text樹德科技大學
資訊管理研究所
91
The problem of multiple sequence alignment (MSA) is the important issue of the molecular biology in recent years. The purpose of molecular sequence alignment is revealing the diversity of structure in the DNA/Protein. MSA is the most common and important technology to compute the molecular sequence alignment of creature. In this paper, we combined genetic algorithm and dynamic programming to solve the problem of MSA. Thus, we used two crossover operators and three mutation operators to improve the molecular sequence alignment. Experimental results on real sequences, which are provided from BAliBASE are given to illustrate the effectiveness of the proposed approach.