Dissertations / Theses: 'Algorithms- Protein'

1

Derevyanko, Georgy. "Structure-based algorithms for protein-protein interactions." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENY070/document.

Full text

Abstract:

Les phénotypes de tous les organismes vivants connus sont déterminés par les interactions compliquées entre les protéines produites dans ces organismes. La compréhension des réponses des organismes aux stimuli externes ou internes est basée sur la compréhension des interactions des protéines individuelles et des structures de ses complexes. La prédiction d'un complexe de deux ou plus protéines est le problème du domaine du docking protéine-protéine. Les algorithmes du docking ont habituellement deux étapes majeurs: recherche 6D exhaustive suivi par le scoring. Dans ce travail, nous avons contribués aux deus étapes sus indiquées. Nous avons développés le nouvel algorithme pour la recherche 6D exhaustive, HermiteFit. Cela est basé sur la décomposition des fonctions 3D en base Hermite. Nous avons implémenté cet algorithme dans le programme pour le fitting (l'ajustement des donnés) des cartes de densité électronique de résolution faible. Nous avons montrés qu'il surpasse les algorithmes existants en terme de temps par point tandis qu'il maintient la même précision du modèle sortant. Nous avons aussi développés la nouvelle approche de calculation de la fonction du scoring, qui est basé sur les arguments logique simples et qui évite la calculation ambiguë de l'état de référence. Nous avons comparés cela aux fonctions de scoring existantes avec l'aide du docking protéines-protéines benchmarks bien connues. Enfin, nous avons développés une approche permettant l'inclusion des interactions eau-protéine à la fonction du scoring et nous avons validés notre méthode pendant le CAPRI (Critical Assessment of Protein Interactions) tour 47
The phenotype of every known living organism is determined mainly by the complicated interactions between the proteins produced in this organism. Understanding the orchestration of the organismal responses to the external or internal stimuli is based on the understanding of the interactions of individual proteins and their complexes structures. The prediction of a complex of two or more proteins is the problem of the protein-protein docking field. Docking algorithms usually have two major steps: exhaustive 6D rigid-body search followed by the scoring. In this work we made contribution to both of these steps. We developed a novel algorithm for 6D exhaustive search, HermiteFit. It is based on Hermite decomposition of 3D functions into the Hermite basis. We implemented this algorithm in the program for fitting low-resolution electron density maps. We showed that it outperforms existing algorithms in terms of time-per-point while maintaining the same output model accuracy. We also developed a novel approach to computation of a scoring function, which is based on simple logical arguments and avoids an ambiguous computation of the reference state. We compared it to the existing scoring functions on the widely used protein-protein docking benchmarks. Finally, we developed an approach to include water-protein interactions into the scoring functions and validated our method during the Critical Assessment of Protein Interactions round 47

APA, Harvard, Vancouver, ISO, and other styles

2

Lassmann, Timo. "Algorithms for building and evaluating multiple sequence alignments /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-887-8/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Hosur, Raghavendra. "Structure-based algorithms for protein-protein interaction prediction." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/75843.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Materials Science and Engineering, 2012.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student submitted PDF version of thesis.
Includes bibliographical references (p. 109-124).
Protein-protein interactions (PPIs) play a central role in all biological processes. Akin to the complete sequencing of genomes, complete descriptions of interactomes is a fundamental step towards a deeper understanding of biological processes, and has a vast potential to impact systems biology, genomics, molecular biology and therapeutics. PPIs are critical in maintenance of cellular integrity, metabolism, transcription/ translation, and cell-cell communication. This thesis develops new methods that significantly advance our efforts at structure- based approaches to predict PPIs and boost confidence in emerging high-throughput (HTP) data. The aims of this thesis are, 1) to utilize physicochemical properties of protein interfaces to better predict the putative interacting regions and increase coverage of PPI prediction, 2) increase confidence in HTP datasets by identifying likely experimental errors, and 3) provide residue-level information that gives us insights into structure-function relationships in PPIs. Taken together, these methods will vastly expand our understanding of macromolecular networks. In this thesis, I introduce two computational approaches for structure-based proteinprotein interaction prediction: iWRAP and Coev2Net. iWRAP is an interface threading approach that utilizes biophysical properties specific to protein interfaces to improve PPI prediction. Unlike previous structure-based approaches that use single structures to make predictions, iWRAP first builds profiles that characterize the hydrophobic, electrostatic and structural properties specific to protein interfaces from multiple interface alignments. Compatibility with these profiles is used to predict the putative interface region between the two proteins. In addition to improved interface prediction, iWRAP provides better accuracy and close to 50% increase in coverage on genome-scale PPI prediction tasks. As an application, we effectively combine iWRAP with genomic data to identify novel cancer related genes involved in chromatin remodeling, nucleosome organization and ribonuclear complex assembly - processes known to be critical in cancer. Coev2Net addresses some of the limitations of iWRAP, and provides techniques to increase coverage and accuracy even further. Unlike earlier sequence and structure profiles, Coev2Net explicitly models long-distance correlations at protein interfaces. By formulating interface co-evolution as a high-dimensional sampling problem, we enrich sequence/structure profiles with artificial interacting homologus sequences for families which do not have known multiple interacting homologs. We build a spanning-tree based graphical model induced by the simulated sequences as our interface profile. Cross-validation results indicate that this approach is as good as previous methods at PPI prediction. We show that Coev2Net's predictions correlate with experimental observations and experimentally validate some of the high-confidence predictions. Furthermore, we demonstrate how analysis of the predicted interfaces together with human genomic variation data can help us understand the role of these mutations in disease and normal cells.
by Raghavendra Hosur.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

4

Bazzoli, A. "Protein structure prediction and protein design with evolutionary algorithms." Doctoral thesis, Università degli Studi di Milano, 2009. http://hdl.handle.net/2434/64478.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Lappe, Michael. "Novel algorithms for protein interaction networks." Thesis, University of Cambridge, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.615625.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Sajjadi, Sajdeh [Verfasser]. "Step by step in fast protein-protein docking algorithms / Sajdeh Sajjadi." Lübeck : Zentrale Hochschulbibliothek Lübeck, 2014. http://d-nb.info/1060276887/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

C, Dukka Bahadur K. "Clique-based algorithms for protein structure prediction." 京都大学 (Kyoto University), 2006. http://hdl.handle.net/2433/143887.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Thomas, Dallas, and University of Lethbridge Faculty of Arts and Science. "Algorithms & experiments for the protein chain lattice fitting problem." Thesis, Lethbridge, Alta. : University of Lethbridge, Faculty of Arts and Science, 2006, 2006. http://hdl.handle.net/10133/535.

Full text

Abstract:

This study seeks to design algorithms that may be used to determine if a given lattice is a good approximation to a given rigid protein structure. Ideal lattice models discovered using our techniques may then be used in algorithms for protein folding and inverse protein folding. In this study we develop methods based on dynamic programming and branch and bound in an effort to identify “ideal” lattice models. To further our understanding of the concepts behind the methods we have utilized a simple cubic lattice for our analysis. The algorithms may be adapted to work on any lattice. We describe two algorithms. One for aligning the protein backbone to the lattice as a walk. This algorithm runs in polynomial time. The second algorithm for aligning a protein backbone as a path to the lattice. Both the algorithms seek to minimize the CRMS deviation of the alignment. The second problem was recently shown to be NP-Complete, hence it is highly unlikely that an efficient algorithm exists. The first algorithm gives a lower bound on the optimal solution to the second problem, and can be used in a branch and bound procedure. Further, we perform an empirical evaluation of our algorithm on proteins from the Protein Data Bank (PDB).
ix, 47 leaves ; 29 cm.

APA, Harvard, Vancouver, ISO, and other styles

9

Gamalielsson, Jonas. "Models for Protein Structure Prediction by Evolutionary Algorithms." Thesis, University of Skövde, Department of Computer Science, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-623.

Full text

Abstract:

Evolutionary algorithms (EAs) have been shown to be competent at solving complex, multimodal optimisation problems in applications where the search space is large and badly understood. EAs are therefore among the most promising classes of algorithms for solving the Protein Structure Prediction Problem (PSPP). The PSPP is how to derive the 3D-structure of a protein given only its sequence of amino acids. This dissertation defines, evaluates and shows limitations of simplified models for solving the PSPP. These simplified models are off-lattice extensions to the lattice HP model which has been proposed and is claimed to possess some of the properties of real protein folding such as the formation of a hydrophobic core. Lattice models usually model a protein at the amino acid level of detail, use simple energy calculations and are used mainly for search algorithm development. Off-lattice models usually model the protein at the atomic level of detail, use more complex energy calculations and may be used for comparison with real proteins. The idea is to combine the fast energy calculations of lattice models with the increased spatial possibilities of an off-lattice environment allowing for comparison with real protein structures. A hypothesis is presented which claims that a simplified off-lattice model which considers other amino acid properties apart from hydrophobicity will yield simulated structures with lower Root Mean Square Deviation (RMSD) to the native fold than a model only considering hydrophobicity. The hypothesis holds for four of five tested short proteins with a maximum of 46 residues. Best average RMSD for any model tested is above 6Å, i.e. too high for useful structure prediction and excludes significant resemblance between native and simulated structure. Hence, the tested models do not contain the necessary biological information to capture the complex interactions of real protein folding. It is also shown that the EA itself is competent and can produce near-native structures if given a suitable evaluation function. Hence, EAs are useful for eventually solving the PSPP.

APA, Harvard, Vancouver, ISO, and other styles

10

Parry-Smith, David John. "Algorithms and data structures for protein sequence analysis." Thesis, University of Leeds, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.277404.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Singh, Rohit Ph D. Massachusetts Institute of Technology. "Algorithms for the analysis of protein interaction networks." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/71489.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 107-117).
In the decade since the human genome project, a major research trend in biology has been towards understanding the cell as a system. This interest has stemmed partly from a deeper appreciation of how important it is to understand the emergent properties of cellular systems (e.g., they seem to be the key to understanding diseases like cancer). It has also been enabled by new high-throughput techniques that have allowed us to collect new types of data at the whole-genome scale. We focus on one sub-domain of systems biology: the understanding of protein interactions. Such understanding is valuable: interactions between proteins are fundamental to many cellular processes. Over the last decade, high-throughput experimental techniques have allowed us to collect a large amount of protein-protein interaction (PPI) data for many species. A popular abstraction for representing this data is the protein interaction network: each node of the network represents a protein and an edge between two nodes represents a physical interaction between the two corresponding proteins. This abstraction has proven to be a powerful tool for understanding the systems aspects of protein interaction. We present some algorithms for the augmentation, cleanup and analysis of such protein interaction networks: 1. In many species, the coverage of known PPI data remains partial. Given two protein sequences, we describe an algorithm to predict if two proteins physically interact, using logistic regression and insights from structural biology. We also describe how our predictions may be further improved by combining with functional-genomic data. 2. We study systematic false positives in a popular experimental protocol, the Yeast 2-Hybrid method. Here, some "promiscuous" proteins may lead to many false positives. We describe a Bayesian approach to modeling and adjusting for this error. 3. Comparative analysis of PPI networks across species can provide valuable insights. We describe IsoRank, an algorithm for global network alignment of multiple PPI networks. The algorithm first constructs an eigenvalue problem that encapsulates the network and sequence similarity constraints. The solution of the problem describes a k-partite graph that is further processed to find the alignment. 4. For a given signaling network, we describe an algorithm that combines RNA-interference data with PPI data to produce hypotheses about the structure of the signaling network. Our algorithm constructs a multi-commodity flow problem that expresses the constraints described by the data and finds a sparse solution to it.
by Rohit Singh.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

12

Djurdjević, Dušan. "Ab initio protein fold prediction using evolutionary algorithms." Thesis, University of Edinburgh, 2006. http://hdl.handle.net/1842/13660.

Full text

Abstract:

A comprehensive study was undertaken for ab initio protein fold prediction using a fully atomistic protein model and a physicochemical potential. Twenty four EA designs where initially assessed on polyalanine, a prototypical α-helical polypeptide. Design aspects varied include the encoding alphabet, crossover operator, replacement strategy and selection strategy. By undertaking a comprehensive parameter study, the best performing designs and associated control parameter values were identified for polyalanine. The scaling between the performance and polyalinine size was also identified for these best designs. This initial study was followed by a similar parametric study for met-enkephalin, a five residue polypeptide that has long been used as a de facto standard test case for protein structure prediction algorithms. It was found that the control parameter scalings identified from the polyalinine study were transferable to this real protein, and that the EA is superior to all existing ab initio approaches for met-enkephalin. The best design was finally applied to a series of real proteins ranging in size up to 45 residues to more generally assess the EA’s performance. The thesis is concluded with consideration of the future work required to extend the EA to larger proteins and ab initio structure prediction for non-native environments such as at interfaces, which are of relevance to, for example, biosensors.

APA, Harvard, Vancouver, ISO, and other styles

13

Contreras-Moreira, Bruno. "Algorithms for protein comparative modelling and some evolutionary implications." Thesis, University College London (University of London), 2004. http://discovery.ucl.ac.uk/1446587/.

Full text

Abstract:

Protein comparative modelling (CM) is a predictive technique to build an atomic model for a polypeptide chain, based on the experimentally determined structures of related proteins (templates). It is widely used in Structural Biology, with applications ranging from mutation analysis, protein and drug design to function prediction and analysis, particularly when there are no experimental structures of the protein of interest. Therefore, CM is an important tool to process the amount of data generated by genomic projects. Several problems affect the performance of CM and therefore solutions for them are needed to increase its applicability. In this work different algorithms and approaches were tested with this aim, particularly to help in template selection and alignment, and some useful insights were obtained. First, this work describes the development of DomainFishing, a tool to split protein sequences into functionally and structurally defined domains and to align each of them to the available templates. The performance of our approach is benchmarked and some problems and possible developments are identified. When comparing different alignment procedures none of them is found to be consistently superior, suggesting that a combination of several could be an advantage. Driven by these ideas and the fact that selecting templates can be a difficult problem, a new modelling approach is designed and tested. This algorithm uses crossover, mutation and selection within populations of protein models generated from different templates and alignments to obtain recombinant structures optimised in terms of fitness. Despite our simple definition of fitness, the procedure is shown to be robust to some alignment errors while simplifying the task of selecting templates, making it a good candidate for automatic building of reliable protein models. In-house benchmarks of the method show its strengths and limitations. The method was also benchmarked during the fifth Critical Assessment of techniques for protein Structure Prediction (CASP5), in which its perfomance was encouraging both for comparative modelling and fold recognition targets, among the top 20 predictors. Finally, we present some data to support a possible evolutionary feedback mechanism between protein structure and gene structure, using human and murine genomic data, structural data from the Protein Data Bank and the protein recombination methodology. This may have some implications for understanding protein evolution and protein design, which are discussed.

APA, Harvard, Vancouver, ISO, and other styles

14

Wang, Xueyi Snoeyink Jack. "Exploring RNA and protein 3D structures by geometric algorithms." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2008. http://dc.lib.unc.edu/u?/etd,1905.

Full text

Abstract:

Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2008.
Title from electronic title page (viewed Dec. 11, 2008). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science." Discipline: Computer Science; Department/School: Computer Science.

APA, Harvard, Vancouver, ISO, and other styles

15

Jiménez, García Brian. "Development and optimization of high-performance computational tools for protein-protein docking." Doctoral thesis, Universitat de Barcelona, 2016. http://hdl.handle.net/10803/398790.

Full text

Abstract:

Computing has pushed a paradigm shift in many disciplines, including structural biology and chemistry. This change has been mainly driven by the increase in performance of computers, the capacity of dealing with huge amounts of experimental and analysis data and the development of new algorithms. Thanks to these advances, our understanding on the chemistry that supports life has increased and it is even more sophisticated that we had never imagined before. Proteins play a major role in nature and are often described as the factories of the cell as they are involved in virtually all important function in living organisms. Unfortunately, our understanding of the function of many proteins is still very poor due to the actual limitations in experimental techniques which, at the moment, they can not provide crystal structure for many protein complexes. The development of computational tools as protein-protein docking methods could help to fill this gap. In this thesis, we have presented a new protein-protein docking method, LightDock, which supports the use of different custom scoring functions and it includes anisotropic normal analysis to model backbone flexibility upon binding process. Second, several interesting web-based tools for the scientific community have been developed, including a web server for protein-protein docking, a web tool for the characterization of protein-protein interfaces and a web server for including SAXS experimental data for a better prediction of protein complexes. Moreover, the optimizations made in the pyDock protocol and the increase in th performance helped our group to score in the 5th position among more than 60 participants in the past two CAPRI editions. Finally, we have designed and compiled the Protein-Protein (version 5.0) and Protein-RNA (version 1.0) docking benchmarks, which are important resources for the community to test and to develop new methods against a reference set of curated cases.
Gràcies als recents avenços en computació, el nostre coneixement de la química que suporta la vida ha incrementat enormement i ens ha conduït a comprendre que la química de la vida és més sofisticada del que mai haguéssim pensat. Les proteïnes juguen un paper fonamental en aquesta química i són descrites habitualment com a les fàbriques de les cèl·lules. A més a més, les proteïnes estan involucrades en gairebé tots els processos fonamentals en els éssers vius. Malauradament, el nostre coneixement de la funció de moltes proteïnes és encara escaig degut a les limitacions actuals de molts mètodes experimentals, que encara no són capaços de proporcionar-nos estructures de cristall per a molts complexes proteïna-proteïna. El desenvolupament de tècniques i eines informàtiques d’acoblament proteïna-proteïna pot ésser crucial per a ajudar-nos a reduir aquest forat. En aquesta tesis, hem presentat un nou mètode computacional de predicció d’acoblament proteïna-proteïna, LightDock, que és capaç de fer servir diverses funcions energètiques definides per l’usuari i incloure un model de flexibilitat de la cadena principal mitjançant la anàlisis de modes normals. Segon, diverses eines d’interès per a la comunitat científica i basades en tecnologia web han sigut desenvolupades: un servidor web de predicció d’acoblament proteïna-proteïna, una eina online per a caracteritzar les interfícies d’acoblament proteïna-proteïna i una eina web per a incloure dades experimentals de tipus SAXS. A més a més, les optimitzacions fetes al protocol pyDock i la conseqüent millora en rendiment han propiciat que el nostre grup de recerca obtingués la cinquena posició entre més de 60 grups en les dues darreres avaluacions de l’experiment internacional CAPRI. Finalment, hem dissenyat i compilat els banc de proves d’acoblament proteïna-proteïna (versió 5) i proteïna-ARN (versió 1), molt importants per a la comunitat ja que permeten provar i desenvolupar nous mètodes i analitzar-ne el rendiment en aquest marc de referència comú.

APA, Harvard, Vancouver, ISO, and other styles

16

Bondugula, Rajkumar. "A novel framework for protein structure prediction." Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/4855.

Full text

Abstract:

Thesis (Ph.D.)--University of Missouri-Columbia, 2007.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on March 23, 2009) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

17

Pettitt, Christopher Steven. "Refinement of protein structure models with multi-objective genetic algorithms." Thesis, University College London (University of London), 2007. http://discovery.ucl.ac.uk/1446043/.

Full text

Abstract:

Here I investigate the protein structure refinement problem for homology-based protein structure models. The refinement problem has been identified as a major bottleneck in the structure prediction process and inhibits the goal of producing high-resolution experimental quality structures for target protein sequences. This thesis is composed of three investigations into aspects of template-based modelling and refinement. In the primary investigation, empirical evidence is provided to support the hypothesis that using multiple template-based structures to model a target sequence can improve the quality of the prediction over that obtained solely by using the single best prediction. A multi-objective genetic algorithm is used to optimize protein structure models by using the structural information from a set of predictions, guided by various objective functions. The effect of multi-objective optimization on model quality is examined. A benchmark of energy functions and model quality assessment methods is performed in the context of automated homology modelling to assess the ability of these methods at discriminating nearer-native structures from a set of predictions. These model quality assessment methods were unable to significantly improve the ranking of threading- based prediction methods though some model quality assessment methods improved model selection for methods which use sequence information alone. The results suggest that structural informational can provide valuable information for distinguishing better models where only sequence information has been used for modelling. The suitability of these energy functions for high-resolution refinement is discussed. Finally, a stochastic optimization algorithm is developed for refining homology-based protein structure models using evolutionary algorithms. This approach uses multiple structural model inputs, conformational sampling operators, and objective functions for guiding a search through conformational space. Single- and multi-objective genetic variants are applied to homology model predictions for 35 target proteins. The refinement results are discussed and the performance of both algorithmic variants compared and contrasted.

APA, Harvard, Vancouver, ISO, and other styles

18

Bliven, Spencer Edward. "Structure-Preserving Rearrangements| Algorithms for Structural Comparison and Protein Analysis." Thesis, University of California, San Diego, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3716489.

Full text

Abstract:

Protein structure is fundamental to a deep understanding of how proteins function. Since structure is highly conserved, structural comparison can provide deep information about the evolution and function of protein families. The Protein Data Bank (PDB) continues to grow rapidly, providing copious opportunities for advancing our understanding of proteins through large-scale searches and structural comparisons. In this work I present several novel structural comparison methods for specific applications, as well as apply structure comparison tools systematically to better understand global properties of protein fold space.

Circular permutation describes a relationship between two proteins where the N-terminal portion of one protein is related to the C-terminal portion of the other. Proteins that are related by a circular permutation generally share the same structure despite the rearrangement of their primary sequence. This non-sequential relationship makes them difficult for many structure alignment tools to detect. Combinatorial Extension for Circular Permutations (CE-CP) was developed to align proteins that may be related by a circular permutation. It is widely available due to its incorporation into the RCSB PDB website.

Symmetry and structural repeats are common in protein structures at many levels. The CE-Symm tool was developed in order to detect internal pseudosymmetry within individual polypeptide chains. Such internal symmetry can arise from duplication events, so aligning the individual symmetry units provides insights about conservation and evolution. In many cases, internal symmetry can be shown to be important for a number of functions, including ligand binding, allostery, folding, stability, and evolution.

Structural comparison tools were applied comprehensively across all PDB structures for systematic analysis. Pairwise structural comparisons of all proteins in the PDB have been computed using the Open Science Grid computing infrastructure, and are kept continually up-to-date with the release of new structures. These provide a network-based view of protein fold space. CE-Symm was also applied to systematically survey the PDB for internally symmetric proteins. It is able to detect symmetry in ~20% of all protein families. Such PDB-wide analyses give insights into the complex evolution of protein folds.

APA, Harvard, Vancouver, ISO, and other styles

19

Singh, Mona. "Learning algorithms with applications to robot navigation and protein folding." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/40579.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Crook, James. "New algorithms and methods for protein and DNA sequence comparison." Thesis, University of Edinburgh, 1991. http://hdl.handle.net/1842/13497.

Full text

Abstract:

International biological sequence databases hold information about protein and DNA molecules. The molecules are represented by sequences of characters. In analysis of this data algorithms for comparing the character sequences play a central role. Comparisons can be made using dynamic programming techniques to determine the score of optimal sequence alignments. Such methods are particularly popular with molecular biologists for they accommodate the kinds of differences which actually occur in the sequences of related molecules. Sequence alignments are normally scored using score tables based on an evolutionary model. The derivation of these score tables is re-examined and a formula giving an analytic counterpart to an empirical method for assessment of a score table's discriminating power is found. Use of the formula to derive alternative protein similarity scoring tables is discussed. A new approach to tackling the heavy computational demands of the dynamic programming algorithm is described: intensive optimisation of a microcomputer implementation. This provides an alternative to implementations which use parallel computers for searching protein databases. This thesis also describes how other implementational problems were tackled in order to make more effective use of the serial comparison software. The new software permitted comparison by optimal alignment of 32,000,000 pairs of sequences from a protein database using widely available and inexpensive hardware. The results from this search were then reorganised to facilitate the findings of previously unseen similarities. Software tools were written to assist with the analysis including software to align sequence families. From the results of this work, nine similarities are presented which do not appear to have been previously noted. The examples illustrate factors that are important in assessing similarities with scores close to the boundaries of significance. The similarities presented are of particular interest because of the biological functions they relate. One software tool developed for the sequence analysis work was a new multiple sequence alignment editor and sequence aligner, 'medal'. Lessons from its use on real sequence data lead to a modification to the original comparison method to accommodate local variations in sequence similarity. Consideration is given to parallelisation of this modification and of the methods used to obtain speed in the serial software. Alternatives are suggested. The suggested parallel method to cope with variations in sequence similarity requires two interdependent sequence comparisons. A serial program using three interdependent comparisons is demonstated and shows the feasibility of multiple interdependent comparisons. Examples show how this new program, 'Fradho', can compare DNA sequences to protein sequences accommodating frameshifts.

APA, Harvard, Vancouver, ISO, and other styles

21

Tan, Guanhong. "Study of Protein Identification Algorithms and Ammonia Metabolism in Mosquitoes." Thesis, The University of Arizona, 2006. http://hdl.handle.net/10150/193319.

Full text

Abstract:

Two database search algorithms, SEQUEST and X!Tandem, were studied in detail. Research results showed that SEQUEST is relatively prone to identify single charged peptides, while X!Tandem is prone to identify highly charged peptides.Peptide fragmentation patterns associated with corresponding structure motifs are incorporated into SEQUEST Replica and X!Tandem Replica. Research results showed that selective cleavage rules for peptide fragmentation help improve peptide identification especially for selectively cleaved peptides. A tool that makes use of the peak intensity information in the experimental spectra is applied after a SEQUEST search to extract correct peptides. Results showed that more peptides could be correctly identified and a low false positive rate (<5%) was introduced by applying this tool after SEQUEST search.A new possible ammonia metabolic pathway in mosquitoes was proposed. Results showed that the major steps along this pathway were confirmed and the detailed transfer pathway of nitrogen was elucidated.

APA, Harvard, Vancouver, ISO, and other styles

22

Choudhury, Salimur Rashid, and University of Lethbridge Faculty of Arts and Science. "Approximation algorithms for a graph-cut problem with applications to a clustering problem in bioinformatics." Thesis, Lethbridge, Alta. : University of Lethbridge, Deptartment of Mathematics and Computer Science, 2008, 2008. http://hdl.handle.net/10133/774.

Full text

Abstract:

Clusters in protein interaction networks can potentially help identify functional relationships among proteins. We study the clustering problem by modeling it as graph cut problems. Given an edge weighted graph, the goal is to partition the graph into a prescribed number of subsets obeying some capacity constraints, so as to maximize the total weight of the edges that are within a subset. Identification of a dense subset might shed some light on the biological function of all the proteins in the subset. We study integer programming formulations and exhibit large integrality gaps for various formulations. This is indicative of the difficulty in obtaining constant factor approximation algorithms using the primal-dual schema. We propose three approximation algorithms for the problem. We evaluate the algorithms on the database of interacting proteins and on randomly generated graphs. Our experiments show that the algorithms are fast and have good performance ratio in practice.
xiii, 71 leaves : ill. ; 29 cm.

APA, Harvard, Vancouver, ISO, and other styles

23

Palmer, Jane. "The application of genetic algorithms to problems in protein structure solution." Thesis, University of Sheffield, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.286746.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Otero, Fernando E. B. "New ant colony optimisation algorithms for hierarchial classification of protein functions." Thesis, University of Kent, 2010. http://www.cs.kent.ac.uk/pubs/2010/3057.

Full text

Abstract:

Ant colony optimisation (ACO) is a metaheuristic to solve optimisation problems inspired by the foraging behaviour of ant colonies. It has been successfully applied to several types of optimisation problems, such as scheduling and routing, and more recently for the discovery of classification rules. The classification task in data mining aims at predicting the value of a given goal attribute for an example, based on the values of a set of predictor attributes for that example. Since real-world classification problems are generally described by nominal (categorical or discrete) and continuous (real-valued) attributes, classification algorithms are required to be able to cope with both nominal and continuous attributes. Current ACO classification algorithms have been designed with the limitation of discovering rules using nominal attributes describing the data. Furthermore, they also have the limitation of not coping with more complex types of classification problems e.g., hierarchical multi-label classification problems. This thesis investigates the extension of ACO classification algorithms to cope with the aforementioned limitations. Firstly, a method is proposed to extend the rule construction process of ACO classification algorithms to cope with continuous attributes directly. Four new ACO classification algorithms are presented, as well as a comparison between them and well-known classification algorithms from the literature. Secondly, an ACO classification algorithm for the hierarchical problem of protein function prediction which is a major type of bioinformatics problem addressed in this thesis is presented. Finally, three different approaches to extend ACO classification algorithms to the more complex case of hierarchical multi-label classification are described, elaborating on the ideas of the proposed hierarchical classification ACO algorithm. These algorithms are compare against state-of-the-art decision tree induction algorithms for hierarchical multi-label classification in the context of protein function prediction. The computational results of experiments with a wide range of data sets including challenging protein function prediction data sets with very large number.

APA, Harvard, Vancouver, ISO, and other styles

25

Chippington-Derrick, T. C. "Models, methods and algorithms for constraint dynamics simulations of long chain molecules." Thesis, University of Reading, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.234776.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Ishivatari, Luís Henrique Uchida. "Função de avaliação dinâmica em algoritmos genéticos aplicados na predição de estruturas tridimensionais de proteínas." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-27112012-185423/.

Full text

Abstract:

O problema de predição de estruturas tridimensionais de proteínas pode ser visto computacionalmente como um problema de otimização, tal que dada a sequência de aminoácidos, deve-se encontrar a estrutura tridimensional da proteína dentre as muitas possíveis através da obtenção de mínimos de funções de energia. Vários pesquisadores têm proposto estratégias de Computação Evolutiva para a determinação de estruturas tridimensionais das proteínas, entretanto nem sempre resultados animadores têm sido alcançados visto que entre outros fatores, há um grande número de ótimos locais no espaço de busca. Geralmente as funções de fitness empregadas pelos algoritmos de otimização são baseadas em campos de força com diferentes termos de energia, sendo que os parâmetros destes termos são ajustados a priori e são mantidos estáticos ao longo do processo de otimização. Alguns pesquisadores sugerem que o uso de funções de fitness dinâmicas, ou seja, que mudam durante um processo de otimização evolutivo, pode aumentar a capacidade das populações fugirem de ótimos locais em problemas altamente multimodais. Neste trabalho, propõe-se que os parâmetros dos termos do campo de força utilizado sejam modificados durante o processo de otimização realizado por Algoritmos Genéticos (AGs) no problema de predição de estruturas de proteínas, sendo aumentados ou diminuídos, por exemplo, de acordo com a sua influência na formação de estruturas secundárias e no seu ajuste fino. Como a função de avaliação será modificada durante o processo de otimização, a predição de estruturas tridimensionais de proteínas torna-se um problema de otimização dinâmica, sendo que o uso de Algoritmos Genéticos específicos para tais problemas, como o AG com hipermutação e os AGs com imigrantes aleatórios são investigados aqui. É proposta uma nova métrica relacionada ao alinhamento da estrutura secundária da proteína, para auxiliar a análise dos dados obtidos e os resultados dos experimentos indicam que os algoritmos com função de avaliação dinâmica obtiveram resultados melhores que os algoritmos estáticos, o que é explicado pelo fato de as mudanças na função de fitness possibilitarem eventuais fugas de ótimos locais, bem como um aumento da diversidade da população.
The protein structure prediction can be seen as an optimization problem where given an amino acid sequence, the tertiary protein structure must be found amongst many possible by obtaining energy functions minima. Many researchers have been proposing Evolutionary Computation strategies to find tridimensional structures of proteins; however results are not always satisfactory since among other factors, there are always a great number of local optima in the search space. Usually, the fitness functions used by optimization algorithms are based on force fields with different energy terms with parameters from those terms being adjusted a priori, kept static through the optimization process. Some researchers suggest that the use of dynamic functions, i.e., that can be changed during the evolutionary process, can help the population to escape from local optima in highly multimodal problems. In this work we propose that the force field parameters can be changed during the optimization process of Genetic Algorithms (GAs) in the protein structure prediction problem, being increased or decreased, for instance, according with its influence on formation of secondary structures and its fine tuning. Since the cost function will be changed during the optimization process, the protein tridimensional structure prediction becomes a dynamic optimization problem and specific Genetic Algorithms for this kind of problem, like the hypermutation GA and random immigrants GA are investigated. We also propose a new metric related to the proteins secondary structure alignment to help the analysis of obtained data. Results indicate that the dynamic function algorithms obtained better results than static algorithms since changes on the fitness function allow the population to escape local optima, as well as an increase on the population diversity.

APA, Harvard, Vancouver, ISO, and other styles

27

Herndon, Nic. "Domain adaptation algorithms for biological sequence classification." Diss., Kansas State University, 2016. http://hdl.handle.net/2097/35242.

Full text

Abstract:

Doctor of Philosophy
Department of Computing and Information Sciences
Doina Caragea
The large volume of data generated in the recent years has created opportunities for discoveries in various fields. In biology, next generation sequencing technologies determine faster and cheaper the exact order of nucleotides present within a DNA or RNA fragment. This large volume of data requires the use of automated tools to extract information and generate knowledge. Machine learning classification algorithms provide an automated means to annotate data but require some of these data to be manually labeled by human experts, a process that is costly and time consuming. An alternative to labeling data is to use existing labeled data from a related domain, the source domain, if any such data is available, to train a classifier for the domain of interest, the target domain. However, the classification accuracy usually decreases for the domain of interest as the distance between the source and target domains increases. Another alternative is to label some data and complement it with abundant unlabeled data from the same domain, and train a semi-supervised classifier, although the unlabeled data can mislead such classifier. In this work another alternative is considered, domain adaptation, in which the goal is to train an accurate classifier for a domain with limited labeled data and abundant unlabeled data, the target domain, by leveraging labeled data from a related domain, the source domain. Several domain adaptation classifiers are proposed, derived from a supervised discriminative classifier (logistic regression) or a supervised generative classifier (naïve Bayes), and some of the factors that influence their accuracy are studied: features, data used from the source domain, how to incorporate the unlabeled data, and how to combine all available data. The proposed approaches were evaluated on two biological problems -- protein localization and ab initio splice site prediction. The former is motivated by the fact that predicting where a protein is localized provides an indication for its function, whereas the latter is an essential step in gene prediction.

APA, Harvard, Vancouver, ISO, and other styles

28

Chi, Pin-Hao. "Efficient protein tertiary structure retrievals and classifications using content based comparison algorithms." Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/4817.

Full text

Abstract:

Thesis (Ph. D.)--University of Missouri-Columbia, 2007.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on September 19, 2007) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

29

Park, Daniel K. (Daniel Kyu). "Web servers, databases, and algorithms for the analysis of protein interaction networks." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/79146.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, Computational and Systems Biology Program, 2013.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (p. 41-44).
Understanding the cell as a system has become one of the foremost challenges in the post-genomic era. As a result of advances in high-throughput (HTP) methodologies, we have seen a rapid growth in new types of data at the whole-genome scale. Over the last decade, HTP experimental techniques such as yeast two-hybrid assays and co-affinity purification couple with mass spectrometry have generated large amounts of data on protein-protein interactions (PPI) for many organisms. We focus on the sub-domain of systems biology related to understanding the interactions between proteins that ultimately drive all cellular processes. Representing PPIs as a protein interaction network has proved to be a powerful tool for understanding PPIs at the systems level. In this representation, each node represents a protein and each edge between two nodes represents a physical interaction between the corresponding two proteins. With this abstraction, we present algorithms for the prediction and analysis of such PPI networks as well as web servers and databases for their public availability: 1. In many organisms, the coverage of experimental determined PPI data remains relatively noisy and limited. Given two protein sequences, we describe an algorithm, called Struct2Net, to predict if two proteins physically interact, using insights from structural biology and logistic regression. Furthermore, we create a community-wide web-resource that predicts interactions between any protein sequence pair and provides proteome-wide pre-computed PPI predictions for Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. 2. Comparative analysis of PPI networks across organisms can provide valuable insights into evolutionary conservation. We describe an algorithm, called IsoRank, for global alignment of multiple PPI networks. The algorithm first constructs an eigenvalue problem that models the network and sequence similarity constraints. The solution of the problem describes a k partite graph that is further processed to find the alignments. Furthermore, we create a communitywide web database, called IsoBase, that provides network alignments and orthology mappings for the most commonly studied eukaryotic model organisms: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae.
by Daniel K. Park.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

30

Li, Wenzhou. "Protein Identification Algorithms Developed from Statistical Analysis of MS/MS Fragmentation Patterns." Diss., The University of Arizona, 2012. http://hdl.handle.net/10150/242432.

Full text

Abstract:

Tandem mass spectrometry is widely used in proteomic studies because of its ability to identify large numbers of peptides from complex mixtures. In a typical LC-MS/MS experiment, thousands of tandem mass spectra will be collected and peptide identification algorithms are of great importance to translate them into peptide sequences. Though these spectra contain both m/z and intensity values, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. In this dissertation, an unsupervised statistical method, K-means clustering, was used to study peptide fragmentation patterns for both CID and ETD data, and many unique fragmentation features were discovered. For instance, strong c(n-1) ions were observed in ETD, indicating that the fragmentation site in ETD is highly related to the amino acid residue location. Based on the fragmentation patterns observed through data mining, a peptide identification algorithm that makes use of these patterns was developed. The program is named SQID and it is the first algorithm in our bioinformatics project. Our testing results using multiple public datasets indicated an improvement in the number of identified peptides compared with popular proteomics algorithms such as Sequest or X!Tandem. SQID was further extended to improve cross-linked peptide identification (SQID-XLink) as well as blind modification identification (SQID-Mod), and both of them showed significant improvement compared with existing methods. In this dissertation the SQID algorithm was also successfully applied to a mosquito proteomics project. We are incorporating new features and new algorithms to our software, such as more fragmentation methods, more accurate spectra prediction and more user-friendly interface. We hope the SQID project can continually benefit researchers and help to improve the data analysis of proteomics community.

APA, Harvard, Vancouver, ISO, and other styles

31

Karimpour-Fard, Anis. "Prediction of protein-protein interactions and function in bacteria /." Connect to full text via ProQuest. Limited to UCD Anschutz Medical Campus, 2008.

Find full text

Abstract:

Thesis (Ph.D. in Bioinformatics) -- University of Colorado Denver, 2008.
Typescript. Includes bibliographical references (leaves 141-150). Free to UCD Anschutz Medical Campus. Online version available via ProQuest Digital Dissertations;

APA, Harvard, Vancouver, ISO, and other styles

32

Akkaladevi, Somasheker. "Decision Fusion for Protein Secondary Structure Prediction." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_diss/9.

Full text

Abstract:

Prediction of protein secondary structure from primary sequence of amino acids is a very challenging task, and the problem has been approached from several angles. Proteins have many different biological functions; they may act as enzymes or as building blocks (muscle fibers) or may have transport function (e.g., transport of oxygen). The three-dimensional protein structure determines the functional properties of the protein. A lot of interesting work has been done on this problem, and over the last 10 to 20 years the methods have gradually improved in accuracy. In this dissertation we investigate several techniques for predicting the protein secondary structure. The prediction is carried out mainly using pattern classification techniques such as neural networks, genetic algorithms, simulated annealing. Each individual algorithm may work well in certain situations but fails in others. Capitalizing on the positive decisions can be achieved by forcing the various methods to collaborate to reach a unified consensus based on their previous performances. The process of combining classifiers is called decision fusion. The various decision fusion techniques such as the committee method, correlation method and the Bayesian inference methods to fuse the solutions from various approaches and to get better prediction accuracy are thoroughly explored in this dissertation. The RS126 data set was used for training and testing purposes. The results of applying pattern classification algorithms along with decision fusion techniques showed improvement in the prediction accuracy compared to that of prediction by neural networks or pattern classification algorithms individually or combined with neural networks. This research has shown that decision fusion techniques can be used to obtain better protein secondary structure prediction accuracy.

APA, Harvard, Vancouver, ISO, and other styles

33

Planas, Iglesias Joan 1980. "On the study of 3D structure of proteins for developing new algorithms to complete the interactome and cell signalling networks." Doctoral thesis, Universitat Pompeu Fabra, 2013. http://hdl.handle.net/10803/104152.

Full text

Abstract:

Proteins are indispensable players in virtually all biological events. The functions of proteins are determined by their three dimensional (3D) structure and coordinated through intricate networks of protein-protein interactions (PPIs). Hence, a deep comprehension of such networks turns out to be crucial for understanding the cellular biology. Computational approaches have become critical tools for analysing PPI networks. In silico methods take advantage of the existing PPI knowledge to both predict new interactions and predict the function of proteins. Regarding the task of predicting PPIs, several methods have been already developed. However, recent findings demonstrate that such methods could take advantage of the knowledge on non-interacting protein pairs (NIPs). On the task of predicting the function of proteins,the Guilt-by-Association (GBA) principle can be exploited to extend the functional annotation of proteins over PPI networks. In this thesis, a new algorithm for PPI prediction and a protocol to complete cell signalling networks are presented. iLoops is a method that uses NIP data and structural information of proteins to predict the binding fate of protein pairs. A novel protocol for completing signalling networks –a task related to predicting the function of a protein, has also been developed. The protocol is based on the application of GBA principle in PPI networks.
Les proteïnes tenen un paper indispensable en virtualment qualsevol procés biològic. Les funcions de les proteïnes estan determinades per la seva estructura tridimensional (3D) i són coordinades per mitjà d’una complexa xarxa d’interaccions protiques (en anglès, protein-protein interactions, PPIs). Axí doncs, una comprensió en profunditat d’aquestes xarxes és fonamental per entendre la biologia cel•lular. Per a l’anàlisi de les xarxes d’interacció de proteïnes, l’ús de tècniques computacionals ha esdevingut fonamental als darrers temps. Els mètodes in silico aprofiten el coneixement actual sobre les interaccions proteiques per fer prediccions de noves interaccions o de les funcions de les proteïnes. Actualment existeixen diferents mètodes per a la predicció de noves interaccions de proteines. De tota manera, resultats recents demostren que aquests mètodes poden beneficiar-se del coneixement sobre parelles de proteïnes no interaccionants (en anglès, non-interacting pairs, NIPs). Per a la tasca de predir la funció de les proteïnes, el principi de “culpable per associació” (en anglès, guilt by association, GBA) és usat per extendre l’anotació de proteïnes de funció coneguda a través de xarxes d’interacció de proteïnes. En aquesta tesi es presenta un nou mètode pre a la predicció d’interaccions proteiques i un nou protocol basat per a completar xarxes de senyalització cel•lular. iLoops és un mètode que utilitza dades de parells no interaccionants i coneixement de l’estructura 3D de les proteïnes per a predir interaccions de proteïnes. També s’ha desenvolupat un nou protocol per a completar xarxes de senyalització cel•lular, una tasca relacionada amb la predicció de les funcions de les proteïnes. Aquest protocol es basa en aplicar el principi GBA a xarxes d’interaccions proteiques.

APA, Harvard, Vancouver, ISO, and other styles

34

Zhao, Zhiyu. "Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison." ScholarWorks@UNO, 2008. http://scholarworks.uno.edu/td/851.

Full text

Abstract:

Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics research. This dissertation discusses, specifically, protein structure related problems including protein structure alignment and query, and genome sequence related problems including haplotype reconstruction and genome rearrangement. It first presents an algorithm for pairwise protein structure alignment that is tested with structures from the Protein Data Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and CE. The preliminary algorithm is a graph-theory based approach, which uses the concept of \stars" to reduce the complexity of clique-finding algorithms. The algorithm is then improved by introducing \double-center stars" in the graph and applying a self-learning strategy. The updated algorithm is tested with a much larger set of protein structures and shown to be an improvement in accuracy, especially in cases of weak similarity. A protein structure query algorithm is designed to search for similar structures in the PDB, using the improved alignment algorithm. It is compared with SSM and shows better performance with lower maximum and average Q-score for missing proteins. An interesting problem dealing with the calculation of the diameter of a 3-D sequence of points arose and its connection to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence is approximated by a series of sublinear time deterministic, zero-error and bounded-error randomized algorithms and we have obtained a series of separations about the power of sublinear time computations. This dissertation also discusses two genome sequence related problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices with incomplete and inconsistent errors. The experiments with simulated data show both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy of the algorithm. Finally, a genome rearrangement problem is studied. The concept of non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several polynomial time algorithms are presented.

APA, Harvard, Vancouver, ISO, and other styles

35

Kim, Wooyoung. "Innovative Algorithms and Evaluation Methods for Biological Motif Finding." Digital Archive @ GSU, 2012. http://digitalarchive.gsu.edu/cs_diss/63.

Full text

Abstract:

Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs, and how to validate the biological significance of the ‘candidate motifs’ becomes an important question. Some of sequence motifs are verified by their structural similarities or their functional roles in DNA or protein sequences, and stored in databases. However, biological role of network motifs is still invalidated and currently no databases exist for this purpose. In this thesis, we focus not only on the computational efficiency but also on the biological meanings of the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. Experimental results show that the algorithms perform better than existing algorithms by producing a larger number of high-quality of biological motifs. In addition, we apply biological network motifs for the discovery of essential proteins. Essential proteins are defined as a minimum set of proteins which are vital for development to a fertile adult and in a cellular life in an organism. We design a new centrality algorithm with biological network motifs, named MCGO, and score proteins in a protein-protein interaction (PPI) network to find essential proteins. MCGO is also combined with other centrality measures to predict essential proteins using machine learning techniques. We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins.

APA, Harvard, Vancouver, ISO, and other styles

36

Reyaz-Ahmed, Anjum B. "Protein Secondary Structure Prediction Using Support Vector Machines, Nueral Networks and Genetic Algorithms." Digital Archive @ GSU, 2007. http://digitalarchive.gsu.edu/cs_theses/43.

Full text

Abstract:

Bioinformatics techniques to protein secondary structure prediction mostly depend on the information available in amino acid sequence. Support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction. In this study, a new sliding window scheme is introduced with multiple windows to form the protein data for training and testing SVM. Orthogonal encoding scheme coupled with BLOSUM62 matrix is used to make the prediction. First the prediction of binary classifiers using multiple windows is compared with single window scheme, the results shows single window not to be good in all cases. Two new classifiers are introduced for effective tertiary classification. This new classifiers use neural networks and genetic algorithms to optimize the accuracy of the tertiary classifier. The accuracy level of the new architectures are determined and compared with other studies. The tertiary architecture is better than most available techniques.

APA, Harvard, Vancouver, ISO, and other styles

37

Yerardi, Jason T. "The Implementation and Evaluation of Bioinformatics Algorithms for the Classification of Arabinogalactan-Proteins in Arabidopsis thaliana." Ohio University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1301069861.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

North, Benjamin H. "A Comparison of Clustering Algorithms for the Study of Antibody Loop Structures." Master's thesis, Temple University Libraries, 2017. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/464867.

Full text

Abstract:

Computer and Information Science
M.S.
Antibodies are the fundamental agents of the immune system. The CDRs, or Complementarity Determining Regions act as the functional surfaces in binding antibodies to their targets. These CDR structures, which are peptide loops, are diverse in both amino acid sequence and structure. In 2011, we surveyed a database of CDR loop structures using the affinity propagation clustering algorithm of Frey and Dueck. With the growth of the number of structures deposited in the Protein Data Bank, the number of antibody CDRs has approximately tripled. In addition, although the affinity clustering in 2011 was successful in many ways, the methods used left too much noise in the data, and the affinity clustering algorithm tended to clump diverse structures together. This work revisits the antibody CDR clustering problem and uses five different clustering algorithms to categorize the data. Three of the clustering algorithms use DBSCAN but differ in the data comparison functions used. One uses the sum of the dihedral distances, while another uses the supremum of the dihedral distances, and the third uses the Jarvis-Patrick shared nearest neighbor similarity, where the nearest neighbor lists are compiled using the sum of the dihedral distances. The other two clustering methods use the k-medoids algorithm, one of which has been modified to include the use of pairwise constraints. Overall, the DBSCAN using the sum of dihedral distances and the supremum of the dihedral distances produced the best clustering results as measured by the average silhouette coefficient, while the constrained k-medoids clustering algorithm had the worst clustering results overall.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

39

Parkinson, Scott. "Rational Design Inspired Application of Natural Language Processing Algorithms to Red Shift mNeptune684." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/41928.

Full text

Abstract:

Recent innovations and progress in machine learning algorithms from the Natural Language Processing (NLP) community have motivated efforts to apply these models and concepts to proteins. The representations generated by trained NLP models have been shown to capture important semantic and structural understanding of proteins encompassing biochemical and biophysical properties, among other key concepts. In turn, these representations have demonstrated application to protein engineering tasks including mutation analysis and design of novel proteins. Here we use this NLP paradigm in a protein engineering effort to further red shift the emission wavelength of the red fluorescent protein mNeptune684 using only a small number of functional training variants ('Low-N' scenario). The collaborative nature of this thesis with the Department of Chemistry and Biomolecular Sciences explores using these tools and methods in the rational design process.

APA, Harvard, Vancouver, ISO, and other styles

40

Yaveroglu, Omer Nebil. "Identification Of Functionally Orthologous Protein Groups In Different Species Based On Protein Network Alignment." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12612395/index.pdf.

Full text

Abstract:

In this study, an algorithm named ClustOrth is proposed for determining and matching functionally orthologous protein clusters in different species. The algorithm requires protein interaction networks of the organisms to be compared and GO terms of the proteins in these interaction networks as prior information. After determining the functionally related protein groups using the Repeated Random Walks algorithm, the method maps the identified protein groups according to the similarity metric defined. In order to evaluate the similarities of protein groups, graph theoretical information is used together with the context information about the proteins. The clusters are aligned using GO-Term-based protein similarity measures defined in previous studies. These alignments are used to evaluate cluster similarities by defining a cluster similarity metric from protein similarities. The top scoring cluster alignments are considered as orthologous. Several data sources providing orthology information have shown that the defined cluster similarity metric can be used to make inferences about the orthological relevance of protein groups. Comparison with a protein orthology prediction algorithm named ISORANK also showed that the ClustOrth algorithm is successful in determining orthologies between proteins. However, the cluster similarity metric is too strict and many cluster matches are not able to produce high scores for this metric. For this reason, the number of predictions performed is low. This problem can be overcomed with the introduction of different sources of information related to proteins in the clusters for the evaluation of the clusters. The ClustOrth algorithm also outperformed the NetworkBLAST algorithm which aims to find orthologous protein clusters using protein sequence information directly for determining orthologies. It can be concluded that this study is one of the leading studies addressing the protein cluster matching problem for identifying orthologous functional modules of protein interaction networks computationally.

APA, Harvard, Vancouver, ISO, and other styles

41

Shah, Anuj R. "Improving protein remote homology detection using supervised and semi-supervised support vector machines." Online access for everyone, 2008. http://www.dissertations.wsu.edu/Dissertations/Spring2008/A_Shah_042408.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Klaib, Ahmad. "Exact string matching algorithms for searching DNA and protein sequences and searching chemical databases." Thesis, University of Huddersfield, 2014. http://eprints.hud.ac.uk/id/eprint/24266/.

Full text

Abstract:

The enormous quantities of biological and chemical files and databases are likely to grow year on year, consequently giving rise to the need to develop string-matching algorithms capable of minimizing the searching response time. Being aware of this need, this thesis aims to develop string matching algorithms to search biological sequences and chemical structures by studying exact string matching algorithms in detail. As a result, this research developed a new classification of string matching algorithms containing eight categories according to the pre-processing function of algorithms and proposed five new string matching algorithms; BRBMH, BRQS, Odd and Even algorithm (OE), Random String Matching algorithm (RSMA) and Skip Shift New algorithm (SSN). The main purpose behind the proposed algorithms is to reduce the searching response time and the total number of comparisons. They are tested by comparing them with four well- known standard algorithms, Boyer Moore Horspool (BMH), Quick Search (QS), TVSBS and BRFS. This research applied all of the algorithms to sample data files by implementing three types of tests. The number of comparison tests showed a substantial difference in the number of comparisons our algorithms use compared to the non-hybrid algorithms such as QS and BMH. In addition, the tests showed considerable difference between our algorithms and other hybrid algorithm such as TVSBS and BRFS. For instance, the average elapsed search time tests showed that our algorithms presented better average elapsed search time than the BRFS, TVSBS, QS and BMH algorithms, while the average number of tests showed better number of attempts compared to BMH, QS, TVSBS and BRFS algorithms. A new contribution has been added by this research by using the fastest proposed algorithm, the SSN algorithm, to develop a chemical structure searching toolkit to search chemical structures in our local database. The new algorithms were paralleled using OpenMP and MPI parallel models and tested at the University of Science Malaysia (USM) on a Stealth Cluster with different number of threads and processors to improve the speed of searching pattern in the given text which, as we believe, is another contribution.

APA, Harvard, Vancouver, ISO, and other styles

43

Olandersson, Sandra. "Evaluation of Machine Learning Algorithms for Classification of Short-Chain Dehydrogenase/Reductase Protein Sequences." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik och datavetenskap, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3828.

Full text

Abstract:

The classification of protein sequences is a subfield in the area of Bioinformatics that attracts a substantial interest today. Machine Learning algorithms are here believed to be able to improve the performance of the classification phase. This thesis considers the application of different Machine Learning algorithms to the classification problem of a data set of short-chain dehydrogenases/reductases (SDR) proteins. The classification concerns both the division of the proteins into the two main families, Classic and Extended, and into their different subfamilies. The results of the different algorithms are compared to select the most appropriate algorithm for this particular classification problem.
Klassificeringen av proteinsekvenser är ett område inom Bioinformatik, vilket idag drar till sig ett stort intresse. Maskininlärningsalgoritmer anses här kunna förbättra utförandet av klassificeringsfasen. Denna uppsats rör tillämpandet av olika maskininlärningsalgoritmer för klassificering av ett dataset med short-chain dehydrogenases/reductases (SDR) proteiner. Klassificeringen rör både indelningen av proteinerna i två huvudklasser, Classic och Extended, och deras olika subklasser. Resultaten av de olika algoritmerna jämförs för att välja ut den mest lämpliga algoritmen för detta specifika klassificeringsproblem.
Sandra Olandersson Blåbärsvägen 27 372 38 Ronneby home: 0457-12084

APA, Harvard, Vancouver, ISO, and other styles

44

Denarie, Laurent. "Robotics-inspired methods to enhance protein design." Phd thesis, Toulouse, INPT, 2017. http://oatao.univ-toulouse.fr/18677/1/Denarie.pdf.

Full text

Abstract:

The ability to design proteins with specific properties would yield great progress in pharmacology and bio-technologies. Methods to design proteins have been developed since a few decades and some relevant achievements have been made including de novo protein design. Yet, current approaches suffer some serious limitations. By not taking protein’s backbone motions into account, they fail at capturing some of the properties of the candidate design and cannot guarantee that the solution will in fact be stable for the goal conformation. Besides, although multi-states design methods have been proposed, they do not guarantee that a feasible trajectory between those states exists, which means that design problem involving state transitions are out of reach of the current methods. This thesis investigates how robotics-inspired algorithms can be used to efficiently explore the conformational landscape of a protein aiming to enhance protein design methods by introducing additional backbone flexibility. This work also provides first milestones towards protein motion design.

APA, Harvard, Vancouver, ISO, and other styles

45

Wistrand, Markus. "Hidden Markov models for remote protein homology detection /." Stockholm, 2005. http://diss.kib.ki.se/2006/91-7140-598-4/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Hom, Geoffrey Deshaies Raymond Joseph. "Advances in computational protein design : development of more efficient search algorithms and their application to the full-sequence design of larger proteins /." Diss., Pasadena, Calif. : California Institute of Technology, 2005. http://resolver.caltech.edu/CaltechETD:etd-05302005-223153.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Bakare, Olalekan Olanrewaju. "Identification and Molecular validation of Biomarkers for the accurate and sensitive diagnosis of bacterial and viral Pneumonia." University of Western Cape, 2019. http://hdl.handle.net/11394/7421.

Full text

Abstract:

Philosophiae Doctor - PhD
Pneumonia remains the major cause of death in children and the elderly and several efforts have been intensified to reduce the rate of pneumonia infection. The major breakthrough has been the discovery of certain biomarkers for the diagnosis of pneumonia through immunogenic techniques.

APA, Harvard, Vancouver, ISO, and other styles

48

Kolar, Michal. "Statistical Physics and Message Passing Algorithms. Two Case Studies: MAX-K-SAT Problem and Protein Flexibility." Doctoral thesis, SISSA, 2006. http://hdl.handle.net/20.500.11767/4659.

Full text

Abstract:

In the last decades the tl1eory of spin glasses has been developed within the framework of statistical physics. The obtained results showed to be novel not only from the physical point of vie\l\'1 but they have brought also new mathematical techniques and algorithmic approaches. Indeed, the problem of finding ground state of a spin glass is (in general) NP-complete. The methods that were found brought new ideas to the field of Combinatorial Optimization, and on the other side, the similar methods of Combinatorial Optimization, were applied in physical systems. As it happened with the Monte Carlo sampling and the Simulated Annealing, also the novel Cavity Method lead to algorithms that are open to wide use in various fields of research The Cavity Method shows to be equivalent to Bethe Approximation in its most symmetric version, and the derived algorithm is equivalent to the Belief Propagation, an inference method used widely for example in the field of Pattern Recognition. The Cavity Method in a less symmetric situation, when one has to consider correctly the clustering of the configuration space, lead to a novel messagepassing algorithm-the Survey Propagation. The class of Message-Passing algorithms, among which both the Belief Propagation and the Survey Propagation belong, has found its application as Inference Algorithms in many engineering fields. Among others let us :mention the Low-Density Parity-Check Codes, that are widely used as ErrorCorrecting Codes for communication over noisy cha1mels. In the first part of this work we have compared efficiency of the Survey Propagation Algorithm and of standard heuristic algorithms in the case of the random-MAX-K-SAT problem. The results showed that the algorithms perform similarly in the regions where the clustering of configuration space does not appeai~ but that the Survey Propagation finds much better solutions to the optimization problem in the critical region where one has to consider existence of many ergodic components explicitly. The second part of the thesis targets the problem of protein structure and flexibility. In many proteins the mobility of certain regions and rigidity of other regions of their structure is crucial for their function or interaction with other cellular elements. Our simple model tries to point out the flexible regions from the knowledge of native 3D-structure of the protein. The problem is mapped to a spin glass model which is successfully solved by the Believe Propagation algorithm.

APA, Harvard, Vancouver, ISO, and other styles

49

Lan, Liang. "Data Mining Algorithms for Classification of Complex Biomedical Data." Diss., Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/214773.

Full text

Abstract:

Computer and Information Science
Ph.D.
In my dissertation, I will present my research which contributes to solve the following three open problems from biomedical informatics: (1) Multi-task approaches for microarray classification; (2) Multi-label classification of gene and protein prediction from multi-source biological data; (3) Spatial scan for movement data. In microarray classification, samples belong to several predefined categories (e.g., cancer vs. control tissues) and the goal is to build a predictor that classifies a new tissue sample based on its microarray measurements. When faced with the small-sample high-dimensional microarray data, most machine learning algorithm would produce an overly complicated model that performs well on training data but poorly on new data. To reduce the risk of over-fitting, feature selection becomes an essential technique in microarray classification. However, standard feature selection algorithms are bound to underperform when the size of the microarray data is particularly small. The best remedy is to borrow strength from external microarray datasets. In this dissertation, I will present two new multi-task feature filter methods which can improve the classification performance by utilizing the external microarray data. The first method is to aggregate the feature selection results from multiple microarray classification tasks. The resulting multi-task feature selection can be shown to improve quality of the selected features and lead to higher classification accuracy. The second method jointly selects a small gene set with maximal discriminative power and minimal redundancy across multiple classification tasks by solving an objective function with integer constraints. In protein function prediction problem, gene functions are predicted from a predefined set of possible functions (e.g., the functions defined in the Gene Ontology). Gene function prediction is a complex classification problem characterized by the following aspects: (1) a single gene may have multiple functions; (2) the functions are organized in hierarchy; (3) unbalanced training data for each function (much less positive than negative examples); (4) missing class labels; (5) availability of multiple biological data sources, such as microarray data, genome sequence and protein-protein interactions. As participants in the 2011 Critical Assessment of Function Annotation (CAFA) challenge, our team achieved the highest AUC accuracy among 45 groups. In the competition, we gained by focusing on the 5-th aspect of the problem. Thus, in this dissertation, I will discuss several schemes to integrate the prediction scores from multiple data sources and show their results. Interestingly, the experimental results show that a simple averaging integration method is competitive with other state-of-the-art data integration methods. Original spatial scan algorithm is used for detection of spatial overdensities: discovery of spatial subregions with significantly higher scores according to some density measure. This algorithm is widely used in identifying cluster of disease cases (e.g., identifying environmental risk factors for child leukemia). However, the original spatial scan algorithm only works on static spatial data. In this dissertation, I will propose one possible solution for spatial scan on movement data.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

50

Mathuriya, Amrita. "Prediction of secondary structures for large RNA molecules." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/28195.

Full text

Abstract:

Thesis (M. S.)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Bader, David; Committee Co-Chair: Heitsch, Christine; Committee Member: Harvey, Stephen; Committee Member: Vuduc, Richard.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Algorithms- Protein'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles