Dissertations / Theses: 'Sequence alignment'

1

Starrett, Dean. "Optimal Alignment of Multiple Sequence Alignments." Diss., The University of Arizona, 2008. http://hdl.handle.net/10150/194840.

Full text

Abstract:

An essential tool in biology is the alignment of multiple sequences. Biologists use multiple sequence alignments for tasks such as predicting protein structure and function, reconstructing phylogenetic trees, and finding motifs. Constructing high-quality multiple alignments is computationally hard, both in theory and in practice, and is typically done using heuristic methods. The majority of state-of-the-art multiple alignment programs employ a form and polish strategy, where in the construction phase, an initial multiple alignment is formed by progressively merging smaller alignments, starting with single sequences. Then in a local-search phase, the resulting alignment is polished by repeatedly splitting it into smaller alignments and re-merging. This merging of alignments, the basic computational problem in the construction and local-search phases of the best multiple alignment heuristics, is called the Aligning Alignments Problem. Under the sum-of-pairs objective for scoring multiple alignments, this problem may seem to be a simple extension of two-sequence alignment. It is proven here, however, that with affine gap costs (which are recognized as necessary to get biologically-informative alignments) the problem is NP-complete when gaps are counted exactly. Interestingly, this form of multiple alignment is polynomial-time solvable when we relax the exact count, showing that exact gap counts themselves are inherently hard in multiple sequence alignment. Unlike general multiple alignment however, we show that Aligning Alignments with affine gap costs and exact counts is tractable in practice, by demonstrating an effective algorithm and a fast implementation. Our software AlignAlign is both time- and space-efficient on biological data. Computational experiments on biological data show instances derived from standard benchmark suites can be optimally aligned with surprising efficiency, and experiments on simulated data show the time and space both scale well.

APA, Harvard, Vancouver, ISO, and other styles

2

Chia, Nicholas Lee-Ping. "Sequence alignment." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1154616122.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Al, Ghamdi Manal. "Video sequence alignment." Thesis, University of Sheffield, 2015. http://etheses.whiterose.ac.uk/9056/.

Full text

Abstract:

The task of aligning multiple audio visual sequences with similar contents needs careful synchronisation in both spatial and temporal domains. It is a challenging task due to a broad range of contents variations, background clutter, occlusions, and other factors. This thesis is concerned with aligning video contents by characterising the spatial and temporal information embedded in the high-dimensional space. To that end a three- stage framework is developed, involving space-time representation of video clips with local linear coding, followed by their alignment in the manifold embedded space. The first two stages present a video representation techniques based on local feature extraction and linear coding methods. Firstly, the scale invariant feature transform (SIFT) is extended to extract interest points not only from the spatial plane but also from the planes along the space-time axis. Locality constrained coding is then incorporated to project each descriptor into a local coordinate system produced by a pooling technique. Human action classification benchmarks are adopted to evaluate these two stages, comparing their performance against existing techniques. The results shows that space-time extension of SIFT with a linear coding scheme outperforms most of the state-of-the-art approaches on the action classification task owing to its ability to represent complex events in video sequences. The final stage presents a manifold learning algorithm with spatio-temporal constraints to embed a video clip in a lower dimensional space while preserving the intrinsic geometry of the data. The similarities observed between frame sequences are captured by defining two types of correlation graphs: an intra-correlation graph within a single video sequence and an inter-correlation graph between two sequences. A video retrieval and ranking tasks are designed to evaluate the manifold learning stage. The experimental outcome shows that the approach outperforms the conventional techniques in defining similar video contents and capture the spatio-temporal correlations between them.

APA, Harvard, Vancouver, ISO, and other styles

4

Sammeth, Michael. "Integrated multiple sequence alignment." [S.l.] : [s.n.], 2005. http://deposit.ddb.de/cgi-bin/dokserv?idn=98148767X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Powell, David Richard 1973. "Algorithms for sequence alignment." Monash University, School of Computer Science and Software Engineering, 2001. http://arrow.monash.edu.au/hdl/1959.1/8051.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Birney, Ewan. "Sequence alignment in bioinformatics." Thesis, University of Cambridge, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621653.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Fleissner, Roland. "Sequence alignment and phylogenetic inference." Berlin : Logos Verlag, 2004. http://diss.ub.uni-duesseldorf.de/ebib/diss/file?dissid=769.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Fleissner, Roland. "Sequence alignment and phylogenetic inference." [S.l. : s.n.], 2003. http://deposit.ddb.de/cgi-bin/dokserv?idn=971844704.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Auer, Jens. "Metaheuristic Multiple Sequence Alignment Optimisation." Thesis, University of Skövde, School of Humanities and Informatics, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-899.

Full text

Abstract:

The ability to tackle NP-hard problems has been greatly extended by the introduction of Metaheuristics (see Blum & Roli (2003)) for a summary of most Metaheuristics, general problem-independent optimisation algorithms extending the hill-climbing local search approach to escape local minima. One of these algorithms is Iterated Local Search (ILS) (Lourenco et al., 2002; Stützle, 1999a, p. 25ff), a recent easy to implement but powerful algorithm with results comparable or superior to other state-of-the-art methods for many combinatorial optimisation problems, among them the Traveling Salesman (TSP) and Quadratic Assignment Problem (QAP). ILS iteratively samples local minima by modifying the current local minimum and restarting

a local search porcedure on this modified solution. This thesis will show how ILS can be implemented for MSA. After that, ILS will be evaluated and compared to other MSA algorithms by BAliBASE (Thomson et al., 1999), a set of manually refined alignments used in most recent publications of algorithms and in at least two MSA algorithm surveys. The runtime-behaviour will be evaluated using runtime-distributions.

The quality of alignments produced by ILS is at least as good as the best algorithms available and significantly superiour to previously published Metaheuristics for MSA, Tabu Search and Genetic Algorithm (SAGA). On the average, ILS performed best in five out of eight test cases, second for one test set and third for the remaining two. A drawback of all iterative methods for MSA is the long runtime needed to produce good alignments. ILS needs considerably less runtime than Tabu Search and SAGA, but can not compete with progressive or consistency based methods, e. g. ClustalW or T-COFFEE.

APA, Harvard, Vancouver, ISO, and other styles

10

Arvestad, Lars. "Algorithms for biological sequence alignment." Doctoral thesis, KTH, Numerisk analys och datalogi, NADA, 1999. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-2905.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Ho, Ngai-lam, and 何毅林. "Algorithms on constrained sequence alignment." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30201949.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Carroll, Hyrum D. "Biologically Relevant Multiple Sequence Alignment." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2623.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Grossmann, Steffen. "Statistics of optimal sequence alignments." [S.l. : s.n.], 2003. http://deposit.ddb.de/cgi-bin/dokserv?idn=968907466.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Alimehr, Leila. "The Performance of Sequence Alignment Algorithms." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-200289.

Full text

Abstract:

This thesis deals with sequence alignment algorithms. The sequence alignment is a mutual arrange of two or more sequences in order to study their similarity and dissimilarity. Four decades after the seminal work by Needleman and Wunsch in 1970, these methods still need more explorations. We start out with a review of a sequence alignment, and its generalization to multiple alignments, although the focus of this thesis is on the evaluation of the new alignment algorithms. The research presented here in has stepped into the different algorithms that are in terms of the dynamic programming. In the study of sequence alignment algorithms, two powerful techniques have been invented. According to the simulations, the new algorithms are shown to be extremely efficient for the comparing DNA sequences. All the sequence alignment algorithmsare compared in terms of the distance. We use the programming language R for the implementation and simulation of the algorithms discussed in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

15

DeBlasio, Daniel Frank. "Parameter Advising for Multiple Sequence Alignment." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/612932.

Full text

Abstract:

The problem of aligning multiple protein sequences is essential to many biological analyses, but most standard formulations of the problem are NP-complete. Due to both the difficulty of the problem and its practical importance, there are many heuristic multiple sequence aligners that a researcher has at their disposal. A basic issue that frequently arises is that each of these alignment tools has a multitude of parameters that must be set, and which greatly affect the quality of the alignment produced. Most users rely on the default parameter setting that comes with the aligner, which is optimal on average, but can produce a low-quality alignment for the given inputs. This dissertation develops an approach called parameter advising to find a parameter setting that produces a high-quality alignment for each given input. A parameter advisor aligns the input sequences for each choice in a collection of parameter settings, and then selects the best alignment from the resulting alignments produced. A parameter advisor has two major components: (i) an advisor set of parameter choices that are given to the aligner, and (ii) an accuracy estimator that is used to rank alignments produced by the aligner. Alignment accuracy is measured with respect to a known reference alignment, in practice a reference alignment is not available, and we can only estimate accuracy. We develop a new accuracy estimator that we call called Facet (short for "feature-based accuracy estimator") that computes an accuracy estimate as a linear combination of efficiently-computable feature functions, whose coefficients are learned by solving a large scale linear programming problem. We also develop an efficient approximation algorithm for finding an advisor set of a given cardinality for a fixed estimator, whose cardinality should ideally small, as the aligner is invoked for each parameter choice in the set. Using Facet for parameter advising boosts advising accuracy by almost 20% beyond using a single default parameter choice for the hardest-to-align benchmarks. This dissertation further applies parameter advising in two ways: (i) to ensemble alignment, which uses the advising process on a collection of aligners to choose both the aligner and its parameter settings, and (ii) to adaptive local realignment, which can align different regions of the input sequences with distinct parameter choices to conform to mutation rates as they vary across the lengths of the sequences.

APA, Harvard, Vancouver, ISO, and other styles

16

Guasco, Luciano M. "Multiple sequence alignment correction using constraints." Master's thesis, Faculdade de Ciências e Tecnologia, 2010. http://hdl.handle.net/10362/5143.

Full text

Abstract:

Trabalho apresentado no âmbito do European Master in Computational Logics, como requisito parcial para obtenção do grau de Mestre em Computational Logics
One of the most important fields in bioinformatics has been the study of protein sequence alignments. The study of homologous proteins, related by evolution, shows the conservation of many amino acids because of their functional and structural importance. One particular relationship between the amino acid sites in the same sequence or between different sequences, is protein-coevolution, interest in which has increased as a consequence of mathematical and computational methods used to understand the spatial, functional and evolutionary dependencies between amino acid sites. The principle of coevolution means that some amino acids are related through evolution because mutations in one site can create evolutionary pressures to select compensatory mutations in other sites that are functionally or structurally related. With the actual methods to detect coevolution, specifically mutual information techniques from the information theory field, we show in this work that much of the information between coevolved sites is lost because of mistakes in the multiple sequence alignment of variable regions. Moreover, we show that using these statistical methods to detect coevolved sites in multiple sequence alignments results in a high rate of false positives. Due to the amount of errors in the detection of coevolved site from multiple sequence alignments, we propose in this work a method to improve the detection efficacy of coevolved sites and we implement an algorithm to fix such sites correcting the misalignment produced in those specific locations. The detection part of our work is based on the mutual information between sites that are guessed as having coevolved, due to their high statistical correlation score. With this information we search for possible misalignments on those regions due to the incorrect matching of amino acids during the alignment. The re-alignment part is based on constraint programming techniques, to avoid the combinatorial complexity when one amino acid can be aligned with many others and to avoid inconsistencies in the alignments. In this work, we present a framework to impose constraints over the sequences, and we show how it is possible to compute alignments based on different criteria just by setting constraint between the amino acids. This framework can be applied not only for improving the alignment and detection of coevolved regions, but also to any desired constraints that may be used to express functional or structural relations among the amino acids in multiple sequences. We show also that after we fix these misalignments, using constraints based techniques, the correlation between coevolved sites increases and, in general, the new alignment is closer to the correct alignment than the MSA alignment. Finally, we show possible future research lines with the objective of overcoming some drawbacks detected during this work.

APA, Harvard, Vancouver, ISO, and other styles

17

Zhao, Kaiyong. "GPU accelerated sequence alignment /Zhao Kaiyong." HKBU Institutional Repository, 2016. https://repository.hkbu.edu.hk/etd_oa/378.

Full text

Abstract:

DNA sequence alignment is a fundamental task in gene information processing, which is about searching the location of a string (usually based on newly collected DNA data) in the existing huge DNA sequence databases. Due to the huge amount of newly generated DNA data and the complexity of approximate string match, sequence alignment becomes a time-consuming process. Hence how to reduce the alignment time becomes a significant research problem. Some algorithms of string alignment based on HASH comparison, suffix array and BWT, which have been proposed for DNA sequence alignment. Although these algorithms have reached the speed of O(N), they still cannot meet the increasing demand if they are running on traditional CPUs. Recently, GPUs have been widely accepted as an efficient accelerator for many scientific and commercial applications. A typical GPU has thousands of processing cores which can speed up repetitive computations significantly as compared to multi-core CPUs. However, sequence alignment is one kind of computation procedure with intensive data access, i.e., it is memory-bounded. The access to GPU memory and IO has more significant influence in performance when compared to the computing capabilities of GPU cores. By analyzing GPU memory and IO characteristics, this thesis produces novel parallel algorithms for DNA sequence alignment applications. This thesis consists of six parts. The first two parts explain some basic knowledge of DNA sequence alignment and GPU computing. The third part investigates the performance of data access on different types of GPU memory. The fourth part describes a parallel method to accelerate short-read sequence alignment based on BWT algorithm. The fifth part proposes the parallel algorithm for accelerating BLASTN, one of the most popular sequence alignment software. It shows how multi-threaded control and multiple GPU cards can accelerate the BLASTN algorithm significantly. The sixth part concludes the whole thesis. To summarize, through analyzing the layout of GPU memory and comparing data under the mode of multithread access, this thesis analyzes and concludes a perfect optimization method to achieve sequence alignment on GPU. The outcomes can help practitioners in bioinformatics to improve their working efficiency by significantly reducing the sequence alignment time.

APA, Harvard, Vancouver, ISO, and other styles

18

Löytynoja, Ari. "Molecular sequence alignment and character homology." Doctoral thesis, Universite Libre de Bruxelles, 2003. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/211261.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Zola, Jaroslaw. "Parallel server for multiple sequence alignment." Grenoble INPG, 2005. http://www.theses.fr/2005INPG0187.

Full text

Abstract:

Dans ce travail réalisé au cours du Doctorat, nous nous proposons d'étudier des techniques du calcul parallèle (en particulier, les caches web) pour optimiser l'alignement multiple de séquences. Nous développons une méthode générique pour gérer un cache local ou distribué et nous présentons un système de cache décentralisé gardant en mémoire les résultats intermédiaires ainsi que les alignements de séquences. Enfin, nous construisons un serveur parallèle utilisant les techniques précédentes permettant d'aligner plus rapidement des ensembles de séquences de grandes tailles (des milliers de séquences composées elles-mêmes de milliers depaires de bases). Ce serveur est basé sur un algorithme PhylTree, développé au Laboratoire ID-IMAG, qui est un schéma générique qui permet de construire simultanément l'alignement et la phylogénie. Le système de cache a été implémenté, le logiciel est disponible et a été utilisé en dehors du laboratoire pour plusieurs autres applications. Finalement, nous avons proposé également quelques extensions à PhylTree, comme par exemple l'utilisation du recuit simulé pour améliorer l'efficacité de l'analyse phylogénétique
Ln this work we investigate application of parallel processing and web-caching as a method to improve the efficiency of multiple sequence alignment. We develop a generic framework for distributed and local cache implementation, and we design decentralised caching system storing intermediate results of sequence alignment. Finally, we create a parallel server for multiple sequence alignment which utilises above techniques to speedup processing of large sequence sets. The server is based on the PhylTree method which is a generic scheme for multiple sequence alignment with simultaneous phylogeny, developed in the Laboratory ID-IMAG. Ln our work we propose also sorne extensions of PhylTree, like for example the application of simulated annealing to improve the efficiency of phylogenetic analysis

APA, Harvard, Vancouver, ISO, and other styles

20

Gîrdea, Marta. "New methods for biological sequence alignment." Thesis, Lille 1, 2010. http://www.theses.fr/2010LIL10089/document.

Full text

Abstract:

L'alignement de séquences biologiques est une technique fondamentale en bioinformatique, et consiste à identifier des séries de caractères similaires (conservés) qui apparaissent dans le même ordre dans les deux séquences, et à inférer un ensemble de modifications (substitutions, insertions et suppressions) impliquées dans la transformation d'une séquence en l'autre. Cette technique permet de déduire, sur la base de la similarité de séquence, si deux ou plusieurs séquences biologiques sont potentiellement homologues, donc si elles partagent un ancêtre commun, permettant ainsi de mieux comprendre l'évolution des séquences. Cette thèse aborde les problèmes de comparaison de séquences dans deux cadres différents: la détection d'homologies et le séquençage à haut débit. L'objectif de ce travail est de développer des méthodes d'alignement qui peuvent apporter des solutions aux deux problèmes suivants: i) la détection d'homologies cachées entre des protéines par comparaison de séquences protéiques, lorsque la source de leur divergence sont les mutations qui changent le cadre de lecture, et ii) le mapping de reads SOLiD (séquences de di-nucléotides chevauchantes codés par des couleurs) sur un génome de référence. Dans les deux cas, la même idée générale est appliquée: comparer implicitement les séquences d'ADN pour la détection de changements qui se produisent à ce niveau, en manipulant, en pratique, d'autres représentations (séquences de protéines, séquences de codes di-nucléotides) qui fournissent des informations supplémentaires et qui aident à améliorer la recherche de similarités. Le but est de concevoir et d'appliquer des méthodes exactes et heuristiques d'alignement, ainsi que des systemes de scores, adaptés à ces scénarios
Biological sequence alignment is a fundamental technique in bioinformatics, and consists of identifying series of similar (conserved) characters that appear in the same order in both sequences, and eventually deducing a set of modifications (substitutions, insertions and deletions) involved in the transformation of one sequence into the other. This technique allows one to infer, based on sequence similarity, if two or more biological sequences are potentially homologous, i.e. if they share a common ancestor, thus enabling the understanding of sequence evolution.This thesis addresses sequence comparison problems in two different contexts: homology detection and high throughput DNA sequencing. The goal of this work is to develop sensitive alignment methods that provide solutions to the following two problems: i) the detection of hidden protein homologies by protein sequence comparison, when the source of the divergence are frameshift mutations, and ii) mapping short SOLiD reads (sequences of overlapping di-nucleotides encoded as colors) to a reference genome. In both cases, the same general idea is applied: to implicitly compare DNA sequences for detecting changes occurring at this level, while manipulating, in practice, other representations (protein sequences, sequences of di-nucleotide codes) that provide additional information and thus help to improve the similarity search. The aim is to design and implement exact and heuristic alignment methods, along with scoring schemes, adapted to these scenarios

APA, Harvard, Vancouver, ISO, and other styles

21

Jiang, Tianwei. "Sequence alignment : algorithm development and applications /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?ECED%202009%20JIANG.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Isa, Mohammad Nazrin. "High performance reconfigurable architectures for biological sequence alignment." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/7721.

Full text

Abstract:

Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort.

APA, Harvard, Vancouver, ISO, and other styles

23

Nguyen, Ken D. "Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/cs_diss/62.

Full text

Abstract:

Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences' structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm.

APA, Harvard, Vancouver, ISO, and other styles

24

Garriga, Nogales Edgar 1990. "New algorithmic contributions for large scale multiple sequence alignments of protein sequences." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2022. http://hdl.handle.net/10803/673526.

Full text

Abstract:

In these days of significant changes and the rapid evolution of technology, the amount of datascience has to deal with the growth incredibly fast, and the size of data could be prohibitive.Multiple Sequence Alignments (MSA) are used in various areas of biology, and the increase ofdata has produced a degradation of the methods. That is why is proposed a new solution toperform the MSA. This novel paradigm allows the alignment of millions of sequences and theability to modularize the process. Regressive enables the parallelization of the process and thecombination of clustering methods (guide-tree) with whatever aligner is desired. On theclustering side, the guide-tree has to be rethought. A study of the current state of the methodsand their strength and weaknesses have been performed to shed some light on the topic. Theguide-tree cannot be the bottleneck, and it should provide a good starting point for the aligners.
En aquests dies de profunds canvis i una ràpida evolució de la tecnologia, la quantitat de dataque la ciència ha de treballar ha crescut increïblement ràpid i la grandària dels arxius ha crescutde manera quasi prohibitiva.Els alineaments múltiples de seqüència (MSA) es fan servir endiverses àrees de la biologia, i l'increment de les dades ha produït una degradació delsresultats. És per això, que es proposa una nova estratègia per realitzar els alineaments. Aquestnou paradigma permet alinear milions de seqüències i l'opcio de modularitzar el procés.'Regressive' permet la paral·lelització del procés i la combinació de diferents algoritmesd'agrupacio (guide-tree) amb el mètode de alineament que és desitgi. Dins del camp del'agrupació, s'ha de repensar l'estratègia per crear els guide-tree. Un estudi sobre l'estat actualdels mètodes i les seves virtuts i punts febles ha sigut realitzar per llençar una mica de llum enaquesta àrea. Els 'guide-tree' no poden ser el coll de botella, i haurien de servir per començarde la millor manera possible el procés d'alineament.

APA, Harvard, Vancouver, ISO, and other styles

25

Lu, Yue. "Improving the quality of multiple sequence alignment." [College Station, Tex. : Texas A&M University, 2008. http://hdl.handle.net/1969.1/ETD-TAMU-3111.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Orobitg, Cortada Miquel. "High performance computing on biological sequence alignment." Doctoral thesis, Universitat de Lleida, 2013. http://hdl.handle.net/10803/110930.

Full text

Abstract:

L'Alineament Múltiple de Seqüències (MSA) és una eina molt potent per a aplicacions biològiques importants. Els MSA són computacionalment complexos de calcular, i la majoria de les formulacions porten a problemes d'optimització NP-Hard. Per a dur a terme alineaments de milers de seqüències, nous desafiaments necessiten ser resolts per adaptar els algoritmes a l'era de la computació d'altes prestacions. En aquesta tesi es proposen tres aportacions diferents per resoldre algunes limitacions dels mètodes MSA. La primera proposta consisteix en un algoritme de construcció d'arbres guia per millorar el grau de paral•lelisme, amb la finalitat de resoldre el coll d'ampolla de l'etapa de l'alineament progressiu. La segona proposta consisteix en optimitzar la biblioteca de consistència per millorar el temps d'execució, l'escalabilitat, i poder tractar un major nombre de seqüències. Finalment, proposem Multiples Trees Alignment (MTA), un mètode MSA per alinear en paral•lel múltiples arbres guia, avaluar els alineaments obtinguts i seleccionar el millor com a resultat. Els resultats experimentals han demostrat que MTA millora considerablement la qualitat dels alineaments. El Alineamiento Múltiple de Secuencias (MSA) es una herramienta poderosa para aplicaciones biológicas importantes. Los MSA son computacionalmente complejos de calcular, y la mayoría de las formulaciones llevan a problemas de optimización NP-Hard. Para llevar a cabo alineamientos de miles de secuencias, nuevos desafíos necesitan ser resueltos para adaptar los algoritmos a la era de la computación de altas prestaciones. En esta tesis se proponen tres aportaciones diferentes para resolver algunas limitaciones de los métodos MSA. La primera propuesta consiste en un algoritmo de construcción de árboles guía para mejorar el grado de paralelismo, con el fin de resolver el cuello de botella de la etapa del alineamiento progresivo. La segunda propuesta consiste en optimizar la biblioteca de consistencia para mejorar el tiempo de ejecución, la escalabilidad, y poder tratar un mayor número de secuencias. Finalmente, proponemos Múltiples Trees Alignment (MTA), un método MSA para alinear en paralelo múltiples árboles guía, evaluar los alineamientos obtenidos y seleccionar el mejor como resultado. Los resultados experimentales han demostrado que MTA mejora considerablemente la calidad de los alineamientos. Multiple Sequence Alignment (MSA) is a powerful tool for important biological applications. MSAs are computationally difficult to calculate, and most formulations of the problem lead to NP-Hard optimization problems. To perform large-scale alignments, with thousands of sequences, new challenges need to be resolved to adapt the MSA algorithms to the High-Performance Computing era. In this thesis we propose three different approaches to solve some limitations of main MSA methods. The first proposal consists of a new guide tree construction algorithm to improve the degree of parallelism in order to resolve the bottleneck of the progressive alignment stage. The second proposal consists of optimizing the consistency library, improving the execution time and the scalability of MSA to enable the method to treat more sequences. Finally, we propose Multiple Trees Alignments (MTA), a MSA method to align in parallel multiple guide-trees, evaluate the alignments obtained and select the best one as a result. The experimental results demonstrated that MTA improves considerably the quality of the alignments.

APA, Harvard, Vancouver, ISO, and other styles

27

Yang, Qian 1973. "RNA sequence alignment and secondary structure prediction." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=82453.

Full text

Abstract:

Functional RNA sequences typically have structural elements that are highly conserved during evolution. Here we present an algorithmic method for multiple alignment of RNAs, taking into consideration both structural similarity and sequence identity. Furthermore, we performed a comparative analysis on pairing probability matrices of a set of aligned orthologous sequences and predicted the conserved secondary structure. Our alignment method outperforms the most widely used multiple alignment tool - Clustal W, and the structure prediction approach we proposed can generate a more accurate secondary structure for 5S rRNA compared to the existing approaches such as Alifold. In addition, our algorithms are efficient in terms of CPU time and memory usage compared to most existing methods for secondary structure prediction.

APA, Harvard, Vancouver, ISO, and other styles

28

Holmes, I. "Studies in probabilistic sequence alignment and evolution." Thesis, University of Cambridge, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.604192.

Full text

Abstract:

The complete sequencing of whole genomes presents opportunities for detailed study of molecular evolution. This thesis combines theoretical developments of Bayesian approaches in bioinformatics with analysis of duplications in the recently completed C. elegans genome. Developments in the Bayesian probabilistic framework for sequence analysis using hidden Markov models (HMMs) are described. The principal HMM algorithms are reviewed including alignment, training and model comparison. Theory is developed for prediction of alignment accuracy and tested using simulations. Software to provide accuracy measures for multiple alignments, based on the popular HMMER suite of profile-based alignment algorithms, is presented and evaluated with reference to the Pfam database of multiple alignments. Several of these statistical techniques are applied to an analysis of genomic duplications in the C. elegans genome. The completion of this - the first animal genome - offers an opportunity to study the random duplications that are believed to be the first step in the evolution of a new gene. The construction of a database of non-coding duplications is described and measurements of molecular evolutionary parameters in C. elegans are calculated from the data and reported. A method of dating gene duplications using alignments between conserved introns is presented and compared to existing methods using Bayesian techniques developed earlier in the dissertation. Amongst the principal agents involved in creating genomic duplications are transposons; one of the simplest families of transposon is the Tcl-mariner family, of which two distinct active subfamilies are well-known in C. elegans. Using HMM profiles, six new subfamilies of mariner-like transposon have been identified in the C. elegans genome. Several of the new subfamilies display interesting homologies to one another, suggestive of common mechanisms of transpositional catalysis. Finally, the software tools developed during this project are described and made available for public retrieval from the Sanger Centre web site.

APA, Harvard, Vancouver, ISO, and other styles

29

Bulancea, Lindvall Oscar. "Quantum Methods for Sequence Alignment and Metagenomics." Thesis, KTH, Tillämpad fysik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-256349.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Büschking, Christian. "Incorporation of structural information in RNA sequence alignment." [S.l. : s.n.], 2001. http://deposit.ddb.de/cgi-bin/dokserv?idn=969343140.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Rausch, Tobias [Verfasser]. "Dissecting multiple sequence alignment methods : the analysis, design and development of generic multiple sequence alignment components in SeqAn / Tobias Rausch." Berlin : Freie Universität Berlin, 2010. http://d-nb.info/1024541460/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Marco-Sola, Santiago. "Efficient approximate string matching techniques for sequence alignment." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/460835.

Full text

Abstract:

One of the outstanding milestones achieved in recent years in the field of biotechnology research has been the development of high-throughput sequencing (HTS). Due to the fact that at the moment it is technically impossible to decode the genome as a whole, HTS technologies read billions of relatively short chunks of a genome at random locations. Such reads then need to be located within a reference for the species being studied (that is aligned or mapped to the genome): for each read one identifies in the reference regions that share a large sequence similarity with it, therefore indicating what the read¿s point or points of origin may be. HTS technologies are able to re-sequence a human individual (i.e. to establish the differences between his/her individual genome and the reference genome for the human species) in a very short period of time. They have also paved the way for the development of a number of new protocols and methods, leading to novel insights in genomics and biology in general. However, HTS technologies also pose a challenge to traditional data analysis methods; this is due to the sheer amount of data to be processed and the need for improved alignment algorithms that can generate accurate results quickly. This thesis tackles the problem of sequence alignment as a step within the analysis of HTS data. Its contributions focus on both the methodological aspects and the algorithmic challenges towards efficient, scalable, and accurate HTS mapping. From a methodological standpoint, this thesis strives to establish a comprehensive framework able to assess the quality of HTS mapping results. In order to be able to do so one has to understand the source and nature of mapping conflicts, and explore the accuracy limits inherent in how sequence alignment is performed for current HTS technologies. From an algorithmic standpoint, this work introduces state-of-the-art index structures and approximate string matching algorithms. They contribute novel insights that can be used in practical applications towards efficient and accurate read mapping. More in detail, first we present methods able to reduce the storage space taken by indexes for genome-scale references, while still providing fast query access in order to support effective search algorithms. Second, we describe novel filtering techniques that vastly reduce the computational requirements of sequence mapping, but are nonetheless capable of giving strict algorithmic guarantees on the completeness of the results. Finally, this thesis presents new incremental algorithmic techniques able to combine several approximate string matching algorithms; this leads to efficient and flexible search algorithms allowing the user to reach arbitrary search depths. All algorithms and methodological contributions of this thesis have been implemented as components of a production aligner, the GEM-mapper, which is publicly available, widely used worldwide and cited by a sizeable body of literature. It offers flexible and accurate sequence mapping while outperforming other HTS mappers both as to running time and to the quality of the results it produces.
Uno de los avances más importantes de los últimos años en el campo de la biotecnología ha sido el desarrollo de las llamadas técnicas de secuenciación de alto rendimiento (high-throughput sequencing, HTS). Debido a las limitaciones técnicas para secuenciar un genoma, las técnicas de alto rendimiento secuencian individualmente billones de pequeñas partes del genoma provenientes de regiones aleatorias. Posteriormente, estas pequeñas secuencias han de ser localizadas en el genoma de referencia del organismo en cuestión. Este proceso se denomina alineamiento - o mapeado - y consiste en identificar aquellas regiones del genoma de referencia que comparten una alta similaridad con las lecturas producidas por el secuenciador. De esta manera, en cuestión de horas, la secuenciación de alto rendimiento puede secuenciar un individuo y establecer las diferencias de este con el resto de la especie. En última instancia, estas tecnologías han potenciado nuevos protocolos y metodologías de investigación con un profundo impacto en el campo de la genómica, la medicina y la biología en general. La secuenciación alto rendimiento, sin embargo, supone un reto para los procesos tradicionales de análisis de datos. Debido a la elevada cantidad de datos a analizar, se necesitan nuevas y mejoradas técnicas algorítmicas que puedan escalar con el volumen de datos y producir resultados precisos. Esta tesis aborda dicho problema. Las contribuciones que en ella se realizan se enfocan desde una perspectiva metodológica y otra algorítmica que propone el desarrollo de nuevos algoritmos y técnicas que permitan alinear secuencias de manera eficiente, precisa y escalable. Desde el punto de vista metodológico, esta tesis analiza y propone un marco de referencia para evaluar la calidad de los resultados del alineamiento de secuencias. Para ello, se analiza el origen de los conflictos durante la alineación de secuencias y se exploran los límites alcanzables en calidad con las tecnologías de secuenciación de alto rendimiento. Desde el punto de vista algorítmico, en el contexto de la búsqueda aproximada de patrones, esta tesis propone nuevas técnicas algorítmicas y de diseño de índices con el objetivo de mejorar la calidad y el desempeño de las herramientas dedicadas a alinear secuencias. En concreto, esta tesis presenta técnicas de diseño de índices genómicos enfocados a obtener un acceso más eficiente y escalable. También se presentan nuevas técnicas algorítmicas de filtrado con el fin de reducir el tiempo de ejecución necesario para alinear secuencias. Y, por último, se proponen algoritmos incrementales y técnicas híbridas para combinar métodos de alineamiento y mejorar el rendimiento en búsquedas donde el error esperado es alto. Todo ello sin degradar la calidad de los resultados y con garantías formales de precisión. Para concluir, es preciso apuntar que todos los algoritmos y metodologías propuestos en esta tesis están implementados y forman parte del alineador GEM. Este versátil alineador ofrece resultados de alta calidad en entornos de producción siendo varias veces más rápido que otros alineadores. En la actualidad este software se ofrece gratuitamente, tiene una amplia comunidad de usuarios y ha sido citado en numerosas publicaciones científicas.

APA, Harvard, Vancouver, ISO, and other styles

33

Lightner, Carin Ann. "A Tabu Search Approach to Multiple Sequence Alignment." NCSU, 2008. http://www.lib.ncsu.edu/theses/available/etd-05312008-191232/.

Full text

Abstract:

Sequence alignment methods are used to detect and quantify similarities between different DNA and protein sequences that may have evolved from a common ancestor. Effective sequence alignment methodologies also provide insight into the structure function of a sequence and are the first step in constructing evolutionary trees. In this dissertation, we use a tabu search approach to multiple sequence alignment. A tabu search is a heuristic approach that uses adaptive memory features to align multiple sequences. The adaptive memory feature, a tabu list, helps the search process avoid local optimal solutions and explores the solution space in an efficient manner. We develop two main tabu searches that progressively align sequences. A randomly generated bifurcating tree guides the alignment. The objective is to optimize the alignment score computed using either the sum of pairs or parsimony scoring function. The use of a parsimony scoring function provides insight into the homology between sequences in the alignment. We also explore iterative refinement techniques such as a hidden Markov model and an intensification heuristic to further improve the alignment. This approach to multiple sequence alignment provides improved alignments as compared to several other methods.

APA, Harvard, Vancouver, ISO, and other styles

34

Talbot, Danielle. "Identifying misalignments in sequence alignment for protein modelling." Thesis, University of Reading, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445754.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Zhang, Xiaodong. "A Local Improvement Algorithm for Multiple Sequence Alignment." Ohio University / OhioLINK, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1049485762.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Kim, Eagu. "Inverse Parametric Alignment for Accurate Biological Sequence Comparison." Diss., The University of Arizona, 2008. http://hdl.handle.net/10150/193664.

Full text

Abstract:

For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. In practice, substitution scores are usually chosen by convention, and gap penalties are often found by trial and error. In contrast, a rigorous way to determine parameter values that are appropriate for aligning biological sequences is by solving the problem of Inverse Parametric Sequence Alignment. Given examples of biologically correct reference alignments, this is the problem of finding parameter values that make the examples score as close as possible to optimal alignments of their sequences. The reference alignments that are currently available contain regions where the alignment is not specified, which leads to a version of the problem with partial examples.In this dissertation, we develop a new polynomial-time algorithm for Inverse Parametric Sequence Alignment that is simple to implement, fast in practice, and can learn hundreds of parameters simultaneously from hundreds of examples. Computational results with partial examples show that best possible values for all 212 parameters of the standard alignment scoring model for protein sequences can be computed from 200 examples in 4 hours of computation on a standard desktop machine. We also consider a new scoring model with a small number of additional parameters that incorporates predicted secondary structure for the protein sequences. By learning parameter values for this new secondary-structure-based model, we can improve on the alignment accuracy of the standard model by as much as 15% for sequences with less than 25% identity.

APA, Harvard, Vancouver, ISO, and other styles

37

Arner, Erik. "Solving repeat problems in shotgun sequencing /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-996-3/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Blassel, Luc. "From sequences to knowledge, improving and learning from sequence alignments." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS385.

Full text

Abstract:

Dans cette thèse nous étudierons deux problèmes importants en bioinformatique, le premier concernant l’analyse primaire de données de séquençage, et le second concernant l’analyse secondaire de séquence par apprentissage automatique en vue d’obtenir des connaissances biologiques. L’alignement de séquences est l’un des outils les plus puissants et les plus importants dans le domaine de la biologie computationnelle. L’alignement de lectures de séquençage est souvent la première étape de nombreuses analyses telles que la détection de variations de structure, ou l’assemblage de génomes. Les technologies de séquençage à longue lectures ont amélioré la qualité des résultats pour toutes ces analyses. Elles sont, cependant, riches en erreurs de séquençage et posent des problèmes algorithmiques à l’alignement. Une technique répandue pour réduire les effets néfastes de ces erreurs est la compression d’homopolymères. Cette technique cible le type d’erreur de séquençage à longue lectures le plus répandu. Nous présentons une technique plus générale que la compression d’homopolymères, que nous appelons les “mapping-friendly sequence reductions” (MSR). Nous montrons ensuite que certaines de ces MSRs améliorent la précision des alignements de lecture sur des génomes entiers d’humains, de drosophiles et d’E. coli. L’amélioration des méthodes d’alignement de séquences est cruciale pour les analyses en aval .Par exemple, les alignements de séquences multiples sont indispensables pour étudier la résistance des virus. Grâce à la quantité toujours croissante d’alignements de séquences multiples annotés et de haute qualité, il est aujourd’hui devenu possible et utile d’étudier la résistance des virus à l’aide de méthodes d’apprentissage automatique. Nous avons utilisé un très grand alignement de séquences multiples de séquences de VIH britanniques et entraîné plusieurs classificateurs pour distinguer les séquences non-traitées des séquences traitées. En étudiant les variables importantes aux classificateurs, nous avons identifié des mutations de résistance aux médicaments. Nous avons ensuite, avant l’entraînement, supprimé le signal connu et associé à la pharmacoressitance des données. Nous conservons le pouvoir discriminant des classificateurs, et avons identifié 6 nouvelles mutations associées à la résistance. Une étude plus approfondie a indiqué que celles-ci étaient très probablement de nature accessoire et liées à des mutations de résistance connues
In this thesis we study two important problems in computational biology, one pertaining to primary analysis of sequencing data, and the second pertaining to secondary analysis of sequences to obtain biological insights using machine-learning. Sequence alignment is one of the most powerful and important tools in the field of computational biology. Read alignment is often the first step in many analyses like structural variant detection, genome assembly or variant calling. Long read sequencing technologies have improved the quality of results across all these analyses. They remain, however, plagued by sequencing errors and pose algorithmic challenges to alignment. A prevalent technique to reduce the detrimental effects of these errors is homopolymer compression, which targets the most prevalent type of long-read sequencing error. We present a more general framework than homopolymer compression, which we call mapping-friendly sequence reductions (MSR). We then show that some of these MSRs improve the accuracy of read alignments across whole human, drosophila and E. coli genomes. Improvements in sequence alignment methods are crucial for downstream analyses. For instance, multiple sequence alignments are indispensable when studying resistance in viruses. With the ever growing quantity of annotated, high quality multiple sequence alignments it has become possible and useful to study resistance in viruses with machine learning methods. We used a very large multiple sequence alignment of British HIV sequences and trained multiple classifiers to discriminate between treatment-naive and treatment-experienced sequences. By studying important classifier features we identified drug resistance mutations. We then removed known drug resistance associated signal from the data before training, kept classifying power, and identified 6 novel resistance associated mutations. Further study indicated that these were most likely accessory in nature and linked to known resistance mutations

APA, Harvard, Vancouver, ISO, and other styles

39

Li, Yuheng. "Searching for remotely homologous sequences in protein databases with hybrid PSI-blast." The Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1164741421.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Ye, Yongtao, and 叶永滔. "Aligning multiple sequences adaptively." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/206465.

Full text

Abstract:

With the rapid development of genome sequencing, an ever-increasing number of molecular biology analyses rely on the construction of an accurate multiple sequence alignment (MSA), such as motifs detection, phylogeny inference and structure prediction. Although many methods have been developed during the last two decades, most of them may perform poorly on some types of inputs, in particular when families of sequences fall below thirty percent similarity. Therefore, this thesis introduced two different effective approaches to improve the overall quality of multiple sequence alignment. First, by considering the similarity of the input sequences, we proposed an adaptive approach to compute better substitution matrices for each pair of sequences, and then apply the progressive alignment method to align them. For example, for inputs with high similarity, we consider the whole sequences and align them with global pair-Hidden Markov model, while for those with moderate low similarity, we may ignore the ank regions and use some local pair-Hidden Markov models to align them. To test the effectiveness of this approach, we have implemented a multiple sequence alignment tool called GLProbs and compared its performance with one dozen leading tools on three benchmark alignment databases, and GLProbs' alignments have the best scores in almost all testings. We have also evaluated the practicability of the alignments of GLProbs by applying the tool to three biological applications, namely phylogenetic tree reconstruction, protein secondary structure prediction and the detection of high risk members for cervical cancer in the HPV-E6 family, and the results are very encouraging. Second, based on our previous study, we proposed another new tool PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies input sequences into two types: normally related sequences and distantly related sequences. For normally related sequences, it uses an adaptive approach to construct the guide tree, and based on this guide tree, aligns the sequences progressively. To be more precise, it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the best method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree; instead it uses the non-progressive sequence annealing method to construct the multiple sequence alignment. By combining the strength of the progressive and non-progressive methods, and with a better way to construct the guide tree, PnpProbs improves the quality of multiple sequence alignments significantly for not only general input sequences, but also those very distantly related. With those encouraging empirical results, our developed software tools have been appreciated by the community gradually. For example, GLProbs has been invited and incorporated into the JAva Bioinformatics Analysis Web Services system (JABAWS).
published_or_final_version
Computer Science
Master
Master of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

41

Nakato, Ryuichiro. "Development of Fast and Accurate Genomic Sequence Alignment Methods." 京都大学 (Kyoto University), 2010. http://hdl.handle.net/2433/123352.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Zhang, Ching. "Genetic algorithm approaches for efficient multiple molecular sequence alignment." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape17/PQDD_0013/NQ30660.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Zhou, Rong. "Memory-efficient graph search applied to multiple sequence alignment." Diss., Mississippi State : Mississippi State University, 2005. http://library.msstate.edu/etd/show.asp?etd=etd-06282005-015428.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Herman, Joseph L. "Multiple sequence analysis in the presence of alignment uncertainty." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:88a56d9f-a96e-48e3-b8dc-a73f3efc8472.

Full text

Abstract:

Sequence alignment is one of the most intensely studied problems in bioinformatics, and is an important step in a wide range of analyses. An issue that has gained much attention in recent years is the fact that downstream analyses are often highly sensitive to the specific choice of alignment. One way to address this is to jointly sample alignments along with other parameters of interest. In order to extend the range of applicability of this approach, the first chapter of this thesis introduces a probabilistic evolutionary model for protein structures on a phylogenetic tree; since protein structures typically diverge much more slowly than sequences, this allows for more reliable detection of remote homologies, improving the accuracy of the resulting alignments and trees, and reducing sensitivity of the results to the choice of dataset. In order to carry out inference under such a model, a number of new Markov chain Monte Carlo approaches are developed, allowing for more efficient convergence and mixing on the high-dimensional parameter space. The second part of the thesis presents a directed acyclic graph (DAG)-based approach for representing a collection of sampled alignments. This DAG representation allows the initial collection of samples to be used to generate a larger set of alignments under the same approximate distribution, enabling posterior alignment probabilities to be estimated reliably from a reasonable number of samples. If desired, summary alignments can then be generated as maximum-weight paths through the DAG, under various types of loss or scoring functions. The acyclic nature of the graph also permits various other types of algorithms to be easily adapted to operate on the entire set of alignments in the DAG. In the final part of this work, methodology is introduced for alignment-DAG-based sequence annotation using hidden Markov models, and RNA secondary structure prediction using stochastic context-free grammars. Results on test datasets indicate that the additional information contained within the DAG allows for improved predictions, resulting in substantial gains over simply analysing a set of alignments one by one.

APA, Harvard, Vancouver, ISO, and other styles

45

McMahon, Peter Leonard. "Accelerating genomic sequence alignment using high performance reconfigurable computers." Master's thesis, University of Cape Town, 2008. http://hdl.handle.net/11427/17377.

Full text

Abstract:

Includes bibliographical references (pages 65-70).
Reconfigurable computing technology has progressed to a stage where it is now possible to achieve orders of magnitude performance and power efficiency gains over conventional computer architectures for a subset of high performance computing applications. In this thesis, we investigate the potential of reconfigurable computers to accelerate genomic sequence alignment specifically for genome sequencing applications. We present a highly optimized implementation of a parallel sequence alignment algorithm for the Berkeley Emulation Engine (BEE2) reconfigurable computer, allowing a single BEE2 to align simultaneously hundreds of sequences. For each reconfigurable processor (FPGA), we demonstrate a 61X speedup versus a state-of-the-art implementation on a modern conventional CPU core, and a 56X improvement in performance-per-Watt. We also show that our implementation is highly scalable and we provide performance results from a cluster implementation using 32 FPGAs. We conclude that reconfigurable computers provide an excellent platform on which to run sequence alignment, and that clusters of reconfigurable computers will be able to cope far more easily with the vast quantities of data produced by new ultra-high-throughput sequencers.

APA, Harvard, Vancouver, ISO, and other styles

46

Jiang, Yanan master of cellular and molecular biology. "Manual alignment of IVS sequences and its implication in multiple sequence alignment." Thesis, 2011. http://hdl.handle.net/2152/ETD-UT-2011-12-4706.

Full text

Abstract:

It is recognized that an iterative comparative analysis of large-scale homologous RNAs significantly promote the understanding of an RNA family. The Gutell lab is renowned for maintaining high quality RNA sequence alignments and accurately predicted RNA secondary structures using this approach. While the current available alignment and structure data are mainly obtained by trained domain experts with extensive manual effort, it is highly desired that this process is automated and replicable given the exponentially growing number of RNA sequence data and the amount of time required for expert training. In this thesis, we learn the processes involved in comparative analysis by manually aligning a non-coding RNA family, IVS sequences, with the supervision of Dr. Gutell. Each process is then simulated by mathematical objective functions and algorithms. We also evaluate the current available RNA analysis packages that aim each of the processes. Finally, a new RNA sequence alignment algorithm incorporating structure information that can be extended for different alignment tasks is proposed.
text

APA, Harvard, Vancouver, ISO, and other styles

47

Ying, Chung Li, and 鍾立穎. "Multiple Sequence Alignment using Pairwise Suboptimal Alignment." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/78872248303598965142.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊工程系
94
Multiple sequence alignment is an important tool to analysis biological sequence from searching similar sequence in database to protein structure. The optimal solution of dynamic programming is not always real biological solution when the number of sequence is increasing. Another method is progressive algorithm, it combined most similar sequence and then added next similar sequence. But the order of combining sequence have different alignment. Due to the optimal alignment is not always the best alignment in biological alignment, combining the pairwise suboptimal alignment have the possibility to find a better solution. The method also can decrease the time complexity. On the other hand, there is a possibility to find better alignment when we take a few time to try all combination.

APA, Harvard, Vancouver, ISO, and other styles

48

Wang, Shu 1973. "On multiple sequence alignment." Thesis, 2007. http://hdl.handle.net/2152/3715.

Full text

Abstract:

The tremendous increase in biological sequence data presents us with an opportunity to understand the molecular and cellular basis for cellular life. Comparative studies of these sequences have the potential, when applied with sufficient rigor, to decipher the structure, function, and evolution of cellular components. The accuracy and detail of these studies are directly proportional to the quality of these sequences alignments. Given the large number of sequences per family of interest, and the increasing number of families to study, improving the speed, accuracy and scalability of MSA is becoming an increasingly important task. In the past, much of interest has been on Global MSA. In recent years, the focus for MSA has shifted from global MSA to local MSA. Local MSA is being needed to align variable sequences from different families/species. In this dissertation, we developed two new algorithms for fast and scalable local MSA, a three-way-consistency-based MSA and a biclustering -based MSA. The first MSA algorithm is a three-way-Consistency-Based MSA (CBMSA). CBMSA applies alignment consistency heuristics in the form of a new three-way alignment to MSA. While three-way consistency approach is able to maintain the same time complexity as the traditional pairwise consistency approach, it provides more reliable consistency information and better alignment quality. We quantify the benefit of using three-way consistency as compared to pairwise consistency. We have also compared CBMSA to a suite of leading MSA programs and CBMSA consistently performs favorably. We also developed another new MSA algorithm, a biclustering-based MSA. Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in MSA is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering algorithms are intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was compared with a suite of leading MSA programs. With respect to quantitative measures of MSA, BlockMSA scores comparable to or better than the other leading MSA programs. With respect to biological validation of MSA, the other leading MSA programs lag BlockMSA in their ability to identify the most highly conserved regions.

APA, Harvard, Vancouver, ISO, and other styles

49

Tsai, Ping Han, and 蔡秉翰. "Sequence Alignment with Block Constraint." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/82895843637763555950.

Full text

Abstract:

碩士
國立清華大學
資訊系統與應用研究所
104
In order to determine whether two sequences are similar or not, we usually do the pairwise alignment. In bioinformatics, sequence alignment is an important strategy to determine the identity between two DNA, RNA, or protein sequences. The sequence alignment can identify the similar regions that may share similar structure, function or evolutionary relationship. Compared with the 20-letter protein alphabet, the 4-letter RNA alphabet is smaller and less informative. As a consequence, when the identity between two RNA sequences is under 60%, it is hard to determine whether these two RNA sequences have the similar struc-ture. Thus, to align two RNA molecules, several studies have considered not merely sequence information, but also secondary or tertiary structure infor-mation. Our lab developed a tool called iPARTS2 in 2016 that aligns two RNA 3D structures based on both primary and tertiary structure information. The basic steps of our iPARTS2 are as follows. First, a Ramachandran-like diagram of RNAs was derived by plotting nucleotides of RNA structures in the PDB da-tabase on a 2D axis using their two pseudo-torsion angles η and θ. Then, affinity propagation clustering algorithm was applied to the η-θ plot to obtain 23 nucle-otide conformations, which were combined with RNA 1D sequence information A, U, C and G to further obtain a structural alphabet (SA) of 92 elements. Next, the SA was used to transform RNA 3D structures into 1D sequences of SA let-ters. Finally, classical sequence alignment methods were utilized on two SA-encoded sequences to determine their structural similarities. However, given two RNA molecules

APA, Harvard, Vancouver, ISO, and other styles

50

Xiao, Bo Weng, and èåæ. "Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/19616460760964410139.

Full text

Abstract:

碩士
國立暨南國際大學
資訊工程學系
92
Multiple sequence alignment is a fundamental problem in computational molecular biology. It has been known as an NP-hard problem. To find its optimal solution will take a lot of time. For a reasonable wait and an acceptable solution, we have progressive methods. These methods perform pairwise alignment, and then combine them in to a multiple sequence alignment. In this thesis, we focus on multiple sequence alignment containing clusters. We try to take another view point to deal with sequence alignment. We use a matrix to present a sequence. Every sequence will be represented as a matrix. After two sequences (matrices) are aligned, the result of the alignment will again be represented by a matrix and then the original two sequences (matrices) will be discarded. That is, the result of aligning a set of sequences will always be considered as a block and represented by a matrix. This is thus different from the old ways in which only two sequences are aligned, not a group of aligned sequences and another group of aligned sequences. In this thesis, we will show some experimental results to test our proposed method. Block alignment outperforms those progressive methods for sequences containing clusters.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Sequence alignment'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles