Log in

Relevant bibliographies by topics / Sequence alignment algorithms / Dissertations / Theses

Dissertations / Theses on the topic 'Sequence alignment algorithms'

To see the other types of publications on this topic, follow the link: Sequence alignment algorithms.

Author: Grafiati

Published: 14 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Sequence alignment algorithms.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Starrett, Dean. "Optimal Alignment of Multiple Sequence Alignments." Diss., The University of Arizona, 2008. http://hdl.handle.net/10150/194840.

Full text

Abstract:

An essential tool in biology is the alignment of multiple sequences. Biologists use multiple sequence alignments for tasks such as predicting protein structure and function, reconstructing phylogenetic trees, and finding motifs. Constructing high-quality multiple alignments is computationally hard, both in theory and in practice, and is typically done using heuristic methods. The majority of state-of-the-art multiple alignment programs employ a form and polish strategy, where in the construction phase, an initial multiple alignment is formed by progressively merging smaller alignments, starting with single sequences. Then in a local-search phase, the resulting alignment is polished by repeatedly splitting it into smaller alignments and re-merging. This merging of alignments, the basic computational problem in the construction and local-search phases of the best multiple alignment heuristics, is called the Aligning Alignments Problem. Under the sum-of-pairs objective for scoring multiple alignments, this problem may seem to be a simple extension of two-sequence alignment. It is proven here, however, that with affine gap costs (which are recognized as necessary to get biologically-informative alignments) the problem is NP-complete when gaps are counted exactly. Interestingly, this form of multiple alignment is polynomial-time solvable when we relax the exact count, showing that exact gap counts themselves are inherently hard in multiple sequence alignment. Unlike general multiple alignment however, we show that Aligning Alignments with affine gap costs and exact counts is tractable in practice, by demonstrating an effective algorithm and a fast implementation. Our software AlignAlign is both time- and space-efficient on biological data. Computational experiments on biological data show instances derived from standard benchmark suites can be optimally aligned with surprising efficiency, and experiments on simulated data show the time and space both scale well.

APA, Harvard, Vancouver, ISO, and other styles

2

Powell, David Richard 1973. "Algorithms for sequence alignment." Monash University, School of Computer Science and Software Engineering, 2001. http://arrow.monash.edu.au/hdl/1959.1/8051.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Arvestad, Lars. "Algorithms for biological sequence alignment." Doctoral thesis, KTH, Numerisk analys och datalogi, NADA, 1999. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-2905.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Ho, Ngai-lam, and 何毅林. "Algorithms on constrained sequence alignment." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30201949.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Alimehr, Leila. "The Performance of Sequence Alignment Algorithms." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-200289.

Full text

Abstract:

This thesis deals with sequence alignment algorithms. The sequence alignment is a mutual arrange of two or more sequences in order to study their similarity and dissimilarity. Four decades after the seminal work by Needleman and Wunsch in 1970, these methods still need more explorations. We start out with a review of a sequence alignment, and its generalization to multiple alignments, although the focus of this thesis is on the evaluation of the new alignment algorithms. The research presented here in has stepped into the different algorithms that are in terms of the dynamic programming. In the study of sequence alignment algorithms, two powerful techniques have been invented. According to the simulations, the new algorithms are shown to be extremely efficient for the comparing DNA sequences. All the sequence alignment algorithmsare compared in terms of the distance. We use the programming language R for the implementation and simulation of the algorithms discussed in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

6

Nguyen, Ken D. "Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/cs_diss/62.

Full text

Abstract:

Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences' structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm.

APA, Harvard, Vancouver, ISO, and other styles

7

Jiang, Tianwei. "Sequence alignment : algorithm development and applications /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?ECED%202009%20JIANG.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Lu, Yue. "Improving the quality of multiple sequence alignment." [College Station, Tex. : Texas A&M University, 2008. http://hdl.handle.net/1969.1/ETD-TAMU-3111.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Isa, Mohammad Nazrin. "High performance reconfigurable architectures for biological sequence alignment." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/7721.

Full text

Abstract:

Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhou, Rong. "Memory-efficient graph search applied to multiple sequence alignment." Diss., Mississippi State : Mississippi State University, 2005. http://library.msstate.edu/etd/show.asp?etd=etd-06282005-015428.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Lassmann, Timo. "Algorithms for building and evaluating multiple sequence alignments /." Stockholm, 2006. http://diss.kib.ki.se/2006/91-7140-887-8/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Rumbold, A. "Multiple sequence alignment algorithms for the phylogenic analysis of chloroplast DNA." Thesis, Honours thesis, University of Tasmania, 2004. https://eprints.utas.edu.au/100/1/HonoursThesis_final.pdf.

Full text

Abstract:

The application of approximate string matching and alignment algorithms to either DNA or amino acid (protein) sequences is important for determining conserved regions, functional sites and to allow for multiple sequence alignments from which an evolutionary (phylogenic) tree may be inferred. Global sequence alignment algorithms attempt to maximise the alignment score by placing gaps (which are seen as insertion/deletion evolutionary events) in either sequence so as to maximise the number of matching characters and minimise mismatches and gaps. The extension of traditional dynamic programming algorithms for aligning two sequences to aligning N sequences leads to a polynomial increase in the space and time complexity. Consequently many heuristic multiple sequence alignment algorithms, and improvements in the representation of a multiple sequence alignment, have been developed. This honours project has focussed on multiple sequence alignment algorithms, their processing and space requirements and the suitability of the alignments of samples of chloroplast DNA to further phylogenic analysis. Since the samples being used in the analysis are hypervariable, this research has also looked at algorithms capable of handling inversions in the DNA sequences (where a section of DNA has undergone the mutation of reversing).

APA, Harvard, Vancouver, ISO, and other styles

13

Gharazi, Elham. "IMAP : design and implementation of an interactive and integrative programming environment for progressive multiple sequence alignment and phylogeny reconstruction." Phd thesis, Faculty of Engineering and Information Technologies, 2011. http://hdl.handle.net/2123/9328.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Ngxande, Mkhuseli. "Development of high performance computing cluster for evaluation of sequence alignment algorithms." Thesis, University of Fort Hare, 2015. http://hdl.handle.net/10353/d1020163.

Full text

Abstract:

As the biological databases are increasing rapidly, there is a challenge for both Biologists and Computer Scientists to develop algorithms and databases to manage the increasing data. There are many algorithms developed to align the sequences stored in biological databases - some take time to process the data while others are inefficient to produce reasonable results. As more data is generated, and time consuming algorithms are developed to handle them, there is a need for specialized computers to handle the computations. Researchers are typically limited by the computational power of their computers. High Performance Computing (HPC) field addresses this challenge and can be used in a cost-effective manner where there is no need for expensive equipment, instead old computers can be used together to form a powerful system. This is the premise of this research, wherein the setup of a low-cost Beowulf cluster is explored, with the subsequent evaluation of its performance for processing sequent alignment algorithms. A mixed method methodology is used in this dissertation, which consists of literature study, theoretical and practise based system. This mixed method methodology also have a proof and concept where the Beowulf cluster is designed and implemented to perform the sequence alignment algorithms and also the performance test. This dissertation firstly gives an overview of sequence alignment algorithms that are already developed and also highlights their timeline. A presentation of the design and implementation of the Beowulf Cluster is highlighted and this is followed by the experiments on the baseline performance of the cluster. A detailed timeline of the sequence alignment algorithms is given and also the comparison between ClustalW-MPI and T-Coffee (Tree-based Consistency Objective Function For alignment Evaluation) algorithm is presented as part of the findings in the research study. The efficiency of the cluster was observed to be 19.8%, this percentage is unexpected because the predicted efficiency is 83.3%, which is found in the theoretical cluster calculator. The theoretical performance of the cluster showed a high performance as compared with the experimental performance, this is attributable to the slow network, which was 100Mbps, low processor speed of 2.50 GHz, and low memory of 2 Gigabytes.

APA, Harvard, Vancouver, ISO, and other styles

15

Yim, Cheuk-hon Terence. "Approximate string alignment and its application to ESTs, mRNAs and genome mapping." Click to view the E-thesis via HKUTO, 2004. http://sunzi.lib.hku.hk/hkuto/record/B31455736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Zhao, Zhiyu. "Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison." ScholarWorks@UNO, 2008. http://scholarworks.uno.edu/td/851.

Full text

Abstract:

Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics research. This dissertation discusses, specifically, protein structure related problems including protein structure alignment and query, and genome sequence related problems including haplotype reconstruction and genome rearrangement. It first presents an algorithm for pairwise protein structure alignment that is tested with structures from the Protein Data Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and CE. The preliminary algorithm is a graph-theory based approach, which uses the concept of \stars" to reduce the complexity of clique-finding algorithms. The algorithm is then improved by introducing \double-center stars" in the graph and applying a self-learning strategy. The updated algorithm is tested with a much larger set of protein structures and shown to be an improvement in accuracy, especially in cases of weak similarity. A protein structure query algorithm is designed to search for similar structures in the PDB, using the improved alignment algorithm. It is compared with SSM and shows better performance with lower maximum and average Q-score for missing proteins. An interesting problem dealing with the calculation of the diameter of a 3-D sequence of points arose and its connection to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence is approximated by a series of sublinear time deterministic, zero-error and bounded-error randomized algorithms and we have obtained a series of separations about the power of sublinear time computations. This dissertation also discusses two genome sequence related problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices with incomplete and inconsistent errors. The experiments with simulated data show both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy of the algorithm. Finally, a genome rearrangement problem is studied. The concept of non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several polynomial time algorithms are presented.

APA, Harvard, Vancouver, ISO, and other styles

17

Helal, Manal Computer Science &amp Engineering Faculty of Engineering UNSW. "Indexing and partitioning schemes for distributed tensor computing with application to multiple sequence alignment." Awarded by:University of New South Wales. Computer Science & Engineering, 2009. http://handle.unsw.edu.au/1959.4/44781.

Full text

Abstract:

This thesis investigates indexing and partitioning schemes for high dimensional scientific computational problems. Building on the foundation offered by Mathematics of Arrays (MoA) for tensor-based computation, the ultimate contribution of the thesis is a unified partitioning scheme that works invariant of the dataset dimension and shape. Consequently, portability is ensured between different high performance machines, cluster architectures, and potentially computational grids. The Multiple Sequence Alignment (MSA) problem in computational biology has an optimal dynamic programming based solution, but it becomes computationally infeasible as its dimensionality (the number of sequences) increases. Even sub-optimal approximations may be unmanageable for more than eight sequences. Furthermore, no existing MSA algorithms have been formulated in a manner invariant over the number of sequences. This thesis presents an optimal distributed MSA method based on MoA. The latter offers a set of constructs that help represent multidimensional arrays in memory in a linear, concise and efficient way. Using MoA allows the partitioning of the dynamic programming algorithm to be expressed independently of dimension. MSA is the highest dimensional scientific problem considered for MoA-based partitioning to date. Two partitioning schemes are presented: the first is a master/slave approach which is based on both master/slave scheduling and slave/slave coupling. The second approach is a peer-to-peer design, in which the scheduling and dependency communication are calculated independently by each process, with no need for a master scheduler. A search space reduction technique is introduced to cater for the exponential expansion as the problem dimensionality increases. This technique relies on defining a hyper-diagonal through the tensor space, and choosing a band of neighbouring partitions around the diagonal to score. In contrast, other sub-optimal methods in the literature only consider projections on the surface of the hyper-cube. The resulting massively parallel design produces a scalable solution that has been implemented on high performance machines and cluster architectures. Experimental results for these implementations are presented for both simulated and real datasets. Comparisons between the reduced search space technique of this thesis with other sub-optimal methods for the MSA problem are presented.

APA, Harvard, Vancouver, ISO, and other styles

18

Buckingham, Lawrence. "K-mer based algorithms for biological sequence comparison and search." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/236377/1/Buckingham%2BThesis%281%29.pdf.

Full text

Abstract:

The present thesis develops novel algorithms for biological sequence comparison and accelerated sequence database search, motivated by the need to work effectively with the rapidly expanding volume of sequence data which is available as the result of continuing advances in sequencing technology. Empirical tests using datasets of realistic size and content demonstrate that these algorithms are approximately an order of magnitude faster than standard sequence database search tools, while attaining higher precision. While the algorithms have been developed and tested in a biological context, they are applicable to any problem involving comparison of sequential data series.

APA, Harvard, Vancouver, ISO, and other styles

19

Yim, Cheuk-hon Terence, and 嚴卓漢. "Approximate string alignment and its application to ESTs, mRNAs and genome mapping." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31455736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Darbha, Sriram. "RNA Homology Searches Using Pair Seeding." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/1172.

Full text

Abstract:

Due to increasing numbers of non-coding RNA (ncRNA) being discovered recently, there is interest in identifying homologs of a given structured RNA sequence. Exhaustive homology searching for structured RNA molecules using covariance models is infeasible on genome-length sequences. Hence, heuristic methods are employed, but they largely ignore structural information in the query. We present a novel method, which uses secondary structure information, to perform homology searches for a structured RNA molecule. We define the concept of a pair seed and theoretically model alignments of random and related paired regions to compute expected sensitivity and specificity. We show that our method gives theoretical gains in sensitivity and specificity compared to a BLAST-based heuristic approach. We provide experimental verification of this gain.

We also show that pair seeds can be effectively combined with the spaced seeds approach to nucleotide homology search. The hybrid search method has theoretical specificity superior to that of the BLAST seed. We provide experimental evaluation of our hypotheses. Finally, we note that our method is easily modified to process pseudo-knotted regions in the query, something outside the scope of covariance model based methods.

APA, Harvard, Vancouver, ISO, and other styles

21

Ticona, Waldo Gonzalo Cancino. "Aplicação de algoritmos genéricos multi-objetivo para alinhamento de seqüências biológicas." Universidade de São Paulo, 2003. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-09052003-215914/.

Full text

Abstract:

O alinhamento de seqüências biológicas é uma operação básica em Bioinformática, já que serve como base para outros processos como, por exemplo, a determinação da estrutura tridimensional das proteínas. Dada a grande quantidade de dados presentes nas seqüencias, são usadas técnicas matemáticas e de computação para realizar esta tarefa. Tradicionalmente, o Problema de Alinhamento de Seqüências Biológicas é formulado como um problema de otimização de objetivo simples, onde alinhamento de maior semelhança, conforme um esquema de pontuação, é procurado. A Otimização Multi-Objetivo aborda os problemas de otimização que possuem vários critérios a serem atingidos. Para este tipo de problema, existe um conjunto de soluções que representam um "compromiso" entre os objetivos. Uma técnica que se aplica com sucesso neste contexto são os Algoritmos Evolutivos, inspirados na Teoria da Evolução de Darwin, que trabalham com uma população de soluções que vão evoluindo até atingirem um critério de convergência ou de parada. Este trabalho formula o Problema de Alinhamento de Seqüências Biológicas como um Problema de Otimização Multi-Objetivo, para encontrar um conjunto de soluções que representem um compromisso entre a extensão e a qualidade das soluções. Aplicou-se vários modelos de Algoritmos Evolutivos para Otimização Multi-Objetivo. O desempenho de cada modelo foi avaliado por métricas de performance encontradas na literatura.
The Biological Sequence Alignment is a basic operation in Bioinformatics since it serves as a basis for other processes, i.e. determination of the protein's three-dimensional structure. Due to the large amount of data involved, mathematical and computational methods have been used to solve this problem. Traditionally, the Biological Alignment Sequence Problem is formulated as a single optimization problem. Each solution has a score that reflects the similarity between sequences. Then, the optimization process looks for the best score solution. The Multi-Objective Optimization solves problems with multiple objectives that must be reached. Frequently, there is a solution set that represents a trade-off between the objectives. Evolutionary Algorithms, which are inspired by Darwin's Evolution Theory, have been applied with success in solving this kind of problems. This work formulates the Biological Sequence Alignment as a Multi-Objective Optimization Problem in order to find a set of solutions that represent a trade-off between the extension and the quality of the solutions. Several models of Evolutionary Algorithms for Multi-Objetive Optimization have been applied and were evaluated using several performance metrics found in the literature.

APA, Harvard, Vancouver, ISO, and other styles

22

Yin, Zhaoming. "Enhance the understanding of whole-genome evolution by designing, accelerating and parallelizing phylogenetic algorithms." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51875.

Full text

Abstract:

The advent of new technology enhance the speed and reduce the cost for sequencing biological data. Making biological sense of this genomic data is a big challenge to the algorithm design as well as the high performance computing society. There are many problems in Bioinformatics, such as how new functional genes arise, why genes are organized into chromosomes, how species are connected through the evolutionary tree of life, or why arrangements are subject to change. Phylogenetic analyses have become essential to research on the evolutionary tree of life. It can help us to track the history of species and the relationship between different genes or genomes through millions of years. One of the fundamentals for phylogenetic construction is the computation of distances between genomes. Since there are much more complicated combinatoric patterns in rearrangement events, the distance computation is still a hot topic as much belongs to mathematics as to biology. For the distance computation with input of two genomes containing unequal gene contents (with insertions/deletions and duplications) the problem is especially hard. In this thesis, we will discuss about our contributions to the distance estimation for unequal gene order data. The problem of finding the median of three genomes is the key process in building the most parsimonious phylogenetic trees from genome rearrangement data. For genomes with unequal contents, to the best of our knowledge, there is no algorithm that can help to find the median. In this thesis, we make our contributions to the median computation in two aspects. 1) Algorithm engineering aspect, we harness the power of streaming graph analytics methods to implement an exact DCJ median algorithm which run as fast as the heuristic algorithm and can help construct a better phylogenetic tree. 2) Algorithmic aspect, we theoretically formulate the problem of finding median with input of genomes having unequal gene content, which leads to the design and implementation of an efficient Lin-Kernighan heuristic based median algorithm. Inferring phylogenies (evolutionary history) of a set of given species is the ultimate goal when the distance and median model are chosen. For more than a decade, biologists and computer scientists have studied how to infer phylogenies by the measurement of genome rearrangement events using gene order data. While evolution is not an inherently parsimonious process, maximum parsimony (MP) phylogenetic analysis has been supported by widely applied to the phylogeny inference to study the evolutionary patterns of genome rearrangements. There are generally two problems with the MP phylogenetic arose by genome rearrangement: One is, given a set of modern genomes, how to compute the topologies of the according phylogenetic tree; Another is, given the topology of a model tree, how to infer the gene orders of the ancestor species. To assemble a MP phylogenetic tree constructor, there are multiple NP hard problems involved, unfortunately, they organized as one problem on top of other problems. Which means, to solve a NP hard problem, we need to solve multiple NP hard sub-problems. For phylogenetic tree construction with the input of unequal content genomes, there are three layers of NP hard problems. In this thesis, we will mainly discuss about our contributions to the design and implementation of the software package DCJUC (Phylogeny Inference using DCJ model to cope with Unequal Content Genomes), that can help to achieve both of these two goals. Aside from the biological problems, another issue we need to concern is about the use of the power of parallel computing to assist accelerating algorithms to handle huge data sets, such as the high resolution gene order data. For one thing, all of the method to tackle with phylogenetic problems are based on branch and bound algorithms, which are quite irregular and unfriendly to parallel computing. To parallelize these algorithms, we need to properly enhance the efficiency for localized memory access and load balance methods to make sure that each thread can put their potentials into full play. For the other, there is a revolution taking place in computing with the availability of commodity graphical processors such as Nvidia GPU and with many-core CPUs such as Cray-XMT, or Intel Xeon Phi Coprocessor with 60 cores. These architectures provide a new way for us to achieve high performance at much lower cost. However, code running on these machines are not so easily programmed, and scientific computing is hard to tune well on them. We try to explore the potentials of these architectures to help us accelerate branch and bound based phylogenetic algorithms.

APA, Harvard, Vancouver, ISO, and other styles

23

Furcy, David Andre. "Speeding Up the Convergence of Online Heuristic Search and Scaling Up Offline Heuristic Search." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/4855.

Full text

Abstract:

The most popular methods for solving the shortest-path problem in Artificial Intelligence are heuristic search algorithms. The main contributions of this research are new heuristic search algorithms that are either faster or scale up to larger problems than existing algorithms. Our contributions apply to both online and offline tasks. For online tasks, existing real-time heuristic search algorithms learn better informed heuristic values and in some cases eventually converge to a shortest path by repeatedly executing the action leading to a successor state with a minimum cost-to-goal estimate. In contrast, we claim that real-time heuristic search converges faster to a shortest path when it always selects an action leading to a state with a minimum f-value, where the f-value of a state is an estimate of the cost of a shortest path from start to goal via the state, just like in the offline A* search algorithm. We support this claim by implementing this new non-trivial action-selection rule in FALCONS and by showing empirically that FALCONS significantly reduces the number of actions to convergence of a state-of-the-art real-time search algorithm. For offline tasks, we improve on two existing ways of scaling up best-first search to larger problems. First, it is known that the WA* algorithm (a greedy variant of A*) solves larger problems when it is either diversified (i.e., when it performs expansions in parallel) or committed (i.e., when it chooses the state to expand next among a fixed-size subset of the set of generated but unexpanded states). We claim that WA* solves even larger problems when it is enhanced with both diversity and commitment. We support this claim with our MSC-KWA* algorithm. Second, it is known that breadth-first search solves larger problems when it prunes unpromising states, resulting in the beam search algorithm. We claim that beam search quickly solves even larger problems when it is enhanced with backtracking based on limited discrepancy search. We support this claim with our BULB algorithm. We show that both MSC-KWA* and BULB scale up to larger problems than several state-of-the-art offline search algorithms in three standard benchmark domains. Finally, we present an anytime variant of BULB and apply it to the multiple sequence alignment problem in biology.

APA, Harvard, Vancouver, ISO, and other styles

24

Liakhovitch, Evgueni. "Genetic algorithm using restricted sequence alignments." Ohio : Ohio University, 2000. http://www.ohiolink.edu/etd/view.cgi?ohiou1172598174.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Zafalon, Geraldo Francisco Donegá. "Aplicação de estratégias híbridas em algoritmos de alinhamento múltiplo de sequências para ambientes de computação paralela e distribuída." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-28082015-120515/.

Full text

Abstract:

A Bioinformática tem se desenvolvido de forma intensa nos últimos anos. A necessidade de se processar os grandes conjuntos de sequências, sejam de nucleotídeos ou de aminoácidos, tem estimulado o desenvolvimento de diversas técnicas algorítmicas, de modo a tratar este problema de maneira factível. Os algoritmos de alinhamento de alinhamento múltiplo de sequências assumiram um papel primordial, tornando a execução de alinhamentos de conjuntos com mais de duas sequencias uma tarefa viável computacionalmente. No entanto, com o aumento vertiginoso tanto da quantidade de sequencias em um determinado conjunto, quanto do comprimento dessas sequencias, a utilização desses algoritmos de alinhamento múltiplo, sem o acoplamento de novas estratégias, tornou-se algo impraticável. Consequentemente, a computação de alto desempenho despontou como um dos recursos a serem utilizados, através da paralelização de diversas estratégias para sua execução em grandes sistemas computacionais. Além disso, com a contínua expansão dos conjuntos de sequências, outras estratégias de otimização passaram a ser agregadas aos algoritmos de alinhamento múltiplo paralelos. Com isso, o desenvolvimento de ferramentas para alinhamento múltiplo de sequencias baseadas em abordagens híbridas destaca-se, atualmente, como a solução com melhor aceitação. Assim, no presente trabalho, pode-se verificar o desenvolvimento de uma estratégia híbrida para os algoritmos de alinhamento múltiplo progressivos, cuja utilização e amplamente difundida, em Bioinformática. Nesta abordagem, conjugou-se a paralelização e o particionamento dos conjuntos de sequências, na fase de construção da matriz de pontuação, e a otimização das fases de construção da árvore filogenética e de alinhamento múltiplo, através dos algoritmos de colônia de formigas e simulated annealling paralelo, respectivamente.
Bioinformatics has been developed in a fast way in the last years. The need for processing large sequences sets, either nucleotides or aminoacids, has stimulated the development of many algorithmic techniques, to solve this problem in a feasible way. Multiple sequence alignment algorithms have played an important role, because with the reduced computational complexity provided by them, it is possible to perform alignments with more than two sequences. However, with the fast growing of the amount and length of sequences in a set, the use of multiple alignment algorithms without new optimization strategies became almost impossible. Therefore, high performance computing has emerged as one of the features being used, through the parallelization of many strategies for execution in large computational systems. Moreover, with the continued expansion of sequences sets, other optimization strategies have been coupled with parallel multiple sequence alignments. Thus, the development of multiple sequences alignment tools based on hybrid strategies has been considered the solution with the best results. In this work, we present the development of a hybrid strategy to progressive multiple sequence alignment, where its using is widespread in Bioinformatics. In this approach, we have aggregated the parallelization and the partitioning of sequences sets in the score matrix calculation stage, and the optimization of the stages of the phylogenetic tree reconstruction and multiple alignment through ant colony and parallel simulated annealing algorithms, respectively.

APA, Harvard, Vancouver, ISO, and other styles

26

Garriga, Nogales Edgar 1990. "New algorithmic contributions for large scale multiple sequence alignments of protein sequences." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2022. http://hdl.handle.net/10803/673526.

Full text

Abstract:

In these days of significant changes and the rapid evolution of technology, the amount of datascience has to deal with the growth incredibly fast, and the size of data could be prohibitive.Multiple Sequence Alignments (MSA) are used in various areas of biology, and the increase ofdata has produced a degradation of the methods. That is why is proposed a new solution toperform the MSA. This novel paradigm allows the alignment of millions of sequences and theability to modularize the process. Regressive enables the parallelization of the process and thecombination of clustering methods (guide-tree) with whatever aligner is desired. On theclustering side, the guide-tree has to be rethought. A study of the current state of the methodsand their strength and weaknesses have been performed to shed some light on the topic. Theguide-tree cannot be the bottleneck, and it should provide a good starting point for the aligners.
En aquests dies de profunds canvis i una ràpida evolució de la tecnologia, la quantitat de dataque la ciència ha de treballar ha crescut increïblement ràpid i la grandària dels arxius ha crescutde manera quasi prohibitiva.Els alineaments múltiples de seqüència (MSA) es fan servir endiverses àrees de la biologia, i l'increment de les dades ha produït una degradació delsresultats. És per això, que es proposa una nova estratègia per realitzar els alineaments. Aquestnou paradigma permet alinear milions de seqüències i l'opcio de modularitzar el procés.'Regressive' permet la paral·lelització del procés i la combinació de diferents algoritmesd'agrupacio (guide-tree) amb el mètode de alineament que és desitgi. Dins del camp del'agrupació, s'ha de repensar l'estratègia per crear els guide-tree. Un estudi sobre l'estat actualdels mètodes i les seves virtuts i punts febles ha sigut realitzar per llençar una mica de llum enaquesta àrea. Els 'guide-tree' no poden ser el coll de botella, i haurien de servir per començarde la millor manera possible el procés d'alineament.

APA, Harvard, Vancouver, ISO, and other styles

27

Cunial, Fabio. "Analysis of the subsequence composition of biosequences." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44716.

Full text

Abstract:

Measuring the amount of information and of shared information in biological strings, as well as relating information to structure, function and evolution, are fundamental computational problems in the post-genomic era. Classical analyses of the information content of biosequences are grounded in Shannon's statistical telecommunication theory, while the recent focus is on suitable specializations of the notions introduced by Kolmogorov, Chaitin and Solomonoff, based on data compression and compositional redundancy. Symmetrically, classical estimates of mutual information based on string editing are currently being supplanted by compositional methods hinged on the distribution of controlled substructures. Current compositional analyses and comparisons of biological strings are almost exclusively limited to short sequences of contiguous solid characters. Comparatively little is known about longer and sparser components, both from the point of view of their effectiveness in measuring information and in separating biological strings from random strings, and from the point of view of their ability to classify and to reconstruct phylogenies. Yet, sparse structures are suspected to grasp long-range correlations and, at short range, they are known to encode signatures and motifs that characterize molecular families. In this thesis, we introduce and study compositional measures based on the repertoire of distinct subsequences of any length, but constrained to occur with a predefined maximum gap between consecutive symbols. Such measures highlight previously unknown laws that relate subsequence abundance to string length and to the allowed gap, across a range of structurally and functionally diverse polypeptides. Measures on subsequences are capable of separating only few amino acid strings from their random permutations, but they reveal that random permutations themselves amass along previously undetected, linear loci. This is perhaps the first time in which the vocabulary of all distinct subsequences of a set of structurally and functionally diverse polypeptides is systematically counted and analyzed. Another objective of this thesis is measuring the quality of phylogenies based on the composition of sparse structures. Specifically, we use a set of repetitive gapped patterns, called motifs, whose length and sparsity have never been considered before. We find that extremely sparse motifs in mitochondrial proteomes support phylogenies of comparable quality to state-of-the-art string-based algorithms. Moving from maximal motifs -- motifs that cannot be made more specific without losing support -- to a set of generators with decreasing size and redundancy, generally degrades classification, suggesting that redundancy itself is a key factor for the efficient reconstruction of phylogenies. This is perhaps the first time in which the composition of all motifs of a proteome is systematically used in phylogeny reconstruction on a large scale. Extracting all maximal motifs, or even their compact generators, is infeasible for entire genomes. In the last part of this thesis, we study the robustness of measures of similarity built around the dictionary of LZW -- the variant of the LZ78 compression algorithm proposed by Welch -- and of some of its recently introduced gapped variants. These algorithms use a very small vocabulary, they perform linearly in the input strings, and they can be made even faster than LZ77 in practice. We find that dissimilarity measures based on maximal strings in the dictionary of LZW support phylogenies that are comparable to state-of-the-art methods on test proteomes. Introducing a controlled proportion of gaps does not degrade classification, and allows to discard up to 20% of each input proteome during comparison.

APA, Harvard, Vancouver, ISO, and other styles

28

Rinaudo, Philippe. "Algorithmique de l'alignement structure-séquence d'ARN : une approche générale et paramétrée." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00847745.

Full text

Abstract:

L'alignement de macromolécules biologiques comme les protéines, l'ADN ou encore l'ARN est une problématique biologique et bio-informatique qui a pour but de révéler une partie des mystères du fonctionnement des cellules, constituants des êtres vivants. Les ARN non-codant sont des macromolécules intervenant dans le métabolisme de tout être vivant et les deux problématiques majeurs les concernant sont: la prédiction de leur structure pour mieux comprendre leur fonctionnement et leur détection dans des bases de données ou des génomes. L'une des approches: l'alignement structure-séquence d'ARN, répond à ces deux problématiques. Le problème d'alignement structure-séquence consiste à aligner une structure connue d'un premier ARN avec la séquence d'un deuxième ARN.La structure est représentée sous la forme d'un graphe ou de façon équivalente sous la forme d'une séquence arc-annotées et la séquence représente la suite des nucléotides de l'ARN.Pour résoudre ce problème, nous cherchons à optimiser l'alignement selon une fonction de coût. C'est donc un problème d'optimisation, qui malheureusement se révèle NP-Difficile.En conséquence différents travaux définissent des classes d'instances réduites pour lesquelles ils proposent des algorithmes spécifiques mais à complexités polynomiales.Les travaux de ma thèse unifient et la généralisent les approches précédentes par la construction d'un algorithme à complexité paramétrée non spécifique à une classe d'instances. En utilisant cet algorithme, il est possible de résoudre le problème d'alignement structure-séquence pour toutes les instances possibles, et aussi efficacement que les précédentes approches sur leur domaine de résolution respectif. Cet algorithme utilise une technique empruntée à la théorie des graphes: la décomposition arborescente, c'est-à-dire qu'il transforme la structure donnée en une décomposition arborescente et c'est ensuite cette décomposition qui est alignée avec la séquence donnée. L'alignement entre une décomposition arborescente et une séquence se fait par programmation dynamique.Sa mise en place a nécessité une reformulation du problème ainsi qu'une modification importante de l'utilisation classique de la programmation dynamique pour les décompositions arborescentes. Au final, cela conduit à un algorithme paramétré dont le paramètre est entièrement lié à la décomposition arborescente. La construction des décompositions arborescentes pour lesquelles l'alignement s'effectuera plus le efficacement possible est malheureusement un problème lui aussi NP-Difficile. Néanmoins, nous avons créé une heuristique de construction de décompositions adaptée aux structures d'ARN.Nous avons alors défini des nouvelles classes de structures pour lesquelles notre algorithme (décomposition et alignement) possède une faible complexité. Ces classes incluent notamment toutes les autres classes précédemment définies et la complexité de notre algorithme est au moins aussi faible que celles des algorithmes spécifiques sur leurs classes de structures respectives. Ces classes de structures représentent la majorité des structures connues et contiennent de nombreux éléments importants jusqu'alors non pris en compte (tel que les motifs tertiaires d'ARN). Le problème de l'alignement structure-séquence tente de répondre aux problématiques de prédictions de structures et de recherche d'ARN. Néanmoins, la qualité des résultats obtenus par sa résolution dépendent de la fonction de coût utilisée. Durant ma thèse j'ai commencé la mise place de la construction par apprentissage d'une nouvelle fonction de coût, adaptée aux nouvelles classes de structures que nous avons défini. Enfin de par la nature de l'algorithme, le travail réalisé permet des améliorations non négligeables, en terme de qualité des résultats et de rapidité de calcul comme la recherche de solution sous-optimales ou l'utilisation de l'algorithme au sein d'heuristiques dérivées d'heuristiques classiques.

APA, Harvard, Vancouver, ISO, and other styles

29

Zhang, Xiaodong. "A Local Improvement Algorithm for Multiple Sequence Alignment." Ohio University / OhioLINK, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1049485762.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Zhang, Ching. "Genetic algorithm approaches for efficient multiple molecular sequence alignment." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape17/PQDD_0013/NQ30660.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Ahmed, Nova. "Parallel Algorithm for Memory Efficient Pairwise and Multiple Genome Alignment in Distributed Environment." Digital Archive @ GSU, 2004. http://digitalarchive.gsu.edu/cs_theses/2.

Full text

Abstract:

The genome sequence alignment problems are very important ones from the computational biology perspective. These problems deal with large amount of data which is memory intensive as well as computation intensive. In the literature, two separate algorithms have been studied and improved – one is a Pairwise sequence alignment algorithm which aligns pairs of genome sequences with memory reduction and parallelism for the computation and the other one is the multiple sequence alignment algorithm that aligns multiple genome sequences and this algorithm is also parallelized efficiently so that the workload of the alignment program is well distributed. The parallel applications can be launched on different environments where shared memory is very well suited for these kinds of applications. But shared memory environment has the limitation of memory usage as well as scalability also these machines are very costly. A better approach is to use the cluster of computers and the cluster environment can be further enhanced to a grid environment so that the scalability can be improved introducing multiple clusters. Here the grid environment is studied as well as the shared memory and cluster environment for the two applications. It can be stated that for carefully designed algorithms the grid environment is comparable for its performance to other distributed environments and it sometimes outperforms the others in terms of the limitations of resources the other distributed environments have.

APA, Harvard, Vancouver, ISO, and other styles

32

Kim, Eagu. "Inverse Parametric Alignment for Accurate Biological Sequence Comparison." Diss., The University of Arizona, 2008. http://hdl.handle.net/10150/193664.

Full text

Abstract:

For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. In practice, substitution scores are usually chosen by convention, and gap penalties are often found by trial and error. In contrast, a rigorous way to determine parameter values that are appropriate for aligning biological sequences is by solving the problem of Inverse Parametric Sequence Alignment. Given examples of biologically correct reference alignments, this is the problem of finding parameter values that make the examples score as close as possible to optimal alignments of their sequences. The reference alignments that are currently available contain regions where the alignment is not specified, which leads to a version of the problem with partial examples.In this dissertation, we develop a new polynomial-time algorithm for Inverse Parametric Sequence Alignment that is simple to implement, fast in practice, and can learn hundreds of parameters simultaneously from hundreds of examples. Computational results with partial examples show that best possible values for all 212 parameters of the standard alignment scoring model for protein sequences can be computed from 200 examples in 4 hours of computation on a standard desktop machine. We also consider a new scoring model with a small number of additional parameters that incorporates predicted secondary structure for the protein sequences. By learning parameter values for this new secondary-structure-based model, we can improve on the alignment accuracy of the standard model by as much as 15% for sequences with less than 25% identity.

APA, Harvard, Vancouver, ISO, and other styles

33

URGESE, GIANVITO. "Computational Methods for Bioinformatics Analysis and Neuromorphic Computing." Doctoral thesis, Politecnico di Torino, 2016. http://hdl.handle.net/11583/2646486.

Full text

Abstract:

The latest biological discoveries and the exponential growth of more and more sophisticated biotechnologies led in the current century to a revolution that totally reshaped the concept of genetic study. This revolution, which began in the last decades, is still continuing thanks to the introduction of new technologies capable of producing a huge amount of biological data in a relatively short time and at a very low price with respect to some decades ago. These new technologies are known as Next Generation Sequencing (NGS). These platforms perform massively parallel sequencing of both RNA and DNA molecules, thus allowing to retrieve the nucleic acid sequence of millions of fragments of DNA or RNA in a single machine run. The introduction of such technologies rapidly changed the landscape of genetic research, providing the ability to answer questions with heretofore unimaginable accuracy and speed. Moreover, the advent of NGS with the consequent need for ad-hoc strategies for data storage, sharing, and analysis is transforming genetics in a big data research field. Indeed, the large amount of data coming from sequencing technologies and the complexity of biological processes call for novel computational tools (Bioinformatics tools) and informatics resources to exploit this kind of information and gain novel insights into human beings, living organisms, and pathologies mechanisms. At the same time, a new scientific discipline called Neuromorphic Computing has been established to develop SW/HW systems having brain-specific features, such as high degree of parallelism and low power consumption. These platforms are usually employed to support the simulation of the nervous system, thus allowing the study of the mechanisms at the basis of the brain functioning. In this scenario, my research program focused on the development of optimized HW/SW algorithms and tools to process the biological information from Bioinformatics and Neuromorphic studies. The main objective of the methodologies proposed in this thesis consisted in achieving a high level of sensitivity and specificity in data analysis while minimizing the computational time. To reach these milestones, then, some bottlenecks identified in the state-of-the-art tools have been solved through a careful design of three new optimised algorithms. The work that led to this thesis is part of three collaborative projects. Two concerning the design of Bioinformatics sequence alignment algorithms and one aimed at optimizing the resources usage of a Neuromorphic platform. In the next paragraphs, the projects are briefly introduced. Dynamic Gap Selector Project This project concerned the design and implementation of a new gap model implemented in the dynamic programming sequence alignment algorithms. Smith-Waterman (S-W) and Needleman-Wunsch (N-W) are widespread methods to perform Local and Global alignments of biological sequences such as proteins, DNA and RNA molecules that are represented such as sequences of letters. Both the algorithms make use of scoring procedures to evaluate matches and errors that can be encountered during the sequence alignment process. These scoring strategies are designed to consider insertions and deletions through the identification of gaps in the aligned sequences. The Affine gap model is considered the most accurate model for the alignment of biomolecules. However, its application to S-W and N-W algorithms is quite expensive both in terms of computational time as well as in terms of memory requirements when compared to other less demanding models as the Linear gap one. In order to overcome these drawbacks, an optimised version of the Affine gap model called Dynamic Gap Selector (DGS) has been developed. The alignment scores computed using DGS are very similar to those computed using the gold standard Affine gap model. However, the implementation of this novel gap model during the S-W and N-W alignment procedures leads to the reduction of the memory requirements by a factor of 3. Moreover, the DGS model application accounts for a reduction by a factor of 2 in the number of operations required with respect to the standard Affine gap model. isomiR-SEA Project One of the most attractive research fields that is currently investigated by several interdisciplinary research teams is the study of small and medium RNA sequences with regulatory functions on the production of proteins. These RNA molecules are respectively called microRNAs (miRNAs) and long non-coding RNAs (lncRNAs). In the second project, an alignment algorithm specific for miRNAs detection and characterization have been designed and implemented. miRNAs are a class of short RNAs (18-25 bases) that play essential roles in a variety of cellular processes such as development, metabolism, regulation of immunological response and tumor genesis. Several tools have been developed in the last years to align and analyse the huge amount of data coming from the sequencing of short RNA molecules. However, these tools still lack accuracy and completeness because they use general alignment procedures that do not take into account the structural characteristics of miRNA molecules. Moreover, they are not able to detect specific miRNA variants, called isomiRs, that have recently been found to be relevant for miRNA targets regulation. To overcome these limitations, a miRNA-based alignment algorithm has been designed and developed. The isomiR-SEA algorithm is specifically tailored to detect different miRNAs variants (isomiRs) in the RNA-Seq data and to provide users with a detailed picture of the isomiRs spectrum characterizing the sample under investigation. The accuracy proper of the implemented alignment policy is reflected in the precise miRNAs and isomiRs quantification, and in the detailed profiling of miRNAtarget mRNA interaction sites. This information, hidden in raw miRNA sequencing data, can be very useful to properly characterize miRNAs and to adopt them as reliable biomarkers able to describe multifactorial pathologies such as cancer. SNN Partitioning and Placement Project In the Neuromorphic Computing field, SpiNNaker is one of the state-of-the-art massively parallel neuromorphic platform. It is designed to simulate Spiking Neural Networks (SNN) but it is characterized by several bottlenecks in the neuron partitioning and placement phases executed during the simulation configuration. In this activity, related to the European Flagship project Human Brain Project, a top-down methodology has been developed to improve the scalability and reliability of SNN simulations on massively many-core and densely interconnected platforms. In this context, SNNs mimic the brain activity by emulating spikes sent among neurons populations. Many-core platforms are emerging computing resources to achieve real-time SNNs simulations. Neurons are mapped to parallel cores and spikes are sent in the form of packets over the on-chip and off-chip network. However, due to the heterogeneity and complexity of neuron populations activity, achieving an efficient exploitation of platforms resources is a challenge, often impacting simulation reliability and limiting the biological network size. To address this challenge, the proposed methodology makes use of customized SNN configurations capable of extracting detailed profiling information about network usage of on-chip and off-chip resources. Thus, allowing to recognize the bottlenecks in the spike propagation system. These bottlenecks have been then considered during the SNN Partitioning and Placement of a graph describing the SNN interconnection on chips and cores available on the SpiNNaker board. The advantages of the proposed SNN Partitioning and Placement applied to the SpiNNaker has been evaluated in terms of traffic reduction and consequent simulation reliability. The results demonstrate that it is possible to consistently reduce packet traffic and improve simulation reliability by means of an effective neuron placement.

APA, Harvard, Vancouver, ISO, and other styles

34

Hameed, A. "Parallelization of the AAE algorithm." Thesis, Honours thesis, University of Tasmania, 2007. https://eprints.utas.edu.au/3270/1/ahThesis.pdf.

Full text

Abstract:

The exact algorithm formulated by Kececioglu and Starrett (2004) provides a solution to the NP-complete problem of aligning alignments in most biological cases. This work investigates potential speedups that may be gained through the parallelization of the dynamic programming phase of this particular algorithm. Results indicate it is possible to improve the run time performance of the algorithm over non-trivial alignments that compose to a matrix of greater than 10^5 cells.

APA, Harvard, Vancouver, ISO, and other styles

35

Bell, Lachlan Hamilton. "Sequence analysis of enzymes of the shikimate pathway : development of a novel multiple sequence alignment algorithm." Thesis, University of Glasgow, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.307193.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Ioste, Aline Rodrigheri. "Sequências de DNA: uma nova abordagem para o alinhamento ótimo." Pontifícia Universidade Católica de São Paulo, 2016. https://tede2.pucsp.br/handle/handle/18207.

Full text

Abstract:

Made available in DSpace on 2016-04-29T14:23:42Z (GMT). No. of bitstreams: 1 Aline Rodrigheri Ioste.pdf: 3892568 bytes, checksum: d4b25a166ea46de0a3b7edfbfeab6923 (MD5) Previous issue date: 2016-03-04
The objective of this study is to deeply understand the techniques currently used in optimal alignment of DNA sequences, focused on the strengths and limitations of these methods. Analyzing the feasibility of creating a new logical approach able to ensure optimal results , taking into account existing problems in optimal alignment as: (i ) the numerous alignment possibilities of two sequences , ( ii ) the great need for space and memory the machines, ( ii ) processing time to compute the optimal data and (iv ) exponential growth. This study allowed the beginning of the creation of a new logical approach to the global optimum alignment, showing promising results in higher scores with less need for calculations where the mastery of these new techniques can lead to use search of excellent results in the global alignment optimal in large data bases
O objetivo deste estudo é entender profundamente as técnicas utilizadas atualmente no alinhamento ótimo de sequências de DNA e analisar a viabilidade da criação de uma nova abordagem lógica capaz de garantir o resultado ótimo, levando em consideração os problemas existentes no alinhamento ótimo como: (i) as inúmeras possibilidades de alinhamento de duas sequências, (ii) a grande necessidade de espaço e memória das máquinas, (ii) o tempo de processamento para computar os dados ótimos e (iv) seu crescimento exponencial. O presente estudo permitiu o início da criação de uma nova abordagem lógica para o alinhamento ótimo global, demonstrando resultados promissores de maiores pontuações com menos necessidades de cálculos, onde o domínio destas novas técnicas pode conduzir à utilização da busca de resultados ótimos no alinhamento global de sequências biológicas em grandes bases de dados

APA, Harvard, Vancouver, ISO, and other styles

37

Pappas, Nicholas Peter. "Searching Biological Sequence Databases Using Distributed Adaptive Computing." Thesis, Virginia Tech, 2003. http://hdl.handle.net/10919/31074.

Full text

Abstract:

Genetic research projects currently can require enormous computing power to processes the vast quantities of data available. Further, DNA sequencing projects are generating data at an exponential rate greater than that of the development microprocessor technology; thus, new, faster methods and techniques of processing this data are needed. One common type of processing involves searching a sequence database for the most similar sequences. Here we present a distributed database search system that utilizes adaptive computing technologies. The search is performed using the Smith-Waterman algorithm, a common sequence comparison algorithm. To reduce the total search time, an initial search is performed using a version of the algorithm, implemented in adaptive computing hardware, which is designed to efficiently perform the initial search. A final search is performed using a complete version of the algorithm. This two-stage search, employing adaptive and distributed hardware, achieves a performance increase of several orders of magnitude over similar processor based systems.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

38

Hu, Hanqing. "USING PROGRAM SLICING AND SEQUENCE ALIGNMENT TO ANALYZE ORGANISMS OF AVIDA, A DIGITAL EVOLUTION PLATFORM." Miami University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=miami1331015337.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Cuevas, Tristan Lee. "A PAIRWISE COMPARISON OF DNA SEQUENCE ALIGNMENT USING AN OPENMP IMPLEMENTATION OF THE SWAMP PARALLEL SMITH-WATERMAN ALGORITHM." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1429528937.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Trněný, Ondřej. "Metody vícenásobného zarovnávání nukleotidových sekvencí." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-220010.

Full text

Abstract:

To be able to understand characteristics and purpose of biological sequences correctly, it is crucial to have a possibility to sort and compare them. Because of this need and to extend existing knowledge pool, numerous methods were proposed. Especially in field of multiple sequences alignment. Methods for multiple sequences alignment may provide various valuable information about sequences which failed to show enough similarity in pairwise alignment. According to this, several algorithms were implemented in various computer applications which provide a way to analyse huge sets of data. One of those, the progressive alignment algorithm, is implemented as a part of this thesis

APA, Harvard, Vancouver, ISO, and other styles

41

Ding, Guoxiang. "DERIVING ACTIVITY PATTERNS FROM INDIVIDUAL TRAVEL DIARY DATA: A SPATIOTEMPORAL DATA MINING APPROACH." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1236777859.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Nosek, Ondřej. "Hardwarová akcelerace algoritmu pro hledání podobnosti dvou DNA řetězců." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2007. http://www.nusl.cz/ntk/nusl-236882.

Full text

Abstract:

Methods for aproximate string matching of various sequences used in bioinformatics are crucial part of development in this branch. Tasks are of very large time complexity and therefore we want create a hardware platform for acceleration of these computations. Goal of this work is to design a generalized architecture based on FPGA technology, which can work with various types of sequences. Designed acceleration card will use especially dynamic algorithms like Needleman-Wunsch and Smith-Waterman.

APA, Harvard, Vancouver, ISO, and other styles

43

Zhao, Mengyao. "Genomic variation detection using dynamic programming methods." Thesis, Boston College, 2014. http://hdl.handle.net/2345/bc-ir:104357.

Full text

Abstract:

Thesis advisor: Gabor T. Marth
Background: Due to the rapid development and application of next generation sequencing (NGS) techniques, large amounts of NGS data have become available for genome-related biological research, such as population genetics, evolutionary research, and genome wide association studies. A crucial step of these genome-related studies is the detection of genomic variation between different species and individuals. Current approaches for the detection of genomic variation can be classified into alignment-based variation detection and assembly-based variation detection. Due to the limitation of current NGS read length, alignment-based variation detection remains the mainstream approach. The Smith-Waterman algorithm, which produces the optimal pairwise alignment between two sequences, is frequently used as a key component of fast heuristic read mapping and variation detection tools for next-generation sequencing data. Though various fast Smith-Waterman implementations are developed, they are either designed as monolithic protein database searching tools, which do not return detailed alignment, or they are embedded into other tools. These issues make reusing these efficient Smith-Waterman implementations impractical. After the alignment step in the traditional variation detection pipeline, the afterward variation detection using pileup data and the Bayesian model is also facing great challenges especially from low-complexity genomic regions. Sequencing errors and misalignment problems still influence variation detection (especially INDEL detection) a lot. The accuracy of genomic variation detection still needs to be improved, especially when we work on low- complexity genomic regions and low-quality sequencing data. Results: To facilitate easy integration of the fast Single-Instruction-Multiple-Data Smith-Waterman algorithm into third-party software, we wrote a C/C++ library, which extends Farrar's Striped Smith-Waterman (SSW) to return alignment information in addition to the optimal Smith-Waterman score. In this library we developed a new method to generate the full optimal alignment results and a suboptimal score in linear space at little cost of efficiency. This improvement makes the fast Single-Instruction-Multiple-Data Smith-Waterman become really useful in genomic applications. SSW is available both as a C/C++ software library, as well as a stand-alone alignment tool at: https://github.com/mengyao/Complete- Striped-Smith-Waterman-Library. The SSW library has been used in the primary read mapping tool MOSAIK, the split-read mapping program SCISSORS, the MEI detector TAN- GRAM, and the read-overlap graph generation program RZMBLR. The speeds of the mentioned software are improved significantly by replacing their ordinary Smith-Waterman or banded Smith-Waterman module with the SSW Library. To improve the accuracy of genomic variation detection, especially in low-complexity genomic regions and on low-quality sequencing data, we developed PHV, a genomic variation detection tool based on the profile hidden Markov model. PHV also demonstrates a novel PHMM application in the genomic research field. The banded PHMM algorithms used in PHV make it a very fast whole-genome variation detection tool based on the HMM method. The comparison of PHV to GATK, Samtools and Freebayes for detecting variation from both simulated data and real data shows PHV has good potential for dealing with sequencing errors and misalignments. PHV also successfully detects a 49 bp long deletion that is totally misaligned by the mapping tool, and neglected by GATK and Samtools. Conclusion: The efforts made in this thesis are very meaningful for methodology development in studies of genomic variation detection. The two novel algorithms stated here will also inspire future work in NGS data analysis
Thesis (PhD) — Boston College, 2014
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Biology

APA, Harvard, Vancouver, ISO, and other styles

44

Belghiti, Moulay Tayeb. "Modélisation et techniques d'optimisation en bio-informatique et fouille de données." Thesis, Rouen, INSA, 2008. http://www.theses.fr/2008ISAM0002.

Full text

Abstract:

Cette thèse est particulièrement destinée à traiter deux types de problèmes : clustering et l'alignement multiple de séquence. Notre objectif est de résoudre de manière satisfaisante ces problèmes globaux et de tester l'approche de la Programmation DC et DCA sur des jeux de données réelles. La thèse comporte trois parties : la première partie est consacrée aux nouvelles approches de l'optimisation non convexe. Nous y présentons une étude en profondeur de l'algorithme qui est utilisé dans cette thèse, à savoir la programmation DC et l'algorithme DC (DCA). Dans la deuxième partie, nous allons modéliser le problème clustering en trois sous-problèmes non convexes. Les deux premiers sous-problèmes se distinguent par rapport au choix de la norme utilisée, (clustering via les normes 1 et 2). Le troisième sous-problème utilise la méthode du noyau, (clustering via la méthode du noyau). La troisième partie sera consacrée à la bio-informatique. On va se focaliser sur la modélisation et la résolution de deux sous-problèmes : l'alignement multiple de séquence et l'alignement de séquence d'ARN par structure. Tous les chapitres excepté le premier se terminent par des tests numériques
This Ph.D. thesis is particularly intended to treat two types of problems : clustering and the multiple alignment of sequence. Our objective is to solve efficiently these global problems and to test DC Programming approach and DCA on real datasets. The thesis is divided into three parts : the first part is devoted to the new approaches of nonconvex optimization-global optimization. We present it a study in depth of the algorithm which is used in this thesis, namely the programming DC and the algorithm DC ( DCA). In the second part, we will model the problem clustering in three nonconvex subproblems. The first two subproblems are distinguished compared to the choice from the norm used, (clustering via norm 1 and 2). The third subproblem uses the method of the kernel, (clustering via the method of the kernel). The third part will be devoted to bioinformatics, one goes this focused on the modeling and the resolution of two subproblems : the multiple alignment of sequence and the alignment of sequence of RNA. All the chapters except the first end in numerical tests

APA, Harvard, Vancouver, ISO, and other styles

45

Thibord, Florian. "Variation génétique et plasmatique des microARNs : impact sur les paramètres biologiques de l’hémostase OPTIMIR, a novel algorithm for integrating available genome-wide genotype data into miRNA sequence alignment analysis A Genome Wide Association Study on plasma FV levels identified PLXDC2 as a new modifier of the coagulation process." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS379.

Full text

Abstract:

Les microARNs (miARNs) sont les membres d’une classe de petits ARNs non codants d’environ 22 nucléotides, dont le mécanisme principal est de réguler l’expression des gènes dans le cytoplasme. Leurs importance est telle qu’il est estimé que la majorité des gènes humains sont régulés par ces petits ARNs, et ils sont ainsi potentiellement impliqués dans le développement de nombreuse pathologies. La séquence des miARNs peut-être soumise à des variations post- transcriptionnelles et des variations génétiques générant alors des séquences isoformes appelées isomiRs. Afin de détécter et quantifier précisemment l’expression des miARNs à partir de données de séquençage, cette hétérogénéité intra-miARN, due aux isomiRs, doit être prise en compte, tout comme l’homogénéité inter-miARN due aux miARNs paralogues. Le pipeline optimiR, développé dans le cadre de cette thèse, permet de surmonter ces challenges grâce notamment à l’intégration de l’information génétique des échantillons analysés, ainsi qu’à une stratégie d’alignement originale, qui permettent de détecter les isomiRs tout en distinguant les miARNs paralogues. Les données analysées lors de cette thèse proviennent de la cohorte MARTHA, composée de patients ayant développé une thrombose veineuse (VTE), parfois avec récidive. L’expression normalisée de 162 miARNs obtenue pour 344 patients a ensuite été utilisée afin d’analyser: 1) les déterminants génétiques de l’expression de ces miARNs; 2) l’association des miARNs avec le risque de récidive pour la VTE; 3) les corrélations avec certains paramètres biologiques de l’hémostase. Collectivement, ces analyses m’ont permis d’identifier des microARNs d’intérêt pour la recherche sur la thrombose veineuse et sur l’hémostase
MicroRNAs (miRNA) are small non coding RNAs with an average size of 22 nucleotides, mainly known to regulate gene expression in the cytoplasm. These small RNAs are estimated to regulate the majority of human genes, and are potentially involved in several diseases. MiRNA sequences might contain genetic variants and can undergo post-transcriptional variations, which generate miRNA isoforms called isomiRs. In order to accurately detect and quantify miRNA expression, isomiRs as well as paralogous miRNAs must be accounted for. The optimiR pipeline developed during this project overcome these challenges by integrating genetic information and by implementing an original strategy based on local alignement. Sequencing data were obtained from the MARTHA cohort, which is composed of french unrelated patients who experienced venous thrombosis (VTE). Normalized expression of 162 miRNAs from 334 patients were used to analyze: 1) the genetic determinants of miRNA expression; 2) the association of miRNA expression levels with VTE recurence; 3) the correlations between miRNA expression levels and hemostatic traits. As a whole, these analyses allowed me to identify miRNAs of interest for the study of VTE and hemostasis

APA, Harvard, Vancouver, ISO, and other styles

46

Yu, Daniel C. T., and 游景達. "Efficient Algorithms for Constrained Sequence Alignment Problems." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/v2a9sc.

Full text

Abstract:

碩士
靜宜大學
資訊管理學系研究所
91
Sequence alignment is one of the most important primitive tools in computational biology. By using this methodology, we can identify a homologous relationship effectively with a biological sequence about which more is known. Though the sequence alignment tool is useful, the alignment results generated by this tool may not always meet the user-specified constraints, which are the datasets biologists believed that should be aligned together. Therefore, the development of a new sequence alignment tool that meets the user-specified constraints becomes a necessity. In this thesis, two efficient constrained sequence alignment algorithms for different application is being proposed. 1.Constrained sequence alignment using consecutive subsequence constraints - an extended version of C2SA(Constrained Pairwise Sequence Alignment, C2SA), which is proposed by Chuan-Yi Tang et al., with reduced time complexity of O(rn^2). 2.Constrained structural RNA sequence alignment - a constrained sequence alignment method with basepairing which takes O(rn^4), where n is the maximum of the lengths of sequences being aligned and r is the number of constraints.

APA, Harvard, Vancouver, ISO, and other styles

47

Abbasi, Maryam. "Multiobjective Sequence Alignment Formulation, Algorithms and Application." Doctoral thesis, 2019. http://hdl.handle.net/10316/87604.

Full text

Abstract:

Tese de Doutoramento em Ciências e Tecnologias da Informação, apresentada ao Departamento de Engenharia Informática da Faculdade de Ciências e Tecnologia da Universidade de Coimbra
Sequence alignment is a standard technique in bioinformatics to measure the relationship between evolutionary or structurally related DNA/proteins sequences. Most modern programs for sequence alignment optimize a given objective function that is a convex combination of how many gaps need to be inserted into the sequences and how many characters become aligned. Clearly, depending on the weights given to each of the two components, different optimal alignments can be obtained. Therefore, choosing only one weight setting may provide an undesirable bias in further steps of the analysis, such as phylogenetic tree construction, and provide too simplistic interpretations. In this thesis, we take a different point of view on the mathematical formulation of the sequence alignment problem. Rather than considering the optimization of a scalar score function, resulting from a weighted sum of components, we consider a vector score function with the goal of optimizing, simultaneously, the different score components. This brings us to the topic of multiobjective optimization, which deals with the mathematical formulation of optimization problems with several conflicting objectives as well as with algorithms to solve them. Under this new formulation, these algorithms return a set of non-dominated alignments, each of which representing a trade-off between the several components. This set gives further information about the similarity of the sequences, from which a practitioner could analyze and choose the most plausible alignment. We consider the biobjective pairwise sequence alignment problem and propose extensions of efficient dynamic programming algorithms for several variants of this problem. We propose a novel pruning technique that substantially reduces the computation time and memory usage. Moreover, we consider a biobjective variant of this problem with more than two sequences, which is computationally intractable. We introduce local search techniques for this problem and conduct an in-depth experimental analysis on a wide range of benchmark instances. Based on the hypervolume indicator and empirical attainment function methodology, we establish functional relationships between algorithm performance and instance features. Finally, we present a method that uses multiobjective concepts for the construction of phylogenetic trees. We test this method on two real-life cases and show that the number of distinct phylogenetic tree topologies obtained is very small. This work shows that multiobjective concepts can successfully be applied to the sequence alignment problem and identifies which approaches can be used for the several variants of this problem. We believe that the methods proposed in this thesis, by providing more information about the relationship between biological sequences than the current known procedures, can be of great value to a broad range of research communities as well as to practitioners in the field.
O alinhamento de sequências é um procedimento utilizado na Bioinformática que tem por objetivo medir a semelhança entre sequências de DNA ou proteínas relacionadas entre si de uma forma evolutiva ou estrutural. As aplicações atuais para alinhamento de sequências otimizam uma determinada função objetivo que resulta da combinação convexa da quantidade de espaços a inserir nas sequências e da quantidade de caracteres que ficam alinhados. Dependendo das ponderações atribuídas a cada um destes dois componentes, diferentes alinhamentos podem ser obtidos. Desta forma, a escolha de uma só ponderação pode enviesar, indesejadamente, os passos seguintes da análise, por exemplo, na construção de árvores filogenéticas, e fornecer interpretações demasiado simples. Esta tese aborda o problema de alinhamento de sequências de uma forma diferente na perspetiva de formulação matemática. Em vez da otimização de uma função escalar que resulta de uma soma ponderada das componentes, considera-se uma função vetorial em que se pretende otimizar, simultaneamente, as suas componentes. O estudo destes problemas é abordado em otimização multi-objetivo, que lida com as formulações matemáticas de problemas de otimização com vários objetivos conflituosos entre si e com algoritmos para a sua resolução. Com esta nova formulação, os algoritmos retornam um conjunto de alinhamentos não-dominados, cada um representando um compromisso entre as várias componentes da função objetivo. Este conjunto, ao fornecer mais informação acerca da semelhança entre as sequências em análise, permite, ao profissional, escolher o alinhamento mais plausível. Neste estudo, considera-se o problema bi-objetivo de alinhamento emparelhado de sequências e variantes deste problema, para os quais propõem-se extensões de algoritmos eficientes baseados em programação dinâmica. Propõe-se iguamente uma variante bi-objetivo deste problema para mais do que duas sequências, que é considerado um problema computacionalmente intratável. Por esta razão, apresentam-se técnicas de procura local para este problema. Estes algoritmos são analisados experimentalmente num conjunto de instâncias de referência. Com base no indicador de hipervolume e na metodologia das funções de aproveitamento, estabelecem-se relações funcionais entre o desempenho dos algoritmos e características destas instâncias. Finalmente, apresenta-se um método que utiliza conceitos de otimização multi-objetivo para a construção de árvores filogenéticas. Este método é testado em dois casos reais. Os resultados obtidos indicam que o número de topologias distintas de árvores filogenéticas é bastante pequeno. Este estudo mostra que os conceitos multi-objetivo podem ser utilizados com sucesso no problema de alinhamento de sequências e permite identificar quais as abordagens que podem ser utilizadas para cada uma das variantes apresentadas. Ao fornecer mais informação acerca da relação entre as sequências biológicas do que os métodos atuais, espera-se que as contribuições desta tese possam de grande valor tanto para a comunidade académica como para os profissionais de Bioinformática.

APA, Harvard, Vancouver, ISO, and other styles

48

Wang, Shu 1973. "On multiple sequence alignment." Thesis, 2007. http://hdl.handle.net/2152/3715.

Full text

Abstract:

The tremendous increase in biological sequence data presents us with an opportunity to understand the molecular and cellular basis for cellular life. Comparative studies of these sequences have the potential, when applied with sufficient rigor, to decipher the structure, function, and evolution of cellular components. The accuracy and detail of these studies are directly proportional to the quality of these sequences alignments. Given the large number of sequences per family of interest, and the increasing number of families to study, improving the speed, accuracy and scalability of MSA is becoming an increasingly important task. In the past, much of interest has been on Global MSA. In recent years, the focus for MSA has shifted from global MSA to local MSA. Local MSA is being needed to align variable sequences from different families/species. In this dissertation, we developed two new algorithms for fast and scalable local MSA, a three-way-consistency-based MSA and a biclustering -based MSA. The first MSA algorithm is a three-way-Consistency-Based MSA (CBMSA). CBMSA applies alignment consistency heuristics in the form of a new three-way alignment to MSA. While three-way consistency approach is able to maintain the same time complexity as the traditional pairwise consistency approach, it provides more reliable consistency information and better alignment quality. We quantify the benefit of using three-way consistency as compared to pairwise consistency. We have also compared CBMSA to a suite of leading MSA programs and CBMSA consistently performs favorably. We also developed another new MSA algorithm, a biclustering-based MSA. Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in MSA is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering algorithms are intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was compared with a suite of leading MSA programs. With respect to quantitative measures of MSA, BlockMSA scores comparable to or better than the other leading MSA programs. With respect to biological validation of MSA, the other leading MSA programs lag BlockMSA in their ability to identify the most highly conserved regions.

APA, Harvard, Vancouver, ISO, and other styles

49

Wang, Xuting. "Multiple sequence alignment using traveling salesman problem algorithms." 2002. http://purl.galileo.usg.edu/uga%5Fetd/wang%5Fxuting%5F200205%5Fms.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Ma, Fangrui. "Biological sequence analyses theory, algorithms, and applications /." 2009. http://proquest.umi.com/pqdweb?did=1821098721&sid=1&Fmt=2&clientId=14215&RQT=309&VName=PQD.

Full text

Abstract:

Thesis (Ph.D.)--University of Nebraska-Lincoln, 2009.
Title from title screen (site viewed October 13, 2009). PDF text: xv, 233 p. : ill. ; 4 Mb. UMI publication number: AAT 3360173. Includes bibliographical references. Also available in microfilm and microfiche formats.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!