Log in

Relevant bibliographies by topics / Sequence alignment algorithms / Journal articles

Journal articles on the topic 'Sequence alignment algorithms'

To see the other types of publications on this topic, follow the link: Sequence alignment algorithms.

Author: Grafiati

Published: 14 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Sequence alignment algorithms.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Cavanaugh, David, and Krishnan Chittur. "A hydrophobic proclivity index for protein alignments." F1000Research 4 (October 21, 2015): 1097. http://dx.doi.org/10.12688/f1000research.6348.1.

Full text

Abstract:

Sequence alignment algorithms are fundamental to modern bioinformatics. Sequence alignments are widely used in diverse applications such as phylogenetic analysis, database searches for related sequences to aid identification of unknown protein domain structures and classification of proteins and protein domains. Additionally, alignment algorithms are integral to the location of related proteins to secure understanding of unknown protein functions, to suggest the folded structure of proteins of unknown structure from location of homologous proteins and/or by locating homologous domains of known 3D structure. For proteins, alignment algorithms depend on information about amino acid substitutions that allows for matching sequences that are similar, but not exact. When primary sequence percent identity falls below about 25%, algorithms often fail to identify proteins that may have similar 3D structure. We have created a hydrophobicity scale and a matching dynamic programming algorithm called TMATCH (unpublished report) that is able to match proteins with remote homologs with similar secondary/tertiary structure, even with very low primary sequence matches. In this paper, we describe how we arrived at the hydrophobic scale, how it provides much more information than percent identity matches and some of the implications for better alignments and understanding protein structure.

APA, Harvard, Vancouver, ISO, and other styles

2

Cavanaugh, David, and Krishnan Chittur. "A hydrophobic proclivity index for protein alignments." F1000Research 4 (October 15, 2020): 1097. http://dx.doi.org/10.12688/f1000research.6348.2.

Full text

Abstract:

Sequence alignment algorithms are fundamental to modern bioinformatics. Sequence alignments are widely used in diverse applications such as phylogenetic analysis, database searches for related sequences to aid identification of unknown protein domain structures and classification of proteins and protein domains. Additionally, alignment algorithms are integral to the location of related proteins to secure understanding of unknown protein functions, to suggest the folded structure of proteins of unknown structure from location of homologous proteins and/or by locating homologous domains of known 3D structure. For proteins, alignment algorithms depend on information about amino acid substitutions that allows for matching sequences that are similar, but not exact. When primary sequence percent identity falls below about 25%, algorithms often fail to identify proteins that may have similar 3D structure. We have created a hydrophobicity scale and a matching dynamic programming algorithm called TMATCH (preprint report) that is able to match proteins with remote homologs with similar secondary/tertiary structure, even with very low primary sequence matches. In this paper, we describe how we arrived at the hydrophobic scale, how it provides much more information than percent identity matches and some of the implications for better alignments and understanding protein structure.

APA, Harvard, Vancouver, ISO, and other styles

3

Arenas-Díaz, Edgar D., Helga Ochoterena, and Katya Rodríguez-Vázquez. "Multiple Sequence Alignment Using a Genetic Algorithm and GLOCSA." Journal of Artificial Evolution and Applications 2009 (August 27, 2009): 1–10. http://dx.doi.org/10.1155/2009/963150.

Full text

Abstract:

Algorithms that minimize putative synapomorphy in an alignment cannot be directly implemented since trivial cases with concatenated sequences would be selected because they would imply a minimum number of events to be explained (e.g., a single insertion/deletion would be required to explain divergence among two sequences). Therefore, indirect measures to approach parsimony need to be implemented. In this paper, we thoroughly present a Global Criterion for Sequence Alignment (GLOCSA) that uses a scoring function to globally rate multiple alignments aiming to produce matrices that minimize the number of putative synapomorphies. We also present a Genetic Algorithm that uses GLOCSA as the objective function to produce sequence alignments refining alignments previously generated by additional existing alignment tools (we recommend MUSCLE). We show that in the example cases our GLOCSA-guided Genetic Algorithm (GGGA) does improve the GLOCSA values, resulting in alignments that imply less putative synapomorphies.

APA, Harvard, Vancouver, ISO, and other styles

4

WANG, YI, and KUO-BIN LI. "MULTIPLE SEQUENCE ALIGNMENT USING AN EXHAUSTIVE AND GREEDY ALGORITHM." Journal of Bioinformatics and Computational Biology 03, no. 02 (April 2005): 243–55. http://dx.doi.org/10.1142/s021972000500103x.

Full text

Abstract:

We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.

APA, Harvard, Vancouver, ISO, and other styles

5

BACKOFEN, ROLF, and SEBASTIAN WILL. "LOCAL SEQUENCE-STRUCTURE MOTIFS IN RNA." Journal of Bioinformatics and Computational Biology 02, no. 04 (December 2004): 681–98. http://dx.doi.org/10.1142/s0219720004000818.

Full text

Abstract:

Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2·m2· max (n,m)) and a space complexity of only O(n·m). An implementation of our algorithm is available at . Its runtime is competitive with global sequence-structure alignment.

APA, Harvard, Vancouver, ISO, and other styles

6

Rautiainen, Mikko, Veli Mäkinen, and Tobias Marschall. "Bit-parallel sequence-to-graph alignment." Bioinformatics 35, no. 19 (March 9, 2019): 3599–607. http://dx.doi.org/10.1093/bioinformatics/btz162.

Full text

Abstract:

Abstract Motivation Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction and variant calling with respect to a variation graph. Results We generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers’ bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of up to w over naive algorithms. For a graph with |V| nodes and |E| edges and a sequence of length m, our bitvector-based graph alignment algorithm reaches a worst case runtime of O(|V|+⌈mw⌉|E| log w) for acyclic graphs and O(|V|+m|E| log w) for arbitrary cyclic graphs. We apply it to five different types of graphs and observe a speedup between 3-fold and 20-fold compared with a previous (asymptotically optimal) alignment algorithm. Availability and implementation https://github.com/maickrau/GraphAligner Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

7

CHIN, FRANCIS Y. L., N. L. HO, T. W. LAM, and PRUDENCE W. H. WONG. "EFFICIENT CONSTRAINED MULTIPLE SEQUENCE ALIGNMENT WITH PERFORMANCE GUARANTEE." Journal of Bioinformatics and Computational Biology 03, no. 01 (February 2005): 1–18. http://dx.doi.org/10.1142/s0219720005000977.

Full text

Abstract:

The constrained multiple sequence alignment problem is to align a set of sequences of maximum length n subject to a given constrained sequence, which arises from some knowledge of the structure of the sequences. This paper presents new algorithms for this problem, which are more efficient in terms of time and space (memory) than the previous algorithms,15 and with a worst-case guarantee on the quality of the alignment. Saving the space requirement by a quadratic factor is particularly significant as the previous O(n4)-space algorithm has limited application due to its huge memory requirement. Experiments on real data sets confirm that our new algorithms show improvements in both alignment quality and resource requirements.

APA, Harvard, Vancouver, ISO, and other styles

8

Zhu, J., J. S. Liu, and C. E. Lawrence. "Bayesian adaptive sequence alignment algorithms." Bioinformatics 14, no. 1 (February 1, 1998): 25–39. http://dx.doi.org/10.1093/bioinformatics/14.1.25.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

NARIMANI, ZAHRA, HAMID BEIGY, and HASSAN ABOLHASSANI. "A NEW GENETIC ALGORITHM FOR MULTIPLE SEQUENCE ALIGNMENT." International Journal of Computational Intelligence and Applications 11, no. 04 (December 2012): 1250023. http://dx.doi.org/10.1142/s146902681250023x.

Full text

Abstract:

Multiple sequence alignment (MSA) is one of the basic and important problems in molecular biology. MSA can be used for different purposes including finding the conserved motifs and structurally important regions in protein sequences and determine evolutionary distance between sequences. Aligning several sequences cannot be done in polynomial time and therefore heuristic methods such as genetic algorithms can be used to find approximate solutions of MSA problems. Several algorithms based on genetic algorithms have been developed for this problem in recent years. Most of these algorithms use very complicated, problem specific and time consuming mutation operators. In this paper, we propose a new algorithm that uses a new way of population initialization and simple mutation and recombination operators. The strength of the proposed GA is using simple mutation operators and also a special recombination operator that does not have problems of similar recombination operators in other GAs. The experimental results show that the proposed algorithm is capable of finding good MSAs in contrast to existing methods, while it uses simple operators with low computational complexity.

APA, Harvard, Vancouver, ISO, and other styles

10

Long, Hai Xia, Li Hua Wu, and Yu Zhang. "Multiple Sequence Alignment Based on Profile Hidden Markov Model and Quantum-Behaved Particle Swarm Optimization with Selection Method." Advanced Materials Research 282-283 (July 2011): 7–12. http://dx.doi.org/10.4028/www.scientific.net/amr.282-283.7.

Full text

Abstract:

Multiple sequence alignment (MSA) is an NP-complete and important problem in bioinformatics. Currently, profile hidden Markov model (HMM) is widely used for multiple sequence alignment. In this paper, Quantum-behaved Particle Swarm Optimization with selection operation (SQPSO) is presented, which is used to train profile HMM. Furthermore, an integration algorithm based on the profile HMM and SQPSO for the MSA is constructed. The approach is examined by using multiple nucleotides and protein sequences and compared with other algorithms. The results of the comparisons show that the HMM trained with SQPSO and QPSO yield better alignments than other most commonly used HMM training methods such as Baum–Welch and PSO.

APA, Harvard, Vancouver, ISO, and other styles

11

Keith, Jonathan M., Peter Adams, Darryn Bryant, Keith R. Mitchelson, Duncan A. E. Cochran, and Gita H. Lala. "Inferring an Original Sequence from Erroneous Copies: Two Approaches." Asia-Pacific Biotech News 07, no. 03 (February 3, 2003): 107–14. http://dx.doi.org/10.1142/s0219030303000284.

Full text

Abstract:

This paper considers the problem of inferring an original sequence from a number of erroneous copies. The problem arises in DNA sequencing, particularly in the context of emerging technologies that provide high throughput or other advantages at the cost of an increased number of errors. We describe and compare two approaches that have recently been developed by the authors. The first approach searches for a sequence known as a Steiner string; the second searches for the most probable original sequence with respect to a simple Bayesian model of sequencing errors. We present the results of extensive tests in which erroneous copies of real DNA sequences were simulated and the algorithms were used to infer the original sequences. The results are used to compare the two approaches to each other and to a third, more conventional, approach based on multiple sequence alignment. We find that the Bayesian approach is superior to the Steiner approach, which in turn is superior to the alignment approach. The two new algorithms can also be used to construct multiple sequence alignments. We show that the two methods produce alignments of approximately equal quality, and conclude that the Steiner approach is better for this purpose because it is faster. Both methods produce better alignments than a well-known multiple sequence alignment package, for the cases tested.

APA, Harvard, Vancouver, ISO, and other styles

12

Gambin, Anna, and Rafał Otto. "Contextual Multiple Sequence Alignment." Journal of Biomedicine and Biotechnology 2005, no. 2 (2005): 124–31. http://dx.doi.org/10.1155/jbb.2005.124.

Full text

Abstract:

In a recently proposed contextual alignment model, efficient algorithms exist for global and local pairwise alignment of protein sequences. Preliminary results obtained for biological data are very promising. Our main motivation was to adopt the idea of context dependency to the multiple-alignment setting. To this aim the relaxation of the model was developed (we call this new modelaveraged contextual alignment) and a new family of amino acids substitution matrices are constructed. In this paper we present a contextual multiple-alignment algorithm and report the outcomes of experiments performed for the BAliBASE test set. The contextual approach turned out to give much better results for the set of sequences containing orphan genes.

APA, Harvard, Vancouver, ISO, and other styles

13

Md Isa, Mohd Nazrin, Sohiful Anuar Zainol Murad, Mohamad Imran Ahmad, Muhammad M. Ramli, and Rizalafande Che Ismail. "An Efficient Scheduling Technique for Biological Sequence Alignment." Applied Mechanics and Materials 754-755 (April 2015): 1087–92. http://dx.doi.org/10.4028/www.scientific.net/amm.754-755.1087.

Full text

Abstract:

Computing alignment matrix score to search for regions of homology between biological sequences is time consuming task. This is due to the recursive nature of the dynamic programming-based algorithms such as the Smith-Waterman and the Needleman-Wunsch algorithmns. Typical FPGA-based protein sequencer comprises of two main logic blocks. One for computing alignment scores i.e. the processing element (PE), while another logic block for configuring the PE with coefficients. During alignment matrix computation, the logic block for configuring the PE are left unused until the time consuming alignment matrix computation finished. Therefore, a new technique, known as overlap computation and configuration (OCC) is proposed to minimize the time overhead for performing biological sequence alignment. The OCC technique simultaneously updating substitution matrix in a processing element (PE) systolic array, while computing alignment matrix scores. Results showed that, the sequencer achieves more than two order of magnitude speed-up higher compared to the state of the art, at negligible area overhead, if any.

APA, Harvard, Vancouver, ISO, and other styles

14

Manavalan, Mani. "Fast Model-based Protein Homology Discovery without Alignment." Asia Pacific Journal of Energy and Environment 1, no. 2 (December 31, 2014): 169–84. http://dx.doi.org/10.18034/apjee.v1i2.580.

Full text

Abstract:

The need for quick gene categorization tools is growing as more genomes are sequenced. To evaluate a newly sequenced genome, the genes must first be identified and translated into amino acid sequences, which are then categorized into structural or functional classes. Protein homology detection using sequence alignment algorithms is the most effective way for protein categorization. Discriminative approaches such as support vector machines (SVMs) and position-specific scoring matrices (PSSM) derived from PSI-BLAST have recently been used to improve alignment algorithms. However, if a fresh sequence is being aligned, alignment algorithms take time. must be compared to a large number of previously published sequences — the same is true for SVMs. Building a PSSM for the PSSM is even more time-consuming than a fresh order It would take roughly 25 hours to implement the best-performing approaches to classify the sequences on today's computers. Describing a novel genome (20, 000 genes) as belonging to one single organism. There are hundreds of classes to choose from, though. Another flaw with alignment algorithms is that they do not construct a model of the positive class, instead of measuring the mutual distance between sequences or profiles. Only multiple alignments and hidden Markov models are common classification approaches for creating a positive class model, but they have poor classification performance. A model's advantage is that it may be evaluated for chemical features that are shared by all members of the class to get fresh insights into protein function and structure. We used LSTM to solve a well-known remote protein homology detection benchmark, in which a protein must be categorized as a member of the SCOP superfamily. LSTM achieves state-of-the-art classification performance while being significantly faster than other algorithms with similar classification performance. LSTM is five orders of magnitude quicker than the quickest SVM-based approaches and two orders of magnitude faster than methods that perform somewhat better in classification (which, however, have lower classification performance than LSTM). We applied LSTM to PROSITE classes and analyzed the derived patterns to test the modeling capabilities of the algorithm. Because it does not require established similarity metrics like BLOSUM or PAM matrices, LSTM is complementary to alignment-based techniques. The PROSITE motif was retrieved by LSTM in 8 out of 15 classes. In the remaining seven examples, alternative motifs are developed that, on average, outperform the PROSITE motifs in categorization.

APA, Harvard, Vancouver, ISO, and other styles

15

Barton, Geoffrey J. "Protein Sequence Alignment Techniques." Acta Crystallographica Section D Biological Crystallography 54, no. 6 (November 1, 1998): 1139–46. http://dx.doi.org/10.1107/s0907444998008324.

Full text

Abstract:

The basic algorithms for alignment of two or more protein sequences are explained. Alternative methods for scoring substitutions and gaps (insertions and deletions) are described, as are global and local alignment methods. Multiple alignment techniques are explained, including methods for profile comparison. A summary is given of programs for the alignment and analysis of protein sequences, either from sequence alone, or from three-dimensional structure.

APA, Harvard, Vancouver, ISO, and other styles

16

Fang, Meng, Jiawei Xu, Nan Sun, and Stephen S. T. Yau. "Generating Minimal Models of H1N1 NS1 Gene Sequences Using Alignment-Based and Alignment-Free Algorithms." Genes 14, no. 1 (January 10, 2023): 186. http://dx.doi.org/10.3390/genes14010186.

Full text

Abstract:

For virus classification and tracing, one idea is to generate minimal models from the gene sequences of each virus group for comparative analysis within and between classes, as well as classification and tracing of new sequences. The starting point of defining a minimal model for a group of gene sequences is to find their longest common sequence (LCS), but this is a non-deterministic polynomial-time hard (NP-hard) problem. Therefore, we applied some heuristic approaches of finding LCS, as well as some of the newer methods of treating gene sequences, including multiple sequence alignment (MSA) and k-mer natural vector (NV) encoding. To evaluate our algorithms, a five-fold cross validation classification scheme on a dataset of H1N1 virus non-structural protein 1 (NS1) gene was analyzed. The results indicate that the MSA-based algorithm has the best performance measured by classification accuracy, while the NV-based algorithm exhibits advantages in the time complexity of generating minimal models.

APA, Harvard, Vancouver, ISO, and other styles

17

PENG, YUNG-HSING, CHANG-BIAU YANG, KUO-TSUNG TSENG, and KUO-SI HUANG. "AN ALGORITHM AND APPLICATIONS TO SEQUENCE ALIGNMENT WITH WEIGHTED CONSTRAINTS." International Journal of Foundations of Computer Science 21, no. 01 (February 2010): 51–59. http://dx.doi.org/10.1142/s012905411000712x.

Full text

Abstract:

Given two sequences S1, S2, and a constrained sequence C, a longest common subsequence of S1, S2 with restriction to C is called a constrained longest common subsequence of S1 and S2 with C. At the same time, an optimal alignment of S1, S2 with restriction to C is called a constrained pairwise sequence alignment of S1 and S2 with C. Previous algorithms have shown that the constrained longest common subsequence problem is a special case of the constrained pairwise sequence alignment problem, and that both of them can be solved in O(rnm) time, where r, n, and m represent the lengths of C, S1, and S2, respectively. In this paper, we extend the definition of constrained pairwise sequence alignment to a more flexible version, called weighted constrained pairwise sequence alignment, in which some constraints might be ignored. We first give an O(rnm)-time algorithm for solving the weighted constrained pairwise sequence alignment problem, then show that our extension can be adopted to solve some constraint-related problems that cannot be solved by previous algorithms for the constrained longest common subsequence problem or the constrained pairwise sequence alignment problem. Therefore, in contrast to previous results, our extension is a new and suitable model for sequence analysis.

APA, Harvard, Vancouver, ISO, and other styles

18

GOTOH, O. "Multiple sequence alignment: Algorithms and applications." Advances in Biophysics 36 (1999): 159–206. http://dx.doi.org/10.1016/s0065-227x(99)80007-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Bafna, Vineet, Eugene L. Lawler, and Pavel A. Pevzner. "Approximation algorithms for multiple sequence alignment." Theoretical Computer Science 182, no. 1-2 (August 1997): 233–44. http://dx.doi.org/10.1016/s0304-3975(97)00023-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Pinhas, Tamar, Nimrod Milo, Gregory Kucherov, and Michal Ziv-Ukelson. "Algorithms for path-constrained sequence alignment." Journal of Discrete Algorithms 24 (January 2014): 48–58. http://dx.doi.org/10.1016/j.jda.2013.09.003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Waterman, Michael S. "Parametric and ensemble sequence alignment algorithms." Bulletin of Mathematical Biology 56, no. 4 (July 1994): 743–67. http://dx.doi.org/10.1007/bf02460719.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

WATERMAN, M. "Parametric and ensemble sequence alignment algorithms." Bulletin of Mathematical Biology 56, no. 4 (July 1994): 743–67. http://dx.doi.org/10.1016/s0092-8240(05)80311-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Jiang, Yihang, Yuankai Qi, Will Ke Wang, Brinnae Bent, Robert Avram, Jeffrey Olgin, and Jessilyn Dunn. "EventDTW: An Improved Dynamic Time Warping Algorithm for Aligning Biomedical Signals of Nonuniform Sampling Frequencies." Sensors 20, no. 9 (May 9, 2020): 2700. http://dx.doi.org/10.3390/s20092700.

Full text

Abstract:

The dynamic time warping (DTW) algorithm is widely used in pattern matching and sequence alignment tasks, including speech recognition and time series clustering. However, DTW algorithms perform poorly when aligning sequences of uneven sampling frequencies. This makes it difficult to apply DTW to practical problems, such as aligning signals that are recorded simultaneously by sensors with different, uneven, and dynamic sampling frequencies. As multi-modal sensing technologies become increasingly popular, it is necessary to develop methods for high quality alignment of such signals. Here we propose a DTW algorithm called EventDTW which uses information propagated from defined events as basis for path matching and hence sequence alignment. We have developed two metrics, the error rate (ER) and the singularity score (SS), to define and evaluate alignment quality and to enable comparison of performance across DTW algorithms. We demonstrate the utility of these metrics on 84 publicly-available signals in addition to our own multi-modal biomedical signals. EventDTW outperformed existing DTW algorithms for optimal alignment of signals with different sampling frequencies in 37% of artificial signal alignment tasks and 76% of real-world signal alignment tasks.

APA, Harvard, Vancouver, ISO, and other styles

24

Wojciechowski, Pawel, Wojciech Frohmberg, Michal Kierzynka, Piotr Zurkowski, and Jacek Blazewicz. "G-MAPSEQ – a new method for mapping reads to a reference genome." Foundations of Computing and Decision Sciences 41, no. 2 (June 1, 2016): 123–42. http://dx.doi.org/10.1515/fcds-2016-0007.

Full text

Abstract:

AbstractThe problem of reads mapping to a reference genome is one of the most essential problems in modern computational biology. The most popular algorithms used to solve this problem are based on the Burrows-Wheeler transform and the FM-index. However, this causes some issues with highly mutated sequences due to a limited number of mutations allowed. G-MAPSEQ is a novel, hybrid algorithm combining two interesting methods: alignment-free sequence comparison and an ultra fast sequence alignment. The former is a fast heuristic algorithm which uses k-mer characteristics of nucleotide sequences to find potential mapping places. The latter is a very fast GPU implementation of sequence alignment used to verify the correctness of these mapping positions. The source code of G-MAPSEQ along with other bioinformatic software is available at: http://gpualign.cs.put.poznan.pl.

APA, Harvard, Vancouver, ISO, and other styles

25

Alser, Mohammed, Hasan Hassan, Akash Kumar, Onur Mutlu, and Can Alkan. "Shouji: a fast and efficient pre-alignment filter for sequence alignment." Bioinformatics 35, no. 21 (March 28, 2019): 4255–63. http://dx.doi.org/10.1093/bioinformatics/btz234.

Full text

Abstract:

AbstractMotivationThe ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern field-programmable gate array (FPGA) architectures to further boost the performance of our algorithm.ResultsShouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8×. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step.Availability and implementationhttps://github.com/CMU-SAFARI/Shouji.Supplementary informationSupplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

26

Daugelaite, Jurate, Aisling O' Driscoll, and Roy D. Sleator. "An Overview of Multiple Sequence Alignments and Cloud Computing in Bioinformatics." ISRN Biomathematics 2013 (August 14, 2013): 1–14. http://dx.doi.org/10.1155/2013/615630.

Full text

Abstract:

Multiple sequence alignment (MSA) of DNA, RNA, and protein sequences is one of the most essential techniques in the fields of molecular biology, computational biology, and bioinformatics. Next-generation sequencing technologies are changing the biology landscape, flooding the databases with massive amounts of raw sequence data. MSA of ever-increasing sequence data sets is becoming a significant bottleneck. In order to realise the promise of MSA for large-scale sequence data sets, it is necessary for existing MSA algorithms to be run in a parallelised fashion with the sequence data distributed over a computing cluster or server farm. Combining MSA algorithms with cloud computing technologies is therefore likely to improve the speed, quality, and capability for MSA to handle large numbers of sequences. In this review, multiple sequence alignments are discussed, with a specific focus on the ClustalW and Clustal Omega algorithms. Cloud computing technologies and concepts are outlined, and the next generation of cloud base MSA algorithms is introduced.

APA, Harvard, Vancouver, ISO, and other styles

27

WANG, ZHUOZHI, and KAIZHONG ZHANG. "MULTIPLE RNA STRUCTURE ALIGNMENT." Journal of Bioinformatics and Computational Biology 03, no. 03 (June 2005): 609–26. http://dx.doi.org/10.1142/s0219720005001296.

Full text

Abstract:

Ribonucleic Acid (RNA) structures can be viewed as a special kind of strings where characters in a string can bond with each other. The question of aligning two RNA structures has been studied for a while, and there are several successful algorithms that are based upon different models. In this paper, by adopting the model introduced in Wang and Zhang,19 we propose two algorithms to attack the question of aligning multiple RNA structures. Our methods are to reduce the multiple RNA structure alignment problem to the problem of aligning two RNA structure alignments. Meanwhile, we will show that the framework of sequence center star alignment algorithm can be applied to the problem of multiple RNA structure alignment, and if the triangle inequality is met in the scoring matrix, the approximation ratio of the algorithm remains to be [Formula: see text], where n is the total number of structures.

APA, Harvard, Vancouver, ISO, and other styles

28

Spirollari, Junilda, Jason T. L. Wang, Kaizhong Zhang, Vivian Bellofatto, Yongkyu Park, and Bruce A. Shapiro. "Predicting Consensus Structures for RNA Alignments via Pseudo-Energy Minimization." Bioinformatics and Biology Insights 3 (January 2009): BBI.S2578. http://dx.doi.org/10.4137/bbi.s2578.

Full text

Abstract:

Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http://datalab.njit.edu/biology/RSpredict .

APA, Harvard, Vancouver, ISO, and other styles

29

Chao, Jiannan, Furong Tang, and Lei Xu. "Developments in Algorithms for Sequence Alignment: A Review." Biomolecules 12, no. 4 (April 6, 2022): 546. http://dx.doi.org/10.3390/biom12040546.

Full text

Abstract:

The continuous development of sequencing technologies has enabled researchers to obtain large amounts of biological sequence data, and this has resulted in increasing demands for software that can perform sequence alignment fast and accurately. A number of algorithms and tools for sequence alignment have been designed to meet the various needs of biologists. Here, the ideas that prevail in the research of sequence alignment and some quality estimation methods for multiple sequence alignment tools are summarized.

APA, Harvard, Vancouver, ISO, and other styles

30

Wilburn, Grey W., and Sean R. Eddy. "Remote homology search with hidden Potts models." PLOS Computational Biology 16, no. 11 (November 30, 2020): e1008085. http://dx.doi.org/10.1371/journal.pcbi.1008085.

Full text

Abstract:

Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.

APA, Harvard, Vancouver, ISO, and other styles

31

Sauder, J. Michael, Jonathan W. Arthur, and Roland L. Dunbrack Jr. "Large-scale comparison of protein sequence alignment algorithms with structure alignments." Proteins: Structure, Function, and Genetics 40, no. 1 (July 1, 2000): 6–22. http://dx.doi.org/10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Paten, B., D. Earl, N. Nguyen, M. Diekhans, D. Zerbino, and D. Haussler. "Cactus: Algorithms for genome multiple sequence alignment." Genome Research 21, no. 9 (June 10, 2011): 1512–28. http://dx.doi.org/10.1101/gr.123356.111.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Shahab, Muhammad Luthfi, and Mohammad Isa Irawan. "Sequence Alignment Using Nature-Inspired Metaheuristic Algorithms." International Journal of Computing Science and Applied Mathematics 3, no. 1 (March 1, 2017): 27. http://dx.doi.org/10.12962/j24775401.v3i1.2118.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Notredame, Cédric. "Recent Evolutions of Multiple Sequence Alignment Algorithms." PLoS Computational Biology 3, no. 8 (August 31, 2007): e123. http://dx.doi.org/10.1371/journal.pcbi.0030123.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Rangwala, H., and G. Karypis. "Incremental window-based protein sequence alignment algorithms." Bioinformatics 23, no. 2 (January 15, 2007): e17-e23. http://dx.doi.org/10.1093/bioinformatics/btl297.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Catanach, Therese A., Andrew D. Sweet, Nam-phuong D. Nguyen, Rhiannon M. Peery, Andrew H. Debevec, Andrea K. Thomer, Amanda C. Owings, et al. "Fully automated sequence alignment methods are comparable to, and much faster than, traditional methods in large data sets: an example with hepatitis B virus." PeerJ 7 (January 3, 2019): e6142. http://dx.doi.org/10.7717/peerj.6142.

Full text

Abstract:

Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e., a combination of full genomes and genomic fragments), which can compound the computational issues related to MSA. Traditionally, alignments are produced with automated algorithms and then checked and/or corrected “by eye” prior to phylogenetic inference. However, this manual curation is inefficient at the data scales required of modern phylogenetics and results in alignments that are not reproducible. Recently, methods have been developed for fully automating alignments of large data sets, but it is unclear if these methods produce alignments that result in compatible phylogenies when compared to more traditional alignment approaches that combined automated and manual methods. Here we use approximately 33,000 publicly available sequences from the hepatitis B virus (HBV), a globally distributed and rapidly evolving virus, to compare different alignment approaches. Using one data set comprised exclusively of whole genomes and a second that also included sequence fragments, we compared three MSA methods: (1) a purely automated approach using traditional software, (2) an automated approach including by eye manual editing, and (3) more recent fully automated approaches. To understand how these methods affect phylogenetic results, we compared resulting tree topologies based on these different alignment methods using multiple metrics. We further determined if the monophyly of existing HBV genotypes was supported in phylogenies estimated from each alignment type and under different statistical support thresholds. Traditional and fully automated alignments produced similar HBV phylogenies. Although there was variability between branch support thresholds, allowing lower support thresholds tended to result in more differences among trees. Therefore, differences between the trees could be best explained by phylogenetic uncertainty unrelated to the MSA method used. Nevertheless, automated alignment approaches did not require human intervention and were therefore considerably less time-intensive than traditional approaches. Because of this, we conclude that fully automated algorithms for MSA are fully compatible with older methods even in extremely difficult to align data sets. Additionally, we found that most HBV diagnostic genotypes did not correspond to evolutionarily-sound groups, regardless of alignment type and support threshold. This suggests there may be errors in genotype classification in the database or that HBV genotypes may need a revision.

APA, Harvard, Vancouver, ISO, and other styles

37

Schroedl, S. "An Improved Search Algorithm for Optimal Multiple-Sequence Alignment." Journal of Artificial Intelligence Research 23 (May 1, 2005): 587–623. http://dx.doi.org/10.1613/jair.1534.

Full text

Abstract:

Multiple sequence alignment (MSA) is a ubiquitous problem in computational biology. Although it is NP-hard to find an optimal solution for an arbitrary number of sequences, due to the importance of this problem researchers are trying to push the limits of exact algorithms further. Since MSA can be cast as a classical path finding problem, it is attracting a growing number of AI researchers interested in heuristic search algorithms as a challenge with actual practical relevance. In this paper, we first review two previous, complementary lines of research. Based on Hirschberg's algorithm, Dynamic Programming needs O(kN^(k-1)) space to store both the search frontier and the nodes needed to reconstruct the solution path, for k sequences of length N. Best first search, on the other hand, has the advantage of bounding the search space that has to be explored using a heuristic. However, it is necessary to maintain all explored nodes up to the final solution in order to prevent the search from re-expanding them at higher cost. Earlier approaches to reduce the Closed list are either incompatible with pruning methods for the Open list, or must retain at least the boundary of the Closed list. In this article, we present an algorithm that attempts at combining the respective advantages; like A* it uses a heuristic for pruning the search space, but reduces both the maximum Open and Closed size to O(kN^(k-1)), as in Dynamic Programming. The underlying idea is to conduct a series of searches with successively increasing upper bounds, but using the DP ordering as the key for the Open priority queue. With a suitable choice of thresholds, in practice, a running time below four times that of A* can be expected. In our experiments we show that our algorithm outperforms one of the currently most successful algorithms for optimal multiple sequence alignments, Partial Expansion A*, both in time and memory. Moreover, we apply a refined heuristic based on optimal alignments not only of pairs of sequences, but of larger subsets. This idea is not new; however, to make it practically relevant we show that it is equally important to bound the heuristic computation appropriately, or the overhead can obliterate any possible gain. Furthermore, we discuss a number of improvements in time and space efficiency with regard to practical implementations. Our algorithm, used in conjunction with higher-dimensional heuristics, is able to calculate for the first time the optimal alignment for almost all of the problems in Reference 1 of the benchmark database BAliBASE.

APA, Harvard, Vancouver, ISO, and other styles

38

Kostenko, Dimitrii O., and Eugene V. Korotkov. "Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences." International Journal of Molecular Sciences 23, no. 7 (March 29, 2022): 3764. http://dx.doi.org/10.3390/ijms23073764.

Full text

Abstract:

The aim of this work was to compare the multiple alignment methods MAHDS, T-Coffee, MUSCLE, Clustal Omega, Kalign, MAFFT, and PRANK in their ability to align highly divergent amino acid sequences. To accomplish this, we created test amino acid sequences with an average number of substitutions per amino acid (x) from 0.6 to 5.6, a total of 81 sets. Comparison of the performance of sequence alignments constructed by MAHDS and previously developed algorithms using the CS and Z score criteria and the benchmark alignment database (BAliBASE) indicated that, although the quality of the alignments built with MAHDS was somewhat lower than that of the other algorithms, it was compensated by greater statistical significance. MAHDS could construct statistically significant alignments of artificial sequences with x ≤ 4.8, whereas the other algorithms (T-Coffee, MUSCLE, Clustal Omega, Kalign, MAFFT, and PRANK) could not perform that at x > 2.4. The application of MAHDS to align 21 families of highly diverged proteins (identity < 20%) from Pfam and HOMSTRAD databases showed that it could calculate statistically significant alignments in cases when the other methods failed. Thus, MAHDS could be used to construct statistically significant multiple alignments of highly divergent protein sequences, which accumulated multiple mutations during evolution.

APA, Harvard, Vancouver, ISO, and other styles

39

Kalinin, M., and V. Krundyshev. "Sequence Alignment Algorithms for Intrusion Detection in the Internet of Things." Nonlinear Phenomena in Complex Systems 23, no. 4 (December 4, 2020): 397–404. http://dx.doi.org/10.33581/1561-4085-2020-23-4-397-404.

Full text

Abstract:

The paper reviews the intrusion detection approach based on bioinformatics algorithms for alignment and comparing of the nucleotide sequences. Sequence alignment is a natureclose computational procedure for matching the coded strings by searching for the regions of individual characteristics that are located in the same order. A calculated rank of similarity is used instead of equity checking to estimate the distance between a sequence of the monitored operational acts and a generalized intrusion pattern. Multiple alignment schema is more effective and accurate than the Smith–Waterman local alignment due to ability to find few blocks of similarity. In comparison with a traditional signature-based IDS, it is found that the nature-inspired approach provides the better work characteristics. The experimental study have shown that new approach demonstrates high, 99 percent, level of accuracy.

APA, Harvard, Vancouver, ISO, and other styles

40

Zhan, Qing, Yilei Fu, Qinghua Jiang, Bo Liu, Jiajie Peng, and Yadong Wang. "SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically." Protein & Peptide Letters 27, no. 4 (March 17, 2020): 295–302. http://dx.doi.org/10.2174/0929866526666190806143959.

Full text

Abstract:

Background: Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy. Objective: In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically. Method: Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs. Results: We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools. Conclusion: The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.

APA, Harvard, Vancouver, ISO, and other styles

41

Silva, Fernando José Mateus da, Juan Manuel Sánchez Pérez, Juan Antonio Gómez Pulido, and Miguel A. Vega Rodríguez. "Parallel Niche Pareto AlineaGA – an Evolutionary Multiobjective approach on Multiple Sequence Alignment." Journal of Integrative Bioinformatics 8, no. 3 (December 1, 2011): 57–72. http://dx.doi.org/10.1515/jib-2011-174.

Full text

Abstract:

Summary Multiple sequence alignment is one of the most recurrent assignments in Bioinformatics. This method allows organizing a set of molecular sequences in order to expose their similarities and their differences. Although exact methods exist for solving this problem, their use is limited by the computing demands which are necessary for exploring such a large and complex search space. Genetic Algorithms are adaptive search methods which perform well in large and complex spaces. Parallel Genetic Algorithms, not only increase the speed up of the search, but also improve its efficiency, presenting results that are better than those provided by the sum of several sequential Genetic Algorithms. Although these methods are often used to optimize a single objective, they can also be used in multidimensional domains, finding all possible tradeoffs among multiple conflicting objectives. Parallel AlineaGA is an Evolutionary Algorithm which uses a Parallel Genetic Algorithm for performing multiple sequence alignment. We now present the Parallel Niche Pareto AlineaGA, a multiobjective version of Parallel AlineaGA.We compare the performance of both versions using eight BAliBASE datasets. We also measure up the quality of the obtained solutions with the ones achieved by T-Coffee and ClustalW2, allowing us to observe that our algorithm reaches for better solutions in the majority of the datasets.

APA, Harvard, Vancouver, ISO, and other styles

42

Gupta, Ruchi, and Pankaj Agarwal. "SOGA: space oriented genetic algorithm for multiple sequence alignment." International Journal of Engineering & Technology 7, no. 4.5 (September 22, 2018): 481. http://dx.doi.org/10.14419/ijet.v7i4.5.21138.

Full text

Abstract:

Multiple sequence alignment is one of the recurrent assignments in Bioinformatics. This method allows organizing a set of molecular sequences in order to expose their similarities and their differences. Although several applicable techniques were observed in this re- search, from traditional method such as dynamic programming to the extent of widely used stochastic optimization method such as Simu- lated Annealing and motif finding for solving this problem, their use is limited by the computing demands which are necessary for ex- ploring such a large and complex search space. This paper presents a new genetic algorithm, namely SOGA (Space Oriented Genetic Algorithm for Multiple Sequence Alignment), which has two new mechanisms: the first generates the population with randomly inserting the space between the selected sequences and the second applying new crossover and mutation operator, within an iterative process, to generate new and better solutions. This method is simple and fast. Its performance will further be tested on standard benchmark databas- es and will be compared with well-known algorithms. However, as its solutions clears that there is scope for further improvement.

APA, Harvard, Vancouver, ISO, and other styles

43

Song, Yinglei, Chunmei Liu, Xiuzhen Huang, Russell Malmberg, Ying Xu, and Liming Cai. "Efficient Parameterized Algorithms for Biopolymer Structure-Sequence Alignment." IEEE/ACM Transactions on Computational Biology and Bioinformatics 3, no. 4 (October 2006): 423–32. http://dx.doi.org/10.1109/tcbb.2006.52.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Liu Weiguo, B. Schmidt, G. Voss, and W. Muller-Wittig. "Streaming Algorithms for Biological Sequence Alignment on GPUs." IEEE Transactions on Parallel and Distributed Systems 18, no. 9 (September 2007): 1270–81. http://dx.doi.org/10.1109/tpds.2007.1069.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Chung, Yun-Sheng, Chin Lung Lu, and Chuan Yi Tang. "Efficient algorithms for regular expression constrained sequence alignment." Information Processing Letters 103, no. 6 (September 2007): 240–46. http://dx.doi.org/10.1016/j.ipl.2007.04.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Le Thi, Hoai An, Tao Pham Dinh, and Moulay Belghiti. "DCA based algorithms for multiple sequence alignment (MSA)." Central European Journal of Operations Research 22, no. 3 (September 1, 2013): 501–24. http://dx.doi.org/10.1007/s10100-013-0324-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Ben Othman, Mohamed Tahar. "Survey of the use of genetic algorithm for multiple sequence alignment." Journal of Advanced Computer Science & Technology 5, no. 2 (June 8, 2016): 28. http://dx.doi.org/10.14419/jacst.v5i2.6079.

Full text

Abstract:

Multiple Sequence Alignment (MSA) is used in genomic analysis, such as the identification of conserved sequence motifs, the estimation of evolutionary divergence between sequences, and the genes’ historical relationships inference. Several researches were conducted to determine the level of similarity of a set of sequences. Due to the problem of the NP-complete class property, a number of researches use genetic algorithms (GA) to find a solution to the multiple sequence alignment. However, the nature of genetic algorithms makes the complexity extremely high due to the redundancy provided by the different operators. The aim of this paper is to study some proposed GA solutions provided for MSA and to compare them using some criteria which we believe any solution should comply with in matters of representativeness, closeness and original sequence invariance.

APA, Harvard, Vancouver, ISO, and other styles

48

Ho, Jiacang, and Dae-Ki Kang. "Sequence Alignment with Dynamic Divisor Generation for Keystroke Dynamics Based User Authentication." Journal of Sensors 2015 (2015): 1–14. http://dx.doi.org/10.1155/2015/935986.

Full text

Abstract:

Keystroke dynamics based authentication is one of the prevention mechanisms used to protect one’s account from criminals’ illegal access. In this authentication mechanism, keystroke dynamics are used to capture patterns in a user typing behavior. Sequence alignment is shown to be one of effective algorithms for keystroke dynamics based authentication, by comparing the sequences of keystroke data to detect imposter’s anomalous sequences. In previous research, static divisor has been used for sequence generation from the keystroke data, which is a number used to divide a time difference of keystroke data into an equal-length subinterval. After the division, the subintervals are mapped to alphabet letters to form sequences. One major drawback of this static divisor is that the amount of data for this subinterval generation is often insufficient, which leads to premature termination of subinterval generation and consequently causes inaccurate sequence alignment. To alleviate this problem, we introduce sequence alignment of dynamic divisor (SADD) in this paper. In SADD, we use mean of Horner’s rule technique to generate dynamic divisors and apply them to produce the subintervals with different length. The comparative experimental results with SADD and other existing algorithms indicate that SADD is usually comparable to and often outperforms other existing algorithms.

APA, Harvard, Vancouver, ISO, and other styles

49

Lipták, Panna, Attila Kiss, and János Márk Szalai-Gindl. "Heuristic Pairwise Alignment in Database Environments." Genes 13, no. 11 (November 2, 2022): 2005. http://dx.doi.org/10.3390/genes13112005.

Full text

Abstract:

Biological data have gained wider recognition during the last few years, although managing and processing these data in an efficient way remains a challenge in many areas. Increasingly, more DNA sequence databases can be accessed; however, most algorithms on these sequences are performed outside of the database with different bioinformatics software. In this article, we propose a novel approach for the comparative analysis of sequences, thereby defining heuristic pairwise alignment inside the database environment. This method takes advantage of the benefits provided by the database management system and presents a way to exploit similarities in data sets to quicken the alignment algorithm. We work with the column-oriented MonetDB, and we further discuss the key benefits of this database system in relation to our proposed heuristic approach.

APA, Harvard, Vancouver, ISO, and other styles

50

Korotkov, Eugene V., Yulia M. Suvorova, Dmitrii O. Kostenko, and Maria A. Korotkova. "Multiple Alignment of Promoter Sequences from the Arabidopsis thaliana L. Genome." Genes 12, no. 2 (January 21, 2021): 135. http://dx.doi.org/10.3390/genes12020135.

Full text

Abstract:

In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (x). We generated sets of artificial DNA sequences with x ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existing methods could produce statistically significant alignments only for the sets with x < 2.5, whereas MAHDS could operate on sequences with x = 4.4. We also used MAHDS to analyze a set of promoter sequences from the Arabidopsis thaliana genome and discovered many conserved regions upstream of the transcription initiation site (from −499 to +1 bp); a part of the downstream region (from +1 to +70 bp) also significantly contributed to the obtained alignments. The possibilities of applying the newly developed method for the identification of promoter sequences in any genome are discussed. A server for multiple alignment of nucleotide sequences has been created.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!