Journal articles on the topic 'Sequence alignment'

To see the other types of publications on this topic, follow the link: Sequence alignment.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Sequence alignment.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Staritzbichler, René, Edoardo Sarti, Emily Yaklich, Antoniya Aleksandrova, Marcus Stamm, Kamil Khafizov, and Lucy R. Forrest. "Refining pairwise sequence alignments of membrane proteins by the incorporation of anchors." PLOS ONE 16, no. 4 (April 30, 2021): e0239881. http://dx.doi.org/10.1371/journal.pone.0239881.

Full text
Abstract:
The alignment of primary sequences is a fundamental step in the analysis of protein structure, function, and evolution, and in the generation of homology-based models. Integral membrane proteins pose a significant challenge for such sequence alignment approaches, because their evolutionary relationships can be very remote, and because a high content of hydrophobic amino acids reduces their complexity. Frequently, biochemical or biophysical data is available that informs the optimum alignment, for example, indicating specific positions that share common functional or structural roles. Currently, if those positions are not correctly matched by a standard pairwise sequence alignment procedure, the incorporation of such information into the alignment is typically addressed in an ad hoc manner, with manual adjustments. However, such modifications are problematic because they reduce the robustness and reproducibility of the aligned regions either side of the newly matched positions. Previous studies have introduced restraints as a means to impose the matching of positions during sequence alignments, originally in the context of genome assembly. Here we introduce position restraints, or “anchors” as a feature in our alignment tool AlignMe, providing an aid to pairwise global sequence alignment of alpha-helical membrane proteins. Applying this approach to realistic scenarios involving distantly-related and low complexity sequences, we illustrate how the addition of anchors can be used to modify alignments, while still maintaining the reproducibility and rigor of the rest of the alignment. Anchored alignments can be generated using the online version of AlignMe available at www.bioinfo.mpg.de/AlignMe/.
APA, Harvard, Vancouver, ISO, and other styles
2

Pervez, Muhammad Tariq, Hayat Ali Shah, Masroor Ellahi Babar, Nasir Naveed, and Muhammad Shoaib. "SAliBASE: A Database of Simulated Protein Alignments." Evolutionary Bioinformatics 15 (January 2019): 117693431882108. http://dx.doi.org/10.1177/1176934318821080.

Full text
Abstract:
Simulated alignments are alternatives to manually constructed multiple sequence alignments for evaluating performance of multiple sequence alignment tools. The importance of simulated sequences is recognized because their true evolutionary history is known, which is very helpful for reconstructing accurate phylogenetic trees and alignments. However, generating simulated alignments require expertise to use bioinformatics tools and consume several hours for reconstructing even a few hundreds of simulated sequences. It becomes a tedious job for an end user who needs a few datasets of variety of simulated sequences. Currently, there is no databank available which may help researchers to download simulated sequences/alignments for their study. Major focus of our study was to develop a database of simulated protein sequences (SAliBASE) based on different varying parameters such as insertion rate, deletion rate, sequence length, number of sequences, and indel size. Each dataset has corresponding alignment as well. This repository is very useful for evaluating multiple alignment methods.
APA, Harvard, Vancouver, ISO, and other styles
3

Martin, Andrew C. R. "Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)." F1000Research 3 (October 23, 2014): 249. http://dx.doi.org/10.12688/f1000research.5486.1.

Full text
Abstract:
The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.
APA, Harvard, Vancouver, ISO, and other styles
4

Arenas-Díaz, Edgar D., Helga Ochoterena, and Katya Rodríguez-Vázquez. "Multiple Sequence Alignment Using a Genetic Algorithm and GLOCSA." Journal of Artificial Evolution and Applications 2009 (August 27, 2009): 1–10. http://dx.doi.org/10.1155/2009/963150.

Full text
Abstract:
Algorithms that minimize putative synapomorphy in an alignment cannot be directly implemented since trivial cases with concatenated sequences would be selected because they would imply a minimum number of events to be explained (e.g., a single insertion/deletion would be required to explain divergence among two sequences). Therefore, indirect measures to approach parsimony need to be implemented. In this paper, we thoroughly present a Global Criterion for Sequence Alignment (GLOCSA) that uses a scoring function to globally rate multiple alignments aiming to produce matrices that minimize the number of putative synapomorphies. We also present a Genetic Algorithm that uses GLOCSA as the objective function to produce sequence alignments refining alignments previously generated by additional existing alignment tools (we recommend MUSCLE). We show that in the example cases our GLOCSA-guided Genetic Algorithm (GGGA) does improve the GLOCSA values, resulting in alignments that imply less putative synapomorphies.
APA, Harvard, Vancouver, ISO, and other styles
5

Aadland, Kelsey, and Bryan Kolaczkowski. "Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy." Genome Biology and Evolution 12, no. 9 (August 12, 2020): 1549–65. http://dx.doi.org/10.1093/gbe/evaa164.

Full text
Abstract:
Abstract Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, whereas phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here, we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions.
APA, Harvard, Vancouver, ISO, and other styles
6

WANG, YI, and KUO-BIN LI. "MULTIPLE SEQUENCE ALIGNMENT USING AN EXHAUSTIVE AND GREEDY ALGORITHM." Journal of Bioinformatics and Computational Biology 03, no. 02 (April 2005): 243–55. http://dx.doi.org/10.1142/s021972000500103x.

Full text
Abstract:
We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.
APA, Harvard, Vancouver, ISO, and other styles
7

Prerna, Prerna, Pankaj Bhambri, and Dr O. P. Gupta Dr. O.P. Gupta. "Multiple Sequence Alignment of Different Species." Indian Journal of Applied Research 1, no. 7 (October 1, 2011): 78–82. http://dx.doi.org/10.15373/2249555x/apr2012/24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

FÜRER, MARTIN, and WEBB MILLER. "ALIGNMENT-TO-ALIGNMENT EDITING WITH “MOVE GAP” OPERATIONS." International Journal of Foundations of Computer Science 07, no. 01 (March 1996): 23–41. http://dx.doi.org/10.1142/s012905419600004x.

Full text
Abstract:
An alignment of k given sequences is a k-rowed matrix frequently used by molecular biologists to display correspondences between entries from each sequence. Under one approach, an alignment is represented by a matrix of ‘x’ and ’-’ characters, where each x in row r indicates the position of an entry of sequence r. It is sometimes efficient to store only the run-length encoding of each row of this bit-matrix. A natural class of commands for editing one such row into another consists of operations of the form: “Move the d dashes that begin at position i of row r to position j of that row,” for relevant values of r, d, i and j. We show that the problem of determining a shortest sequence of such operations that converts one given alignment to another is NP-hard and give a polynomial-time algorithm that always comes within a factor 5/4 of optimality. An application of these ideas to alignments of long DNA sequences is discussed.
APA, Harvard, Vancouver, ISO, and other styles
9

Ji, Guoli, Yong Zeng, Zijiang Yang, Congting Ye, and Jingci Yao. "A multiple sequence alignment method with sequence vectorization." Engineering Computations 31, no. 2 (February 25, 2014): 283–96. http://dx.doi.org/10.1108/ec-01-2013-0026.

Full text
Abstract:
Purpose – The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) (N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value – This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research.
APA, Harvard, Vancouver, ISO, and other styles
10

Tu, Shin-Lin, Jeannette Staheli, Colum McClay, Kathleen McLeod, Timothy Rose, and Chris Upton. "Base-By-Base Version 3: New Comparative Tools for Large Virus Genomes." Viruses 10, no. 11 (November 15, 2018): 637. http://dx.doi.org/10.3390/v10110637.

Full text
Abstract:
Base-By-Base is a comprehensive tool for the creation and editing of multiple sequence alignments that is coded in Java and runs on multiple platforms. It can be used with gene and protein sequences as well as with large viral genomes, which themselves can contain gene annotations. This report describes new features added to Base-By-Base over the last 7 years. The two most significant additions are: (1) The recoding and inclusion of “consensus-degenerate hybrid oligonucleotide primers” (CODEHOP), a popular tool for the design of degenerate primers from a multiple sequence alignment of proteins; and (2) the ability to perform fuzzy searches within the columns of sequence data in multiple sequence alignments to determine the distribution of sequence variants among the sequences. The intuitive interface focuses on the presentation of results in easily understood visualizations and providing the ability to annotate the sequences in a multiple alignment with analytic and user data.
APA, Harvard, Vancouver, ISO, and other styles
11

Piña, Johan S., Simon Orozco-Arias, Nicolas Tobón-Orozco, Leonardo Camargo-Forero, Reinel Tabares-Soto, and Romain Guyot. "G-SAIP: Graphical Sequence Alignment Through Parallel Programming in the Post-Genomic Era." Evolutionary Bioinformatics 19 (January 2023): 117693432211505. http://dx.doi.org/10.1177/11769343221150585.

Full text
Abstract:
A common task in bioinformatics is to compare DNA sequences to identify similarities between organisms at the sequence level. An approach to such comparison is the dot-plots, a 2-dimensional graphical representation to analyze DNA or protein alignments. Dot-plots alignment software existed before the sequencing revolution, and now there is an ongoing limitation when dealing with large-size sequences, resulting in very long execution times. High-Performance Computing (HPC) techniques have been successfully used in many applications to reduce computing times, but so far, very few applications for graphical sequence alignment using HPC have been reported. Here, we present G-SAIP (Graphical Sequence Alignment in Parallel), a software capable of spawning multiple distributed processes on CPUs, over a supercomputing infrastructure to speed up the execution time for dot-plot generation up to 1.68× compared with other current fastest tools, improve the efficiency for comparative structural genomic analysis, phylogenetics because the benefits of pairwise alignments for comparison between genomes, repetitive structure identification, and assembly quality checking.
APA, Harvard, Vancouver, ISO, and other styles
12

Shu, Jian-Jun, Kian Yan Yong, and Weng Kong Chan. "An Improved Scoring Matrix for Multiple Sequence Alignment." Mathematical Problems in Engineering 2012 (2012): 1–9. http://dx.doi.org/10.1155/2012/490649.

Full text
Abstract:
The way for performing multiple sequence alignment is based on the criterion of the maximum-scored information content computed from a weight matrix, but it is possible to have two or more alignments to have the same highest score leading to ambiguities in selecting the best alignment. This paper addresses this issue by introducing the concept of joint weight matrix to eliminate the randomness in selecting the best multiple sequence alignment. Alignments with equal scores are iteratively rescored with the joint weight matrix of increasing level (nucleotide pairs, triplets, and so on) until one single best alignment is eventually found. This method for resolving ambiguity in multiple sequence alignment can be easily implemented by use of the improved scoring matrix.
APA, Harvard, Vancouver, ISO, and other styles
13

Keith, Jonathan M., Peter Adams, Darryn Bryant, Keith R. Mitchelson, Duncan A. E. Cochran, and Gita H. Lala. "Inferring an Original Sequence from Erroneous Copies: Two Approaches." Asia-Pacific Biotech News 07, no. 03 (February 3, 2003): 107–14. http://dx.doi.org/10.1142/s0219030303000284.

Full text
Abstract:
This paper considers the problem of inferring an original sequence from a number of erroneous copies. The problem arises in DNA sequencing, particularly in the context of emerging technologies that provide high throughput or other advantages at the cost of an increased number of errors. We describe and compare two approaches that have recently been developed by the authors. The first approach searches for a sequence known as a Steiner string; the second searches for the most probable original sequence with respect to a simple Bayesian model of sequencing errors. We present the results of extensive tests in which erroneous copies of real DNA sequences were simulated and the algorithms were used to infer the original sequences. The results are used to compare the two approaches to each other and to a third, more conventional, approach based on multiple sequence alignment. We find that the Bayesian approach is superior to the Steiner approach, which in turn is superior to the alignment approach. The two new algorithms can also be used to construct multiple sequence alignments. We show that the two methods produce alignments of approximately equal quality, and conclude that the Steiner approach is better for this purpose because it is faster. Both methods produce better alignments than a well-known multiple sequence alignment package, for the cases tested.
APA, Harvard, Vancouver, ISO, and other styles
14

Lu, Yue, and Sing-Hoi Sze. "Multiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences." Journal of Computational Biology 15, no. 7 (September 2008): 767–77. http://dx.doi.org/10.1089/cmb.2007.0132.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Ji, Guo Li, Jing Ci Yao, Zi Jiang Yang, and Cong Ting Ye. "LemK_MSA: A Multiple Sequence Alignment Method with Sequence Vectorization Based on Lempel-Ziv." Applied Mechanics and Materials 284-287 (January 2013): 3203–7. http://dx.doi.org/10.4028/www.scientific.net/amm.284-287.3203.

Full text
Abstract:
In this paper, we propose a method for multiple sequence alignment, LemK_MSA, which integrates Lempel-Ziv based sequence vectorization and k-means clustering analysis. LemK_MSA converts multiple sequence alignment into corresponding 10-dimensional vector alignment by 10 types of copy modes. Then it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each part with the vectors of the sequences. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Thus, the time efficiency of processing multiple sequence alignment, especially for large-scale sequences, can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. LemK_MSA also provides an effective method to analyze the evolutionary relationship and structural features among high-throughput sequences.
APA, Harvard, Vancouver, ISO, and other styles
16

Kanagarajadurai, Karuppiah, Singaravelu Kalaimathy, Paramasivam Nagarajan, and Ramanathan Sowdhamini. "PASS2." International Journal of Knowledge Discovery in Bioinformatics 2, no. 4 (October 2011): 53–66. http://dx.doi.org/10.4018/jkdb.2011100104.

Full text
Abstract:
A detailed comparison of protein domains that belong to families and superfamilies shows that structure is better conserved than sequence during evolutionary divergence. Sequence alignments, guided by structural features, permit a better sampling of the protein sequence space and effective construction of libraries for fold recognition. Sequence alignments are useful evolutionary models in defining structure-function relationships for protein superfamilies. The PASS2 database, maintained by the authors, presents alignments of proteins related at the superfamily level and characterised by low sequence similarity. The number of new superfamilies increased to 47% compared with the previous PASS2 version, which shows the crucial importance of updating the PASS2 database. In the current release of the PASS2 database, they align protein superfamilies using a structural alignment protocol. The authors also introduce two alignment assessment methods that depend on the average structural deviations of domains and the extent of conserved secondary structures. They also integrate new and important structural and sequence features at the superfamily level into the database. These features are conserved-unconserved blocks in proteins, spatial distribution of sequences using principal component analysis and a statistical view for each superfamily. The authors suggest that highly structurally deviant superfamily members could be removed as outliers, so that such extreme distant relationships will not obscure the alignment. They report a nearly-automated, updated version of the superfamily alignment database, consisting of 1776 superfamilies and 9536 protein domains, that is in direct correspondence with the SCOP (1.73) database.
APA, Harvard, Vancouver, ISO, and other styles
17

Ji, Mingeun, Yejin Kan, Dongyeon Kim, Jaehee Jung, and Gangman Yi. "cPlot: Contig-Plotting Visualization for the Analysis of Short-Read Nucleotide Sequence Alignments." International Journal of Molecular Sciences 23, no. 19 (September 29, 2022): 11484. http://dx.doi.org/10.3390/ijms231911484.

Full text
Abstract:
Advances in the next-generation sequencing technology have led to a dramatic decrease in read-generation cost and an increase in read output. Reconstruction of short DNA sequence reads generated by next-generation sequencing requires a read alignment method that reconstructs a reference genome. In addition, it is essential to analyze the results of read alignments for a biologically meaningful inference. However, read alignment from vast amounts of genomic data from various organisms is challenging in that it involves repeated automatic and manual analysis steps. We, here, devised cPlot software for read alignment of nucleotide sequences, with automated read alignment and position analysis, which allows visual assessment of the analysis results by the user. cPlot compares sequence similarity of reads by performing multiple read alignments, with FASTA format files as the input. This application provides a web-based interface for the user for facile implementation, without the need for a dedicated computing environment. cPlot identifies the location and order of the sequencing reads by comparing the sequence to a genetically close reference sequence in a way that is effective for visualizing the assembly of short reads generated by NGS and rapid gene map construction.
APA, Harvard, Vancouver, ISO, and other styles
18

Cavanaugh, David, and Krishnan Chittur. "A hydrophobic proclivity index for protein alignments." F1000Research 4 (October 21, 2015): 1097. http://dx.doi.org/10.12688/f1000research.6348.1.

Full text
Abstract:
Sequence alignment algorithms are fundamental to modern bioinformatics. Sequence alignments are widely used in diverse applications such as phylogenetic analysis, database searches for related sequences to aid identification of unknown protein domain structures and classification of proteins and protein domains. Additionally, alignment algorithms are integral to the location of related proteins to secure understanding of unknown protein functions, to suggest the folded structure of proteins of unknown structure from location of homologous proteins and/or by locating homologous domains of known 3D structure. For proteins, alignment algorithms depend on information about amino acid substitutions that allows for matching sequences that are similar, but not exact. When primary sequence percent identity falls below about 25%, algorithms often fail to identify proteins that may have similar 3D structure. We have created a hydrophobicity scale and a matching dynamic programming algorithm called TMATCH (unpublished report) that is able to match proteins with remote homologs with similar secondary/tertiary structure, even with very low primary sequence matches. In this paper, we describe how we arrived at the hydrophobic scale, how it provides much more information than percent identity matches and some of the implications for better alignments and understanding protein structure.
APA, Harvard, Vancouver, ISO, and other styles
19

Cavanaugh, David, and Krishnan Chittur. "A hydrophobic proclivity index for protein alignments." F1000Research 4 (October 15, 2020): 1097. http://dx.doi.org/10.12688/f1000research.6348.2.

Full text
Abstract:
Sequence alignment algorithms are fundamental to modern bioinformatics. Sequence alignments are widely used in diverse applications such as phylogenetic analysis, database searches for related sequences to aid identification of unknown protein domain structures and classification of proteins and protein domains. Additionally, alignment algorithms are integral to the location of related proteins to secure understanding of unknown protein functions, to suggest the folded structure of proteins of unknown structure from location of homologous proteins and/or by locating homologous domains of known 3D structure. For proteins, alignment algorithms depend on information about amino acid substitutions that allows for matching sequences that are similar, but not exact. When primary sequence percent identity falls below about 25%, algorithms often fail to identify proteins that may have similar 3D structure. We have created a hydrophobicity scale and a matching dynamic programming algorithm called TMATCH (preprint report) that is able to match proteins with remote homologs with similar secondary/tertiary structure, even with very low primary sequence matches. In this paper, we describe how we arrived at the hydrophobic scale, how it provides much more information than percent identity matches and some of the implications for better alignments and understanding protein structure.
APA, Harvard, Vancouver, ISO, and other styles
20

Padua, F. L. C., R. L. Carceroni, G. A. M. R. Santos, and K. N. Kutulakos. "Linear Sequence-to-Sequence Alignment." IEEE Transactions on Pattern Analysis and Machine Intelligence 32, no. 2 (February 2010): 304–20. http://dx.doi.org/10.1109/tpami.2008.301.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Munshi, Hassan, Mark Liberman, and Jianjing Kuang. "Textsetting as sequence alignment." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A179. http://dx.doi.org/10.1121/10.0015960.

Full text
Abstract:
We argue that textsetting is better accounted for under the framework of sequence comparison, whereby textsetting is taken to be an instance of sequence alignment. Given a text, a tune, and an alignment procedure, we argue that all one needs is to align strong elements of one sequence (the text) to elements of the other (the tune). We show that, in many cases, the only principle needed to account for textsetting in English folk music is Strength Match, i.e., matching stressed syllables to strong beats. Many constraints proposed by previous treatments are to be discarded if we allow our textsetting model to compare between different possible textsettings and then choose the optimal one. The optimization problem is automated under the framework of dynamic programming, which allows us to create a distance metric to score all possible alignments. Moreover, we show that preferred textsettings always have the highest score, when compared to other textsettings that are either ill-formed or less preferred.
APA, Harvard, Vancouver, ISO, and other styles
22

Zhan, Qing, Yilei Fu, Qinghua Jiang, Bo Liu, Jiajie Peng, and Yadong Wang. "SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically." Protein & Peptide Letters 27, no. 4 (March 17, 2020): 295–302. http://dx.doi.org/10.2174/0929866526666190806143959.

Full text
Abstract:
Background: Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy. Objective: In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically. Method: Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs. Results: We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools. Conclusion: The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.
APA, Harvard, Vancouver, ISO, and other styles
23

Ahola, Virpi, Tero Aittokallio, Esa Uusipaikka, and Mauno Vihinen. "Statistical Methods for Identifying Conserved Residues in Multiple Sequence Alignment." Statistical Applications in Genetics and Molecular Biology 3, no. 1 (January 30, 2004): 1–28. http://dx.doi.org/10.2202/1544-6115.1074.

Full text
Abstract:
The assessment of residue conservation in a multiple sequence alignment is a central issue in bioinformatics. Conserved residues and regions are used to determine structural and functional motifs or evolutionary relationships between the sequences of a multiple sequence alignment. For this reason, residue conservation is a valuable measure for database and motif search or for estimating the quality of alignments. In this paper, we present statistical methods for identifying conserved residues in multiple sequence alignments. While most earlier studies examine the positional conservation of the alignment, we focus on the detection of individual conserved residues at a position. The major advantages of multiple comparison methods originate from their ability to select conserved residues simultaneously and to consider the variability of the residue estimates. Large-scale simulations were used for the comparative analysis of the methods. Practical performance was studied by comparing the structurally and functionally important residues of Src homology 2 (SH2) domains to the assignments of the conservation indices. The applicability of the indices was also compared in three additional protein families comprising different degrees of entropy and variability in alignment positions. The results indicate that statistical multiple comparison methods are sensitive and reliable in identifying conserved residues.
APA, Harvard, Vancouver, ISO, and other styles
24

Linheiro, Raquel, Stephen Sabatino, Diana Lobo, and John Archer. "CView: A network based tool for enhanced alignment visualization." PLOS ONE 17, no. 6 (June 13, 2022): e0259726. http://dx.doi.org/10.1371/journal.pone.0259726.

Full text
Abstract:
To date basic visualization of sequence alignments have largely focused on displaying per-site columns of nucleotide, or amino acid, residues along with associated frequency summarizations. The persistence of this tendency to the recent tools designed for viewing mapped read data indicates that such a perspective not only provides a reliable visualization of per-site alterations, but also offers implicit reassurance to the end-user in relation to data accessibility. However, the initial insight gained is limited, something that is especially true when viewing alignments consisting of many sequences representing differing factors such as location, date and subtype. A basic alignment viewer can have potential to increase initial insight through visual enhancement, whilst not delving into the realms of complex sequence analysis. We present CView, a visualizer that expands on the per-site representation of residues through the incorporation of a dynamic network that is based on the summarization of diversity present across different regions of the alignment. Within the network, nodes are based on the clustering of sequence fragments that span windows placed consecutively along the alignment. Edges are placed between nodes of neighbouring windows where they share sequence identification(s), i.e. different regions of the same sequence(s). Thus, if a node is selected on the network, then the relationship that sequences passing through that node have to other regions of diversity within the alignment can be observed through path tracing. In addition to augmenting visual insight, CView provides export features including variant summarization, per-site residue and kmer frequencies, consensus sequence, alignment dissection as well as clustering; each useful across a range of research areas. The software has been designed to be user friendly, intuitive and interactive. It is open source and an executable jar, source code, quick start, usage tutorial and test data are available (under the GNU General Public License) from https://sourceforge.net/projects/cview/.
APA, Harvard, Vancouver, ISO, and other styles
25

Tang, Chuan Yi, Chin Lung Lu, Margaret Dah-Tsyr Chang, Yin-Te Tsai, Yuh-Ju Sun, Kun-Mao Chao, Jia-Ming Chang, et al. "Constrained Multiple Sequence Alignment Tool Development and Its Application to RNase Family Alignment." Journal of Bioinformatics and Computational Biology 01, no. 02 (July 2003): 267–87. http://dx.doi.org/10.1142/s0219720003000095.

Full text
Abstract:
In this paper, we design a heuristic algorithm of computing a constrained multiple sequence alignment (CMSA for short) for guaranteeing that the generated alignment satisfies the user-specified constraints that some particular residues should be aligned together. If the number of residues needed to be aligned together is a constant α, then the time-complexity of our CMSA algorithm for aligning K sequences is O(αKn4), where n is the maximum of the lengths of sequences. In addition, we have built up such a CMSA software system and made several experiments on the RNase sequences, which mainly function in catalyzing the degradation of RNA molecules. The resulting alignments illustrate the practicability of our method.
APA, Harvard, Vancouver, ISO, and other styles
26

Sierk, Michael L., Michael E. Smoot, Ellen J. Bass, and William R. Pearson. "Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments." BMC Bioinformatics 11, no. 1 (2010): 146. http://dx.doi.org/10.1186/1471-2105-11-146.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Barton, Geoffrey J. "Protein Sequence Alignment Techniques." Acta Crystallographica Section D Biological Crystallography 54, no. 6 (November 1, 1998): 1139–46. http://dx.doi.org/10.1107/s0907444998008324.

Full text
Abstract:
The basic algorithms for alignment of two or more protein sequences are explained. Alternative methods for scoring substitutions and gaps (insertions and deletions) are described, as are global and local alignment methods. Multiple alignment techniques are explained, including methods for profile comparison. A summary is given of programs for the alignment and analysis of protein sequences, either from sequence alone, or from three-dimensional structure.
APA, Harvard, Vancouver, ISO, and other styles
28

Edgar, Robert C., and Serafim Batzoglou. "Multiple sequence alignment." Current Opinion in Structural Biology 16, no. 3 (June 2006): 368–73. http://dx.doi.org/10.1016/j.sbi.2006.04.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Chao, Kun-Mao, Ross C. Hardison, and Webb Miller. "Constrained sequence alignment." Bulletin of Mathematical Biology 55, no. 3 (May 1993): 503–24. http://dx.doi.org/10.1007/bf02460648.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

CHAO, K., R. HARDISON, and W. MILLER. "Constrained sequence alignment." Bulletin of Mathematical Biology 55, no. 3 (May 1993): 503–24. http://dx.doi.org/10.1016/s0092-8240(05)80237-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Bacon, David J., and Wayne F. Anderson. "Multiple sequence alignment." Journal of Molecular Biology 191, no. 2 (September 1986): 153–61. http://dx.doi.org/10.1016/0022-2836(86)90252-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Ji, Yukai, Tao Huang, Chunlai Ma, Chao Hu, Zhanfeng Wang, and Anmin Fu. "IMCSA: Providing Better Sequence Alignment Space for Industrial Control Protocol Reverse Engineering." Security and Communication Networks 2022 (November 24, 2022): 1–9. http://dx.doi.org/10.1155/2022/8026280.

Full text
Abstract:
Nowadays, with the wide application of industrial control facilities, industrial control protocol reverse engineering has significant security implications. The reverse method of industrial protocol based on sequence alignment is the current mainstream method because of its high accuracy. However, this method will incur a huge time overhead due to unnecessary alignments during the sequence alignment process. In this paper, we optimize the traditional sequence alignment method by combining the characteristics of industrial control protocols. We improve the frequent sequence mining algorithm, Apriori, to propose a more efficient Bag-of-Words generation algorithm for finding keywords. Then, we precluster the messages based on the generated Bag-of-Words to improve the similarity of the message within a cluster. Finally, we propose an industrial control protocol message preclustering model for sequence alignment, namely, IMCSA. We evaluate it over five industrial control protocols, and the results show that IMCSA can generate clusters with higher message similarity, which will greatly reduce the invalid alignments existing in the sequence alignment stage and ultimately improve the overall efficiency.
APA, Harvard, Vancouver, ISO, and other styles
33

Catanach, Therese A., Andrew D. Sweet, Nam-phuong D. Nguyen, Rhiannon M. Peery, Andrew H. Debevec, Andrea K. Thomer, Amanda C. Owings, et al. "Fully automated sequence alignment methods are comparable to, and much faster than, traditional methods in large data sets: an example with hepatitis B virus." PeerJ 7 (January 3, 2019): e6142. http://dx.doi.org/10.7717/peerj.6142.

Full text
Abstract:
Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e., a combination of full genomes and genomic fragments), which can compound the computational issues related to MSA. Traditionally, alignments are produced with automated algorithms and then checked and/or corrected “by eye” prior to phylogenetic inference. However, this manual curation is inefficient at the data scales required of modern phylogenetics and results in alignments that are not reproducible. Recently, methods have been developed for fully automating alignments of large data sets, but it is unclear if these methods produce alignments that result in compatible phylogenies when compared to more traditional alignment approaches that combined automated and manual methods. Here we use approximately 33,000 publicly available sequences from the hepatitis B virus (HBV), a globally distributed and rapidly evolving virus, to compare different alignment approaches. Using one data set comprised exclusively of whole genomes and a second that also included sequence fragments, we compared three MSA methods: (1) a purely automated approach using traditional software, (2) an automated approach including by eye manual editing, and (3) more recent fully automated approaches. To understand how these methods affect phylogenetic results, we compared resulting tree topologies based on these different alignment methods using multiple metrics. We further determined if the monophyly of existing HBV genotypes was supported in phylogenies estimated from each alignment type and under different statistical support thresholds. Traditional and fully automated alignments produced similar HBV phylogenies. Although there was variability between branch support thresholds, allowing lower support thresholds tended to result in more differences among trees. Therefore, differences between the trees could be best explained by phylogenetic uncertainty unrelated to the MSA method used. Nevertheless, automated alignment approaches did not require human intervention and were therefore considerably less time-intensive than traditional approaches. Because of this, we conclude that fully automated algorithms for MSA are fully compatible with older methods even in extremely difficult to align data sets. Additionally, we found that most HBV diagnostic genotypes did not correspond to evolutionarily-sound groups, regardless of alignment type and support threshold. This suggests there may be errors in genotype classification in the database or that HBV genotypes may need a revision.
APA, Harvard, Vancouver, ISO, and other styles
34

Ji, Guo Li, Long Teng Chen, and Liang Liang Chen. "Two-Level Parallel Alignment Based on Sequence Parallel Vectorization." Applied Mechanics and Materials 490-491 (January 2014): 757–62. http://dx.doi.org/10.4028/www.scientific.net/amm.490-491.757.

Full text
Abstract:
This paper proposed a way of two-level parallel alignment based on sequence parallel vectorization with GPU acceleration on the Fermi architecture, which integrates sequence parallel vectorization, parallel k-means clustering approximate alignment and parallel Smith-Waterman algorithm. The method converts sequence alignment into vector alignment by first. Then it uses k-means alignment to divide sequences into several groups and reduce the size of sequence data. The expected accurate alignment result is achieved using parallel Smith-Waterman algorithm. The high-throughput mouse T-cell receptor (TCR) sequences were used to validate the proposed method. Under the same hardware condition, comparing to serial Smith-Waterman algorithm and CUDASW++2.0 algorithm, our method is the most efficient alignment algorithm with high alignment accuracy.
APA, Harvard, Vancouver, ISO, and other styles
35

Lebsir, Rabah, Abdesslem Layeb, and Tahi Fariza. "A Greedy Clustering Algorithm for Multiple Sequence Alignment." International Journal of Cognitive Informatics and Natural Intelligence 15, no. 4 (October 2021): 1–17. http://dx.doi.org/10.4018/ijcini.20211001.oa41.

Full text
Abstract:
This paper presents a strategy to tackle the Multiple Sequence Alignment (MSA) problem, which is one of the most important tasks in the biological sequence analysis. Its role is to align the sequences in their entirety to derive relationships and common characteristics between a set of protein or nucleotide sequences. The MSA problem was proved to be an NP-Hard problem. The proposed strategy incorporates a new idea based on the well-known divide and conquer paradigm. This paper presents a novel method of clustering sequences as a preliminary step to improve the final alignment; this decomposition can be used as an optimization procedure with any MSA aligner to explore promising alignments of the search space. In their solution, authors proposed to align the clusters in a parallel and distributed way in order to benefit from parallel architectures. The strategy was tested using classical benchmarks like BAliBASE, Sabre, Prefab4 and Oxm, and the experimental results show that it gives good results by comparing to the other aligners.
APA, Harvard, Vancouver, ISO, and other styles
36

Alser, Mohammed, Hasan Hassan, Akash Kumar, Onur Mutlu, and Can Alkan. "Shouji: a fast and efficient pre-alignment filter for sequence alignment." Bioinformatics 35, no. 21 (March 28, 2019): 4255–63. http://dx.doi.org/10.1093/bioinformatics/btz234.

Full text
Abstract:
AbstractMotivationThe ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern field-programmable gate array (FPGA) architectures to further boost the performance of our algorithm.ResultsShouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8×. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step.Availability and implementationhttps://github.com/CMU-SAFARI/Shouji.Supplementary informationSupplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
37

Wilson, W. C. "Activity Pattern Analysis by Means of Sequence-Alignment Methods." Environment and Planning A: Economy and Space 30, no. 6 (June 1998): 1017–38. http://dx.doi.org/10.1068/a301017.

Full text
Abstract:
The author describes a method of comparing sequences of characters, called sequence alignment or string matching, and illustrates its use in the analysis of daily activity patterns derived from time-use diaries. It allows definition of measures of similarity or distance between complete sequences, called global alignment, or the evaluation of the best fit of short sequences within long sequences, called local alignment. Alignments may be done pairwise to develop similarity or distance matrices that describe the relatedness of individuals in the set of sequences being examined. Pairwise alignment methods may be extended to many individuals by using multiple alignment analysis. A number of elementary hand-worked examples are provided. The basic concepts are discussed in terms of the problems of time-use research and the method is illustrated by examining diary data from a survey conducted in Reading, England. The CLUSTAL software used for the alignments was written for molecular biological research. The method offers a powerful technique for analyzing the full richness of diary data without discarding the details of episode ordering, duration, or transition. It is also possible to extend the analysis to include the context of activities, such as the presence of other persons or the location, but such extensions would require software designed for social science rather than biochemical problems. The method also offers a challenge to researchers to begin to develop theories about the determinants of daily behavior as a whole, rather than about participation in single activities or about time-budget totals.
APA, Harvard, Vancouver, ISO, and other styles
38

Schulz, Tizian, Roland Wittler, Sven Rahmann, Faraz Hach, and Jens Stoye. "Detecting high-scoring local alignments in pangenome graphs." Bioinformatics 37, no. 16 (February 3, 2021): 2266–74. http://dx.doi.org/10.1093/bioinformatics/btab077.

Full text
Abstract:
Abstract Motivation Increasing amounts of individual genomes sequenced per species motivate the usage of pangenomic approaches. Pangenomes may be represented as graphical structures, e.g. compacted colored de Bruijn graphs, which offer a low memory usage and facilitate reference-free sequence comparisons. While sequence-to-graph mapping to graphical pangenomes has been studied for some time, no local alignment search tool in the vein of BLAST has been proposed yet. Results We present a new heuristic method to find maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph. Our approach additionally allows a comparison of similarity among sequences within the pangenome. We show that local alignment scores follow an exponential-tail distribution similar to BLAST scores, and we discuss how to estimate its parameters to separate local alignments representing sequence homology from spurious findings. An implementation of our method is presented, and its performance and usability are shown. Our approach scales sublinearly in running time and memory usage with respect to the number of genomes under consideration. This is an advantage over classical methods that do not make use of sequence similarity within the pangenome. Availability and implementation Source code and test data are available from https://gitlab.ub.uni-bielefeld.de/gi/plast. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
39

Tyson, Hugh. "Relationships between amino acid sequences determined through optimum alignments, clustering, and specific distance patterns: application to a group of scorpion toxins." Genome 35, no. 2 (April 1, 1992): 360–71. http://dx.doi.org/10.1139/g92-055.

Full text
Abstract:
Optimum alignment in all pairwise combinations among a group of amino acid sequences generated a distance matrix. These distances were clustered to evaluate relationships among the sequences. The degree of relationship among sequences was also evaluated by calculating specific distances from the distance matrix and examining correlations between patterns of specific distances for pairs of sequences. The sequences examined were a group of 20 amino acid sequences of scorpion toxins originally published and analyzed by M.J. Dufton and H. Rochat in 1984. Alignment gap penalties were constant for all 190 pairwise sequence alignments and were chosen after assessing the impact of changing penalties on resultant distances. The total distances generated by the 190 pairwise sequence aligments were clustered using complete (farthest neighbour) linkage. The square, symmetrical input distance matrix is analogous to diallel cross data where reciprocal and parental values are absent. Diallel analysis methods provided analogues for the distance matrix to genetical specific combining abilities, namely specific distances between all sequence pairs that are independent of the average distances shown by individual sequences. Correlation of specific distance patterns, with transformation to modified z values and a stringent probability level, were used to delineate subgroups of related sequences. These were compared with complete linkage clustering results. Excellent agreement between the two approaches was found. Three originally outlying sequences were placed within the four new subgroups.Key words: sequence alignment, specific distances, sequence relationships.
APA, Harvard, Vancouver, ISO, and other styles
40

Md Isa, Mohd Nazrin, Sohiful Anuar Zainol Murad, Mohamad Imran Ahmad, Muhammad M. Ramli, and Rizalafande Che Ismail. "An Efficient Scheduling Technique for Biological Sequence Alignment." Applied Mechanics and Materials 754-755 (April 2015): 1087–92. http://dx.doi.org/10.4028/www.scientific.net/amm.754-755.1087.

Full text
Abstract:
Computing alignment matrix score to search for regions of homology between biological sequences is time consuming task. This is due to the recursive nature of the dynamic programming-based algorithms such as the Smith-Waterman and the Needleman-Wunsch algorithmns. Typical FPGA-based protein sequencer comprises of two main logic blocks. One for computing alignment scores i.e. the processing element (PE), while another logic block for configuring the PE with coefficients. During alignment matrix computation, the logic block for configuring the PE are left unused until the time consuming alignment matrix computation finished. Therefore, a new technique, known as overlap computation and configuration (OCC) is proposed to minimize the time overhead for performing biological sequence alignment. The OCC technique simultaneously updating substitution matrix in a processing element (PE) systolic array, while computing alignment matrix scores. Results showed that, the sequencer achieves more than two order of magnitude speed-up higher compared to the state of the art, at negligible area overhead, if any.
APA, Harvard, Vancouver, ISO, and other styles
41

Tumescheit, Charlotte, Andrew E. Firth, and Katherine Brown. "CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments." PeerJ 10 (March 15, 2022): e12983. http://dx.doi.org/10.7717/peerj.12983.

Full text
Abstract:
Background Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to poor alignment and large gaps. This slows down computation and can impact conclusions without being biologically relevant. Cleaning the alignment by removing common issues such as gaps, divergent sequences, large insertions and deletions and poorly aligned sequence ends can substantially improve analyses. Manual editing of MSAs is very widespread but is time-consuming and difficult to reproduce. Results We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed. The main functionalities of the tool include removing regions of low coverage due to insertions, removing gaps, cropping poorly aligned sequence ends and removing sequences that are too divergent or too short. The thresholds for each function can be specified by the user and parameters can be adjusted to each individual MSA. CIAlign is designed with an emphasis on solving specific and common alignment problems and on providing transparency to the user. Conclusion CIAlign effectively removes problematic regions and sequences from MSAs and provides novel visualisation options. This tool can be used to fine-tune alignments for further analysis and processing. The tool is aimed at anyone who wishes to automatically clean up parts of an MSA and those requiring a new, accessible way of visualising large MSAs.
APA, Harvard, Vancouver, ISO, and other styles
42

BACKOFEN, ROLF, and SEBASTIAN WILL. "LOCAL SEQUENCE-STRUCTURE MOTIFS IN RNA." Journal of Bioinformatics and Computational Biology 02, no. 04 (December 2004): 681–98. http://dx.doi.org/10.1142/s0219720004000818.

Full text
Abstract:
Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2·m2· max (n,m)) and a space complexity of only O(n·m). An implementation of our algorithm is available at . Its runtime is competitive with global sequence-structure alignment.
APA, Harvard, Vancouver, ISO, and other styles
43

Lee, Sung Jong, Keehyoung Joo, Sangjin Sim, Juyong Lee, In-Ho Lee, and Jooyoung Lee. "CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields." Molecules 27, no. 12 (June 9, 2022): 3711. http://dx.doi.org/10.3390/molecules27123711.

Full text
Abstract:
Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence–structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign ≃42.94%) compared with that of HHalign (TM-HHalign ≃39.05%) and also that of MRFalign (TM-MRFalign ≃36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.
APA, Harvard, Vancouver, ISO, and other styles
44

Kholiq, Hibban, Mamika Ujianita Romdhini, and Marliadi Susanto. "Algoritma Needleman-Wunsch dalam Menentukan Tingkat Kemiripan Urutan DNA Rusa Timor (Cervus timorensis) dan Rusa Merah (Cervus elaphus)." EIGEN MATHEMATICS JOURNAL 3, no. 2 (December 30, 2020): 125. http://dx.doi.org/10.29303/emj.v3i2.65.

Full text
Abstract:
Sequence alignment is a basic method in sequence analysis. This method is used to determine the similaritiy level of DNA sequences. The Needleman-Wunsch algorithm is an algorithm that can be used to solve the problem of sequence alignment. This research shows that the relation T (i, j) used in the Needleman-Wunsch algorithm is a function where T: (ℕ0 ℕ0) → ℤ. The function T (i, j) is a recursive function. Moreover, DNA sequence data used are DNA sequences from the Timor Deer, which are the identities of the provinces of West Nusa Tenggara and Red Deer, which are typical deer from the European continent as a comparison. The DNA sequence data was obtained from BLAST (Basic Local Alignment Search Tool). Based on the alignment, the most optimal alignment is obtained by forming 666 base pairs sequences with 322 matches, 230 missmatches and 114 gaps, meaning that the two DNA sequences have a 48% similarity (322/666).
APA, Harvard, Vancouver, ISO, and other styles
45

Spirollari, Junilda, Jason T. L. Wang, Kaizhong Zhang, Vivian Bellofatto, Yongkyu Park, and Bruce A. Shapiro. "Predicting Consensus Structures for RNA Alignments via Pseudo-Energy Minimization." Bioinformatics and Biology Insights 3 (January 2009): BBI.S2578. http://dx.doi.org/10.4137/bbi.s2578.

Full text
Abstract:
Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http://datalab.njit.edu/biology/RSpredict .
APA, Harvard, Vancouver, ISO, and other styles
46

Solano-Roman, A., C. Cruz-Castillo, D. Offenhuber, and A. Colubri. "NX4: a web-based visualization of large multiple sequence alignments." Bioinformatics 35, no. 22 (June 4, 2019): 4800–4802. http://dx.doi.org/10.1093/bioinformatics/btz457.

Full text
Abstract:
Abstract Summary Multiple Sequence Alignments (MSAs) are a fundamental operation in genome analysis. However, MSA visualizations such as sequence logos and matrix representations have changed little since the nineties and are not well suited for displaying large-scale alignments. We propose a novel, web-based MSA visualization tool called NX4, which can handle genome alignments comprising thousands of sequences. NX4 calculates the frequency of each nucleotide along the alignment and visually summarizes the results using a color-blind friendly palette that helps identifying regions of high genetic diversity. NX4 also provides the user with additional assistance in finding these regions with a ‘focus + context’ mechanism that uses a line chart of the Shannon entropy across the alignment. The tool offers geneticists an easy-to-use and scalable analysis for large MSA studies. Availability and implementation NX4 is freely available at https://www.nx4.io, and its source code at https://github.com/NX4/nx4. Supplementary information Supplementary data are available at Bioinformatics online
APA, Harvard, Vancouver, ISO, and other styles
47

Andrade, Helena, Juan J. Nieto, and Angela Torres. "The number of alignments between two DNA sequences." International Journal of Biomathematics 09, no. 04 (April 22, 2016): 1650053. http://dx.doi.org/10.1142/s1793524516500534.

Full text
Abstract:
We consider two DNA sequences and compare both sequences. One of the crucial issues in bioinformatics is to measure the similarity of two DNA sequences. To this purpose one has to consider different alignments between both sequences. The number of alignments grows very rapidly with the length of the sequences. In this paper we give exact, explicit and computable formulas for the number of different possible alignments and for some classes of reduced alignments. We provide a new insight into the theory of DNA sequence alignment.
APA, Harvard, Vancouver, ISO, and other styles
48

Shen, Chengze, Paul Zaharias, and Tandy Warnow. "MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences." Bioinformatics 38, no. 4 (November 17, 2021): 918–24. http://dx.doi.org/10.1093/bioinformatics/btab788.

Full text
Abstract:
Abstract Summary Multiple sequence alignment is an initial step in many bioinformatics pipelines, including phylogeny estimation, protein structure prediction and taxonomic identification of reads produced in amplicon or metagenomic datasets, etc. Yet, alignment estimation is challenging on datasets that exhibit substantial sequence length heterogeneity, and especially when the datasets have fragmentary sequences as a result of including reads or contigs generated by next-generation sequencing technologies. Here, we examine techniques that have been developed to improve alignment estimation when datasets contain substantial numbers of fragmentary sequences. We find that MAGUS, a recently developed MSA method, is fairly robust to fragmentary sequences under many conditions, and that using a two-stage approach where MAGUS is used to align selected ‘backbone sequences’ and the remaining sequences are added into the alignment using ensembles of Hidden Markov Models further improves alignment accuracy. The combination of MAGUS with the ensemble of eHMMs (i.e. MAGUS+eHMMs) clearly improves on UPP, the previous leading method for aligning datasets with high levels of fragmentation. Availability and implementation UPP is available on https://github.com/smirarab/sepp, and MAGUS is available on https://github.com/vlasmirnov/MAGUS. MAGUS+eHMMs can be performed by running MAGUS to obtain the backbone alignment, and then using the backbone alignment as an input to UPP. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
49

Steenwyk, Jacob L., Thomas J. Buida, Yuanning Li, Xing-Xing Shen, and Antonis Rokas. "ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference." PLOS Biology 18, no. 12 (December 2, 2020): e3001007. http://dx.doi.org/10.1371/journal.pbio.3001007.

Full text
Abstract:
Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.
APA, Harvard, Vancouver, ISO, and other styles
50

Roca, Alberto I., Aaron C. Abajian, and David J. Vigerust. "ProfileGrids solve the large alignment visualization problem: influenza hemagglutinin example." F1000Research 2 (January 4, 2013): 2. http://dx.doi.org/10.12688/f1000research.2-2.v1.

Full text
Abstract:
Large multiple sequence alignments are a challenge for current visualization programs. ProfileGrids are a solution that reduces alignments to a matrix, color-shaded according to the residue frequency at each column position. ProfileGrids are not limited by the number of sequences and so solves this visualization problem. We demonstrate the new metadata searching and grep filtering features of the JProfileGrid version 2.0 software on an alignment of 11,900 hemagglutinin protein sequences. JProfileGrid is free and available from http://www.ProfileGrid.org.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography