To see the other types of publications on this topic, follow the link: Biological Sequence Analysis.

Journal articles on the topic 'Biological Sequence Analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Biological Sequence Analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Allison, L., L. Stern, T. Edgoose, and T. I. Dix. "Sequence complexity for biological sequence analysis." Computers & Chemistry 24, no. 1 (January 2000): 43–55. http://dx.doi.org/10.1016/s0097-8485(00)80006-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Hongliang, and Bin Liu. "BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo." PLOS Computational Biology 19, no. 6 (June 20, 2023): e1011214. http://dx.doi.org/10.1371/journal.pcbi.1011214.

Full text
Abstract:
As the key for biological sequence structure and function prediction, disease diagnosis and treatment, biological sequence similarity analysis has attracted more and more attentions. However, the exiting computational methods failed to accurately analyse the biological sequence similarities because of the various data types (DNA, RNA, protein, disease, etc) and their low sequence similarities (remote homology). Therefore, new concepts and techniques are desired to solve this challenging problem. Biological sequences (DNA, RNA and protein sequences) can be considered as the sentences of “the book of life”, and their similarities can be considered as the biological language semantics (BLS). In this study, we are seeking the semantics analysis techniques derived from the natural language processing (NLP) to comprehensively and accurately analyse the biological sequence similarities. 27 semantics analysis methods derived from NLP were introduced to analyse biological sequence similarities, bringing new concepts and techniques to biological sequence similarity analysis. Experimental results show that these semantics analysis methods are able to facilitate the development of protein remote homology detection, circRNA-disease associations identification and protein function annotation, achieving better performance than the other state-of-the-art predictors in the related fields. Based on these semantics analysis methods, a platform called BioSeq-Diabolo has been constructed, which is named after a popular traditional sport in China. The users only need to input the embeddings of the biological sequence data. BioSeq-Diabolo will intelligently identify the task, and then accurately analyse the biological sequence similarities based on biological language semantics. BioSeq-Diabolo will integrate different biological sequence similarities in a supervised manner by using Learning to Rank (LTR), and the performance of the constructed methods will be evaluated and analysed so as to recommend the best methods for the users. The web server and stand-alone package of BioSeq-Diabolo can be accessed at http://bliulab.net/BioSeq-Diabolo/server/.
APA, Harvard, Vancouver, ISO, and other styles
3

Petti, Samantha, and Sean R. Eddy. "Constructing benchmark test sets for biological sequence analysis using independent set algorithms." PLOS Computational Biology 18, no. 3 (March 7, 2022): e1009492. http://dx.doi.org/10.1371/journal.pcbi.1009492.

Full text
Abstract:
Biological sequence families contain many sequences that are very similar to each other because they are related by evolution, so the strategy for splitting data into separate training and test sets is a nontrivial choice in benchmarking sequence analysis methods. A random split is insufficient because it will yield test sequences that are closely related or even identical to training sequences. Adapting ideas from independent set graph algorithms, we describe two new methods for splitting sequence data into dissimilar training and test sets. These algorithms input a sequence family and produce a split in which each test sequence is less than p% identical to any individual training sequence. These algorithms successfully split more families than a previous approach, enabling construction of more diverse benchmark datasets.
APA, Harvard, Vancouver, ISO, and other styles
4

Horton, Robert M. "Biological Sequence Analysis Using Regular Expressions." BioTechniques 27, no. 1 (July 1999): 76–78. http://dx.doi.org/10.2144/99271ir01.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yap, T. K., O. Frieder, and R. L. Martino. "Parallel computation in biological sequence analysis." IEEE Transactions on Parallel and Distributed Systems 9, no. 3 (March 1998): 283–94. http://dx.doi.org/10.1109/71.674320.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Pachter, L., and B. Sturmfels. "Parametric inference for biological sequence analysis." Proceedings of the National Academy of Sciences 101, no. 46 (November 8, 2004): 16138–43. http://dx.doi.org/10.1073/pnas.0406011101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Mitrophanov, Alexander Yu, and Mark Borodovsky. "Statistical significance in biological sequence analysis." Briefings in Bioinformatics 7, no. 1 (March 1, 2006): 2–24. http://dx.doi.org/10.1093/bib/bbk001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Dwivedi, Vivek Dhar, Indra Prasad Tripathi, Aman Chandra Kaushik, Shiv Bharadwaj, and Sarad Kumar Mishra. "Biological Data Analysis Program (BDAP): a multitasking biological sequence analysis program." Neural Computing and Applications 30, no. 5 (December 17, 2016): 1493–501. http://dx.doi.org/10.1007/s00521-016-2772-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Murad, Taslim, Sarwan Ali, and Murray Patterson. "Exploring the Potential of GANs in Biological Sequence Analysis." Biology 12, no. 6 (June 14, 2023): 854. http://dx.doi.org/10.3390/biology12060854.

Full text
Abstract:
Biological sequence analysis is an essential step toward building a deeper understanding of the underlying functions, structures, and behaviors of the sequences. It can help in identifying the characteristics of the associated organisms, such as viruses, etc., and building prevention mechanisms to eradicate their spread and impact, as viruses are known to cause epidemics that can become global pandemics. New tools for biological sequence analysis are provided by machine learning (ML) technologies to effectively analyze the functions and structures of the sequences. However, these ML-based methods undergo challenges with data imbalance, generally associated with biological sequence datasets, which hinders their performance. Although various strategies are present to address this issue, such as the SMOTE algorithm, which creates synthetic data, however, they focus on local information rather than the overall class distribution. In this work, we explore a novel approach to handle the data imbalance issue based on generative adversarial networks (GANs), which use the overall data distribution. GANs are utilized to generate synthetic data that closely resembles real data, thus, these generated data can be employed to enhance the ML models’ performance by eradicating the class imbalance problem for biological sequence analysis. We perform four distinct classification tasks by using four different sequence datasets (Influenza A Virus, PALMdb, VDjDB, Host) and our results illustrate that GANs can improve the overall classification performance.
APA, Harvard, Vancouver, ISO, and other styles
10

Hanif, Waqar, Hijab Fatima, Muhammad Qasim, Rana Muhammad Atif, and Muhammad Rizwan Javed. "SeqDown: An Efficient Sequence Retrieval Software and Comparative Sequence Retrieval Analysis." Current Trends in OMICS 1, no. 1 (August 2, 2021): 18–29. http://dx.doi.org/10.32350/cto.11.03.

Full text
Abstract:
For any sequence analysis procedure, a single or multiple sequence must be retrieved, stored, organized. One of the most common public databases used for biological sequence retrieval is GenBank which is a comprehensive public database of nucleotide sequences. However, as the length of the sequence to be retrieved increases such as a chromosome, entire genome, scaffold, etc., the elapsed time to download the file gets even elongated due to slower bandwidth to download/retrieve the sequence.[8] In most cases, during sequence analysis, the researcher requires messenger RNA (mRNA), RNA, DNA, protein sequences of the same sequence-of-interest to work with, which consumes a substantial amount of the researcher in finding and retrieving the sequence files. An access to GenBank through JAVA HTTPS protocols is established to request and receive the sequence files associated with the input accessions. SeqDown was shown to be much efficient in terms of retrieval time of the sequences as compared to the other internet browsers and was found to be 15.27% faster than Mozilla Firefox. SeqDown also provides the feature to retrieve coding DNA sequences & protein sequences present in a single chromosome. Sequence retrieval from the most biological databases don’t have proper naming of their files and the user has to deal with the redundantly named sequence files which leads to incorrect and time-consuming analysis and can be solved with SeqDown. SeqDown is available as a free-to-download software at https://bit.ly/3cUwchz
APA, Harvard, Vancouver, ISO, and other styles
11

Liu, Wen-li, and Qing-biao Wu. "Analysis method and algorithm design of biological sequence problem based on generalized k-mer vector." Applied Mathematics-A Journal of Chinese Universities 36, no. 1 (March 2021): 114–27. http://dx.doi.org/10.1007/s11766-021-4033-x.

Full text
Abstract:
AbstractK-mer can be used for the description of biological sequences and k-mer distribution is a tool for solving sequences analysis problems in bioinformatics. We can use k-mer vector as a representation method of the k-mer distribution of the biological sequence. Problems, such as similarity calculations or sequence assembly, can be described in the k-mer vector space. It helps us to identify new features of an old sequence-based problem in bioinformatics and develop new algorithms using the concepts and methods from linear space theory. In this study, we defined the k-mer vector space for the generalized biological sequences. The meaning of corresponding vector operations is explained in the biological context. We presented the vector/matrix form of several widely seen sequence-based problems, including read quantification, sequence assembly, and pattern detection problem. Its advantages and disadvantages are discussed. Also, we implement a tool for the sequence assembly problem based on the concepts of k-mer vector methods. It shows the practicability and convenience of this algorithm design strategy.
APA, Harvard, Vancouver, ISO, and other styles
12

Wang, Zhan Bin, Hong Yun Xu, De Hai Li, and Jing Jie Wang. "The Biological Characteristics and its Sequence Analysis of Pholiota adiposa." Advanced Materials Research 518-523 (May 2012): 5371–75. http://dx.doi.org/10.4028/www.scientific.net/amr.518-523.5371.

Full text
Abstract:
In this paper, the biological characteristics of Pholiota adiposa were systematically studied. The results showed that the ideal temperature range for growth is from 20 °C to 25°C, with optimal temperature at 25°C; the optimal light condition is full darkness; the ideal pH range for growth is from 5 to 9, with optimal pH at 6; the preferred carbon source is sucrose, followed by glucose; the preferred nitrogen source is potassium nitrate, glutamic acid. The internal transcribed spacer region (ITS) was sequenced to determine whether the DNA sequence data supported the experimental result. The phylogenetic tree for the 19 pieces of homologous sequences were analyzed, with the highest homology reaching 99%.
APA, Harvard, Vancouver, ISO, and other styles
13

Iuchi, Hitoshi, Taro Matsutani, Keisuke Yamada, Natsuki Iwano, Shunsuke Sumi, Shion Hosoda, Shitao Zhao, Tsukasa Fukunaga, and Michiaki Hamada. "Representation learning applications in biological sequence analysis." Computational and Structural Biotechnology Journal 19 (2021): 3198–208. http://dx.doi.org/10.1016/j.csbj.2021.05.039.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Birney, E. "Hidden Markov models in biological sequence analysis." IBM Journal of Research and Development 45, no. 3.4 (May 2001): 449–54. http://dx.doi.org/10.1147/rd.453.0449.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Vinga, S. "Information theory applications for biological sequence analysis." Briefings in Bioinformatics 15, no. 3 (September 20, 2013): 376–89. http://dx.doi.org/10.1093/bib/bbt068.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

GANAPATHIRAJU, MADHAVI K., ASIA D. MITCHELL, MOHAMED THAHIR, KAMIYA MOTWANI, and SESHAN ANANTHASUBRAMANIAN. "SUITE OF TOOLS FOR STATISTICAL N-GRAM LANGUAGE MODELING FOR PATTERN MINING IN WHOLE GENOME SEQUENCES." Journal of Bioinformatics and Computational Biology 10, no. 06 (October 18, 2012): 1250016. http://dx.doi.org/10.1142/s0219720012500163.

Full text
Abstract:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
APA, Harvard, Vancouver, ISO, and other styles
17

VIVÈS, Romain R., David A. PYE, Markku SALMIVIRTA, John J. HOPWOOD, Ulf LINDAHL, and John T. GALLAGHER. "Sequence analysis of heparan sulphate and heparin oligosaccharides." Biochemical Journal 339, no. 3 (April 26, 1999): 767–73. http://dx.doi.org/10.1042/bj3390767.

Full text
Abstract:
The biological activity of heparan sulphate (HS) and heparin largely depends on internal oligosaccharide sequences that provide specific binding sites for an extensive range of proteins. Identification of such structures is crucial for the complete understanding of glycosaminoglycan (GAG)-protein interactions. We describe here a simple method of sequence analysis relying on the specific tagging of the sugar reducing end by 3H radiolabelling, the combination of chemical scission and specific enzymic digestion to generate intermediate fragments, and the analysis of the generated products by strong-anion-exchange HPLC. We present full sequence data on microgram quantities of four unknown oligosaccharides (three HS-derived hexasaccharides and one heparin-derived octasaccharide) which illustrate the utility and relative simplicity of the technique. The results clearly show that it is also possible to read sequences of inhomogeneous preparations. Application of this technique to biologically active oligosaccharides should accelerate progress in the understanding of HS and heparin structure-function relationships and provide new insights into the primary structure of these polysaccharides.
APA, Harvard, Vancouver, ISO, and other styles
18

Beckstette, Michael, Jens T. Mailänder, Richard J. Marhöfer, Alexander Sczyrba, Enno Ohlebusch, Robert Giegerich, and Paul M. Selzer. "Genlight: Interactive high-throughput sequence analysis and comparative genomics." Journal of Integrative Bioinformatics 1, no. 1 (December 1, 2004): 90–107. http://dx.doi.org/10.1515/jib-2004-8.

Full text
Abstract:
Abstract With rising numbers of fully sequenced genomes the importance of comparative genomics is constantly increasing. Although several software systems for genome comparison analyses do exist, their functionality and flexibility is still limited, compared to the manifold possible applications. Therefore, we developed Genlight(http://piranha.techfak.uni-bielefeld.de.), a Client/Server based program suite for large scale sequence analysis and comparative genomics. Genlight uses the object relational database system PostgreSQL together with a state of the art data representation and a distributed execution approach for large scale analysis tasks. The system includes a wide variety of comparison and sequence manipulation methods and supports the management of nucleotide sequences as well as protein sequences. The comparison methods are complemented by a large variety of visualization methods for the assessment of the generated results. In order to demonstrate the suitability of the system for the treatment of biological questions, Genlight was used to identify potential drug and vaccine targets of the pathogen Helicobacter pylori.
APA, Harvard, Vancouver, ISO, and other styles
19

Zhang, Yinxi. "Analysis of the application of bioinformatics in the medicine." Theoretical and Natural Science 29, no. 1 (January 8, 2024): 82–86. http://dx.doi.org/10.54254/2753-8818/29/20240751.

Full text
Abstract:
As humans develop, science is also rapidly evolving, and biological science and medical support are vital for humans need. The development of biology has been particularly important. It stretches from the initial understanding of plants and animals to human biology and micro molecular biology, from macro understanding of life to micro molecular biology. Although it has accumulated a lot of knowledge about biology, it still has many unknowns about this giant and precise system. However, in recent years, a new interdisciplinary and emerging discipline has greatly opened up peoples understanding about biological information-bioinformatics. The most fundamental research target of bioinformatics is different sequences. The most important is the amino acid sequence of proteins and the base sequence of DNA. To study sequences and their constituent components, bioinformatics has another major research target - the biological database. This involves analysing the structure, connotation, and function of biological information through basic protein and nucleic acid sequences. Therefore, bioinformatics has made tremendous applications in the medical field. This research mainly analyzes and discusses the advantages and disadvantages of the application of bioinformatics in the medical field.
APA, Harvard, Vancouver, ISO, and other styles
20

PUDIMAT, RAINER, ROLF BACKOFEN, and ERNST G. SCHUKAT-TALAMAZZINI. "FAST FEATURE SUBSET SELECTION IN BIOLOGICAL SEQUENCE ANALYSIS." International Journal of Pattern Recognition and Artificial Intelligence 23, no. 02 (March 2009): 191–207. http://dx.doi.org/10.1142/s0218001409007107.

Full text
Abstract:
Biological research produces a wealth of measured data. Neither it is easy for biologists to postulate hypotheses about the behavior or structure of the observed entity because the relevant properties measured are not seen in the ocean of measurements. Nor is it easy to design machine learning algorithms to classify or cluster the data items for the same reason. Algorithms for automatically selecting a highly predictive subset of the measured features can help to overcome these difficulties. We present an efficient feature selection strategy which can be applied to arbitrary feature selection problems. The core technique is a new method for estimating the quality of subsets from previously calculated qualities for smaller subsets by minimizing the mean standard error of estimated values with an approach common to support vector machines. This method can be integrated in many feature subset search algorithms. We have applied it with sequential search algorithms and have been able to reduce the number of quality calculations for finding accurate feature subsets by about 70%. We show these improvements by applying our approach to the problem of finding highly predictive feature subsets for transcription factor binding sites.
APA, Harvard, Vancouver, ISO, and other styles
21

Smith, Lloyd M. "Automated Synthesis and Sequence Analysis of Biological Macromolecules." Analytical Chemistry 60, no. 6 (March 15, 1988): 381A—390A. http://dx.doi.org/10.1021/ac00157a717.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Stockinger, H., T. Attwood, S. N. Chohan, R. Cote, P. Cudre-Mauroux, L. Falquet, P. Fernandes, et al. "Experience using web services for biological sequence analysis." Briefings in Bioinformatics 9, no. 6 (July 11, 2008): 493–505. http://dx.doi.org/10.1093/bib/bbn029.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Standage, Daniel, Ali yari, Lisa J. Cohen, Michael R. Crusoe, Tim Head, Luiz Irber, Shannon EK Joslin, et al. "khmer release v2.1: software for biological sequence analysis." Journal of Open Source Software 2, no. 15 (July 3, 2017): 272. http://dx.doi.org/10.21105/joss.00272.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Et. al., Karuppusamy T,. "BIOLOGICAL GENE SEQUENCE STUCTURE ANALYSIS USING HIDDEN MARKOV MODEL." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 4 (April 10, 2021): 1652–66. http://dx.doi.org/10.17762/turcomat.v12i4.1420.

Full text
Abstract:
Identification or prediction of coding sequences from within genomic DNA has been a major part of the search of the gene. In this work real hidden Markov models (HMMs) to denote the consensus and deliver a beneficial tool in determining the splicing junction sites Markov models which has a recurring nature in computational biology leads to statistical models, in every sequential analysis it plays a role of putting up a right label on each residue. In sequential alignment and as well as in gene identification namely exons, introns or intergenic sequences which make in a sequence with homologous residue with the target database. Under the gene identification methodology Condon bias, exons, introns have length preference which leads to a combination of splice site consensus. Parameters are fixed on the onset while weight of the different information are polled together leading to the interception of result probability, which could lead to identifying the best score based on score mean and how confident are the best scoring answers are perfect. This leads to the concept of extendibility, to perfect and ad hoc gene finder, which is a modeled transitional methodology leading to the consensus, alternate splicing and offers polyadenylation signal. This leads to piling of authenticity against a delicate ad hoc program which could make to breakdown under its individual weightiness.
APA, Harvard, Vancouver, ISO, and other styles
25

Hart, Reece K., and Andreas Prlić. "SeqRepo: A system for managing local collections of biological sequences." PLOS ONE 15, no. 12 (December 3, 2020): e0239883. http://dx.doi.org/10.1371/journal.pone.0239883.

Full text
Abstract:
Motivation Access to biological sequence data, such as genome, transcript, or protein sequence, is at the core of many bioinformatics analysis workflows. The National Center for Biotechnology Information (NCBI), Ensembl, and other sequence database maintainers provide methods to access sequences through network connections. For many users, the convenience and currency of remotely managed data are compelling, and the network latency is non-consequential. However, for high-throughput and clinical applications, local sequence collections are essential for performance, stability, privacy, and reproducibility. Results Here we describe SeqRepo, a novel system for building a local, high-performance, non-redundant collection of biological sequences. SeqRepo enables clients to use primary database identifiers and several digests to identify sequences and sequence alises. SeqRepo provides a native Python interface and a REST interface, which can run locally and enables access from other programming languages. SeqRepo also provides an alternative REST interface based on the GA4GH refget protocol. SeqRepo provides fast random access to sequence slices. We provide results that demonstrate that a local SeqRepo sequence collection yields significant performance benefits of up to 1300-fold over remote sequence collections. In our use case for a variant validation and normalization pipeline, SeqRepo improved throughput 50-fold relative to use with remote sequences. SeqRepo may be used with any species or sequence type. Regular snapshots of Human sequence collections are available. It is often convenient or necessary to use a computed digest as a sequence identifier. For example, a digest-based identifier may be used to refer to proprietary reference genomes or segments of a graph genome, for which conventional identifiers will not be available. Here we also introduce a convention for the application of the SHA-512 hashing algorithm with Base64 encoding to generate URL-safe identifiers. This convention, sha512t24u, combines a fast digest mechanism with a space-efficient representation that can be used for any object. Our report includes an analysis of timing and collision probabilities for sha512t24u. SeqRepo enables clients to use sha512t24u as identifiers, thereby seamlessly integrating public and private sequence sets. Availability SeqRepo is released under the Apache License 2.0 and is available on github and PyPi. Docker images and database snapshots are also available. See https://github.com/biocommons/biocommons.seqrepo.
APA, Harvard, Vancouver, ISO, and other styles
26

Lu, Yue, Long Zhao, Zhao Li, and Xiangjun Dong. "Genetic Similarity Analysis Based on Positive and Negative Sequence Patterns of DNA." Symmetry 12, no. 12 (December 16, 2020): 2090. http://dx.doi.org/10.3390/sym12122090.

Full text
Abstract:
Similarity analysis of DNA sequences can clarify the homology between sequences and predict the structure of, and relationship between, them. At the same time, the frequent patterns of biological sequences explain not only the genetic characteristics of the organism, but they also serve as relevant markers for certain events of biological sequences. However, most of the aforementioned biological sequence similarity analysis methods are targeted at the entire sequential pattern, which ignores the missing gene fragment that may induce potential disease. The similarity analysis of such sequences containing a missing gene item is a blank. Consequently, some sequences with missing bases are ignored or not effectively analyzed. Thus, this paper presents a new method for DNA sequence similarity analysis. Using this method, we first mined not only positive sequential patterns, but also sequential patterns that were missing some of the base terms (collectively referred to as negative sequential patterns). Subsequently, we used these frequent patterns for similarity analysis on a two-dimensional plane. Several experiments were conducted in order to verify the effectiveness of this algorithm. The experimental results demonstrated that the algorithm can obtain various results through the selection of frequent sequential patterns and that accuracy and time efficiency was improved.
APA, Harvard, Vancouver, ISO, and other styles
27

KOH, CHUAN HOCK, SHARENE LIN, GREGORY JEDD, and LIMSOON WONG. "SIRIUS PSB: A GENERIC SYSTEM FOR ANALYSIS OF BIOLOGICAL SEQUENCES." Journal of Bioinformatics and Computational Biology 07, no. 06 (December 2009): 973–90. http://dx.doi.org/10.1142/s0219720009004436.

Full text
Abstract:
Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models — one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at:
APA, Harvard, Vancouver, ISO, and other styles
28

Liu, Bin. "BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches." Briefings in Bioinformatics 20, no. 4 (December 19, 2017): 1280–94. http://dx.doi.org/10.1093/bib/bbx165.

Full text
Abstract:
AbstractWith the avalanche of biological sequences generated in the post-genomic age, one of the most challenging problems is how to computationally analyze their structures and functions. Machine learning techniques are playing key roles in this field. Typically, predictors based on machine learning techniques contain three main steps: feature extraction, predictor construction and performance evaluation. Although several Web servers and stand-alone tools have been developed to facilitate the biological sequence analysis, they only focus on individual step. In this regard, in this study a powerful Web server called BioSeq-Analysis (http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/) has been proposed to automatically complete the three main steps for constructing a predictor. The user only needs to upload the benchmark data set. BioSeq-Analysis can generate the optimized predictor based on the benchmark data set, and the performance measures can be reported as well. Furthermore, to maximize user’s convenience, its stand-alone program was also released, which can be downloaded from http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/download/, and can be directly run on Windows, Linux and UNIX. Applied to three sequence analysis tasks, experimental results showed that the predictors generated by BioSeq-Analysis even outperformed some state-of-the-art methods. It is anticipated that BioSeq-Analysis will become a useful tool for biological sequence analysis.
APA, Harvard, Vancouver, ISO, and other styles
29

Kumari, Uma, Aastha Tanwar, Jositta George, and Daityari Nayak. "NGS Analysis to Detect Mutation in Brain Tumor Diagnostic." International Journal for Research in Applied Science and Engineering Technology 11, no. 7 (July 31, 2023): 1394–402. http://dx.doi.org/10.22214/ijraset.2023.54895.

Full text
Abstract:
Abstract: This study presents an integrated computational approach for analyzing protein sequences and their 3D structures. By leveraging the MMDB Macromolecular database, homologs of a protein sequence of interest are identified, and interactive visualization of their structural properties is provided. The computational alignment method, using BLASTP, allows for efficient determination of sequence similarities and identification of conserved regions among multiple protein sequences. COBALT is employed to refine sequence alignments and facilitate graphical analysis of sequence relationships. RASMOL, a computational analysis program, generates 2-D representations of protein-ligand complexes, enabling visual exploration of their interactions. ORF finder is used to identify coding regions in mRNA sequences, aiding in the prediction of protein-coding regions. The approach is applied to brain tumor diagnostics using human biological samples, exploring the structural properties of brain tumor-related proteins with the help of the 2RHU protein structure and PYMOL visualization software. Overall, this integrated computational framework offers a comprehensive toolkit for protein sequence analysis, structure visualization, and homology modeling, with potential applications in drug discovery, molecular biology, and medical diagnostics
APA, Harvard, Vancouver, ISO, and other styles
30

Flach, Quezia N., Arthur F. Lorenzon, Marcelo C. Luizelli, and Fabio D. Rossi. "Analysis of Biological Sequence Search Performance in NoSQL Database." International Journal of Computer Applications 176, no. 42 (July 15, 2020): 1–6. http://dx.doi.org/10.5120/ijca2020920416.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Gascuel, O. "Inductive learning and biological sequence analysis. The PLAGE program." Biochimie 75, no. 5 (January 1993): 363–70. http://dx.doi.org/10.1016/0300-9084(93)90170-w.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Rascoe, J., M. Berg, U. Melcher, F. L. Mitchell, B. D. Bruton, S. D. Pair, and J. Fletcher. "Identification, Phylogenetic Analysis, and Biological Characterization of Serratia marcescens Strains Causing Cucurbit Yellow Vine Disease." Phytopathology® 93, no. 10 (October 2003): 1233–39. http://dx.doi.org/10.1094/phyto.2003.93.10.1233.

Full text
Abstract:
A serious vine decline of cucurbits known as cucurbit yellow vine disease (CYVD) is caused by rod-shaped bacteria that colonize the phloem elements. Sequence analysis of a CYVD-specific polymerase chain reaction (PCR)-amplified 16S rDNA product showed the microbe to be a γ-proteobacterium related to the genus Serratia. To identify and characterize the bacteria, one strain each from watermelon and zucchini and several noncucurbit-derived reference strains were subjected to sequence analysis and biological function assays. Taxonomic and phylogenetic placement was investigated by analysis of the groE and 16S rDNA regions, which were amplified by PCR and directly sequenced. For comparison, eight other bacterial strains identified by others as Serratia spp. also were sequenced. These sequences clearly identified the CYVD strains as Serratia marcescens. However, evaluation of metabolic and biochemical features revealed that cucurbit-derived strains of S. marcescens differ substantially from strains of the same species isolated from other environmental niches. Cucurbit strains formed a distinct cluster, separate from other strains, when their fatty acid methyl ester profiles were analyzed. In substrate utilization assays (BIOLOG, Vitek, and API 20E), the CYVD strains lacked a number of metabolic functions characteristic for S. marcescens, failing to catabolize 25 to 30 compounds that were utilized by S. marcescens reference strains. These biological differences may reflect gene loss or repression that occurred as the bacterium adapted to life as an intracellular parasite and plant pathogen.
APA, Harvard, Vancouver, ISO, and other styles
33

Wei, Dan, Qingshan Jiang, and Sheng Li. "A New Approach for DNA Sequence Similarity Analysis based on Triplets of Nucleic Acid Bases." International Journal of Nanotechnology and Molecular Computation 2, no. 4 (October 2010): 1–11. http://dx.doi.org/10.4018/978-1-60960-064-8.ch006.

Full text
Abstract:
Similarity analysis of DNA sequences is a fundamental research area in Bioinformatics. The characteristic distribution of L-tuple, which is the tuple of length L, reflects the valuable information contained in a biological sequence and thus may be used in DNA sequence similarity analysis. However, similarity analysis based on characteristic distribution of L-tuple is not effective for the comparison of highly conservative sequences. In this paper, a new similarity measurement approach based on Triplets of Nucleic Acid Bases (TNAB) is introduced for DNA sequence similarity analysis. The new approach characterizes both the content feature and position feature of a DNA sequence using the frequency and position of occurrence of TNAB in the sequence. The experimental results show that the approach based on TNAB is effective for analysing DNA sequence similarity.
APA, Harvard, Vancouver, ISO, and other styles
34

Yu, Zu-Guo, Vo Anh, and Ka-Sing Lau. "Iterated Function System and Multifractal Analysis of Biological Sequences." International Journal of Modern Physics B 17, no. 22n24 (September 30, 2003): 4367–75. http://dx.doi.org/10.1142/s0217979203022477.

Full text
Abstract:
The fractal method has been successfully used to study many problems in physics, mathematics, engineering, finance, even in biology till now. In the past decade or so there has been a ground swell of interest in unravelling the mysteries of DNA. How to get more bioinformations from these DNA sequences is a challenging problem. The problem of classification and evolution relationship of organisms are the central problems in bioinformatics. And it is also very hard to predict the secondary and space structure of a protein from its amino acid sequence. In this paper, some recent results related these problems obtained through multifractal analysis and iterated function system (IFS) model are introduced.
APA, Harvard, Vancouver, ISO, and other styles
35

Kauffman, Erle G., and Bradley B. Sageman. "Biological patterns in sequence stratigraphy; Cretaceous of the Western Interior Basin, North America." Paleontological Society Special Publications 6 (1992): 158. http://dx.doi.org/10.1017/s2475262200007188.

Full text
Abstract:
High-resolution stratigraphic analysis of Cretaceous strata in the Western Interior Basin (WIB) of North America has allowed definition of numerous disconformity-bounded, eustatically and/or tectonically driven sequences and their systems tracts at 2nd- through 4th-order scale, as well as 5th- to 7th-order climate-induced cycles. Integrated event chronostratigraphy and biostratigraphy allow detailed regional tracing and facies analysis of these sequences, leading to three-dimensional modeling of facies evolution. Whether driven by relative sealevel changes or smaller scale climate cycles, Cretaceous sequences and their bounding disconformities reflect dynamic changes in many factors which moderate biological systems (e.g. sealevel and paleobathymetric changes, changes in current velocity and in erosion/sedimentation rates and patterns, watermass temperature and chemistry, etc). Predictable biological responses (patterns) to varying environmental conditions and different systems tracts are expected in sequence stratigraphy. Once defined within well-studied systems, these patterns can then be used as an independent tool for sequence stratigraphic analysis. To date, our research has focused on the development of paleobiological criteria which aid in the recognition of sequence stratigraphic frameworks, especially in basinal facies where sequence boundaries and systems tracts may be subtly defined in the physical stratigraphy. Such criteria may include the identification of sequence boundaries and other omission surfaces by punctuated character displacement in evolutionary series, by condensation or omission of biostratigraphic zones, by mixed or time-averaged community elements and biozones, and by selective colonization by firm substrate-dependent benthic communities. Gradients within and between systems are characterized by different community composition, biofacies, taxonomic and community diversity patterns, adaptive bauplans among resident taxa, taphonomic signatures, and bioevents that allow predictive biological characterization in sequence stratigraphy. Once established and correlated, sequence stratigraphic systems among different basins provide a chronostratigaphic and environmental framework within which the regional dynamics of ancient populations and communities can be evaluated, leading to the analysis and modeling of relationships between sealevel changes and biogeographic migration patterns, and the rates and patterns of evolution and extinction.
APA, Harvard, Vancouver, ISO, and other styles
36

Gancheva, Veska, and Hristo Stoev. "Optimization and Performance Analysis of CAT Method for DNA Sequence Similarity Searching and Alignment." Genes 15, no. 3 (March 7, 2024): 341. http://dx.doi.org/10.3390/genes15030341.

Full text
Abstract:
Bioinformatics is a rapidly developing field enabling scientific experiments via computer models and simulations. In recent years, there has been an extraordinary growth in biological databases. Therefore, it is extremely important to propose effective methods and algorithms for the fast and accurate processing of biological data. Sequence comparisons are the best way to investigate and understand the biological functions and evolutionary relationships between genes on the basis of the alignment of two or more DNA sequences in order to maximize the identity level and degree of similarity. This paper presents a new version of the pairwise DNA sequences alignment algorithm, based on a new method called CAT, where a dependency with a previous match and the closest neighbor are taken into consideration to increase the uniqueness of the CAT profile and to reduce possible collisions, i.e., two or more sequence with the same CAT profiles. This makes the proposed algorithm suitable for finding the exact match of a concrete DNA sequence in a large set of DNA data faster. In order to enable the usage of the profiles as sequence metadata, CAT profiles are generated once prior to data uploading to the database. The proposed algorithm consists of two main stages: CAT profile calculation depending on the chosen benchmark sequences and sequence comparison by using the calculated CAT profiles. Improvements in the generation of the CAT profiles are detailed and described in this paper. Block schemes, pseudo code tables, and figures were updated according to the proposed new version and experimental results. Experiments were carried out using the new version of the CAT method for DNA sequence alignment and different datasets. New experimental results regarding collisions, speed, and efficiency of the suggested new implementation are presented. Experiments related to the performance comparison with Needleman–Wunsch were re-executed with the new version of the algorithm to confirm that we have the same performance. A performance analysis of the proposed algorithm based on the CAT method against the Knuth–Morris–Pratt algorithm, which has a complexity of O(n) and is widely used for biological data searching, was performed. The impact of prior matching dependencies on uniqueness for generated CAT profiles is investigated. The experimental results from sequence alignment demonstrate that the proposed CAT method-based algorithm exhibits minimal deviation, which can be deemed negligible if such deviation is considered permissible in favor of enhanced performance. It should be noted that the performance of the CAT algorithm in terms of execution time remains stable, unaffected by the length of the analyzed sequences. Hence, the primary benefit of the suggested approach lies in its rapid processing capabilities in large-scale sequence alignment, a task that traditional exact algorithms would require significantly more time to perform.
APA, Harvard, Vancouver, ISO, and other styles
37

Jeon, Yoon-Seong, Kihyun Lee, Sang-Cheol Park, Bong-Soo Kim, Yong-Joon Cho, Sung-Min Ha, and Jongsik Chun. "EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes." International Journal of Systematic and Evolutionary Microbiology 64, Pt_2 (February 1, 2014): 689–91. http://dx.doi.org/10.1099/ijs.0.059360-0.

Full text
Abstract:
EzEditor is a Java-based molecular sequence editor allowing manipulation of both DNA and protein sequence alignments for phylogenetic analysis. It has multiple features optimized to connect initial computer-generated multiple alignment and subsequent phylogenetic analysis by providing manual editing with reference to biological information specific to the genes under consideration. It provides various functionalities for editing rRNA alignments using secondary structure information. In addition, it supports simultaneous editing of both DNA sequences and their translated protein sequences for protein-coding genes. EzEditor is, to our knowledge, the first sequence editing software designed for both rRNA- and protein-coding genes with the visualization of biologically relevant information and should be useful in molecular phylogenetic studies. EzEditor is based on Java, can be run on all major computer operating systems and is freely available from http://sw.ezbiocloud.net/ezeditor/.
APA, Harvard, Vancouver, ISO, and other styles
38

Liu, Bin, Xin Gao, and Hanyu Zhang. "BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches." Nucleic Acids Research 47, no. 20 (September 4, 2019): e127-e127. http://dx.doi.org/10.1093/nar/gkz740.

Full text
Abstract:
Abstract As the first web server to analyze various biological sequences at sequence level based on machine learning approaches, many powerful predictors in the field of computational biology have been developed with the assistance of the BioSeq-Analysis. However, the BioSeq-Analysis can be only applied to the sequence-level analysis tasks, preventing its applications to the residue-level analysis tasks, and an intelligent tool that is able to automatically generate various predictors for biological sequence analysis at both residue level and sequence level is highly desired. In this regard, we decided to publish an important updated server covering a total of 26 features at the residue level and 90 features at the sequence level called BioSeq-Analysis2.0 (http://bliulab.net/BioSeq-Analysis2.0/), by which the users only need to upload the benchmark dataset, and the BioSeq-Analysis2.0 can generate the predictors for both residue-level analysis and sequence-level analysis tasks. Furthermore, the corresponding stand-alone tool was also provided, which can be downloaded from http://bliulab.net/BioSeq-Analysis2.0/download/. To the best of our knowledge, the BioSeq-Analysis2.0 is the first tool for generating predictors for biological sequence analysis tasks at residue level. Specifically, the experimental results indicated that the predictors developed by BioSeq-Analysis2.0 can achieve comparable or even better performance than the existing state-of-the-art predictors.
APA, Harvard, Vancouver, ISO, and other styles
39

Pujari, Jeevana Jyothi, and Karteeka Pavan Kanadam. "Semi Global Pairwise Sequence Alignment Using New Chromosome Structure Genetic Algorithm." Ingénierie des systèmes d information 27, no. 1 (February 28, 2022): 67–74. http://dx.doi.org/10.18280/isi.270108.

Full text
Abstract:
Biological sequence alignment is a prominent and eminent task in the analysis of biological data. This paper proposes a pair wise semi global sequence alignment technique using New Chromosome Structure based Genetic algorithm (NCSGA) for aligning sequences by automatically detecting optimal number of gaps and their positions to explore the optimal score for DNA or protein sequences. The experimental results are conducted using simulated real datasets from NCBI. The proposed method can be tested on real data sets of nucleotide sequence pairs. The computational results show that NCSGA produces the near optimal solutions for semi global alignment compared to other existing approaches.
APA, Harvard, Vancouver, ISO, and other styles
40

Meyer, Axel. "MacVector: Sequence Analysis Software. Version 4.1.AssemblyLIGN: Sequence Assembly Software." Quarterly Review of Biology 70, no. 1 (March 1995): 128–29. http://dx.doi.org/10.1086/418976.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Zhang, Nian-Zhang, Ying Xu, Si-Yang Huang, Dong-Hui Zhou, Rui-Ai Wang, and Xing-Quan Zhu. "Sequence Variation inToxoplasma gondii rop17Gene among Strains from Different Hosts and Geographical Locations." Scientific World Journal 2014 (2014): 1–4. http://dx.doi.org/10.1155/2014/349325.

Full text
Abstract:
Genetic diversity ofT. gondiiis a concern of many studies, due to the biological and epidemiological diversity of this parasite. The present study examined sequence variation in rhoptry protein 17 (ROP17) gene amongT. gondiiisolates from different hosts and geographical regions. Therop17gene was amplified and sequenced from 10T. gondiistrains, and phylogenetic relationship among theseT. gondiistrains was reconstructed using maximum parsimony (MP), neighbor-joining (NJ), and maximum likelihood (ML) analyses. The partialrop17gene sequences were 1375 bp in length and A+T contents varied from 49.45% to 50.11% among all examinedT. gondiistrains. Sequence analysis identified 33 variable nucleotide positions (2.1%), 16 of which were identified as transitions. Phylogeny reconstruction based onrop17gene data revealed two major clusters which could readily distinguish Type I and Type II strains. Analyses of sequence variations in nucleotides and amino acids among these strains revealed high ratio of nonsynonymous to synonymous polymorphisms (>1), indicating thatrop17shows signs of positive selection. This study demonstrated the existence of slightly high sequence variability in therop17gene sequences amongT. gondiistrains from different hosts and geographical regions, suggesting thatrop17gene may represent a new genetic marker for population genetic studies ofT. gondiiisolates.
APA, Harvard, Vancouver, ISO, and other styles
42

Nugent, Cameron M., Tyler A. Elliott, Sujeevan Ratnasingham, and Sarah J. Adamowicz. "coil: an R package for cytochrome c oxidase I (COI) DNA barcode data cleaning, translation, and error evaluation." Genome 63, no. 6 (June 2020): 291–305. http://dx.doi.org/10.1139/gen-2019-0206.

Full text
Abstract:
Biological conclusions based on DNA barcoding and metabarcoding analyses can be strongly influenced by the methods utilized for data generation and curation, leading to varying levels of success in the separation of biological variation from experimental error. The 5′ region of cytochrome c oxidase subunit I (COI-5P) is the most common barcode gene for animals, with conserved structure and function that allows for biologically informed error identification. Here, we present coil ( https://CRAN.R-project.org/package=coil ), an R package for the pre-processing and frameshift error assessment of COI-5P animal barcode and metabarcode sequence data. The package contains functions for placement of barcodes into a common reading frame, accurate translation of sequences to amino acids, and highlighting insertion and deletion errors. The analysis of 10 000 barcode sequences of varying quality demonstrated how coil can place barcode sequences in reading frame and distinguish sequences containing indel errors from error-free sequences with greater than 97.5% accuracy. Package limitations were tested through the analysis of COI-5P sequences from the plant and fungal kingdoms as well as the analysis of potential contaminants: nuclear mitochondrial pseudogenes and Wolbachia COI-5P sequences. Results demonstrated that coil is a strong technical error identification method but is not reliable for detecting all biological contaminants.
APA, Harvard, Vancouver, ISO, and other styles
43

Chen, Xinwen, W. J. Zhang, J. Wong, G. Chun, A. Lu, B. F. McCutchen, J. K. Presnail, et al. "Comparative analysis of the complete genome sequences of Helicoverpa zea and Helicoverpa armigera single-nucleocapsid nucleopolyhedroviruses." Journal of General Virology 83, no. 3 (March 1, 2002): 673–84. http://dx.doi.org/10.1099/0022-1317-83-3-673.

Full text
Abstract:
The complete nucleotide sequence of Helicoverpa zea single-nucleocapsid nucleopolyhedrovirus (HzSNPV) has been determined (130869 bp) and compared to the nucleotide sequence of Helicoverpa armigera (Ha) SNPV. These two genomes are very similar in their nucleotide (97% identity) and amino acid (99% identity) sequences. The coding regions are much more conserved than the non-coding regions. In HzSNPV/HaSNPV, the 63 open reading frames (ORFs) present in all baculoviruses sequenced so far are much more conserved than other ORFs. HzSNPV has four additional small ORFs compared with HaSNPV, one of these (Hz42) being in a correct transcriptional context. The major differences between HzSNPV and HaSNPV are found in the sequence and organization of the homologous regions (hrs) and the baculovirus repeat ORFs (bro genes). The sequence identity between the HzSNPV and HaSNPV hrs ranges from 90% (hr1) to almost 100% (hr5) and the hrs differ in the presence/absence of one or more type A and/or B repeats. The three HzSNPV bro genes differ significantly from those in HaSNPV and may have been acquired independently in the ancestral past. The sequence data suggest strongly that HzSNPV and HaSNPV are variants of the same virus species, a conclusion that is supported by the physical and biological data.
APA, Harvard, Vancouver, ISO, and other styles
44

Steinke, Dirk, Miguel Vences, Walter Salzburger, and Axel Meyer. "TaxI: a software tool for DNA barcoding using distance methods." Philosophical Transactions of the Royal Society B: Biological Sciences 360, no. 1462 (September 8, 2005): 1975–80. http://dx.doi.org/10.1098/rstb.2005.1729.

Full text
Abstract:
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding.
APA, Harvard, Vancouver, ISO, and other styles
45

Onasanya, A., M. M. Ekperigin, R. O. Onasanya, T. O. Obafemi, A. T. Ogundipe, A. A. Ojo, and I. Ingelbrecht. "DNA Sequencing Analysis of African Xanthomonas oryzae pv. oryzae Virulence Gene (AXaVrg) DNA Marker." Scientia Agriculturae Bohemica 49, no. 2 (June 1, 2018): 78–86. http://dx.doi.org/10.2478/sab-2018-0012.

Full text
Abstract:
Abstract Global rice production is constrained by bacterial leaf blight (BLB) disease caused by Xanthomonas oryzae pv. oryzae (Xoo). BLB disease incidence in West Africa was between 70–85% and yield loss in farmers’ fields was in the range of 50–90% from 2005 to 2010. In the present study, African Xoo virulence gene OPP-172000 DNA marker was identified and purified using randomly amplified polymorphic DNA polymerase chain reaction (RAPD-PCR) products from 50 Xoo isolates. Genomic DNA of 50 Xoo isolates were analyzed using OPP-17 primer in RAPD-PCR during which African Xoo virulence gene OPP-172000 DNA marker was identified, purified, cloned, and sequenced. Cloning and DNA sequencing of African Xoo virulence gene OPP-172000 DNA generated a 1953 bp nucleotide sequence consequently tagged as AXaVrg-1953. BLAST homologous analysis of the AXaVrg-1953 sequence provides comprehensive identification of the type II secretion genes and secreted proteins, type III secretion genes and secreted proteins in African Xoo virulence gene. Phylogenetic unweighted pairgroup method arithmetic (UPGMA) analysis revealed the African AXaVrg-1953 sequence was distinct from the other Xoo virulence gene sequences from China, Japan, Korea, Germany, and the United States. This information is potentially useful for effective management of BLB disease in West Africa.
APA, Harvard, Vancouver, ISO, and other styles
46

Yang, Lina, Pu Wei, Cheng Zhong, Zuqiang Meng, Patrick Wang, and Yuan Yan Tang. "A Fractal Dimension and Empirical Mode Decomposition-Based Method for Protein Sequence Analysis." International Journal of Pattern Recognition and Artificial Intelligence 33, no. 11 (October 2019): 1940020. http://dx.doi.org/10.1142/s0218001419400202.

Full text
Abstract:
In bioinformatics, the biological functions of proteins and their interactions can often be analyzed by the similarity of their sequences. In this paper, the authors combine the fractal dimension, empirical mode decomposition (EMD), and sliding window for protein sequence comparison. First, the protein sequence is characterized and digitized into a signal, and then the signal characteristics are obtained by using EMD and fractal dimension. Each protein sequence can be decomposed into Intrinsic Mode Functions (IMFs). The fixed window’s fractal dimension is applied to each IMF and the original signal to extract the protein sequence characteristics. Experiments have shown that the feature extracted by this hybrid method is superior to the EMD method alone.
APA, Harvard, Vancouver, ISO, and other styles
47

Yoon, Byung-Jun. "Hidden Markov Models and their Applications in Biological Sequence Analysis." Current Genomics 10, no. 6 (September 1, 2009): 402–15. http://dx.doi.org/10.2174/138920209789177575.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Manzoor, Umar, Sarosh Shahid, and Bassam Zafar. "A comparative analysis of multiple sequence alignments for biological data." Bio-Medical Materials and Engineering 26, s1 (August 17, 2015): S1781—S1789. http://dx.doi.org/10.3233/bme-151479.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Hawkins, J., and M. Boden. "The Applicability of Recurrent Neural Networks for Biological Sequence Analysis." IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, no. 3 (July 2005): 243–53. http://dx.doi.org/10.1109/tcbb.2005.44.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Meng, Xiandong, and Vipin Chaudhary. "A High-Performance Heterogeneous Computing Platform for Biological Sequence Analysis." IEEE Transactions on Parallel and Distributed Systems 21, no. 9 (September 2010): 1267–80. http://dx.doi.org/10.1109/tpds.2009.165.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography