Journal articles on the topic 'Sequenze biologiche'

To see the other types of publications on this topic, follow the link: Sequenze biologiche.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Sequenze biologiche.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Willment, J. A., D. P. Martin, E. Van der Walt, and E. P. Rybicki. "Biological and Genomic Sequence Characterization of Maize streak virus Isolates from Wheat." Phytopathology® 92, no. 1 (January 2002): 81–86. http://dx.doi.org/10.1094/phyto.2002.92.1.81.

Full text
Abstract:
Maize streak virus (MSV) is best known as the causal agent of maize streak disease. However, only a genetically uniform subset of the viruses within this diverse species is actually capable of producing severe symptoms in maize. Whereas these “maize-type” viruses all share greater than 95% sequence identity, MSV strains isolated from grasses may share as little as 79% sequence identity with the maize-type viruses. Here, we present the complete genome sequences and biological characterization of two MSV isolates from wheat that share ≈89% sequence identity with the maize-type viruses. Clonal populations of these two isolates, named MSV-Tas and MSV-VW, were leafhopper-transmitted to Digitaria sanguinalis and a range of maize, wheat, and barley genotypes. Whereas the two viruses showed some differences in their pathogenicity in maize, they were both equally pathogenic in D. sanguinalis and the various wheat and barley genotypes tested. Phylogenetic analyses involving the genome sequences of MSV-Tas and MSV-VW, a new maize-type virus also fully sequenced in this study (MSV-VM), and all other available African streak virus sequences, indicated that MSV-Tas and MSV-VW are close relatives that together represent a distinct MSV strain. Sequence analyses revealed that MSV-VM has a recombinant genome containing MSV-Tas/VW-like sequences within its movement protein gene.
APA, Harvard, Vancouver, ISO, and other styles
2

Mechanda, Subbaiah M., Bernard R. Baum, Douglas A. Johnson, and John T. Arnason. "Sequence assessment of comigrating AFLPTM bands in Echinacea — implications for comparative biological studies." Genome 47, no. 1 (January 1, 2004): 15–25. http://dx.doi.org/10.1139/g03-094.

Full text
Abstract:
The extent of sequence identity among clones derived from monomorphic and polymorphic AFLPTM polymorphism bands was quantified. A total of 79 fragments from a monomorphic band of 273 bp and 48 fragments from a polymorphic band of 159 bp, isolated from individuals belonging to different populations, varieties, and species of Echinacea, were cloned and sequenced. The monomorphic fragments exhibited above 90% sequence identity among clones within samples. Sequence identity within variety ranged from 82.78% to 94.87% and within species from 75.82% to 98.9% and was 57.97% in the genus. The polymorphic fragments exhibited much less sequence identity. In some instances, even two clones from the same fragment were different in their size and sequence. Within sample, clone sequence identity ranged from 100% to 51.57%, within variety from 33.33% to 100% in one variety, and from 23.66% to 45% within species and was as low as 1.25% within the genus. In addition, sequences of the same size were aligned to verify the nature of their sequence dissimilarity/similarity. Within each size group, identical sequences were found across species and varieties. In general, comigrating bands cannot be considered homologous. Thus, the use of AFLPTM band data for comparative studies is appropriate only if the results emanating from such analyses are considered as approximations and are interpreted as phenotypic but not genotypic.Key words: AFLP markers, false homologies.
APA, Harvard, Vancouver, ISO, and other styles
3

Venkataraman, Ganesh, Zachary Shriver, Rahul Raman, and Ram Sasisekharan. "Sequencing Complex Polysaccharides." Science 286, no. 5439 (October 15, 1999): 537–42. http://dx.doi.org/10.1126/science.286.5439.537.

Full text
Abstract:
Although rapid sequencing of polynucleotides and polypeptides has become commonplace, it has not been possible to rapidly sequence femto- to picomole amounts of tissue-derived complex polysaccharides. Heparin-like glycosaminoglycans (HLGAGs) were readily sequenced by a combination of matrix-assisted laser desorption ionization mass spectrometry and a notation system for representation of polysaccharide sequences. This will enable identification of sequences that are critical to HLGAG biological activities in anticoagulation, cell growth, and differentiation.
APA, Harvard, Vancouver, ISO, and other styles
4

Song, Bosheng, Zimeng Li, Xuan Lin, Jianmin Wang, Tian Wang, and Xiangzheng Fu. "Pretraining model for biological sequence data." Briefings in Functional Genomics 20, no. 3 (May 2021): 181–95. http://dx.doi.org/10.1093/bfgp/elab025.

Full text
Abstract:
Abstract With the development of high-throughput sequencing technology, biological sequence data reflecting life information becomes increasingly accessible. Particularly on the background of the COVID-19 pandemic, biological sequence data play an important role in detecting diseases, analyzing the mechanism and discovering specific drugs. In recent years, pretraining models that have emerged in natural language processing have attracted widespread attention in many research fields not only to decrease training cost but also to improve performance on downstream tasks. Pretraining models are used for embedding biological sequence and extracting feature from large biological sequence corpus to comprehensively understand the biological sequence data. In this survey, we provide a broad review on pretraining models for biological sequence data. Moreover, we first introduce biological sequences and corresponding datasets, including brief description and accessible link. Subsequently, we systematically summarize popular pretraining models for biological sequences based on four categories: CNN, word2vec, LSTM and Transformer. Then, we present some applications with proposed pretraining models on downstream tasks to explain the role of pretraining models. Next, we provide a novel pretraining scheme for protein sequences and a multitask benchmark for protein pretraining models. Finally, we discuss the challenges and future directions in pretraining models for biological sequences.
APA, Harvard, Vancouver, ISO, and other styles
5

Azha Javed and Muhammad Javed Iqbal. "Classification of Biological Data using Deep Learning Technique." NUML International Journal of Engineering and Computing 1, no. 1 (April 27, 2022): 13–26. http://dx.doi.org/10.52015/nijec.v1i1.10.

Full text
Abstract:
A huge amount of newly sequenced proteins is being discovered on daily basis. The mainconcern is how to extract the useful characteristics of sequences as the input features for thenetwork. These sequences are increasing exponentially over the decades. However, it is veryexpensive to characterize functions for biological experiments and also, it is really necessaryto find the association between the information of datasets to create and improve medicaltools. Recently machine learning algorithms got huge attention and are widely used. Thesealgorithms are based on deep learning architecture and data-driven models. Previous workfailed to properly address issues related to the classification of biological sequences i.e.protein including efficient encoding of variable length biological sequence data andimplementation of deep learning based neural network models to enhance the performance ofclassification/ recognition systems. To overcome these issues, we have proposed a deeplearning based neural network architecture so that classification performance of the systemcan be increased. In our work, we have proposed 1D-convolution neural network whichclassifies the protein sequences to 10 top common classes. The model extracted features fromthe protein sequences labels and learned through the dataset. We have trained and evaluateour model on protein sequences downloaded from protein data bank (PDB). The modelmaximizes the accuracy rate up to 96%.
APA, Harvard, Vancouver, ISO, and other styles
6

Md Isa, Mohd Nazrin, Sohiful Anuar Zainol Murad, Mohamad Imran Ahmad, Muhammad M. Ramli, and Rizalafande Che Ismail. "An Efficient Scheduling Technique for Biological Sequence Alignment." Applied Mechanics and Materials 754-755 (April 2015): 1087–92. http://dx.doi.org/10.4028/www.scientific.net/amm.754-755.1087.

Full text
Abstract:
Computing alignment matrix score to search for regions of homology between biological sequences is time consuming task. This is due to the recursive nature of the dynamic programming-based algorithms such as the Smith-Waterman and the Needleman-Wunsch algorithmns. Typical FPGA-based protein sequencer comprises of two main logic blocks. One for computing alignment scores i.e. the processing element (PE), while another logic block for configuring the PE with coefficients. During alignment matrix computation, the logic block for configuring the PE are left unused until the time consuming alignment matrix computation finished. Therefore, a new technique, known as overlap computation and configuration (OCC) is proposed to minimize the time overhead for performing biological sequence alignment. The OCC technique simultaneously updating substitution matrix in a processing element (PE) systolic array, while computing alignment matrix scores. Results showed that, the sequencer achieves more than two order of magnitude speed-up higher compared to the state of the art, at negligible area overhead, if any.
APA, Harvard, Vancouver, ISO, and other styles
7

Idris, A. M., and J. K. Brown. "Sinaloa Tomato Leaf Curl Geminivirus: Biological and Molecular Evidence for a New Subgroup III Virus." Phytopathology® 88, no. 7 (July 1998): 648–57. http://dx.doi.org/10.1094/phyto.1998.88.7.648.

Full text
Abstract:
The biological and molecular properties of Sinaloa tomato leaf curl virus (STLCV) were investigated in line with the hypothesis that STLCV is a previously uncharacterized, whitefly-transmitted geminivirus from North America. STLCV causes yellow leaf curl symptoms in tomato and yellow-green foliar mottle in pepper. Five species belonging to two plant families were STLCV experimental hosts. STLCV had a persistent relationship with its whitefly vector, Bemisia tabaci. Polymerase chain reaction fragments of STLCV common region (CR) sequences of the A or B genomic components and the viral coat protein gene (AV1) were molecularly cloned and sequenced. The STLCV A- and B-component CR sequences (174 nucleotides each) shared 97.9% identity and contained identical cis elements putatively involved in transcriptional regulation and an origin of replication (the AC cleavage site within the loop of the hairpin structure and two direct repeat sequences thought to constitute the Rep binding motif), which collectively are diagnostic for subgroup III geminiviruses. The STLCV CR sequence shared 23.1 to 77.6% identity with CR sequences of representative geminiviridae, indicating the STLCV CR sequence is unique. Molecular phylogenetic analysis of CR or AV1 sequences of STLCV and the respective sequences of 31 familial members supported the placement of STLCV as a unique bipartite, subgroup III virus most closely related to other viruses from the Western Hemisphere. STLCV is provisionally described as a new species within the genus Begomovirus, family Geminiviridae.
APA, Harvard, Vancouver, ISO, and other styles
8

Lotrakul, Pongtharin, Rodrigo A. Valverde, and Angela D. Landry. "Biological and Molecular Properties of a Begomovirus from Dicliptera sexangularis." Phytopathology® 90, no. 7 (July 2000): 723–29. http://dx.doi.org/10.1094/phyto.2000.90.7.723.

Full text
Abstract:
Sixangle foldwing, Dicliptera sexangularis (Acanthaceae), showing severe yellow mottle and leaf distortion symptoms was collected from the shoreline of Calusa Island (Lee County, FL). The putative virus was transmitted from infected D. sexangularis to healthy seedlings by mechanical, whitefly (Bemisia tabaci biotype B), and graft-inoculations. Different forms of geminivirus-like DNAs were detected in total DNA extracted from infected plants by Southern blot hybridization analyses using DNA-A and -B of Bean golden mosaic virus (BGMV) from Guatemala as probes. Preliminary polymerase chain reaction experiments and sequence comparisons indicated that the virus was a distinct bipartite begomovirus. The virus was designated Dicliptera yellow mottle virus (DiYMV). Replicative dsDNAs of DiYMV were extracted, digested with selected restriction enzymes, and cloned into a plasmid vector. Both DNA-A and -B were sequenced and compared with those of other begomoviruses. Phylogenetic analyses using AV1, AC1, and BV1 nucleotide sequences indicated that DiYMV has a close relationship with the New World begomoviruses, especially those distributed in the nearby geographic areas of the Florida coast and the Caribbean Basin. However, different percent nucleotide sequence identities and phylogenetic relationships were detected when different open reading frames (ORFs) of DiYMV were compared with their counterparts from begomoviruses from the Caribbean Basin. Based on phylogenetic analyses of the AC1 and BV1 ORFs, DiYMV was closely related to BGMV type II isolates, whereas sequence comparisons of the common region and the AC4-derived amino acid sequences indicated its close relationship with Potato yellow mosaic virus from Venezuela.
APA, Harvard, Vancouver, ISO, and other styles
9

Petti, Samantha, and Sean R. Eddy. "Constructing benchmark test sets for biological sequence analysis using independent set algorithms." PLOS Computational Biology 18, no. 3 (March 7, 2022): e1009492. http://dx.doi.org/10.1371/journal.pcbi.1009492.

Full text
Abstract:
Biological sequence families contain many sequences that are very similar to each other because they are related by evolution, so the strategy for splitting data into separate training and test sets is a nontrivial choice in benchmarking sequence analysis methods. A random split is insufficient because it will yield test sequences that are closely related or even identical to training sequences. Adapting ideas from independent set graph algorithms, we describe two new methods for splitting sequence data into dissimilar training and test sets. These algorithms input a sequence family and produce a split in which each test sequence is less than p% identical to any individual training sequence. These algorithms successfully split more families than a previous approach, enabling construction of more diverse benchmark datasets.
APA, Harvard, Vancouver, ISO, and other styles
10

Liu, Wen-li, and Qing-biao Wu. "Analysis method and algorithm design of biological sequence problem based on generalized k-mer vector." Applied Mathematics-A Journal of Chinese Universities 36, no. 1 (March 2021): 114–27. http://dx.doi.org/10.1007/s11766-021-4033-x.

Full text
Abstract:
AbstractK-mer can be used for the description of biological sequences and k-mer distribution is a tool for solving sequences analysis problems in bioinformatics. We can use k-mer vector as a representation method of the k-mer distribution of the biological sequence. Problems, such as similarity calculations or sequence assembly, can be described in the k-mer vector space. It helps us to identify new features of an old sequence-based problem in bioinformatics and develop new algorithms using the concepts and methods from linear space theory. In this study, we defined the k-mer vector space for the generalized biological sequences. The meaning of corresponding vector operations is explained in the biological context. We presented the vector/matrix form of several widely seen sequence-based problems, including read quantification, sequence assembly, and pattern detection problem. Its advantages and disadvantages are discussed. Also, we implement a tool for the sequence assembly problem based on the concepts of k-mer vector methods. It shows the practicability and convenience of this algorithm design strategy.
APA, Harvard, Vancouver, ISO, and other styles
11

Xue, Linyan, Xiaoke Zhang, Fei Xie, Shuang Liu, and Peng Lin. "Frequent Patterns Algorithm of Biological Sequences based on Pattern Prefix-tree." International Journal of Computers Communications & Control 14, no. 4 (August 5, 2019): 574–89. http://dx.doi.org/10.15837/ijccc.2019.4.3607.

Full text
Abstract:
In the application of bioinformatics, the existing algorithms cannot be directly and efficiently implement sequence pattern mining. Two fast and efficient biological sequence pattern mining algorithms for biological single sequence and multiple sequences are proposed in this paper. The concept of the basic pattern is proposed, and on the basis of mining frequent basic patterns, the frequent pattern is excavated by constructing prefix trees for frequent basic patterns. The proposed algorithms implement rapid mining of frequent patterns of biological sequences based on pattern prefix trees. In experiment the family sequence data in the pfam protein database is used to verify the performance of the proposed algorithm. The prediction results confirm that the proposed algorithms can’t only obtain the mining results with effective biological significance, but also improve the running time efficiency of the biological sequence pattern mining.
APA, Harvard, Vancouver, ISO, and other styles
12

Rahman, Tasnim, Hasnain Heickal, Shamira Tabrejee, Md Miraj Kobad Chowdhury, Sheikh Muhammad Sarwar, and Mohammad Shoyaib. "SeqDev: An Algorithm for Constructing Genetic Elements Using Comparative Assembly." Plant Tissue Culture and Biotechnology 26, no. 1 (September 27, 2016): 105–21. http://dx.doi.org/10.3329/ptcb.v26i1.29772.

Full text
Abstract:
With the availability of recent next generation sequencing technologies and their low cost, genomes of different organisms are being sequenced frequently. Therefore, quick assembly of genome, transcriptome, and target contigs from the raw data generated through the sequencing technologies has become necessary for better understanding of different biological systems. This article proposes an algorithm, namely SeqDev (Sequence Developer) for constructing contigs from raw reads using reference sequences. For this, we considered a weighted frequency?based consensus mechanism named BlastAssemb for primary construction of a sequence with gaps. Then, we adopted suffix array and proposed a gap filling search (GFS) algorithm for searching the missing sequences in the primary construct. For evaluating our algorithm, we have chosen Pokkali (rice) raw genome and Japonica (rice) as our reference data. Experimental results demonstrated that our proposed algorithm accurately constructs promoter sequences of Pokkali from its raw genome data. These constructed promoter sequences were 93 ? 100% identical with the reference and also aligned with 96 ? 100% of corresponding reference sequences with eValue ranging from 0.0 ? 2e-14. All these results indicated that our proposed method could be a potential algorithm to construct target contigs from raw sequences with the help of reference sequences. Further wet lab validation with specific Pokkali promoter sequence will boost this method as a robust algorithm for target contig assembly.Plant Tissue Cult. & Biotech. 26(1): 105-121, 2016 (June)
APA, Harvard, Vancouver, ISO, and other styles
13

Gomez, John, and Rafael Jimenez. "Sequence, a BioJS component for visualising sequences." F1000Research 3 (February 13, 2014): 52. http://dx.doi.org/10.12688/f1000research.3-52.v1.

Full text
Abstract:
Summary: Sequences are probably the most common piece of information in sites providing biological data resources, particularly those related to genes and proteins. Multiple visual representations of the same sequence can be found across those sites. This can lead to an inconsistency compromising both the user experience and usability while working with graphical representations of a sequence. Furthermore, the code of the visualisation module is commonly embedded and merged with the rest of the application, making it difficult to reuse it in other applications. In this paper, we present a BioJS component for visualising sequences with a set of options supporting a flexible configuration of the visual representation, such as formats, colours, annotations, and columns, among others. This component aims to facilitate a common representation across different sites, making it easier for end users to move from one site to another.Availability: http://www.ebi.ac.uk/Tools/biojs; http://dx.doi.org/10.5281/zenodo.8299
APA, Harvard, Vancouver, ISO, and other styles
14

Hart, Reece K., and Andreas Prlić. "SeqRepo: A system for managing local collections of biological sequences." PLOS ONE 15, no. 12 (December 3, 2020): e0239883. http://dx.doi.org/10.1371/journal.pone.0239883.

Full text
Abstract:
Motivation Access to biological sequence data, such as genome, transcript, or protein sequence, is at the core of many bioinformatics analysis workflows. The National Center for Biotechnology Information (NCBI), Ensembl, and other sequence database maintainers provide methods to access sequences through network connections. For many users, the convenience and currency of remotely managed data are compelling, and the network latency is non-consequential. However, for high-throughput and clinical applications, local sequence collections are essential for performance, stability, privacy, and reproducibility. Results Here we describe SeqRepo, a novel system for building a local, high-performance, non-redundant collection of biological sequences. SeqRepo enables clients to use primary database identifiers and several digests to identify sequences and sequence alises. SeqRepo provides a native Python interface and a REST interface, which can run locally and enables access from other programming languages. SeqRepo also provides an alternative REST interface based on the GA4GH refget protocol. SeqRepo provides fast random access to sequence slices. We provide results that demonstrate that a local SeqRepo sequence collection yields significant performance benefits of up to 1300-fold over remote sequence collections. In our use case for a variant validation and normalization pipeline, SeqRepo improved throughput 50-fold relative to use with remote sequences. SeqRepo may be used with any species or sequence type. Regular snapshots of Human sequence collections are available. It is often convenient or necessary to use a computed digest as a sequence identifier. For example, a digest-based identifier may be used to refer to proprietary reference genomes or segments of a graph genome, for which conventional identifiers will not be available. Here we also introduce a convention for the application of the SHA-512 hashing algorithm with Base64 encoding to generate URL-safe identifiers. This convention, sha512t24u, combines a fast digest mechanism with a space-efficient representation that can be used for any object. Our report includes an analysis of timing and collision probabilities for sha512t24u. SeqRepo enables clients to use sha512t24u as identifiers, thereby seamlessly integrating public and private sequence sets. Availability SeqRepo is released under the Apache License 2.0 and is available on github and PyPi. Docker images and database snapshots are also available. See https://github.com/biocommons/biocommons.seqrepo.
APA, Harvard, Vancouver, ISO, and other styles
15

Wang, Qian, Darryl N. Davis, and Jiadong Ren. "Mining frequent biological sequences based on bitmap without candidate sequence generation." Computers in Biology and Medicine 69 (February 2016): 152–57. http://dx.doi.org/10.1016/j.compbiomed.2015.12.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Wang, Zhan Bin, Hong Yun Xu, De Hai Li, and Jing Jie Wang. "The Biological Characteristics and its Sequence Analysis of Pholiota adiposa." Advanced Materials Research 518-523 (May 2012): 5371–75. http://dx.doi.org/10.4028/www.scientific.net/amr.518-523.5371.

Full text
Abstract:
In this paper, the biological characteristics of Pholiota adiposa were systematically studied. The results showed that the ideal temperature range for growth is from 20 °C to 25°C, with optimal temperature at 25°C; the optimal light condition is full darkness; the ideal pH range for growth is from 5 to 9, with optimal pH at 6; the preferred carbon source is sucrose, followed by glucose; the preferred nitrogen source is potassium nitrate, glutamic acid. The internal transcribed spacer region (ITS) was sequenced to determine whether the DNA sequence data supported the experimental result. The phylogenetic tree for the 19 pieces of homologous sequences were analyzed, with the highest homology reaching 99%.
APA, Harvard, Vancouver, ISO, and other styles
17

Liu, Peng Fei, Shou Bin Dong, Yi Cheng Cao, and Zheng Ping Du. "Distributed Biological Datasets Index Model." Advanced Materials Research 143-144 (October 2010): 599–603. http://dx.doi.org/10.4028/www.scientific.net/amr.143-144.599.

Full text
Abstract:
DBioSearch is a distributed model optimized for indexing next-generation exploding sequence data including the human genome and other biological sequence data. It is modeled in view of the distributed programming and text search engine Lucene, and reports good indexing efficiency for each kind of biological datasets. Indexing on large volume of sequences could be time consuming, but DBioSearch uses the distributed programming skill to parallelize indexing process that can take advantage of multiple computing nodes.
APA, Harvard, Vancouver, ISO, and other styles
18

Bitard-Feildel, Tristan. "Navigating the amino acid sequence space between functional proteins using a deep learning framework." PeerJ Computer Science 7 (September 17, 2021): e684. http://dx.doi.org/10.7717/peerj-cs.684.

Full text
Abstract:
Motivation Shedding light on the relationships between protein sequences and functions is a challenging task with many implications in protein evolution, diseases understanding, and protein design. The protein sequence space mapping to specific functions is however hard to comprehend due to its complexity. Generative models help to decipher complex systems thanks to their abilities to learn and recreate data specificity. Applied to proteins, they can capture the sequence patterns associated with functions and point out important relationships between sequence positions. By learning these dependencies between sequences and functions, they can ultimately be used to generate new sequences and navigate through uncharted area of molecular evolution. Results This study presents an Adversarial Auto-Encoder (AAE) approached, an unsupervised generative model, to generate new protein sequences. AAEs are tested on three protein families known for their multiple functions the sulfatase, the HUP and the TPP families. Clustering results on the encoded sequences from the latent space computed by AAEs display high level of homogeneity regarding the protein sequence functions. The study also reports and analyzes for the first time two sampling strategies based on latent space interpolation and latent space arithmetic to generate intermediate protein sequences sharing sequential properties of original sequences linked to known functional properties issued from different families and functions. Generated sequences by interpolation between latent space data points demonstrate the ability of the AAE to generalize and produce meaningful biological sequences from an evolutionary uncharted area of the biological sequence space. Finally, 3D structure models computed by comparative modelling using generated sequences and templates of different sub-families point out to the ability of the latent space arithmetic to successfully transfer protein sequence properties linked to function between different sub-families. All in all this study confirms the ability of deep learning frameworks to model biological complexity and bring new tools to explore amino acid sequence and functional spaces.
APA, Harvard, Vancouver, ISO, and other styles
19

Zhang, Nian-Zhang, Ying Xu, Si-Yang Huang, Dong-Hui Zhou, Rui-Ai Wang, and Xing-Quan Zhu. "Sequence Variation inToxoplasma gondii rop17Gene among Strains from Different Hosts and Geographical Locations." Scientific World Journal 2014 (2014): 1–4. http://dx.doi.org/10.1155/2014/349325.

Full text
Abstract:
Genetic diversity ofT. gondiiis a concern of many studies, due to the biological and epidemiological diversity of this parasite. The present study examined sequence variation in rhoptry protein 17 (ROP17) gene amongT. gondiiisolates from different hosts and geographical regions. Therop17gene was amplified and sequenced from 10T. gondiistrains, and phylogenetic relationship among theseT. gondiistrains was reconstructed using maximum parsimony (MP), neighbor-joining (NJ), and maximum likelihood (ML) analyses. The partialrop17gene sequences were 1375 bp in length and A+T contents varied from 49.45% to 50.11% among all examinedT. gondiistrains. Sequence analysis identified 33 variable nucleotide positions (2.1%), 16 of which were identified as transitions. Phylogeny reconstruction based onrop17gene data revealed two major clusters which could readily distinguish Type I and Type II strains. Analyses of sequence variations in nucleotides and amino acids among these strains revealed high ratio of nonsynonymous to synonymous polymorphisms (>1), indicating thatrop17shows signs of positive selection. This study demonstrated the existence of slightly high sequence variability in therop17gene sequences amongT. gondiistrains from different hosts and geographical regions, suggesting thatrop17gene may represent a new genetic marker for population genetic studies ofT. gondiiisolates.
APA, Harvard, Vancouver, ISO, and other styles
20

Komínek, P., and M. Komínková. "Genetic and biological characterisation of a Grapevine virus A isolate from the Czech Republic." Plant Protection Science 44, No. 4 (January 10, 2009): 121–26. http://dx.doi.org/10.17221/24/2008-pps.

Full text
Abstract:
An isolate of <i>Grapevine virus A</i> (GVA) from the Czech Republic was obtained from the grapevine cultivar Müller Thurgau. Symptoms of GVA – Kober stem grooving disease were not observed in the infected grapevines (which had been grafted onto Kober 5BB rootstock). A partial genomic sequence of the GVA isolate, 1523 nucleotides long, was obtained. The sequence completely covers the genes for both a movement and a coat protein. Compared to the GVA sequences available in databases, the nucleotide identity reached 84%. The amino acid identity in the movement protein reached 88%, and 98% in the coat protein.
APA, Harvard, Vancouver, ISO, and other styles
21

Beckstette, Michael, Jens T. Mailänder, Richard J. Marhöfer, Alexander Sczyrba, Enno Ohlebusch, Robert Giegerich, and Paul M. Selzer. "Genlight: Interactive high-throughput sequence analysis and comparative genomics." Journal of Integrative Bioinformatics 1, no. 1 (December 1, 2004): 90–107. http://dx.doi.org/10.1515/jib-2004-8.

Full text
Abstract:
Abstract With rising numbers of fully sequenced genomes the importance of comparative genomics is constantly increasing. Although several software systems for genome comparison analyses do exist, their functionality and flexibility is still limited, compared to the manifold possible applications. Therefore, we developed Genlight(http://piranha.techfak.uni-bielefeld.de.), a Client/Server based program suite for large scale sequence analysis and comparative genomics. Genlight uses the object relational database system PostgreSQL together with a state of the art data representation and a distributed execution approach for large scale analysis tasks. The system includes a wide variety of comparison and sequence manipulation methods and supports the management of nucleotide sequences as well as protein sequences. The comparison methods are complemented by a large variety of visualization methods for the assessment of the generated results. In order to demonstrate the suitability of the system for the treatment of biological questions, Genlight was used to identify potential drug and vaccine targets of the pathogen Helicobacter pylori.
APA, Harvard, Vancouver, ISO, and other styles
22

Koonin, Eugene V. "The meaning of biological information." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374, no. 2063 (March 13, 2016): 20150065. http://dx.doi.org/10.1098/rsta.2015.0065.

Full text
Abstract:
Biological information encoded in genomes is fundamentally different from and effectively orthogonal to Shannon entropy. The biologically relevant concept of information has to do with ‘meaning’, i.e. encoding various biological functions with various degree of evolutionary conservation. Apart from direct experimentation, the meaning, or biological information content, can be extracted and quantified from alignments of homologous nucleotide or amino acid sequences but generally not from a single sequence, using appropriately modified information theoretical formulae. For short, information encoded in genomes is defined vertically but not horizontally. Informally but substantially, biological information density seems to be equivalent to ‘meaning’ of genomic sequences that spans the entire range from sharply defined, universal meaning to effective meaninglessness. Large fractions of genomes, up to 90% in some plants, belong within the domain of fuzzy meaning. The sequences with fuzzy meaning can be recruited for various functions, with the meaning subsequently fixed, and also could perform generic functional roles that do not require sequence conservation. Biological meaning is continuously transferred between the genomes of selfish elements and hosts in the process of their coevolution. Thus, in order to adequately describe genome function and evolution, the concepts of information theory have to be adapted to incorporate the notion of meaning that is central to biology.
APA, Harvard, Vancouver, ISO, and other styles
23

Hanif, Waqar, Hijab Fatima, Muhammad Qasim, Rana Muhammad Atif, and Muhammad Rizwan Javed. "SeqDown: An Efficient Sequence Retrieval Software and Comparative Sequence Retrieval Analysis." Current Trends in OMICS 1, no. 1 (August 2, 2021): 18–29. http://dx.doi.org/10.32350/cto.11.03.

Full text
Abstract:
For any sequence analysis procedure, a single or multiple sequence must be retrieved, stored, organized. One of the most common public databases used for biological sequence retrieval is GenBank which is a comprehensive public database of nucleotide sequences. However, as the length of the sequence to be retrieved increases such as a chromosome, entire genome, scaffold, etc., the elapsed time to download the file gets even elongated due to slower bandwidth to download/retrieve the sequence.[8] In most cases, during sequence analysis, the researcher requires messenger RNA (mRNA), RNA, DNA, protein sequences of the same sequence-of-interest to work with, which consumes a substantial amount of the researcher in finding and retrieving the sequence files. An access to GenBank through JAVA HTTPS protocols is established to request and receive the sequence files associated with the input accessions. SeqDown was shown to be much efficient in terms of retrieval time of the sequences as compared to the other internet browsers and was found to be 15.27% faster than Mozilla Firefox. SeqDown also provides the feature to retrieve coding DNA sequences & protein sequences present in a single chromosome. Sequence retrieval from the most biological databases don’t have proper naming of their files and the user has to deal with the redundantly named sequence files which leads to incorrect and time-consuming analysis and can be solved with SeqDown. SeqDown is available as a free-to-download software at https://bit.ly/3cUwchz
APA, Harvard, Vancouver, ISO, and other styles
24

Maina, Solomon, Brenda A. Coutts, Owain R. Edwards, Luis de Almeida, Abel Ximenes, and Roger A. C. Jones. "Papaya ringspot virus Populations From East Timorese and Northern Australian Cucurbit Crops: Biological and Molecular Properties, and Absence of Genetic Connectivity." Plant Disease 101, no. 6 (June 2017): 985–93. http://dx.doi.org/10.1094/pdis-10-16-1499-re.

Full text
Abstract:
To examine possible genetic connectivity between crop viruses found in Southeast Asia and Australia, Papaya ringspot virus biotype W (PRSV-W) isolates from cucurbits growing in East Timor and northern Australia were studied. East Timorese samples from cucumber (Cucumis sativus) or pumpkin (Cucurbita moschata and C. maxima) were sent to Australia on FTA cards. These samples and others of pumpkin, rockmelon, honeydew melon (Cucumis melo), or watermelon (Citrullus lanatus) growing in one location each in northwest, north, or northeast Australia were subjected to high throughput sequencing (HTS). When the 17 complete PRSV genomic sequences obtained by HTS were compared with 32 others from GenBank, the five from East Timor were in a different major phylogroup from the 12 Australian sequences. Moreover, the East Timorese and Australian sequences each formed their own minor phylogroups named VI and I, respectively. A Taiwanese sequence was closest to the East Timorese (89.6% nt dentity), and Mexican and Brazilian sequences were the closest to the Australian (92.3% nt identity). When coat protein gene (CP) sequences from the 17 new genomic sequences were compared with 126 others from GenBank, three Australian isolates sequenced more than 20 years ago grouped with the new Australian sequences, while the closest sequence to the East Timorese was from Thailand (93.1% nt identity). Recombination analysis revealed 13 recombination events among the 49 complete genomes. Two isolates from East Timor (TM50, TM32) and eight from GenBank were recombinants, but all 12 Australian isolates were non-recombinants. No evidence of genome connectivity between Australian and Southeast Asian PRSV populations was obtained. The strand-specific RNA library approach used optimized data collection for virus genome assembly. When an Australian PRSV isolate was inoculated to plants of zucchini (Cucurbita pepo), watermelon, rockmelon, and honeydew melon, they all developed systemic foliage symptoms characteristic of PRSV-W, but symptom severity varied among melon cultivars.
APA, Harvard, Vancouver, ISO, and other styles
25

Pujari, Jeevana Jyothi, and Karteeka Pavan Kanadam. "Semi Global Pairwise Sequence Alignment Using New Chromosome Structure Genetic Algorithm." Ingénierie des systèmes d information 27, no. 1 (February 28, 2022): 67–74. http://dx.doi.org/10.18280/isi.270108.

Full text
Abstract:
Biological sequence alignment is a prominent and eminent task in the analysis of biological data. This paper proposes a pair wise semi global sequence alignment technique using New Chromosome Structure based Genetic algorithm (NCSGA) for aligning sequences by automatically detecting optimal number of gaps and their positions to explore the optimal score for DNA or protein sequences. The experimental results are conducted using simulated real datasets from NCBI. The proposed method can be tested on real data sets of nucleotide sequence pairs. The computational results show that NCSGA produces the near optimal solutions for semi global alignment compared to other existing approaches.
APA, Harvard, Vancouver, ISO, and other styles
26

Hanage, William P., Christophe Fraser, and Brian G. Spratt. "Sequences, sequence clusters and bacterial species." Philosophical Transactions of the Royal Society B: Biological Sciences 361, no. 1475 (October 6, 2006): 1917–27. http://dx.doi.org/10.1098/rstb.2006.1917.

Full text
Abstract:
Whatever else they should share, strains of bacteria assigned to the same species should have house-keeping genes that are similar in sequence. Single gene sequences (or rRNA gene sequences) have very few informative sites to resolve the strains of closely related species, and relationships among similar species may be confounded by interspecies recombination. A more promising approach (multilocus sequence analysis, MLSA) is to concatenate the sequences of multiple house-keeping loci and to observe the patterns of clustering among large populations of strains of closely related named bacterial species. Recent studies have shown that large populations can be resolved into non-overlapping sequence clusters that agree well with species assigned by the standard microbiological methods. The use of clustering patterns to inform the division of closely related populations into species has many advantages for poorly studied bacteria (or to re-evaluate well-studied species), as it provides a way of recognizing natural discontinuities in the distribution of similar genotypes. Clustering patterns can be used by expert groups as the basis of a pragmatic approach to assigning species, taking into account whatever additional data are available (e.g. similarities in ecology, phenotype and gene content). The development of large MLSA Internet databases provides the ability to assign new strains to previously defined species clusters and an electronic taxonomy. The advantages and problems in using sequence clusters as the basis of species assignments are discussed.
APA, Harvard, Vancouver, ISO, and other styles
27

Adkins, Scott, Tom D’Elia, Kornelia Fillmer, Patchara Pongam, and Carlye A. Baker. "Biological and Genomic Characterization of a Novel Tobamovirus Infecting Hoya spp." Plant Disease 102, no. 12 (December 2018): 2571–77. http://dx.doi.org/10.1094/pdis-04-18-0667-re.

Full text
Abstract:
Foliar symptoms suggestive of virus infection were observed on the ornamental plant hoya (Hoya spp.; commonly known as waxflower) in Florida. An agent that reacted with commercially available tobamovirus detection reagents was mechanically transmitted to Chenopodium quinoa and Nicotiana benthamiana. Rod-shaped particles ∼300 nm in length and typical of tobamoviruses were observed in partially purified virion preparations by electron microscopy. An experimental host range was determined by mechanical inoculation with virions, and systemic infections were observed in plants in the Asclepiadaceae, Apocynaceae, and Solanaceae families. Some species in the Solanaceae and Chenopodiaceae families allowed virus replication only in inoculated leaves, and were thus only local hosts for the virus. Tested plants in the Amaranthaceae, Apiaceae, Brassicaceae, Cucurbitaceae, Fabaceae, and Malvaceae did not support either local or systemic virus infection. The complete genome for the virus was sequenced and shown to have a typical tobamovirus organization. Comparisons of genome nucleotide sequence and individual gene deduced amino acid sequences indicate that it is a novel tobamovirus sharing the highest level of sequence identity with Streptocarpus flower break virus and members of the Brassicaceae-infecting subgroup of tobamoviruses. The virus, for which the name Hoya chlorotic spot virus (HoCSV) is proposed, was detected in multiple hoya plants from different locations in Florida.
APA, Harvard, Vancouver, ISO, and other styles
28

Harrison, Robert L., and Bryony C. Bonning. "The nucleopolyhedroviruses of Rachiplusia ou and Anagrapha falcifera are isolates of the same virus." Journal of General Virology 80, no. 10 (October 1, 1999): 2793–98. http://dx.doi.org/10.1099/0022-1317-80-10-2793.

Full text
Abstract:
The 7·8 kb EcoRI-G fragment of Rachiplusia ou multicapsid nucleopolyhedrovirus (RoMNPV), containing the polyhedrin gene, was cloned and sequenced. The sequence of the fragment was 92·3% identical to the sequence of the corresponding region in the Autographa californica (Ac)MNPV genome. A comparison of the EcoRI-G sequence with other MNPV sequences revealed that RoMNPV was most closely related to AcMNPV. However, the predicted amino acid sequence of RoMNPV polyhedrin shared more sequence identity with the polyhedrin of Orygia pseudotsugata MNPV. In addition, the RoMNPV sequence was almost completely identical (99·9%) to a previously published 6·3 kb sequence of Anagrapha falcifera MNPV (AfMNPV). The Eco RI and HindIII restriction fragment profiles of RoMNPV and AfMNPV also were nearly identical, with an additional EcoRI band detected in RoMNPV DNA. Bioassays of these viruses with three different hosts (the European corn borer, Ostrinia nubilalis H übner, the corn earworm, Helicoverpa zea Boddie, and the tobacco budworm, Heliothis virescens Fabricius) failed to detect any differences in the biological activities of RoMNPV and AfMNPV. These results indicate that RoMNPV and AfMNPV are different isolates of the same virus. The taxonomic relationship of Ro/AfMNPV and AcMNPV is discussed.
APA, Harvard, Vancouver, ISO, and other styles
29

Mullan, L. J. "Biological sequence databases." Briefings in Bioinformatics 4, no. 1 (January 1, 2003): 75–77. http://dx.doi.org/10.1093/bib/4.1.75.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Borges, Júlio C., Maria C. Peroto, and Carlos H. I. Ramos. "Molecular chaperone genes in the sugarcane expressed sequence database (SUCEST)." Genetics and Molecular Biology 24, no. 1-4 (December 2001): 85–92. http://dx.doi.org/10.1590/s1415-47572001000100013.

Full text
Abstract:
Some newly synthesized proteins require the assistance of molecular chaperones for their correct folding. Chaperones are also involved in the dissolution of protein aggregates making their study significant for both biotechnology and medicine and the identification of chaperones and stress-related protein sequences in different organisms is an important task. We used bioinformatic tools to investigate the information generated by the Sugarcane Expressed Sequence Tag (SUCEST) genome project in order to identify and annotate molecular chaperones. We considered that the SUCEST sequences belonged to this category of proteins when their E-values were lower than 1.0e-05. Our annotation shows that 4,164 of the 5’ expressed sequence tag (EST) sequences were homologous to molecular chaperones, nearly 1.8% of all the 5’ ESTs sequenced during the SUCEST project. About 43% of the chaperones which we found were Hsp70 chaperones and its co-chaperones, 10% were Hsp90 chaperones and 13% were peptidyl-prolyl cis, trans isomerase. Based on the annotation results we predicted 156 different chaperone gene subclasses in the sugarcane genome. Taken together, our results indicate that genes which encode chaperones were diverse and abundantly expressed in sugarcane cells, which emphasizes their biological importance.
APA, Harvard, Vancouver, ISO, and other styles
31

Allison, L., L. Stern, T. Edgoose, and T. I. Dix. "Sequence complexity for biological sequence analysis." Computers & Chemistry 24, no. 1 (January 2000): 43–55. http://dx.doi.org/10.1016/s0097-8485(00)80006-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Rascoe, J., M. Berg, U. Melcher, F. L. Mitchell, B. D. Bruton, S. D. Pair, and J. Fletcher. "Identification, Phylogenetic Analysis, and Biological Characterization of Serratia marcescens Strains Causing Cucurbit Yellow Vine Disease." Phytopathology® 93, no. 10 (October 2003): 1233–39. http://dx.doi.org/10.1094/phyto.2003.93.10.1233.

Full text
Abstract:
A serious vine decline of cucurbits known as cucurbit yellow vine disease (CYVD) is caused by rod-shaped bacteria that colonize the phloem elements. Sequence analysis of a CYVD-specific polymerase chain reaction (PCR)-amplified 16S rDNA product showed the microbe to be a γ-proteobacterium related to the genus Serratia. To identify and characterize the bacteria, one strain each from watermelon and zucchini and several noncucurbit-derived reference strains were subjected to sequence analysis and biological function assays. Taxonomic and phylogenetic placement was investigated by analysis of the groE and 16S rDNA regions, which were amplified by PCR and directly sequenced. For comparison, eight other bacterial strains identified by others as Serratia spp. also were sequenced. These sequences clearly identified the CYVD strains as Serratia marcescens. However, evaluation of metabolic and biochemical features revealed that cucurbit-derived strains of S. marcescens differ substantially from strains of the same species isolated from other environmental niches. Cucurbit strains formed a distinct cluster, separate from other strains, when their fatty acid methyl ester profiles were analyzed. In substrate utilization assays (BIOLOG, Vitek, and API 20E), the CYVD strains lacked a number of metabolic functions characteristic for S. marcescens, failing to catabolize 25 to 30 compounds that were utilized by S. marcescens reference strains. These biological differences may reflect gene loss or repression that occurred as the bacterium adapted to life as an intracellular parasite and plant pathogen.
APA, Harvard, Vancouver, ISO, and other styles
33

Wu, Mi, Zhifei Zhang, Xin Su, Haipeng Lu, Xuesong Li, Chunxiu Yuan, Qinfang Liu, Qiaoyang Teng, Letu Geri, and Zejun Li. "Biological Characteristics of Infectious Laryngotracheitis Viruses Isolated in China." Viruses 14, no. 6 (May 31, 2022): 1200. http://dx.doi.org/10.3390/v14061200.

Full text
Abstract:
Infectious laryngotracheitis virus (ILTV) causes severe respiratory disease in chickens and results in huge economic losses in the poultry industry worldwide. To correlate the genomic difference with the replication and pathogenicity, phenotypes of three ILTVs isolated from chickens in China from 2016 to 2018 were sequenced by high-throughput sequencing. Based on the entire genome, the isolates GD2018 and SH2017 shared 99.9% nucleotide homology, while the isolate SH2016 shared 99.7% nucleotide homology with GD2018 and SH2017, respectively. Each virus genome contained 82 ORFs encoding 77 kinds of protein, 31 of which share the same amino acid sequence in the three viruses. GD2018 and SH2017 shared 57 proteins with the same amino acid sequence, while SH2016 shared 42 and 41 proteins with the amino acid sequences of GD2018 and SH2017, respectively. SH2016 propagated efficiently in allantoic fluid and on chorioallantoic membranes (CAMs) of SPF chicken embryo eggs, while GD2018 and SH2017 proliferated well only on CAMs. GD2018 propagated most efficiently on CAMs and LMH cells among three isolates. SH2016 caused serious clinical symptoms, while GD2018 and SH2017 caused mild and moderate clinical symptoms in chickens, although the sero of the chickens infected with those three isolates were all positive for anti-ILTV antibody at 14 and 21 days after challenge. Three ILTVs with high genetic homology showed significant differences in the replication in different culture systems and the pathogenicity of chickens, providing basic materials for studying the key determinants of pathogenicity of ILTV.
APA, Harvard, Vancouver, ISO, and other styles
34

Lee, Michael S. Y. "The molecularisation of taxonomy." Invertebrate Systematics 18, no. 1 (2004): 1. http://dx.doi.org/10.1071/is03021.

Full text
Abstract:
The recent proposal for a new system of biological taxonomy based primarily on DNA sequences from one or a few chosen ('standard') genes sequenced across all taxa appears inadvisable for both practical and theoretical reasons. While nucleotide sequences are more objective than traditional (e.g. morphological) data in some respects (character choice, character delineation, character state identity), in other respects both are inherently subjective (homology/alignment, divergence metrics). Sequence divergence in standard gene(s) is an extremely crude method for determining species limits; more appropriate markers (potentially directly linked to species criteria such as reproductive isolation) should be and often are used. It is thus worth persisting with the plurality of genetic, anatomical and ethological criteria currently used to hypothesise ('identify') and test species boundaries. However, once species boundaries have been thus discerned, use of sequences from standard genes to diagnose those boundaries (and place individuals with respect to those boundaries) is highly feasible, though subject to error like any single type of marker. In many cases this approach might have advantages over morphological diagnoses. However, unless an appropriate taxonomic framework constructed using all appropriate biological information is already in place, such molecular diagnoses will be premature.
APA, Harvard, Vancouver, ISO, and other styles
35

LIU, HUIQING, and LIMSOON WONG. "DATA MINING TOOLS FOR BIOLOGICAL SEQUENCES." Journal of Bioinformatics and Computational Biology 01, no. 01 (April 2003): 139–67. http://dx.doi.org/10.1142/s0219720003000216.

Full text
Abstract:
We describe a methodology, as well as some related data mining tools, for analyzing sequence data. The methodology comprises three steps: (a) generating candidate features from the sequences, (b) selecting relevant features from the candidates, and (c) integrating the selected features to build a system to recognize specific properties in sequence data. We also give relevant techniques for each of these three steps. For generating candidate features, we present various types of features based on the idea of k-grams. For selecting relevant features, we discuss signal-to-noise, t-statistics, and entropy measures, as well as a correlation-based feature selection method. For integrating selected features, we use machine learning methods, including C4.5, SVM, and Naive Bayes. We illustrate this methodology on the problem of recognizing translation initiation sites. We discuss how to generate and select features that are useful for understanding the distinction between ATG sites that are translation initiation sites and those that are not. We also discuss how to use such features to build reliable systems for recognizing translation initiation sites in DNA sequences.
APA, Harvard, Vancouver, ISO, and other styles
36

Davydov, Vladimir V., Sergey V. Zhavoronok, Tatyana V. Znovets, Vladimir M. Tsyrkunov, Andrei S. Babenkа, Svetlana I. Marchuk, Elena L. Gasich, et al. "Molecular epidemiological study of clinical cases of acute hepatitis E in Belarus." Journal of microbiology, epidemiology and immunobiology 99, no. 6 (January 10, 2023): 625–36. http://dx.doi.org/10.36233/0372-9311-328.

Full text
Abstract:
Relevance. The frequency of occurrence of anamnestic antibodies to the hepatitis E virus (HEV) in the general population of the Republic of Belarus is 7.3%, which is clearly not consistent with the low incidence of hepatitis E (HE). Most of primary HEV infections remain undiagnosed. The intensive epidemic process of HEV in the Belarusian population is hidden. Conducting epidemiological studies, including genotyping of HEV sequences isolated on the territory of the republic, makes it possible to more accurately characterize the sources of HEV infection and the mechanisms of its transmission. Aim molecular epidemiological study of two cases of acute hepatitis E detected in patients from Belarus. Materials and methods. During 20212022, samples of biological material were obtained from two patients undergoing treatment with an established diagnosis of acute hepatitis E. Serum samples were tested to detect antibodies to HEV using enzyme immunoassay, HEV RNA was detected in fecal samples using nested RT-PCR. The nucleotide sequence was determined by an automatic sequencer using the Sanger method. Analysis of nucleotide sequences, their genotyping, and calculation of evolutionary distances were performed using MEGA X software. Results. The HEV sequence isolated from a pregnant woman who had an epidemiological episode of alimentary contact with raw pork meat is clustered into a common phylogenetic clade with HEV sequence obtained from the patient from Belarus with a history of kidney transplantation and HEV sequences isolated from a domestic pigs. The HEV sequence isolated from a patient with a history of travel to Pakistan belongs to the HEV genotype 1 and joins a clade of HEV sequences isolated in Pakistan, India, Nepal and Mongolia.
APA, Harvard, Vancouver, ISO, and other styles
37

Wang, Yanbin, Zhu-Hong You, Shan Yang, Xiao Li, Tong-Hai Jiang, and Xi Zhou. "A High Efficient Biological Language Model for Predicting Protein–Protein Interactions." Cells 8, no. 2 (February 3, 2019): 122. http://dx.doi.org/10.3390/cells8020122.

Full text
Abstract:
Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient methods for predicting PPIs from protein sequence information have not been found for many years due to limiting factors including both methodology and technology. Inspired by the similarity of biological sequences and languages, developing a biological language processing technology may provide a brand new theoretical perspective and feasible method for the study of biological sequences. In this paper, a pure biological language processing model is proposed for predicting protein–protein interactions only using a protein sequence. The model was constructed based on a feature representation method for biological sequences called bio-to-vector (Bio2Vec) and a convolution neural network (CNN). The Bio2Vec obtains protein sequence features by using a “bio-word” segmentation system and a word representation model used for learning the distributed representation for each “bio-word”. The Bio2Vec supplies a frame that allows researchers to consider the context information and implicit semantic information of a bio sequence. A remarkable improvement in PPIs prediction performance has been observed by using the proposed model compared with state-of-the-art methods. The presentation of this approach marks the start of “bio language processing technology,” which could cause a technological revolution and could be applied to improve the quality of predictions in other problems.
APA, Harvard, Vancouver, ISO, and other styles
38

Memišević, Vesna, Tijana Milenković, and Nataša Pržulj. "Complementarity of network and sequence information in homologous proteins." Journal of Integrative Bioinformatics 7, no. 3 (December 1, 2010): 275–89. http://dx.doi.org/10.1515/jib-2010-135.

Full text
Abstract:
Summary Traditional approaches for homology detection rely on finding sufficient similarities between protein sequences. Motivated by studies demonstrating that from non-sequence based sources of biological information, such as the secondary or tertiary molecular structure, we can extract certain types of biological knowledge when sequence-based approaches fail, we hypothesize that protein-protein interaction (PPI) network topology and protein sequence might give insights into different slices of biological information. Since proteins aggregate to perform a function instead of acting in isolation, analyzing complex wirings around a protein in a PPI network could give deeper insights into the protein’s role in the inner working of the cell than analyzing sequences of individual genes. Hence, we believe that one could lose much information by focusing on sequence information alone. We examine whether the information about homologous proteins captured by PPI network topology differs and to what extent from the information captured by their sequences. We measure how similar the topology around homologous proteins in a PPI network is and show that such proteins have statistically significantly higher network similarity than nonhomologous proteins. We compare these network similarity trends of homologous proteins with the trends in their sequence identity and find that network similarities uncover almost as much homology as sequence identities. Although none of the two methods, network topology and sequence identity, seems to capture homology information in its entirety, we demonstrate that the two might give insights into somewhat different types of biological information, as the overlap of the homology information that they uncover is relatively low. Therefore, we conclude that similarities of proteins’ topological neighborhoods in a PPI network could be used as a complementary method to sequence-based approaches for identifying homologs, as well as for analyzing evolutionary distance and functional divergence of homologous proteins.
APA, Harvard, Vancouver, ISO, and other styles
39

Ejigu, Girum Fitihamlak, Gangman Yi, Jong Im Kim, and Jaehee Jung. "ReGSP: a visualized application for homology-based gene searching and plotting using multiple reference sequences." PeerJ 9 (December 23, 2021): e12707. http://dx.doi.org/10.7717/peerj.12707.

Full text
Abstract:
The massively parallel nature of next-generation sequencing technologies has contributed to the generation of massive sequence data in the last two decades. Deciphering the meaning of each generated sequence requires multiple analysis tools, at all stages of analysis, from the reads stage all the way up to the whole-genome level. Homology-based approaches based on related reference sequences are usually the preferred option for gene and transcript prediction in newly sequenced genomes, resulting in the popularity of a variety of BLAST and BLAST-based tools. For organelle genomes, a single-reference–based gene finding tool that uses grouping parameters for BLAST results has been implemented in the Genome Search Plotter (GSP). However, this tool does not accept multiple and user-customized reference sequences required for a broad homology search. Here, we present multiple Reference–based Gene Search and Plot (ReGSP), a simple and convenient web tool that accepts multiple reference sequences for homology-based gene search. The tool incorporates cPlot, a novel dot plot tool, for illustrating nucleotide sequence similarity between the query and the reference sequences. ReGSP has an easy-to-use web interface and is freely accessible at https://ds.mju.ac.kr/regsp.
APA, Harvard, Vancouver, ISO, and other styles
40

Tapinos, Avraam, Bede Constantinides, My V. T. Phan, Samaneh Kouchaki, Matthew Cotten, and David L. Robertson. "The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences." Viruses 11, no. 5 (April 26, 2019): 394. http://dx.doi.org/10.3390/v11050394.

Full text
Abstract:
Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
APA, Harvard, Vancouver, ISO, and other styles
41

GANAPATHIRAJU, MADHAVI K., ASIA D. MITCHELL, MOHAMED THAHIR, KAMIYA MOTWANI, and SESHAN ANANTHASUBRAMANIAN. "SUITE OF TOOLS FOR STATISTICAL N-GRAM LANGUAGE MODELING FOR PATTERN MINING IN WHOLE GENOME SEQUENCES." Journal of Bioinformatics and Computational Biology 10, no. 06 (October 18, 2012): 1250016. http://dx.doi.org/10.1142/s0219720012500163.

Full text
Abstract:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
APA, Harvard, Vancouver, ISO, and other styles
42

Antignus, Y., Y. Wang, M. Pearlsman, O. Lachman, N. Lavi, and A. Gal-On. "Biological and Molecular Characterization of a New Cucurbit-Infecting Tobamovirus." Phytopathology® 91, no. 6 (June 2001): 565–71. http://dx.doi.org/10.1094/phyto.2001.91.6.565.

Full text
Abstract:
An uncharacterized virus was isolated from greenhouse-grown cucumber plants. Biological and serological data described in the present study indicated that the virus belonged in the genus Tobamovirus. The host range of the virus included several plant species within the family Cucurbitaceae. The virus designated Cucumber fruit mottle mosaic virus (CFMMV) causes severe mottling or mosaic on cucumber fruits, and its fast spread within greenhouses could lead to significant economic losses in cucumber crops. The genome of CFMMV has been completely sequenced and its genome organization was typical of a Tobamovirus. However, its sequence was distinct from other described viruses within the group of cucurbit-infecting Tobamoviruses. Comparisons of sequences and phylogenetic analysis suggested that the cucurbit-infecting Tobamoviruses be separated into two subgroups: subgroup I comprising the strains and isolates referred to in the literature as Cucumber green mottle mosaic virus (CGMMV) (CV3, CV4, CGMMV-W, CGMMV-SH, and CGMMV-Is) and subgroup II comprising CFMMV, Kyuri green mottle mosaic virus (KGMMV), and the Yodo strain of CGMMV, which is closely related to KGMMV and may be considered a strain of it.
APA, Harvard, Vancouver, ISO, and other styles
43

Srikantha, A., A. S. Bopardikar, K. K. Kaipa, P. Venkataraman, K. Lee, T. Ahn, and R. Narayanan. "A fast algorithm for exact sequence search in biological sequences using polyphase decomposition." Bioinformatics 26, no. 18 (September 7, 2010): i414—i419. http://dx.doi.org/10.1093/bioinformatics/btq364.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Rajman, Luis A., and Susan T. Lovett. "A Thermostable Single-Strand DNase fromMethanococcus jannaschii Related to the RecJ Recombination and Repair Exonuclease from Escherichia coli." Journal of Bacteriology 182, no. 3 (February 1, 2000): 607–12. http://dx.doi.org/10.1128/jb.182.3.607-612.2000.

Full text
Abstract:
ABSTRACT The RecJ protein of Escherichia coli plays an important role in a number of DNA repair and recombination pathways. RecJ catalyzes processive degradation of single-stranded DNA in a 5′-to-3′ direction. Sequences highly related to those encoding RecJ can be found in most of the eubacterial genomes sequenced to date. From alignment of these sequences, seven conserved motifs are apparent. At least five of these motifs are shared among a large family of proteins in eubacteria, eukaryotes, and archaea, including the PPX1 polyphosphatase of yeast and Drosophila Prune. Archaeal genomes are particularly rich in such sequences, but it has not been clear whether any of the encoded proteins play a functional role similar to that of RecJ exonuclease. We have investigated three such proteins fromMethanococcus jannaschii with the strongest overall sequence similarity to E. coli RecJ. Two of the genes, MJ0977 and MJ0831, partially complement a recJ mutant phenotype in E. coli. The expression of MJ0977 in E. coli resulted in high levels of a thermostable single-stranded DNase activity with properties similar to those of RecJ exonuclease. Despite overall weak sequence similarity between the MJ0977 product and RecJ, these nucleases are likely to have similar biological functions.
APA, Harvard, Vancouver, ISO, and other styles
45

Humphrey, Sam, Alastair Kerr, Magnus Rattray, Caroline Dive, and Crispin J. Miller. "A model of k-mer surprisal to quantify local sequence information content surrounding splice regions." PeerJ 8 (November 4, 2020): e10063. http://dx.doi.org/10.7717/peerj.10063.

Full text
Abstract:
Molecular sequences carry information. Analysis of sequence conservation between homologous loci is a proven approach with which to explore the information content of molecular sequences. This is often done using multiple sequence alignments to support comparisons between homologous loci. These methods therefore rely on sufficient underlying sequence similarity with which to construct a representative alignment. Here we describe a method using a formal metric of information, surprisal, to analyse biological sub-sequences without alignment constraints. We applied our model to the genomes of five different species to reveal similar patterns across a panel of eukaryotes. As the surprisal of a sub-sequence is inversely proportional to its occurrence within the genome, the optimal size of the sub-sequences was selected for each species under consideration. With the model optimized, we found a strong correlation between surprisal and CG dinucleotide usage. The utility of our model was tested by examining the sequences of genes known to undergo splicing. We demonstrate that our model can identify biological features of interest such as known donor and acceptor sites. Analysis across all annotated coding exon junctions in Homo sapiens reveals the information content of coding exons to be greater than the surrounding intron regions, a consequence of increased suppression of the CG dinucleotide in intronic space. Sequences within coding regions proximal to exon junctions exhibited novel patterns within DNA and coding mRNA that are not a function of the encoded amino acid sequence. Our findings are consistent with the presence of secondary information encoding features such as DNA and RNA binding sites, multiplexed through the coding sequence and independent of the information required to define the corresponding amino-acid sequence. We conclude that surprisal provides a complementary methodology with which to locate regions of interest in the genome, particularly in situations that lack an appropriate multiple sequence alignment.
APA, Harvard, Vancouver, ISO, and other styles
46

Kaur, Karamjeet, Sudeshna Chakraborty, and Manoj Kumar Gupta. "Accelerating Smith-Waterman Algorithm for Faster Sequence Alignment using Graphical Processing Unit." Journal of Physics: Conference Series 2161, no. 1 (January 1, 2022): 012028. http://dx.doi.org/10.1088/1742-6596/2161/1/012028.

Full text
Abstract:
Abstract In bioinformatics, sequence alignment is very important task to compare and find similarity between biological sequences. Smith Waterman algorithm is most widely used for alignment process but it has quadratic time complexity. This algorithm is using sequential approach so if the no. of biological sequences is increasing then it takes too much time to align sequences. In this paper, parallel approach of Smith Waterman algorithm is proposed and implemented according to the architecture of graphic processing unit using CUDA in which features of GPU is combined with CPU in such a way that alignment process is three times faster than sequential implementation of Smith Waterman algorithm and helps in accelerating the performance of sequence alignment using GPU. This paper describes the parallel implementation of sequence alignment using GPU and this intra-task parallelization strategy reduces the execution time. The results show significant runtime savings on GPU.
APA, Harvard, Vancouver, ISO, and other styles
47

Kumar, Chetan, and K. Sekar. "SSMBS: a web server to locate sequentially separated motifs in biological sequences." Journal of Applied Crystallography 43, no. 1 (December 9, 2009): 203–5. http://dx.doi.org/10.1107/s0021889809047050.

Full text
Abstract:
The identification of sequence (amino acids or nucleotides) motifs in a particular order in biological sequences has proved to be of interest. This paper describes a computing server,SSMBS, which can locate and display the occurrences of user-defined biologically important sequence motifs (a maximum of five) present in a specific order in protein and nucleotide sequences. While the server can efficiently locate motifs specified using regular expressions, it can also find occurrences of long and complex motifs. The computation is carried out by an algorithm developed using the concepts of quantifiers in regular expressions. The web server is available to users around the clock at http://dicsoft1.physics.iisc.ernet.in/ssmbs/.
APA, Harvard, Vancouver, ISO, and other styles
48

Dominguez, Geraldina, Timothy R. Dambaugh, Felicia R. Stamey, Stephen Dewhurst, Naoki Inoue, and Philip E. Pellett. "Human Herpesvirus 6B Genome Sequence: Coding Content and Comparison with Human Herpesvirus 6A." Journal of Virology 73, no. 10 (October 1, 1999): 8040–52. http://dx.doi.org/10.1128/jvi.73.10.8040-8052.1999.

Full text
Abstract:
ABSTRACT Human herpesvirus 6 variants A and B (HHV-6A and HHV-6B) are closely related viruses that can be readily distinguished by comparison of restriction endonuclease profiles and nucleotide sequences. The viruses are similar with respect to genomic and genetic organization, and their genomes cross-hybridize extensively, but they differ in biological and epidemiologic features. Differences include infectivity of T-cell lines, patterns of reactivity with monoclonal antibodies, and disease associations. Here we report the complete genome sequence of HHV-6B strain Z29 [HHV-6B(Z29)], describe its genetic content, and present an analysis of the relationships between HHV-6A and HHV-6B. As sequenced, the HHV-6B(Z29) genome is 162,114 bp long and is composed of a 144,528-bp unique segment (U) bracketed by 8,793-bp direct repeats (DR). The genomic sequence allows prediction of a total of 119 unique open reading frames (ORFs), 9 of which are present only in HHV-6B. Splicing is predicted in 11 genes, resulting in the 119 ORFs composing 97 unique genes. The overall nucleotide sequence identity between HHV-6A and HHV-6B is 90%. The most divergent regions are DR and the right end of U, spanning ORFs U86 to U100. These regions have 85 and 72% nucleotide sequence identity, respectively. The amino acid sequences of 13 of the 17 ORFs at the right end of U differ by more than 10%, with the notable exception of U94, the adeno-associated virus type 2 rep homolog, which differs by only 2.4%. This region also includes putative cis-acting sequences that are likely to be involved in transcriptional regulation of the major immediate-early locus. The catalog of variant-specific genetic differences resulting from our comparison of the genome sequences adds support to previous data indicating that HHV-6A and HHV-6B are distinct herpesvirus species.
APA, Harvard, Vancouver, ISO, and other styles
49

Wintermantel, William M., and Laura L. Hladky. "Complete Genome Sequence and Biological Characterization of Moroccan pepper virus (MPV) and Reclassification of Lettuce necrotic stunt virus as MPV." Phytopathology® 103, no. 5 (May 2013): 501–8. http://dx.doi.org/10.1094/phyto-07-12-0166-r.

Full text
Abstract:
Moroccan pepper virus (MPV) and Lettuce necrotic stunt virus (LNSV) have been steadily increasing in prevalence in central Asia and western North America, respectively, over the past decade. Recent sequence analysis of LNSV demonstrated a close relationship between the coat proteins of LNSV and MPV. To determine the full extent of the relationship between LNSV and MPV, the genomes of three MPV isolates were sequenced and compared with that of LNSV. Sequence analysis demonstrated that genomic nucleotide sequences as well as virus-encoded proteins of the three MPV isolates and LNSV shared 97% or greater identity. A full-length clone of a California LNSV isolate was developed and virus derived from infectious transcripts was used to evaluate host plant reactions under controlled conditions. Symptoms of LNSV matched those described previously for MPV on most of a select series of host plants, although some differences were observed. Collectively, these molecular and biological results demonstrate that LNSV should be classified as MPV within the family Tombusviridae, genus Tombusvirus, and confirm the presence of MPV in North America.
APA, Harvard, Vancouver, ISO, and other styles
50

Brunel, Dominique, Nicole Froger, and Georges Pelletier. "Development of amplified consensus genetic markers (ACGM) in Brassica napus from Arabidopsis thaliana sequences of known biological function." Genome 42, no. 3 (June 1, 1999): 387–402. http://dx.doi.org/10.1139/g98-141.

Full text
Abstract:
A method for the development of consensus genetic markers between species of the same taxonomic family is described in this paper. It is based on the conservation of the peptide sequences and on the potential polymorphism within non-coding sequences. Six loci sequenced from Arabidopsis thaliana, AG, LFY3, AP3, FAD7, FAD3, and ADH, were analysed for one ecotype of A. thaliana, four lines of Brassica napus, and one line for each parental species, Brassica oleracea and Brassica rapa. Positive amplifications with the degenerate primers showed one band for A. thaliana, two to four bands in rapeseed, and one to two bands in the parental species. Direct sequencing of the PCR products confirms their peptide similarity with the "mother" sequence. By comparison of intron sequences, the correspondence between each rapeseed gene and its homologue in one of the parental species can be determined without ambiguity. Another important result is the presence of a polymorphism inside these fragments between the rapeseed lines. This variability could generally be detected by differences of electrophoretic migration on long non-denaturing polyacrylamide gels. This method enables a quick and easy shuttle between A. thaliana and Brassica species without cloning.Key words: consensus genetics markers, PCR specific, Brassica, Arabidopsis, targeted markers, DSCP.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography