Log in

Relevant bibliographies by topics / Low-memory bioinformatics

Academic literature on the topic 'Low-memory bioinformatics'

Author: Grafiati

Published: 10 December 2022

Last updated: 29 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Contents

Journal articles
Conference papers

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Low-memory bioinformatics.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Low-memory bioinformatics"

1

Rizk, G., D. Lavenier, and R. Chikhi. "DSK: k-mer counting with very low memory usage." Bioinformatics 29, no. 5 (January 16, 2013): 652–53. http://dx.doi.org/10.1093/bioinformatics/btt020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chikhi, Rayan, Antoine Limasset, and Paul Medvedev. "Compacting de Bruijn graphs from sequencing data quickly and in low memory." Bioinformatics 32, no. 12 (June 15, 2016): i201—i208. http://dx.doi.org/10.1093/bioinformatics/btw279.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Khan, Jamshed, and Rob Patro. "Cuttlefish: fast, parallel and low-memory compaction of de Bruijn graphs from large-scale genome collections." Bioinformatics 37, Supplement_1 (July 1, 2021): i177—i186. http://dx.doi.org/10.1093/bioinformatics/btab309.

Full text

Abstract:

Abstract Motivation The construction of the compacted de Bruijn graph from collections of reference genomes is a task of increasing interest in genomic analyses. These graphs are increasingly used as sequence indices for short- and long-read alignment. Also, as we sequence and assemble a greater diversity of genomes, the colored compacted de Bruijn graph is being used more and more as the basis for efficient methods to perform comparative genomic analyses on these genomes. Therefore, time- and memory-efficient construction of the graph from reference sequences is an important problem. Results We introduce a new algorithm, implemented in the tool Cuttlefish, to construct the (colored) compacted de Bruijn graph from a collection of one or more genome references. Cuttlefish introduces a novel approach of modeling de Bruijn graph vertices as finite-state automata, and constrains these automata’s state-space to enable tracking their transitioning states with very low memory usage. Cuttlefish is also fast and highly parallelizable. Experimental results demonstrate that it scales much better than existing approaches, especially as the number and the scale of the input references grow. On a typical shared-memory machine, Cuttlefish constructed the graph for 100 human genomes in under 9 h, using ∼29 GB of memory. On 11 diverse conifer plant genomes, the compacted graph was constructed by Cuttlefish in under 9 h, using ∼84 GB of memory. The only other tool completing these tasks on the hardware took over 23 h using ∼126 GB of memory, and over 16 h using ∼289 GB of memory, respectively. Availability and implementation Cuttlefish is implemented in C++14, and is available under an open source license at https://github.com/COMBINE-lab/cuttlefish. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

4

Li, Mulin Jun, Pak Chung Sham, and Junwen Wang. "FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution." Bioinformatics 26, no. 22 (September 21, 2010): 2897–99. http://dx.doi.org/10.1093/bioinformatics/btq540.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Stovner, Endre Bakken, and Pål Sætrom. "epic2 efficiently finds diffuse domains in ChIP-seq data." Bioinformatics 35, no. 21 (March 28, 2019): 4392–93. http://dx.doi.org/10.1093/bioinformatics/btz232.

Full text

Abstract:

Abstract Summary Data from chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) generally contain either narrow peaks or broad and diffusely enriched domains. The SICER ChIP-seq caller has proven adept at finding diffuse domains in ChIP-seq data, but it is slow, requires much memory, needs manual installation steps and is hard to use. epic2 is a complete rewrite of SICER that is focused on speed, low memory overhead and ease-of-use. Availability and implementation The MIT-licensed code is available at https://github.com/biocore-ntnu/epic2. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

6

Shi, Christina Huan, and Kevin Y. Yip. "A general near-exact k-mer counting method with low memory consumption enables de novo assembly of 106× human sequence data in 2.7 hours." Bioinformatics 36, Supplement_2 (December 2020): i625—i633. http://dx.doi.org/10.1093/bioinformatics/btaa890.

Full text

Abstract:

Abstract Motivation In de novo sequence assembly, a standard pre-processing step is k-mer counting, which computes the number of occurrences of every length-k sub-sequence in the sequencing reads. Sequencing errors can produce many k-mers that do not appear in the genome, leading to the need for an excessive amount of memory during counting. This issue is particularly serious when the genome to be assembled is large, the sequencing depth is high, or when the memory available is limited. Results Here, we propose a fast near-exact k-mer counting method, CQF-deNoise, which has a module for dynamically removing noisy false k-mers. It automatically determines the suitable time and number of rounds of noise removal according to a user-specified wrong removal rate. We tested CQF-deNoise comprehensively using data generated from a diverse set of genomes with various data properties, and found that the memory consumed was almost constant regardless of the sequencing errors while the noise removal procedure had minimal effects on counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consistently performed the best in terms of memory usage, consuming 49–76% less memory than the second best method. When counting the k-mers from a human dataset with around 60× coverage, the peak memory usage of CQF-deNoise was only 10.9 GB (gigabytes) for k = 28 and 21.5 GB for k = 55. De novo assembly of 106× human sequencing data using CQF-deNoise for k-mer counting required only 2.7 h and 90 GB peak memory. Availability and implementation The source codes of CQF-deNoise and SH-assembly are available at https://github.com/Christina-hshi/CQF-deNoise.git and https://github.com/Christina-hshi/SH-assembly.git, respectively, both under the BSD 3-Clause license.

APA, Harvard, Vancouver, ISO, and other styles

7

Schulz, Tizian, Roland Wittler, Sven Rahmann, Faraz Hach, and Jens Stoye. "Detecting high-scoring local alignments in pangenome graphs." Bioinformatics 37, no. 16 (February 3, 2021): 2266–74. http://dx.doi.org/10.1093/bioinformatics/btab077.

Full text

Abstract:

Abstract Motivation Increasing amounts of individual genomes sequenced per species motivate the usage of pangenomic approaches. Pangenomes may be represented as graphical structures, e.g. compacted colored de Bruijn graphs, which offer a low memory usage and facilitate reference-free sequence comparisons. While sequence-to-graph mapping to graphical pangenomes has been studied for some time, no local alignment search tool in the vein of BLAST has been proposed yet. Results We present a new heuristic method to find maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph. Our approach additionally allows a comparison of similarity among sequences within the pangenome. We show that local alignment scores follow an exponential-tail distribution similar to BLAST scores, and we discuss how to estimate its parameters to separate local alignments representing sequence homology from spurious findings. An implementation of our method is presented, and its performance and usability are shown. Our approach scales sublinearly in running time and memory usage with respect to the number of genomes under consideration. This is an advantage over classical methods that do not make use of sequence similarity within the pangenome. Availability and implementation Source code and test data are available from https://gitlab.ub.uni-bielefeld.de/gi/plast. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

8

Shimmura, Keisuke, Yuki Kato, and Yukio Kawahara. "Bivartect: accurate and memory-saving breakpoint detection by direct read comparison." Bioinformatics 36, no. 9 (January 27, 2020): 2725–30. http://dx.doi.org/10.1093/bioinformatics/btaa059.

Full text

Abstract:

Abstract Motivation Genetic variant calling with high-throughput sequencing data has been recognized as a useful tool for better understanding of disease mechanism and detection of potential off-target sites in genome editing. Since most of the variant calling algorithms rely on initial mapping onto a reference genome and tend to predict many variant candidates, variant calling remains challenging in terms of predicting variants with low false positives. Results Here we present Bivartect, a simple yet versatile variant caller based on direct comparison of short sequence reads between normal and mutated samples. Bivartect can detect not only single nucleotide variants but also insertions/deletions, inversions and their complexes. Bivartect achieves high predictive performance with an elaborate memory-saving mechanism, which allows Bivartect to run on a computer with a single node for analyzing small omics data. Tests with simulated benchmark and real genome-editing data indicate that Bivartect was comparable to state-of-the-art variant callers in positive predictive value for detection of single nucleotide variants, even though it yielded a substantially small number of candidates. These results suggest that Bivartect, a reference-free approach, will contribute to the identification of germline mutations as well as off-target sites introduced during genome editing with high accuracy. Availability and implementation Bivartect is implemented in C++ and available along with in silico simulated data at https://github.com/ykat0/bivartect. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

9

Saha, Subrata, Jethro Johnson, Soumitra Pal, George M. Weinstock, and Sanguthevar Rajasekaran. "MSC: a metagenomic sequence classification algorithm." Bioinformatics 35, no. 17 (January 14, 2019): 2932–40. http://dx.doi.org/10.1093/bioinformatics/bty1071.

Full text

Abstract:

Abstract Motivation Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences. Results Microbiome researchers are generally interested in two objectives of a taxonomic classifier: (i) to detect prevalence, i.e. the taxa present in a sample, and (ii) to estimate their relative abundances. MSC is primarily designed to detect prevalence and experimental results show that MSC is indeed a more effective and efficient algorithm compared to the other state-of-the-art algorithms in terms of accuracy, memory and runtime. Moreover, MSC outputs an approximate estimate of the abundances. Availability and implementation The implementations are freely available for non-commercial purposes. They can be downloaded from https://drive.google.com/open?id=1XirkAamkQ3ltWvI1W1igYQFusp9DHtVl.

APA, Harvard, Vancouver, ISO, and other styles

10

Sater, Vincent, Pierre-Julien Viailly, Thierry Lecroq, Élise Prieur-Gaston, Élodie Bohers, Mathieu Viennot, Philippe Ruminy, Hélène Dauchel, Pierre Vera, and Fabrice Jardin. "UMI-VarCal: a new UMI-based variant caller that efficiently improves low-frequency variant detection in paired-end sequencing NGS libraries." Bioinformatics 36, no. 9 (January 27, 2020): 2718–24. http://dx.doi.org/10.1093/bioinformatics/btaa053.

Full text

Abstract:

Abstract Motivation Next-generation sequencing has become the go-to standard method for the detection of single-nucleotide variants in tumor cells. The use of such technologies requires a PCR amplification step and a sequencing step, steps in which artifacts are introduced at very low frequencies. These artifacts are often confused with true low-frequency variants that can be found in tumor cells and cell-free DNA. The recent use of unique molecular identifiers (UMI) in targeted sequencing protocols has offered a trustworthy approach to filter out artefactual variants and accurately call low-frequency variants. However, the integration of UMI analysis in the variant calling process led to developing tools that are significantly slower and more memory consuming than raw-reads-based variant callers. Results We present UMI-VarCal, a UMI-based variant caller for targeted sequencing data with better sensitivity compared to other variant callers. Being developed with performance in mind, UMI-VarCal stands out from the crowd by being one of the few variant callers that do not rely on SAMtools to do their pileup. Instead, at its core runs an innovative homemade pileup algorithm specifically designed to treat the UMI tags in the reads. After the pileup, a Poisson statistical test is applied at every position to determine if the frequency of the variant is significantly higher than the background error noise. Finally, an analysis of UMI tags is performed, a strand bias and a homopolymer length filter are applied to achieve better accuracy. We illustrate the results obtained using UMI-VarCal through the sequencing of tumor samples and we show how UMI-VarCal is both faster and more sensitive than other publicly available solutions. Availability and implementation The entire pipeline is available at https://gitlab.com/vincent-sater/umi-varcal-master under MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Conference papers on the topic "Low-memory bioinformatics"

1

"De Novo Short Read Assembly Algorithm with Low Memory Usage." In International Conference on Bioinformatics Models, Methods and Algorithms. SCITEPRESS - Science and and Technology Publications, 2014. http://dx.doi.org/10.5220/0004881002150220.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!