Journal articles on the topic 'Low-memory bioinformatics'

To see the other types of publications on this topic, follow the link: Low-memory bioinformatics.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Low-memory bioinformatics.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Rizk, G., D. Lavenier, and R. Chikhi. "DSK: k-mer counting with very low memory usage." Bioinformatics 29, no. 5 (January 16, 2013): 652–53. http://dx.doi.org/10.1093/bioinformatics/btt020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chikhi, Rayan, Antoine Limasset, and Paul Medvedev. "Compacting de Bruijn graphs from sequencing data quickly and in low memory." Bioinformatics 32, no. 12 (June 15, 2016): i201—i208. http://dx.doi.org/10.1093/bioinformatics/btw279.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Khan, Jamshed, and Rob Patro. "Cuttlefish: fast, parallel and low-memory compaction of de Bruijn graphs from large-scale genome collections." Bioinformatics 37, Supplement_1 (July 1, 2021): i177—i186. http://dx.doi.org/10.1093/bioinformatics/btab309.

Full text
Abstract:
Abstract Motivation The construction of the compacted de Bruijn graph from collections of reference genomes is a task of increasing interest in genomic analyses. These graphs are increasingly used as sequence indices for short- and long-read alignment. Also, as we sequence and assemble a greater diversity of genomes, the colored compacted de Bruijn graph is being used more and more as the basis for efficient methods to perform comparative genomic analyses on these genomes. Therefore, time- and memory-efficient construction of the graph from reference sequences is an important problem. Results We introduce a new algorithm, implemented in the tool Cuttlefish, to construct the (colored) compacted de Bruijn graph from a collection of one or more genome references. Cuttlefish introduces a novel approach of modeling de Bruijn graph vertices as finite-state automata, and constrains these automata’s state-space to enable tracking their transitioning states with very low memory usage. Cuttlefish is also fast and highly parallelizable. Experimental results demonstrate that it scales much better than existing approaches, especially as the number and the scale of the input references grow. On a typical shared-memory machine, Cuttlefish constructed the graph for 100 human genomes in under 9 h, using ∼29 GB of memory. On 11 diverse conifer plant genomes, the compacted graph was constructed by Cuttlefish in under 9 h, using ∼84 GB of memory. The only other tool completing these tasks on the hardware took over 23 h using ∼126 GB of memory, and over 16 h using ∼289 GB of memory, respectively. Availability and implementation Cuttlefish is implemented in C++14, and is available under an open source license at https://github.com/COMBINE-lab/cuttlefish. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Mulin Jun, Pak Chung Sham, and Junwen Wang. "FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution." Bioinformatics 26, no. 22 (September 21, 2010): 2897–99. http://dx.doi.org/10.1093/bioinformatics/btq540.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Stovner, Endre Bakken, and Pål Sætrom. "epic2 efficiently finds diffuse domains in ChIP-seq data." Bioinformatics 35, no. 21 (March 28, 2019): 4392–93. http://dx.doi.org/10.1093/bioinformatics/btz232.

Full text
Abstract:
Abstract Summary Data from chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) generally contain either narrow peaks or broad and diffusely enriched domains. The SICER ChIP-seq caller has proven adept at finding diffuse domains in ChIP-seq data, but it is slow, requires much memory, needs manual installation steps and is hard to use. epic2 is a complete rewrite of SICER that is focused on speed, low memory overhead and ease-of-use. Availability and implementation The MIT-licensed code is available at https://github.com/biocore-ntnu/epic2. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
6

Shi, Christina Huan, and Kevin Y. Yip. "A general near-exact k-mer counting method with low memory consumption enables de novo assembly of 106× human sequence data in 2.7 hours." Bioinformatics 36, Supplement_2 (December 2020): i625—i633. http://dx.doi.org/10.1093/bioinformatics/btaa890.

Full text
Abstract:
Abstract Motivation In de novo sequence assembly, a standard pre-processing step is k-mer counting, which computes the number of occurrences of every length-k sub-sequence in the sequencing reads. Sequencing errors can produce many k-mers that do not appear in the genome, leading to the need for an excessive amount of memory during counting. This issue is particularly serious when the genome to be assembled is large, the sequencing depth is high, or when the memory available is limited. Results Here, we propose a fast near-exact k-mer counting method, CQF-deNoise, which has a module for dynamically removing noisy false k-mers. It automatically determines the suitable time and number of rounds of noise removal according to a user-specified wrong removal rate. We tested CQF-deNoise comprehensively using data generated from a diverse set of genomes with various data properties, and found that the memory consumed was almost constant regardless of the sequencing errors while the noise removal procedure had minimal effects on counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consistently performed the best in terms of memory usage, consuming 49–76% less memory than the second best method. When counting the k-mers from a human dataset with around 60× coverage, the peak memory usage of CQF-deNoise was only 10.9 GB (gigabytes) for k = 28 and 21.5 GB for k = 55. De novo assembly of 106× human sequencing data using CQF-deNoise for k-mer counting required only 2.7 h and 90 GB peak memory. Availability and implementation The source codes of CQF-deNoise and SH-assembly are available at https://github.com/Christina-hshi/CQF-deNoise.git and https://github.com/Christina-hshi/SH-assembly.git, respectively, both under the BSD 3-Clause license.
APA, Harvard, Vancouver, ISO, and other styles
7

Schulz, Tizian, Roland Wittler, Sven Rahmann, Faraz Hach, and Jens Stoye. "Detecting high-scoring local alignments in pangenome graphs." Bioinformatics 37, no. 16 (February 3, 2021): 2266–74. http://dx.doi.org/10.1093/bioinformatics/btab077.

Full text
Abstract:
Abstract Motivation Increasing amounts of individual genomes sequenced per species motivate the usage of pangenomic approaches. Pangenomes may be represented as graphical structures, e.g. compacted colored de Bruijn graphs, which offer a low memory usage and facilitate reference-free sequence comparisons. While sequence-to-graph mapping to graphical pangenomes has been studied for some time, no local alignment search tool in the vein of BLAST has been proposed yet. Results We present a new heuristic method to find maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph. Our approach additionally allows a comparison of similarity among sequences within the pangenome. We show that local alignment scores follow an exponential-tail distribution similar to BLAST scores, and we discuss how to estimate its parameters to separate local alignments representing sequence homology from spurious findings. An implementation of our method is presented, and its performance and usability are shown. Our approach scales sublinearly in running time and memory usage with respect to the number of genomes under consideration. This is an advantage over classical methods that do not make use of sequence similarity within the pangenome. Availability and implementation Source code and test data are available from https://gitlab.ub.uni-bielefeld.de/gi/plast. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
8

Shimmura, Keisuke, Yuki Kato, and Yukio Kawahara. "Bivartect: accurate and memory-saving breakpoint detection by direct read comparison." Bioinformatics 36, no. 9 (January 27, 2020): 2725–30. http://dx.doi.org/10.1093/bioinformatics/btaa059.

Full text
Abstract:
Abstract Motivation Genetic variant calling with high-throughput sequencing data has been recognized as a useful tool for better understanding of disease mechanism and detection of potential off-target sites in genome editing. Since most of the variant calling algorithms rely on initial mapping onto a reference genome and tend to predict many variant candidates, variant calling remains challenging in terms of predicting variants with low false positives. Results Here we present Bivartect, a simple yet versatile variant caller based on direct comparison of short sequence reads between normal and mutated samples. Bivartect can detect not only single nucleotide variants but also insertions/deletions, inversions and their complexes. Bivartect achieves high predictive performance with an elaborate memory-saving mechanism, which allows Bivartect to run on a computer with a single node for analyzing small omics data. Tests with simulated benchmark and real genome-editing data indicate that Bivartect was comparable to state-of-the-art variant callers in positive predictive value for detection of single nucleotide variants, even though it yielded a substantially small number of candidates. These results suggest that Bivartect, a reference-free approach, will contribute to the identification of germline mutations as well as off-target sites introduced during genome editing with high accuracy. Availability and implementation Bivartect is implemented in C++ and available along with in silico simulated data at https://github.com/ykat0/bivartect. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
9

Saha, Subrata, Jethro Johnson, Soumitra Pal, George M. Weinstock, and Sanguthevar Rajasekaran. "MSC: a metagenomic sequence classification algorithm." Bioinformatics 35, no. 17 (January 14, 2019): 2932–40. http://dx.doi.org/10.1093/bioinformatics/bty1071.

Full text
Abstract:
Abstract Motivation Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences. Results Microbiome researchers are generally interested in two objectives of a taxonomic classifier: (i) to detect prevalence, i.e. the taxa present in a sample, and (ii) to estimate their relative abundances. MSC is primarily designed to detect prevalence and experimental results show that MSC is indeed a more effective and efficient algorithm compared to the other state-of-the-art algorithms in terms of accuracy, memory and runtime. Moreover, MSC outputs an approximate estimate of the abundances. Availability and implementation The implementations are freely available for non-commercial purposes. They can be downloaded from https://drive.google.com/open?id=1XirkAamkQ3ltWvI1W1igYQFusp9DHtVl.
APA, Harvard, Vancouver, ISO, and other styles
10

Sater, Vincent, Pierre-Julien Viailly, Thierry Lecroq, Élise Prieur-Gaston, Élodie Bohers, Mathieu Viennot, Philippe Ruminy, Hélène Dauchel, Pierre Vera, and Fabrice Jardin. "UMI-VarCal: a new UMI-based variant caller that efficiently improves low-frequency variant detection in paired-end sequencing NGS libraries." Bioinformatics 36, no. 9 (January 27, 2020): 2718–24. http://dx.doi.org/10.1093/bioinformatics/btaa053.

Full text
Abstract:
Abstract Motivation Next-generation sequencing has become the go-to standard method for the detection of single-nucleotide variants in tumor cells. The use of such technologies requires a PCR amplification step and a sequencing step, steps in which artifacts are introduced at very low frequencies. These artifacts are often confused with true low-frequency variants that can be found in tumor cells and cell-free DNA. The recent use of unique molecular identifiers (UMI) in targeted sequencing protocols has offered a trustworthy approach to filter out artefactual variants and accurately call low-frequency variants. However, the integration of UMI analysis in the variant calling process led to developing tools that are significantly slower and more memory consuming than raw-reads-based variant callers. Results We present UMI-VarCal, a UMI-based variant caller for targeted sequencing data with better sensitivity compared to other variant callers. Being developed with performance in mind, UMI-VarCal stands out from the crowd by being one of the few variant callers that do not rely on SAMtools to do their pileup. Instead, at its core runs an innovative homemade pileup algorithm specifically designed to treat the UMI tags in the reads. After the pileup, a Poisson statistical test is applied at every position to determine if the frequency of the variant is significantly higher than the background error noise. Finally, an analysis of UMI tags is performed, a strand bias and a homopolymer length filter are applied to achieve better accuracy. We illustrate the results obtained using UMI-VarCal through the sequencing of tumor samples and we show how UMI-VarCal is both faster and more sensitive than other publicly available solutions. Availability and implementation The entire pipeline is available at https://gitlab.com/vincent-sater/umi-varcal-master under MIT license. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
11

Chen, Jun-Yu, Ya-Ru Sun, Tao Xiong, Guan-Nan Wang, and Qing Chang. "Identification of HIBCH as a Fatty Acid Metabolism-Related Biomarker in Aortic Valve Calcification Using Bioinformatics." Oxidative Medicine and Cellular Longevity 2022 (October 7, 2022): 1–24. http://dx.doi.org/10.1155/2022/9558713.

Full text
Abstract:
Objective. To identify fatty acid metabolism-related biomarkers of aortic valve calcification (AVC) using bioinformatics and to research the role of immune cell infiltration for AVC. Methods. The AVC dataset was retrieved from the Gene Expression Omnibus database. R package is used for differential expression genes analysis and weighted gene coexpression analysis. The differentially coexpressed genes were identified by the Venn diagram, followed by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of differentially coexpressed genes. Functions closely related to AVC were identified by GO and KEGG enrichment analyses of differentially coexpressed genes. Genes related to fatty acid metabolism were retrieved from the Molecular Signatures Database (MSigDB) database. After removing duplicate genes, least absolute shrinkage and selection operator (LASSO) regression analysis, support vector machine recursive feature elimination (SVM-RFE), and random forest were applied to recognize biomarkers related to fatty acid metabolism in AVC. The CIBERSORT tool was used to analyze infiltration of immune cells in normal and AVC samples. Correlations between biomarkers and immune cells were calculated. Finally, HIBCH-related pathway was predicted by single-gene gene set enrichment analysis (GSEA). Results. 2416 differentially expressed genes and one coexpression module were identified. A total of 1473 differentially coexpressed genes were acquired. GO and KEGG enrichment analyses demonstrated that differentially coexpressed genes were closely related to fatty acid metabolism. LASSO regression analysis, SVM-REF, and random forest revealed that 3-hydroxyisobutyryl-CoA hydrolase (HIBCH) was a biomarker of fatty acid metabolism-related genes in AVC. Significant high levels of memory B cells were found in AVC than normal samples, while activated natural killer (NK) cells were significantly low in AVC than normal samples. A significantly positive relevance was observed between HIBCH and activated NK cells, regulatory T cells, monocytes, naïve B cells, activated dendritic cells, resting memory CD4 T cells, resting NK cells, and CD8 T cells. A significantly negative relevance was observed between HIBCH and activated memory CD4 T cells, memory B cells, neutrophils, gamma delta T cells, M0 macrophages, and plasma cells. The single-gene GSEA results suggest that HIBCH may work through the inhibition of multiple immune-related pathways. Conclusion. HIBCH is closely relevant to immune cell infiltration in AVC and could be applied as a diagnostic marker for AVC.
APA, Harvard, Vancouver, ISO, and other styles
12

Jing, Gongchao, Yufeng Zhang, Ming Yang, Lu Liu, Jian Xu, and Xiaoquan Su. "Dynamic Meta-Storms enables comprehensive taxonomic and phylogenetic comparison of shotgun metagenomes at the species level." Bioinformatics 36, no. 7 (December 3, 2019): 2308–10. http://dx.doi.org/10.1093/bioinformatics/btz910.

Full text
Abstract:
Abstract Motivation An accurate and reliable distance (or dissimilarity) among shotgun metagenomes is fundamental to deducing the beta-diversity of microbiomes. To compute the distance at the species level, current methods either ignore the evolutionary relationship among species or fail to account for unclassified organisms that cannot be mapped to definite tip nodes in the phylogenic tree, thus can produce erroneous beta-diversity pattern. Results To solve these problems, we propose the Dynamic Meta-Storms (DMS) algorithm to enable the comprehensive comparison of metagenomes on the species level with both taxonomy and phylogeny profiles. It compares the identified species of metagenomes with phylogeny, and then dynamically places the unclassified species to the virtual nodes of the phylogeny tree via their higher-level taxonomy information. Its high speed and low memory consumption enable pairwise comparison of 100 000 metagenomes (synthesized from 3688 bacteria) within 6.4 h on a single computing node. Availability and implementation An optimized implementation of DMS is available on GitHub (https://github.com/qibebt-bioinfo/dynamic-meta-storms) under a GNU GPL license. It takes the species-level profiles of metagenomes as input, and generates their pairwise distance matrix. The bacterial species-level phylogeny tree and taxonomy information of MetaPhlAn2 have been integrated into this implementation, while customized tree and taxonomy are also supported. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
13

Lai, Qing, and Haifei Feng. "An Immune-Related Prognostic Risk Model in Colon Cancer by Bioinformatics Analysis." Evidence-Based Complementary and Alternative Medicine 2022 (August 27, 2022): 1–13. http://dx.doi.org/10.1155/2022/3640589.

Full text
Abstract:
Colon cancer is one of the leading malignancies with poor prognosis worldwide. Immune cell infiltration has a potential prognostic value for colon cancer. This study aimed to establish an immune-related prognostic risk model for colon cancer by bioinformatics analysis. A total of 1670 differentially expressed genes (DEGs), including 177 immune-related genes, were identified from The Cancer Genome Atlas (TCGA) dataset. A prognostic risk model was constructed based on six critical immune-related genes (C-X-C motif chemokine ligand 1 (CXCL1), epiregulin (EREG), C-C motif chemokine ligand 24 (CCL24), fatty acid binding protein 4 (FABP4), tropomyosin 2 (TPM2), and semaphorin 3G (SEMA3G)). This model was validated using the microarray dataset GSE35982. In addition, Cox regression analysis showed that age and clinical stage were correlated with prognostic risk scores. Kaplan–Meier survival analysis showed that high risk scores correlated with low survival probabilities in patients with colon cancer. Downregulated TPM2, FABP4, and SEMA3G levels were positively associated with the activated mast cells, monocytes, and macrophages M2. Upregulated CXCL1 and EREG were positively correlated with macrophages M1 and activated T cells CD4 memory, respectively. Based on these results, we can conclude that the proposed prognostic risk model presents promising novel signatures for the diagnosis and prognosis prediction of colon cancer. This model may provide therapeutic benefits for the development of immunotherapy for colon cancer.
APA, Harvard, Vancouver, ISO, and other styles
14

Hajibaba, Majid, Mohsen Sharifi, and Saeid Gorgin. "The Influence of Memory-Aware Computation on Distributed BLAST." Current Bioinformatics 14, no. 2 (January 7, 2019): 157–63. http://dx.doi.org/10.2174/1574893613666180601080811.

Full text
Abstract:
Background: One of the pivotal challenges in nowadays genomic research domain is the fast processing of voluminous data such as the ones engendered by high-throughput Next-Generation Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard. Objective: To improve the performance of BLAST in the processing of voluminous data, we have applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data. Method: We have used a master-worker model for the processing of voluminous data alongside a memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for each worker, and consequently each worker further splits and formats its allocated data chunk according to the size of its memory. Each worker searches every split data one-by-one through a list of queries. Results: We have chosen a list of queries with different lengths to run insensitive searches in a huge database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance when workers used our proposed memory-aware technique compared to when they were not memory aware. Comparatively, experiments show even higher performance improvement, approximately 50 percent, when we applied our memory-aware technique to mpiBLAST. Conclusion: We have shown that memory-awareness in formatting bulky database, when running BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory environments. Even though distributed computing attempts to mitigate search time by partitioning and distributing database portions, our memory-aware technique alleviates negative effects of page-faults on performance.
APA, Harvard, Vancouver, ISO, and other styles
15

Li, Yu, Sheng Wang, Chongwei Bi, Zhaowen Qiu, Mo Li, and Xin Gao. "DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing." Bioinformatics 36, no. 8 (January 8, 2020): 2578–80. http://dx.doi.org/10.1093/bioinformatics/btz963.

Full text
Abstract:
Abstract Motivation Nanopore sequencing is one of the leading third-generation sequencing technologies. A number of computational tools have been developed to facilitate the processing and analysis of the Nanopore data. Previously, we have developed DeepSimulator1.0 (DS1.0), which is the first simulator for Nanopore sequencing to produce both the raw electrical signals and the reads. However, although DS1.0 can produce high-quality reads, for some sequences, the divergence between the simulated raw signals and the real signals can be large. Furthermore, the Nanopore sequencing technology has evolved greatly since DS1.0 was released. It is thus necessary to update DS1.0 to accommodate those changes. Results We propose DeepSimulator1.5 (DS1.5), all three modules of which have been updated substantially from DS1.0. As for the sequence generator, we updated the sample read length distribution to reflect the newest real reads’ features. In terms of the signal generator, which is the core of DeepSimulator, we added one more pore model, the context-independent pore model, which is much faster than the previous context-dependent one. Furthermore, to make the generated signals more similar to the real ones, we added a low-pass filter to post-process the pore model signals. Regarding the basecaller, we added the support for the newest official basecaller, Guppy, which can support both GPU and CPU. In addition, multiple optimizations, related to multiprocessing control, memory and storage management, have been implemented to make DS1.5 a much more amenable and lighter simulator than DS1.0. Availability and implementation The main program and the data are available at https://github.com/lykaust15/DeepSimulator. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
16

Li, Xiangtao, Shixiong Zhang, and Ka-Chun Wong. "Single-cell RNA-seq interpretations using evolutionary multiobjective ensemble pruning." Bioinformatics 35, no. 16 (December 28, 2018): 2809–17. http://dx.doi.org/10.1093/bioinformatics/bty1056.

Full text
Abstract:
Abstract Motivation In recent years, single-cell RNA sequencing enables us to discover cell types or even subtypes. Its increasing availability provides opportunities to identify cell populations from single-cell RNA-seq data. Computational methods have been employed to reveal the gene expression variations among multiple cell populations. Unfortunately, the existing ones can suffer from realistic restrictions such as experimental noises, numerical instability, high dimensionality and computational scalability. Results We propose an evolutionary multiobjective ensemble pruning algorithm (EMEP) that addresses those realistic restrictions. Our EMEP algorithm first applies the unsupervised dimensionality reduction to project data from the original high dimensions to low-dimensional subspaces; basic clustering algorithms are applied in those new subspaces to generate different clustering results to form cluster ensembles. However, most of those cluster ensembles are unnecessarily bulky with the expense of extra time costs and memory consumption. To overcome that problem, EMEP is designed to dynamically select the suitable clustering results from the ensembles. Moreover, to guide the multiobjective ensemble evolution, three cluster validity indices including the overall cluster deviation, the within-cluster compactness and the number of basic partition clusters are formulated as the objective functions to unleash its cell type discovery performance using evolutionary multiobjective optimization. We applied EMEP to 55 simulated datasets and seven real single-cell RNA-seq datasets, including six single-cell RNA-seq dataset and one large-scale dataset with 3005 cells and 4412 genes. Two case studies are also conducted to reveal mechanistic insights into the biological relevance of EMEP. We found that EMEP can achieve superior performance over the other clustering algorithms, demonstrating that EMEP can identify cell populations clearly. Availability and implementation EMEP is written in Matlab and available at https://github.com/lixt314/EMEP Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
17

Luo, Zhiwen, and Xinyu Bi. "Recurrence after surgery for concurrent metastatic colorectal cancer: The perspective of bioinformatics and machine learning." Journal of Clinical Oncology 38, no. 15_suppl (May 20, 2020): 4043. http://dx.doi.org/10.1200/jco.2020.38.15_suppl.4043.

Full text
Abstract:
4043 Background: Recurrence of concurrent metastatic colorectal cancers (mCRCs) after surgery is still a challenge. But mCRCs’ outcomes are heterogeneous, and no clinicopathological methods can predict its recurrence and guide postoperative treatment from an intrinsic cell activities and extrinsic immune microenvironment perspective. We aimed to identify such gene models. Methods: Gene expression analysis on CRCs. Based on metastasis-related genes, a metastatic evaluation model (MEM) was developed, dividing mCRCs into high and low recurrence risk clusters. Machine learning tested MEM’s importance to predict recurrence. Further investigating MEM’s two clusters made an immune prognostic model (IPM) with immune genes differentially expressed between MEM clusters. The predictive performance of MEM and IPM on prognosis was comprehensively analyzed and validated. The mechanism of IPM on the immune microenvironment and response to immuno/chemotherapy was analyzed extensively. Results: RNA data of 998 CRCs were analyzed. High postoperative recurrence risk in mCRCs was owing to immune response’s down-regulation, which was influenced by 3 MEM genes ( BAMBI, F13A1, LCN2) and their related 3 IPM genes ( SLIT2, CDKN2A, CLU). MEM and IPM were developed and validated on 239 mCRCs to differentiate a low and high recurrence risk (AUCs > 0.7). Functional enrichment analysis showed immune response and immune system diseases pathway represented the major function and pathway related to IPM gene. IPM high-risk group (IPM-high) had higher fractions of Tregs ( P= 0.04), lower fractions of resisting memory CD4+ T cells ( P= 0.02) than IPM-low. And stroma and immune cells in IPM-high samples were scant ( P= 0.0002, 0.001, respectively). In IPM-high, MHC class II molecules all down-expressed, and DNA methylation disordered. TIDE algorithm and GDSC analysis discovered IPM-low was more promising to respond to both anti-CTLA4 therapy ( P= 0.005) and common FDA targeted drugs ( P< 0.05), while IPM-high had nonresponse to both of them. But anti-CDKN2A agent with activation of MHC class II response might reverse the dilemma of this refractory mCRCs subgroup. Conclusions: Postoperative recurrence of mCRC is strongly related to immune microenvironment. Our two relative gene models could identify subgroups of mCRC with different recurrence risk, and stratify mCRCs sensitive to immune/chemotherapy, even highlight the ignored importance of MHC class II molecules on immunotherapy in mCRCs for the first time.
APA, Harvard, Vancouver, ISO, and other styles
18

Tang, Deyou, Daqiang Tan, Weihao Xiao, Jiabin Lin, and Juan Fu. "KMC3 and CHTKC: Best Scenarios, Deficiencies, and Challenges in High-Throughput Sequencing Data Analysis." Algorithms 15, no. 4 (March 24, 2022): 107. http://dx.doi.org/10.3390/a15040107.

Full text
Abstract:
Background: K-mer frequency counting is an upstream process of many bioinformatics data analysis workflows. KMC3 and CHTKC are the representative partition-based k-mer counting and non-partition-based k-mer counting algorithms, respectively. This paper evaluates the two algorithms and presents their best applicable scenarios and potential improvements using multiple hardware contexts and datasets. Results: KMC3 uses less memory and runs faster than CHTKC on a regular configuration server. CHTKC is efficient on high-performance computing platforms with high available memory, multi-thread, and low IO bandwidth. When tested with various datasets, KMC3 is less sensitive to the number of distinct k-mers and is more efficient for tasks with relatively low sequencing quality and long k-mer. CHTKC performs better than KMC3 in counting assignments with large-scale datasets, high sequencing quality, and short k-mer. Both algorithms are affected by IO bandwidth, and decreasing the influence of the IO bottleneck is critical as our tests show improvement by filtering and compressing consecutive first-occurring k-mers in KMC3. Conclusions: KMC3 is more competitive for running counter on ordinary hardware resources, and CHTKC is more competitive for counting k-mers in super-scale datasets on higher-performance computing platforms. Reducing the influence of the IO bottleneck is essential for optimizing the k-mer counting algorithm, and filtering and compressing low-frequency k-mers is critical in relieving IO impact.
APA, Harvard, Vancouver, ISO, and other styles
19

Jin, Hai, Hao Qi, Jin Zhao, Xinyu Jiang, Yu Huang, Chuangyi Gui, Qinggang Wang, et al. "Software Systems Implementation and Domain-Specific Architectures towards Graph Analytics." Intelligent Computing 2022 (October 29, 2022): 1–32. http://dx.doi.org/10.34133/2022/9806758.

Full text
Abstract:
Graph analytics, which mainly includes graph processing, graph mining, and graph learning, has become increasingly important in several domains, including social network analysis, bioinformatics, and machine learning. However, graph analytics applications suffer from poor locality, limited bandwidth, and low parallelism owing to the irregular sparse structure, explosive growth, and dependencies of graph data. To address those challenges, several programming models, execution modes, and messaging strategies are proposed to improve the utilization of traditional hardware and performance. In recent years, novel computing and memory devices have emerged, e.g., HMCs, HBM, and ReRAM, providing massive bandwidth and parallelism resources, making it possible to address bottlenecks in graph applications. To facilitate understanding of the graph analytics domain, our study summarizes and categorizes current software systems implementation and domain-specific architectures. Finally, we discuss the future challenges of graph analytics.
APA, Harvard, Vancouver, ISO, and other styles
20

Yook, Jang Soo, Randeep Rakwal, Junko Shibato, Kanako Takahashi, Hikaru Koizumi, Takeru Shima, Mitsushi J. Ikemoto, Leandro K. Oharomari, Bruce S. McEwen, and Hideaki Soya. "Leptin in hippocampus mediates benefits of mild exercise by an antioxidant on neurogenesis and memory." Proceedings of the National Academy of Sciences 116, no. 22 (May 13, 2019): 10988–93. http://dx.doi.org/10.1073/pnas.1815197116.

Full text
Abstract:
Regular exercise and dietary supplements with antioxidants each have the potential to improve cognitive function and attenuate cognitive decline, and, in some cases, they enhance each other. Our current results reveal that low-intensity exercise (mild exercise, ME) and the natural antioxidant carotenoid astaxanthin (AX) each have equivalent beneficial effects on hippocampal neurogenesis and memory function. We found that the enhancement by ME combined with AX in potentiating hippocampus-based plasticity and cognition is mediated by leptin (LEP) made and acting in the hippocampus. In assessing the combined effects upon wild-type (WT) mice undergoing ME with or without an AX diet for four weeks, we found that, when administrated alone, ME and AX separately enhanced neurogenesis and spatial memory, and when combined they were at least additive in their effects. DNA microarray and bioinformatics analyses revealed not only the up-regulation of an antioxidant gene, ABHD3, but also that the up-regulation of LEP gene expression in the hippocampus of WT mice with ME alone is further enhanced by AX. Together, they also increased hippocampal LEP (h-LEP) protein levels and enhanced spatial memory mediated through AKT/STAT3 signaling. AX treatment also has direct action on human neuroblastoma cell lines to increase cell viability associated with increased LEP expression. In LEP-deficient mice (ob/ob), chronic infusion of LEP into the lateral ventricles restored the synergy. Collectively, our findings suggest that not only h-LEP but also exogenous LEP mediates effects of ME on neural functions underlying memory, which is further enhanced by the antioxidant AX.
APA, Harvard, Vancouver, ISO, and other styles
21

Chu, Justin, Hamid Mohamadi, Emre Erhan, Jeffery Tse, Readman Chiu, Sarah Yeo, and Inanc Birol. "Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex Bloom filters." Proceedings of the National Academy of Sciences 117, no. 29 (July 8, 2020): 16961–68. http://dx.doi.org/10.1073/pnas.1903436117.

Full text
Abstract:
Alignment-free classification tools have enabled high-throughput processing of sequencing data in many bioinformatics analysis pipelines primarily due to their computational efficiency. Originallyk-mer based, such tools often lack sensitivity when faced with sequencing errors and polymorphisms. In response, some tools have been augmented with spaced seeds, which are capable of tolerating mismatches. However, spaced seeds have seen little practical use in classification because they bring increased computational and memory costs compared to methods that usek-mers. These limitations have also caused the design and length of practical spaced seeds to be constrained, since storing spaced seeds can be costly. To address these challenges, we have designed a probabilistic data structure called a multiindex Bloom Filter (miBF), which can store multiple spaced seed sequences with a low memory cost that remains static regardless of seed length or seed design. We formalize how to minimize the false-positive rate of miBFs when classifying sequences from multiple targets or references. Available within BioBloom Tools, we illustrate the utility of miBF in two use cases: read-binning for targeted assembly, and taxonomic read assignment. In our benchmarks, an analysis pipeline based on miBF shows higher sensitivity and specificity for read-binning than sequence alignment-based methods, also executing in less time. Similarly, for taxonomic classification, miBF enables higher sensitivity than a conventional spaced seed-based approach, while using half the memory and an order of magnitude less computational time.
APA, Harvard, Vancouver, ISO, and other styles
22

Hansson, Lotta, Hodjattallah Rabbani, Jan Fagerberg, Anders Österborg, and Håkan Mellstedt. "T-cell epitopes within the complementarity-determining and framework regions of the tumor-derived immunoglobulin heavy chain in multiple myeloma." Blood 101, no. 12 (June 15, 2003): 4930–36. http://dx.doi.org/10.1182/blood-2002-04-1250.

Full text
Abstract:
Abstract The idiotypic structure of the monoclonal immunoglobulin (Ig) in multiple myeloma (MM) might be regarded as a tumor-specific antigen. The present study was designed to identify T-cell epitopes of the variable region of the Ig heavy chain (VH) in MM (n = 5) using bioinformatics and analyze the presence of naturally occurring T cells against idiotype-derived peptides. A large number of human-leukocyte-antigen (HLA)–binding (class I and II) peptides were identified. The frequency of predicted epitopes depended on the database used: 245 in bioinformatics and molecular analysis section (BIMAS) and 601 in SYFPEITHI. Most of the peptides displayed a binding half-life or score in the low or intermediate affinity range. The majority of the predicted peptides were complementarity-determining region (CDR)–rather than framework region (FR)–derived (52%-60% vs 40%-48%, respectively). Most of the predicted peptides were confined to the CDR2-FR3-CDR3 “geographic” region of the Ig-VH region (70%), and significantly fewer peptides were found within the flanking (FR1-CDR1-FR2 and FR4) regions (P &lt; .01). There were 8– to 10–amino acid (aa) long peptides corresponding to the CDRs and fitting to the actual HLA-A/B haplotypes that spontaneously recognized, albeit with a low magnitude, type I T cells (interferon γ), indicating an ongoing major histocompatibility complex (MHC) class I–restricted T-cell response. Most of those peptides had a low binding half-life (BIMAS) and a low/intermediate score (SYFPEITHI). Furthermore, 15- to 20-aa long CDR1-3–derived peptides also spontaneously recognized type I T cells, indicating the presence of MHC class II–restricted T cells as well. This study demonstrates that a large number of HLA-binding idiotypic peptides can be identified in patients with MM. Such peptides may spontaneously induce a type I MHC class I– as well as class II–restricted memory T-cell response.
APA, Harvard, Vancouver, ISO, and other styles
23

Sun, Heng, Bowen Sui, Yu Li, Jun Yan, Mingming Cao, Lijia Zhang, and Songjiang Liu. "Analysis of the Significance of Immune Cell Infiltration and Prognosis of Non-Small-Cell Lung Cancer by Bioinformatics." Journal of Healthcare Engineering 2021 (September 22, 2021): 1–8. http://dx.doi.org/10.1155/2021/3284186.

Full text
Abstract:
Objective. To perform gene set enrichment analysis (GSEA) and analysis of immune cell infiltration on non-small-cell lung cancer (NSCLC) expression profiling microarray data based on bioinformatics, construct TICS scoring model to distinguish prognosis time, screen key genes and cancer-related pathways for NSCLC treatment, explore differential genes in NSCLC patients, predict potential therapeutic targets for NSCLC, and provide new directions for the treatment of NSCLC. Methods. Transcriptome data of 81 NSCLC patients and the GEO database were used to download matching clinical data (access number: GSE120622). Form the expression of non-small cell lung cancer (NSCLC). TICS values were calculated and grouped according to TICS values, and we used mRNA expression profile data to perform GSEA in non-small-cell lung cancer patients. Biological process (GO) analysis and DAVID and KOBAS were used to undertake pathway enrichment (KEGG) analysis of differential genes. Use protein interaction (PPI) to analyze the database STRING, and construct a PPI network model of target interaction. Results. We obtained 6 significantly related immune cells including activated B cells through the above analysis (Figure 1(b), p < 0.001 ). Based on the TICS values of significantly correlated immune cells, 41 high-risk and 40 low-risk samples were obtained. TICS values and immune score values were subjected to Pearson correlation coefficient calculation, and TICS and IMS values were found to be significantly correlated (Cor = 0.7952). Based on non-small-cell lung cancer mRNA expression profile data, a substantial change in mRNA was found between both the high TICS group as well as the low TICS group (FDR 0.01, FC > 2). The researchers discovered 730 mRNAs that were considerably upregulated in the high TICS group and 121 mRNAs that were considerably downregulated in the low TICS group. High confidence edges (combined score >0.7) were selected using STRING data; then, 191 mRNAs were matched to the reciprocal edges; finally, an undirected network including 164 points and 777 edges was constructed. Important members of cellular chemokine-mediated signaling pathways, such as CCL19, affect patient survival time. Conclusion. (1) The longevity of patients with non-small-cell lung cancer was substantially connected with the presence of immature B cells, activated B cells, MDSC, effector memory CD4 T cells, eosinophils, and regulatory T cells. (2) Immune-related genes such as CX3CR1, CXCR4, CXCR5, and CCR7, which are associated with the survival of NSCLC, affect the prognosis of NSCLC patients by regulating the immune process.
APA, Harvard, Vancouver, ISO, and other styles
24

Fan, Wei, Jun Ding, Shushu Liu, and Wei Zhong. "Development and validation of novel prognostic models based on RNA-binding proteins in breast cancer." Journal of International Medical Research 50, no. 6 (June 2022): 030006052211062. http://dx.doi.org/10.1177/03000605221106285.

Full text
Abstract:
Objectives We aimed to construct novel prognostic models based on RNA-binding proteins (RBPs) in breast cancer (BRCA) and explore their roles in this disease and their effects on tumor-infiltrating immune cells (TIICs). Methods Datasets were downloaded from the Gene Expression Omnibus (GEO) database. Functions and prognostic values of RBPs were systematically investigated using a series of bioinformatics analysis methods. TIICs were assessed using CIBERSORT. Results Overall, 138 differentially expressed RBPs were identified, of which 86 were upregulated and 52 were downregulated. Of these, 13 RBPs were identified as prognosis-related and adopted to construct an overall survival (OS) model, while 12 RBPs were used for the relapse-free survival (RFS) model. High-risk patients had poorer OS and RFS rates than low-risk patients. The results indicate that the OS and RFS models are good prognostic models with reliable predictive abilities. In addition, the proportions of CD8, CD4 naïve, and CD4 memory resting T cells, as well as resting dendritic cells, were significantly different between the low-risk and high-risk groups in the OS model. Conclusions OS and RFS signatures can be used as reliable BRCA prognostic biomarkers. This work will help understand the prognostic roles and functions of RBPs in BRCA.
APA, Harvard, Vancouver, ISO, and other styles
25

Dezordi, Filipe Zimmer, Antonio Marinho da Silva Neto, Túlio de Lima Campos, Pedro Miguel Carneiro Jeronimo, Cleber Furtado Aksenen, Suzana Porto Almeida, and Gabriel Luz Wallau. "ViralFlow: A Versatile Automated Workflow for SARS-CoV-2 Genome Assembly, Lineage Assignment, Mutations and Intrahost Variant Detection." Viruses 14, no. 2 (January 23, 2022): 217. http://dx.doi.org/10.3390/v14020217.

Full text
Abstract:
The COVID-19 pandemic is driven by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) that emerged in 2019 and quickly spread worldwide. Genomic surveillance has become the gold standard methodology used to monitor and study this fast-spreading virus and its constantly emerging lineages. The current deluge of SARS-CoV-2 genomic data generated worldwide has put additional pressure on the urgent need for streamlined bioinformatics workflows. Here, we describe a workflow developed by our group to process and analyze large-scale SARS-CoV-2 Illumina amplicon sequencing data. This workflow automates all steps of SARS-CoV-2 reference-based genomic analysis: data processing, genome assembly, PANGO lineage assignment, mutation analysis and the screening of intrahost variants. The pipeline is capable of processing a batch of around 100 samples in less than half an hour on a personal laptop or in less than five minutes on a server with 50 threads. The workflow presented here is available through Docker or Singularity images, allowing for implementation on laptops for small-scale analyses or on high processing capacity servers or clusters. Moreover, the low requirements for memory and CPU cores and the standardized results provided by ViralFlow highlight it as a versatile tool for SARS-CoV-2 genomic analysis.
APA, Harvard, Vancouver, ISO, and other styles
26

Strycharz, Justyna, Ewa Świderska, Adam Wróblewski, Marta Podolska, Piotr Czarny, Janusz Szemraj, Aneta Balcerczyk, Józef Drzewoski, Jacek Kasznicki, and Agnieszka Śliwińska. "Hyperglycemia Affects miRNAs Expression Pattern during Adipogenesis of Human Visceral Adipocytes—Is Memorization Involved?" Nutrients 10, no. 11 (November 15, 2018): 1774. http://dx.doi.org/10.3390/nu10111774.

Full text
Abstract:
microRNAs are increasingly analyzed in adipogenesis, whose deregulation, especially visceral, contributes to the development of diabetes. Hyperglycemia is known to affect cells while occurring acutely and chronically. Therefore, we aimed to evaluate the effect of hyperglycemia on human visceral pre/adipocytes from the perspective of microRNAs. The relative expression of 78 microRNAs was determined by TaqMan Low Density Arrays at three stages of HPA-v adipogenesis conducted under normoglycemia, chronic, and intermittent hyperglycemia (30 mM). Hierarchical clustering/Pearson correlation revealed the relationship between various microRNAs’ expression profiles, while functional analysis identified the genes and signaling pathways regulated by differentially expressed microRNAs. Hyperglycemia affected microRNAs’ expression patterns during adipogenesis, and at the stage of pre-adipocytes, differentiated and matured adipocytes compared to normoglycemia. Interestingly, the changes that were evoked upon hyperglycemic exposure during one adipogenesis stage resembled those observed upon chronic hyperglycemia. At least 15 microRNAs were modulated during normoglycemic and/or hyperglycemic adipogenesis and/or upon intermittent/chronic hyperglycemia. Bioinformatics analysis revealed the involvement of these microRNAs in cell cycles, lipid metabolism, ECM–receptor interaction, oxidative stress, signaling of insulin, MAPK, TGF-β, p53, and more. The obtained data suggests that visceral pre/adipocytes exposed to chronic/intermittent hyperglycemia develop a microRNAs’ expression pattern, which may contribute to further visceral dysfunction, the progression of diabetic phenotype, and diabetic complications possibly involving “epi”-memory.
APA, Harvard, Vancouver, ISO, and other styles
27

Xiao, Gong, Qiongjing Yuan, and Wei Wang. "The Prognostic Value of the m6A Score in Multiple Myeloma Based on Machine Learning." BioMedInformatics 1, no. 3 (September 27, 2021): 77–87. http://dx.doi.org/10.3390/biomedinformatics1030006.

Full text
Abstract:
Background: Multiple myeloma (MM) is one of the most common cancers of the blood system. N6-methyladenosine (m6A) plays an important role in cancer progression. We aimed to investigate the prognostic relevance of the m6A score in multiple myeloma through a series of bioinformatics analyses. Methods: The microarray dataset GSE4581 and GSE57317 used in this study were downloaded from the Gene Expression Omnibus (GEO) database. The m6A score was calculated using the GSVA package. The Random forests, univariate Cox regression analysis and Lasso analyses were performed for the differentially expressed genes (DEGs). Kaplan–Meier analysis and an ROC curve were used to diagnose the effectiveness of the model. Results: The GSVA R software package was used to predict the function. A total of 21 m6A genes were obtained, and 286 DEGs were identified between high and low m6A score groups. The risk model was constructed and composed of PRX, LBR, RB1, FBXL19-AS1, ARSK, MFAP3L, SLC44A3, UNC119 and SHCBP1. Functional analysis of risk score showed that with the increase in the risk score, Activated CD4 T cells, Memory B cells and Type 2 T helper cells were highly infiltrated. Conclusions: Immune checkpoints such as HMGB1, TGFB1, CXCL9 and HAVCR2 were significantly positively correlated with the risk score. We believe that the m6A score has a certain prognostic value in multiple myeloma.
APA, Harvard, Vancouver, ISO, and other styles
28

Li, Chunzhen, Weizheng Zhou, Ji Zhu, Qi Shen, Guangjie Wang, Ling Chen, and Tiejun Zhao. "Identification of an Immune-Related Gene Signature Associated with Prognosis and Tumor Microenvironment in Esophageal Cancer." BioMed Research International 2022 (December 23, 2022): 1–22. http://dx.doi.org/10.1155/2022/7413535.

Full text
Abstract:
Background. Esophageal cancer (EC) is a common malignant tumor of the digestive system with high mortality and morbidity. Current evidence suggests that immune cells and molecules regulate the initiation and progression of EC. Accordingly, it is necessary to identify immune-related genes (IRGs) affecting the biological behaviors and microenvironmental characteristics of EC. Methods. Bioinformatics methods, including differential expression analysis, Cox regression, and immune infiltration prediction, were conducted using R software to analyze the Gene Expression Omnibus (GEO) dataset. The Cancer Genome Atlas (TCGA) cohort was used to validate the prognostic signature. Patients were stratified into high- and low-risk groups for further analyses, including functional enrichment, immune infiltration, checkpoint relevance, clinicopathological characteristics, and therapeutic sensitivity analyses. Results. A prognostic signature was established based on 21 IRGs (S100A7, S100A7A, LCN1, CR2, STAT4, GAST, ANGPTL5, TRAV39, F2RL2, PGLYRP3, KLRD1, TRIM36, PDGFA, SLPI, PCSK2, APLN, TICAM1, ITPR3, MAPK9, GATA4, and PLAU). Compared with high-risk patients, better overall survival rates and clinicopathological characteristics were found in low-risk patients. The areas under the curve of the two cohorts were 0.885 and 0.718, respectively. Higher proportions of resting CD4+ memory T lymphocytes, M2 macrophages, and resting dendritic cells and lower proportions of follicular helper T lymphocytes, plasma cells, and neutrophils were found in the high-risk tumors. Moreover, the high-risk group showed higher expression of CD44 and TNFSF4, lower expression of PDCD1 and CD40, and higher TIDE scores, suggesting they may respond poorly to immunotherapy. High-risk patients responded better to chemotherapeutic agents such as docetaxel, doxorubicin, and gemcitabine. Furthermore, IRGs associated with tumor progression, including PDGFA, ITPR3, SLPI, TICAM1, and GATA4, were identified. Conclusion. Our immune-related signature yielded reliable value in evaluating the prognosis, microenvironmental characteristics, and therapeutic sensitivity of EC and may help with the precise treatment of this patient population.
APA, Harvard, Vancouver, ISO, and other styles
29

Zhaoran, Su, and Jia Weidong. "INSC Is a Prognosis-Associated Biomarker Involved in Tumor Immune Infiltration in Colon Adenocarcinoma." BioMed Research International 2022 (September 12, 2022): 1–9. http://dx.doi.org/10.1155/2022/5794150.

Full text
Abstract:
Aims. The purpose of this study was to investigate the correlation of INSC gene with the level of immune infiltration and clinical prognosis in colon adenocarcinoma (COAD) patients. Materials and Methods. INSC expression profile data and clinicopathological information of COAD patients were downloaded from TCGA. Xiantao bioinformatics tool was used to analyze the expression of INSC between the COAD group and the normal control group, and GEPIA2 was used to analyze the top 100 coexpressed genes. Logistic regression analysis was performed to assess the relationship between clinicopathological features and INSC. The Kaplan-Meier method and Cox regression model were used to perform the survival analysis. CIBERSORT algorithm was used to analyze the relationship between INSC expression and immune infiltration cells. Results. The expression level of INSC in COAD was significantly downregulated. The result of logistic regression analysis confirmed that tumor stage was the final influencing factor of INSC expression. The overall survival rate of INSC in the high expression group was higher than that of the low expression group, and it was an independent risk factor of prognosis. Enrichment results indicated that INSC was enriched in the regulation of T-helper 2 cell differentiation pathway. Immune infiltration analysis showed that INSC expression was positively correlated with the B cell plasma, T cell CD4+ memory resting, activated myeloid dendritic cells, and eosinophils. Conclusions. Our study found that the expression of INSC was significantly downregulated in COAD, which regulated immune-infiltrating cells during cancer development and was associated with malignant progression in COAD patients.
APA, Harvard, Vancouver, ISO, and other styles
30

Dong, Gaifang, Xueliang Fu, Honghui Li, and Xu Pan. "An Accurate Sequence Assembly Algorithm for Livestock, Plants and Microorganism Based on Spark." International Journal of Pattern Recognition and Artificial Intelligence 31, no. 08 (May 9, 2017): 1750024. http://dx.doi.org/10.1142/s0218001417500240.

Full text
Abstract:
Sequence Assembly is one of the important topics in bioinformatics research. Sequence assembly algorithm has always met the problems of poor assembling precision and low efficiency. In view of these two problems, this paper designs and implements a precise assembling algorithm under the strategy of finding the source of reads based on the MapReduce (SA-BR-MR) and Eulerian path algorithm. Computational results show that SA-BR-MR is more accurate than other algorithms. At the same time, SA-BR-MR calculates 54 sequences which are randomly selected from animals, plants and microorganisms with base lengths from hundreds to tens of thousands from NCBI. All matching rates of the 54 sequences are 100%. For each species, the algorithm summarizes the range of [Formula: see text] which makes the matching rates to be 100%. In order to verify the range of [Formula: see text] value of hepatitis C virus (HCV) and related variants, the randomly selected eight HCV variants are calculated. The results verify the correctness of [Formula: see text] range of hepatitis C and related variants from NCBI. The experiment results provide the basis for sequencing of other variants of the HCV. In addition, Spark platform is a new computing platform based on memory computation, which is featured by high efficiency and suitable for iterative calculation. Therefore, this paper designs and implements sequence assembling algorithm based on the Spark platform under the strategy of finding the source of reads (SA-BR-Spark). In comparison with SA-BR-MR, SA-BR-Spark shows a superior computational speed.
APA, Harvard, Vancouver, ISO, and other styles
31

Fan, Yunjian, Jie Zhang, Jiayu Shi, Lin Chen, Jiazhen Long, Shuqi Zhang, and Shuguang Liu. "Genetic Cross-Talk between Oral Squamous Cell Carcinoma and Type 2 Diabetes: The Potential Role of Immunity." Disease Markers 2022 (May 19, 2022): 1–24. http://dx.doi.org/10.1155/2022/6389906.

Full text
Abstract:
Background. This bioinformatics study was aimed at evaluating type 2 diabetes (T2D) and oral squamous cell carcinoma (OSCC) with regard to related immune cells and prognosis. Methods. We downloaded the data on OSCC from TCGA and for T2D from GEO database. Differentially expressed genes were analyzed, i.e., for OSCC genes with p value < 0.01 , log 2 FC > 0 ; and for T2D, genes with p value < 0.05 , log 2 FC > 0 . The intersected genes between OSCC and T2D were cross-talk genes. The expression values of immune-related genes in case samples in OSCC and T2D were assessed and underwent multivariate and univariate analysis (Cox-PH model). The intersection between the immune genes and cross-talk genes was taken and further analyzed by recursive feature elimination (RFE), survival analysis, and ROC analysis. Results. 1008 cross-talk genes were acquired, including 28 common upregulated, 440 common downregulated, and 540 differently regulated DEGs. We extracted the gene expression value of 782 immune-related genes, of which seven increased immune cells were obtained. From the results, plasmacytoid dendritic cells and effector memory CD8 T cells were highly negatively correlated in both OSCC and T2D. After estimating a low- and high-risk model for survival, we found that activated dendritic cell was significantly different between high and low groups ( p = 0.0095 ), followed by plasmacytoid dendritic cell. We integrated DE_Immune genes set 1 and DE_Immune genes set 2 and eight key immune-related cross-talk genes (C1QC, ABCD1, NOS2, PDIA4, IL1RN, ALOX15, CSE1L, and PSMC4) were evaluated. After ROC analysis, we obtained that ABCD1, C1QC, CSE1L, and PSMC4 had higher classification and prediction effects on OSCC and T2D. Conclusion. This study revealed a close relationship between T2D and OSCC. Thereby, plasmacytoid dendritic cell and activated dendritic cell-related genes were associated with the survival of T2D-related OSCC, while ABCD1, C1QC, CSE1L, and PSMC4 were the most important immune-related cross-talk genes.
APA, Harvard, Vancouver, ISO, and other styles
32

Müller, Robert, and Markus Nebel. "On the use of sequence-quality information in OTU clustering." PeerJ 9 (August 16, 2021): e11717. http://dx.doi.org/10.7717/peerj.11717.

Full text
Abstract:
Background High-throughput sequencing has become an essential technology in life science research. Despite continuous improvements in technology, the produced sequences are still not entirely accurate. Consequently, the sequences are usually equipped with error probabilities. The quality information is already employed to find better solutions to a number of bioinformatics problems (e.g. read mapping). Data processing pipelines benefit in particular (especially when incorporating the quality information early), since enhanced outcomes of one step can improve all subsequent ones. Preprocessing steps, thus, quite regularly consider the sequence quality to fix errors or discard low-quality data. Other steps, however, like clustering sequences into operational taxonomic units (OTUs), a common task in the analysis of microbial communities, are typically performed without making use of the available quality information. Results In this paper, we present quality-aware clustering methods inspired by quality-weighted alignments and model-based denoising, and explore their applicability to OTU clustering. We implemented the quality-aware methods in a revised version of our de novo clustering tool GeFaST and evaluated their clustering quality and performance on mock-community data sets. Quality-weighted alignments were able to improve the clustering quality of GeFaST by up to 10%. The examination of the model-supported methods provided a more diverse picture, hinting at a narrower applicability, but they were able to attain similar improvements. Considering the quality information enlarged both runtime and memory consumption, even though the increase of the former depended heavily on the applied method and clustering threshold. Conclusions The quality-aware methods expand the iterative, de novo clustering approach by new clustering and cluster refinement methods. Our results indicate that OTU clustering constitutes yet another analysis step benefiting from the integration of quality information. Beyond the shown potential, the quality-aware methods offer a range of opportunities for fine-tuning and further extensions.
APA, Harvard, Vancouver, ISO, and other styles
33

Kovjazin, Riva, David Shitrit, Rachel Preiss, Ilanit Haim, Lev Triezer, Leonardo Fuks, Abdel Rahman Nader, et al. "Characterization of Novel Multiantigenic Vaccine Candidates with Pan-HLA Coverage against Mycobacterium tuberculosis." Clinical and Vaccine Immunology 20, no. 3 (January 2, 2013): 328–40. http://dx.doi.org/10.1128/cvi.00586-12.

Full text
Abstract:
ABSTRACTThe low protection by the bacillus Calmette-Guérin (BCG) vaccine and existence of drug-resistant strains require better anti-Mycobacterium tuberculosisvaccines with a broad, long-lasting, antigen-specific response. Using bioinformatics tools, we identified five 19- to 40-mer signal peptide (SP) domain vaccine candidates (VCs) derived fromM. tuberculosisantigens. All VCs were predicted to have promiscuous binding to major histocompatibility complex (MHC) class I and II alleles in large geographic territories worldwide. Peripheral mononuclear cells (PBMC) from healthy naïve donors and tuberculosis patients exhibited strong proliferation that correlated positively with Th1 cytokine secretion only in healthy naïve donors. Proliferation to SP VCs was superior to that to antigen-matched control peptides with similar length and various MHC class I and II binding properties. T-cell lines induced to SP VCs from healthy naïve donors had increased CD44high/CD62L+activation/effector memory markers and gamma interferon (IFN-γ), but not interleukin-4 (IL-4), production in both CD4+and CD8+T-cell subpopulations. T-cell lines from healthy naïve donors and tuberculosis patients also manifested strong, dose-dependent, antigen-specific cytotoxicity against autologous VC-loaded orM. tuberculosis-infected macrophages. Lysis ofM. tuberculosis-infected targets was accompanied by high IFN-γ secretion. Various combinations of these five VCs manifested synergic proliferation of PBMC from selected healthy naïve donors. Immunogenicity of the best three combinations, termed Mix1, Mix2, and Mix3 and consisting of 2 to 5 of the VCs, was then evaluated in mice. Each mixture manifested strong cytotoxicity againstM. tuberculosis-infected macrophages, while Mix3 also manifested a VC-specific humoral immune response. Based on these results, we plan to evaluate the protection properties of these combinations as an improved tuberculosis subunit vaccine.
APA, Harvard, Vancouver, ISO, and other styles
34

Chen, Peixin, Lishu Zhao, Hao Wang, Liping Zhang, Wei Zhang, Jun Zhu, Jia Yu, et al. "Human leukocyte antigen class II-based immune risk model for recurrence evaluation in stage I–III small cell lung cancer." Journal for ImmunoTherapy of Cancer 9, no. 8 (August 2021): e002554. http://dx.doi.org/10.1136/jitc-2021-002554.

Full text
Abstract:
BackgroundImmunotherapy has revolutionized therapeutic patterns of small cell lung cancer (SCLC). Human leukocyte antigen class II (HLA class II) is related to antitumor immunity. However, the implications of HLA class II in SCLC remain incompletely understood.Materials and methodsWe investigated the expression patterns of HLA class II on tumor cells and tumor-infiltrating lymphocytes (TILs) by immunohistochemistry staining and its association with clinical parameters, immune markers, and recurrence-free survival (RFS) in 102 patients with stage I–III SCLC with radical surgery. Additionally, an HLA class II-based immune risk model was established by least absolute shrinkage and selection operator regression. With bioinformatics methods, we investigated HLA class II-related enrichment pathways and immune infiltration landscape in SCLC.ResultsHLA class II on tumor cells and TILs was positively expressed in 9 (8.8%) and 45 (44.1%) patients with SCLC, respectively. HLA class II on TILs was negatively associated with lymph node metastasis and positively correlated with programmed death-ligand 1 (PD-L1) on TILs (p<0.001) and multiple immune markers (CD3, CD4, CD8, FOXP3; p<0.001). Lymph node metastasis (OR 0.314, 95% CI 0.118 to 0.838, p=0.021) and PD-L1 on TILs (OR 3.233, 95% CI 1.051 to 9.95, p=0.041) were independent predictive factors of HLA class II on TILs. HLA class II positivity on TILs prompted a longer RFS (40.2 months, 95% CI 31.7 to 48.7 vs 28.8 months, 95% CI 21.4 to 36.3, p=0.014). HLA class II on TILs, PD-L1 on TILs, CD4, and FOXP3 were enrolled in the immune risk model, which categorized patients into high-risk and low-risk groups and had better power for predicting the recurrence than tumor stage. Pathway enrichment analyses showed that patients with high HLA class II expression demonstrated signatures of transmembrane transportation, channel activity, and neuroactive ligand–receptor interaction. High-risk SCLC patients had a higher proportion of T follicular helper cells (p=0.034) and a lower proportion of activated memory CD4-positive T cells (p=0.040) and resting dendritic cells (p=0.045) versus low-risk patients.ConclusionsHLA class II plays a crucial role in tumor immune microenvironment and recurrence prediction. This work demonstrates the prognostic and clinical values of HLA class II in patients with SCLC.
APA, Harvard, Vancouver, ISO, and other styles
35

А.Р., Дягель,, Зарубин, А.А., Назаренко, М.С., and Слепцов, А.А. "Single-cell RNA Sequencing Data Analysis Reveals Structural Diversity of T-lymphocyte and Macrophage Infiltration in Atherosclerosis." Nauchno-prakticheskii zhurnal «Medicinskaia genetika, no. 7 (July 29, 2022): 43–45. http://dx.doi.org/10.25557/2073-7998.2022.07.43-45.

Full text
Abstract:
Атеросклероз - многофакторное заболевание, существенный вклад в развитие которого вносят такие факторы, как иммунный ответ и физиологическое состояние артерии,. Иммунная система организма, включающая врожденный и адаптивный иммунный ответы, является драйвером патогенеза атеросклероза и играет не сколько протективную, антиатерогенную, сколько проатерогенную роль. На сегодняшний день пристальное внимание уделяется изучению Т-лимфоцитов, которые обладают как про-, так и антиатерогенными свойствами. В настоящем исследовании проведена оценка характера Т-клеточных субпопуляций в зависимости от макрофагального компонента в атеросклеротической бляшке. С этой целью были проанализированы транскриптомные данные единичных клеток атеросклеротических бляшек человека из базы данных SRА (SRP199578, SRP274629, SRP287809). Выявлена прямо-пропорциональная зависимость CD8+ стволовых Т-клеток памяти (TSCM) и отношения пула CD8+ Т-клеток к пулу макрофагов. Кроме того, показаны различия в лиганд-рецепторных взаимодействиях среди TSCM при разном соотношении CD8+ Т-клеток и макрофагов, в частности IL2RA, SELP, LEPR, SDC2, CCR6, TGFBR3, IL1R2, CD247, CXCR6, LIFR, ERBB3. Atherosclerosis is a multicomponent disease that depends on immune response, and physiological condition of the arteries. Innate and adaptive immunity act as a driver of the pathogenesis of atherosclerosis rather than play an atheroprotective role. T-lymphocytes, as a key part of the immune response, can exhibit both pro- and anti-atherogenic properties. In this regard, the study of this cell population is of particular interest. In the present study, we performed the analysis of T-cells subpopulations depending on the T-cells to macrophages ratio. We applied bioinformatics methods to three single-cell RNA-sequencing datasets of human coronary and carotid arteries downloaded from the SRA database (SRP199578, SRP274629, SRP287809). The present study demonstrated a direct proportionality between CD8+ T memory stem cells (TSCM) and the ratio of the CD8+ T cells pool to the macrophage pool. In addition, we revealed differences in TSCM receptors (IL2RA, SELP, LEPR, SDC2, CCR6, TGFBR3, IL1R2, CD247, CXCR6, LIFR, ERBB3) that are involved in ligand-receptor interactions with other atherosclerotic plaques cell populations, between two groups of patients with high and low CD8+ T-cells to macrophages ratio.
APA, Harvard, Vancouver, ISO, and other styles
36

Nuthikattu, Saivageethi, Dragan Milenkovic, John C. Rutledge, and Amparo C. Villablanca. "Sex-Dependent Molecular Mechanisms of Lipotoxic Injury in Brain Microvasculature: Implications for Dementia." International Journal of Molecular Sciences 21, no. 21 (October 31, 2020): 8146. http://dx.doi.org/10.3390/ijms21218146.

Full text
Abstract:
Cardiovascular risk factors and biologic sex play a role in vascular dementia which is characterized by progressive reduction in cognitive function and memory. Yet, we lack understanding about the role sex plays in the molecular mechanisms whereby lipid stress contributes to cognitive decline. Five-week-old low-density lipoprotein deficient (LDL-R −/−) male and female mice and C57BL/6J wild types (WT) were fed a control or Western Diet for 8 weeks. Differential expression of protein coding and non-protein coding genes (DEG) were determined in laser captured hippocampal microvessels using genome-wide microarray, followed by bioinformatic analysis of gene networks, pathways, transcription factors and sex/gender-based analysis (SGBA). Cognitive function was assessed by Y-maze. Bioinformatic analysis revealed more DEGs in females (2412) compared to males (1972). Hierarchical clusters revealed distinctly different sex-specific gene expression profiles irrespective of diet and genotype. There were also fewer and different biologic responses in males compared to females, as well as different cellular pathways and gene networks (favoring greater neuroprotection in females), together with sex-specific transcription factors and non-protein coding RNAs. Hyperlipidemic stress also resulted in less severe cognitive dysfunction in females. This sex-specific pattern of differential hippocampal microvascular RNA expression might provide therapeutic targets for dementia in males and females.
APA, Harvard, Vancouver, ISO, and other styles
37

Dupont, Marine, Paco Derouault, Virginie Pascal, Charlotte Rivière, Benjamin Ganne, Mélanie Boulin, Sophie Perron, et al. "Comparison with Private and Public Clonotypes Reveals That, While CLL Type Stereotyped BCR Are Produced in 90% of Healthy Subjects from 18 to 78 Years, They Are Not Accumulated and Are Mostly Found in Immature and Naïve B-Cells." Blood 138, Supplement 1 (November 5, 2021): 2621. http://dx.doi.org/10.1182/blood-2021-152714.

Full text
Abstract:
Abstract Introduction: One third of Chronic Lymphocytic Leukemia (CLL) patients harbor stereotyped B-Cell receptor (BCR). This is in contrast with the 1012 potential different BCRs that can be produced by the immune system. Some stereotyped BCR are also highly associated with prognosis. For example, the stereotyped subset #2 identifies aggressive CLLs with frequent SF3B1 and ATM alterations. This underlies an important role of BCR selection for emergence of CLL clones. Aim and methods: We raised the question of the existence and quantification of B-cells with stereotyped BCR of the CLL-type in a series of 69 healthy subjects (HS) from 18 to 78 years old, without any past of hematological history nor any detectable monotypic B-cells by flow cytometry. HS were divided in age three groups, adults (18 - 55 years, n= 31), senior-1 (55 - 65 years, n = 14) and senior-2 (&gt; 65 years, n = 24). The variable region of the immunoglobulin heavy chain gene (IGHV) was amplified from genomic DNA of purified B-cells sequenced on an Illumina MiSeq and annotated with the online IMGT HighV-Quest tool. Clonotypes were defined as IGHV genes with the same rearranged V and J segment and with the same CDR3 amino acid sequence with one mismatch tolerance, and if a minimal threshold read number was reached (ranging 5 to 15 according to the DNA input and sequencing depth).Public clonotypes were those shared in between at least 4 (5%) HS. Detection of the 19 major stereotyped CLL's BCRs was performed with an in house R script. And was confirmed with the ARResT/AssignSubsets online tool (V Bistry et al, Bioinformatics, 2015). Exploration of the IGHV repertoire of the different B-cell subpopulations was done from public data (Mitsunaga, Mol Cell Proteomics, 2020). Results: To be independent of the number of reads and to compare HS to each other, clonotypic frequencies were ranked in 150 increasing intervals. In each case, the mean rank of clonotypic frequency ranged 3 to 7 with an SD in between 2 and 6 and was independent of the age category. A Z-score was calculated for each clonotype. Less than 1% of private clonotypes had a Z-score&gt;1.96, while 10% to 30% of public clonotypes had a significant Z-score&gt;1.96 (figure 1). This indicates that public clonotypes are often accumulated when compared to their private counterpart. A stereotyped CLL's BCR was found in 90% HS all stereotypes together. Subsets #2 and #5 were found in more than 50% of cases (figure 2). Frequencies of subsets #3, #59, #7C, #202, #14, #64B, decreased from 23% to 6% cases. Subsets #1, #4, #6, #8, #12, #16, #99 and #201 were never found. Frequencies of stereotyped BCRs ranged 5.2x10 -5 to 1.2x10 -4, being similar in the different HS age categories. Ranks of stereotyped BCR frequencies were almost always below the mean rank of private clonotypes, that indicates a very low abundance (figure 3). Junction analyses suggest that B-cells with stereotyped BCRs were polyclonal since they were very likely to derive from different B-cell precursors. Exploring the longitudinal repertoires of Mitsunaga and Snyder(Mol Cell Proteomics 2020), showed that stereotyped BCRs were almost restricted to immature and naïve B-cells and were fugace while public clonotypes were recurrently and frequently found in memory B-cells and could be stable over time for many of them. Conclusion: B-cells with stereotyped CLL BCRs were easily detected in healthy people whatever their age category. Remarkably, subsets #2 and #5 that are associated with aggressive CLL were found in more than 50% healthy donors at any age. Comparison with private and public clonotypic repertoire reveals that abundance of stereotyped BCR was almost always very low and was restricted to immature and naïve B-cells. Therefore, B-cells with stereotyped BCRs are produced throughout life in almost everybody. But these B-cells do not accumulate and do not seem to undergo to B-cell maturation. This suggests that stereotyped BCRs are not intrinsically prone to transformation and that clonal selection of CLL cells would involve additional events such as antigenic encounter or oncogenic alterations. Figure 1 Figure 1. Disclosures No relevant conflicts of interest to declare.
APA, Harvard, Vancouver, ISO, and other styles
38

Attaf, Noudjoud, Inaki Cervera-Marzal, Laurine Gil, Chuang Dong, Jean-Marc Navarro, Lionel Spinelli, Bertrand Nadel, and Pierre Milpied. "Single-Cell RNA Sequencing Identifies a Pseudo-Immune Differentiation Axis As the Main Source of Functional Heterogeneity in Follicular Lymphoma B-Cells." Blood 134, Supplement_1 (November 13, 2019): 548. http://dx.doi.org/10.1182/blood-2019-125718.

Full text
Abstract:
Introduction Follicular Lymphoma (FL), the second most frequent lymphoma in adults, often presents as a disseminated disease at diagnosis. Despite a generally slow progression and a median overall survival of more than 15 years with current chemo-immunotherapies, FL patients often suffer from multiple relapses. Yet, the biological mechanisms promoting FL dissemination, progression and relapse are still poorly understood. FL, like most B-cell lymphomas, originates from germinal centers (GC) where B-cells physiologically undergo clonal expansion, antibody affinity maturation, and differentiation into antibody-producing plasma cells (PC) or recirculating memory (Mem) B-cells. Recently, we provided evidence that FL B-cells are not blocked in a GC B-cell state but might adopt new dynamic modes of functional diversity (Milpied et al., Nature Immunology 2018), yet the main sources of intratumoral heterogeneity within FL remained to be identified. Methods Frozen live cell suspensions were obtained from the CeVi collection of the Institute Carnot/Calym (ANR, France). We initially applied a plate-based 5'-end single-cell RNAseq (scRNAseq) method for deep integrative single-cell analyses of transcriptome, B-cell receptor (BCR) sequence, and surface phenotype on FACS-sorted FL B-cells (4 patients, lymph node biopsies) and their non-malignant counterparts (6 adult healthy donors, spleen and tonsil samples). We confirmed our findings on additional FL samples with high-throughput droplet-based 3'-end scRNAseq (9 patients, lymph node biopsies), and 5'-end scRNAseq paired with BCRseq (5 patients, lymph node biopsies). Custom and existing bioinformatics analysis pipelines were combined for quality control and cell filtering, dimensionality reduction (PCA, t-SNE, UMAP), clustering, pseudo-time analysis, BCR sequence analysis and integrative data analysis. We further validated our transcriptomic data with FACS-based surface and intracellular protein analysis (8 patients, lymph node biopsies). Results Consistent with our previous findings, FL B-cells were transcriptionally diverse, with most cells exhibiting a patient-specific gene expression profile distinct from PC, GC and Mem cells. Challenging the mainstream view of a differentiation blockade in FL, we identified rare FL B-cells carrying a PC-like profile (including low expression of MS4A1/CD20, high expression of XBP1, MZB1, PRDM1). PC-like FL B-cells expressed high levels of the tumor clonal BCR heavy and light chain mRNA, and BCR sequence phylogenetic analysis revealed that those cells did not branch out from a specific tumor subclone. Most importantly, we found that the molecular profiles of the vast majority of FL B-cells spanned a continuum of transitional states between proliferating GC-like and quiescent Mem-like gene expression states. Principal component analysis and pseudo-time reconstruction revealed that pseudo-immune differentiation axis was consistently the main source of intra-sample transcriptional heterogeneity. On top of cell cycle related genes, GC-like FL B-cells notably expressed AICDA, BCL6, RGS13, NANS, CD81, and CD38 genes. By contrast, Mem-like FL B-cells expressed CD44, GPR183, CD69, CXCR4, CCR7, SELL, KLF2, suggesting that those cells may not be confined to the FL follicles. Flow cytometry analysis of dissociated FL tumors confirmed that only the CD38hiCD81hi subset of FL B-cells (GC-like cells), expressed Ki67 and high levels of Bcl6, whereas only CD38negCD81neg FL B-cells (Mem-like cells) consistently contained CD44+ and GPR183+ cells. Conclusions Our study suggests that FL B-cells hijack the physiological GC differentiation process to dynamically alternate between GC-like and Mem-like states that might be responsible for FL progression and dissemination, respectively. We anticipate that such FL-specific clonal dynamics may be orchestrated by extrinsic signals delivered by tumor-infiltrating T cells. Disclosures Milpied: Innate Pharma: Research Funding; Institut Roche: Research Funding.
APA, Harvard, Vancouver, ISO, and other styles
39

Almeida, Diogo, Ida Skov, Jesper Lund, Afsaneh Mohammadnejad, Artur Silva, Fabio Vandin, Qihua Tan, Jan Baumbach, and Richard Röttger. "Jllumina - A comprehensive Java-based API for statistical Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data processing." Journal of Integrative Bioinformatics 13, no. 4 (October 1, 2016): 24–32. http://dx.doi.org/10.1515/jib-2016-294.

Full text
Abstract:
Summary Measuring differential methylation of the DNA is the nowadays most common approach to linking epigenetic modifications to diseases (called epigenome-wide association studies, EWAS). For its low cost, its efficiency and easy handling, the Illumina HumanMethylation450 BeadChip and its successor, the Infinium MethylationEPIC BeadChip, is the by far most popular techniques for conduction EWAS in large patient cohorts. Despite the popularity of this chip technology, raw data processing and statistical analysis of the array data remains far from trivial and still lacks dedicated software libraries enabling high quality and statistically sound downstream analyses. As of yet, only R-based solutions are freely available for low-level processing of the Illumina chip data. However, the lack of alternative libraries poses a hurdle for the development of new bioinformatic tools, in particular when it comes to web services or applications where run time and memory consumption matter, or EWAS data analysis is an integrative part of a bigger framework or data analysis pipeline. We have therefore developed and implemented Jllumina, an open-source Java library for raw data manipulation of Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data, supporting the developer with Java functions covering reading and preprocessing the raw data, down to statistical assessment, permutation tests, and identification of differentially methylated loci. Jllumina is fully parallelizable and publicly available at http://dimmer.compbio.sdu.dk/download.html
APA, Harvard, Vancouver, ISO, and other styles
40

Wang, Zhihuai, Adeel Ur Rehman, Xihu Qin, Chunfu Zhu, and Siyuan Wu. "PI3K/AKT/mTOR Pathway-Associated Genes Reveal a Putative Prognostic Signature Correlated with Immune Infiltration in Hepatocellular Carcinoma." Disease Markers 2022 (May 9, 2022): 1–18. http://dx.doi.org/10.1155/2022/7545666.

Full text
Abstract:
Background. The dysregulated PI3K/AKT/mTOR pathway acts as the main regulator of tumorigenesis in hepatocellular carcinoma (HCC). Aim. Here, we identify the prognostic significance of PI3K/AKT/mTOR pathway-associated genes (PAGs) as well as their putative signature based on PAGs in an HCC patient’s cohort. Methods. The transcriptomic data and clinical feature sets were queried to extract the putative prognostic signature. Results. We identified nine PAGs with different expressions. GO and KEGG indicated that these differentially expressed genes were associated with various carcinogenic pathways. Based on the signature-computed median risk score, we categorized the patients into groups of low risk and high risk. The survival time for the low-risk group is longer than that of the high-risk group in Kaplan-Meier (KM) curves. The prognostic value of risk score ( ROC = 0.736 ) of receiver operating characteristic (ROC) curves performed better in comparison to that of other clinicopathological features. In both the GEO database and ICGC database, these outcomes were verified. The predictions of the overall survival rates in HCC patients of 1 year, 3 years, and 5 years can be obtained separately from the nomogram. The risk score was associated with the immune infiltrations of CD8 T cells, activated CD4 memory T cells, and follicular helper T cells, and the expression of immune checkpoints (PD-1, TIGIT, TIM-3, BTLA, LAG-3, and CTLA4) was positively relevant to the risk score. The sensitivity to several chemotherapeutic drugs can also be revealed by the signature. CDK1, PITX2, PRKAA2, and SFN were all upregulated in the tumor tissue of clinical samples. Conclusion. A putative and differential dataset-validated prognostic signature on the basis of integrated bioinformatic analysis was established in our study, providing the immunotherapeutic targets as well as the personalized treatment in HCC with neoteric insight.
APA, Harvard, Vancouver, ISO, and other styles
41

Coscia, Marta, Candida Vitale, Silvia Peola, Chiara Riganti, Daniela Angelini, Myriam Foglietta, Francesca Pantaleoni, et al. "The Defective Proliferation of Vgamma9Vdelta2 T Cells to Zoledronic Acid In Chronic Lymphocytic Leukemia (CLL) Is a Powerful Time to First Treatment (TFT) Predictor Associated with the IGHV Mutational Status." Blood 116, no. 21 (November 19, 2010): 3602. http://dx.doi.org/10.1182/blood.v116.21.3602.3602.

Full text
Abstract:
Abstract Abstract 3602 The clinical course of chronic lymphocytic leukemia (CLL) depends on the intrinsic tumor cell features but also on the interactions between tumor cells, local microenvironment and host immunity. The interplay between CLL cells and conventional αβ T cells has already been investigated in details, whereas very little is known about the role of Vgamma9Vdelta2 T cells. These cells have intrinsic antitumor properties which can be further enhanced by stimulation with aminobisphosphonates such as zoledronic acid (ZA) via monocytes or other antigen-presenting cells. ZA targets the mevalonate (Mev) pathway and induces the intracellular accumulation and release of intermediate metabolites, like isopentenyl pyrophosphate (IPP), which are very similar to the natural ligands of Vgamma9Vdelta2 T cells. In this study we have performed a phenotypic and functional analysis of Vgamma9Vdelta2 T cells in 93 untreated CLL patients and correlated the results with intrinsic CLL cell features and clinical outcome. Stimulation of peripheral blood mononuclear cells with ZA induced Vgamma9Vdelta2 T cell proliferation in 47/93 patients (responders, R), but not in 46/93 patients (non-responders, NR). Vgamma9Vdelta2 T-cell subset distribution of R CLL was similar to healthy donors, whereas effector memory (EM) and terminally differentiated effector memory (TEMRA) cells predominated in NR CLL. A significant association was found between the R/NR status of Vgamma9Vdelta2 T cells and the mutational status of the tumor IGHV genes: 77% of R patients were M, whereas 70% of UM patients were NR. To test the hypothesis that this difference reflected a different activity of the Mev pathway in M and UM CLL cells, we performed a bioinformatic elaboration of data obtained from gene expression profiling of CLL cells and a biochemical quantification of the Mev pathway in CLL cells including the intermediate metabolites farnesyl pyrophosphate (FPP) and IPP, and the final product cholesterol. The Mev pathway was significantly more active in UM than in M CLL cells, suggesting that the IPP overproduction by UM CLL cells is responsible for a chronic stimulation of Vgamma9Vdelta2 T cells leading to their differentiation into EM and TEMRA cells. This biochemical-driven immunoediting process has clinical implications. After a median follow-up of 46 months from diagnosis, univariate analysis identified R status as a predictor of reduced TFT (NR: 59 months vs R: not reached; p=0.01). The R/NR status also allows to further identify two subsets in UM CLL patients (R-UM and NR-UM) with significantly different TFT (NR-UM: 32 months vs R-UM: not reached; p=0.02). In multivariate analysis, Binet stage (P=0.027), IGHV mutational status (p=0.016), R/NR status (p=0.007), high and low-risk cytogenetic abnormalities (p<0.001) and lymphocytosis at the time of diagnosis (p<0.001) were independent TFT predictors. In conclusion, we have identified a novel TFT predictor based on a putative interaction between the Mev pathway of CLL cells and Vgamma9Vdelta2 T cells. These results further strengthen the importance of tumor-host immune interactions in CLL evolution. Disclosures: Boccadoro: Celgene: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Janssen-Cilag: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding. Massaia:Novartis: Honoraria, Research Funding.
APA, Harvard, Vancouver, ISO, and other styles
42

Awad, Mark, David Spigel, Edward Garon, Saiama Waqar, Aaron Lisberg, Melissa Moles, Jennifer Tepper, et al. "364 A personal neoantigen vaccine NEO-PV-01 in combination with chemotherapy and pembrolizumab induces broad de novo immune responses in first line, non-squamous NSCLC." Journal for ImmunoTherapy of Cancer 8, Suppl 3 (November 2020): A389. http://dx.doi.org/10.1136/jitc-2020-sitc2020.0364.

Full text
Abstract:
BackgroundNeoantigens arising from mutations in cancer cell DNA are important targets for T cell mediated anti-tumor immunity. NEO-PV-01 is a personal neoantigen vaccine of up to 20 peptides (14–35 amino acids) based on a patient‘s HLA profile and bioinformatic analysis of tumor neoantigens. We report here clinical and immune data for NT-002, a Phase 1b study of NEO-PV-01 with pemetrexed, carboplatin, and pembrolizumab as first-line therapy for advanced non-squamous NSCLC.MethodsPatients received 12 weeks of pembrolizumab (Q3W) plus carboplatin and pemetrexed. NEO-PV-01 was then given subcutaneously in a prime-boost format spanning 12 weeks, followed by pembrolizumab for up to 2 years. The primary objective was safety; secondary objectives included overall response rate (ORR), clinical benefit rate (CBR), progression-free survival (PFS), and overall survival (OS). Comprehensive immune assessments were performed with peripheral blood mononuclear cells and biopsies collected at weeks 0, 12, and 24.ResultsA total of 38 patients initiated study treatment (ITT population); 21 patients received at least 1 dose NEO-PV-01 (vaccinated group, VAX). The demographics included 61% women and 82% with a smoking history. The regimen was well tolerated consistent with the pembrolizumab plus pemetrexed/carboplatin safety profile, with transient low-grade injection site reactions present in VAX (29%). Treatment-related study discontinuations were rare (2/38). The ORR/CBR for the ITT and VAX were 37%/69% and 57%/95%, respectively. Median PFS was 7.2 months (95% CI: 5.6,16.8) for both the ITT and VAX, and median OS 16.8 months (95% CI: 11.6, NR) for both groups. Interim immune analysis on 8 patients revealed neoantigen-specific CD4+ and CD8+ T cell responses against 48% of vaccine peptides. T cell responses were durable at 52 weeks and exhibited a memory phenotype with cytolytic potential. Epitope spread was observed in 3 of 5 patients analyzed thus far. Further, assessments of immune and molecular correlates of clinical response identified both tumor mutation burden and baseline levels of T cell infiltration in tumor as highly predictive of durable PFS (p= 0.005 and p= 7.2e-07 (for CD8), respectively). Additional correlates of clinical outcomes with molecular and immunologic responses will be presented.ConclusionsNEO-PV-01 in combination with pembrolizumab and carboplatin/pemetrexed is feasible, has a good safety profile, and induces de novo immune responses in first line non-squamous NSCLC. The association of baseline disease characteristics to prolonged PFS suggests future patient enrichment strategies for evaluation of this novel regimen in a phase 2 trial.Trial RegistrationNCT03380871Ethics ApprovalThe clinical study is conducted in accordance with ethical principles founded in the Declaration of Helsinki and approved by the local institutional review board and health authorities.
APA, Harvard, Vancouver, ISO, and other styles
43

Vergani, Stefano, Davide Bagnara, Andrea Nicola Mazzarello, Gerardo Ferrer, Sophia Yancopoulos, Kostas Stamatopoulos, and Nicholas Chiorazzi. "CLL Stereotyped IGHV-D-J Rearrangements Can Be Detected Throughout Normal B-Cell Developmental Stages in Aged People When Using Ultra-Deep, Next Generation Sequencing Techniques." Blood 128, no. 22 (December 2, 2016): 2028. http://dx.doi.org/10.1182/blood.v128.22.2028.2028.

Full text
Abstract:
Abstract B cells originate in the bone marrow where precursors pass through a series of highly regulated processes to generate a functional B-cell receptor (BCR). IGHV-D-J and IGLV-J rearrangements are achieved via a combinatorial process with associated amino acid additions/deletions at recombination sites leading to an immense variety of structurally diverse BCRs. Due to the large number of possibilities, potentially harmful BCRs recognizing self-antigens can be generated. In the early stages of B cell development a number of distinct selection mechanisms eliminate higher affinity auto-reactive B cells. Chronic lymphocytic leukemia (CLL) is a CD5+ B cell malignancy. Besides having a unique phenotype not always assignable to a defined B-cell subset, many CLL B cells express auto-reactive BCRs; those that do not can often be shown to have derived from self-reactive clones. In addition, certain CLL patients express virtually identical IGHV-D-J rearrangements that can be organized into categories called "stereotyped BCRs". To date, stereotyped BCRs have not been conclusively shown to exist in the normal B-cell repertoire. This might be due to strict autoreactive censoring that eliminates stereotyped BCRs during B-cell ontogeny and/or because the number of IGHV-D-J rearrangements sequenced from individuals has not been sufficient to detect infrequent Igs. Therefore, using ultradeep, next generation IGHV-D-J sequencing we searched for CLL-specific stereotyped BCR rearrangements in the early phases of normal B cell maturation, before all mechanisms of selection are complete. We sorted several B-cell populations from bone marrow samples obtained from healthy patients undergoing hip replacement without an autoimmunity or chronic infection history. The MiSeq platform was used to determine the IGHV-D-J repertoire of pro-, pre-, immature, transitional, naive, and memory B cells, and of plasmablasts. Sequences were analyzed using the pRESTO bioinformatic tool and R. Using VDJ rearrangement and CDR3 length criteria we focused on sequences potentially similar to those in CLL. Sequences were submitted to an online tool (ARReST/Assign subset) to be attributed to a particular CLL stereotyped subset based on a degree of certitude. We identified 156 sequences belonging to defined CLL stereotyped subsets with varying levels of similarity (extreme to low), most residing in early/antigen-independent compartments such as pre-B cells and immature, transitional, and naive B cells. When stereotyped rearrangements of all degrees of certainty were considered, the greatest number of CLL-like stereotypes were found among the most immature B-cell populations, with frequency decreasing with advanced maturation (e.g., 0.04% in pro-B cells vs. 0.028 in naive and 0.01% in memory B cells). This is consistent with the notion that auto-reactive stereotyped rearrangements are eliminated from the normal repertoire. However, when using only those IGHV-D-J rearrangements with extreme or very high certitude, CLL-like stereotyped sequences were found at the highest frequency in the naive B-cell subset (0.02% vs 0.0058% in immature and 0.007 in pre-B cells). In addition, we identified at least one sequence for a majority of the most frequent 18 stereotyped subsets, although the proportional representation of each subset in our data is different from that seen in CLL. Specifically, the highest scoring recurrent subsets in our analysis were subsets 3 (61% of total CLL sequences), 5 (9.5%) and 64B (7%), which are not the most common in the known CLL cohort. Rather, we found only a few sequences belonging to the most common subsets such as subsets 1 and 2 (4.5% and 2% of the total CLL sequences), and none belonging to subset 4, although this subset is always isotype class-switched and IGHV-mutated and hence would not be expected to be found in the developing or naive repertoire. Finding a significant presence of CLL-like stereotyped sequences when using ultra-deep DNA sequencing techniques in early phases of B-cell development and in the naive B-cell fraction suggests that CLL-like B cells are present among normal B cell populations albeit at very low frequencies. Moreover the presence of only certain of the most prevalent stereotypes such as subset 3 and 5, respectively the 5th and 6th biggest subsets, suggests these clones might not exhibit high auto-reactive affinity allowing them to more readily escape mechanisms of selection. Disclosures Stamatopoulos: Abbvie: Honoraria, Other: Travel expenses; Gilead: Consultancy, Honoraria, Research Funding; Novartis: Honoraria, Research Funding; Janssen: Honoraria, Other: Travel expenses, Research Funding.
APA, Harvard, Vancouver, ISO, and other styles
44

Katsuta, Eriko, Tao Dai, Abhisha Sawant Dessai, and Subhamoy Dasgupta. "Abstract P5-06-12: Extracellular adenosine synthesis genes regulated by estrogen signaling are associated with cancer aggressiveness and poor prognosis in estrogen receptor (ER)-positive breast cancer." Cancer Research 82, no. 4_Supplement (February 15, 2022): P5–06–12—P5–06–12. http://dx.doi.org/10.1158/1538-7445.sabcs21-p5-06-12.

Full text
Abstract:
Abstract Background: Accumulation of extracellular adenosine regulates tumor progression. Extracellular adenosine binds to adenosine receptors, which mediates signaling to induce angiogenesis and cell proliferation, as well as functioning as an immunosuppressive agent in the tumor microenvironment (TME). CD73 and CD39 are two cell surface enzymes that catalyze the synthesis of extracellular adenosine from AMP, ADP and ATP in the TME. However, the underlying mechanisms that regulate adenosine synthesis in the TME of ER-positive breast cancer remains unknown. Methods: In order to investigate the transcriptional regulation of CD73 and CD39 in ER-positive breast cancer, we treated ER-positive breast cancer cell line, MCF7 with estrogen and tamoxifen. We also investigated the clinical significance of CD73/39 expression in ER-positive patients by bioinformatical approach using TCGA and GEO breast cancer cohorts. Results: In TCGA cohort, higher CD73 expression was associated with worse overall survival in ER-positive tumors (p=0.003), but not in ER-negative tumors. Gene set enrichment analysis revealed that estrogen response gene sets (Early; p=0.043, Late; p=0.021) were significantly enriched in CD73 low expressing ER-positive tumors, suggesting estrogen signaling may inhibit CD73 expression. To test this hypothesis, we analyzed the expression of CD73 and CD39 in MCF7 cells treated with estrogen, tamoxifen, or vehicle control. Our data revealed that estrogen treatment suppressed CD73 and CD39 expression, however tamoxifen treatment significantly enhanced expression of the genes. These findings suggest that tamoxifen treatment can induce the expression of CD73 and CD39 in ER-positive breast tumors, by removing the repressive effect of hormone signaling. Additionally, gene set enrichment analysis revealed that CD73-high ER-positive tumor was significantly associated with cancer aggressiveness characteristics, such as epithelial-mesenchymal transition (EMT) (p&lt;0.001) and angiogenesis (p&lt;0.001). On the other hand, CD73-high ER-positive tumor have significantly less infiltrating CD8-positive T cells, memory B cells and plasma cells in silico analysis, implying that CD73-high tumors have immunosuppressive environment. Further, we found that CD73 expression was significantly elevated post-chemotherapy as compared to the tumors prior to the treatment (p=0.007), and CD73 high expressing patients demonstrated worse relapse-free survival in the neoadjuvant chemotherapy cohort (p=0.003). Conclusion: Our molecular findings indicate that expression of CD73 and CD39 are transcriptionally repressed by estrogen signaling, however tamoxifen treatment reverses the effect. Increased expression of CD73 significantly correlates with worse outcomes in ER-positive breast cancer patients which may be due to immunosuppressive tumor microenvironment created by extracellular adenosine driving a pro-metastatic phenotype. Our data indicate an intriguing mechanism which could be therapeutically exploited by targeting CD73/39 to reverse ‘immune-cold’ microenvironment for the treatment of recurrent and metastatic ER-positive breast cancers. Citation Format: Eriko Katsuta, Tao Dai, Abhisha Sawant Dessai, Subhamoy Dasgupta. Extracellular adenosine synthesis genes regulated by estrogen signaling are associated with cancer aggressiveness and poor prognosis in estrogen receptor (ER)-positive breast cancer [abstract]. In: Proceedings of the 2021 San Antonio Breast Cancer Symposium; 2021 Dec 7-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2022;82(4 Suppl):Abstract nr P5-06-12.
APA, Harvard, Vancouver, ISO, and other styles
45

Xing, Fei, Yi Ping Yao, Zhi Wen Jiang, and Bing Wang. "Fine-Grained Parallel and Distributed Spatial Stochastic Simulation of Biological Reactions." Advanced Materials Research 345 (September 2011): 104–12. http://dx.doi.org/10.4028/www.scientific.net/amr.345.104.

Full text
Abstract:
To date, discrete event stochastic simulations of large scale biological reaction systems are extremely compute-intensive and time-consuming. Besides, it has been widely accepted that spatial factor plays a critical role in the dynamics of most biological reaction systems. The NSM (the Next Sub-Volume Method), a spatial variation of the Gillespie’s stochastic simulation algorithm (SSA), has been proposed for spatially stochastic simulation of those systems. While being able to explore high degree of parallelism in systems, NSM is inherently sequential, which still suffers from the problem of low simulation speed. Fine-grained parallel execution is an elegant way to speed up sequential simulations. Thus, based on the discrete event simulation framework JAMES II, we design and implement a PDES (Parallel Discrete Event Simulation) TW (time warp) simulator to enable the fine-grained parallel execution of spatial stochastic simulations of biological reaction systems using the ANSM (the Abstract NSM), a parallel variation of the NSM. The simulation results of classical Lotka-Volterra biological reaction system show that our time warp simulator obtains remarkable parallel speed-up against sequential execution of the NSM.I.IntroductionThe goal of Systems biology is to obtain system-level investigations of the structure and behavior of biological reaction systems by integrating biology with system theory, mathematics and computer science [1][3], since the isolated knowledge of parts can not explain the dynamics of a whole system. As the complement of “wet-lab” experiments, stochastic simulation, being called the “dry-computational” experiment, plays a more and more important role in computing systems biology [2]. Among many methods explored in systems biology, discrete event stochastic simulation is of greatly importance [4][5][6], since a great number of researches have present that stochasticity or “noise” have a crucial effect on the dynamics of small population biological reaction systems [4][7]. Furthermore, recent research shows that the stochasticity is not only important in biological reaction systems with small population but also in some moderate/large population systems [7].To date, Gillespie’s SSA [8] is widely considered to be the most accurate way to capture the dynamics of biological reaction systems instead of traditional mathematical method [5][9]. However, SSA-based stochastic simulation is confronted with two main challenges: Firstly, this type of simulation is extremely time-consuming, since when the types of species and the number of reactions in the biological system are large, SSA requires a huge amount of steps to sample these reactions; Secondly, the assumption that the systems are spatially homogeneous or well-stirred is hardly met in most real biological systems and spatial factors play a key role in the behaviors of most real biological systems [19][20][21][22][23][24]. The next sub-volume method (NSM) [18], presents us an elegant way to access the special problem via domain partition. To our disappointment, sequential stochastic simulation with the NSM is still very time-consuming, and additionally introduced diffusion among neighbor sub-volumes makes things worse. Whereas, the NSM explores a very high degree of parallelism among sub-volumes, and parallelization has been widely accepted as the most meaningful way to tackle the performance bottleneck of sequential simulations [26][27]. Thus, adapting parallel discrete event simulation (PDES) techniques to discrete event stochastic simulation would be particularly promising. Although there are a few attempts have been conducted [29][30][31], research in this filed is still in its infancy and many issues are in need of further discussion. The next section of the paper presents the background and related work in this domain. In section III, we give the details of design and implementation of model interfaces of LP paradigm and the time warp simulator based on the discrete event simulation framework JAMES II; the benchmark model and experiment results are shown in Section IV; in the last section, we conclude the paper with some future work.II. Background and Related WorkA. Parallel Discrete Event Simulation (PDES)The notion Logical Process (LP) is introduced to PDES as the abstract of the physical process [26], where a system consisting of many physical processes is usually modeled by a set of LP. LP is regarded as the smallest unit that can be executed in PDES and each LP holds a sub-partition of the whole system’s state variables as its private ones. When a LP processes an event, it can only modify the state variables of its own. If one LP needs to modify one of its neighbors’ state variables, it has to schedule an event to the target neighbor. That is to say event message exchanging is the only way that LPs interact with each other. Because of the data dependences or interactions among LPs, synchronization protocols have to be introduced to PDES to guarantee the so-called local causality constraint (LCC) [26]. By now, there are a larger number of synchronization algorithms have been proposed, e.g. the null-message [26], the time warp (TW) [32], breath time warp (BTW) [33] and etc. According to whether can events of LPs be processed optimistically, they are generally divided into two types: conservative algorithms and optimistic algorithms. However, Dematté and Mazza have theoretically pointed out the disadvantages of pure conservative parallel simulation for biochemical reaction systems [31]. B. NSM and ANSM The NSM is a spatial variation of Gillespie’ SSA, which integrates the direct method (DM) [8] with the next reaction method (NRM) [25]. The NSM presents us a pretty good way to tackle the aspect of space in biological systems by partitioning a spatially inhomogeneous system into many much more smaller “homogeneous” ones, which can be simulated by SSA separately. However, the NSM is inherently combined with the sequential semantics, and all sub-volumes share one common data structure for events or messages. Thus, directly parallelization of the NSM may be confronted with the so-called boundary problem and high costs of synchronously accessing the common data structure [29]. In order to obtain higher efficiency of parallel simulation, parallelization of NSM has to firstly free the NSM from the sequential semantics and secondly partition the shared data structure into many “parallel” ones. One of these is the abstract next sub-volume method (ANSM) [30]. In the ANSM, each sub-volume is modeled by a logical process (LP) based on the LP paradigm of PDES, where each LP held its own event queue and state variables (see Fig. 1). In addition, the so-called retraction mechanism was introduced in the ANSM too (see algorithm 1). Besides, based on the ANSM, Wang etc. [30] have experimentally tested the performance of several PDES algorithms in the platform called YH-SUPE [27]. However, their platform is designed for general simulation applications, thus it would sacrifice some performance for being not able to take into account the characteristics of biological reaction systems. Using the similar ideas of the ANSM, Dematté and Mazza have designed and realized an optimistic simulator. However, they processed events in time-stepped manner, which would lose a specific degree of precisions compared with the discrete event manner, and it is very hard to transfer a time-stepped simulation to a discrete event one. In addition, Jeschke etc.[29] have designed and implemented a dynamic time-window simulator to execution the NSM in parallel on the grid computing environment, however, they paid main attention on the analysis of communication costs and determining a better size of the time-window.Fig. 1: the variations from SSA to NSM and from NSM to ANSMC. JAMES II JAMES II is an open source discrete event simulation experiment framework developed by the University of Rostock in Germany. It focuses on high flexibility and scalability [11][13]. Based on the plug-in scheme [12], each function of JAMES II is defined as a specific plug-in type, and all plug-in types and plug-ins are declared in XML-files [13]. Combined with the factory method pattern JAMES II innovatively split up the model and simulator, which makes JAMES II is very flexible to add and reuse both of models and simulators. In addition, JAMES II supports various types of modelling formalisms, e.g. cellular automata, discrete event system specification (DEVS), SpacePi, StochasticPi and etc.[14]. Besides, a well-defined simulator selection mechanism is designed and developed in JAMES II, which can not only automatically choose the proper simulators according to the modeling formalism but also pick out a specific simulator from a serious of simulators supporting the same modeling formalism according to the user settings [15].III. The Model Interface and SimulatorAs we have mentioned in section II (part C), model and simulator are split up into two separate parts. Thus, in this section, we introduce the designation and implementation of model interface of LP paradigm and more importantly the time warp simulator.A. The Mod Interface of LP ParadigmJAMES II provides abstract model interfaces for different modeling formalism, based on which Wang etc. have designed and implemented model interface of LP paradigm[16]. However, this interface is not scalable well for parallel and distributed simulation of larger scale systems. In our implementation, we accommodate the interface to the situation of parallel and distributed situations. Firstly, the neighbor LP’s reference is replaced by its name in LP’s neighbor queue, because it is improper even dangerous that a local LP hold the references of other LPs in remote memory space. In addition, (pseudo-)random number plays a crucial role to obtain valid and meaningful results in stochastic simulations. However, it is still a very challenge work to find a good random number generator (RNG) [34]. Thus, in order to focus on our problems, we introduce one of the uniform RNGs of JAMES II to this model interface, where each LP holds a private RNG so that random number streams of different LPs can be independent stochastically. B. The Time Warp SimulatorBased on the simulator interface provided by JAMES II, we design and implement the time warp simulator, which contains the (master-)simulator, (LP-)simulator. The simulator works strictly as master/worker(s) paradigm for fine-grained parallel and distributed stochastic simulations. Communication costs are crucial to the performance of a fine-grained parallel and distributed simulation. Based on the Java remote method invocation (RMI) mechanism, P2P (peer-to-peer) communication is implemented among all (master-and LP-)simulators, where a simulator holds all the proxies of targeted ones that work on remote workers. One of the advantages of this communication approach is that PDES codes can be transferred to various hardwire environment, such as Clusters, Grids and distributed computing environment, with only a little modification; The other is that RMI mechanism is easy to realized and independent to any other non-Java libraries. Since the straggler event problem, states have to be saved to rollback events that are pre-processed optimistically. Each time being modified, the state is cloned to a queue by Java clone mechanism. Problem of this copy state saving approach is that it would cause loads of memory space. However, the problem can be made up by a condign GVT calculating mechanism. GVT reduction scheme also has a significant impact on the performance of parallel simulators, since it marks the highest time boundary of events that can be committed so that memories of fossils (processed events and states) less than GVT can be reallocated. GVT calculating is a very knotty for the notorious simultaneous reporting problem and transient messages problem. According to our problem, another GVT algorithm, called Twice Notification (TN-GVT) (see algorithm 2), is contributed to this already rich repository instead of implementing one of GVT algorithms in reference [26] and [28].This algorithm looks like the synchronous algorithm described in reference [26] (pp. 114), however, they are essentially different from each other. This algorithm has never stopped the simulators from processing events when GVT reduction, while algorithm in reference [26] blocks all simulators for GVT calculating. As for the transient message problem, it can be neglect in our implementation, because RMI based remote communication approach is synchronized, that means a simulator will not go on its processing until the remote the massage get to its destination. And because of this, the high-costs message acknowledgement, prevalent over many classical asynchronous GVT algorithms, is not needed anymore too, which should be constructive to the whole performance of the time warp simulator.IV. Benchmark Model and Experiment ResultsA. The Lotka-Volterra Predator-prey SystemIn our experiment, the spatial version of Lotka-Volterra predator-prey system is introduced as the benchmark model (see Fig. 2). We choose the system for two considerations: 1) this system is a classical experimental model that has been used in many related researches [8][30][31], so it is credible and the simulation results are comparable; 2) it is simple but helpful enough to test the issues we are interested in. The space of predator-prey System is partitioned into a2D NXNgrid, whereNdenotes the edge size of the grid. Initially the population of the Grass, Preys and Predators are set to 1000 in each single sub-volume (LP). In Fig. 2,r1,r2,r3stand for the reaction constants of the reaction 1, 2 and 3 respectively. We usedGrass,dPreyanddPredatorto stand for the diffusion rate of Grass, Prey and Predator separately. Being similar to reference [8], we also take the assumption that the population of the grass remains stable, and thusdGrassis set to zero.R1:Grass + Prey ->2Prey(1)R2:Predator +Prey -> 2Predator(2)R3:Predator -> NULL(3)r1=0.01; r2=0.01; r3=10(4)dGrass=0.0;dPrey=2.5;dPredato=5.0(5)Fig. 2: predator-prey systemB. Experiment ResultsThe simulation runs have been executed on a Linux Cluster with 40 computing nodes. Each computing node is equipped with two 64bit 2.53 GHz Intel Xeon QuadCore Processors with 24GB RAM, and nodes are interconnected with Gigabit Ethernet connection. The operating system is Kylin Server 3.5, with kernel 2.6.18. Experiments have been conducted on the benchmark model of different size of mode to investigate the execution time and speedup of the time warp simulator. As shown in Fig. 3, the execution time of simulation on single processor with 8 cores is compared. The result shows that it will take more wall clock time to simulate much larger scale systems for the same simulation time. This testifies the fact that larger scale systems will leads to more events in the same time interval. More importantly, the blue line shows that the sequential simulation performance declines very fast when the mode scale becomes large. The bottleneck of sequential simulator is due to the costs of accessing a long event queue to choose the next events. Besides, from the comparison between group 1 and group 2 in this experiment, we could also conclude that high diffusion rate increased the simulation time greatly both in sequential and parallel simulations. This is because LP paradigm has to split diffusion into two processes (diffusion (in) and diffusion (out) event) for two interactive LPs involved in diffusion and high diffusion rate will lead to high proportional of diffusion to reaction. In the second step shown in Fig. 4, the relationship between the speedups from time warp of two different model sizes and the number of work cores involved are demonstrated. The speedup is calculated against the sequential execution of the spatial reaction-diffusion systems model with the same model size and parameters using NSM.Fig. 4 shows the comparison of speedup of time warp on a64X64grid and a100X100grid. In the case of a64X64grid, under the condition that only one node is used, the lowest speedup (a little bigger than 1) is achieved when two cores involved, and the highest speedup (about 6) is achieved when 8 cores involved. The influence of the number of cores used in parallel simulation is investigated. In most cases, large number of cores could bring in considerable improvements in the performance of parallel simulation. Also, compared with the two results in Fig. 4, the simulation of larger model achieves better speedup. Combined with time tests (Fig. 3), we find that sequential simulator’s performance declines sharply when the model scale becomes very large, which makes the time warp simulator get better speed-up correspondingly.Fig. 3: Execution time (wall clock time) of Seq. and time warp with respect to different model sizes (N=32, 64, 100, and 128) and model parameters based on single computing node with 8 cores. Results of the test are grouped by the diffusion rates (Group 1: Sequential 1 and Time Warp 1. dPrey=2.5, dPredator=5.0; Group 2: dPrey=0.25, dPredator=0.5, Sequential 2 and Time Warp 2).Fig. 4: Speedup of time warp with respect to the number of work cores and the model size (N=64 and 100). Work cores are chose from one computing node. Diffusion rates are dPrey=2.5, dPredator=5.0 and dGrass=0.0.V. Conclusion and Future WorkIn this paper, a time warp simulator based on the discrete event simulation framework JAMES II is designed and implemented for fine-grained parallel and distributed discrete event spatial stochastic simulation of biological reaction systems. Several challenges have been overcome, such as state saving, roll back and especially GVT reduction in parallel execution of simulations. The Lotka-Volterra Predator-Prey system is chosen as the benchmark model to test the performance of our time warp simulator and the best experiment results show that it can obtain about 6 times of speed-up against the sequential simulation. The domain this paper concerns with is in the infancy, many interesting issues are worthy of further investigated, e.g. there are many excellent PDES optimistic synchronization algorithms (e.g. the BTW) as well. Next step, we would like to fill some of them into JAMES II. In addition, Gillespie approximation methods (tau-leap[10] etc.) sacrifice some degree of precision for higher simulation speed, but still could not address the aspect of space of biological reaction systems. The combination of spatial element and approximation methods would be very interesting and promising; however, the parallel execution of tau-leap methods should have to overcome many obstacles on the road ahead.AcknowledgmentThis work is supported by the National Natural Science Foundation of China (NSF) Grant (No.60773019) and the Ph.D. Programs Foundation of Ministry of Education of China (No. 200899980004). The authors would like to show their great gratitude to Dr. Jan Himmelspach and Dr. Roland Ewald at the University of Rostock, Germany for their invaluable advice and kindly help with JAMES II.ReferencesH. Kitano, "Computational systems biology." Nature, vol. 420, no. 6912, pp. 206-210, November 2002.H. Kitano, "Systems biology: a brief overview." Science (New York, N.Y.), vol. 295, no. 5560, pp. 1662-1664, March 2002.A. Aderem, "Systems biology: Its practice and challenges," Cell, vol. 121, no. 4, pp. 511-513, May 2005. [Online]. Available: http://dx.doi.org/10.1016/j.cell.2005.04.020.H. de Jong, "Modeling and simulation of genetic regulatory systems: A literature review," Journal of Computational Biology, vol. 9, no. 1, pp. 67-103, January 2002.C. W. Gardiner, Handbook of Stochastic Methods: for Physics, Chemistry and the Natural Sciences (Springer Series in Synergetics), 3rd ed. Springer, April 2004.D. T. Gillespie, "Simulation methods in systems biology," in Formal Methods for Computational Systems Biology, ser. Lecture Notes in Computer Science, M. Bernardo, P. Degano, and G. Zavattaro, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, vol. 5016, ch. 5, pp. 125-167.Y. Tao, Y. Jia, and G. T. Dewey, "Stochastic fluctuations in gene expression far from equilibrium: Omega expansion and linear noise approximation," The Journal of Chemical Physics, vol. 122, no. 12, 2005.D. T. Gillespie, "Exact stochastic simulation of coupled chemical reactions," Journal of Physical Chemistry, vol. 81, no. 25, pp. 2340-2361, December 1977.D. T. Gillespie, "Stochastic simulation of chemical kinetics," Annual Review of Physical Chemistry, vol. 58, no. 1, pp. 35-55, 2007.D. T. Gillespie, "Approximate accelerated stochastic simulation of chemically reacting systems," The Journal of Chemical Physics, vol. 115, no. 4, pp. 1716-1733, 2001.J. Himmelspach, R. Ewald, and A. M. Uhrmacher, "A flexible and scalable experimentation layer," in WSC '08: Proceedings of the 40th Conference on Winter Simulation. Winter Simulation Conference, 2008, pp. 827-835.J. Himmelspach and A. M. Uhrmacher, "Plug'n simulate," in 40th Annual Simulation Symposium (ANSS'07). Washington, DC, USA: IEEE, March 2007, pp. 137-143.R. Ewald, J. Himmelspach, M. Jeschke, S. Leye, and A. M. Uhrmacher, "Flexible experimentation in the modeling and simulation framework james ii-implications for computational systems biology," Brief Bioinform, vol. 11, no. 3, pp. bbp067-300, January 2010.A. Uhrmacher, J. Himmelspach, M. Jeschke, M. John, S. Leye, C. Maus, M. Röhl, and R. Ewald, "One modelling formalism & simulator is not enough! a perspective for computational biology based on james ii," in Formal Methods in Systems Biology, ser. Lecture Notes in Computer Science, J. Fisher, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, vol. 5054, ch. 9, pp. 123-138. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-68413-8_9.R. Ewald, J. Himmelspach, and A. M. Uhrmacher, "An algorithm selection approach for simulation systems," pads, vol. 0, pp. 91-98, 2008.Bing Wang, Jan Himmelspach, Roland Ewald, Yiping Yao, and Adelinde M Uhrmacher. Experimental analysis of logical process simulation algorithms in james ii[C]// In M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, editors, Proceedings of the Winter Simulation Conference, IEEE Computer Science, 2009. 1167-1179.Ewald, J. Rössel, J. Himmelspach, and A. M. Uhrmacher, "A plug-in-based architecture for random number generation in simulation systems," in WSC '08: Proceedings of the 40th Conference on Winter Simulation. Winter Simulation Conference, 2008, pp. 836-844.J. Elf and M. Ehrenberg, "Spontaneous separation of bi-stable biochemical systems into spatial domains of opposite phases." Systems biology, vol. 1, no. 2, pp. 230-236, December 2004.K. Takahashi, S. Arjunan, and M. Tomita, "Space in systems biology of signaling pathways? Towards intracellular molecular crowding in silico," FEBS Letters, vol. 579, no. 8, pp. 1783-1788, March 2005.J. V. Rodriguez, J. A. Kaandorp, M. Dobrzynski, and J. G. Blom, "Spatial stochastic modelling of the phosphoenolpyruvate-dependent phosphotransferase (pts) pathway in escherichia coli," Bioinformatics, vol. 22, no. 15, pp. 1895-1901, August 2006.D. Ridgway, G. Broderick, and M. Ellison, "Accommodating space, time and randomness in network simulation," Current Opinion in Biotechnology, vol. 17, no. 5, pp. 493-498, October 2006.J. V. Rodriguez, J. A. Kaandorp, M. Dobrzynski, and J. G. Blom, "Spatial stochastic modelling of the phosphoenolpyruvate-dependent phosphotransferase (pts) pathway in escherichia coli," Bioinformatics, vol. 22, no. 15, pp. 1895-1901, August 2006.W. G. Wilson, A. M. Deroos, and E. Mccauley, "Spatial instabilities within the diffusive lotka-volterra system: Individual-based simulation results," Theoretical Population Biology, vol. 43, no. 1, pp. 91-127, February 1993.K. Kruse and J. Elf. Kinetics in spatially extended systems. In Z. Szallasi, J. Stelling, and V. Periwal, editors, System Modeling in Cellular Biology. From Concepts to Nuts and Bolts, pages 177–198. MIT Press, Cambridge, MA, 2006.M. A. Gibson and J. Bruck, "Efficient exact stochastic simulation of chemical systems with many species and many channels," The Journal of Physical Chemistry A, vol. 104, no. 9, pp. 1876-1889, March 2000.R. M. Fujimoto, Parallel and Distributed Simulation Systems (Wiley Series on Parallel and Distributed Computing). Wiley-Interscience, January 2000.Y. Yao and Y. Zhang, “Solution for analytic simulation based on parallel processing,” Journal of System Simulation, vol. 20, No.24, pp. 6617–6621, 2008.G. Chen and B. K. Szymanski, "Dsim: scaling time warp to 1,033 processors," in WSC '05: Proceedings of the 37th conference on Winter simulation. Winter Simulation Conference, 2005, pp. 346-355.M. Jeschke, A. Park, R. Ewald, R. Fujimoto, and A. M. Uhrmacher, "Parallel and distributed spatial simulation of chemical reactions," in 2008 22nd Workshop on Principles of Advanced and Distributed Simulation. Washington, DC, USA: IEEE, June 2008, pp. 51-59.B. Wang, Y. Yao, Y. Zhao, B. Hou, and S. Peng, "Experimental analysis of optimistic synchronization algorithms for parallel simulation of reaction-diffusion systems," High Performance Computational Systems Biology, International Workshop on, vol. 0, pp. 91-100, October 2009.L. Dematté and T. Mazza, "On parallel stochastic simulation of diffusive systems," in Computational Methods in Systems Biology, M. Heiner and A. M. Uhrmacher, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, vol. 5307, ch. 16, pp. 191-210.D. R. Jefferson, "Virtual time," ACM Trans. Program. Lang. Syst., vol. 7, no. 3, pp. 404-425, July 1985.J. S. Steinman, "Breathing time warp," SIGSIM Simul. Dig., vol. 23, no. 1, pp. 109-118, July 1993. [Online]. Available: http://dx.doi.org/10.1145/174134.158473 S. K. Park and K. W. Miller, "Random number generators: good ones are hard to find," Commun. ACM, vol. 31, no. 10, pp. 1192-1201, October 1988.
APA, Harvard, Vancouver, ISO, and other styles
46

Marchet, Camille, Mael Kerbiriou, and Antoine Limasset. "BLight: efficient exact associative structure for k-mers." Bioinformatics, April 3, 2021. http://dx.doi.org/10.1093/bioinformatics/btab217.

Full text
Abstract:
Abstract Motivation A plethora of methods and applications share the fundamental need to associate information to words for high-throughput sequence analysis. Doing so for billions of k-mers is commonly a scalability problem, as exact associative indexes can be memory expensive. Recent works take advantage of overlaps between k-mers to leverage this challenge. Yet, existing data structures are either unable to associate information to k-mers or are not lightweight enough. Results We present BLight, a static and exact data structure able to associate unique identifiers to k-mers and determine their membership in a set without false positive that scales to huge k-mer sets with a low memory cost. This index combines an extremely compact representation along with very fast queries. Besides, its construction is efficient and needs no additional memory. Our implementation achieves to index the k-mers from the human genome using 8 GB of RAM (23 bits per k-mer) within 10 min and the k-mers from the large axolotl genome using 63 GB of memory (27 bits per k-mer) within 76 min. Furthermore, while being memory efficient, the index provides a very high throughput: 1.4 million queries per second on a single CPU or 16.1 million using 12 cores. Finally, we also present how BLight can practically represent metagenomic and transcriptomic sequencing data to highlight its wide applicative range. Availability and implementation We wrote the BLight index as an open source C++ library under the AGPL3 license available at github.com/Malfoy/BLight. It is designed as a user-friendly library and comes along with code usage samples.
APA, Harvard, Vancouver, ISO, and other styles
47

Benoit, Gaëtan, Mahendra Mariadassou, Stéphane Robin, Sophie Schbath, Pierre Peterlongo, and Claire Lemaitre. "SimkaMin: fast and resource frugal de novo comparative metagenomics." Bioinformatics, September 3, 2019. http://dx.doi.org/10.1093/bioinformatics/btz685.

Full text
Abstract:
Abstract Motivation De novo comparative metagenomics is one of the most straightforward ways to analyze large sets of metagenomic data. Latest methods use the fraction of shared k-mers to estimate genomic similarity between read sets. However, those methods, while extremely efficient, are still limited by computational needs for practical usage outside of large computing facilities. Results We present SimkaMin, a quick comparative metagenomics tool with low disk and memory footprints, thanks to an efficient data subsampling scheme used to estimate Bray-Curtis and Jaccard dissimilarities. One billion metagenomic reads can be analyzed in <3 min, with tiny memory (1.09 GB) and disk (≈0.3 GB) requirements and without altering the quality of the downstream comparative analyses, making of SimkaMin a tool perfectly tailored for very large-scale metagenomic projects. Availability and implementation https://github.com/GATB/simka. Contact Claire.Lemaitre@inria.fr Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
48

Peng, Xiyu, and Karin S. Dorman. "AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data." Bioinformatics, July 22, 2020. http://dx.doi.org/10.1093/bioinformatics/btaa648.

Full text
Abstract:
Abstract Motivation Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ‘denoising’ methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. Results We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. Availability and implementation Source code is available at https://github.com/DormanLab/AmpliCI. Supplementary information Supplementary material are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
49

Kim, Jeremie S., Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Can Alkan, and Onur Mutlu. "FastRemap: A Tool for Quickly Remapping Reads between Genome Assemblies." Bioinformatics, August 17, 2022. http://dx.doi.org/10.1093/bioinformatics/btac554.

Full text
Abstract:
Abstract Motivation A genome read data set can be quickly and efficiently remapped from one reference to another similar reference (e.g., between two reference versions or two similar species) using a variety of tools, e.g., the commonly-used CrossMap tool. With the explosion of available genomic data sets and references, high-performance remapping tools will be even more important for keeping up with the computational demands of genome assembly and analysis. Results We provide FastRemap, a fast and efficient tool for remapping reads between genome assemblies. FastRemap provides up to a 7.19× speedup (5.97×, on average) and uses as low as 61.7% (80.7%, on average) of the peak memory consumption compared to the state-of-the-art remapping tool, CrossMap. Availability FastRemap is written in C ++. Source code and user manual are freely available at: github.com/CMU-SAFARI/FastRemap Docker image available at: https://hub.docker.com/r/alkanlab/fast Also available in Bioconda.
APA, Harvard, Vancouver, ISO, and other styles
50

Lu, Chengqian, Min Zeng, Fang-Xiang Wu, Min Li, and Jianxin Wang. "Improving circRNA-disease association prediction by sequence and ontology representations with convolutional and recurrent neural networks." Bioinformatics, December 26, 2020. http://dx.doi.org/10.1093/bioinformatics/btaa1077.

Full text
Abstract:
Abstract Motivation Emerging studies indicate that circular RNAs (circRNAs) are widely involved in the progression of human diseases. Due to its special structure which is stable, circRNAs are promising diagnostic and prognostic biomarkers for diseases. However, the experimental verification of circRNA-disease associations is expensive and limited to small-scale. Effective computational methods for predicting potential circRNA-disease associations are regarded as a matter of urgency. Although several models have been proposed, over-reliance on known associations and the absence of characteristics of biological functions make precise predictions are still challenging. Results In this study, we propose a method for predicting CircRNA-Disease Associations based on Sequence and Ontology Representations, named CDASOR, with convolutional and recurrent neural networks. For sequences of circRNAs, we encode them with continuous k-mers, get low-dimensional vectors of k-mers, extract their local feature vectors with 1 D CNN and learn their long-term dependencies with bi-directional long short-term memory. For diseases, we serialize disease ontology into sentences containing the hierarchy of ontology, obtain low-dimensional vectors for disease ontology terms and get terms’ dependencies. Furthermore, we get association patterns of circRNAs and diseases from known circRNA-disease associations with neural networks. After the above steps, we get circRNAs’ and diseases’ high-level representations which are informative to improve the prediction. The experimental results show that CDASOR provides an accurate prediction. Importing the characteristics of biological functions, CDASOR achieves impressive predictions in the de novo test. In addition, 6 of the top-10 predicted results are verified by the published literature in the case studies. Availability The code of CDASOR is freely available at https://github.com/BioinformaticsCSU/CDASOR Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography