Journal articles on the topic 'GENOMIC LANGUAGE PROCESSING'

To see the other types of publications on this topic, follow the link: GENOMIC LANGUAGE PROCESSING.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'GENOMIC LANGUAGE PROCESSING.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Routhier, Etienne, and Julien Mozziconacci. "Genomics enters the deep learning era." PeerJ 10 (June 24, 2022): e13613. http://dx.doi.org/10.7717/peerj.13613.

Full text
Abstract:
The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
APA, Harvard, Vancouver, ISO, and other styles
2

Kehl, Kenneth L., Wenxin Xu, Eva Lepisto, Haitham Elmarakeby, Michael J. Hassett, Eliezer M. Van Allen, Bruce E. Johnson, and Deborah Schrag. "Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes." JCO Clinical Cancer Informatics, no. 4 (September 2020): 680–90. http://dx.doi.org/10.1200/cci.20.00020.

Full text
Abstract:
PURPOSE Cancer research using electronic health records and genomic data sets requires clinical outcomes data, which may be recorded only in unstructured text by treating oncologists. Natural language processing (NLP) could substantially accelerate extraction of this information. METHODS Patients with lung cancer who had tumor sequencing as part of a single-institution precision oncology study from 2013 to 2018 were identified. Medical oncologists’ progress notes for these patients were reviewed. For each note, curators recorded whether the assessment/plan indicated any cancer, progression/worsening of disease, and/or response to therapy or improving disease. Next, a recurrent neural network was trained using unlabeled notes to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on labeled assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) among a held-out test set of 10% of patients. Associations between curated response or progression end points and overall survival were measured using Cox models among patients receiving palliative-intent systemic therapy. RESULTS Medical oncologist notes (n = 7,597) were manually curated for 919 patients. In the 10% test set, NLP models replicated human curation with AUROCs of 0.94 for the any-cancer outcome, 0.86 for the progression outcome, and 0.90 for the response outcome. Progression/worsening events identified using NLP models were associated with shortened survival (hazard ratio [HR] for mortality, 2.49; 95% CI, 2.00 to 3.09); response/improvement events were associated with improved survival (HR, 0.45; 95% CI, 0.30 to 0.67). CONCLUSION NLP models based on neural networks can extract meaningful outcomes from oncologist notes at scale. Such models may facilitate identification of clinical and genomic features associated with response to cancer treatment.
APA, Harvard, Vancouver, ISO, and other styles
3

Schubert, Michael. "clustermq enables efficient parallelization of genomic analyses." Bioinformatics 35, no. 21 (May 27, 2019): 4493–95. http://dx.doi.org/10.1093/bioinformatics/btz284.

Full text
Abstract:
Abstract Motivation High performance computing (HPC) clusters play a pivotal role in large-scale bioinformatics analysis and modeling. For the statistical computing language R, packages exist to enable a user to submit their analyses as jobs on HPC schedulers. However, these packages do not scale well to high numbers of tasks, and their processing overhead quickly becomes a prohibitive bottleneck. Results Here we present clustermq, an R package that can process analyses up to three orders of magnitude faster than previously published alternatives. We show this for investigating genomic associations of drug sensitivity in cancer cell lines, but it can be applied to any kind of parallelizable workflow. Availability and implementation The package is available on CRAN and https://github.com/mschubert/clustermq. Code for performance testing is available at https://github.com/mschubert/clustermq-performance. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
4

Le Guen, Yann, François Leroy, Cathy Philippe, Jean-François Mangin, Ghislaine Dehaene-Lambertz, and Vincent Frouin. "Enhancer Locus in ch14q23.1 Modulates Brain Asymmetric Temporal Regions Involved in Language Processing." Cerebral Cortex 30, no. 10 (May 20, 2020): 5322–32. http://dx.doi.org/10.1093/cercor/bhaa112.

Full text
Abstract:
Abstract Identifying the genes that contribute to the variability in brain regions involved in language processing may shed light on the evolution of brain structures essential to the emergence of language in Homo sapiens. The superior temporal asymmetrical pit (STAP), which is not observed in chimpanzees, represents an ideal phenotype to investigate the genetic variations that support human communication. The left STAP depth was significantly associated with a predicted enhancer annotation located in the 14q23.1 locus, between DACT1 and KIAA0586, in the UK Biobank British discovery sample (N = 16 515). This association was replicated in the IMAGEN cohort (N = 1726) and the UK Biobank non-British validation sample (N = 2161). This genomic region was also associated to a lesser extent with the right STAP depth and the formation of sulcal interruptions, “plis de passage,” in the bilateral STAP but not with other structural brain MRI phenotypes, highlighting its notable association with the superior temporal regions. Diffusion MRI emphasized an association with the fractional anisotropy of the left auditory fibers of the corpus callosum and with networks involved in linguistic processing in resting-state functional MRI. Overall, this evidence demonstrates a specific relationship between this locus and the establishment of the superior temporal regions that support human communication.
APA, Harvard, Vancouver, ISO, and other styles
5

Konstantinidis, George, Adriane Chapman, Mark J. Weal, Ahmed Alzubaidi, Lisa M. Ballard, and Anneke M. Lucassen. "The Need for Machine-Processable Agreements in Health Data Management." Algorithms 13, no. 4 (April 7, 2020): 87. http://dx.doi.org/10.3390/a13040087.

Full text
Abstract:
Data processing agreements in health data management are laid out by organisations in monolithic “Terms and Conditions” documents written in natural legal language. These top-down policies usually protect the interest of the service providers, rather than the data owners. They are coarse-grained and do not allow for more than a few opt-in or opt-out options for individuals to express their consent on personal data processing, and these options often do not transfer to software as they were intended to. In this paper, we study the problem of health data sharing and we advocate the need for individuals to describe their personal contract of data usage in a formal, machine-processable language. We develop an application for sharing patient genomic information and test results, and use interactions with patients and clinicians in order to identify the particular peculiarities a privacy/policy/consent language should offer in this complicated domain. We present how Semantic Web technologies can have a central role in this approach by providing the formal tools and features required in such a language. We present our ongoing approach to construct an ontology-based framework and a policy language that allows patients and clinicians to express fine-grained consent, preferences or suggestions on sharing medical information. Our language offers unique features such as multi-party ownership of data or data sharing dependencies. We evaluate the landscape of policy languages from different areas, and show how they are lacking major requirements needed in health data management. In addition to enabling patients, our approach helps organisations increase technological capabilities, abide by legal requirements, and save resources.
APA, Harvard, Vancouver, ISO, and other styles
6

Guan, Meijian, Samuel Cho, Robin Petro, Wei Zhang, Boris Pasche, and Umit Topaloglu. "Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes." JAMIA Open 2, no. 1 (January 3, 2019): 139–49. http://dx.doi.org/10.1093/jamiaopen/ooy061.

Full text
Abstract:
Abstract Objectives Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.
APA, Harvard, Vancouver, ISO, and other styles
7

Miyano, Satoru. "IL-3 Changing Cancer Genomics and Cancer Genomic Medicine by Artificial Intelligence and Large-Scale Data Analysis." Neuro-Oncology Advances 3, Supplement_6 (December 1, 2021): vi1. http://dx.doi.org/10.1093/noajnl/vdab159.002.

Full text
Abstract:
Abstract In MEXT Program for Scientific Research on Innovative Areas “Systems Cancer” and “Systems Cancer in Neo-Dimension” (2010-2019), we developed a large-scale genome data analysis pipeline called Genomon in collaboration with Professor Seiji Ogawa (Kyoto University). Our efforts successfully produced innovative results on cancer genomics. This system is implemented on the supercomputers SHIROKANE and FUGAKU. One of the contributions unraveled the overall picture of genetic abnormalities in malignant brain tumors (Mutational landscape and clonal architecture in grade II and III gliomas. Nat Genet 2015) that exploited Genomon on SHIROKANE. However, with the spread of new measurement technology and new computing environments, no one thinks that the future can be figured on this simple extension. On the other hand, for cancer genomic medicine, Institute of Medical Science University of Tokyo made a research team analyzing whole genome sequences. The challenge we faced was to transform thousands to millions of genomic aberrations per case into precision medicine. It is what we now call “digital transformation.” IBM’s Watson for Genomics was introduced for our research purpose. In the process, we identified the effectiveness of AI, the indispensability of specialist intervention, and bottlenecks. We recognized that natural language processing technology such as BERT and Google Knowledge Graph AI technology will open up the future. Automatic document creation is also a realistic issue. Cancer research is getting more difficult and larger in scale. For example, analysis of genomic data from 60, 954 cases revealed a new underlying mechanism in which multiple mutations within the same oncogene synergistically work (Nature 2021). AI with an accuracy of X% does not seem to be the goal. What is needed is not a black box, but explainable AI that explains “why” in a human-understandable way. We are currently conducting research with Fujitsu Laboratories for this direction.
APA, Harvard, Vancouver, ISO, and other styles
8

Garzon, Max H., Kiran C. Bobba, Andrew Neel, and Vinhthuy Phan. "DNA-Based Indexing." International Journal of Nanotechnology and Molecular Computation 2, no. 3 (July 2010): 25–45. http://dx.doi.org/10.4018/jnmc.2010070102.

Full text
Abstract:
DNA has been acknowledged as a suitable medium for massively parallel computing and as a “smart” glue for self-assembly. In this paper, a third capability of DNA is described in detail as memory capable of encoding and processing large amounts of data so that information can be retrieved associatively based on content. The technique is based on a novel representation of data on DNA that can shed information on the way DNA-, RNA- and other biomolecules encode information, which may be potentially important in applications to fields like bioinformatics and genetics, and natural language processing. Analyses are also provided of the sensitivity, robustness, and bounds on the theoretical capacity of the memories. Finally, the potential use of the memories are illustrated with two applications, one in genomic analysis for identification and classification, another in information retrieval from text data in abiotic form.
APA, Harvard, Vancouver, ISO, and other styles
9

Klein, Harry, Tali Mazor, Matthew Galvin, Jason Hansel, Emily Mallaber, Pavel Trukhanov, Joyce Yu, et al. "Abstract 1067: MatchMiner: An open-source AI precision medicine trial matching platform." Cancer Research 83, no. 7_Supplement (April 4, 2023): 1067. http://dx.doi.org/10.1158/1538-7445.am2023-1067.

Full text
Abstract:
Abstract As the number of precision medicine (PM) trials and patient genomic data has grown, it has become challenging for clinicians and trial staff to identify PM trial options for patients. Several trial matching software platforms have been developed to match genomic data from patients with PM trials, but these existing platforms are proprietary and are not easily accessible for adoption by institutions. At Dana-Farber Cancer Institute (DFCI), we have addressed this challenge by developing our own open-source institutional trial matching software, MatchMiner. MatchMiner algorithmically matches patient genomic and clinical data with PM trial eligibility data. Trial eligibility data is manually curated into a human-readable markup language, called clinical trial markup language (CTML), for matching with patient genomic data. MatchMiner has 2 main modes of clinical use: (1) patient-centric, where clinicians search for trial matches for individual patients and (2) trial-centric, where trial staff identify patients that match their trial’s genomic eligibility. We recently described MatchMiner’s usage at DFCI and since our report, we have added 90 additional trial consents facilitated by MatchMiner (>250 trial consents, called MatchMiner consents [MMC]). Here, we describe new characteristics of our MMC including which user mode (patient-centric or trial-centric) was used to match the consent, genomic alterations and cancer types that matched to eligibility criteria, and whether the patient went onto trial. MMCs were mostly identified by patient-centric mode (70%), genomic alterations and cancer types among MMC were diverse (n=55 genes and n=20 cancer types), and 87% of MMC went on trial. Among MMCs, the most common altered genes leading to trial eligibility were ERBB2 and KRAS in breast cancer and lung cancer, which is consistent with the number of therapies targeting ERBB2 and KRAS. MMCs also included patients with rare cancer types, like extraskeletal myxoid chondrosarcoma, as well as rare genomic alterations, such as NTRK fusions. Thus, MatchMiner has been successful at facilitating PM trial matching for a broad range of genomic alterations and cancer types at DFCI. MatchMiner matches patients to trials as soon as their genomic report is available, however, many patients are not yet ready to enroll onto a trial because their cancer is responding to the standard of care or they are in a remission period. To address this problem, we are evaluating the use of artificial intelligence (AI) to identify patients that may be ready for a new treatment option. After trial matches have been generated by MatchMiner, radiology scan text from patients’ tumor scans is run through a natural language processing (NLP) model to identify patients who are more likely to be ready to enroll onto a trial. By using NLP to filter trial matches, we hope to improve MatchMiner’s efficiency of finding trial matches and provide more timely trial options for patients. Citation Format: Harry Klein, Tali Mazor, Matthew Galvin, Jason Hansel, Emily Mallaber, Pavel Trukhanov, Joyce Yu, James Lindsay, Kenneth Kehl, Michael Hassett, Ethan Cerami. MatchMiner: An open-source AI precision medicine trial matching platform [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 1067.
APA, Harvard, Vancouver, ISO, and other styles
10

Graves, Jordan, Jacob Byerly, Eduardo Priego, Naren Makkapati, S. Vince Parish, Brenda Medellin, and Monica Berrondo. "A Review of Deep Learning Methods for Antibodies." Antibodies 9, no. 2 (April 28, 2020): 12. http://dx.doi.org/10.3390/antib9020012.

Full text
Abstract:
Driven by its successes across domains such as computer vision and natural language processing, deep learning has recently entered the field of biology by aiding in cellular image classification, finding genomic connections, and advancing drug discovery. In drug discovery and protein engineering, a major goal is to design a molecule that will perform a useful function as a therapeutic drug. Typically, the focus has been on small molecules, but new approaches have been developed to apply these same principles of deep learning to biologics, such as antibodies. Here we give a brief background of deep learning as it applies to antibody drug development, and an in-depth explanation of several deep learning algorithms that have been proposed to solve aspects of both protein design in general, and antibody design in particular.
APA, Harvard, Vancouver, ISO, and other styles
11

Jee, Justin, Anisha Luthra, Christopher Fong, Karl Pichotta, Thinh Tran, Mirella Altoe, Alex Miller, et al. "Large-scale clinicogenomic models of solid tumor CNS metastasis." Journal of Clinical Oncology 41, no. 16_suppl (June 1, 2023): 2037. http://dx.doi.org/10.1200/jco.2023.41.16_suppl.2037.

Full text
Abstract:
2037 Background: Central nervous system (CNS) metastasis is a major cause of cancer death and morbidity, but the clinicogenomic covariates of CNS metastasis have been studied in small cohorts. We sought to i) determine whether models predicting patient time to CNS metastasis (ttCNS) trained on a large, automatically annotated clinicogenomic dataset could stratify ttCNS risk in an external, manually curated cohort and ii) use these data to study the genomic risk factors for metastasis at scale. Methods: We leveraged the AACR Project GENIE Biopharma Collaborative (BPC), a structured curation of electronic health records at four cancer centers using the PRISSMM method, to train natural language processing (NLP) algorithms to annotate metastatic sites from radiology reports. We applied these algorithms to all reports for MSK patients with tumor sequencing with our FDA-authorized targeted sequencing platform. We used the resulting clinicogenomic data to train random survival forests (RSF) to predict radiographically confirmed ttCNS from time of sample acquisition for patients with non-small cell lung (NSCLC, N = 7,263), breast (BRC, N = 5,195; HR+ N = 4,050, HER2+ N = 879, triple-negative (TNBC) N = 866), and colorectal cancer (CRC, N = 4,320) using stage, gene-level pathogenic alterations, pre-existing metastatic sites, histopathology, prior and current treatment, and patient demographics as variables, excluding those reaching the endpoint prior to sample acquisition. We also predicted time to bone, liver, and adrenal metastases. RSFs were validated in the manually curated, non-MSK BPC cohort. Results: RSFs had predictive power for ttCNS in validation datasets (NSCLC c-index: 0.66, BRC: 0.71 (HR+ only: 0.71, HER2+ only: 0.69, TNBC: 0.62), CRC: 0.67, all p < 0.001). Pre-existing metastatic involvement, and genomic, histopathologic and clinical features had non-overlapping information for predicting ttCNS. We explored genomic covariates of ttCNS and other sites using Cox proportional hazards models adjusted for disease stage. Within individual cancer types, the hazard ratios of gene-level changes leading to the four considered sites of metastasis were correlated (Pearson R = 0.71-0.98); in all cancer types the highest correlations were between ttCNS and ttAdrenal metastases. Across cancer types, genomic alterations leading to metastatic sites were less correlated (R = -0.22-0.48). For example, CDKN2A/B and MYC alterations shortened ttCNS in NSCLC and HR+ BRC but not in HR- BRC. PTEN was associated with shortened ttCNS in TNBC and NSCLC but not CRC and other breast subtypes. Conclusions: Automatically annotated cohorts provide a means of studying drivers of metastasis at scale. Pre-existing non-CNS sites are associated with shorter ttCNS. Genomic alterations predisposing to CNS metastases frequently predispose to other organ metastases, although in general the genomics of organotropism are highly cancer-specific.
APA, Harvard, Vancouver, ISO, and other styles
12

Kelley, Michael J., and Julie Ann Lynch. "Tools to accurately identify veterans who undergo molecular diagnostic testing." Journal of Clinical Oncology 31, no. 31_suppl (November 1, 2013): 201. http://dx.doi.org/10.1200/jco.2013.31.31_suppl.201.

Full text
Abstract:
201 Background: Yearly, 52,000 veterans are diagnosed with cancer. Minorities represent 18.5% (9,651) of these patients. The rapid and explosive growth of molecular diagnostics (MDx) has created challenges for large healthcare systems such as the Veterans Health Administration (VA) to integrate genomic data into the EMR. Yet, this is an important component of high quality cancer care. In 2011, the VA cancer registry (VACCR) began reporting molecular data. This presentation will describe one project to improve integration of genomics into the EMR and the VACCR. Methods: Using ICD diagnosis codes, we identified veterans diagnosed in 2011 with brain, breast, colon, gastrointestinal stromal, lung, and melanoma cancers. Administrative data was obtained to identify MDx testing. These data were then compared to random chart audits. Significant discrepancies between these sources of data prompted collaboration with national and proprietary reference labs (ARUP, LabCorp, Quest, Genomic Health) to obtain the volume of testing by each VAMC. These data were used to conduct targeted chart reviews to identify the location, processes of care, free and structured text information. These data informed the development of natural language processing (NLP) tools to automatically identify patients that underwent testing. Results: Laboratories had the most accurate source of data. Data from ARUP, Quest, and LabCorp identified a significantly higher volume of testing than reported by administrative data. Applying NLP tools to patients diagnosed with breast cancer identified 44 of the 116 tests ordered for the 21-gene risk score tests. Conclusions: Decision support systems are needed to link tumor SNOMED code to diagnostic testing. Until systems are developed, collaborations with reference labs may be an effective method for identifying molecular data. NLP tools may also serve as an adjunct method for capturing MDx tests ordered from smaller labs.
APA, Harvard, Vancouver, ISO, and other styles
13

Magge, Arjun, Davy Weissenbacher, Karen O’Connor, Tasnia Tahsin, Graciela Gonzalez-Hernandez, and Matthew Scotch. "GeoBoost2: a natural languageprocessing pipeline for GenBank metadata enrichment for virus phylogeography." Bioinformatics 36, no. 20 (July 19, 2020): 5120–21. http://dx.doi.org/10.1093/bioinformatics/btaa647.

Full text
Abstract:
Abstract Summary We present GeoBoost2, a natural language-processing pipeline for extracting the location of infected hosts for enriching metadata in nucleotide sequences repositories like National Center of Biotechnology Information’s GenBank for downstream analysis including phylogeography and genomic epidemiology. The increasing number of pathogen sequences requires complementary information extraction methods for focused research, including surveillance within countries and between borders. In this article, we describe the enhancements from our earlier release including improvement in end-to-end extraction performance and speed, availability of a fully functional web-interface and state-of-the-art methods for location extraction using deep learning. Availability and implementation Application is freely available on the web at https://zodo.asu.edu/geoboost2. Source code, usage examples and annotated data for GeoBoost2 is freely available at https://github.com/ZooPhy/geoboost2. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
14

Tran, Thinh N., Karl B. Pichotta, Si-Yang Liu, Christopher Fong, Anisha Luthra, Brooke Mastrogiacomo, Steven Maron, et al. "Abstract 4259: Identification of anti-neoplastic therapy given before initial visit at a referral center using natural language processing applied to medical oncology initial consultation notes." Cancer Research 83, no. 7_Supplement (April 4, 2023): 4259. http://dx.doi.org/10.1158/1538-7445.am2023-4259.

Full text
Abstract:
Abstract Anticancer therapy changes tumor physiology and genomics, making it a key variable in cancer studies. Although antineoplastics given at a single institution may be available in research-ready format, treatment at external institutions prior to receiving care at academic medical centers, common among patients at these centers, is often only described in free-text clinical notes, necessitating manual curation for downstream analysis. To overcome this bottleneck, we trained and validated natural language processing (NLP) models using initial consult notes to identify whether patients had received treatment at external institutions and studied the impact of these putative treatments on tumor genomics. Training data were derived from the AACR Project GENIE Biopharma Collaborative (BPC) for 2,663 patients at Memorial Sloan Kettering (MSK) across four cancer types. For each patient, we selected initial visits with medical and radiation oncologists based on an a priori note prioritization scheme and determined “ground-truth” prior external medications based on manually curated BPC administration records, whitelisting MSK-given medications. We trained logistic regression and clinical longformer models to identify external treatment receipt and evaluated model performance with 5-fold cross-validation. The clinical longformer model performed best across evaluation metrics, with an average area under the receiver operating characteristic curve of 0.972, macro-averaged precision/recall of 0.854/0.902 and macro-averaged F1 score of 0.876. Re-review of discrepant cases suggested that 75% of “false positives” may be due to curation error. We used our model to infer treatment status in a pan-cancer cohort with tumor genomic profiling using our institutional sequencing platform. Out of 48,447 patients, 11,900 were predicted to have received external treatment. Patients with putative external treatment had higher alteration frequencies in resistance-related genes than untreated patients and comparable to known pre-treated patients, including ESR1 in patients with breast cancer, AR in patients with prostate cancer, and EGFR T790M in patients with EGFR-mutated non-small cell lung cancer. Patients with putative external treatments, similar to known pre-treated patients, had shorter survival compared to treatment-naïve patients of the same cancer type. NLP can abstract external treatment status from clinical notes. When applied at scale, our model could help mitigate confounding variables and identify relationships between clinicogenomic variables and anticancer therapy. Citation Format: Thinh N. Tran, Karl B. Pichotta, Si-Yang Liu, Christopher Fong, Anisha Luthra, Brooke Mastrogiacomo, Steven Maron, Deborah Schrag, Sohrab P. Shah, Pedram Razavi, Bob T. Li, Gregory J. Riely, Nikolaus Schultz, Justin Jee. Identification of anti-neoplastic therapy given before initial visit at a referral center using natural language processing applied to medical oncology initial consultation notes. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 4259.
APA, Harvard, Vancouver, ISO, and other styles
15

Choi, Sanghyuk Roy, and Minhyeok Lee. "Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review." Biology 12, no. 7 (July 22, 2023): 1033. http://dx.doi.org/10.3390/biology12071033.

Full text
Abstract:
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
APA, Harvard, Vancouver, ISO, and other styles
16

Yandell, Mark D., and William H. Majoros. "Genomics and natural language processing." Nature Reviews Genetics 3, no. 8 (August 2002): 601–10. http://dx.doi.org/10.1038/nrg861.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Courdy, Samir, Mark Hulse, Sorena Nadaf, Allen Mao, Alex Pozhitkov, Stacy Berger, Jack Chang, et al. "The City of Hope POSEIDON enterprise-wide platform for real-world data and evidence in cancer." Journal of Clinical Oncology 39, no. 15_suppl (May 20, 2021): e18813-e18813. http://dx.doi.org/10.1200/jco.2021.39.15_suppl.e18813.

Full text
Abstract:
e18813 Background: The City of Hope Center for Precision Medicine developed an enterprise-wide platform and precision medicine program to unlock the research potential and clinical value of complex and unique datasets by combining patient data with comprehensive genomic profiling and proprietary analytics. POSEIDON (Precision Oncology Software Environment Interoperable Data Ontologies Network) is a secure, cloud-based Oncology Insights Engine enabling exploration, analysis, visualization, and collaboration on our patient clinico-genomic data along with public data sources. This platform enables investigators to access and visualize data from clinical and multi-omics data and provides an engine that can be utilized for cohort discovery and exploration, preliminary feasibility testing to deriving patient specific insights based on real world data (RWD) and real-world evidence (RWE). Patients are consented through an IRB-approved protocol with active, opt-in participation. Methods: The POSEIDON Common Data Model (PCDM) is a standard, extensible data schema that incorporates patient data to support Precision Medicine. Data are incorporated from disparate data sources and stored in a combined harmonized manner promoting consistency of data and meaning across downstream applications. A multi-step process was created to capture and structure multiple data types into the PCDM. Natural language processing (NLP) tools are deployed to automate and structure valuable data elements from unstructured documents including pathology reports and clinical notes. NLP augmented software tools were developed to assist manual data abstractors to capture more complex terms and disease specific data elements which can include disease progression, progression free survival, and other outcomes. Results: Comprehensive data from 175,000 City of Hope patients are included within this environment for cohort exploration, longitudinal follow-up, outcomes, hypothesis development, and queries for synthetic controls. Data from disease specific-research registries constitute a rich dataset within POSEIDON by disease and tumor type, including lung cancer, colorectal cancer, breast cancer, leukemia, lymphoma and multiple myeloma, among other disease types. Automated genomic workflows were created to gain access to genomic profiling and whole exome sequencing. Genomic data is associated with the clinical data in the PCDM. Automated data flows from the Enterprise Data Warehouse EDW include data that is captured in discrete formats in the EDW and provided for in the PCDM and further enrich the data that flows from the disease registries. Statistically rigorous methods for de-identification are applied for collaborative studies. Conclusions: The City of Hope Center for Precision Medicine and the POSEIDON platform offer an exceptional resource for collaborative RWD & RWE studies.
APA, Harvard, Vancouver, ISO, and other styles
18

Piening, Brian, Bela Bapat, Roshanthi K. Weerasinghe, Ryan Meng, Alexa K. Dowdell, Shu-Ching Chang, Ann Vita, et al. "Improved outcomes from reflex comprehensive genomic profiling-guided precision therapeutic selection across a major US healthcare system." Journal of Clinical Oncology 41, no. 16_suppl (June 1, 2023): 6622. http://dx.doi.org/10.1200/jco.2023.41.16_suppl.6622.

Full text
Abstract:
6622 Background: An ever-increasing number of biomarker-guided therapies, some with pan-cancer indications have expanded the oncologist’s toolbox for fighting advanced cancer. Despite this, not all advanced cancer patients receive tumor genomic testing, or are tested for a limited number of targets. Still others are tested too late in their care journey to benefit from precision therapy. To assess the impact of removing testing barriers, we developed a reflex testing protocol where comprehensive genomic profiling (CGP) was routinely ordered by pathologists at time of diagnosis for advanced cancer patients. Methods: Reflex CGP testing was primarily initiated by the pathologist at the time of advanced cancer diagnosis. Testing was performed between 2019 and 2021 via CGP using the ProvSeq 523 lab-developed test, and testing was performed at no cost to the patient. Post-CGP, stage 4 patients were followed for ≥ 12 months. We assessed time to therapy, therapy selection, and overall survival (OS). Therapies were stratified by presence of biomarkers associated with approved targeted therapies (TT), presence of biomarkers for immunotherapies (IO), and/or therapies that were guideline based (GB) and not associated with a specific biomarker. As much key patient information is only consistently available in free-text medical charts, we implemented a novel natural-language processing (NLP) approach based on deep learning and large language models to accelerate abstraction. Results: A cohort of 1,423 advanced cancer patients met the study criteria. The median age was 66 years, 53% were female, and 82% white. The 3 most tested tumor types were 22% non-small cell lung cancer, 16% colorectal cancer, and 12% breast. Overall, 49% (N=704) of patients had a biomarker result considered actionable for an approved TT or IO. Median (IQR) time-to-treatment initiation post-CGP was 19 (2-70) days. In patients with no actionable TT or IO biomarkers (N=719), 63% were treated with chemotherapy-based regimens, 11% with GB, 17% with unmatched TT, and 8% with unmatched IO. Of patients who had only an actionable TT biomarker (N=287), 18% received matched TT and 13% received GB. 36% of patients with only an actionable IO biomarker (N=317) received matched IO monotherapy. 48% of patients with both actionable TT and IO biomarkers (N=100) received matched TT or IO monotherapy, and 5% received GB. Across all tumor types, patients receiving a TT had better OS compared to patients receiving chemotherapy with 12 months OS (%) of 70.1 (95% CI=64.4 - 76.3) for TT-treated patients, compared to 62.9 (95% CI=58.8 - 67.3) for chemotherapy only. Conclusions: CGP-guided precision therapy use is associated with significantly higher survival in a reflex testing population. A reflex protocol can overcome key barriers to the use and timing of genomic testing to improve access to these life-extending treatment modalities.
APA, Harvard, Vancouver, ISO, and other styles
19

Agrawal, J. P., B. J. Erickson, and C. E. Kahn. "Imaging Informatics: 25 Years of Progress." Yearbook of Medical Informatics 25, S 01 (August 2016): S23—S31. http://dx.doi.org/10.15265/iys-2016-s004.

Full text
Abstract:
SummaryThe science and applications of informatics in medical imaging have advanced dramatically in the past 25 years. This article provides a selective overview of key developments in medical imaging informatics. Advances in standards and technologies for compression and transmission of digital images have enabled Picture Archiving and Communications Systems (PACS) and teleradiology. Research in speech recognition, structured reporting, ontologies, and natural language processing has improved the ability to generate and analyze the reports of imaging procedures. Informatics has provided tools to address workflow and ergonomic issues engendered by the growing volume of medical image information. Research in computer-aided detection and diagnosis of abnormalities in medical images has opened new avenues to improve patient care. The growing number of medical-imaging examinations and their large volumes of information create a natural platform for “big data“ analytics, particularly when joined with high-dimensional genomic data. Radiogenomics investigates relationships between a disease’s genetic and gene-expression characteristics and its imaging phenotype; this emerging field promises to help us better understand disease biology, prognosis, and treatment options. The next 25 years offer remarkable opportunities for informatics and medical imaging together to lead to further advances in both disciplines and to improve health.
APA, Harvard, Vancouver, ISO, and other styles
20

Subhojyoti Chatterjee,, Jagriti Chatterjee. "Exploiting PERL for Micro-RNA Target Identification: A Potential Drug Site for COVID-19." International Journal for Modern Trends in Science and Technology, no. 8 (August 5, 2020): 51–56. http://dx.doi.org/10.46501/ijmtst060810.

Full text
Abstract:
The functioning of gene expression or ribonucleic acid (RNA) silencing is governed by microRNA also known as miRNA. It is a small non-coding RNA molecule which finds its existence in animals, viruses, or plants. The large part of miRNAs is found to be transcribed from DNA sequences to form primary miRNAs followed by the processing of the precursor miRNAs and mature miRNAs. The miRNAs interconnect with their target genes in an effective manner which is dependent on factors like sub-cellular location of miRNAs, the availability of miRNAs and target miRNAs along with their interaction affinities, as seen in case of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Together with its involvement in normal functioning of eukaryotic cells, microRNA deregulation is associated with cancer, i.e., chronic lymphocytic leukemia. Therefore, microRNA target identification becomes important to unwind the relationship between microRNA deregulation and human diseases, thereby paving a path for structure-baseddrug discovery. Towards that direction, we have attempted to use the platform of PERL programming (a user-friendly and dynamic language to easily process and manipulate long sequences), to detect the microRNA target sites in genomic sequences, thereby trying to suppress the expression level for prognosis.
APA, Harvard, Vancouver, ISO, and other styles
21

Bar, Yael, Kfir Bar, Judith Ben Dror, Didi Feldman, Meishar Shahoha, Ahuva Weiss-Meilik, Nachum Dershowitz, Wolf Ido, and Amir Sonnenblick. "Abstract P3-05-27: The impact of co-existing ductal carcinoma in situ in invasive early hormone receptor positive breast cancer on the genomic and clinical risk of recurrence." Cancer Research 83, no. 5_Supplement (March 1, 2023): P3–05–27—P3–05–27. http://dx.doi.org/10.1158/1538-7445.sabcs22-p3-05-27.

Full text
Abstract:
Abstract Background: Invasive early breast cancer (IBC) often presents with a co-existing ductal carcinoma in situ (DCIS) component, while about 5% of the cases present with an extensive (&gt;25%) intraductal component (EIC). The presence of a DCIS component was previously shown to be associated with favorable clinico-pathological characteristics and survival outcomes. However, the association between co-existing DCIS and genomic risk of recurrence is unclear. Methods: Patients with early hormone receptor positive (HR+) HER2neu-negative (HER2-) breast cancer and known OncotypeDX breast recurrence score (RS), who underwent breast surgery in our institute, were included. A natural language processing (NLP) algorithm was used to identify co-existence of extensive DCIS (DCIS-H) and non-extensive DCIS (DCIS-L) in surgical pathological reports. Genomic risk was determined using OncotypeDX RS, while clinical risk was calculated according to the MINDACT criteria, based on tumor size and grade. The genomic and clinical risks of DCIS-H, DCIS-L and pure IBC (No-DCIS) were compared. Results: A total of 45 (5%) DCIS-H cases, 468 (56%) DCIS-L cases and 328 (39%) No-DCIS cases were identified. DCIS-H cases presented with less aggressive clinico-pathological characteristics, such as lower proportions of histologic grade III (10% vs 26% vs 21%) and lower proportions of node-positive disease (13% vs 18% vs 21%), compared to DCIS-L and No-DCIS cases, respectively. The distribution of OncotypeDX RS significantly varied between the groups. DCIS-H tumors were less likely to have a high RS and more likely to have a low or intermediate RS compared to DCIS-L and No-DCIS tumors (High RS: 4% vs 20% vs 20%, Low + intermediate RS: 96% vs 80% vs 80%, respectively; p=0.04). Additionally, the proportions of high clinical risk cases were lower in the DCIS-H group compared to the DCIS-L and No-DCIS groups (42% vs 53% vs 50%, respectively; p=0.002). Based on genomic and clinical risk and current guidelines, we found that women presented with an extensive DCIS component (DCIS-H) had a lower probability of receiving an adjuvant chemotherapy recommendation compared to women presented with a non-extensive DCIS component (DCIS-L) or pure IBC (No-DCIS) (11% vs 29% vs 25%, respectively; p=.035). No differences in disease recurrence were detected between the groups. Conclusions: Co-existing extensive DCIS in invasive early HR+Her2- breast cancer is significantly correlated with lower genomic and clinical risk of recurrence and a smaller chance for chemotherapy recommendation. The rarity of this condition (5% of cases) limited our ability to detect differences in outcomes. These findings warrant future studies of the underlying genomic landscape of co-existing extensive DCIS. Citation Format: Yael Bar, Kfir Bar, Judith Ben- Dror, Didi Feldman, Meishar Shahoha, Ahuva Weiss-Meilik, Nachum Dershowitz, Wolf Ido, Amir Sonnenblick. The impact of co-existing ductal carcinoma in situ in invasive early hormone receptor positive breast cancer on the genomic and clinical risk of recurrence [abstract]. In: Proceedings of the 2022 San Antonio Breast Cancer Symposium; 2022 Dec 6-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2023;83(5 Suppl):Abstract nr P3-05-27.
APA, Harvard, Vancouver, ISO, and other styles
22

Kage, Hidenori, Takashi Aoki, Aya Shinozaki-Ushiku, Kousuke Watanabe, Nana Akiyama, Hideaki Isago, Kazunaga Ishigaki, et al. "Performance of an artificial intelligence-based annotation algorithm for reporting cancer genomic profiling tests." Journal of Clinical Oncology 40, no. 16_suppl (June 1, 2022): 1551. http://dx.doi.org/10.1200/jco.2022.40.16_suppl.1551.

Full text
Abstract:
1551 Background: Cancer genomic profiling (CGP) tests have been approved in Japan since June 2019, with the requisite that all test results be discussed by molecular tumor boards (MTBs). More than 20,000 patients in over 200 designated hospitals have taken CGP tests by December 2021. As CGP tests have entered clinical practice, streamlining decision making by MTBs and standardizing interpretation of test results and treatment recommendations have become urgent issues. Here, we evaluated the utility of Chrovis, an annotation algorithm for reporting CGP tests to support MTBs make their recommendations. Methods: We retrospectively reviewed the reporting process of all approved CGP tests done at The University of Tokyo Hospital between December 2019 and November 2021. Chrovis provided annotation for each genetic variant by incorporating biologic, clinical, and therapeutic information by referencing several public knowledge databases and using natural language processing, and generated reports using the automated program. The MTB reviewed and made any necessary changes before finalizing the report. Changes in disclosure of germline findings were made according to the recommendations of a national guideline with consideration of past and family history. Results: Of the 243 tests, 91 changes in 81 Chrovis reports (33% of all reports) were made by the MTB. The most common type of change was germline disclosure with 26 changes (29%), followed by clinical trial information in Japan (18 changes, 20%) and recommendation of the patient-proposed national basket trial with multiple targeted agents (17 changes, 19%). Changes in germline disclosure increased from June 2021, when an update to a national guideline was released, while the proportion of changes in the latter two types remained unchanged. Gene alterations that led to the highest number of changes was TP53, with 13 changes. Changes in therapeutic recommendations were frequently observed in the RAS/MAPK pathway ( BRAF, KRAS, NF1, NRAS) with 12 changes. More changes were required with a tumor-only tissue CGP panel (57 of 149) compared with a matched tumor/normal tissue CGP panel (24 of 94, p = 0.04), mostly due to germline disclosure (24 vs. 2 changes). Conclusions: We observed that automated algorithm-based reporting was sufficient in 67% of reports. Recommendation for germline disclosure still requires manual supervision, particularly with tumor-only tissue CGP panels if algorithms do not incorporate medical history. The process of recommending clinical trials needs improvement, e.g., standardizing database formats for inclusion and exclusion criteria.
APA, Harvard, Vancouver, ISO, and other styles
23

Luthra, Anisha, Karl Pichotta, Brooke Mastrogiacomo, Samantha McCarthy, Steven Maron, Jianjiong Gao, Justin Jee, Christopher J. Fong, and Nikolaus Schultz. "Abstract 1158: A.I.-assisted clinical data curation to determine genomic biomarkers of cancer metastasis." Cancer Research 82, no. 12_Supplement (June 15, 2022): 1158. http://dx.doi.org/10.1158/1538-7445.am2022-1158.

Full text
Abstract:
Abstract While progression to metastatic disease is the main cause of cancer death, little is known about the genomic mechanisms that drive metastasis. Rapidly growing clinical genomic data sets have the potential to identify genomic biomarkers of cancer metastasis, however, manual curation of clinical data is quickly emerging as a bottleneck. To overcome this challenge, we have developed a natural language processing (NLP) pipeline to identify organs affected by metastasis from radiology reports of patients with cancer. To develop our NLP models, we leveraged the AACR GENIE Biopharma Collaborative lung and colorectal cancer datasets generated in part at Memorial Sloan Kettering Cancer Center (MSK), containing curated labels of ten metastatic disease sites derived from 31,445 corresponding free-text radiology reports (2,310 patients). Using these data, we trained three machine learning models for identifying metastatic events from clinical text, using logistic regression, convolutional neural networks (CNN), and Bidirectional Encoder Representations from Transformers (BERT). We split patients into a training set (80% of patients) and validation set (20%). The BERT model yielded superior performance across evaluation metrics, with an average per metastatic disease site area under the receiver operating characteristic curve (AUC) of 0.981, average accuracy of 97.3%, macro-average precision/recall of 85.1/85.6, and micro-average precision/recall of 87.5/89.6. We applied our method to radiology reports from 52,000 patients with tumors prospectively profiled using the MSK-IMPACT clinical sequencing cohort. A comparison with the MSK-MET cohort, which contains metastatic events derived from billing codes in a subset of 25,000 patients, showed strong concordance (79.7% of metastatic events matched), with the NLP-based method identified an average of 1.4 additional metastatic sites per patient, an expected result given the incomplete nature of the billing code data. Analyzing genomic and clinical data in this cohort, we confirmed that chromosomal instability, as inferred by the fraction of genome altered (FGA), is strongly correlated with metastatic burden (defined as the number of distinct organs affected by metastases) in several tumor types, including prostate adenocarcinoma, lung adenocarcinoma and HR-positive breast ductal carcinoma, and we identified this trend in 10 additional cancer types not previously identified, including lobular HR-positive breast carcinoma and esophageal adenocarcinoma.We demonstrate that mining of electronic health records can be used to extract rich, structured clinical information. Our models, applied at scale, offer a unique resource for the investigation of the biological basis for metastatic spread. We hope our automated clinical data extractions can enable further large-scale studies of associations between genomic biomarkers and metastatic behavior. Citation Format: Anisha Luthra, Karl Pichotta, Brooke Mastrogiacomo, Samantha McCarthy, Steven Maron, Jianjiong Gao, Justin Jee, Christopher J. Fong, Nikolaus Schultz. A.I.-assisted clinical data curation to determine genomic biomarkers of cancer metastasis [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1158.
APA, Harvard, Vancouver, ISO, and other styles
24

Fong, Christopher J., Michele Waters, Karl Pichotta, Justin Jee, Devika R. Jutagir, David Ma, Tomin Perea-Chamblee, et al. "Abstract 4260: Understanding genomic and social determinants of cancer immunotherapy outcome across ancestry." Cancer Research 83, no. 7_Supplement (April 4, 2023): 4260. http://dx.doi.org/10.1158/1538-7445.am2023-4260.

Full text
Abstract:
Abstract Compared with previous standards of care, the use of immune checkpoint inhibitors (ICI) has brought significant improvements in survival and quality of life for lung cancer patients. However, only a small proportion of these patients respond durably. People with different ancestries differ probabilistically in genetic factors, environmental exposures, and socio-economic conditions. Whether patients of different ancestry benefit equally from ICIs remains unclear. We studied the impact of genomic ancestry, tumor genomics, and social determinants of health (SDH) factors and factors that are impacted from SDH including recorded race/ethnicity, inferred low-income status from patient zip codes, exposure to smoking, and BMI on ICI response, defined by cancer progression-free survival (PFS, minimum 6 months FU), for non-small cell lung cancer (NSCLC) patients with MSK-IMPACT targeted panel sequencing. This FDA approved assay includes matched tumor-white blood cell sequencing to distinguish germline from somatic variants and has been applied to 1,802 NSCLC patients who received ICI treatment, including 81 and 117 patients with at least 80% of African (AFR) and East Asian (EAS) ancestry, respectively. Moreover, 173 samples were derived from admixed patients with more than one major ancestry. We first used a natural language processing (NLP) model to obtain PFS from free-text clinical notes. A multivariable cox proportional hazard model was then used to associate PFS with ancestry, race, smoking status, ICI drug regimen, PD-L1 status, disease stage, tumor mutational burden (TMB), inferred income, and BMI. Neither genetic ancestry nor self-reported race/ethnicity was associated with the PFS. Moreover, ICI drug regimen types, low-income status, and BMI were not associated with PFS in our cohort. TMB-high was associated with longer PFS across all ancestries, although TMB was lower in patients with EAS ancestry (Median 7.9 vs. 5.3 mut/Mb, p&lt;0.001). These results suggest that the benefits of ICI extend across ancestry, race, and income lines in a single institution, arguing for more equitable patient access to these medications. We also show that TMB is a generalizable biomarker for ICI outcome across ancestries. However, more diverse patient populations are needed to understand whether there is ancestry-specificity in other ICI outcome biomarkers. Citation Format: Christopher J. Fong, Michele Waters, Karl Pichotta, Justin Jee, Devika R. Jutagir, David Ma, Tomin Perea-Chamblee, Susie Kim, Kanika Arora, Brooke Mastrogiacomo, Thinh Tran, Steven Maron, Mirella Altoe, Anisha Luthra, Joseph Kholodenko, Arfath Patha, Doori Rose, Michael F. Berger, Gregory J. Riely, Nikolaus Schultz, Sanna Goyert, Adam Schoenfeld, Francesca Gany, Jian Carrot-Zhang. Understanding genomic and social determinants of cancer immunotherapy outcome across ancestry. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 4260.
APA, Harvard, Vancouver, ISO, and other styles
25

Teixeira, Pedro L., Wei-Qi Wei, Robert M. Cronin, Huan Mo, Jacob P. VanHouten, Robert J. Carroll, Eric LaRose, et al. "Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals." Journal of the American Medical Informatics Association 24, no. 1 (August 7, 2016): 162–71. http://dx.doi.org/10.1093/jamia/ocw071.

Full text
Abstract:
Objective: Phenotyping algorithms applied to electronic health record (EHR) data enable investigators to identify large cohorts for clinical and genomic research. Algorithm development is often iterative, depends on fallible investigator intuition, and is time- and labor-intensive. We developed and evaluated 4 types of phenotyping algorithms and categories of EHR information to identify hypertensive individuals and controls and provide a portable module for implementation at other sites. Materials and Methods: We reviewed the EHRs of 631 individuals followed at Vanderbilt for hypertension status. We developed features and phenotyping algorithms of increasing complexity. Input categories included International Classification of Diseases, Ninth Revision (ICD9) codes, medications, vital signs, narrative-text search results, and Unified Medical Language System (UMLS) concepts extracted using natural language processing (NLP). We developed a module and tested portability by replicating 10 of the best-performing algorithms at the Marshfield Clinic. Results: Random forests using billing codes, medications, vitals, and concepts had the best performance with a median area under the receiver operator characteristic curve (AUC) of 0.976. Normalized sums of all 4 categories also performed well (0.959 AUC). The best non-NLP algorithm combined normalized ICD9 codes, medications, and blood pressure readings with a median AUC of 0.948. Blood pressure cutoffs or ICD9 code counts alone had AUCs of 0.854 and 0.908, respectively. Marshfield Clinic results were similar. Conclusion: This work shows that billing codes or blood pressure readings alone yield good hypertension classification performance. However, even simple combinations of input categories improve performance. The most complex algorithms classified hypertension with excellent recall and precision.
APA, Harvard, Vancouver, ISO, and other styles
26

Yung, Rachel, Kari A. Stephens, Meliha Yetisgen, Andrea Burnett-Hartman, Ashwani Tanwar, Guilherme Freire, Atri Sharma, et al. "Abstract 4090: Creating research quality cancer genomic data from electronic health records." Cancer Research 82, no. 12_Supplement (June 15, 2022): 4090. http://dx.doi.org/10.1158/1538-7445.am2022-4090.

Full text
Abstract:
Abstract Background: Understanding the impact of precision medicine on medical practice, patient care, and clinical outcomes is a priority for advancing cancer care. With the recent dramatic increase in the use of tumor genomic testing (TGT), records within EHRs are a rich data source for evaluating the impact of TGT results in real-world clinical practice of care and on patient outcomes. However, extracting TGT results from electronic health records (EHR) is challenging due to a lack of standards to communicate genomic information and an inability to store such information in commonly available EHR systems. Moreover, TGT results are delivered to clinicians in unstructured formats and image-based files (PDFs). We initiated a pilot study to assess the ability of natural language processing (NLP) algorithms to convert EHR unstructured clinical text and PDF-formatted TGT results into research-quality data. Methods: One author (RY) drew a sample of approximately 800 clinical text records from 21 breast cancer patients treated at University of Washington. Sources used for data extraction included medical record notes and PDF reports for two breast cancer gene expression tests: 21-gene Recurrence Score (RS, OncotypeDx) and/or the 70-gene signature (MMP, Mammaprint). A team redacted all PHI and provided records to a commercial collaborator (Pangaeadata.AI, UK), along with definitions of variables to be extracted, but without annotated target answers. Existing NLP algorithms that leverage pre-training, fine-tuning and rules were adapted to extract 26 variables specified by the research team (e.g., age at diagnosis, histology, and RS or MMP dates and scores). The output placed variables into relevant, standardized formats and produced a research quality data set. The extraction strategy depended on the feature and variable characteristics. For example, cancer stage, an ordinal numerical variable, was determined with a rule-based extraction method from outpatient clinic notes and pathology reports, whereas the RS score, a continuous variable, came from OncotypeDx PDF and OCR semi-structured retrieval produced the output. Results/Conclusions: The Pangaea tool obtained an average accuracy of 97.3% with a standard deviation of 3.5% across all 26 variables. The approach is developed based on rules designed and validated by clinical experts, using a model that does not require training, making overfitting likely minimal. Qualitative analysis showed that: 1] algorithms used to electronically extract TGT results provided the same data as manual abstraction by physicians, and 2] context matters, namely, the capability of preliminary semantic understanding in the Pangaea model using contextual words and phrases contributed to high accuracy and can be generalized further with larger datasets. Expansion to other health care data systems is needed to assess scalability of these technologies to create research-quality data fit for use. Citation Format: Rachel Yung, Kari A. Stephens, Meliha Yetisgen, Andrea Burnett-Hartman, Ashwani Tanwar, Guilherme Freire, Atri Sharma, Jingqing Zhang, Vibhor Gupta, Yike Guo, VK Gadi, Larry Kessler. Creating research quality cancer genomic data from electronic health records [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 4090.
APA, Harvard, Vancouver, ISO, and other styles
27

Halder, Rajib Kumar, Mohammed Nasir Uddin, Md Ashraf Uddin, Sunil Aryal, Md Aminul Islam, Fahima Hossain, Nusrat Jahan, Ansam Khraisat, and Ammar Alazab. "A Grid Search-Based Multilayer Dynamic Ensemble System to Identify DNA N4—Methylcytosine Using Deep Learning Approach." Genes 14, no. 3 (February 25, 2023): 582. http://dx.doi.org/10.3390/genes14030582.

Full text
Abstract:
DNA (Deoxyribonucleic Acid) N4-methylcytosine (4mC), a kind of epigenetic modification of DNA, is important for modifying gene functions, such as protein interactions, conformation, and stability in DNA, as well as for the control of gene expression throughout cell development and genomic imprinting. This simply plays a crucial role in the restriction–modification system. To further understand the function and regulation mechanism of 4mC, it is essential to precisely locate the 4mC site and detect its chromosomal distribution. This research aims to design an efficient and high-throughput discriminative intelligent computational system using the natural language processing method “word2vec” and a multi-configured 1D convolution neural network (1D CNN) to predict 4mC sites. In this article, we propose a grid search-based multi-layer dynamic ensemble system (GS-MLDS) that can enhance existing knowledge of each level. Each layer uses a grid search-based weight searching approach to find the optimal accuracy while minimizing computation time and additional layers. We have used eight publicly available benchmark datasets collected from different sources to test the proposed model’s efficiency. Accuracy results in test operations were obtained as follows: 0.978, 0.954, 0.944, 0.961, 0.950, 0.973, 0.948, 0.952, 0.961, and 0.980. The proposed model has also been compared to 16 distinct models, indicating that it can accurately predict 4mC.
APA, Harvard, Vancouver, ISO, and other styles
28

Jee, Justin, Chris Fong, Karl Pichotta, Thinh Tran, Anisha Luthra, Mirella Altoe, Steven Maron, et al. "Abstract 5721: Automated annotation for large-scale clinicogenomic models of lung cancer treatment response and overall survival." Cancer Research 83, no. 7_Supplement (April 4, 2023): 5721. http://dx.doi.org/10.1158/1538-7445.am2023-5721.

Full text
Abstract:
Abstract The digitization of health records and prompt availability of tumor DNA sequencing results offer a chance to study the determinants of cancer outcomes with unprecedented richness; however, abstraction of key attributes from free text presents a major limitation to large-scale analyses. Using natural language processing (NLP), we derived sites of metastasis, prior treatment at outside institutions, programmed death ligand 1 (PD-L1) levels, and smoking status from records of patients with tumor sequencing to create a richly annotated clinicogenomic cohort. We sought to define whether combining features would improve models of overall survival (OS) and treatment response as validated in a multi-institution, manually curated cohort. We leveraged the manually curated AACR GENIE Biopharma Collaborative (BPC) dataset to train NLP algorithms to abstract the aforementioned features from overlapping records available at Memorial Sloan Kettering (MSK). All models achieved precision and recall &gt; 0.85. We deployed these algorithms to records of all MSK patients with non-small cell lung cancer (NSCLC) and tumor profiling with our FDA-authorized institutional targeted sequencing platform (N=7,015). These labels were combined with genomic, demographic, histopathologic, internal treatment and staging data to train random survival forests (RSF) to predict OS and time-to-next-treatment (TTNT) for molecularly targeted and immunotherapies. RSFs trained on the MSK NSCLC cohort were validated with the curated, non-MSK BPC NSCLC cohort (N=977). The addition of NLP-derived variables to genomic features enhanced RSF predictive power for OS (c-index, 10x bootstrap 95%CI: 0.58, 0.57-0.59 vs 0.75, 0.74-0.76 combined) and targeted and immunotherapy TTNT. The size of the MSK NSCLC cohort enabled discovery of associations between metastatic sites, PD-L1 status, genomics, and TTNTs not apparent in the smaller BPC cohort. We measured the added predictive value of variables not available in BPC with MSK-only cross-validation analyses. White blood cell differential counts and additional tissue genomic features including tumor mutational burden and fraction genome altered added minimally, while circulating tumor DNA sequencing added prognostic power for OS over other factors including disease burden Using NLP we present a large NSCLC cohort with rich clinicoradiographic annotation, leading to superior models of patient outcomes. Our data uncovers associations not observed in smaller, manually curated cohorts and provides a foundation for further research in therapy choice and prognostication. Citation Format: Justin Jee, Chris Fong, Karl Pichotta, Thinh Tran, Anisha Luthra, Mirella Altoe, Steven Maron, Ronglai Shen, Si-Yang Liu, Michele Waters, Joseph Kholodenko, Brooke Mastrogiacomo, Susie Kim, A Rose Brannon, Michael F. Berger, Axel Martin, Jason Chang, Anton Safonov, Jorge S. Reis-Filho, Deborah Schrag, Sohrab P. Shah, Pedram Razavi, Bob T. Li, Gregory J. Riely, Nikolaus Schultz. Automated annotation for large-scale clinicogenomic models of lung cancer treatment response and overall survival. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5721.
APA, Harvard, Vancouver, ISO, and other styles
29

Korenberg, Julie R., Xiao-Ning Chen, Hamao Hirota, Zona Lai, Ursula Bellugi, Dennis Burian, Bruce Roe, and Rumiko Matsuoka. "VI. Genome Structure and Cognitive Map of Williams Syndrome." Journal of Cognitive Neuroscience 12, supplement 1 (March 2000): 89–107. http://dx.doi.org/10.1162/089892900562002.

Full text
Abstract:
Williams syndrome (WMS) is a most compelling model of human cognition, of human genome organization, and of evolution. Due to a deletion in chromosome band 7q11.23, subjects have cardiovascular, connective tissue, and neurode-velopmental deficits. Given the striking peaks and valleys in neurocognition including deficits in visual-spatial and global processing, preserved language and face processing, hypersociability, and heightened affect, the goal of this work has been to identify the genes that are responsible, the cause of the deletion, and its origin in primate evolution. To do this, we have generated an integrated physical, genetic, and transcriptional map of the WMS and flanking regions using multicolor metaphase and interphase fluorescence in situ hybridization (FISH) of bacterial artificial chromosomes (BACs) and P1 artificial chromosomes (PACs), BAC end sequencing, PCR gene marker and microsatellite, large-scale sequencing, cDNA library, and database analyses. The results indicate the genomic organization of the WMS region as two nested duplicated regions flanking a largely single-copy region. There are at least two common deletion breakpoints, one in the centromeric and at least two in the telomeric repeated regions. Clones anchoring the unique to the repeated regions are defined along with three new pseudogene families. Primate studies indicate an evolutionary hot spot for chromosomal inversion in the WMS region. A cognitive phenotypic map of WMS is presented, which combines previous data with five further WMS subjects and three atypical WMS subjects with deletions; two larger (deleted for D7S489L) and one smaller, deleted for genes telomeric to FZD9, through LIMK1, but not WSCR1 or telomeric. The results establish regions and consequent gene candidates for WMS features including mental retardation, hypersociability, and facial features. The approach provides the basis for defining pathways linking genetic underpinnings with the neuroanatomical, functional, and behavioral consequences that result in human cognition.
APA, Harvard, Vancouver, ISO, and other styles
30

Leone, Michele, Eugenia Galeota, Marco Masseroli, and Mattia Pelizzola. "Identification, semantic annotation and comparison of combinations of functional elements in multiple biological conditions." Bioinformatics 38, no. 5 (December 2, 2021): 1183–90. http://dx.doi.org/10.1093/bioinformatics/btab815.

Full text
Abstract:
Abstract Motivation Approaches such as chromatin immunoprecipitation followed by sequencing (ChIP-seq) represent the standard for the identification of binding sites of DNA-associated proteins, including transcription factors and histone marks. Public repositories of omics data contain a huge number of experimental ChIP-seq data, but their reuse and integrative analysis across multiple conditions remain a daunting task. Results We present the Combinatorial and Semantic Analysis of Functional Elements (CombSAFE), an efficient computational method able to integrate and take advantage of the valuable and numerous, but heterogeneous, ChIP-seq data publicly available in big data repositories. Leveraging natural language processing techniques, it integrates omics data samples with semantic annotations from selected biomedical ontologies; then, using hidden Markov models, it identifies combinations of static and dynamic functional elements throughout the genome for the corresponding samples. CombSAFE allows analyzing the whole genome, by clustering patterns of regions with similar functional elements and through enrichment analyses to discover ontological terms significantly associated with them. Moreover, it allows comparing functional states of a specific genomic region to analyze their different behavior throughout the various semantic annotations. Such findings can provide novel insights by identifying unexpected combinations of functional elements in different biological conditions. Availability and implementation The Python implementation of the CombSAFE pipeline is freely available for non-commercial use at: https://github.com/DEIB-GECO/CombSAFE. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
31

Brady, Cassandra, Bahram Namjou, Stephanie Kennebeck, Jonathan Bickel, Nandan Patibandla, Yizhao Ni, Sara Van Driest, et al. "Developing an Algorithm to Detect Early Childhood Obesity in Two Tertiary Pediatric Medical Centers." Applied Clinical Informatics 07, no. 03 (July 2016): 693–706. http://dx.doi.org/10.4338/aci-2016-01-ra-0015.

Full text
Abstract:
SummaryThe objective of this study is to develop an algorithm to accurately identify children with severe early onset childhood obesity (ages 1–5.99 years) using structured and unstructured data from the electronic health record (EHR).Childhood obesity increases risk factors for cardiovascular morbidity and vascular disease. Accurate definition of a high precision phenotype through a standardize tool is critical to the success of large-scale genomic studies and validating rare monogenic variants causing severe early onset obesity.Rule based and machine learning based algorithms were developed using structured and unstructured data from two EHR databases from Boston Children’s Hospital (BCH) and Cincinnati Children’s Hospital and Medical Center (CCHMC). Exclusion criteria including medications or comorbid diagnoses were defined. Machine learning algorithms were developed using cross-site training and testing in addition to experimenting with natural language processing features.Precision was emphasized for a high fidelity cohort. The rule-based algorithm performed the best overall, 0.895 (CCHMC) and 0.770 (BCH). The best feature set for machine learning employed Unified Medical Language System (UMLS) concept unique identifiers (CUIs), ICD-9 codes, and RxNorm codes.Detecting severe early childhood obesity is essential for the intervention potential in children at the highest long-term risk of developing comorbidities related to obesity and excluding patients with underlying pathological and non-syndromic causes of obesity assists in developing a high-precision cohort for genetic study. Further such phenotyping efforts inform future practical application in health care environments utilizing clinical decision support.Citation: Lingren T, Thaker V, Brady C, Namjou B, Kennebeck S, Bickel J, Patibandla N, Ni Y, Van Driest SL, Chen L, Roach A, Cobb B, Kirby J, Denny J, Bailey-Davis L, Williams MS, Marsolo K, Solti I, Holm IA, Harley J, Kohane IS, Savova G, Crimmins N. Developing an algorithm to detect early childhood obesity in two tertiary pediatric medical centers.
APA, Harvard, Vancouver, ISO, and other styles
32

Mechanic, Leah E., Danielle M. Carrick, Tony Dickherber, and Juli Klemm. "Abstract A04: NCI programs supporting technology development for population studies in digital age." Cancer Epidemiology, Biomarkers & Prevention 29, no. 9_Supplement (September 1, 2020): A04. http://dx.doi.org/10.1158/1538-7755.modpop19-a04.

Full text
Abstract:
Abstract Cancer is the result of a complex interplay of genetic, environmental, host, and societal factors operating over a prolonged time. The development of novel molecular technologies and informatics tools can facilitate a more comprehensive study of the risk factors contributing to the development of and outcomes from cancer in population studies. The National Cancer Institute (NCI) leads two programs in this area: the Innovative Molecular Analysis Technologies (IMAT) Program and the Informatics Technology for Cancer Research (ITCR) Program. IMAT supports the development, technical maturation, and dissemination of novel and potentially transformative next-generation technologies. ITCR program supports research-driven informatics technology development spanning all aspects of cancer research and stages of tool development, from algorithm development to prototyping, enhancement, and sustainment of these tools. The funding opportunities through these programs can support the development and application of new technologies for epidemiology research. For example, molecular technologies may address needs in areas such as exposure assessment, epigenetics, genomics, transcriptomics, imaging, and collection of biospecimens. Informatics technology needs may include genomic tools for data analysis, interpretation and visualization, annotation of genetic variants, supporting sharing of data, natural language processing of electronic health records (EHRs), managing cohort data collection, data harmonization, and extracting unstructured phenotype data from medical records. Importantly, the tools and resources that have been developed through these programs can be leveraged for use by epidemiology researchers. Learn more about the funding opportunities and technologies developed through these programs at https://imat.cancer.gov and https://itcr.cancer.gov. Citation Format: Leah E. Mechanic, Danielle M. Carrick, Tony Dickherber, Juli Klemm, NCI Innovative Molecular Analysis Technologies (IMAT) Program, NCI Informatics Technology for Cancer Research (ITCR) Program. NCI programs supporting technology development for population studies in digital age [abstract]. In: Proceedings of the AACR Special Conference on Modernizing Population Sciences in the Digital Age; 2019 Feb 19-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2020;29(9 Suppl):Abstract nr A04.
APA, Harvard, Vancouver, ISO, and other styles
33

Guevara, Elaine E., William D. Hopkins, Patrick R. Hof, John J. Ely, Brenda J. Bradley, and Chet C. Sherwood. "Comparative analysis reveals distinctive epigenetic features of the human cerebellum." PLOS Genetics 17, no. 5 (May 6, 2021): e1009506. http://dx.doi.org/10.1371/journal.pgen.1009506.

Full text
Abstract:
Identifying the molecular underpinnings of the neural specializations that underlie human cognitive and behavioral traits has long been of considerable interest. Much research on human-specific changes in gene expression and epigenetic marks has focused on the prefrontal cortex, a brain structure distinguished by its role in executive functions. The cerebellum shows expansion in great apes and is gaining increasing attention for its role in motor skills and cognitive processing, including language. However, relatively few molecular studies of the cerebellum in a comparative evolutionary context have been conducted. Here, we identify human-specific methylation in the lateral cerebellum relative to the dorsolateral prefrontal cortex, in a comparative study with chimpanzees (Pan troglodytes) and rhesus macaques (Macaca mulatta). Specifically, we profiled genome-wide methylation levels in the three species for each of the two brain structures and identified human-specific differentially methylated genomic regions unique to each structure. We further identified which differentially methylated regions (DMRs) overlap likely regulatory elements and determined whether associated genes show corresponding species differences in gene expression. We found greater human-specific methylation in the cerebellum than the dorsolateral prefrontal cortex, with differentially methylated regions overlapping genes involved in several conditions or processes relevant to human neurobiology, including synaptic plasticity, lipid metabolism, neuroinflammation and neurodegeneration, and neurodevelopment, including developmental disorders. Moreover, our results show some overlap with those of previous studies focused on the neocortex, indicating that such results may be common to multiple brain structures. These findings further our understanding of the cerebellum in human brain evolution.
APA, Harvard, Vancouver, ISO, and other styles
34

Kirby, Jacqueline C., Peter Speltz, Luke V. Rasmussen, Melissa Basford, Omri Gottesman, Peggy L. Peissig, Jennifer A. Pacheco, et al. "PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability." Journal of the American Medical Informatics Association 23, no. 6 (March 28, 2016): 1046–52. http://dx.doi.org/10.1093/jamia/ocv202.

Full text
Abstract:
Abstract Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems. Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org ), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
APA, Harvard, Vancouver, ISO, and other styles
35

Fathiamini, Safa, Amber M. Johnson, Jia Zeng, Alejandro Araya, Vijaykumar Holla, Ann M. Bailey, Beate C. Litzenburger, et al. "Automated identification of molecular effects of drugs (AIMED)." Journal of the American Medical Informatics Association 23, no. 4 (April 23, 2016): 758–65. http://dx.doi.org/10.1093/jamia/ocw030.

Full text
Abstract:
Abstract Introduction Genomic profiling information is frequently available to oncologists, enabling targeted cancer therapy. Because clinically relevant information is rapidly emerging in the literature and elsewhere, there is a need for informatics technologies to support targeted therapies. To this end, we have developed a system for Automated Identification of Molecular Effects of Drugs, to help biomedical scientists curate this literature to facilitate decision support. Objectives To create an automated system to identify assertions in the literature concerning drugs targeting genes with therapeutic implications and characterize the challenges inherent in automating this process in rapidly evolving domains. Methods We used subject-predicate-object triples (semantic predications) and co-occurrence relations generated by applying the SemRep Natural Language Processing system to MEDLINE abstracts and ClinicalTrials.gov descriptions. We applied customized semantic queries to find drugs targeting genes of interest. The results were manually reviewed by a team of experts. Results Compared to a manually curated set of relationships, recall, precision, and F2 were 0.39, 0.21, and 0.33, respectively, which represents a 3- to 4-fold improvement over a publically available set of predications (SemMedDB) alone. Upon review of ostensibly false positive results, 26% were considered relevant additions to the reference set, and an additional 61% were considered to be relevant for review. Adding co-occurrence data improved results for drugs in early development, but not their better-established counterparts. Conclusions Precision medicine poses unique challenges for biomedical informatics systems that help domain experts find answers to their research questions. Further research is required to improve the performance of such systems, particularly for drugs in development.
APA, Harvard, Vancouver, ISO, and other styles
36

Crespi, Bernard, and Christopher Badcock. "Psychosis and autism as diametrical disorders of the social brain." Behavioral and Brain Sciences 31, no. 3 (June 2008): 241–61. http://dx.doi.org/10.1017/s0140525x08004214.

Full text
Abstract:
AbstractAutistic-spectrum conditions and psychotic-spectrum conditions (mainly schizophrenia, bipolar disorder, and major depression) represent two major suites of disorders of human cognition, affect, and behavior that involve altered development and function of the social brain. We describe evidence that a large set of phenotypic traits exhibit diametrically opposite phenotypes in autistic-spectrum versus psychotic-spectrum conditions, with a focus on schizophrenia. This suite of traits is inter-correlated, in that autism involves a general pattern of constrained overgrowth, whereas schizophrenia involves undergrowth. These disorders also exhibit diametric patterns for traits related to social brain development, including aspects of gaze, agency, social cognition, local versus global processing, language, and behavior. Social cognition is thus underdeveloped in autistic-spectrum conditions and hyper-developed on the psychotic spectrum.;>We propose and evaluate a novel hypothesis that may help to explain these diametric phenotypes: that the development of these two sets of conditions is mediated in part by alterations of genomic imprinting. Evidence regarding the genetic, physiological, neurological, and psychological underpinnings of psychotic-spectrum conditions supports the hypothesis that the etiologies of these conditions involve biases towards increased relative effects from imprinted genes with maternal expression, which engender a general pattern of undergrowth. By contrast, autistic-spectrum conditions appear to involve increased relative bias towards effects of paternally expressed genes, which mediate overgrowth. This hypothesis provides a simple yet comprehensive theory, grounded in evolutionary biology and genetics, for understanding the causes and phenotypes of autistic-spectrum and psychotic-spectrum conditions.
APA, Harvard, Vancouver, ISO, and other styles
37

Clark, Michelle M., Amber Hildreth, Sergey Batalov, Yan Ding, Shimul Chowdhury, Kelly Watkins, Katarzyna Ellsworth, et al. "Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation." Science Translational Medicine 11, no. 489 (April 24, 2019): eaat6177. http://dx.doi.org/10.1126/scitranslmed.aat6177.

Full text
Abstract:
By informing timely targeted treatments, rapid whole-genome sequencing can improve the outcomes of seriously ill children with genetic diseases, particularly infants in neonatal and pediatric intensive care units (ICUs). The need for highly qualified professionals to decipher results, however, precludes widespread implementation. We describe a platform for population-scale, provisional diagnosis of genetic diseases with automated phenotyping and interpretation. Genome sequencing was expedited by bead-based genome library preparation directly from blood samples and sequencing of paired 100-nt reads in 15.5 hours. Clinical natural language processing (CNLP) automatically extracted children’s deep phenomes from electronic health records with 80% precision and 93% recall. In 101 children with 105 genetic diseases, a mean of 4.3 CNLP-extracted phenotypic features matched the expected phenotypic features of those diseases, compared with a match of 0.9 phenotypic features used in manual interpretation. We automated provisional diagnosis by combining the ranking of the similarity of a patient’s CNLP phenome with respect to the expected phenotypic features of all genetic diseases, together with the ranking of the pathogenicity of all of the patient’s genomic variants. Automated, retrospective diagnoses concurred well with expert manual interpretation (97% recall and 99% precision in 95 children with 97 genetic diseases). Prospectively, our platform correctly diagnosed three of seven seriously ill ICU infants (100% precision and recall) with a mean time saving of 22:19 hours. In each case, the diagnosis affected treatment. Genome sequencing with automated phenotyping and interpretation in a median of 20:10 hours may increase adoption in ICUs and, thereby, timely implementation of precise treatments.
APA, Harvard, Vancouver, ISO, and other styles
38

Allaway, Robert J., Sasha Scott, Gabriel Altay, Hariprasad Donthi, Karthika R, Muhammad Alaa Alwattar, Lucas Pastur Romay, et al. "Abstract LB056: Crowdsourcing rare cancer research in the Hack4NF GENIE-NF tumor identification and classification challenge." Cancer Research 83, no. 8_Supplement (April 14, 2023): LB056. http://dx.doi.org/10.1158/1538-7445.am2023-lb056.

Full text
Abstract:
Abstract A key challenge in rare tumor research is the paucity of genomic data that can be used to understand and devise better therapeutic strategies for rare cancers. Furthermore, the “curse of dimensionality,” in which data has many features, such as genetic variants, but few specimens, makes it difficult or impossible to use conventional machine learning techniques to explore these data. To address these challenges in the context of tumors associated with the rare disease neurofibromatosis, we ran a hackathon to stimulate the development of methods to better understand the biology of tumors related to this disease. The hackathon had three challenges centered around variant effect prediction, drug discovery, and genomics. The genomics track of the hackathon leveraged the AACR Project GENIE database (1) and challenged participants to develop new frameworks that accurately use GENIE data to classify neurofibromatosis-related tumors. They were asked to first identify the neurofibromatosis-related tumors in the dataset. They were then asked to use one or more novel classification methods to classify the tumor samples into different groups based on genetic features. To help them do this, we provided access to version 13 of the GENIE database to the hackathon participants, though they were allowed to integrate other relevant datasets. The expected output was a classification method that differentiates different types of NF1, NF2, and schwannomatosis-related tumors using clinical sequencing data, as well as a list of the most important features in the algorithm for differentiating tumor types. Domain expert judges qualitatively scored each team’s rationale for defining and including “NF-related tumors” in their project, and scored the feature list based on the presence of known important biomarkers and features in NF tumors as well as potentially novel features that the algorithm identified. A technical judge also scored the code repository based on documentation and clarity of code. Two teams from the GENIE subchallenge were awarded prizes - Team Next GeNLP as the best overall GENIE challenge submission, and team “Artificial Intelligence for neurofibromatosis” for best project documentation. Both winning teams used methods based on natural language processing (NLP) techniques to reduce the dimensionality and complexity of the variant data, and to identify new representations of NF-relevant tumors, and then applied downstream analysis methods such as distance calculations and feature prioritization to better understand the genomic profiles of different tumors. While these methods and tools focused on NF-specific tumor types, we anticipate that they could be re-used by others to better explore the biology and interrelatedness of other rare tumors within the GENIE database. (1) The AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine Through An International Consortium, Cancer Discov. 2017. The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. Citation Format: Robert J. Allaway, Sasha Scott, Gabriel Altay, Hariprasad Donthi, Karthika R, Muhammad Alaa Alwattar, Lucas Pastur Romay, Ayesha Parsha, Jineta Banerjee, Chelsea Nayan, Julie Bletz, Salvatore La Rosa. Crowdsourcing rare cancer research in the Hack4NF GENIE-NF tumor identification and classification challenge [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 2 (Clinical Trials and Late-Breaking Research); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(8_Suppl):Abstract nr LB056.
APA, Harvard, Vancouver, ISO, and other styles
39

Ruby Dhar, Arun Kumar, and Subhradip Karmakar. "Artificial intelligence in healthcare: Setting new algo RHYTHM in medicine." Asian Journal of Medical Sciences 13, no. 11 (November 1, 2022): 1–2. http://dx.doi.org/10.3126/ajms.v13i11.48575.

Full text
Abstract:
Artificial intelligence (AI) is a rapidly expanding avenue in science and is often used to describe the use of computers, big data mining, modeling, and network that may simulate intelligent behavior and critical thinking similar to a human being but at an exceedingly fast pace. What lies at the heart of AI are intelligent algorithms that drive the whole process integratively, imparting the decisive ability to machines. The artificial intelligence market is expected to grow at a rate of 39.4% by 2022-2028. British mathematician Alan Turing’s 1950 paper “Computing Machinery and Intelligence " established AI's fundamental goal and vision. Fundamentally, AI is a branch of computer science that aims to answer Turing’s question of replicating human intelligence as machines. Since AI research aims to make smarter algorithms evolve and emulate human-like functioning, the degree to which an AI system can replicate human capabilities is used to determine the types of AI. The volume, magnitude, and complexity of data in healthcare mean that artificial intelligence (AI) will be increasingly applied and exploited, with several types of AI already being employed by researchers, clinicians, and data analysts. AI is a collective term, not just one technology, with an immediate relevance to healthcare. Few AI technologies of high importance to healthcare are: Machine learning, Neural Networks, and Deep Learning Natural Language Processing ( NLP) Physical Robots Rule-based expert systems The AI-associated healthcare market is growing at an exponential pace and is expected to reach USD 6.6 billion by 2021. Precision medicine initiatives can be divided into three types of clinical areas: complex algorithms, digital health applications, and “omics”-based tests. Large genomic datasets and informatics related to demographic data or diagnostic reports are exploited by machine learning algorithms for prediction and prognosis. This is coupled with patient healthcare data, such as treatment protocols, treatment response, response to therapy, and health monitoring, with data input from clinicians, researchers, and the public domain. High throughput big data from population-based studies are used with machine learning algorithms to establish correlations, build predictive models, and provide customized treatment protocols. In Omics based era, data from genomics, proteomics, and metabolomics are exceedingly used to address and identify health-related problems in large study cohorts . This will provide opportunities for translation and transition from bench to bedside through precision medicine.
APA, Harvard, Vancouver, ISO, and other styles
40

Mukherjee, Semanti, Andrew Schroeder, Subrata Chatterjee, John Cadley, Christina J. Falcon, Miika Mehine, Chaitanya Bandlamudi, et al. "Association of smoking history extracted from electronic health records (EHR) using machine-learning methods and tumor characteristics in patients with lung cancer." Journal of Clinical Oncology 41, no. 16_suppl (June 1, 2023): 1559. http://dx.doi.org/10.1200/jco.2023.41.16_suppl.1559.

Full text
Abstract:
1559 Background: Though smoking is a major risk factor for lung cancer, it has been a challenge to collect patients’ smoking history information accurately from the EH due to data inconsistency and incompleteness. To address these challenges, we utilized a weak supervision methodology to automatically annotate smoking status of patients with lung cancer and correlated it with tumor characteristics. Methods: We analyzed 6,355 patients with lung cancer who underwent tumor profiling with MSK-IMPACT. In total, 14,555 unstructured clinical notes were extracted from EHR at the Memorial Sloan Kettering Cancer Center. The weak supervision methodology used a generative model for intermediate labels that were subsequently tuned by machine-learning classifier to generate the final labels. Clinical notes from a randomly sampled set of 564 patients were manually curated and used for performance assessment. The rest of the patients were split into training and validation datasets used for model training and hyperparameter tuning. Pack years were also extracted from clinical notes using Natural Language Processing. We next conducted multivariate analyses for primary and metastatic tumor samples separately to correlate smoking metrics with tumor characteristics including tumor mutation burden (TMB) and chromosomal instability, as inferred by the fraction of genome altered (FGA) after controlling for age at sequencing, gender, histological subtypes, ancestry, coverage and tumor purity. Results: The weak supervision classifier had almost perfect performance for 2-label classification model (ever smokers and never smokers) with macro F1-score: 97.7%, balanced accuracy: 97.1%, 97.1%, precision:98.4%, 98.4% and recall: 99.5%,94.6% respectively. For 3-label classification model (never smoker, former smoker, and current smoker), the macro F1-score was 79.8%; balanced accuracy: 97.1%, 86.7%, 71.2%, precision: 93.9%, 90.1%, 61.7%, recall: 96.1%, 93.3%, 46.0% respectively. Analyzing genomic data, we observed that smoking status (smoker vs. never smoker) and pack-years were associated with TMB in both primary and metastatic tumor samples (p<2e-16). FGA was marginally associated with smokers compared to never smokers in primary tumor samples (p=0.06). Among smokers diagnosed with lung adenocarcinoma, significantly high FGA in primary tumor samples was observed in males compared to females after adjusting for pack-years and other variables (p= 3.3e-3). Conclusions: We demonstrated high performance of our approach for automated curation of smoking history from EHR. The genomic results confirmed distinct mutational patterns associated with smoking behavior in patients with lung cancer. We are currently exploring multimodal approaches by including chest CT images and “time of quitting” to improve performance of the 3-class model.
APA, Harvard, Vancouver, ISO, and other styles
41

Torrente, Maria, Ernestina Menasalvas, Pedro Da Costa Sousa, and Mariano Provencio. "The CLARIFY digital decision support platform: An artificial intelligence tool for exploring multidimensional cancer data." Journal of Clinical Oncology 41, no. 16_suppl (June 1, 2023): e13638-e13638. http://dx.doi.org/10.1200/jco.2023.41.16_suppl.e13638.

Full text
Abstract:
e13638 Background: Artificial intelligence has emerged as a mean of improving cancer care with the use of computer science. A digital framework that includes heterogeneous pipelines of real-world data can take advantage of AI, genomics and natural language processing to uncover insights that support decision-making in daily clinical practice. The goal of this study is to present an AI-based solution tool for cancer patients’ and identify clinical factors associated with relapse and survival, and to develop a prognostic model that identifies features associated with poor prognosis and stratifies patients by risk. Methods: This is a hospital-based retrospective registry included in CLARIFY (Cancer Long Survivor Artificial Intelligence Follow-up), a European project supported by the EU Horizon 2020 (grant agreement nº 875160), including 2275 patients diagnosed since 2008 at Medical Oncology Department at Hospital Universitario Puerta de Hierro-Majadahonda with non-small cell lung cancer (NSCLC) and 3000 breast cancer patients. The study was approved by the Ethics Committee at HUPHM (No. PI 148/15) and was carried out in accordance with the Helsinki Declaration. CLARIFY Decision Support Platform (DSP) is an AI-based solution tool which centralizes and analyzes real time anonymized clinical data from heterogeneous sources of data. It shows the clinical user information about individual patients or about the whole population and produces real-time descriptive statistics along with survival analysis (Kaplan Meier estimates and Cox regression model). Integrated data include clinical data from electronic health records from more than 5000 NSCLC patients and breast cancer patients, including more than 1,000,000 clinical notes, more than 900.000 clinical reports, data from wearable devices that produced around 1,000,000 variable values per patient, genomic data and data from quality of life questionnaires. Results: Using the DSP we obtained patients’ profiles, survival probabilities, we stratified over 2000 patients in low and high-risk profile and developed a machine-aided tumour-recurrence prediction model. This computational infrastructure was able to extract knowledge from different data sources allowing clinicians to analyse multiple factors that help stratifying patients by risk, in order to implement a personalized follow up care programme, aiming to make a significant impact in the patients’ quality of life. Conclusions: The reconstruction of the population’s risk profile was achieved and proved useful in clinical practice using AI. This DSP has potential application in clinical settings to improve risk stratification, early detection, and personalized surveillance management of cancer patients and may assist clinicians in their daily clinical practice.
APA, Harvard, Vancouver, ISO, and other styles
42

Heit, John A., Mariza de Andrade, Sebastian M. Armasu, Iftikhar J. Kullo, Jyotishman Pathak, Christopher G. Chute, Omri Gottesman, et al. "Genome-Wide Association Study (GWAS) Of Venous Thromboembolism (VTE) In African-Americans From The Electronic Medical Records & Genomics (eMERGE) Networkm." Blood 122, no. 21 (November 15, 2013): 458. http://dx.doi.org/10.1182/blood.v122.21.458.458.

Full text
Abstract:
Background The incidence of VTE in African-Americans (AAs) is similar to or higher than in Americans of European ancestry. However, the carrier frequencies of inherited thrombophilias common in whites (i.e., Factor V Leiden, Prothrombin G20210A) are very low in AAs, suggesting that other inherited thrombophilias may be associated with VTE in AAs. Objective To identify potentially novel non-hypothesis-driven single nucleotide polymorphisms (SNPs) associated with VTE in AAs. Methods We used the resources of the eMERGE Network to perform a GWAS of VTE in AAs. The eMERGE Network (funded by NHGRI) is comprised of nine sites each with DNA biobanks linked to electronic health records (EHRs). Approximately 39,206 unique DNA samples have been genotyped using either Affymetrix or Illumina genome-wide SNP arrays. Led by the Coordinating Center and eMERGE genomics workgroup, an imputation pipeline was developed for merging genomic data across the different SNP arrays used by the eMERGE sites, to maximize sample size and the power to detect associations. The imputation was performed using the 1000 Genomes Cosmopolitan reference panel which includes 1092 individuals and over 36 million SNPs. From our previously-identified cohort of Olmsted County, MN residents with incident or recurrent VTE, 1996-2005, we derived and validated an EHR-driven phenotype extraction algorithm that leveraged structured data based on ICD-9-CM codes and unstructured data from clinical notes via natural language processing to identify VTE cases and controls with 100% and 94% positive and negative predictive values, respectively. We tested for an association between each SNP and VTE among AAs using unconditional logistic regression, adjusting for age, sex and eMERGE site. Results Among 294 AA VTE cases and 3,661 AA controls (total n=3,955; females, n=2,512), the Factor V Leiden (F5 rs6025) was not analyzed due to a very low minor allele frequency (MAF=0.0036). The prothrombin G20210A (F2 rs1799963) and ABO blood type O (ABO rs8176719) SNPs were not genotyped in any of the arrays and could not be imputed. Among SNPs with an imputation score >0.8, the most significant SNPs associated with VTE were ITPR3 (inositol 1,4,5-triphosphate receptor type 3) rs2229637 (OR=1.65; p=3.61E-07; MAF=0.19) and CLEC7A (C-type lectin domain family 7, member A) rs59819090 (OR=2.16; p=1.06E-06; MAF=0.056). ITPR3 SNPs have been associated with coronary artery aneurysm in Kawasaki disease, type 1 diabetes mellitus and other autoimmune disorders. CLEC7A rs59819090 encodes for a serine82 to leucine nonsynonymous amino acid change in dectin-1. Dectin-1 is a transmembrane innate immune pattern recognition receptor on myeloid cells that upon binding it’s agonist, β-glucans, stimulates cytokine production and the respiratory burst, a prerequisite for formation of neutrophil extracelluar traps (NETs). NETs have been associated with VTE (van Montfoort, et al. Ateriosclero Thromb Vasc Biol 2013;33:147-51). Dectin-1 serine82 is not evolutionarily conserved and the serine82 to leucine amino acid change is predicted to be tolerated, with a moderate effect on protein function. Conclusions ITPR3 rs2229637 and CLAC7A rs59819090 may be associated with VTE in African-Americans. These observations require replication and functional studies toward understanding how they may lie on the causal pathway to VTE. Disclosures: No relevant conflicts of interest to declare.
APA, Harvard, Vancouver, ISO, and other styles
43

Yamoah, Ebenezer N., Gabriela Pavlinkova, and Bernd Fritzsch. "The Development of Speaking and Singing in Infants May Play a Role in Genomics and Dementia in Humans." Brain Sciences 13, no. 8 (August 11, 2023): 1190. http://dx.doi.org/10.3390/brainsci13081190.

Full text
Abstract:
The development of the central auditory system, including the auditory cortex and other areas involved in processing sound, is shaped by genetic and environmental factors, enabling infants to learn how to speak. Before explaining hearing in humans, a short overview of auditory dysfunction is provided. Environmental factors such as exposure to sound and language can impact the development and function of the auditory system sound processing, including discerning in speech perception, singing, and language processing. Infants can hear before birth, and sound exposure sculpts their developing auditory system structure and functions. Exposing infants to singing and speaking can support their auditory and language development. In aging humans, the hippocampus and auditory nuclear centers are affected by neurodegenerative diseases such as Alzheimer’s, resulting in memory and auditory processing difficulties. As the disease progresses, overt auditory nuclear center damage occurs, leading to problems in processing auditory information. In conclusion, combined memory and auditory processing difficulties significantly impact people’s ability to communicate and engage with their societal essence.
APA, Harvard, Vancouver, ISO, and other styles
44

Mani Sekhar, S. R., G. M. Siddesh, Sunilkumar S. Manvi, and K. G. Srinivasa. "Optimized Focused Web Crawler with Natural Language Processing Based Relevance Measure in Bioinformatics Web Sources." Cybernetics and Information Technologies 19, no. 2 (June 1, 2019): 146–58. http://dx.doi.org/10.2478/cait-2019-0021.

Full text
Abstract:
Abstract In the fast growing of digital technologies, crawlers and search engines face unpredictable challenges. Focused web-crawlers are essential for mining the boundless data available on the internet. Web-Crawlers face indeterminate latency problem due to differences in their response time. The proposed work attempts to optimize the designing and implementation of Focused Web-Crawlers using Master-Slave architecture for Bioinformatics web sources. Focused Crawlers ideally should crawl only relevant pages, but the relevance of the page can only be estimated after crawling the genomics pages. A solution for predicting the page relevance, which is based on Natural Language Processing, is proposed in the paper. The frequency of the keywords on the top ranked sentences of the page determines the relevance of the pages within genomics sources. The proposed solution uses a TextRank algorithm to rank the sentences, as well as ensuring the correct classification of Bioinformatics web page. Finally, the model is validated by being compared with a breadth first search web-crawler. The comparison shows significant reduction in run time for the same harvest rate.
APA, Harvard, Vancouver, ISO, and other styles
45

Sochacki, Andrew, Cosmin Adrian Bejan, Shilin Zhao, Travis Spaulding, Thomas Stricker, Yaomin Xu, and Michael R. Savona. "Next Generation Myelofibrosis Risk Analysis in the Electronic Health Record." Blood 132, Supplement 1 (November 29, 2018): 3038. http://dx.doi.org/10.1182/blood-2018-99-113692.

Full text
Abstract:
Abstract Background: Myelofibrosis (MF) is a devastating myeloproliferative neoplasm that is hallmarked by marrow fibrosis, symptomatic extramedullary hematopoiesis, and risk of leukemic transformation, most commonly driven by janus kinase 2 (JAK2) pathway mutations. MF risk classification systems guide prognosis, decisions regarding allogeneic stem cell transplantation, and disease modifying agents. Key systems include the Dynamic International Prognostic Scoring System (DIPSS) 2009, DIPSS plus 2010, Genetics-Based Prognostic Scoring System (GPSS) 2014, and Mutation-Enhanced International Prognostic Scoring System (MIPSS) 2014. System contributions include dynamic scoring (DIPSS), cytogenetics (DIPSS Plus), and high risk molecular mutations (GPSS and MIPSS). To power the next generation of MF risk prognostication, and ascertain new prognostic factors, large scale electronic health record (EHR) and genomic data will need integration. As a proof of concept, we leveraged our de-identified research EHR (2.9 million records) and linked genomic biobank (288,000 patients) to develop an all-inclusive phenotype-genotype-prognostic system for MF and recapitulate DIPSS, DIPSS Plus, GPSS and MIPSS. Methods: Our previously described methods (Bejan et al. AACR 2018) utilized natural language processing to algorithmically identify 306 MF patients. A subset (N=125) had available DNA for genotyping. We automatically extracted: age greater than 65, leukocyte count (WBC) greater than 25x109/L, hemoglobin (Hgb) less than 10g/dL, platelets (PLT) less than 100 x 109/L, circulating myeloid blasts ≥ 1%, and 10% weight loss compared to baseline as a proxy for constitutional symptoms. Transfusion data was not included. Karyotype data was manually reviewed. Next generation sequencing (NGS) was performed on biobanked peripheral blood DNA with the Trusight Myeloid Panel (Illumina®). Genotyped samples were restricted to dates after MF diagnosis. Multivariate Cox proportional hazard analysis was performed on all clinical and genomic variables. DIPSS plus was calculated without adjustment but lacked transfusion data. DIPSS, GPSS and MIPSS scores were calculated by published methods. Results: Multivariate Cox proportional hazard regression identified Hgb (HR=6.4; P=0.006), myeloid blasts (HR=3.8; P=0.03), and ASXL1 (HR=5.2; P=0.02) as significant in our cohort with regard to overall survival (OS). We noted a strong trend for high risk karyotype (HR=5.6; P=0.07). Our DIPSS model median survival (N=120) for each subgroup; low risk (median survival not met), intermediate-1 (108 months), intermediate-2 (47 months) and high risk (6 months) P=0.0002 (Figure 1a). DIPSS Plus (N=122) integrated karyotype data and PLT count with similar survival with the exception of high risk (4 months) P=0.00003 (Figure 1b). The percentage of patients with driver mutations in JAK2V617F (57%), CALR (3%) and MPLW515 (7.2%); JAK2WT, CALRWT and MPLWT triple negative (34%); high molecular risk ASXL1 (15%), EZH2 (6%), IDH1/2 (7%), SRFS2 (17%); other variants of interest TET2 (9.6%), TP53 (29%) and DNMT3A (16.8%). MIPSS (N=125; 48 months follow up) noted low risk, intermediate-1, and intermediate-2 (median survival not met) and high risk (32 months) P=0.0001 (Figure 1c). GPSS (N=125; 48 months follow up) did not demonstrate statistical separation among groups (Figure 1d). Discussion: This proof of concept transformed raw EHR records into clinical risk scores for MF. The addition of retrospective DNA analysis via NGS opens the possibility of multi-institutional EHR-biobank studies to most accurately create a system to define MF risk. Our sample size limited the significance of age, PLTs, poor risk mutations and other variables previously shown to impact OS. Likewise, we lacked the capacity to track transfusion dependence, previously shown to have prognostic relevance. Still, prognostication via the EHR mimics common scoring systems in MF and supports correct MF case selection, accurate laboratory extraction and reproducible genotyping of biobanked samples. Similar to the original GPSS report, our low risk cohort was small (N=2) and will benefit from expansion of genotyping underway. Finally, this phenotype-genotype-prognostic paradigm represents a technical advance and a unique opportunity to deploy patient specific comorbidities from lifetime EHR records to further refine risk across all myeloid disease. Disclosures Savona: Boehringer Ingelheim: Consultancy; Celgene: Consultancy, Membership on an entity's Board of Directors or advisory committees; Incyte: Membership on an entity's Board of Directors or advisory committees, Research Funding.
APA, Harvard, Vancouver, ISO, and other styles
46

Nickols, Nicholas George, Kara Noelle Maxwell, Kyung Min Lee, Ryan Hausler, Tori Anglin-Foote, Isla Garraway, and Julie Ann Lynch. "Frequencies of actionable alterations found by somatic tumor sequencing in veterans with metastatic prostate cancer." Journal of Clinical Oncology 40, no. 6_suppl (February 20, 2022): 178. http://dx.doi.org/10.1200/jco.2022.40.6_suppl.178.

Full text
Abstract:
178 Background: Prostate cancer comprises one third of male Veteran cancers and is their second leading cause of cancer death. Metastatic prostate cancer is lethal. Next Generation Sequencing (NGS) of somatic tumors is recommended for metastatic prostate to identify actionable alterations targeted with approved therapies. Veterans with prostate cancers harboring alterations in genes involved in the DNA damage response (e.g. BRCA1/2) or high microsatellite instability (MSI-High) may be eligible for PARP inhibitors or checkpoint blockade immunotherapy, respectively. Potential candidates may be identified for ongoing clinical trials of novel precision oncology approaches. Methods: This is a retrospective analysis of clinical, genomic, demographic data from Veterans with metastatic prostate cancer who underwent somatic NGS using the Foundation Medicine NGS platform from 2019-February 2021. To be included, prostate cancer was submitted diagnosis for the NGS testing and metastatic disease determined by the VINCI natural language processing tool. Variables included demographic, clinical, and pathological characteristics (self-identified race/ethnicity, age, rurality of residence, Gleason score, specimen site, other cancer diagnosis, mutation frequency). Primary outcome was mutation rates in homologous recombination (HR) genes under current FDA approval for olaparib (ATM, BARD1, BRCA1, BRCA2, BRIP1, CDK12, CHEK1, CHEK2, FANCL, PALB2, RAD51B, RAD51C, RAD51D, RAD54L) or MSI-High. Raw variant data, submitted diagnosis, and clinical data were extracted from the NGS reports and harmonized for further variant annotation. Variant data included chromosome, position, reference and alternate allele, total depth, variant allele depth, and quality scores. Variants were annotated using ANNOVAR. Likely oncogenic and oncogenic mutations were identified using OncoKB. Results: 1,597 Veterans with metastatic prostate cancer underwent FMI NGS testing (63% White, 33% African American, 4% other). Median age was 66 years, 78.6% of cases from >60 years. Of the 1,597 who underwent blood or tumor testing, at least one likely oncogenic mutation in an HR gene under FDA approval for olaparib was found in 369 (23.1%) of Veterans (19% of tissue-based tests, 32.9% of blood-based tests). Of 651 liquid biopsy tests with at least one HR gene mutation, 125 of 214 (52%) had mutations at a variant allele frequency (VAF) <0.5% or were found in an MSI-High sample that could indicate a spurious mutation due to clonal hematopoiesis. 33 patients (2.1%) were MSI-High, (21 tissue-based and 12 blood-based). Frequencies of alterations in ATM (3.6%), CDK12 (5.6%), and BRCA2 (4%) in tissue-based tests were not significantly different from those reported in other series. Conclusions: NGS of somatic tumors from Veterans with metastatic prostate cancer identifies alterations that impact management and clinical outcomes.
APA, Harvard, Vancouver, ISO, and other styles
47

Kelly, Cassidy, and Hui Yang. "A System for Extracting Study Design Parameters from Nutritional Genomics Abstracts." Journal of Integrative Bioinformatics 10, no. 2 (June 1, 2013): 82–93. http://dx.doi.org/10.1515/jib-2013-222.

Full text
Abstract:
Summary The extraction of study design parameters from biomedical journal articles is an important problem in natural language processing (NLP). Such parameters define the characteristics of a study, such as the duration, the number of subjects, and their profile. Here we present a system for extracting study design parameters from sentences in article abstracts. This system will be used as a component of a larger system for creating nutrigenomics networks from articles in the nutritional genomics domain. The algorithms presented consist of manually designed rules expressed either as regular expressions or in terms of sentence parse structure. A number of filters and NLP tools are also utilized within a pipelined algorithmic framework. Using this novel approach, our system performs extraction at a finer level of granularity than comparable systems, while generating results that surpass the current state of the art.
APA, Harvard, Vancouver, ISO, and other styles
48

Golovko, G., and D. Isai. "USAGE OF IT TECHNOLOGIES IN MEDICINE AND GENOMICS." Системи управління, навігації та зв’язку. Збірник наукових праць 4, no. 70 (November 29, 2022): 66–70. http://dx.doi.org/10.26906/sunz.2022.4.066.

Full text
Abstract:
In this article, we will consider what IT technologies are most used in medicine and by genomics methods in particular, also we will take a look at the use of big data in this matter. Additionally, we will learn what a connectome is, analyze 4M and 3V frameworks in genomics. Statistics in medicine is one of the analysis tools experimental data and clinical observations, as well as the language by means of which the obtained mathematical results are reported. However, this is not the only task of statistics in medicine. Mathematical apparatus widely used for diagnostic purposes, solving classification problems and search for new patterns, for setting new scientific hypotheses. The use of statistical programs presupposes knowledge of the basic methods and stages of statistical analysis: their sequence, necessity and sufficiency. In the proposed presentation, the main emphasis is not on detailed presentation of the formulas that make up the statistical methods, and on their essence and application rules. Finally, we talk through genome-wide association studies, methods of statistical processing of medical data and their relevance. In this article, we analyzed the basic concepts of statistics, statistical methods in medicine and data science, considered several areas in which large amounts of data are used that require modern IT technologies, including genomics, genome-wide association studies, visualization and connectome data collection.
APA, Harvard, Vancouver, ISO, and other styles
49

Yu, Lijia, Deepak Kumar Tanwar, Emanuel Diego S. Penha, Yuri I. Wolf, Eugene V. Koonin, and Malay Kumar Basu. "Grammar of protein domain architectures." Proceedings of the National Academy of Sciences 116, no. 9 (February 7, 2019): 3636–45. http://dx.doi.org/10.1073/pnas.1814684116.

Full text
Abstract:
From an abstract, informational perspective, protein domains appear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. We employ a popular linguistic technique, n-gram analysis, to probe the “proteome grammar”—that is, the rules of association of domains that generate various domain architectures of proteins. Comparison of the complexity measures of “protein languages” in major branches of life shows that the relative entropy difference (information gain) between the observed domain architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are observed in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. We also identify the n-grams that represent signatures of the major branches of cellular life. The results of this analysis bolster the analogy between genomes and natural language and show that a “quasi-universal grammar” underlies the evolution of domain architectures in all divisions of cellular life. The nearly universal value of information gain by the domain architectures could reflect the minimum complexity of signal processing that is required to maintain a functioning cell.
APA, Harvard, Vancouver, ISO, and other styles
50

Zolyan, Suren T. "Does a Ribosome Really Read? On the Cognitive Roots and Heuristic Value of Linguistic Metaphors in Molecular Genetics Part 2." Russian Journal of Philosophical Sciences 63, no. 2 (May 16, 2020): 46–62. http://dx.doi.org/10.30727/0235-1188-2019-63-2-46-62.

Full text
Abstract:
We discuss the role of linguistic metaphors as a cognitive frame for the understanding of genetic information processing. The essential similarity between language and genetic information processing has been recognized since the very beginning, and many prominent scholars have noted the possibility of considering genes and genomes as texts or languages. Most of the core terms in molecular biology are based on linguistic metaphors. The processing of genetic information is understood as some operations on text – writing, reading and editing and their specification (encoding/decoding, proofreading, transcription, translation, reading frame). The concept of gene reading can be traced from the archaic idea of the equation of Life and Nature with the Book. Thus, the genetics itself can be metaphorically represented as some operations on text (deciphering, understanding, codebreaking, transcribing, editing, etc.), which are performed by scientists. At the same time linguistic metaphors portrayed gene entities also as having the ability of reading. In the case of such “bio-reading” some essential features similar to the processes of human reading can be revealed: this is an ability to identify the biochemical sequences based on their function in an abstract system and distinguish between type and its contextual tokens of the same type. Metaphors seem to be an effective instrument for representation, as they make possible a two-dimensional description: biochemical by its experimental empirical results and textual based on the cognitive models of comprehension. In addition to their heuristic value, linguistic metaphors are based on the essential characteristics of genetic information derived from its dual nature: biochemical by its substance, textual (or quasi-textual) by its formal organization. It can be concluded that linguistic metaphors denoting biochemical objects and processes seem to be a method of description and explanation of these heterogeneous properties.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography