Academic literature on the topic 'GENOMIC LANGUAGE PROCESSING'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'GENOMIC LANGUAGE PROCESSING.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "GENOMIC LANGUAGE PROCESSING"

1

Routhier, Etienne, and Julien Mozziconacci. "Genomics enters the deep learning era." PeerJ 10 (June 24, 2022): e13613. http://dx.doi.org/10.7717/peerj.13613.

Full text
Abstract:
The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
APA, Harvard, Vancouver, ISO, and other styles
2

Kehl, Kenneth L., Wenxin Xu, Eva Lepisto, Haitham Elmarakeby, Michael J. Hassett, Eliezer M. Van Allen, Bruce E. Johnson, and Deborah Schrag. "Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes." JCO Clinical Cancer Informatics, no. 4 (September 2020): 680–90. http://dx.doi.org/10.1200/cci.20.00020.

Full text
Abstract:
PURPOSE Cancer research using electronic health records and genomic data sets requires clinical outcomes data, which may be recorded only in unstructured text by treating oncologists. Natural language processing (NLP) could substantially accelerate extraction of this information. METHODS Patients with lung cancer who had tumor sequencing as part of a single-institution precision oncology study from 2013 to 2018 were identified. Medical oncologists’ progress notes for these patients were reviewed. For each note, curators recorded whether the assessment/plan indicated any cancer, progression/worsening of disease, and/or response to therapy or improving disease. Next, a recurrent neural network was trained using unlabeled notes to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on labeled assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) among a held-out test set of 10% of patients. Associations between curated response or progression end points and overall survival were measured using Cox models among patients receiving palliative-intent systemic therapy. RESULTS Medical oncologist notes (n = 7,597) were manually curated for 919 patients. In the 10% test set, NLP models replicated human curation with AUROCs of 0.94 for the any-cancer outcome, 0.86 for the progression outcome, and 0.90 for the response outcome. Progression/worsening events identified using NLP models were associated with shortened survival (hazard ratio [HR] for mortality, 2.49; 95% CI, 2.00 to 3.09); response/improvement events were associated with improved survival (HR, 0.45; 95% CI, 0.30 to 0.67). CONCLUSION NLP models based on neural networks can extract meaningful outcomes from oncologist notes at scale. Such models may facilitate identification of clinical and genomic features associated with response to cancer treatment.
APA, Harvard, Vancouver, ISO, and other styles
3

Schubert, Michael. "clustermq enables efficient parallelization of genomic analyses." Bioinformatics 35, no. 21 (May 27, 2019): 4493–95. http://dx.doi.org/10.1093/bioinformatics/btz284.

Full text
Abstract:
Abstract Motivation High performance computing (HPC) clusters play a pivotal role in large-scale bioinformatics analysis and modeling. For the statistical computing language R, packages exist to enable a user to submit their analyses as jobs on HPC schedulers. However, these packages do not scale well to high numbers of tasks, and their processing overhead quickly becomes a prohibitive bottleneck. Results Here we present clustermq, an R package that can process analyses up to three orders of magnitude faster than previously published alternatives. We show this for investigating genomic associations of drug sensitivity in cancer cell lines, but it can be applied to any kind of parallelizable workflow. Availability and implementation The package is available on CRAN and https://github.com/mschubert/clustermq. Code for performance testing is available at https://github.com/mschubert/clustermq-performance. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
4

Le Guen, Yann, François Leroy, Cathy Philippe, Jean-François Mangin, Ghislaine Dehaene-Lambertz, and Vincent Frouin. "Enhancer Locus in ch14q23.1 Modulates Brain Asymmetric Temporal Regions Involved in Language Processing." Cerebral Cortex 30, no. 10 (May 20, 2020): 5322–32. http://dx.doi.org/10.1093/cercor/bhaa112.

Full text
Abstract:
Abstract Identifying the genes that contribute to the variability in brain regions involved in language processing may shed light on the evolution of brain structures essential to the emergence of language in Homo sapiens. The superior temporal asymmetrical pit (STAP), which is not observed in chimpanzees, represents an ideal phenotype to investigate the genetic variations that support human communication. The left STAP depth was significantly associated with a predicted enhancer annotation located in the 14q23.1 locus, between DACT1 and KIAA0586, in the UK Biobank British discovery sample (N = 16 515). This association was replicated in the IMAGEN cohort (N = 1726) and the UK Biobank non-British validation sample (N = 2161). This genomic region was also associated to a lesser extent with the right STAP depth and the formation of sulcal interruptions, “plis de passage,” in the bilateral STAP but not with other structural brain MRI phenotypes, highlighting its notable association with the superior temporal regions. Diffusion MRI emphasized an association with the fractional anisotropy of the left auditory fibers of the corpus callosum and with networks involved in linguistic processing in resting-state functional MRI. Overall, this evidence demonstrates a specific relationship between this locus and the establishment of the superior temporal regions that support human communication.
APA, Harvard, Vancouver, ISO, and other styles
5

Konstantinidis, George, Adriane Chapman, Mark J. Weal, Ahmed Alzubaidi, Lisa M. Ballard, and Anneke M. Lucassen. "The Need for Machine-Processable Agreements in Health Data Management." Algorithms 13, no. 4 (April 7, 2020): 87. http://dx.doi.org/10.3390/a13040087.

Full text
Abstract:
Data processing agreements in health data management are laid out by organisations in monolithic “Terms and Conditions” documents written in natural legal language. These top-down policies usually protect the interest of the service providers, rather than the data owners. They are coarse-grained and do not allow for more than a few opt-in or opt-out options for individuals to express their consent on personal data processing, and these options often do not transfer to software as they were intended to. In this paper, we study the problem of health data sharing and we advocate the need for individuals to describe their personal contract of data usage in a formal, machine-processable language. We develop an application for sharing patient genomic information and test results, and use interactions with patients and clinicians in order to identify the particular peculiarities a privacy/policy/consent language should offer in this complicated domain. We present how Semantic Web technologies can have a central role in this approach by providing the formal tools and features required in such a language. We present our ongoing approach to construct an ontology-based framework and a policy language that allows patients and clinicians to express fine-grained consent, preferences or suggestions on sharing medical information. Our language offers unique features such as multi-party ownership of data or data sharing dependencies. We evaluate the landscape of policy languages from different areas, and show how they are lacking major requirements needed in health data management. In addition to enabling patients, our approach helps organisations increase technological capabilities, abide by legal requirements, and save resources.
APA, Harvard, Vancouver, ISO, and other styles
6

Guan, Meijian, Samuel Cho, Robin Petro, Wei Zhang, Boris Pasche, and Umit Topaloglu. "Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes." JAMIA Open 2, no. 1 (January 3, 2019): 139–49. http://dx.doi.org/10.1093/jamiaopen/ooy061.

Full text
Abstract:
Abstract Objectives Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.
APA, Harvard, Vancouver, ISO, and other styles
7

Miyano, Satoru. "IL-3 Changing Cancer Genomics and Cancer Genomic Medicine by Artificial Intelligence and Large-Scale Data Analysis." Neuro-Oncology Advances 3, Supplement_6 (December 1, 2021): vi1. http://dx.doi.org/10.1093/noajnl/vdab159.002.

Full text
Abstract:
Abstract In MEXT Program for Scientific Research on Innovative Areas “Systems Cancer” and “Systems Cancer in Neo-Dimension” (2010-2019), we developed a large-scale genome data analysis pipeline called Genomon in collaboration with Professor Seiji Ogawa (Kyoto University). Our efforts successfully produced innovative results on cancer genomics. This system is implemented on the supercomputers SHIROKANE and FUGAKU. One of the contributions unraveled the overall picture of genetic abnormalities in malignant brain tumors (Mutational landscape and clonal architecture in grade II and III gliomas. Nat Genet 2015) that exploited Genomon on SHIROKANE. However, with the spread of new measurement technology and new computing environments, no one thinks that the future can be figured on this simple extension. On the other hand, for cancer genomic medicine, Institute of Medical Science University of Tokyo made a research team analyzing whole genome sequences. The challenge we faced was to transform thousands to millions of genomic aberrations per case into precision medicine. It is what we now call “digital transformation.” IBM’s Watson for Genomics was introduced for our research purpose. In the process, we identified the effectiveness of AI, the indispensability of specialist intervention, and bottlenecks. We recognized that natural language processing technology such as BERT and Google Knowledge Graph AI technology will open up the future. Automatic document creation is also a realistic issue. Cancer research is getting more difficult and larger in scale. For example, analysis of genomic data from 60, 954 cases revealed a new underlying mechanism in which multiple mutations within the same oncogene synergistically work (Nature 2021). AI with an accuracy of X% does not seem to be the goal. What is needed is not a black box, but explainable AI that explains “why” in a human-understandable way. We are currently conducting research with Fujitsu Laboratories for this direction.
APA, Harvard, Vancouver, ISO, and other styles
8

Garzon, Max H., Kiran C. Bobba, Andrew Neel, and Vinhthuy Phan. "DNA-Based Indexing." International Journal of Nanotechnology and Molecular Computation 2, no. 3 (July 2010): 25–45. http://dx.doi.org/10.4018/jnmc.2010070102.

Full text
Abstract:
DNA has been acknowledged as a suitable medium for massively parallel computing and as a “smart” glue for self-assembly. In this paper, a third capability of DNA is described in detail as memory capable of encoding and processing large amounts of data so that information can be retrieved associatively based on content. The technique is based on a novel representation of data on DNA that can shed information on the way DNA-, RNA- and other biomolecules encode information, which may be potentially important in applications to fields like bioinformatics and genetics, and natural language processing. Analyses are also provided of the sensitivity, robustness, and bounds on the theoretical capacity of the memories. Finally, the potential use of the memories are illustrated with two applications, one in genomic analysis for identification and classification, another in information retrieval from text data in abiotic form.
APA, Harvard, Vancouver, ISO, and other styles
9

Klein, Harry, Tali Mazor, Matthew Galvin, Jason Hansel, Emily Mallaber, Pavel Trukhanov, Joyce Yu, et al. "Abstract 1067: MatchMiner: An open-source AI precision medicine trial matching platform." Cancer Research 83, no. 7_Supplement (April 4, 2023): 1067. http://dx.doi.org/10.1158/1538-7445.am2023-1067.

Full text
Abstract:
Abstract As the number of precision medicine (PM) trials and patient genomic data has grown, it has become challenging for clinicians and trial staff to identify PM trial options for patients. Several trial matching software platforms have been developed to match genomic data from patients with PM trials, but these existing platforms are proprietary and are not easily accessible for adoption by institutions. At Dana-Farber Cancer Institute (DFCI), we have addressed this challenge by developing our own open-source institutional trial matching software, MatchMiner. MatchMiner algorithmically matches patient genomic and clinical data with PM trial eligibility data. Trial eligibility data is manually curated into a human-readable markup language, called clinical trial markup language (CTML), for matching with patient genomic data. MatchMiner has 2 main modes of clinical use: (1) patient-centric, where clinicians search for trial matches for individual patients and (2) trial-centric, where trial staff identify patients that match their trial’s genomic eligibility. We recently described MatchMiner’s usage at DFCI and since our report, we have added 90 additional trial consents facilitated by MatchMiner (>250 trial consents, called MatchMiner consents [MMC]). Here, we describe new characteristics of our MMC including which user mode (patient-centric or trial-centric) was used to match the consent, genomic alterations and cancer types that matched to eligibility criteria, and whether the patient went onto trial. MMCs were mostly identified by patient-centric mode (70%), genomic alterations and cancer types among MMC were diverse (n=55 genes and n=20 cancer types), and 87% of MMC went on trial. Among MMCs, the most common altered genes leading to trial eligibility were ERBB2 and KRAS in breast cancer and lung cancer, which is consistent with the number of therapies targeting ERBB2 and KRAS. MMCs also included patients with rare cancer types, like extraskeletal myxoid chondrosarcoma, as well as rare genomic alterations, such as NTRK fusions. Thus, MatchMiner has been successful at facilitating PM trial matching for a broad range of genomic alterations and cancer types at DFCI. MatchMiner matches patients to trials as soon as their genomic report is available, however, many patients are not yet ready to enroll onto a trial because their cancer is responding to the standard of care or they are in a remission period. To address this problem, we are evaluating the use of artificial intelligence (AI) to identify patients that may be ready for a new treatment option. After trial matches have been generated by MatchMiner, radiology scan text from patients’ tumor scans is run through a natural language processing (NLP) model to identify patients who are more likely to be ready to enroll onto a trial. By using NLP to filter trial matches, we hope to improve MatchMiner’s efficiency of finding trial matches and provide more timely trial options for patients. Citation Format: Harry Klein, Tali Mazor, Matthew Galvin, Jason Hansel, Emily Mallaber, Pavel Trukhanov, Joyce Yu, James Lindsay, Kenneth Kehl, Michael Hassett, Ethan Cerami. MatchMiner: An open-source AI precision medicine trial matching platform [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 1067.
APA, Harvard, Vancouver, ISO, and other styles
10

Graves, Jordan, Jacob Byerly, Eduardo Priego, Naren Makkapati, S. Vince Parish, Brenda Medellin, and Monica Berrondo. "A Review of Deep Learning Methods for Antibodies." Antibodies 9, no. 2 (April 28, 2020): 12. http://dx.doi.org/10.3390/antib9020012.

Full text
Abstract:
Driven by its successes across domains such as computer vision and natural language processing, deep learning has recently entered the field of biology by aiding in cellular image classification, finding genomic connections, and advancing drug discovery. In drug discovery and protein engineering, a major goal is to design a molecule that will perform a useful function as a therapeutic drug. Typically, the focus has been on small molecules, but new approaches have been developed to apply these same principles of deep learning to biologics, such as antibodies. Here we give a brief background of deep learning as it applies to antibody drug development, and an in-depth explanation of several deep learning algorithms that have been proposed to solve aspects of both protein design in general, and antibody design in particular.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "GENOMIC LANGUAGE PROCESSING"

1

Dyremark, Johanna, and Caroline Mayer. "Bedömning av elevuppsatser genom maskininlärning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-262041.

Full text
Abstract:
Betygsättning upptar idag en stor del av lärares arbetstid och det finns en betydande inkonsekvens vid bedömning utförd av olika lärare. Denna studie ämnar undersöka vilken träffsäkerhet som en automtiserad bedömningsmodell kan uppnå. Tre maskininlärningsmodeller för klassifikation i form av Linear Discriminant Analysis, K-Nearest Neighbor och Random Forest tränas och testas med femfaldig korsvalidering på uppsatser från nationella prov i svenska. Klassificeringen baseras på språk och formrelaterade attribut inkluderande ord och teckenvisa längdmått, likhet med texter av olika formalitetsgrad och grammatikrelaterade mått. Detta utmynnar i ett maximalt quadratic weighted kappa-värde på 0,4829 och identisk överensstämmelse med expertgivna betyg i 57,53 % av fallen. Dessa resultat uppnåddes av en modell baserad på Linear Discriminant Analysis och uppvisar en högre korrelation med expertgivna betyg än en ordinarie lärare. Trots pågående digitalisering inom skolväsendet kvarstår ett antal hinder innan fullständigt maskininlärningsbaserad bedömning kan realiseras, såsom användarnas inställning till tekniken, etiska dilemman och teknikens svårigheter med förståelse av semantik. En delvis integrerad automatisk betygssättning har dock potential att identifiera uppsatser där behov av dubbelrättning föreligger, vilket kan öka överensstämmelsen vid storskaliga prov till en låg kostnad.
Today, a large amount of a teacher’s workload is comprised of essay scoring and there is a large variability between teachers’ gradings. This report aims to examine what accuracy can be acceived with an automated essay scoring system for Swedish. Three following machine learning models for classification are trained and tested with 5-fold cross-validation on essays from Swedish national tests: Linear Discriminant Analysis, K-Nearest Neighbour and Random Forest. Essays are classified based on 31 language structure related attributes such as token-based length measures, similarity to texts with different formal levels and use of grammar. The results show a maximal quadratic weighted kappa value of 0.4829 and a grading identical to expert’s assessment in 57.53% of all tests. These results were achieved by a model based on Linear Discriminant Analysis and showed higher inter-rater reliability with expert grading than a local teacher. Despite an ongoing digitilization within the Swedish educational system, there are a number of obstacles preventing a complete automization of essay scoring such as users’ attitude, ethical issues and the current techniques difficulties in understanding semantics. Nevertheless, a partial integration of automatic essay scoring has potential to effectively identify essays suitable for double grading which can increase the consistency of large-scale tests to a low cost.
APA, Harvard, Vancouver, ISO, and other styles
2

Akhurst, Timothy John. "The role of parallel computing in bioinformatics." Thesis, Rhodes University, 2005. http://eprints.ru.ac.za/162/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

CHAKRABORTY, RAJKUMAR. "GENOMIC LANGUAGE PROCESSING USING MACHINE LEARNING." Thesis, 2023. http://dspace.dtu.ac.in:8080/jspui/handle/repository/20063.

Full text
Abstract:
The purpose of developing biological language models (BLMs) is to enhance our capacity to comprehend and analyse biological sequences, such as DNA, RNA, and protein sequences. These sequences contain crucial information about the structure and function of living organisms and are involved in virtually every biological process. Nonetheless, analysing biological sequences can be difficult due to their complexity and enormous potential. Specifically, the functions and properties of a large number of coding and non-coding DNA and RNA sequences remain poorly understood. This thesis presents three objectives related to the application of natural language processing techniques in the field of bio-molecule sciences. The first objective involves using a combination of a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) network, stacked in a sequence-to-sequence (Seq2Seq) architecture, to predict microRNA sequences from mRNA sequences. The microRNA are small, generally 28 bp long, non-coding RNAs that play a role in various physiological and disease processes. Identifying mRNA targeted by microRNAs is a challenge, and researchers often rely on computational programs to initially identify target candidates for subsequent validation. In this work, a neural network was trained to predict microRNA from the bound target segment in mRNA using a dataset of experimentally validated and cleaned microRNA-mRNA sequence pairs from TarBase v8. Convolutional neural networks (CNNs) were used to recognize patterns in mRNA segments and extract features, while long short-term memory (LSTM) networks in a seq2seq architecture were used to predict microRNA sequences. The model achieved an accuracy of 80% and was validated using experimentally verified microRNA-RNA pairs involved in skin diseases from an in-house database called miDerma, correctly predicting an average of 72% of the microRNAs from mRNA in each case. The package, called "model: A MicroRNA sequence prediction tool from RNA sequence based on CNNs, LSTMs, and seq2seq architecture," allows users to input a gene symbol and retrieves the protein coding transcript's sequence from the Ensemble REST API to predict a list of microRNAs that may bind to potential target segments in the mRNA. The second objective involves using natural language processing techniques, including an embedding layer, a CNN layer, and a bidirectional LSTM layer, to predict disordered regions in proteins. Intrinsically disordered regions (IDRs) are important for various physiological processes and diseases and play a complementary role to the functions of structured proteins. They can be identified through multiple experimental techniques, but these methods can be costly and time-consuming. As a result, researchers rely on computational strategies to predict probable IDRs/IDPs before conducting further validation through experimental studies. While there have been significant advancements in predicting long and short IDRs in recent years, there is still scope for algorithmic improvement. This study aims to improve the prediction of IDRs by using neural networks, specifically convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory (LSTM) networks, as well as natural language processing (NLP) techniques. The study also explores the use of different input sequence lengths and various embedding sizes for the CNN and LSTM models. The results show that the CNN and LSTM models outperform state-of-the-art techniques for predicting IDRs, with the LSTM model achieving the highest accuracy of 85.7%. The study also demonstrates the effectiveness of using NLP techniques for analyzing protein sequences and the importance of carefully selecting model architectures and hyperparameters to achieve good performance. The third objective involves using an autoencoder, a type of deep learning architecture, to generate drug analogues by reconstructing chemical SMILES (Simplified Molecular-Input Line-Entry System) representations of molecules and varying the batch size and latent space dimensionality of the autoencoder. The design of drug analogues involves the creation of modified versions of existing drugs to improve their efficacy, stability, and safety. Deep Page | ix learning techniques, such as autoencoders, can be used to generate new drug analogues through a process of chemical structure reproduction. In this study, an autoencoder was trained on chemical SMILES data from the ChEMBL database and used to generate 157 variants of the drug Vandetanib by adding noise to its latent representation and reconstructing the resulting compounds using a decoder. Molecular docking and dynamics simulations were then performed to determine which of these analogues had a higher binding affinity than Vandetanib. At least two of the analogues had a higher binding affinity than the control compound. While this model has the potential to generate a wide range of molecules, it may have difficulty generating molecules with SMILES strings longer than 80 characters due to a lack of training data of SMILES string length above 80 characters. The synthesis and laboratory testing of the generated molecules to determine their potential as drugs also presents a challenge. However, this study has the potential to make significant contributions to the field of automatic drug analogue prediction and could be a valuable addition to the current scientific literature. The study presents several potential applications for its microRNA, protein disorder region finding, and drug analogue generation models. The microRNA prediction model could aid in the development of therapies for diseases by identifying microRNA sequences that regulate gene expression. The protein disorder prediction model could be used in drug design and protein engineering by identifying disordered regions in proteins that play a role in various protein functions. The drug analogue generation model has the potential to generate new drug analogues with desired properties and could be used in drug discovery and the optimization of existing drugs. Overall, this research has the potential to make significant contributions to biomedical research and could lead to the development of new therapies and drugs for diseases, as well as new bio-molecular language models for other tasks.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "GENOMIC LANGUAGE PROCESSING"

1

Dwyer, Rex A. Genomic Perl: From bioinformatics basics to working code. Cambridge: Cambridge University Press, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ian, Korf, ed. UNIX and Perl to the rescue!: A field guide for the life sciences (and other data-rich pursuits). New York: Cambridge University Press, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Genomic Perl. Cambridge University Press, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Dwyer, Rex A. Genomic Perl: From Bioinformatics Basics to Working Code. Cambridge University Press, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Dwyer, Rex A. Genomic Perl: From Bioinformatics Basics to Working Code. Cambridge University Press, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Dwyer, Rex A. Genomic Perl: From Bioinformatics Basics to Working Code. Cambridge University Press, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Dwyer, Rex A. Genomic Perl: From Bioinformatics Basics to Working Code. Cambridge University Press, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Akalin, Altuna. Computational Genomics with R. Taylor & Francis Group, 2020.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Akalin, Altuna. Computational Genomics with R. Taylor & Francis Group, 2020.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Akalin, Altuna. Computational Genomics with R. Taylor & Francis Group, 2020.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "GENOMIC LANGUAGE PROCESSING"

1

Botsis, Taxiarchis, Joseph Murray, Alessandro Leal, Doreen Palsgrove, Wei Wang, James R. White, Victor E. Velculescu, and Valsamo Anagnostou. "Natural Language Processing Approaches for Retrieval of Clinically Relevant Genomic Information in Cancer." In Studies in Health Technology and Informatics. IOS Press, 2022. http://dx.doi.org/10.3233/shti220735.

Full text
Abstract:
The accelerating impact of genomic data in clinical decision-making has generated a paradigm shift from treatment based on the anatomic origin of the tumor to the incorporation of key genomic features to guide therapy. Assessing the clinical validity and utility of the genomic background of a patient’s cancer represents one of the emerging challenges in oncology practice, demanding the development of automated platforms for extracting clinically relevant genomic information from medical texts. We developed PubMiner, a natural language processing tool to extract and interpret cancer type, therapy, and genomic information from biomedical abstracts. Our initial focus has been the retrieval of gene names, variants, and negations, where PubMiner performed highly in terms of total recall (91.7%) with a precision of 79.7%. Our next steps include developing a web-based interface to promote personalized treatment based on each tumor’s unique genomic fingerprints.
APA, Harvard, Vancouver, ISO, and other styles
2

Yee, David P., and Tim Hunkapiller. "Overview: A System for Tracking and Managing the Results from Sequence Comparison Programs." In Pattern Discovery in Biomolecular Data. Oxford University Press, 1999. http://dx.doi.org/10.1093/oso/9780195119404.003.0017.

Full text
Abstract:
The Human Genome Project was launched in the early 1990s to map, sequence, and study the function of genomes derived from humans and a number of model organisms such as mouse, rat, fruit fly, worm, yeast, and Escherichia coli. This ambitious project was made possible by advances in high-speed DNA sequencing technology (Hunkapiller et al., 1991). To date, the Human Genome Project and other large-scale sequencing projects have been enormously successful. The complete genomes of several microbes (such as Hemophilus influenzae Rd, Mycoplasma genitalium, and Methanococcus jannaschii) have been completely sequenced. The genome of bacteriophage T4 is complete, and the 4.6-megabase sequence of E. coli and the 13-megabase genome of Saccharomyces cerevisiae have just recently also been completed. There are 71 megabases of the nematode Caenorhabditis elegans available. Six megabases of mouse and 60 megabases of human genomic sequence have been finished, which represent 0.2% and 2% of their respective genomes. Finally, more than 1 million expressed sequence tags derived from human and mouse complementary DNA expression libraries are publicly available. These public data, in addition to private and proprietary DNA sequence databases, represent an enormous information-processing challenge and data-mining opportunity. The need for common interfaces and query languages to access heterogeneous sequence databases is well documented, and several good systems are well underway to provide those interfaces (Woodsmall and Benson, 1993; Marr, 1996). Our own work on database and program interoperability in this domain and in computational chemistry (Gushing, 1995) has shown, however, that providing the interface is but the first step toward making these databases fully useful to the researcher. (Here, the term “database” means a collection of data in electronic form, which may not necessarily be physically deposited in a database management system [DBMS]. A scientist’s database could thus be a collection of flat files, where the term “database” means “data stored in a DBMS” is clear from the context.) Deciphering the genomes of sequenced organisms falls into the realm of analysis; there is now plenty of sequence data. The most common form of sequence analysis involves the identification of homologous relationships among similar sequences.
APA, Harvard, Vancouver, ISO, and other styles
3

Lussier, Yves A. "Ontologies for natural language processing." In Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. Chichester: John Wiley & Sons, Ltd, 2005. http://dx.doi.org/10.1002/047001153x.g408212.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Dar, Gowhar Mohiuddin, Ashok Sharma, and Parveen Singh. "Deep Learning Models for Detection and Diagnosis of Alzheimer's Disease." In Advances in Medical Technologies and Clinical Practice, 140–49. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-7188-0.ch011.

Full text
Abstract:
The chapter explores the implications of deep learning in medical sciences, focusing on deep learning concerning natural language processing, computer vision, reinforcement learning, big data, and blockchain influence on some areas of medicine and construction of end-to-end systems with the help of these computational techniques. The deliberation of computer vision in the study is mainly concerned with medical imaging and further usage of natural language processing to spheres such as electronic wellbeing record data. Application of deep learning in genetic mapping and DNA sequencing termed as genomics and implications of reinforcement learning about surgeries assisted by robots are also overviewed.
APA, Harvard, Vancouver, ISO, and other styles
5

Nagarajan, Srikantan S., Kamalini G. Ranasinghe, and Keith A. Vossel. "Brain Imaging With Magnetoencephalography During Rest and During Speech and Language Processing." In Genomics, Circuits, and Pathways in Clinical Neuropsychiatry, 233–45. Elsevier, 2016. http://dx.doi.org/10.1016/b978-0-12-800105-9.00015-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Raychaudhuri, Soumya. "Textual Profiles of Genes." In Computational Text Analysis. Oxford University Press, 2006. http://dx.doi.org/10.1093/oso/9780198567400.003.0010.

Full text
Abstract:
Using algorithms to analyze natural language text is a challenging task. Recent advances in algorithms, and increased availability of computational power and online text has resulted in incremental progress in text analysis (Rosenfeld 2000). For certain specific applications natural language processing algorithms can rival human performance. Even the simplest algorithms and approaches can glean information from the text and do it at a rate much faster than humans. In the case of functional genomics, where an individual assay might include thousands of genes, and tens of thousands of documents pertinent to those genes, the speed of text mining approaches offers a great advantage to investigators trying to understand the data. In this chapter, we will focus on techniques to convert text into simple numerical vectors to facilitate computation. Then we will go on to discuss how these vectors can be combined into textual profiles for genes; these profiles offer additional biologically meaningful information that can complement available genomics data sets. The previous chapter introduced methods to analyze gene expression data and sequence data. The focus of many analytical methods was comparing and grouping genes by similarity. Some sequence analysis methods like dynamic programming and BLAST offer opportunities to compare two sequences, while multiple sequence alignment and weight matrices provide a means to compare families of sequences. Similarly, gene expression array analysis approaches are mostly contingent on distance metrics that compare gene expression profiles to each other; clustering and classification algorithms provide a means to group similar genes. The primary goal of applying these methods was to transfer knowledge between similar genes. We can think of the scientific literature as yet another data type and define document similarity metrics. Algorithms that tap the knowledge locked in the scientific literature require sophisticated natural language processing approaches. On the other hand, assessing document similarity is a comparatively easier task. A measure of document similarity that corresponds to semantic similarity between documents can also be powerful. For example, we might conclude that two genes are related if documents that refer to them are semantically similar.
APA, Harvard, Vancouver, ISO, and other styles
7

Raychaudhuri, Soumya. "Finding Gene Names." In Computational Text Analysis. Oxford University Press, 2006. http://dx.doi.org/10.1093/oso/9780198567400.003.0016.

Full text
Abstract:
Successful use of text mining algorithms to facilitate genomics research hinges on the ability to recognize the names of genes in scientific text. In this chapter we address the critical issue of gene name recognition. Once gene names can be recognized in the scientific text, we can begin to understand what the text says about those genes. This is a much more challenging issue than one might appreciate at first glance. Gene names can be inconsistent and confusing; automated gene name recognition efforts have therfore turned out to be quite challenging to implement with high accuracy. Gene name recognition algorithms have a wide range of useful applications. Until this chapter we have been avoiding this issue and have been using only gene-article indices. In practice these indices are manually assembled. Gene name recognition algorithms offer the possibility of automating and expediting the laborious task of building reference indices. Article indices can be built that associate articles to genes based on whether or not the article mentions the gene by name. In addition, gene name recognition is the first step in doing more detailed sentence-by-sentence text analysis. For example, in Chapter 10 we will talk about identifying relationships between genes from text. Frequently, this requires identifying sentences refering to two gene names, and understanding what sort of relationship the sentence is describing between these genes. Sophisticated natural language processing techniques to parse sentences and understand gene function cannot be done in a meaningful way without recognizing where the gene names are in the first place. The major concepts of this chapter are presented in the frame box. We begin by describing the commonly used strategies that can be used alone or in concert to identify gene names. At the end of the chapter we introduce one successful name finding algorithm that combines many of the different strategies. There are several commonly used approaches that can be exploited to recognize gene names in text (Chang, Shutze, et al. 2004). Often times these approaches can be combined into even more effective multifaceted algorithms.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "GENOMIC LANGUAGE PROCESSING"

1

Cao, Jiarun, Niels Peek, Andrew Renehan, and Sophia Ananiadou. "Gaussian Distributed Prototypical Network for Few-shot Genomic Variant Detection." In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.bionlp-1.2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Dordiuk, Vladislav, Ekaterina Demicheva, Fernando Polanco Espino, and Konstantin Ushenin. "Natural language processing for clusterization of genes according to their functions." In 2022 Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine (CSGB). IEEE, 2022. http://dx.doi.org/10.1109/csgb56354.2022.9865330.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Cahyawijaya, Samuel, Tiezheng Yu, Zihan Liu, Xiaopu Zhou, Tze Wing Tiffany Mak, Yuk Yu Nancy Ip, and Pascale Fung. "SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study." In Proceedings of the 21st Workshop on Biomedical Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.bionlp-1.14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Mansouri Ghiasi, Nika, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, et al. "GenStore: a high-performance in-storage processing system for genome sequence analysis." In ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503222.3507702.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Grechkin, Maxim, Hoifung Poon, and Bill Howe. "EZLearn: Exploiting Organic Supervision in Automated Data Annotation." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/568.

Full text
Abstract:
Many real-world applications require automated data annotation, such as identifying tissue origins based on gene expressions and classifying images into semantic categories. Annotation classes are often numerous and subject to changes over time, and annotating examples has become the major bottleneck for supervised learning methods. In science and other high-value domains, large repositories of data samples are often available, together with two sources of organic supervision: a lexicon for the annotation classes, and text descriptions that accompany some data samples. Distant supervision has emerged as a promising paradigm for exploiting such indirect supervision by automatically annotating examples where the text description contains a class mention in the lexicon. However, due to linguistic variations and ambiguities, such training data is inherently noisy, which limits the accuracy in this approach. In this paper, we introduce an auxiliary natural language processing system for the text modality, and incorporate co-training to reduce noise and augment signal in distant supervision. Without using any manually labeled data, our EZLearn system learned to accurately annotate data samples in functional genomics and scientific figure comprehension, substantially outperforming state-of-the-art supervised methods trained on tens of thousands of annotated examples.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography