To see the other types of publications on this topic, follow the link: Events in natural language processing.

Journal articles on the topic 'Events in natural language processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Events in natural language processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

KARTTUNEN, LAURI, KIMMO KOSKENNIEMI, and GERTJAN VAN NOORD. "Finite state methods in natural language processing." Natural Language Engineering 9, no. 1 (March 2003): 1–3. http://dx.doi.org/10.1017/s1351324903003139.

Full text
Abstract:
Finite state methods have been in common use in various areas of natural language processing (NLP) for many years. A series of specialized workshops in this area illustrates this. In 1996, András Kornai organized a very successful workshop entitled Extended Finite State Models of Language. One of the results of that workshop was a special issue of Natural Language Engineering (Volume 2, Number 4). In 1998, Kemal Oflazer organized a workshop called Finite State Methods in Natural Language Processing. A selection of submissions for this workshop were later included in a special issue of Computational Linguistics (Volume 26, Number 1). Inspired by these events, Lauri Karttunen, Kimmo Koskenniemi and Gertjan van Noord took the initiative for a workshop on finite state methods in NLP in Helsinki, as part of the European Summer School in Language, Logic and Information. As a related special event, the 20th anniversary of two-level morphology was celebrated. The appreciation of these events led us to believe that once again it should be possible, with some additional submissions, to compose an interesting special issue of this journal.
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Yong, Xiaojun Yang, Min Zuo, Qingyu Jin, Haisheng Li, and Qian Cao. "Deep Structured Learning for Natural Language Processing." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 3 (July 9, 2021): 1–14. http://dx.doi.org/10.1145/3433538.

Full text
Abstract:
The real-time and dissemination characteristics of network information make net-mediated public opinion become more and more important food safety early warning resources, but the data of petabyte (PB) scale growth also bring great difficulties to the research and judgment of network public opinion, especially how to extract the event role of network public opinion from these data and analyze the sentiment tendency of public opinion comment. First, this article takes the public opinion of food safety network as the research point, and a BLSTM-CRF model for automatically marking the role of event is proposed by combining BLSTM and conditional random field organically. Second, the Attention mechanism based on vocabulary in the field of food safety is introduced, the distance-related sequence semantic features are extracted by BLSTM, and the emotional classification of sequence semantic features is realized by using CNN. A kind of Att-BLSTM-CNN model for the analysis of public opinion and emotional tendency in the field of food safety is proposed. Finally, based on the time series, this article combines the role extraction of food safety events and the analysis of emotional tendency and constructs a net-mediated public opinion early warning model in the field of food safety according to the heat of the event and the emotional intensity of the public to food safety public opinion events.
APA, Harvard, Vancouver, ISO, and other styles
3

Ozonoff, Al, Carly E. Milliren, Kerri Fournier, Jennifer Welcher, Assaf Landschaft, Mihail Samnaliev, Mehmet Saluvan, Mark Waltzman, and Amir A. Kimia. "Electronic surveillance of patient safety events using natural language processing." Health Informatics Journal 28, no. 4 (October 2022): 146045822211324. http://dx.doi.org/10.1177/14604582221132429.

Full text
Abstract:
Objective We describe our approach to surveillance of reportable safety events captured in hospital data including free-text clinical notes. We hypothesize that a) some patient safety events are documented only in the clinical notes and not in any other accessible source; and b) large-scale abstraction of event data from clinical notes is feasible. Materials and Methods We use regular expressions to generate a training data set for a machine learning model and apply this model to the full set of clinical notes and conduct further review to identify safety events of interest. We demonstrate this approach on peripheral intravenous (PIV) infiltrations and extravasations (PIVIEs). Results During Phase 1, we collected 21,362 clinical notes, of which 2342 were reviewed. We identified 125 PIV events, of which 44 cases (35%) were not captured by other patient safety systems. During Phase 2, we collected 60,735 clinical notes and identified 440 infiltrate events. Our classifier demonstrated accuracy above 90%. Conclusion Our method to identify safety events from the free text of clinical documentation offers a feasible and scalable approach to enhance existing patient safety systems. Expert reviewers, using a machine learning model, can conduct routine surveillance of patient safety events.
APA, Harvard, Vancouver, ISO, and other styles
4

Guda, Vanitha, and SureshKumar Sanampudi. "Event Time Relationship in Natural Language Text." International Journal of Recent Contributions from Engineering, Science & IT (iJES) 7, no. 3 (September 25, 2019): 4. http://dx.doi.org/10.3991/ijes.v7i3.10985.

Full text
Abstract:
<p>Due to the numerous information needs, retrieval of events from a given natural language text is inevitable. In natural language processing (NLP) perspective, "Events" are situations, occurrences, real-world entities or facts. Extraction of events and arranging them on a timeline is helpful in various NLP application like building the summary of news articles, processing health records, and Question Answering System (QA) systems. This paper presents a framework for identifying the events and times from a given document and representing them using a graph data structure. As a result, a graph is derived to show event-time relationships in the given text. Events form the nodes in a graph, and edges represent the temporal relations among the nodes. Time of an event occurrence exists in two forms namely qualitative (like before, after, duringetc) and quantitative (exact time points/periods). To build the event-time-event structure quantitative time is normalized to qualitative form. Thus obtained temporal information is used to label the edges among the events. Data set released in the shared task EvTExtract of (Forum for Information Retrieval Extraction) FIRE 2018 conference is identified to evaluate the framework. Precision and recall are used as evaluation metrics to access the performance of the proposed framework with other methods mentioned in state of the art with 85% of accuracy and 90% of precision.</p>
APA, Harvard, Vancouver, ISO, and other styles
5

Balgi, Sanjana Madhav. "Fake News Detection using Natural Language Processing." International Journal for Research in Applied Science and Engineering Technology 10, no. 6 (June 30, 2022): 4790–95. http://dx.doi.org/10.22214/ijraset.2022.45095.

Full text
Abstract:
Abstract: Fake news is information that is false or misleading but is reported as news. The tendency for people to spread false information is influenced by human behaviour; research indicates that people are drawn to unexpected fresh events and information, which increases brain activity. Additionally, it was found that motivated reasoning helps spread incorrect information. This ultimately encourages individuals to repost or disseminate deceptive content, which is frequently identified by click-bait and attention-grabbing names. The proposed study uses machine learning and natural language processing approaches to identify false news specifically, false news items that come from unreliable sources. The dataset used here is ISOT dataset which contains the Real and Fake news collected from various sources. Web scraping is used here to extract the text from news website to collect the present news and is added into the dataset. Data pre-processing, feature extraction is applied on the data. It is followed by dimensionality reduction and classification using models such as Rocchio classification, Bagging classifier, Gradient Boosting classifier and Passive Aggressive classifier. To choose the best functioning model with an accurate prediction for fake news, we compared a number of algorithms.
APA, Harvard, Vancouver, ISO, and other styles
6

Hkiri, Emna, Souheyl Mallat, and Mounir Zrigui. "Events Automatic Extraction from Arabic Texts." International Journal of Information Retrieval Research 6, no. 1 (January 2016): 36–51. http://dx.doi.org/10.4018/ijirr.2016010103.

Full text
Abstract:
The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.
APA, Harvard, Vancouver, ISO, and other styles
7

Melton, Genevieve B., and George Hripcsak. "Automated Detection of Adverse Events Using Natural Language Processing of Discharge Summaries." Journal of the American Medical Informatics Association 12, no. 4 (July 2005): 448–57. http://dx.doi.org/10.1197/jamia.m1794.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

YLI-JYRÄ, ANSSI, ANDRÁS KORNAI, and JACQUES SAKAROVITCH. "Finite-state methods and models in natural language processing." Natural Language Engineering 17, no. 2 (March 21, 2011): 141–44. http://dx.doi.org/10.1017/s1351324911000015.

Full text
Abstract:
For the past two decades, specialised events on finite-state methods have been successful in presenting interesting studies on natural language processing to the public through journals and collections. The FSMNLP workshops have become well-known among researchers and are now the main forum of the Association for Computational Linguistics' (ACL) Special Interest Group on Finite-State Methods (SIGFSM). The current issue on finite-state methods and models in natural language processing was planned in 2008 in this context as a response to a call for special issue proposals. In 2010, the issue received a total of sixteen submissions, some of which were extended and updated versions of workshop papers, and others which were completely new. The final selection, consisting of only seven papers that could fit into one issue, is not fully representative, but complements the prior special issues in a nice way. The selected papers showcase a few areas where finite-state methods have less than obvious and sometimes even groundbreaking relevance to natural language processing (NLP) applications.
APA, Harvard, Vancouver, ISO, and other styles
9

Abbood, Auss, Alexander Ullrich, Rüdiger Busche, and Stéphane Ghozzi. "EventEpi—A natural language processing framework for event-based surveillance." PLOS Computational Biology 16, no. 11 (November 20, 2020): e1008277. http://dx.doi.org/10.1371/journal.pcbi.1008277.

Full text
Abstract:
According to the World Health Organization (WHO), around 60% of all outbreaks are detected using informal sources. In many public health institutes, including the WHO and the Robert Koch Institute (RKI), dedicated groups of public health agents sift through numerous articles and newsletters to detect relevant events. This media screening is one important part of event-based surveillance (EBS). Reading the articles, discussing their relevance, and putting key information into a database is a time-consuming process. To support EBS, but also to gain insights into what makes an article and the event it describes relevant, we developed a natural language processing framework for automated information extraction and relevance scoring. First, we scraped relevant sources for EBS as done at the RKI (WHO Disease Outbreak News and ProMED) and automatically extracted the articles’ key data: disease, country, date, and confirmed-case count. For this, we performed named entity recognition in two steps: EpiTator, an open-source epidemiological annotation tool, suggested many different possibilities for each. We extracted the key country and disease using a heuristic with good results. We trained a naive Bayes classifier to find the key date and confirmed-case count, using the RKI’s EBS database as labels which performed modestly. Then, for relevance scoring, we defined two classes to which any article might belong: The article is relevant if it is in the EBS database and irrelevant otherwise. We compared the performance of different classifiers, using bag-of-words, document and word embeddings. The best classifier, a logistic regression, achieved a sensitivity of 0.82 and an index balanced accuracy of 0.61. Finally, we integrated these functionalities into a web application called EventEpi where relevant sources are automatically analyzed and put into a database. The user can also provide any URL or text, that will be analyzed in the same way and added to the database. Each of these steps could be improved, in particular with larger labeled datasets and fine-tuning of the learning algorithms. The overall framework, however, works already well and can be used in production, promising improvements in EBS. The source code and data are publicly available under open licenses.
APA, Harvard, Vancouver, ISO, and other styles
10

Kosiv, Yurii A., and Vitaliy S. Yakovyna. "Three language political leaning text classification using natural language processing methods." Applied Aspects of Information Technology 5, no. 4 (December 28, 2022): 359–70. http://dx.doi.org/10.15276/aait.05.2022.24.

Full text
Abstract:
In this article, the problem of political leaning classificationof the text resource is solved. First, a detailed analysis of ten stud-ies on the work’s topicwas performed in the form of comparative characteristicsof the used methodologies.Literary sources were compared according to the problem-solvingmethods,the learning that was carried out, the evaluation metrics, and according to the vectorizations.Thus, it was determined that machine learning algorithms and neural networks, as well as vectorizationmethods TF-IDF and Word2Vec, were most often used to solve the problem.Next, various classification models of whether textual information is pro-Ukrainian or pro-Russian were built based on a dataset containing messages from social media users about the events of the large-scale Russian invasion of Ukraine from February 24, 2022.The problem was solved with the help of Support Vector Machines, Decision Tree, Random Forest, Naïve Bayes classifier,eXtreme Gradient BoostingandLogistic Regressionmachine learning algo-rithms, Convolutional Neural Networks, Long short-term memory and BERT neural networks, techniques for working with unbal-anced dataRandom Oversampling, Random Undersampling , SMOTE and SMOTETomek, as well as stacking ensembles of models.Amongthe machine learning algorithms, LR performed best, showing a macro F1-scorevalue of 0.7966 when features were trans-formed by TF-IDF vectorization and 0.7933 when BoW.Among neural networks, the best macro F1-scorevalue of 0.76was ob-tained using CNN and LSTM.Applying data balancing techniques failed to improve the results of machine learning algorithms.Next, ensembles of models from machine learning algorithms were determined. Two of the constructed ensembles achieved the same macro F1-scorevalue of 0.7966 as with LR. Ensembles that wasable to do so consisted of the TF-IDF vectorization, the B-NBC meta-model, and the SVC, NuSVC LR, and SVC, LR base models, respectively.Thus, three classifiers, the LR machine learning algorithmand two ensembles of models, which were defined as a combination of existing methods of solving the problem, demon-strated the largest macro F1-score value of 0.7966. The obtained models can be used for a detailed review of various news publica-tions according to the political leaning characteristic, information about which can help people identify being isolated by a filter bubble.
APA, Harvard, Vancouver, ISO, and other styles
11

Lakkad, Aditya Kamleshbhai, Rushit Dharmendrabhai Bhadaniya, Vraj Nareshkumar Shah, and Lavanya K. "Complex Events Processing on Live News Events Using Apache Kafka and Clustering Techniques." International Journal of Intelligent Information Technologies 17, no. 1 (January 2021): 39–52. http://dx.doi.org/10.4018/ijiit.2021010103.

Full text
Abstract:
The explosive growth of news and news content generated worldwide, coupled with the expansion through online media and rapid access to data, has made trouble and screening of news tedious. An expanding need for a model that can reprocess, break down, and order main content to extract interpretable information, explicitly recognizing subjects and content-driven groupings of articles. This paper proposed automated analyzing heterogeneous news through complex event processing (CEP) and machine learning (ML) algorithms. Initially, news content streamed using Apache Kafka, stored in Apache Druid, and further processed by a blend of natural language processing (NLP) and unsupervised machine learning (ML) techniques.
APA, Harvard, Vancouver, ISO, and other styles
12

Kehl, Kenneth L., Wenxin Xu, Eva Lepisto, Haitham Elmarakeby, Michael J. Hassett, Eliezer M. Van Allen, Bruce E. Johnson, and Deborah Schrag. "Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes." JCO Clinical Cancer Informatics, no. 4 (September 2020): 680–90. http://dx.doi.org/10.1200/cci.20.00020.

Full text
Abstract:
PURPOSE Cancer research using electronic health records and genomic data sets requires clinical outcomes data, which may be recorded only in unstructured text by treating oncologists. Natural language processing (NLP) could substantially accelerate extraction of this information. METHODS Patients with lung cancer who had tumor sequencing as part of a single-institution precision oncology study from 2013 to 2018 were identified. Medical oncologists’ progress notes for these patients were reviewed. For each note, curators recorded whether the assessment/plan indicated any cancer, progression/worsening of disease, and/or response to therapy or improving disease. Next, a recurrent neural network was trained using unlabeled notes to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on labeled assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) among a held-out test set of 10% of patients. Associations between curated response or progression end points and overall survival were measured using Cox models among patients receiving palliative-intent systemic therapy. RESULTS Medical oncologist notes (n = 7,597) were manually curated for 919 patients. In the 10% test set, NLP models replicated human curation with AUROCs of 0.94 for the any-cancer outcome, 0.86 for the progression outcome, and 0.90 for the response outcome. Progression/worsening events identified using NLP models were associated with shortened survival (hazard ratio [HR] for mortality, 2.49; 95% CI, 2.00 to 3.09); response/improvement events were associated with improved survival (HR, 0.45; 95% CI, 0.30 to 0.67). CONCLUSION NLP models based on neural networks can extract meaningful outcomes from oncologist notes at scale. Such models may facilitate identification of clinical and genomic features associated with response to cancer treatment.
APA, Harvard, Vancouver, ISO, and other styles
13

BOURBAKIS, NIKOLAOS, and MICHAEL MILLS. "CONVERTING NATURAL LANGUAGE TEXT SENTENCES INTO SPN REPRESENTATIONS FOR ASSOCIATING EVENTS." International Journal of Semantic Computing 06, no. 03 (September 2012): 353–70. http://dx.doi.org/10.1142/s1793351x12500067.

Full text
Abstract:
A better understanding of events many times requires the association and the efficient representation of multi-modal information. A good approach to this important issue is the development of a common platform for converting different modalities (such as images, text, etc.) into the same medium and associating them for efficient processing and understanding. In a previous paper we have presented a Local-Global graph model for the conversion of images into graphs with attributes and then into natural language (NL) text sentences [25]. Here, in this paper we propose the conversion of NL text sentences into graphs and then into Stochastic Petri-nets (SPN) descriptions in order to efficiently offer a model of associating "activities or changes" in multimodal information for events representation and understanding. The selection of the SPN graph model is due to its capability for efficiently representing structural and functional knowledge. Simple illustrative examples are provided for proving the concept proposed here.
APA, Harvard, Vancouver, ISO, and other styles
14

Verma, Sudha, Sarah Vieweg, William Corvey, Leysia Palen, James Martin, Martha Palmer, Aaron Schram, and Kenneth Anderson. "Natural Language Processing to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency." Proceedings of the International AAAI Conference on Web and Social Media 5, no. 1 (August 3, 2021): 385–92. http://dx.doi.org/10.1609/icwsm.v5i1.14119.

Full text
Abstract:
In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually cull and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time- and safety-critical situations if captured and analyzed properly and rapidly. We describe an approach for automatically identifying messages communicated via Twitter that contribute to situational awareness, and explain why it is beneficial for those seeking information during mass emergencies.We collected Twitter messages from four different crisis events of varying nature and magnitude and built a classifier to automatically detect messages that may contribute to situational awareness, utilizing a combination of hand-annotated and automatically-extracted linguistic features. Our system was able to achieve over 80% accuracy on categorizing tweets that contribute to situational awareness. Additionally, we show that a classifier developed for a specific emergency event performs well on similar events. The results are promising, and have the potential to aid the general public in culling and analyzing information communicated during times of mass emergency.
APA, Harvard, Vancouver, ISO, and other styles
15

Fong, Allan, Nicole Harriott, Donna M. Walters, Hanan Foley, Richard Morrissey, and Raj R. Ratwani. "Integrating natural language processing expertise with patient safety event review committees to improve the analysis of medication events." International Journal of Medical Informatics 104 (August 2017): 120–25. http://dx.doi.org/10.1016/j.ijmedinf.2017.05.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Ujiie, Shogo, Shuntaro Yada, Shoko Wakamiya, and Eiji Aramaki. "Identification of Adverse Drug Event–Related Japanese Articles: Natural Language Processing Analysis." JMIR Medical Informatics 8, no. 11 (November 27, 2020): e22661. http://dx.doi.org/10.2196/22661.

Full text
Abstract:
Background Medical articles covering adverse drug events (ADEs) are systematically reported by pharmaceutical companies for drug safety information purposes. Although policies governing reporting to regulatory bodies vary among countries and regions, all medical article reporting may be categorized as precision or recall based. Recall-based reporting, which is implemented in Japan, requires the reporting of any possible ADE. Therefore, recall-based reporting can introduce numerous false negatives or substantial amounts of noise, a problem that is difficult to address using limited manual labor. Objective Our aim was to develop an automated system that could identify ADE-related medical articles, support recall-based reporting, and alleviate manual labor in Japanese pharmaceutical companies. Methods Using medical articles as input, our system based on natural language processing applies document-level classification to extract articles containing ADEs (replacing manual labor in the first screening) and sentence-level classification to extract sentences within those articles that imply ADEs (thus supporting experts in the second screening). We used 509 Japanese medical articles annotated by a medical engineer to evaluate the performance of the proposed system. Results Document-level classification yielded an F1 of 0.903. Sentence-level classification yielded an F1 of 0.413. These were averages of fivefold cross-validations. Conclusions A simple automated system may alleviate the manual labor involved in screening drug safety–related medical articles in pharmaceutical companies. After improving the accuracy of the sentence-level classification by considering a wider context, we intend to apply this system toward real-world postmarketing surveillance.
APA, Harvard, Vancouver, ISO, and other styles
17

Ryciak, Piotr, Katarzyna Wasielewska, and Artur Janicki. "Anomaly Detection in Log Files Using Selected Natural Language Processing Methods." Applied Sciences 12, no. 10 (May 18, 2022): 5089. http://dx.doi.org/10.3390/app12105089.

Full text
Abstract:
In this article, we address the problem of detecting anomalies in system log files. Computer systems generate huge numbers of events, which are noted in event log files. While most of them report normal actions, an unusual entry may inform about a failure or malware infection. A human operator may easily miss such an entry; therefore, anomaly detection methods are used for this purpose. In our work, we used an approach known from the natural language processing (NLP) domain, which operates on so-called embeddings, that is vector representations of words or phrases. We describe an improved version of the LogEvent2Vec algorithm, proposed in 2020. In contrast to the original version, we propose a significant shortening of the analysis window, which both increased the accuracy of anomaly detection and made further analysis of suspicious sequences much easier. We experimented with various binary classifiers, such as decision trees or multilayer perceptrons (MLPs), and the Blue Gene/L dataset. We showed that selecting an optimal classifier (in this case, MLP) and a short log sequence gave very good results. The improved version of the algorithm yielded the best F1-score of 0.997, compared to 0.886 in the original version of the algorithm.
APA, Harvard, Vancouver, ISO, and other styles
18

Mashima, Yukinori, Takashi Tamura, Jun Kunikata, Shinobu Tada, Akiko Yamada, Masatoshi Tanigawa, Akiko Hayakawa, Hirokazu Tanabe, and Hideto Yokoi. "Using Natural Language Processing Techniques to Detect Adverse Events From Progress Notes Due to Chemotherapy." Cancer Informatics 21 (January 2022): 117693512210850. http://dx.doi.org/10.1177/11769351221085064.

Full text
Abstract:
Objective: In recent years, natural language processing (NLP) techniques have progressed, and their application in the medical field has been tested. However, the use of NLP to detect symptoms from medical progress notes written in Japanese, remains limited. We aimed to detect 2 gastrointestinal symptoms that interfere with the continuation of chemotherapy—nausea/vomiting and diarrhea—from progress notes using NLP, and then to analyze factors affecting NLP. Materials and methods: In this study, 200 patients were randomly selected from 5277 patients who received intravenous injections of cytotoxic anticancer drugs at Kagawa University Hospital, Japan, between January 2011 and December 2018. We aimed to detect the first occurrence of nausea/vomiting (Group A) and diarrhea (Group B) using NLP. The NLP performance was evaluated by the concordance with a review of the physicians’ progress notes used as the gold standard. Results: Both groups showed high concordance: 83.5% (95% confidence interval [CI] 74.1-90.1) in Group A and 97.7% (95% CI 91.3-99.9) in Group B. However, the concordance was significantly better in Group B ( P = .0027). There were significantly more misdetection cases in Group A than in Group B (15.3% in Group A; 1.2% in Group B, P = .0012) due to negative findings or past history. Conclusion: We detected occurrences of nausea/vomiting and diarrhea accurately using NLP. However, there were more misdetection cases in Group A due to negative findings or past history, which may have been influenced by the physicians’ more frequent documentation of nausea/vomiting.
APA, Harvard, Vancouver, ISO, and other styles
19

Llorens, Hector, Estela Saquete, and Borja Navarro-Colorado. "Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language." Information Processing & Management 49, no. 1 (January 2013): 179–97. http://dx.doi.org/10.1016/j.ipm.2012.05.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Claus, Stefan, and Massimo Stella. "Natural Language Processing and Cognitive Networks Identify UK Insurers’ Trends in Investor Day Transcripts." Future Internet 14, no. 10 (October 12, 2022): 291. http://dx.doi.org/10.3390/fi14100291.

Full text
Abstract:
The ability to spot key ideas, trends, and relationships between them in documents is key to financial services, such as banks and insurers. Identifying patterns across vast amounts of domain-specific reports is crucial for devising efficient and targeted supervisory plans, subsequently allocating limited resources where most needed. Today, insurance supervisory planning primarily relies on quantitative metrics based on numerical data (e.g., solvency financial returns). The purpose of this work is to assess whether Natural Language Processing (NLP) and cognitive networks can highlight events and relationships of relevance for regulators that supervise the insurance market, replacing human coding of information with automatic text analysis. To this aim, this work introduces a dataset of NIDT=829 investor transcripts from Bloomberg and explores/tunes 3 NLP techniques: (1) keyword extraction enhanced by cognitive network analysis; (2) valence/sentiment analysis; and (3) topic modelling. Results highlight that keyword analysis, enriched by term frequency-inverse document frequency scores and semantic framing through cognitive networks, could detect events of relevance for the insurance system like cyber-attacks or the COVID-19 pandemic. Cognitive networks were found to highlight events that related to specific financial transitions: The semantic frame of “climate” grew in size by +538% between 2018 and 2020 and outlined an increased awareness that agents and insurers expressed towards climate change. A lexicon-based sentiment analysis achieved a Pearson’s correlation of ρ=0.16 (p<0.001,N=829) between sentiment levels and daily share prices. Although relatively weak, this finding indicates that insurance jargon is insightful to support risk supervision. Topic modelling is considered less amenable to support supervision, because of a lack of results’ stability and an intrinsic difficulty to interpret risk patterns. We discuss how these automatic methods could complement existing supervisory tools in supporting effective oversight of the insurance market.
APA, Harvard, Vancouver, ISO, and other styles
21

Li, Liuqing, Jack Geissinger, William A. Ingram, and Edward A. Fox. "Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning." Data and Information Management 4, no. 1 (March 24, 2020): 18–43. http://dx.doi.org/10.2478/dim-2020-0003.

Full text
Abstract:
AbstractNatural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.
APA, Harvard, Vancouver, ISO, and other styles
22

Sprugnoli, Rachele, and Sara Tonelli. "Novel Event Detection and Classification for Historical Texts." Computational Linguistics 45, no. 2 (June 2019): 229–65. http://dx.doi.org/10.1162/coli_a_00347.

Full text
Abstract:
Event processing is an active area of research in the Natural Language Processing community, but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts Particularly in the current era of massive digitization of historical sources: Research in this domain can lead to the development of methodologies and tools that can assist historians in enhancing their work, while having an impact also on the field of Natural Language Processing. Our work aims at shedding light on the complex concept of events when dealing with historical texts. More specifically, we introduce new annotation guidelines for event mentions and types, categorized into 22 classes. Then, we annotate a historical corpus accordingly, and compare two approaches for automatic event detection and classification following this novel scheme. We believe that this work can foster research in a field of inquiry as yet underestimated in the area of Temporal Information Processing. To this end, we release new annotation guidelines, a corpus, and new models for automatic annotation.
APA, Harvard, Vancouver, ISO, and other styles
23

GLAVAŠ, GORAN, and JAN ŠNAJDER. "Construction and evaluation of event graphs." Natural Language Engineering 21, no. 4 (May 1, 2014): 607–52. http://dx.doi.org/10.1017/s1351324914000060.

Full text
Abstract:
AbstractEvents play an important role in natural language processing and information retrieval due to numerous event-oriented texts and information needs. Many natural language processing and information retrieval applications could benefit from a structured event-oriented document representation. In this paper, we proposeevent graphsas a novel way of structuring event-based information from text. Nodes in event graphs represent the individual mentions of events, whereas edges represent the temporal and coreference relations between mentions. Contrary to previous natural language processing research, which has mainly focused on individual event extraction tasks, we describe a complete end-to-end system for event graph extraction from text. Our system is a three-stage pipeline that performs anchor extraction, argument extraction, and relation extraction (temporal relation extraction and event coreference resolution), each at a performance level comparable with the state of the art. We presentEvExtra, a large newspaper corpus annotated with event mentions and event graphs, on which we train and evaluate our models. To measure the overall quality of the constructed event graphs, we propose two metrics based on the tensor product between automatically and manually constructed graphs. Finally, we evaluate the overall quality of event graphs with the proposed evaluation metrics and perform a headroom analysis of the system.
APA, Harvard, Vancouver, ISO, and other styles
24

Bridgelall, Raj. "An Application of Natural Language Processing to Classify What Terrorists Say They Want." Social Sciences 11, no. 1 (January 13, 2022): 23. http://dx.doi.org/10.3390/socsci11010023.

Full text
Abstract:
Knowing what perpetrators want can inform strategies to achieve safe, secure, and sustainable societies. To help advance the body of knowledge in counterterrorism, this research applied natural language processing and machine learning techniques to a comprehensive database of terrorism events. A specially designed empirical topic modeling technique provided a machine-aided human decision process to glean six categories of perpetrator aims from the motive text narrative. Subsequently, six different machine learning models validated the aim categories based on the accuracy of their association with a different narrative field, the event summary. The ROC-AUC scores of the classification ranged from 86% to 93%. The Extreme Gradient Boosting model provided the best predictive performance. The intelligence community can use the identified aim categories to help understand the incentive structure of terrorist groups and customize strategies for dealing with them.
APA, Harvard, Vancouver, ISO, and other styles
25

Psarologou, Adamantia, and Nikolaos Bourbakis. "Glossa — A Formal Language as a Mapping Mechanism of NL Sentences into SPN State Machine for Actions/Events Association." International Journal on Artificial Intelligence Tools 26, no. 02 (April 2017): 1750012. http://dx.doi.org/10.1142/s0218213017500129.

Full text
Abstract:
Natural Language Understanding (NLU) is an old and really challenging field with a variety of research work published on it. In this paper we present a formal language methodology based on a state machine for efficiently representing natural language events/actions and their associations in well-written documents. The methodology consists of the following steps. We firstly apply Anaphora Resolution (AR) to the pre-processing natural language text. Then we extract the kernel(s) of each sentence. These kernels are formally represented using a formal language, (Glossa) to map the language expressions (kernels) into Stochastic Petri Nets (SPN) graphs. Finally we apply a set of rules to combine the SPN graphs in order to achieve the associations of actions/events in time. Special emphasis of this paper is the mapping of kernels of NL sentences into SPN graphs. Note that this work does not cover all the aspects of the NLU. Examples of SPN graphs of different NL texts, produced by our proposed methodology are given.
APA, Harvard, Vancouver, ISO, and other styles
26

Wang, Chun Ping. "Design and Implementation of Network Events Monitoring System." Applied Mechanics and Materials 568-570 (June 2014): 1430–33. http://dx.doi.org/10.4028/www.scientific.net/amm.568-570.1430.

Full text
Abstract:
The model of network monitoring system proposed in this paper, the use of user modeling techniques and event detection techniques. Preclude the use of dynamic modeling and dynamic model of the method of combining inferred more detailed user interest model to optimize the results, event detection method for the introduction of natural language processing, the system automatically send the text to identify hot topics and events advertising. In the system design, considering the intersection of the two, to obtain a better user experience.
APA, Harvard, Vancouver, ISO, and other styles
27

Bai, Sun, Zang, Zhang, Shen, Liu, and Wei. "Identification Technology of Grid Monitoring Alarm Event Based on Natural Language Processing and Deep Learning in China." Energies 12, no. 17 (August 23, 2019): 3258. http://dx.doi.org/10.3390/en12173258.

Full text
Abstract:
Power dispatching systems currently receive massive, complicated, and irregular monitoring alarms during their operation, which prevents the controllers from making accurate judgments on the alarm events that occur within a short period of time. In view of the current situation with the low efficiency of monitoring alarm information, this paper proposes a method based on natural language processing (NLP) and a hybrid model that combines long short-term memory (LSTM) and convolutional neural network (CNN) for the identification of grid monitoring alarm events. Firstly, the characteristics of the alarm information text were analyzed and induced and then preprocessed. Then, the monitoring alarm information was vectorized based on the Word2vec model. Finally, a monitoring alarm event identification model based on a combination of LSTM and CNN was established for the characteristics of the alarm information. The feasibility and effectiveness of the method in this paper were verified by comparison with multiple identification models.
APA, Harvard, Vancouver, ISO, and other styles
28

Son, Ji-eun, and Yu-nam Cheong. "A study on the Korean event named entity for natural language processing." Journal of Yeongju Language & Literature 53 (October 31, 2022): 29–63. http://dx.doi.org/10.30774/yjll.2023.02.53.29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Rose, Rodrigo L., Tejas G. Puranik, and Dimitri N. Mavris. "Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives." Aerospace 7, no. 10 (September 28, 2020): 143. http://dx.doi.org/10.3390/aerospace7100143.

Full text
Abstract:
The complexity of commercial aviation operations has grown substantially in recent years, together with a diversification of techniques for collecting and analyzing flight data. As a result, data-driven frameworks for enhancing flight safety have grown in popularity. Data-driven techniques offer efficient and repeatable exploration of patterns and anomalies in large datasets. Text-based flight safety data presents a unique challenge in its subjectivity, and relies on natural language processing tools to extract underlying trends from narratives. In this paper, a methodology is presented for the analysis of aviation safety narratives based on text-based accounts of in-flight events and categorical metadata parameters which accompany them. An extensive pre-processing routine is presented, including a comparison between numeric models of textual representation for the purposes of document classification. A framework for categorizing and visualizing narratives is presented through a combination of k-means clustering and 2-D mapping with t-Distributed Stochastic Neighbor Embedding (t-SNE). A cluster post-processing routine is developed for identifying driving factors in each cluster and building a hierarchical structure of cluster and sub-cluster labels. The Aviation Safety Reporting System (ASRS), which includes over a million de-identified voluntarily submitted reports describing aviation safety incidents for commercial flights, is analyzed as a case study for the methodology. The method results in the identification of 10 major clusters and a total of 31 sub-clusters. The identified groupings are post-processed through metadata-based statistical analysis of the learned clusters. The developed method shows promise in uncovering trends from clusters that are not evident in existing anomaly labels in the data and offers a new tool for obtaining insights from text-based safety data that complement existing approaches.
APA, Harvard, Vancouver, ISO, and other styles
30

Murphy, Rachel M., Joanna E. Klopotowska, Nicolette F. de Keizer, Kitty J. Jager, Jan Hendrik Leopold, Dave A. Dongelmans, Ameen Abu-Hanna, and Martijn C. Schut. "Adverse drug event detection using natural language processing: A scoping review of supervised learning methods." PLOS ONE 18, no. 1 (January 3, 2023): e0279842. http://dx.doi.org/10.1371/journal.pone.0279842.

Full text
Abstract:
To reduce adverse drug events (ADEs), hospitals need a system to support them in monitoring ADE occurrence routinely, rapidly, and at scale. Natural language processing (NLP), a computerized approach to analyze text data, has shown promising results for the purpose of ADE detection in the context of pharmacovigilance. However, a detailed qualitative assessment and critical appraisal of NLP methods for ADE detection in the context of ADE monitoring in hospitals is lacking. Therefore, we have conducted a scoping review to close this knowledge gap, and to provide directions for future research and practice. We included articles where NLP was applied to detect ADEs in clinical narratives within electronic health records of inpatients. Quantitative and qualitative data items relating to NLP methods were extracted and critically appraised. Out of 1,065 articles screened for eligibility, 29 articles met the inclusion criteria. Most frequent tasks included named entity recognition (n = 17; 58.6%) and relation extraction/classification (n = 15; 51.7%). Clinical involvement was reported in nine studies (31%). Multiple NLP modelling approaches seem suitable, with Long Short Term Memory and Conditional Random Field methods most commonly used. Although reported overall performance of the systems was high, it provides an inflated impression given a steep drop in performance when predicting the ADE entity or ADE relation class. When annotating corpora, treating an ADE as a relation between a drug and non-drug entity seems the best practice. Future research should focus on semi-automated methods to reduce the manual annotation effort, and examine implementation of the NLP methods in practice.
APA, Harvard, Vancouver, ISO, and other styles
31

Zhao, Yiqing, Sunyang Fu, Suzette J. Bielinski, Paul A. Decker, Alanna M. Chamberlain, Veronique L. Roger, Hongfang Liu, and Nicholas B. Larson. "Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation." Journal of Medical Internet Research 23, no. 3 (March 8, 2021): e22951. http://dx.doi.org/10.2196/22951.

Full text
Abstract:
Background Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. Objective The aim of this study was to develop a machine learning–based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. Methods The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). Results Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample. Conclusions We developed and validated a machine learning–based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions.
APA, Harvard, Vancouver, ISO, and other styles
32

Mao, Jialin, Art Sedrakyan, Tianyi Sun, Maryam Guiahi, Scott Chudnoff, Madris Kinard, and Stephen B. Johnson. "Assessing adverse event reports of hysteroscopic sterilization device removal using natural language processing." Pharmacoepidemiology and Drug Safety 31, no. 4 (December 21, 2021): 442–51. http://dx.doi.org/10.1002/pds.5402.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Nguyen, M., E. J. Woo, S. Winiecki, J. Scott, D. Martin, T. Botsis, R. Ball, and B. Baer. "Can Natural Language Processing Improve the Efficiency of Vaccine Adverse Event Report Review?" Methods of Information in Medicine 55, no. 02 (2016): 144–50. http://dx.doi.org/10.3414/me14-01-0066.

Full text
Abstract:
SummaryBackground: Individual case review of spontaneous adverse event (AE) reports remains a cornerstone of medical product safety surveil-lance for industry and regulators. Previously we developed the Vaccine Adverse Event Text Miner (VaeTM) to offer automated information extraction and potentially accelerate the evaluation of large volumes of unstructured data and facilitate signal detection.Objective: To assess how the information extraction performed by VaeTM impacts the accuracy of a medical expert’s review of the vaccine adverse event report.Methods: The “outcome of interest” (diagnosis, cause of death, second level diagnosis), “onset time,” and “alternative explanations” (drug, medical and family history) for the adverse event were extracted from 1000 reports from the Vaccine Adverse Event Reporting System (VAERS) using the VaeTM system. We compared the human interpretation, by medical experts, of the VaeTM extracted data with their interpretation of the traditional full text reports for these three variables. Two experienced clinicians alternately reviewed text miner output and full text. A third clinician scored the match rate using a predefined algorithm; the proportion of matches and 95% confidence intervals (CI) were calculated. Review time per report was analyzed.Results: Proportion of matches between the interpretation of the VaeTM extracted data, compared to the interpretation of the full text: 93% for outcome of interest (95% CI: 91– 94%) and 78% for alternative explanation (95% CI: 75 – 81%). Extracted data on the time to onset was used in 14% of cases and was a match in 54% (95% CI: 46 – 63%) of those cases. When supported by structured time data from reports, the match for time to onset was 79% (95% CI: 76 – 81%). The extracted text averaged 136 (74%) fewer words, resulting in a mean reduction in review time of 50 (58%) seconds per report.Conclusion: Despite a 74% reduction in words, the clinical conclusion from VaeTM extracted data agreed with the full text in 93% and 78% of reports for the outcome of interest and alternative explanation, respec -tively. The limited amount of extracted time interval data indicates the need for further development of this feature. VaeTM may improve review efficiency, but further study is needed to determine if this level of agreement is sufficient for routine use.
APA, Harvard, Vancouver, ISO, and other styles
34

Anand, Sarabjot Singh, Arshad Jhumka, and Kimberley Wade. "Towards the Ordering of Events from Multiple Textual Evidence Sources." International Journal of Digital Crime and Forensics 3, no. 2 (April 2011): 16–34. http://dx.doi.org/10.4018/jdcf.2011040102.

Full text
Abstract:
In any criminal investigation, two important problems have to be addressed: (1) integration of multiple data sources to build a concise picture of the events leading up to and/or during the execution of a crime, and (2) determining the order in which these events occurred. This paper focuses on the integration of multiple textual data sources, each providing a recollection of events observed by eyewitnesses. From these textual documents, using text mining and natural language processing techniques the authors identify events (across the document corpus) associated with the crime and infer temporal relationships between these events to create a (partial) ordering of the events. The authors evaluate their method on data collected through a mock eyewitness task.
APA, Harvard, Vancouver, ISO, and other styles
35

Chen, Long, Yu Gu, Xin Ji, Zhiyong Sun, Haodan Li, Yuan Gao, and Yang Huang. "Extracting medications and associated adverse drug events using a natural language processing system combining knowledge base and deep learning." Journal of the American Medical Informatics Association 27, no. 1 (October 7, 2019): 56–64. http://dx.doi.org/10.1093/jamia/ocz141.

Full text
Abstract:
Abstract Objective Detecting adverse drug events (ADEs) and medications related information in clinical notes is important for both hospital medical care and medical research. We describe our clinical natural language processing (NLP) system to automatically extract medical concepts and relations related to ADEs and medications from clinical narratives. This work was part of the 2018 National NLP Clinical Challenges Shared Task and Workshop on Adverse Drug Events and Medication Extraction. Materials and Methods The authors developed a hybrid clinical NLP system that employs a knowledge-based general clinical NLP system for medical concepts extraction, and a task-specific deep learning system for relations identification using attention-based bidirectional long short-term memory networks. Results The systems were evaluated as part of the 2018 National NLP Clinical Challenges challenge, and our attention-based bidirectional long short-term memory networks based system obtained an F-measure of 0.9442 for relations identification task, ranking fifth at the challenge, and had &lt;2% difference from the best system. Error analysis was also conducted targeting at figuring out the root causes and possible approaches for improvement. Conclusions We demonstrate the generic approaches and the practice of connecting general purposed clinical NLP system to task-specific requirements with deep learning methods. Our results indicate that a well-designed hybrid NLP system is capable of ADE and medication-related information extraction, which can be used in real-world applications to support ADE-related researches and medical decisions.
APA, Harvard, Vancouver, ISO, and other styles
36

Moon, Junhyung, Gyuyoung Park, and Jongpil Jeong. "POP-ON: Prediction of Process Using One-Way Language Model Based on NLP Approach." Applied Sciences 11, no. 2 (January 18, 2021): 864. http://dx.doi.org/10.3390/app11020864.

Full text
Abstract:
In business process management, the monitoring service is an important element that can prevent various problems in advance from before they occur in companies and industries. Execution log is created in an information system that is aware of the enterprise process, which helps predict the process. The ultimate goal of the proposed method is to predict the process following the running process instance and predict events based on previously completed event log data. Companies can flexibly respond to unwanted deviations in their workflow. When solving the next event prediction problem, we use a fully attention-based transformer, which has performed well in recent natural language processing approaches. After recognizing the name attribute of the event in the natural language and predicting the next event, several necessary elements were applied. It is trained using the proposed deep learning model according to specific pre-processing steps. Experiments using various business process log datasets demonstrate the superior performance of the proposed method. The name of the process prediction model we propose is “POP-ON”.
APA, Harvard, Vancouver, ISO, and other styles
37

Schäfer, Martin. "Tagungsbericht / Conference report : “Wen wurmt der Ohrwurm? An Interdisciplinary, Cross-Lingual Perspective on the Role of Constituents in Multi-Word Expressions” (Workshop at the 39th DGfS Annual Conference ,,Information und sprachliche Kodierung“, Saarbrücken, 08.10.03.2017)." Zeitschrift für Wortbildung / Journal of Word Formation 1, no. 2 (January 1, 2017): 91–97. http://dx.doi.org/10.3726/zwjw.2017.02.04.

Full text
Abstract:
Abstract Organized by Sabine Schulte im Walde (University of Stuttgart) and Eva Smolka (University of Konstanz) as part of the 39th Annual Conference of the German Linguistic Society (DGfS) held at the Saarland University in Saarbrücken, Germany, the workshop aimed “to shed light on the interaction of constituent properties and compound transparency across languages and disciplines integrating linguistic, psycholinguistic, corpus-based and computational studies”. The workshop brought together researchers from linguistics, psycholinguistics, and natural language processing and comprised 11 contributed talks, framed by two invited talks by Gary Libben and Marco Marelli. Most of the slides are available from the workshop’s homepage at “http://www.ims.uni-stuttgart.de/events/dgfs-mwe-17/program.html”.
APA, Harvard, Vancouver, ISO, and other styles
38

Bashmal, Laila, Yakoub Bazi, Mohamad Mahmoud Al Rahhal, Mansour Zuair, and Farid Melgani. "CapERA: Captioning Events in Aerial Videos." Remote Sensing 15, no. 8 (April 18, 2023): 2139. http://dx.doi.org/10.3390/rs15082139.

Full text
Abstract:
In this paper, we introduce the CapERA dataset, which upgrades the Event Recognition in Aerial Videos (ERA) dataset to aerial video captioning. The newly proposed dataset aims to advance visual–language-understanding tasks for UAV videos by providing each video with diverse textual descriptions. To build the dataset, 2864 aerial videos are manually annotated with a caption that includes information such as the main event, object, place, action, numbers, and time. More captions are automatically generated from the manual annotation to take into account as much as possible the variation in describing the same video. Furthermore, we propose a captioning model for the CapERA dataset to provide benchmark results for UAV video captioning. The proposed model is based on the encoder–decoder paradigm with two configurations to encode the video. The first configuration encodes the video frames independently by an image encoder. Then, a temporal attention module is added on the top to consider the temporal dynamics between features derived from the video frames. In the second configuration, we directly encode the input video using a video encoder that employs factorized space–time attention to capture the dependencies within and between the frames. For generating captions, a language decoder is utilized to autoregressively produce the captions from the visual tokens. The experimental results under different evaluation criteria show the challenges of generating captions from aerial videos. We expect that the introduction of CapERA will open interesting new research avenues for integrating natural language processing (NLP) with UAV video understandings.
APA, Harvard, Vancouver, ISO, and other styles
39

Woller, Bela, Austin Daw, Valerie Aston, Jim Lloyd, Greg Snow, Scott M. Stevens, Scott C. Woller, Peter Jones, and Joseph Bledsoe. "Natural Language Processing Performance for the Identification of Venous Thromboembolism in an Integrated Healthcare System." Clinical and Applied Thrombosis/Hemostasis 27 (January 1, 2021): 107602962110131. http://dx.doi.org/10.1177/10760296211013108.

Full text
Abstract:
Real-time identification of venous thromboembolism (VTE), defined as deep vein thrombosis (DVT) and pulmonary embolism (PE), can inform a healthcare organization’s understanding of these events and be used to improve care. In a former publication, we reported the performance of an electronic medical record (EMR) interrogation tool that employs natural language processing (NLP) of imaging studies for the diagnosis of venous thromboembolism. Because we transitioned from the legacy electronic medical record to the Cerner product, iCentra, we now report the operating characteristics of the NLP EMR interrogation tool in the new EMR environment. Two hundred randomly selected patient encounters for which the imaging report assessed by NLP that revealed VTE was present were reviewed. These included one hundred imaging studies for which PE was identified. These included computed tomography pulmonary angiography—CTPA, ventilation perfusion—V/Q scan, and CT angiography of the chest/ abdomen/pelvis. One hundred randomly selected comprehensive ultrasound (CUS) that identified DVT were also obtained. For comparison, one hundred patient encounters in which PE was suspected and imaging was negative for PE (CTPA or V/Q) and 100 cases of suspected DVT with negative CUS as reported by NLP were also selected. Manual chart review of the 400 charts was performed and we report the sensitivity, specificity, positive and negative predictive values of NLP compared with manual chart review. NLP and manual review agreed on the presence of PE in 99 of 100 cases, the presence of DVT in 96 of 100 cases, the absence of PE in 99 of 100 cases and the absence of DVT in all 100 cases. When compared with manual chart review, NLP interrogation of CUS, CTPA, CT angiography of the chest, and V/Q scan yielded a sensitivity = 93.3%, specificity = 99.6%, positive predictive value = 97.1%, and negative predictive value = 99%.
APA, Harvard, Vancouver, ISO, and other styles
40

Buselli, Irene, Luca Oneto, Carlo Dambra, Christian Verdonk Gallego, Miguel García Martínez, Anthony Smoker, Nnenna Ike, Tamara Pejovic, and Patricia Ruiz Martino. "Natural language processing for aviation safety: extracting knowledge from publicly-available loss of separation reports." Open Research Europe 1 (September 23, 2021): 110. http://dx.doi.org/10.12688/openreseurope.14040.1.

Full text
Abstract:
Background: The air traffic management (ATM) system has historically coped with a global increase in traffic demand ultimately leading to increased operational complexity. When dealing with the impact of this increasing complexity on system safety it is crucial to automatically analyse the loss of separation (LoS) using tools able to extract meaningful and actionable information from safety reports. Current research in this field mainly exploits natural language processing (NLP) to categorise the reports, with the limitations that the considered categories need to be manually annotated by experts and that general taxonomies are seldom exploited. Methods: To address the current gaps, authors propose to perform exploratory data analysis on safety reports combining state-of-the-art techniques like topic modelling and clustering and then to develop an algorithm able to extract the Toolkit for ATM Occurrence Investigation (TOKAI) taxonomy factors from the free-text safety reports based on syntactic analysis. TOKAI is a general taxonomy developed by EUROCONTROL and intended to become a standard and harmonised approach to future investigations. Results: Leveraging on the LoS events reported in the public databases of the Comisión de Estudio y Análisis de Notificaciones de Incidentes de Tránsito Aéreo and the United Kingdom Airprox Board, authors show how their proposal is able to automatically extract meaningful and actionable information from safety reports and to classify them according to the TOKAI taxonomy. The quality of the approach is also indirectly validated by checking the connection between the identified factors and the main contributor of the incidents. Conclusions: Authors' results are a promising first step toward the full automation of a general analysis of LoS reports supported by results on real world data coming from two different sources. In the future, authors' proposal could be extended to other taxonomies or tailored to identify factors to be included in the safety taxonomies.
APA, Harvard, Vancouver, ISO, and other styles
41

Buselli, Irene, Luca Oneto, Carlo Dambra, Christian Verdonk Gallego, Miguel García Martínez, Anthony Smoker, Nnenna Ike, Tamara Pejovic, and Patricia Ruiz Martino. "Natural language processing for aviation safety: extracting knowledge from publicly-available loss of separation reports." Open Research Europe 1 (February 18, 2022): 110. http://dx.doi.org/10.12688/openreseurope.14040.2.

Full text
Abstract:
Background: The air traffic management (ATM) system has historically coped with a global increase in traffic demand ultimately leading to increased operational complexity. When dealing with the impact of this increasing complexity on system safety it is crucial to automatically analyse the losses of separation (LoSs) using tools able to extract meaningful and actionable information from safety reports. Current research in this field mainly exploits natural language processing (NLP) to categorise the reports,with the limitations that the considered categories need to be manually annotated by experts and that general taxonomies are seldom exploited. Methods: To address the current gaps,authors propose to perform exploratory data analysis on safety reports combining state-of-the-art techniques like topic modelling and clustering and then to develop an algorithm able to extract the Toolkit for ATM Occurrence Investigation (TOKAI) taxonomy factors from the free-text safety reports based on syntactic analysis. TOKAI is a tool for investigation developed by EUROCONTROL and its taxonomy is intended to become a standard and harmonised approach to future investigations. Results: Leveraging on the LoS events reported in the public databases of the Comisión de Estudio y Análisis de Notificaciones de Incidentes de Tránsito Aéreo and the United Kingdom Airprox Board,authors show how their proposal is able to automatically extract meaningful and actionable information from safety reports,other than to classify their content according to the TOKAI taxonomy. The quality of the approach is also indirectly validated by checking the connection between the identified factors and the main contributor of the incidents. Conclusions: Authors' results are a promising first step toward the full automation of a general analysis of LoS reports supported by results on real-world data coming from two different sources. In the future,authors' proposal could be extended to other taxonomies or tailored to identify factors to be included in the safety taxonomies.
APA, Harvard, Vancouver, ISO, and other styles
42

Shahbazi, Zeinab, and Yung-Cheol Byun. "Blockchain-Based Event Detection and Trust Verification Using Natural Language Processing and Machine Learning." IEEE Access 10 (2022): 5790–800. http://dx.doi.org/10.1109/access.2021.3139586.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Ma, Meng, Kyeryoung Lee, Yun Mai, Christopher Gilman, Zongzhi Liu, Mingwei Zhang, Minghao Li, et al. "Extracting longitudinal anticancer treatments at scale using deep natural language processing and temporal reasoning." Journal of Clinical Oncology 39, no. 15_suppl (May 20, 2021): e18747-e18747. http://dx.doi.org/10.1200/jco.2021.39.15_suppl.e18747.

Full text
Abstract:
e18747 Background: Accurate longitudinal cancer treatments are vital for establishing primary endpoints such as outcome as well as for the investigation of adverse events. However, many longitudinal therapeutic regimens are not well captured in structured electronic health records (EHRs). Thus, their recognition in unstructured data such as clinical notes is critical to gain an accurate description of the real-world patient treatment journey. Here, we demonstrate a scalable approach to extract high-quality longitudinal cancer treatments from lung cancer patients' clinical notes using a Bidirectional Long Short Term Memory (BiLSTM) and Conditional Random Fields (CRF) based natural language processing (NLP) pipeline. Methods: The lung cancer (LC) cohort of 4,698 patients was curated from the Mount Sinai Healthcare system (2003-2020). Two domain experts developed a structured framework of entities and semantics that captured treatment and its temporality. The framework included therapy type (chemotherapy, targeted therapy, immunotherapy, etc.), status (on, off, hold, planned, etc.) and temporal reasoning entities and relations (admin_date, duration, etc.) We pre-annotated 149 FDA-approved cancer drugs and longitudinal timelines of treatment on the training corpus. A NLP pipeline was implemented with BiLSTM-CRF-based deep learning models to train and then apply the resulting models to the clinical notes of LC cohort. A postprocessor was developed to subsequently post-coordinate and refine the output. We performed both cross-evaluation and independent evaluation to assess the pipeline performance. Results: We applied the NLP pipeline to the 853,755 clinical notes, and identified 1,155 distinct entities for 194 cancer generic drugs, including 74 chemotherapy drugs, 21 immunotherapy drugs, and 99 targeted therapy drugs. We identified chemotherapy, immunotherapy, or targeted therapy data for 3,509 patients in the LC cohort from the clinical notes. Compared to only 2,395 patients with cancer treatments in structured EHR, this pipeline identified cancer treatments from notes for additional 2,303 patients who did not have any available cancer treatment data in the structured EHR. Our evaluation schema indicates that the longitudinal cancer drug recognition pipeline delivers strong performance (named entity recognization for drugs and temporal: F1 = 95%; drug-temporal relation recognition: F1 = 90%). Conclusions: We developed a high-performance BiLSTM-CRF based NLP pipeline to recognize longitudinal cancer treatments. The pipeline recovers and encodes as twice as many patients with cancer treatments compared with structured EHR. Our study indicates deep NLP with temporal reasoning could substantially accelerate the extraction of treatment profiles at scale. The pipeline is adjustable and can be applied across different cancers.
APA, Harvard, Vancouver, ISO, and other styles
44

Quaresma, Paulo, Vítor Beires Nogueira, Kashyap Raiyani, and Roy Bayot. "Event Extraction and Representation: A Case Study for the Portuguese Language." Information 10, no. 6 (June 8, 2019): 205. http://dx.doi.org/10.3390/info10060205.

Full text
Abstract:
Text information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail.
APA, Harvard, Vancouver, ISO, and other styles
45

Vanegas, Jorge A., Sérgio Matos, Fabio González, and José L. Oliveira. "An Overview of Biomolecular Event Extraction from Scientific Documents." Computational and Mathematical Methods in Medicine 2015 (2015): 1–19. http://dx.doi.org/10.1155/2015/571381.

Full text
Abstract:
This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed.
APA, Harvard, Vancouver, ISO, and other styles
46

Christanno, Ivan, Priscilla Priscilla, Jody Johansyah Maulana, Derwin Suhartono, and Rini Wongso. "Eve: An Automated Question Answering System for Events Information." ComTech: Computer, Mathematics and Engineering Applications 8, no. 1 (March 31, 2017): 15. http://dx.doi.org/10.21512/comtech.v8i1.3781.

Full text
Abstract:
The objective of this research was to create a closed-domain of automated question answering system specifically for events called Eve. Automated Question Answering System (QAS) is a system that accepts question input in the form of natural language. The question will be processed through modules to finally return the most appropriate answer to the corresponding question instead of returning a full document as an output. Thescope of the events was those which were organized by Students Association of Computer Science (HIMTI) in Bina Nusantara University. It consisted of 3 main modules namely query processing, information retrieval, and information extraction. Meanwhile, the approaches used in this system included question classification, document indexing, named entity recognition and others. For the results, the system can answer 63 questions for word matching technique, and 32 questions for word similarity technique out of 94 questions correctly.
APA, Harvard, Vancouver, ISO, and other styles
47

Senders, Joeky T., David J. Cote, Alireza Mehrtash, Robert Wiemann, William B. Gormley, Timothy R. Smith, Marike L. D. Broekman, and Omar Arnaout. "Deep learning for natural language processing of free-text pathology reports: a comparison of learning curves." BMJ Innovations 6, no. 4 (June 23, 2020): 192–98. http://dx.doi.org/10.1136/bmjinnov-2019-000410.

Full text
Abstract:
IntroductionAlthough clinically derived information could improve patient care, its full potential remains unrealised because most of it is stored in a format unsuitable for traditional methods of analysis, free-text clinical reports. Various studies have already demonstrated the utility of natural language processing algorithms for medical text analysis. Yet, evidence on their learning efficiency is still lacking. This study aimed to compare the learning curves of various algorithms and develop an open-source framework for text mining in healthcare.MethodsDeep learning and regressions-based models were developed to determine the histopathological diagnosis of patients with brain tumour based on free-text pathology reports. For each model, we characterised the learning curve and the minimal required training examples to reach the area under the curve (AUC) performance thresholds of 0.95 and 0.98.ResultsIn total, we retrieved 7000 reports on 5242 patients with brain tumour (2316 with glioma, 1412 with meningioma and 1514 with cerebral metastasis). Conventional regression and deep learning-based models required 200–400 and 800–1500 training examples to reach the AUC performance thresholds of 0.95 and 0.98, respectively. The deep learning architecture utilised in the current study required 100 and 200 examples, respectively, corresponding to a learning capacity that is two to eight times more efficient.ConclusionsThis open-source framework enables the development of high-performing and fast learning natural language processing models. The steep learning curve can be valuable for contexts with limited training examples (eg, rare diseases and events or institutions with lower patient volumes). The resultant models could accelerate retrospective chart review, assemble clinical registries and facilitate a rapid learning healthcare system.
APA, Harvard, Vancouver, ISO, and other styles
48

Agyeman, Nana Ama. "Documenting Simpa: Advances in language documentation." Legon Journal of the Humanities 30, no. 2 (December 31, 2019): 167–90. http://dx.doi.org/10.4314/ljh.v30i2.8.

Full text
Abstract:
Documentary linguistics, also known as language documentation, a relatively new branch of Linguistics, advocates for the fundamental need to collect records of language use and practices in various forms from diverse genres for multiple purposes. Such purposes include language description, language development, language maintenance, and language revitalisation. Such a record of a language serves to feed not only linguistic research but also research in other disciplines, such as anthropology, history, and ethnography. Language documentation is recognized as an ultimate response to language endangerment. This paper explores language documentation with specific reference to Simpa, an under-described, minority language of Ghana. The paper reviews theories, approaches, methods, and tools of language documentation to highlight how they were employed and attuned to take care of the Simpa context. Thus, the discussion dilates on specific field methods and tools adapted for obtaining a balanced set of data from three complementary event types, viz., natural communicative events, staged communicative events, and elicitations, to build a language documentation corpus. Data processing, data annotation, and data management practices applied in building the corpus, as well as dissemination of the research outcomes are also addressed. Furthermore, fieldwork ethics used in the study are discussed. Finally, for consideration in future research, the paper reflects on some challenges that were encountered in documenting Simpa.
APA, Harvard, Vancouver, ISO, and other styles
49

Uspenskij, Mikhail B. "Log mining and knowledge-based models in data storage systems diagnostics." E3S Web of Conferences 140 (2019): 03006. http://dx.doi.org/10.1051/e3sconf/201914003006.

Full text
Abstract:
Modern data storage systems have a sophisticated hardware and software architecture, including multiple storage processors, storage fabrics, network equipment and storage media and contain information, which can be damaged or lost because of hardware or software fault. Approach to storage software diagnostics, presented in current paper, combines a log mining algorithms for fault detection based on natural language processing text classification methods, and usage of the diagnostic model for a task of fault source detection. Currently existing approaches to computational systems diagnostics are either ignoring system or event log data, using only numeric monitoring parameters, or target only certain log types or use logs to create chains of the structured events. The main advantage of using natural language processing method for log text classification is that no information of log message structure or log message source, or log purpose is required if there is enough data for classificator model training. Developed diagnostic procedure has accuracy score comparable with existing methods and can target all presented in training set faults without prior log structure research.
APA, Harvard, Vancouver, ISO, and other styles
50

Abud, Abdi, and Damon E. Houghton. "Derivation and Validation of Natural Language Processing Algorithms to Identify and Classify Venous Thrombotic Events from Lower Extremity Duplex Ultrasound Reports." Blood 138, Supplement 1 (November 5, 2021): 831. http://dx.doi.org/10.1182/blood-2021-144961.

Full text
Abstract:
Abstract Background: The precise anatomic location and extent of most venous thromboembolism (VTE) is not captured through diagnostic codes and is a major limitation to research. Correctly identifying the specific site of thrombosis, whether proximal or distal deep vein thrombosis (DVT) or superficial vein thrombosis (SVT) is critical.An accurate natural language processing (NLP) tool would make analysis of large datasets from electronic medical records possible and could be an significant improvement compared to analyses using the International Classification of Diseases codes. Using an open-source NLP tool, we evaluated existing algorithms and then created and validated new NLP algorithms to classify lower extremity thrombosis as not only a DVT or SVT, but also to identify more precise anatomic locations. Methods: A random sample of deidentified ultrasound reports were extracted from electronic medical records, manually reviewed, and classified as either positive or negative for a DVT or SVT (any chronicity). Reports with DVT were further classified into proximal (popliteal, femoral, deep femoral, common femoral, iliac veins, or vena cava) and distal (posterior tibial, anterior tibial, peroneal, soleal, or gastrocnemius veins) DVT. SVT was further classified into greater saphenous vein, small saphenous vein, or other site. Thrombosis near the junction of deep and superficial veins was only designated as involving specific vein segments if there was intraluminal thrombosis at that site. Initial sets of 100 ultrasound reports were used for derivation and reiterative testing of the algorithm, consisting of 50 that were positive for DVT/SVT and 50 that were negative for DVT/SVT. Text from radiology reports underwent cleaning to remove extraneous punctuation, text outside of the "findings" and "impression" section, and all text matches for "superficial femoral vein" were replaced with "femoral vein". Using target phrases, the simple NLP tool either classified the reports as present or absent, signifying either positive or negative for thrombosis at the site of interest. After maximizing accuracy in the derivation cohorts, each NLP algorithm was tested in the complete, manually reviewed dataset. Sensitivity (Sn) and specificity (Sp) were calculated and confidence intervals were determined by the binomial exact method. Results: A total of 1206 ultrasounds were reviewed, among which there was 687 positives for DVT (503 had proximal DVT and 378 had distal DVT involvement). A total of 176 had SVT, 114 with involvement of the great saphenous vein (GSV) and 65 with involvement of the small saphenous vein (SSV). The previously published NLP (designed to identify DVT) had a poor Sn 45.0%, but reasonable Sp (91.0%) for DVT. Among the incorrect positive determinations (n=47), 33 were incorrectly positive due to presence of SVT (in the absence of DVT). Our newly developed NLP algorithm correctly identified DVT at any site (Sn: 96.2% 95% CI 94.5-97.5; 661/687 and Sp: 93.8% 95% CI 91.4-95.7; 487/519). Among ultrasounds positive for DVT, the NLP algorithm to determine proximal DVT site had a 96.0% Sn (95% CI 93.9-97.6; 483/503) and a 97.8% Sp (95% CI 94.5-99.4; 179/183). The algorithm for distal DVT had a 96.8% Sn (95% CI 94.5-98.4; 366/378) and a 96.1% Sp (95% CI 93.3-98.0; 296/308). The algorithm to identify SVT was also highly accurate (Sn 97.7%, 95% CI 94.3-99.4; 172/176 and Sp 95.5%, 95% CI 94.1-96.7; 984/1030). Among ultrasounds positive for SVT, the Sn to determine GSV site was 94.7% (95% CI 88.9-98.0; 108/114) and Sp was 95.1% (95% CI 86.3-99.0; 58/61). The algorithm to identify SSV site had a Sn of 98.5% (95% CI 91.7-99.9; 64/65) and a Sp of 100% (95% CI 96.7-100.0; 110/110). Conclusion: Using a previously published, open source, simple NLP program, we have improved the sensitivity and specificity for identification of DVT and created an algorithm to accurately identify SVT. Using a multifaceted analysis approach, we were able to accurately further subclassify the anatomic location of thrombosis on ultrasound reports. This tool and the developed algorithms will allow analysis of large data sets with minimal effort and great accuracy, capitalizing on the power of large electronic datasets to offer new insights on pathophysiology and clinical prognosis. Figure 1 Figure 1. Disclosures No relevant conflicts of interest to declare.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography