To see the other types of publications on this topic, follow the link: Natural language processing (Computer science).

Dissertations / Theses on the topic 'Natural language processing (Computer science)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Natural language processing (Computer science).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Naphtal, Rachael (Rachael M. ). "Natural language processing based nutritional application." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100640.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 67-68).
The ability to accurately and eciently track nutritional intake is a powerful tool in combating obesity and other food related diseases. Currently, many methods used for this task are time consuming or easily abandoned; however, a natural language based application that converts spoken text to nutritional information could be a convenient and eective solution. This thesis describes the creation of an application that translates spoken food diaries into nutritional database entries. It explores dierent methods for solving the problem of converting brands, descriptions and food item names into entries in nutritional databases. Specifically, we constructed a cache of over 4,000 food items, and also created a variety of methods to allow refinement of database mappings. We also explored methods of dealing with ambiguous quantity descriptions and the mapping of spoken quantity values to numerical units. When assessed by 500 users entering their daily meals on Amazon Mechanical Turk, the system was able to map 83.8% of the correctly interpreted spoken food items to relevant nutritional database entries. It was also able to nd a logical quantity for 92.2% of the correct food entries. Overall, this system shows a signicant step towards the intelligent conversion of spoken food diaries to actual nutritional feedback.
by Rachael Naphtal.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
2

Cosh, Kenneth John. "Supporting organisational semiotics with natural language processing techniques." Thesis, Lancaster University, 2003. http://eprints.lancs.ac.uk/12351/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

張少能 and Siu-nang Bruce Cheung. "A concise framework of natural language processing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1989. http://hub.hku.hk/bib/B31208563.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Lei, Tao Ph D. Massachusetts Institute of Technology. "Interpretable neural models for natural language processing." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/108990.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 109-119).
The success of neural network models often comes at a cost of interpretability. This thesis addresses the problem by providing justifications behind the model's structure and predictions. In the first part of this thesis, we present a class of sequence operations for text processing. The proposed component generalizes from convolution operations and gated aggregations. As justifications, we relate this component to string kernels, i.e. functions measuring the similarity between sequences, and demonstrate how it encodes the efficient kernel computing algorithm into its structure. The proposed model achieves state-of-the-art or competitive results compared to alternative architectures (such as LSTMs and CNNs) across several NLP applications. In the second part, we learn rationales behind the model's prediction by extracting input pieces as supporting evidence. Rationales are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by the desiderata for rationales. We demonstrate the effectiveness of this learning framework in applications such multi-aspect sentiment analysis. Our method achieves a performance over 90% evaluated against manual annotated rationales.
by Tao Lei.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
5

Grinman, Alex J. "Natural language processing on encrypted patient data." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/113438.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 85-86).
While many industries can benefit from machine learning techniques for data analysis, they often do not have the technical expertise nor computational power to do so. Therefore, many organizations would benefit from outsourcing their data analysis. Yet, stringent data privacy policies prevent outsourcing sensitive data and may stop the delegation of data analysis in its tracks. In this thesis, we put forth a two-party system where one party capable of powerful computation can run certain machine learning algorithms from the natural language processing domain on the second party's data, where the first party is limited to learning only specific functions of the second party's data and nothing else. Our system provides simple cryptographic schemes for locating keywords, matching approximate regular expressions, and computing frequency analysis on encrypted data. We present a full implementation of this system in the form of a extendible software library and a command line interface. Finally, we discuss a medical case study where we used our system to run a suite of unmodified machine learning algorithms on encrypted free text patient notes.
by Alex J. Grinman.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
6

Cheung, Siu-nang Bruce. "A concise framework of natural language processing /." [Hong Kong : University of Hong Kong], 1989. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12432544.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Shepherd, David. "Natural language program analysis combining natural language processing with program analysis to improve software maintenance tools /." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 176 p, 2007. http://proquest.umi.com/pqdweb?did=1397920371&sid=6&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bajwa, Imran Sarwar. "A natural language processing approach to generate SBVR and OCL." Thesis, University of Birmingham, 2014. http://etheses.bham.ac.uk//id/eprint/4890/.

Full text
Abstract:
The Object Constraint Language (OCL) is a declarative language and is used to make the Unified Modeling Language (UML) models well-defined through defining a set of constraints. However, the syntactic complexity of OCL makes the writing of OCL code difficult. A natural language based interface can be useful in making the process of writing OCL expressions easy and simple. However, the translation of natural language (NL) text to object constraint language (OCL) code is a challenging task on account of the informal nature of natural languages as various syntactic and semantic ambiguities make the process of NL translation to formal languages more complex. However, in our approach the usage of SBVR not only provides natural languages a formal abstract syntax representation but it is also close to OCL syntax. In this thesis, a framework is presented to facilitate the users of the UML tools so that they can write invariants and pre/post conditions in English. The results of the case studies manifest that a natural language based approach to generate OCL constraints can not only help in significantly improving usability of OCL but also outperforms the most closely related techniques in terms of effectiveness and effort required in generating OCL.
APA, Harvard, Vancouver, ISO, and other styles
9

Strandberg, Aron, and Patrik Karlström. "Processing Natural Language for the Spotify API : Are sophisticated natural language processing algorithms necessary when processing language in a limited scope?" Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186867.

Full text
Abstract:
Knowing whether you can implement something complex in a simple way in your application is always of interest. A natural language interface is some- thing that could theoretically be implemented in a lot of applications but the complexity of most natural language processing algorithms is a limiting factor. The problem explored in this paper is whether a simpler algorithm that doesn’t make use of convoluted statistical models and machine learning can be good enough. We implemented two algorithms, one utilizing Spotify’s own search and one with a more accurate, o✏ine search. With the best precision we could muster being 81% at an average of 2,28 seconds per query this is not a viable solution for a complete and satisfactory user experience. Further work could push the performance into an acceptable range.
APA, Harvard, Vancouver, ISO, and other styles
10

Bigert, Johnny. "Automatic and unsupervised methods in natural language processing." Doctoral thesis, Stockholm, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-156.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Walker, Alden. "Natural language interaction with robots." Diss., Connect to the thesis, 2007. http://hdl.handle.net/10066/1275.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

XIAO, MIN. "Generalized Domain Adaptation for Sequence Labeling in Natural Language Processing." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/391382.

Full text
Abstract:
Computer and Information Science
Ph.D.
Sequence labeling tasks have been widely studied in the natural language processing area, such as part-of-speech tagging, syntactic chunking, dependency parsing, and etc. Most of those systems are developed on a large amount of labeled training data via supervised learning. However, manually collecting labeled training data is too time-consuming and expensive. As an alternative, to alleviate the issue of label scarcity, domain adaptation has recently been proposed to train a statistical machine learning model in a target domain where there is no enough labeled training data by exploiting existing free labeled training data in a different but related source domain. The natural language processing community has witnessed the success of domain adaptation in a variety of sequence labeling tasks. Though the labeled training data in the source domain are available and free, however, they are not exactly as and can be very different from the test data in the target domain. Thus, simply applying naive supervised machine learning algorithms without considering domain differences may not fulfill the purpose. In this dissertation, we developed several novel representation learning approaches to address domain adaptation for sequence labeling in natural language processing. Those representation learning techniques aim to induce latent generalizable features to bridge domain divergence to enable cross-domain prediction. We first tackle a semi-supervised domain adaptation scenario where the target domain has a small amount of labeled training data and propose a distributed representation learning approach based on a probabilistic neural language model. We then relax the assumption of the availability of labeled training data in the target domain and study an unsupervised domain adaptation scenario where the target domain has only unlabeled training data, and give a task-informative representation learning approach based on dynamic dependency networks. Both works are developed in the setting where different domains contain sentences in different genres. We then extend and generalize domain adaptation into a more challenging scenario where different domains contain sentences in different languages and propose two cross-lingual representation learning approaches, one is based on deep neural networks with auxiliary bilingual word pairs and the other is based on annotation projection with auxiliary parallel sentences. All four specific learning scenarios are extensively evaluated with different sequence labeling tasks. The empirical results demonstrate the effectiveness of those generalized domain adaptation techniques for sequence labeling in natural language processing.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
13

Cline, Ben E. "Knowledge intensive natural language generation with revision." Diss., This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-09092008-063657/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Chen, Michelle W. M. Eng Massachusetts Institute of Technology. "Comparison of natural language processing algorithms for medical texts." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100298.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Title as it appears in MIT Commencement Exercises program, June 5, 2015: Comparison of NLP systems for medical text. Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 57-58).
With the large corpora of clinical texts, natural language processing (NLP) is growing to be a field that people are exploring to extract useful patient information. NLP applications in clinical medicine are especially important in domains where the clinical observations are crucial to define and diagnose the disease. There are a variety of different systems that attempt to match words and word phrases to medical terminologies. Because of the differences in annotation datasets and lack of common conventions, many of the systems yield conflicting results. The purpose of this thesis project is (1) to create a visual representation of how different concepts compare to each other when using various annotators and (2) to improve upon the NLP methods to yield terms with better fidelity to what the clinicians are trying to express.
by Michelle W. Chen.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
15

Chien, Isabel. "Natural language processing for precision clinical diagnostics and treatment." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119754.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 61-65).
In this thesis, I focus upon application of natural language processing to clinical diagnostics and treatment within the palliative care and serious illness field. I explore a variety of natural language processing methods, including deep learning, rule-based, and classic machine learning, and applied to the identication of documentation reflecting advanced care planning measures, serious illnesses, and serious illness symptoms. I introduce two tools that can be used to analyze clinical notes from electronic health records: ClinicalRegex, a regular expression interface, and PyCCI, an a clinical text annotation tool. Additionally, I discuss a palliative care-focused research project in which I apply machine learning natural language processing methods to identifying clinical documentation in the palliative care and serious illness field. Advance care planning, which includes clarifying and documenting goals of care and preferences for future care, is essential for achieving end-of-life care that is consistent with the preferences of dying patients and their families. Physicians document their communication about these preferences as unstructured free text in clinical notes; as a result, routine assessment of this quality indicator is time consuming and costly. Integrating goals of care conversations and advance care planning into decision-making about palliative surgery have been shown to result in less invasive care near the time of death and improve clinical outcomes for both the patient and surviving family members. Natural language processing methods offer an efficient and scalable way to improve the visibility of documented serious illness conversations within electronic health record data, helping to better quality of care.
by Isabel Chien.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
16

Indovina, Donna Blodgett. "A natural language interface to MS-DOS /." Online version of thesis, 1989. http://hdl.handle.net/1850/10548.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Shah, Aalok Bipin 1977. "Iteractive design and natural language processing in the WISE Project." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80118.

Full text
Abstract:
Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.
Includes bibliographical references (p. 55-57).
by Aalok Bipin Shah.
S.B.and M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
18

Pham, Son Bao Computer Science &amp Engineering Faculty of Engineering UNSW. "Incremental knowledge acquisition for natural language processing." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/26299.

Full text
Abstract:
Linguistic patterns have been used widely in shallow methods to develop numerous NLP applications. Approaches for acquiring linguistic patterns can be broadly categorised into three groups: supervised learning, unsupervised learning and manual methods. In supervised learning approaches, a large annotated training corpus is required for the learning algorithms to achieve decent results. However, annotated corpora are expensive to obtain and usually available only for established tasks. Unsupervised learning approaches usually start with a few seed examples and gather some statistics based on a large unannotated corpus to detect new examples that are similar to the seed ones. Most of these approaches either populate lexicons for predefined patterns or learn new patterns for extracting general factual information; hence they are applicable to only a limited number of tasks. Manually creating linguistic patterns has the advantage of utilising an expert's knowledge to overcome the scarcity of annotated data. In tasks with no annotated data available, the manual way seems to be the only choice. One typical problem that occurs with manual approaches is that the combination of multiple patterns, possibly being used at different stages of processing, often causes unintended side effects. Existing approaches, however, do not focus on the practical problem of acquiring those patterns but rather on how to use linguistic patterns for processing text. A systematic way to support the process of manually acquiring linguistic patterns in an efficient manner is long overdue. This thesis presents KAFTIE, an incremental knowledge acquisition framework that strongly supports experts in creating linguistic patterns manually for various NLP tasks. KAFTIE addresses difficulties in manually constructing knowledge bases of linguistic patterns, or rules in general, often faced in existing approaches by: (1) offering a systematic way to create new patterns while ensuring they are consistent; (2) alleviating the difficulty in choosing the right level of generality when creating a new pattern; (3) suggesting how existing patterns can be modified to improve the knowledge base's performance; (4) making the effort in creating a new pattern, or modifying an existing pattern, independent of the knowledge base's size. KAFTIE, therefore, makes it possible for experts to efficiently build large knowledge bases for complex tasks. This thesis also presents the KAFDIS framework for discourse processing using new representation formalisms: the level-of-detail tree and the discourse structure graph.
APA, Harvard, Vancouver, ISO, and other styles
19

Li, Wenhui. "Sentiment analysis: Quantitative evaluation of subjective opinions using natural language processing." Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/28000.

Full text
Abstract:
Sentiment Analysis consists of recognizing sentiment orientation towards specific subjects within natural language texts. Most research in this area focuses on classifying documents as positive or negative. The purpose of this thesis is to quantitatively evaluate subjective opinions of customer reviews using a five star rating system, which is widely used on on-line review web sites, and to try to make the predicted score as accurate as possible. Firstly, this thesis presents two methods for rating reviews: classifying reviews by supervised learning methods as multi-class classification does, or rating reviews by using association scores of sentiment terms with a set of seed words extracted from the corpus, i.e. the unsupervised learning method. We extend the feature selection approach used in Turney's PMI-IR estimation by introducing semantic relatedness measures based up on the content of WordNet. This thesis reports on experiments using the two methods mentioned above for rating reviews using the combined feature set enriched with WordNet-selected sentiment terms. The results of these experiments suggest ways in which incorporating WordNet relatedness measures into feature selection may yield improvement over classification and unsupervised learning methods which do not use it. Furthermore, via ordinal meta-classifiers, we utilize the ordering information contained in the scores of bank reviews to improve the performance, we explore the effectiveness of re-sampling for reducing the problem of skewed data, and we check whether discretization benefits the ordinal meta-learning process. Finally, we combine the unsupervised and supervised meta-learning methods to optimize performance on our sentiment prediction task.
APA, Harvard, Vancouver, ISO, and other styles
20

Jarmasz, Mario. ""Roget's Thesaurus" as a lexical resource for natural language processing." Thesis, University of Ottawa (Canada), 2003. http://hdl.handle.net/10393/26493.

Full text
Abstract:
This dissertation presents an implementation of an electronic lexical knowledge base that uses the 1987 Penguin edition of Roget's Thesaurus as the source for its lexical material---the first implementation of a computerized Roget's to use an entire current edition. It explains the steps necessary for taking a machine-readable file and transforming it into a tractable system. Roget's organization is studied in detail and contrasted with WordNet's. We show two applications of the computerized Thesaurus: computing semantic similarity between words and phrases, and building lexical chains in a text. The experiments are performed using well-known benchmarks and the results are compared to those of other systems that use Roget's, WordNet and statistical techniques. Roget's has turned out to be an excellent resource for measuring semantic similarity; lexical chains are easily built but more difficult to evaluate. We also explain ways in which Roget's Thesaurus and WordNet can be combined.
APA, Harvard, Vancouver, ISO, and other styles
21

Hu, Jin. "Explainable Deep Learning for Natural Language Processing." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254886.

Full text
Abstract:
Deep learning methods get impressive performance in many Natural Neural Processing (NLP) tasks, but it is still difficult to know what happened inside a deep neural network. In this thesis, a general overview of Explainable AI and how explainable deep learning methods applied for NLP tasks is given. Then the Bi-directional LSTM and CRF (BiLSTM-CRF) model for Named Entity Recognition (NER) task is introduced, as well as the approach to make this model explainable. The approach to visualize the importance of neurons in Bi-LSTM layer of the model for NER by Layer-wise Relevance Propagation (LRP) is proposed, which can measure how neurons contribute to each predictionof a word in a sequence. Ideas about how to measure the influence of CRF layer of the Bi-LSTM-CRF model is also described.
Djupa inlärningsmetoder får imponerande prestanda i många naturliga Neural Processing (NLP) uppgifter, men det är fortfarande svårt att veta vad hände inne i ett djupt neuralt nätverk. I denna avhandling, en allmän översikt av förklarliga AI och hur förklarliga djupa inlärningsmetoder tillämpas för NLP-uppgifter ges. Då den bi-riktiga LSTM och CRF (BiLSTM-CRF) modell för Named Entity Recognition (NER) uppgift införs, liksom tillvägagångssättet för att göra denna modell förklarlig. De tillvägagångssätt för att visualisera vikten av neuroner i BiLSTM-skiktet av Modellen för NER genom Layer-Wise Relevance Propagation (LRP) föreslås, som kan mäta hur neuroner bidrar till varje förutsägelse av ett ord i en sekvens. Idéer om hur man mäter påverkan av CRF-skiktet i Bi-LSTM-CRF-modellen beskrivs också.
APA, Harvard, Vancouver, ISO, and other styles
22

O'Sullivan, John J. D. "Teach2Learn : gamifying education to gather training data for natural language processing." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/117320.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 65-66).
Teach2Learn is a website which crowd-sources the problem of labeling natural text samples using gamified education as an incentive. Students assign labels to text samples from an unlabeled data set, thereby teaching superised machine learning algorithms how to interpret new samples. In return, students can learn how that algorithm works by unlocking lessons written by researchers. This aligns the incentives of researchers and learners to help both achieve their goals. The application used current best practices in gamification to create a motivating structure around that labeling task. Testing showed that 27.7% of the user base (5/18 users) engaged with the content and labeled enough samples to unlock all of the lessons, suggesting that learning modules are sufficient motivation for the right users. Attempts to grow the platform through paid social media advertising were unsuccessful, likely because users aren't looking for a class when they browse those sites. Unpaid posts on subreddits discussing related topics, where users were more likely to be searching for learning opportunities, were more successful. Future research should seek users through comparable sites and explore how Teach2Learn can be used as an additional learning resource in classrooms.
by John J.D. O'Sullivan
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
23

Forsyth, Alexander William. "Improving clinical decision making with natural language processing and machine learning." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/112847.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 49-53).
This thesis focused on two tasks of applying natural language processing (NLP) and machine learning to electronic health records (EHRs) to improve clinical decision making. The first task was to predict cardiac resynchronization therapy (CRT) outcomes with better precision than the current physician guidelines for recommending the procedure. We combined NLP features from free-text physician notes with structured data to train a supervised classifier to predict CRT outcomes. While our results gave a slight improvement over the current baseline, we were not able to predict CRT outcome with both high precision and high recall. These results limit the clinical applicability of our model, and reinforce previous work, which also could not find accurate predictors of CRT response. The second task in this thesis was to extract breast cancer patient symptoms during chemotherapy from free-text physician notes. We manually annotated about 10,000 sentences, and trained a conditional random field (CRF) model to predict whether a word indicated a symptom (positive label), specifically indicated the absence of a symptom (negative label), or was neutral. Our final model achieved 0.66, 1.00, and 0.77 F1 scores for predicting positive, neutral, and negative labels respectively. While the F1 scores for positive and negative labels are not extremely high, with the current performance, our model could be applied, for example, to gather better statistics about what symptoms breast cancer patients experience during chemotherapy and at what time points during treatment they experience these symptoms.
by Alexander William Forsyth.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
24

Manek, Meenakshi. "Natural language interface to a VHDL modeling tool." Thesis, This resource online, 1993. http://scholar.lib.vt.edu/theses/available/etd-06232009-063212/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Watanabe, Kiyoshi. "Visible language : repetition and its artistic presentation with the computers." Thesis, Georgia Institute of Technology, 1997. http://hdl.handle.net/1853/17664.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Cohn, Trevor A. "Scaling conditional random fields for natural language processing /." Connect to thesis, 2007. http://eprints.unimelb.edu.au/archive/00002874.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Keller, Thomas Anderson. "Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing." Thesis, University of California, San Diego, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10599339.

Full text
Abstract:

Most machine learning algorithms require a fixed length input to be able to perform commonly desired tasks such as classification, clustering, and regression. For natural language processing, the inherently unbounded and recursive nature of the input poses a unique challenge when deriving such fixed length representations. Although today there is a general consensus on how to generate fixed length representations of individual words which preserve their meaning, the same cannot be said for sequences of words in sentences, paragraphs, or documents. In this work, we study the encoders commonly used to generate fixed length representations of natural language sequences, and analyze their effectiveness across a variety of high and low level tasks including sentence classification and question answering. Additionally, we propose novel improvements to the existing Skip-Thought and End-to-End Memory Network architectures and study their performance on both the original and auxiliary tasks. Ultimately, we show that the setting in which the encoders are trained, and the corpus used for training, have a greater influence of the final learned representation than the underlying sequence encoders themselves.

APA, Harvard, Vancouver, ISO, and other styles
28

Thompson, Cynthia Ann. "Semantic lexicon acquisition for learning natural language interfaces /." Digital version accessible at:, 1998. http://wwwlib.umi.com/cr/utexas/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Schäfer, Ulrich. "Integrating deep and shallow natural language processing components : representations and hybrid architectures /." Saarbrücken : German Reseach Center for Artificial Intelligence : Saarland University, Dept. of Computational Linguistics and Phonetics, 2007. http://www.loc.gov/catdir/toc/fy1001/2008384333.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Berman, Lucy. "Lewisian Properties and Natural Language Processing: Computational Linguistics from a Philosophical Perspective." Scholarship @ Claremont, 2019. https://scholarship.claremont.edu/cmc_theses/2200.

Full text
Abstract:
Nothing seems more obvious than that our words have meaning. When people speak to each other, they exchange information through the use of a particular set of words. The words they say to each other, moreover, are about something. Yet this relation of “aboutness,” known as “reference,” is not quite as simple as it appears. In this thesis I will present two opposing arguments about the nature of our words and how they relate to the things around us. First, I will present Hilary Putnam’s argument, in which he examines the indeterminacy of reference, forcing us to conclude that we must abandon metaphysical realism. While Putnam considers his argument to be a refutation of non-epistemicism, David Lewis takes it to be a reductio, claiming Putnam’s conclusion is incredible. I will present Lewis’s response to Putnam, in which he accepts the challenge of demonstrating how Putnam’s argument fails and rescuing us from the abandonment of realism. In order to explain the determinacy of reference, Lewis introduces the concept of “natural properties.” In the final chapter of this thesis, I will propose another use for Lewisian properties. Namely, that of helping to minimize the gap between natural language processing and human communication.
APA, Harvard, Vancouver, ISO, and other styles
31

Huber, Bernard J. Jr. "A knowledge-based approach to understanding natural language. /." Online version of thesis, 1991. http://hdl.handle.net/1850/11053.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Välme, Emma, and Lea Renmarker. "Accelerating Sustainability Report Assessment with Natural Language Processing." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445912.

Full text
Abstract:
Corporations are expected to be transparent on their sustainability impact and keep their stakeholders informed about how large the impact on the environment is, as well as their work on reducing the impact in question. The transparency is accounted for in a, usually voluntary, sustainability report additional to the already required financial report. With new regulations for mandatory sustainability reporting in Sweden, comprehensive and complete guidelines for corporations to follow are insufficient and the reports tend to be extensive. The reports are therefore hard to assess in terms of how well the reporting is actually done. The Sustainability Reporting Maturity Grid (SRMG) is an assessment tool introduced by Cöster et al. (2020) used for assessing the quality of sustainability reporting. Today, the assessment is performed manually which has proven to be both time-consuming and resulting in varying assessments, affected by individual interpretation of the content. This thesis is exploring how assessment time and grading with the SRMG can be improved by applying Natural Language Processing (NLP) on sustainability documents, resulting in a compressed assessment method - The Prototype. The Prototype intends to facilitate and speed up the process of assessment. The first step towards developing the Prototype was to decide which one of the three Machine Learning models; Naïve Bayes (NB), Support Vector Machines (SVM), or Bidirectional Encoder Representations of Transformers (BERT), is most suitable. This decision was supported by analyzing the accuracy for each model and for respective criteria in the SRMG, where BERT proved a strong classification ability with an average accuracy of 96,8%. Results from the user evaluation of the Prototypeindicated that the assessment time can be halved using the Prototype, with an initial average of 40 minutes decreased to 20 minutes. However, the results further showed a decreased average grading and an increased variation in assessment. The results indicate that applying NLP could be successful, but to get a more competitive Prototype, a more nuanced dataset must be developed, giving more space for the model to detect patterns in the data.
APA, Harvard, Vancouver, ISO, and other styles
33

Linckels, Serge, and Christoph Meinel. "An e-librarian service : natural language interface for an efficient semantic search within multimedia resources." Universität Potsdam, 2005. http://opus.kobv.de/ubp/volltexte/2009/3308/.

Full text
Abstract:
1 Introduction 1.1 Project formulation 1.2 Our contribution 2 Pedagogical Aspect 4 2.1 Modern teaching 2.2 Our Contribution 2.2.1 Autonomous and exploratory learning 2.2.2 Human machine interaction 2.2.3 Short multimedia clips 3 Ontology Aspect 3.1 Ontology driven expert systems 3.2 Our contribution 3.2.1 Ontology language 3.2.2 Concept Taxonomy 3.2.3 Knowledge base annotation 3.2.4 Description Logics 4 Natural language approach 4.1 Natural language processing in computer science 4.2 Our contribution 4.2.1 Explored strategies 4.2.2 Word equivalence 4.2.3 Semantic interpretation 4.2.4 Various problems 5 Information Retrieval Aspect 5.1 Modern information retrieval 5.2 Our contribution 5.2.1 Semantic query generation 5.2.2 Semantic relatedness 6 Implementation 6.1 Prototypes 6.2 Semantic layer architecture 6.3 Development 7 Experiments 7.1 Description of the experiments 7.2 General characteristics of the three sessions, instructions and procedure 7.3 First Session 7.4 Second Session 7.5 Third Session 7.6 Discussion and conclusion 8 Conclusion and future work 8.1 Conclusion 8.2 Open questions A Description Logics B Probabilistic context-free grammars
APA, Harvard, Vancouver, ISO, and other styles
34

Lazic, Marko. "Using Natural Language Processing to extract information from receipt text." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279302.

Full text
Abstract:
The ability to automatically read, recognize, and extract different information from unstructured text is of key importance to many areas. Most research in this area has been focused on scanned invoices. This thesis investigates the feasibility of using natural language processing to extract information from receipt text. Three different machine learning models, BiLSTM, GCN, and BERT, were trained to extract a total of 7 different data points from a dataset consisting of 790 receipts. In addition, a simple rule-based model is built to serve as a baseline. These four models were then compered on how well they perform on different data points. The best performing machine learning model was BERT with an overall F1 score of 0.455. The second best machine learning model was BiLSTM with the F1 score of 0.278 and GCN had the F1 score of 0.167. These F1 scores are highly affected by the low performance on the product list which was observed with all three models. BERT showed promising results on vendor name, date, tax rate, price, and currency. However, a simple rule-based method was able to outperform the BERT model on all data points except vendor name and tax rate. Receipt images from the dataset were often blurred, rotated, and crumbled which introduced a high OCR error. This error then propagated through all of the steps and was most likely the main rea- son why machine learning models, especially BERT were not able to perform. It is concluded that there is potential in using natural language processing for the problem of information extraction. However, further research is needed if it is going to outperform the rule-based models.
Förmågan att automatiskt läsa, känna igen och utvinna information från ostrukturerad text har en avgörande betydelse för många områden. Majoriteten av den forskning som gjorts inom området har varit inriktad på inskannade fakturor. Detta examensarbete undersöker huruvida språkteknologi kan användas för att utvinna information från kvittotext. Tre olika maskininlärningsmodeller, BiLSTM, GCN och BERT, tränades på att utvinna totalt 7 olika datapunkter från ett dataset bestående av 790 kvitton. Dessutom byggdes en enkel regel- baserad modell som en referens. Dessa fyra modeller har sedan jämförts på hur väl de presterat på de olika datapunkterna. Modellen som gav bäst resultat bland maskininlärningsmodellerna var BERT med F1-resultatet 0.455. Den näst bästa modellen var BiLSTM med F1-resultatet 0.278 medan GCN ha- de F1-resultat 0.167. Dessa resultat påverkas starkt av den låga prestandan på produktlistan som observerades med alla tre modellerna. BERT visade lovande resultat på leverantörens namn, datum, moms, pris och valuta. Dock hade den regelbaserade modellen bättre resultat på alla datapunkter förutom leve- rantörens namn och moms. Kvittobilder från datasetet är ofta suddiga, roterade och innehåller skrynkliga kvitton, vilket resulterar i ett högt fel hos maskinläsningverktyget. Detta fel propagerades sedan genom alla steg och var troligen den främsta anledningen till att maskininlärningsmodellerna, särskilt BERT, inte kunde prestera. Sammanfattningsvis kan slutsatsen dras att användandet av språkteknologi för att utvinna information från kvittotext har potential. Ytterligare forskning behövs dock om det ska användas istället för regelbaserade modeller.
APA, Harvard, Vancouver, ISO, and other styles
35

Chandra, Yohan. "Natural Language Interfaces to Databases." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5474/.

Full text
Abstract:
Natural language interfaces to databases (NLIDB) are systems that aim to bridge the gap between the languages used by humans and computers, and automatically translate natural language sentences to database queries. This thesis proposes a novel approach to NLIDB, using graph-based models. The system starts by collecting as much information as possible from existing databases and sentences, and transforms this information into a knowledge base for the system. Given a new question, the system will use this knowledge to analyze and translate the sentence into its corresponding database query statement. The graph-based NLIDB system uses English as the natural language, a relational database model, and SQL as the formal query language. In experiments performed with natural language questions ran against a large database containing information about U.S. geography, the system showed good performance compared to the state-of-the-art in the field.
APA, Harvard, Vancouver, ISO, and other styles
36

Custy, E. John. "An architecture for the semantic processing of natural language input to a policy workbench." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FCusty.pdf.

Full text
Abstract:
Thesis (M.S. in Software Engineering)--Naval Postgraduate School, March 2003.
Thesis advisor(s): James Bret Michael, Neil C. Rowe. Includes bibliographical references (p. 91-92). Also available online.
APA, Harvard, Vancouver, ISO, and other styles
37

Dua, Smrite. "Introducing Semantic Role Labels and Enhancing Dependency Parsing to Compute Politeness in Natural Language." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1430876809.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Dulle, John David. "A caption-based natural-language interface handling descriptive captions for a multimedia database system." Thesis, Monterey, California : Naval Postgraduate School, 1990. http://handle.dtic.mil/100.2/ADA236533.

Full text
Abstract:
Thesis (M.S. in Computer Science)--Naval Postgraduate School, June 1990.
Thesis Advisor(s): Lum, Vincent Y. ; Rowe, Neil C. "June 1990." Description based on signature page. DTIC Identifiers: Interfaces, natural language, databases, theses. Author(s) subject terms: Natural language processing, multimedia database system, natural language interface, descriptive captions. Includes bibliographical references (p. 27).
APA, Harvard, Vancouver, ISO, and other styles
39

Califf, Mary Elaine. "Relational learning techniques for natural language information extraction /." Digital version accessible at:, 1998. http://wwwlib.umi.com/cr/utexas/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Ramachandran, Venkateshwaran. "A temporal analysis of natural language narrative text." Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-03122009-040648/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Han, Yo-Sub. "Regular languages and codes /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20HAN.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Byström, Adam. "From Intent to Code : Using Natural Language Processing." Thesis, Uppsala universitet, Avdelningen för datalogi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-325238.

Full text
Abstract:
Programming and the possibility to express one’s intent to a machine is becoming a very important skill in our digitalizing society. Today, instructing a machine, such as a computer to perform actions is done through programming. What if this could be done with human language? This thesis examines how new technologies and methods in the form of Natural Language Processing can be used to make programming more accessible by translating intent expressed in natural language into code that a computer can execute. Related research has studied using natural language as a programming language and using natural language to instruct robots. These studies have shown promising results but are hindered by strict syntaxes, limited domains and inability to handle ambiguity. Studies have also been made using Natural Language Processing to analyse source code, turning code into natural language. This thesis has the reversed approach. By utilizing Natural Language Processing techniques, an intent can be translated into code containing concepts such as sequential execution, loops and conditional statements. In this study, a system for converting intent, expressed in English sentences, into code is developed. To analyse this approach to programming, an evaluation framework is developed, evaluating the system during the development process as well as usage of the final system. The results show that this way of programming might have potential but conclude that the Natural Language Processing models still have too low accuracy. Further research is required to increase this accuracy to further assess the potential of this way of programming.
APA, Harvard, Vancouver, ISO, and other styles
43

González, Alejandro. "A Swedish Natural Language Processing Pipeline For Building Knowledge Graphs." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254363.

Full text
Abstract:
The concept of knowledge is proper only to the human being thanks to the faculty of understanding. The immaterial concepts, independent of the material causes of the experience constitute an evident proof of the existence of the rational soul that makes the human being a spiritual being "in a way independent of the material. Nowadays research efforts in the field of Artificial Intelligence are trying to mimic this human capacity using computers by means of tteachingthem how to read and understand human language using Machine Learning techniques related to the processing of human language. However, there are still a significant number of challenges such as how to represent this knowledge so can be used by a machine to infer conclusions or provide answers. This thesis presents a Natural Language Processing pipeline that is capable of building a knowledge representation of the information contained in Swedish human-generated text. The result is a system that, given Swedish text in its raw format, builds a representation in the form of a Knowledge Graph of the knowledge or information contained in that text.
Vetskapen om kunskap är den del av det som definierar den nutida människan (som vet, att hon vet). De immateriella begreppen oberoende av materiella attribut är en del av beviset på att människan en själslig varelse som till viss del är oberoende av materialet. För närvarande försöker forskningsinsatser inom artificiell intelligens efterlikna det mänskliga betandet med hjälp av datorer genom att "lära" dem hur man läser och förstår mänskligt språk genom att använda maskininlärningstekniker relaterade till behandling av mänskligt språk. Det finns emellertid fortfarande ett betydande antal utmaningar, till exempel hur man representerar denna kunskap så att den kan användas av en maskin för att dra slutsatser eller ge svar utifrån detta. Denna avhandling presenterar en studie i användningen av ”Natural Language Processing” i en pipeline som kan generera en kunskapsrepresentation av informationen utifrån det svenska språket som bas. Resultatet är ett system som, med svensk text i råformat, bygger en representation i form av en kunskapsgraf av kunskapen eller informationen i den texten.
APA, Harvard, Vancouver, ISO, and other styles
44

Das, Dipanjan. "Semi-Supervised and Latent-Variable Models of Natural Language Semantics." Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/342.

Full text
Abstract:
This thesis focuses on robust analysis of natural language semantics. A primary bottleneck for semantic processing of text lies in the scarcity of high-quality and large amounts of annotated data that provide complete information about the semantic structure of natural language expressions. In this dissertation, we study statistical models tailored to solve problems in computational semantics, with a focus on modeling structure that is not visible in annotated text data. We first investigate supervised methods for modeling two kinds of semantic phenomena in language. First, we focus on the problem of paraphrase identification, which attempts to recognize whether two sentences convey the same meaning. Second, we concentrate on shallow semantic parsing, adopting the theory of frame semantics (Fillmore, 1982). Frame semantics offers deep linguistic analysis that exploits the use of lexical semantic properties and relationships among semantic frames and roles. Unfortunately, the datasets used to train our paraphrase and frame-semantic parsing models are too small to lead to robust performance. Therefore, a common trait in our methods is the hypothesis of hidden structure in the data. To this end, we employ conditional log-linear models over structures, that are firstly capable of incorporating a wide variety of features gathered from the data as well as various lexica, and secondly use latent variables to model missing information in annotated data. Our approaches towards solving these two problems achieve state-of-the-art accuracy on standard corpora. For the frame-semantic parsing problem, we present fast inference techniques for jointly modeling the semantic roles of a given predicate. We experiment with linear program formulations, and use a commercial solver as well as an exact dual decomposition technique that breaks the role labeling problem into several overlapping components. Continuing with the theme of hypothesizing hidden structure in data for modeling natural language semantics, we present methods to leverage large volumes of unlabeled data to improve upon the shallow semantic parsing task. We work within the framework of graph-based semi-supervised learning, a powerful method that associates similar natural language types, and helps propagate supervised annotations to unlabeled data. We use this framework to improve frame-semantic parsing performance on unknown predicates that are absent in annotated data. We also present a family of novel objective functions for graph-based learning that result in sparse probability measures over graph vertices, a desirable property for natural language types. Not only are these objectives easier to numerically optimize, but also they result in smoothed distributions over predicates that are smaller in size. The experiments presented in this dissertation empirically demonstrates that missing information in text corpora contain considerable semantic information that can be incorporated into structured models for semantics, to significant benefit over the current state of the art. The methods in this thesis were originally presented by Das and Smith (2009, 2011, 2012), and Das et al. (2010, 2012). The thesis gives a more thorough exposition, relating and comparing the methods, and also presents several extensions of the aforementioned papers.
APA, Harvard, Vancouver, ISO, and other styles
45

Ramos, Brás Juan Ariel. "Natural language processing and translation using augmented transition networks and semantic networks." Diss., Connect to the thesis, 2003. http://hdl.handle.net/10066/1480.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Kakavandy, Hanna, and John Landeholt. "How natural language processing can be used to improve digital language learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281693.

Full text
Abstract:
The world is facing globalization and with that, companies are growing and need to hire according their needs. A great obstacle for this is the language barrier between job applicants and employers who want to hire competent candidates. One spark of light in this challenge is Lingio, who provides a product that teaches digital profession-specific Swedish. Lingio intends to make their existing product more interactive and this research paper aims to research aspects involved in that. This study evaluates system utterances that are planned to be used in Lingio’s product for language learners to use in their practice and studies the feasibility of using the natural language model cosine similarity in classifying the correctness of answers to these utterances. This report also looks at whether it best to use crowd sourced material or a golden standard as benchmark for a correct answer. The results indicate that there are a number of improvements and developments that need to be made to the model in order for it to accurately classify answers due to its formulation and the complexity of human language. It is also concluded that the utterances by Lingio might need to be further developed in order to be efficient in their use for learning language and that crowd sourced material works better than a golden standard. The study makes several interesting observations from the collected data and analysis, aiming to contribute to further research in natural language engineering when it comes to text classification and digital language learning.
Globaliseringen medför flertal konsekvenser för växande företag. En av utmaningarna som företag står inför är anställandet av tillräckligt med kompentent personal. För många företag står språkbarriären mellan de och att anställa kompetens, arbetsökande har ofta inte tillräckligt med språkkunskaper för att klara av jobbet. Lingio är företag som arbetar med just detta, deras produkt är en digital applikation som undervisar yrkesspecific svenska, en effektiv lösning för den som vill fokusera sin inlärning av språket inför ett jobb. Syftet är att hjälpa Lingio i utvecklingen av deras produkt, närmare bestämt i arbetet med att göra den mer interaktiv. Detta görs genom att undersöka effektiviteten hos applikationens yttranden som används för inlärningssyfte och att använda en språkteknologisk modell för att klassificera en användares svar till ett yttrande. Vidare analyseras huruvida det är bäst att använda en golden standard eller insamlat material från enkäter som referenspunkt för ett korrekt yttrande. Resultatet visar att modellen har flertal svagheter och  behöver utvecklas för att kunna göra klassificeringen på ett korrekt sätt och att det finns utrymme för bättring när det kommer till yttrandena. Det visas även att insamlat material från enkäter fungerar bättre än en golden standard.
APA, Harvard, Vancouver, ISO, and other styles
47

Mahamood, Saad Ali. "Generating affective natural language for parents of neonatal infants." Thesis, University of Aberdeen, 2010. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=158569.

Full text
Abstract:
The thesis presented here describes original research in the field of Natural Language Generation (NLG). NLG is the subfield of artificial intelligence that is concerned with the automatic production of documents from underlying data. This thesis in particular focuses on developing new and novel methods for generating text that takes into consideration the recipient’s level of stress as a factor to adapt the resultant textural output. This consideration of taking the recipient level of stress was particularly salient due to the domain that this research was conducted under; providing information for parents of pre-term infants during neonatal intensive care (NICU). A highly technical and stressful environment for parents where emotional sensitivity must be shown for the nature of information presented. We have investigated the emotional and informational needs of these parents through an extensive past literature review and two separate research studies with former and current NICU parents. The NLG system built for this research was called BabyTalk Family (BT-Family). A system that can produce a textual summary of medical events that has occurred for a baby in NICU in last twenty-four hours for parents. The novelty of this system is that is capable of estimating the level of stress of the recipient and by using several affective NLG strategies it is able to tailor it’s output for a stressed audience. Unlike traditional NLG systems where the output would remain unchanged regardless of emotional state of the recipient. The key innovation in this system was the integration of several affective strategies in the Document Planner for tailoring textual output for stress recipients. BT-Family’s output was evaluated with thirteen parents that previously had baby in neonatal care. We developed a methodology for an evaluation that involved a direct comparison between stressed and unstressed text for the same given medical scenario for variables such as preference, understandability, helpfulness, and emotional appropriateness. The results, obtained showed the parents overwhelming preferred the stressed text for all of the variables measured.
APA, Harvard, Vancouver, ISO, and other styles
48

Augustsson, Christopher. "Multipurpose Case-Based Reasoning System, Using Natural Language Processing." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-104890.

Full text
Abstract:
Working as a field technician of any sort can many times be a challenging task. Often you find yourself alone, with a machine you have limited knowledge about, and the only support you have are the user manuals. As a result, it is not uncommon for companies to aid the technicians with a knowledge base that often revolves around some share point. But, unfortunately, the share points quickly get cluttered with too much information that leaves the user overwhelmed. Case-based reasoning (CBR), a form of problem-solving technology, uses previous cases to help users solve new problems they encounter, which could benefit the field technician. But for a CBR system to work with a wide variety of machines, the system must have a dynamic nature and handle multiple data types. By developing a prototype focusing on case retrieval, based on .Net core and MySql, this report sets the foundation for a highly dynamic CBR system that uses natural language processing to map case attributes during case retrieval. In addition, using datasets from UCI and Kaggle, the system's accuracy is validated, and by using a dataset created explicitly for this report, the system manifest to be robust.
APA, Harvard, Vancouver, ISO, and other styles
49

Buys, Jan Moolman. "Incremental generative models for syntactic and semantic natural language processing." Thesis, University of Oxford, 2017. https://ora.ox.ac.uk/objects/uuid:a9a7b5cf-3bb1-4e08-b109-de06bf387d1d.

Full text
Abstract:
This thesis investigates the role of linguistically-motivated generative models of syntax and semantic structure in natural language processing (NLP). Syntactic well-formedness is crucial in language generation, but most statistical models do not account for the hierarchical structure of sentences. Many applications exhibiting natural language understanding rely on structured semantic representations to enable querying, inference and reasoning. Yet most semantic parsers produce domain-specific or inadequately expressive representations. We propose a series of generative transition-based models for dependency syntax which can be applied as both parsers and language models while being amenable to supervised or unsupervised learning. Two models are based on Markov assumptions commonly made in NLP: The first is a Bayesian model with hierarchical smoothing, the second is parameterised by feed-forward neural networks. The Bayesian model enables careful analysis of the structure of the conditioning contexts required for generative parsers, but the neural network is more accurate. As a language model the syntactic neural model outperforms both the Bayesian model and n-gram neural networks, pointing to the complementary nature of distributed and structured representations for syntactic prediction. We propose approximate inference methods based on particle filtering. The third model is parameterised by recurrent neural networks (RNNs), dropping the Markov assumptions. Exact inference with dynamic programming is made tractable here by simplifying the structure of the conditioning contexts. We then shift the focus to semantics and propose models for parsing sentences to labelled semantic graphs. We introduce a transition-based parser which incrementally predicts graph nodes (predicates) and edges (arguments). This approach is contrasted against predicting top-down graph traversals. RNNs and pointer networks are key components in approaching graph parsing as an incremental prediction problem. The RNN architecture is augmented to condition the model explicitly on the transition system configuration. We develop a robust parser for Minimal Recursion Semantics, a linguistically-expressive framework for compositional semantics which has previously been parsed only with grammar-based approaches. Our parser is much faster than the grammar-based model, while the same approach improves the accuracy of neural Abstract Meaning Representation parsing.
APA, Harvard, Vancouver, ISO, and other styles
50

Botha, Gerrti Reinier. "Text-based language identification for the South African languages." Pretoria : [s.n.], 2007. http://upetd.up.ac.za/thesis/available/etd-090942008-133715/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography