Rozprawy doktorskie: „Natural Language Processing”

1

Matsubara, Shigeki. "Corpus-based Natural Language Processing". INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2004. http://hdl.handle.net/2237/10355.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

2

Smith, Sydney. "Approaches to Natural Language Processing". Scholarship @ Claremont, 2018. http://scholarship.claremont.edu/cmc_theses/1817.

Pełny tekst źródła

Streszczenie:

This paper explores topic modeling through the example text of Alice in Wonderland. It explores both singular value decomposition as well as non-‐‑negative matrix factorization as methods for feature extraction. The paper goes on to explore methods for partially supervised implementation of topic modeling through introducing themes. A large portion of the paper also focuses on implementation of these techniques in python as well as visualizations of the results which use a combination of python, html and java script along with the d3 framework. The paper concludes by presenting a mixture of SVD, NMF and partially-‐‑supervised NMF as a possible way to improve topic modeling.

Style APA, Harvard, Vancouver, ISO itp.

3

Strandberg, Aron, i Patrik Karlström. "Processing Natural Language for the Spotify API : Are sophisticated natural language processing algorithms necessary when processing language in a limited scope?" Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186867.

Pełny tekst źródła

Streszczenie:

Knowing whether you can implement something complex in a simple way in your application is always of interest. A natural language interface is some- thing that could theoretically be implemented in a lot of applications but the complexity of most natural language processing algorithms is a limiting factor. The problem explored in this paper is whether a simpler algorithm that doesn’t make use of convoluted statistical models and machine learning can be good enough. We implemented two algorithms, one utilizing Spotify’s own search and one with a more accurate, o✏ine search. With the best precision we could muster being 81% at an average of 2,28 seconds per query this is not a viable solution for a complete and satisfactory user experience. Further work could push the performance into an acceptable range.

Style APA, Harvard, Vancouver, ISO itp.

4

Chen, Joseph C. H. "Quantum computation and natural language processing". [S.l.] : [s.n.], 2002. http://deposit.ddb.de/cgi-bin/dokserv?idn=965581020.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

5

Knight, Sylvia Frances. "Natural language processing for aerospace documentation". Thesis, University of Cambridge, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621395.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

6

Naphtal, Rachael (Rachael M. ). "Natural language processing based nutritional application". Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100640.

Pełny tekst źródła

Streszczenie:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 67-68).
The ability to accurately and eciently track nutritional intake is a powerful tool in combating obesity and other food related diseases. Currently, many methods used for this task are time consuming or easily abandoned; however, a natural language based application that converts spoken text to nutritional information could be a convenient and eective solution. This thesis describes the creation of an application that translates spoken food diaries into nutritional database entries. It explores dierent methods for solving the problem of converting brands, descriptions and food item names into entries in nutritional databases. Specifically, we constructed a cache of over 4,000 food items, and also created a variety of methods to allow refinement of database mappings. We also explored methods of dealing with ambiguous quantity descriptions and the mapping of spoken quantity values to numerical units. When assessed by 500 users entering their daily meals on Amazon Mechanical Turk, the system was able to map 83.8% of the correctly interpreted spoken food items to relevant nutritional database entries. It was also able to nd a logical quantity for 92.2% of the correct food entries. Overall, this system shows a signicant step towards the intelligent conversion of spoken food diaries to actual nutritional feedback.
by Rachael Naphtal.
M. Eng.

Style APA, Harvard, Vancouver, ISO itp.

7

Eriksson, Simon. "COMPARING NATURAL LANGUAGE PROCESSING TO STRUCTURED QUERY LANGUAGE ALGORITHMS". Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163310.

Pełny tekst źródła

Streszczenie:

Using natural language processing to create Structured Query Language (SQL) queries has many benefits in theory. Even though SQL is an expressive and powerful language it requires certain technical knowledge to use. An interface effectively utilizing natural language processing would instead allow the user to communicate with the SQL database as if they were communicating with another human being. In this paper I compare how two of the currently most advanced open source algorithms (TypeSQL and SyntaxSQL) in this field can understandadvanced SQL. I show that SyntaxSQL is signicantly more accurate but makes some sacrices in execution time compared to TypeSQL.

Style APA, Harvard, Vancouver, ISO itp.

8

Kesarwani, Vaibhav. "Automatic Poetry Classification Using Natural Language Processing". Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/37309.

Pełny tekst źródła

Streszczenie:

Poetry, as a special form of literature, is crucial for computational linguistics. It has a high density of emotions, figures of speech, vividness, creativity, and ambiguity. Poetry poses a much greater challenge for the application of Natural Language Processing algorithms than any other literary genre. Our system establishes a computational model that classifies poems based on similarity features like rhyme, diction, and metaphor. For rhyme analysis, we investigate the methods used to classify poems based on rhyme patterns. First, the overview of different types of rhymes is given along with the detailed description of detecting rhyme type and sub-types by the application of a pronunciation dictionary on our poetry dataset. We achieve an accuracy of 96.51% in identifying rhymes in poetry by applying a phonetic similarity model. Then we achieve a rhyme quantification metric RhymeScore based on the matching phonetic transcription of each poem. We also develop an application for the visualization of this quantified RhymeScore as a scatter plot in 2 or 3 dimensions. For diction analysis, we investigate the methods used to classify poems based on diction. First the linguistic quantitative and semantic features that constitute diction are enumerated. Then we investigate the methodology used to compute these features from our poetry dataset. We also build a word embeddings model on our poetry dataset with 1.5 million words in 100 dimensions and do a comparative analysis with GloVe embeddings. Metaphor is a part of diction, but as it is a very complex topic in its own right, we address it as a stand-alone issue and develop several methods for it. Previous work on metaphor detection relies on either rule-based or statistical models, none of them applied to poetry. Our methods focus on metaphor detection in a poetry corpus, but we test on non-poetry data as well. We combine rule-based and statistical models (word embeddings) to develop a new classification system. Our first metaphor detection method achieves a precision of 0.759 and a recall of 0.804 in identifying one type of metaphor in poetry, by using a Support Vector Machine classifier with various types of features. Furthermore, our deep learning model based on a Convolutional Neural Network achieves a precision of 0.831 and a recall of 0.836 for the same task. We also develop an application for generic metaphor detection in any type of natural text.

Style APA, Harvard, Vancouver, ISO itp.

9

Pham, Son Bao Computer Science &amp Engineering Faculty of Engineering UNSW. "Incremental knowledge acquisition for natural language processing". Awarded by:University of New South Wales. School of Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/26299.

Pełny tekst źródła

Streszczenie:

Linguistic patterns have been used widely in shallow methods to develop numerous NLP applications. Approaches for acquiring linguistic patterns can be broadly categorised into three groups: supervised learning, unsupervised learning and manual methods. In supervised learning approaches, a large annotated training corpus is required for the learning algorithms to achieve decent results. However, annotated corpora are expensive to obtain and usually available only for established tasks. Unsupervised learning approaches usually start with a few seed examples and gather some statistics based on a large unannotated corpus to detect new examples that are similar to the seed ones. Most of these approaches either populate lexicons for predefined patterns or learn new patterns for extracting general factual information; hence they are applicable to only a limited number of tasks. Manually creating linguistic patterns has the advantage of utilising an expert's knowledge to overcome the scarcity of annotated data. In tasks with no annotated data available, the manual way seems to be the only choice. One typical problem that occurs with manual approaches is that the combination of multiple patterns, possibly being used at different stages of processing, often causes unintended side effects. Existing approaches, however, do not focus on the practical problem of acquiring those patterns but rather on how to use linguistic patterns for processing text. A systematic way to support the process of manually acquiring linguistic patterns in an efficient manner is long overdue. This thesis presents KAFTIE, an incremental knowledge acquisition framework that strongly supports experts in creating linguistic patterns manually for various NLP tasks. KAFTIE addresses difficulties in manually constructing knowledge bases of linguistic patterns, or rules in general, often faced in existing approaches by: (1) offering a systematic way to create new patterns while ensuring they are consistent; (2) alleviating the difficulty in choosing the right level of generality when creating a new pattern; (3) suggesting how existing patterns can be modified to improve the knowledge base's performance; (4) making the effort in creating a new pattern, or modifying an existing pattern, independent of the knowledge base's size. KAFTIE, therefore, makes it possible for experts to efficiently build large knowledge bases for complex tasks. This thesis also presents the KAFDIS framework for discourse processing using new representation formalisms: the level-of-detail tree and the discourse structure graph.

Style APA, Harvard, Vancouver, ISO itp.

10

張少能 i Siu-nang Bruce Cheung. "A concise framework of natural language processing". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1989. http://hub.hku.hk/bib/B31208563.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

11

Cahill, Lynne Julie. "Syllable-based morphology for natural language processing". Thesis, University of Sussex, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.386529.

Pełny tekst źródła

Streszczenie:

This thesis addresses the problem of accounting for morphological alternation within Natural Language Processing. It proposes an approach to morphology which is based on phonological concepts, in particular the syllable, in contrast to morpheme-based approaches which have standardly been used by both NLP and linguistics. It is argued that morpheme-based approaches, within both linguistics and NLP, grew out of the apparently purely affixational morphology of European languages, and especially English, but are less appropriate for non-affixational languages such as Arabic. Indeed, it is claimed that even accounts of those European languages miss important linguistic generalizations by ignoring more phonologically based alternations, such as umlaut in German and ablaut in English. To justify this approach, we present a wide range of data from languages as diverse as German and Rotuman. A formal language, MOLUSe, is described, which allows for the definition of declarative mappings between syllable-sequences, and accounts of non-trivial fragments of the inflectional morphology of English, Arabic and Sanskrit are presented, to demonstrate the capabilities of the language. A semantics for the language is defined, and the implementation of an interpreter is described. The thesis discusses theoretical (linguistic) issues, as well as implementational issues involved in the incorporation of MOLUSC into a larger lexicon system. The approach is contrasted with previous work in computational morphology, in particular finite-state morphology, and its relation to other work in the fields of morphology and phonology is also discussed.

Style APA, Harvard, Vancouver, ISO itp.

12

Lei, Tao Ph D. Massachusetts Institute of Technology. "Interpretable neural models for natural language processing". Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/108990.

Pełny tekst źródła

Streszczenie:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 109-119).
The success of neural network models often comes at a cost of interpretability. This thesis addresses the problem by providing justifications behind the model's structure and predictions. In the first part of this thesis, we present a class of sequence operations for text processing. The proposed component generalizes from convolution operations and gated aggregations. As justifications, we relate this component to string kernels, i.e. functions measuring the similarity between sequences, and demonstrate how it encodes the efficient kernel computing algorithm into its structure. The proposed model achieves state-of-the-art or competitive results compared to alternative architectures (such as LSTMs and CNNs) across several NLP applications. In the second part, we learn rationales behind the model's prediction by extracting input pieces as supporting evidence. Rationales are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by the desiderata for rationales. We demonstrate the effectiveness of this learning framework in applications such multi-aspect sentiment analysis. Our method achieves a performance over 90% evaluated against manual annotated rationales.
by Tao Lei.
Ph. D.

Style APA, Harvard, Vancouver, ISO itp.

13

Grinman, Alex J. "Natural language processing on encrypted patient data". Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/113438.

Pełny tekst źródła

Streszczenie:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 85-86).
While many industries can benefit from machine learning techniques for data analysis, they often do not have the technical expertise nor computational power to do so. Therefore, many organizations would benefit from outsourcing their data analysis. Yet, stringent data privacy policies prevent outsourcing sensitive data and may stop the delegation of data analysis in its tracks. In this thesis, we put forth a two-party system where one party capable of powerful computation can run certain machine learning algorithms from the natural language processing domain on the second party's data, where the first party is limited to learning only specific functions of the second party's data and nothing else. Our system provides simple cryptographic schemes for locating keywords, matching approximate regular expressions, and computing frequency analysis on encrypted data. We present a full implementation of this system in the form of a extendible software library and a command line interface. Finally, we discuss a medical case study where we used our system to run a suite of unmodified machine learning algorithms on encrypted free text patient notes.
by Alex J. Grinman.
M. Eng.

Style APA, Harvard, Vancouver, ISO itp.

14

Alharthi, Haifa. "Natural Language Processing for Book Recommender Systems". Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39134.

Pełny tekst źródła

Streszczenie:

The act of reading has benefits for individuals and societies, yet studies show that reading declines, especially among the young. Recommender systems (RSs) can help stop such decline. There is a lot of research regarding literary books using natural language processing (NLP) methods, but the analysis of textual book content to improve recommendations is relatively rare. We propose content-based recommender systems that extract elements learned from book texts to predict readers’ future interests. One factor that influences reading preferences is writing style; we propose a system that recommends books after learning their authors’ writing style. To our knowledge, this is the first work that transfers the information learned by an author-identification model to a book RS. Another approach that we propose uses over a hundred lexical, syntactic, stylometric, and fiction-based features that might play a role in generating high-quality book recommendations. Previous book RSs include very few stylometric features; hence, our study is the first to include and analyze a wide variety of textual elements for book recommendations. We evaluated both approaches according to a top-k recommendation scenario. They give better accuracy when compared with state-of-the-art content and collaborative filtering methods. We highlight the significant factors that contributed to the accuracy of the recommendations using a forest of randomized regression trees. We also conducted a qualitative analysis by checking if similar books/authors were annotated similarly by experts. Our content-based systems suffer from the new user problem, well-known in the field of RSs, that hinders their ability to make accurate recommendations. Therefore, we propose a Topic Model-Based book recommendation component (TMB) that addresses the issue by using the topics learned from a user’s shared text on social media, to recognize their interests and map them to related books. To our knowledge, there is no literature regarding book RSs that exploits public social networks other than book-cataloging websites. Using topic modeling techniques, extracting user interests can be automatic and dynamic, without the need to search for predefined concepts. Though TMB is designed to complement other systems, we evaluated it against a traditional book CB. We assessed the top k recommendations made by TMB and CB and found that both retrieved a comparable number of books, even though CB relied on users’ rating history, while TMB only required their social profiles.

Style APA, Harvard, Vancouver, ISO itp.

15

Medlock, Benjamin William. "Investigating classification for natural language processing tasks". Thesis, University of Cambridge, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.611949.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

16

Huang, Yin Jou. "Event Centric Approaches in Natural Language Processing". Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/265210.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

17

Woldemariam, Yonas Demeke. "Natural language processing in cross-media analysis". Licentiate thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-147640.

Pełny tekst źródła

Streszczenie:

A cross-media analysis framework is an integrated multi-modal platform where a media resource containing different types of data such as text, images, audio and video is analyzed with metadata extractors, working jointly to contextualize the media resource. It generally provides cross-media analysis and automatic annotation, metadata publication and storage, searches and recommendation services. For on-line content providers, such services allow them to semantically enhance a media resource with the extracted metadata representing the hidden meanings and make it more efficiently searchable. Within the architecture of such frameworks, Natural Language Processing (NLP) infrastructures cover a substantial part. The NLP infrastructures include text analysis components such as a parser, named entity extraction and linking, sentiment analysis and automatic speech recognition. Since NLP tools and techniques are originally designed to operate in isolation, integrating them in cross-media frameworks and analyzing textual data extracted from multimedia sources is very challenging. Especially, the text extracted from audio-visual content lack linguistic features that potentially provide important clues for text analysis components. Thus, there is a need to develop various techniques to meet the requirements and design principles of the frameworks. In our thesis, we explore developing various methods and models satisfying text and speech analysis requirements posed by cross-media analysis frameworks. The developed methods allow the frameworks to extract linguistic knowledge of various types and predict various information such as sentiment and competence. We also attempt to enhance the multilingualism of the frameworks by designing an analysis pipeline that includes speech recognition, transliteration and named entity recognition for Amharic, that also enables the accessibility of Amharic contents on the web more efficiently. The method can potentially be extended to support other under-resourced languages.

Style APA, Harvard, Vancouver, ISO itp.

18

Cheung, Siu-nang Bruce. "A concise framework of natural language processing /". [Hong Kong : University of Hong Kong], 1989. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12432544.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

19

Dawborn, Timothy James. "DOCREP: Document Representation for Natural Language Processing". Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/14767.

Pełny tekst źródła

Streszczenie:

The field of natural language processing (NLP) revolves around the computational interpretation and generation of natural language. The language typically processed in NLP occurs in paragraphs or documents rather than in single isolated sentences. Despite this, most NLP tools operate over one sentence at a time, not utilising the context outside of the sentence nor any of the metadata associated with the underlying document. One pragmatic reason for this disparity is that representing documents and their annotations through an NLP pipeline is difficult with existing infrastructure. Representing linguistic annotations for a text document using a plain text markupbased format is not sufficient to capture arbitrarily nested and overlapping annotations. Despite this, most linguistic text corpora and NLP tools still operate in this fashion. A document representation framework (DRF) supports the creation of linguistic annotations stored separately to the original document, overcoming this nesting and overlapping annotations problem. Despite the prevalence of pipelines in NLP, there is little published work on, or implementations of, DRFs. The main DRFs, GATE and UIMA, exhibit usability issues which have limited their uptake by the NLP community. This thesis aims to solve this problem through a novel, modern DRF, DOCREP; a portmanteau of document representation. DOCREP is designed to be efficient, programming language and environment agnostic, and most importantly, easy to use. We want DOCREP to be powerful and simple enough to use that NLP researchers and language technology application developers would even use it in their own small projects instead of developing their own ad hoc solution. This thesis begins by presenting the design criteria for our new DRF, extending upon existing requirements from the literature with additional usability and efficiency requirements that should lead to greater use of DRFs. We outline how our new DRF, DOCREP, differs from existing DRFs in terms of the data model, serialisation strategy, developer interactions, support for rapid prototyping, and the expected runtime and environment requirements. We then describe our provided implementations of DOCREP in Python, C++, and Java, the most common languages in NLP; outlining their efficiency, idiomaticity, and the ways in which these implementations satisfy our design requirements. We then present two different evaluations of DOCREP. First, we evaluate its ability to model complex linguistic corpora through the conversion of the OntoNotes 5 corpus to DOCREP and UIMA, outlining the differences in modelling approaches required and efficiency when using these two DRFs. Second, we evaluate DOCREP against our usability requirements from the perspective of a computational linguist who is new to DOCREP. We walk through a number of common use cases for working with text corpora and contrast traditional approaches again their DOCREP counterpart. These two evaluations conclude that DOCREP satisfies our outlined design requirements and outperforms existing DRFs in terms of efficiency, and most importantly, usability. With DOCREP designed and evaluated, we then show how NLP applications can harness document structure. We present a novel document structureaware tokenization framework for the first stage of fullstack NLP systems. We then present a new structureaware NER system which achieves stateoftheart results on multiple standard NER evaluations. The tokenization framework produces its tokenization, sentence boundary, and document structure annotations as native DOCREP annotations. The NER system consumes DOCREP annotations and utilises many components of the DOCREP runtime. We believe that the adoption of DOCREP throughout the NLP community will assist in the reproducibility of results, substitutability of components, and overall quality assurance of NLP systems and corpora, all of which are problematic areas within NLP research and applications. This adoption will make developing and combining NLP components into applications faster, more efficient, and more reliable.

Style APA, Harvard, Vancouver, ISO itp.

20

Miao, Yishu. "Deep generative models for natural language processing". Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:e4e1f1f9-e507-4754-a0ab-0246f1e1e258.

Pełny tekst źródła

Streszczenie:

Deep generative models are essential to Natural Language Processing (NLP) due to their outstanding ability to use unlabelled data, to incorporate abundant linguistic features, and to learn interpretable dependencies among data. As the structure becomes deeper and more complex, having an effective and efficient inference method becomes increasingly important. In this thesis, neural variational inference is applied to carry out inference for deep generative models. While traditional variational methods derive an analytic approximation for the intractable distributions over latent variables, here we construct an inference network conditioned on the discrete text input to provide the variational distribution. The powerful neural networks are able to approximate complicated non-linear distributions and grant the possibilities for more interesting and complicated generative models. Therefore, we develop the potential of neural variational inference and apply it to a variety of models for NLP with continuous or discrete latent variables. This thesis is divided into three parts. Part I introduces a generic variational inference framework for generative and conditional models of text. For continuous or discrete latent variables, we apply a continuous reparameterisation trick or the REINFORCE algorithm to build low-variance gradient estimators. To further explore Bayesian non-parametrics in deep neural networks, we propose a family of neural networks that parameterise categorical distributions with continuous latent variables. Using the stick-breaking construction, an unbounded categorical distribution is incorporated into our deep generative models which can be optimised by stochastic gradient back-propagation with a continuous reparameterisation. Part II explores continuous latent variable models for NLP. Chapter 3 discusses the Neural Variational Document Model (NVDM): an unsupervised generative model of text which aims to extract a continuous semantic latent variable for each document. In Chapter 4, the neural topic models modify the neural document models by parameterising categorical distributions with continuous latent variables, where the topics are explicitly modelled by discrete latent variables. The models are further extended to neural unbounded topic models with the help of stick-breaking construction, and a truncation-free variational inference method is proposed based on a Recurrent Stick-breaking construction (RSB). Chapter 5 describes the Neural Answer Selection Model (NASM) for learning a latent stochastic attention mechanism to model the semantics of question-answer pairs and predict their relatedness. Part III discusses discrete latent variable models. Chapter 6 introduces latent sentence compression models. The Auto-encoding Sentence Compression Model (ASC), as a discrete variational auto-encoder, generates a sentence by a sequence of discrete latent variables representing explicit words. The Forced Attention Sentence Compression Model (FSC) incorporates a combined pointer network biased towards the usage of words from source sentence, which significantly improves the performance when jointly trained with the ASC model in a semi-supervised learning fashion. Chapter 7 describes the Latent Intention Dialogue Models (LIDM) that employ a discrete latent variable to learn underlying dialogue intentions. Additionally, the latent intentions can be interpreted as actions guiding the generation of machine responses, which could be further refined autonomously by reinforcement learning. Finally, Chapter 8 summarizes our findings and directions for future work.

Style APA, Harvard, Vancouver, ISO itp.

21

Hu, Jin. "Explainable Deep Learning for Natural Language Processing". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254886.

Pełny tekst źródła

Streszczenie:

Deep learning methods get impressive performance in many Natural Neural Processing (NLP) tasks, but it is still difficult to know what happened inside a deep neural network. In this thesis, a general overview of Explainable AI and how explainable deep learning methods applied for NLP tasks is given. Then the Bi-directional LSTM and CRF (BiLSTM-CRF) model for Named Entity Recognition (NER) task is introduced, as well as the approach to make this model explainable. The approach to visualize the importance of neurons in Bi-LSTM layer of the model for NER by Layer-wise Relevance Propagation (LRP) is proposed, which can measure how neurons contribute to each predictionof a word in a sequence. Ideas about how to measure the influence of CRF layer of the Bi-LSTM-CRF model is also described.
Djupa inlärningsmetoder får imponerande prestanda i många naturliga Neural Processing (NLP) uppgifter, men det är fortfarande svårt att veta vad hände inne i ett djupt neuralt nätverk. I denna avhandling, en allmän översikt av förklarliga AI och hur förklarliga djupa inlärningsmetoder tillämpas för NLP-uppgifter ges. Då den bi-riktiga LSTM och CRF (BiLSTM-CRF) modell för Named Entity Recognition (NER) uppgift införs, liksom tillvägagångssättet för att göra denna modell förklarlig. De tillvägagångssätt för att visualisera vikten av neuroner i BiLSTM-skiktet av Modellen för NER genom Layer-Wise Relevance Propagation (LRP) föreslås, som kan mäta hur neuroner bidrar till varje förutsägelse av ett ord i en sekvens. Idéer om hur man mäter påverkan av CRF-skiktet i Bi-LSTM-CRF-modellen beskrivs också.

Style APA, Harvard, Vancouver, ISO itp.

22

Gainon, de Forsan de Gabriac Clara. "Deep Natural Language Processing for User Representation". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS274.

Pełny tekst źródła

Streszczenie:

La dernière décennie a vu s’imposer le développement des méthodes de Deep Learning (DL), aussi bien dans le monde académique qu’industriel. Ce succès peut s’expliquer par la capacité du DL à modéliser des entités toujours plus complexes. En particulier, les méthodes de Representation Learning se concentrent sur l’apprentissage de représentations latentes issues de données hétérogènes, à la fois versatiles et réutilisables, notamment en Natural Language Processing (NLP). En parallèle, le nombre grandissant de systèmes reposant sur des données utilisateurs entraînent leur lot de défis.Cette thèse propose des méthodes tirant partie du pouvoir de représentation du NLP pour apprendre des représentations d’utilisateur riches et versatiles. D'abord, nous étudions la Recommandation. Nous parlons ensuite des récentes avancées du NLP et des moyens de les appliquer de façon à tirer partie des textes écrits par les utilisateurs, pour enfin détailler les modèles génératifs. Puis, nous présentons un Système de Recommandation fondé sur la combinaison, d’une méthode de représentation par factorisation matricielle traditionnelle, et d’un modèle d’analyse de sentiments. Nos expériences montrent que, en plus d’améliorer les performances, ce modèle nous permet de comprendre ce qui intéresse l’utilisateur chez un produit, et de fournir des explications concernant les suggestions émises par le modèle. Enfin, nous présentons une nouvelle tâche centrée sur la représentation d’utilisateur : l’apprentissage de profil professionnel. Nous proposons un cadre de travail pour l’apprentissage et l’évaluation des profils professionnels sur différentes tâches, notamment la génération du prochain job
The last decade has witnessed the impressive expansion of Deep Learning (DL) methods, both in academic research and the private sector. This success can be explained by the ability DL to model ever more complex entities. In particular, Representation Learning methods focus on building latent representations from heterogeneous data that are versatile and re-usable, namely in Natural Language Processing (NLP). In parallel, the ever-growing number of systems relying on user data brings its own lot of challenges. This work proposes methods to leverage the representation power of NLP in order to learn rich and versatile user representations.Firstly, we detail the works and domains associated with this thesis. We study Recommendation. We then go over recent NLP advances and how they can be applied to leverage user-generated texts, before detailing Generative models.Secondly, we present a Recommender System (RS) that is based on the combination of a traditional Matrix Factorization (MF) representation method and a sentiment analysis model. The association of those modules forms a dual model that is trained on user reviews for rating prediction. Experiments show that, on top of improving performances, the model allows us to better understand what the user is really interested in in a given item, as well as to provide explanations to the suggestions made.Finally, we introduce a new task-centered on UR: Professional Profile Learning. We thus propose an NLP-based framework, to learn and evaluate professional profiles on different tasks, including next job generation

Style APA, Harvard, Vancouver, ISO itp.

23

Guy, Alison. "Logical expressions in natural language conditionals". Thesis, University of Sunderland, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.278644.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

24

Walker, Alden. "Natural language interaction with robots". Diss., Connect to the thesis, 2007. http://hdl.handle.net/10066/1275.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

25

Fuchs, Gil Emanuel. "Practical natural language processing question answering using graphs /". Diss., Digital Dissertations Database. Restricted to UC campuses, 2004. http://uclibs.org/PID/11984.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

26

Kolak, Okan. "Rapid resource transfer for multilingual natural language processing". College Park, Md. : University of Maryland, 2005. http://hdl.handle.net/1903/3182.

Pełny tekst źródła

Streszczenie:

Thesis (Ph. D.) -- University of Maryland, College Park, 2005.
Thesis research directed by: Dept. of Linguistics. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

Style APA, Harvard, Vancouver, ISO itp.

27

Takeda, Koichi. "Building Natural Language Processing Applications Using Descriptive Models". 京都大学 (Kyoto University), 2010. http://hdl.handle.net/2433/120372.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

28

Åkerud, Daniel, i Henrik Rendlo. "Natural Language Processing from a Software Engineering Perspective". Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2056.

Pełny tekst źródła

Streszczenie:

This thesis is intended to deal with questions related to the processing of naturally occurring texts, also known as natural language processing (NLP). The subject will be approached from a software engineering perspective, and the problem description will be formulated thereafter. The thesis is roughly divided into two major parts. The first part contains a literature study covering fundamental concepts and algorithms. We discuss both serial and parallel architectures, and conclude that different scenarios call for different architectures. The second part is an empirical evaluation of an NLP framework or toolkit chosen amongst a few, conducted in order to elucidate the theoretical part of the thesis. We argue that component based development in a portable language could increase the reusability in the NLP community, where reuse is currently low. The recent emergence of the discovered initiatives and the great potential of many applications in this area reveal a bright future for NLP.

Style APA, Harvard, Vancouver, ISO itp.

29

Byström, Adam. "From Intent to Code : Using Natural Language Processing". Thesis, Uppsala universitet, Avdelningen för datalogi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-325238.

Pełny tekst źródła

Streszczenie:

Programming and the possibility to express one’s intent to a machine is becoming a very important skill in our digitalizing society. Today, instructing a machine, such as a computer to perform actions is done through programming. What if this could be done with human language? This thesis examines how new technologies and methods in the form of Natural Language Processing can be used to make programming more accessible by translating intent expressed in natural language into code that a computer can execute. Related research has studied using natural language as a programming language and using natural language to instruct robots. These studies have shown promising results but are hindered by strict syntaxes, limited domains and inability to handle ambiguity. Studies have also been made using Natural Language Processing to analyse source code, turning code into natural language. This thesis has the reversed approach. By utilizing Natural Language Processing techniques, an intent can be translated into code containing concepts such as sequential execution, loops and conditional statements. In this study, a system for converting intent, expressed in English sentences, into code is developed. To analyse this approach to programming, an evaluation framework is developed, evaluating the system during the development process as well as usage of the final system. The results show that this way of programming might have potential but conclude that the Natural Language Processing models still have too low accuracy. Further research is required to increase this accuracy to further assess the potential of this way of programming.

Style APA, Harvard, Vancouver, ISO itp.

30

Bigert, Johnny. "Automatic and unsupervised methods in natural language processing". Doctoral thesis, Stockholm, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-156.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

31

Cohn, Trevor A. "Scaling conditional random fields for natural language processing /". Connect to thesis, 2007. http://eprints.unimelb.edu.au/archive/00002874.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

32

Zhang, Lidan, i 张丽丹. "Exploiting linguistic knowledge for statistical natural language processing". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46506299.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

33

Cosh, Kenneth John. "Supporting organisational semiotics with natural language processing techniques". Thesis, Lancaster University, 2003. http://eprints.lancs.ac.uk/12351/.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

34

Allott, Nicholas Mark. "A natural language processing framework for automated assessment". Thesis, Nottingham Trent University, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.314333.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

35

Fass, D. "Collative semantics : A semantics for natural language processing". Thesis, University of Essex, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.383507.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

36

Grubbs, Elmer Andrew. "An information theoretic approach to natural language processing". Diss., The University of Arizona, 1994. http://hdl.handle.net/10150/186886.

Pełny tekst źródła

Streszczenie:

A new method of natural language processing, based on the theory of information is described. Parsing of a sentence is accomplished not in a sequential manner, but in a fashion that begins by searching for the main verb of the sentence, then for the object, subject and perhaps for a prepositional phrase. As each new part of speech is located, the uncertainty of the sentence's meaning is reduced. When the uncertainty reaches zero, the parsing is complete, and the machine performs the task assigned by the input sentence. The process is modeled by a Markov Chain, which can often be used for the internal representation of the sentence. All of this work is done for communication with an intelligent task oriented machine, but the theoretical basis for extending this to other, more complicated domains is also described. A description of a methodology for extending the theory, so that it can be used for the implementation of a machine that learns is also described in this paper. By using belief networks, the machine constructs additions to its basic Markov Chain in order to handle new verbs and objects, which were not included in the original programming. Once implemented, the system will then treat the new word as if it had originally been programmed into the machine. Finally, several prototypes are described which have been written to validate the theory presented. The information theoretic system contained herein is compared to other techniques of natural language processing, and shown to have significant advantages.

Style APA, Harvard, Vancouver, ISO itp.

37

Djoweini, Camran, i Henrietta Hellberg. "Approaches to natural language processing in app development". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230167.

Pełny tekst źródła

Streszczenie:

Natural language processing is an on-going field that is not yet fully established. A high demand for natural language processing in applications creates a need for good development-tools and different implementation approaches developed to suit the engineers behind the applications. This project approaches the field from an engineering point of view to research approaches, tools, and techniques that are readily available today for development of natural language processing support. The sub-area of information retrieval of natural language processing was examined through a case study, where prototypes were developed to get a deeper understanding of the tools and techniques used for such tasks from an engineering point of view. We found that there are two major approaches to developing natural language processing support for applications, high-level and low-level approaches. A categorization of tools and frameworks belonging to the two approaches as well as the source code, documentation and, evaluations, of two prototypes developed as part of the research are presented. The choice of approach, tools and techniques should be based on the specifications and requirements of the final product and both levels have their own pros and cons. The results of the report are, to a large extent, generalizable as many different natural language processing tasks can be solved using similar solutions even if their goals vary.
Datalingvistik (engelska natural language processing) är ett område inom datavetenskap som ännu inte är fullt etablerat. En hög efterfrågan av stöd för naturligt språk i applikationer skapar ett behov av tillvägagångssätt och verktyg anpassade för ingenjörer. Detta projekt närmar sig området från en ingenjörs synvinkel för att undersöka de tillvägagångssätt, verktyg och tekniker som finns tillgängliga att arbeta med för utveckling av stöd för naturligt språk i applikationer i dagsläget. Delområdet ‘information retrieval’ undersöktes genom en fallstudie, där prototyper utvecklades för att skapa en djupare förståelse av verktygen och teknikerna som används inom området. Vi kom fram till att det går att kategorisera verktyg och tekniker i två olika grupper, beroende på hur distanserad utvecklaren är från den underliggande bearbetningen av språket. Kategorisering av verktyg och tekniker samt källkod, dokumentering och utvärdering av prototyperna presenteras som resultat. Valet av tillvägagångssätt, tekniker och verktyg bör baseras på krav och specifikationer för den färdiga produkten. Resultaten av studien är till stor del generaliserbara eftersom lösningar till många problem inom området är likartade även om de slutgiltiga målen skiljer sig åt.

Style APA, Harvard, Vancouver, ISO itp.

38

Bjöörn, Anton, i Lukas Uggla. "Answering Game Rulebook Enquiries Through Natural Language Processing". Thesis, KTH, Skolan för teknikvetenskap (SCI), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230209.

Pełny tekst źródła

Streszczenie:

The aim of this research project was to create a conversationalinterface for retrieving information from rulebooks. This conversationalinterface takes the shape of an assistant named OLGA (shortfor Open Legend Game Assistant) to whom you can give enquiriesabout the rules of any game loaded into the program. We tunedand designed the assistant around a specic type of board gamescalled TRPGs (tabletop role playing games), hence the conversationalinterface is focused around game rulebooks. By giving theassistant the rules for a game in the form of a raw text documentthe program can extract key concepts and words from therules which we call entities. The process of extracting entities andall other functions of the assistant were calibrated on the TRPGcalled Open Legend, hence the name Open Legend Game Assistant.When the user sends a query to the assistant it is rst sentto the web service Dialog ow for interpretation. In Dialog owwe enter our extracted entities to assist the service in recognizingkey words and concepts in the queries. Dialog ow then returnsan object with information telling the assistant what the intentof the user's query was and any additional information provided.The assistant then responds to the query. The standard responsefor a request for information about an entity is what we call astreak search. The assistant locates parts of the rules that containthe entity and sorts them by a relevance score, then the resultsare presented in order of relevance. When testing on people withno prior knowledge of the game it was concluded that the assistantindeed could be helpful in nding answers to rule questions inthe limited amount of time provided. Generalization being one ofour goals the program was also applied on another rule system inthe TRPG genre, Pathnder, applied on this system the assistantworked as intended without altering any algorithm.
Vi har undersökt hur man kan skapa en generell konverserande assistent som kan hjälpa med att svara på regelfrågor i brädspel (eller mer specifikt bordsrollspel, även kallat penna och papper roll- spel). I många bordsrollspel finns det väldigt mycket regler och termer som de flesta användare inte kan memorera direkt om än alls. Istället för att behöva bläddra genom långa böcker eller söka på en hemsida efter rätt del i regeltexten där det man undrar står har vi designat en assistent som man kan ställa frågor till om regler. För att göra detta har vi använt programeringsspråket Python samt verktyget NLTK [11] och hemsidan Dialogflow. Frågan som man ställer programmet skickas till Dialogflow som kan tolka språket och skickar sedan tillbaka till oss vad det är användaren vill (ex- empelvis ”Berätta om svart magi.” skickar tillbaka att användaren vill ha info om ”svart magi”). Därefter söker vårt program igenom en textfil där alla regler har kopierats in (denna används också innan för att Dialogflow ska veta vad som kan frågas om) och av vår kod rankas då olika textstycken utefter relevans och den högst rankade texten visas för användaren. Sedan kan man ex- empelvis fråga om mer information om ämnet, se vad som står direkt efter i texten eller ställa en ny fråga. Vi använde bord- srollspelet Open Legend [2] som testsystem och lät sedan testare som ej var bekanta med systemet testa att försöka svara på några svåra frågor om spelet på tio minuter, detta jämfördes med att lika många testare svarade på samma frågor under samma tid men fick använda hemsidan där reglerna var tagna ifrån istället. Un- dersökningen visade att även i situationer där testaren var okunnig om spelet och nyintroducerad till programmet så kunde assistenten vara lika effektiv som hemsidan på att få ut information. För att undersöka hur generellt applicerbar assistenen var testade vi också att applicera vårat program på ett helt orelaterat och mycket större bordsrollspel (Pathfinder) och även då visade sig assisteten fungera och kunde svara på regelfrågorna, om än mycket långsammare på grund av regelbokens längd.

Style APA, Harvard, Vancouver, ISO itp.

39

Rodríguez, Ruiz Luis. "Interactive Pattern Recognition applied to Natural Language Processing". Doctoral thesis, Universitat Politècnica de València, 2010. http://hdl.handle.net/10251/8479.

Pełny tekst źródła

Streszczenie:

This thesis is about Pattern Recognition. In the last decades, huge efforts have been made to develop automatic systems able to rival human capabilities in this field. Although these systems achieve high productivity rates, they are not precise enough in most situations. Humans, on the contrary, are very accurate but comparatively quite slower. This poses an interesting question: the possibility of benefiting from both worlds by constructing cooperative systems. This thesis presents diverse contributions to this kind of collaborative approach. The point is to improve the Pattern Recognition systems by properly introducing a human operator into the system. We call this Interactive Pattern Recognition (IPR). Firstly, a general proposal for IPR will be stated. The aim is to develop a framework to easily derive new applications in this area. Some interesting IPR issues are also introduced. Multi-modality or adaptive learning are examples of extensions that can naturally fit into IPR. In the second place, we will focus on a specific application. A novel method to obtain high quality speech transcriptions (CAST, Computer Assisted Speech Transcription). We will start by proposing a CAST formalization and, next, we will cope with different implementation alternatives. Practical issues, as the system response time, will be also taken into account, in order to allow for a practical implementation of CAST. Word graphs and probabilistic error correcting parsing are tools that will be used to reach an alternative formulation that allows for the use of CAST in a real scenario. Afterwards, a special application within the general IPR framework will be discussed. This is intended to test the IPR capabilities in an extreme environment, where no input pattern is available and the system only has access to the user actions to produce a hypothesis. Specifically, we will focus here on providing assistance in the problem of text generation.
Rodríguez Ruiz, L. (2010). Interactive Pattern Recognition applied to Natural Language Processing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8479
Palancia

Style APA, Harvard, Vancouver, ISO itp.

40

Jin, Di Ph D. Massachusetts Institute of Technology. "Transfer learning and robustness for natural language processing". Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/129004.

Pełny tekst źródła

Streszczenie:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mechanical Engineering, 2020
Cataloged from student-submitted PDF of thesis.
Includes bibliographical references (pages 189-217).
Teaching machines to understand human language is one of the most elusive and long-standing challenges in Natural Language Processing (NLP). Driven by the fast development of deep learning, state-of-the-art NLP models have already achieved human-level performance in various large benchmark datasets, such as SQuAD, SNLI, and RACE. However, when these strong models are deployed to real-world applications, they often show poor generalization capability in two situations: 1. There is only a limited amount of data available for model training; 2. Deployed models may degrade significantly in performance on noisy test data or natural/artificial adversaries. In short, performance degradation on low-resource tasks/datasets and unseen data with distribution shifts imposes great challenges to the reliability of NLP models and prevent them from being massively applied in the wild. This dissertation aims to address these two issues.
Towards the first one, we resort to transfer learning to leverage knowledge acquired from related data in order to improve performance on a target low-resource task/dataset. Specifically, we propose different transfer learning methods for three natural language understanding tasks: multi-choice question answering, dialogue state tracking, and sequence labeling, and one natural language generation task: machine translation. These methods are based on four basic transfer learning modalities: multi-task learning, sequential transfer learning, domain adaptation, and cross-lingual transfer. We show experimental results to validate that transferring knowledge from related domains, tasks, and languages can improve the target task/dataset significantly. For the second issue, we propose methods to evaluate the robustness of NLP models on text classification and entailment tasks.
On one hand, we reveal that although these models can achieve a high accuracy of over 90%, they still easily crash over paraphrases of original samples by changing only around 10% words to their synonyms. On the other hand, by creating a new challenge set using four adversarial strategies, we find even the best models for the aspect-based sentiment analysis task cannot reliably identify the target aspect and recognize its sentiment accordingly. On the contrary, they are easily confused by distractor aspects. Overall, these findings raise great concerns of robustness of NLP models, which should be enhanced to ensure their long-run stable service.
by Di Jin.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Mechanical Engineering

Style APA, Harvard, Vancouver, ISO itp.

41

Välme, Emma, i Lea Renmarker. "Accelerating Sustainability Report Assessment with Natural Language Processing". Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445912.

Pełny tekst źródła

Streszczenie:

Corporations are expected to be transparent on their sustainability impact and keep their stakeholders informed about how large the impact on the environment is, as well as their work on reducing the impact in question. The transparency is accounted for in a, usually voluntary, sustainability report additional to the already required financial report. With new regulations for mandatory sustainability reporting in Sweden, comprehensive and complete guidelines for corporations to follow are insufficient and the reports tend to be extensive. The reports are therefore hard to assess in terms of how well the reporting is actually done. The Sustainability Reporting Maturity Grid (SRMG) is an assessment tool introduced by Cöster et al. (2020) used for assessing the quality of sustainability reporting. Today, the assessment is performed manually which has proven to be both time-consuming and resulting in varying assessments, affected by individual interpretation of the content. This thesis is exploring how assessment time and grading with the SRMG can be improved by applying Natural Language Processing (NLP) on sustainability documents, resulting in a compressed assessment method - The Prototype. The Prototype intends to facilitate and speed up the process of assessment. The first step towards developing the Prototype was to decide which one of the three Machine Learning models; Naïve Bayes (NB), Support Vector Machines (SVM), or Bidirectional Encoder Representations of Transformers (BERT), is most suitable. This decision was supported by analyzing the accuracy for each model and for respective criteria in the SRMG, where BERT proved a strong classification ability with an average accuracy of 96,8%. Results from the user evaluation of the Prototypeindicated that the assessment time can be halved using the Prototype, with an initial average of 40 minutes decreased to 20 minutes. However, the results further showed a decreased average grading and an increased variation in assessment. The results indicate that applying NLP could be successful, but to get a more competitive Prototype, a more nuanced dataset must be developed, giving more space for the model to detect patterns in the data.

Style APA, Harvard, Vancouver, ISO itp.

42

Liliemark, Adam, i Viktor Enghed. "Categorization of Customer Reviews Using Natural Language Processing". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299882.

Pełny tekst źródła

Streszczenie:

Databases of user generated data can quickly become unmanageable. Klarna faced this issue, with a database of around 700,000 customer reviews. Ideally, the database would be cleaned of uninteresting reviews and the remaining reviews categorized. Without knowing what categories might emerge, the idea was to use an unsupervised clustering algorithm to ﬁnd categories. This thesis describes the work carried out to solve this problem, and proposes a solution for Klarna that involves artiﬁcial neural networks rather than unsupervised clustering. The implementation done by us is able to categorize reviews as either interesting or uninteresting. We propose a workﬂow that would create means to categorize reviews not only in these two categories, but in multiple. The method revolved around experimentation with clustering algorithms and neural networks. Previous research shows that texts can be clustered, however, the datasets used seem to be vastly diﬀerent from the Klarna dataset. The Klarna dataset consists of short reviews and contain a large amount of uninteresting reviews. Using unsupervised clustering yielded unsatisfactory results, as no discernible categories could be found. In some cases, the technique created clusters of uninteresting reviews. These clusters were used as training data for an artiﬁcial neural network, together with manually labeled interesting reviews. The results from this artiﬁcial neural network was satisfactory; it can with an accuracy of around 86% say whether a review is interesting or not. This was achieved using the aforementioned clusters and ﬁve feedback loops, where the model’s wrongfully predicted reviews from an evaluation dataset was fed back to it as training data. We argue that the main reason behind why unsupervised clustering failed is that the length of the reviews are too short. In comparison, other researchers have successfully clustered text data with an average length in the hundreds. These items pack much more features than the short reviews in the Klarna dataset. We show that an artiﬁcial neural network is able to detect these features despite the short length, through its intrinsic design. Further research in feature extraction of short text strings could provide means to cluster this kind of data. If features can be extracted, the clustering can thus be done on the features rather than the actual words. Our artiﬁcial neural network shows that the arbitrary features interesting and uninteresting can be extracted, so we are hopeful that future researchers will ﬁnd ways of extracting more features from short text strings. In theory, this should mean that text of all lengths can be clustered unsupervised.
Databaser med användargenererad data kan snabbt bli ohanterbara. Klarna stod inför detta problem, med en databas innehållande cirka 700 000 recensioner från kunder. De såg helst att databasen skulle rensas från ointressanta recensioner och att de kvarvarande kategoriseras. Eftersom att kategorierna var okända initialt, var tanken att använda en oövervakad grupperingsalgoritm. Denna rapport beskriver det arbete som utfördes för att lösa detta problem, och föreslår en lösning till Klarna som involverar artiﬁciella neurala nätverk istället för oövervakad gruppering. Implementationen skapad av oss är kapabel till att kategorisera recensioner som intressanta eller ointressanta. Vi föreslår ett arbetsﬂöde som skulle skapa möjlighet att kategorisera recensioner inte bara i dessa två kategorier, utan i ﬂera. Metoden kretsar kring experimentering med grupperingsalgoritmer och artiﬁciella neurala nätverk. Tidigare forskning visar att texter kan grupperas oövervakat, dock med ingångsdata som väsentligt skiljer sig från Klarnas data. Recensionerna i Klarnas data är generellt sett korta och en stor andel av dem kan ses som ointressanta. Oövervakad grupperingen gav otillräckliga resultat, då inga skönjbara kategorier stod att ﬁnna. I vissa fall skapades grupperingar av ointressanta recensioner. Dessa användes som träningsdata för ett artiﬁciellt neuralt nätverk. Till träningsdatan lades intressanta recensioner som tagits fram manuellt. Resultaten från detta var positivt; med en träﬀsäkerhet om cirka 86% avgörs om en recension är intressant eller inte. Detta uppnåddes genom den tidigare skapade träningsdatan samt fem återkopplingsprocesser, där modellens felaktiga prediktioner av evalueringsdata matades in som träningsdata. Vår uppfattning är att den korta längden på recensionerna gör att den oövervakade grupperingen inte fungerar. Andra forskare har lyckats gruppera textdata med snittlängder om hundratals ord per text. Dessa texter rymmer ﬂer meningsfulla enheter än de korta recensionerna i Klarnas data. Det ﬁnns lösningar som innefattar artiﬁciella neurala nätverk å andra sidan kan upptäcka dessa meningsfulla enheter, tack vare sin grundläggande utformning. Vårt arbete visar att ett artiﬁciellt neuralt nätverk kan upptäcka dessa meningsfulla enheter, trots den korta längden per recension. Extrahering av meningsfulla enheter ur korta texter är ett ¨ämne som behöver mer forskning för att underlätta problem som detta. Om meningsfulla enheter kan extraheras ur texter, kan grupperingen göras på dessa enheter istället för orden i sig. Vårt artiﬁciella neurala nätverk visar att de arbiträra enheterna intressant och ointressant kan extraheras, vilket gör oss hoppfulla om att framtida forskare kan ﬁnna sätt att extrahera ﬂer enheter ur korta texter. I teorin innebär detta att texter av alla längder kan grupperas oövervakat.

Style APA, Harvard, Vancouver, ISO itp.

43

Kavousi, Mohammadamir. "APPLICATIONS OF NEURAL NETWORKS IN NATURAL LANGUAGE PROCESSING". OpenSIUC, 2019. https://opensiuc.lib.siu.edu/theses/2570.

Pełny tekst źródła

Streszczenie:

User-generated texts such as reviews and social media are valuable sources of information. Online reviews are important assets for users to buy a product, see a movie, or choose a restaurant. Therefore, rating of a review is one of the reliable factors for all the

Style APA, Harvard, Vancouver, ISO itp.

44

Wang, Qianqian. "NATURAL LANGUAGE PROCESSING BASED GENERATOR OF TESTING INSTRUMENTS". CSUSB ScholarWorks, 2017. https://scholarworks.lib.csusb.edu/etd/576.

Pełny tekst źródła

Streszczenie:

Natural Language Processing (NLP) is the field of study that focuses on the interactions between human language and computers. By “natural language” we mean a language that is used for everyday communication by humans. Different from programming languages, natural languages are hard to be defined with accurate rules. NLP is developing rapidly and it has been widely used in different industries. Technologies based on NLP are becoming increasingly widespread, for example, Siri or Alexa are intelligent personal assistants using NLP build in an algorithm to communicate with people. “Natural Language Processing Based Generator of Testing Instruments” is a stand-alone program that generates “plausible” multiple-choice selections by analyzing word sense disambiguation and calculating semantic similarity between two natural language entities. The core is Word Sense Disambiguation (WSD), WSD is identifying which sense of a word is used in a sentence when the word has multiple meanings. WSD is considered as an AI-hard problem. The project presents several algorithms to resolve WSD problem and compute semantic similarity, along with experimental results demonstrating their effectiveness.

Style APA, Harvard, Vancouver, ISO itp.

45

Ruggeri, Federico <1993&gt. "Towards Unstructured Knowledge Integration in Natural Language Processing". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amsdottorato.unibo.it/10229/1/FedericoRuggeri_PhD_Thesis.pdf.

Pełny tekst źródła

Streszczenie:

In the last decades, Artificial Intelligence has witnessed multiple breakthroughs in deep learning. In particular, purely data-driven approaches have opened to a wide variety of successful applications due to the large availability of data. Nonetheless, the integration of prior knowledge is still required to compensate for specific issues like lack of generalization from limited data, fairness, robustness, and biases. In this thesis, we analyze the methodology of integrating knowledge into deep learning models in the field of Natural Language Processing (NLP). We start by remarking on the importance of knowledge integration. We highlight the possible shortcomings of these approaches and investigate the implications of integrating unstructured textual knowledge. We introduce Unstructured Knowledge Integration (UKI) as the process of integrating unstructured knowledge into machine learning models. We discuss UKI in the field of NLP, where knowledge is represented in a natural language format. We identify UKI as a complex process comprised of multiple sub-processes, different knowledge types, and knowledge integration properties to guarantee. We remark on the challenges of integrating unstructured textual knowledge and bridge connections with well-known research areas in NLP. We provide a unified vision of structured knowledge extraction (KE) and UKI by identifying KE as a sub-process of UKI. We investigate some challenging scenarios where structured knowledge is not a feasible prior assumption and formulate each task from the point of view of UKI. We adopt simple yet effective neural architectures and discuss the challenges of such an approach. Finally, we identify KE as a form of symbolic representation. From this perspective, we remark on the need of defining sophisticated UKI processes to verify the validity of knowledge integration. To this end, we foresee frameworks capable of combining symbolic and sub-symbolic representations for learning as a solution.

Style APA, Harvard, Vancouver, ISO itp.

46

Tripodi, Rocco <1982&gt. "Evolutionary game theoretic models for natural language processing". Doctoral thesis, Università Ca' Foscari Venezia, 2015. http://hdl.handle.net/10579/8351.

Pełny tekst źródła

Streszczenie:

This thesis is aimed at discovering new learning algorithms inspired by principles of biological evolution, which are able to exploit relational and contextual information, viewing clustering and classification problems in a dynamical system perspective. In particular, we have investigated how game theoretic models can be used to solve different Natural Language Processing tasks. Traditional studies of language have used a game-theoretic perspective to study how language evolves over time and how it emerges in a community but to the best of our knowledge, this is the first attempt to use game-theory to solve specific problems in this area. These models are based on the concept of equilibrium, a state of a system, which emerges after a series of interactions among the elements, which are part of it. Starting from a situation in which there is uncertainty about a particular phenomenon, they describe how a disequilibrium state resolves in equilibrium. The games are situations in which a group of objects has to be classified or clustered and each of them has to choose its collocation in a predefined set of classes. The choice of each one is influenced by the choices of the other and the satisfaction that a player has, about the outcome of a game, is determined by a payoff function, which the players try to maximize. After a series of interactions the players learn to play their best strategies, leading to an equilibrium state and to the resolution of the problem. From a machine-learning perspective this approach is appealing, because it can be employed as an unsupervised, semi-supervised or supervised learning model. We have used it to resolve the word sense disambiguation problem. We casted this task as a constraint satisfaction problem, where each word to be disambiguated is con- strained to choose the most coherent sense among the available, according to the sense that the words around it are choosing. This formulation ensures the mainte- nance of textual coherence and has been tested against state-of-the-art algorithms with higher and more stable results. We have also used a game theoretic formulation, to improve the clustering results of dominant set clustering and non-negative matrix factorization technique. We evaluated our system on different document datasets through different approaches, achieving results, which outperform state-of-the-art algorithms. This work opened new perspectives in game theoretic models, demonstrating that these approaches are promising and that they can be employed also for the resolution of new problems.

Style APA, Harvard, Vancouver, ISO itp.

47

Boulanger, Hugo. "Data augmentation and generation for natural language processing". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG019.

Pełny tekst źródła

Streszczenie:

De plus en plus de domaines cherchent à automatiser une partie de leur processus.Le traitement automatique des langues contient des méthodes permettant d'extraire des informations dans des textes.Ces méthodes peuvent utiliser de l'apprentissage automatique.L'apprentissage automatique nécessite des données annotées pour faire de l'extraction d'information de manière optimale.L'application de ces méthodes à de nouveaux domaines nécessite d'obtenir des données annotées liée à la tâche.Le problème que nous souhaitons résoudre est de proposer et d'étudier des méthodes de génération pour améliorer les performances de modèles appris à basse quantité de données.Nous explorons différentes méthodes avec et sans apprentissage pour générer les données nécessaires à l'apprentissage de modèles d'étiquetage.La première méthode que nous explorons est le remplissage de patrons.Cette méthode de génération de données permet de générer des données annotées en combinant des phrases à trous, les patrons, et des mentions.Nous avons montré que cette méthode permet d'améliorer les performances des modèles d'étiquetage à très petite quantité de données.Nous avons aussi étudié la quantité de données nécessaire pour l'utilisation optimale de cette méthode.La deuxième approche de génération que nous avons testé est l'utilisation de modèles de langue pour la génération couplée à l'utilisation de méthode d'apprentissage semi-supervisé.La méthode d'apprentissage semi-supervisé utilisé est le tri-training et sert à ajouter les étiquettes aux données générées.Le tri-training est testé sur plusieurs méthodes de génération utilisant différents modèles de langue pré-entraînés.Nous avons proposé une version du tri-training appelé tri-training génératif, où la génération n'est pas faite en amont, mais durant le processus de tri-training et profite de celui-ci.Nous avons testé les performances des modèles entraînés durant le processus de semi-supervision et des modèles entraîné sur les données produites par celui-ci.Dans la majeure partie des cas, les données produites permettent d'égaler les performances des modèles entraînés avec la semi-supervision.Cette méthode permet l'amélioration des performances à tous les niveaux de données testés vis-à-vis des modèles sans augmentation.La troisième piste d'étude vise à combiner certains aspects des approches précédentes.Pour cela, nous avons testé différentes approches.L'utilisation de modèles de langues pour faire du remplacement de bouts de phrase à la manière de la méthode de remplissage de patrons fut infructueuse.Nous avons testé l'addition de données générées par différentes méthodes qui ne permet pas de surpasser la meilleure des méthodes.Enfin, nous avons testé l'application de la méthode de remplissage de patrons sur les données générées avec le tri-training qui n'a pas amélioré les résultats obtenu avec le tri-training.S'il reste encore beaucoup à étudier, nous avons cependant mis en évidence des méthodes simples, comme le remplissage de patrons, et plus complexe, comme l'utilisation d'apprentissage supervisé avec des phrases générées par un modèle de langue, permettant d'améliorer les performances de modèles d'étiquetage grâce à la génération de données annotées
More and more fields are looking to automate part of their process.Automatic language processing contains methods for extracting information from texts.These methods can use machine learning.Machine learning requires annotated data to perform information extraction.Applying these methods to new domains requires obtaining annotated data related to the task.In this thesis, our goal is to study generation methods to improve the performance of learned models with low amounts of data.Different methods of generation are explored that either contain machine learning or do not, which are used to generate the data needed to learn sequence labeling models.The first method explored is pattern filling.This data generation method generates annotated data by combining sentences with slots, or patterns, with mentions.We have shown that this method improves the performance of labeling models with tiny amounts of data.The amount of data needed to use this method is also studied.The second approach tested is the use of language models for text generation alongside a semi-supervised learning method for tagging.The semi-supervised learning method used is tri-training and is used to add labels to the generated data.The tri-training is tested on several generation methods using different pre-trained language models.We proposed a version of tri-training called generative tri-training, where the generation is not done in advance but during the tri-training process and takes advantage of it.The performance of the models trained during the semi-supervision process and of the models trained on the data generated by it are tested.In most cases, the data produced match the performance of the models trained with the semi-supervision.This method allows the improvement of the performances at all the tested data levels with respect to the models without augmentation.The third avenue of study combines some aspects of the previous approaches.For this purpose, different approaches are tested.The use of language models to do sentence replacement in the manner of the pattern-filling generation method is unsuccessful.Using a set of data coming from the different generation methods is tested, which does not outperform the best method.Finally, applying the pattern-filling method to the data generated with the tri-training is tested and does not improve the results obtained with the tri-training.While much remains to be studied, we have highlighted simple methods, such as pattern filling, and more complex ones, such as the use of supervised learning with sentences generated by a language model, to improve the performance of labeling models through the generation of annotated data

Style APA, Harvard, Vancouver, ISO itp.

48

Hellmann, Sebastian. "Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data". Doctoral thesis, Universitätsbibliothek Leipzig, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-157932.

Pełny tekst źródła

Streszczenie:

This thesis is a compendium of scientific works and engineering specifications that have been contributed to a large community of stakeholders to be copied, adapted, mixed, built upon and exploited in any way possible to achieve a common goal: Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data The explosion of information technology in the last two decades has led to a substantial growth in quantity, diversity and complexity of web-accessible linguistic data. These resources become even more useful when linked with each other and the last few years have seen the emergence of numerous approaches in various disciplines concerned with linguistic resources and NLP tools. It is the challenge of our time to store, interlink and exploit this wealth of data accumulated in more than half a century of computational linguistics, of empirical, corpus-based study of language, and of computational lexicography in all its heterogeneity. The vision of the Giant Global Graph (GGG) was conceived by Tim Berners-Lee aiming at connecting all data on the Web and allowing to discover new relations between this openly-accessible data. This vision has been pursued by the Linked Open Data (LOD) community, where the cloud of published datasets comprises 295 data repositories and more than 30 billion RDF triples (as of September 2011). RDF is based on globally unique and accessible URIs and it was specifically designed to establish links between such URIs (or resources). This is captured in the Linked Data paradigm that postulates four rules: (1) Referred entities should be designated by URIs, (2) these URIs should be resolvable over HTTP, (3) data should be represented by means of standards such as RDF, (4) and a resource should include links to other resources. Although it is difficult to precisely identify the reasons for the success of the LOD effort, advocates generally argue that open licenses as well as open access are key enablers for the growth of such a network as they provide a strong incentive for collaboration and contribution by third parties. In his keynote at BNCOD 2011, Chris Bizer argued that with RDF the overall data integration effort can be “split between data publishers, third parties, and the data consumer”, a claim that can be substantiated by observing the evolution of many large data sets constituting the LOD cloud. As written in the acknowledgement section, parts of this thesis has received numerous feedback from other scientists, practitioners and industry in many different ways. The main contributions of this thesis are summarized here: Part I – Introduction and Background. During his keynote at the Language Resource and Evaluation Conference in 2012, Sören Auer stressed the decentralized, collaborative, interlinked and interoperable nature of the Web of Data. The keynote provides strong evidence that Semantic Web technologies such as Linked Data are on its way to become main stream for the representation of language resources. The jointly written companion publication for the keynote was later extended as a book chapter in The People’s Web Meets NLP and serves as the basis for “Introduction” and “Background”, outlining some stages of the Linked Data publication and refinement chain. Both chapters stress the importance of open licenses and open access as an enabler for collaboration, the ability to interlink data on the Web as a key feature of RDF as well as provide a discussion about scalability issues and decentralization. Furthermore, we elaborate on how conceptual interoperability can be achieved by (1) re-using vocabularies, (2) agile ontology development, (3) meetings to refine and adapt ontologies and (4) tool support to enrich ontologies and match schemata. Part II - Language Resources as Linked Data. “Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge Acquisition Spiral” summarize the results of the Linked Data in Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in 2013 and give a preview of the MLOD special issue. In total, five proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP & DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012) and one journal special issue (Multilingual Linked Open Data, MLOD to appear) – have been (co-)edited to create incentives for scientists to convert and publish Linked Data and thus to contribute open and/or linguistic data to the LOD cloud. Based on the disseminated call for papers, 152 authors contributed one or more accepted submissions to our venues and 120 reviewers were involved in peer-reviewing. “DBpedia as a Multilingual Language Resource” and “Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked Data Cloud” contain this thesis’ contribution to the DBpedia Project in order to further increase the size and inter-linkage of the LOD Cloud with lexical-semantic resources. Our contribution comprises extracted data from Wiktionary (an online, collaborative dictionary similar to Wikipedia) in more than four languages (now six) as well as language-specific versions of DBpedia, including a quality assessment of inter-language links between Wikipedia editions and internationalized content negotiation rules for Linked Data. In particular the work described in created the foundation for a DBpedia Internationalisation Committee with members from over 15 different languages with the common goal to push DBpedia as a free and open multilingual language resource. Part III - The NLP Interchange Format (NIF). “NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and “Evaluation and Related Work” constitute one of the main contribution of this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The core specification is included in and describes which URI schemes and RDF vocabularies must be used for (parts of) natural language texts and annotations in order to create an RDF/OWL-based interoperability layer with NIF built upon Unicode Code Points in Normal Form C. In , classes and properties of the NIF Core Ontology are described to formally define the relations between text, substrings and their URI schemes. contains the evaluation of NIF. In a questionnaire, we asked questions to 13 developers using NIF. UIMA, GATE and Stanbol are extensible NLP frameworks and NIF was not yet able to provide off-the-shelf NLP domain ontologies for all possible domains, but only for the plugins used in this study. After inspecting the software, the developers agreed however that NIF is adequate enough to provide a generic RDF output based on NIF using literal objects for annotations. All developers were able to map the internal data structure to NIF URIs to serialize RDF output (Adequacy). The development effort in hours (ranging between 3 and 40 hours) as well as the number of code lines (ranging between 110 and 445) suggest, that the implementation of NIF wrappers is easy and fast for an average developer. Furthermore the evaluation contains a comparison to other formats and an evaluation of the available URI schemes for web annotation. In order to collect input from the wide group of stakeholders, a total of 16 presentations were given with extensive discussions and feedback, which has lead to a constant improvement of NIF from 2010 until 2013. After the release of NIF (Version 1.0) in November 2011, a total of 32 vocabulary employments and implementations for different NLP tools and converters were reported (8 by the (co-)authors, including Wiki-link corpus, 13 by people participating in our survey and 11 more, of which we have heard). Several roll-out meetings and tutorials were held (e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC 2014). Part IV - The NLP Interchange Format in Use. “Use Cases and Applications for NIF” and “Publication of Corpora using NIF” describe 8 concrete instances where NIF has been successfully used. One major contribution in is the usage of NIF as the recommended RDF mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard and the conversion algorithms from ITS to NIF and back. One outcome of the discussions in the standardization meetings and telephone conferences for ITS 2.0 resulted in the conclusion there was no alternative RDF format or vocabulary other than NIF with the required features to fulfill the working group charter. Five further uses of NIF are described for the Ontology of Linguistic Annotations (OLiA), the RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and visualisations of NIF using the RelFinder tool. These 8 instances provide an implemented proof-of-concept of the features of NIF. starts with describing the conversion and hosting of the huge Google Wikilinks corpus with 40 million annotations for 3 million web sites. The resulting RDF dump contains 477 million triples in a 5.6 GB compressed dump file in turtle syntax. describes how NIF can be used to publish extracted facts from news feeds in the RDFLiveNews tool as Linked Data. Part V - Conclusions. provides lessons learned for NIF, conclusions and an outlook on future work. Most of the contributions are already summarized above. One particular aspect worth mentioning is the increasing number of NIF-formated corpora for Named Entity Recognition (NER) that have come into existence after the publication of the main NIF paper Integrating NLP using Linked Data at ISWC 2013. These include the corpora converted by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an OpenNLP-based CoNLL converter by Brümmer. Furthermore, we are aware of three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP Interchange Format for Open German Governmental Data, N^3 – A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format and Global Intelligent Content: Active Curation of Language Resources using Linked Data as well as an early implementation of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr. Further funding for the maintenance, interlinking and publication of Linguistic Linked Data as well as support and improvements of NIF is available via the expiring LOD2 EU project, as well as the CSA EU project called LIDER, which started in November 2013. Based on the evidence of successful adoption presented in this thesis, we can expect a decent to high chance of reaching critical mass of Linked Data technology as well as the NIF standard in the field of Natural Language Processing and Language Resources.

Style APA, Harvard, Vancouver, ISO itp.

49

Devlin, Siobhan Lucy. "Simplifying natural language for aphasic readers". Thesis, University of Sunderland, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.300732.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

50

Shepherd, David. "Natural language program analysis combining natural language processing with program analysis to improve software maintenance tools /". Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 176 p, 2007. http://proquest.umi.com/pqdweb?did=1397920371&sid=6&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat „Natural Language Processing”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych