Tesis: "Computational linguistics"

1

vanCort, Tracy. "Computational Evolutionary Linguistics". Scholarship @ Claremont, 2001. https://scholarship.claremont.edu/hmc_theses/137.

Texto completo

Resumen

Languages and species both evolve by a process of repeated divergences, which can be described with the branching of a phylogenetic tree or phylogeny. Taking advantage of this fact, it is possible to study language change using computational tree building techniques developed for evolutionary biology. Mathematical approaches to the construction of phylogenies fall into two major categories: character based and distance based methods. Character based methods were used in prior work in the application of phylogenetic methods to the Indo-European family of languages by researchers at the University of Pennsylvania. Discussion of the limitations of character-based models leads to a similar presentation of distance based models. We present an adaptation of these methods to linguistic data, and the phylogenies generated by applying these methods to several modern Germanic languages and Spanish. We conclude that distance based for phylogenies are useful for historical linguistic reconstruction, and that it would be useful to extend existing tree drawing methods to better model the evolutionary effects of language contact.

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Wang, Pengyu. "Collapsed variational inference for computational linguistics". Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:13c08f60-1441-4ea5-b52f-7ffd0d7a744f.

Texto completo

Resumen

Bayesian modelling is a natural fit for tasks in computational linguistics, since it can provide interpretable structures, useful prior controls, and coherent management of uncertainty. However, exact Bayesian inference is intractable for many models of practical interest. Developing both accurate and efficient approximate Bayesian inference algorithms remains a fundamental challenge, especially for the field of computational linguistics where datasets are large and growing and models consist of complex latent structures. Collapsed variational inference (CVI) is an important milestone that combines the efficiency of variational inference (VI) and the accuracy of Markov chain Monte Carlo (MCMC) (Teh et al., 2006). However, its previous applications were limited to bag-of-words models whose hidden variables are conditionally independent given the parameters, whereas in computational linguistics, the hidden variable dependencies are crucial for modelling the underlying syntactic and semantic relations. To enlarge the application domain of CVI as well as to address the above Bayesian inference challenge, we investigate the applications of collapsed variational inference to computational linguistics. In this thesis, our contributions are three-fold. First, we solve a number of inference challenges arising from the hidden variable dependencies and derive a set of new CVI algorithms for the two ubiquitous and foundational models in computational linguistics, namely hidden Markov models (HMMs) and probabilistic context free grammars. We also propose CVI for hierarchical Dirichlet process (HDP) HMMs that are Bayesian nonparametric extensions of HMMs. Second, along the way we propose a set of novel algorithmic techniques, which are generally applicable to a wide variety of probabilistic graphical models in the conjugate exponential family and computational linguistic models using non-conjugate HDP constructions. Therefore, our work represents one step in bridging the gap between increasingly richer Bayesian models in computational linguistics and recent advances in approximate Bayesian inference. Third, we empirically evaluate our proposed CVI algorithms and their stochastic versions in a range of computational linguistic tasks, such as part-of-speech induction, grammar induction and many others. Experimental results consistently demonstrate that, using our techniques for handling the hidden variable dependencies, the empirical advantages of both VI and MCMC can be combined in a much larger domain of CVI applications.

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Penton, Dave. "Linguistic data models : presentation and representation /". Connect to thesis, 2006. http://eprints.unimelb.edu.au/archive/00002875.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Fujinami, Tsutomu. "A process algebraic approach to computational linguistics". Thesis, University of Edinburgh, 1996. http://hdl.handle.net/1842/521.

Texto completo

Resumen

The thesis presents a way to apply process algebra to computational linguistics. We are interested in how contexts can affect or contribute to language understanding and model the phenomena as a system of communicating processes to study the interaction between them in detail. For this purpose, we turn to the pie-calculus and investigate how communicating processes may be defined. While investigating the computational grounds of communication and concurrency,we devise a graphical representation for processes to capture the structure of interaction between them. Then, we develop a logic, combinatory intuitionistic linear logic with equality relation, to specify communicating processes logically. The development enables us to study Situation Semantics with process algebra. We construct semantic objects employed in Situation Semantics in the pi-calculus and then represent them in the logic. Through the construction,we also relate Situation Semantics with the research on the information flow, Channel Theory, by conceiving of linear logic as a theory of the information flow. To show how sentences can be parsed as the result of interactions between processes, we present a concurrent chart parser encoded in the pi-calculus. We also explain how a semantic representation can be generated as a process by the parser. We conclude the thesis by comparing the framework with other approaches.

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Moilanen, Karo. "Compositional entity-level sentiment analysis". Thesis, University of Oxford, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.559817.

Texto completo

Resumen

This thesis presents a computational text analysis tool called AFFECTiS (Affect Interpretation/Inference System) which focuses on the task of interpreting natural language text based on its subjective, non-factual, affective properties that go beyond the 'traditional' factual, objective dimensions of meaning that have so far been the main focus of Natural Language Processing and Computational Linguistics. The thesis presents a fully compositional uniform wide-coverage computational model of sentiment in text that builds on a number of fundamental compositional sentiment phenomena and processes discovered by detailed linguistic analysis of the behaviour of sentiment across key syntactic constructions in English. Driven by the Principle of Semantic Compositionality, the proposed model breaks sentiment interpretation down into strictly binary combinatory steps each of which explains the polarity of a given sentiment expression as a function of the properties of the sentiment carriers contained in it and the grammatical and semantic context(s) involved. An initial implementation of the proposed compositional sentiment model is de- scribed which attempts direct logical sentiment reasoning rather than basing compu- tational sentiment judgements on indirect data-driven evidence. Together with deep grammatical analysis and large hand-written sentiment lexica, the model is applied recursively to assign sentiment to all (sub )sentential structural constituents and to concurrently equip all individual entity mentions with gradient sentiment scores. The system was evaluated on an extensive multi-level and multi-task evaluation framework encompassing over 119,000 test cases from which detailed empirical ex- perimental evidence is drawn. The results across entity-, phrase-, sentence-, word-, and document-level data sets demonstrate that AFFECTiS is capable of human-like sentiment reasoning and can interpret sentiment in a way that is not only coherent syntactically but also defensible logically - even in the presence of the many am- biguous extralinguistic, paralogical, and mixed sentiment anomalies that so tellingly characterise the challenges involved in non-factual classification.

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Holmqvist, Maria. "Word Alignment by Re-using Parallel Phrases". Licentiate thesis, Linköping University, Linköping University, NLPLAB - Natural Language Processing Laboratory, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15463.

Texto completo

Resumen

In this thesis we present the idea of using parallel phrases for word alignment. Each parallel phrase is extracted from a set of manual word alignments and contains a number of source and target words and their corresponding alignments. If a parallel phrase matches a new sentence pair, its word alignments can be applied to the new sentence. There are several advantages of using phrases for word alignment. First, longer text segments include more context and will be more likely to produce correct word alignments than shorter segments or single words. More importantly, the use of longer phrases makesit possible to generalize words in the phrase by replacing words by parts-of-speech or other grammatical information. In this way, the number of words covered by the extracted phrases can go beyond the words and phrases that were present in the original set of manually aligned sentences. We present experiments with phrase-based word alignment on three types of English–Swedish parallel corpora: a software manual, a novel and proceedings of the European Parliament. In order to find a balance between improved coverage and high alignment accuracy we investigated different properties of generalised phrases to identify which types of phrases are likely to produce accurate alignments on new data. Finally, we have compared phrase-based word alignments to state-of-the-art statistical alignment with encouraging results. We show that phrase-based word alignments can be used to enhance statistical word alignment. To evaluate word alignments an English–Swedish reference set for the Europarl corpus was constructed. The guidelines for producing this reference alignment are presented in the thesis.

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Wahle, Johannes [Verfasser]. "Algorithmic advancements in Computational Historical Linguistics / Johannes Wahle". Tübingen : Universitätsbibliothek Tübingen, 2021. http://d-nb.info/1241537038/34.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

Chew, Peter. "A computational phonology of Russian". Thesis, University of Oxford, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.324285.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Senaldi, Marco Silvio Giuseppe. "Working both sides of the street: computational and psycholinguistic investigations on idiomatic variability". Doctoral thesis, Scuola Normale Superiore, 2019. http://hdl.handle.net/11384/86016.

Texto completo

Resumen

Over the years, the original conception of idioms as semantically empty and formally frozen units (Bobrow and Bell, 1973; Swinney and Cutler, 1979) has been replaced by a more complex view, whereby some idioms display an analyz able semantic structure (Nunberg, 1978) that allows for greater formal plasticity (Nunberg et al., 1994; Gibbs and Nayak, 1989). Corpus data have anyway shown that all types of idioms allow for a certain degree of manipulation if an appropriate context is provided (Duﬄey, 2013; Vietri, 2014). On the other hand, psycholin guistic data have revealed that the processing of idiom variants is not necessarily harder than the processing of idiom canonical forms or that it can be similar to the processing of literal language (McGlone et al., 1994; Geeraert et al., 2017a). Despite this possible variability, in two computational studies we show that focus ing on lexical ﬁxedness is still an eﬀective method for automatically telling apart non-compositional idiomatic expressions and compositional non-idiomatic expres sions by means of distributional-semantic indices of compositionality that compute the cosine similarity between the vector of a given phrase to be classiﬁed and the vectors of lexical variants of the same phrase that are generated distributionally or from the Italian section of MultiWordNet (Pianta et al., 2002). Idioms all in all result to be less similar to the vectors of their lexical variants with respect to compositional expressions, conﬁrming that they tend to be employed in a more formally conservative way in language use. In two eye-tracking studies we then compare the reading times of idioms and literals in the active form, in a passive form with preverbal subject and in a passive form with postverbal subject, which preserves the verb-noun order of the canonical active form. The ﬁrst experiment reveals that passives are longer to read than actives with no signiﬁcant eﬀect of idiomaticity in passive forms. A second experiment with more ecological dialogic stimuli reveals that preserving the surface verb-noun order of the active form fa cilitates the processing of passive idioms, suggesting that one of the core issues with idiom passivization could be the violation of canonical verb-noun order rather than verb voice per se.

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Kof, Leonid. "Text analysis for requirements engineering : application of computational linguistics /". Saarbrücken : VDM Verl. Dr. Müller, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=3021639&prov=M&dok_var=1&dok_ext=htm.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

Gao, Lili. "Applications of MachLearning and Computational Linguistics in Financial Economics". Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/815.

Texto completo

Resumen

In the world of the financial economics, we have abundant text data. Articles in the Wall Street Journal and on Bloomberg Terminals, corporate SEC filings, earnings-call transcripts, social media messages, etc. all contain ample information about financial markets and investor behaviors. Extracting meaningful signals from unstructured and high dimensional text data is not an easy task. However, with the development of machine learning and computational linguistic techniques, processing and statistically analyzing textual documents tasks can be accomplished, and many applications of statistical text analysis in social sciences have proven to be successful.

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Jin, Lifeng. "Computational Modeling of Syntax Acquisition with Cognitive Constraints". The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1594934826359118.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Matthews, Clive Andrew. "French gender attribution as a computational system". Thesis, University of East Anglia, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.301870.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Heiberg, Andrea Jeanine. "Features in optimality theory: A computational model". Diss., The University of Arizona, 1999. http://hdl.handle.net/10150/288983.

Texto completo

Resumen

This dissertation presents a computational model of Optimality Theory (OT) (Prince and Smolensky 1993). The model provides an efficient solution to the problem of candidate generation and evaluation, and is demonstrated for the realm of phonological features. Explicit object-oriented implementations are proposed for autosegmental representations (Goldsmith 1976 and many others) and violable OT constraints and Gen operations on autosegmental representations. Previous computational models of OT (Ellison 1995, Tesar 1995, Eisner 1997, Hammond 1997, Karttunen 1998) have not dealt in depth with autosegmental representations. The proposed model provides a full treatment of autosegmental representations and constraints on autosegmental representations (Akinlabi 1996, Archangeli and Pulleyblank 1994, Ito, Mester, and Padgett 1995, Kirchner 1993, Padgett 1995, Pulleyblank 1993, 1996, 1998). Implementing Gen, the candidate generation component of OT, is a seemingly intractable problem. Gen in principle performs unlimited insertion; therefore, it may produce an infinite candidate set. For autosegmental representations, however, it is not necessary to think of Gen as infinite. The Obligatory Contour Principle (Leben 1973, McCarthy 1979, 1986) restricts the number of tokens of any one feature type in a single representation; hence, Gen for autosegmental features is finite. However, a finite Gen may produce a candidate set of exponential size. Consider an input representation with four anchors for each of five features: there are (2⁴ + 1)⁵, more than one million, candidates for such an input. The proposed model implements a method for significantly reducing the exponential size of the candidate set. Instead of first creating all candidates (Gen) and then evaluating them against the constraint hierarchy (Eval), candidate creation and evaluation are interleaved (cf. Eisner 1997, Hammond 1997) in a Gen-Eval loop. At each pass through the Gen-Eval loop, Gen operations apply to create the minimal number of candidates needed for constraint evaluation; this candidate set is evaluated and culled, and the set of Gen operations is reduced. The loop continues until the hierarchy is exhausted; the remaining candidate(s) are optimal. In providing explicit implementations of autosegmental representations, constraints, and Gen operations, the model provides a coherent view of autosegmental theory, Optimality Theory, and the interaction between the two.

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Evans, Owain Rhys. "Bayesian computational models for inferring preferences". Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101522.

Texto completo

Resumen

Thesis: Ph. D. in Linguistics, Massachusetts Institute of Technology, Department of Linguistics and Philosophy, 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 130-131).
This thesis is about learning the preferences of humans from observations of their choices. It builds on work in economics and decision theory (e.g. utility theory, revealed preference, utilities over bundles), Machine Learning (inverse reinforcement learning), and cognitive science (theory of mind and inverse planning). Chapter 1 lays the conceptual groundwork for the thesis and introduces key challenges for learning preferences that motivate chapters 2 and 3. I adopt a technical definition of 'preference' that is appropriate for inferring preferences from choices. I consider what class of objects preferences should be defined over. I discuss the distinction between actual preferences and informed preferences and the distinction between basic/intrinsic and derived/instrumental preferences. Chapter 2 focuses on the challenge of human 'suboptimality'. A person's choices are a function of their beliefs and plans, as well as their preferences. If they have inaccurate beliefs or make inefficient plans, then it will generally be more difficult to infer their preferences from choices. It is also more difficult if some of their beliefs might be inaccurate and some of their plans might be inefficient. I develop models for learning the preferences of agents subject to false beliefs and to time inconsistency. I use probabilistic programming to provide a concise, extendable implementation of preference inference for suboptimal agents. Agents performing suboptimal sequential planning are represented as functional programs. Chapter 3 considers how preferences vary under different combinations (or &compositions') of outcomes. I use simple mathematical functional forms to model composition. These forms are standard in microeconomics, where the outcomes in question are quantities of goods or services. These goods may provide the same purpose (and be substitutes for one another). Alternatively, they may combine together to perform some useful function (as with complements). I implement Bayesian inference for learning the preferences of agents making choices between different combinations of goods. I compare this procedure to empirical data for two different applications.
by Owain Rhys Evans.
Ph. D. in Linguistics

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Chan, Ching Lap. "Semantic annotation in knowledge engineering, e-learning and computational linguistics". Thesis, City University London, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.576943.

Texto completo

Resumen

In this work, a comprehensive study of semantic annotation has been carried out in early stage. The study focuses on the annotation requirements of human knowledge acquisition in knowledge engineering, e-Iearning and computational linguistics. Based on findings from the study, annotation of natural languages for linguistic analysis creates complicated data structures. Due to the complexity, almost all existing annotation schemes are designed to support only one application domain at one instance. Discovery of new knowledge by means of cross-domains text analysis is limited by the capability of these annotation schemes. To realize the findings in the study and provide solution to the problem, a new general-purpose annotation archival scheme has been developed but not limited to (1) Enable true cross-domain data analysis in knowledge engineering, e-Learning and computational linguistics, and (2) Organize complex structure of human knowledge annotation in an accessible manner, so they can be analyzed in multiple layers through retrieval, search, visualization and etc. To further verify the contributions of the new semantic annotation scheme in real application, experiments has been carried out in several areas, namely (1) collaborative retrieval of complex linguistic information, (2) computer-assisted production of learning material and (3) relevancy comparison between text. In (1), the annotation scheme is applied in a cloud-based platform for hosting parallel multilingual corpora leading to new applications such as computer assisted pattern visualization, speech analysis, speech-to-text transcription and statistical analysis. In (2), the annotation scheme provides support to applications that produce reader friendly learning material suites for teacher, and as a result improve learning quality. In (3), the annotation scheme provides support to a text' comparison platform that carries out writing assessment semantically. XIII

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Godby, Carol Jean. "A Computational Study of Lexicalized Noun Phrases in English". The Ohio State University, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=osu1017343683.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Cheng, Chi Wa. "Probabilistic topic modeling and classification probabilistic PCA for text corpora". HKBU Institutional Repository, 2011. http://repository.hkbu.edu.hk/etd_ra/1263.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Phillips, Aaron B. "Modeling Relevance in Statistical Machine Translation: Scoring Alignment, Context, and Annotations of Translation Instances". Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/134.

Texto completo

Resumen

Machine translation has advanced considerably in recent years, primarily due to the availability of larger datasets. However, one cannot rely on the availability of copious, high-quality bilingual training data. In this work, we improve upon the state-of-the-art in machine translation with an instance-based model that scores each instance of translation in the corpus. A translation instance reflects a source and target correspondence at one specific location in the corpus. The significance of this approach is that our model is able to capture that some instances of translation are more relevant than others. We have implemented this approach in Cunei, a new platform for machine translation that permits the scoring of instance-specific features. Leveraging per-instance alignment features, we demonstrate that Cunei can outperform Moses, a widely-used machine translation system. We then expand on this baseline system in three principal directions, each of which shows further gains. First, we score the source context of a translation instance in order to favor those that are most similar to the input sentence. Second, we apply similar techniques to score the target context of a translation instance and favor those that are most similar to the target hypothesis. Third, we provide a mechanism to mark-up the corpus with annotations (e.g. statistical word clustering, part-of-speech labels, and parse trees) and then exploit this information to create additional perinstance similarity features. Each of these techniques explicitly takes advantage of the fact that our approach scores each instance of translation on demand after the input sentence is provided and while the target hypothesis is being generated; similar extensions would be impossible or quite difficult in existing machine translation systems. Ultimately, this approach provides a more exible framework for integration of novel features that adapts better to new data. In our experiments with German-English and Czech-English translation, the addition of instance-specific features consistently shows improvement.

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Tang, Haijiang. "Building phrase based language model from large corpus /". View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202002%20TANG.

Texto completo

Resumen

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002.
Includes bibliographical references (leaves 74-79). Also available in electronic version. Access restricted to campus users.

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Wang, Long Qi. "Translation accuracy comparison between machine translation and context-free machine natural language grammar–based translation". Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3950657.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Jensen, Sean. "A computational approach to the phonology of connected speech". Thesis, SOAS, University of London, 2000. http://eprints.soas.ac.uk/12710/.

Texto completo

Resumen

This thesis attempts to answer the question "How do we store and retrieve linguistic information?", and to show how this is intimately related to the question of connected speech phonology. The main discussion begins in Chapter One with a non-linguistic introduction to the problem of looking things up, and considers in particular the hashtable and its properties. The theme is developed directly in the latter part of the chapter, and further in Chapter Two, where it is proposed not only that the hashtable is the mechanism actually used by the language faculty, but also that phonology is that mechanism. Chapter Two develops in detail a radically new theory of phonology based on this hypothesis, and examines at length its ramifications. As a foundation for understanding how the phonological and the conceptual-semantic forms of utterances are related, we undertake a detailed study of the relationship between "form" and "meaning" in Chapter Three. We propose a general algorithm, which we claim is a real mechanism driving the acquisition of morphological knowledge, that can abstract and generalise these sorts of morphological relationships. We examine its computational properties, which are surprisingly favourable, and provide a detailed quasi-experimental case-study. By Chapter Four, all the theoretical necessities for describing and ex- plaining what are traditionally believed to be phonological processes operating at the level of the sentence have been introduced. The chapter is used to show how the pieces of Chapters One, Two and Three fit together to tell this story. The chapter also offers some well-motivated speculation on new lines research suggested by some of the computational results obtained throughout this work, and provides a meta-level framework for the future development of a full-scale theory of syntactic function and its acquisition.

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Williams, Sheila Margaret. "LexPhon : a computational implementation of aspects of lexical phonology". Thesis, University of Reading, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.357021.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

Van, Genabith Josef Albert. "Declarative reformulations of DRT and their computational interpretation". Thesis, University of Essex, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.332403.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

JESUS, BIANCA FREITAS DE. "SAY IN PORTUGUESE: A DIALOGUE BETWEEN TRANSLATION, DESCRIPTION AND COMPUTATIONAL LINGUISTICS". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2016. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=27758@1.

Texto completo

Resumen

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE SUPORTE À PÓS-GRADUAÇÃO DE INSTS. DE ENSINO
Os profissionais que traduzem textos do inglês para o português deparam-se com uma exigência dos clientes quanto à tradução de diálogos: diversificar os verbos de elocução, com o intuito de evitar a repetição do verbo dizer. Em resposta, esta pesquisa visa à elaboração de um glossário dos verbos que introduzem discurso relatado, chamados de verbos de elocução. Para tanto, conduzimos um estudo descritivo com base em corpus, com o intuito de compilar um léxico dos verbos de elocução do português. Assim, esta pesquisa promoveu um amplo levantamento dos verbos de elocução, estabeleceu os padrões de uso nos quais esses verbos costumam ser empregados e, ainda, propôs uma classificação desses verbos em grupos de sentido. Nosso estudo traz, portanto, dois objetivos principais, que se traduzem em contribuições concretas: (i) a elaboração de um glossário dos verbos introdutores de discurso relatado para tradutores, chamado DISSE, e (ii) a descrição dessa classe de verbos no português, tomando um viés semântico e com base em grandes corpora. Como contribuições secundárias, mas não menos importantes e já em fase final de implementação, destacam-se (iii) a criação e a disponibilização de um corpus anotado quanto aos verbos do grande grupo de dizer, também chamados de verbos de comunicação e (iv) a colaboração na elaboração de sistemas capazes de identificar automaticamente citações em textos da língua portuguesa.
Professionals that translate from English to Portuguese often face a certain demand from clients when it comes to translating dialogues: diversifying the verbs that introduce reported speech, in an attempt to avoid repeating the verb dizer (say) in Portuguese. In order to help solve such problem, this study aims at developing a glossary of verbs that introduce reported speech. To reach that aim, we conducted a corpus-based descriptive research, in order to compile a quotation verbs lexicon for the Portuguese language. Thus, this study promoted a wide collection of quotation verbs, as well as stablished patters of usage in which these verbs are commonly found and put forward a classification of these verbs into groups of meaning. This study proposes two main objectives, which have led to concrete contributions: (i) the elaboration of a reported speech verbs glossary for translators of Portuguese, which is called DISSE (said, in Portuguese), and (ii) the description of such verb class in Portuguese, with a semantic approach and based on large corpora. As secondary contributions, but far from unimportant and close to final implementation, it is possible to highlight (iii) the creation and the public availability of annotated corpora, including a semantic annotation for reported speech verbs and (iv) the collaboration with the preparation of systems capable of automatically identifying quotations in Portuguese-written texts.

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Belz, Anja. "Computational learning of finite-state models for natural language processing". Thesis, University of Sussex, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.311331.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Zhang, Lidan y 张丽丹. "Exploiting linguistic knowledge for statistical natural language processing". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46506299.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Hardie, Andrew. "The computational analysis of morphosyntactic categories in Urdu". Thesis, Lancaster University, 2004. http://eprints.lancs.ac.uk/106/.

Texto completo

Resumen

Urdu is a language of the Indo-Aryan family, widely spoken in India and Pakistan, and an important minority language in Europe, North America, and elsewhere. This thesis describes the development of a computer-based system for part-of-speech tagging of Urdu texts, consisting of a tagset, a set of tagging guidelines for manual tagging or post-editing, and the tagger itself. The tagset is defined in accordance with a set of design principles, derived from a survey of good practice in the field of tagset design, including compliance with the EAGLES guidelines on morphosyntactic annotation. These are shown to be extensible to languages, such as Urdu, that are closely related to those languages for which the guidelines were originally devised. The description of Urdu grammar given by Schmidt (1999) is used as a model of the language for the purpose of tagset design. Manual tagging is undertaken using this tagset, by which process a set of tagging guidelines are created, and a set of manually tagged texts to serve as training data is obtained. A rule-based methodology is used here to perform tagging in Urdu. The justification for this choice is discussed. A suite of programs which function together within the Unitag architecture are described. This system (as well as a tokeniser) includes an analyser (Urdutag) based on lexical look-up and word-form analysis, and a disambiguator (Unirule) which removes contextually inappropriate tags using a set of 274 rules. While the system's final performance is not particularly impressive, this is largely due to a paucity of training data leading to a small lexicon, rather than any substantial flaw in the system.

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

Kotsifas, Dimitrios. "Intonation and sentence type interpretation in Greek : A production and perception approach". Thesis, University of Skövde, School of Humanities and Informatics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-2960.

Texto completo

Resumen

This thesis examines the intonation patterns of Modern Greek with regard to different interpretations of the sentence types (declarative, interrogative, imperative).

14 utterances are produced by Greek native speakers (2 men and 2 women) so as to express various speech acts: STATEMENT, QUESTION, COMMAND and REQUEST.

The acquisition of the F0 curve for each utterance by means of the Wavesurfer tool leads to an analysis of the pitch movements and their alignments.

After the F0 curves are analyzed and illustrated using the Excel program we are able to compare and group them. Thus, we come up with 5 different intonation patterns. After a second-level comparison based on the fact that some of the F0 curves were similar but they differed only as far as the final pitch movement is concerned, we ended up with 3 fundamental categories of intonation patterns: Category I whose main feature is the rising pitch movement aligned to the onset of the stressed syllables. This category includes only sentences that denote Statement so we can call it the STATEMENT category. Category II’s main characteristic is a dipping pitch movement aligned to the head of the utterance that is the stress of the verb or a particle that signifies negation (/min/, /den/). Sentences meaning Command or Request belong to this category. Lastly, Category III’s intonation pattern consists of peaking pitch movements aligned to the initial and final stressed syllables. Interrogative sentences belong to this category no matter their interpretation.

A secondary goal of the thesis is to examine to which extent intonation can be a safe criterion for the “correct” interpretation of a sentence. A de facto presumption that since the ratio between the number of utterances (14) and the different intonation patterns (5) is not 1:1 there can always be misunderstandings among speakers, is basically verified by the results of our perception test conducted to Greek native speakers: Greek native speakers were able to identify most of the speech acts that were expressed by the most common (default) sentence type (i.e. imperative sentence for COMMAND and interrogative for QUESTION) however there were combinations that they had difficulties to identify, such as interrogative sentences that were denoting other than QUESTION, e.g. REQUEST or STATEMENT.Ending, a perception test conducted to Flemish speakers (subjects that were native speakers of another language than Greek) showed that they were more successful in sentences that meant STATEMENT and QUESTION but they could hardly identify an interrogative sentence that meant other than QUESTION and they also confused between COMMAND and REQUEST. This implies that the intonation used to convey different interpretations is basically language-dependent.

Concluding, this study offers a description of the intonation patterns (based on pitch movements) regarding the 3 sentence types with 4 different interpretations. Our findings prove that the intonation for some cases (i.e. for sentences that express COMMAND or STATEMENT) seems to be structure-independent and for others structure-dependent (cf. the interrogative sentences). Additionally, the fact that the negation can play an important role for the choice of intonation pattern (as shown for the case of COMMAND and STATEMENT) could be considered as a structure-dependent feature of intonation. This approach contrasts the approach used for many years in the traditional Grammar according to which the structure alone (sentence type) defines the meaning that is to be conveyed.

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

Filali, Karim. "Multi-dynamic Bayesian networks for machine translation and NLP /". Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/6857.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Tonkes, Bradley. "On the origins of linguistic structure : computational models of the evolution of language /". St. Lucia, Qld, 2001. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe16529.pdf.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Bird, Steven. "Constraint-based phonology". Thesis, University of Edinburgh, 1991. http://hdl.handle.net/1842/23727.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Macias, Benjamin. "An incremental parser for government-binding theory". Thesis, University of Cambridge, 1991. https://www.repository.cam.ac.uk/handle/1810/251511.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

Cahill, Lynne Julie. "Syllable-based morphology for natural language processing". Thesis, University of Sussex, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.386529.

Texto completo

Resumen

This thesis addresses the problem of accounting for morphological alternation within Natural Language Processing. It proposes an approach to morphology which is based on phonological concepts, in particular the syllable, in contrast to morpheme-based approaches which have standardly been used by both NLP and linguistics. It is argued that morpheme-based approaches, within both linguistics and NLP, grew out of the apparently purely affixational morphology of European languages, and especially English, but are less appropriate for non-affixational languages such as Arabic. Indeed, it is claimed that even accounts of those European languages miss important linguistic generalizations by ignoring more phonologically based alternations, such as umlaut in German and ablaut in English. To justify this approach, we present a wide range of data from languages as diverse as German and Rotuman. A formal language, MOLUSe, is described, which allows for the definition of declarative mappings between syllable-sequences, and accounts of non-trivial fragments of the inflectional morphology of English, Arabic and Sanskrit are presented, to demonstrate the capabilities of the language. A semantics for the language is defined, and the implementation of an interpreter is described. The thesis discusses theoretical (linguistic) issues, as well as implementational issues involved in the incorporation of MOLUSC into a larger lexicon system. The approach is contrasted with previous work in computational morphology, in particular finite-state morphology, and its relation to other work in the fields of morphology and phonology is also discussed.

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Keenan, Francis Gerard. "Large vocabulary syntactic analysis for text recognition". Thesis, Nottingham Trent University, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.334311.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

Ananiadou, Sofia. "Towards a methodology for automatic term recognition". Thesis, University of Manchester, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.277200.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

Copestake, Ann Alicia. "The representation of lexical semantic information". Thesis, University of Sussex, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.359745.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

Smith, Mark H. "Natural language generation in the LOLITA system an engineering approach". Thesis, Durham University, 1995. http://etheses.dur.ac.uk/5457/.

Texto completo

Resumen

Natural Language Generation (NLG) is the automatic generation of Natural Language (NL) by computer in order to meet communicative goals. One aim of NL processing (NLP) is to allow more natural communication with a computer and, since communication is a two-way process, a NL system should be able to produce as well as interpret NL text. This research concerns the design and implementation of a NLG module for the LOLITA system. LOLITA (Large scale, Object-based, Linguistic Interactor, Translator and Analyser) is a general purpose base NLP system which performs core NLP tasks and upon which prototype NL applications have been built. As part of this encompassing project, this research shares some of its properties and methodological assumptions: the LOLITA generator has been built following Natural Language Engineering principles uses LOLITA's SemNet representation as input and is implemented in the functional programming language Haskell. As in other generation systems the adopted solution utilises a two component architecture. However, in order to avoid problems which occur at the interface between traditional planning and realisation modules (known as the generation gap) the distribution of tasks between the planner and plan-realiser is different: the plan-realiser, in the absence of detailed planning instructions, must perform some tasks (such as the selection and ordering of content) which are more traditionally performed by a planner. This work largely concerns the development of the plan- realiser and its interface with the planner. Another aspect of the solution is the use of Abstract Transformations which act on the SemNet input before realisation leading to an increased ability for creating paraphrases. The research has lead to a practical working solution which has greatly increased the power of the LOLITA system. The research also investigates how NLG systems can be evaluated and the advantages and disadvantages of using a functional language for the generation task.

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

Gwei, G. M. "New models of natural language for consultative computing". Thesis, University of Nottingham, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.378986.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

Fischer, Markus. "Automatic generation of spatial configurations in user interfaces". Thesis, University of Brighton, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.263977.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

Clough, Paul D. "Measuring text reuse". Thesis, University of Sheffield, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.275023.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

Osborne, Miles. "Learning unification-based natural language grammars". Thesis, University of York, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.241010.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

Lyon, Caroline. "The representation of natural language to enable neural networks to detect syntactic features". Thesis, University of Hertfordshire, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.387160.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

Downey, Daniel J. G. "Knowledge representation in natural language : the wordicle - a subconscious connection". Thesis, Cranfield University, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.333137.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Bridge, Derek G. "Computing presuppositions in an incremental natural language processing system". Thesis, University of Cambridge, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.315811.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Clark, Stephen. "Class-based statistical models for lexical knowledge acquisition". Thesis, University of Sussex, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.341541.

Texto completo

Resumen

This thesis is about the automatic acquisition of a particular kind of lexical knowledge, namely the knowledge of which noun senses can fill the argument slots of predicates. The knowledge is represented using probabilities, which agrees with the intuition that there are no absolute constraints on the arguments of predicates, but that the constraints are satisfied to a certain degree; thus the problem of knowledge acquisition becomes the problem of probability estimation from corpus data. The problem with defining a probability model in terms of senses is that this involves a huge number of parameters, which results in a sparse data problem. The proposal here is to define a probability model over senses in a semantic hierarchy, and exploit the fact that senses can be grouped into classes consisting of semantically similar senses. A novel class-based estimation technique is developed, together with a procedure that determines a suitable class for a sense (given a predicate and argument position). The problem of determining a suitable class can be thought of as finding a suitable level of generalisation in the hierarchy. The generalisation procedure uses a statistical test to locate areas consisting of semantically similar senses, and, as well as being used for probability estimation, is also employed as part of a re-estimation algorithm for estimating sense frequencies from incomplete data. The rest of the thesis considers how the lexical knowledge can be used to resolve structural ambiguities, and provides empirical evaluations. The estimation techniques are first integrated into a parse selection system, using a probabilistic dependency model to rank the alternative parses for a sentence. Then, a PP-attachment task is used to provide an evaluation which is more focussed on the class-based estimation technique, and, finally, a pseudo disambiguation task is used to compare the estimation technique with alternative approaches.

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

Luk, Robert Wing Pong. "Stochastic transduction for English grapheme-to-phoneme conversion". Thesis, University of Southampton, 1992. https://eprints.soton.ac.uk/250076/.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Dale, Robert. "Generating referring expressions in a domain of objects and processes". Thesis, University of Edinburgh, 1989. http://hdl.handle.net/1842/32242.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Berman, Lucy. "Lewisian Properties and Natural Language Processing: Computational Linguistics from a Philosophical Perspective". Scholarship @ Claremont, 2019. https://scholarship.claremont.edu/cmc_theses/2200.

Texto completo

Resumen

Nothing seems more obvious than that our words have meaning. When people speak to each other, they exchange information through the use of a particular set of words. The words they say to each other, moreover, are about something. Yet this relation of “aboutness,” known as “reference,” is not quite as simple as it appears. In this thesis I will present two opposing arguments about the nature of our words and how they relate to the things around us. First, I will present Hilary Putnam’s argument, in which he examines the indeterminacy of reference, forcing us to conclude that we must abandon metaphysical realism. While Putnam considers his argument to be a refutation of non-epistemicism, David Lewis takes it to be a reductio, claiming Putnam’s conclusion is incredible. I will present Lewis’s response to Putnam, in which he accepts the challenge of demonstrating how Putnam’s argument fails and rescuing us from the abandonment of realism. In order to explain the determinacy of reference, Lewis introduces the concept of “natural properties.” In the final chapter of this thesis, I will propose another use for Lewisian properties. Namely, that of helping to minimize the gap between natural language processing and human communication.

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Sun, Shupeng. "The clarity of disclosure in patents: An economic analysis using computational linguistics". Thesis, Queensland University of Technology, 2018. https://eprints.qut.edu.au/122181/1/Shupeng_Sun_Thesis.pdf.

Texto completo

Resumen

This thesis aims to explore and demonstrate the use of computational linguistic analysis to measure the "readability" of patent documents. By using readability as a proxy for the extent of disclosure in patent documents, this thesis studies whether patent applicants may strategically choose the disclosure level for their patents, and how the disclosure level would affect the patent acquisitions and patent examination. This thesis introduces a new method to the quantitative economic analysis of patents, and generates research results with important implications for patent policy and practice.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "Computational linguistics"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros