To see the other types of publications on this topic, follow the link: Probabilistic grammar.

Dissertations / Theses on the topic 'Probabilistic grammar'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 28 dissertations / theses for your research on the topic 'Probabilistic grammar.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Kwiatkowski, Thomas Mieczyslaw. "Probabilistic grammar induction from sentences and structured meanings." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/6190.

Full text
Abstract:
The meanings of natural language sentences may be represented as compositional logical-forms. Each word or lexicalised multiword-element has an associated logicalform representing its meaning. Full sentential logical-forms are then composed from these word logical-forms via a syntactic parse of the sentence. This thesis develops two computational systems that learn both the word-meanings and parsing model required to map sentences onto logical-forms from an example corpus of (sentence, logical-form) pairs. One of these systems is designed to provide a general purpose method of inducing semantic parsers for multiple languages and logical meaning representations. Semantic parsers map sentences onto logical representations of their meanings and may form an important part of any computational task that needs to interpret the meanings of sentences. The other system is designed to model the way in which a child learns the semantics and syntax of their first language. Here, logical-forms are used to represent the potentially ambiguous context in which childdirected utterances are spoken and a psycholinguistically plausible training algorithm learns a probabilistic grammar that describes the target language. This computational modelling task is important as it can provide evidence for or against competing theories of how children learn their first language. Both of the systems presented here are based upon two working hypotheses. First, that the correct parse of any sentence in any language is contained in a set of possible parses defined in terms of the sentence itself, the sentence’s logical-form and a small set of combinatory rule schemata. The second working hypothesis is that, given a corpus of (sentence, logical-form) pairs that each support a large number of possible parses according to the schemata mentioned above, it is possible to learn a probabilistic parsing model that accurately describes the target language. The algorithm for semantic parser induction learns Combinatory Categorial Grammar (CCG) lexicons and discriminative probabilistic parsing models from corpora of (sentence, logical-form) pairs. This system is shown to achieve at or near state of the art performance across multiple languages, logical meaning representations and domains. As the approach is not tied to any single natural or logical language, this system represents an important step towards widely applicable black-box methods for semantic parser induction. This thesis also develops an efficient representation of the CCG lexicon that separately stores language specific syntactic regularities and domain specific semantic knowledge. This factorised lexical representation improves the performance of CCG based semantic parsers in sparse domains and also provides a potential basis for lexical expansion and domain adaptation for semantic parsers. The algorithm for modelling child language acquisition learns a generative probabilistic model of CCG parses from sentences paired with a context set of potential logical-forms containing one correct entry and a number of distractors. The online learning algorithm used is intended to be psycholinguistically plausible and to assume as little information specific to the task of language learning as is possible. It is shown that this algorithm learns an accurate parsing model despite making very few initial assumptions. It is also shown that the manner in which both word-meanings and syntactic rules are learnt is in accordance with observations of both of these learning tasks in children, supporting a theory of language acquisition that builds upon the two working hypotheses stated above.
APA, Harvard, Vancouver, ISO, and other styles
2

Stüber, Torsten. "Consistency of Probabilistic Context-Free Grammars." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-86943.

Full text
Abstract:
We present an algorithm for deciding whether an arbitrary proper probabilistic context-free grammar is consistent, i.e., whether the probability that a derivation terminates is one. Our procedure has time complexity $\\\\mathcal O(n^3)$ in the unit-cost model of computation. Moreover, we develop a novel characterization of consistent probabilistic context-free grammars. A simple corollary of our result is that training methods for probabilistic context-free grammars that are based on maximum-likelihood estimation always yield consistent grammars.
APA, Harvard, Vancouver, ISO, and other styles
3

Afrin, Taniza. "Extraction of Basic Noun Phrases from Natural Language Using Statistical Context-Free Grammar." Thesis, Virginia Tech, 2001. http://hdl.handle.net/10919/33353.

Full text
Abstract:
The objective of this research was to extract simple noun phrases from natural language texts using two different grammars: stochastic context-free grammar (SCFG) and non-statistical context free grammar (CFG). Precision and recall were calculated to determine how many precise and correct noun phrases were extracted using these two grammars. Several text files containing sentences from English natural language specifications were analyzed manually to obtain the test-set of simple noun-phrases. To obtain precision and recall, this test-set of manually extracted noun phrases was compared with the extracted-sets of noun phrases obtained using the both grammars SCFG and CFG. A probabilistic chart parser was developed by modifying a deterministic parallel chart parser. Extraction of simple noun-phrases with the SCFG was accomplished using this probabilistic chart parser, a dictionary containing word probabilities along with the meaning, context-free grammar rules associated with rule probabilities and finally an algorithm to extract most likely parses of a sentence. The probabilistic parsing algorithm and the algorithm to determine figures of merit were implemented using C++ programming language.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
4

Hsu, Hsin-jen. "A neurophysiological study on probabilistic grammatical learning and sentence processing." Diss., University of Iowa, 2009. https://ir.uiowa.edu/etd/243.

Full text
Abstract:
Syntactic anomalies reliably elicit P600 effects in natural language processing. A survey of previous work converged on a conclusion that the mean amplitude of the P600 seems to be associated with the goodness of fit of a target word with expectation generated based on already unfolded materials. Based on this characteristic of the P600 effects, the current study aimed to look for evidence indicating the influence of input statistics in shaping grammatical knowledge/representations, and as a result leading to probabilistically-based competition/expectation generation processes of online sentence processing. An artificial grammar learning (AGL) task with 4 different conditions varying in probabilities were used to test this hypothesis. Results from this task indicated graded mean amplitude of the P600 effects across conditions, and the pattern of gradience is consistent with the variation of the input statistics. The use of the artificial language to simulate natural language learning process was further justified with statistically undistinguishable P600 effects elicited in a natural language sentence processing (NLSP) task. Together, the results indicate that the same neural mechanisms are recruited for both syntactic processing of natural language stimuli and sentence strings in an artificial language.
APA, Harvard, Vancouver, ISO, and other styles
5

Brookes, James William Rowe. "Probabilistic and multivariate modelling in Latin grammar : the participle-auxiliary alternation as a case study." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/probabilistic-and-multivariate-modelling-in-latin-grammar-the-participleauxiliary-alternation-as-a-case-study(4ff5b912-c410-41f2-94f2-859eb1ce5b21).html.

Full text
Abstract:
Recent research has shown that language is sensitive to probabilities and a whole host of multivariate conditioning factors. However, most of the research in this arena centres on the grammar of English, and, as yet, there is no statistical modelling on the grammar of Latin, studies of which have to date been largely philological. The rise in advanced statistical methodologies allows us to capture the underlying structure of the rich datasets which this corpus only language can potentially offer. This thesis intends to remedy this deficit by applying probabilistic and multivariate models to a specific case study, namely the alternation of word order in Latin participle auxiliary clusters (pacs), which alternate between participle-auxiliary order, as in mortuus est ‘dead is’ and est mortuus ‘is dead’. The broad research questions to be explored in this thesis are the following: (i) To what extent are probabilistic models useful and reflective of Latin syntax variation phenomena?, (ii) What are the most useful statistical models to use?, (iii) What types of linguistic variables influence variation, (iv) What theoretical implications and explanations do the statistical models suggest?Against this backdrop, a dataset of 2409 pac observations are extracted from Late Re- publican texts of the first century bc. The dataset is annotated for an “information space” of thirty-three predictor variables from various levels of linguistics: text and lemma-based variability, prosody and phonology, grammar, semantics and pragmatics, and usage-based features such as frequency. The study exploits such statistical tools as generalized linear models and multilevel generalized linear models for the regression modelling of the binary categorical outcome. However, because of the potential collinearity, and the many predictor terms, amongst other issues, the use of these models to assess the joint effect of all predictors is particularly problematic. As such, the new statistical toolkit of random forests is utilized for evaluating the relative contribution of each predictor. Overall, it is found that Latin is indeed probabilistic in its grammar, and the condition- ing factors that govern it are spread widely throughout the language space. It is also noted that probabilistic models, such as the ones used in this study, have practical applications in traditional areas of philology, including textual criticism and literary stylistics.
APA, Harvard, Vancouver, ISO, and other styles
6

Buys, Jan Moolman. "Probabilistic tree transducers for grammatical error correction." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019.1/85592.

Full text
Abstract:
Thesis (MSc)--Stellenbosch University, 2013.
ENGLISH ABSTRACT: We investigate the application of weighted tree transducers to correcting grammatical errors in natural language. Weighted finite-state transducers (FST) have been used successfully in a wide range of natural language processing (NLP) tasks, even though the expressiveness of the linguistic transformations they perform is limited. Recently, there has been an increase in the use of weighted tree transducers and related formalisms that can express syntax-based natural language transformations in a probabilistic setting. The NLP task that we investigate is the automatic correction of grammar errors made by English language learners. In contrast to spelling correction, which can be performed with a very high accuracy, the performance of grammar correction systems is still low for most error types. Commercial grammar correction systems mostly use rule-based methods. The most common approach in recent grammatical error correction research is to use statistical classifiers that make local decisions about the occurrence of specific error types. The approach that we investigate is related to a number of other approaches inspired by statistical machine translation (SMT) or based on language modelling. Corpora of language learner writing annotated with error corrections are used as training data. Our baseline model is a noisy-channel FST model consisting of an n-gram language model and a FST error model, which performs word insertion, deletion and replacement operations. The tree transducer model we use to perform error correction is a weighted top-down tree-to-string transducer, formulated to perform transformations between parse trees of correct sentences and incorrect sentences. Using an algorithm developed for syntax-based SMT, transducer rules are extracted from training data of which the correct version of sentences have been parsed. Rule weights are also estimated from the training data. Hypothesis sentences generated by the tree transducer are reranked using an n-gram language model. We perform experiments to evaluate the performance of different configurations of the proposed models. In our implementation an existing tree transducer toolkit is used. To make decoding time feasible sentences are split into clauses and heuristic pruning is performed during decoding. We consider different modelling choices in the construction of transducer rules. The evaluation of our models is based on precision and recall. Experiments are performed to correct various error types on two learner corpora. The results show that our system is competitive with existing approaches on several error types.
AFRIKAANSE OPSOMMING: Ons ondersoek die toepassing van geweegde boomoutomate om grammatikafoute in natuurlike taal outomaties reg te stel. Geweegde eindigetoestand outomate word suksesvol gebruik in ’n wye omvang van take in natuurlike taalverwerking, alhoewel die uitdrukkingskrag van die taalkundige transformasies wat hulle uitvoer beperk is. Daar is die afgelope tyd ’n toename in die gebruik van geweegde boomoutomate en verwante formalismes wat sintaktiese transformasies in natuurlike taal in ’n probabilistiese raamwerk voorstel. Die natuurlike taalverwerkingstoepassing wat ons ondersoek is die outomatiese regstelling van taalfoute wat gemaak word deur Engelse taalleerders. Terwyl speltoetsing in Engels met ’n baie hoë akkuraatheid gedoen kan word, is die prestasie van taalregstellingstelsels nog relatief swak vir meeste fouttipes. Kommersiële taalregstellingstelsels maak oorwegend gebruik van reël-gebaseerde metodes. Die algemeenste benadering in onlangse navorsing oor grammatikale foutkorreksie is om statistiese klassifiseerders wat plaaslike besluite oor die voorkoms van spesifieke fouttipes maak te gebruik. Die benadering wat ons ondersoek is verwant aan ’n aantal ander benaderings wat geïnspireer is deur statistiese masjienvertaling of op taalmodellering gebaseer is. Korpora van taalleerderskryfwerk wat met foutregstellings geannoteer is, word as afrigdata gebruik. Ons kontrolestelsel is ’n geraaskanaal eindigetoestand outomaatmodel wat bestaan uit ’n n-gram taalmodel en ’n foutmodel wat invoegings-, verwyderings- en vervangingsoperasies op woordvlak uitvoer. Die boomoutomaatmodel wat ons gebruik vir grammatikale foutkorreksie is ’n geweegde bo-na-onder boom-na-string omsetteroutomaat geformuleer om transformasies tussen sintaksbome van korrekte sinne en foutiewe sinne te maak. ’n Algoritme wat ontwikkel is vir sintaksgebaseerde statistiese masjienvertaling word gebruik om reëls te onttrek uit die afrigdata, waarvan sintaksontleding op die korrekte weergawe van die sinne gedoen is. Reëlgewigte word ook vanaf die afrigdata beraam. Hipotese-sinne gegenereer deur die boomoutomaat word herrangskik met behulp van ’n n-gram taalmodel. Ons voer eksperimente uit om die doeltreffendheid van verskillende opstellings van die voorgestelde modelle te evalueer. In ons implementering word ’n bestaande boomoutomaat sagtewarepakket gebruik. Om die dekoderingstyd te verminder word sinne in frases verdeel en die soekruimte heuristies besnoei. Ons oorweeg verskeie modelleringskeuses in die samestelling van outomaatreëls. Die evaluering van ons modelle word gebaseer op presisie en herroepvermoë. Eksperimente word uitgevoer om verskeie fouttipes reg te maak op twee leerderkorpora. Die resultate wys dat ons model kompeterend is met bestaande benaderings op verskeie fouttipes.
APA, Harvard, Vancouver, ISO, and other styles
7

Shan, Yin Information Technology &amp Electrical Engineering Australian Defence Force Academy UNSW. "Program distribution estimation with grammar models." Awarded by:University of New South Wales - Australian Defence Force Academy. School of Information Technology and Electrical Engineering, 2005. http://handle.unsw.edu.au/1959.4/38737.

Full text
Abstract:
This thesis studies grammar-based approaches in the application of Estimation of Distribution Algorithms (EDA) to the tree representation widely used in Genetic Programming (GP). Although EDA is becoming one of the most active fields in Evolutionary computation (EC), the solution representation in most EDA is a Genetic Algorithms (GA) style linear representation. The more complex tree representations, resembling GP, have received only limited exploration. This is unfortunate, because tree representations provide a natural and expressive way of representing solutions for many problems. This thesis aims to help fill this gap, exploring grammar-based approaches to extending EDA to GP-style tree representations. This thesis firstly provides a comprehensive survey of current research on EDA with emphasis on EDA with GP-style tree representation. The thesis attempts to clarify the relationship between EDA with conventional linear representations and those with a GP-style tree representation, and to reveal the unique difficulties which face this research. Secondly, the thesis identifies desirable properties of probabilistic models for EDA with GP-style tree representation, and derives the PRODIGY framework as a consequence. Thirdly, following the PRODIGY framework, three methods are proposed. The first method is Program Evolution with Explicit Learning (PEEL). Its incremental general-to-specific grammar learning method balances the effectiveness and efficiency of the grammar learning. The second method is Grammar Model-based Program Evolution (GMPE). GMPE realises the PRODIGY framework by introducing elegant inference methods from the formal grammar field. GMPE provides good performance on some problems, but also provides a means to better understand some aspects of conventional GP, especially the building block hypothesis. The third method is Swift GMPE (sGMPE), which is an extension of GMPE, aiming at reducing the computational cost. Fourthly, a more accurate Minimum Message Length metric for grammar learning in PRODIGY is derived in this thesis. This metric leads to improved performance in the GMPE system, but may also be useful in grammar learning in general. It is also relevant to the learning of other probabilistic graphical models.
APA, Harvard, Vancouver, ISO, and other styles
8

Pinnow, Eleni. "The role of probabilistic phonotactics in the recognition of reduced pseudowords." Diss., Online access via UMI:, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Mora, Randall P., and Jerry L. Hill. "Service-Based Approach for Intelligent Agent Frameworks." International Foundation for Telemetering, 2011. http://hdl.handle.net/10150/595661.

Full text
Abstract:
ITC/USA 2011 Conference Proceedings / The Forty-Seventh Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2011 / Bally's Las Vegas, Las Vegas, Nevada
This paper describes a service-based Intelligent Agent (IA) approach for machine learning and data mining of distributed heterogeneous data streams. We focus on an open architecture framework that enables the programmer/analyst to build an IA suite for mining, examining and evaluating heterogeneous data for semantic representations, while iteratively building the probabilistic model in real-time to improve predictability. The Framework facilitates model development and evaluation while delivering the capability to tune machine learning algorithms and models to deliver increasingly favorable scores prior to production deployment. The IA Framework focuses on open standard interoperability, simplifying integration into existing environments.
APA, Harvard, Vancouver, ISO, and other styles
10

Torres, Parra Jimena Cecilia. "A Perception Based Question-Answering Architecture Derived from Computing with Words." Available to subscribers only, 2009. http://proquest.umi.com/pqdweb?did=1967797581&sid=1&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Plum, Guenter Arnold. "Text and Contextual Conditioning in Spoken English: A genre approach." Thesis, The University of Sydney, 1988. http://hdl.handle.net/2123/608.

Full text
Abstract:
This study brings together two approaches to linguistic variation, Hallidayan systemic-functional grammar and Labovian variation theory, and in doing so brings together a functional interpretation of language and its empirical investigation in its social context. The study reports on an empirical investigation of the concept of text. The investigation proceeds on the basis of a corpus of texts gathered in sociolinguistic interviews with fifty adult speakers of Australian English in Sydney. The total corpus accounted for in terms of text type or genre numbers 420 texts of varying length, 125 of which, produced in response to four narrative questions, are investigated in greater detail in respect both of the types of text they constitute as well as of some of their linguistic realisations. These largely narrative-type texts, which represent between two and three hours of spoken English and total approximately 53000 words, are presented in a second volume analysed in terms of their textual or generic structure as well as their realisation at the level of the clause complex. The study explores in some detail models of register and genre developed within systemic-functional linguistics, adopting a genre model developed by J.R. Martin and others working within his model which foregrounds the notion that all aspects of the system(s) involved are related to one another probabilistically. In order to investigate the concept of text in actual discourse under conditions which permit us to become sufficiently confident of our understanding of it to proceed to generalisations about text and its contextual conditioning in spoken discourse, we turn to Labovian methods of sociolinguistic inquiry, i.e. to quantitative methods or methods of quantifying linguistic choice. The study takes the sociolinguistic interview as pioneered by Labov in his study of phonological variation in New York City and develops it for the purpose of investigating textual variation. The question of methodology constitutes a substantial part of the study, contributing in the process to a much greater understanding of the very phenomenon of text in discourse, for example by addressing itself to the question of the feasibility of operationalising a concept of text in the context of spoken discourse. The narrative-type texts investigated in further detail were found to range on a continuum from most experientially-oriented texts such as procedure and recount at one end to the classic narrative of personal experience and anecdote to the increasingly interpersonally-oriented exemplum and observation, both of which become interpretative of the real world in contrast to the straightforwardly representational slant taken on the same experience by the more experientially-oriented texts. The explanation for the generic variation along this continuum must be sought in a system of generic choice which is essentially cultural. A quantitative analysis of clausal theme and clause complex-type relations was carried out, the latter by means of log-linear analysis, in order to investigate their correlation with generic structure. While it was possible to relate the choice of theme to the particular stages of generic structures, clause complex-type relations are chosen too infrequently to be related to stages and were thus related to genres as a whole. We find that while by and large the choice of theme correlates well with different generic stages, it only discriminates between different genres, i.e. generic structures in toto, for those genres which are maximally different. Similarly, investigating the two choices in the principal systems involved in the organisation of the clause complex, i.e. the choice of taxis (parataxis vs. hypotaxis) and the (grammatically independent) choice of logico-semantic relations (expansion vs. projection), we find that both those choices discriminate better between types more distant on a narrative continuum. The log-linear analysis of clause complex-type relations also permitted the investigation of the social characteristics of speakers. We found that the choice of logico-semantic relations correlates with genre and question, while the choice of taxis correlates with a speaker's sex and his membership of some social group (in addition to genre). Parataxis is favoured by men and by members of the group lowest in the social hierarchy. Age on the other hand is not significant in the choice of taxis at all. In other words, since social factors are clearly shown to be significant in the making of abstract grammatical choices where they cannot be explained in terms of the functional organisation of text, we conclude that social factors must be made part of a model of text in order to fully account for its contextual conditioning. The study demonstrates that an understanding of the linguistic properties of discourse requires empirical study and, conversely, that it is possible to study discourse empirically without relaxing the standards of scientific inquiry.
APA, Harvard, Vancouver, ISO, and other styles
12

Plum, Guenter Arnold. "Text and Contextual Conditioning in Spoken English: A genre approach." University of Sydney. Linguistics, 1988. http://hdl.handle.net/2123/608.

Full text
Abstract:
This study brings together two approaches to linguistic variation, Hallidayan systemic-functional grammar and Labovian variation theory, and in doing so brings together a functional interpretation of language and its empirical investigation in its social context. The study reports on an empirical investigation of the concept of text. The investigation proceeds on the basis of a corpus of texts gathered in sociolinguistic interviews with fifty adult speakers of Australian English in Sydney. The total corpus accounted for in terms of text type or genre numbers 420 texts of varying length, 125 of which, produced in response to four narrative questions, are investigated in greater detail in respect both of the types of text they constitute as well as of some of their linguistic realisations. These largely narrative-type texts, which represent between two and three hours of spoken English and total approximately 53000 words, are presented in a second volume analysed in terms of their textual or generic structure as well as their realisation at the level of the clause complex. The study explores in some detail models of register and genre developed within systemic-functional linguistics, adopting a genre model developed by J.R. Martin and others working within his model which foregrounds the notion that all aspects of the system(s) involved are related to one another probabilistically. In order to investigate the concept of text in actual discourse under conditions which permit us to become sufficiently confident of our understanding of it to proceed to generalisations about text and its contextual conditioning in spoken discourse, we turn to Labovian methods of sociolinguistic inquiry, i.e. to quantitative methods or methods of quantifying linguistic choice. The study takes the sociolinguistic interview as pioneered by Labov in his study of phonological variation in New York City and develops it for the purpose of investigating textual variation. The question of methodology constitutes a substantial part of the study, contributing in the process to a much greater understanding of the very phenomenon of text in discourse, for example by addressing itself to the question of the feasibility of operationalising a concept of text in the context of spoken discourse. The narrative-type texts investigated in further detail were found to range on a continuum from most experientially-oriented texts such as procedure and recount at one end to the classic narrative of personal experience and anecdote to the increasingly interpersonally-oriented exemplum and observation, both of which become interpretative of the real world in contrast to the straightforwardly representational slant taken on the same experience by the more experientially-oriented texts. The explanation for the generic variation along this continuum must be sought in a system of generic choice which is essentially cultural. A quantitative analysis of clausal theme and clause complex-type relations was carried out, the latter by means of log-linear analysis, in order to investigate their correlation with generic structure. While it was possible to relate the choice of theme to the particular stages of generic structures, clause complex-type relations are chosen too infrequently to be related to stages and were thus related to genres as a whole. We find that while by and large the choice of theme correlates well with different generic stages, it only discriminates between different genres, i.e. generic structures in toto, for those genres which are maximally different. Similarly, investigating the two choices in the principal systems involved in the organisation of the clause complex, i.e. the choice of taxis (parataxis vs. hypotaxis) and the (grammatically independent) choice of logico-semantic relations (expansion vs. projection), we find that both those choices discriminate better between types more distant on a narrative continuum. The log-linear analysis of clause complex-type relations also permitted the investigation of the social characteristics of speakers. We found that the choice of logico-semantic relations correlates with genre and question, while the choice of taxis correlates with a speaker's sex and his membership of some social group (in addition to genre). Parataxis is favoured by men and by members of the group lowest in the social hierarchy. Age on the other hand is not significant in the choice of taxis at all. In other words, since social factors are clearly shown to be significant in the making of abstract grammatical choices where they cannot be explained in terms of the functional organisation of text, we conclude that social factors must be made part of a model of text in order to fully account for its contextual conditioning. The study demonstrates that an understanding of the linguistic properties of discourse requires empirical study and, conversely, that it is possible to study discourse empirically without relaxing the standards of scientific inquiry.
APA, Harvard, Vancouver, ISO, and other styles
13

MATSUBARA, Shigeki, and Yoshihide KATO. "Incremental Parsing with Adjoining Operation." Institute of Electronics, Information and Communication Engineers, 2009. http://hdl.handle.net/2237/15001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Kalantari, John I. "A general purpose artificial intelligence framework for the analysis of complex biological systems." Diss., University of Iowa, 2017. https://ir.uiowa.edu/etd/5953.

Full text
Abstract:
This thesis encompasses research on Artificial Intelligence in support of automating scientific discovery in the fields of biology and medicine. At the core of this research is the ongoing development of a general-purpose artificial intelligence framework emulating various facets of human-level intelligence necessary for building cross-domain knowledge that may lead to new insights and discoveries. To learn and build models in a data-driven manner, we develop a general-purpose learning framework called Syntactic Nonparametric Analysis of Complex Systems (SYNACX), which uses tools from Bayesian nonparametric inference to learn the statistical and syntactic properties of biological phenomena from sequence data. We show that the models learned by SYNACX offer performance comparable to that of standard neural network architectures. For complex biological systems or processes consisting of several heterogeneous components with spatio-temporal interdependencies across multiple scales, learning frameworks like SYNACX can become unwieldy due to the the resultant combinatorial complexity. Thus we also investigate ways to robustly reduce data dimensionality by introducing a new data abstraction. In particular, we extend traditional string and graph grammars in a new modeling formalism which we call Simplicial Grammar. This formalism integrates the topological properties of the simplicial complex with the expressive power of stochastic grammars in a computation abstraction with which we can decompose complex system behavior, into a finite set of modular grammar rules which parsimoniously describe the spatial/temporal structure and dynamics of patterns inferred from sequence data.
APA, Harvard, Vancouver, ISO, and other styles
15

Aycinena, Margaret Aida. "Probabilistic geometric grammars for object recognition." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/34640.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.
Includes bibliographical references (p. 121-123).
This thesis presents a generative three-dimensional (3D) representation and recognition framework for classes of objects. The framework uses probabilistic grammars to represent object classes recursively in terms of their parts, thereby exploiting the hierarchical and substitutive structure inherent to many types of objects. The framework models the 3) geometric characteristics of object parts using multivariate conditional Gaussians over dimensions, position, and rotation. I present algorithms for learning geometric models and rule probabilities given parsed 3D examples and a fixed grammar. I also present a parsing algorithm for classifying unlabeled, unparsed 3D examples given a geometric grammar. Finally, I describe the results of a set of experiments designed to investigate the chosen model representation of the framework.
by Margaret Aida Aycinena.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
16

Carroll, Glenn R. "Learning probabilistic grammars for language modeling." [S.l.] : Universität Stuttgart , Fakultätsübergreifend / Sonstige Einrichtung, 1995. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB7084251.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Álvaro, Muñoz Francisco. "Mathematical Expression Recognition based on Probabilistic Grammars." Doctoral thesis, Universitat Politècnica de València, 2015. http://hdl.handle.net/10251/51665.

Full text
Abstract:
[EN] Mathematical notation is well-known and used all over the world. Humankind has evolved from simple methods representing countings to current well-defined math notation able to account for complex problems. Furthermore, mathematical expressions constitute a universal language in scientific fields, and many information resources containing mathematics have been created during the last decades. However, in order to efficiently access all that information, scientific documents have to be digitized or produced directly in electronic formats. Although most people is able to understand and produce mathematical information, introducing math expressions into electronic devices requires learning specific notations or using editors. Automatic recognition of mathematical expressions aims at filling this gap between the knowledge of a person and the input accepted by computers. This way, printed documents containing math expressions could be automatically digitized, and handwriting could be used for direct input of math notation into electronic devices. This thesis is devoted to develop an approach for mathematical expression recognition. In this document we propose an approach for recognizing any type of mathematical expression (printed or handwritten) based on probabilistic grammars. In order to do so, we develop the formal statistical framework such that derives several probability distributions. Along the document, we deal with the definition and estimation of all these probabilistic sources of information. Finally, we define the parsing algorithm that globally computes the most probable mathematical expression for a given input according to the statistical framework. An important point in this study is to provide objective performance evaluation and report results using public data and standard metrics. We inspected the problems of automatic evaluation in this field and looked for the best solutions. We also report several experiments using public databases and we participated in several international competitions. Furthermore, we have released most of the software developed in this thesis as open source. We also explore some of the applications of mathematical expression recognition. In addition to the direct applications of transcription and digitization, we report two important proposals. First, we developed mucaptcha, a method to tell humans and computers apart by means of math handwriting input, which represents a novel application of math expression recognition. Second, we tackled the problem of layout analysis of structured documents using the statistical framework developed in this thesis, because both are two-dimensional problems that can be modeled with probabilistic grammars. The approach developed in this thesis for mathematical expression recognition has obtained good results at different levels. It has produced several scientific publications in international conferences and journals, and has been awarded in international competitions.
[ES] La notación matemática es bien conocida y se utiliza en todo el mundo. La humanidad ha evolucionado desde simples métodos para representar cuentas hasta la notación formal actual capaz de modelar problemas complejos. Además, las expresiones matemáticas constituyen un idioma universal en el mundo científico, y se han creado muchos recursos que contienen matemáticas durante las últimas décadas. Sin embargo, para acceder de forma eficiente a toda esa información, los documentos científicos han de ser digitalizados o producidos directamente en formatos electrónicos. Aunque la mayoría de personas es capaz de entender y producir información matemática, introducir expresiones matemáticas en dispositivos electrónicos requiere aprender notaciones especiales o usar editores. El reconocimiento automático de expresiones matemáticas tiene como objetivo llenar ese espacio existente entre el conocimiento de una persona y la entrada que aceptan los ordenadores. De este modo, documentos impresos que contienen fórmulas podrían digitalizarse automáticamente, y la escritura se podría utilizar para introducir directamente notación matemática en dispositivos electrónicos. Esta tesis está centrada en desarrollar un método para reconocer expresiones matemáticas. En este documento proponemos un método para reconocer cualquier tipo de fórmula (impresa o manuscrita) basado en gramáticas probabilísticas. Para ello, desarrollamos el marco estadístico formal que deriva varias distribuciones de probabilidad. A lo largo del documento, abordamos la definición y estimación de todas estas fuentes de información probabilística. Finalmente, definimos el algoritmo que, dada cierta entrada, calcula globalmente la expresión matemática más probable de acuerdo al marco estadístico. Un aspecto importante de este trabajo es proporcionar una evaluación objetiva de los resultados y presentarlos usando datos públicos y medidas estándar. Por ello, estudiamos los problemas de la evaluación automática en este campo y buscamos las mejores soluciones. Asimismo, presentamos diversos experimentos usando bases de datos públicas y hemos participado en varias competiciones internacionales. Además, hemos publicado como código abierto la mayoría del software desarrollado en esta tesis. También hemos explorado algunas de las aplicaciones del reconocimiento de expresiones matemáticas. Además de las aplicaciones directas de transcripción y digitalización, presentamos dos propuestas importantes. En primer lugar, desarrollamos mucaptcha, un método para discriminar entre humanos y ordenadores mediante la escritura de expresiones matemáticas, el cual representa una novedosa aplicación del reconocimiento de fórmulas. En segundo lugar, abordamos el problema de detectar y segmentar la estructura de documentos utilizando el marco estadístico formal desarrollado en esta tesis, dado que ambos son problemas bidimensionales que pueden modelarse con gramáticas probabilísticas. El método desarrollado en esta tesis para reconocer expresiones matemáticas ha obtenido buenos resultados a diferentes niveles. Este trabajo ha producido varias publicaciones en conferencias internacionales y revistas, y ha sido premiado en competiciones internacionales.
[CAT] La notació matemàtica és ben coneguda i s'utilitza a tot el món. La humanitat ha evolucionat des de simples mètodes per representar comptes fins a la notació formal actual capaç de modelar problemes complexos. A més, les expressions matemàtiques constitueixen un idioma universal al món científic, i s'han creat molts recursos que contenen matemàtiques durant les últimes dècades. No obstant això, per accedir de forma eficient a tota aquesta informació, els documents científics han de ser digitalitzats o produïts directament en formats electrònics. Encara que la majoria de persones és capaç d'entendre i produir informació matemàtica, introduir expressions matemàtiques en dispositius electrònics requereix aprendre notacions especials o usar editors. El reconeixement automàtic d'expressions matemàtiques té per objectiu omplir aquest espai existent entre el coneixement d'una persona i l'entrada que accepten els ordinadors. D'aquesta manera, documents impresos que contenen fórmules podrien digitalitzar-se automàticament, i l'escriptura es podria utilitzar per introduir directament notació matemàtica en dispositius electrònics. Aquesta tesi està centrada en desenvolupar un mètode per reconèixer expressions matemàtiques. En aquest document proposem un mètode per reconèixer qualsevol tipus de fórmula (impresa o manuscrita) basat en gramàtiques probabilístiques. Amb aquesta finalitat, desenvolupem el marc estadístic formal que deriva diverses distribucions de probabilitat. Al llarg del document, abordem la definició i estimació de totes aquestes fonts d'informació probabilística. Finalment, definim l'algorisme que, donada certa entrada, calcula globalment l'expressió matemàtica més probable d'acord al marc estadístic. Un aspecte important d'aquest treball és proporcionar una avaluació objectiva dels resultats i presentar-los usant dades públiques i mesures estàndard. Per això, estudiem els problemes de l'avaluació automàtica en aquest camp i busquem les millors solucions. Així mateix, presentem diversos experiments usant bases de dades públiques i hem participat en diverses competicions internacionals. A més, hem publicat com a codi obert la majoria del software desenvolupat en aquesta tesi. També hem explorat algunes de les aplicacions del reconeixement d'expressions matemàtiques. A més de les aplicacions directes de transcripció i digitalització, presentem dues propostes importants. En primer lloc, desenvolupem mucaptcha, un mètode per discriminar entre humans i ordinadors mitjançant l'escriptura d'expressions matemàtiques, el qual representa una nova aplicació del reconeixement de fórmules. En segon lloc, abordem el problema de detectar i segmentar l'estructura de documents utilitzant el marc estadístic formal desenvolupat en aquesta tesi, donat que ambdós són problemes bidimensionals que poden modelar-se amb gramàtiques probabilístiques. El mètode desenvolupat en aquesta tesi per reconèixer expressions matemàtiques ha obtingut bons resultats a diferents nivells. Aquest treball ha produït diverses publicacions en conferències internacionals i revistes, i ha sigut premiat en competicions internacionals.
Álvaro Muñoz, F. (2015). Mathematical Expression Recognition based on Probabilistic Grammars [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/51665
TESIS
APA, Harvard, Vancouver, ISO, and other styles
18

Lee, Wing Kuen. "Interpreting tables in text using probabilistic two-dimensional context-free grammars /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20LEEW.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Scicluna, James. "Grammatical inference of probalistic context-free grammars." Nantes, 2014. http://www.theses.fr/2014NANT2071.

Full text
Abstract:
L’inférence grammaticale consiste à apprendre, à partir de données provenant d’un langage, une grammaire susceptible d’expliquer ou de générer le langage en question. Ce travail, concerne les grammaires incontextuelles (ou context-free) probabilistes, plus puissantes que les grammaires régulières, objet de la plupart des travaux en inférence grammaticale. L’apprentissage est non supervisé : aucune information structurelle n’est connue. Le travail comprend un état de l’art concernant l’inférence grammaticale, les grammaires probabilistes et les classes de grammaires permettant un apprentissage distributionnel. Puis nous étudions différents problèmes de décision concernant des questions de (calculs de) distances entre distributions et nous montrons qu’en général il s’agit de problèmes indécidables. Dans un second temps nous donnons une description mathématique de la classe de grammaires qui vont nous intéresser. Le coeur de la thèse concerne le développement de l’algorithme COMINO, de l’analyse de ses propriétés et de l’étude empirique de ses capacités. L’algorithme se déroule en trois phases : durant la première, une relation d’équivalence sur les sous-mots est calculée. Durant la seconde, un solveur est utilisé pour sélectionner un nombre minimal de classes. Enfin, les classes deviennent les nonterminaux d’une grammaire dont les poids des règles sont estimés grâce à l’échantillon. Les résultats expérimentaux témoignent de la robustesse de l’approche mais montrent également les limites de l’approche sur des données réelles de langue naturelle
Probabilistic Context-Free Grammars (PCFGs) are formal statistical models which describe probability distributions on strings and on tree structures of the same strings. Grammatical Inference is a sub-field of machine learning where the task is to learn automata or grammars (such as PCFGs) from information about their languages. In this thesis, we are interested in Grammatical Inference of PCFGs from text. There are various applications for this problem, chief amongst which are Unsupervised Parsing and Language Modelling in Natural Language Processing and RNA secondary structure prediction in Bioinformatics. PCFG inference is however a difficult problem for a variety of reasons. In spite of its importance for various applications, only few positive results have up till now been obtained for this problem. Our main contribution in this thesis is a practical PCFG learning algorithm with some proven properties and based on a principled approach. We define a new subclass of PCFGs (very similar to the one defined in (Clark, 2010)) and use distributional learning and MDL-based techniques in order to learn this class of grammars. We obtain competitive results on experiments that evaluate unsupervised parsing and language modelling. A minor contribution in this thesis is a compendium of undecidability results for distances between PCFGs along with two positive results on PCFGs. Having such results can help in the process of finding learning algorithms for PCFGs
APA, Harvard, Vancouver, ISO, and other styles
20

Beneš, Vojtěch. "Syntaktický analyzátor pro český jazyk." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236022.

Full text
Abstract:
Master’s thesis describes theoretical basics, solution design, and implementation of constituency (phrasal) parser for Czech language, which is based on a part of speech association into phrases. Created program works with manually built and annotated Czech sample corpus to generate probabilistic context free grammar within runtime machine learning. Parser implementation, based on extended CKY algorithm, then for the input Czech sentence decides if the sentence can be generated by the created grammar and for the positive cases constructs the most probable derivation tree. This result is then compared with the expected parse to evaluate constituency parser success rate.
APA, Harvard, Vancouver, ISO, and other styles
21

Bensalem, Raja. "Construction de ressources linguistiques arabes à l’aide du formalisme de grammaires de propriétés en intégrant des mécanismes de contrôle." Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0503/document.

Full text
Abstract:
La construction de ressources linguistiques arabes riches en informations syntaxiques constitue un enjeu important pour le développement de nouveaux outils de traitement automatique. Cette thèse propose une approche pour la création d’un treebank de l’arabe intégrant des informations d’un type nouveau reposant sur le formalisme des Grammaires de Propriétés. Une propriété syntaxique caractérise une relation pouvant exister entre deux unités d’une certaine structure syntaxique. Cette grammaire est induite automatiquement à partir du treebank arabe ATB, ce qui constitue un enrichissement de cette ressource tout en conservant ses qualités. Cet enrichissement a été également appliqué aux résultats d’analyse d’un analyseur état de l’art du domaine, le Stanford Parser, offrant la possibilité d’une évaluation s’appuyant sur un ensemble de mesures obtenues à partir de cette ressource. Les étiquettes des unités de cette grammaire sont structurées selon une hiérarchie de types permettant la variation de leur degré de granularité, et par conséquent du degré de précision des informations. Nous avons pu ainsi construire, à l’aide de cette grammaire, d’autres ressources linguistiques arabes. En effet, sur la base de cette nouvelle ressource, nous avons développé un analyseur syntaxique probabiliste à base de propriétés syntaxiques, le premier appliqué pour l'arabe. Une grammaire de propriétés lexicalisée probabiliste fait partie de son modèle d’apprentissage pour pouvoir affecter positivement le résultat d’analyse et caractériser ses structures syntaxiques avec les propriétés de ce modèle. Nous avons enfin évalué les résultats obtenus en les comparant à celles du Stanford Parser
The building of syntactically informative Arabic linguistic resources is a major issue for the development of new machine processing tools. We propose in this thesis to create an Arabic treebank that integrates a new type of information, which is based on the Property Grammar formalism. A syntactic property is a relation between two units of a given syntactic structure. This grammar is automatically induced from the Arabic treebank ATB. We enriched this resource with the property representations of this grammar, while retaining its qualities. We also applied this enrichment to the parsing results of a state-of-the-art analyzer, the Stanford Parser. This provides the possibility of an evaluation using a measure set, which is calculated on this resource. We structured the tags of the units in this grammar according to a type hierarchy. This permit to vary the granularity level of these units, and consequently the accuracy level of the information. We have thus been able to construct, using this grammar, other Arabic linguistic resources. Secondly, based on this new resource, we developed a probabilistic syntactic parser based on syntactic properties. This is the first analyzer of this type that we have applied to Arabic. In the learning model, we integrated a probabilistic lexicalized property grammar that may positively affect the parsing result and describe its syntactic structures with its properties. Finally, we evaluated the parsing results of this approach by comparing them to those of the Stanford Parser
APA, Harvard, Vancouver, ISO, and other styles
22

Toussenel, François. "Étiquetage probabiliste avec un grand jeu d'étiquettes en vue de l'analyse syntaxique complète." Paris 7, 2005. http://www.theses.fr/2005PA070087.

Full text
Abstract:
Nous parcourons les limites de l'approche d'étiquetage en arbres élémentaires par modèle de Markov caché comme étape préparatoire à l'analyse syntaxique complète utilisant une large grammaire d'arbres adjoints lexicalisée extraite automatiquement d'un corpus arboré. Après avoir identifié deux sources majeures de difficulté pour cette approche (des problèmes statistiques dus à un fort manque de données, et un conflit entre la nature globale des informations véhiculées par les schémas d'arbre et la vision locale du modèle de Markov caché), nous avons exploré trois voies d'amélioration de la phase d'étiquetage. Les deux premières (généralisation des données d'apprentissage et sous-spécification) utilisent une décomposition des schémas d'arbre en traits. La troisième, qui s'attaque à la seconde source de difficulté, utilise la structure des schémas d'arbre correspondant aux supertags pour éliminer les séquences de supertags qui ne pourront donner d'analyse complète
We explore the limits of the approach of supertagging using a hidden Markov model as a pre-processing step before full parsing, using a large Lexicalized Tree Adjoining Grammar automatically extracted from a treebank. We identify two major sources of difficulty in this approach (statistical issues due to heavy data sparseness, and a clash between the global nature of information provided by the supertags and the local vision of the hidden Markov model), and then we explore three possible ways to improve the tagging step. The first two (generalization of learning data and underspecification) make use of a feature structure to represent the supertags. The third way addresses the second source of difficulty and relies on the structure of the supertags to prune the sequences of supertags which can never result in a full parse
APA, Harvard, Vancouver, ISO, and other styles
23

Mamián, López Esther Sofía 1985. "Métodos de pontos interiores como alternativa para estimar os parâmetros de uma gramática probabilística livre do contexto." [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306757.

Full text
Abstract:
Orientadores: Aurelio Ribeiro Leite de Oliveira, Fredy Angel Amaya Robayo
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica
Made available in DSpace on 2018-08-23T17:46:00Z (GMT). No. of bitstreams: 1 MamianLopez_EstherSofia_M.pdf: 1176541 bytes, checksum: 8f49901f40e77c9511c30e86c0d1bb0d (MD5) Previous issue date: 2013
Resumo: Os modelos probabilísticos de uma linguagem (MPL) são modelos matemáticos onde é definida uma função de probabilidade que calcula a probabilidade de ocorrência de uma cadeia em uma linguagem. Os parâmetros de um MPL, que são as probabilidades de uma cadeia, são aprendidos a partir de uma base de dados (amostras de cadeias) pertencentes à linguagem. Uma vez obtidas as probabilidades, ou seja, um modelo da linguagem, existe uma medida para comparar quanto o modelo obtido representa a linguagem em estudo. Esta medida é denominada perplexidade por palavra. O modelo de linguagem probabilístico que propomos estimar, está baseado nas gramáticas probabilísticas livres do contexto. O método clássico para estimar os parâmetros de um MPL (Inside-Outside) demanda uma grande quantidade de tempo, tornando-o inviável para aplicações complexas. A proposta desta dissertação consiste em abordar o problema de estimar os parâmetros de um MPL usando métodos de pontos interiores, obtendo bons resultados em termos de tempo de processamento, número de iterações até obter convergência e perplexidade por palavra
Abstract: In a probabilistic language model (PLM), a probability function is defined to calculate the probability of a particular string ocurring within a language. These probabilities are the PLM parameters and are learned from a corpus (string samples), being part of a language. When the probabilities are calculated, with a language model as a result, a comparison can be realized in order to evaluate the extent to which the model represents the language being studied. This way of evaluation is called perplexity per word. The PLM proposed in this work is based on the probabilistic context-free grammars as an alternative to the classic method inside-outside that can become quite time-consuming, being unviable for complex applications. This proposal is an approach to estimate the PLM parameters using interior point methods with good results being obtained in processing time, iterations number until convergence and perplexity per word
Mestrado
Matematica Aplicada
Mestra em Matemática Aplicada
APA, Harvard, Vancouver, ISO, and other styles
24

Yi-Ting, Fu, and 傅怡婷. "Learning Semantic Parsing Using Probabilistic Context-Free Grammar in Chinese Poetry Domains." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/bhdau2.

Full text
Abstract:
碩士
國立清華大學
資訊系統與應用研究所
93
Statistical model have been used quite successfully in Natural Language Processing for recovery of hidden structure such as part-of-speech tags, or syntactic structure. This thesis considers semantic parsing and tagging of classical Chinese poetry lines. There are five aims in this thesis: (1) Construct semantic grammars; (2) Modify and learning probabilities of the semantic grammars from the training corpus; (3) Parse the sentence to tree structure; (4) Evaluate the accuracy of parsing results and (5) Compare with the Hidden Markov Model bi-gram tagger. In the first three tasks, we assumed that the categories of Chinese Thesaurus are representative enough to help us analyze the semantic of the sentences. And the semantic grammars were built upon the semantic categories and semantic rules. We modified the grammars and learned the probabilities from training data with Inside-Outside algorithm. And Viterbi algorithm was used to find the most likely parsing route. In the last two tasks, we found that the PCFG semantic parser has better performance on prediction of semantic tagging in the situation of data sparseness and the greater ability on disambiguation. We believe that parsing results might have broadly usages in machine translation, and poetry generation, and etc. in the future.
APA, Harvard, Vancouver, ISO, and other styles
25

Jia-KuanLin and 林家寬. "Affective Structure Modeling of Speech for Emotion Recognition Using Probabilistic Context Free Grammar." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/38279767034205399000.

Full text
Abstract:
碩士
國立成功大學
醫學資訊研究所
102
Speech is the most natural way with rich emotional information for communication. Recognition of emotions in speech plays an important role in affective computing. Related research on utterance-level and segment-level processing lacks the understanding of the underlying structure of emotional speech. In this thesis, a hierarchical approach to modeling affective structure based on probabilistic context free grammar is proposed for recognition. Canny edge detection algorithm is employed to detect the hypothesized segment boundaries of speech signal according to spectral similarity. Emotion profiles generated from the SVM-based classification model are used to find a maximum change boundary between segments. Then, a binary tree is constructed to derive the hierarchical structure with multi-layer speech segments. Vector quantization is further used to generate emotion-profile codebook and a hierarchical representation of the speech segments. Probabilistic context free grammar is adopted to model the hierarchical relations between codewords for affective structure modeling. In order to evaluate the proposed method, Berlin emotional speech database (EMO-DB) with 1495 utterances and 7 emotions and the leave-one-speaker-out cross validation scheme was employed. For investigating the effect of utterance length, concatenation of two or more utterances from the database was also performed. The experimental results show that the proposed method achieved emotion recognition accuracy of 87.22% in long utterance and outperformed the conventional SVM-based method. Further study on collecting more real corpus is needed for the analysis and recognition of emotions in spontaneous speech.
APA, Harvard, Vancouver, ISO, and other styles
26

Cunha, Jessica Megane Taveira da. "Probabilistic Grammatical Evolution." Master's thesis, 2021. http://hdl.handle.net/10316/96066.

Full text
Abstract:
Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia
Evolução Gramatical (GE) [1] é uma das variantes mais populares de Programação Genética(GP) [2] e tem sido utilizada com sucesso em problemas de vários domínios. Desde a pro-posta original, muitas melhorias foram introduzidas na GE para melhorar a sua perfor-mance abordando alguns dos seus principais problemas, nomeadamente a baixa localidadee a alta redundância [3, 4].Nos métodos de GP baseados em gramáticas a escolha da gramática tem um papel impor-tante na qualidade das soluções geradas, uma vez que é a gramática que define o espaçode procura [5]. Neste trabalho, propomos quatro variantes da GE, que durante o processoevolucionário realizam uma exploração do espaço de procura, alterando os pesos de cada re-gra da gramática. Estas variantes introduzem dois tipos de representação alternativas, doismétodos diferentes de ajustar a gramática e um novo método de mapeamento, utilizandouma Gramática Livre de Contexto Probabilistica (PCFG).O primeiro método é a Evolução Gramatical Probabilistica (PGE), no qual os individuossão representados por uma lista de probabilidades (genótipo), onde cada valor representa aprobabilidade de selecionar uma regra de derivação. O genótipo é mapeado numa soluçãopara o problema em questão (fenótipo), recorrendo a uma PCFG. A cada geração, as prob-abilidades de cada regra da gramática são atualizadas, com base nas regras de expansãousadas pelo melhor individuo. A Evolução Gramatical Probabilistica Co-Evolucionária(Co-PGE) utiliza a mesma representação dos individuos e introduz uma nova técnicade atualização das probabilidades da gramática onde as probabilidades de cada regra dederivação são alteradas a cada geração usando um operador semelhante à mutação. Emambos os métodos os individuos são remapeados após atualização da gramática.A Evolução Gramatical Estruturada Probabilistica (PSGE) e a Evolução Gramatical Estru-turada Probabilistica Co-Evolucionária (Co-PSGE) foram criadas adaptando a EvoluçãoGramatical Estruturada (SGE), um método que foi proposto para superar os problemas daGE melhorando a sua performance [6]. Estas variantes usam como genótipo um conjuntode listas dinâmicas, uma para cada não-terminal, em que cada elemento da lista é umaprobabilidade usada para mapear o individuo, usando uma PCFG.Analisamos e comparamos o desempenho dos métodos em seis problemas benchmark.Quando comparados com a GE, os resultados mostraram que a PGE e a Co-PGE sãoestatisticamente semelhantes ou melhores em todos os problemas, enquanto que a PSGE ea Co-PSGE foram estatisticamente melhores em todos os problemas do que a tradicionalGE. Destacamos também a Co-PSGE por superar estatisticamente a SGE em alguns prob-lemas, tornando-a competitiva com o estado da arte. Também realizamos uma análise nasrepresentações, e os resultados mostraram que a PSGE e a Co-PSGE tem menos redun-dancia, e todos os métodos apresentaram localidade mais elevada que o GE, o que permiteuma melhor exploração do espaço de procura.As análises efetuadas mostraram que as gramáticas evoluidas ajudam a guiar o processoevolucionario, e fornecem-nos informações sobre as regras de produção mais relevantespara gerar melhores soluções. Além disso, também podem ser utilizadas para gerar umaamostragem de soluções com melhor fitness médio.
Grammatical Evolution (GE) [1] is one of the most popular variants of Genetic Program-ming (GP) [2] and has been successfully used in a wide range of problem domains. Sincethe original proposal, many improvements have been introduced in GE to improve its per-formance by addressing some of its main issues, namely low locality and high redundancy[3, 4].In grammar-based GP methods the choice of the grammar has a significant impact onthe quality of the generated solutions, since it is the grammar that defines the searchspace [5]. In this work, we present four variants of GE, which during the evolutionaryprocess perform an exploration of the search space by updating the weights of each ruleof the grammar. These variants introduce two alternative representation types, two gram-mar adjustment methods, and a new mapping method using a Probabilistic Context-FreeGrammar (PCFG).The first method is Probabilistic Grammatical Evolution (PGE), in which individuals arerepresented by a list of real values (genotype), each value denoting the probability of select-ing a derivation rule. The genotype is mapped into a solution (phenotype) to the problemat hand, using a PCFG. At each generation, the probabilities of each rule in the grammarare upated, based on the expansion rules used by the best individual. Co-evolutionaryProbabilistic Grammatical Evolution (Co-PGE) employs the same representation of indi-viduals and introduces a new technique to update the grammar’s probabilities, where eachindividual is assigned a PCFG where the probabilities of each derivation option are changedat each generation using a mutation like operator. In both methods, the individuals areremapped after updating the grammar.Probabilistic Structured Grammatical Evolution (PSGE) and Co-evolutionary Probabilis-tic Structured Grammatical Evolution (Co-PSGE) were created by adapting the mappingand probabilities update mechanism from PGE and Co-PGE to Structured GrammaticalEvolution (SGE), a method that was proposed to overcome the issues of GE while improv-ing its performance [6]. These variants use as genotype a set of dynamic lists, one for eachnon-terminal of the grammar, with each element of the list being the probability used tomap the individual with the PCFG.We analyse and compare the performance of all the methods in six benchmarks. Whencompared to GE, the results showed that PGE and Co-PGE are statistically similar orbetter on all problems, while PSGE and Co-PSGE are statistically better on all problems.We also highlight Co-PSGE since it is statistically superior to SGE in some problems,making it competitive with the state-of-the-art. We also performed an analysis on therepresentations, and the results showed that PSGE and Co-PSGE have less redundancy,and all approaches exhibited better locality than GE, which allows for a better explorationof the search space.The analyses conducted showed that the evolved grammars help guide the evolutionaryprocess and provides us information about the most relevant production rules to generatebetter solutions. In addition, they can also be used to generate a sampling of solutionswith better average fitness.
FCT
APA, Harvard, Vancouver, ISO, and other styles
27

Nguyen, Ngoc Tran. "Étude de transformations grammaticales pour l'entraînement de grammaires probabilistes hors-contexte." Thèse, 2002. http://hdl.handle.net/1866/14498.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Gotti, Fabrizio. "L'atténuation statistique des surdétections d'un correcteur grammatical symbolique." Thèse, 2012. http://hdl.handle.net/1866/9809.

Full text
Abstract:
Les logiciels de correction grammaticale commettent parfois des détections illégitimes (fausses alertes), que nous appelons ici surdétections. La présente étude décrit les expériences de mise au point d’un système créé pour identifier et mettre en sourdine les surdétections produites par le correcteur du français conçu par la société Druide informatique. Plusieurs classificateurs ont été entraînés de manière supervisée sur 14 types de détections faites par le correcteur, en employant des traits couvrant di-verses informations linguistiques (dépendances et catégories syntaxiques, exploration du contexte des mots, etc.) extraites de phrases avec et sans surdétections. Huit des 14 classificateurs développés sont maintenant intégrés à la nouvelle version d’un correcteur commercial très populaire. Nos expériences ont aussi montré que les modèles de langue probabilistes, les SVM et la désambiguïsation sémantique améliorent la qualité de ces classificateurs. Ce travail est un exemple réussi de déploiement d’une approche d’apprentissage machine au service d’une application langagière grand public robuste.
Grammar checking software sometimes erroneously flags a correct word sequence as an error, a problem we call overdetection in the present study. We describe the devel-opment of a system for identifying and filtering out the overdetections produced by the French grammar checker designed by the firm Druide Informatique. Various fami-lies of classifiers have been trained in a supervised way for 14 types of detections flagged by the grammar checker, using features that capture diverse linguistic phe-nomena (syntactic dependency links, POS tags, word context exploration, etc.), extracted from sentences with and without overdetections. Eight of the 14 classifiers we trained are now part of the latest version of a very popular commercial grammar checker. Moreover, our experiments have shown that statistical language models, SVMs and word sense disambiguation can all contribute to the improvement of these classifiers. This project is a striking illustration of a machine learning component suc-cessfully integrated within a robust, commercial natural language processing application.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography