Dissertations / Theses on the topic 'Multiwords'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 36 dissertations / theses for your research on the topic 'Multiwords.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Monti, Johanna. "Multi-word unit processing in machine translation. Developing and using language resources for multi-word unit processing in machine translation." Doctoral thesis, Universita degli studi di Salerno, 2015. http://hdl.handle.net/10556/2042.
Full textWaszczuk, Jakub. "Leveraging MWEs in practical TAG parsing : towards the best of the two worlds." Thesis, Tours, 2017. http://www.theses.fr/2017TOUR4024/document.
Full textIn this thesis, we focus on multiword expressions (MWEs) and their relationships with syntactic parsing. The latter task consists in retrieving the syntactic relations holding between the words in a given sentence. The challenge of MWEs in this respect is that, in contrast to regular linguistic expressions, they exhibit various irregular properties which make them harder to deal with in natural language processing. In our work, we show that the challenge of the MWE-related irregularities can be turned into an advantage in practical symbolic parsing. Namely, with tree adjoining grammars (TAGs), which provide first-cLass support for MWEs, and A* search strategies, considerable speed-up gains can be achieved by promoting MWE-based analyses with virtually no loss in syntactic parsing accuracy. This is in contrast to purely statistical state-of-the-art parsers, which, despite efficiency, provide no satisfactory support for MWEs. We contribute a TAG-A* -MWE-aware parsing architecture with facilities (grammar compression and feature structures) enabling real-world applications, easily extensible to a probabilistic framework
Su, Kim Nam. "Statistical modeling of multiword expressions." Connect to thesis, 2008. http://repository.unimelb.edu.au/10187/3147.
Full textOur goals in this research are: to use computational techniques to shed light on the underlying linguistic processes giving rise to MWEs across constructions and languages; to generalize existing techniques by abstracting away from individual MWE types; and finally to exemplify the utility of MWE interpretation within general NLP tasks.
In this thesis, we target English MWEs due to resource availability. In particular, we focus on noun compounds (NCs) and verb-particle constructions (VPCs) due to their high productivity and frequency.
Challenges in processing noun compounds are: (1) interpreting the semantic relation (SR) that represents the underlying connection between the head noun and modifier(s); (2) resolving syntactic ambiguity in NCs comprising three or more terms; and (3) analyzing the impact of word sense on noun compound interpretation. Our basic approach to interpreting NCs relies on the semantic similarity of the NC components using firstly a nearest-neighbor method (Chapter 5), then verb semantics based on the observation that it is often an underlying verb that relates the nouns in NCs (Chapter 6), and finally semantic variation within NC sense collocations, in combination with bootstrapping (Chapter 7).
Challenges in dealing with verb-particle constructions are: (1) identifying VPCs in raw text data (Chapter 8); and (2) modeling the semantic compositionality of VPCs (Chapter 5). We place particular focus on identifying VPCs in context, and measuring the compositionality of unseen VPCs in order to predict their meaning. Our primary approach to the identification task is to adapt localized context information derived from linguistic features of VPCs to distinguish between VPCs and simple verb-PP combinations. To measure the compositionality of VPCs, we use semantic similarity among VPCs by testing the semantic contribution of each component.
Finally, we conclude the thesis with a chapter-by-chapter summary and outline of the findings of our work, suggestions of potential NLP applications, and a presentation of further research directions (Chapter 9).
Korkontzelos, Ioannis. "Unsupervised learning of multiword expressions." Thesis, University of York, 2010. http://etheses.whiterose.ac.uk/2091/.
Full textTaslimipoor, Shiva. "Automatic identification and translation of multiword expressions." Thesis, University of Wolverhampton, 2018. http://hdl.handle.net/2436/622068.
Full textCordeiro, Silvio Ricardo. "Distributional models of multiword expression compositionality prediction." Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0501/document.
Full textNatural language processing systems often rely on the idea that language is compositional, that is, the meaning of a linguistic entity can be inferred from the meaning of its parts. This expectation fails in the case of multiword expressions (MWEs). For example, a person who is a "sitting duck" is neither a duck nor necessarily sitting. Modern computational techniques for inferring word meaning based on the distribution of words in the text have been quite successful at multiple tasks, especially since the rise of word embedding approaches. However, the representation of MWEs still remains an open problem in the field. In particular, it is unclear how one could predict from corpora whether a given MWE should be treated as an indivisible unit (e.g. "nut case") or as some combination of the meaning of its parts (e.g. "engine room"). This thesis proposes a framework of MWE compositionality prediction based on representations of distributional semantics, which we instantiate under a variety of parameters. We present a thorough evaluation of the impact of these parameters on three new datasets of MWE compositionality, encompassing English, French and Portuguese MWEs. Finally, we present an extrinsic evaluation of the predicted levels of MWE compositionality on the task of MWE identification. Our results suggest that the proper choice of distributional model and corpus parameters can produce compositionality predictions that are comparable to the state of the art
Cordeiro, Silvio Ricardo. "Distributional models of multiword expression compositionality prediction." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2018. http://hdl.handle.net/10183/174519.
Full textNatural language processing systems often rely on the idea that language is compositional, that is, the meaning of a linguistic entity can be inferred from the meaning of its parts. This expectation fails in the case of multiword expressions (MWEs). For example, a person who is a sitting duck is neither a duck nor necessarily sitting. Modern computational techniques for inferring word meaning based on the distribution of words in the text have been quite successful at multiple tasks, especially since the rise of word embedding approaches. However, the representation of MWEs still remains an open problem in the field. In particular, it is unclear how one could predict from corpora whether a given MWE should be treated as an indivisible unit (e.g. nut case) or as some combination of the meaning of its parts (e.g. engine room). This thesis proposes a framework of MWE compositionality prediction based on representations of distributional semantics, which we instantiate under a variety of parameters. We present a thorough evaluation of the impact of these parameters on three new datasets of MWE compositionality, encompassing English, French and Portuguese MWEs. Finally, we present an extrinsic evaluation of the predicted levels of MWE compositionality on the task of MWE identification. Our results suggest that the proper choice of distributional model and corpus parameters can produce compositionality predictions that are comparable to the state of the art.
Alghamdi, Ayman Ahmad O. "A computational lexicon and representational model for Arabic multiword expressions." Thesis, University of Leeds, 2018. http://etheses.whiterose.ac.uk/22821/.
Full textObermeier, Andrew Stanton. "Multiword Units at the Interface: Deliberate Learning and Implicit Knowledge Gains." Diss., Temple University Libraries, 2015. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/360635.
Full textEd.D.
Multiword units (MWUs) is a term used in the current study to broadly cover what second language acquisition (SLA) researchers refer to as collocations, conventional expressions, chunks, idioms, formulaic sequences, or other such terms, depending on their research perspective. They are ubiquitous in language and essential in both first language (L1) and second language (L2) acquisition. Although MWUs are typically learned implicitly while using language naturally in both of these types of acquisition, the current study is an investigation of whether they are acquired in implicit knowledge when they are learned explicitly in a process called deliberate paired association learning. In SLA research, it is widely accepted that explicit knowledge is developed consciously and implicit knowledge is developed subconsciously. It is also believed that there is little crossover from explicit learning to implicit knowledge. However, recent research has cast doubt on this assumption. In a series of priming experiments, Elgort (2007, 2011) demonstrated that the formal and semantic lexical representations of deliberately learned pseudowords were accessed fluently and integrated into the mental lexicon, convincing evidence that deliberately learned words are immediately acquired in implicit knowledge. The current study aimed to extend these findings to MWUs in a psycholinguistic experiment that tested for implicit knowledge gains resulting from deliberate learning. Participants’ response times (RTs) were measured in three ways, on two testing instruments. First, subconscious formal recognition processing was measured in a masked repetition priming lexical decision task. In the second instrument, a self-paced reading task, both formulaic sequencing and semantic association gains were measured. The experiment was a counterbalanced, within-subjects design; so all comparisons were between conditions on items. Results were analyzed in a repeated measures linear mixed-effects model with participants and items as crossed random effects. The dependent variable was RTs on target words. The primary independent variable was learning condition: half of the critical MWUs were learned and half of them were not. The secondary independent variable was MWU composition at two levels: literal and figurative. The masked priming lexical decision task results showed that priming effects increased especially for learned figurative MWUs, evidence that implicit knowledge gains were made on their formal and semantic lexical representations as a result of deliberate learning. Results of the self-paced reading task were analyzed from two perspectives, but were less conclusive with regard to the effects of deliberate learning. Regarding formulaic sequencing gains, literal MWUs showed the most evidence of acquisition, but this happened as a result of both incidental and deliberate learning. With regard to semantic associations, it was shown that deliberate learning had similar effects on both literal and figurative MWUs. However, a serendipitous finding from this aspect of the self-paced reading results showed clearly that literal MWUs reliably primed semantic associations and sentence processing more strongly than figurative MWUs did, both before and after deliberate learning. In sum, results revealed that the difficulties learners have with developing fluent processing of figurative MWUs can be lessened by deliberate learning. On the other hand, for literal MWUs incidental learning is adequate for incrementally developing representation strength.
Temple University--Theses
GARRAO, MILENA DE UZEDA. "THE CORPUS NEVER LIES: ON THE IDENTIFICATION AND USE OF MULTIWORD EXPRESSIONS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2006. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=8873@1.
Full textMuitos estudos recentes sobre a identificação e uso de combinações multivocabulares (CMs) adotam uma perspectiva representacionista do significado da palavra. Este estudo propõe que é muito mais interessante identificar as CMs por um olhar não-representacionista. A metodologia proposta foi testada em CMs do tipo V+SN, um padrão bastante freqüente no português do Brasil (PB). Trata-se de uma análise estatística com base em córpus que pode ser resumida em três etapas: 1) córpus robusto do PB como base de análise, 2) aplicação de um teste estatístico ao córpus, a saber, teste de Logaritmo de Verossimilhança (Banerjee e Pedersen, 2003), para detecção das CMs mais freqüentes com padrão V+SN (como tomar café) e exclusão de co-ocorrências sintáticas aleatórias dos mesmos itens lexicais, 3) aplicação de Medidas de Similaridade (Baeza-Yates e Ribeiro-Neto, 1999) entre todos os parágrafos contendo uma certa CM (por exemplo, fazer campanha) e todos os parágrafos contendo o substantivo fora da CM (campanha). Esta última etapa foi utilizada para avaliar o grau de composicionalidade da CM. Pôde-se concluir que quanto maior a similaridade entre os parágrafos contendo a CM e os parágrafos contendo o substantivo fora da expressão, maior será o grau de composicionalidade da CM. Por essa razão, este estudo tem um impacto tanto teórico quanto prático para a semântica.
A considerable amount of recent researches on defining multi-word expressions´ (MWE) phenomenon has an underlying representational framework of word meaning. In this study we claim that it is much more interesting to view MWE from a non-representational perspective. By choosing this path, we avoid the time-consuming and controversial human intuitions to MWE identification and definition. Our methodology was tested on Brazilian Portuguese verbal phrases of V+NP pattern. It is a statistically-based corpus analysis which could be summed up as the following three sequent steps: 1) robust linguistic corpora as output, 2) application of a probabilistic test to the corpora, namely Log Likelihood test (Banerjee and Pedersen, 2003), in order to spot the Portuguese MWEs of V+NP pattern (such as tomar café) and disregard casual syntactic and not otherwise motivated co-occurrences of the same lexical items, 3) application of Similarity Measures (Baeza-Yates and Ribeiro-Neto, 1999) between all the paragraphs containing a certain MWE and all the paragraphs containing its separate noun. This latter step is crucial to assess the MWE compositionality level. We conclude that the higher are the similarity measures between the MWE (such as fazer campanha) and its separate noun (campanha), the more compositional will be the MWE. Therefore, we believe that this work has both a practical and a theoretical impact to semantics.
Alshaikhi, Adel Zain. "The Effects of Using Textual Enhancement on Processing and Learning Multiword Expressions." Scholar Commons, 2018. https://scholarcommons.usf.edu/etd/7464.
Full textRamisch, Carlos Eduardo. "A generic and open framework for multiword expressions treatment : from acquisition to applications." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2012. http://hdl.handle.net/10183/65777.
Full textAcevedo, Giménez César Esteban. "Planificador consciente del almacenamiento para Multiwork ows en Cluster Galaxy." Doctoral thesis, Universitat Autònoma de Barcelona, 2017. http://hdl.handle.net/10803/456672.
Full textEn el ámbito bioinformático, la experimentación se realiza a través de secuencias de ejecuciones de aplicaciones, cada aplicación utiliza como archivo de entrada el generado por la aplicación anterior. Este proceso de análisis formado por una lista de aplicaciones describiendo una cadena de dependencia se llama Workflow. Dos características relevantes de los workflows bioinformáticos, hacen referencia al manejo de grandes volúmenes de datos y a la complejidad de las dependencias de datos. Muchos de los gestores de recursos actuales, ignoran la ubicación de los archivos, esto implica un elevado costo si los elementos de procesamiento no están próximos a los archivos y hay que moverlos. El modelo de grafo dirigido acíclico (DAG), utilizado para representar el orden de ejecución de los trabajos del workflow, no ayuda a establecer la mejor ubicación de los archivos de entrada o temporales para una ejecución eficiente. La solución para este desafío, puede ser la planificación de recursos consciente del almacenamiento, donde una estrategia inteligente de colocación de archivos, añadida a una planificación de recursos acorde a este conocimiento; contribuirá a evitar los periodos de inactividad en los sistemas, causados por los tiempos de espera de archivos en los elementos de procesamiento. Con la capacidad de cómputo actual de los clústers, es posible que múltiples workflows puedan ser ejecutados en paralelo. Además, los clústers permiten que los multiworkflows, puedan compartir los archivos de entrada y temporales en la jerarquía de almacenamiento. Proponemos una jerarquía de almacenamiento compuesta por el sistema de archivos distribuido, una RamDisk Local, Disco Local y Disco de Estado Solido (SSD) Local. Con objeto de resolver la asignación de aplicaciones de multiworkflows a los recursos del clúster, extendimos la heurística basada en lista para multiworkflows llamada HEFT (Heterogeneuos Earliest Finish Time). Esta comprende dos fases: primero se realiza una fase de priorización de tareas, para posteriormente realizar la selección de procesadores, que consiste en asignar las aplicaciones al nodo que minimiza el tiempo de finalización de cada una de ellas. El planificador consciente del almacenamiento propuesto, considera ubicar los archivos en la jerarquía de almacenamiento antes de comenzar la ejecución. La pre-ubicación de archivos en los nodos de cómputo hace que las aplicaciones que las utilizan, puedan ser asignadas al mismo nodo que los archivos, reduciendo el tiempo de acceso a disco. Para determinar la ubicación inicial de los archivos de entrada y temporales, el planificador realiza la fusión de todos los workflows en un solo meta-workflow, a continuación, el algoritmo establece según las precedencias de aplicaciones, tamaño de los archivos y grado de compartición de los mismos; el almacenamiento adecuado de cada archivo dentro de la jerarquía. El objetivo del trabajo es implementar una política de planificación consciente del almacenamiento para multiworkflows que mejore el makespan de aplicaciones con cómputo intensivo de datos. Para evaluar la escalabilidad de la propuesta y compararla con otras políticas de la literatura, utilizamos simuladores. Este es un método común para validar heurísticas de planificación y ahorrar tiempo de cómputo buscando la mejor opción. Para ello, extendimos WorkflowSim dotándolo de un planificador consciente de la jerarquía de almacenamiento. El trabajo fue validado, con workflows sintéticos, implementados a partir de la caracterización de aplicaciones bioinformáticas reales, y workflows ampliamente utilizados como Montage y Epigenomics debido a que generan una gran cantidad de archivos temporales. La experimentación se realizó en dos escenarios: sistemas de clúster real de 128 núcleos y simulador de clúster en WorkflowSim hasta 1024 núcleos. El escenario real, arrojo mejoras de makespan de hasta 70%. En el escenario simulado, la mejora de makespan fue del 69% con errores entre 0,9% y 3%.
In the bioinformatic field, experimentation is performed through sequential execution of applications, each application uses as input file the one generated by the previous application. This analysis process consisting of a list of applications describing a dependency chain is called Workflow. Two relevant characteristics of bioinformatic workflows refer to the handling of large volumes of data and the complexity of data dependencies. Many of the current resource managers ignore the location of the files, this implies a high cost if the processing elements are not close to the files and have to be moved. The direct acyclic graph (DAG) model, used to represent the execution order of workflow jobs, does not help to establish the best location of input or temporary data files for efficient execution. The solution to this challenge may be the data-aware scheduling, where an intelligent file placement strategy, added to a resource scheduling according to this knowledge; Will help prevent system downtime caused by the waiting time of data file on processing elements. With the current computing power of clusters, it is possible that multiple workflows to be executed in parallel. In addition, clusters allow multiworkflows to share input and temporal data files in the storage hierarchy. We propose a storage hierarchy composed by the distributed file system, a Local RamDisk, Local Disk and Local Solid State Disk (SSD). In order to solve the assignment of multiworkflows applications to the cluster resources, we extended the multiworkflow heuristic called HEFT (Heterogeneous Earliest Finish Time). This comprises two phases: first a task prioritization phase is performed, and then the processors selection is performed, which consists of assigning the applications to the node that minimizes the execution time of each one of them. The data-aware scheduler considers placing the files in the storage hierarchy before starting the execution. The data files pre-fetching on the compute nodes makes the applications that use them, can be assigned to the same node as the data files, reducing the access time to disk. To determine the initial location of the input and temporal data files, the scheduler performs the merging of all workflows into a single meta-workflow, then the algorithm sets according to application precedence, file size and sharing degree; The proper storage of each file within the hierarchy. The goal of the research is to implement a multi-workflow data-aware scheduler policy that improves the makespan of data-intensive applications. To evaluate the scalability of the proposal and to compare it with other policies in the literature, we use simulators. This is a common method for validating scheduling heuristics and saving computation time by looking for the best option. To do this, we extend WorkflowSim by providing it with a data-aware scheduler with a storage hierarchy. Our work was validated, with synthetic workflows, implemented from the characterization of real bioinformatics applications, and workflows benchmark as Montage and Epigenomics because they generate a large amount of temporal files. The experimentation was performed in two scenarios: real cluster system of 128 cores and a simulated cluster in WorkflowSim with up to 1024 cores. In the real scenario, we achieve a makespan improvement of up to 70%. In the simulated scenario, the makespan improvement was 69% with errors between 0.9% and 3%.
Ochieng, Dunlop. "Indirect Influence of English on Kiswahili: The Case of Multiword Duplicates between Kiswahili and English." Doctoral thesis, Universitätsbibliothek Chemnitz, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-179613.
Full textAcosta, Otavio Costa. "Identificação e tratamento de expressões multipalavras aplicado à recuperação de informação." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2011. http://hdl.handle.net/10183/134318.
Full textThe use of Multiword Expressions (MWE) in natural language texts requires a detailed study, to further support in manipulating and processing, robustly, these kinds of expression. A MWE typically gives concepts and ideas that usually cannot be expressed by a single word and it is estimated that the number of MWEs in the lexicon of a native speaker is similar to the number of single words. Most real applications simply ignore them or create a list of compounds, treating and identifying them as isolated lexical items and not as an individual unit. For the success of a Natural Language Processing (NLP) application, involving semantic processing, adequate treatment for these expressions is required. In this work we investigate the hypothesis that an appropriate identification of Multiword Expressions provide better results in an application, such as Information Retrieval (IR). The objectives of this work are to compare techniques of MWE extraction for creating MWE dictionaries, to be used for indexing purposes in IR. Experimental results show qualitative improvements on the retrieval of relevant documents when identifying MWEs and treating them as a single indexing unit.
Ochieng, Dunlop [Verfasser], Josef [Akademischer Betreuer] Schmied, and Roy Bertus [Gutachter] Van. "Indirect Influence of English on Kiswahili: The Case of Multiword Duplicates between Kiswahili and English / Dunlop Ochieng ; Gutachter: Bertus Van Roy ; Betreuer: Josef Schmied." Chemnitz : Universitätsbibliothek Chemnitz, 2015. http://d-nb.info/1213813700/34.
Full textSchreiner, Paulo. "Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2010. http://hdl.handle.net/10183/27658.
Full textLexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work.
Bellanger, Cindy. "Mémorisation et reconnaissance de séquences multimots chez l'enfant et l'adulte : effets de la fréquence et de la variabilité interne." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAS047/document.
Full textThe mental lexicon is usually assumed as the main foundation of written and spoken-language perception. Numerous and hierarchically-organized cues drive speech segmentation in adults and infants but lexical cues appear as overriding. Throughout this work, we question multiword-sequence storage idiosyncrasy and multiword-sequence memorizing as one unit in the mental lexicon.This work splits into two parts, each composed of a set of experiments. The first one assesses the cues involved in recognition facilitation of nouns in noun phrases. For that purpose, we disentangled grammatical-gender effects and co-occurrence frequency effects on the processing of determiner-noun sequences. Then, we tested the cohesiveness effect on three-word sequences’ recognition.The second set of experiments is about the influence of determiner-noun sequences’ internal variability in noun-phrase’s structure aquisition in 2 to 2,5 year-old children. In a three-month longitudinal study, we contrast two main conceptions of first-language acquisition: Universal Grammar and Usage-Based approaches
Al, Saied Hazem. "Analyse automatique par transitions pour l'identification des expressions polylexicales." Electronic Thesis or Diss., Université de Lorraine, 2019. http://www.theses.fr/2019LORR0206.
Full textThis thesis focuses on the identification of multi-word expressions, addressed through a transition-based system. A multi-word expression (MWE) is a linguistic construct composed of several elements whose combination shows irregularity at one or more linguistic levels. Identifying MWEs in context amounts to annotating the occurrences of MWEs in texts, i.e. to detecting sets of tokens forming such occurrences. For example, in the sentence This has nothing to do with the book, the tokens has, to, do and with would be marked as forming an occurrence of the MWE have to do with. Transition-based analysis is a famous NLP technique to build a structured output from a sequence of elements, applying a sequence of actions (called «transitions») chosen from a predefined set, to incrementally build the output structure. In this thesis, we propose a transition system dedicated to MWE identification within sentences represented as token sequences, and we study various architectures for the classifier which selects the transitions to apply to build the sentence analysis. The first variant of our system uses a linear support vector machine (SVM) classifier. The following variants use neural models: a simple multilayer perceptron (MLP), followed by variants integrating one or more recurrent layers. The preferred scenario is an identification of MWEs without the use of syntactic information, even though we know the two related tasks. We further study a multitasking approach, which jointly performs and take mutual advantage of morphosyntactic tagging, transition-based MWE identification and dependency parsing. The thesis comprises an important experimental part. Firstly, we studied which resampling techniques allow good learning stability despite random initializations. Secondly, we proposed a method for tuning the hyperparameters of our models by trend analysis within a random search for a hyperparameter combination. We produce systems with the constraint of using the same hyperparameter combination for different languages. We use data from the two PARSEME international competitions for verbal MWEs. Our variants produce very good results, including state-of-the-art scores for many languages in the PARSEME 1.0 and 1.1 datasets. One of the variants ranked first for most languages in the PARSEME 1.0 shared task. By the way, our models have poor performance on MWEs that are were not seen at learning time
Candarli, Duygu. "A longitudinal study of multi-word units in L1 and L2 novice academic writing." Thesis, University of Manchester, 2017. https://www.research.manchester.ac.uk/portal/en/theses/a-longitudinal-study-of-multiword-units-in-l1-and-l2-novice-academic-writing(c57f2773-6965-4a96-9cfa-79e2b11e9408).html.
Full textRamisch, Carlos Eduardo. "Un environnement générique et ouvert pour le traitement des expressions polylexicales : de l'acquisition aux applications." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00741147.
Full textPasquer, Caroline. "Garder la trace, mettre de l'ordre et relier les points : modéliser la variation et l'ambiguïté des expressions polylexicales." Thesis, Tours, 2019. http://www.theses.fr/2019TOUR4017.
Full textAutomatic identification of multiword expressions (MWEs) is a pre-requisite for many natural language processing applications. This task is challenging because MWEs, especially verbal ones (VMWEs) like to kick the bucket (which means to die), exhibit surface variability (no buckets were kicked ). However, compared with regular constructions, this variability is usually more restricted (e.g. some nouns cannot be modified by an adjective), hence various variability profiles. We address here a subproblem of VMWE identification, namely the identification of occurrences of VMWEs previously seen in corpora, whatever their surface form, which requires to take ambiguity into account to avoidliteral (he kicked the old bucket) or coincidental occurrences (he kicked the ball and the bucket fell down). To this end, we considered two main approaches : The first one is based on a language independent measure of VMWE variability. The second one consists in modeling the problem as a classification task on the basis of features relevant to the VMWE morphosyntactic variability, which led to a system (VarIDE) that participated in the PARSEME shared task on automatic identification of VMWEs in 2018
Kyriakopoulou, Anthoula. "Elaboration de ressources électroniques pour les noms composés de type N (E+DET=G) N=G du grec moderne." Phd thesis, Université Paris-Est, 2011. http://pastel.archives-ouvertes.fr/pastel-00666189.
Full textGonçalves, Carlos Jorge de Sousa. "Parallel and Distributed Statistical-based Extraction of Relevant Multiwords from Large Corpora." Doctoral thesis, 2017. http://hdl.handle.net/10362/28488.
Full textMoszczyński, Radosław. "Formal approaches to multiword lexemes." Thesis, 2006. https://bc.klf.uw.edu.pl/246/1/3301-MGR-FL-A-25320.pdf.
Full textFazly, Afsaneh. "Automatic acquisition of lexical knowledge about multiword predicates." 2007. http://link.library.utoronto.ca/eir/EIRdetail.cfm?Resources__ID=478903&T=F.
Full textBai, Ming-Hong, and 白明弘. "Extraction of Bilingual Multiword Expressions with Application to Bilingual Concordancer." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/63001866540723370915.
Full text國立清華大學
資訊工程學系
101
A bilingual concordancer is a computer-assisted translation tool that uses the parallel corpus as its knowledge base. Given a word or phrase, the bilingual concordancer retrieves aligned sentence pairs, which contain the word or phrase in the source sentences, from the parallel corpus. Then, it identifies the translation equivalents in the target sentences and reorders the sentence pairs according to the correlation from the query string and the translation equivalents. It helps not only on finding translation equivalents of the query but also presenting various contexts of occurrence. As a result, it is extremely useful for bilingual lexicographers, human translators and second language learners. Extraction of bilingual multi-word expressions is the most important part of a bilingual concordancer. For example, highlighting translation equivalents in the target sentence and generating translation equivalent list are highly depend on a high quality extraction model. However, the existing models for extracting translation equivalents still have many problems and still room to improve. In this thesis, we discuss some problems of the existing models for extracting bilingual multi-word expressions, including the over-alignment problem and the under-alignment problem. Then, we propose a novel model to address these problems to improve the quality the extracted translation equivalents. Further, we implement a bilingual concordancer employs the proposed translation extraction model. To measure the performance of the bilingual concordancer, we use three type of multi-word expression as our test target. The results are compared with the existing statistical machine translation models.
Wu, Tzu-Wei, and 吳紫葦. "Extraction of Multiword Expressions related to Grammatical Collocation Based on Syntactic and Statistical Information." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/40531784349915519893.
Full text國立清華大學
資訊系統與應用研究所
94
This paper concentrates on the study of multiword expressions related to grammatical collocations. We propose a method to automatically extract grammatical collocations from a corpus. Our method involves selecting collocations in line with certain structure based on part of speech information and analyses of base phrases, extracting meaningful grammatical collocations by statistical analysis of associativity. In addition to statistics and linguistic knowledge, we also rely on syntactic patterns of multiword expressions. Take the collocate pattern of (“at”, “cost”) for example. Pattern of seed MWEs will enable us to obtain multiword expressions like “at cost” or “at all costs”. We exploit mutual information (MI) to evaluate each collocation candidate and filter out ones with low mutual information rate, which is a threshold trained on real data. Collocations with MI higher than the lower-bound are further used to assist in the extraction of multiword expressions. The grammatical collocations and related multiword expressions can be used in many Natural Language Processing applications, including computer assisted language learning, parsing, and machine translation.
Reynolds, Barry Lee, and 雷貝利. "Rethinking Frequency in Incidental Vocabulary Acquisition: The Effects of Word Form Variation and Multiword Patterns." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/9br86s.
Full text國立中央大學
學習與教學研究所
101
Within language acquisition research there exists a substantial body of literature supporting extensive reading as a means of vocabulary growth for both L1 and L2 learners. Moreover, vocabulary acquisition through extensive reading has been considered as occurring incidentally because learners are focused on the task of reading instead of learning vocabulary. In recent years, a large number of extensive reading studies have been conducted investigating the effects of numerous variables on the incidental acquisition of vocabulary through reading. These experiments, however, leave two crucial language acquisition issues unaddressed. Namely, none of these studies investigated how word-- internal form variation and/or word-‐external multiword pattern variation affects the incidental acquisition of target vocabulary through reading. The purpose of this dissertation, accordingly, is to investigate whether varying degrees of word form variation of target words (i.e., no variation, inflectional variation, and derivational variation) and the appearance of target word tokens in multiword patterns might affect the incidental acquisition of target vocabulary through reading. A group of L1 English-‐speaking and L2 English-‐speaking participants were given a copy of an unmodified English novel, The BFG, containing nonce words to read within two weeks. After reading, they were first given two surprise forms of assessment (meaning recall translation and meaning recognition multiple-‐choice) measuring acquisition of 49 target nonce words followed by an open-‐response reflective questionnaire. Five weeks later, post hoc interviews were conducted to gain a deeper understanding of both the L1 and L2-‐English speaking participants' perceptions of the nonce words used as target words in the dissertation research. The results revealed various significant similarities and differences between L1 and L2 speakers. As shown through the post hoc interviews, the L1 speakers did not perceive nonce words as worth learning, whereas the L2 speakers were less clear on whether they gave nonce words a different status than real English words. Moreover, data collected from an L1 control group suggests any interpretation of the L1 experimental group data must be done cautiously since the acquisition results cannot be totally attributed to incidental acquisition through reading. Analysis of target word acquisition data in terms of word form variation illuminated differences between L1 and L2 speakers, while analysis of target word acquisition data in terms of multiword patterns highlighted similarities between L1 and L2 speakers. Analysis of L2 speakers' target word acquisition results as shown on both assessments found an interaction effect between word form variation and frequency. For the meaning recall data, L2 speakers showed a statistically significant difference in acquisition between lower and higher frequency target words that exhibited derivational variation in form. However, for the meaning recognition data, L2 speakers showed a statistically significant difference in acquisition between lower and higher frequency target words that exhibited inflectional and derivational variation in form. Analysis of L1 speakers' target word acquisition results as shown on both assessments failed to find an interaction effect for word form variation and frequency. However, a statistically significant effect for word form variation was shown for both assessments. For the meaning recall, L1 speakers acquired more target words that did not vary in form than target words that exhibited inflectional or derivational variation in form. On the meaning recognition, L1 speakers acquired more target words that did not vary in form or exhibited derivational variation in form than target words that exhibited inflectional variation in form. Both groups of experimental participants acquired more target words that appeared in multiword patterns than did not appear in multiword patterns, regardless of assessment. Furthermore, an interaction effect between patterns and frequency was shown for both L1 and L2 participants' meaning recognition assessment results. While there was no significant difference in acquisition for lower frequency words that appeared in multiword patterns and lower frequency words that did not appear in multiword patterns, a significant difference in acquisition was shown between higher frequency words that appeared in multiword patterns and higher frequency words that did not appear in multiword patterns. Taking all the results together, the present dissertation research suggests: (1) frequency matters more for L2 speakers when encountering target words whose tokens exhibit inflectional and derivational variation in form, and (2) the appearance of target words in multiword patterns, especially higher frequency target words, matters to L1 and L2 speakers. Implications of the present dissertation research for the incidental vocabulary acquisition research community, corpus-‐derived analyses, teaching practices, materials development, and L2 vocabulary acquisition through extensive reading are discussed.
Ochieng, Dunlop. "Indirect Influence of English on Kiswahili: The Case of Multiword Duplicates between Kiswahili and English." Doctoral thesis, 2014. https://monarch.qucosa.de/id/qucosa%3A20316.
Full textScheepers, Ruth Angela. "Lexical levels and formulaic language : an exploration of undergraduate students' vocabulary and written production of delexical multiword units." Thesis, 2014. http://hdl.handle.net/10500/18245.
Full textLinguistics and Modern Languages
D. Litt. et Phil. (Linguistics)
Bejček, Eduard. "Automatické propojování lexikografických zdrojů a korpusových dat." Doctoral thesis, 2015. http://www.nusl.cz/ntk/nusl-351016.
Full textJungwirthová, Klára. "Víceslovná pojmenování v italštině." Master's thesis, 2015. http://www.nusl.cz/ntk/nusl-340200.
Full textRybáková, Jana. "Kvantitativní a kvalitativní rozbor spojek ve vybrané dětské literatuře." Master's thesis, 2018. http://www.nusl.cz/ntk/nusl-382983.
Full textHubková, Helena. "Názvy současných profesí ve zdravotnictví." Master's thesis, 2016. http://www.nusl.cz/ntk/nusl-352469.
Full textLief, Eric. "Použití hlubokých kontextualizovaných slovních reprezentací založených na znacích pro neuronové sekvenční značkování." Master's thesis, 2019. http://www.nusl.cz/ntk/nusl-393167.
Full text