Relevant bibliographies by topics / Corpus method

Academic literature on the topic 'Corpus method'

Author: Grafiati

Published: 30 May 2022

Last updated: 31 May 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Corpus method.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Corpus method"

Waskita, Dana. "CORPUS LINGUISTICS: METHOD, THEORY, AND PRACTICE." Jurnal Sosioteknologi 16, no. 1 (April 29, 2017): 145–47. http://dx.doi.org/10.5614/sostek.itbj.2017.16.1.12.

Full text

APA, Harvard, Vancouver, ISO, and other styles

de Bruijn, L. M., A. Hasman, and J. W. Arends. "Automatic SNOMED classification—a corpus-based method." Computer Methods and Programs in Biomedicine 54, no. 1-2 (September 1997): 115–22. http://dx.doi.org/10.1016/s0169-2607(97)00040-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Xu, Gui-Xian, Chang-Zhi Wang, Li-Hui Wang, Yu-Hong Zhou, Wei-Kang Li, Hao Xu, and Qing Huang. "Semantic classification method for network Tibetan corpus." Cluster Computing 20, no. 1 (January 19, 2017): 155–65. http://dx.doi.org/10.1007/s10586-017-0742-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reinert, par Max. "Classification Descendante Hierarchique et Analvse Lexicale par Contexte - Application au Corpus des Poesies D'A. Rihbaud." Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 13, no. 1 (January 1987): 53–90. http://dx.doi.org/10.1177/075910638701300107.

Full text

Abstract:

Hierarchically descending classification and lexical analysis by context: application to the corpus of A. Ri baud's poetry. Using a lexical analysis by context and a hierarchically descending classification method, the author examines the corpus of Rimbaud's poetry. Two different codings of the corpus with the use cf the same method of analysis, plus other methods also, furnish a means of judging the stability of the results obtained. Lexical analysis, Arthur Rimbaud, methodology, hierarchically descending classification analysis, coding. Employant une analyse lexicale par contexte et une methode de classification descendante hierarchique, l'auteur examine Ie corpus de la poese de Rimbaud. Deux codages differents de ce corpus et l'application de la meme methode d'analyse, plus l'utilisation d'autres methodes, permettent de juger la stabilite des resultats obtenus. Analyse lexicale, Arthur Rimbaud, methodologie, classificton descendante hierarchique, codage.

APA, Harvard, Vancouver, ISO, and other styles

Chitez, Madalina, and Loredana Bercuci. "Calibrating digital method integration into ESP courses according to disciplinary settings." New Trends and Issues Proceedings on Humanities and Social Sciences 7, no. 1 (July 2, 2020): 20–29. http://dx.doi.org/10.18844/prosoc.v7i1.4862.

Full text

Abstract:

This study aims to analyse the effects of a digitally enhanced teaching strategy in an ESP course. The intervention method consists of guided corpus linguistics exercises which are progressively introduced to improve the students’ academic writing. We collected data from various task-based corpus processing, consultation and analysis stages, each one having a different complexity level: compilation of a discipline-specific expert corpus, consultation of a native speaker English corpus and analyses of both types. The pre- and post-intervention results are quantitatively and qualitatively assessed, controlling for discipline specificity. The results of a corpus consultation satisfaction survey are also included in the analysis. We conclude that corpus consultations not only lead to the improvement of ESP students’ writing but also increasing student motivation. The recommendation is to first test the digital methods in ESP courses and calibrate them according to disciplinary settings.

APA, Harvard, Vancouver, ISO, and other styles

Shima, Kazuaki, Takeshi Homma, Masataka Motohashi, Rintaro Ikeshita, Hiroaki Kokubo, Yasunari Obuchi, and Jinhua She. "Efficient Corpus Creation Method for NLU Using Interview with Probing Questions." Journal of Advanced Computational Intelligence and Intelligent Informatics 23, no. 5 (September 20, 2019): 947–55. http://dx.doi.org/10.20965/jaciii.2019.p0947.

Full text

Abstract:

This paper presents an efficient method to build a corpus to train natural language understanding (NLU) modules. Conventional corpus creation methods involve a common cycle: a subject is given a specific situation where the subject operates a device by voice, and then the subject speaks one utterance to execute the task. In these methods, many subjects are required in order to build a large-scale corpus, which causes a problem of increasing lead time and financial cost. To solve this problem, we propose to incorporate a “probing question” into the cycle. Specifically, after a subject speaks one utterance, the subject is asked to think of alternative utterances to execute the same task. In this way, we obtain many utterances from a small number of subjects. An evaluation of the proposed method applied to interview-based corpus creation shows that the proposed method reduces the number of subjects by 41% while maintaining morphological diversity in a corpus and morphological coverage for user utterances spoken to commercial devices. It also shows that the proposed method reduces the total time for interviewing subjects by 36% compared with the conventional method. We conclude that the proposed method can be used to build a useful corpus while reducing lead time and financial cost.

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Jin Xi, Hong Zhi Yu, Ning Ma, and Zhao Yao Li. "The Phoneme Automatic Segmentation Algorithms Study of Tibetan Lhasa Words Continuous Speech Stream." Advanced Materials Research 765-767 (September 2013): 2051–54. http://dx.doi.org/10.4028/www.scientific.net/amr.765-767.2051.

Full text

Abstract:

In this paper, we adopt two methods to voice phoneme segmentation when building Tibetan corpus: One is the traditional artificial segmentation method, one is the automatic segmentation method based on the Mono prime HMM model. And experiments are performed to analyze the accuracy of both methods of segmentations. The results showed: Automatic segmentation method based tone prime HMM model helps to shorten the cycle of building Tibetan corpus, especially in building a large corpus segmentation and labeling a lot of time and manpower cost savings, and have greatly improved the accuracy and consistency of speech corpus annotation information.

APA, Harvard, Vancouver, ISO, and other styles

HO, CHUKFONG, MASRAH AZRIFAH AZMI MURAD, RABIAH ABDUL KADIR, and SHYAMALA DORAISAMY. "COMPARING TWO CORPUS-BASED METHODS FOR EXTRACTING PARAPHRASES TO DICTIONARY-BASED METHOD." International Journal of Semantic Computing 05, no. 02 (June 2011): 133–78. http://dx.doi.org/10.1142/s1793351x11001225.

Full text

Abstract:

Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases.

APA, Harvard, Vancouver, ISO, and other styles

Dao, Jizhaxi, Zhijie Cai, Rangzhuoma Cai, Maocuo San, and Mabao Ban. "A method of constructing syllable level Tibetan text classification corpus." MATEC Web of Conferences 336 (2021): 06013. http://dx.doi.org/10.1051/matecconf/202133606013.

Full text

Abstract:

Corpus serves as an indispensable ingredient for statistical NLP research and real-world applications, therefore corpus construction method has a direct impact on various downstream tasks. This paper proposes a method to construct Tibetan text classification corpus based on a syllable-level processing technique which we refer as TC_TCCNL. Empirical evidence indicates that the algorithm is able to produce a promising performance, which may lay a starting point for research on Tibetan text classification in the future.

APA, Harvard, Vancouver, ISO, and other styles

Wang, Shi, Zhujun Wang, Yi Jiang, and Huayu Wang. "Hierarchical Annotation Event Extraction Method in Multiple Scenarios." Wireless Communications and Mobile Computing 2021 (March 18, 2021): 1–9. http://dx.doi.org/10.1155/2021/8899852.

Full text

Abstract:

In the event extraction task, considering that there may be multiple scenarios in the corpus and an argument may play different roles under different triggers, the traditional tagging scheme can only tag each word once, which cannot solve the problem of argument overlap. A hierarchical tagging pipeline model for Chinese corpus based on the pretrained model Bert was proposed, which can obtain the relevant arguments of each event in a hierarchical way. The pipeline structure is selected in the model, and the event extraction task is divided into event trigger classification and argument recognition. Firstly, the pretrained model Bert is used to generate the feature vector and transfer it to bidirectional gated recurrent unit+conditional random field (BiGRU+CRF) model for trigger classification; then, the marked event type features are spliced into the corpus as known features and then passed into BiGRU+CRF for argument recognition. We evaluated our method on DUEE, combined with data enhancement and mask operation. Experimental results show that our method is improved compared with other baselines, which prove the effectiveness of the model in Chinese corpus.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Corpus method"

Tariq, Mariam. "A corpus-based method for ontology acquisition." Thesis, University of Surrey, 1994. http://epubs.surrey.ac.uk/843178/.

Full text

Abstract:

In this thesis we explore the acquisition of a domain ontology based on the characteristics of languages, in particular specialist languages. Our work is supported by the presumption that language can communicate information, specifically classification information, and especially when employed within specialist domains of knowledge. Knowledge involves being familiar with the existence of important objects and interrelationships between objects that make up a specific world, and language is often used as a medium to make this knowledge explicit. We examine the possibility of a local grammar for statements that convey ontological information. Assuming a correlation between the conceptual structure of a domain and a substantial collection of domain specific documents, we propose a method to analyse such a collection in an attempt to elicit this conceptual structure, which may help in understanding the ontological commitment of the domain experts. We have developed a prototype to implement the proposed method.

APA, Harvard, Vancouver, ISO, and other styles

Rayson, Paul Edward. "Matrix : a statistical method and software tool for linguistic analysis through corpus comparison." Thesis, Lancaster University, 2003. http://eprints.lancs.ac.uk/12287/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Norkevičius, Giedrius. "Method for creating phone duration models using very large, multi-speaker, automatically annotated speech corpus." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2011. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2011~D_20110201_144440-12017.

Full text

Abstract:

Two heretofore unanalyzed aspects are addressed in this dissertation: 1. Building a model capable of predicting phone duration of Lithuanian. All existing investigations of phone durations of Lithuanian were performed by linguists. Usually these investigations are the kind of exploratory statistics and are limited to a single factor, affecting phone duration, analysis. Phone duration dependencies on contextual factors were estimated and written in explicit form (decision tree) in this work by means of machine learning method. 2. Construction of language independent method for creating phone duration models using very large, multi-speaker, automatically annotated speech corpus. Most of the researchers worldwide use speech corpus that are: relatively small scale, single speaker, manually annotated or at least validated by experts. Usually the referred reasons are: using multi-speaker speech corpora is inappropriate because different speakers have different pronunciation manners and speak in different speech rate; automatically annotated corpuses lack accuracy. The created method for phone duration modeling enables the use of such corpus. The main components of the created method are: the reduction of noisy data in speech corpus; normalization of speaker specific phone durations by using phone type clustering. The performed listening tests of synthesized speech, showed that: the perceived naturalness is affected by the underlying phones durations; The use of contextual... [to full text]
Disertacijoje nagrinėjamos dvi iki šiol netyrinėtos problemos: 1. Lietuvių kalbos garsų trukmių prognozavimo modelių kūrimas Iki šiol visi darbai, kuriuose yra nagrinėjamos lietuvių kalbos garsų trukmės, yra atlikti kalbininkų, tačiau šie tyrimai yra daugiau aprašomosios statistikos pobūdžio ir apsiriboja pavienių požymių įtakos garso trukmei analize. Šiame darbe, mašininio mokymo algoritmo pagalba, požymių įtaka garsų trukmei yra išmokstama iš duomenų ir užrašoma sprendimo medžio pavidalu. 2. Nuo kalbos nepriklausomų garsų trukmių prognozavimo modelių kūrimo metodas, naudojant didelės apimties daugelio, kalbėtojų automatiškai, anotuotą garsyną. Dėl skirtingų kalbėtojų tarties specifikos ir dėl automatinio anotavimo netikslumų, kuriant garsų trukmės modelius visame pasaulyje yra apsiribojama vieno kalbėtojo ekspertų anotuotais nedidelės apimties garsynais. Darbe pasiūlyti skirtingų kalbėtojų tarties ypatybių normalizavimo ir garsyno duomenų triukšmo atmetimo algoritmai leidžia garsų trukmių modelių kūrimui naudoti didelės apimties, daugelio kalbėtojų automatiškai anotuotus garsynus. Darbo metu atliktas audicinis tyrimas, kurio pagalba parodoma, kad šnekos signalą sudarančių garsų trukmės turi įtakos klausytojų/respondentų suvokiamam šnekos signalo natūralumui; kontekstinės informacijos panaudojimas garsų trukmių prognozavimo uždavinio sprendime yra svarbus faktorius įtakojantis sintezuotos šnekos natūralumą; natūralaus šnekos signalo atžvilgiu, geriausiai vertinamas yra... [toliau žr. visą tekstą]

APA, Harvard, Vancouver, ISO, and other styles

Lynn, Ethan Michael. "Getting All the Ducks in a Row: Towards a Method for the Consolidation of English Idioms." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/6014.

Full text

Abstract:

Idioms play an important role in language acquisition but learners do not have sufficient time to learn all of them. Therefore, learners need to focus on the most frequently occurring idioms, which can be determined by corpus searches. Building off previous corpus studies, this study generated a comprehensive list of English idioms by combining lists from several sources and developed a methodology for organizing and sorting idioms within the list. In total, over 27,000 idiom forms were amalgamated and a portion of the list was compiled, which featured 2,697 core idioms and 5,559 variant idiom forms. It was found that over 35% of idioms varied structurally and thirteen types of idiom variation were highlighted. Additionally, issues concerning idiom boundaries were investigated. These results are congruent with previous findings which show that variation is a commonly occurring element of idioms. Furthermore, specific problematic elements for future corpus searches and English language learners are identified.

APA, Harvard, Vancouver, ISO, and other styles

Theunissen, M. W. (Marthinus Wilhelmus). "Phonene-based topic spotting on the switchboard corpus." Thesis, Stellenbosch : Stellenbosch University, 2002. http://hdl.handle.net/10019.1/52998.

Full text

Abstract:

Thesis (MScEng)--Stellenbosch University, 2002.
ENGLISH ABSTRACT: The field of topic spotting in conversational speech deals with the problem of identifying "interesting" conversations or speech extracts contained within large volumes of speech data. Typical applications where the technology can be found include the surveillance and screening of messages before referring to human operators. Closely related methods can also be used for data-mining of multimedia databases, literature searches, language identification, call routing and message prioritisation. The first topic spotting systems used words as the most basic units. However, because of the poor performance of speech recognisers, a large amount of topic-specific hand-transcribed training data is needed. It is for this reason that researchers started concentrating on methods using phonemes instead, because the errors then occur on smaller, and therefore less important, units. Phoneme-based methods consequently make it feasible to use computer generated transcriptions as training data. Building on word-based methods, a number of phoneme-based systems have emerged. The two most promising ones are the Euclidean Nearest Wrong Neighbours (ENWN) algorithm and the newly developed Stochastic Method for the Automatic Recognition of Topics (SMART). Previous experiments on the Oregon Graduate Institute of Science and Technology's Multi-Language Telephone Speech Corpus suggested that SMART yields a large improvement over ENWN which outperformed competing phoneme-based systems in evaluations. However, the small amount of data available for these experiments meant that more rigorous testing was required. In this research, the algorithms were therefore re-implemented to run on the much larger Switchboard Corpus. Subsequently, a substantial improvement of SMART over ENWN was observed, confirming the result that was previously obtained. In addition to this, an investigation was conducted into the improvement of SMART. This resulted in a new counting strategy with a corresponding improvement in performance.
AFRIKAANSE OPSOMMING: Die veld van onderwerp-herkenning in spraak het te doen met die probleem om "interessante" gesprekke of spraaksegmente te identifiseer tussen groot hoeveelhede spraakdata. Die tegnologie word tipies gebruik om gesprekke te verwerk voor dit verwys word na menslike operateurs. Verwante metodes kan ook gebruik word vir die ontginning van data in multimedia databasisse, literatuur-soektogte, taal-herkenning, oproep-kanalisering en boodskap-prioritisering. Die eerste onderwerp-herkenners was woordgebaseerd, maar as gevolg van die swak resultate wat behaal word met spraak-herkenners, is groot hoeveelhede hand-getranskribeerde data nodig om sulke stelsels af te rig. Dit is om hierdie rede dat navorsers tans foneemgebaseerde benaderings verkies, aangesien die foute op kleiner, en dus minder belangrike, eenhede voorkom. Foneemgebaseerde metodes maak dit dus moontlik om rekenaargegenereerde transkripsies as afrigdata te gebruik. Verskeie foneemgebaseerde stelsels het verskyn deur voort te bou op woordgebaseerde metodes. Die twee belowendste stelsels is die "Euclidean Nearest Wrong Neighbours" (ENWN) algoritme en die nuwe "Stochastic Method for the Automatic Recognition of Topics" (SMART). Vorige eksperimente op die "Oregon Graduate Institute of Science and Technology's Multi-Language Telephone Speech Corpus" het daarop gedui dat die SMART algoritme beter vaar as die ENWN-stelsel wat ander foneemgebaseerde algoritmes geklop het. Die feit dat daar te min data beskikbaar was tydens die eksperimente het daarop gedui dat strenger toetse nodig was. Gedurende hierdie navorsing is die algoritmes dus herimplementeer sodat eksperimente op die "Switchboard Corpus" uitgevoer kon word. Daar is vervolgens waargeneem dat SMART aansienlik beter resultate lewer as ENWN en dit het dus die geldigheid van die vorige resultate bevestig. Ter aanvulling hiervan, is 'n ondersoek geloods om SMART te probeer verbeter. Dit het tot 'n nuwe telling-strategie gelei met 'n meegaande verbetering in resultate.

APA, Harvard, Vancouver, ISO, and other styles

Huiza, Pereyra Eric Raphael. "Talking with signs: a simple method to detect nouns and numbers in a non annotated signs language corpus." Master's thesis, Pontificia Universidad Católica del Perú, 2020. http://hdl.handle.net/20.500.12404/16906.

Full text

Abstract:

People with deafness or hearing disabilities who aim to use computer based systems rely on state-of-art video classification and human action recognition techniques that combine traditional movement pat-tern recognition and deep learning techniques. In this work we present a pipeline for semi-automatic video annotation applied to a non-annotated Peru-vian Signs Language (PSL) corpus along with a novel method for a progressive detection of PSL elements (nSDm). We produced a set of video annotations in-dicating signs appearances for a small set of nouns and numbers along with a labeled PSL dataset (PSL dataset). A model obtained after ensemble a 2D CNN trained with movement patterns extracted from the PSL dataset using Lucas Kanade Opticalflow, and a RNN with LSTM cells trained with raw RGB frames extracted from the PSL dataset reporting state-of-art results over the PSL dataset on signs classification tasks in terms of AUC, Precision and Recall.
Trabajo de investigación

APA, Harvard, Vancouver, ISO, and other styles

Lorenzi, Mikaela, and Sofia Bergström. ""I can tell a story that my dads friend tell me" : A corpus- and interview-based study on grammar education, with focus on verb forms." Thesis, Uppsala universitet, Institutionen för pedagogik, didaktik och utbildningsstudier, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-275268.

Full text

Abstract:

This study consists of two methods: textual analysis and interviews, which are based on text from The Uppsala Learner English Corpus (ULEC), and teachers as interview objects. The textual analysis investigates errors made by students in year seven and year nine, regarding the construction of different verb forms in written English essays. A potential difference between errors made in year seven and nine is also examined. Moreover, the interview based analysis investigates professional junior high school teachers’ teaching methods and attitudes towards grammar. The errors investigated in the textual analysis are compared with the responses of the teachers’ perception of common errors in verb forms made by their students. The textual analysis showed that the most common errors made regard spelling within the verb phrase, auxiliary verbs, subject-verb agreement, and irregular verbs, and that year seven had a higher frequency of errors than year nine in most categories, even if the results differed inconsiderably. The analysis of the interviews of the teachers found that teachers, in general, enjoy grammar, and aim to have a student-centered approach, however, the teachers testify of characteristics of traditional teacher-centered grammar teaching. It is reasoned that traditional teacher-centered grammar teaching is fundamentally established, where teachers today appear not to acquire the tools to move away from the teacher-centered approach onwards to a student-centered grammar teaching. We reason that the education of L2 teachers needs to be reformed and provide tools to help teachers achieve a student-centered approach, and therein enable students to become more successful in grammar.

APA, Harvard, Vancouver, ISO, and other styles

Olsson, Fredrik. "Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora." Doctoral thesis, SICS, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:ri:diva-22935.

Full text

Abstract:

This thesis describes the development and in-depth empirical investigation of a method, called BootMark, for bootstrapping the marking up of named entities in textual documents. The reason for working with documents, as opposed to for instance sentences or phrases, is that the BootMark method is concerned with the creation of corpora. The claim made in the thesis is that BootMark requires a human annotator to manually annotate fewer documents in order to produce a named entity recognizer with a given performance, than would be needed if the documents forming the basis for the recognizer were randomly drawn from the same corpus. The intention is then to use the created named en- tity recognizer as a pre-tagger and thus eventually turn the manual annotation process into one in which the annotator reviews system-suggested annotations rather than creating new ones from scratch. The BootMark method consists of three phases: (1) Manual annotation of a set of documents; (2) Bootstrapping – active machine learning for the purpose of selecting which document to an- notate next; (3) The remaining unannotated documents of the original corpus are marked up using pre-tagging with revision. Five emerging issues are identified, described and empirically investigated in the thesis. Their common denominator is that they all depend on the real- ization of the named entity recognition task, and as such, require the context of a practical setting in order to be properly addressed. The emerging issues are related to: (1) the characteristics of the named entity recognition task and the base learners used in conjunction with it; (2) the constitution of the set of documents annotated by the human annotator in phase one in order to start the bootstrapping process; (3) the active selection of the documents to annotate in phase two; (4) the monitoring and termination of the active learning carried out in phase two, including a new intrinsic stopping criterion for committee-based active learning; and (5) the applicability of the named entity recognizer created during phase two as a pre-tagger in phase three. The outcomes of the empirical investigations concerning the emerging is- sues support the claim made in the thesis. The results also suggest that while the recognizer produced in phases one and two is as useful for pre-tagging as a recognizer created from randomly selected documents, the applicability of the recognizer as a pre-tagger is best investigated by conducting a user study involving real annotators working on a real named entity recognition task.

APA, Harvard, Vancouver, ISO, and other styles

Do, Thi Ngoc Diep. "Extraction de corpus parallèle pour la traduction automatique depuis et vers une langue peu dotée." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00680046.

Full text

Abstract:

Les systèmes de traduction automatique obtiennent aujourd'hui de bons résultats sur certains couples de langues comme anglais - français, anglais - chinois, anglais - espagnol, etc. Les approches de traduction empiriques, particulièrement l'approche de traduction automatique probabiliste, nous permettent de construire rapidement un système de traduction si des corpus de données adéquats sont disponibles. En effet, la traduction automatique probabiliste est fondée sur l'apprentissage de modèles à partir de grands corpus parallèles bilingues pour les langues source et cible. Toutefois, la recherche sur la traduction automatique pour des paires de langues dites "peu dotés" doit faire face au défi du manque de données. Nous avons ainsi abordé le problème d'acquisition d'un grand corpus de textes bilingues parallèles pour construire le système de traduction automatique probabiliste. L'originalité de notre travail réside dans le fait que nous nous concentrons sur les langues peu dotées, où des corpus de textes bilingues parallèles sont inexistants dans la plupart des cas. Ce manuscrit présente notre méthodologie d'extraction d'un corpus d'apprentissage parallèle à partir d'un corpus comparable, une ressource de données plus riche et diversifiée sur l'Internet. Nous proposons trois méthodes d'extraction. La première méthode suit l'approche de recherche classique qui utilise des caractéristiques générales des documents ainsi que des informations lexicales du document pour extraire à la fois les documents comparables et les phrases parallèles. Cependant, cette méthode requiert des données supplémentaires sur la paire de langues. La deuxième méthode est une méthode entièrement non supervisée qui ne requiert aucune donnée supplémentaire à l'entrée, et peut être appliquée pour n'importe quelle paires de langues, même des paires de langues peu dotées. La dernière méthode est une extension de la deuxième méthode qui utilise une troisième langue, pour améliorer les processus d'extraction de deux paires de langues. Les méthodes proposées sont validées par des expériences appliquées sur la langue peu dotée vietnamienne et les langues française et anglaise.

APA, Harvard, Vancouver, ISO, and other styles

Chan, Nok Chin Lydia. "Grammar "bores the crap out of me!": A mixed-method study on the XTYOFZ construction and its usage by ESL and ENL speakers." Thesis, Stockholms universitet, Engelska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-194086.

Full text

Abstract:

Different from Generative Grammar which sees grammar as a formal system of how words are put together to form sentences, Construction Grammar suggests that grammar is more than just rules and surface forms; instead, grammar includes many form-and-meaning pairings which are called constructions. For years, Construction Grammarians have been investigating constructions with various approaches, including corpus-linguistics, pedagogical, second language acquisition and so on, yet there is still room for exploration. The present paper aims to further investigate the [V the Ntaboo-word out of]-construction (Hoeksema & Napoli, 2008; Haïk, 2012; Perek, 2016; Hoffmann, 2020) (e.g., I kick the hell out of him.) and propose a new umbrella construction, “X the Y out of Z” (XTYOFZ) construction, for it. Another aim is to examine the usage and comprehension of the XTYOFZ construction by English as a Second Language (ESL) and English as Native Language (ENL) speakers. The usage context, syntactic and semantic characteristics of the XTYOFZ construction were examined through corpus linguistic methodology. Furthermore, processing and understanding of the construction by ESL and ENL speakers were tested via an online timed Lexical Decision Task as well as an online follow-up survey consisting of questions on English acquisition and usage, and a short comprehension task on the XTYOFZ construction. Corpus data shows that in general, the combination of non-motion action verbs (e.g., scare, beat) as X and taboo terms (e.g., shit, hell) as Y was the most common. Also, it was found that the construction occurs mostly in non-academic contexts such as websites and TV/movies. On the other hand, results from the Lexical Decision Task show that ESL speakers access constructional meaning slightly more slowly than ENL speakers. The follow-up survey also reflects that ESL speakers seem to have a harder time to produce and comprehend the construction compared to ENL speakers. By investigating the features of a relatively less-discussed construction and its usage by ESL speakers, this study hopes to increase the knowledge base of Construction Grammar and ESL construction comprehension and usage, particularly on the constructions that are mainly used in more casual settings.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Corpus method"

Tony, McEnery. Corpus linguistics: Method, theory and practice. Cambridge: Cambridge University Press, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

1928-, Kennedy George Alexander, and Rabe Hugo, eds. Invention and method: Two rhetorical treatises from the Hermogenic corpus. Atlanta: Society of Biblical Literature, 2005.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Hermogenes. Invention and method: Two rhetorical treatises from the Hermogenic corpus. Atlanta, GA: Society of Biblical Literature, 2006.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Hermogenes. Invention and method in Greek rhetorical theory: Two treatises from the Hermogenic corpus. Leiden: Brill, 2005.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Bubenhofer, Noah. Patterns of Language Usage. Corpus Linguistics as a Method of Analyzing Discourse and Culture. Berlin, New York: Walter de Gruyter, 2009. http://dx.doi.org/10.1515/9783110215854.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Glynn, Dylan, and Justyna A. Robinson, eds. Corpus Methods for Semantics. Amsterdam: John Benjamins Publishing Company, 2014. http://dx.doi.org/10.1075/hcp.43.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Statistics for corpus linguistics. Edinburgh: Edinburgh University Press, 1998.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Lu, Xiaofei. Computational Methods for Corpus Annotation and Analysis. Dordrecht: Springer Netherlands, 2014. http://dx.doi.org/10.1007/978-94-017-8645-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Oakes, Michael P., and Meng Ji, eds. Quantitative Methods in Corpus-Based Translation Studies. Amsterdam: John Benjamins Publishing Company, 2012. http://dx.doi.org/10.1075/scl.51.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Young, Steve. Corpus-Based Methods in Language and Speech Processing. Dordrecht: Springer Netherlands, 1997.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Corpus method"

Brezina, Vaclav, and Dana Gablasova. "The Corpus Method." In English Language, 595–609. London: Macmillan Education UK, 2018. http://dx.doi.org/10.1057/978-1-137-57185-4_40.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Marchi, Anna. "Method, Corpus, Process." In Self-Reflexive Journalism, 37–76. New York : Routledge , 2019. | Series: Routledge advances in corpus linguistics ; 23: Routledge, 2019. http://dx.doi.org/10.4324/9781315178691-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Plecháč, Petr. "A Collocation-Driven Method of Discovering Rhymes (in Czech, English, and French Poetry)." In Taming the Corpus, 79–95. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-98017-1_5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Villaseñor-Pineda, Luis, Manuel Montes-y-Gómez, Manuel Alberto Pérez-Coutiño, and Dominique Vaufreydaz. "A Corpus Balancing Method for Language Model Construction." In Computational Linguistics and Intelligent Text Processing, 393–401. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003. http://dx.doi.org/10.1007/3-540-36456-0_40.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Karaoglan, Bahar, Tarık Kışla, and Senem Kumova Metin. "Description of Turkish Paraphrase Corpus Structure and Generation Method." In Computational Linguistics and Intelligent Text Processing, 208–17. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-75477-2_13.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Aussenac-Gilles, Nathalie, Brigitte Biébow, and Sylvie Szulman. "Revisiting Ontology Design: A Method Based on Corpus Analysis." In Knowledge Engineering and Knowledge Management Methods, Models, and Tools, 172–88. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000. http://dx.doi.org/10.1007/3-540-39967-4_13.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Fu, Jianhui, Shi Wang, Ya Wang, and Cungen Cao. "A Practical Method of Identifying Chinese Metaphor Phrases from Corpus." In Knowledge Science, Engineering and Management, 43–54. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-47650-6_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Herrera, William G., Giovana S. Cover, and Leticia Rittner. "Pixel-Based Classification Method for Corpus Callosum Segmentation on Diffusion-MRI." In VipIMAGE 2017, 217–24. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-68195-5_24.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Deepa Modi and Neeta Nain. "Part-of-Speech Tagging of Hindi Corpus Using Rule-Based Method." In Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing, 241–47. New Delhi: Springer India, 2016. http://dx.doi.org/10.1007/978-81-322-2638-3_28.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kotani, Katsunori, and Takehiko Yoshimi. "Design for a Listening Learner Corpus for a Listenability Measurement Method." In Advances in Intelligent Systems and Computing, 15–26. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-70016-8_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Corpus method"

Pak, Alexander Alexandrovich, Sergazy Sakenovich Narynov, Arman Serikuly Zharmagambetov, Sholpan Nazarovna Sagyndykova, Zhanat Elubaevna Kenzhebayeva, and Irbulat Turemuratovich. "The method of synonyms extraction from unannotated corpus." In 2015 Third International Conference on Digital Information, Networking, and Wireless Communications (DINWC). IEEE, 2015. http://dx.doi.org/10.1109/dinwc.2015.7054207.

Full text

APA, Harvard, Vancouver, ISO, and other styles

ul Haque, Rakib, Parisa Mehera, M. F. Mridha, and Md Abdul Hamid. "Bengali Stop Phrase Detection Mechanism using Corpus Based Method." In 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR). IEEE, 2019. http://dx.doi.org/10.1109/iciev.2019.8858583.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wu, Yaguang, Haichun Sun, and Chungang Yan. "An event timeline extraction method based on news corpus." In 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA). IEEE, 2017. http://dx.doi.org/10.1109/icbda.2017.8078725.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bodenreider, Olivier, Thomas C. Rindflesch, and Anita Burgun. "Unsupervised, corpus-based method for extending a biomedical terminology." In the ACL-02 workshop. Morristown, NJ, USA: Association for Computational Linguistics, 2002. http://dx.doi.org/10.3115/1118149.1118157.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tu, Shiying, Haojin Hu, Ronglyu Sun, Yanmei Jing, and Wenxue He. "Research on the Construction Method of Chinese - Vietnamese Parallel Corpus." In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). IEEE, 2019. http://dx.doi.org/10.1109/iaeac47372.2019.8998000.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Terashima, Ryo, Hiroshi Echizen-ya, and Kenji Araki. "Learning Method for Extraction of Partial Correspondence from Parallel Corpus." In 2009 International Conference on Asian Language Processing (IALP). IEEE, 2009. http://dx.doi.org/10.1109/ialp.2009.69.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Xia, Jing-Jing, and Meng-Qiang Liu. "A Corpus Building Method on Forensic Identification of Dialects (FIDS)." In 2011 International Conference on Information Technology, Computer Engineering and Management Sciences (ICM). IEEE, 2011. http://dx.doi.org/10.1109/icm.2011.357.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Qingzhi, Sun, Du Qingfeng, Zhang Chenxi, and Li Jun. "Chinese News Event Corpus Construction Method Based on Syntax Tree." In ICBDT 2020: 2020 3rd International Conference on Big Data Technologies. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3422713.3422741.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Dan, Qiqi Zhang, Jiadong Cao, and Yihua Mao. "Study of Corpus Classification Management Method in Construction Correspondence Files." In Proceedings of the 1st International Symposium on Economic Development and Management Innovation (EDMI 2019). Paris, France: Atlantis Press, 2019. http://dx.doi.org/10.2991/edmi-19.2019.50.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bourigault, Didier. "An endogeneous corpus-based method for structural noun phrase disambiguation." In the sixth conference. Morristown, NJ, USA: Association for Computational Linguistics, 1993. http://dx.doi.org/10.3115/976744.976755.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Corpus method"

Sánchez Sabaté, R., C. del Valle, and M. Mensa. Method for the construction of large thematic corpora of online news articles. Towards a corpus of food-related news. Revista Latina de Comunicación Social, February 2019. http://dx.doi.org/10.4185/rlcs-2019-1347en.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mulliner, Sara. A Mixed Methods Analysis of Corpus Data from Reddit Discussions of "Gay Voice". Portland State University Library, December 2019. http://dx.doi.org/10.15760/etd.7316.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Philpot, J. N. Lecture Utilized as the Primary Method of Instruction in the Marine Corps. Fort Belvoir, VA: Defense Technical Information Center, February 2009. http://dx.doi.org/10.21236/ada509952.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Louie, David, Yifeng Wang, Rekha R. Rao, Alec Kucala, Kyle Ross, Jessica Nicole Kruichak, and William Robert Chavez. A New Method to Contain Molten Corium in Catastrophic Nuclear Reactor Accidents. Office of Scientific and Technical Information (OSTI), October 2019. http://dx.doi.org/10.2172/1573134.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Whitlow, James L. A Method for Collectively Measuring the Operating Tempo of Individuals in Marine Corps Units -- Why and How. Fort Belvoir, VA: Defense Technical Information Center, April 1990. http://dx.doi.org/10.21236/ada241099.

Full text

APA, Harvard, Vancouver, ISO, and other styles

McInerney, Michael, Matthew Brenner, Sean Morefield, Robert Weber, and John Carlyle. Acoustic nondestructive testing and measurement of tension for steel reinforcing members. Engineer Research and Development Center (U.S.), October 2021. http://dx.doi.org/10.21079/11681/42181.

Full text

Abstract:

Many concrete structures contain internal post-tensioned steel structural members that are subject to fracturing and corrosion. The major problem with conventional tension measurement techniques is that they use indirect and non-quantitative methods to determine whether there has been a loss of tension. This work developed an acoustics-based technology and method for making quantitative tension measurements of an embedded, tensioned steel member. The theory and model were verified in the laboratory using a variety of steel rods as test specimens. Field tests of the method were conducted at three Corps of Engineers dams. Measurements of the longitudinal and shear velocity were done on rods up to 50 ft long. Not all rods of this length were able to be measured and the quality and consistency of the signal varied. There were fewer problems measuring the longitudinal velocity than shear velocity. While the tension predictions worked in the laboratory tests, the tension could not be accurately calculated for any of the field sites because researchers could not obtain the longitudinal or shear velocities in an unstressed state, or precise measurements of the longitudinal and shear velocities due to the unknown precise length of the rods in the tensioned state.

APA, Harvard, Vancouver, ISO, and other styles

Gomrick, Kathleen M. Gung Ho, Raider! The Philosophy and Methods of Brigadier General Evans F. Carlson, Marine Corps Raider. Fort Belvoir, VA: Defense Technical Information Center, April 1999. http://dx.doi.org/10.21236/ada388945.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Malej, Matt, and Fengyan Shi. Suppressing the pressure-source instability in modeling deep-draft vessels with low under-keel clearance in FUNWAVE-TVD. Engineer Research and Development Center (U.S.), May 2021. http://dx.doi.org/10.21079/11681/40639.

Full text

Abstract:

This Coastal and Hydraulics Engineering Technical Note (CHETN) documents the development through verification and validation of three instability-suppressing mechanisms in FUNWAVE-TVD, a Boussinesq-type numerical wave model, when modeling deep-draft vessels with a low under-keel clearance (UKC). Many large commercial ports and channels (e.g., Houston Ship Channel, Galveston, US Army Corps of Engineers [USACE]) are traveled and affected by tens of thousands of commercial vessel passages per year. In a series of recent projects undertaken for the Galveston District (USACE), it was discovered that when deep-draft vessels are modeled using pressure-source mechanisms, they can suffer from model instabilities when low UKC is employed (e.g., vessel draft of 12 m¹ in a channel of 15 m or less of depth), rendering a simulation unstable and obsolete. As an increasingly large number of deep-draft vessels are put into service, this problem is becoming more severe. This presents an operational challenge when modeling large container-type vessels in busy shipping channels, as these often will come as close as 1 m to the bottom of the channel, or even touch the bottom. This behavior would subsequently exhibit a numerical discontinuity in a given model and could severely limit the sample size of modeled vessels. This CHETN outlines a robust approach to suppressing such instability without compromising the integrity of the far-field vessel wave/wake solution. The three methods developed in this study aim to suppress high-frequency spikes generated nearfield of a vessel. They are a shock-capturing method, a friction method, and a viscosity method, respectively. The tests show that the combined shock-capturing and friction method is the most effective method to suppress the local high-frequency noises, while not affecting the far-field solution. A strong test, in which the target draft is larger than the channel depth, shows that there are no high-frequency noises generated in the case of ship squat as long as the shock-capturing method is used.

APA, Harvard, Vancouver, ISO, and other styles

David, Gabrielle, D. Somerville, Julia McCarthy, Spencer MacNeil, Faith Fitzpatrick, Ryan Evans, and David Wilson. Technical guide for the development, evaluation, and modification of stream assessment methods for the Corps Regulatory Program. Engineer Research and Development Center (U.S.), October 2021. http://dx.doi.org/10.21079/11681/42182.

Full text

Abstract:

The U.S. Army Corps Regulatory Program considers the loss (impacts) and gain (compensatory mitigation) of aquatic resource functions as part of Clean Water Act Section 404 permitting and compensatory mitigation decisions. To better inform this regulatory decision-making, the Regulatory Program needs transparent and objective approaches to assess the function and condition of aquatic resources, including streams. Therefore, the Regulatory Program needs function-based stream assessments (1) to characterize a stream’s condition or function, (2) to improve understanding of the impact of a proposed action on an aquatic resource, and/or (3) to inform the development of stream compensatory mitigation tools rooted in stream condition and/or function. A function-based stream assessment can provide regulatory decision makers with the resources to objectively consider alternatives, minimize impacts, assess unavoidable impacts, determine mitigation requirements, and monitor the success of mitigation projects. A multiagency National Committee on Stream Assessment (NCSA) convened to create these guidelines to inform the development of new methods and evaluation of both national-level and regional methods currently in use. The resulting guidelines present nine phases, including rationale and recommendations to facilitate work efforts. The NCSA hopes that this technical guide promotes transparency, technical defensibility, and consistent application of stream assessments in the Regulatory Program.

APA, Harvard, Vancouver, ISO, and other styles

Haring, Christopher. Data collection tools for river geomorphology studies : LiDAR and traditional methods. Engineer Research and Development Center (U.S.), December 2021. http://dx.doi.org/10.21079/11681/42502.

Full text

Abstract:

The purpose of this review is to highlight LiDAR data usage for geomorphic studies and compare to other remote sensing technologies. This review further identifies survey efficiencies and issues that can be problematic in using LiDAR digital elevation models (DEMs) in completing surveys and geomorphic analysis. US Army Corps of Engineers (USACE) geospatial data collection guidance (EM 1110-1-1000) (USACE 2015) aligns with the American Society for Photogrammetry and Remote Sensing Positional Accuracy Standards for Digital Geospatial Data (ASPRS 2014). Geomorphic assessment technologies are rapidly evolving, and LiDAR data collection methods are at the forefront. The FluvialGeomorph (FG) toolbox, developed to support USACE watershed planning, is a recent example of the use of LiDAR high-resolution terrain data to provide a new, efficient approach for rapid watershed assessments (Haring et al. 2020; Haring and Biedenharn 2021). However, there are advantages and disadvantages in using LiDAR data compared to other remote sensing technologies and traditional topographic field survey methods.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Academic literature on the topic 'Corpus method'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Corpus method"

Dissertations / Theses on the topic "Corpus method"

Books on the topic "Corpus method"

Book chapters on the topic "Corpus method"

Conference papers on the topic "Corpus method"

Reports on the topic "Corpus method"