Academic literature on the topic 'Corpus compilation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Corpus compilation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Corpus compilation"

1

Alfraidi, Tareq, Mohammad A. R. Abdeen, Ahmed Yatimi, Reyadh Alluhaibi, and Abdulmohsen Al-Thubaity. "The Saudi Novel Corpus: Design and Compilation." Applied Sciences 12, no. 13 (June 30, 2022): 6648. http://dx.doi.org/10.3390/app12136648.

Full text
Abstract:
Arabic has recently received significant attention from corpus compilers. This situation has led to the creation of many Arabic corpora that cover various genres, most notably the newswire genre. Yet, Arabic novels, and specifically those authored by Saudi writers, lack the sufficient digital datasets that would enhance corpus linguistic and stylistic studies of these works. Thus, Arabic lags behind English and other European languages in this context. In this paper, we present the Saudi Novels Corpus, built to be a valuable resource for linguistic and stylistic research communities. We specifically present the procedures we followed and the decisions we made in creating the corpus. We describe and clarify the design criteria, data collection methods, process of annotation, and encoding. In addition, we present preliminary results that emerged from the analysis of the corpus content. We consider the work described in this paper as initial steps to bridge the existing gap between corpus linguistics and Arabic literary texts. Further work is planned to improve the quality of the corpus by adding advanced features.
APA, Harvard, Vancouver, ISO, and other styles
2

Castillo Rodríguez, Cristina, José María Díaz Lage, and Beatriz Rubio Martínez. "Compiling and analyzing a tagged learner corpus: a corpus-based study of adjective uses." Círculo de Lingüística Aplicada a la Comunicación 81 (February 21, 2020): 115–36. http://dx.doi.org/10.5209/clac.67932.

Full text
Abstract:
A learner corpus (LC) is widely known as a rich source of information regarding the use of expressions and the errors made by students in their productions. In fact, we, as teachers, can profit from the compilation of their tasks so as to analyze in detail their way of writing. However, the mere compilation of texts does not guarantee a successful exploitation, as more steps than saving texts must be involved in the whole process. Therefore, it seems essential to follow a protocolized methodology of compilation. In this paper we propose five phases for compiling a LC containing texts from the spontaneous written productions from undergraduate and postgraduate students. The outcomes thrown with the LC exploitation will reveal the errors in students’ productions regarding the use of plural, comparative and superlative in adjectives and also other fails detected in the tagging phase, most of which are due to students’ misuses.
APA, Harvard, Vancouver, ISO, and other styles
3

Kwon, Heokseung. "English learner corpora and research in Korea." Corpora 17, Supplement (October 2022): 5–22. http://dx.doi.org/10.3366/cor.2022.0244.

Full text
Abstract:
The interest in the exploitation of corpora in the study of Korean L2 learners’ use of English has risen dramatically over the past two decades, leading to the compilation of learner corpora and to numerous empirical investigations into Korean learners’ use of English. This paper will give an overview of the compilation and characteristics of English learner corpora in Korea and will also provide an analysis of the recent trends in learner corpus research. It was not until the mid-2000s that Korean academics started to compile English learner corpora, such as the snu Korean-speaking English Learner Corpus (skelc), the Yonsei English Learner Corpus (yelc), the Gachon Learner Corpus (glc), the Neungyule Interlanguage Corpus of Korean Learners of English (nickle), the efl Teacher Corpus (etc), the Korean English Learners’ Spoken Corpus (kelsc) and the ets Corpus of Non-native Written English (TOEFL11). There have also been a growing number of learner corpus-based studies that used the existing learner corpora as well as self-compiled corpus data. All the learner corpus-based research articles published in two Korean academic journals ( English Teaching and Korean Journal of Applied Linguistics) will be reviewed and analysed in terms of research topics and areas, data types, analysis methods and corpus compilation practices. Finally, this paper will suggest some future directions for learner corpus compilation and research in Korea.
APA, Harvard, Vancouver, ISO, and other styles
4

Llaurado, Anna, Maria Antònia Martí, and Liliana Tolchinsky. "Corpus CesCa." International Journal of Corpus Linguistics 17, no. 3 (December 31, 2012): 428–41. http://dx.doi.org/10.1075/ijcl.17.3.06lla.

Full text
Abstract:
This paper outlines the compilation of a corpus of Catalan written production. The CesCa corpus presents a picture of the Catalan written language throughout compulsory schooling. It contains two kinds of data: Vocabularies of five semantic fields comprising 242,404 lexical forms and Textual data of four different discourse genres consisting of 207,028 tokens. Both vocabularies and the textual data have been morphologically analyzed and lemmatized. The corpus is freely available. This paper will outline the main features of the corpus and make some suggestions as to the uses to which the corpus can be put.
APA, Harvard, Vancouver, ISO, and other styles
5

Monaco, Leida Maria, and Luis Puente-Castelo. "‘A matter both of curioſity and uſefulneſs’: Compiling the Corpus of English Texts on Language." Research in Corpus Linguistics 7 (2019): 47–68. http://dx.doi.org/10.32714/ricl.07.03.

Full text
Abstract:
This paper describes the compilation of CETeL, the subcorpus on ‘Language and Linguistics’ in the Coruña Corpus of English Scientific Writing, and discusses the various challenges encountered during the process of selection and digitisation of material. CETeL includes forty-four samples of texts on Language, Languages, and Linguistics from the period 1700–1900, and on completion will contain around 400,000 words. The paper will examine the historical context of academic writing in that period and the way in which this context affects the process of compilation. Likewise, the criteria followed in the compilation of the Coruña Corpus will be discussed in order to show the extent to which these criteria have affected the compilation of CETeL, and how they contribute towards making the corpus representative of the disciplinary practices of the period. Finally, the corpus will also be described according to a series of parameters used to assure representativeness and balance, namely the date of publication of samples, their genre, and the sex and linguistic background of their authors.
APA, Harvard, Vancouver, ISO, and other styles
6

Ó Meachair, Mícheál J., Brian Ó Raghallaigh, Úna Bhreathnach, Gearóid Ó Cleircín, and Kevin Scannell. "Tiomsú Corpais don Taighde Foclóireachta: Corpas Foclóireachta na Gaeilge (CFG2020)." TEANGA, the Journal of the Irish Association for Applied Linguistics 28 (December 9, 2021): 278–305. http://dx.doi.org/10.35903/teanga.v28i.726.

Full text
Abstract:
Leagtar amach sa pháipéar seo na céimeanna a leanadh le Corpas Foclóireachta na Gaeilge 2020 (CFG2020), corpas aonteangach 77.3 milliún focal, a thiomsú. Mínítear comhthéacs an tionscadail agus na riachtanais a spreag na cinntí a tógadh lena linn. Déantar cur síos ansin ar chéim an tiomsaithe agus ar na céimeanna próiseála. Tugtar spléachadh ar inneachar an chorpais, ar an acmhainn a cruthaíodh lena chuardach, agus ar an gcineál anailíse agus taighde a cumasaíodh leis seo. Tiomsaíodh CFG2020 ar an tuiscint gur réamhchéim é ar thionscadal níos leithne corpais, is ar an gcúis sin a dhéantar moltaí i dtaca lena fheabhsú agus lena mhéadú. [This paper sets out the steps followed in the compilation of Corpas Foclóireachta na Gaeilge 2020 (CFG2020), a monolingual 77.3 million word Irish-language corpus. The context and circumstances of the project are explained, along with the motivation for various decisions made. The compilation and processing stages are described in detail. The contents of the corpus are outlined and the resource created to query CFG2020 is presented, along with reference to the kinds of analysis and research which it enables. CFG2020 was created as a first step towards a proposed larger corpus project, and suggestions for improvement and expansion are therefore proposed.]
APA, Harvard, Vancouver, ISO, and other styles
7

Travis, Catherine E., and Rena Torres Cacoullos. "Making Voices Count: Corpus Compilation in Bilingual Communities." Australian Journal of Linguistics 33, no. 2 (May 2013): 170–94. http://dx.doi.org/10.1080/07268602.2013.814529.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

LOUREIRO-PORTO, LUCÍA. "ICE vs GloWbE: Big data and corpus compilation." World Englishes 36, no. 3 (September 2017): 448–70. http://dx.doi.org/10.1111/weng.12281.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ling Lee, Joanna Chiew, Phoey Lee Teh, Sian Lun Lau, and Irina Pak. "Compilation of malay criminological terms from online news." Indonesian Journal of Electrical Engineering and Computer Science 15, no. 1 (July 1, 2019): 355. http://dx.doi.org/10.11591/ijeecs.v15.i1.pp355-364.

Full text
Abstract:
<p>A Malay language corpus has been established by the Institute of Language and Literature (Dewan Bahasa dan Pustaka, DBP in Malaysia). Most of the past research on the Malay language corpus has focused on the description, lexicography and translation of the Malay language. However, in the existing literature, there is no list of Malay words that categorizes crime terminologies. This study aims to fill that linguistic gap. First, we aggregated the most frequently used crime terminology words from Malaysian online news sources. Five hundred crime-related words were compiled. No automatic machines were in the initial process, but they were subsequently used to verify the data. Four human coders were used to validate the data and ensure the originality of the semantic understanding of the Malay text. Finally, major crime terminologies were outlined from a set of keywords to serve as taggers in our solution. The ultimate goal of this study is to provide a corpus for forensic linguistics, police investigations, and general crime research. This study has established the first corpus of a criminological text in the Malay language.</p>
APA, Harvard, Vancouver, ISO, and other styles
10

Faya-Cerqueiro, Fátima, and Gema Alcaraz-Mármol. "The Toledo Teacher Trainees corpus (TTT): Bridging the gap between students’ narratives and corpus linguistics." Research in Corpus Linguistics 8 (2020): 147–63. http://dx.doi.org/10.32714/ricl.08.01.10.

Full text
Abstract:
In recent decades a few research methods have resorted to L2 learners in order to analyse several aspects aiming at methodological improvements. One of them is corpus linguistics, which has largely contributed to the study of language production from a quantitative perspective. A very different one has been the compilation of perceptions of the L2 learning process using ‘narrative inquiry’ and qualitative methods of analysis. However, scholars have not addressed the combination of both methods. In this proposal we examine their main individual features and offer an interwoven line of research, applying the quantitative approach of corpus linguistics to the genre of language learning narratives. Thus, we present a new corpus of L2 learners’ perceptions and provide detailed information on its structure, compilation and categorisation. The interdisciplinary status of this proposal will enable the exploration of new research possibilities that can ultimately benefit the teaching-learning process.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Corpus compilation"

1

Sebolela, Fannie. "The compilation of corpus-based Setswana dictionaries." Thesis, University of Pretoria, 2009. http://hdl.handle.net/2263/30829.

Full text
Abstract:
The aim of this thesis is to describe how corpus-based Setswana dictionaries should be compiled. The challenge to the modern Setswana lexicographer is to compile very practical descriptive and user-friendly dictionaries. A detailed evaluation of existing Setswana dictionaries will be performed in terms of the macrostructural and microstructural aspects:
  • Coverage of frequently used words.
  • Effective use of dictionary space.
  • Use of standard dictionary conventions.
  • Choice, ordering and composition of translation equivalent paradigms.
The focus will be on material collection and corpus building. Informants will be used to compile an oral corpus of 100,000 tokens. All ethical requirements such as informed consent requirements (See Appendix 1) will be honoured. Since the text corpus is an organic corpus, thus not a designed corpus aimed at balance and representativeness, the oral corpus will be constructed in the same way i.e. only basic selection criteria:
  • Mother tongue speakers of Setswana.
  • Adults (to be on a par with authors of the written sources in the text corpus). Age: ranging from 20-60 years.
  • Male and female.
Critical analysis of all currently available Setswana dictionaries will be done with special reference to the dictionaries of Brown (1987) (SESD), Snyman, et al. (1990), Matumo (1993).(MSED), Kgasa (1976) (THAND) and Kgasa and Tsonope (1995).(THAN) In all these cases the strategy would be in terms of the theoretical criteria and best practices in terms of a broad theoretical survey of core aspects of dictionary compilation. Finally, the study will be concluded with an analysis of corpus integrity and stability of Setswana corpora based on the model introduced by Prinsloo and De Schryver (2001a).
Thesis (DLitt)--University of Pretoria, 2009.
African Languages
Unrestricted
APA, Harvard, Vancouver, ISO, and other styles
2

Love, Robbie. "The Spoken British National Corpus 2014 : design, compilation and analysis." Thesis, Lancaster University, 2018. http://eprints.lancs.ac.uk/90068/.

Full text
Abstract:
The ESRC-funded Centre for Corpus Approaches to Social Science at Lancaster University (CASS) and the English Language Teaching group at Cambridge University Press (CUP) have compiled a new, publicly-accessible corpus of spoken British English from the 2010s, known as the Spoken British National Corpus 2014 (Spoken BNC2014). The 11.5 million-word corpus, gathered solely in informal contexts, is the first freely-accessible corpus of its kind since the spoken component of the original British National Corpus (the Spoken BNC1994), which, despite its age, is still used as a proxy for present-day English in research today. This thesis presents a detailed account of each stage of the Spoken BNC2014’s construction, including its conception, design, transcription, processing and dissemination. It also demonstrates the research potential of the corpus, by presenting a diachronic analysis of ‘bad language’ in spoken British English, comparing the 1990s to the 2010s. The thesis shows how the research team struck a delicate balance between backwards compatibility with the Spoken BNC1994 and optimal practice in the context of compiling a new corpus. Although comparable with its predecessor, the Spoken BNC2014 is shown to represent innovation in approaches to the compilation of spoken corpora. This thesis makes several useful contributions to the linguistic research community. The Spoken BNC2014 itself should be of use to many researchers, educators and students in the corpus linguistics and English language communities and beyond. In addition, the thesis represents an example of good practice with regards to academic collaboration with a commercial stakeholder. Thirdly, although not a ‘user guide’, the methodological discussions and analysis presented in this thesis are intended to help the Spoken BNC2014 to be as useful to as many people, and for as many purposes, as possible.
APA, Harvard, Vancouver, ISO, and other styles
3

Medjati, Mehdi. "Les animaux dans la compilation de Justinien." Aix-Marseille 3, 2008. http://www.theses.fr/2008AIX32049.

Full text
Abstract:
La Compilation de Justinien, œuvre monumentale de réformation et de systématisation du droit romain, comporte de nombreux fragments relatifs aux animaux. La collecte de ces paragraphes, suivie de leur classement, permet de découvrir, dans un ouvrage juridique, l’existence d’une véritable dimension animale. Le monde animal ainsi reconstitué comprend deux types d’animaux : d’une part, les animaux domestiques, utiles car étroitement associés aux activités humaines ; d’autre part, les bêtes sauvages, dont le statut apparaît beaucoup plus ambivalent. L’analyse des textes appelle deux séries d’observations. La place accordée aux animaux traduit d’abord la vision utilitariste que les Romains ont de l’animal : l’animal est un objet animé qui, pour être digne d’intérêt, doit présenter une certaine utilité, voire une certaine productivité. De plus, entre l’époque ancienne et le Bas-empire, le statut de l’animal évolue en même temps et dans les mêmes proportions que le droit romain lui-même. En fait, l’adoucissement des mœurs profite aux animaux, dont la condition s’améliore sensiblement
The Compilation of Justinian, monumental work of reformation and systematization of the roman right, contains a lot of fragments relating to the animals. The collection of those paragraphs, followed by their classification, allows to discover, into a legal work, the existence of a true animal dimension. The animal world in this way put together contains two types of animals : on the one hand, the servant animals, which are useful because they are closely brought to human activities ; on the other hand, the wild beasts, whose status is far more ambivalent. The analysis of the texts leads to two series of remarks. The space granted to the animals first reveals the roman utilitarist view about animal : the animal is an animated object which must, to be considered as an interesting thing, presents a real usefullness, even a real productivity. Furthermore, between the ancient period and the lower-empire, the animal status evolves in the same time and in the same way as the roman law itself. Actually, the animals benefit from the softenerness of the habits, their station improves appreciably
APA, Harvard, Vancouver, ISO, and other styles
4

Wilcox, J. "The compilation of Old English homilies in MSS Cambridge, Corpus Christi College, 419 and 421." Thesis, University of Cambridge, 1987. https://www.repository.cam.ac.uk/handle/1810/272589.

Full text
Abstract:
The subject of this study is the compilation of an Old English homiliary contained in the companion volumes, Cambridge, Corpus Christi College, 419 and the original portions of 421 (together designated N in this thesis), written as a hitherto unidentified centre in the first half of the eleventh century. The collection comprises twenty-three Old English homilies: seven by AElfric, six by Wulfstan, and ten of unknown authorship. It is of particular significance as a witness to the use of anonymous homilies in the eleventh century. I provide a commentary on the anonymous homilies, discuss the textual affiliations of the collection as a whole, and investigate its place of origin. A detailed examination of the two manuscripts provides information about the exemplars from which they were copied and the uses to which they were put. I demonstrate that N was a popular collection - it contains corrections and revisions by at least twenty-one different hands - and that it travelled to Exeter at a time when Old English manuscripts were still in use. Eight of the anonymous homilies in N have been edited by A. S. Napier, Wulfstan: Sammlung der ihm zugeschriebenen Homilien (Berlin, 1883), but have never been fully discussed. The ninth has not been adequately edited (it was edited from a single manuscript by A. O. Belfour, Twelfth-Century Homilies in MS Bodley 343, EETS o.s. 133 (London, 1909) as homily VI). I provide an edition from all the surviving manuscripts as an appendix. The unpublished variants of one manuscript of the tenth anonymous homily (edited by Bruno Assmann, Angelsáchsische Homilien und Heiligenleben, Bibliothek der angelsáchsischen Prosa 3 (Kassel, 1889) as homily XI) are listed in a second appendix. I describe the sources of each anonymous homily and show how the homilist has used those sources. I also establish the textual relationship of all surviving manuscripts of the homilies and show how each homily has developed in the course of transmission. The textual relations and development of the homilies by AElfric and Wulfstan are described more briefly. The language of all the homilies is discussed in a separate chapter. As a result of these investigations I demonstrate that N was compiled from eleven different exemplars, some of which had already enjoyed a considerable history by the eleventh century. The collection was compiled to provide basic Christian instruction, which is given added urgency by an insistence on the imminence of judgement. I conclude that it was assembled at a small monastery dominated by Canterbury influences - probably the unknown monastery which the manuscript Cambridge, Trinity College, B.15.34 (containing a collection of AElfric's homilies) travelled to in the Anglo-Saxon period.
APA, Harvard, Vancouver, ISO, and other styles
5

Ferracci, Elsa. "Edition critique, traduction et commentaire du traité hippocratique des Prénotions de Cos." Thesis, Paris 4, 2009. http://www.theses.fr/2009PA040266.

Full text
Abstract:
Les Prénotions de Cos sont l’un des traités du Corpus hippocratique. L’ouvrage, anonyme, qu’on peut dater des environs du dernier tiers du IVe siècle av. J.-C., se présente comme la réunion de 640 propositions, le plus souvent de style aphoristique, reprises pour plus de la moitié à des traités hippocratiques comme les Aphorismes, le Pronostic, le Prorrhétique I, les Epidémies, le groupe des Maladies I-II-III, ou encore Plaies de la tête. Le contenu de la compilation est exclusivement orienté vers le pronostic médical. Le traité n’a connu qu’une postérité limitée dans l’Antiquité, et n’a été transmis ni en syriaque, ni en arabe ; aucune traduction latine n’est connue avant celle de Calvus, en 1525, qui inaugure le retour des érudits vers les manuscrits transmettant le texte. L’introduction donne une présentation de l’ouvrage (titre, datation, lectorat supposé, rapports avec les autres œuvres du Corpus hippocratique, structure et procédés de composition, théories médicales exposées, style, histoire du traité, traditions directe et indirecte, éditions, dialecte). L’étude s’attache à mettre en évidence la fonction didactique du traité, fonction qui explique tant l’organisation générale que certaines spécificités formelles du texte. L’édition opère un retour systématique vers le manuscrit le plus ancien et le plus conservateur, le Parisinus graecus 2253 (A), et s’appuie également sur la tradition indirecte (traités hippocratiques qui constituent les sources du traité, et commentaires de Galien principalement). Le texte critique, présenté avec les Testimonia, est accompagné d’une traduction, d’un commentaire philologique, historique et médical, et d’annexes
The Coan Prognoses are one of the treatises of the Hippocratic Corpus. The work is anonymous and can be dated from about the last third of the IVth century b.C. It is presented in the form of a collection of 640 propositions, most often written in a aphoristic style, that are for more than half of them taken from Hippocratic treatises like Aphorisms, Pronostic, Prorrhetic I, Epidemics, Diseases I-II-III, or On head wounds. The content of the compilation is exclusively devoted to the medical prognosis. The treatise had only a very limited tradition in the Antiquity, and was translated neither in Syriac nor in Arabic. Any latin translation is known before the Calvus translation (1525), which represents the starting point of the scholars return to the manuscripts which pass down the text. The introduction presents the work (title, datation, readership, relationships with the others Hippocratic treatises, structure and devices of composition, medical theories, style, history of the treatise, direct and indirect traditions, editions, dialect). The edition itself is mainly based on the more ancient and conservative manuscript, the Parisinus graecus 2253 (A), and also on the indirect tradition (Hippocratic treatises which represent the sources of the work, and Galenic commentaries). The Greek text, presented with the Testimonia, is accompanied by a French translation, by a philological, historical and medical commentary, and by appendix
APA, Harvard, Vancouver, ISO, and other styles
6

Pourquery, De Boisserin Juliette. "L'énergie chevaleresque : étude de la matière textuelle et iconographique du manuscrit BnF fr.340 (compilation de Rusticien de Pise et Guiron le courtois)." Phd thesis, Université Rennes 2, 2009. http://tel.archives-ouvertes.fr/tel-00458206.

Full text
Abstract:
Guiron le courtois, vaste roman de chevalerie en prose composé du Roman de Meliadus et du Roman de Guiron a été copié, remanié, compilé dans de nombreux manuscrits depuis sa création, entre 1235-1240, jusqu'à la fin du Moyen Âge. Sa matière ouverte à toutes les continuations en fait une œuvre aux contours flous. Le manuscrit BnF fr. 340 (fin XIVe-début XVe siècle), associant le Roman de Meliadus et la Compilation de Rusticien de Pise, forme un ensemble où se dessine la chevalerie errante des pères des grands héros arthuriens, animée par la violence des combats, l'abondance du sang versé, mais aussi par la joie des rencontres et de la parole échangée. C'est cette communauté vivante que cette thèse entreprend d'analyser, à travers l'exploration conjointe du texte et de l'image du manuscrit BnF fr. 340, dont l'analyse iconographique sera ouverte à d'autres programmes de manuscrits de la même période
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Corpus compilation"

1

Exploring newspaper language: Corpus compilation and research based on the Norwegian newspaper corpus. Amsterdam: John Benjamins Pub. Co., 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Stenström, Anna-Brita. Trends in teenage talk: Corpus compilation, analysis, and findings. Amsterdam: J. Benjamins, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Stenström, Anna-Brita. Trends in teenage talk: Corpus compilation, analysis, and findings. Amsterdam: Benjamins, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Justinian's Digest: Character and compilation. Oxford [England]: Oxford University Press, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Baron-Szabo, Rosemarie C. Scleractinian corals of the Cretaceous: A compilation of Cretaceous forms with descriptions, illustrations, and remarks on their taxonomic position ; with 142 plates and 86 text-figures. Knoxville, Tenn: R.C. Baron-Szabo, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Price, Kathleen Marie. The Indus script signary and corpus: A compilation. 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Andersen, Gisle, Anna-Brita Stenstrom, and Ingrid Kristine Hasund. Trends in Teenage Talk: Corpus Compilation, Analysis and Findings (Studies in Corpus Linguistics). John Benjamins Publishing Co, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

H, Hary Benjamin, and Bet ha-sefer le-madaʻe ha-Yahadut ʻal shem Ḥayim Rozenberg., eds. Corpus linguistics and modern Hebrew: Towards the compilation of the corpus of spoken Israeli Hebrew (CoSIH). [Tel Aviv]: The Chaim Rosenberg School of Jewish Studies, Tel Aviv University, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

John, B. ;. Matthews Wendy L. McClay. Corpus Juris Humorous: A Compilation of Humorous, Extraordinary, Outrageous, Unusual, Colorful, Infamous, Clever and Witty Reported Judicial Opinion. Mac-Mat, 1991.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

(Editor), John B. McClay, and Wendy L. Matthews (Editor), eds. Corpus Juris Humorous: In Brief: A Compilation of Outrageous, Unusual, Infamous & Witty Judicial Opinions from 1256 A.D. to the Present. Mac-Mat, 1994.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Corpus compilation"

1

Ädel, Annelie. "Corpus Compilation." In A Practical Handbook of Corpus Linguistics, 3–24. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-46216-1_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Viana, Vander, and Aisling O’Boyle. "Specialized corpus compilation for EAP." In Corpus Linguistics for English for Academic Purposes, 62–100. London: Routledge, 2021. http://dx.doi.org/10.4324/9781003245988-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ribeiro De Mello, Heliana. "Methodological issues for spontaneous speech corpora compilation." In Studies in Corpus Linguistics, 27–68. Amsterdam: John Benjamins Publishing Company, 2014. http://dx.doi.org/10.1075/scl.61.01mel.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Gelbukh, Alexander, Grigori Sidorov, and Liliana Chanona-Hernández. "Compilation of a Spanish Representative Corpus." In Computational Linguistics and Intelligent Text Processing, 285–88. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002. http://dx.doi.org/10.1007/3-540-45715-1_27.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Hu, Kaibao. "Compilation of Corpora for Translation Studies." In Introducing Corpus-based Translation Studies, 35–83. Berlin, Heidelberg: Springer Berlin Heidelberg, 2016. http://dx.doi.org/10.1007/978-3-662-48218-6_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Fletcher, William H. "Facilitating the compilation and dissemination of ad-hoc web corpora." In Studies in Corpus Linguistics, 273–300. Amsterdam: John Benjamins Publishing Company, 2004. http://dx.doi.org/10.1075/scl.17.21fle.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hong, Huaqing. "Compilation and exploration of ICCI corpus for learner language research." In Developmental and Crosslinguistic Perspectives in Learner Corpus Research, 47–62. Amsterdam: John Benjamins Publishing Company, 2012. http://dx.doi.org/10.1075/tufs.4.08hon.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Ogawa, Yasuhiro, Masayuki Yamada, Ryuta Kato, and Katsuhiko Toyama. "Design and Compilation of Syntactically Tagged Corpus of Japanese Statutory Sentences." In New Frontiers in Artificial Intelligence, 141–52. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-25655-4_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Solovyev, Valery, and Vladimir Ivanov. "Automated Compilation of a Corpus-Based Dictionary and Computing Concreteness Ratings of Russian." In Speech and Computer, 554–61. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60276-5_53.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

"II. Corpus compilation and corpus types." In Corpus Linguistics, edited by Anke Lüdeling and Merja Kytö. Berlin, New York: Mouton de Gruyter, 2008. http://dx.doi.org/10.1515/9783110211429.1.154.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Corpus compilation"

1

Alansary, Sameh, and Magdy Nagi. "The International Corpus of Arabic: Compilation, Analysis and Evaluation." In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2014. http://dx.doi.org/10.3115/v1/w14-3602.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kegalj, Jana, and Mirjana Borucinsky. "Genre-based approach to corpus compilation for translation research." In 7th International e-Conference on Studies in Humanities and Social Sciences. Center for Open Access in Science, Belgrade, 2021. http://dx.doi.org/10.32591/coas.e-conf.07.22215k.

Full text
Abstract:
Translation research focuses mainly on parallel and comparable corpora, whereby it is constantly faced with issues of representativeness, balance and comparability as its main constraints. This research aims to introduce the concept of genre as a way of observing linguistic features under controlled conditions. The study analyses the application of external and internal criteria with particular focus on the genre criterion in selecting texts for the compilation of a highly-specialized bilingual maritime legal corpus, consisting of source texts in English and their translations into Croatian. The main advantages and constraints of genre as a criterion are discussed. The main benefits of such an approach are found in its application in translator training and practice. In addition, genre-based approaches to corpus analysis may raise awareness of generic features specific to a target language, ultimately improving the quality of translation.
APA, Harvard, Vancouver, ISO, and other styles
3

Weerasooriya, Tharindu, Nandula Perera, and S. R. Liyanage. "A framework for automated corpus compilation for KeyXtract: Twitter model." In 2017 Seventeenth International Conference on Advances in ICT for Emerging Regions (ICTer). IEEE, 2017. http://dx.doi.org/10.1109/icter.2017.8257783.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Waclawičová, Martina, Michal Křen, and Lucie Válková. "Balanced corpus of informal spoken Czech: compilation, design and findings." In Interspeech 2009. ISCA: ISCA, 2009. http://dx.doi.org/10.21437/interspeech.2009-530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

de Groc, Clement. "Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction." In 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT). IEEE, 2011. http://dx.doi.org/10.1109/wi-iat.2011.253.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hollenstein, Nora, and Noëmi Aepli. "Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging." In Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects. Stroudsburg, PA, USA: Association for Computational Linguistics and Dublin City University, 2014. http://dx.doi.org/10.3115/v1/w14-5310.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Trofimova, Nadezhda A. "The project of an ideographic dictionary of the Russian special vocabulary of construction." In Lexicography of the digital age. TSU Press, 2021. http://dx.doi.org/10.17223/978-5-907442-19-1-2021-31.

Full text
Abstract:
The project of an ideographic dictionary of the Russian construction vocabulary is presented. The main principles of dictionary compilation, the thematic groups of lexical units, and the dictionary entry are presented. Lexical units previously unrecorded in the terminological dictionaries and the National Corpus of the Russian Language are analyzed.
APA, Harvard, Vancouver, ISO, and other styles
8

Savilova, Svetlana L., and Olga G. Shchitova. "The explanatory etymological dictionary of the latest foreign vocabulary in the modern student sociolect." In Lexicography of the digital age. TSU Press, 2021. http://dx.doi.org/10.17223/978-5-907442-19-1-2021-135.

Full text
Abstract:
The project of the explanatory etymological dictionary of the latest foreign vocabulary in the modern student sociolect is presented. Main principles of the dictionary compilation, thematic groups of lexical units, and the structure of the dictionary article are presented. Lexical units previously unrecorded in the terminological dictionaries and the National Corpus of the Russian Language are analyzed.
APA, Harvard, Vancouver, ISO, and other styles
9

Maekawa, Kikuo. "Compilation of the Balanced Corpus of Contemporary Written Japanese in the KOTONOHA Initiative (Invited Paper)." In 2008 Second International Symposium on Universal Communication (ISUC). IEEE, 2008. http://dx.doi.org/10.1109/isuc.2008.82.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Finotti, Vitor, and Bruno Albertini. "An Open-Source Soft-Microcontroller Implementation Using an ARM Cortex-M0 on FPGA." In Workshop em Desempenho de Sistemas Computacionais e de Comunicação. Sociedade Brasileira de Computação - SBC, 2021. http://dx.doi.org/10.5753/wperformance.2021.15726.

Full text
Abstract:
There is a myriad of projects that could be deployed on FPGA for architectural exploration. However, open-source platforms are scarce, and one with embedded software and operating system support to the application-specific hardware could not be found in the literature. We present an open-source soft-microcontroller architecture based on an ARM Cortex-M0, adaptable to different amounts of cores or new components, supporting an end-to-end deployment from code compilation using arm-gcc to loading the binary into the HDL memory cores. The proposed design is validated through simulation and implementation on a KC705 development kit, demonstrating busy-wait polling, DMA transfer, and deterministic real-time processing through FreeRTOS.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Corpus compilation"

1

King, E. L., A. Normandeau, T. Carson, P. Fraser, C. Staniforth, A. Limoges, B. MacDonald, F. J. Murrillo-Perez, and N. Van Nieuwenhove. Pockmarks, a paleo fluid efflux event, glacial meltwater channels, sponge colonies, and trawling impacts in Emerald Basin, Scotian Shelf: autonomous underwater vehicle surveys, William Kennedy 2022011 cruise report. Natural Resources Canada/CMSS/Information Management, 2022. http://dx.doi.org/10.4095/331174.

Full text
Abstract:
A short but productive cruise aboard RV William Kennedy tested various new field equipment near Halifax (port of departure and return) but also in areas that could also benefit science understanding. The GSC-A Gavia Autonomous Underwater Vehicle equipped with bathymetric, sidescan and sub-bottom profiler was successfully deployed for the first time on Scotian Shelf science targets. It surveyed three small areas: two across known benthic sponge, Vazella (Russian Hat) within a DFO-directed trawling closure area on the SE flank of Sambro Bank, bordering Emerald Basin, and one across known pockmarks, eroded cone-shaped depression in soft mud due to fluid efflux. The sponge study sites (~ 150 170 m water depth) were known to lie in an area of till (subglacial diamict) exposure at the seabed. The AUV data identified gravel and cobble-rich seabed, registering individual clasts at 35 cm gridded resolution. A subtle variation in seabed texture is recognized in sidescan images, from cobble-rich on ridge crests and flanks, to limited mud-rich sediment in intervening troughs. Correlation between seabed topography and texture with the (previously collected) Vazella distribution along two transects is not straightforward. However there may be a preference for the sponge in the depressions, some of which have a thin but possibly ephemeral sediment cover. Both sponge study sites depict a hereto unknown morphology, carved in glacial deposits, consisting of a series of discontinuous ridges interpreted to be generated by erosion in multiple, continuous, meandering and cross-cutting channels. The morphology is identical to glacial Nye, or mp;lt;"N-mp;lt;"channels, cut by sub-glacial meltwater. However their scale (10 to 100 times mp;lt;"typicalmp;gt;" N-channels) and the unique eroded medium, (till rather than bedrock), presents a rare or unknown size and medium and suggests a continuum in sub-glacial meltwater channels between much larger tunnel valleys, common to the eastward, and the bedrock forms. A comparison is made with coastal Nova Scotia forms in bedrock. The Emerald Basin AUV site, targeting pockmarks was in ~260 to 270 m water depth and imaged eight large and one small pockmark. The main aim was to investigate possible recent or continuous fluid flux activity in light of ocean acidification or greenhouse gas contribution; most accounts to date suggested inactivity. While a lack of common attributes marking activity is confirmed, creep or rotational flank failure is recognized, as is a depletion of buried diffuse methane immediately below the seabed features. Discovery of a second, buried, pockmark horizon, with smaller but more numerous erosive cones and no spatial correlation to the buried diffuse gas or the seabed pockmarks, indicates a paleo-event of fluid or gas efflux; general timing and possible mechanisms are suggested. The basinal survey also registered numerous otter board trawl marks cutting the surficial mud from past fishing activity. The AUV data present a unique dataset for follow-up quantification of the disturbance. Recent realization that this may play a significant role in ocean acidification on a global scale can benefit from such disturbance quantification. The new pole-mounted sub-bottom profiler collected high quality data, enabling correlation of recently recognized till ridges exposed at the seabed as they become buried across the flank and base of the basin. These, along with the Nye channels, will help reconstruct glacial behavior and flow patterns which to date are only vaguely documented. Several cores provide the potential for stratigraphic dating of key horizons and will augment Holocene environmental history investigations by a Dalhousie University student. In summary, several unique features have been identified, providing sufficient field data for further compilation, analysis and follow-up publications.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography