Academic literature on the topic 'Corpus-Based Syntax'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Corpus-Based Syntax.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Corpus-Based Syntax"

1

Mikulová, Marie, Eduard Bejček, Veronika Kolářová, and Jarmila Panevová. "Subcategorization of Adverbial Meanings Based on Corpus Data." Journal of Linguistics/Jazykovedný casopis 68, no. 2 (December 1, 2017): 268–77. http://dx.doi.org/10.1515/jazcas-2017-0036.

Full text
Abstract:
Abstract We introduce a corpus based description of selected adverbial meanings in Czech sentences. Its basic repertory is one of a long lasting tradition in both scientific and school grammars. However, before the corpus era, researchers had to rely on their own excerption; but nowadays, current syntax has a vast material basis in the form of electronic corpora available. On the case of spatial adverbials, we describe our methodology which we used to acquire a detailed, comprehensive, well-arranged description of meanings of adverbials including a list of formal realizations with examples. Theoretical knowledge stemming from this work will lead into an improval of the annotation of the meanings in the Prague Dependency Treebanks which serve as the corpus sources for our research. The Prague Dependency Treebanks include data manually annotated on the layer of deep syntax and thus provide a large amount of valuable examples on the basis of which the meanings of adverbials can be defined more accurately and subcategorized more precisely. Both theoretical and practical results will subsequently be used in NLP, such as machine translation.
APA, Harvard, Vancouver, ISO, and other styles
2

Kong, Leilei, Zhongyuan Han, Yong Han, and Haoliang Qi. "A Deep Paraphrase Identification Model Interacting Semantics with Syntax." Complexity 2020 (October 30, 2020): 1–14. http://dx.doi.org/10.1155/2020/9757032.

Full text
Abstract:
Paraphrase identification is central to many natural language applications. Based on the insight that a successful paraphrase identification model needs to adequately capture the semantics of the language objects as well as their interactions, we present a deep paraphrase identification model interacting semantics with syntax (DPIM-ISS) for paraphrase identification. DPIM-ISS introduces the linguistic features manifested in syntactic features to produce more explicit structures and encodes the semantic representation of sentence on different syntactic structures by means of interacting semantics with syntax. Then, DPIM-ISS learns the paraphrase pattern from this representation interacting the semantics with syntax by exploiting a convolutional neural network with convolution-pooling structure. Experiments are conducted on the corpus of Microsoft Research Paraphrase (MSRP), PAN 2010 corpus, and PAN 2012 corpus for paraphrase plagiarism detection. The experimental results demonstrate that DPIM-ISS outperforms the classical word-matching approaches, the syntax-similarity approaches, the convolution neural network-based models, and some deep paraphrase identification models.
APA, Harvard, Vancouver, ISO, and other styles
3

Tetreault, Joel R. "A Corpus-Based Evaluation of Centering and Pronoun Resolution." Computational Linguistics 27, no. 4 (December 2001): 507–20. http://dx.doi.org/10.1162/089120101753342644.

Full text
Abstract:
In this paper we compare pronoun resolution algorithms and introduce a centering algorithm(Left-Right Centering) that adheres to the constraints and rules of centering theory and is an alternative to Brennan, Friedman, and Pollard's (1987) algorithm. We then use the Left-Right Centering algorithm to see if two psycholinguistic claims on Cf-list ranking will actually improve pronoun resolution accuracy. Our results from this investigation lead to the development of a new syntax-based ranking of the Cf-list and corpus-based evidence that contradicts the psycholinguistic claims.
APA, Harvard, Vancouver, ISO, and other styles
4

Duo, Jiecairang, Quecairang Hua, Keyou Huan, and Rangdangzhi Cai. "Transition based neural network dependency parsing of Tibetan." MATEC Web of Conferences 336 (2021): 06018. http://dx.doi.org/10.1051/matecconf/202133606018.

Full text
Abstract:
In order to improve the performance of Tibetan natural language processing applications such as machine translation, sentiment analysis and other tasks, this article proposes a neural network-based method for syntactic analysis of Tibetan language dependence. Part of the corpus of Qinghai Normal University’s part-of-speech tag set is marked by the corresponding mapping relationship is transformed into the corpus annotated by the national standard part-of-speech tag set. At the same time, the CoNLL format Tibetan language dependency syntax tree library is constructed, and the method of shift-reduce plus neural network is adopted to systematically study and analyze the Tibetan language dependency syntax. Thereby improving the quality of Tibetan dependency syntactic analysis, and its accuracy rate reaches UAS:94.59%
APA, Harvard, Vancouver, ISO, and other styles
5

TENUTA, Adriana Maria, Ana Larissa A. M. OLIVEIRA, and Bárbara Malveira ORFANÓ. "How Brazilian learners express modality through verbs and adverbs in their writing: a corpus-based study on n-grams." DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada 31, no. 2 (December 2015): 333–57. http://dx.doi.org/10.1590/0102-445071548936077492.

Full text
Abstract:
Based on the view of modality in the theoretical framework of descriptive syntax, this study examined a corpus of learners compared with a corpus of native speakers of English, aiming to identify different patterns of expression of modal meanings, particularly, adverbs and modal verbs. Therefore, the study focused its analysis on n-grams containing modal verbs and adverbs that express modality. This analysis revealed the prevalence of epistemic values in both corpora, and the existence of distinct patterns in the expression of this type of modality. In the non-native corpus, the expression of modality is restricted when compared to the native speakers'. In the corpus of native speakers, there was a prevalence of adverbs with modalizing meanings. In addition, learners tend to use some modal verbs differently. This study may contribute to the emerging field of corpora linguistic studies as well as to the area of syntax, with possible implications for the teaching of academic writing in English.
APA, Harvard, Vancouver, ISO, and other styles
6

Mastromattei, Michele, Leonardo Ranaldi, Francesca Fallucchi, and Fabio Massimo Zanzotto. "Syntax and prejudice: ethically-charged biases of a syntax-based hate speech recognizer unveiled." PeerJ Computer Science 8 (February 3, 2022): e859. http://dx.doi.org/10.7717/peerj-cs.859.

Full text
Abstract:
Hate speech recognizers (HSRs) can be the panacea for containing hate in social media or can result in the biggest form of prejudice-based censorship hindering people to express their true selves. In this paper, we hypothesized how massive use of syntax can reduce the prejudice effect in HSRs. To explore this hypothesis, we propose Unintended-bias Visualizer based on Kermit modeling (KERM-HATE): a syntax-based HSR, which is endowed with syntax heat parse trees used as a post-hoc explanation of classifications. KERM-HATE significantly outperforms BERT-based, RoBERTa-based and XLNet-based HSR on standard datasets. Surprisingly this result is not sufficient. In fact, the post-hoc analysis on novel datasets on recent divisive topics shows that even KERM-HATE carries the prejudice distilled from the initial corpus. Therefore, although tests on standard datasets may show higher performance, syntax alone cannot drive the “attention” of HSRs to ethically-unbiased features.
APA, Harvard, Vancouver, ISO, and other styles
7

Andrushenko, Olena. "Corpus-based studies of Middle English adverb largely: syntax and information-structure." XLinguae 14, no. 2 (April 2021): 60–75. http://dx.doi.org/10.18355/xl.2021.14.02.05.

Full text
Abstract:
The study aims at exploring the adverb largely in late Middle English based on the Corpus of Middle English Prose and Verse, in terms of its functioning as a sentence Focus marker. The article considers syntactic changes in English from the language with V2 tendencies to the one with verb-medial order. Such differences make sentence information structure disrupted, and new elements arise in the language as ‘therapy.’ The assumption made in this paper is as follows: the word largely emerging in English in ca. 1200 starts functioning as a focusing adverb in 1400 as a result of the shift in the main word order patterns. Moreover, investigating late Middle English syntactic structure and taking into account different types of foci based on information structure tagging throughout the Corpus, the study found that positional variations of adverb largely are used as a mechanism of marking a peculiar type of Focus and are governed by its position in relation to the word it modifies.
APA, Harvard, Vancouver, ISO, and other styles
8

Schneider, Ulrike. "The syntax of metaphor." Yearbook of the German Cognitive Linguistics Association 9, no. 1 (November 1, 2021): 47–70. http://dx.doi.org/10.1515/gcla-2021-0003.

Full text
Abstract:
Abstract This paper analyses diachronic changes which result from metaphorical extension. Its aim is to assess whether such semantic shifts may lead to further semantic and syntactic differentiation between the verb senses and whether they can be described as shifts away or towards prototypical transitivity (cf. Hopper & Thompson 1980). It focusses on changes the verb derail underwent in the 19th and 20th centuries. In a corpus-based analysis, it utilises CART trees and a random forest to determine which syntactic and semantic properties differentiate literal and metaphorical uses of derail. Results reveal a syntactic shift from transitive to intransitive in the older literal construction which hardly affects the younger metaphorical one. This indicates that differentiation can be an epiphenomenon of semantic shifts.
APA, Harvard, Vancouver, ISO, and other styles
9

Hermawan, Nuri. "Representasi Anies dan Ganjar pada Bursa Calon Presiden Indonesia 2024 dalam Berita Online Okezone.com." Syntax Literate ; Jurnal Ilmiah Indonesia 6, no. 1 (November 18, 2021): 24. http://dx.doi.org/10.36418/syntax-literate.v6i1.4613.

Full text
Abstract:
Penelitian ini bertujuan untuk mengungkap representasi nama Anies Baswedan dan Ganjar Pranowo sebagai tokoh yang sering disebut-sebut dan unggul dalam beberapa survei sebagai calon Presiden Indonesia tahun 2024. Menggunakan sumber dari data bahasa yang muncul pada berita dalam jaringan okezone.com, penelitian ini menggunakan metode analisis wacana kritis dengan bantuan linguistik korpus atau Corpus-Assisted Critical Discourse Analysis. Pendekatan pada penelitian ini menggunakan pendekatan corpus-driven dan corpus-based yang bertujuan untuk membantu pemilihan sumber data, pengumpulan data, dan identifikasi topik berita yang menggambarkan bagaimana dua sosok kandidat terkuat yang muncul pada bursa Pilpres 2024. Selanjutnya, teknik linguistik korpus yang digunakan pada penelitian ini bertujuan untuk menganalisis kompilasi korpus yang meliputi frekuensi, kata kunci, kelompok, kolokasi, dan konkordansi. Analisis kritis terhadap data diungkapkan dengan melihat representasi dua nama tokoh yang muncul dan sengaja diciptakan oleh media okezone.com. Representasi kandidat Pilpres 2024 tersebut dilakukan dengan menggunakan tipe-tipe wacana yang dilatarbelakangi arah penggiringan opini pada media okezone.com. Dari representasi dua nama kandidat yang sering muncul ditarik kesimpulan bahwa keduanya merupakan sosok yang pantas maju pada kontestasi Pilpres 2024. Namun, sosok Anies digambarkan sebagai figur yang maju dengan jalan yang tenang, sedang Ganjar digambarkan figur yang punya ambisi dan sudah memetakan langkah menuju Pilpres 2024
APA, Harvard, Vancouver, ISO, and other styles
10

Taylor, Ann. "Treebanks in Historical Syntax." Annual Review of Linguistics 6, no. 1 (January 14, 2020): 195–212. http://dx.doi.org/10.1146/annurev-linguistics-011619-030515.

Full text
Abstract:
Over the last 20 years, the development of a wide range of treebanks that track the evolution of languages’ syntactic patterns through time has revolutionized the field of historical syntax. The range of treebanks now available facilitates research into the long histories of many of the major Indo-European languages. Although the field's essentially corpus-based methodology has not changed, the quantity of data now available and the ease and precision with which those data can be extracted have created new opportunities. For example, with a treebank it is possible to extract all examples of surface strings associated only with abstract structures (e.g., relative clauses, extraposition), to investigate predictions made by syntactic analyses, to search for rare constructions, and to extract enough data to support sophisticated statistical analyses. Crucially, treebanks make verification and replicability of results possible.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Corpus-Based Syntax"

1

Blöhdorn, Lars M. "Postmodifying attributive adjectives in English an integrated corpus-based approach." Frankfurt, M. Berlin Bern Bruxelles New York, NY Oxford Wien Lang, 2008. http://d-nb.info/99131008X/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Gyawali, Bikash. "Surface Realisation from Knowledge Bases." Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0004/document.

Full text
Abstract:
La Génération Automatique de Langue Naturelle vise à produire des textes dans une langue humaine à partir d'un ensemble de données non-linguistiques. Elle comprend généralement trois sous-tâches principales: (i) sélection et organisation d'un sous-ensemble des données d'entrée; ii) détermination des mots à utiliser pour verbaliser les données d'entrée; et (iii) regroupement de ces mots en un texte en langue naturelle. La dernière sous-tâche est connue comme la tâche de Réalisation de Surface (RS). Dans ma thèse, j'étudie la tâche de RS quand les données d'entrée sont extraites de Bases de Connaissances (BC). Je présente deux nouvelles approches pour la réalisation de surface à partir de bases de connaissances: une approche supervisée et une approche faiblement supervisée. Dans l'approche supervisée, je présente une méthode basée sur des corpus pour induire une grammaire à partir d'un corpus parallèle de textes et de données. Je montre que la grammaire induite est compacte et suffisamment générale pour traiter les données de test. Dans l'approche faiblement supervisée, j'explore une méthode pour la réalisation de surface à partir de données extraites d'une BC qui ne requière pas de corpus parallèle. À la place, je construis un corpus de textes liés au domaine et l'utilise pour identifier les lexicalisations possibles des symboles de la BC et leurs modes de verbalisation. J'évalue les phrases générées et analyse les questions relatives à l'apprentissage à partir de corpus non-alignés. Dans chacune de ces approches, les méthodes proposées sont génériques et peuvent être facilement adaptées pour une entrée à partir d'autres ontologies
Natural Language Generation is the task of automatically producing natural language text to describe information present in non-linguistic data. It involves three main subtasks: (i) selecting the relevant portion of input data; (ii) determining the words that will be used to verbalise the selected data; and (iii) mapping these words into natural language text. The latter task is known as Surface Realisation (SR). In my thesis, I study the SR task in the context of input data coming from Knowledge Bases (KB). I present two novel approaches to surface realisation from knowledge bases: a supervised approach and a weakly supervised approach. In the first, supervised, approach, I present a corpus-based method for inducing a Feature Based Lexicalized Tree Adjoining Grammar from a parallel corpus of text and data. I show that the induced grammar is compact and generalises well over the test data yielding results that are close to those produced by a handcrafted symbolic approach and which outperform an alternative statistical approach. In the weakly supervised approach, I explore a method for surface realisation from KB data which does not require a parallel corpus. Instead, I build a corpus from heterogeneous sources of domain-related text and use it to identify possible lexicalisations of KB symbols and their verbalisation patterns. I evaluate the output sentences and analyse the issues relevant to learning from non-parallel corpora. In both these approaches, the proposed methods are generic and can be easily adapted for input from other ontologies for which a parallel/non-parallel corpora exists
APA, Harvard, Vancouver, ISO, and other styles
3

Faghiri, Pegah. "La variation de l'ordre des constituants dans le domaine préverbal en persan : approche empirique." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCA161.

Full text
Abstract:
Cette thèse propose une étude quantitative de la variation de l'ordre des constituants en persan avec un intérêt particulier pour l'ordre relatif entre le COD et le COI étant donné son rôle crucial dans les analyses de la structure du SV. Afin de remédier à une lacune empirique dont souffre l’étude de la syntaxe du persan, notre premier objectif est d’évaluer, à partir de données empiriques robustes, l’hypothèse largement admise selon laquelle il existe un ordre relatif canonique dichotomique entre les compléments verbaux, dépendant du marquage différentiel de l'objet (MDO). Notre second objectif, relatif à la linguistique générale et à la typologie, est de contribuer aux débats controversés sur les préférences translinguistiques de l'ordre des mots en étudiant, dans une langue SOV à structure mixte, les effets des facteurs tels que le poids (ou la longueur relative). Les résultats de nos études de corpus et de nos expériences montrent l’inadéquation du critère MDO pour expliquer l’ordre relatif entre le COD et le COI. Cette conclusion nous conduit à réfuter également l’hypothèse de la position syntaxique double de l’objet au profit d’une structure plate pour le SV. De plus, nos données révèlent une préférence « long-avant-court » subordonnée aux facteurs contribuant à la saillance, tels que la définitude, l'animéité et le rôle grammatical. Nous arguons que cette préférence échappe, en partie ou totalement, aux modèles du traitement se fondant sur la distance entre la tête et ses dépendants, alors qu’elle est compatible avec l'hypothèse selon laquelle dans les langues SOV l'accessibilité conceptuelle des constituants longs favorise leur production plus en amont dans la phrase
This thesis proposes a quantitative study of word order variations in Persian, focusing on the relative order between the direct object (DO) and the indirect object (IO). The latter plays a crucial role in the theoretical analyses of the VP, which in the absence of quantitative studies lack solid empirical underpinning. My first goal is to contribute to the study of Persian syntax by providing reliable data in order to evaluate the prevailing hypothesis according to which there exists a dual canonical relative order between the two objects triggered by the Differential Object marking (DOM). My second goal is to contribute to the ongoing debates on word order preferences in general linguistics and typology by bringing in data on an SOV language with mixed head-direction. To this end, I study the effect of factors such as grammatical weight (or relative length), which are claimed to influence the linear order across languages. First, the results of our corpus and experimental studies show that the DOM account of the relative order between the DO and the IO is flawed. Based on this conclusion, I also reject the two object positions hypothesis and plead for a flat structure view of the VP. Second, our data reveal a “long-before-short” preference, which is shown to depend on the effect of salience-enhancing factors such as definiteness, animacy and the grammatical role. I argue that while this preference is, either totally or partially, incompatible with the predictions of processing-oriented dependency-based models, it can be accounted for by production models assuming that the greater conceptual accessibility of longer constituents favors their early position in SOV languages
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Corpus-Based Syntax"

1

1949-, Takagaki Toshihiro, ed. Corpus-based approaches to sentence structures. Amsterdam :aPhiladelphia: J. Benjamins Pub., 2005.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Exploring newspaper language: Corpus compilation and research based on the Norwegian newspaper corpus. Amsterdam: John Benjamins Pub. Co., 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Corpus-based studies of lesser-described languages: The CorpAfroAs corpus of spoken AfroAsiatic languages. Amsterdam: John Benjamins Publishing Company, 2015.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Noun complementation in English: A corpus-based study of structural types and patterns. Göteborg, Sweden: Göteborg University, Dept. of English, 2005.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pérez-Guerra, Javier. Historical English syntax: A statistical corpus based study on the organisation of early modern English sentences. München: LINCOM EUROPA, 1999.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Basciano, Bianca, Franco Gatti, and Anna Morbiato. Corpus-Based Research on Chinese Language and Linguistics. Venice: Fondazione Università Ca’ Foscari, 2020. http://dx.doi.org/10.30687/978-88-6969-406-6.

Full text
Abstract:
This volume collects papers presenting corpus-based research on Chinese language and linguistics, from both a synchronic and a diachronic perspective. The contributions cover different fields of linguistics, including syntax and pragmatics, semantics, morphology and the lexicon, sociolinguistics, and corpus building. There is now considerable emphasis on the reliability of linguistic data: the studies presented here are all grounded in the tenet that corpora, intended as collections of naturally occurring texts produced by a variety of speakers/writers, provide a more robust, statistically significant foundation for linguistic analysis. The volume explores not only the potential of using corpora as tools allowing access to authentic language material, but also the challenges involved in corpus interrogation, analysis, and building.
APA, Harvard, Vancouver, ISO, and other styles
7

Szmrecsanyi, Benedikt. Grammatical variation in British English dialects: A study in corpus-based dialectometry. Cambridge, [England]: Cambridge University Press, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Knut, Hofland, ed. Frequency analysis of English vocabulary and grammar: Based on the LOB corpus. Oxford [England]: Clarendon Press, 1989.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Calabrese, Rita. Insights into the lexicon-syntax interface in Italian learners' English: A generative framework for a corpus-based analysis. Roma: Aracne, 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Complementation in British and Americal English: Corpus-based studies on prepositions and complement clauses in British and American English. Lanham, MD: University press of America, 2005.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Corpus-Based Syntax"

1

Bemposta-Rivas, Sofia. "A corpus-based study on the development of dare in Middle English and Early Modern English." In Developments in English Historical Morpho-Syntax, 129–48. Amsterdam: John Benjamins Publishing Company, 2019. http://dx.doi.org/10.1075/cilt.346.07riv.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Özsoy, A. Sumru. "Argument structure, animacy, syntax and semantics of passivization in Turkish: A corpus-based approach." In Corpus Analysis and Variation in Linguistics, 259–79. Amsterdam: John Benjamins Publishing Company, 2009. http://dx.doi.org/10.1075/tufs.1.16ozs.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Horton, Julian. "First-Theme Syntax in Brahms’s Sonata Forms." In Rethinking Brahms, 195—C10.T7. Oxford University PressNew York, 2022. http://dx.doi.org/10.1093/oso/9780197541739.003.0011.

Full text
Abstract:
Abstract The recent burgeoning of interest in the analysis of nineteenth-century sonata forms under the aegis of the so-called New Formenlehre has opened up new avenues of research for scholars of Brahms’s music. Taking its bearings especially from the work of James Webster and William Caplin, this chapter explores one such avenue, by applying a systematic, corpus-based methodology to the study of first-theme syntax in Brahms’s first-movement sonata forms. The chapter develops a categorical framework for theorizing Brahms’s thematic syntax, before addressing in detail the compositional choices that characterize Brahms’s paths from thematic initiation to closure.
APA, Harvard, Vancouver, ISO, and other styles
4

"5 Experiential Constructions in Late Latin and Old Italian: A Corpus-based Investigation into Diachronic Syntax." In Experiential Constructions in Latin, 190–250. BRILL, 2014. http://dx.doi.org/10.1163/9789004257832_006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wolfe, Sam. "Rethinking Medieval Romance Verb Second." In Rethinking Verb Second, 348–67. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198844303.003.0015.

Full text
Abstract:
This chapter offers a reappraisal of the place of Medieval Romance languages within the V2 typology based on novel corpus data. A review of the available primary and secondary evidence provides compelling evidence that the Medieval Romance languages considered (French, Occitan, Sicilian, Venetian, and Spanish) were V2 languages, with V-to-C movement and XP-merger in the left periphery. The second half of the chapter focuses in detail on Old Sicilian and Old French, arguing that although both show certain commonalities, the height of the V2 bottleneck is distinct with thirteenth-century French showing a stricter V2 syntax than Old Sicilian. This is linked to the former’s status as a high V2 language with a locus for V2 on Force, as opposed to Fin where the constraint is operative in Sicilian.
APA, Harvard, Vancouver, ISO, and other styles
6

Sweany, Erin E. "Dangerous Voices, Erased Bodies." In Feminist Approaches to Early Medieval English Studies. Nieuwe Prinsengracht 89 1018 VR Amsterdam Nederland: Amsterdam University Press, 2022. http://dx.doi.org/10.5117/9789463721462_ch09.

Full text
Abstract:
Entry 57 of Leechbook III is a brief, vague remedy that is ambiguous but has a long history of being interpreted as a reference to witchcraft. This essay assesses how potential female voices and bodies in the Old English medical corpus have been interpreted as agents of harm by modern scholars, based on thin philological evidence. This essay relies on a combination of lexis, syntax, and context to interpret a thinly attested Old English illness label and propose a previously overlooked female patient in Leechbook III. The case of wif gemædla serves as a reminder that scholarship of early medieval English medicine continues to rely heavily on nineteenth-century translations and editions, which has left us with a legacy of outdated editorial and cultural assumptions that now require updating.
APA, Harvard, Vancouver, ISO, and other styles
7

"Towards a Comprehensive Survey of Register-based Variation in Spanish Syntax." In Corpus Linguistics Beyond the Word, 73–85. Brill | Rodopi, 2007. http://dx.doi.org/10.1163/9789401203845_006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Auziņa, Ilze, Kristīne Levāne-Petrova, Roberts Darģis, Kristīne Pokratniece, and Inga Kaija. "Latviešu valodas apguvēju korpusa (LaVA) izmantošana pētniecībā un mācību uzdevumu izstrādē." In Latviešu valodas apguve. XIII Starptautiskais baltistu kongress : rakstu krājums, 142–61. Liepājas Universitāte, 2021. http://dx.doi.org/10.37384/lva.2021.142.

Full text
Abstract:
The Latvian Language Learners Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia, includes more than 1000 texts created by foreign Latvian language learners studying at Latvian higher education institutions for the first or second semester reaching A1 (possibly A2) Latvian language proficiency level. The size of the corpus is more than 180 000 words. The morphologically annotated texts have been checked manually; the language learners' errors have been manually annotated. In addition, each text is accompanied by information about the author of the text (metadata): gender, age, native language, knowledge of other languages. When analysing the data, this information can be used to determine how the learner's mother tongue and language skills, in general, affect the acquisition of the Latvian language. Users of the corpus can analyse the data both on the LaVA website (see http://lava.korpuss.lv/search) and in the SketchEngine tool, where the quantitative and qualitative analysis of the data can be performed. The quantitative approach makes it possible to find out the tendencies of the use of a word, word form, or construction and allows to determine the frequency of mistakes made by language learners. In addition, the objectivity of the research is ensured by looking at the data of language learners from different aspects and performing repeated analysis. For example, by statistically analysing the nouns used in learners' texts, it can be concluded that declension 4 nouns are most often used. The next in terms of frequency of use are declension 1, 5 and 2 nouns, while declension 3 and 6 nouns and indeclinable nouns are used very rarely. Qualitative analysis reveals certain features of morphology and word formation, including aspects of syntax, based on empirical data. It is possible to qualitatively analyse the erroneous use of nouns, verbs, or other parts of speech, trying to understand what rules determine this. For example, consider using non-reflexive verbs instead of reflexive verbs, using infinitives instead of finite forms (person forms), using a suffix that does not fit the noun paradigm, etc. According to LaVA data analysis, including learners error analysis, exercises and tests are generated. The exercises are intended to help the language learner to strengthen the linguistic competence of the Latvian language, for example, the use of verb forms in the indicative mood, both in indefinite and perfect tense forms. Exercise creation consists of three stages: (1) analysis of LaVA errors and identification of typical errors, (2) Collecting of sample sentences from various corpora of the Latvian language, for example, LVK2018, Saeima, with word forms and constructions in which language learners most often make mistakes in LaVA texts, (3) generation of different exercises using the selected sample sentences.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Corpus-Based Syntax"

1

Gibbon, Dafydd. "Corpus-based syntax-prosody tree matching." In 8th European Conference on Speech Communication and Technology (Eurospeech 2003). ISCA: ISCA, 2003. http://dx.doi.org/10.21437/eurospeech.2003-61.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Qingzhi, Sun, Du Qingfeng, Zhang Chenxi, and Li Jun. "Chinese News Event Corpus Construction Method Based on Syntax Tree." In ICBDT 2020: 2020 3rd International Conference on Big Data Technologies. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3422713.3422741.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Piperski, A. Ch. "RUSSIAN LANGUAGE AND CORPUS DIVERSITY." In International Conference on Computational Linguistics and Intellectual Technologies "Dialogue". Russian State University for the Humanities, 2020. http://dx.doi.org/10.28995/2075-7182-2020-19-615-627.

Full text
Abstract:
This paper discusses the use of most widely-known Russian corpora, namely Russian National Corpus, ruTenTen, General Internet Corpus of Russian, and Araneum Russicum Maximum, for the theoretical study of Russian language. Based on a sample of papers from 2019, I demonstrate that scholars, especially theoretical linguists, tend to ignore the opportunities provided by a wide range of Web corpora, even though these resources are well-known to the NLP community. I present a selection of case studies to show that data from “non-classical” corpora can be used for studying various linguistic phenomena, such as: 1) variation in morphology and syntax; 2) word formation and lexical change; 3) construction grammar. I also claim that the underuse of non-classical corpora is partly due to the fact that they are (perceived as) not quite user-friendly.
APA, Harvard, Vancouver, ISO, and other styles
4

Zimmerling, Anton. "Historical Text Corpora and the Conclusiveness of Linguistic Analysis." In Dialogue. RSUH, 2022. http://dx.doi.org/10.28995/2075-7182-2022-21-586-593.

Full text
Abstract:
I discuss the methodology and conclusiveness of the corpus-based historical linguistics and analyze two formal models predicting the language-internal variation in Early Old Russian syntax. Linguistic models claiming a rigid distribution of grammatical features like ± overt realization of agreement markers activate hidden corpus characteristics such as profiles of text genres, chronology, vector of change, ± impact of L2, ± presence of supra-dialect features. In this case they can be valued and checked on text samples, where genre features are stable, while location and time vary.
APA, Harvard, Vancouver, ISO, and other styles
5

Kuvshinova, T. "SENTENCE COMPRESSION FOR RUSSIAN: DATASET AND BASELINES." In International Conference on Computational Linguistics and Intellectual Technologies "Dialogue". Russian State University for the Humanities, 2020. http://dx.doi.org/10.28995/2075-7182-2020-19-517-528.

Full text
Abstract:
Sentence compression is the task of removing redundant information from a sentence while preserving its original meaning. In this paper, we approach deletion-based sentence compression for the Russian language. We use the data from the plagiarism detection corpus (ParaPlag) to create a corpus for sentence compression in Russian of almost 3,000 pairs of sentences. We align source sentences and their compressions using the NeedlemanWunsch algorithm and perform human-evaluation of the corpus by readability and informativeness. Then we use bidirectional LSTM to solve sentence-compression task for Russian, which is a typical baseline for the problem. We also experiment with RuBert and Bert-multilingual. For the latter, we use transfer-learning, firstly pretraining the model on English data, which improves performance. We conduct human evaluation by readability and informativeness and do error analysis for the models. We are able to achieve f-measure of 74.8%, readability of 3.88 and informativeness of 3.47 (out of 5) on test data. We also implement post-hoc syntax-based evaluator, which can detect some of the wrong compressions, increasing overall quality of the system. We provide the data and baseline results for future studies.
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Cao, Shizhu He, Kang Liu, and Jun Zhao. "Curriculum Learning for Natural Answer Generation." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/587.

Full text
Abstract:
By reason of being able to obtain natural language responses, natural answers are more favored in real-world Question Answering (QA) systems. Generative models learn to automatically generate natural answers from large-scale question answer pairs (QA-pairs). However, they are suffering from the uncontrollable and uneven quality of QA-pairs crawled from the Internet. To address this problem, we propose a curriculum learning based framework for natural answer generation (CL-NAG), which is able to take full advantage of the valuable learning data from a noisy and uneven-quality corpus. Specifically, we employ two practical measures to automatically measure the quality (complexity) of QA-pairs. Based on the measurements, CL-NAG firstly utilizes simple and low-quality QA-pairs to learn a basic model, and then gradually learns to produce better answers with richer contents and more complete syntaxes based on more complex and higher-quality QA-pairs. In this way, all valuable information in the noisy and uneven-quality corpus could be fully exploited. Experiments demonstrate that CL-NAG outperforms the state-of-the-arts, which increases 6.8% and 8.7% in the accuracy for simple and complex questions, respectively.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography