Journal articles on the topic 'Corpus-Based Syntax'

To see the other types of publications on this topic, follow the link: Corpus-Based Syntax.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Corpus-Based Syntax.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mikulová, Marie, Eduard Bejček, Veronika Kolářová, and Jarmila Panevová. "Subcategorization of Adverbial Meanings Based on Corpus Data." Journal of Linguistics/Jazykovedný casopis 68, no. 2 (December 1, 2017): 268–77. http://dx.doi.org/10.1515/jazcas-2017-0036.

Full text
Abstract:
Abstract We introduce a corpus based description of selected adverbial meanings in Czech sentences. Its basic repertory is one of a long lasting tradition in both scientific and school grammars. However, before the corpus era, researchers had to rely on their own excerption; but nowadays, current syntax has a vast material basis in the form of electronic corpora available. On the case of spatial adverbials, we describe our methodology which we used to acquire a detailed, comprehensive, well-arranged description of meanings of adverbials including a list of formal realizations with examples. Theoretical knowledge stemming from this work will lead into an improval of the annotation of the meanings in the Prague Dependency Treebanks which serve as the corpus sources for our research. The Prague Dependency Treebanks include data manually annotated on the layer of deep syntax and thus provide a large amount of valuable examples on the basis of which the meanings of adverbials can be defined more accurately and subcategorized more precisely. Both theoretical and practical results will subsequently be used in NLP, such as machine translation.
APA, Harvard, Vancouver, ISO, and other styles
2

Kong, Leilei, Zhongyuan Han, Yong Han, and Haoliang Qi. "A Deep Paraphrase Identification Model Interacting Semantics with Syntax." Complexity 2020 (October 30, 2020): 1–14. http://dx.doi.org/10.1155/2020/9757032.

Full text
Abstract:
Paraphrase identification is central to many natural language applications. Based on the insight that a successful paraphrase identification model needs to adequately capture the semantics of the language objects as well as their interactions, we present a deep paraphrase identification model interacting semantics with syntax (DPIM-ISS) for paraphrase identification. DPIM-ISS introduces the linguistic features manifested in syntactic features to produce more explicit structures and encodes the semantic representation of sentence on different syntactic structures by means of interacting semantics with syntax. Then, DPIM-ISS learns the paraphrase pattern from this representation interacting the semantics with syntax by exploiting a convolutional neural network with convolution-pooling structure. Experiments are conducted on the corpus of Microsoft Research Paraphrase (MSRP), PAN 2010 corpus, and PAN 2012 corpus for paraphrase plagiarism detection. The experimental results demonstrate that DPIM-ISS outperforms the classical word-matching approaches, the syntax-similarity approaches, the convolution neural network-based models, and some deep paraphrase identification models.
APA, Harvard, Vancouver, ISO, and other styles
3

Tetreault, Joel R. "A Corpus-Based Evaluation of Centering and Pronoun Resolution." Computational Linguistics 27, no. 4 (December 2001): 507–20. http://dx.doi.org/10.1162/089120101753342644.

Full text
Abstract:
In this paper we compare pronoun resolution algorithms and introduce a centering algorithm(Left-Right Centering) that adheres to the constraints and rules of centering theory and is an alternative to Brennan, Friedman, and Pollard's (1987) algorithm. We then use the Left-Right Centering algorithm to see if two psycholinguistic claims on Cf-list ranking will actually improve pronoun resolution accuracy. Our results from this investigation lead to the development of a new syntax-based ranking of the Cf-list and corpus-based evidence that contradicts the psycholinguistic claims.
APA, Harvard, Vancouver, ISO, and other styles
4

Duo, Jiecairang, Quecairang Hua, Keyou Huan, and Rangdangzhi Cai. "Transition based neural network dependency parsing of Tibetan." MATEC Web of Conferences 336 (2021): 06018. http://dx.doi.org/10.1051/matecconf/202133606018.

Full text
Abstract:
In order to improve the performance of Tibetan natural language processing applications such as machine translation, sentiment analysis and other tasks, this article proposes a neural network-based method for syntactic analysis of Tibetan language dependence. Part of the corpus of Qinghai Normal University’s part-of-speech tag set is marked by the corresponding mapping relationship is transformed into the corpus annotated by the national standard part-of-speech tag set. At the same time, the CoNLL format Tibetan language dependency syntax tree library is constructed, and the method of shift-reduce plus neural network is adopted to systematically study and analyze the Tibetan language dependency syntax. Thereby improving the quality of Tibetan dependency syntactic analysis, and its accuracy rate reaches UAS:94.59%
APA, Harvard, Vancouver, ISO, and other styles
5

TENUTA, Adriana Maria, Ana Larissa A. M. OLIVEIRA, and Bárbara Malveira ORFANÓ. "How Brazilian learners express modality through verbs and adverbs in their writing: a corpus-based study on n-grams." DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada 31, no. 2 (December 2015): 333–57. http://dx.doi.org/10.1590/0102-445071548936077492.

Full text
Abstract:
Based on the view of modality in the theoretical framework of descriptive syntax, this study examined a corpus of learners compared with a corpus of native speakers of English, aiming to identify different patterns of expression of modal meanings, particularly, adverbs and modal verbs. Therefore, the study focused its analysis on n-grams containing modal verbs and adverbs that express modality. This analysis revealed the prevalence of epistemic values in both corpora, and the existence of distinct patterns in the expression of this type of modality. In the non-native corpus, the expression of modality is restricted when compared to the native speakers'. In the corpus of native speakers, there was a prevalence of adverbs with modalizing meanings. In addition, learners tend to use some modal verbs differently. This study may contribute to the emerging field of corpora linguistic studies as well as to the area of syntax, with possible implications for the teaching of academic writing in English.
APA, Harvard, Vancouver, ISO, and other styles
6

Mastromattei, Michele, Leonardo Ranaldi, Francesca Fallucchi, and Fabio Massimo Zanzotto. "Syntax and prejudice: ethically-charged biases of a syntax-based hate speech recognizer unveiled." PeerJ Computer Science 8 (February 3, 2022): e859. http://dx.doi.org/10.7717/peerj-cs.859.

Full text
Abstract:
Hate speech recognizers (HSRs) can be the panacea for containing hate in social media or can result in the biggest form of prejudice-based censorship hindering people to express their true selves. In this paper, we hypothesized how massive use of syntax can reduce the prejudice effect in HSRs. To explore this hypothesis, we propose Unintended-bias Visualizer based on Kermit modeling (KERM-HATE): a syntax-based HSR, which is endowed with syntax heat parse trees used as a post-hoc explanation of classifications. KERM-HATE significantly outperforms BERT-based, RoBERTa-based and XLNet-based HSR on standard datasets. Surprisingly this result is not sufficient. In fact, the post-hoc analysis on novel datasets on recent divisive topics shows that even KERM-HATE carries the prejudice distilled from the initial corpus. Therefore, although tests on standard datasets may show higher performance, syntax alone cannot drive the “attention” of HSRs to ethically-unbiased features.
APA, Harvard, Vancouver, ISO, and other styles
7

Andrushenko, Olena. "Corpus-based studies of Middle English adverb largely: syntax and information-structure." XLinguae 14, no. 2 (April 2021): 60–75. http://dx.doi.org/10.18355/xl.2021.14.02.05.

Full text
Abstract:
The study aims at exploring the adverb largely in late Middle English based on the Corpus of Middle English Prose and Verse, in terms of its functioning as a sentence Focus marker. The article considers syntactic changes in English from the language with V2 tendencies to the one with verb-medial order. Such differences make sentence information structure disrupted, and new elements arise in the language as ‘therapy.’ The assumption made in this paper is as follows: the word largely emerging in English in ca. 1200 starts functioning as a focusing adverb in 1400 as a result of the shift in the main word order patterns. Moreover, investigating late Middle English syntactic structure and taking into account different types of foci based on information structure tagging throughout the Corpus, the study found that positional variations of adverb largely are used as a mechanism of marking a peculiar type of Focus and are governed by its position in relation to the word it modifies.
APA, Harvard, Vancouver, ISO, and other styles
8

Schneider, Ulrike. "The syntax of metaphor." Yearbook of the German Cognitive Linguistics Association 9, no. 1 (November 1, 2021): 47–70. http://dx.doi.org/10.1515/gcla-2021-0003.

Full text
Abstract:
Abstract This paper analyses diachronic changes which result from metaphorical extension. Its aim is to assess whether such semantic shifts may lead to further semantic and syntactic differentiation between the verb senses and whether they can be described as shifts away or towards prototypical transitivity (cf. Hopper & Thompson 1980). It focusses on changes the verb derail underwent in the 19th and 20th centuries. In a corpus-based analysis, it utilises CART trees and a random forest to determine which syntactic and semantic properties differentiate literal and metaphorical uses of derail. Results reveal a syntactic shift from transitive to intransitive in the older literal construction which hardly affects the younger metaphorical one. This indicates that differentiation can be an epiphenomenon of semantic shifts.
APA, Harvard, Vancouver, ISO, and other styles
9

Hermawan, Nuri. "Representasi Anies dan Ganjar pada Bursa Calon Presiden Indonesia 2024 dalam Berita Online Okezone.com." Syntax Literate ; Jurnal Ilmiah Indonesia 6, no. 1 (November 18, 2021): 24. http://dx.doi.org/10.36418/syntax-literate.v6i1.4613.

Full text
Abstract:
Penelitian ini bertujuan untuk mengungkap representasi nama Anies Baswedan dan Ganjar Pranowo sebagai tokoh yang sering disebut-sebut dan unggul dalam beberapa survei sebagai calon Presiden Indonesia tahun 2024. Menggunakan sumber dari data bahasa yang muncul pada berita dalam jaringan okezone.com, penelitian ini menggunakan metode analisis wacana kritis dengan bantuan linguistik korpus atau Corpus-Assisted Critical Discourse Analysis. Pendekatan pada penelitian ini menggunakan pendekatan corpus-driven dan corpus-based yang bertujuan untuk membantu pemilihan sumber data, pengumpulan data, dan identifikasi topik berita yang menggambarkan bagaimana dua sosok kandidat terkuat yang muncul pada bursa Pilpres 2024. Selanjutnya, teknik linguistik korpus yang digunakan pada penelitian ini bertujuan untuk menganalisis kompilasi korpus yang meliputi frekuensi, kata kunci, kelompok, kolokasi, dan konkordansi. Analisis kritis terhadap data diungkapkan dengan melihat representasi dua nama tokoh yang muncul dan sengaja diciptakan oleh media okezone.com. Representasi kandidat Pilpres 2024 tersebut dilakukan dengan menggunakan tipe-tipe wacana yang dilatarbelakangi arah penggiringan opini pada media okezone.com. Dari representasi dua nama kandidat yang sering muncul ditarik kesimpulan bahwa keduanya merupakan sosok yang pantas maju pada kontestasi Pilpres 2024. Namun, sosok Anies digambarkan sebagai figur yang maju dengan jalan yang tenang, sedang Ganjar digambarkan figur yang punya ambisi dan sudah memetakan langkah menuju Pilpres 2024
APA, Harvard, Vancouver, ISO, and other styles
10

Taylor, Ann. "Treebanks in Historical Syntax." Annual Review of Linguistics 6, no. 1 (January 14, 2020): 195–212. http://dx.doi.org/10.1146/annurev-linguistics-011619-030515.

Full text
Abstract:
Over the last 20 years, the development of a wide range of treebanks that track the evolution of languages’ syntactic patterns through time has revolutionized the field of historical syntax. The range of treebanks now available facilitates research into the long histories of many of the major Indo-European languages. Although the field's essentially corpus-based methodology has not changed, the quantity of data now available and the ease and precision with which those data can be extracted have created new opportunities. For example, with a treebank it is possible to extract all examples of surface strings associated only with abstract structures (e.g., relative clauses, extraposition), to investigate predictions made by syntactic analyses, to search for rare constructions, and to extract enough data to support sophisticated statistical analyses. Crucially, treebanks make verification and replicability of results possible.
APA, Harvard, Vancouver, ISO, and other styles
11

Lowder, Matthew W., and Peter C. Gordon. "Eye-tracking and corpus-based analyses of syntax-semantics interactions in complement coercion." Language, Cognition and Neuroscience 31, no. 7 (May 19, 2016): 921–39. http://dx.doi.org/10.1080/23273798.2016.1183798.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

TAMAREDO, IVÁN, MELANIE RÖTHLISBERGER, JASON GRAFMILLER, and BENEDIKT HELLER. "Probabilistic indigenization effects at the lexis–syntax interface." English Language and Linguistics 24, no. 2 (June 25, 2019): 413–40. http://dx.doi.org/10.1017/s1360674319000133.

Full text
Abstract:
Szmrecsanyi et al. (2016) define probabilistic indigenization as the process whereby probabilistic constraints shape variation patterns in different ways, which eventually leads to more heterogeneity in the constraints governing syntactic variation across different varieties of English. The present study extends our knowledge of the heterogeneity of probabilistic grammars by sketching a corpus-based variationist method for calculating the similarity between varieties thereby drawing inspiration from the comparative sociolinguistics literature. Based on linguistic material from the International Corpus of English, we ascertain the degree of regional variability of five probabilistic constraints on the genitive, dative, particle placement and subject pronoun omission alternations across three varieties of English, namely British, Indian and Singapore English. Our results indicate that, of the four alternations under study, the genitive alternation is the most homogeneous one from a regional perspective, followed – in increasing order of heterogeneity – by subject pronoun omission, dative and particle placement alternations. On the basis of these findings, we evaluate claims in the literature according to which the extent of probabilistic indigenization is proportional to the lexical specificity of the syntactic phenomenon under study, a hypothesis that is borne out by our data.
APA, Harvard, Vancouver, ISO, and other styles
13

Schäfer, Roland, and Ulrike Sayatz. "Punctuation and syntactic structure in obwohl and weil clauses in nonstandard written German." Written Language and Literacy 19, no. 2 (December 31, 2016): 212–45. http://dx.doi.org/10.1075/wll.19.2.04sch.

Full text
Abstract:
In this paper, we analyze written sentences containing the German particles obwohl (“although”) and weil (“because”). In standard written German, these particles embed clauses in verb-last constituent order, which is characteristic of subordinated clauses. In spoken and – as we show – nonstandard written German, they embed clauses in verb-second constituent order, which is characteristic of independent sentences. Our usage-based approach to the syntax – graphemics interface includes a large-scale corpus analysis of the patterns of punctuation in the nonstandard variants that provides clues to the syntactic structure and degree of sentential independence of the nonstandard variants. Our corpus study confirms and refines hypotheses from existing theoretical approaches by clearly showing that writers mark obwohl clauses with verb-second order systematically as independent sentences, whereas weil clauses with verb-second order are much less strongly marked as independent. This work suggests that similar corpus studies could provide deeper insight into the interplay between syntax and graphemics.
APA, Harvard, Vancouver, ISO, and other styles
14

Farese, Gian Marco. "“Know Your Coffee!” The Cultural Semantics of a Lexico-Syntactic Molecule of English." International Journal of English Linguistics 12, no. 4 (June 5, 2022): 11. http://dx.doi.org/10.5539/ijel.v12n4p11.

Full text
Abstract:
This paper presents a cultural semantic analysis of the English syntactic construction ‘know your + noun’ made combining the analytical principles and methods of ethnosyntax (Wierzbicka, 1988, 2003, 2006a) with those of corpus-based discourse analysis (Baker, 2006; Partington et al., 2004). Three main points are made in the paper: (i) ‘know your n.’ constitutes an indissoluble lexico-syntactic molecule of English expressing its own specific meaning; (ii) this construction is both genre-specific and subject to intralinguistic variation; (iii) this construction is quintessentially Anglo, because it reflects Anglo cultural assumptions about personal autonomy informing certain speech practices in English discourse (Goddard & Wierzbicka, 2004; Wierzbicka, 2006b) and defies easy translation in other languages. The analysis is based on the findings of a corpus search in GLOWBE across varieties of English complemented by additional data from the web. The results provide a clear picture of the meaning of ‘know your n.’ and of where it situates within the broad range of know-constructions. Ultimately, the paper emphasises the contribution that corpus-based, empirical discourse analysis can make to the semantics and ethnography of syntax as well as to the study of the interface between syntax, semantics and culture.
APA, Harvard, Vancouver, ISO, and other styles
15

Levin, Beth, and Grace Song. "Making Sense of Corpus Data." International Journal of Corpus Linguistics 2, no. 1 (January 1, 1997): 23–64. http://dx.doi.org/10.1075/ijcl.2.1.04lev.

Full text
Abstract:
This paper demonstrates the essential role of corpus data in the development of a theory that explains and predicts word behavior. We make this point through a case study of verbs of sound, drawing our evidence primarily from the British National Corpus. We begin by considering pretheoretic notions of the verbs of sound as presented in corpus-based dictionaries and then contrast them with the predictions made by a theory of syntax, as represented by Chomsky's Government-Binding framework. We identify and classify the transitive uses of sixteen representative verbs of sound found in the corpus data. Finally, we consider what a linguistic account with both syntactic and lexical semantic components has to offer as an explanation of observed differences in the behavior of the sample verbs.
APA, Harvard, Vancouver, ISO, and other styles
16

Hajič, Jan, Eva Hajičová, Jiří Mírovský, and Jarmila Panevová. "Linguistically Annotated Corpus as an Invaluable Resource for Advancements in Linguistic Research: A Case Study." Prague Bulletin of Mathematical Linguistics 106, no. 1 (October 1, 2016): 69–124. http://dx.doi.org/10.1515/pralin-2016-0012.

Full text
Abstract:
Abstract A case study based on experience in linguistic investigations using annotated monolingual and multilingual text corpora; the “cases” include a description of language phenomena belonging to different layers of the language system: morphology, surface and underlying syntax, and discourse. The analysis is based on a complex annotation of syntax, semantic functions, information structure and discourse relations of the Prague Dependency Treebank, a collection of annotated Czech texts. We want to demonstrate that annotation of corpus is not a self-contained goal: in order to be consistent, it should be based on some linguistic theory, and, at the same time, it should serve as a test bed for the given linguistic theory in particular and for linguistic research in general.
APA, Harvard, Vancouver, ISO, and other styles
17

Kononenko, Irina Vital'evna. "Cross-cultural communication - lost in translation: A corpus study (based on the material from the Russian-Polish corpus)." Russian Journal of Linguistics 24, no. 4 (December 15, 2020): 926–44. http://dx.doi.org/10.22363/2687-0088-2020-24-4-926-944.

Full text
Abstract:
The article is devoted to cross-cultural communication and its implementation in Polish translations of Russian fiction. Nowadays, both the study of national specifics relating to the worldviews of speakers of different languages, and the analysis of the way those worldviews are reflected in translation, are becoming more relevant. This article aims to study the properties of cross-cultural dialogue, which is mirrored in parallel fictional texts. The research material came from the Russian-Polish corpus. The analysis indicates that nationally specific features can manifest themselves on different levels of the language system - in vocabulary, phraseology, word formation, morphology, and syntax. The translation of sentences which include units representative of the Russian linguistic worldview demonstrates both cross-cultural successes and failures (omission of elements symbolic of Russian culture, their inaccurate interpretation or replacement with items typical of the Polish worldview). The existing printed and electronic dictionaries, as well as online translators, do not fully meet current requirements, including those related to conveying Russian cultural and linguistic senses by means of the Polish language. The practice of translating literary works from Russian into Polish demonstrates the need for further investigation of the worldviews of both nations.
APA, Harvard, Vancouver, ISO, and other styles
18

Lliteras Poncel, Margarita. "Sobre la formación del corpus de autoridades en la Gramática Española." Historiographia Linguistica 24, no. 1-2 (January 1, 1997): 57–72. http://dx.doi.org/10.1075/hl.24.1-2.06lli.

Full text
Abstract:
Summary In the Spanish tradition, descriptive grammars based both factually and methodologically on a corpus gleaned from identified contemporary sources, mostly taken from literature, do not appear until the several editions (11831–81847) of the grammar of Vicente Salvâ (1786–1849), and later in that of Andrés Bello (11847–1860). A small part of Salvá’s corpus does come from medieval and renaissance authors, but these are used only to illustrate diachronic change in Spanish. Salvá’s empirical and descriptive approach, and that of other 19th-century Spanish and Spanish American grammarians that follow him, leads to specialization within the wider field of grammar and, as is shown here, syntax is the area that profits the most, both in depth and in size or extension. There is no precedent for this grammaticographical tradition in the Renaissance, when a literary corpus is used only for those parts of the texts that traditionally dealt with metrics and versification. Renaissance grammarians derived the authority of their texts from the transfer of the rules of Latin grammar into Spanish, not from the language of the literary canon. During the 18th-century Enlightenment grammars based on a literary corpus begin to appear, but the authors from whose works the corpus is taken are those of a previous (non-contemporary) period. As shown in this article, it is in the 18th century that descriptivism results in an increase in the importance of syntax, although that increment in size is minor by comparison with that which takes place during the 19th century beginning with the works of Salvá.
APA, Harvard, Vancouver, ISO, and other styles
19

PAOLILLO, JOHN C. "Formalizing formality: an analysis of register variation in Sinhala." Journal of Linguistics 36, no. 2 (July 2000): 215–59. http://dx.doi.org/10.1017/s0022226700008148.

Full text
Abstract:
Variation in language on the basis of formality (register variation) is often neglected both in grammatical descriptions and in sociolinguistic analyses. I demonstrate here that in Sinhala, and perhaps in other diglossic languages, register variation in syntax cannot be ignored. In a Head-Driven Phrase Structure Grammar (HPSG) analysis based on a corpus of naturally occurring Sinhala texts, I propose an analysis of register variation in which the syntax of all observed registers is accounted for within a single grammar. I further explain how the approach to register variation developed here can be extended to other types of sociolinguistic variation.
APA, Harvard, Vancouver, ISO, and other styles
20

Royani, Ida, and Heni Arwida. "Critical Reading for Self-Critical Writing." Syntax Literate ; Jurnal Ilmiah Indonesia 6, no. 2 (December 20, 2021): 1252. http://dx.doi.org/10.36418/syntax-literate.v6i2.5111.

Full text
Abstract:
This study aims at exploring students’ critical reading strategies and explaining how their critical reading encounters critical writing. It is due to students were lack of confidence in their ability to challenge the arguments and evidence put forward by respected academics author. The qualitative design was established by Gay and Airasian (2012) by delivering open and closed ended questions through Google forms and analyzing corpus based on students’ proposal text. Then, it had been analyzed by using cyclical steps; reading, describing, clarifying and interpreting. Based on the data, firstly, it has been revealed that students’ critical reading strategies mostly established are making connections, contextualizing and making applications and identifying problems and creating annotations. Students were rarely to challenge author’s assumptions, translate ideas into visuals and evaluate arguments. Secondly, their reading activity also reflected their critical reading, in other words, students state their purpose of writing, define key terms, and manage references on their work. Based on this, it can be figured out that students’ critical writing were relied on superficial argument development and format-based writing which performed a shallow writing.
APA, Harvard, Vancouver, ISO, and other styles
21

Le Normand, Marie-Thérèse. "Productive use of syntactic categories in typical young French children." First Language 39, no. 1 (June 1, 2018): 45–60. http://dx.doi.org/10.1177/0142723718778920.

Full text
Abstract:
In this corpus study, it is asked whether young children speaking European French build their early syntax around grammatical or lexical words. Specifically, the study examines the relationship of grammatical and lexical words in three types of syntactic structures (determiner–noun, pronoun–verb and subject pronoun–verb). The corpus included 315 samples from children aged 24–48 months, a period of rapid growth in grammatical morphology and syntax. The results of a series of stepwise multiple regression analyses indicate that prepositions and auxiliaries explain the unique variance in determiner–noun and determiners and prepositions explain the unique variance in pronoun–verb and subject pronoun–verb combinations better than lexical categories. All these strong predictors support the view that grammatical words guide and facilitate syntactic knowledge. Early grammar is based not on a lexicon but on basic grammatical relationships that young children build gradually, making use of the formal distributional properties of their native language.
APA, Harvard, Vancouver, ISO, and other styles
22

Crible, Ludivine. "The syntax and semantics of coherence relations." International Journal of Corpus Linguistics 27, no. 1 (January 21, 2022): 59–92. http://dx.doi.org/10.1075/ijcl.19109.cri.

Full text
Abstract:
Abstract This corpus-based study investigates the inter-relation between discourse markers (DMs) and other contextual signals that contribute to the interpretation of coherence relations. The objectives are three-fold: (i) to provide a comprehensive and systematic portrait of the syntax and semantics of a set of coherence relations in English; (ii) to draw a distinction between mere tendencies of co-occurrence and strong predictive signals; (iii) to identify factors that account for the variation of these signals, focusing on relation complexity, DM strength and genre preferences. The methodology combines systematic coding (description) and multivariate statistical modelling (prediction). While the effect of genre and relation complexity was found to be null or moderate, the presence of discourse signals systematically varies with the ambiguity of the DM in the relation: signals co-occur more with ambiguous DMs than with more informative ones.
APA, Harvard, Vancouver, ISO, and other styles
23

Lee, Ming Che, Jia Wei Chang, and Tung Cheng Hsieh. "A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences." Scientific World Journal 2014 (2014): 1–17. http://dx.doi.org/10.1155/2014/437162.

Full text
Abstract:
This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.
APA, Harvard, Vancouver, ISO, and other styles
24

Wang, Ai Ling. "On the Statistical Machine Translation Studies." Applied Mechanics and Materials 347-350 (August 2013): 3262–66. http://dx.doi.org/10.4028/www.scientific.net/amm.347-350.3262.

Full text
Abstract:
Machine translation (MT) is one of the core application of natural language processing and an important branch of artificial intelligence research; statistical methods have already become the mainstream of machine translation. This paper explores the comparative analysis on the translation model of statistical natural language processing based on the large-scale corpus; discusses word-based, phrase-based and syntax-based machine translation methods respectively, summarizes the evaluation factors of machine translation and analyzes evaluation methods of machine translation.
APA, Harvard, Vancouver, ISO, and other styles
25

Szkudlarek-Śmiechowicz, Ewa. "Internet communication and language dynamics and norm." Acta Universitatis Lodziensis. Folia Linguistica 56 (December 21, 2022): 39–52. http://dx.doi.org/10.18778/0208-6077.56.02.

Full text
Abstract:
The purpose of the research the results of which are presented in the article is to show the language dynamics taking place under the influence of internet communication and in internet communication, on the example of selected syntax structures representing two mechanisms contributing to language dynamization, namely substitution and multiplication. In the article, based on quantitative corpus research in a diachronic approach, impact of internet communication, with observable distortion of the language standard, on the general standard Polish language is proven. An example of that impact is variability of the syntax structures selected for the research in scientific texts, that is in texts representing the highest correctness standards, subject to multiple linguistic corrections.
APA, Harvard, Vancouver, ISO, and other styles
26

Nkollo, Mikołaj, and Beata Malczewska. "Towards phrasal attachment of European Portuguese proclitics. A corpus-based inquiry into diachronic phonology-syntax interface." Linguistica Copernicana 16 (May 2, 2020): 259. http://dx.doi.org/10.12775/lincop.2019.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Lindström, Liina, and Kristel Uiboaed. "Syntactic variation in ‘need’-constructions in Estonian dialects." Nordic Journal of Linguistics 40, no. 3 (November 29, 2017): 313–49. http://dx.doi.org/10.1017/s0332586517000191.

Full text
Abstract:
The article contributes new data and findings to the growing field of corpus-based dialect syntax research. The focus of the paper is on variation in ‘need’-constructions (tarvis/vaja olema+ nominal complement/infinitive ‘need to’) based on the corpus of Estonian dialects. Our purpose was to demonstrate the complex nature of syntactic variation, constrained geographically, individually or by language-internal factors. The study takes a corpus-based quantitative approach to observing the geographical spread of linguistic units. We apply conditional inference tree and random forests models to capture the (co)varying parts of the construction studied. Our results show that variation in different parts of constructions is influenced by different factors, both geographical and language-internal. Lexical variation (adverbtarvis‘need’ orvaja‘need’) and omission of the copula are clearly geographically distributed, while omission of the experiencer is determined mainly by language-internal factors. However, the study has also found extensive inter-individual differences.
APA, Harvard, Vancouver, ISO, and other styles
28

ALLEMBE, Rodrigue Lézin. "A corpus-based analysis of learners’ spelling and morpho-syntactic writing errors and mistakes. Case study of senior secondary schools in Brazzaville (Congo)." International Journal of Education and Humanities 2, no. 4 (October 29, 2022): 135–46. http://dx.doi.org/10.58557/ijeh.v2i4.103.

Full text
Abstract:
This research work aimed at identifying and analyzing errors and mistakes that Congolese EFL learners commit in their written productions with regard to spelling and morpho-syntax aspects. We carried out the investigation in two senior secondary schools: Réconciliation and Kintélé, located in Brazzaville. To obtain reliable results, we used for data collection the EFL learners’ copies from grade 3 of these two schools of our choice. Besides, we used the descriptive analytical method and Corder’s errors analysis theory (EA) to identify and analyze the types of errors and mistakes made in writing. The results show that EFL learners mostly make interlingual errors (interference between French and English) at the level of selection (morphological and grammatical errors) and of syntax (misordering of words). To enhance EFL learners’ writing skills, we suggest some strategies and techniques teachers should use during writing instruction
APA, Harvard, Vancouver, ISO, and other styles
29

Colina, Sonia. "Syntax, Discourse Analysis, and Translation Studies." Babel. Revue internationale de la traduction / International Journal of Translation 43, no. 2 (January 1, 1997): 126–37. http://dx.doi.org/10.1075/babel.43.2.04col.

Full text
Abstract:
Abstract The linguistics of the 60s and 70s did not prove to be of much help to translation and translation theory, due to the emphasis placed on languages as formal systems. However, newer directions of linguistics research which focus on the communicative function of language, such as text linguistics, discourse analysis, pragmatics, have much to offer to translation studies. This paper shows how discourse analysis can be applied to translation and highlights some of the benefits of knowledge of linguistics and discourse analysis for the translation teacher, the student and the professional translator. In addition, it joins recent literature on translation studies and linguistics (House and Blum-Kulka 1986; Hatim and Mason 1990; Neubert and Shreve 1992; Baker 1992) in calling for a more influential role of linguistics in translation studies and translation theory. Working within discourse analysis and, in particular, syntax in discourse, i.e. discourse functions of syntactic constructions, the present study examines the discourse functions of the passive in Spanish and in English. The paper first presents a contrastive description of the textual functions of the passive in English and in Spanish based on a corpus of original texts in both languages. Then a discourse-based explanation for the differences is provided. Finally, the author examines the solutions found in translation as well as the analysis' efficiency in predicting and/or explaining such solutions. Résumé La linguistique des années 60 et 70, période pendant laquelle la langue était conçue comme un système formel, ne se prêtait pas bien à la traduction et à sa théorie. La recherche portant sur la linguistique a depuis changé d'orientation; on reconnaît maintenant l'aspect communicatif de la langue. On accorde donc une importance particulière à la linguistique, à l'analyse du discours et à la pragmatique, entre autres, ce qui se prête beaucoup mieux au concept de la traduction. La présente étude démontre comment on peut appliquer l'analyse de la rédaction à la traduction et souligne quelques-uns des avantages qu'offre la connaissance de cette analyse et de la linguistique pour l'enseignant, l'étudiant et le traducteur professionnel. De plus, l'auteur se joint aux auteurs d'études récentes portant sur la traduction et la linguistique (House et Blum-Kulka, 1986; Hatim et Mason, 1990; Neubert et Shreve, 1992; Baker, 1992) en recommandant un rôle plus important pour la linguistique dans l'étude et la théorie de la traduction. A l'aide d'une analyse du discours, et plus particulièrement de la syntaxe, c'est-à-dire de la fonction de la syntaxe dans la rédaction, le rapport examine l'emploi du passif dans les langues espagnole et anglaise. On établit d'abord le contraste entre la fonction textuelle du passif dans la langue anglaise et celle dans la langue espagnole en étudiant un corpus de textes dans les deux langues. On explique ensuite la différence du point de vue de la rédaction. Enfin, l'auteur examine les solutions qu'apporte la traduction et l'efficacité de l'analyse pour prévoir et pour expliquer ces solutions.
APA, Harvard, Vancouver, ISO, and other styles
30

Qing-dao-er-ji, Ren, Kun Cheng, and Rui Pang. "Research on Traditional Mongolian-Chinese Neural Machine Translation Based on Dependency Syntactic Information and Transformer Model." Applied Sciences 12, no. 19 (October 7, 2022): 10074. http://dx.doi.org/10.3390/app121910074.

Full text
Abstract:
Neural machine translation (NMT) is a data-driven machine translation approach that has proven its superiority in large corpora, but it still has much room for improvement when the corpus resources are not abundant. This work aims to improve the translation quality of Traditional Mongolian-Chinese (MN-CH). First, the baseline model is constructed based on the Transformer model, and then two different syntax-assisted learning units are added to the encoder and decoder. Finally, the encoder’s ability to learn Traditional Mongolian syntax is implicitly strengthened, and the knowledge of Chinese-dependent syntax is taken as prior knowledge to explicitly guide the decoder to learn Chinese syntax. The average BLEU values measured under two experimental conditions showed that the proposed improved model improved by 6.706 (45.141–38.435) and 5.409 (41.930–36.521) compared with the baseline model. The analysis of the experimental results also revealed that the proposed improved model was still deficient in learning Chinese syntax, and then the Primer-EZ method was introduced to ameliorate this problem, leading to faster convergence and better translation quality. The final improved model had an average BLEU value increase of 9.113 (45.634–36.521) compared with the baseline model at experimental conditions of N = 5 and epochs = 35. The experiments showed that both the proposed model architecture and prior knowledge could effectively lead to an increase in BLEU value, and the addition of syntactic-assisted learning units not only corrected the initial association but also alleviated the long-term dependence between words.
APA, Harvard, Vancouver, ISO, and other styles
31

Góra, Katarzyna. "Predicate-Argument Structure in a Valence Dictionary (on the Example of the Verb Reward)." Acta Neophilologica 1, no. XXIII (June 1, 2021): 101–22. http://dx.doi.org/10.31648/an.6414.

Full text
Abstract:
Valence dictionaries are very often specialized works for advanced readers which present how particular linguistic units combine with its subordinates. The article is a critical analysis of a dictionary entry for the lexical unit of reward contained in A Valency Dictionary of English, a Corpus-Based Analysis of the Complementation Patterns of English Verbs, Nouns and Adjectives [2004]. A complementary proposal regarding the predicate-argument structure and its annotation system will be provided based on the theoretical model proposed by S. Karolak [1984; 2002] called Semantic Syntax (SS) and more specifically its extended model called explicative syntax [Kiklewicz et al. 2010; 2019]. The research findings demonstrate the need for coordinated international projects that should integrate both the syntactic as well as the semantic levels in order to gradually meet the objective of an integrated language description encompassing both the grammar and the lexicon.
APA, Harvard, Vancouver, ISO, and other styles
32

Wang, Meng, and Fanghui Hu. "The Application of NLTK Library for Python Natural Language Processing in Corpus Research." Theory and Practice in Language Studies 11, no. 9 (September 1, 2021): 1041–49. http://dx.doi.org/10.17507/tpls.1109.09.

Full text
Abstract:
Corpora play an important role in linguistics research and foreign language teaching. At present, the relevant research on the corpus in China mainly uses WordSmith, Antconc and other retrieval tools. NLTK library, which is based on Python language, can provide more flexible and rich research methods, and it can use unified data standards to avoid the trouble of various data type conversion. At the same time, with the help of Python’s numerous third-party libraries, it can make up for the shortcomings of other tools in syntax analysis, graphic rendering, regular expression retrieval and other aspects. In terms of the main links in corpus research, such as text cleaning, word form restoration, part of speech tagging and text retrieval statistics, this paper takes the US presidential inaugural speech in the corpus as an example to show how to use this tool to process the language data, and introduces the application of Python NLTK library in corpus research.
APA, Harvard, Vancouver, ISO, and other styles
33

Winer, David, and R. Young. "Automated Screenplay Annotation for Extracting Storytelling Knowledge." Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 13, no. 2 (June 25, 2021): 273–80. http://dx.doi.org/10.1609/aiide.v13i2.12994.

Full text
Abstract:
Narrative screenplays follow a standardized format fortheir parts (e.g., stage direction, dialogue, etc.) including short descriptions for what, where, when, and howto film the events in the story (shot headings). We created a grammar based on the syntax of shot headings toextract this and other discourse elements for automatic screenplay annotation. We test our annotator on over a thousand raw screenplays from the IMSDb screenplay corpus and make the output available for narrative intelligence research.
APA, Harvard, Vancouver, ISO, and other styles
34

Yan, Jianwei. "Morphology and word order in Slavic languages: Insights from annotated corpora." Voprosy Jazykoznanija, no. 4 (2021): 131. http://dx.doi.org/10.31857/0373-658x.2021.4.131-159.

Full text
Abstract:
Slavic languages are generally assumed to possess rich morphological features with free syntactic word order. Exploring this complexity trade-off can help us better understand the relationship between morphology and syntax within natural languages. However, few quantitative investigations have been carried out into this relationship within Slavic languages. Based on 34 annotated corpora from Universal Dependencies, this paper paid special attention to the correlations between morphology and syntax within Slavic languages by applying two metrics of morphological richness and two of word order freedom, respectively. Our findings are as follows. First, the quantitative metrics adopted can well capture the distributions of morphological richness and word order freedom of languages. Second, the metrics can corroborate the correlation between morphological richness and word order freedom. Within Slavic languages, this correlation is moderate and statistically significant. Precisely, the richer the morphology, the less strict the word order. Third, Slavic languages can be clustered into three subgroups based on classification models. Most importantly, ancient Slavic languages are characterized by richer morphology and more flexible word order than modern ones. Fourth, as two possible disturbing factors, corpus size does not greatly affect the results of the metrics, whereas corpus genre does play an important part in the measurements of word order freedom. Specifically, the word order of formal written genres tends to be more rigid than that of informal written and spoken ones. Overall, based on annotated corpora, the results verify the negative correlation between morphological richness and word order rigidity within Slavic languages, which might shed light on the dynamic relations between morphology and syntax of natural languages and provide quantitative instantiations of how languages encode lexical and syntactic information for the purpose of efficient communication.
APA, Harvard, Vancouver, ISO, and other styles
35

Higgins, Derrick, and Jerrold M. Sadock. "A Machine Learning Approach to Modeling Scope Preferences." Computational Linguistics 29, no. 1 (March 2003): 73–96. http://dx.doi.org/10.1162/089120103321337449.

Full text
Abstract:
This article describes a corpus-based investigation of quantifier scope preferences. Following recent work on multimodular grammar frameworks in theoretical linguistics and a long history of combining multiple information sources in natural language processing, scope is treated as a distinct module of grammar from syntax. This module incorporates multiple sources of evidence regarding the most likely scope reading for a sentence and is entirely data-driven. The experiments discussed in this article evaluate the performance of our models in predicting the most likely scope reading for a particular sentence, using Penn Treebank data both with and without syntactic annotation. We wish to focus attention on the issue of determining scope preferences, which has largely been ignored in theoretical linguistics, and to explore different models of the interaction between syntax and quantifier scope.
APA, Harvard, Vancouver, ISO, and other styles
36

Lafontaine, Fanny. "Proposal to use the study corpus for contemporary French in Didactics of French as a Foreign Language." Journal of Linguistics/Jazykovedný casopis 73, no. 1 (June 1, 2022): 951–66. http://dx.doi.org/10.2478/jazcas-2022-0019.

Full text
Abstract:
Abstract The ORFÉO platform (Tools and Research on Written and Oral French) has been making available to users since 2018 a Study Corpus for sampled Contemporary French as well as operating tools. Although this resource is intended for an audience of researchers and students in the fields of linguistics and automatic language processing, we endeavor in this article to report on the didactic potential that it offers within the framework of a Licensing Syntax course treating “subordination” and intended for Czech and Slovak students at levels B1 to C1 in French. We propose a didactic sequence composed of four activities and pursuing three objectives: consolidation of the mastery of the basic functions of dont («which») from a corpus of friendly conversations; the use of simple query interface tools and the introduction of certain principles of corpus sociolinguistics. The corpus-based approach, by confronting learners with authentic contextualized data, helps to redefine the teaching-learning priorities of a language by giving primacy not to respect for grammatical norms but to genre norms.
APA, Harvard, Vancouver, ISO, and other styles
37

Lin, Jingxia, and Jeeyoung Peck. "The syntax–semantics interface of multi-morpheme motion constructions in Chinese." Studies in Language 35, no. 2 (September 30, 2011): 337–79. http://dx.doi.org/10.1075/sl.35.2.04lin.

Full text
Abstract:
This study analyzes semantic constraints affecting the order of motion morphemes in Mandarin Chinese multi-morpheme motion constructions (MMMCs, e.g. zǒu-jìn fángjiān ‘walk into the room’ (lit.) ‘walk-enter room’ vs. *jìn-zǒu (lit.) ‘enter-walk’). We classify Chinese motion morphemes into four types based on recent study on “scale structure”. Then, we propose an implicational scalar hierarchy formed by the four types of morphemes that can be used to predict the order of motion morphemes in Chinese MMMCs. Our corpus studies demonstrate that the hierarchy can explain the morpheme order of MMMCs for a comprehensive range of existing natural Chinese data. We anticipate that our scalar hierarchy may be extensible to serial-verb motion constructions in other languages as well.
APA, Harvard, Vancouver, ISO, and other styles
38

Rodríguez-Juárez, Carolina, and Francisco J. Cortés-Rodríguez. "A Role and Reference Grammar Account of Adjuncts in the Airbus Corpus: A Quantitative-Based Study." Atlantis. Journal of the Spanish Association for Anglo-American Studies 44, no. 2 (December 23, 2022): 20–44. http://dx.doi.org/10.28914/atlantis-2022-44.2.02.

Full text
Abstract:
This paper presents the results of a quantitative study of adjuncts in the Airbus corpus carried out within the theoretical framework of Role and Reference Grammar (RRG). We describe the positional behaviour of these peripheral constituents in the Layered Structure of the Clause and postulate scales of positional and peripheral preferences, based on frequency distribution, in the Airbus controlled natural language (CNL). The results obtained were compared with a previous study on adjunct preferences and positions in Natural English to check for changes in these scales due to the nature of the texts written in this CNL. We also aim to contribute to the development of the RRG analysis of adverbials by offering a detailed semantic typology and a description of the syntax of these peripheral constituents grounded in empirical and quantitatively based data that will serve as a basis for the parsing of adverbials in the computational processing of CNLs.
APA, Harvard, Vancouver, ISO, and other styles
39

Wang, Juan, and Feng Gu. "An Automatic Error Correction Method for English Composition Grammar Based on Multilayer Perceptron." Mathematical Problems in Engineering 2022 (June 16, 2022): 1–7. http://dx.doi.org/10.1155/2022/6070445.

Full text
Abstract:
In order to improve the timeliness of English grammar error correction and the recall rate of English grammar error correction, this paper proposes an automatic error correction method for English composition grammar based on a multilayer perceptron. On the basis of preprocessing the English composition corpus data, this paper extracts the grammatical features in the English composition corpus and constructs a grammatical feature set. We take the feature set as the input information of the multilayer perceptron and realize feature classification through network learning and training. The grammatical error items in the English composition are detected according to the similarity, and the error correction is completed by setting the penalty parameter and reducing the deviation parameter. The experimental results show that the syntax error detection time of this method is less than 6 minutes, the recall rate is higher than 90%, and the detection error rate is lower than 6%. The method improves the timeliness of grammatical error correction and improves the efficiency of error correction.
APA, Harvard, Vancouver, ISO, and other styles
40

Léveillé Gauvin, Hubert. "“The Times They Were A-Changin’”: A Database-Driven Approach to the Evolution of Musical Syntax in Popular Music from the 1960s." Empirical Musicology Review 10, no. 3 (April 9, 2015): 215. http://dx.doi.org/10.18061/emr.v10i3.4467.

Full text
Abstract:
The goal of this research is to investigate the pitch structures of popular music in the 1960s through a large corpus study in order to identify any consistent changes in harmonic and tonal syntax.  More specifically, two studies based on the Billboard DataSet (Burgoyne, Wild & Fujinaga, 2011; Burgoyne, 2011), a new corpus presenting transcriptions for more than 700 songs, is presented. The first study looks at the incidence of multi-tonic songs throughout the decade, while the second study focuses on the incidence of flat-side harmonies (e.g. bIII, bVI, and bVII) over the same period of time. While no difference was observed in the frequency of multi-tonic songs, the study showed a significant increase in the incidence of flat-side harmonies during the second half of the decade.
APA, Harvard, Vancouver, ISO, and other styles
41

Hoffmann, Roland. "The “Kühner-Stegmann” of 1914 and the “Oxford Latin Syntax” of 2015 and 2021: Comparing Two Examples of a Latin syntax of Different Times and Different Approaches." Philologia Classica 16, no. 1 (2021): 138–57. http://dx.doi.org/10.21638/spbu20.2021.112.

Full text
Abstract:
Comparing the grammars of so-called dead languages is particularly fascinating. The article compares two syntactic accounts from completely different time periods — a traditional and still well-known one from the 19th century and a modern one that has currently been published: the syntactic part of the “Ausführliche Grammatik der Lateinischen Sprache” by Raphael Kühner from 1878 and 1879, and the “Oxford Latin Syntax” by Harm Pinkster from 2015 and 2021. First, the general concepts are introduced: Karl Ferdinand Becker’s theory of the three syntactic relations (2.1), S. H. A. Herling’s theory of the equivalence of sentence parts and subordinate clauses (2.2), as well as the modern functional approaches of the theory of the simple clause (3.1) and the complex sentence (3.2). Six differences between the two Latin syntactic concepts are discussed. Three common aspects are taken into consideration, namely the corpus-based approach, the restriction to a single language, and the purely descriptive method as opposed to a normative or more formal approach. Among the results, it is concluded that both grammatical systems of the Latin syntax are legitimate and both should be used to address questions concerning Latin syntax.
APA, Harvard, Vancouver, ISO, and other styles
42

Wu, Jing, Hongxu Hou, Feilong Bao, and Yupeng Jiang. "Template-Based Model for Mongolian-Chinese Machine Translation." Journal of Advanced Computational Intelligence and Intelligent Informatics 20, no. 6 (November 20, 2016): 893–901. http://dx.doi.org/10.20965/jaciii.2016.p0893.

Full text
Abstract:
Mongolian and Chinese statistical machine translation (SMT) system has its limitation because of the complex Mongolian morphology, scarce resource of parallel corpus and the significant syntax differences. To address these problems, we propose a template-based machine translation (TBMT) system and combine it with the SMT system to achieve a better translation performance. The TBMT model we proposed includes a template extraction model and a template translation model. In the template extraction model, we present a novel method of aligning and abstracting static words from bilingual parallel corpus to extract templates automatically. In the template translation model, our specially designed method of filtering out the low quality matches can enhance the translation performance. Moreover, we apply lemmatization and Latinization to address data sparsity and do the fuzzy match. Experimentally, the coverage of TBMT system is over 50%. The combined SMT system translates all the other uncovered source sentences. The TBMT system outperforms the baselines of phrase-based and hierarchical phrase-based SMT systems for +3.08 and +1.40 BLEU points. The combined system of TBMT and SMT systems also performs better than the baselines of +2.49 and +0.81 BLEU points.
APA, Harvard, Vancouver, ISO, and other styles
43

Javier Pérez-Guerra. "A corpus-based account of the placement of dependents in phrasal categories: On syntax and performance in English." Linguistic Research 32, no. 3 (December 2015): 503–32. http://dx.doi.org/10.17250/khisli.32.3.201512.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Broekhuis, Hans. "Why I will not become a corpus linguist* : The use of introspection data and corpus data in synchronic syntactic research." Nederlandse Taalkunde 25, no. 2 (October 1, 2020): 181–92. http://dx.doi.org/10.5117/nedtaa2020.2-3.003.broe.

Full text
Abstract:
Abstract This article discusses the intuitionist approach to the study of syntax: the study of the internal structure of phrases/sentences with the help of data obtained by introspection. Some critics claim that this approach is inadequate (if not obsolete) on the assumption that introspection data are not empirical data and are therefore inherently inferior to corpus data based on ‘real language’. Fortunately, not all linguists who promote the use of corpora are of this opinion: Odijk (2020), for instance, stresses that data from corpora and data collected in artificial experimental settings (including introspection) should all be considered empirical data: all relevant evidence should be taken into account, and no form or source of evidence has a privileged status. Although I agree with this statement in principle, this article will argue that there are reasons for assuming that introspection research is a better method for collecting synchronic syntactic data than corpus research.
APA, Harvard, Vancouver, ISO, and other styles
45

Martins, Ana Maria. "Autorreferência generalizadora com a pessoa, uma pessoa e outros nomes humanos referencialmente vagos." Linguística: Revista de Estudos Linguísticos da Universidade do Porto 1 (2022): 109–42. http://dx.doi.org/10.21747/16466195/ling2022v1a5.

Full text
Abstract:
This article analyzes data extracted from the Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN) and Corpus Africa with the aim of demonstrating that the arbitrary/indefinite expressions uma pessoa and a pessoa convey first-person-based genericity, that is, what Moltmann (2010) calls ‘generalizing detached self-reference’. Differences in contextual adequacy between uma pessoa and a pessoa are then discussed using intuitive data. It is suggested that such distributional differences are a consequence of the greater interpretative flexibility of a pessoa (which allows an inclusive or exclusive reading) relative to uma pessoa(which only allows an inclusive reading). Finally, the geolinguistic distribution of uma pessoaand a pessoa in the Portuguese territory is compared to the geolinguistic patterns found for other expressions with human general nouns that also seem to convey generalizing detached self-reference (i.e. homem, fulano, fulana, gajo, tipo, indivíduo, mulher).
APA, Harvard, Vancouver, ISO, and other styles
46

Leone, Ljubica. "Phrasal verbs and analogical generalization in Late Modern Spoken English." ICAME Journal 40, no. 1 (March 1, 2016): 39–62. http://dx.doi.org/10.1515/icame-2016-0004.

Full text
Abstract:
Abstract This present study focuses on the description of the development of phrasal verbs (PVs) in Late Modern Spoken English and, specifically, aims at analysing texts taken from the Proceedings of the Old Bailey, a valuable source of spoken language from past time periods. From a diachronic perspective, the emergence of new PVs can be considered strictly linked to the process of direct formation and analogical generalization resulting in PVs as they are known in Present Day English (PDE). This study is a corpus-based investigation conducted on the Late Modern English-Old Bailey Corpus (LModE-OBC), a corpus that has been compiled by using texts from the Proceedings of the Old Bailey and annotated with the Visual Interactive Syntax Learning interface (VISL). The analysis reveals that, in the time span 1750-1850, this verbal group underwent a gradual process of change also due to the contribution of direct formation and analogical generalization, a process that started in the Early Modern English (EME) period and that continued to the Late Modern English (LModE) era.
APA, Harvard, Vancouver, ISO, and other styles
47

Belousova, Anastasia, Juan Sebastián Páramo Rueda, and Paula Ruiz Charris. "Rhythm, Syntax, Punctuation: A Distant Analysis of the European Sonnet." Studia Metrica et Poetica 9, no. 1 (September 1, 2022): 39–65. http://dx.doi.org/10.12697/smp.2022.9.1.03.

Full text
Abstract:
Elaborating on an analysis of a corpus of more than 1200 sonnets by Italian, French, Spanish, English and Russian authors, this article describes the general rhythmic-syntactic arrangement of thirteenth- and fourteenth-century Italian sonnets, European Petrarchist sonnets, and several experiments with this form in the nineteenth and twentieth centuries. It presents results obtained with the help of a computer program developed for the automated analysis of strophic syntax. The program was created using Boris Tomashevsky’s method based on analyzing the punctuation at the end of poetic lines (the strength of the syntactic pause is evaluated depending on the absence or presence of a punctuation sign: i.e., a comma, a dash, a semicolon, or a full stop / question mark / exclamation mark). We supplemented this with two more indices also based on punctuation. The first characterizes the length of sentences (the percentage of sentences in one line, two lines, three lines, etc.), and the second characterizes the number of sentences that end with a full stop, which comes in the middle of a line followed by the beginning of the next sentence in the same line (or, which is the same, the number of such lines). This study demonstrates that both the number of lines with a strong pause in the middle and the number of short sentences have increased over time.
APA, Harvard, Vancouver, ISO, and other styles
48

Xiang, Dajun, and Chengyu Liu. "The Semantics of MOOD and the Syntax of the Let’s-construction in English: A Corpus-based Cardiff Grammar Approach." Australian Journal of Linguistics 38, no. 4 (October 2, 2018): 549–85. http://dx.doi.org/10.1080/07268602.2018.1510726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

BREUL, CARSTEN. "The perfect participle paradox: some implications for the architecture of grammar." English Language and Linguistics 18, no. 3 (October 28, 2014): 449–70. http://dx.doi.org/10.1017/s1360674314000124.

Full text
Abstract:
The topic of this article can be exemplified by the final clause of the following attested sentence: I don't know how he found out that she belonged to that lass, but find out he has. Clauses like this one show a preposed verb phrase that is headed by a plain verb whereas the non-preposed verb phrase of their canonical counterparts is obligatorily headed by a perfect participle (i.e. he has {found / *find} out). This peculiarity of verb phrase preposing, which will be referred to as the perfect participle paradox, has seldom been discussed. The article starts by showing that clauses that manifest the paradox are more frequent in the Corpus of Contemporary American English and in the British National Corpus than their non-paradoxical analogues with preposed canonical perfect participles. The article then looks at the paradox from the point of view of generative syntax, discusses and rejects previous analyses, and argues that a solution entails the rejection of two assumptions that have been associated with a lexicalist position, especially by proponents of distributed morphology. These are the assumptions that (a) a syntactic terminal is an item supplied by the lexicon and comprising a phonological representation and (b) that syntax may not manipulate the internal structure of syntactic terminals. The article proposes an analysis that is not based on these assumptions, but argues that the analysis does not entail the superiority of a distributed morphology framework.
APA, Harvard, Vancouver, ISO, and other styles
50

Kosem, Iztok, and Victoria Nyst. "The corpus-driven revolution in Polish Sign Language: the interview with Dr. Paweł Rutkowski." Slovenščina 2.0: empirical, applied and interdisciplinary research 5, no. 1 (March 7, 2018): 70–90. http://dx.doi.org/10.4312/slo2.0.2017.1.70-90.

Full text
Abstract:
Dr. Paweł Rutkowski is head of the Section for Sign Linguistics at the University of Warsaw. He is a general linguist and a specialist in the field of syntax of natural languages, carrying out research on Polish Sign Language (polski język migowy — PJM). He has been awarded a number of prizes, grants and scholarships by such institutions as the Foundation for Polish Science, Polish Ministry of Science and Higher Education, National Science Centre, Poland, Polish–U.S. Fulbright Commission, Kosciuszko Foundation and DAAD.Dr. Rutkowski leads the team developing the Corpus of Polish Sign Language and the Corpus-based Dictionary of Polish Sign Language, the first dictionary of this language prepared in compliance with modern lexicographical standards. The dictionary is an open-access publication, available freely at the following address: www.slownikpjm.uw.edu.pl/en/.This interview took place at eLex 2017, a biennial conference on electronic lexicography, where Dr. Rutkowski was awarded the Adam Kilgarriff Prize and gave a keynote address entitled Sign language as a challenge to electronic lexicography: The Corpus-based Dictionary of Polish Sign Language and beyond. The interview was conducted by Dr. Victoria Nyst from Leiden University, Faculty of Humanities, and Dr. Iztok Kosem from the University of Ljubljana, Faculty of Arts.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography