Journal articles: 'Corpus method'

1

Waskita, Dana. "CORPUS LINGUISTICS: METHOD, THEORY, AND PRACTICE." Jurnal Sosioteknologi 16, no. 1 (April 29, 2017): 145–47. http://dx.doi.org/10.5614/sostek.itbj.2017.16.1.12.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

de Bruijn, L. M., A. Hasman, and J. W. Arends. "Automatic SNOMED classification—a corpus-based method." Computer Methods and Programs in Biomedicine 54, no. 1-2 (September 1997): 115–22. http://dx.doi.org/10.1016/s0169-2607(97)00040-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Xu, Gui-Xian, Chang-Zhi Wang, Li-Hui Wang, Yu-Hong Zhou, Wei-Kang Li, Hao Xu, and Qing Huang. "Semantic classification method for network Tibetan corpus." Cluster Computing 20, no. 1 (January 19, 2017): 155–65. http://dx.doi.org/10.1007/s10586-017-0742-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Reinert, par Max. "Classification Descendante Hierarchique et Analvse Lexicale par Contexte - Application au Corpus des Poesies D'A. Rihbaud." Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 13, no. 1 (January 1987): 53–90. http://dx.doi.org/10.1177/075910638701300107.

Full text

Abstract:

Hierarchically descending classification and lexical analysis by context: application to the corpus of A. Ri baud's poetry. Using a lexical analysis by context and a hierarchically descending classification method, the author examines the corpus of Rimbaud's poetry. Two different codings of the corpus with the use cf the same method of analysis, plus other methods also, furnish a means of judging the stability of the results obtained. Lexical analysis, Arthur Rimbaud, methodology, hierarchically descending classification analysis, coding. Employant une analyse lexicale par contexte et une methode de classification descendante hierarchique, l'auteur examine Ie corpus de la poese de Rimbaud. Deux codages differents de ce corpus et l'application de la meme methode d'analyse, plus l'utilisation d'autres methodes, permettent de juger la stabilite des resultats obtenus. Analyse lexicale, Arthur Rimbaud, methodologie, classificton descendante hierarchique, codage.

APA, Harvard, Vancouver, ISO, and other styles

5

Chitez, Madalina, and Loredana Bercuci. "Calibrating digital method integration into ESP courses according to disciplinary settings." New Trends and Issues Proceedings on Humanities and Social Sciences 7, no. 1 (July 2, 2020): 20–29. http://dx.doi.org/10.18844/prosoc.v7i1.4862.

Full text

Abstract:

This study aims to analyse the effects of a digitally enhanced teaching strategy in an ESP course. The intervention method consists of guided corpus linguistics exercises which are progressively introduced to improve the students’ academic writing. We collected data from various task-based corpus processing, consultation and analysis stages, each one having a different complexity level: compilation of a discipline-specific expert corpus, consultation of a native speaker English corpus and analyses of both types. The pre- and post-intervention results are quantitatively and qualitatively assessed, controlling for discipline specificity. The results of a corpus consultation satisfaction survey are also included in the analysis. We conclude that corpus consultations not only lead to the improvement of ESP students’ writing but also increasing student motivation. The recommendation is to first test the digital methods in ESP courses and calibrate them according to disciplinary settings.

APA, Harvard, Vancouver, ISO, and other styles

6

Shima, Kazuaki, Takeshi Homma, Masataka Motohashi, Rintaro Ikeshita, Hiroaki Kokubo, Yasunari Obuchi, and Jinhua She. "Efficient Corpus Creation Method for NLU Using Interview with Probing Questions." Journal of Advanced Computational Intelligence and Intelligent Informatics 23, no. 5 (September 20, 2019): 947–55. http://dx.doi.org/10.20965/jaciii.2019.p0947.

Full text

Abstract:

This paper presents an efficient method to build a corpus to train natural language understanding (NLU) modules. Conventional corpus creation methods involve a common cycle: a subject is given a specific situation where the subject operates a device by voice, and then the subject speaks one utterance to execute the task. In these methods, many subjects are required in order to build a large-scale corpus, which causes a problem of increasing lead time and financial cost. To solve this problem, we propose to incorporate a “probing question” into the cycle. Specifically, after a subject speaks one utterance, the subject is asked to think of alternative utterances to execute the same task. In this way, we obtain many utterances from a small number of subjects. An evaluation of the proposed method applied to interview-based corpus creation shows that the proposed method reduces the number of subjects by 41% while maintaining morphological diversity in a corpus and morphological coverage for user utterances spoken to commercial devices. It also shows that the proposed method reduces the total time for interviewing subjects by 36% compared with the conventional method. We conclude that the proposed method can be used to build a useful corpus while reducing lead time and financial cost.

APA, Harvard, Vancouver, ISO, and other styles

7

Zhang, Jin Xi, Hong Zhi Yu, Ning Ma, and Zhao Yao Li. "The Phoneme Automatic Segmentation Algorithms Study of Tibetan Lhasa Words Continuous Speech Stream." Advanced Materials Research 765-767 (September 2013): 2051–54. http://dx.doi.org/10.4028/www.scientific.net/amr.765-767.2051.

Full text

Abstract:

In this paper, we adopt two methods to voice phoneme segmentation when building Tibetan corpus: One is the traditional artificial segmentation method, one is the automatic segmentation method based on the Mono prime HMM model. And experiments are performed to analyze the accuracy of both methods of segmentations. The results showed: Automatic segmentation method based tone prime HMM model helps to shorten the cycle of building Tibetan corpus, especially in building a large corpus segmentation and labeling a lot of time and manpower cost savings, and have greatly improved the accuracy and consistency of speech corpus annotation information.

APA, Harvard, Vancouver, ISO, and other styles

8

HO, CHUKFONG, MASRAH AZRIFAH AZMI MURAD, RABIAH ABDUL KADIR, and SHYAMALA DORAISAMY. "COMPARING TWO CORPUS-BASED METHODS FOR EXTRACTING PARAPHRASES TO DICTIONARY-BASED METHOD." International Journal of Semantic Computing 05, no. 02 (June 2011): 133–78. http://dx.doi.org/10.1142/s1793351x11001225.

Full text

Abstract:

Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases.

APA, Harvard, Vancouver, ISO, and other styles

9

Dao, Jizhaxi, Zhijie Cai, Rangzhuoma Cai, Maocuo San, and Mabao Ban. "A method of constructing syllable level Tibetan text classification corpus." MATEC Web of Conferences 336 (2021): 06013. http://dx.doi.org/10.1051/matecconf/202133606013.

Full text

Abstract:

Corpus serves as an indispensable ingredient for statistical NLP research and real-world applications, therefore corpus construction method has a direct impact on various downstream tasks. This paper proposes a method to construct Tibetan text classification corpus based on a syllable-level processing technique which we refer as TC_TCCNL. Empirical evidence indicates that the algorithm is able to produce a promising performance, which may lay a starting point for research on Tibetan text classification in the future.

APA, Harvard, Vancouver, ISO, and other styles

10

Wang, Shi, Zhujun Wang, Yi Jiang, and Huayu Wang. "Hierarchical Annotation Event Extraction Method in Multiple Scenarios." Wireless Communications and Mobile Computing 2021 (March 18, 2021): 1–9. http://dx.doi.org/10.1155/2021/8899852.

Full text

Abstract:

In the event extraction task, considering that there may be multiple scenarios in the corpus and an argument may play different roles under different triggers, the traditional tagging scheme can only tag each word once, which cannot solve the problem of argument overlap. A hierarchical tagging pipeline model for Chinese corpus based on the pretrained model Bert was proposed, which can obtain the relevant arguments of each event in a hierarchical way. The pipeline structure is selected in the model, and the event extraction task is divided into event trigger classification and argument recognition. Firstly, the pretrained model Bert is used to generate the feature vector and transfer it to bidirectional gated recurrent unit+conditional random field (BiGRU+CRF) model for trigger classification; then, the marked event type features are spliced into the corpus as known features and then passed into BiGRU+CRF for argument recognition. We evaluated our method on DUEE, combined with data enhancement and mask operation. Experimental results show that our method is improved compared with other baselines, which prove the effectiveness of the model in Chinese corpus.

APA, Harvard, Vancouver, ISO, and other styles

11

Wang, Shuang. "Research and Application of Network Aided Translation Method." Applied Mechanics and Materials 687-691 (November 2014): 1687–90. http://dx.doi.org/10.4028/www.scientific.net/amm.687-691.1687.

Full text

Abstract:

The paper put forward a new aided translation's pattern, Bilingual Assisted Translation Search Engine. It's different from traditional machine translation, does not rely on the computer's automatic translation, but according to relevant translation list given by system, and get the correct translation by people. Compared with automatic translation machine, it has better quality; compared with artificial translation, it has more efficient. For users, the relevant translation we provide more accurate and matching unless there is a large corpus, so the core of the system is the construction of the bilingual corpus. The paper adopted the method of web data mining and search engine technology, completed the construction of large-scale corpus automatically.

APA, Harvard, Vancouver, ISO, and other styles

12

Wolk, Christoph, and Benedikt Szmrecsanyi. "Probabilistic corpus-based dialectometry." Journal of Linguistic Geography 6, no. 1 (April 2018): 56–75. http://dx.doi.org/10.1017/jlg.2018.6.

Full text

Abstract:

Researchers in dialectometry have begun to explore measurements based on fundamentally quantitative metrics, often sourced from dialect corpora, as an alternative to the traditional signals derived from dialect atlases. This change of data type amplifies an existing issue in the classical paradigm, namely that locations may vary in coverage and that this affects the distance measurements: pairs involving a location with lower coverage suffer from greater noise and therefore imprecision. We propose a method for increasing robustness using generalized additive modeling, a statistical technique that allows leveraging the spatial arrangement of the data. The technique is applied to data from the British English dialect corpus FRED; the results are evaluated regarding their interpretability and according to several quantitative metrics. We conclude that data availability is an influential covariate in corpus-based dialectometry and beyond, and recommend that researchers be aware of this issue and of methods to alleviate it.

APA, Harvard, Vancouver, ISO, and other styles

13

Xu, Wenbin, and Chengbo Yin. "Adaptive Language Processing Based on Deep Learning in Cloud Computing Platform." Complexity 2020 (June 19, 2020): 1–11. http://dx.doi.org/10.1155/2020/5828130.

Full text

Abstract:

With the continuous advancement of technology, the amount of information and knowledge disseminated on the Internet every day has been developing several times. At the same time, a large amount of bilingual data has also been produced in the real world. These data are undoubtedly a great asset for statistical machine translation research. Based on the dual-sentence quality corpus screening, two corpus screening strategies are proposed first, based on the double-sentence pair length ratio method and the word-based alignment information method. The innovation of these two methods is that no additional linguistic resources such as bilingual dictionary and syntactic analyzer are needed as auxiliary. No manual intervention is required, and the poor quality sentence pairs can be automatically selected and can be applied to any language pair. Secondly, a domain adaptive method based on massive corpus is proposed. The method based on massive corpus utilizes massive corpus mechanism to carry out multidomain automatic model migration. In this domain, each domain learns the intradomain model independently, and different domains share the same general model. Through the method of massive corpus, these models can be combined and adjusted to make the model learning more accurate. Finally, the adaptive method of massive corpus filtering and statistical machine translation based on cloud platform is verified. Experiments show that both methods have good effects and can effectively improve the translation quality of statistical machines.

APA, Harvard, Vancouver, ISO, and other styles

14

Dash, Niladri Sekhar, Kesavan Vadakalur Elumalai, Mufleh Salem M. Alqahtani, and May Abdulaziz Abumelha. "Proposing a Customised Method for Extratextual Documentative Annotation on Written Text Corpus." International Journal of English Linguistics 9, no. 2 (February 24, 2019): 99. http://dx.doi.org/10.5539/ijel.v9n2p99.

Full text

Abstract:

In this paper, we have made an attempt to portray a perceivable sketch of extratextual documentative annotation which, in the present frame of text annotation, is considered as one of the indispensable processes through which we can add representational information to the texts included in a written corpus. This becomes more important when a corpus is made with a large number of texts obtained from different genres and text types. To develop a workable frame for extratextual annotation, at each stage, we have broadly classified the existing processes of corpus annotation into two broad types. Moreover, we have tried to explain different layers that are embedded with extratextual annotation of texts as well as marked out the applications which can substantially enhance the accessibility of language data from a corpus for the works of text file management, information retrieval, lexical items extraction, and language processing. The techniques that we have proposed and described in this paper are unique in the sense that these are highly useful for expanding the utility of data of a written text corpus beyond the immediate horizons of language processing to the realms of theoretical, descriptive, and applied linguistics. In this paper, we have also argued that we should try to annotate all kinds of written text corpora so far developed in different natural languages at the extratextual level in a uniform manner so that the text samples stored in corpora can be uniformly used for various works of descriptive linguistics, theoretical linguistics, language technology, and applied linguistics including grammar writing, dictionary compilation, and language teaching. The annotation scheme proposed here is applied on a sample Bangla text corpus and we have noted that the accessibility of data and information from this kind of corpus is far easier than that of an un-annotated raw corpus.

APA, Harvard, Vancouver, ISO, and other styles

15

Bloothooft, Gerrit. "Corpus-based Name Standardization." History and Computing 6, no. 3 (October 1994): 153–67. http://dx.doi.org/10.3366/hac.1994.6.3.153.

Full text

Abstract:

A method is described to standardize nominal data on the basis of a combination of rules and a probabilistic similarity measure. Onomastic corpora are used to estimate the probability of spelling variations automatically. These corpora are also the basis for finding the most likely standard for a name not encountered before.

APA, Harvard, Vancouver, ISO, and other styles

16

Johnson, Mark. "The DOP Estimation Method Is Biased and Inconsistent." Computational Linguistics 28, no. 1 (March 2002): 71–76. http://dx.doi.org/10.1162/089120102317341783.

Full text

Abstract:

A data-oriented parsing or DOP model for statistical parsing associates fragments of linguistic representations with numerical weights, where these weights are estimated by normalizing the empirical frequency of each fragment in a training corpus (see Bod [1998] and references cited therein). This note observes that this estimation method is biased and inconsistent that is, the estimated distribution does not in general converge on the true distribution as the size of the training corpus increases.

APA, Harvard, Vancouver, ISO, and other styles

17

Shao, Yikang, Guangde Xu, Mingxue Xu, and Lanfang Dong. "An Automatic Question Answering Method for Small-Scale Corpus." Journal of Physics: Conference Series 1621 (August 2020): 012113. http://dx.doi.org/10.1088/1742-6596/1621/1/012113.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Tepper, Ronnie, Zvi Leibovitz, Catherine Garel, and Rivka Sukenik‐Halevy. "A new method for evaluating short fetal corpus callosum." Prenatal Diagnosis 39, no. 13 (November 11, 2019): 1283–90. http://dx.doi.org/10.1002/pd.5598.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

VILA, M., H. RODRÍGUEZ, and M. A. MARTÍ. "Relational paraphrase acquisition from Wikipedia: The WRPA method and corpus." Natural Language Engineering 21, no. 3 (September 16, 2013): 355–89. http://dx.doi.org/10.1017/s1351324913000235.

Full text

Abstract:

AbstractParaphrase corpora are an essential but scarce resource in Natural Language Processing. In this paper, we present the Wikipedia-based Relational Paraphrase Acquisition (WRPA) method, which extracts relational paraphrases from Wikipedia, and the derived WRPA paraphrase corpus. The WRPA corpus currently covers person-related and authorship relations in English and Spanish, respectively, suggesting that, given adequate Wikipedia coverage, our method is independent of the language and the relation addressed. WRPA extracts entity pairs from structured information in Wikipedia applying distant learning and, based on the distributional hypothesis, uses them as anchor points for candidate paraphrase extraction from the free text in the body of Wikipedia articles. Focussing on relational paraphrasing and taking advantage of Wikipedia-structured information allows for an automatic and consistent evaluation of the results. The WRPA corpus characteristics distinguish it from other types of corpora that rely on string similarity or transformation operations. WRPA relies on distributional similarity and is the result of the free use of language outside any reformulation framework. Validation results show a high precision for the corpus.

APA, Harvard, Vancouver, ISO, and other styles

20

Соловьёв, Роман Сергеевич. "Methods for Establishing a Chronology of Dialogues in Corpus Platonicum." Theological Herald, no. 4(39) (December 15, 2020): 279–96. http://dx.doi.org/10.31802/gb.2020.39.4.016.

Full text

Abstract:

В статье автор критически рассматривает методы установления хронологии диалогов платоновского корпуса. Описав античные методы организации корпуса (каноны Аристофана Византийского и Трасилла), автор показал, что принцип организации диалогов - сюжетно-смысловой, а не хронологический. Далее разбираются основные методы определения хронологии, а именно: анализ литературной формы, анализ философского содержания, внешние и внутренние свидетельства, данные стилометрического анализа. Указав на субъективность оценок литературных свойств диалогов, порочность подхода, исходящего из априорного представления о постепенном философском развитии Платона, недостаточность внешних и внутренних свидетельств, автор специально останавливается на стилометрическом методе, основанном на допущении, что на протяжении жизни язык и стиль Платона сознательно и бессознательно изменялся, что можно проследить на языковом материале диалогов. Проследив развитие метода и описав его результаты, автор приходит к выводу, что стилометрия не дает никаких определенных данных для хронологии т. н. ранних диалогов. Наконец, автор анализирует выдвигавшиеся исследователями основания относить диалог Евтифрон к числу ранних и приходит к выводу о неосновательности такой оценки. Результаты подробного исследования места Евтифрона в Corpus Platonicum были опубликованы в предыдущих номерах Богословского вестника. In the article, the author critically examines methods of establishing a chronology of dialogues in Corpus Platonicum. After the description of ancient methods of the organization of the Corpus Platonicum (Platonic Canon of Aristophanes of Byzantium and Thrasyllus), the author has shown that the principle of the organization of dialogues is thematic but not chronological. The principal methods of determining chronology, namely the analysis of the literary form, the analysis of philosophical content, external and internal evidence and the stylometric analysis, are then discussed. Pointing out the subjectivity of assessments of the literary properties of the dialogues, the viciousness of the approach based on the a priori idea of Plato’s gradual philosophical development, and the insufficiency of external and internal evidence, the author specifically stops at the stylometric method based on the assumption that Plato’s language and style have changed consciously and unconsciously throughout his life, which can be traced back to the language and style of the dialogues. Having traced the development of the method and described its results, the author concludes that stylometry does not provide any specific data for the chronology of the so-called early dialogues. Finally, the author analyses the reasons put forward by researchers for including Euthyphro in the early dialogues and concludes that this assessment is not grounded. The results of a detailed study of Euthyphro’s place in Corpus Platonicum have been published in previous issues of the Theological Herald.

APA, Harvard, Vancouver, ISO, and other styles

21

Bauer, Gerhard, Elżbieta Płonka-Półtorak, Richard Bauer, Iris Unterberger, and Giorgi Kuchukhidze. "Corpus callosum and epilepsies." Journal of Epileptology 21, no. 2 (December 1, 2013): 89–104. http://dx.doi.org/10.1515/joepi-2015-0008.

Full text

Abstract:

SUMMARYIntroduction.Corpus callosum (CC) is the largest forebrain commissure. Structural anomalies and accompanying clinical symptoms are not in the focus of neurologists, epileptologists or neurosurgeons.Aim and method.Anatomy, embryological development, normal functions, structural abnormalities, additional malformations, clinical symptoms and seizure disorders with CC anomalies are reviewed from the literature.Review.The detection of callosal anomalies increased rapidly with widespread use of brain imaging methods. Agenesis or dysgenesis of corpus callosum (AgCC) might be considered an accidental finding. Epileptic seizures occur in up to 89% of patients with AgCC. The causal relationship correctly is questioned. However, additional causative malformations of midline and/or telencephalic structures can be demonstrated in most seizure patients. The interruption of bilateral spread of seizure activities acts as the concept for callosotomy as epilepsy surgery. Indications are drug-resistant generalized, diffuse, or multifocal epilepsies. A resectable seizure onset zone should be excluded. Most treated patients are diagnosed as Lennox-Gastaut or Lennox-like syndrome.Conclusions.In cases with callosal abnormalities and clinical symptoms additional malformations are frequently observed, especially with seizure disorders. Callosotomy is the most effective option against drop attacks. The method probably is underused. After callosotomy a circumscript seizure focus might be unveiled and a second step of resective epilepsy surgery can be successful.

APA, Harvard, Vancouver, ISO, and other styles

22

Wang, Shuang. "Research on Bilingual Corpus Based Machine Translation." Applied Mechanics and Materials 687-691 (November 2014): 1683–86. http://dx.doi.org/10.4028/www.scientific.net/amm.687-691.1683.

Full text

Abstract:

This thesis proposes several methods for bilingual corpus form different websites, such as Automatic acquisition of bilingual corpus base on "iciba" web, CNKI and Patent network. It introduced methods, procedures of the acquisition of a variety of corpus. We proposed different methods to obtain the bilingual corpus for different characteristics of different sites, and achieved fast and accurate automatic access of a large-scale bilingual corpus. When we obtain the bilingual corpus based on "iciba" web, the main method is Nutch crawler, which is relatively good, and has an accurate retrieve and a good correlation. In addition, we give up the idea of bilingual corpus obtained from the entire Internet, but we use an entirely new access, that is to access to the basic information of scholarly thesis’s in the CNKI to obtain the large-scale high-quality English-Chinese bilingual corpus. We obtain GB level of large-scale bilingual aligned corpus in the end, which is very accurate by the manual evaluation. And the corpus makes preparation for the further cross-language information retrieval research.

APA, Harvard, Vancouver, ISO, and other styles

23

Luo, Ruifeng. "A Study on Chinese TALK Metaphor from Corpus-based Approach." Journal of Language Teaching and Research 9, no. 2 (March 1, 2018): 346. http://dx.doi.org/10.17507/jltr.0902.16.

Full text

Abstract:

Metaphor is a vitally important concept in Cognitive Linguistics and refers to the mapping from source domain to the target domain. It is the mapping from the concrete entity to the abstract one, through which we can understand the process of men’s mental cognition to handle abstract things through specific ones and has been researching by many linguistic scholars by means of traditional methods such as introspection. The Corpus method is a newly utilized and empirical method to conduct linguistic research and contains the language materials of real and the actual use of language, and corpus is the carrier of basic language knowledge resources based on the computer. The real corpus must be processed (analysis and processing), in order to become useful resources. This paper takes advantage of CCL Corpus(Center for Chinese Linguistics Corpus) which is the biggest Chinese Corpus in China constructed by Beijing University to investigate TAKL metaphor and conduct the empirical research to make metaphor research more objective and convincing.

APA, Harvard, Vancouver, ISO, and other styles

24

Nazar, Rogelio, Irene Renau, Nicolas Acosta, Hernan Robledo, Maha Soliman, and Sofıa Zamora. "Corpus-Based Methods for Recognizing the Gender of Anthroponyms." Names 69, no. 3 (August 16, 2021): 16–27. http://dx.doi.org/10.5195/names.2021.2238.

Full text

Abstract:

This paper presents a series of methods for automatically determining the gender of proper names, based on their co-occurrence with words and grammatical features in a large corpus. Although the results obtained were for Spanish given names, the method presented here can be easily replicated and used for names in other languages. Most methods reported in the literature use pre-existing lists of first names that require costly manual processing and tend to become quickly outdated. Instead, we propose using corpora. Doing so offers the possibility of obtaining real and up-to-date name-gender links. To test the effectiveness of our method, we explored various machine-learning methods as well as another method based on simple frequency of co-occurrence. The latter produced the best results: 93% precision and 88% recall on a database of ca. 10,000 mixed names. Our method can be applied to a variety of natural language processing tasks such as information extraction, machine translation, anaphora resolution or large-scale delivery or email correspondence, among others.

APA, Harvard, Vancouver, ISO, and other styles

25

Milkevich, Elena S. "Teaching Cognitive Linguistics Methods at Master Degree Programmes." Proceedings of Southern Federal University. Philology 2021, no. 2 (June 30, 2021): 182–91. http://dx.doi.org/10.18522/1995-0640-2021-2-182-191.

Full text

Abstract:

Cognitive linguistics combines knowledge of different sciences, such as philosophy, linguistics, psychology, neuroscience, anthropology, mathematical statistics and others. Therefore, cognitive linguistics uses specific methods and types of analyses. Among them is the method of corpus analysis, which is widely used in cognitive research. The Master Degree Programme “Digital technologies in philology. Computer linguistics” at Southern Federal University, Russia, Rostov-on-Don, aims at enabling students to master modern methods and other tools applicable in cognitive research. The teaching process covers several stages. They are: critical analysis of published corpus analysis research, working out the algorithm of conducting the corpus research, practical application of the corpus method, reading widely papers on cognitive linguistics when coming across some tricky points, arguing basic propositions of cognitive linguistics used in the research.

APA, Harvard, Vancouver, ISO, and other styles

26

de Monnink, Inge. "Combining Corpus and Experimental Data." International Journal of Corpus Linguistics 4, no. 1 (August 13, 1999): 77–111. http://dx.doi.org/10.1075/ijcl.4.1.05mon.

Full text

Abstract:

In this article I argue that, from a methodological point of view, descriptive studies improve considerably if they use a multi-method approach to the data, more specifically, if they use a combination of corpus data and experimental data. In the modern conception of corpus linguistics, intuitive data play an important role. The linguist formulates research hypotheses based on his or her intuitive knowledge. These hypotheses are then tested on the corpus data. I argue that a sound descriptive study should not end with simply stating the results from the corpus study. Instead, the corpus data have to be supplemented. An appropriate way to supplement corpus data is through the use of elicitation techniques. I illustrate the multi-method approach on a case study of floating postmodification in the English noun phrase.

APA, Harvard, Vancouver, ISO, and other styles

27

Kotsiuk, Lesia, and Yurii Kotsiuk. "CLASSIFICATIONAL PARADIGM OF A TEXT CORPUS BY ITS DESIGN, STRUCTURE AND USE, AS WELL AS BY THE FIXATION AND INDEXATION METHODS OF ITS TEXT DATA." Naukovì zapiski Nacìonalʹnogo unìversitetu «Ostrozʹka akademìâ». Serìâ «Fìlologìâ» 1, no. 9(77) (January 30, 2020): 106–10. http://dx.doi.org/10.25264/2519-2558-2020-9(77)-106-110.

Full text

Abstract:

The article attempts to analyze the typological characteristics of text corpora. The author proposes to classify corpora with consideration of different aspects of this modern linguistic notion, namely the design and structural features of the corpus (balanced / representative corpus, opportunistic corpus, complete corpus, full-text corpus, fragmentary corpus, parallel corpus and comparable corpus, static / sample corpus, dynamic / monitor corpus), the method of fixing and indexing text data in the corpus (printed corpus, electronic text corpus, transcribed speech corpus, audio/video corpus, multimodal corpus, plain corpus, annotated corpus), as well as the way of how the corpus can be used. According to the aim of the corpus use one can distinguish between a linguistic and illustrative corpus. Due to the access possibilities, there can be identified an open-access corpus, closed-access corpus and the commercial one. Examples of these types of text corpora are also presented. The article presents terminological equivalents of corpus names by the type of text data in Ukrainian and English.

APA, Harvard, Vancouver, ISO, and other styles

28

Kilgarriff, Adam. "Comparing Corpora." International Journal of Corpus Linguistics 6, no. 1 (December 17, 2001): 97–133. http://dx.doi.org/10.1075/ijcl.6.1.05kil.

Full text

Abstract:

Corpus linguistics lacks strategies for describing and comparing corpora. Currently most descriptions of corpora are textual, and questions such as ‘what sort of a corpus is this?’, or ‘how does this corpus compare to that?’ can only be answered impressionistically. This paper considers various ways in which different corpora can be compared more objectively. First we address the issue, ‘which words are particularly characteristic of a corpus?’, reviewing and critiquing the statistical methods which have been applied to the question and proposing the use of the Mann-Whitney ranks test. Results of two corpus comparisons using the ranks test are presented. Then, we consider measures for corpus similarity. After discussing limitations of the idea of corpus similarity, we present a method for evaluating corpus similarity measures. We consider several measures and establish that a\chi\tsup{2}-based one performs best. All methods considered in this paper are based on word and ngram frequencies; the strategy is defended.

APA, Harvard, Vancouver, ISO, and other styles

29

Ouyang, Xin, and Chun Yan Shuai. "A Method of Inquiring Ontology with Semantic Templates." Applied Mechanics and Materials 347-350 (August 2013): 2452–57. http://dx.doi.org/10.4028/www.scientific.net/amm.347-350.2452.

Full text

Abstract:

It is always a challenge by using statistical method in corpus database to analyze semantics of natural language Sentences (NLS). This paper proposes a method of recognizing and translating ontology query in natural language, called OntoQuery-NLP. With the help of pre-create semantic templates, the OntoQuery-NLP maps NLSs matching the format of the semantic templates into formal semantic expressions. By parsing these semantic expressions, the OntoQuery-NLP recognizes the queries and gets the correct answers from ontology. Compared with other methods, the OntoQuery-NLP, without the support of any corpus, has faster retrieving speed and higher retrieving accuracy.

APA, Harvard, Vancouver, ISO, and other styles

30

Petrauskaitė, Rūta, and Virginijus Dadurkevičius. "A method to update traditional explanatory dictionaries." Taikomoji kalbotyra 16 (October 26, 2021): 42–48. http://dx.doi.org/10.15388/taikalbot.2021.16.3.

Full text

Abstract:

In the paper the method is presented how to update traditional digitalised dictionaries based on comparison of the dictionary lemmas and a big corpus. Hunspell platform is used for generation of all the word forms from the dictionary lemmas. 6th edition of The Dictionary of Modern Lithuanian was chosen for its comparison with the lexical data from The Joint Corpus of Lithuanian. The outcome of the comparison was two lists of non-overlapping lexis: the list of the dictionary lemmas unused in the present-day Lithuanian and the list of the dictionary gaps, i.e., frequently used words and word forms ignored by the dictionary. The latter is discussed in greater detail to give lexicographers a clue for updates.

APA, Harvard, Vancouver, ISO, and other styles

31

LIU, Yong. "A Corpus Based Method for Evaluating the Teaching Effect of Electronic Engineering English." Tobacco Regulatory Science 7, no. 5 (September 30, 2021): 1222–29. http://dx.doi.org/10.18001/trs.7.5.39.

Full text

Abstract:

The existing evaluation methods have the problem of imperfect teaching effect evaluation model, which leads to the low reliability of the evaluation index. This paper designs a corpus based quantitative analysis method for teaching effect evaluation of Electronic Engineering English. Based on the mathematical principle of radial basis function, this paper uses corpus to analyze the distribution characteristics of Electronic Engineering English courses quantitatively, and uses association rules algorithm to build an English teaching effect evaluation model to comprehensively judge the importance of each factor. Experimental results: the average reliability of the two existing evaluation methods is 1.0751 and 0.5455 respectively, and the average reliability of the evaluation method is 0.7983, which shows that the reliability of the evaluation method is closer to the standard value of 0.8, which proves that the evaluation method of Electronic Engineering English teaching effect integrated with quantitative analysis of corpus has better practical application performance good.

APA, Harvard, Vancouver, ISO, and other styles

32

Schönefeld, Doris. "Corpus Linguistics and Cognitivism." International Journal of Corpus Linguistics 4, no. 1 (August 13, 1999): 137–71. http://dx.doi.org/10.1075/ijcl.4.1.07sch.

Full text

Abstract:

The following article is meant to discuss the status of corpus linguistics, how it is seen and sees itself as a field: Is it merely a method of doing linguistics, or can it be considered a distinct approach to language description? In our argument, we claim that corpus linguistics is on the way of becoming more than a methodology, since its research results are increasingly interpreted with regard to their impact on the commonly held views about language. Dealing with these interpretations, we have noticed a number of similarities with assumptions made by cognitive linguistics, and we aim at showing that the two trends—corpus linguistics and cognitivism—are compatible in that they complement each other.

APA, Harvard, Vancouver, ISO, and other styles

33

Zhong, Zhi Nong, Fang Chi Liu, Lin Lei, and Ning Jing. "A New Method for Location Entity Recognition." Advanced Materials Research 791-793 (September 2013): 2031–37. http://dx.doi.org/10.4028/www.scientific.net/amr.791-793.2031.

Full text

Abstract:

Location Entity Recognition (LER) is an important part in Named Entity Recognition (NER), and it is a significant research topic in this domain to use the abundant unlabeled corpus to improve recognition performance. A new method combined Active Learning with Self-Training is proposed, which selects samples based on confidence and 2-Gram frequency, and expands the training set by annotating the unlabeled corpus manually and automatically. The experiments reveal that the F-measure of this method is 8% higher than randomized Active Learning while the annotation is only 1/3 of the latter. And using this method, only 5% of characters in the extended training set need to be labeled to acquire a similar performance with complete manual annotation.

APA, Harvard, Vancouver, ISO, and other styles

34

Wang, Xiangdong, Yang Yang, Hong Liu, and Yueliang Qian. "Chinese-Braille Translation Based on Braille Corpus." International Journal of Advanced Pervasive and Ubiquitous Computing 8, no. 2 (April 2016): 56–63. http://dx.doi.org/10.4018/ijapuc.2016040104.

Full text

Abstract:

For people with visual disabilities, reading Braille text is an important way to acquire information. There are great challenges for Chinese-Braille translation due to the characteristics of word segmentation and tone marking in Chinese Braille. In this paper, a novel scheme of Chinese-Braille translation is proposed. Unlike current methods which use heuristic rules defined by experts for Braille word segmentation, the proposed method performs Chinese-Braille translation based on a Braille Corpus without experts on Braille. Under the scheme, a Braille word segmentation model based on statistical machine learning is trained on a Braille corpus, and Braille word segmentation is carried out using the statistical model directly without the stage of Chinese word segmentation. Tone marking and some special treatment are also performed based on word and rule mining on the Corpus. This method avoids manually establishment of rules concerning syntactic and semantic information and uses statistical model to learn the rules by stealthily and automatically. Experimental results show the effectiveness of the proposed approach.

APA, Harvard, Vancouver, ISO, and other styles

35

DEN, YASUHARU. "A Corpus-based Preference Decision Method for Spoken Language Analysis." Journal of Natural Language Processing 4, no. 1 (1997): 41–56. http://dx.doi.org/10.5715/jnlp.4.41.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Pivovarova, Ekaterina Vladimirovna. "CORPUS ANALYSIS METHOD IN STUDYING THE GERMAN PHRASEOLOGY (THEORETICAL SURVEY)." Philological Sciences. Issues of Theory and Practice, no. 12 (December 2019): 263–68. http://dx.doi.org/10.30853/filnauki.2019.12.52.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Peña, Gilberto Anguiano, and Catalina Naumis Peña. "Method for Selecting Specialized Terms from a General Language Corpus." KNOWLEDGE ORGANIZATION 42, no. 3 (2015): 164–75. http://dx.doi.org/10.5771/0943-7444-2015-3-164.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Liu, Hong, Jonathan Blumenthal, Neal Jeffries, Catherine Vaituzis, A. Zijdenbos, J. Rapoport, and J. Giedd. "A fully automatic method for human corpus callosum MRI analysis." NeuroImage 13, no. 6 (June 2001): 188. http://dx.doi.org/10.1016/s1053-8119(01)91531-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Wang, Anmin. "Li, D. J. (2015). Corpus Lexicography: Theory, Method and Application." International Journal of Corpus Linguistics 23, no. 3 (October 29, 2018): 370–74. http://dx.doi.org/10.1075/ijcl.00006.wan.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Cover, Giovana, Mariana Pereira, Mariana Bento, Simone Appenzeller, and Leticia Rittner. "Data-Driven Corpus Callosum Parcellation Method Through Diffusion Tensor Imaging." IEEE Access 5 (2017): 22421–32. http://dx.doi.org/10.1109/access.2017.2761701.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Gale, William A., Kenneth W. Church, and David Yarowsky. "A method for disambiguating word senses in a large corpus." Computers and the Humanities 26, no. 5-6 (December 1992): 415–39. http://dx.doi.org/10.1007/bf00136984.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Azzini, Antonia, da Costa Pereira, Mauro Dragoni, and Andrea G. B. Tettamanzi. "A Neuro-Evolutionary Corpus-Based Method for Word Sense Disambiguation." IEEE Intelligent Systems 27, no. 6 (November 2012): 26–35. http://dx.doi.org/10.1109/mis.2011.108.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Crowne, Douglas P., and Claudette M. Richardson. "A method to section the corpus callosum in the rat." Physiology & Behavior 34, no. 5 (May 1985): 847–50. http://dx.doi.org/10.1016/0031-9384(85)90389-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Szmrecsanyi, Benedikt, and Christoph Wolk. "Holistic corpus-based dialectology." Revista Brasileira de Linguística Aplicada 11, no. 2 (2011): 561–92. http://dx.doi.org/10.1590/s1984-63982011000200011.

Full text

Abstract:

This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i) abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii) draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis), and (iii) aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain.

APA, Harvard, Vancouver, ISO, and other styles

45

Pryima, L. Yu, and K. S. Chupryna. "APPLYING CORPUS APPROACH AT THE ENGLISH CLASSES IN HIGHER EDUCATIONAL ESTABLISHMENTS." Актуальні проблеми сучасної медицини: Вісник Української медичної стоматологічної академії 19, no. 2 (July 19, 2019): 206–10. http://dx.doi.org/10.31718/2077-1096.19.2.206.

Full text

Abstract:

The article describes the possible ways of applying the achievements of corpus linguistics in the ESL classroom at the HEIs. The notion of a language corpus was defined, a journey into the history of the first corpus creation was taken, and the most popular modern English-language corpora (British National Corpus, The Oxford English Corpus, COCA, etc.) were considered. The efficiency of applying the corpus method in teaching a foreign language in higher education has been proven. Thus, the use of corpora in students' work along with the inductive method contributes to their understanding of the basic linguistic patterns and the development of linguistic intuition, the material of the study being exclusively authentic texts. The possibility of using direct and indirect corpus approach to assist the formation of students' lexical, grammatical, stylistic and phonetic skills has also been studied. The direct applying this method may include teaching corpus linguistics to university students as a purely academic subject, performing certain tasks or exercises using concordance programs and performing individual research projects by students. And the indirect corpus approach can include publishing links, developing materials and language testing. The article provides information on special software that enables one to carry out corpus training and analyzes the main types of tasks that can be created using these programs. Finally, the article discusses the reasons for the unpopularity of the corpus method in the Ukrainian HEIs at the present stage of education and justifies the appropriateness of its wider use in the future academic environment.

APA, Harvard, Vancouver, ISO, and other styles

46

Fernández-Silva, Sabela, Judit Freixa, and M. Teresa Cabré Castellví. "A proposed method for analysing the dynamics of cognition through term variation." Terminology 17, no. 1 (June 20, 2011): 49–74. http://dx.doi.org/10.1075/term.17.1.04fer.

Full text

Abstract:

Today, term variation is commonly accepted to be a widespread phenomenon in specialised communication. Although some degree of arbitrariness is inevitable, the expert’s choice of a term variant is generally motivated to some extent. This article presents a methodology for describing the conceptually motivated patterns of term variation in a real corpus of special language. This method — which analyses the conceptual information displayed on the term’s form — represents an attempt to provide a framework accounting for the flexibility of concepts and conceptual structures in a systematic way. Using data from a bilingual (French and Galician) corpus of texts related to coastal fishing and aquaculture, the applicability of proposed method for analysis is illustrated with a description of the variation patterns pertaining to terms designating human entity concepts within the corpus.

APA, Harvard, Vancouver, ISO, and other styles

47

Tumbe, Chinmay. "Corpus linguistics, newspaper archives and historical research methods." Journal of Management History 25, no. 4 (November 11, 2019): 533–49. http://dx.doi.org/10.1108/jmh-01-2018-0009.

Full text

Abstract:

Purpose The purpose of this paper is to demonstrate the utility of corpus linguistics and digitised newspaper archives in management and organisational history. Design/methodology/approach The paper draws its inferences from Google NGram Viewer and five digitised historical newspaper databases – The Times of India, The Financial Times, The Economist, The New York Times and The Wall Street Journal – that contain prints from the nineteenth century. Findings The paper argues that corpus linguistics or the quantitative and qualitative analysis of large-scale real-world machine-readable text can be an important method of historical research in management studies, especially for discourse analysis. It shows how this method can be fruitfully used for research in management and organisational history, using term count and cluster analysis. In particular, historical databases of digitised newspapers serve as important corpora to understand the evolution of specific words and concepts. Corpus linguistics using newspaper archives can potentially serve as a method for periodisation and triangulation in corporate, analytically structured and serial histories and also foster cross-country comparisons in the evolution of management concepts. Research limitations/implications The paper also shows the limitation of the research method and potential robustness checks while using the method. Practical implications Findings of this paper can stimulate new ways of conducting research in management history. Originality/value The paper for the first time introduces corpus linguistics as a research method in management history.

APA, Harvard, Vancouver, ISO, and other styles

48

Th. Gries, Stefan. "The most under-used statistical method in corpus linguistics: multi-level (and mixed-effects) models." Corpora 10, no. 1 (April 2015): 95–125. http://dx.doi.org/10.3366/cor.2015.0068.

Full text

Abstract:

Much statistical analysis of psycholinguistic data is now being done with so-called mixed-effects regression models. This development was spearheaded by a few highly influential introductory articles that (i) showed how these regression models are superior to what was the previous gold standard and, perhaps even more importantly, (ii) showed how these models are used practically. Corpus linguistics can benefit from mixed-effects/multi-level models for the same reason that psycholinguistics can – because, for example, speaker-specific and lexically specific idiosyncrasies can be accounted for elegantly; but, in fact, corpus linguistics needs them even more because (i) corpus-linguistic data are observational and, thus, usually unbalanced and messy/noisy, and (ii) most widely used corpora come with a hierarchical structure that corpus linguists routinely fail to consider. Unlike nearly all overviews of mixed-effects/multi-level modelling, this paper is specifically written for corpus linguists to get more of them to start using these techniques more. After a short methodological history, I provide a non-technical introduction to mixed-effects models and then discuss in detail one example – particle placement in English – to show how mixed-effects/multi-level modelling results can be obtained and how they are far superior to those of traditional regression modelling.

APA, Harvard, Vancouver, ISO, and other styles

49

Jensen, Kim Ebensgaard. "Linguistics in the digital humanities: (computational) corpus linguistics." MedieKultur: Journal of media and communication research 30, no. 57 (December 19, 2014): 20. http://dx.doi.org/10.7146/mediekultur.v30i57.15968.

Full text

Abstract:

<p class="p1">Corpus linguistics has been closely intertwined with digital technology since the introduction of university computer mainframes in the 1960s. Making use of both digitized data in the form of the language corpus and computational methods of analysis involving concordancers and statistics software, corpus linguistics arguably has a place in the digital humanities. Still, it remains obscure and fi gures only sporadically in the literature on the digital humanities. Th is article provides an overview of the main principles of corpus linguistics and the role of computer technology in relation to data and method and also off ers a bird's-eye view of the history of corpus linguistics with a focus on its intimate relationship with digital technology and how digital technology has impacted the very core of corpus linguistics and shaped the identity of the corpus linguist. Ultimately, the article is oriented towards an acknowledgment of corpus linguistics' alignment with the digital humanities.</p>

APA, Harvard, Vancouver, ISO, and other styles

50

Bertels, Ann, and Dirk Speelman. "‘Keywords Method’ versus ‘Calcul des Spécificités’." International Journal of Corpus Linguistics 18, no. 4 (December 5, 2013): 536–60. http://dx.doi.org/10.1075/ijcl.18.4.04ber.

Full text

Abstract:

This paper explores two tools and methods for keyword extraction. As several tools are available, it makes a comparison of two widely used tools, namely Lexico3 (Lamalle et al. 2003) and WordSmith Tools (Scott 2013). It shows the importance of keywords and discusses recent studies involving keyword extraction. Since no previous study has attempted to compare two different tools, used by different language communities and which use different methodologies to extract keywords, this paper aims at filling the gap by comparing not only the tools and their practical use, but also the underlying methodologies and statistics. By means of a comparative study on a small test corpus, this paper shows major similarities and differences between the tools. The similarities mainly concern the most typical keywords, whereas the differences concern the total number of significant keywords extracted, the granularity of both probability value and typicality coefficient and the type of the reference corpus.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Corpus method'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles