Academic literature on the topic 'Parallel corpora identification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Parallel corpora identification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Parallel corpora identification"

1

Postolea, Sorina, and Teodora Ghivirigă. "Using Small Parallel Corpora to Develop Collocation-Centred Activities in Specialized Translation Classes." Linguaculture 2016, no. 2 (December 1, 2016): 53–72. http://dx.doi.org/10.1515/lincu-2016-0012.

Full text
Abstract:
Abstract The research devoted to special languages as well as the activities carried out in specialized translation classes tend to focus primarily on one-word or multi-word terminological units. However, a very important part in the making of specialist registers and texts is played by specialised collocations, i.e. relatively stable word combinations that do not designate concepts but are nevertheless of frequent use in a given field of activity. This is why helping students acquire competences relative to the identification and processing of collocations should become an important objective in specialised translation classes. An easily accessible and dependable resource that may be successfully used to this purpose is represented by corpora and corpus analysis tools, whose usefulness in translator training has been highlighted by numerous studies. This article proposes a series of practical, task-based activities-developed with the help of a small-size parallel corpus of specialised texts-that aim to raise the translation trainees′ awareness of the collocations present in specialised texts and to provide suggestions about their processing in translation.
APA, Harvard, Vancouver, ISO, and other styles
2

Streiter, Oliver, and Leonid L. Iomdin. "Learning Lessons from Bilingual Corpora: Benefits for Machine Translation." International Journal of Corpus Linguistics 5, no. 2 (December 31, 2000): 199–230. http://dx.doi.org/10.1075/ijcl.5.2.06str.

Full text
Abstract:
The research described in this paper is rooted in the endeavors to combine the advantages of corpus-based and rule-based MT approaches in order to improve the performance of MT systems—most importantly, the quality of translation. The authors review the ongoing activities in the field and present a case study, which shows how translation knowledge can be drawn from parallel corpora and compiled into the lexicon of a rule-based MT system. These data are obtained with the help of three procedures: (1) identification of hence unknown one-word translations, (2) statistical rating of the known one-word translations, and (3) extraction of new translations of multiword expressions (MWEs) followed by compilation steps which create new rules for the MT engine. As a result, the lexicon is enriched with translation equivalents attested for different subject domains, which facilitates the tuning of the MT system to a specific subject domain and improves the quality and adequacy of translation.
APA, Harvard, Vancouver, ISO, and other styles
3

Marcińczuk, Michał, and Aleksander Wawer. "Named entity recognition for Polish." Poznan Studies in Contemporary Linguistics 55, no. 2 (June 26, 2019): 239–69. http://dx.doi.org/10.1515/psicl-2019-0010.

Full text
Abstract:
Abstract In this article we discuss the current state-of-the-art for named entity recognition for Polish. We present publicly available resources and open-source tools for named entity recognition. The overview includes various kind of resources, i.e. guidelines, annotated corpora (NKJP, KPWr, CEN, PST) and lexicons (NELexiconS, PNET, Gazetteer). We present the major NER tools for Polish (Sprout, NERF, Liner2, Parallel LSTM-CRFs and PolDeepNer) and discuss their performance on the reference datasets. In the article we cover identification of named entity mentions in the running text, local and global entity categorization, fine- and coarse-grained categorization and lemmatization of proper names.
APA, Harvard, Vancouver, ISO, and other styles
4

Koseska-Toszewa, Violetta, and Roman Roszko. "Języki słowiańskie i litewski w korpusach równoległych Clarin-PL." Studia z Filologii Polskiej i Słowiańskiej 51 (December 31, 2016): 191–217. http://dx.doi.org/10.11649/sfps.2016.011.

Full text
Abstract:
Slavic languages and the Lithuanian language in the Clarin-PL parallel corporaThe Clarin Eric and Clarin-PL strategic scientific purpose is to support humanistic research in a multicultural and multilingual Europe. Polish researchers put the emphasis on building a bridge between the Polish language and Polish linguistic technologies and other European languages and their linguistic technologies. So far, the Polish scientific community has mainly focused on Polish-English connections. Clarin-PL has been developing the first and only multilingual corpora of the Polish language in conjunction with other Slavic languages and the Lithuanian language: the Polish-Bulgarian-Russian Parallel Corpus and the Polish- Lithuanian Parallel Corpus. The parallel corpora created by the ISS PAS Corpus Linguistics and Semantics Team break through the existing “canons” and allow scientists access to interlinked multilingual language resources – in the first phase limited to the languages of the three Slavic groups and the Lithuanian language. In the article, the authors present very detailed information on their original system of the semantic annotation of scope quantification in multilingual parallel corpora, hitherto unused in the subject literature. Due to the system’s originality, the semantic annotation is carried out manually. Identification of particular values of scope quantification in a sentence and the hereby presented attempts of its recording are supported by long-term research conducted by an international team of linguists and computer scientists / mathematicians developing the issue of quantification of names, time and aspect in natural languages. Języki słowiańskie i litewski w korpusach równoległych Clarin-PLStrategicznym celem naukowym Clarin ERIC i Clarin-PL jest wspieranie badań humanistycznych w wielokulturowej i wielojęzycznej Europie. Dla polskich badaczy ważna jest budowa pomostu między językiem polskim, polskimi technologiami językowymi a innymi językami europejskimi i na ich rzecz opracowanymi technologiami językowymi. Dotychczas w nauce polskiej największy nacisk był kładziony na powiązania polsko-angielskie. Clarin-PL opracowuje zatem pierwsze jak dotąd wielojęzyczne korpusy języka polskiego w zestawieniu z innymi językami słowiańskimi oraz z językiem litewskim: Korpus równoległy polsko-bułgarsko-rosyjski i Korpus równoległy polsko-litewski. Tworzone przez Zespół Lingwistyki Korpusowej i Semantyki (IS PAN) korpusy równoległe przełamują dotychczasowe „kanony” i udostępniają nauce powiązane wielojęzyczne zasoby – w pierwszym etapie ograniczone do języków trzech grup słowiańskich oraz języka litewskiego. W artykule autorzy przedstawiają bardzo szczegółową informację o zastosowanej po raz pierwszy w literaturze przedmiotu anotacji semantycznej dotyczącej kwantyfikacji zakresowej w wielojęzycznych korpusach równoległych. Z powodu swojego rozległego zakresu i nowatorstwa ta anotacja semantyczna jest nanoszona ręcznie. Identyfikacja poszczególnych wartości kwantyfikacji zakresowej w zdaniu oraz przedstawiane tu próby jej zapisu są poparte wieloletnimi badaniami międzynarodowego zespołu lingwistów i matematyków-informatyków opracowujących zagadnienie kwantyfikacji imion, czasu i aspektu w językach naturalnych.
APA, Harvard, Vancouver, ISO, and other styles
5

MORGENSTERN, MATTHEW. "Linguistic notes on magic bowls in the Moussaieff Collection." Bulletin of the School of Oriental and African Studies 68, no. 3 (October 2005): 349–67. http://dx.doi.org/10.1017/s0041977x05000200.

Full text
Abstract:
The study of Babylonian Aramaic magic bowls has undergone something of a renaissance in recent years owing to a wave of new publications which have considerably enriched the corpus of texts now at the reader's disposal. In parallel, significant progress has been made in the identification, publication and study of accurate manuscripts of Babylonian Rabbinic literature. Based upon a close reading of recently published magic bowls from the Moussaieff Collection, this article seeks to indicate how the two corpora are mutually enlightening and dependent. An accurate reading and interpretation of the bowls depends upon an intimate knowledge of Babylonian Aramaic grammar, for which much is to be learnt from the best rabbinic manuscripts. At the same time, the magic bowls confirm the antiquity of many linguistic traits found in these rabbinic sources, and accordingly confirm their status as reliable witnesses to the Talmudic language.
APA, Harvard, Vancouver, ISO, and other styles
6

Calzada Pérez, María. "The group in the self." Pragmatics. Quarterly Publication of the International Pragmatics Association (IPrA) 29, no. 3 (February 22, 2019): 357–83. http://dx.doi.org/10.1075/prag.18026.cal.

Full text
Abstract:
Abstract Drawing on theoretical approaches to personal/group behaviour, and informed by Michael Hoey’s priming theory, this paper presents a corpus-assisted discourse study of European Parliament interventions from 2004 to 2011. The study aims to identify the group in the self and the various selves in the individual. For the analysis, three corpora from the European Comparable and Parallel Corpus Archive are explored: EP_EN (with EP interventions: 26,959,446 tokens), HC (with House of Commons interventions: 70,567,728), and SandD_david_martin (with member of European Parliament – MEP – David Martin’s interventions: 116,781). The main tool of analysis is the keyword, as generated by WordSmith 7.0. The analysis proceeds in three stages: stage 1, where the EP_EN and HC wordlists are compared, resulting in EP key priming; stage 2, where the SandD_david_martin and HC wordlists are compared, exposing David Martin’s idiosyncratic productions; and stage 3, where the EP_EN and SandD_david_martin keyword lists are manually compared, leading to the identification of EP priming in David Martin’s interventions.
APA, Harvard, Vancouver, ISO, and other styles
7

Váradi, Tamás. "Fishing for Translation Equivalents Using Grammatical Anchors." International Journal of Corpus Linguistics 5, no. 1 (July 28, 2000): 1–16. http://dx.doi.org/10.1075/ijcl.5.1.02var.

Full text
Abstract:
Bilingual parallel corpora offer a treasure house of human translator’s knowledge of the correspondences between the two languages. Extracting by automatic means the translation equivalents deemed accurate and contextually appropriate by a human translator is of great practical importance for various fields such as example-based machine translation, computational lexicography, information retrieval, etc. The task of word or phrase level identification is greatly reduced if suitable anchor points can be found in the stream of texts. It is suggested that grammatical morphemes provide very useful clues to finding translation equivalents. They typically form a closed set, occur frequently enough in sentences, have more or less fixed meanings, and, most important, will stand in a one-to-one or at most one-to-few relationship with corresponding elements in the other language. This paper will explore the viability of the idea with reference to the Hungarian and English versions of Plato’s Republic, which are available in sentence-aligned form. Hungarian has a rich set of suffixes which are typically deployed in a concatenated manner. Corresponding to them in English are prepositions, auxiliary words, and suffixes. The paper will show how, by starting from a well defined set of correspondences between Hungarian grammatical morphemes and their equivalents and using a combination of pattern matching and heuristics, one can arrive at a mapping of phrases between the two texts.
APA, Harvard, Vancouver, ISO, and other styles
8

Bertrand, Marianne, Matilde Bombardini, Raymond Fisman, and Francesco Trebbi. "Tax-Exempt Lobbying: Corporate Philanthropy as a Tool for Political Influence." American Economic Review 110, no. 7 (July 1, 2020): 2065–102. http://dx.doi.org/10.1257/aer.20180615.

Full text
Abstract:
We explore the role of charitable giving as a means of political influence. For philanthropic foundations associated with large US corporations, we present three different identification strategies that consistently point to the use of corporate social responsibility in ways that parallel the strategic use of political action committee (PAC) spending. Our estimates imply that 6.3 percent of corporate charitable giving may be politically motivated, an amount 2.5 times larger than annual PAC contributions and 35 percent of federal lobbying. Absent of disclosure requirements, charitable giving may be a form of corporate political influence undetected by voters and subsidized by taxpayers. (JEL D22, D64, D72, L31)
APA, Harvard, Vancouver, ISO, and other styles
9

Walkowski, Michał, Maciej Krakowiak, Jacek Oko, and Sławomir Sujecki. "Efficient Algorithm for Providing Live Vulnerability Assessment in Corporate Network Environment." Applied Sciences 10, no. 21 (November 9, 2020): 7926. http://dx.doi.org/10.3390/app10217926.

Full text
Abstract:
The time gap between public announcement of a vulnerability—its detection and reporting to stakeholders—is an important factor for cybersecurity of corporate networks. A large delay preceding an elimination of a critical vulnerability presents a significant risk to the network security and increases the probability of a sustained damage. Thus, accelerating the process of vulnerability identification and prioritization helps to red the probability of a successful cyberattack. This work introduces a flexible system that collects information about all known vulnerabilities present in the system, gathers data from organizational inventory database, and finally integrates and processes all collected information. Thanks to application of parallel processing and non relational databases, the results of this process are available subject to a negligible delay. The subsequent vulnerability prioritization is performed automatically on the basis of the calculated CVSS 2.0 and 3.1 scores for all scanned assets. The environmental CVSS vector component is evaluated accurately thanks to the fact that the environmental data is imported directly from the organizational inventory database.
APA, Harvard, Vancouver, ISO, and other styles
10

Bolboli, Seyed Amir, and Markus Reiche. "Introducing a concept for efficient design of EFQM excellence model." TQM Journal 27, no. 4 (June 8, 2015): 382–96. http://dx.doi.org/10.1108/tqm-01-2015-0012.

Full text
Abstract:
Purpose – The purpose of this paper is to propose a roadmap for operationalizing EFQM excellence model based on the RADAR logic and in parallel develop a new concept for selecting the firm-specific EFQM measures based on the level of maturity and the prevailing corporate culture. Design/methodology/approach – A comprehensive review of literature leads to a clarification of the relation between EFQM measures and RADAR logic and also identification of the requirements for assessment of culture and determination of maturity level in the context of EFQM excellence model. Based on these requirements, existing culture assessment approaches and maturity assessment methods have been evaluated. Findings – The main outcome of this research is a new concept for efficient design of EFQM excellence model. This concept consist of three main parts: assessment of culture types in context of EFQM; assessment of maturity level; and design of EFQM measures based on RADAR logic. The findings are expected to reduce the effort for implementation of EFQM by designing tailored measures that fit to the existing culture and maturity level. Practical implications – The findings of this study are relevant to multinational large firms that deal with EFQM or similar excellence models. Originality/value – This paper presents a new concept for designing EFQM in the light of prevailing corporate culture and maturity level, which in one hand needs fewer resources and on the other hand it is more effective in implementation.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Parallel corpora identification"

1

Asian, Jelita, and jelitayang@gmail com. "Effective Techniques for Indonesian Text Retrieval." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080110.084651.

Full text
Abstract:
The Web is a vast repository of data, and information on almost any subject can be found with the aid of search engines. Although the Web is international, the majority of research on finding of information has a focus on languages such as English and Chinese. In this thesis, we investigate information retrieval techniques for Indonesian. Although Indonesia is the fourth most populous country in the world, little attention has been given to search of Indonesian documents. Stemming is the process of reducing morphological variants of a word to a common stem form. Previous research has shown that stemming is language-dependent. Although several stemming algorithms have been proposed for Indonesian, there is no consensus on which gives better performance. We empirically explore these algorithms, showing that even the best algorithm still has scope for improvement. We propose novel extensions to this algorithm and develop a new Indonesian stemmer, and show that these can improve stemming correctness by up to three percentage points; our approach makes less than one error in thirty-eight words. We propose a range of techniques to enhance the performance of Indonesian information retrieval. These techniques include: stopping; sub-word tokenisation; and identification of proper nouns; and modifications to existing similarity functions. Our experiments show that many of these techniques can increase retrieval performance, with the highest increase achieved when we use grams of size five to tokenise words. We also present an effective method for identifying the language of a document; this allows various information retrieval techniques to be applied selectively depending on the language of target documents. We also address the problem of automatic creation of parallel corpora --- collections of documents that are the direct translations of each other --- which are essential for cross-lingual information retrieval tasks. Well-curated parallel corpora are rare, and for many languages, such as Indonesian, do not exist at all. We describe algorithms that we have developed to automatically identify parallel documents for Indonesian and English. Unlike most current approaches, which consider only the context and structure of the documents, our approach is based on the document content itself. Our algorithms do not make any prior assumptions about the documents, and are based on the Needleman-Wunsch algorithm for global alignment of protein sequences. Our approach works well in identifying Indonesian-English parallel documents, especially when no translation is performed. It can increase the separation value, a measure to discriminate good matches of parallel documents from bad matches, by approximately ten percentage points. We also investigate the applicability of our identification algorithms for other languages that use the Latin alphabet. Our experiments show that, with minor modifications, our alignment methods are effective for English-French, English-German, and French-German corpora, especially when the documents are not translated. Our technique can increase the separation value for the European corpus by up to twenty-eight percentage points. Together, these results provide a substantial advance in understanding techniques that can be applied for effective Indonesian text retrieval.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Parallel corpora identification"

1

Morgera, Elisa. Corporate Environmental Accountability in International Law. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198738046.001.0001.

Full text
Abstract:
This book explores the evolving role of international law in directing and controlling the conduct of business enterprises, in particular multinational corporations, with respect to the protection of the environment, the sustainable use of natural resources, and the respect of inter-related human rights. It assesses the progress and continuing limitations in the identification of international standards of corporate environmental accountability and responsibility, and their implementation by international organizations. This assessment indicates the extent to which the international community has conceptually and operationally clarified its expectations about acceptable corporate conduct. This second edition relates the intensified convergence of international standard-setting efforts on corporate environmental accountability, with parallel international developments on business and human rights and on the inter-relationship between human rights and the environment. It also explores the more recent emergence of substantive international standards of corporate environmental responsibility, which have arisen from a growing number of sectoral guidelines. In addition, this edition points to remaining divergences in the content of international standards of corporate environmental accountability and responsibility, which reflect differing views between States of their international obligations to ensure the protection of the environment and the respect of human rights.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Parallel corpora identification"

1

Reynaert, Martin. "Parallel identification of the spelling variants in corpora." In The Third Workshop. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1568296.1568310.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Davoodi, Elnaz, and Leila Kosseim. "Automatic Identification of AltLexes using Monolingual Parallel Corpora." In RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning. Incoma Ltd. Shoumen, Bulgaria, 2017. http://dx.doi.org/10.26615/978-954-452-049-6_027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Cardon, Rémi, and Natalia Grabar. "Identification of Parallel Sentences in Comparable Monolingual Corpora from Different Registers." In Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/w18-5610.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography