Log in

Relevant bibliographies by topics / Language identification / Journal articles

To see the other types of publications on this topic, follow the link: Language identification.

Journal articles on the topic 'Language identification'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 June 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Language identification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Suganthi, Mrs Dr V., C. Thavapriya, and T. Mirudhu Bashini. "Sign Language Identification." International Journal of Research Publication and Reviews 5, no. 3 (March 21, 2024): 5997–6001. http://dx.doi.org/10.55248/gengpi.5.0324.0855.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Kumar, P. Vijay, and A. Raviteja A. Raviteja. "Automatic Indian Language Identification." International Journal of Scientific Research 2, no. 4 (June 1, 2012): 79–82. http://dx.doi.org/10.15373/22778179/apr2013/31.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

MALMASI, SHERVIN, and MARK DRAS. "Multilingual native language identification." Natural Language Engineering 23, no. 2 (December 2, 2015): 163–215. http://dx.doi.org/10.1017/s1351324915000406.

Full text

Abstract:

AbstractWe present the first comprehensive study of Native Language Identification (NLI) applied to text written in languages other than English, using data from six languages. NLI is the task of predicting an author’s first language using only their writings in a second language, with applications in Second Language Acquisition and forensic linguistics. Most research to date has focused on English but there is a need to apply NLI to other languages, not only to gauge its applicability but also to aid in teaching research for other emerging languages. With this goal, we identify six typologically very different sources of non-English second language data and conduct six experiments using a set of commonly used features. Our first two experiments evaluate our features and corpora, showing that the features perform well and at similar rates across languages. The third experiment compares non-native and native control data, showing that they can be discerned with 95 per cent accuracy. Our fourth experiment provides a cross-linguistic assessment of how the degree of syntactic data encoded in part-of-speech tags affects their efficiency as classification features, finding that most differences between first language groups lie in the ordering of the most basic word categories. We also tackle two questions that have not previously been addressed for NLI. Other work in NLI has shown that ensembles of classifiers over feature types work well and in our final experiment we use such an oracle classifier to derive an upper limit for classification accuracy with our feature set. We also present an analysis examining feature diversity, aiming to estimate the degree of overlap and complementarity between our chosen features employing an association measure for binary data. Finally, we conclude with a general discussion and outline directions for future work.

APA, Harvard, Vancouver, ISO, and other styles

4

Qafmolla, Nejla. "Automatic Language Identification." European Journal of Language and Literature 7, no. 1 (January 21, 2017): 140. http://dx.doi.org/10.26417/ejls.v7i1.p140-150.

Full text

Abstract:

Automatic Language Identification (LID) is the process of automatically identifying the language of spoken utterance or written material. LID has received much attention due to its application to major areas of research and long-aspired dreams in computational sciences, namely Machine Translation (MT), Speech Recognition (SR) and Data Mining (DM). A considerable increase in the amount of and access to data provided not only by experts but also by users all over the Internet has resulted into both the development of different approaches in the area of LID – so as to generate more efficient systems – as well as major challenges that are still in the eye of the storm of this field. Despite the fact that the current approaches have accomplished considerable success, future research concerning some issues remains on the table. The aim of this paper shall not be to describe the historic background of this field of studies, but rather to provide an overview of the current state of LID systems, as well as to classify the approaches developed to accomplish them. LID systems have advanced and are continuously evolving. Some of the issues that need special attention and improvement are semantics, the identification of various dialects and varieties of a language, identification of spelling errors, data retrieval, multilingual documents, MT and speech-to-speech translation. Methods applied to date have been good from a technical point of view, but not from a semantic one.

APA, Harvard, Vancouver, ISO, and other styles

5

Zissman, Marc A., and Kay M. Berkling. "Automatic language identification." Speech Communication 35, no. 1-2 (August 2001): 115–24. http://dx.doi.org/10.1016/s0167-6393(00)00099-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Van Segbroeck, Maarten, Ruchir Travadi, and Shrikanth S. Narayanan. "Rapid Language Identification." IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, no. 7 (July 2015): 1118–29. http://dx.doi.org/10.1109/taslp.2015.2419978.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Ranasinghe, Tharindu, and Marcos Zampieri. "Multilingual Offensive Language Identification for Low-resource Languages." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 1 (January 31, 2022): 1–13. http://dx.doi.org/10.1145/3457610.

Full text

Abstract:

Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, cyberbullying, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this article, we take advantage of available English datasets by applying cross-lingual contextual word embeddings and transfer learning to make predictions in low-resource languages. We project predictions on comparable data in Arabic, Bengali, Danish, Greek, Hindi, Spanish, and Turkish. We report results of 0.8415 F1 macro for Bengali in TRAC-2 shared task [23], 0.8532 F1 macro for Danish and 0.8701 F1 macro for Greek in OffensEval 2020 [58], 0.8568 F1 macro for Hindi in HASOC 2019 shared task [27], and 0.7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) [7], showing that our approach compares favorably to the best systems submitted to recent shared tasks on these three languages. Additionally, we report competitive performance on Arabic and Turkish using the training and development sets of OffensEval 2020 shared task. The results for all languages confirm the robustness of cross-lingual contextual embeddings and transfer learning for this task.

APA, Harvard, Vancouver, ISO, and other styles

8

Botha, G., V. Zimu, and E. Barnard. "Text-based language identification for south african languages." SAIEE Africa Research Journal 98, no. 4 (December 2007): 141–46. http://dx.doi.org/10.23919/saiee.2007.9485636.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Jothilakshmi, S., V. Ramalingam, and S. Palanivel. "A hierarchical language identification system for Indian languages." Digital Signal Processing 22, no. 3 (May 2012): 544–53. http://dx.doi.org/10.1016/j.dsp.2011.11.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Baimyrza, A. "LANGUAGE IDENTIFICATION PROCESSES OF THE YOUTH." Tiltanym, no. 3 (September 30, 2021): 28–36. http://dx.doi.org/10.55491/2411-6076-2021-3-28-36.

Full text

Abstract:

The article deals with the role of the Russian language in the processes of language identification of students. The results of the sociolinguistic survey, the main objectives of which were determined based on the need to obtain information on the following aspects of the language situation: the degree of knowledge of the youth in the state, Russian and other languages; the level and nature of social preferences in relation to the use of languages in various spheres of life; the nature of social and language preferences of the young population. The review of theoretical works of Kazakhstan and foreign scientists on this subject is given. The conclusions of the study noted the significant role of the Russian language in the formation of linguistic identity, which is due not only to historical realities, in particular, the language policy conducted for a long time, as well as the conscious choice of language. The conducted studies prove that language proficiency and its use are a factor of socialization of young people and determine the style of human interaction with their social environment.

APA, Harvard, Vancouver, ISO, and other styles

11

Gamallo, Pablo, José Ramom Pichel, and Iñaki Alegria. "From language identification to language distance." Physica A: Statistical Mechanics and its Applications 484 (October 2017): 152–62. http://dx.doi.org/10.1016/j.physa.2017.05.011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Deshpande, Mr Onkar. "Postal Address Identification and Sorting." International Journal for Research in Applied Science and Engineering Technology 9, no. VI (June 30, 2021): 4946–53. http://dx.doi.org/10.22214/ijraset.2021.36023.

Full text

Abstract:

In this fast-moving world, a normal man can take considerable time to find a postal card in a bunch of postcards with significant issues like unclear handwriting, having trouble recognizing some uncommon or ambiguous names. Also, in postal offices or industries, it negatively impacts the efficiency of the postal system. I am making a system for Indian postal automation based on recognizing pin-code on the postcard. In India, there are multiple languages were speak. Indian postcards are mainly written in three languages the state's official language, English, and Devanagari language. In India, more than 50% of people write Pincode digits in either English or Devanagari language, so I am making such a system that sorts both English and Devanagari language postcards. Moreover, the system is mature enough to recognize handwritten as well as printed digits. As a result, the system gets an accuracy of 92.59% on the English language postcards, 90% accuracy on the Devanagari language postcards e and the digit recognition model gives accuracy 99.23% Devanagari numerals and 99.43% accuracy on English numerals.

APA, Harvard, Vancouver, ISO, and other styles

13

Shimi, G., C. Jerin Mahibha, and Durairaj Thenmozhi. "An Empirical Analysis of Language Detection in Dravidian Languages." Indian Journal Of Science And Technology 17, no. 15 (April 16, 2024): 1515–26. http://dx.doi.org/10.17485/ijst/v17i15.765.

Full text

Abstract:

Objectives: Language detection is the process of identifying a language associated with a text. The proposed system aims to detect the Dravidian language that is associated with the given text using different machine learning and deep learning algorithms. The paper presents an empirical analysis of the results obtained using the different models. It also aims to evaluate the performance of a language agnostic model for the purpose of language detection. Method: An empirical analysis of Dravidian language identification in social media text using machine learning and deep learning approaches with k-fold cross validation has been implemented. The identification of Dravidian languages, including Tamil, Malayalam, Tamil Code Mix, and Malayalam Code Mix, is performed using both machine learning (ML) and deep learning algorithms. The machine learning algorithms used for language detection are Naive Bayes (NB), Multinomial Logistic Regression (MLR), Support Vector Machine (SVM), and Random Forest (RF). The supervised Deep Learning (DL) models used include BERT, mBERT and language agnostic models. Findings: The language agnostic model outperform all other models considering the task of language detection in Dravidian languages. The results of both the ML and DL models are analyzed empirically with performance measures like accuracy, precision, recall, and f1-score. The accuracy associated with different machine learning algorithms varies from 85% to 89%. It is evident from the experimental result that the deep learning model outperformed with an accuracy of 98%. Novelty: The proposed system emphasizes on the use of the language agnostic model to implement the process of detecting Dravidian languages associated with the given text which provides a promising result of 98% accuracy which is higher than the existing methodologies. Keywords: Language, Machine learning, Deep learning, Transformer model, Encoder, Decoder

APA, Harvard, Vancouver, ISO, and other styles

14

Del Bonifro, Francesca, Maurizio Gabbrielli, Antonio Lategano, and Stefano Zacchiroli. "Image-based many-language programming language identification." PeerJ Computer Science 7 (July 23, 2021): e631. http://dx.doi.org/10.7717/peerj-cs.631.

Full text

Abstract:

Programming language identification (PLI) is a common need in automatic program comprehension as well as a prerequisite for deeper forms of code understanding. Image-based approaches to PLI have recently emerged and are appealing due to their applicability to code screenshots and programming video tutorials. However, they remain limited to the recognition of a small amount of programming languages (up to 10 languages in the literature). We show that it is possible to perform image-based PLI on a large number of programming languages (up to 149 in our experiments) with high (92%) precision and recall, using convolutional neural networks (CNNs) and transfer learning, starting from readily-available pretrained CNNs. Results were obtained on a large real-world dataset of 300,000 code snippets extracted from popular GitHub repositories. By scrambling specific character classes and comparing identification performances we also show that the characters that contribute the most to the visual recognizability of programming languages are symbols (e.g., punctuation, mathematical operators and parentheses), followed by alphabetic characters, with digits and indentation having a negligible impact.

APA, Harvard, Vancouver, ISO, and other styles

15

Barnard, Etienne, and Yonghong Yan. "Toward new language adaptation for language identification." Speech Communication 21, no. 4 (May 1997): 245–54. http://dx.doi.org/10.1016/s0167-6393(97)00009-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Orena, Adriel John, Linda Polka, and Rachel M. Theodore. "Language familiarity mediates identification of bilingual talkers across languages." Journal of the Acoustical Society of America 140, no. 4 (October 2016): 3227. http://dx.doi.org/10.1121/1.4970197.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Sujaini, Herry, and Arif Bijaksana Putra. "Analysis of language identification algorithms for regional Indonesian languages." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 2 (June 1, 2024): 1741. http://dx.doi.org/10.11591/ijai.v13.i2.pp1741-1752.

Full text

Abstract:

Detecting local languages in Indonesia is essential for recognizing linguistic diversity, promoting intercultural understanding, preserving endangered languages, and improving access to education and services. By identifying and documenting these languages, we can support language preservation efforts, provide tailored resources for communities, and celebrate the unique cultural heritage of different ethnic groups. Ultimately, this encourages a more accepting and open-minded society, prioritizing various languages and cultural customs. This research aims to identify the most suitable algorithm for language detection in Indonesian regional languages and gain insights into their unique characteristics through n-gram analysis. By understanding language diversity, the study contributes to preserving Indonesia's cultural and linguistic heritage and improving language detection techniques. This study compares the performance of five algorithms (Naïve Bayes, K-nearest neighbors (KNN), least-squares, Kullback Leibler divergence, and Kolmogorov Smirnov test) to determine the most accurate and efficient method for language identification. Incorporating trigram features alongside unigrams and bigrams significantly improved the model's performance, with F1 scores increasing from 0.923 to 0.959. The study found that using more features leads to better accuracy, with Naïve Bayes and KNN emerging as the top-performing algorithms for language identification.

APA, Harvard, Vancouver, ISO, and other styles

18

Singh, Gundeep, Sahil Sharma, Vijay Kumar, Manjit Kaur, Mohammed Baz, and Mehedi Masud. "Spoken Language Identification Using Deep Learning." Computational Intelligence and Neuroscience 2021 (September 20, 2021): 1–12. http://dx.doi.org/10.1155/2021/5123671.

Full text

Abstract:

The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%.

APA, Harvard, Vancouver, ISO, and other styles

19

Muthusamy, Y. K., E. Barnard, and R. A. Cole. "Reviewing automatic language identification." IEEE Signal Processing Magazine 11, no. 4 (October 1994): 33–41. http://dx.doi.org/10.1109/79.317925.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Ambikairajah, Eliathamby, Haizhou Li, Liang Wang, Bo Yin, and Vidhyasaharan Sethu. "Language Identification: A Tutorial." IEEE Circuits and Systems Magazine 11, no. 2 (2011): 82–108. http://dx.doi.org/10.1109/mcas.2011.941081.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

van Bezooijen, Renée, and Charlotte Gooskens. "Identification of Language Varieties." Journal of Language and Social Psychology 18, no. 1 (March 1999): 31–48. http://dx.doi.org/10.1177/0261927x99018001003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

N.S, Shvaikina, and Laryushkina E.E. "Identification of Ways to Prevent and Overcome the Language Barrier in Foreign Language Classes." Addiction Research and Adolescent Behaviour 5, no. 3 (April 29, 2022): 01–02. http://dx.doi.org/10.31579/2688-7517/046.

Full text

Abstract:

Some students lose their motivation to learn a foreign language at school, as they may have had a negative experience. In order for a teacher to increase motivation to learn a second language, it is necessary to create a situation of success.

APA, Harvard, Vancouver, ISO, and other styles

23

Thomas, Merin, Dr Latha C A, and Antony Puthussery. "Identification of language in a cross linguistic environment." Indonesian Journal of Electrical Engineering and Computer Science 18, no. 1 (April 1, 2020): 544. http://dx.doi.org/10.11591/ijeecs.v18.i1.pp544-548.

Full text

Abstract:

<p class="normal">World has become very small due to software internationationalism. Applications of machine translations are increasing day by day. Using multiple languages in the social media text is an developing trend. .Availability of fonts in the native language enhanced the usage of native text in internet communications. Usage of transliterations of language has become quite common. In Indian scenario current generations are familiar to talk in native language but not to read and write in the native language, hence they started using English representation of native language in textual messages. This paper describes the identification of the transliterated text in cross lingual environment .In this paper a Neural network model identifies the prominent language in the text and hence the same can be used to identify the meaning of the text in the concerned language. The model is based upon Recurrent Neural Networks that found to be the most efficient in machine translations. Language identification can serve as a base for many applications in multi linguistic environment. Currently the South Indian Languages Malayalam, Tamil are identified from given text. An algorithmic approach of Stop words based model is depicted in this paper. Model can be also enhanced to address all the Indian Languages that are in use.</p>

APA, Harvard, Vancouver, ISO, and other styles

24

Nugraha, Azhar Baihaqi, and Ade Romadhony. "Identification of 10 Regional Indonesian Languages Using Machine Learning." sinkron 8, no. 4 (October 1, 2023): 2203–14. http://dx.doi.org/10.33395/sinkron.v8i4.12989.

Full text

Abstract:

Language Identification plays a pivotal role in deciphering the rich tapestry of Indonesia's diverse regional languages, encompassing a wide spectrum of scripts, and spoken forms. Language Identification, an integral component of Natural Language Processing, is frequently addressed through Text Classification. In this study, we embark on the task of identifying 10 Indonesian languages, leveraging the NusaX dataset, with the overarching objective of contextual language determination. To achieve this, we harness a diverse array of machine learning techniques, including Support Vector Machine, Naïve Bayes Classifier, Decision Tree, Rocchio Classification, Logistic Regression, and Random Forest. We complement these methods with two distinct feature extraction approaches: N-gram and TF-IDF. This comprehensive approach enables us to construct robust models for language identification. Our findings unveil the strong efficacy of these models in discerning Indonesian languages, with the Naïve Bayes Classifier emerging as the frontrunner, achieving an impressive accuracy rate of 99.2% with TF-IDF and an even more remarkable 99.4% with N-Gram. To gain deeper insights, we delve into error analysis, revealing that misclassifications often stem from shared words across different languages. This research is underpinned by the necessity for a robust language identification model, underscoring its critical role within the complex linguistic landscape of Indonesian regional languages. These results hold great promise for applications in automated language processing and understanding within this diverse and multifaceted linguistic context.

APA, Harvard, Vancouver, ISO, and other styles

25

Irtza, Saad, Vidhyasaharan Sethu, Eliathamby Ambikairajah, and Haizhou Li. "Using language cluster models in hierarchical language identification." Speech Communication 100 (June 2018): 30–40. http://dx.doi.org/10.1016/j.specom.2018.04.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Selamat, Ali, and Nicholas Akosu. "Word-length algorithm for language identification of under-resourced languages." Journal of King Saud University - Computer and Information Sciences 28, no. 4 (October 2016): 457–69. http://dx.doi.org/10.1016/j.jksuci.2014.12.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Chakravarthi, Bharathi Raja, Manoj Balaji Jagadeeshan, Vasanth Palanikumar, and Ruba Priyadharshini. "Offensive language identification in dravidian languages using MPNet and CNN." International Journal of Information Management Data Insights 3, no. 1 (April 2023): 100151. http://dx.doi.org/10.1016/j.jjimei.2022.100151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Asubiaro, Toluwase, Tunde Adegbola, Robert Mercer, and Isola Ajiferuke. "A word‐level language identification strategy for resource‐scarce languages." Proceedings of the Association for Information Science and Technology 55, no. 1 (January 2018): 19–28. http://dx.doi.org/10.1002/pra2.2018.14505501004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Hidayatullah, Ahmad Fathan, Rosyzie Anna Apong, Daphne T. C. Lai, and Atika Qazi. "Corpus creation and language identification for code-mixed Indonesian-Javanese-English Tweets." PeerJ Computer Science 9 (June 22, 2023): e1312. http://dx.doi.org/10.7717/peerj-cs.1312.

Full text

Abstract:

With the massive use of social media today, mixing between languages in social media text is prevalent. In linguistics, the phenomenon of mixing languages is known as code-mixing. The prevalence of code-mixing exposes various concerns and challenges in natural language processing (NLP), including language identification (LID) tasks. This study presents a word-level language identification model for code-mixed Indonesian, Javanese, and English tweets. First, we introduce a code-mixed corpus for Indonesian-Javanese-English language identification (IJELID). To ensure reliable dataset annotation, we provide full details of the data collection and annotation standards construction procedures. Some challenges encountered during corpus creation are also discussed in this paper. Then, we investigate several strategies for developing code-mixed language identification models, such as fine-tuning BERT, BLSTM-based, and CRF. Our results show that fine-tuned IndoBERTweet models can identify languages better than the other techniques. This is the result of BERT’s ability to understand each word’s context from the given text sequence. Finally, we show that sub-word language representation in BERT models can provide a reliable model for identifying languages in code-mixed texts.

APA, Harvard, Vancouver, ISO, and other styles

30

Bose, Smarajit, Amita Pal, Anish Mukherjee, and Debasmita Das. "Improved Language-Independent Speaker Identification in a Non-contemporaneous Setup." International Journal of Machine Learning and Computing 10, no. 5 (October 5, 2020): 630–36. http://dx.doi.org/10.18178/ijmlc.2020.10.5.984.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Lui, Marco, Jey Han Lau, and Timothy Baldwin. "Automatic Detection and Language Identification of Multilingual Documents." Transactions of the Association for Computational Linguistics 2 (December 2014): 27–40. http://dx.doi.org/10.1162/tacl_a_00163.

Full text

Abstract:

Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document. In this work, we address the problem of detecting documents that contain text from more than one language ( multilingual documents). We introduce a method that is able to detect that a document is multilingual, identify the languages present, and estimate their relative proportions. We demonstrate the effectiveness of our method over synthetic data, as well as real-world multilingual documents collected from the web.

APA, Harvard, Vancouver, ISO, and other styles

32

Barlas, P., D. Hebert, C. Chatelain, S. Adam, and T. Paquet. "Language Identification in Document Images." Electronic Imaging 2016, no. 17 (February 17, 2016): 1–16. http://dx.doi.org/10.2352/issn.2470-1173.2016.17.drr-058.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Sadhukhan, Tanusree, Shweta Bansal, and Atul Kumar. "Automatic Identification of Spoken Language." IOSR Journal of Computer Engineering 19, no. 02 (May 2017): 84–89. http://dx.doi.org/10.9790/0661-1902058489.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Barlas, P., D. Hebert, C. Chatelain, S. Adam, and T. Paquet. "Language Identification in Document Images." Journal of Imaging Science and Technology 60, no. 1 (January 1, 2016): 104071–1040716. http://dx.doi.org/10.2352/j.imagingsci.technol.2016.60.1.010407.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

., Shubham Saini. "LANGUAGE IDENTIFICATION USING G-LDA." International Journal of Research in Engineering and Technology 02, no. 11 (November 25, 2013): 42–45. http://dx.doi.org/10.15623/ijret.2013.0211008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Mahender, C. Namrata, Ramesh Ram Naik, and Maheshkumar Bhujangrao Landge. "Author Identification for Marathi Language." Advances in Science, Technology and Engineering Systems Journal 5, no. 2 (2020): 432–40. http://dx.doi.org/10.25046/aj050256.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Hazen, Timothy J., and Victor W. Zue. "Segment-based automatic language identification." Journal of the Acoustical Society of America 101, no. 4 (April 1997): 2323–31. http://dx.doi.org/10.1121/1.418211.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Li, Kung-Pu. "Automatic language identification/verification system." Journal of the Acoustical Society of America 104, no. 1 (July 1998): 31. http://dx.doi.org/10.1121/1.424049.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Dutta, Arup Kumar, and K. Sreenivasa Rao. "Language identification using phase information." International Journal of Speech Technology 21, no. 3 (December 12, 2017): 509–19. http://dx.doi.org/10.1007/s10772-017-9482-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Newman, Jacob L., and Stephen J. Cox. "Language Identification Using Visual Features." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 7 (September 2012): 1936–47. http://dx.doi.org/10.1109/tasl.2012.2191956.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Jain, S., and A. Sharma. "Prudence in vacillatory language identification." Mathematical Systems Theory 28, no. 3 (May 1995): 267–79. http://dx.doi.org/10.1007/bf01303059.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Souter et al., Clive. "Natural Language Identification using Corpus-Based Models." HERMES - Journal of Language and Communication in Business 7, no. 13 (January 4, 2017): 183. http://dx.doi.org/10.7146/hjlcb.v7i13.25083.

Full text

Abstract:

This paper describes three approaches to the task of automatically identifying the language a text is written in. We conducted experiments to compare the success of each approach in identifying languages from a set of texts in Dutch/Friesian, English, French, Gaelic (Irish), German, Italian, Portuguese, Serbo-Croat and Spanish.....

APA, Harvard, Vancouver, ISO, and other styles

43

Jauhiainen, T., K. Lindén, and H. Jauhiainen. "Language model adaptation for language and dialect identification of text." Natural Language Engineering 25, no. 5 (July 31, 2019): 561–83. http://dx.doi.org/10.1017/s135132491900038x.

Full text

Abstract:

AbstractThis article describes an unsupervised language model (LM) adaptation approach that can be used to enhance the performance of language identification methods. The approach is applied to a current version of the HeLI language identification method, which is now called HeLI 2.0. We describe the HeLI 2.0 method in detail. The resulting system is evaluated using the datasets from the German dialect identification and Indo-Aryan language identification shared tasks of the VarDial workshops 2017 and 2018. The new approach with LM adaptation provides considerably higher F1-scores than the basic HeLI or HeLI 2.0 methods or the other systems which participated in the shared tasks. The results indicate that unsupervised LM adaptation should be considered as an option in all language identification tasks, especially in those where encountering out-of-domain data is likely.

APA, Harvard, Vancouver, ISO, and other styles

44

Avram, Andrei-Marius, Verginica Barbu Mititelu, Vasile Păiș, Dumitru-Clementin Cercel, and Ștefan Trăușan-Matu. "Multilingual Multiword Expression Identification Using Lateral Inhibition and Domain Adaptation." Mathematics 11, no. 11 (June 1, 2023): 2548. http://dx.doi.org/10.3390/math11112548.

Full text

Abstract:

Correctly identifying multiword expressions (MWEs) is an important task for most natural language processing systems since their misidentification can result in ambiguity and misunderstanding of the underlying text. In this work, we evaluate the performance of the mBERT model for MWE identification in a multilingual context by training it on all 14 languages available in version 1.2 of the PARSEME corpus. We also incorporate lateral inhibition and language adversarial training into our methodology to create language-independent embeddings and improve its capabilities in identifying multiword expressions. The evaluation of our models shows that the approach employed in this work achieves better results compared to the best system of the PARSEME 1.2 competition, MTLB-STRUCT, on 11 out of 14 languages for global MWE identification and on 12 out of 14 languages for unseen MWE identification. Additionally, averaged across all languages, our best approach outperforms the MTLB-STRUCT system by 1.23% on global MWE identification and by 4.73% on unseen global MWE identification.

APA, Harvard, Vancouver, ISO, and other styles

45

Wijonarko, Panji, and Amalia Zahra. "Spoken language identification on 4 Indonesian local languages using deep learning." Bulletin of Electrical Engineering and Informatics 11, no. 6 (December 1, 2022): 3288–93. http://dx.doi.org/10.11591/eei.v11i6.4166.

Full text

Abstract:

Language identification is at the forefront of assistance in many applications, including multilingual speech systems, spoken language translation, multilingual speech recognition, and human-machine interaction via voice. The identification of indonesian local languages using spoken language identification technology has enormous potential to advance tourism potential and digital content in Indonesia. The goal of this study is to identify four Indonesian local languages: Javanese, Sundanese, Minangkabau, and Buginese, utilizing deep learning classification techniques such as artificial neural network (ANN), convolutional neural network (CNN), and long-term short memory (LSTM). The selected extraction feature for audio data extraction employs mel-frequency cepstral coefficient (MFCC). The results showed that the LSTM model had the highest accuracy for each speech duration (3 s, 10 s, and 30 s), followed by the CNN and ANN models.

APA, Harvard, Vancouver, ISO, and other styles

46

Menon, Riya. "Detectsy: A System for Detecting Language from the Text, Images, and Audio Files." International Journal for Research in Applied Science and Engineering Technology 10, no. 6 (June 30, 2022): 1975–80. http://dx.doi.org/10.22214/ijraset.2022.44281.

Full text

Abstract:

Abstract— Language detection is a natural language processing task where we need to identify the language of a text or document. As a human, we can easily detect the languages we know. However, it is not possible for an individual to identify many languages. This is where the language identification task can be used. The proposed solution is a complete system that detects language from the text, images, and audio files. Language identification task from text is carried out by training a Multinomial Naive Bayes classifier model. In the case of image and audio inputs, Python libraries are used to achieve the goal of language detection.

APA, Harvard, Vancouver, ISO, and other styles

47

Ranasinghe, Tharindu, and Marcos Zampieri. "An Evaluation of Multilingual Offensive Language Identification Methods for the Languages of India." Information 12, no. 8 (July 29, 2021): 306. http://dx.doi.org/10.3390/info12080306.

Full text

Abstract:

The pervasiveness of offensive content in social media has become an important reason for concern for online platforms. With the aim of improving online safety, a large number of studies applying computational models to identify such content have been published in the last few years, with promising results. The majority of these studies, however, deal with high-resource languages such as English due to the availability of datasets in these languages. Recent work has addressed offensive language identification from a low-resource perspective, exploring data augmentation strategies and trying to take advantage of existing multilingual pretrained models to cope with data scarcity in low-resource scenarios. In this work, we revisit the problem of low-resource offensive language identification by evaluating the performance of multilingual transformers in offensive language identification for languages spoken in India. We investigate languages from different families such as Indo-Aryan (e.g., Bengali, Hindi, and Urdu) and Dravidian (e.g., Tamil, Malayalam, and Kannada), creating important new technology for these languages. The results show that multilingual offensive language identification models perform better than monolingual models and that cross-lingual transformers show strong zero-shot and few-shot performance across languages.

APA, Harvard, Vancouver, ISO, and other styles

48

Ellis, Erica M., and Donna J. Thal. "Early Language Delay and Risk for Language Impairment." Perspectives on Language Learning and Education 15, no. 3 (October 2008): 93–100. http://dx.doi.org/10.1044/lle15.3.93.

Full text

Abstract:

Abstract Clinicians are often faced with the difficult task of deciding whether a late talker shows normal variability or has a clinically significant language disorder. This article provides an overview of research investigating identification, characteristics, outcomes, and predictors of late talkers. Clinical implications for speech-language pathologists in the identification and treatment of children who are late talkers are discussed.

APA, Harvard, Vancouver, ISO, and other styles

49

Bhuvanagirir, Kiran, and Sunil Kumar Kopparapu. "Mixed Language Speech Recognition without Explicit Identification of Language." American Journal of Signal Processing 2, no. 5 (December 1, 2012): 92–97. http://dx.doi.org/10.5923/j.ajsp.20120205.02.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Marchegiani, Letizia, and Xenofon Fafoutis. "On cross-language consonant identification in second language noise." Journal of the Acoustical Society of America 138, no. 4 (October 2015): 2206–9. http://dx.doi.org/10.1121/1.4930955.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!