Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Language identification.

Artykuły w czasopismach na temat „Language identification”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Language identification”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Suganthi, Mrs Dr V., C. Thavapriya i T. Mirudhu Bashini. "Sign Language Identification". International Journal of Research Publication and Reviews 5, nr 3 (21.03.2024): 5997–6001. http://dx.doi.org/10.55248/gengpi.5.0324.0855.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Kumar, P. Vijay, i A. Raviteja A. Raviteja. "Automatic Indian Language Identification". International Journal of Scientific Research 2, nr 4 (1.06.2012): 79–82. http://dx.doi.org/10.15373/22778179/apr2013/31.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

MALMASI, SHERVIN, i MARK DRAS. "Multilingual native language identification". Natural Language Engineering 23, nr 2 (2.12.2015): 163–215. http://dx.doi.org/10.1017/s1351324915000406.

Pełny tekst źródła
Streszczenie:
AbstractWe present the first comprehensive study of Native Language Identification (NLI) applied to text written in languages other than English, using data from six languages. NLI is the task of predicting an author’s first language using only their writings in a second language, with applications in Second Language Acquisition and forensic linguistics. Most research to date has focused on English but there is a need to apply NLI to other languages, not only to gauge its applicability but also to aid in teaching research for other emerging languages. With this goal, we identify six typologically very different sources of non-English second language data and conduct six experiments using a set of commonly used features. Our first two experiments evaluate our features and corpora, showing that the features perform well and at similar rates across languages. The third experiment compares non-native and native control data, showing that they can be discerned with 95 per cent accuracy. Our fourth experiment provides a cross-linguistic assessment of how the degree of syntactic data encoded in part-of-speech tags affects their efficiency as classification features, finding that most differences between first language groups lie in the ordering of the most basic word categories. We also tackle two questions that have not previously been addressed for NLI. Other work in NLI has shown that ensembles of classifiers over feature types work well and in our final experiment we use such an oracle classifier to derive an upper limit for classification accuracy with our feature set. We also present an analysis examining feature diversity, aiming to estimate the degree of overlap and complementarity between our chosen features employing an association measure for binary data. Finally, we conclude with a general discussion and outline directions for future work.
Style APA, Harvard, Vancouver, ISO itp.
4

Qafmolla, Nejla. "Automatic Language Identification". European Journal of Language and Literature 7, nr 1 (21.01.2017): 140. http://dx.doi.org/10.26417/ejls.v7i1.p140-150.

Pełny tekst źródła
Streszczenie:
Automatic Language Identification (LID) is the process of automatically identifying the language of spoken utterance or written material. LID has received much attention due to its application to major areas of research and long-aspired dreams in computational sciences, namely Machine Translation (MT), Speech Recognition (SR) and Data Mining (DM). A considerable increase in the amount of and access to data provided not only by experts but also by users all over the Internet has resulted into both the development of different approaches in the area of LID – so as to generate more efficient systems – as well as major challenges that are still in the eye of the storm of this field. Despite the fact that the current approaches have accomplished considerable success, future research concerning some issues remains on the table. The aim of this paper shall not be to describe the historic background of this field of studies, but rather to provide an overview of the current state of LID systems, as well as to classify the approaches developed to accomplish them. LID systems have advanced and are continuously evolving. Some of the issues that need special attention and improvement are semantics, the identification of various dialects and varieties of a language, identification of spelling errors, data retrieval, multilingual documents, MT and speech-to-speech translation. Methods applied to date have been good from a technical point of view, but not from a semantic one.
Style APA, Harvard, Vancouver, ISO itp.
5

Zissman, Marc A., i Kay M. Berkling. "Automatic language identification". Speech Communication 35, nr 1-2 (sierpień 2001): 115–24. http://dx.doi.org/10.1016/s0167-6393(00)00099-6.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Van Segbroeck, Maarten, Ruchir Travadi i Shrikanth S. Narayanan. "Rapid Language Identification". IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, nr 7 (lipiec 2015): 1118–29. http://dx.doi.org/10.1109/taslp.2015.2419978.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Ranasinghe, Tharindu, i Marcos Zampieri. "Multilingual Offensive Language Identification for Low-resource Languages". ACM Transactions on Asian and Low-Resource Language Information Processing 21, nr 1 (31.01.2022): 1–13. http://dx.doi.org/10.1145/3457610.

Pełny tekst źródła
Streszczenie:
Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, cyberbullying, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this article, we take advantage of available English datasets by applying cross-lingual contextual word embeddings and transfer learning to make predictions in low-resource languages. We project predictions on comparable data in Arabic, Bengali, Danish, Greek, Hindi, Spanish, and Turkish. We report results of 0.8415 F1 macro for Bengali in TRAC-2 shared task [23], 0.8532 F1 macro for Danish and 0.8701 F1 macro for Greek in OffensEval 2020 [58], 0.8568 F1 macro for Hindi in HASOC 2019 shared task [27], and 0.7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) [7], showing that our approach compares favorably to the best systems submitted to recent shared tasks on these three languages. Additionally, we report competitive performance on Arabic and Turkish using the training and development sets of OffensEval 2020 shared task. The results for all languages confirm the robustness of cross-lingual contextual embeddings and transfer learning for this task.
Style APA, Harvard, Vancouver, ISO itp.
8

Botha, G., V. Zimu i E. Barnard. "Text-based language identification for south african languages". SAIEE Africa Research Journal 98, nr 4 (grudzień 2007): 141–46. http://dx.doi.org/10.23919/saiee.2007.9485636.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Jothilakshmi, S., V. Ramalingam i S. Palanivel. "A hierarchical language identification system for Indian languages". Digital Signal Processing 22, nr 3 (maj 2012): 544–53. http://dx.doi.org/10.1016/j.dsp.2011.11.008.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Baimyrza, A. "LANGUAGE IDENTIFICATION PROCESSES OF THE YOUTH". Tiltanym, nr 3 (30.09.2021): 28–36. http://dx.doi.org/10.55491/2411-6076-2021-3-28-36.

Pełny tekst źródła
Streszczenie:
The article deals with the role of the Russian language in the processes of language identification of students. The results of the sociolinguistic survey, the main objectives of which were determined based on the need to obtain information on the following aspects of the language situation: the degree of knowledge of the youth in the state, Russian and other languages; the level and nature of social preferences in relation to the use of languages in various spheres of life; the nature of social and language preferences of the young population. The review of theoretical works of Kazakhstan and foreign scientists on this subject is given. The conclusions of the study noted the significant role of the Russian language in the formation of linguistic identity, which is due not only to historical realities, in particular, the language policy conducted for a long time, as well as the conscious choice of language. The conducted studies prove that language proficiency and its use are a factor of socialization of young people and determine the style of human interaction with their social environment.
Style APA, Harvard, Vancouver, ISO itp.
11

Gamallo, Pablo, José Ramom Pichel i Iñaki Alegria. "From language identification to language distance". Physica A: Statistical Mechanics and its Applications 484 (październik 2017): 152–62. http://dx.doi.org/10.1016/j.physa.2017.05.011.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
12

Deshpande, Mr Onkar. "Postal Address Identification and Sorting". International Journal for Research in Applied Science and Engineering Technology 9, nr VI (30.06.2021): 4946–53. http://dx.doi.org/10.22214/ijraset.2021.36023.

Pełny tekst źródła
Streszczenie:
In this fast-moving world, a normal man can take considerable time to find a postal card in a bunch of postcards with significant issues like unclear handwriting, having trouble recognizing some uncommon or ambiguous names. Also, in postal offices or industries, it negatively impacts the efficiency of the postal system. I am making a system for Indian postal automation based on recognizing pin-code on the postcard. In India, there are multiple languages were speak. Indian postcards are mainly written in three languages the state's official language, English, and Devanagari language. In India, more than 50% of people write Pincode digits in either English or Devanagari language, so I am making such a system that sorts both English and Devanagari language postcards. Moreover, the system is mature enough to recognize handwritten as well as printed digits. As a result, the system gets an accuracy of 92.59% on the English language postcards, 90% accuracy on the Devanagari language postcards e and the digit recognition model gives accuracy 99.23% Devanagari numerals and 99.43% accuracy on English numerals.
Style APA, Harvard, Vancouver, ISO itp.
13

Shimi, G., C. Jerin Mahibha i Durairaj Thenmozhi. "An Empirical Analysis of Language Detection in Dravidian Languages". Indian Journal Of Science And Technology 17, nr 15 (16.04.2024): 1515–26. http://dx.doi.org/10.17485/ijst/v17i15.765.

Pełny tekst źródła
Streszczenie:
Objectives: Language detection is the process of identifying a language associated with a text. The proposed system aims to detect the Dravidian language that is associated with the given text using different machine learning and deep learning algorithms. The paper presents an empirical analysis of the results obtained using the different models. It also aims to evaluate the performance of a language agnostic model for the purpose of language detection. Method: An empirical analysis of Dravidian language identification in social media text using machine learning and deep learning approaches with k-fold cross validation has been implemented. The identification of Dravidian languages, including Tamil, Malayalam, Tamil Code Mix, and Malayalam Code Mix, is performed using both machine learning (ML) and deep learning algorithms. The machine learning algorithms used for language detection are Naive Bayes (NB), Multinomial Logistic Regression (MLR), Support Vector Machine (SVM), and Random Forest (RF). The supervised Deep Learning (DL) models used include BERT, mBERT and language agnostic models. Findings: The language agnostic model outperform all other models considering the task of language detection in Dravidian languages. The results of both the ML and DL models are analyzed empirically with performance measures like accuracy, precision, recall, and f1-score. The accuracy associated with different machine learning algorithms varies from 85% to 89%. It is evident from the experimental result that the deep learning model outperformed with an accuracy of 98%. Novelty: The proposed system emphasizes on the use of the language agnostic model to implement the process of detecting Dravidian languages associated with the given text which provides a promising result of 98% accuracy which is higher than the existing methodologies. Keywords: Language, Machine learning, Deep learning, Transformer model, Encoder, Decoder
Style APA, Harvard, Vancouver, ISO itp.
14

Del Bonifro, Francesca, Maurizio Gabbrielli, Antonio Lategano i Stefano Zacchiroli. "Image-based many-language programming language identification". PeerJ Computer Science 7 (23.07.2021): e631. http://dx.doi.org/10.7717/peerj-cs.631.

Pełny tekst źródła
Streszczenie:
Programming language identification (PLI) is a common need in automatic program comprehension as well as a prerequisite for deeper forms of code understanding. Image-based approaches to PLI have recently emerged and are appealing due to their applicability to code screenshots and programming video tutorials. However, they remain limited to the recognition of a small amount of programming languages (up to 10 languages in the literature). We show that it is possible to perform image-based PLI on a large number of programming languages (up to 149 in our experiments) with high (92%) precision and recall, using convolutional neural networks (CNNs) and transfer learning, starting from readily-available pretrained CNNs. Results were obtained on a large real-world dataset of 300,000 code snippets extracted from popular GitHub repositories. By scrambling specific character classes and comparing identification performances we also show that the characters that contribute the most to the visual recognizability of programming languages are symbols (e.g., punctuation, mathematical operators and parentheses), followed by alphabetic characters, with digits and indentation having a negligible impact.
Style APA, Harvard, Vancouver, ISO itp.
15

Barnard, Etienne, i Yonghong Yan. "Toward new language adaptation for language identification". Speech Communication 21, nr 4 (maj 1997): 245–54. http://dx.doi.org/10.1016/s0167-6393(97)00009-5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
16

Orena, Adriel John, Linda Polka i Rachel M. Theodore. "Language familiarity mediates identification of bilingual talkers across languages". Journal of the Acoustical Society of America 140, nr 4 (październik 2016): 3227. http://dx.doi.org/10.1121/1.4970197.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
17

Sujaini, Herry, i Arif Bijaksana Putra. "Analysis of language identification algorithms for regional Indonesian languages". IAES International Journal of Artificial Intelligence (IJ-AI) 13, nr 2 (1.06.2024): 1741. http://dx.doi.org/10.11591/ijai.v13.i2.pp1741-1752.

Pełny tekst źródła
Streszczenie:
Detecting local languages in Indonesia is essential for recognizing linguistic diversity, promoting intercultural understanding, preserving endangered languages, and improving access to education and services. By identifying and documenting these languages, we can support language preservation efforts, provide tailored resources for communities, and celebrate the unique cultural heritage of different ethnic groups. Ultimately, this encourages a more accepting and open-minded society, prioritizing various languages and cultural customs. This research aims to identify the most suitable algorithm for language detection in Indonesian regional languages and gain insights into their unique characteristics through n-gram analysis. By understanding language diversity, the study contributes to preserving Indonesia's cultural and linguistic heritage and improving language detection techniques. This study compares the performance of five algorithms (Naïve Bayes, K-nearest neighbors (KNN), least-squares, Kullback Leibler divergence, and Kolmogorov Smirnov test) to determine the most accurate and efficient method for language identification. Incorporating trigram features alongside unigrams and bigrams significantly improved the model's performance, with F1 scores increasing from 0.923 to 0.959. The study found that using more features leads to better accuracy, with Naïve Bayes and KNN emerging as the top-performing algorithms for language identification.
Style APA, Harvard, Vancouver, ISO itp.
18

Singh, Gundeep, Sahil Sharma, Vijay Kumar, Manjit Kaur, Mohammed Baz i Mehedi Masud. "Spoken Language Identification Using Deep Learning". Computational Intelligence and Neuroscience 2021 (20.09.2021): 1–12. http://dx.doi.org/10.1155/2021/5123671.

Pełny tekst źródła
Streszczenie:
The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%.
Style APA, Harvard, Vancouver, ISO itp.
19

Muthusamy, Y. K., E. Barnard i R. A. Cole. "Reviewing automatic language identification". IEEE Signal Processing Magazine 11, nr 4 (październik 1994): 33–41. http://dx.doi.org/10.1109/79.317925.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
20

Ambikairajah, Eliathamby, Haizhou Li, Liang Wang, Bo Yin i Vidhyasaharan Sethu. "Language Identification: A Tutorial". IEEE Circuits and Systems Magazine 11, nr 2 (2011): 82–108. http://dx.doi.org/10.1109/mcas.2011.941081.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
21

van Bezooijen, Renée, i Charlotte Gooskens. "Identification of Language Varieties". Journal of Language and Social Psychology 18, nr 1 (marzec 1999): 31–48. http://dx.doi.org/10.1177/0261927x99018001003.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
22

N.S, Shvaikina, i Laryushkina E.E. "Identification of Ways to Prevent and Overcome the Language Barrier in Foreign Language Classes". Addiction Research and Adolescent Behaviour 5, nr 3 (29.04.2022): 01–02. http://dx.doi.org/10.31579/2688-7517/046.

Pełny tekst źródła
Streszczenie:
Some students lose their motivation to learn a foreign language at school, as they may have had a negative experience. In order for a teacher to increase motivation to learn a second language, it is necessary to create a situation of success.
Style APA, Harvard, Vancouver, ISO itp.
23

Thomas, Merin, Dr Latha C A i Antony Puthussery. "Identification of language in a cross linguistic environment". Indonesian Journal of Electrical Engineering and Computer Science 18, nr 1 (1.04.2020): 544. http://dx.doi.org/10.11591/ijeecs.v18.i1.pp544-548.

Pełny tekst źródła
Streszczenie:
<p class="normal">World has become very small due to software internationationalism. Applications of machine translations are increasing day by day. Using multiple languages in the social media text is an developing trend. .Availability of fonts in the native language enhanced the usage of native text in internet communications. Usage of transliterations of language has become quite common. In Indian scenario current generations are familiar to talk in native language but not to read and write in the native language, hence they started using English representation of native language in textual messages. This paper describes the identification of the transliterated text in cross lingual environment .In this paper a Neural network model identifies the prominent language in the text and hence the same can be used to identify the meaning of the text in the concerned language. The model is based upon Recurrent Neural Networks that found to be the most efficient in machine translations. Language identification can serve as a base for many applications in multi linguistic environment. Currently the South Indian Languages Malayalam, Tamil are identified from given text. An algorithmic approach of Stop words based model is depicted in this paper. Model can be also enhanced to address all the Indian Languages that are in use.</p>
Style APA, Harvard, Vancouver, ISO itp.
24

Nugraha, Azhar Baihaqi, i Ade Romadhony. "Identification of 10 Regional Indonesian Languages Using Machine Learning". sinkron 8, nr 4 (1.10.2023): 2203–14. http://dx.doi.org/10.33395/sinkron.v8i4.12989.

Pełny tekst źródła
Streszczenie:
Language Identification plays a pivotal role in deciphering the rich tapestry of Indonesia's diverse regional languages, encompassing a wide spectrum of scripts, and spoken forms. Language Identification, an integral component of Natural Language Processing, is frequently addressed through Text Classification. In this study, we embark on the task of identifying 10 Indonesian languages, leveraging the NusaX dataset, with the overarching objective of contextual language determination. To achieve this, we harness a diverse array of machine learning techniques, including Support Vector Machine, Naïve Bayes Classifier, Decision Tree, Rocchio Classification, Logistic Regression, and Random Forest. We complement these methods with two distinct feature extraction approaches: N-gram and TF-IDF. This comprehensive approach enables us to construct robust models for language identification. Our findings unveil the strong efficacy of these models in discerning Indonesian languages, with the Naïve Bayes Classifier emerging as the frontrunner, achieving an impressive accuracy rate of 99.2% with TF-IDF and an even more remarkable 99.4% with N-Gram. To gain deeper insights, we delve into error analysis, revealing that misclassifications often stem from shared words across different languages. This research is underpinned by the necessity for a robust language identification model, underscoring its critical role within the complex linguistic landscape of Indonesian regional languages. These results hold great promise for applications in automated language processing and understanding within this diverse and multifaceted linguistic context.
Style APA, Harvard, Vancouver, ISO itp.
25

Irtza, Saad, Vidhyasaharan Sethu, Eliathamby Ambikairajah i Haizhou Li. "Using language cluster models in hierarchical language identification". Speech Communication 100 (czerwiec 2018): 30–40. http://dx.doi.org/10.1016/j.specom.2018.04.004.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
26

Selamat, Ali, i Nicholas Akosu. "Word-length algorithm for language identification of under-resourced languages". Journal of King Saud University - Computer and Information Sciences 28, nr 4 (październik 2016): 457–69. http://dx.doi.org/10.1016/j.jksuci.2014.12.004.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
27

Chakravarthi, Bharathi Raja, Manoj Balaji Jagadeeshan, Vasanth Palanikumar i Ruba Priyadharshini. "Offensive language identification in dravidian languages using MPNet and CNN". International Journal of Information Management Data Insights 3, nr 1 (kwiecień 2023): 100151. http://dx.doi.org/10.1016/j.jjimei.2022.100151.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
28

Asubiaro, Toluwase, Tunde Adegbola, Robert Mercer i Isola Ajiferuke. "A word‐level language identification strategy for resource‐scarce languages". Proceedings of the Association for Information Science and Technology 55, nr 1 (styczeń 2018): 19–28. http://dx.doi.org/10.1002/pra2.2018.14505501004.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
29

Hidayatullah, Ahmad Fathan, Rosyzie Anna Apong, Daphne T. C. Lai i Atika Qazi. "Corpus creation and language identification for code-mixed Indonesian-Javanese-English Tweets". PeerJ Computer Science 9 (22.06.2023): e1312. http://dx.doi.org/10.7717/peerj-cs.1312.

Pełny tekst źródła
Streszczenie:
With the massive use of social media today, mixing between languages in social media text is prevalent. In linguistics, the phenomenon of mixing languages is known as code-mixing. The prevalence of code-mixing exposes various concerns and challenges in natural language processing (NLP), including language identification (LID) tasks. This study presents a word-level language identification model for code-mixed Indonesian, Javanese, and English tweets. First, we introduce a code-mixed corpus for Indonesian-Javanese-English language identification (IJELID). To ensure reliable dataset annotation, we provide full details of the data collection and annotation standards construction procedures. Some challenges encountered during corpus creation are also discussed in this paper. Then, we investigate several strategies for developing code-mixed language identification models, such as fine-tuning BERT, BLSTM-based, and CRF. Our results show that fine-tuned IndoBERTweet models can identify languages better than the other techniques. This is the result of BERT’s ability to understand each word’s context from the given text sequence. Finally, we show that sub-word language representation in BERT models can provide a reliable model for identifying languages in code-mixed texts.
Style APA, Harvard, Vancouver, ISO itp.
30

Bose, Smarajit, Amita Pal, Anish Mukherjee i Debasmita Das. "Improved Language-Independent Speaker Identification in a Non-contemporaneous Setup". International Journal of Machine Learning and Computing 10, nr 5 (5.10.2020): 630–36. http://dx.doi.org/10.18178/ijmlc.2020.10.5.984.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
31

Lui, Marco, Jey Han Lau i Timothy Baldwin. "Automatic Detection and Language Identification of Multilingual Documents". Transactions of the Association for Computational Linguistics 2 (grudzień 2014): 27–40. http://dx.doi.org/10.1162/tacl_a_00163.

Pełny tekst źródła
Streszczenie:
Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document. In this work, we address the problem of detecting documents that contain text from more than one language ( multilingual documents). We introduce a method that is able to detect that a document is multilingual, identify the languages present, and estimate their relative proportions. We demonstrate the effectiveness of our method over synthetic data, as well as real-world multilingual documents collected from the web.
Style APA, Harvard, Vancouver, ISO itp.
32

Barlas, P., D. Hebert, C. Chatelain, S. Adam i T. Paquet. "Language Identification in Document Images". Electronic Imaging 2016, nr 17 (17.02.2016): 1–16. http://dx.doi.org/10.2352/issn.2470-1173.2016.17.drr-058.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
33

Sadhukhan, Tanusree, Shweta Bansal i Atul Kumar. "Automatic Identification of Spoken Language". IOSR Journal of Computer Engineering 19, nr 02 (maj 2017): 84–89. http://dx.doi.org/10.9790/0661-1902058489.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
34

Barlas, P., D. Hebert, C. Chatelain, S. Adam i T. Paquet. "Language Identification in Document Images". Journal of Imaging Science and Technology 60, nr 1 (1.01.2016): 104071–1040716. http://dx.doi.org/10.2352/j.imagingsci.technol.2016.60.1.010407.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
35

., Shubham Saini. "LANGUAGE IDENTIFICATION USING G-LDA". International Journal of Research in Engineering and Technology 02, nr 11 (25.11.2013): 42–45. http://dx.doi.org/10.15623/ijret.2013.0211008.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
36

Mahender, C. Namrata, Ramesh Ram Naik i Maheshkumar Bhujangrao Landge. "Author Identification for Marathi Language". Advances in Science, Technology and Engineering Systems Journal 5, nr 2 (2020): 432–40. http://dx.doi.org/10.25046/aj050256.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
37

Hazen, Timothy J., i Victor W. Zue. "Segment-based automatic language identification". Journal of the Acoustical Society of America 101, nr 4 (kwiecień 1997): 2323–31. http://dx.doi.org/10.1121/1.418211.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

Li, Kung-Pu. "Automatic language identification/verification system". Journal of the Acoustical Society of America 104, nr 1 (lipiec 1998): 31. http://dx.doi.org/10.1121/1.424049.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
39

Dutta, Arup Kumar, i K. Sreenivasa Rao. "Language identification using phase information". International Journal of Speech Technology 21, nr 3 (12.12.2017): 509–19. http://dx.doi.org/10.1007/s10772-017-9482-5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
40

Newman, Jacob L., i Stephen J. Cox. "Language Identification Using Visual Features". IEEE Transactions on Audio, Speech, and Language Processing 20, nr 7 (wrzesień 2012): 1936–47. http://dx.doi.org/10.1109/tasl.2012.2191956.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
41

Jain, S., i A. Sharma. "Prudence in vacillatory language identification". Mathematical Systems Theory 28, nr 3 (maj 1995): 267–79. http://dx.doi.org/10.1007/bf01303059.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
42

Souter et al., Clive. "Natural Language Identification using Corpus-Based Models". HERMES - Journal of Language and Communication in Business 7, nr 13 (4.01.2017): 183. http://dx.doi.org/10.7146/hjlcb.v7i13.25083.

Pełny tekst źródła
Streszczenie:
This paper describes three approaches to the task of automatically identifying the language a text is written in. We conducted experiments to compare the success of each approach in identifying languages from a set of texts in Dutch/Friesian, English, French, Gaelic (Irish), German, Italian, Portuguese, Serbo-Croat and Spanish.....
Style APA, Harvard, Vancouver, ISO itp.
43

Jauhiainen, T., K. Lindén i H. Jauhiainen. "Language model adaptation for language and dialect identification of text". Natural Language Engineering 25, nr 5 (31.07.2019): 561–83. http://dx.doi.org/10.1017/s135132491900038x.

Pełny tekst źródła
Streszczenie:
AbstractThis article describes an unsupervised language model (LM) adaptation approach that can be used to enhance the performance of language identification methods. The approach is applied to a current version of the HeLI language identification method, which is now called HeLI 2.0. We describe the HeLI 2.0 method in detail. The resulting system is evaluated using the datasets from the German dialect identification and Indo-Aryan language identification shared tasks of the VarDial workshops 2017 and 2018. The new approach with LM adaptation provides considerably higher F1-scores than the basic HeLI or HeLI 2.0 methods or the other systems which participated in the shared tasks. The results indicate that unsupervised LM adaptation should be considered as an option in all language identification tasks, especially in those where encountering out-of-domain data is likely.
Style APA, Harvard, Vancouver, ISO itp.
44

Avram, Andrei-Marius, Verginica Barbu Mititelu, Vasile Păiș, Dumitru-Clementin Cercel i Ștefan Trăușan-Matu. "Multilingual Multiword Expression Identification Using Lateral Inhibition and Domain Adaptation". Mathematics 11, nr 11 (1.06.2023): 2548. http://dx.doi.org/10.3390/math11112548.

Pełny tekst źródła
Streszczenie:
Correctly identifying multiword expressions (MWEs) is an important task for most natural language processing systems since their misidentification can result in ambiguity and misunderstanding of the underlying text. In this work, we evaluate the performance of the mBERT model for MWE identification in a multilingual context by training it on all 14 languages available in version 1.2 of the PARSEME corpus. We also incorporate lateral inhibition and language adversarial training into our methodology to create language-independent embeddings and improve its capabilities in identifying multiword expressions. The evaluation of our models shows that the approach employed in this work achieves better results compared to the best system of the PARSEME 1.2 competition, MTLB-STRUCT, on 11 out of 14 languages for global MWE identification and on 12 out of 14 languages for unseen MWE identification. Additionally, averaged across all languages, our best approach outperforms the MTLB-STRUCT system by 1.23% on global MWE identification and by 4.73% on unseen global MWE identification.
Style APA, Harvard, Vancouver, ISO itp.
45

Wijonarko, Panji, i Amalia Zahra. "Spoken language identification on 4 Indonesian local languages using deep learning". Bulletin of Electrical Engineering and Informatics 11, nr 6 (1.12.2022): 3288–93. http://dx.doi.org/10.11591/eei.v11i6.4166.

Pełny tekst źródła
Streszczenie:
Language identification is at the forefront of assistance in many applications, including multilingual speech systems, spoken language translation, multilingual speech recognition, and human-machine interaction via voice. The identification of indonesian local languages using spoken language identification technology has enormous potential to advance tourism potential and digital content in Indonesia. The goal of this study is to identify four Indonesian local languages: Javanese, Sundanese, Minangkabau, and Buginese, utilizing deep learning classification techniques such as artificial neural network (ANN), convolutional neural network (CNN), and long-term short memory (LSTM). The selected extraction feature for audio data extraction employs mel-frequency cepstral coefficient (MFCC). The results showed that the LSTM model had the highest accuracy for each speech duration (3 s, 10 s, and 30 s), followed by the CNN and ANN models.
Style APA, Harvard, Vancouver, ISO itp.
46

Menon, Riya. "Detectsy: A System for Detecting Language from the Text, Images, and Audio Files". International Journal for Research in Applied Science and Engineering Technology 10, nr 6 (30.06.2022): 1975–80. http://dx.doi.org/10.22214/ijraset.2022.44281.

Pełny tekst źródła
Streszczenie:
Abstract— Language detection is a natural language processing task where we need to identify the language of a text or document. As a human, we can easily detect the languages we know. However, it is not possible for an individual to identify many languages. This is where the language identification task can be used. The proposed solution is a complete system that detects language from the text, images, and audio files. Language identification task from text is carried out by training a Multinomial Naive Bayes classifier model. In the case of image and audio inputs, Python libraries are used to achieve the goal of language detection.
Style APA, Harvard, Vancouver, ISO itp.
47

Ranasinghe, Tharindu, i Marcos Zampieri. "An Evaluation of Multilingual Offensive Language Identification Methods for the Languages of India". Information 12, nr 8 (29.07.2021): 306. http://dx.doi.org/10.3390/info12080306.

Pełny tekst źródła
Streszczenie:
The pervasiveness of offensive content in social media has become an important reason for concern for online platforms. With the aim of improving online safety, a large number of studies applying computational models to identify such content have been published in the last few years, with promising results. The majority of these studies, however, deal with high-resource languages such as English due to the availability of datasets in these languages. Recent work has addressed offensive language identification from a low-resource perspective, exploring data augmentation strategies and trying to take advantage of existing multilingual pretrained models to cope with data scarcity in low-resource scenarios. In this work, we revisit the problem of low-resource offensive language identification by evaluating the performance of multilingual transformers in offensive language identification for languages spoken in India. We investigate languages from different families such as Indo-Aryan (e.g., Bengali, Hindi, and Urdu) and Dravidian (e.g., Tamil, Malayalam, and Kannada), creating important new technology for these languages. The results show that multilingual offensive language identification models perform better than monolingual models and that cross-lingual transformers show strong zero-shot and few-shot performance across languages.
Style APA, Harvard, Vancouver, ISO itp.
48

Ellis, Erica M., i Donna J. Thal. "Early Language Delay and Risk for Language Impairment". Perspectives on Language Learning and Education 15, nr 3 (październik 2008): 93–100. http://dx.doi.org/10.1044/lle15.3.93.

Pełny tekst źródła
Streszczenie:
Abstract Clinicians are often faced with the difficult task of deciding whether a late talker shows normal variability or has a clinically significant language disorder. This article provides an overview of research investigating identification, characteristics, outcomes, and predictors of late talkers. Clinical implications for speech-language pathologists in the identification and treatment of children who are late talkers are discussed.
Style APA, Harvard, Vancouver, ISO itp.
49

Bhuvanagirir, Kiran, i Sunil Kumar Kopparapu. "Mixed Language Speech Recognition without Explicit Identification of Language". American Journal of Signal Processing 2, nr 5 (1.12.2012): 92–97. http://dx.doi.org/10.5923/j.ajsp.20120205.02.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
50

Marchegiani, Letizia, i Xenofon Fafoutis. "On cross-language consonant identification in second language noise". Journal of the Acoustical Society of America 138, nr 4 (październik 2015): 2206–9. http://dx.doi.org/10.1121/1.4930955.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii