Journal articles on the topic 'Arabic language – Data processing'

To see the other types of publications on this topic, follow the link: Arabic language – Data processing.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Arabic language – Data processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Bouziane, Abdelghani, Djelloul Bouchiha, Redha Rebhi, Giulio Lorenzini, Noureddine Doumi, Younes Menni, and Hijaz Ahmad. "ARALD: Arabic Annotation Using Linked Data." Ingénierie des systèmes d information 26, no. 2 (April 30, 2021): 143–49. http://dx.doi.org/10.18280/isi.260201.

Full text
Abstract:
The evolution of the traditional Web into the semantic Web makes the machine a first-class citizen on the Web and increases the discovery and accessibility of unstructured Web-based data. This development makes it possible to use Linked Data technology as the background knowledge base for unstructured data, especially texts, now available in massive quantities on the Web. Given any text, the main challenge is determining DBpedia's most relevant information with minimal effort and time. Although, DBpedia annotation tools, such as DBpedia spotlight, mainly targeted English and Latin DBpedia versions. The current situation of the Arabic language is less bright; the Web content of the Arabic language does not reflect the importance of this language. Thus, we have developed an approach to annotate Arabic texts with Linked Open Data, particularly DBpedia. This approach uses natural language processing and machine learning techniques for interlinking Arabic text with Linked Open Data. Despite the high complexity of the independent domain knowledge base and the reduced resources in Arabic natural language processing, the evaluation results of our approach were encouraging.
APA, Harvard, Vancouver, ISO, and other styles
2

Tachicart, Ridouane, and Karim Bouzoubaa. "Moroccan Data-Driven Spelling Normalization Using Character Neural Embedding." Vietnam Journal of Computer Science 08, no. 01 (October 5, 2020): 113–31. http://dx.doi.org/10.1142/s2196888821500044.

Full text
Abstract:
With the increase of Web use in Morocco today, Internet has become an important source of information. Specifically, across social media, the Moroccan people use several languages in their communication leaving behind unstructured user-generated text (UGT) that presents several opportunities for Natural Language Processing. Among the languages found in this data, Moroccan Arabic (MA) stands with an important content and several features. In this paper, we investigate online written text generated by Moroccan users in social media with an emphasis on Moroccan Arabic. For this purpose, we follow several steps, using some tools such as a language identification system, in order to conduct a deep study of this data. The most interesting findings that have emerged are the use of code-switching, multi-script and low amount of words in the Moroccan UGT. Moreover, we used the investigated data in order to build a new Moroccan language resource. The latter consists in building a Moroccan words orthographic variants lexicon following an unsupervised approach and using character neural embedding. This lexicon can be useful for several NLP tasks such as spelling normalization.
APA, Harvard, Vancouver, ISO, and other styles
3

Essam, Nader, Abdullah M. Moussa, Khaled M. Elsayed, Sherif Abdou, Mohsen Rashwan, Shaheen Khatoon, Md Maruf Hasan, Amna Asif, and Majed A. Alshamari. "Location Analysis for Arabic COVID-19 Twitter Data Using Enhanced Dialect Identification Models." Applied Sciences 11, no. 23 (November 30, 2021): 11328. http://dx.doi.org/10.3390/app112311328.

Full text
Abstract:
The recent surge of social media networks has provided a channel to gather and publish vital medical and health information. The focal role of these networks has become more prominent in periods of crisis, such as the recent pandemic of COVID-19. These social networks have been the leading platform for broadcasting health news updates, precaution instructions, and governmental procedures. They also provide an effective means for gathering public opinion and tracking breaking events and stories. To achieve location-based analysis for social media input, the location information of the users must be captured. Most of the time, this information is either missing or hidden. For some languages, such as Arabic, the users’ location can be predicted from their dialects. The Arabic language has many local dialects for most Arab countries. Natural Language Processing (NLP) techniques have provided several approaches for dialect identification. The recent advanced language models using contextual-based word representations in the continuous domain, such as BERT models, have provided significant improvement for many NLP applications. In this work, we present our efforts to use BERT-based models to improve the dialect identification of Arabic text. We show the results of the developed models to recognize the source of the Arabic country, or the Arabic region, from Twitter data. Our results show 3.4% absolute enhancement in dialect identification accuracy on the regional level over the state-of-the-art result. When we excluded the Modern Standard Arabic (MSA) set, which is formal Arabic language, we achieved 3% absolute gain in accuracy between the three major Arabic dialects over the state-of-the-art level. Finally, we applied the developed models on a recently collected resource for COVID-19 Arabic tweets to recognize the source country from the users’ tweets. We achieved a weighted average accuracy of 97.36%, which proposes a tool to be used by policymakers to support country-level disaster-related activities.
APA, Harvard, Vancouver, ISO, and other styles
4

Mahmoudi, Omayma, Mouncef Filali Bouami, and Mustapha Badri. "Arabic Language Modeling Based on Supervised Machine Learning." Revue d'Intelligence Artificielle 36, no. 3 (June 30, 2022): 467–73. http://dx.doi.org/10.18280/ria.360315.

Full text
Abstract:
Misinformation and misleading actions have appeared as soon as COVID-19 vaccinations campaigns were launched, no matter what the country’s alphabetization level or growing index is. In such a situation, supervised machine learning techniques for classification appears as a suitable solution to model the value & veracity of data, especially in the Arabic language as a language used by millions of people around the world. To achieve this task, we had to collect data manually from SM platforms such as Facebook, Twitter and Arabic news websites. This paper aims to classify Arabic language news into fake news and real news, by creating a Machine Learning (ML) model that will detect Arabic fake news (DAFN) about COVID-19 vaccination. To achieve our goal, we will use Natural Language Processing (NLP) techniques, which is especially challenging since NLP libraries support for Arabic is not common. We will use NLTK package of python to preprocess the data, and then we will use a ML model for the classification.
APA, Harvard, Vancouver, ISO, and other styles
5

Aflisia, Noza, Mohamad Erihadiana, and Nur Balqis. "Teacher’s Perception toward the Readiness to Face Multiculturalism in Arabic Teaching and Learning." Izdihar : Journal of Arabic Language Teaching, Linguistics, and Literature 3, no. 3 (December 31, 2020): 197–210. http://dx.doi.org/10.22219/jiz.v3i3.14117.

Full text
Abstract:
Multicultural presence required an appropriate response from Arabic teachers, so that Arabic is easily accepted and loved by various groups. This research aimed to analyze the efforts of Arabic teachers in dealing with multiculturalism and analyze the obstacles encountered in applying multicultural education in Arabic language learning. This qualitative descriptive research was conducted with interview and documentation. While the data analysis and processing techniques used in this study were processing and preparing the data for analysis, reading the entire data, starting coding all the data, coding to explain the settings, people, categories, themes analyzed, and describing the themes that will be presented again in the narrative/qualitative report. The results revealed that the efforts of Arabic teachers to confront multicultural were by reaffirming the unifying Arabic for Muslims, confirming Arabic as one of the International language, learning the essence of multicultural, improving didactic and methodical competencies, attending training, and modeling. The constraints of the application of multicultural education in Arabic language learning were lack of understanding of the essence of multicultural, lack of knowledge of learning methods and strategies, lack of literature, lack of syllabus and teaching materials contained multicultural education, lack of support from institutions, and lack of training and guidance.
APA, Harvard, Vancouver, ISO, and other styles
6

Hizbullah, Nur, Zakiyah Arifa, Yoke Suryadarma, Ferry Hidayat, Luthfi Muhyiddin, and Eka Kurnia Firmansyah. "SOURCE-BASED ARABIC LANGUAGE LEARNING: A CORPUS LINGUISTIC APPROACH." Humanities & Social Sciences Reviews 8, no. 3 (June 17, 2020): 940–54. http://dx.doi.org/10.18510/hssr.2020.8398.

Full text
Abstract:
Purpose: The study explores the process of using Arabic websites for Arabic language learning, utilising the Arabic Corpus Linguistic approach. This approach enables data-mining out of websites, systematically compiling the mined data, as well as processing the data for the express purpose of Arabic language teaching including its clusters, such as Arabic pragmatics, Arabic linguistics, and Arabic translation teaching as well. MethodologyThe research is written descriptively and utilises qualitative methods used for analysing the process and step-by-step procedures to be executed to make good use of the data. Main Findings: This study is conducted based on the theory of source-based teaching, while the process of utilising the websites is systematically elaborated through the corpus linguistic mechanism. The research concludes that almost all Arabic websites can be employed to be authentic, reliable teaching sources. The sources can be made good use of for teaching the four language competencies, for being the object of linguistic studies and for translation through the particular use of websites whose contents are bilingual or multilingual. Implications/ Applications: The utilisation of the Corpus for teaching and learning has still been needing wide-spreading and promoting either among practitioners or among researchers of the Arabic language in Indonesia. Novelty/Originality of this study: This study highlights that almost Arabic-language websites are one of the richest sources of learning. These learning resources can be used for language learning and various other dimensions of scientific Arabic. Corpus linguistics has many benefits for learners and teachers in Arabic language learning. This study gives the new approach of Arabic teaching-learning using website resources, and the dynamic of Arabic learning using technology.
APA, Harvard, Vancouver, ISO, and other styles
7

LANGLOIS, D., M. SAAD, and K. SMAILI. "Alignment of comparable documents: Comparison of similarity measures on French–English–Arabic data." Natural Language Engineering 24, no. 5 (June 19, 2018): 677–94. http://dx.doi.org/10.1017/s1351324918000232.

Full text
Abstract:
AbstractThe objective, in this article, is to address the issue of the comparability of documents, which are extracted from different sources and written in different languages. These documents are not necessarily translations of each other. This material is referred as multilingual comparable corpora. These language resources are useful for multilingual natural language processing applications, especially for low-resourced language pairs. In this paper, we collect different data in Arabic, English, and French. Two corpora are built by using available hyperlinks for Wikipedia and Euronews. Euronews is an aligned multilingual (Arabic, English, and French) corpus of 34k documents collected from Euronews website. A more challenging issue is to build comparable corpus from two different and independent media having two distinct editorial lines, such as British Broadcasting Corporation (BBC) and Al Jazeera (JSC). To build such corpus, we propose to use the Cross-Lingual Latent Semantic approach. For this purpose, documents have been harvested from BBC and JSC websites for each month of the years 2012 and 2013. The comparability is calculated for each Arabic–English couple of documents of each month. This automatic task is then validated by hand. This led to a multilingual (Arabic–English) aligned corpus of 305 pairs of documents (233k English words and 137k Arabic words). In addition, a study is presented in this paper to analyze the performance of three methods of the literature allowing to measure the comparability of documents on the multilingual reference corpora. A recall at rank 1 of 50.16 per cent is achieved with the Cross-lingual LSI approach for BBC–JSC test corpus, while the dictionary-based method reaches a recall of only 35.41 per cent.
APA, Harvard, Vancouver, ISO, and other styles
8

Chaimae, Azroumahli, Yacine El Younoussi, Otman Moussaoui, and Youssra Zahidi. "An Arabic Dialects Dictionary Using Word Embeddings." International Journal of Rough Sets and Data Analysis 6, no. 3 (July 2019): 18–31. http://dx.doi.org/10.4018/ijrsda.2019070102.

Full text
Abstract:
The dialectical Arabic and the Modern Standard Arabic lacks sufficient standardized language resources to enable the tasks of Arabic language processing, despite it being an active research area. This work addresses this issue by firstly highlighting the steps and the issues related to building a multi Arabic dialect corpus using web data from blogs and social media platforms (i.e. Facebook, Twitter, etc.). This is to create a vectorized dictionary for the crawled data using the word Embeddings. In other terms, the goal of this article is to build an updated multi-dialect data set, and then, to extract an annotated corpus from it.
APA, Harvard, Vancouver, ISO, and other styles
9

Alothman, Manal Othman, Muhammad Badruddin Khan, and Mozaherul Hoque Abul Hasanat. "Review of Researches on Arabic Social Media Text Mining." Journal of Intelligent Systems and Computing 2, no. 1 (March 31, 2021): 20–33. http://dx.doi.org/10.51682/jiscom.00201005.2021.

Full text
Abstract:
Social media sites and applications have allowed people to share their comments, opinions, and point of views in different languages on mass scale. Arabic language is one of the languages that has seen huge surge in production of its digital textual content. The Arabic content and its metadata are a goldmine of useful information for a wide variety of applications. A large number of researchers are working on Arabic data in various domains of research such as natural language processing, sentiment analysis, event detection, named entity recognition, etc. This article presents a review of number of such studies conducted between 2014 and 2019 using their data sources from social media websites. We found that Twitter was the most used source to contribute data for dataset construction for Arabic text mining researchers. Our study also found that the Sport Vector Machine (SVM) and Naïve Bayesian (NB) classifiers were the most used classifiers in the previous researches. Moreover, the results of the previous studies indicate that SVM classifier provided the best performance compared to other classifiers.
APA, Harvard, Vancouver, ISO, and other styles
10

Bessou, Sadik, and Racha Sari. "Efficient Discrimination between Arabic Dialects." Recent Advances in Computer Science and Communications 13, no. 4 (October 19, 2020): 725–30. http://dx.doi.org/10.2174/2213275912666190716115604.

Full text
Abstract:
Background: With the explosion of communication technologies and the accompanying pervasive use of social media, we notice an outstanding proliferation of posts, reviews, comments, and other forms of expressions in different languages. This content attracted researchers from different fields; economics, political sciences, social sciences, psychology and particularly language processing. One of the prominent subjects is the discrimination between similar languages and dialects using natural language processing and machine learning techniques. The problem is usually addressed by formulating the identification as a classification task. Methods: The approach is based on machine learning classification methods to discriminate between Modern Standard Arabic (MSA) and four regional Arabic dialects: Egyptian, Levantine, Gulf and North-African. Several models were trained to discriminate between the studied dialects in large corpora mined from online Arabic newspapers and manually annotated. Results: Experimental results showed that n-gram features could substantially improve performance. Logistic regression based on character and word n-gram model using Count Vectors identified the handled dialects with an overall accuracy of 95%. Best results were achieved with Linear Support vector classifier using TF-IDF Vectors trained by character-based uni-gram, bi-gram, trigram, and word-based uni-gram, bi-gram with an overall accuracy of 95.1%. Conclusion: The results showed that n-gram features could substantially improve performance. Additionally, we noticed that the kind of data representation could provide a significant performance boost compared to simple representation.
APA, Harvard, Vancouver, ISO, and other styles
11

Al-Moslmi, Tareq, Mohammed Albared, Adel Al-Shabi, Nazlia Omar, and Salwani Abdullah. "Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis." Journal of Information Science 44, no. 3 (February 1, 2017): 345–62. http://dx.doi.org/10.1177/0165551516683908.

Full text
Abstract:
Sentiment analysis is held to be one of the highly dynamic recent research fields in Natural Language Processing, facilitated by the quickly growing volume of Web opinion data. Most of the approaches in this field are focused on English due to the lack of sentiment resources in other languages such as the Arabic language and its large variety of dialects. In most sentiment analysis applications, good sentiment resources play a critical role. Based on that, in this article, several publicly available sentiment analysis resources for Arabic are introduced. This article introduces the Arabic senti-lexicon, a list of 3880 positive and negative synsets annotated with their part of speech, polarity scores, dialects synsets and inflected forms. This article also presents a Multi-domain Arabic Sentiment Corpus (MASC) with a size of 8860 positive and negative reviews from different domains. In this article, an in-depth study has been conducted on five types of feature sets for exploiting effective features and investigating their effect on performance of Arabic sentiment analysis. The aim is to assess the quality of the developed language resources and to integrate different feature sets and classification algorithms to synthesise a more accurate sentiment analysis method. The Arabic senti-lexicon is used for generating feature vectors. Five well-known machine learning algorithms: naïve Bayes, k-nearest neighbours, support vector machines (SVMs), logistic linear regression and neural network are employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on standard Arabic data sets were conducted, discussion is presented and conclusions are drawn. The experimental results show that the Arabic senti-lexicon is a very useful resource for Arabic sentiment analysis. Moreover, results show that classifiers which are trained on feature vectors derived from the corpus using the Arabic sentiment lexicon are more accurate than classifiers trained using the raw corpus.
APA, Harvard, Vancouver, ISO, and other styles
12

Ismail, Isma'il. "Pelaksanaan Pembelajaran Komunikasi Non-verbal Bahasa Arab dengan Bahasa Tubuh sebagai Pemahaman Kinesik Lintas Budaya." Progressa: Journal of Islamic Religious Instruction 2, no. 2 (May 9, 2019): 59. http://dx.doi.org/10.32616/pgr.v2.2.134.59-68.

Full text
Abstract:
Arabic is a universal language, Learning Arabic cannot be separated from the social and environmental environment that surrounds it. Learning Arabic requires adaptation to the environment which is an integral part of educational institutions. Students are part of the learning process, after formal education is complete, then they again become one of the important elements in society. Non-verbal language is statement forms of personality or personality traits that are manifested in body movements. Arabic is also an interesting study, where Arabic is one of the dominant Semitic languages ​​and still persists today. Based on the focus of the study, this study aims to describe: The Implementation of Learning Arabic non-verbal communication with Body Language as a cross-cultural kinesic understanding. This research is "Library Research". The research data used is secondary data. The data collection technique used by the authors in this study is documentation. Data processing is carried out by conducting study activities, verification and reduction, grouping and systematization, and interpretation or interpretation so that a phenomenon has social, academic, and scientific value. While data analysis in this study was carried out during and after data collection using descriptive-critical-comparative methods, and content analysis methods. From the results of the analysis it was concluded that: 1) In its development language became a feature of a culture. At a minimum it becomes a differentiator between one community and another in terms of language use. 2) Since the first year of his life in the world, children take part in conversations using body language and non-verbal cues. Then little by little they learn linguistic codes of language, how codes represent objects, events and types of relationships between objects and events. They learn how to send and receive orders with spoken language. 3) The role of Arabic in national culture has taken an important part since the development of Islam in the archipelago in the XIII century and its role is still felt lexically and semantically
APA, Harvard, Vancouver, ISO, and other styles
13

Talim, Ahmad. "STUDI EVALUATIF SISTEM PEMBELAJARAN BAHASA ARAB DI MA PUTRA PONDOK MODERN DARUSSALAM LIABUKU KOTA BAU BAU." Inspiratif Pendidikan 9, no. 1 (March 30, 2020): 94. http://dx.doi.org/10.24252/ip.v9i1.14413.

Full text
Abstract:
The focus of this research is the Arabic learning system in MA Putra Bau Bau Darussalam Islamic Boarding School in Bau Bau City. From these problems, this study aims to evaluate the Arabic learning system in Darussalam Islamic Boarding School which is focused on the main issues, namely: 1) How is the compatibility of the Antasedent Arabic learning system entry in Darussalam Liabuku Bau Bau Islamic boarding school with the process standard?This type of research is a program evaluation with a descriptive qualitative approach, in this study the authors assess the quality level and conditions of the Arabic language learning system in Darussalam Islamic Boarding School Liabuku Bau Bau. To obtain the data, three data collection techniques were used, namely observation, interviews and documentation. After data collection, the data processing is then processed through three stages, namely: 1) data reduction, 2) data display, 3) and drawing conclusions / verification.The results of this study indicate that learning Arabic in MA Putra Darussalam Liabuku Islamic boarding school is fulfilled with 3 standard processes, namely: achieving the existing Arabic learning objectives, teachers in MA Kulliyatul Muallimin al-Islamiyah Darussalam Liabuku Islamic boarding school Bau Bau make I 'dad al-Tadrīs to support the existing Arabic teaching and learning activities, because all of his learning uses Arabic as the medium. In evaluating students 'theoretical or practical language skills, the teacher evaluates learning in the form of oral examinations (imtihᾶn syafahi) and written examinations (imtihᾶn tahrīrī), so that the assessment of students' Arabic Arabic abilities cannot be separated in the four language skills.From this research, a strategic recommendation can be adopted in improving the learning system of KMI Pondok Modern Darussalam Liabuku, namely teachers working with Language Builders in language programs to improve the ability of the Santri Arabic language and provide a Language Laboratory to improve students' listening skills.
APA, Harvard, Vancouver, ISO, and other styles
14

Al-Aqarbeh, Rania, and Mohammed Al-Malahmeh. "Grammatical resumption and dependency processing in Southern Jordanian Arabic." Brill’s Journal of Afroasiatic Languages and Linguistics 14, no. 2 (December 19, 2022): 153–93. http://dx.doi.org/10.1163/18776930-01402003.

Full text
Abstract:
Abstract This study investigates the sensitivity of grammatical resumption to islands in wh-interrogative and relative clause dependencies in Southern Jordanian Arabic (JA). An offline acceptability judgment task and an eye-tracking reading experiment were conducted. The results reveal that resumption in southern JA exhibits sensitivity to strong islands, such as adjunct islands, in both dependencies. The findings also suggest that the southern JA parser posits a resumptive pronoun (RP) inside islands that allow resumption. However, the parser does not predict an RP inside islands that disallow resumption. Furthermore, quantitative data show that wh-interrogative and relative clause dependencies pattern similarly in their sensitivity to islands.
APA, Harvard, Vancouver, ISO, and other styles
15

Alshammari, Nasser O., and Fawaz D. Alharbi. "Combining a Novel Scoring Approach with Arabic Stemming Techniques for Arabic Chatbots Conversation Engine." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 4 (July 31, 2022): 1–21. http://dx.doi.org/10.1145/3511215.

Full text
Abstract:
Arabic is recognized as one of the main languages around the world. Many attempts and efforts have been done to provide computing solutions to support the language. Developing Arabic chatbots is still an evolving research field and requires extra efforts due to the nature of the language. One of the common tasks of any natural language processing application is the stemming step. It is important for developing chatbots, since it helps with pre-processing the input data and it can be involved with different phases of the chatbot development process. The aim of this article is to combine a scoring approach with Arabic stemming techniques for developing an Arabic chatbot conversation engine. Two experiments are conducted to evaluate the proposed solution. The first experiment is to select which stemmer is more accurate when applying our solution, since our algorithm can support various stemmers. The second experiment was conducted to evaluate our proposed approach against various machine learning models. The results show that the ISRIS stemming algorithm is the best fit for our solution with accuracy 78.06%. The results also indicate that our novel solution achieved an F1 score of 65.5%, while the other machine learning models achieved slightly lower scores. Our study presents a novel technique by combining scoring mechanisms with stemming processes to produce the best answer for every query sent by chatbots users compared to other approaches. This can be helpful for developing Arabic chatbot and can support many domains such as education, business, and health. This technique is among the first techniques that developed purposefully to serve the development of Arabic chatbots conversation engine.
APA, Harvard, Vancouver, ISO, and other styles
16

Hejazi, Hani D., and Ahmed A. Khamees. "Opinion mining for Arabic dialect in social media data fusion platforms: A systematic review." Fusion: Practice and Applications 9, no. 1 (2022): 08–28. http://dx.doi.org/10.54216/fpa.090101.

Full text
Abstract:
The huge text generated on social media in Arabic, especially the Arabic dialect becomes more attractive for Natural Language Processing (NLP) to extract useful and structured information that benefits many domains. The more challenging point is that this content is mostly written in an Arabic dialect with a big data fusion challenge, and the problem with these dialects it has no written rules like Modern Standard Arabic (MSA) or traditional Arabic, and it is changing slowly but unexpectedly. One of the ways to benefit from this huge data fusion is opinion mining, so we introduce this systematic review for opinion mining from Arabic text dialect for the years from 2016 until 2019. We have found that Saudi, Egyptian, Algerian, and Jordanian are the most studied dialects even if it is still under development and need a bit more effort, nevertheless, dialects like Mauritanian, Yemeni, Libyan, and somalin have not been studied in this period. Many data fusion models that show a good result is the last four years have been discussed.
APA, Harvard, Vancouver, ISO, and other styles
17

Al-Bayati, Abdulhakeem Qusay, Ahmed S. Al-Araji, and Saman Hameed Ameen. "Arabic Sentiment Analysis (ASA) Using Deep Learning Approach." Journal of Engineering 26, no. 6 (June 1, 2020): 85–93. http://dx.doi.org/10.31026/j.eng.2020.06.07.

Full text
Abstract:
Sentiment analysis is one of the major fields in natural language processing whose main task is to extract sentiments, opinions, attitudes, and emotions from a subjective text. And for its importance in decision making and in people's trust with reviews on web sites, there are many academic researches to address sentiment analysis problems. Deep Learning (DL) is a powerful Machine Learning (ML) technique that has emerged with its ability of feature representation and differentiating data, leading to state-of-the-art prediction results. In recent years, DL has been widely used in sentiment analysis, however, there is scarce in its implementation in the Arabic language field. Most of the previous researches address other languages like English. The proposed model tackles Arabic Sentiment Analysis (ASA) by using a DL approach. ASA is a challenging field where Arabic language has a rich morphological structure more than other languages. In this work, Long Short-Term Memory (LSTM) as a deep neural network has been used for training the model combined with word embedding as a first hidden layer for features extracting. The results show an accuracy of about 82% is achievable using DL method.
APA, Harvard, Vancouver, ISO, and other styles
18

Al Sulaiman, Mansour, Abdullah M. Moussa, Sherif Abdou, Hebah Elgibreen, Mohammed Faisal, and Mohsen Rashwan. "Semantic textual similarity for modern standard and dialectal Arabic using transfer learning." PLOS ONE 17, no. 8 (August 11, 2022): e0272991. http://dx.doi.org/10.1371/journal.pone.0272991.

Full text
Abstract:
Semantic Textual Similarity (STS) is the task of identifying the semantic correlation between two sentences of the same or different languages. STS is an important task in natural language processing because it has many applications in different domains such as information retrieval, machine translation, plagiarism detection, document categorization, semantic search, and conversational systems. The availability of STS training and evaluation data resources for some languages such as English has led to good performance systems that achieve above 80% correlation with human judgment. Unfortunately, such required STS data resources are not available for many languages like Arabic. To overcome this challenge, this paper proposes three different approaches to generate effective STS Arabic models. The first one is based on evaluating the use of automatic machine translation for English STS data to Arabic to be used in fine-tuning. The second approach is based on the interleaving of Arabic models with English data resources. The third approach is based on fine-tuning the knowledge distillation-based models to boost their performance in Arabic using a proposed translated dataset. With very limited resources consisting of just a few hundred Arabic STS sentence pairs, we managed to achieve a score of 81% correlation, evaluated using the standard STS 2017 Arabic evaluation set. Also, we managed to extend the Arabic models to process two local dialects, Egyptian (EG) and Saudi Arabian (SA), with a correlation score of 77.5% for EG dialect and 76% for the SA dialect evaluated using dialectal conversion from the same standard STS 2017 Arabic set.
APA, Harvard, Vancouver, ISO, and other styles
19

Husein, Alhassan Abdur-Rahim. "Acquisition of Agreement Structures by Ghanaian Arabic Learners." Al-Adab Journal 1, no. 137 (June 16, 2021): 23–46. http://dx.doi.org/10.31973/aj.v1i137.915.

Full text
Abstract:
The study investigated acquisition of agreement structures by Arabic as Foreign Language (AFL) learners in Ghana from the Processability Theory (PT) perspective. Five Arabic agreement structures at the phrasal, sentence and subordinate clause levels of PT’s processing procedures were tested in a cross-sectional study. It aimed to establish predictions about the implicational nature of the processing procedures. Data were elicited from 15 students of the University of Ghana Arabic learners who were at different proficiency levels. Grammaticality Judgment Task and Elicited Production Task were used to elicit data. The result suggested that acquisition of agreement structures by Ghanaian AFL learners develop, generally, according to PT’s predictions. While the study largely conforms to PT predictions, the behaviour of the Noun Predicative Adjective structure in the interlanguage system of participants suggests that factors other than processing constraints may be involved in the processing architecture of the L2 learners in Ghana.
APA, Harvard, Vancouver, ISO, and other styles
20

Butt, Hanan, Muhammad Raheel Raza, Muhammad Javed Ramzan, Muhammad Junaid Ali, and Muhammad Haris. "Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images." Forecasting 3, no. 3 (July 20, 2021): 520–40. http://dx.doi.org/10.3390/forecast3030033.

Full text
Abstract:
According to statistics, there are 422 million speakers of the Arabic language. Islam is the second-largest religion in the world, and its followers constitute approximately 25% of the world’s population. Since the Holy Quran is in Arabic, nearly all Muslims understand the Arabic language per some analytical information. Many countries have Arabic as their native and official language as well. In recent years, the number of internet users speaking the Arabic language has been increased, but there is very little work on it due to some complications. It is challenging to build a robust recognition system (RS) for cursive nature languages such as Arabic. These challenges become more complex if there are variations in text size, fonts, colors, orientation, lighting conditions, noise within a dataset, etc. To deal with them, deep learning models show noticeable results on data modeling and can handle large datasets. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) can select good features and follow the sequential data learning technique. These two neural networks offer impressive results in many research areas such as text recognition, voice recognition, several tasks of Natural Language Processing (NLP), and others. This paper presents a CNN-RNN model with an attention mechanism for Arabic image text recognition. The model takes an input image and generates feature sequences through a CNN. These sequences are transferred to a bidirectional RNN to obtain feature sequences in order. The bidirectional RNN can miss some preprocessing of text segmentation. Therefore, a bidirectional RNN with an attention mechanism is used to generate output, enabling the model to select relevant information from the feature sequences. An attention mechanism implements end-to-end training through a standard backpropagation algorithm.
APA, Harvard, Vancouver, ISO, and other styles
21

Alani, Ali A., and Georgina Cosma. "ArSL-CNN a convolutional neural network for Arabic sign language gesture recognition." Indonesian Journal of Electrical Engineering and Computer Science 22, no. 2 (May 1, 2021): 1096. http://dx.doi.org/10.11591/ijeecs.v22.i2.pp1096-1107.

Full text
Abstract:
<p class="IJASEITAbtract">Sign language (SL) is a visual language means of communication for people who are Deaf or have hearing impairments. In Arabic-speaking countries, there are many Arabic sign languages (ArSL) and these use the same alphabets. This study proposes ArSL-CNN, a deep learning model that is based on a convolutional neural network (CNN) for translating Arabic SL (ArSL). Experiments were performed using a large ArSL dataset (ArSL2018) that contains 54049 images of 32 sign language gestures, collected from forty participants. The results of the first experiments with the ArSL-CNN model returned a train and test accuracy of 98.80% and 96.59%, respectively. The results also revealed the impact of imbalanced data on model accuracy. For the second set of experiments, various re-sampling methods were applied to the dataset. Results revealed that applying the synthetic minority oversampling technique (SMOTE) improved the overall test accuracy from 96.59% to 97.29%, yielding a statistically signicant improvement in test accuracy (p=0.016, α&lt;0=05). The proposed ArSL-CNN model can be trained on a variety of Arabic sign languages and reduce the communication barriers encountered by Deaf communities in Arabic-speaking countries.</p>
APA, Harvard, Vancouver, ISO, and other styles
22

Jannah, Nur Aini Sholihatun, Nurhidayati Nurhidayati, and Mohammad Ahsanuddin. "Utilization of Materials "Academic Arapça" for Listening Skills in Arabic Language Education." Arabiyat : Jurnal Pendidikan Bahasa Arab dan Kebahasaaraban 9, no. 2 (December 31, 2022): 191–204. http://dx.doi.org/10.15408/a.v9i2.28971.

Full text
Abstract:
Istimâ' Ibtidâ'i course in the Arabic Language Education Department at the State University of Malang refers to the Al-‘Arabiyyah baina Yadaik material only in the form of learning audio. The aims of this study were to: 1) use the material “Academic Arapça” to improve listening skills of undergraduate students in Arabic Language Education Department, and 2) improve listening skills in undergraduate students of Arabic language after using the material “Academic Arapça”. The method research is a classroom action research with quantitative data processing. The results from this research can be concluded that the learning outcomes of students in listening learning obtained an average pretest score of 71 .57 and a post-test of 94.47. Furthermore, the result from the N-Gain is in the high category with a percentage of 81, 18%, which show an increase in results in the use of “Academic Arapça” for the special learning process of the Arabic language department.
APA, Harvard, Vancouver, ISO, and other styles
23

Alkadri, Abdullah M., Abeer Elkorany, and Cherry Ahmed. "Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning." Applied Sciences 12, no. 22 (November 10, 2022): 11388. http://dx.doi.org/10.3390/app122211388.

Full text
Abstract:
In recent years, people have tended to use online social platforms, such as Twitter and Facebook, to communicate with families and friends, read the latest news, and discuss social issues. As a result, spam content can easily spread across them. Spam detection is considered one of the important tasks in text analysis. Previous spam detection research focused on English content, with less attention to other languages, such as Arabic, where labeled data are often hard to obtain. In this paper, an integrated framework for Twitter spam detection is proposed to overcome this problem. This framework integrates data augmentation, natural language processing, and supervised machine learning algorithms to overcome the problems of detection of Arabic spam on the Twitter platform. The word embedding technique is employed to augment the data using pre-trained word embedding vectors. Different machine learning techniques were applied, such as SVM, Naive Bayes, and Logistic Regression for spam detection. To prove the effectiveness of this model, a real-life data set for Arabic tweets have been collected and labeled. The results show that an overall improvement in the use of data augmentation increased the macro F1 score from 58% to 89%, with an overall accuracy of 92%, which outperform the current state of the art.
APA, Harvard, Vancouver, ISO, and other styles
24

Zanona, Marwan Abo, Anmar Abuhamdah, and Bassam Mohammed El-Zaghmouri. "Arabic Hand Written Character Recognition Based on Contour Matching and Neural Network." Computer and Information Science 12, no. 2 (April 30, 2019): 126. http://dx.doi.org/10.5539/cis.v12n2p126.

Full text
Abstract:
Complexity of Arabic writing language makes its handwritten recognition very complex in terms of computer algorithms. The Arabic handwritten recognition has high importance in modern applications. The contour analysis of word image can extract special contour features that discriminate one character from another by the mean of vector features. This paper implements a set of pre-processing functions over a handwritten Arabic characters, with contour analysis, to enter the contour vector to neural network to recognize it. The selection of this set of pre-processing algorithms was completed after hundreds of tests and validation. The feed forward neural network architecture was trained using many patterns regardless of the Arabic font style building a rigid recognition model. Because of the shortcomings in Arabic written databases or datasets, the testing was done by non-standard data set. The presented algorithm structure got recognition ratio about 97%.
APA, Harvard, Vancouver, ISO, and other styles
25

Salem, Awsan, Nurul Akmar Emran, Azah Kamilah Muda, Zahriah Sahri, and Abdulrazzak Ali. "Missing values imputation in Arabic datasets using enhanced robust association rules." Indonesian Journal of Electrical Engineering and Computer Science 28, no. 2 (November 1, 2022): 1067. http://dx.doi.org/10.11591/ijeecs.v28.i2.pp1067-1075.

Full text
Abstract:
Missing value (MV) is one form of data completeness problem in massive datasets. To deal with missing values, data imputation methods were proposed with the aim to improve the completeness of the datasets concerned. Data imputation's accuracy is a common indicator of a data imputation technique's efficiency. However, the efficiency of data imputation can be affected by the nature of the language in which the dataset is written. To overcome this problem, it is necessary to normalize the data, especially in non-Latin languages such as the Arabic language. This paper proposes a method that will address the challenge inherent in Arabic datasets by extending the enhanced robust association rules (ERAR) method with Arabic detection and correction functions. Iterative and Decision Tree methods were used to evaluate the proposed method in an experiment. Experiment results show that the proposed method offers a higher data imputation accuracy than the Iterative and Decision Tree methods.
APA, Harvard, Vancouver, ISO, and other styles
26

Mi, Chenggang, Shaolin Zhu, and Rui Nie. "Improving Loanword Identification in Low-Resource Language with Data Augmentation and Multiple Feature Fusion." Computational Intelligence and Neuroscience 2021 (April 8, 2021): 1–9. http://dx.doi.org/10.1155/2021/9975078.

Full text
Abstract:
Loanword identification is studied in recent years to alleviate data sparseness in several natural language processing (NLP) tasks, such as machine translation, cross-lingual information retrieval, and so on. However, recent studies on this topic usually put efforts on high-resource languages (such as Chinese, English, and Russian); for low-resource languages, such as Uyghur and Mongolian, due to the limitation of resources and lack of annotated data, loanword identification on these languages tends to have lower performance. To overcome this problem, we first propose a lexical constraint-based data augmentation method to generate training data for low-resource language loanword identification; then, a loanword identification model based on a log-linear RNN is introduced to improve the performance of low-resource loanword identification by incorporating features such as word-level embeddings, character-level embeddings, pronunciation similarity, and part-of-speech (POS) into one model. Experimental results on loanword identification in Uyghur (in this study, we mainly focus on Arabic, Chinese, Russian, and Turkish loanwords in Uyghur) showed that our proposed method achieves best performance compared with several strong baseline systems.
APA, Harvard, Vancouver, ISO, and other styles
27

Cuetos, Fernando, and Graciela Miera. "Number Processing Dissociations: Evidence from a Case of Dyscalculia." Spanish Journal of Psychology 1 (May 1998): 18–31. http://dx.doi.org/10.1017/s1138741600005370.

Full text
Abstract:
In this case study of an aphasic patient with difficulties in numerical processing, the patient responded to a series of linguistic and numerical tasks designed to assess efficiency levels in processing various linguistic components. In addition, the patient completed a series of transcoding tasks that were directed at isolating whether the problems were associated primarily with arabic numerals or with other modalities (spoken or written). Data were analyzed using chi-square goodness-of-fit tests. Statistically significant differences were obtained between spoken verbal and written verbal outputs and between arabic and spoken verbal outputs. Based upon an analysis of errors, it was tentatively concluded that the disorders were associated with two types of dissociation operating together, one between spoken verbal and written verbal outputs at the syntactical level and the other between lexical and syntactical components in the spoken verbal output. A revised model is proposed to provide a tentative explanation for these observations.
APA, Harvard, Vancouver, ISO, and other styles
28

Alzahrani, Abdullah Ibrahim Abdullah, and Syed Zohaib Javaid Zaidi. "Recent developments in information extraction approaches from Arabic tweets on social networking sites." International Journal of ADVANCED AND APPLIED SCIENCES 9, no. 9 (September 2022): 145–52. http://dx.doi.org/10.21833/ijaas.2022.09.018.

Full text
Abstract:
Information extraction from Arabic tweets has attracted the attention of researchers due to the huge data accessibility for the swift expansion of social media platforms. With the increasing use of social web applications, information extraction from the various platforms has gained importance for understanding the trending post and events predictions based on those sentiments written by the users on certain news feeds. The Arabic Language is mostly used in Middle Eastern and African countries and most users tweet on social media using the Arabic language, therefore Arabic text classification and sentiment analysis aimed to predict information extraction from social media platforms. This research provides a more detailed critical review of the information extraction presented in the literature focused on using different tools, methods, and techniques like k-NN, support vector machines, Naïve Bayes, and other machine learning tools for the data extraction and processing.
APA, Harvard, Vancouver, ISO, and other styles
29

Duwairi, Rehab, and Ftoon Abushaqra. "Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis." PeerJ Computer Science 7 (April 5, 2021): e469. http://dx.doi.org/10.7717/peerj-cs.469.

Full text
Abstract:
Arabic language is a challenging language for automatic processing. This is due to several intrinsic reasons such as Arabic multi-dialects, ambiguous syntax, syntactical flexibility and diacritics. Machine learning and deep learning frameworks require big datasets for training to ensure accurate predictions. This leads to another challenge faced by researches using Arabic text; as Arabic textual datasets of high quality are still scarce. In this paper, an intelligent framework for expanding or augmenting Arabic sentences is presented. The sentences were initially labelled by human annotators for sentiment analysis. The novel approach presented in this work relies on the rich morphology of Arabic, synonymy lists, syntactical or grammatical rules, and negation rules to generate new sentences from the seed sentences with their proper labels. Most augmentation techniques target image or video data. This study is the first work to target text augmentation for Arabic language. Using this framework, we were able to increase the size of the initial seed datasets by 10 folds. Experiments that assess the impact of this augmentation on sentiment analysis showed a 42% average increase in accuracy, due to the reliability and the high quality of the rules used to build this framework.
APA, Harvard, Vancouver, ISO, and other styles
30

Almuqren, Latifah, and Alexandra Cristea. "AraCust: a Saudi Telecom Tweets corpus for sentiment analysis." PeerJ Computer Science 7 (May 20, 2021): e510. http://dx.doi.org/10.7717/peerj-cs.510.

Full text
Abstract:
Comparing Arabic to other languages, Arabic lacks large corpora for Natural Language Processing (Assiri, Emam & Al-Dossari, 2018; Gamal et al., 2019). A number of scholars depended on translation from one language to another to construct their corpus (Rushdi-Saleh et al., 2011). This paper presents how we have constructed, cleaned, pre-processed, and annotated our 20,0000 Gold Standard Corpus (GSC) AraCust, the first Telecom GSC for Arabic Sentiment Analysis (ASA) for Dialectal Arabic (DA). AraCust contains Saudi dialect tweets, processed from a self-collected Arabic tweets dataset and has been annotated for sentiment analysis, i.e.,manually labelled (k=0.60). In addition, we have illustrated AraCust’s power, by performing an exploratory data analysis, to analyse the features that were sourced from the nature of our corpus, to assist with choosing the right ASA methods for it. To evaluate our Golden Standard corpus AraCust, we have first applied a simple experiment, using a supervised classifier, to offer benchmark outcomes for forthcoming works. In addition, we have applied the same supervised classifier on a publicly available Arabic dataset created from Twitter, ASTD (Nabil, Aly & Atiya, 2015). The result shows that our dataset AraCust outperforms the ASTD result with 91% accuracy and 89% F1avg score. The AraCust corpus will be released, together with code useful for its exploration, via GitHub as a part of this submission.
APA, Harvard, Vancouver, ISO, and other styles
31

Abbas, Samah, Hassanin Al-Barhamtoshy, and Fahad Alotaibi. "Towards an Arabic Sign Language (ArSL) corpus for deaf drivers." PeerJ Computer Science 7 (November 19, 2021): e741. http://dx.doi.org/10.7717/peerj-cs.741.

Full text
Abstract:
Sign language is a common language that deaf people around the world use to communicate with others. However, normal people are generally not familiar with sign language (SL) and they do not need to learn their language to communicate with them in everyday life. Several technologies offer possibilities for overcoming these barriers to assisting deaf people and facilitating their active lives, including natural language processing (NLP), text understanding, machine translation, and sign language simulation. In this paper, we mainly focus on the problem faced by the deaf community in Saudi Arabia as an important member of the society that needs assistance in communicating with others, especially in the field of work as a driver. Therefore, this community needs a system that facilitates the mechanism of communication with the users using NLP that allows translating Arabic Sign Language (ArSL) into voice and vice versa. Thus, this paper aims to purplish our created dataset dictionary and ArSL corpus videos that were done in our previous work. Furthermore, we illustrate our corpus, data determination (deaf driver terminologies), dataset creation and processing in order to implement the proposed future system. Therefore, the evaluation of the dataset will be presented and simulated using two methods. First, using the evaluation of four expert signers, where the result was 10.23% WER. The second method, using Cohen’s Kappa in order to evaluate the corpus of ArSL videos that was made by three signers from different regions of Saudi Arabia. We found that the agreement between signer 2 and signer 3 is 61%, which is a good agreement. In our future direction, we will use the ArSL video corpus of signer 2 and signer 3 to implement ML techniques for our deaf driver system.
APA, Harvard, Vancouver, ISO, and other styles
32

El Adlouni, Yassine, Noureddine En Nahnahi, Said Ouatik El Alaoui, Mohammed Meknassi, Horacio Rodríguez, and Nabil Alami. "Arabic Biomedical Community Question Answering Based on Contextualized Embeddings." International Journal of Intelligent Information Technologies 17, no. 3 (July 2021): 13–29. http://dx.doi.org/10.4018/ijiit.2021070102.

Full text
Abstract:
Community question answering has become increasingly important as they are practical for seeking and sharing information. Applying deep learning models often leads to good performance, but it requires an extensive amount of annotated data, a problem exacerbated for languages suffering a scarcity of resources. Contextualized language representation models have gained success due to promising results obtained on a wide array of downstream natural language processing tasks such as text classification, textual entailment, and paraphrase identification. This paper presents a novel approach by fine-tuning contextualized embeddings for a medical domain community question answering task. The authors propose an architecture combining two neural models powered by pre-trained contextual embeddings to learn a sentence representation and thereafter fine-tuned on the task to compute a score used for both ranking and classification. The experimental results on SemEval Task 3 CQA show that the model significantly outperforms the state-of-the-art models by almost 2% for the '16 edition and 1% for the '17 edition.
APA, Harvard, Vancouver, ISO, and other styles
33

Oktaviani, Selpi, and Maman Abdurrahman. "Analisis Pembelajaran Komunikasi Arab Dalam Pembelajaran Bahasa Arab Di SMA Aisyiyah Boarding School Bandung." Tsaqofiya : Jurnal Pendidikan Bahasa dan Sastra Arab 3, no. 2 (July 31, 2021): 148–57. http://dx.doi.org/10.21154/tsaqofiya.v3i2.73.

Full text
Abstract:
This research is motivated by the difficulties experienced by students when learning Arabic communication takes place. Therefore, this study aims to describe Arabic communication learning, starting from planning, processing, evaluation and the obstacles that students feel when learning Arabic communication activities. This research uses a qualitative approach with descriptive methods. The sample in this study were students of class X Social and teachers who taught Arabic communication. Data collection techniques used were observation, interviews, documentation and student questionnaires. To analyze the data, researchers used qualitative data analysis with data verification steps, data presentation and data reduction. The results showed that 1) Arabic communication learning planning by the teacher was well prepared, the teacher used the 2013 curriculum and Arabic communication modules that were compiled by themselves 2) The Arabic communication learning process went well and according to the procedure but there were some students who were slow to understand the material provided by the teacher 3) Evaluation of Arabic Communication learning including daily exams, midterm and final semester exams including oral tests, written tests and group assignments. 4) The obstacles experienced by students in learning Arabic communication, namely; the lack of student interest in learning Arabic, the lack of self-confidence when communicating using Arabic, the difference in the student's school background that makes students' Arabic language skills diverse
APA, Harvard, Vancouver, ISO, and other styles
34

Al-Janabi, Adel, Ehsan Ali Kareem, and Radhwan Hussein Abdulzhraa Al Sagheer. "Encapsulation of semantic description with syntactic components for the Arabic language." Indonesian Journal of Electrical Engineering and Computer Science 22, no. 2 (May 1, 2021): 961. http://dx.doi.org/10.11591/ijeecs.v22.i2.pp961-967.

Full text
Abstract:
<span>The work presents new theoretical equipment for the representation of natural languages (NL) in computers. Linguistics: morphology, semantics, and syntax are also presented as components of subtle computer science that form. A structure and an integrated data system. The presented useful theory of language is a new method to learn the language by separating the fields of semantics and syntax.</span>
APA, Harvard, Vancouver, ISO, and other styles
35

Baarid, Nurul Aini, and Kamal Yusuf. "ISU DAN TREN PENELITIAN PENGEMBANGAN BAHAN AJAR BAHASA ARAB TAHUN 2017-2020." Al Mi'yar: Jurnal Ilmiah Pembelajaran Bahasa Arab dan Kebahasaaraban 4, no. 1 (April 21, 2021): 139. http://dx.doi.org/10.35931/am.v4i1.524.

Full text
Abstract:
Abstract The aim of this study is to determine the research tren for the development of Arabic teaching materials in year 2017-2020 of Postgraduate students at UIN Sunan Ampel in Surabaya whether this tren is consistent with the development of Industrial Revolution 4.0, which is linked by a system that unites the real and virtual world, or vice versa. The research method used is descriptive analysis research by collecting library data, recording and processing literacy data related to these issues. The results showed that research on the development of Arabic teaching materials to improve four language skills was ranked first as the most emerging research topic with a percentage of 0.6%, secondly research on the design of Arabic teaching materials with a percentage of 0.08%, third research on the Arabic learning curriculum for comparative studies and its development with a percentage of 0.08%, then research on the pedagogical and professional competences of teachers and their effects on student achievement with a percentage of 0.07%, then technology-based research for learning Arabic with a value 0.07%, research on the influence of the language environment by 0.04%, then research on the development of an assessment instrument of 0.02% and research on strategies, media, and methods of learning Arabic as well as book analysis research respectively ranked last with a percentage of 0.01% as the least frequent research topic. Keywords: Development research, Teaching materials, Arabic language
APA, Harvard, Vancouver, ISO, and other styles
36

Baazeem, Ibtehal, Hend Al-Khalifa, and Abdulmalik Al-Salman. "Cognitively Driven Arabic Text Readability Assessment Using Eye-Tracking." Applied Sciences 11, no. 18 (September 16, 2021): 8607. http://dx.doi.org/10.3390/app11188607.

Full text
Abstract:
Using physiological data helps to identify the cognitive processing in the human brain. One method of obtaining these behavioral signals is by using eye-tracking technology. Previous cognitive psychology literature shows that readable and difficult-to-read texts are associated with certain eye movement patterns, which has recently encouraged researchers to use these patterns for readability assessment tasks. However, although it seems promising, this research direction has not been explored adequately, particularly for Arabic. The Arabic language is defined by its own rules and has its own characteristics and challenges. There is still a clear gap in determining the potential of using eye-tracking measures to improve Arabic text. Motivated by this, we present a pilot study to explore the extent to which eye-tracking measures enhance Arabic text readability. We collected the eye movements of 41 participants while reading Arabic texts to provide real-time processing of the text; these data were further analyzed and used to build several readability prediction models using different regression algorithms. The findings show an improvement in the readability prediction task, which requires further investigation. To the best of our knowledge, this work is the first study to explore the relationship between Arabic readability and eye movement patterns.
APA, Harvard, Vancouver, ISO, and other styles
37

Nahar, Khalid, Ra’ed Al-Khatib, Moy'awiah Al-Shannaq, Mohammad Daradkeh, and Rami Malkawi. "Direct Text Classifier for Thematic Arabic Discourse Documents." International Arab Journal of Information Technology 17, no. 3 (May 1, 2019): 394–403. http://dx.doi.org/10.34028/iajit/17/3/13.

Full text
Abstract:
Maintaining the topical coherence while writing a discourse is a major challenge confronting novice and non-novice writers alike. This challenge is even more intense with Arabic discourse because of the complex morphology and the widespread of synonyms in Arabic language. In this research, we present a direct classification of Arabic discourse document while writing. This prescriptive proposed framework consists of the following stages: data collection, pre-processing, construction of Language Model (LM), topics identification, topics classification, and topic notification. To prove and demonstrate our proposed framework, we designed a system and applied it on a corpus of 2800 Arabic discourse documents synthesized into four predefined topics related to: Culture, Economy, Sport, and Religion. System performance was analysed, in terms of accuracy, recall, precision, and F-measure. The results demonstrated that the proposed topic modeling-based decision framework is able to classify topics while writing a discourse with accuracy of 91.0%.
APA, Harvard, Vancouver, ISO, and other styles
38

Khashawi, F. "Verbal and visual-spatial working memory performance in Arabic monolingual and English/Arabic bilingual Kuwaiti children." European Psychiatry 33, S1 (March 2016): S369. http://dx.doi.org/10.1016/j.eurpsy.2016.01.1323.

Full text
Abstract:
IntroductionResearch in psycholinguistics focusing on cognitive processing in bilinguals and the role played by working memory about cognitive processing indicated that Working Memory (WM) was instrumental in cognitive processing in bilinguals, but that its role was different and generally more complex than it was in monolinguals. However, the specific manner in which the use of WM differed between monolinguals and bilinguals was not always clear.ObjectivesThis research explored the verbal and visual-spatial WM performance in an Arabic monolingual group and a bilingual English/Arabic group.MethodsThe participants were 396 Kuwaiti (198 monolingual aged 7.99 ± 1.97 years and 198 bilingual aged 8.03 ± 1.92) with no significant age differences (t = 0.23, P > 0.05). The two groups were compared on how they performed in the Automated Working Memory Assessment (AWMA), to measure a verbal and visual-spatial WM tasks. The tasks were Listening Recall, Counting Recall, Mr. X, Backward Digit Recall, Odd-one-out and Spatial Span. All tasks were internally consistent (Alpha = 0.91, 0.93, 0.87, 0.88, 0.87, and 0.91 respectively). The data was analyzed using Independent Sample t Test.ResultsThe findings showed that there was significant group difference as the monolingual Arabic group (L1) performed better than bilingual English/Arabic group (L2) on both of verbal WM (t = 3.25, P < 0.002) and visuospatial WM (t = 3.04, P < 0.002).ConclusionThe monolingual children obtained higher scores on both verbal and visuospatial WM. These findings were explained in terms of the complexity of the Arabic language and cultural context in which the second language is being practiced. This warrants further investigation.Disclosure of interestThe authors have not supplied their declaration of competing interest.
APA, Harvard, Vancouver, ISO, and other styles
39

Kasmantoni, Kasmantoni, Noza Aflisia, and Isma Muhammad ‘Atiyah. "Arabic Practice in the Language Environment l Mumarasah al-Lughah al-‘Arabiyah fi Bi’ah Lughawiyyah." Jurnal Al Bayan: Jurnal Jurusan Pendidikan Bahasa Arab 14, no. 2 (December 31, 2022): 470–85. http://dx.doi.org/10.24042/albayan.v14i2.12514.

Full text
Abstract:
The concept of learning speaking skills was urgently discussed to become a cornerstone in practice. Many students do not understand correctly the concept of speaking Arabic which is the characteristic of someone who is proficient in Arabic. The purpose of this study was to analyze the idea of Mohammed Mohammed Imam Daoud in learning speaking skills in Arabic Language. A qualitative descriptive method was used as the research design. Data analysis techniques included formulating research problems, conducting literature studies, determining units of observation and units of analysis, creating categorization and coding guidelines, coding data, and processing data. The results of the research that the ability to be tired of the meanings of the soul in different approvals and the ability to use vocabulary in the social context. The method of speaking skill was according to Muhammad Daoud's approach to listening skill and speaking skill. Adab of learning the speaking skills are fidelity, seriousness in training, and continuous, so the learners would be skillful. The information would make knowledge but training and practice would create skill cooperation with the teacher and colleagues and humility in their lives. The key of success in speaking can also be done with the application of Arabic in the language environment. Muhammad Imam Dawood's idea of teaching speaking skills was important for developing the quality of learning speaking skills at the primary, intermediate and advanced levels. The application of Arabic in practice would increase the fluently in speaking performance.
APA, Harvard, Vancouver, ISO, and other styles
40

Alluhaibi, Reyadh, Tareq Alfraidi, Mohammad A. R. Abdeen, and Ahmed Yatimi. "A Comparative Study of Arabic Part of Speech Taggers Using Literary Text Samples from Saudi Novels." Information 12, no. 12 (December 15, 2021): 523. http://dx.doi.org/10.3390/info12120523.

Full text
Abstract:
Part of Speech (POS) tagging is one of the most common techniques used in natural language processing (NLP) applications and corpus linguistics. Various POS tagging tools have been developed for Arabic. These taggers differ in several aspects, such as in their modeling techniques, tag sets and training and testing data. In this paper we conduct a comparative study of five Arabic POS taggers, namely: Stanford Arabic, CAMeL Tools, Farasa, MADAMIRA and Arabic Linguistic Pipeline (ALP) which examine their performance using text samples from Saudi novels. The testing data has been extracted from different novels that represent different types of narrations. The main result we have obtained indicates that the ALP tagger performs better than others in this particular case, and that Adjective is the most frequent mistagged POS type as compared to Noun and Verb.
APA, Harvard, Vancouver, ISO, and other styles
41

Tharwat, Gamal, Abdelmoty M. Ahmed, and Belgacem Bouallegue. "Arabic Sign Language Recognition System for Alphabets Using Machine Learning Techniques." Journal of Electrical and Computer Engineering 2021 (October 25, 2021): 1–17. http://dx.doi.org/10.1155/2021/2995851.

Full text
Abstract:
In recent years, the role of pattern recognition in systems based on human computer interaction (HCI) has spread in terms of computer vision applications and machine learning, and one of the most important of these applications is to recognize the hand gestures used in dealing with deaf people, in particular to recognize the dashed letters in surahs of the Quran. In this paper, we suggest an Arabic Alphabet Sign Language Recognition System (AArSLRS) using the vision-based approach. The proposed system consists of four stages: the stage of data processing, preprocessing of data, feature extraction, and classification. The system deals with three types of datasets: data dealing with bare hands and a dark background, data dealing with bare hands, but with a light background, and data dealing with hands wearing dark colored gloves. AArSLRS begins with obtaining an image of the alphabet gestures, then revealing the hand from the image and isolating it from the background using one of the proposed methods, after which the hand features are extracted according to the selection method used to extract them. Regarding the classification process in this system, we have used supervised learning techniques for the classification of 28-letter Arabic alphabet using 9240 images. We focused on the classification for 14 alphabetic letters that represent the first Quran surahs in the Quranic sign language (QSL). AArSLRS achieved an accuracy of 99.5% for the K-Nearest Neighbor (KNN) classifier.
APA, Harvard, Vancouver, ISO, and other styles
42

Almarshedi, Raniyah Mohammad. "Identifying the domain and level of bilingualism amongst Saudi EFL learners." Linguistics and Culture Review 5, S1 (January 11, 2021): 1696–706. http://dx.doi.org/10.21744/lingcure.v5ns1.2213.

Full text
Abstract:
Human language processing in the context of bilingualism poses many questions. The centre of inquiry, however, still remains the nature of interaction between the bilinguals’ two language systems. The Saudi higher education learners and society are bilinguals with English taking the place of L2 in practically all walks of life. However, dominance of the mother tongue and prevalent pedagogy and coursebooks can be detractors in their acquisition of native-like proficiency. This study evaluates the role of the two language systems (Arabic and English) in the lives of 93 sophomore students of language, College of Arts and Literature, University of Hail, Saudi Arabia Saudi Arabia at Hail University, Saudi Arabia. Using a questionnaire, the study gathered data that loaded onto specific domains, and the level of bilingualism for the respondents. Results indicate that professional use, academic advancement, using English at the college meetings, and to communicate with their teachers at college are the domains where students frequently use English. Findings also showed that students perceived their level of bilingualism as being high. In other words, they scored high in the items which indicated that they could talk without thinking and were comfortable during their talk.
APA, Harvard, Vancouver, ISO, and other styles
43

Hizbullah, Nur, Iin Suryaningsih, and Zaqiatul Mardiah. "MANUSKRIP ARAB DI NUSANTARA DALAM TINJAUAN LINGUISTIK KORPUS." Arabi : Journal of Arabic Studies 4, no. 1 (July 1, 2019): 65. http://dx.doi.org/10.24865/ajas.v4i1.145.

Full text
Abstract:
The history and development of Islam in Indonesia are enriched by the existence of manuscripts written in Arabic language or written in Arabic script, like Pegon or Jawi although they do not use Arabic. In the context of corpus linguistics, the manuscript is a proof of the existence and dynamics of real Arabic usage by Indonesian speakers. This paper describes several classifications of manuscripts written in Arabic and their urgency as the material of Arabic corpus data in Indonesia in the context of the development of multidisciplinary Arabic research. Furthermore, the manuscript will be mapped based on seven types of Arabic corpus in Indonesia. Based on the mapping, it is projected that the majority of Arabic manuscripts in the archipelago are categorized as a corpus of scientific works, the corpus of Islamic studies, and corpus of literary works. For this purpose, it is necessary to process those manuscripts into digital text material to be analyzed with corpus processing applications through three stages: image scanning, image conversion into text, and manual text verification.
APA, Harvard, Vancouver, ISO, and other styles
44

Syamsudar, Syamsudar. "THE EFFORTS TO OVERCOME THE DIFFICULTIES OF LEARNING ARABIC IN MTs. YASRIB LAPAJUNG BOARDING SCHOOL." JICSA (Journal of Islamic Civilization in Southeast Asia) 8, no. 1 (June 29, 2019): 157. http://dx.doi.org/10.24252/jicsa.v8i1.9845.

Full text
Abstract:
This study aimed: 1) to describe the learning process in MTs. Yasrib Lapajung Boarding School, 2) To reveal the kind of difficulties in learning Arabic in MTs. Yasrib Lapajung Boarding School, 4) To describe the efforts done by Arabic teachers in overcoming the difficulties of learning Arabic in MTs. Yasrib Lapajung Boarding School. This research was descriptive qualitative by analyzing and describing the data in an objective and detailed manner to obtain accurate results. The data sources of this study were from the headmaster, Arabic teachers, and the students of MTs. Yasrib Lapajung Boarding School as informants. The technique of data collection was done by conducting interview, observation, and documentation. Data processing technique and qualitative analysis used three stages, namely: 1) data reduction, 2) data display, and 3) conclusion. The results showed that; first, the process of learning Arabic in MTs. Yasrib Lapajung Boarding School was divided into three parts, they were learning plan, implementation, and evaluation. Second, the students' difficulties that emerged in learning Arabic, namely: istima' (listening), kalam (speaking), qira'ah (reading), and kitabah (writing). Third, to overcome the learning difficulties that existed in learning Arabic related to difficulties in aspects of language skills, the teacher asked the students to memorize mufradat (vocabulary) and translate a sentence, to write Arabic by dictation (imla'), and to practice conversation so that the students were accustomed in speaking Arabic, the teacher also fostered the students' motivation and interest, and held extracurricular activities.
APA, Harvard, Vancouver, ISO, and other styles
45

.., Hani D., Ahmed A. Khamees, and Said A. Salloum. "Opinion mining for Arabic dialect in social media: A systematic review." International Journal of Advances in Applied Computational Intelligence 1, no. 2 (2022): 08–28. http://dx.doi.org/10.54216/ijaaci.010201.

Full text
Abstract:
The huge text generated on social media in Arabic, especially the Arabic dialect becomes more attractive for Natural Language Processing (NLP) to extract useful and structured information that benefits many domains. The more challenging point is that this content is mostly written in an Arabic dialect, and the problem with these dialects it has no written rules like Modern Standard Arabic (MSA) or traditional Arabic, and it is changing slowly but unexpectedly. One of the ways to benefit from this huge data is opinion mining, so we introduce this systematic review for opinion mining from Arabic text dialect for the years from 2016 until 2019. We have found that Saudi, Egyptian, Algerian, and Jordanian are the most studied dialects even if it is still under development and need a bit more effort, nevertheless, dialects like Mauritanian, Yemeni, Libyan, and somalin have not been studied in this period; also we have found the main methods that show a good result is the last four years.
APA, Harvard, Vancouver, ISO, and other styles
46

Elfaik, Hanane, and El Habib Nfaoui. "Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text." Journal of Intelligent Systems 30, no. 1 (December 31, 2020): 395–412. http://dx.doi.org/10.1515/jisys-2020-0021.

Full text
Abstract:
Abstract Sentiment analysis aims to predict sentiment polarities (positive, negative or neutral) of a given piece of text. It lies at the intersection of many fields such as Natural Language Processing (NLP), Computational Linguistics, and Data Mining. Sentiments can be expressed explicitly or implicitly. Arabic Sentiment Analysis presents a challenge undertaking due to its complexity, ambiguity, various dialects, the scarcity of resources, the morphological richness of the language, the absence of contextual information, and the absence of explicit sentiment words in an implicit piece of text. Recently, deep learning has obviously shown a great success in the field of sentiment analysis and is considered as the state-of-the-art model in Arabic Sentiment Analysis. However, the state-of-the-art accuracy for Arabic sentiment analysis still needs improvements regarding contextual information and implicit sentiment expressed in different real cases. In this paper, an efficient Bidirectional LSTM Network (BiLSTM) is investigated to enhance Arabic Sentiment Analysis, by applying Forward-Backward encapsulate contextual information from Arabic feature sequences. The experimental results on six benchmark sentiment analysis datasets demonstrate that our model achieves significant improvements over the state-of-art deep learning models and the baseline traditional machine learning methods.
APA, Harvard, Vancouver, ISO, and other styles
47

Hussein, Jabbar S., Abdulkadhim A. Salman, and Thmer R. Saeed. "Arabic speaker recognition using HMM." Indonesian Journal of Electrical Engineering and Computer Science 23, no. 2 (August 1, 2021): 1212. http://dx.doi.org/10.11591/ijeecs.v23.i2.pp1212-1218.

Full text
Abstract:
In this paper, a new suggested system for speaker recognition by using hidden markov model (HHM) algorithm. Many researches have been written in this subject, especially by HMM. Arabic language is one of the difficult languages and the work with it is very little, also, the work has been done for text dependent system where HMM is very effective and the algorithm trained at the word level. One the problems in such systems is the noise, so we take it in consideration by adding additive white gaussian noise (AWGN) to the speech signals to see its effect. Here, we used HMM with new algorithm with one state, where two of these components, i.e. (π and A) are removed. This give extremely accelerates the training and testing stages of recognition speeds with lowest memory usage, as seen in the work. The results show an excellent outcome. 100% recognition rate for the tested data, about 91.6% recognition rate with AWGN noise.
APA, Harvard, Vancouver, ISO, and other styles
48

KHEMAKHEM, AIDA, BILEL GARGOURI, ABDELMAJID BEN HAMADOU, and GIL FRANCOPOULO. "ISO standard modeling of a large Arabic dictionary." Natural Language Engineering 22, no. 6 (September 7, 2015): 849–79. http://dx.doi.org/10.1017/s1351324915000224.

Full text
Abstract:
AbstractIn this paper, we address the problem of the large coverage dictionaries of Arabic language usable both for direct human reading and automatic Natural Language Processing. For these purposes, we propose a normalized and implemented modeling, based on Lexical Markup Framework (LMF-ISO 24613) and Data Registry Category (DCR-ISO 12620), which allows a stable and well-defined interoperability of lexical resources through a unification of the linguistic concepts. Starting from the features of the Arabic language, and due to the fact that a large range of details and refinements need to be described specifically for Arabic, we follow a finely structuring strategy. Besides its richness in morphology, syntax and semantics knowledge, our model includes all the Arabic morphological patterns to generate the inflected forms from a given lemma and highlights the syntactic–semantic relations. In addition, an appropriate codification has been designed for the management of all types of relationships among lexical entries and their related knowledge. According to this model, a dictionary named El Madar1has been built and is now publicly available on line. The data are managed by a user-friendly Web-based lexicographical workstation. This work has not been done in isolation, but is the result of a collaborative effort by an international team mainly within the ISO network during a period of eight years.
APA, Harvard, Vancouver, ISO, and other styles
49

Alian, Marwah, Arafat Awajan, Ahmad Al-Hasan, and Raeda Akuzhia. "Building Arabic Paraphrasing Benchmark based on Transformation Rules." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 4 (June 9, 2021): 1–17. http://dx.doi.org/10.1145/3446770.

Full text
Abstract:
Measuring semantic similarity between short texts is an important task in many applications of natural language processing, such as paraphrasing identification. This process requires a benchmark of sentence pairs that are labeled by Arab linguists and considered a standard that can be used by researchers when evaluating their results. This research describes an Arabic paraphrasing benchmark to be a good standard for evaluation algorithms that are developed to measure semantic similarity for Arabic sentences to detect paraphrasing in the same language. The transformed sentences are in accordance with a set of rules for Arabic paraphrasing. These sentences are constructed from the words in the Arabic word semantic similarity dataset and from different Arabic books, educational texts, and lexicons. The proposed benchmark consists of 1,010 sentence pairs wherein each pair is tagged with scores determining semantic similarity and paraphrasing. The quality of the data is assessed using statistical analysis for the distribution of the sentences over the Arabic transformation rules and exploration through hierarchical clustering (HCL). Our exploration using HCL shows that the sentences in the proposed benchmark are grouped into 27 clusters representing different subjects. The inter-annotator agreement measures show a moderate agreement for the annotations of the graduate students and a poor reliability for the annotations of the undergraduate students.
APA, Harvard, Vancouver, ISO, and other styles
50

H. Aliwy, Ahmed, and Duaa A. Al_Raza. "Part of Speech Tagging for Arabic Long Sentence." International Journal of Engineering & Technology 7, no. 3.27 (August 15, 2018): 125. http://dx.doi.org/10.14419/ijet.v7i3.27.17671.

Full text
Abstract:
Part Of Speech (POS) tagging of Arabic words is a difficult and non-travail task it was studied in details for the last twenty years and its performance affects many applications and tasks in area of natural language processing (NLP). The sentence in Arabic language is very long compared with English sentence. This affect tagging process for any approach deals with complete sentence at once as in Hidden Markov Model HMM tagger. In this paper, new approach is suggested for using HMM and n-grams taggers for tagging Arabic words in a long sentence. The suggested approach is very simple and easy to implement. It is implemented on data set of 1000 documents of 526321 tokens annotated manually (containing punctuations). The results shows that the suggested approach has higher accuracy than HMM and n-gram taggers. The F-measures were 0.888, 0.925 and 0.957 for n-grams, HMM and the suggested approach respectively.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography