Journal articles on the topic 'Arabic language – Computer network resources'

To see the other types of publications on this topic, follow the link: Arabic language – Computer network resources.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Arabic language – Computer network resources.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mahmoud, Adnen, and Mounir Zrigui. "Semantic Similarity Analysis for Corpus Development and Paraphrase Detection in Arabic." International Arab Journal of Information Technology 18, no. 1 (December 31, 2020): 1–7. http://dx.doi.org/10.34028/iajit/18/1/1.

Full text
Abstract:
Paraphrase detection allows determining how original and suspect documents convey the same meaning. It has attracted attention from researchers in many Natural Language Processing (NLP) tasks such as plagiarism detection, question answering, information retrieval, etc., Traditional methods (e.g., Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), and Latent Semantic Analysis (LSA)) cannot capture efficiently hidden semantic relations when sentences may not contain any common words or the co-occurrence of words is rarely present. Therefore, we proposed a deep learning model based on Global Word embedding (GloVe) and Recurrent Convolutional Neural Network (RCNN). It was efficient for capturing more contextual dependencies between words vectors with precise semantic meanings. Seeing the lack of resources in Arabic language publicly available, we developed a paraphrased corpus automatically. It preserved syntactic and semantic structures of Arabic sentences using word2vec model and Part-Of-Speech (POS) annotation. Overall experiments shown that our proposed model outperformed the state-of-the-art methods in terms of precision and recall
APA, Harvard, Vancouver, ISO, and other styles
2

Alali, Muath, Nurfadhlina Mohd Sharef, Masrah Azrifah Azmi Murad, Hazlina Hamdan, and Nor Azura Husin. "Multitasking Learning Model Based on Hierarchical Attention Network for Arabic Sentiment Analysis Classification." Electronics 11, no. 8 (April 9, 2022): 1193. http://dx.doi.org/10.3390/electronics11081193.

Full text
Abstract:
Limited approaches have been applied to Arabic sentiment analysis for a five-point classification problem. These approaches are based on single task learning with a handcrafted feature, which does not provide robust sentence representation. Recently, hierarchical attention networks have performed outstandingly well. However, when training such models as single-task learning, these models do not exhibit superior performance and robust latent feature representation in the case of a small amount of data, specifically on the Arabic language, which is considered a low-resource language. Moreover, these models are based on single task learning and do not consider the related tasks, such as ternary and binary tasks (cross-task transfer). Centered on these shortcomings, we regard five ternary tasks as relative. We propose a multitask learning model based on hierarchical attention network (MTLHAN) to learn the best sentence representation and model generalization, with shared word encoder and attention network across both tasks, by training three-polarity and five-polarity Arabic sentiment analysis tasks alternately and jointly. Experimental results showed outstanding performance of the proposed model, with high accuracy of 83.98%, 87.68%, and 84.59 on LABR, HARD, and BRAD datasets, respectively, and a minimum macro mean absolute error of 0.632% on the Arabic tweets dataset for five-point Arabic sentiment classification problem.
APA, Harvard, Vancouver, ISO, and other styles
3

Chouikhi, Hasna, Mohammed Alsuhaibani, and Fethi Jarray. "BERT-Based Joint Model for Aspect Term Extraction and Aspect Polarity Detection in Arabic Text." Electronics 12, no. 3 (January 19, 2023): 515. http://dx.doi.org/10.3390/electronics12030515.

Full text
Abstract:
Aspect-based sentiment analysis (ABSA) is a method used to identify the aspects discussed in a given text and determine the sentiment expressed towards each aspect. This can help provide a more fine-grained understanding of the opinions expressed in the text. The majority of Arabic ABSA techniques in use today significantly rely on repeated pre-processing and feature-engineering operations, as well as the use of outside resources (e.g., lexicons). In essence, there is a significant research gap in NLP with regard to the use of transfer learning (TL) techniques and language models for aspect term extraction (ATE) and aspect polarity detection (APD) in Arabic text. While TL has proven to be an effective approach for a variety of NLP tasks in other languages, its use in the context of Arabic has been relatively under-explored. This paper aims to address this gap by presenting a TL-based approach for ATE and APD in Arabic, leveraging the knowledge and capabilities of previously trained language models. The Arabic base (Arabic version) of the BERT model serves as the foundation for the suggested models. Different BERT implementations are also contrasted. A reference ABSA dataset was used for the experiments (HAAD dataset). The experimental results demonstrate that our models surpass the baseline model and previously proposed approaches.
APA, Harvard, Vancouver, ISO, and other styles
4

FATTAH, MOHAMED ABDEL, FUJI REN, and SHINGO KUROIWA. "SENTENCE ALIGNMENT USING FEED FORWARD NEURAL NETWORK." International Journal of Neural Systems 16, no. 06 (December 2006): 423–34. http://dx.doi.org/10.1142/s0129065706000822.

Full text
Abstract:
Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English–Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.
APA, Harvard, Vancouver, ISO, and other styles
5

Gul, Shabana, Rafi Ullah Khan, Mohib Ullah, Roman Aftab, Abdul Waheed, and Tsu-Yang Wu. "Tanz-Indicator: A Novel Framework for Detection of Perso-Arabic-Scripted Urdu Sarcastic Opinions." Wireless Communications and Mobile Computing 2022 (July 28, 2022): 1–9. http://dx.doi.org/10.1155/2022/9151890.

Full text
Abstract:
Automatic sarcasm detection in textual data is a crucial task in sentiment analysis. This problem is complex because sarcastic comments usually carry the opposite meaning and are context-driven. The issue of sarcasm detection in comments written in Perso-Arabic-scripted Urdu text is even more challenging due to limited online linguistic resources. In this research, we proposed Tanz-Indicator, a lexicon-based framework to detect sarcasm in the user comments posted in Perso-Arabic Urdu language. We use a lexicon of over 3000 sarcastic tweets and 100 sarcastic features for experimentation. We also train two machine learning models with the same data to compare the performance of the lexicon-based model and machine learning-based model. The results show that the lexicon-based model correctly identified 48.5% sarcastic and 23.5% nonsarcastic tweets with the recall of 69.6% and 87.9% precision. The recall rate of Naïve Bayes and SVM-based machine learning models was 20.1% and 24.4%, respectively, with an overall accuracy of 65.2% and 60.1%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
6

Saleh, Hager, Sherif Mostafa, Lubna Abdelkareim Gabralla, Ahmad O. Aseeri, and Shaker El-Sappagh. "Enhanced Arabic Sentiment Analysis Using a Novel Stacking Ensemble of Hybrid and Deep Learning Models." Applied Sciences 12, no. 18 (September 7, 2022): 8967. http://dx.doi.org/10.3390/app12188967.

Full text
Abstract:
Sentiment analysis (SA) is a machine learning application that drives people’s opinions from text using natural language processing (NLP) techniques. Implementing Arabic SA is challenging for many reasons, including equivocation, numerous dialects, lack of resources, morphological diversity, lack of contextual information, and hiding of sentiment terms in the implicit text. Deep learning models such as convolutional neural networks (CNN) and long short-term memory (LSTM) have significantly improved in the Arabic SA domain. Hybrid models based on CNN combined with long short-term memory (LSTM) or gated recurrent unit (GRU) have further improved the performance of single DL models. In addition, the ensemble of deep learning models, especially stacking ensembles, is expected to increase the robustness and accuracy of the previous DL models. In this paper, we proposed a stacking ensemble model that combined the prediction power of CNN and hybrid deep learning models to predict Arabic sentiment accurately. The stacking ensemble algorithm has two main phases. Three DL models were optimized in the first phase, including deep CNN, hybrid CNN-LSTM, and hybrid CNN-GRU. In the second phase, these three separate pre-trained models’ outputs were integrated with a support vector machine (SVM) meta-learner. To extract features for DL models, the continuous bag of words (CBOW) and the skip-gram models with 300 dimensions of the word embedding were used. Arabic health services datasets (Main-AHS and Sub-AHS) and the Arabic sentiment tweets dataset were used to train and test the models (ASTD). A number of well-known deep learning models, including DeepCNN, hybrid CNN-LSTM, hybrid CNN-GRU, and conventional ML algorithms, have been used to compare the performance of the proposed ensemble model. We discovered that the proposed deep stacking model achieved the best performance compared to the previous models. Based on the CBOW word embedding, the proposed model achieved the highest accuracy of 92.12%, 95.81%, and 81.4% for Main-AHS, Sub-AHS, and ASTD datasets, respectively.
APA, Harvard, Vancouver, ISO, and other styles
7

Butnaru, Andrei-Mădălin. "Machine learning applied in natural language processing." ACM SIGIR Forum 54, no. 1 (June 2020): 1–3. http://dx.doi.org/10.1145/3451964.3451979.

Full text
Abstract:
Machine Learning is present in our lives now more than ever. One of the most researched areas in machine learning is focused on creating systems that are able to understand natural language. Natural language processing is a broad domain, having a vast number of applications with a significant impact in society. In our current era, we rely on tools that can ease our lives. We can search through thousands of documents to find something that we need, but this can take a lot of time. Having a system that can understand a simple query and return only relevant documents is more efficient. Although current approaches are well capable of understanding natural language, there is still space for improvement. This thesis studies multiple natural language processing tasks, presenting approaches on applications such as information retrieval, polarity detection, dialect identification [Butnaru and Ionescu, 2018], automatic essay scoring [Cozma et al., 2018], and methods that can help other systems to understand documents better. Part of the described approaches from this thesis are employing kernel methods, especially string kernels. A method based on string kernels that can determine in what dialect a document is written is presented in this thesis. The approach is treating texts at the character level, extracting features in the form of p -grams of characters, and combining several kernels, including presence bits kernel and intersection kernel. Kernel methods are also presented as a solution for defining the complexity of a specific word. By combining multiple low-level features and high-level semantic features, the approach can find if a non-native speaker of a language can see a word as complicated or not. With one focus on string kernels, this thesis proposes two transductive methods that can improve the results obtained by employing string kernels. One approach suggests using the pairwise string kernel similarities between samples from the training and test sets as features. The other method defines a simple self-training algorithm composed of two iterations. As usual, a classifier is trained over the training data, then is it used to predict the labels of the test samples. In the second iteration, the algorithm adds a predefined number of test samples to the training set for another round of training. These two transductive methods work by adapting the learning method to the test set. A novel cross-dialectal corpus is shown in this thesis. The Moldavian versus Romanian Corpus (MOROCO) [Butnaru and Ionescu, 2019a] contains over 30.000 samples collected from the news domain, split across six categories. Several studies can be employed over this corpus such as binary classification between Romanian and Moldavian samples, intra-dialect multi-class categorization by topic, and cross-dialect multi-class classification by topic. Two baseline approaches are presented for this collection of texts. One method is based on a simple string kernel model. The second approach consists of a character-level deep neural network, which includes several Squeeze-and-Excitation Blocks (SE-blocks). As known at this moment, this is the first time when a SE-block is employed in a natural language processing context. This thesis also presents a method for German Dialect Identification composed on a voting scheme that combines a Character-level Convolutional Neural Network, a Long Short-Term Memory Network, and a model based on String Kernels. Word sense disambiguation is still one of the challenges of the NLP domain. In this context, this thesis tackles this challenge and presents a novel disambiguation algorithm, known as ShowtgunWSD [Butnaru and Ionescu, 2019b]. By treating the global disambiguation problem as multiple local disambiguation problems, ShotgunWSD is capable of determining the sense of the words in an unsupervised and deterministic way, using WordNet as a resource. For this method to work, three functions that can compute the similarity between two words senses are defined. The disambiguation algorithm works as follows. The document is split into multiple windows of words of a specific size for each window. After that, a brute-force algorithm that computes every combination of senses for each word within that window is employed. For every window combination, a score is calculated using one of the three similarity functions. The last step merges the windows using a prefix and suffix matching to form more significant and relevant windows. In the end, the formed windows are ranked by the length and score, and the top ones, based on a voting scheme, will determine the sense for each word. Documents can contain a variable number of words, therefore employing them in machine learning may be hard at times. This thesis presents two novel approaches [Ionescu and Butnaru, 2019] that can represent documents using a finite number of features. Both methods are inspired by computer vision, and they work by first transforming the words within documents to a word representation, such as word2vec. Having words represented in this way, a k-means clustering algorithm can be applied over the words. The centroids of the formed clusters are gathered into a vocabulary. Each word from a document is then represented by the closest centroid from the previously formed vocabulary. To this point, both methods share the same steps. One approach is designed to compute the final representation of a document by calculating the frequency of each centroid found inside it. This method is named Bag of Super Word Embeddings (BOSWE) because each centroid can be viewed as a super word. The second approach presented in this thesis, known as Vector of Locally-Aggregated Word Embeddings (VLAWE), computes the document representation by accumulating the differences between each centroid and each word vector associated with the respective centroid. This thesis also describes a new way to score essays automatically by combining a low-level string kernel model with a high-level semantic feature representation, namely the BOSWE representation. The methods described in this thesis exhibit state-of-the-art performance levels over multiple tasks. One fact to support this claim is that the string kernel method employed for Arabic Dialect Identification obtained the first place, two years in a row at the Fourth and Fifth Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial). The same string kernel model obtained the fifth place at the German Dialect Identification Closed Shared Task at VarDial Workshop of EACL 2017. Second of all, the Complex Word Identification model scored a third-place at the CWI Shared Task of the BEA-13 of NAACL 2018. Third of all, it is worth to mention that the ShotgunWSD algorithm surpassed the MCS baseline on several datasets. Lastly, the model that combines string kernel and bag of super word embeddings obtained state-of-the-art performance over the Automated Student Assessment Prize dataset.
APA, Harvard, Vancouver, ISO, and other styles
8

Essam, Nader, Abdullah M. Moussa, Khaled M. Elsayed, Sherif Abdou, Mohsen Rashwan, Shaheen Khatoon, Md Maruf Hasan, Amna Asif, and Majed A. Alshamari. "Location Analysis for Arabic COVID-19 Twitter Data Using Enhanced Dialect Identification Models." Applied Sciences 11, no. 23 (November 30, 2021): 11328. http://dx.doi.org/10.3390/app112311328.

Full text
Abstract:
The recent surge of social media networks has provided a channel to gather and publish vital medical and health information. The focal role of these networks has become more prominent in periods of crisis, such as the recent pandemic of COVID-19. These social networks have been the leading platform for broadcasting health news updates, precaution instructions, and governmental procedures. They also provide an effective means for gathering public opinion and tracking breaking events and stories. To achieve location-based analysis for social media input, the location information of the users must be captured. Most of the time, this information is either missing or hidden. For some languages, such as Arabic, the users’ location can be predicted from their dialects. The Arabic language has many local dialects for most Arab countries. Natural Language Processing (NLP) techniques have provided several approaches for dialect identification. The recent advanced language models using contextual-based word representations in the continuous domain, such as BERT models, have provided significant improvement for many NLP applications. In this work, we present our efforts to use BERT-based models to improve the dialect identification of Arabic text. We show the results of the developed models to recognize the source of the Arabic country, or the Arabic region, from Twitter data. Our results show 3.4% absolute enhancement in dialect identification accuracy on the regional level over the state-of-the-art result. When we excluded the Modern Standard Arabic (MSA) set, which is formal Arabic language, we achieved 3% absolute gain in accuracy between the three major Arabic dialects over the state-of-the-art level. Finally, we applied the developed models on a recently collected resource for COVID-19 Arabic tweets to recognize the source country from the users’ tweets. We achieved a weighted average accuracy of 97.36%, which proposes a tool to be used by policymakers to support country-level disaster-related activities.
APA, Harvard, Vancouver, ISO, and other styles
9

Husain, Fatemah, and Ozlem Uzuner. "A Survey of Offensive Language Detection for the Arabic Language." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 1 (April 2021): 1–44. http://dx.doi.org/10.1145/3421504.

Full text
Abstract:
The use of offensive language in user-generated content is a serious problem that needs to be addressed with the latest technology. The field of Natural Language Processing (NLP) can support the automatic detection of offensive language. In this survey, we review previous NLP studies that cover Arabic offensive language detection. This survey investigates the state-of-the-art in offensive language detection for the Arabic language, providing a structured overview of previous approaches, including core techniques, tools, resources, methods, and main features used. This work also discusses the limitations and gaps of the previous studies. Findings from this survey emphasize the importance of investing further effort in detecting Arabic offensive language, including the development of benchmark resources and the invention of novel preprocessing and feature extraction techniques.
APA, Harvard, Vancouver, ISO, and other styles
10

Al-Moslmi, Tareq, Mohammed Albared, Adel Al-Shabi, Nazlia Omar, and Salwani Abdullah. "Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis." Journal of Information Science 44, no. 3 (February 1, 2017): 345–62. http://dx.doi.org/10.1177/0165551516683908.

Full text
Abstract:
Sentiment analysis is held to be one of the highly dynamic recent research fields in Natural Language Processing, facilitated by the quickly growing volume of Web opinion data. Most of the approaches in this field are focused on English due to the lack of sentiment resources in other languages such as the Arabic language and its large variety of dialects. In most sentiment analysis applications, good sentiment resources play a critical role. Based on that, in this article, several publicly available sentiment analysis resources for Arabic are introduced. This article introduces the Arabic senti-lexicon, a list of 3880 positive and negative synsets annotated with their part of speech, polarity scores, dialects synsets and inflected forms. This article also presents a Multi-domain Arabic Sentiment Corpus (MASC) with a size of 8860 positive and negative reviews from different domains. In this article, an in-depth study has been conducted on five types of feature sets for exploiting effective features and investigating their effect on performance of Arabic sentiment analysis. The aim is to assess the quality of the developed language resources and to integrate different feature sets and classification algorithms to synthesise a more accurate sentiment analysis method. The Arabic senti-lexicon is used for generating feature vectors. Five well-known machine learning algorithms: naïve Bayes, k-nearest neighbours, support vector machines (SVMs), logistic linear regression and neural network are employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on standard Arabic data sets were conducted, discussion is presented and conclusions are drawn. The experimental results show that the Arabic senti-lexicon is a very useful resource for Arabic sentiment analysis. Moreover, results show that classifiers which are trained on feature vectors derived from the corpus using the Arabic sentiment lexicon are more accurate than classifiers trained using the raw corpus.
APA, Harvard, Vancouver, ISO, and other styles
11

Luqman, Hamzah, and El-Sayed M. El-Alfy. "Towards Hybrid Multimodal Manual and Non-Manual Arabic Sign Language Recognition: mArSL Database and Pilot Study." Electronics 10, no. 14 (July 20, 2021): 1739. http://dx.doi.org/10.3390/electronics10141739.

Full text
Abstract:
Sign languages are the main visual communication medium between hard-hearing people and their societies. Similar to spoken languages, they are not universal and vary from region to region, but they are relatively under-resourced. Arabic sign language (ArSL) is one of these languages that has attracted increasing attention in the research community. However, most of the existing and available works on sign language recognition systems focus on manual gestures, ignoring other non-manual information needed for other language signals such as facial expressions. One of the main challenges of not considering these modalities is the lack of suitable datasets. In this paper, we propose a new multi-modality ArSL dataset that integrates various types of modalities. It consists of 6748 video samples of fifty signs performed by four signers and collected using Kinect V2 sensors. This dataset will be freely available for researchers to develop and benchmark their techniques for further advancement of the field. In addition, we evaluated the fusion of spatial and temporal features of different modalities, manual and non-manual, for sign language recognition using the state-of-the-art deep learning techniques. This fusion boosted the accuracy of the recognition system at the signer-independent mode by 3.6% compared with manual gestures.
APA, Harvard, Vancouver, ISO, and other styles
12

Kamruzzaman, M. M. "Arabic Sign Language Recognition and Generating Arabic Speech Using Convolutional Neural Network." Wireless Communications and Mobile Computing 2020 (May 23, 2020): 1–9. http://dx.doi.org/10.1155/2020/3685614.

Full text
Abstract:
Sign language encompasses the movement of the arms and hands as a means of communication for people with hearing disabilities. An automated sign recognition system requires two main courses of action: the detection of particular features and the categorization of particular input data. In the past, many approaches for classifying and detecting sign languages have been put forward for improving system performance. However, the recent progress in the computer vision field has geared us towards the further exploration of hand signs/gestures’ recognition with the aid of deep neural networks. The Arabic sign language has witnessed unprecedented research activities to recognize hand signs and gestures using the deep learning model. A vision-based system by applying CNN for the recognition of Arabic hand sign-based letters and translating them into Arabic speech is proposed in this paper. The proposed system will automatically detect hand sign letters and speaks out the result with the Arabic language with a deep learning model. This system gives 90% accuracy to recognize the Arabic hand sign-based letters which assures it as a highly dependable system. The accuracy can be further improved by using more advanced hand gestures recognizing devices such as Leap Motion or Xbox Kinect. After recognizing the Arabic hand sign-based letters, the outcome will be fed to the text into the speech engine which produces the audio of the Arabic language as an output.
APA, Harvard, Vancouver, ISO, and other styles
13

Halabi, Dana, Ebaa Fayyoumi, and Arafat Awajan. "I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 2 (March 31, 2022): 1–32. http://dx.doi.org/10.1145/3472295.

Full text
Abstract:
Treebanks are valuable linguistic resources that include the syntactic structure of a language sentence in addition to part-of-speech tags and morphological features. They are mainly utilized in modeling statistical parsers. Although the statistical natural language parser has recently become more accurate for languages such as English, those for the Arabic language still have low accuracy. The purpose of this article is to construct a new Arabic dependency treebank based on the traditional Arabic grammatical theory and the characteristics of the Arabic language, to investigate their effects on the accuracy of statistical parsers. The proposed Arabic dependency treebank, called I3rab, contrasts with existing Arabic dependency treebanks in two main concepts. The first concept is the approach of determining the main word of the sentence, and the second concept is the representation of the joined and covert pronouns. To evaluate I3rab, we compared its performance against a subset of Prague Arabic Dependency Treebank that shares a comparable level of details. The conducted experiments show that the percentage improvement reached up to 10.24% in UAS and 18.42% in LAS.
APA, Harvard, Vancouver, ISO, and other styles
14

Elfaik, Hanane, and El Habib Nfaoui. "Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text." Journal of Intelligent Systems 30, no. 1 (December 31, 2020): 395–412. http://dx.doi.org/10.1515/jisys-2020-0021.

Full text
Abstract:
Abstract Sentiment analysis aims to predict sentiment polarities (positive, negative or neutral) of a given piece of text. It lies at the intersection of many fields such as Natural Language Processing (NLP), Computational Linguistics, and Data Mining. Sentiments can be expressed explicitly or implicitly. Arabic Sentiment Analysis presents a challenge undertaking due to its complexity, ambiguity, various dialects, the scarcity of resources, the morphological richness of the language, the absence of contextual information, and the absence of explicit sentiment words in an implicit piece of text. Recently, deep learning has obviously shown a great success in the field of sentiment analysis and is considered as the state-of-the-art model in Arabic Sentiment Analysis. However, the state-of-the-art accuracy for Arabic sentiment analysis still needs improvements regarding contextual information and implicit sentiment expressed in different real cases. In this paper, an efficient Bidirectional LSTM Network (BiLSTM) is investigated to enhance Arabic Sentiment Analysis, by applying Forward-Backward encapsulate contextual information from Arabic feature sequences. The experimental results on six benchmark sentiment analysis datasets demonstrate that our model achieves significant improvements over the state-of-art deep learning models and the baseline traditional machine learning methods.
APA, Harvard, Vancouver, ISO, and other styles
15

Al-Yahya, Maha, Hend Al-Khalifa, Heyam Al-Baity, Duaa AlSaeed, and Amr Essam. "Arabic Fake News Detection: Comparative Study of Neural Networks and Transformer-Based Approaches." Complexity 2021 (April 16, 2021): 1–10. http://dx.doi.org/10.1155/2021/5516945.

Full text
Abstract:
Fake news detection (FND) involves predicting the likelihood that a particular news article (news report, editorial, expose, etc.) is intentionally deceptive. Arabic FND started to receive more attention in the last decade, and many detection approaches demonstrated some ability to detect fake news on multiple datasets. However, most existing approaches do not consider recent advances in natural language processing, i.e., the use of neural networks and transformers. This paper presents a comprehensive comparative study of neural network and transformer-based language models used for Arabic FND. We examine the use of neural networks and transformer-based language models for Arabic FND and show their performance compared to each other. We also conduct an extensive analysis of the possible reasons for the difference in performance results obtained by different approaches. The results demonstrate that transformer-based models outperform the neural network-based solutions, which led to an increase in the F1 score from 0.83 (best neural network-based model, GRU) to 0.95 (best transformer-based model, QARiB), and it boosted the accuracy by 16% compared to the best in neural network-based solutions. Finally, we highlight the main gaps in Arabic FND research and suggest future research directions.
APA, Harvard, Vancouver, ISO, and other styles
16

Soule, Robert, Shrutarshi Basu, Parisa Jalili Marandi, Fernando Pedone, Robert Kleinberg, Emin Gun Sirer, and Nate Foster. "Merlin: A Language for Managing Network Resources." IEEE/ACM Transactions on Networking 26, no. 5 (October 2018): 2188–201. http://dx.doi.org/10.1109/tnet.2018.2867239.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Sayed, Awny, and Amal Al Muqrishi. "CASONTO." International Journal of Web Information Systems 12, no. 2 (June 20, 2016): 242–62. http://dx.doi.org/10.1108/ijwis-12-2015-0047.

Full text
Abstract:
Purpose The purpose of this paper is to present an efficient and scalable Arabic semantic search engine based on a domain-specific ontological graph for Colleges of Applied Science, Sultanate of Oman (CASOnto). It also supports the factorial question answering and uses two types of searching: the keyword-based search and the semantics-based search in both languages Arabic and English. This engine is built on variety of technologies such as resource description framework data and ontological graph. Furthermore, two experimental results are conducted; the first is a comparison among entity-search and the classical-search in the system itself. The second compares the CASOnto with well-known semantic search engines such as Kngine, Wolfram Alpha and Google to measure their performance and efficiency. Design/methodology/approach The design and implementation of the system comprises the following phases, namely, designing inference, storing, indexing, searching, query processing and the user’s friendly interface, where it is designed based on a specific domain of the IBRI CAS (College of Applied Science) to highlight the academic and nonacademic departments. Furthermore, it is ontological inferred data stored in the tuple data base (TDB) and MySQL to handle the keyword-based search as well as entity-based search. The indexing and searching processes are built based on the Lucene for the keyword search, while TDB is used for the entity search. Query processing is a very important component in the search engines that helps to improve the user’s search results and make the system efficient and scalable. CASOnto handles the Arabic issues such as spelling correction, query completion, stop words’ removal and diacritics removal. It also supports the analysis of the factorial question answering. Findings In this paper, an efficient and scalable Arabic semantic search engine is proposed. The results show that the semantic search that built on the SPARQL is better than the classical search in both simple and complex queries. Clearly, the accuracy of semantic search equals to 100 per cent in both types of queries. On the other hand, the comparison of CASOnto with the Wolfram Alpha, Kngine and Google refers to better results by CASOnto. Consequently, it seems that our proposed engine retrieved better and efficient results than other engines. Thus, it is built according to the ontological domain-specific, highly scalable performance and handles the complex queries well by understanding the context behind the query. Research limitations/implications The proposed engine is built on a specific domain (CAS Ibri – Oman), and in the future vision, it will highlight the nonfactorial question answering and expand the domain of CASOnto to involve more integrated different domains. Originality/value The main contribution of this paper is to build an efficient and scalable Arabic semantic search engine. Because of the widespread use of search engines, a new dimension of challenge is created to keep up with the evolution of the semantic Web. Whereas, catering to the needs of users has become a matter of paramount importance in the light of artificial intelligence and technological development to access the accurate and the efficient information in less possible time. However, the research challenges still in its infancy due to lack of research engine that supports the Arabic language. It could be traced back to the complexity of the Arabic language morphological and grammar rules.
APA, Harvard, Vancouver, ISO, and other styles
18

Abugharsa, Azza. "Sentiment Analysis in Poems in Misurata Sub-dialect." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 21 (September 15, 2021): 103–14. http://dx.doi.org/10.24297/ijct.v21i.9105.

Full text
Abstract:
Over the recent decades, there has been a significant increase and development of resources for Arabic natural language processing. This includes the task of exploring Arabic Language Sentiment Analysis (ALSA) from Arabic utterances in both Modern Standard Arabic (MSA) and different Arabic dialects. This study focuses on detecting sentiment in poems written in Misurata Arabic sub-dialect spoken in Misurata, Libya. The tools used to detect sentiment from the dataset are Sklearn as well as Mazajak sentiment tool1. Logistic Regression, Random Forest, Naive Bayes (NB), and Support Vector Machines (SVM) classifiers are used with Sklearn, while the Convolutional Neural Network (CNN) is implemented with Mazajak. The results show that the traditional classifiers score a higher level of accuracy as compared to Mazajak which is built on an algorithm that includes deep learning techniques. More research is suggested to analyze Arabic sub-dialect poetry in order to investigate the aspects that contribute to sentiments in these multi-line texts; for example, the use of figurative language such as metaphors.
APA, Harvard, Vancouver, ISO, and other styles
19

Ejbali, Ridha, Mourad Zaied, and Chokri Ben Amar. "Wavelet network for recognition system of Arabic word." International Journal of Speech Technology 13, no. 3 (July 13, 2010): 163–74. http://dx.doi.org/10.1007/s10772-010-9076-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Alani, Ali A., and Georgina Cosma. "ArSL-CNN a convolutional neural network for Arabic sign language gesture recognition." Indonesian Journal of Electrical Engineering and Computer Science 22, no. 2 (May 1, 2021): 1096. http://dx.doi.org/10.11591/ijeecs.v22.i2.pp1096-1107.

Full text
Abstract:
<p class="IJASEITAbtract">Sign language (SL) is a visual language means of communication for people who are Deaf or have hearing impairments. In Arabic-speaking countries, there are many Arabic sign languages (ArSL) and these use the same alphabets. This study proposes ArSL-CNN, a deep learning model that is based on a convolutional neural network (CNN) for translating Arabic SL (ArSL). Experiments were performed using a large ArSL dataset (ArSL2018) that contains 54049 images of 32 sign language gestures, collected from forty participants. The results of the first experiments with the ArSL-CNN model returned a train and test accuracy of 98.80% and 96.59%, respectively. The results also revealed the impact of imbalanced data on model accuracy. For the second set of experiments, various re-sampling methods were applied to the dataset. Results revealed that applying the synthetic minority oversampling technique (SMOTE) improved the overall test accuracy from 96.59% to 97.29%, yielding a statistically signicant improvement in test accuracy (p=0.016, α&lt;0=05). The proposed ArSL-CNN model can be trained on a variety of Arabic sign languages and reduce the communication barriers encountered by Deaf communities in Arabic-speaking countries.</p>
APA, Harvard, Vancouver, ISO, and other styles
21

Beseiso, Majdi, Samiksha Tripathi, Bashar Al-Shboul, and Renad Aljadid. "Semantics based English-Arabic machine translation evaluation." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 1 (July 1, 2022): 189. http://dx.doi.org/10.11591/ijeecs.v27.i1.pp189-197.

Full text
Abstract:
Some classic machine translation (MT) Evaluation methods, such as the bilingual evaluation understudy score (BLEU), have notably underperformed in evaluating machine translations for morphologically rich languages like Arabic. However, the recent remarkable advancements in the domain of word vectors and sentence vectors have opened up new research avenues for low-resource languages. This paper proposes a novel linguistic-based evaluation method for English-translated sentences in Arabic. The proposed approach includes penalties based on length, positions, and context-based schemes such as part-of-speech tagging (POS) and multilingual sentenceBERT (SBERT) models for machine translation evaluation. The proposed technique is tested using pearson correlation as a performance evaluation parameter and compared with state-of-the-art techniques. The experimental results demonstrate that the proposed model evidently outperforms other MT evaluation methods such as BLEU.
APA, Harvard, Vancouver, ISO, and other styles
22

Fadel, Ali, Ibraheem Tuffaha, and Mahmoud Al-Ayyoub. "Neural Arabic Text Diacritization: State-of-the-Art Results and a Novel Approach for Arabic NLP Downstream Tasks." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 1 (January 31, 2022): 1–25. http://dx.doi.org/10.1145/3470849.

Full text
Abstract:
In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF), and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models even those requiring human-crafted language-dependent post-processing steps, unlike ours. Moreover, we show how diacritics in Arabic can be used to enhance the models of downstream NLP tasks such as Machine Translation (MT) and Sentiment Analysis (SA) by proposing novel Translation over Diacritization (ToD) and Sentiment over Diacritization (SoD) approaches.
APA, Harvard, Vancouver, ISO, and other styles
23

Zanona, Marwan Abo, Anmar Abuhamdah, and Bassam Mohammed El-Zaghmouri. "Arabic Hand Written Character Recognition Based on Contour Matching and Neural Network." Computer and Information Science 12, no. 2 (April 30, 2019): 126. http://dx.doi.org/10.5539/cis.v12n2p126.

Full text
Abstract:
Complexity of Arabic writing language makes its handwritten recognition very complex in terms of computer algorithms. The Arabic handwritten recognition has high importance in modern applications. The contour analysis of word image can extract special contour features that discriminate one character from another by the mean of vector features. This paper implements a set of pre-processing functions over a handwritten Arabic characters, with contour analysis, to enter the contour vector to neural network to recognize it. The selection of this set of pre-processing algorithms was completed after hundreds of tests and validation. The feed forward neural network architecture was trained using many patterns regardless of the Arabic font style building a rigid recognition model. Because of the shortcomings in Arabic written databases or datasets, the testing was done by non-standard data set. The presented algorithm structure got recognition ratio about 97%.
APA, Harvard, Vancouver, ISO, and other styles
24

Kaddoura, Sanaa, Maher Itani, and Chris Roast. "Analyzing the Effect of Negation in Sentiment Polarity of Facebook Dialectal Arabic Text." Applied Sciences 11, no. 11 (May 22, 2021): 4768. http://dx.doi.org/10.3390/app11114768.

Full text
Abstract:
With the increase in the number of users on social networks, sentiment analysis has been gaining attention. Sentiment analysis establishes the aggregation of these opinions to inform researchers about attitudes towards products or topics. Social network data commonly contain authors’ opinions about specific subjects, such as people’s opinions towards steps taken to manage the COVID-19 pandemic. Usually, people use dialectal language in their posts on social networks. Dialectal language has obstacles that make opinion analysis a challenging process compared to working with standard language. For the Arabic language, Modern Standard Arabic tools (MSA) cannot be employed with social network data that contain dialectal language. Another challenge of the dialectal Arabic language is the polarity of opinionated words affected by inverters, such as negation, that tend to change the word’s polarity from positive to negative and vice versa. This work analyzes the effect of inverters on sentiment analysis of social network dialectal Arabic posts. It discusses the different reasons that hinder the trivial resolution of inverters. An experiment is conducted on a corpus of data collected from Facebook. However, the same work can be applied to other social network posts. The results show the impact that resolution of negation may have on the classification accuracy. The results show that the F1 score increases by 20% if negation is treated in the text.
APA, Harvard, Vancouver, ISO, and other styles
25

Mohammed, Tawffeek. "Designing an Arabic Speaking and Listening Skills E- Course: Resources, Activities and Students' Perceptions." Electronic Journal of e-Learning 20, no. 1 (January 26, 2022): pp53–68. http://dx.doi.org/10.34190/ejel.20.1.2177.

Full text
Abstract:
This paper presents a fully online course model for teaching speaking and listening skills for students learning Arabic as a foreign language at the International Peace College South Africa on the NEO learning management platform. It also investigates the students' attitudes towards the course. The course was developed by the researcher during the first semester of 2020. This period coincided with South Africa’s first wave of COVID-19, and the country’s first strict lockdown. The syllabus consists of three components: listening, speaking and conversational Arabic. It includes various technology-enhanced activities and resources which were developed by using LMS features, Web 2.0 tools, and e-learning specifications such as Learning Tools Interoperability (LTI) and Shareable Content Object Reference Model (SCORM). The integration of technology in the course is based on an approach that combines Bloom's taxonomy and Technology Integration Matrix (TIM). Apart from the description of the course, this study used a thirty-item questionnaire to investigate the attitudes of thirty-one learners who participated in the course. They answered questions about the course’s resources, activities as well as its impact on their language skills. Results from the questionnaire revealed that the respondents' attitudes towards the online course were positive and statistically significant at p < .05. The design and the approach adopted in this study can apply to any context of language teaching. It provides a myriad of technology-enhanced activities that can be effectively used to teach listening and speaking skills virtually. Foreign language teachers can adopt this approach in its entirety, or with idiosyncratic modifications to design their language courses, irrespective of the virtual learning ecology (VLE) they use.
APA, Harvard, Vancouver, ISO, and other styles
26

Daoud, Mohammad. "Topical and Non-Topical Approaches to Measure Similarity between Arabic Questions." Big Data and Cognitive Computing 6, no. 3 (August 22, 2022): 87. http://dx.doi.org/10.3390/bdcc6030087.

Full text
Abstract:
Questions are crucial expressions in any language. Many Natural Language Processing (NLP) or Natural Language Understanding (NLU) applications, such as question-answering computer systems, automatic chatting apps (chatbots), digital virtual assistants, and opinion mining, can benefit from accurately identifying similar questions in an effective manner. We detail methods for identifying similarities between Arabic questions that have been posted online by Internet users and organizations. Our novel approach uses a non-topical rule-based methodology and topical information (textual similarity, lexical similarity, and semantic similarity) to determine if a pair of Arabic questions are similarly paraphrased. Our method counts the lexical and linguistic distances between each question. Additionally, it identifies questions in accordance with their format and scope using expert hypotheses (rules) that have been experimentally shown to be useful and practical. Even if there is a high degree of lexical similarity between a When question (Timex Factoid—inquiring about time) and a Who inquiry (Enamex Factoid—asking about a named entity), they will not be similar. In an experiment using 2200 question pairs, our method attained an accuracy of 0.85, which is remarkable given the simplicity of the solution and the fact that we did not employ any language models or word embedding. In order to cover common Arabic queries presented by Arabic Internet users, we gathered the questions from various online forums and resources. In this study, we describe a unique method for detecting question similarity that does not require intensive processing, a sizable linguistic corpus, or a costly semantic repository. Because there are not many rich Arabic textual resources, this is especially important for informal Arabic text processing on the Internet.
APA, Harvard, Vancouver, ISO, and other styles
27

SamirElons, Ahmed, Magdy Abull-ela, and Mohamed F. Tolba. "Pulse-coupled neural network feature generation model for Arabic sign language recognition." IET Image Processing 7, no. 9 (December 1, 2013): 829–36. http://dx.doi.org/10.1049/iet-ipr.2012.0222.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Shaalan, Khaled. "A Survey of Arabic Named Entity Recognition and Classification." Computational Linguistics 40, no. 2 (June 2014): 469–510. http://dx.doi.org/10.1162/coli_a_00178.

Full text
Abstract:
As more and more Arabic textual information becomes available through the Web in homes and businesses, via Internet and Intranet services, there is an urgent need for technologies and tools to process the relevant information. Named Entity Recognition (NER) is an Information Extraction task that has become an integral part of many other Natural Language Processing (NLP) tasks, such as Machine Translation and Information Retrieval. Arabic NER has begun to receive attention in recent years. The characteristics and peculiarities of Arabic, a member of the Semitic languages family, make dealing with NER a challenge. The performance of an Arabic NER component affects the overall performance of the NLP system in a positive manner. This article attempts to describe and detail the recent increase in interest and progress made in Arabic NER research. The importance of the NER task is demonstrated, the main characteristics of the Arabic language are highlighted, and the aspects of standardization in annotating named entities are illustrated. Moreover, the different Arabic linguistic resources are presented and the approaches used in Arabic NER field are explained. The features of common tools used in Arabic NER are described, and standard evaluation metrics are illustrated. In addition, a review of the state of the art of Arabic NER research is discussed. Finally, we present our conclusions. Throughout the presentation, illustrative examples are used for clarification.
APA, Harvard, Vancouver, ISO, and other styles
29

Baniata, Laith H., Seyoung Park, and Seong-Bae Park. "A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects." Applied Sciences 8, no. 12 (December 5, 2018): 2502. http://dx.doi.org/10.3390/app8122502.

Full text
Abstract:
The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM) - Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, which will be integrated into the Multitask Neural Machine Translation (NMT) model. The proposed solution for NMT is based on the recurrent neural network encoder-decoder NMT model that has been introduced recently. The study has proposed and developed a unified Multitask NMT model that shares an encoder between the two tasks; Arabic Dialect (AD) to Modern Standard Arabic (MSA) translation task and the segment-level POS tagging tasks. A shared layer and an invariant layer are shared between the translation tasks. By training translation tasks and POS tagging task alternately, the proposed model can leverage the characteristic information and improve the translation quality from Arabic dialects to Modern Standard Arabic. The experiments are conducted from Levantine Arabic (LA) to MSA and Maghrebi Arabic (MA) to MSA translation tasks. As an additional linguistic resource, the segment-level part-of-speech tags for Arabic dialects were also exploited. Experiments suggest that translation quality and the performance of POS tagger were improved with the implementation of multitask learning approach.
APA, Harvard, Vancouver, ISO, and other styles
30

Oussous, Ahmed, Fatima-Zahra Benjelloun, Ayoub Ait Lahcen, and Samir Belfkih. "ASA: A framework for Arabic sentiment analysis." Journal of Information Science 46, no. 4 (May 21, 2019): 544–59. http://dx.doi.org/10.1177/0165551519849516.

Full text
Abstract:
Sentiment analysis (SA), also known as opinion mining, is a growing important research area. Generally, it helps to automatically determine if a text expresses a positive, negative or neutral sentiment. It enables to mine the huge increasing resources of shared opinions such as social networks, review sites and blogs. In fact, SA is used by many fields and for various languages such as English and Arabic. However, since Arabic is a highly inflectional and derivational language, it raises many challenges. In fact, SA of Arabic text should handle such complex morphology. To better handle these challenges, we decided to provide the research community and Arabic users with a new efficient framework for Arabic Sentiment Analysis (ASA). Our primary goal is to improve the performance of ASA by exploiting deep learning while varying the preprocessing techniques. For that, we implement and evaluate two deep learning models namely convolutional neural network (CNN) and long short-term memory (LSTM) models. The framework offers various preprocessing techniques for ASA (including stemming, normalisation, tokenization and stop words). As a result of this work, we first provide a new rich and publicly available Arabic corpus called Moroccan Sentiment Analysis Corpus (MSAC). Second, the proposed framework demonstrates improvement in ASA. In fact, the experimental results prove that deep learning models have a better performance for ASA than classical approaches (support vector machines, naive Bayes classifiers and maximum entropy). They also show the key role of morphological features in Arabic Natural Language Processing (NLP).
APA, Harvard, Vancouver, ISO, and other styles
31

KHEMAKHEM, AIDA, BILEL GARGOURI, ABDELMAJID BEN HAMADOU, and GIL FRANCOPOULO. "ISO standard modeling of a large Arabic dictionary." Natural Language Engineering 22, no. 6 (September 7, 2015): 849–79. http://dx.doi.org/10.1017/s1351324915000224.

Full text
Abstract:
AbstractIn this paper, we address the problem of the large coverage dictionaries of Arabic language usable both for direct human reading and automatic Natural Language Processing. For these purposes, we propose a normalized and implemented modeling, based on Lexical Markup Framework (LMF-ISO 24613) and Data Registry Category (DCR-ISO 12620), which allows a stable and well-defined interoperability of lexical resources through a unification of the linguistic concepts. Starting from the features of the Arabic language, and due to the fact that a large range of details and refinements need to be described specifically for Arabic, we follow a finely structuring strategy. Besides its richness in morphology, syntax and semantics knowledge, our model includes all the Arabic morphological patterns to generate the inflected forms from a given lemma and highlights the syntactic–semantic relations. In addition, an appropriate codification has been designed for the management of all types of relationships among lexical entries and their related knowledge. According to this model, a dictionary named El Madar1has been built and is now publicly available on line. The data are managed by a user-friendly Web-based lexicographical workstation. This work has not been done in isolation, but is the result of a collaborative effort by an international team mainly within the ISO network during a period of eight years.
APA, Harvard, Vancouver, ISO, and other styles
32

Boudjellal, Nada, Huaping Zhang, Asif Khan, Arshad Ahmad, Rashid Naseem, Jianyun Shang, and Lin Dai. "ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition." Complexity 2021 (March 13, 2021): 1–6. http://dx.doi.org/10.1155/2021/6633213.

Full text
Abstract:
The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.
APA, Harvard, Vancouver, ISO, and other styles
33

Almanaseer, Waref, Mohammad Alshraideh, and Omar Alkadi. "A Deep Belief Network Classification Approach for Automatic Diacritization of Arabic Text." Applied Sciences 11, no. 11 (June 4, 2021): 5228. http://dx.doi.org/10.3390/app11115228.

Full text
Abstract:
Deep learning has emerged as a new area of machine learning research. It is an approach that can learn features and hierarchical representation purely from data and has been successfully applied to several fields such as images, sounds, text and motion. The techniques developed from deep learning research have already been impacting the research on Natural Language Processing (NLP). Arabic diacritics are vital components of Arabic text that remove ambiguity from words and reinforce the meaning of the text. In this paper, a Deep Belief Network (DBN) is used as a diacritizer for Arabic text. DBN is an algorithm among deep learning that has recently proved to be very effective for a variety of machine learning problems. We evaluate the use of DBNs as classifiers in automatic Arabic text diacritization. The DBN was trained to individually classify each input letter with the corresponding diacritized version. Experiments were conducted using two benchmark datasets, the LDC ATB3 and Tashkeela. Our best settings achieve a DER and WER of 2.21% and 6.73%, receptively, on the ATB3 benchmark with an improvement of 26% over the best published results. On the Tashkeela benchmark, our system continues to achieve high accuracy with a DER of 1.79% and 14% improvement.
APA, Harvard, Vancouver, ISO, and other styles
34

Hamed Abd, Dhafar, Ahmed T. Sadiq, and Ayad R. Abbas. "PAAD: POLITICAL ARABIC ARTICLES DATASET FOR AUTOMATIC TEXT CATEGORIZATION." Iraqi Journal for Computers and Informatics 46, no. 1 (June 30, 2020): 1–10. http://dx.doi.org/10.25195/ijci.v46i1.246.

Full text
Abstract:
Now day’s text Classification and Sentiment analysis is considered as one of the popular Natural Language Processing (NLP) tasks. This kind of technique plays significant role in human activities and has impact on the daily behaviours. Each article in different fields such as politics and business represent different opinions according to the writer tendency. A huge amount of data will be acquired through that differentiation. The capability to manage the political orientation of an online article automatically. Therefore, there is no corpus for political categorization was directed towards this task in Arabic, due to the lack of rich representative resources for training an Arabic text classifier. However, we introduce political Arabic articles dataset (PAAD) of textual data collected from newspapers, social network, general forum and ideology website. The dataset is 206 articles distributed into three categories as (Reform, Conservative and Revolutionary) that we offer to the research community on Arabic computational linguistics. We anticipate that this dataset would make a great aid for a variety of NLP tasks on Modern Standard Arabic, political text classification purposes. We present the data in raw form and excel file. Excel file will be in four types such as V1 raw data, V2 preprocessing, V3 root stemming and V4 light stemming.
APA, Harvard, Vancouver, ISO, and other styles
35

Ismail, Mohammad H., Shefa A. Dawwd, and Fakhradeen H. Ali. "Static hand gesture recognition of Arabic sign language by using deep CNNs." Indonesian Journal of Electrical Engineering and Computer Science 24, no. 1 (October 1, 2021): 178. http://dx.doi.org/10.11591/ijeecs.v24.i1.pp178-188.

Full text
Abstract:
An Arabic sign language recognition using two concatenated deep convolution neural network models DenseNet121 &amp; VGG16 is presented. The pre-trained models are fed with images, and then the system can automatically recognize the Arabic sign language. To evaluate the performance of concatenated two models in the Arabic sign language recognition, the red-green-blue (RGB) images for various static signs are collected in a dataset. The dataset comprises 220,000 images for 44 categories: 32 letters, 11 numbers (0:10), and 1 for none. For each of the static signs, there are 5000 images collected from different volunteers. The pre-trained models were used and trained on prepared Arabic sign language data. These models were used after some modification. Also, an attempt has been made to adopt two models from the previously trained models, where they are trained in parallel deep feature extractions. Then they are combined and prepared for the classification stage. The results demonstrate the comparison between the performance of the single model and multi-model. It appears that most of the multi-model is better in feature extraction and classification than the single models. And also show that when depending on the total number of incorrect recognize sign image in training, validation and testing dataset, the best convolutional neural networks (CNN) model in feature extraction and classification Arabic sign language is the DenseNet121 for a single model using and DenseNet121 &amp; VGG16 for multi-model using.
APA, Harvard, Vancouver, ISO, and other styles
36

Alsubhi, Kholoud, Amani Jamal, and Areej Alhothali. "Deep learning-based approach for Arabic open domain question answering." PeerJ Computer Science 8 (May 4, 2022): e952. http://dx.doi.org/10.7717/peerj-cs.952.

Full text
Abstract:
Open-domain question answering (OpenQA) is one of the most challenging yet widely investigated problems in natural language processing. It aims at building a system that can answer any given question from large-scale unstructured text or structured knowledge-base. To solve this problem, researchers traditionally use information retrieval methods to retrieve the most relevant documents and then use answer extractions techniques to extract the answer or passage from the candidate documents. In recent years, deep learning techniques have shown great success in OpenQA by using dense representation for document retrieval and reading comprehension for answer extraction. However, despite the advancement in the English language OpenQA, other languages such as Arabic have received less attention and are often addressed using traditional methods. In this paper, we use deep learning methods for Arabic OpenQA. The model consists of document retrieval to retrieve passages relevant to a question from large-scale free text resources such as Wikipedia and an answer reader to extract the precise answer to the given question. The model implements dense passage retriever for the passage retrieval task and the AraELECTRA for the reading comprehension task. The result was compared to traditional Arabic OpenQA approaches and deep learning methods in the English OpenQA. The results show that the dense passage retriever outperforms the traditional Term Frequency-Inverse Document Frequency (TF-IDF) information retriever in terms of the top-20 passage retrieval accuracy and improves our end-to-end question answering system in two Arabic question-answering benchmark datasets.
APA, Harvard, Vancouver, ISO, and other styles
37

Meftah, Ali H., Yousef A. Alotaibi, and Sid-Ahmed Selouani. "Arabic Emotional Voice Conversion Using English Pre-Trained StarGANv2-VC-Based Model." Applied Sciences 12, no. 23 (November 28, 2022): 12159. http://dx.doi.org/10.3390/app122312159.

Full text
Abstract:
The goal of emotional voice conversion (EVC) is to convert the emotion of a speaker’s voice from one state to another while maintaining the original speaker’s identity and the linguistic substance of the message. Research on EVC in the Arabic language is well behind that conducted on languages with a wider distribution, such as English. The primary objective of this study is to determine whether Arabic emotions may be converted using a model trained for another language. In this work, we used an unsupervised many-to-many non-parallel generative adversarial network (GAN) voice conversion (VC) model called StarGANv2-VC to perform an Arabic EVC (A-EVC). The latter is realized by using pre-trained phoneme-level automatic speech recognition (ASR) and fundamental frequency (F0) models in the English language. The generated voice is evaluated by prosody and spectrum conversion in addition to automatic emotion recognition and speaker identification using a convolutional recurrent neural network (CRNN). The results of the evaluation indicated that male voices were scored higher than female voices and that the evaluation score for the conversion from neutral to other emotions was higher than the evaluation scores for the conversion of other emotions.
APA, Harvard, Vancouver, ISO, and other styles
38

Dhouib, Amira, Achraf Othman, Oussama El Ghoul, Mohamed Koutheair Khribi, and Aisha Al Sinani. "Arabic Automatic Speech Recognition: A Systematic Literature Review." Applied Sciences 12, no. 17 (September 5, 2022): 8898. http://dx.doi.org/10.3390/app12178898.

Full text
Abstract:
Automatic Speech Recognition (ASR), also known as Speech-To-Text (STT) or computer speech recognition, has been an active field of research recently. This study aims to chart this field by performing a Systematic Literature Review (SLR) to give insight into the ASR studies proposed, especially for the Arabic language. The purpose is to highlight the trends of research about Arabic ASR and guide researchers with the most significant studies published over ten years from 2011 to 2021. This SLR attempts to tackle seven specific research questions related to the toolkits used for developing and evaluating Arabic ASR, the supported type of the Arabic language, the used feature extraction/classification techniques, the type of speech recognition, the performance of Arabic ASR, the existing gaps facing researchers, along with some future research. Across five databases, 38 studies met our defined inclusion criteria. Our results showed different open-source toolkits to support Arabic speech recognition. The most prominent ones were KALDI, HTK, then CMU Sphinx toolkits. A total of 89.47% of the retained studies cover modern standard Arabic, whereas 26.32% of them were dedicated to different dialects of Arabic. MFCC and HMM were presented as the most used feature extraction and classification techniques, respectively: 63% of the papers were based on MFCC and 21% were based on HMM. The review also shows that the performance of Arabic ASR systems depends mainly on different criteria related to the availability of resources, the techniques used for acoustic modeling, and the used datasets.
APA, Harvard, Vancouver, ISO, and other styles
39

Baniata, Laith H., Seyoung Park, and Seong-Bae Park. "A Neural Machine Translation Model for Arabic Dialects That Utilizes Multitask Learning (MTL)." Computational Intelligence and Neuroscience 2018 (December 10, 2018): 1–10. http://dx.doi.org/10.1155/2018/7534712.

Full text
Abstract:
In this research article, we study the problem of employing a neural machine translation model to translate Arabic dialects to Modern Standard Arabic. The proposed solution of the neural machine translation model is prompted by the recurrent neural network-based encoder-decoder neural machine translation model that has been proposed recently, which generalizes machine translation as sequence learning problems. We propose the development of a multitask learning (MTL) model which shares one decoder among language pairs, and every source language has a separate encoder. The proposed model can be applied to limited volumes of data as well as extensive amounts of data. Experiments carried out have shown that the proposed MTL model can ensure a higher quality of translation when compared to the individually learned model.
APA, Harvard, Vancouver, ISO, and other styles
40

Fkih, Fethi, Tarek Moulahi, and Abdulatif Alabdulatif. "Machine Learning Model for Offensive Speech Detection in Online Social Networks Slang Content." WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS 20 (January 17, 2023): 7–15. http://dx.doi.org/10.37394/23209.2023.20.2.

Full text
Abstract:
The majority of the world’s population (about 4 billion people) now uses social media such as Facebook, Twitter, Instagram, and others. Social media has evolved into a vital form of communication, allowing individuals to interact with each other and share their knowledge and experiences. On the other hand, social media can be a source of malevolent conduct. In fact, nasty and criminal activity, such as cyberbullying and threatening, has grown increasingly common on social media, particularly among those who use Arabic. Detecting such behavior, however, is a difficult endeavor since it involves natural language, particularly Arabic, which is grammatically and syntactically rich and fruitful. Furthermore, social network users frequently employ Arabic slang and fail to correct obvious grammatical norms, making automatic recognition of bullying difficult. Meanwhile, only a few research studies in Arabic have addressed this issue. The goal of this study is to develop a method for recognizing and detecting Arabic slang offensive speech in Online Social Networks (OSNs). As a result, we propose an effective strategy based on the combination of Artificial Intelligence and statistical approach due to the difficulty of setting linguistic or semantic rules for modeling Arabic slang due to the absence of grammatical rules. An experimental study comparing frequent machine learning tools shows that Random Forest (RF) outperforms others in terms of precision (90%), recall (90%), and f1-score (90%).
APA, Harvard, Vancouver, ISO, and other styles
41

Peng, Fei, and Tianjie Cao. "Software-Defined Network Resource Optimization of the Data Center Based on P4 Programming Language." Mobile Information Systems 2021 (August 4, 2021): 1–7. http://dx.doi.org/10.1155/2021/3601104.

Full text
Abstract:
This paper makes use of the new architecture software-defined network (SDN) in the cloud data center based on P4 language to realize the flexible management and configuration of the network equipment to achieve (a) data center virtualization management and (b) data center resource optimization based on the P4 programming language. Furthermore, error tolerance of dynamic network optimization depends on the virtual machine (VM) online migration technology, and the load balancing mechanism has a very good flexibility. At the same time, the paper proposed a multipath VM migration strategy based on a quality of service (QoS) mechanism, which divides the VM migration resources into different QoS flows by network dynamic transmission and then selects valid forwarding for each flow path to migrate VMs. This ensures to improve the overall migration performance of VMs and ultimately the dynamic optimization of the network resources and their management. Our experimental evaluations show that the proposed model is approximately 13% and 17% better than the traditional state-of-the-art methods in terms of minimum migration time and the least downtime, respectively.
APA, Harvard, Vancouver, ISO, and other styles
42

Du, Lian Yan. "Design on Structure and Integrity for Foreign Language Network Learning Resource Library." Applied Mechanics and Materials 543-547 (March 2014): 4581–84. http://dx.doi.org/10.4028/www.scientific.net/amm.543-547.4581.

Full text
Abstract:
Network learning resources is accompanied by computer technology, communication technology and network technology developed new educational resources, for foreign language teaching providing a variety of teaching materials. For difficult problems in foreign language network learning resources construction, the paper based on SQL Server database management system to study resource library structure and integrity design. It based on requirements analysis and systems involved in language classes, resource types, resources, resource files, resource evaluation five entities, the part of structural design consisted includes conceptual design and logical design, the part of integrity design includes entity integrity design and referential integrity designs. This article provides practical solutions for foreign language learning resource library construction, for improving the quality of teaching foreign language has great significance.
APA, Harvard, Vancouver, ISO, and other styles
43

Alashban, Adal A., Mustafa A. Qamhan, Ali H. Meftah, and Yousef A. Alotaibi. "Spoken Language Identification System Using Convolutional Recurrent Neural Network." Applied Sciences 12, no. 18 (September 13, 2022): 9181. http://dx.doi.org/10.3390/app12189181.

Full text
Abstract:
Following recent advancements in deep learning and artificial intelligence, spoken language identification applications are playing an increasingly significant role in our day-to-day lives, especially in the domain of multi-lingual speech recognition. In this article, we propose a spoken language identification system that depends on the sequence of feature vectors. The proposed system uses a hybrid Convolutional Recurrent Neural Network (CRNN), which combines a Convolutional Neural Network (CNN) with a Recurrent Neural Network (RNN) network, for spoken language identification on seven languages, including Arabic, chosen from subsets of the Mozilla Common Voice (MCV) corpus. The proposed system exploits the advantages of both CNN and RNN architectures to construct the CRNN architecture. At the feature extraction stage, it compares the Gammatone Cepstral Coefficient (GTCC) feature and Mel Frequency Cepstral Coefficient (MFCC) feature, as well as a combination of both. Finally, the speech signals were represented as frames and used as the input for the CRNN architecture. After conducting experiments, the results of the proposed system indicate higher performance with combined GTCC and MFCC features compared to GTCC or MFCC features used individually. The average accuracy of the proposed system was 92.81% in the best experiment for spoken language identification. Furthermore, the system can learn language-specific patterns in various filter size representations of speech files.
APA, Harvard, Vancouver, ISO, and other styles
44

Moumen, Rajae, Raddouane Chiheb, and Rdouan Faizi. "Real-time Arabic scene text detection using fully convolutional neural networks." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 2 (April 1, 2021): 1634. http://dx.doi.org/10.11591/ijece.v11i2.pp1634-1640.

Full text
Abstract:
The aim of this research is to propose a fully convolutional approach to address the problem of real-time scene text detection for Arabic language. Text detection is performed using a two-steps multi-scale approach. The first step uses light-weighted fully convolutional network: TextBlockDetector FCN, an adaptation of VGG-16 to eliminate non-textual elements, localize wide scale text and give text scale estimation. The second step determines narrow scale range of text using fully convolutional network for maximum performance. To evaluate the system, we confront the results of the framework to the results obtained with single VGG-16 fully deployed for text detection in one-shot; in addition to previous results in the state-of-the-art. For training and testing, we initiate a dataset of 575 images manually processed along with data augmentation to enrich training process. The system scores a precision of 0.651 vs 0.64 in the state-of-the-art and a FPS of 24.3 vs 31.7 for a VGG-16 fully deployed.
APA, Harvard, Vancouver, ISO, and other styles
45

Ismail, Mohammad H., Shefa A. Dawwd, and Fakhradeen H. Ali. "Dynamic hand gesture recognition of Arabic sign language by using deep convolutional neural networks." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 2 (February 1, 2022): 952. http://dx.doi.org/10.11591/ijeecs.v25.i2.pp952-962.

Full text
Abstract:
<p>In computer vision, one of the most difficult problems is human gestures in videos recognition Because of certain irrelevant environmental variables. This issue has been solved by using single deep networks to learn spatiotemporal characteristics from video data, and this approach is still insufficient to handle both problems at the same time. As a result, the researchers fused various models to allow for the effective collection of important shape information as well as precise spatiotemporal variation of gestures. In this study, we collected the dynamic dataset for twenty meaningful words of Arabic sign language (ArSL) using a Microsoft Kinect v2 camera. The recorded data included 7350 red, green, and blue (RGB) videos and 7350 depth videos. We proposed four deep neural networks models using 2D and 3D convolutional neural network (CNN) to cover all feature extraction methods and then passing these features to the recurrent neural network (RNN) for sequence classification. Long short-term memory (LSTM) and gated recurrent unit (GRU) are two types of using RNN. Also, the research included evaluation fusion techniques for several types of multiple models. The experiment results show the best multi-model for the dynamic dataset of the ArSL recognition achieved 100% accuracy.</p>
APA, Harvard, Vancouver, ISO, and other styles
46

Etaiwi, Wael, and Arafat Awajan. "SemG-TS: Abstractive Arabic Text Summarization Using Semantic Graph Embedding." Mathematics 10, no. 18 (September 6, 2022): 3225. http://dx.doi.org/10.3390/math10183225.

Full text
Abstract:
This study proposes a novel semantic graph embedding-based abstractive text summarization technique for the Arabic language, namely SemG-TS. SemG-TS employs a deep neural network to produce the abstractive summary. A set of experiments were conducted to evaluate the performance of SemG-TS and to compare the results to those of a popular baseline word embedding technique called word2vec. A new dataset was collected for the experiments. Two evaluation methodologies were followed in the experiments: automatic and human evaluations. The Rouge evaluation measure was used for the automatic evaluation, while for the human evaluation, Arabic native speakers were tasked to evaluate the relevancy, similarity, readability, and overall satisfaction of the generated summaries. The obtained results prove the superiority of SemG-TS.
APA, Harvard, Vancouver, ISO, and other styles
47

Gupta, Vedika, Nikita Jain, Shubham Shubham, Agam Madan, Ankit Chaudhary, and Qin Xin. "Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 5 (June 23, 2021): 1–23. http://dx.doi.org/10.1145/3450447.

Full text
Abstract:
Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-based approaches—Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression—to analyze the sentiment contained in Hindi language text derived from Twitter. Further, the article presents lexicon-based approaches (Hindi Senti-WordNet, NRC Emotion Lexicon) for sentiment analysis in Hindi while also proposing a Domain-specific Sentiment Dictionary. Finally, an integrated convolutional neural network (CNN)—Recurrent Neural Network and Long Short-term Memory—is proposed to analyze sentiment from Hindi language tweets, a total of 23,767 tweets classified into positive, negative, and neutral. The proposed CNN approach gives an accuracy of 85%.
APA, Harvard, Vancouver, ISO, and other styles
48

Daqrouq, K., and K. Y. Al Azzawi. "Arabic vowels recognition based on wavelet average framing linear prediction coding and neural network." Speech Communication 55, no. 5 (June 2013): 641–52. http://dx.doi.org/10.1016/j.specom.2013.01.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Walid Al-Sayyed, Sa’ida. "The Causal Theory of Names: Between Theory and Practice." Arab World English Journal 12, no. 1 (March 15, 2021): 152–64. http://dx.doi.org/10.24093/awej/vol12no1.11.

Full text
Abstract:
This study explores to what extent a personal name has a causal relationship with its usage. Data were collected by means of a survey in which demographic data were elicited from the participants. Furthermore, the participants, whose ages were above 18 years, were asked to write their first names and reasons behind being given such names. The sample comprised 400 subjects who participated in the online survey distributed through social media network groups. The results revealed that names and naming practices are not haphazard ones. By and large, there is a relationship between the name and its usage, as stated by the causal theory of names. Whenever people choose a name, they are under the influence of; naming after people who are admired for their virtues, the aesthetic taste of the name, parents’ and relatives’ religious beliefs, maintaining rhyming names, circumstantial names, and respecting social and cultural traditions. Another striking finding is that nature and the environment are no longer rich resources for choosing names. Moreover, the analysis found evidence for the complete absence of names related to occupational and achievement names, death prevention and survival names, horrific names, and proverbial names. It is envisaged that the findings might be beneficial for sociolinguists, onomasticians, learners of Arabic as a foreign language, i.e. non-native speakers of Arabic. It might also help people working on language and culture and how culture affects naming traditions in the Arabic context.
APA, Harvard, Vancouver, ISO, and other styles
50

Guellil, Imane, Ahsan Adeel, Faical Azouaou, Sara Chennoufi, Hanene Maafi, and Thinhinane Hamitouche. "Detecting hate speech against politicians in Arabic community on social media." International Journal of Web Information Systems 16, no. 3 (July 31, 2020): 295–313. http://dx.doi.org/10.1108/ijwis-08-2019-0036.

Full text
Abstract:
Purpose This paper aims to propose an approach for hate speech detection against politicians in Arabic community on social media (e.g. Youtube). In the literature, similar works have been presented for other languages such as English. However, to the best of the authors’ knowledge, not much work has been conducted in the Arabic language. Design/methodology/approach This approach uses both classical algorithms of classification and deep learning algorithms. For the classical algorithms, the authors use Gaussian NB (GNB), Logistic Regression (LR), Random Forest (RF), SGD Classifier (SGD) and Linear SVC (LSVC). For the deep learning classification, four different algorithms (convolutional neural network (CNN), multilayer perceptron (MLP), long- or short-term memory (LSTM) and bi-directional long- or short-term memory (Bi-LSTM) are applied. For extracting features, the authors use both Word2vec and FastText with their two implementations, namely, Skip Gram (SG) and Continuous Bag of Word (CBOW). Findings Simulation results demonstrate the best performance of LSVC, BiLSTM and MLP achieving an accuracy up to 91%, when it is associated to SG model. The results are also shown that the classification that has been done on balanced corpus are more accurate than those done on unbalanced corpus. Originality/value The principal originality of this paper is to construct a new hate speech corpus (Arabic_fr_en) which was annotated by three different annotators. This corpus contains the three languages used by Arabic people being Arabic, French and English. For Arabic, the corpus contains both script Arabic and Arabizi (i.e. Arabic words written with Latin letters). Another originality is to rely on both shallow and deep leaning classification by using different model for extraction features such as Word2vec and FastText with their two implementation SG and CBOW.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography