To see the other types of publications on this topic, follow the link: Paraphrase Detection.

Journal articles on the topic 'Paraphrase Detection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Paraphrase Detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Altheneyan, Alaa, and Mohamed El Bachir Menai. "Evaluation of State-of-the-Art Paraphrase Identification and Its Application to Automatic Plagiarism Detection." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 04 (August 22, 2019): 2053004. http://dx.doi.org/10.1142/s0218001420530043.

Full text
Abstract:
Paraphrase identification is a natural language processing (NLP) problem that involves the determination of whether two text segments have the same meaning. Various NLP applications rely on a solution to this problem, including automatic plagiarism detection, text summarization, machine translation (MT), and question answering. The methods for identifying paraphrases found in the literature fall into two main classes: similarity-based methods and classification methods. This paper presents a critical study and an evaluation of existing methods for paraphrase identification and its application to automatic plagiarism detection. It presents the classes of paraphrase phenomena, the main methods, and the sets of features used by each particular method. All the methods and features used are discussed and enumerated in a table for easy comparison. Their performances on benchmark corpora are also discussed and compared via tables. Automatic plagiarism detection is presented as an application of paraphrase identification. The performances on benchmark corpora of existing plagiarism detection systems able to detect paraphrases are compared and discussed. The main outcome of this study is the identification of word overlap, structural representations, and MT measures as feature subsets that lead to the best performance results for support vector machines in both paraphrase identification and plagiarism detection on corpora. The performance results achieved by deep learning techniques highlight that these techniques are the most promising research direction in this field.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhou, Ying, Xiaokang Hu, and Vera Chung. "Automatic Construction of Fine-Grained Paraphrase Corpora System Using Language Inference Model." Applied Sciences 12, no. 1 (January 5, 2022): 499. http://dx.doi.org/10.3390/app12010499.

Full text
Abstract:
Paraphrase detection and generation are important natural language processing (NLP) tasks. Yet the term paraphrase is broad enough to include many fine-grained relations. This leads to different tolerance levels of semantic divergence in the positive paraphrase class among publicly available paraphrase datasets. Such variation can affect the generalisability of paraphrase classification models. It may also impact the predictability of paraphrase generation models. This paper presents a new model which can use few corpora of fine-grained paraphrase relations to construct automatically using language inference models. The fine-grained sentence level paraphrase relations are defined based on word and phrase level counterparts. We demonstrate that the fine-grained labels from our proposed system can make it possible to generate paraphrases at desirable semantic level. The new labels could also contribute to general sentence embedding techniques.
APA, Harvard, Vancouver, ISO, and other styles
3

Siswantining, Titin, Stanley Pratama, and Devvi Sarwinda. "SPRATAMA MODEL FOR INDONESIAN PARAPHRASE DETECTION USING BIDIRECTIONAL LONG SHORT-TERM MEMORY AND BIDIRECTIONAL GATED RECURRENT UNIT." MEDIA STATISTIKA 15, no. 2 (March 5, 2023): 129–38. http://dx.doi.org/10.14710/medstat.15.2.129-138.

Full text
Abstract:
Paraphrasing is a way to write sentences with other words with the same intent or purpose. Automatic paraphrase detection can be done using Natural Language Sentence Matching (NLSM) which is part of Natural Language Processing (NLP). NLP is a computational technique for processing text in general, while NLSM is used specifically to find the relationship between two sentences. With the development Neural Network (NN), nowadays NLP can be done more easily by computers. Many models for detecting and paraphrasing in English have been developed compared to Indonesian, which has less training data. This study proposes SPratama Model, which models paraphrase detection for Indonesian using a Recurrent Neural Network (RNN), namely Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Gated Recurrent Unit (BiGRU). The data used is "Quora Question Pairs" taken from Kaggle and translated into Indonesian using Google Translate. The results of this study indicate that the proposed model has an accuracy of around 80% for the detection of paraphrased sentences.
APA, Harvard, Vancouver, ISO, and other styles
4

Barrón-Cedeño, Alberto, Marta Vila, M. Martí, and Paolo Rosso. "Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection." Computational Linguistics 39, no. 4 (December 2013): 917–47. http://dx.doi.org/10.1162/coli_a_00153.

Full text
Abstract:
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems.
APA, Harvard, Vancouver, ISO, and other styles
5

Bamnote, Dr G. R., and Ms Deepti Ingole. "Design of Efficient Model to Predict Duplications in Questionnaire Forum using Machine Learning." International Journal for Research in Applied Science and Engineering Technology 11, no. 5 (May 31, 2023): 5893–97. http://dx.doi.org/10.22214/ijraset.2023.53088.

Full text
Abstract:
Abstract: Detection of duplicate sentences from a corpus containing a pair of sentences deals with identifying whether two sentences in the pair convey the same meaning or not. This detection of duplicates helps in deduplication, a process in which duplicates are removed. Traditional natural language processing techniques are less accurate in identifying similarity between sentences, such similar sentences can also be referred as paraphrases. Using Quora and Twitter paraphrase corpus, we explored various approaches including several machine learning algorithms to obtain a liable approach that can identify the duplicate sentences given a pair of sentences. This paper discusses the performance of six supervised machine learning algorithms in two different paraphrase corpus, and it focuses on analyzing how accurately the algorithms classify sentences present in the corpus as duplicates and non-duplicates
APA, Harvard, Vancouver, ISO, and other styles
6

Chitra, A., and Anupriya Rajkumar. "Plagiarism Detection Using Machine Learning-Based Paraphrase Recognizer." Journal of Intelligent Systems 25, no. 3 (July 1, 2016): 351–59. http://dx.doi.org/10.1515/jisys-2014-0146.

Full text
Abstract:
AbstractPlagiarism in free text has become a common occurrence due to the wide availability of voluminous information resources. Automatic plagiarism detection systems aim to identify plagiarized content present in large repositories. This task is rendered difficult by the use of sophisticated plagiarism techniques such as paraphrasing and summarization, which mask the occurrence of plagiarism. In this work, a monolingual plagiarism detection technique has been developed to tackle cases of paraphrased plagiarism. A support vector machine based paraphrase recognition system, which works by extracting lexical, syntactic, and semantic features from input text has been used. Both sentence-level and passage-level approaches have been investigated. The performance of the system has been evaluated on various corpora, and the passage level approach has registered promising results.
APA, Harvard, Vancouver, ISO, and other styles
7

Vrbanec, Tedo, and Ana Meštrović. "Corpus-Based Paraphrase Detection Experiments and Review." Information 11, no. 5 (April 29, 2020): 241. http://dx.doi.org/10.3390/info11050241.

Full text
Abstract:
Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.
APA, Harvard, Vancouver, ISO, and other styles
8

Hudson, G. Thomas, and Noura Al Moubayed. "Ask me in your own words: paraphrasing for multitask question answering." PeerJ Computer Science 7 (October 27, 2021): e759. http://dx.doi.org/10.7717/peerj-cs.759.

Full text
Abstract:
Multitask learning has led to significant advances in Natural Language Processing, including the decaNLP benchmark where question answering is used to frame 10 natural language understanding tasks in a single model. In this work we show how models trained to solve decaNLP fail with simple paraphrasing of the question. We contribute a crowd-sourced corpus of paraphrased questions (PQ-decaNLP), annotated with paraphrase phenomena. This enables analysis of how transformations such as swapping the class labels and changing the sentence modality lead to a large performance degradation. Training both MQAN and the newer T5 model using PQ-decaNLP improves their robustness and for some tasks improves the performance on the original questions, demonstrating the benefits of a model which is more robust to paraphrasing. Additionally, we explore how paraphrasing knowledge is transferred between tasks, with the aim of exploiting the multitask property to improve the robustness of the models. We explore the addition of paraphrase detection and paraphrase generation tasks, and find that while both models are able to learn these new tasks, knowledge about paraphrasing does not transfer to other decaNLP tasks.
APA, Harvard, Vancouver, ISO, and other styles
9

Kumova Metin, Senem, Bahar Karaoğlan, Tarık Kışla, and Katira Soleymanzadeh. "Certainty factor model in paraphrase detection." Pamukkale University Journal of Engineering Sciences 27, no. 2 (2021): 139–50. http://dx.doi.org/10.5505/pajes.2020.75350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Kong, Leilei, Zhongyuan Han, Yong Han, and Haoliang Qi. "A Deep Paraphrase Identification Model Interacting Semantics with Syntax." Complexity 2020 (October 30, 2020): 1–14. http://dx.doi.org/10.1155/2020/9757032.

Full text
Abstract:
Paraphrase identification is central to many natural language applications. Based on the insight that a successful paraphrase identification model needs to adequately capture the semantics of the language objects as well as their interactions, we present a deep paraphrase identification model interacting semantics with syntax (DPIM-ISS) for paraphrase identification. DPIM-ISS introduces the linguistic features manifested in syntactic features to produce more explicit structures and encodes the semantic representation of sentence on different syntactic structures by means of interacting semantics with syntax. Then, DPIM-ISS learns the paraphrase pattern from this representation interacting the semantics with syntax by exploiting a convolutional neural network with convolution-pooling structure. Experiments are conducted on the corpus of Microsoft Research Paraphrase (MSRP), PAN 2010 corpus, and PAN 2012 corpus for paraphrase plagiarism detection. The experimental results demonstrate that DPIM-ISS outperforms the classical word-matching approaches, the syntax-similarity approaches, the convolution neural network-based models, and some deep paraphrase identification models.
APA, Harvard, Vancouver, ISO, and other styles
11

I., Mohamed, and Wael H. "Exploring the Recent Trends of Paraphrase Detection." International Journal of Computer Applications 182, no. 46 (March 15, 2019): 1–5. http://dx.doi.org/10.5120/ijca2019918317.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Mahmoud, Adnen, and Mounir Zrigui. "Semantic Similarity Analysis for Corpus Development and Paraphrase Detection in Arabic." International Arab Journal of Information Technology 18, no. 1 (December 31, 2020): 1–7. http://dx.doi.org/10.34028/iajit/18/1/1.

Full text
Abstract:
Paraphrase detection allows determining how original and suspect documents convey the same meaning. It has attracted attention from researchers in many Natural Language Processing (NLP) tasks such as plagiarism detection, question answering, information retrieval, etc., Traditional methods (e.g., Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), and Latent Semantic Analysis (LSA)) cannot capture efficiently hidden semantic relations when sentences may not contain any common words or the co-occurrence of words is rarely present. Therefore, we proposed a deep learning model based on Global Word embedding (GloVe) and Recurrent Convolutional Neural Network (RCNN). It was efficient for capturing more contextual dependencies between words vectors with precise semantic meanings. Seeing the lack of resources in Arabic language publicly available, we developed a paraphrased corpus automatically. It preserved syntactic and semantic structures of Arabic sentences using word2vec model and Part-Of-Speech (POS) annotation. Overall experiments shown that our proposed model outperformed the state-of-the-art methods in terms of precision and recall
APA, Harvard, Vancouver, ISO, and other styles
13

Anchiêta, Rafael T., Rogério F. de Sousa, and Thiago A. S. Pardo. "Modeling the Paraphrase Detection Task over a Heterogeneous Graph Network with Data Augmentation." Information 11, no. 9 (September 1, 2020): 422. http://dx.doi.org/10.3390/info11090422.

Full text
Abstract:
Paraphrase detection is a Natural-Language Processing (NLP) task that aims at automatically identifying whether two sentences convey the same meaning (even with different words). For the Portuguese language, most of the works model this task as a machine-learning solution, extracting features and training a classifier. In this paper, following a different line, we explore a graph structure representation and model the paraphrase identification task over a heterogeneous network. We also adopt a back-translation strategy for data augmentation to balance the dataset we use. Our approach, although simple, outperforms the best results reported for the paraphrase detection task in Portuguese, showing that graph structures may capture better the semantic relatedness among sentences.
APA, Harvard, Vancouver, ISO, and other styles
14

Dajiang Lei, Qingsheng Zhu, Peng Yang, Yifu Jin, Jun Chen, and Hai Lin. "An Outlying Paraphrase Subspace Search Algorithm for Outlier Detection." International Journal of Digital Content Technology and its Applications 5, no. 8 (August 31, 2011): 355–64. http://dx.doi.org/10.4156/jdcta.vol5.issue8.41.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Vrublevskyi, V. N., and A. A. Marchenko. "Review of approaches for paraphrase identification." Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics, no. 1 (2023): 71–78. http://dx.doi.org/10.17721/1812-5409.2023/1.10.

Full text
Abstract:
The article is devoted to a review of approaches to solving the problem of identifying paraphrases. This problem's relevance and use in tasks such as plagiarism detection, text simplification, and information search are described. Several classes of solutions were considered. The first approach is based on manual rules - it uses manually selected features based on the fundamental properties of paraphrases. The second approach is based on lexical similarity and various databases and ontologies. Machine learning-based approaches are also presented in this paper and describe different architectures that can be used to identify paraphrases. The last approach considered is based on deep learning and modern models of transformers.
APA, Harvard, Vancouver, ISO, and other styles
16

Srivastava, Shruti, and Sharvari Govilkar. "A Survey on Paraphrase Detection Techniques for Indian Regional Languages." International Journal of Computer Applications 163, no. 9 (April 17, 2017): 42–47. http://dx.doi.org/10.5120/ijca2017913757.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

El-Alfy, El-Sayed M., Radwan E. Abdel-Aal, Wasfi G. Al-Khatib, and Faisal Alvi. "Boosting paraphrase detection through textual similarity metrics with abductive networks." Applied Soft Computing 26 (January 2015): 444–53. http://dx.doi.org/10.1016/j.asoc.2014.10.021.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Mahmoud, Adnen, and Mounir Zrigui. "Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity." International Journal of Cognitive Informatics and Natural Intelligence 14, no. 1 (January 2020): 35–50. http://dx.doi.org/10.4018/ijcini.2020010103.

Full text
Abstract:
The problem addressed is to develop a model that can reliably identify whether a previously unseen document pair is paraphrased or not. Its detection in Arabic documents is a challenge because of its variability in features and the lack of publicly available corpora. Faced with these problems, the authors propose a semantic approach. At the feature extraction level, the authors use global vectors representation combining global co-occurrence counting and a contextual skip gram model. At the paraphrase identification level, the authors apply a convolutional neural network model to learn more contextual and semantic information between documents. For experiments, the authors use Open Source Arabic Corpora as a source corpus. Then the authors collect different datasets to create a vocabulary model. For the paraphrased corpus construction, the authors replace each word from the source corpus by its most similar one which has the same grammatical class applying the word2vec algorithm and the part-of-speech annotation. Experiments show that the model achieves promising results in terms of precision and recall compared to existing approaches in the literature.
APA, Harvard, Vancouver, ISO, and other styles
19

Et.al, Jia Jun, Dong. "Paraphrasing Chinese Idioms: Paraphrase Acquisition, Rewording and Scoring." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 3 (April 10, 2021): 1999–2005. http://dx.doi.org/10.17762/turcomat.v12i3.1037.

Full text
Abstract:
Paraphrasing is a process to restate the meaning of a text or a passage using different words in the same language to give a clearer understanding of the original sentence to the readers. Paraphrasing is important in many natural language processing tasks such as plagiarism detection, information retrieval, and machine translation. In this article, we describe our work in paraphrasing Chinese idioms by using the definitions from dictionaries. The definitions of the idioms will be reworded and then scored to find the best paraphrase candidates to be used for the given context. With the proposed approach to paraphrase Chinse idioms in sentences, the BLEU was 75.69%, compared to the baseline approach that was 66.34%.
APA, Harvard, Vancouver, ISO, and other styles
20

Saha, Rudradityo, and G. Bharadwaja Kumar. "A Novel Approach for Developing Paraphrase Detection System using Machine Learning." International Journal of Computer Applications 183, no. 9 (June 21, 2021): 29–36. http://dx.doi.org/10.5120/ijca2021921389.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Agarwal, Basant, Heri Ramampiaro, Helge Langseth, and Massimiliano Ruocco. "A deep network model for paraphrase detection in short text messages." Information Processing & Management 54, no. 6 (November 2018): 922–37. http://dx.doi.org/10.1016/j.ipm.2018.06.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

I., Mohamed, Wael H., and Hawaf Abdalhakim. "A Hybrid Model for Paraphrase Detection Combines pros of Text Similarity with Deep Learning." International Journal of Computer Applications 178, no. 20 (June 18, 2019): 18–23. http://dx.doi.org/10.5120/ijca2019919011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Shakeel, Muhammad Haroon, Asim Karim, and Imdadullah Khan. "A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts." Information Processing & Management 57, no. 3 (May 2020): 102204. http://dx.doi.org/10.1016/j.ipm.2020.102204.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Tsinganos, Nikolaos, Panagiotis Fouliras, and Ioannis Mavridis. "Applying BERT for Early-Stage Recognition of Persistence in Chat-Based Social Engineering Attacks." Applied Sciences 12, no. 23 (December 2, 2022): 12353. http://dx.doi.org/10.3390/app122312353.

Full text
Abstract:
Chat-based social engineering (CSE) attacks are attracting increasing attention in the Small-Medium Enterprise (SME) environment, given the ease and potential impact of such an attack. During a CSE attack, malicious users will repeatedly use linguistic tricks to eventually deceive their victims. Thus, to protect SME users, it would be beneficial to have a cyber-defense mechanism able to detect persistent interlocutors who repeatedly bring up critical topics that could lead to sensitive data exposure. We build a natural language processing model, called CSE-PersistenceBERT, for paraphrase detection to recognize persistency as a social engineering attacker’s behavior during a chat-based dialogue. The CSE-PersistenceBERT model consists of a pre-trained BERT model fine-tuned using our handcrafted CSE-Persistence corpus; a corpus appropriately annotated for the specific downstream task of paraphrase recognition. The model identifies the linguistic relationship between the sentences uttered during the dialogue and exposes the malicious intent of the attacker. The results are satisfactory and prove the efficiency of CSE-PersistenceBERT as a recognition mechanism of a social engineer’s persistent behavior during a CSE attack.
APA, Harvard, Vancouver, ISO, and other styles
25

Bouarara, Hadj Ahmed, and Reda Mohamed Hamou. "A New Algorithm of Grouping Cockroaches Classifier (GCC) for Textual Plagiarism Detection." International Journal of Information Retrieval Research 6, no. 4 (October 2016): 51–73. http://dx.doi.org/10.4018/ijirr.2016100104.

Full text
Abstract:
In the last decade with the new technology, it is important to allow users to access information freely, while at the same time, restrict them from illegal copying and distribution of information. In the age of information technologies plagiarism has become a topical subject in the digital world and turned into a serious problem. The author's work deals with the development of a new system for combating this phenomenon using a new insect behaviour algorithm called Groping cockroaches classifier GCC. Each suspicious text (cockroach) will be classified (hidden) in a class (shelter) that can be plagiarism or no-plagiarism, using a security function that is based on the attractiveness of each class (calculated using the aggregation operators (shelter darkness, congeners attraction and security quality)) and the displacement probability (calculated using the naive Bayes algorithm). The experimental results performed on the Pan 09 dataset and using the validation measures (recall, precision, f-measure, and entropy), have demonstrated that GCC has clear advantages over others plagiarism detection techniques existed in literature. Finally, a set of service was added in order to detect the different cases of plagiarism such as plagiarism with translation, plagiarism of idea, plagiarism with synonymy, and plagiarism paraphrase.
APA, Harvard, Vancouver, ISO, and other styles
26

Grefenstette, Edward, and Mehrnoosh Sadrzadeh. "Concrete Models and Empirical Evaluations for the Categorical Compositional Distributional Model of Meaning." Computational Linguistics 41, no. 1 (March 2015): 71–118. http://dx.doi.org/10.1162/coli_a_00209.

Full text
Abstract:
Modeling compositional meaning for sentences using empirical distributional methods has been a challenge for computational linguists. The categorical model of Clark, Coecke, and Sadrzadeh (2008) and Coecke, Sadrzadeh, and Clark (2010) provides a solution by unifying a categorial grammar and a distributional model of meaning. It takes into account syntactic relations during semantic vector composition operations. But the setting is abstract: It has not been evaluated on empirical data and applied to any language tasks. We generate concrete models for this setting by developing algorithms to construct tensors and linear maps and instantiate the abstract parameters using empirical data. We then evaluate our concrete models against several experiments, both existing and new, based on measuring how well models align with human judgments in a paraphrase detection task. Our results show the implementation of this general abstract framework to perform on par with or outperform other leading models in these experiments. 1
APA, Harvard, Vancouver, ISO, and other styles
27

Tiwari, Prayag, Amit Kumar Jaiswal, Sahil Garg, and Ilsun You. "SANTM: Efficient Self-attention-driven Network for Text Matching." ACM Transactions on Internet Technology 22, no. 3 (August 31, 2022): 1–21. http://dx.doi.org/10.1145/3426971.

Full text
Abstract:
Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.
APA, Harvard, Vancouver, ISO, and other styles
28

Kholodna, N., and V. Vysotska. "REWRITING IDENTIFICATION TECHNOLOGY FOR TEXT CONTENT BASED ON MACHINE LEARNING METHODS." Radio Electronics, Computer Science, Control, no. 4 (December 13, 2022): 126. http://dx.doi.org/10.15588/1607-3274-2022-4-11.

Full text
Abstract:
Context. Paraphrased textual content or rewriting is one of the difficult problems of detecting academic plagiarism. Most plagiarism detection systems are designed to detect common words, sequences of linguistic units, and minor changes, but are unable to detect significant semantic and structural changes. Therefore, most cases of plagiarism using paraphrasing remain unnoticed. Objective of the study is to develop a technology for detecting paraphrasing in text based on a classification model and machine learning methods through the use of Siamese neural network based on recurrent and Transformer type – RoBERTa to analyze the level of similarity of sentences of text content. Method. For this study, the following semantic similarity metrics or indicators were chosen as features: Jacquard coefficient for shared N-grams, cosine distance between vector representations of sentences, Word Mover’s Distance, distances according to WordNet dictionaries, prediction of two ML models: Siamese neural network based on recurrent and Transformer type - RoBERTa. Results. An intelligent system for detecting paraphrasing in text based on a classification model and machine learning methods has been developed. The developed system uses the principle of model stacking and feature engineering. Additional features indicate the semantic affiliation of the sentences or the normalized number of common N-grams. An additional fine-tuned RoBERTa neural network (with additional fully connected layers) is less sensitive to pairs of sentences that are not paraphrases of each other. This specificity of the model may contribute to incorrect accusations of plagiarism or incorrect association of user-generated content. Additional features increase both the overall classification accuracy and the model’s sensitivity to pairs of sentences that are not paraphrases of each other. Conclusions. The created model shows excellent classification results on PAWS test data: precision – 93%, recall – 92%, F1score – 92%, accuracy – 92%. The results of the study showed that Transformer-type NNs can be successfully applied to detect paraphrasing in a pair of texts with fairly high accuracy without the need for additional feature generation.
APA, Harvard, Vancouver, ISO, and other styles
29

Mishra, Dr Ritu. "ESTIMATION OF PETRIFYING TRENDSETTING TWINGE BY VIBRATION PARAPHRASE." International Journal of Medical Sciences And Clinical Research 03, no. 03 (March 1, 2023): 1–4. http://dx.doi.org/10.37547/ijmscr/volume03issue03-01.

Full text
Abstract:
Introduction- Petrifying trendsetting twinge may be a common problem with varied aetiologies. MRI has excellent soft tissue resolution and sensitive for detecting osseous, chondral, marrow abnormality and will be the simplest modality for various intra-articular and extra-articular causes of trendsetting twinge. Material and method- this study was done at our tertiary health care centre for 2 years during which 125 patients with a history of petrifying trendsetting twinge were included. Clinical history, laboratory parameters with proper MRI protocol with contrast administration wherever indicated was under taken to guage various causes of trendsetting twinge and to assess MRI appearances of trendsetting pathologies. the main target of our study was to seek out MR characteristics of the varied disease and to guage the simplest sequence for various trendsetting pathologies. We also evaluated various clinical and radiological parameters related to future femoral head collapse. Results- Amongst 125 patient evaluated commonest pathology for the explanation for petrifying trendsetting twinge was avascular necrosis of femoral head. red blood cell disease, chronic alcohol and steroid use were common causes for AVN. Site and percentage of femoral head involvement are essential predictors for future collapse of the femoral head. Amongst paediatric age bracket, transient synovitis followed by osteomyelitis may be a commonest cause for trendsetting twinge. Contrast-enhanced MRI can help within the differentiation of Pyogenic vs Tubercular arthritis. Gadolinium administration should be wiped out all cases of inflammatory arthritis to detect associated synovitis, enthesitis, bursitis. Conclusion- Various trendsetting pathologies cause progressive destruction of trendsetting where early diagnosis help in arresting the disease progression and prompt management of the patient.MRI is non-invasive, accurate and sensitive for trendsetting pathologies and proves to be the modality of choice within the guesstimation of trendsetting twinge altogether the age groups.
APA, Harvard, Vancouver, ISO, and other styles
30

Haneef, Israr, Rao Muhammad Adeel Nawab, Ehsan Ullah Munir, and Imran Sarwar Bajwa. "Design and Development of a Large Cross-Lingual Plagiarism Corpus for Urdu-English Language Pair." Scientific Programming 2019 (March 17, 2019): 1–11. http://dx.doi.org/10.1155/2019/2962040.

Full text
Abstract:
Cross-lingual plagiarism occurs when the source (or original) text(s) is in one language and the plagiarized text is in another language. In recent years, cross-lingual plagiarism detection has attracted the attention of the research community because a large amount of digital text is easily accessible in many languages through online digital repositories and machine translation systems are readily available, making it easier to perform cross-lingual plagiarism and harder to detect it. To develop and evaluate cross-lingual plagiarism detection systems, standard evaluation resources are needed. The majority of earlier studies have developed cross-lingual plagiarism corpora for English and other European language pairs. However, for Urdu-English language pair, the problem of cross-lingual plagiarism detection has not been thoroughly explored although a large amount of digital text is readily available in Urdu and it is spoken in many countries of the world (particularly in Pakistan, India, and Bangladesh). To fulfill this gap, this paper presents a large benchmark cross-lingual corpus for Urdu-English language pair. The proposed corpus contains 2,395 source-suspicious document pairs (540 are automatic translation, 539 are artificially paraphrased, 508 are manually paraphrased, and 808 are nonplagiarized). Furthermore, our proposed corpus contains three types of cross-lingual examples including artificial (automatic translation and artificially paraphrased), simulated (manually paraphrased), and real (nonplagiarized), which have not been previously reported in the development of cross-lingual corpora. Detailed analysis of our proposed corpus was carried out using n-gram overlap and longest common subsequence approaches. Using Word unigrams, mean similarity scores of 1.00, 0.68, 0.52, and 0.22 were obtained for automatic translation, artificially paraphrased, manually paraphrased, and nonplagiarized documents, respectively. These results show that documents in the proposed corpus are created using different obfuscation techniques, which makes the dataset more realistic and challenging. We believe that the corpus developed in this study will help to foster research in an underresourced language of Urdu and will be useful in the development, comparison, and evaluation of cross-lingual plagiarism detection systems for Urdu-English language pair. Our proposed corpus is free and publicly available for research purposes.
APA, Harvard, Vancouver, ISO, and other styles
31

Seifikar, Mahsa, and Saeed Farzi. "A comprehensive study of online event tracking algorithms in social networks." Journal of Information Science 45, no. 2 (July 3, 2018): 156–68. http://dx.doi.org/10.1177/0165551518785548.

Full text
Abstract:
Recently, social networks have provided an important platform to detect trends of real-world events. The trends of real-world events are detected by analysing flow of massive bulks of data in continuous time steps over various social media platforms. Today, many researchers have been interested in detecting social network trends, in order to analyse the gathered information for enabling users and organisations to satisfy their information need. This article is aimed at complete surveying the recent text-based trend detection approaches, which have been studied from three perspectives (algorithms, dimension and diversity of events). The advantages and disadvantages of the considered approaches have also been paraphrased separately to illustrate a comprehensive view of the previous works and open problems.
APA, Harvard, Vancouver, ISO, and other styles
32

Rodríguez-Cantelar, Mario, Marcos Estecha-Garitagoitia, Luis Fernando D’Haro, Fernando Matía, and Ricardo Córdoba. "Automatic Detection of Inconsistencies and Hierarchical Topic Classification for Open-Domain Chatbots." Applied Sciences 13, no. 16 (August 8, 2023): 9055. http://dx.doi.org/10.3390/app13169055.

Full text
Abstract:
Current State-of-the-Art (SotA) chatbots are able to produce high-quality sentences, handling different conversation topics and larger interaction times. Unfortunately, the generated responses depend greatly on the data on which they have been trained, the specific dialogue history and current turn used for guiding the response, the internal decoding mechanisms, and ranking strategies, among others. Therefore, it may happen that for semantically similar questions asked by users, the chatbot may provide a different answer, which can be considered as a form of hallucination or producing confusion in long-term interactions. In this research paper, we propose a novel methodology consisting of two main phases: (a) hierarchical automatic detection of topics and subtopics in dialogue interactions using a zero-shot learning approach, and (b) detecting inconsistent answers using k-means and the Silhouette coefficient. To evaluate the efficacy of topic and subtopic detection, we use a subset of the DailyDialog dataset and real dialogue interactions gathered during the Alexa Socialbot Grand Challenge 5 (SGC5). The proposed approach enables the detection of up to 18 different topics and 102 subtopics. For the purpose of detecting inconsistencies, we manually generate multiple paraphrased questions and employ several pre-trained SotA chatbot models to generate responses. Our experimental results demonstrate a weighted F-1 value of 0.34 for topic detection, a weighted F-1 value of 0.78 for subtopic detection in DailyDialog, then 81% and 62% accuracy for topic and subtopic classification in SGC5, respectively. Finally, to predict the number of different responses, we obtained a mean squared error (MSE) of 3.4 when testing smaller generative models and 4.9 in recent large language models.
APA, Harvard, Vancouver, ISO, and other styles
33

Büchler, Marco, Gregory Crane, and Gerhard Heyer. "Historical Relevance Feedback Detection by Text Re-use Networks." Leonardo 46, no. 3 (June 2013): 276. http://dx.doi.org/10.1162/leon_a_00572.

Full text
Abstract:
Text re-use has been in the humanist's interest for centuries. Collecting parallel texts implies giving a certain information, e.g. a moral statement or report on wars and conflicts, a kind of witness. The more independent parallel texts are collected, the more feasible the information is. The contribution reported here is on automatic detection of text re-use and the usage of a text re-use network to derive a Cultural Heritage Aware PageRank technique given ancient text re-uses like quotations, paraphrases, and allusions.
APA, Harvard, Vancouver, ISO, and other styles
34

Lee, Seongwoon, Seongsoon Kim, Donghyeon Park, and Jaewoo Kang. "A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance." KIISE Transactions on Computing Practices 22, no. 7 (July 15, 2016): 338–43. http://dx.doi.org/10.5626/ktcp.2016.22.7.338.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Ansorge, Libor, Klára Ansorgeová, and Mark Sixsmith. "Plagiarism through Paraphrasing Tools—The Story of One Plagiarized Text." Publications 9, no. 4 (October 20, 2021): 48. http://dx.doi.org/10.3390/publications9040048.

Full text
Abstract:
This paper describes a unique case study wherein real plagiarism revealed in a scientific journal is compared with the original article. The plagiarized text contains many typical errors, such as inconsistent terminology, unclear meanings of sentence, missing tables and figures, and an incorrect literature list. The occurrence of similar errors in other manuscripts may serve as a warning against plagiarism. During the analysis of the plagiarized text, it was assumed that a paraphrasing tool was used for preparing this plagiarized text. To confirm this assumption, the chosen paraphrasing tool was used to create a paraphrased version of the article and this version was compared with the plagiarized text. The paraphrased version had far fewer changes from the plagiarized text than the plagiarized text had from the original article. Thus, it was confirmed that the plagiarized text was created using a paraphrasing tool. Information contained in this article can be used for detecting this type of plagiarism.
APA, Harvard, Vancouver, ISO, and other styles
36

Lidwan, Nanang, Faizal Roni, Sabaruddin Siagian, Sopyan Sopyan, and Adianta Sebayang. "PERANAN PERANGKAT TURNITIN DALAM MENDORONG KARYA ILMIAH BERKUALITAS." Akrab Juara : Jurnal Ilmu-ilmu Sosial 7, no. 4 (November 5, 2022): 2888. http://dx.doi.org/10.58487/akrabjuara.v7i4.1960.

Full text
Abstract:
The purpose of scientific writing is to explain the role of the Turnitin tool to detect plagiarism in a scientific work and also to encourage quality scientific work. To suppress plagiarism and encourage quality scientific work, it is necessary to have a detailed and intense mastery of scientific work material so as to produce scientific writings that produce writings that have high originality. To suppress plagiarism in scientific writing, you can also use a paraphrase strategy and process words and sentences manually. Although the Turnitin tool is currently the best tool for detecting plagiarism, it still has a weakness, namely that it only detects the similarity or similarity of words and sentences. For this reason, there needs to be collaboration between students, lecturers and universities in suppressing thesis plagiarism and utilizing the role of the Turnitin tool in encouraging quality scientific works or thesis..
APA, Harvard, Vancouver, ISO, and other styles
37

Grespan Pensin, Taiana. "Plágio na construção do texto de alunos de Secretariado Executivo?" Revista Expectativa 20, no. 3 (July 29, 2021): 88–107. http://dx.doi.org/10.48075/revex.v20i3.26602.

Full text
Abstract:
Devido ao desconhecimento de algumas normas e até mesmo pela falta de habilidade com a escrita, muitos alunos cometem plágio nos trabalhos acadêmicos. Frente a isso, esta pesquisa, de base qualiquantitativa, objetiva verificar se e, se sim, de que forma o plágio acadêmico está presente nos textos dos alunos de Secretariado Executivo de uma universidade pública do Paraná. Para isso, entrevistaram-se alunos e professores do curso para verificar as principais dificuldades em relação à escrita dos relatórios de estágio. Além disso, realizou-se uma intervenção pedagógica que possibilitou acompanhar o desenvolvimento das habilidades de escrita de paráfrases e citações na turma de formandos de 2019, além de confirmar o problema do plágio. Analisou-se, ainda, um conjunto de oito relatórios de estágio, os quais são o trabalho de conclusão de curso desses alunos. Para auxiliar na detecção de plágio desses textos, utilizou-se uma ferramenta on-line, o DOCxWEB, a qual faz uma comparação do texto a ser analisado com as informações disponíveis na internet, a fim de encontrar similaridades na escrita ou possíveis cópias. A partir disso, foram classificados os tipos de plágio encontrados nos textos conforme o seu grau de gravidade. Esta pesquisa está fundamentada na abordagem dos Letramentos Acadêmicos proposta por Lea e Street (1998), a qual preconiza que o ensino de escrita na universidade deve oportunizar o domínio de habilidades tanto textuais quanto sociais. Além disso, está em consonância ao que preconiza o discurso de Ivanič (2004) em relação à escrita como norteadora de práticas sociais e nos ensinamentos de Krokoscz (2011, 2012), Swale e Feak (2012), Coughlin (2015), Svincki e Mackeachie (2015) e Seide (2018) acerca de plágio e paráfrases. ABSTRACT On account of the lack of knowledge of some norms and even the lack of writing skills, many students commit plagiarism in academic work. Faced with this problem, this qualitative and quantitative research aims to verify if and in what way academic plagiarism is present in the texts of students from a Secretarial Science Undergraduated Course at a public university in Paraná. For this, students and teachers of the course were interviewed to verify the main difficulties in relation to the writing of the internship reports. In addition, a pedagogical intervention was carried out that made it possible to monitor the development of paraphrase and quote writing skills in the 2019 undergraduate class, in addition to confirming the plagiarism problem. A set of eight internship reports was also analyzed, which are the work of completing these students' course. To assist in detecting plagiarism in these texts, an online tool, DOCxWEB, was used, which compares the text to be analyzed with the information available on the internet, in order to find similarities in writing or possible copies. From this, the types of plagiarism found in the texts were classified according to their degree of severity. This research is based on the approach of Academic Literacies proposed by Lea and Street (1998), which advocates that the teaching of writing at the university should provide the domain of both textual and social skills. Furthermore, it is in line with what Ivanič's (2004) discourse advocates in relation to writing as a guide for social practices and in the teachings of Krokoscz (2011, 2012), Swale and Feak (2012), Coughlin (2015), Svincki and Mackeachie (2015) and Seide (2018) about plagiarism and paraphrases.
APA, Harvard, Vancouver, ISO, and other styles
38

Belova, Polina. "Methods for Automated Comparative Analysis of Texts when Detecting Signs of Plagiarism in Expert Case Examinations of Сopyright and Related Rights Infringement." Legal Linguistics, no. 27(38) (April 1, 2023): 94–98. http://dx.doi.org/10.14258/leglin(2023)2717.

Full text
Abstract:
Within the framework of linguistic expertise on cases of copyright and related rights infringement experts are increasingly faced with the challenge of comparing several texts and searching for full-text, partial and other (lexical, grammatical, semantic, etc.) coincidences in them, as well as determining the values of these coincidences. Comparing documents manually takes a lot of time, especially if the research materials are multi-page texts. This article suggests possible ways to automate and improve this work by using special online document comparison tools: "Copyscape", "Embedika Compare", "Draftable Online", "Compare texts", "Copyleaks Text Compare Tool". The given list of tools for comparing texts is compiled by the article author based on the experience of using them in expert practice. For each of the services, the article indicates its advantages and disadvantages, as well as describes the algorithm of operation and features of the presentation of comparison results. Some tools have simple functionality and display how many words matched, show the percentage of uniqueness of the compared texts, others have more advanced comparison analytics and, in addition to the percentage of matches and the number of identical words, determine the types of similarities of text fragments, highlighting among them identical (full-text), similar (with minimal changes) and paraphrased. Nevertheless, the obtained results of comparing text files still require their expert verification and further linguistic research with the interpretation of the established coincidences and the definition of their type, especially with regard to lexical, grammatical, semantic, syntactic coincidences.
APA, Harvard, Vancouver, ISO, and other styles
39

Daoud, Mohammad. "Topical and Non-Topical Approaches to Measure Similarity between Arabic Questions." Big Data and Cognitive Computing 6, no. 3 (August 22, 2022): 87. http://dx.doi.org/10.3390/bdcc6030087.

Full text
Abstract:
Questions are crucial expressions in any language. Many Natural Language Processing (NLP) or Natural Language Understanding (NLU) applications, such as question-answering computer systems, automatic chatting apps (chatbots), digital virtual assistants, and opinion mining, can benefit from accurately identifying similar questions in an effective manner. We detail methods for identifying similarities between Arabic questions that have been posted online by Internet users and organizations. Our novel approach uses a non-topical rule-based methodology and topical information (textual similarity, lexical similarity, and semantic similarity) to determine if a pair of Arabic questions are similarly paraphrased. Our method counts the lexical and linguistic distances between each question. Additionally, it identifies questions in accordance with their format and scope using expert hypotheses (rules) that have been experimentally shown to be useful and practical. Even if there is a high degree of lexical similarity between a When question (Timex Factoid—inquiring about time) and a Who inquiry (Enamex Factoid—asking about a named entity), they will not be similar. In an experiment using 2200 question pairs, our method attained an accuracy of 0.85, which is remarkable given the simplicity of the solution and the fact that we did not employ any language models or word embedding. In order to cover common Arabic queries presented by Arabic Internet users, we gathered the questions from various online forums and resources. In this study, we describe a unique method for detecting question similarity that does not require intensive processing, a sizable linguistic corpus, or a costly semantic repository. Because there are not many rich Arabic textual resources, this is especially important for informal Arabic text processing on the Internet.
APA, Harvard, Vancouver, ISO, and other styles
40

Hafeez, Hamza, Iqra Muneer, Muhammad Sharjeel, Muhammad Adnan Ashraf, and Rao Muhammad Adeel Nawab. "Urdu Short Paraphrase Detection at Sentence Level." ACM Transactions on Asian and Low-Resource Language Information Processing, February 28, 2023. http://dx.doi.org/10.1145/3586009.

Full text
Abstract:
Paraphrase detection systems uncover the relationship between two text fragments and classify them as paraphrased when they convey the same idea; otherwise non-paraphrased. Previously, the researchers have mainly focused on developing resources for the English language for paraphrase detection. There have been very few efforts for paraphrase detection in South Asian languages. However, no research has been conducted on sentence-level paraphrase detection in Urdu, a low-resourced language. It is mainly due to the unavailability of the corpora that focus on the sentence level. The available related studies on the Urdu language only focus on text reuse detection tasks at the passage and document levels. Therefore, this study aims to develop a large-scale manually annotated benchmark Urdu paraphrase detection corpus at the sentence level, based on real cases from journalism. The proposed Urdu Sentential Paraphrases (USP) corpus contains 4,900 sentences (2,941 paraphrased and 1,959 non-paraphrased), manually collected from the Urdu newspapers. Moreover, several techniques were proposed, devloped, and compared as a secondary contribution, including Word Embedding (WE), Sentence Transformers (ST), and feature-fusion techniques. N-gram is treated as the baseline technique for our research. The experimental results indicate that our proposed feature-fusion technique is the most suitable for the Urdu paraphrase detection task. Furthermore, the performance increases when features of the proposed (ST) and baseline (N-gram) are combined for the classification task. In addition, The proposed techniques have also been applied to the UPPC corpus to check their performance at the document level. The best result ws obtained using the feature fusion technique ( F 1 = 0.855). Our corpus is available as free to download option for research purposes.
APA, Harvard, Vancouver, ISO, and other styles
41

Iqbal, Hafiz Rizwan, Rashad Maqsood, Agha Ali Raza, and Saeed-Ul Hassan. "Urdu paraphrase detection: A novel DNN-based implementation using a semi-automatically generated corpus." Natural Language Engineering, May 29, 2023, 1–31. http://dx.doi.org/10.1017/s1351324923000189.

Full text
Abstract:
Abstract Automatic paraphrase detection is the task of measuring the semantic overlap between two given texts. A major hurdle in the development and evaluation of paraphrase detection approaches, particularly for South Asian languages like Urdu, is the inadequacy of standard evaluation resources. The very few available paraphrased corpora for these languages are manually created. As a result, they are constrained to smaller sizes and are not very feasible to evaluate mainstream data-driven and deep neural networks (DNNs)-based approaches. Consequently, there is a need to develop semi- or fully automated corpus generation approaches for the resource-scarce languages. There is currently no semi- or fully automatically generated sentence-level Urdu paraphrase corpus. Moreover, no study is available to localize and compare approaches for Urdu paraphrase detection that focus on various mainstream deep neural architectures and pretrained language models. This research study addresses this problem by presenting a semi-automatic pipeline for generating paraphrased corpora for Urdu. It also presents a corpus that is generated using the proposed approach. This corpus contains 3147 semi-automatically extracted Urdu sentence pairs that are manually tagged as paraphrased (854) and non-paraphrased (2293). Finally, this paper proposes two novel approaches based on DNNs for the task of paraphrase detection in Urdu text. These are Word Embeddings n-gram Overlap (henceforth called WENGO), and a modified approach, Deep Text Reuse and Paraphrase Plagiarism Detection (henceforth called D-TRAPPD). Both of these approaches have been evaluated on two related tasks: (i) paraphrase detection, and (ii) text reuse and plagiarism detection. The results from these evaluations revealed that D-TRAPPD ( $F_1 = 96.80$ for paraphrase detection and $F_1 = 88.90$ for text reuse and plagiarism detection) outperformed WENGO ( $F_1 = 81.64$ for paraphrase detection and $F_1 = 61.19$ for text reuse and plagiarism detection) as well as other state-of-the-art approaches for these two tasks. The corpus, models, and our implementations have been made available as free to download for the research community.
APA, Harvard, Vancouver, ISO, and other styles
42

Srivastava, Shruti, and Sharvari Govilkar. "Detecting Paraphrases in Marathi Language." BOHR International Journal of Smart Computing and Information Technology, 2020, 7–17. http://dx.doi.org/10.54646/bijscit.003.

Full text
Abstract:
Paraphrasing refers to the sentences that either differs in their textual content or dissimilar in rearrangement of words but convey the same meaning. Identifying a paraphrase is exceptionally important in various real life applications such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no existing system to identify paraphrases in Marathi. This is the first such endeavor in Marathi Language. A paraphrase has different structured sentences and Marathi being semantically strong language hence this system is designed for checking both statistical and semantic similarity of Marathi sentences. Statistical similarity measure does not need any prior knowledge as it is only based on the factual data of sentences. The factual data is calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and word-distance. Universal Networking Language (UNL) speaks about the semantic significance in the sentence without any syntacticpointofinterest.Hence, these mantic similarity calculated on the basis of generated UNL graphs for two Marathi sentences renders semantic equality of two Marathi sentences. The total para phrases core was calculated after joining statistical and semantic similarity scores which gives the judgement of being paraphrase or non-paraphrase about the Marathi sentences.
APA, Harvard, Vancouver, ISO, and other styles
43

Alvi, Faisal, Mark Stevenson, and Paul Clough. "Paraphrase type identification for plagiarism detection using contexts and word embeddings." International Journal of Educational Technology in Higher Education 18, no. 1 (August 4, 2021). http://dx.doi.org/10.1186/s41239-021-00277-8.

Full text
Abstract:
AbstractParaphrase types have been proposed by researchers as the paraphrasing mechanisms underlying acts of plagiarism. Synonymous substitution, word reordering and insertion/deletion have been identified as some of the common paraphrasing strategies used by plagiarists. However, similarity reports generated by most plagiarism detection systems provide a similarity score and produce matching sections of text with their possible sources. In this research we propose methods to identify two important paraphrase types – synonymous substitution and word reordering in paraphrased, plagiarised sentence pairs. We propose a three staged approach that uses context matching and pretrained word embeddings for identifying synonymous substitution and word reordering. Our proposed approach indicates that the use of Smith Waterman Algorithm for Plagiarism Detection and ConceptNet Numberbatch pretrained word embeddings produces the best performance in terms of $$\hbox {F}_1$$ F 1 scores. This research can be used to complement similarity reports generated by currently available plagiarism detection systems by incorporating methods to identify paraphrase types for plagiarism detection.
APA, Harvard, Vancouver, ISO, and other styles
44

Jain, Rachna, Abhishek Kathuria, Anubhav Singh, Anmol Saxena, and Anjali Khandelwal. "ParaCap: paraphrase detection model using capsule network." Multimedia Systems, January 22, 2021. http://dx.doi.org/10.1007/s00530-020-00746-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

João, Cordeiro, Dias Gaël, and Brazdil Pavel. "New Functions for Unsupervised Asymmetrical Paraphrase Detection." Journal of Software 2, no. 4 (October 1, 2007). http://dx.doi.org/10.4304/jsw.2.4.12-23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Mahmoud, Adnen, and Mounir Zrigui. "Hybrid Attention-based Approach for Arabic Paraphrase Detection." Applied Artificial Intelligence, September 5, 2021, 1–16. http://dx.doi.org/10.1080/08839514.2021.1975880.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

"OPTIMAL RECURRENT NEURAL NETWORK MODEL IN PARAPHRASE DETECTION." Informatics and Applications, December 30, 2018. http://dx.doi.org/10.14357/19922264180409.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Shahmohammadi, Hassan, MirHossein Dezfoulian, and Muharram Mansoorizadeh. "Paraphrase detection using LSTM networks and handcrafted features." Multimedia Tools and Applications, October 16, 2020. http://dx.doi.org/10.1007/s11042-020-09996-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Basalamah, Anas. "A Language Tutoring Tool based on AI and Paraphrase Detection." International Journal of Advanced Computer Science and Applications 12, no. 12 (2021). http://dx.doi.org/10.14569/ijacsa.2021.0121295.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Vrbanec, Tedo, and Ana Meštrović. "Comparison study of unsupervised paraphrase detection: Deep learning—The key for semantic similarity detection." Expert Systems, June 22, 2023. http://dx.doi.org/10.1111/exsy.13386.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography