Academic literature on the topic 'Paraphrase Detection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Paraphrase Detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Paraphrase Detection"

1

Altheneyan, Alaa, and Mohamed El Bachir Menai. "Evaluation of State-of-the-Art Paraphrase Identification and Its Application to Automatic Plagiarism Detection." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 04 (August 22, 2019): 2053004. http://dx.doi.org/10.1142/s0218001420530043.

Full text
Abstract:
Paraphrase identification is a natural language processing (NLP) problem that involves the determination of whether two text segments have the same meaning. Various NLP applications rely on a solution to this problem, including automatic plagiarism detection, text summarization, machine translation (MT), and question answering. The methods for identifying paraphrases found in the literature fall into two main classes: similarity-based methods and classification methods. This paper presents a critical study and an evaluation of existing methods for paraphrase identification and its application to automatic plagiarism detection. It presents the classes of paraphrase phenomena, the main methods, and the sets of features used by each particular method. All the methods and features used are discussed and enumerated in a table for easy comparison. Their performances on benchmark corpora are also discussed and compared via tables. Automatic plagiarism detection is presented as an application of paraphrase identification. The performances on benchmark corpora of existing plagiarism detection systems able to detect paraphrases are compared and discussed. The main outcome of this study is the identification of word overlap, structural representations, and MT measures as feature subsets that lead to the best performance results for support vector machines in both paraphrase identification and plagiarism detection on corpora. The performance results achieved by deep learning techniques highlight that these techniques are the most promising research direction in this field.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhou, Ying, Xiaokang Hu, and Vera Chung. "Automatic Construction of Fine-Grained Paraphrase Corpora System Using Language Inference Model." Applied Sciences 12, no. 1 (January 5, 2022): 499. http://dx.doi.org/10.3390/app12010499.

Full text
Abstract:
Paraphrase detection and generation are important natural language processing (NLP) tasks. Yet the term paraphrase is broad enough to include many fine-grained relations. This leads to different tolerance levels of semantic divergence in the positive paraphrase class among publicly available paraphrase datasets. Such variation can affect the generalisability of paraphrase classification models. It may also impact the predictability of paraphrase generation models. This paper presents a new model which can use few corpora of fine-grained paraphrase relations to construct automatically using language inference models. The fine-grained sentence level paraphrase relations are defined based on word and phrase level counterparts. We demonstrate that the fine-grained labels from our proposed system can make it possible to generate paraphrases at desirable semantic level. The new labels could also contribute to general sentence embedding techniques.
APA, Harvard, Vancouver, ISO, and other styles
3

Siswantining, Titin, Stanley Pratama, and Devvi Sarwinda. "SPRATAMA MODEL FOR INDONESIAN PARAPHRASE DETECTION USING BIDIRECTIONAL LONG SHORT-TERM MEMORY AND BIDIRECTIONAL GATED RECURRENT UNIT." MEDIA STATISTIKA 15, no. 2 (March 5, 2023): 129–38. http://dx.doi.org/10.14710/medstat.15.2.129-138.

Full text
Abstract:
Paraphrasing is a way to write sentences with other words with the same intent or purpose. Automatic paraphrase detection can be done using Natural Language Sentence Matching (NLSM) which is part of Natural Language Processing (NLP). NLP is a computational technique for processing text in general, while NLSM is used specifically to find the relationship between two sentences. With the development Neural Network (NN), nowadays NLP can be done more easily by computers. Many models for detecting and paraphrasing in English have been developed compared to Indonesian, which has less training data. This study proposes SPratama Model, which models paraphrase detection for Indonesian using a Recurrent Neural Network (RNN), namely Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Gated Recurrent Unit (BiGRU). The data used is "Quora Question Pairs" taken from Kaggle and translated into Indonesian using Google Translate. The results of this study indicate that the proposed model has an accuracy of around 80% for the detection of paraphrased sentences.
APA, Harvard, Vancouver, ISO, and other styles
4

Barrón-Cedeño, Alberto, Marta Vila, M. Martí, and Paolo Rosso. "Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection." Computational Linguistics 39, no. 4 (December 2013): 917–47. http://dx.doi.org/10.1162/coli_a_00153.

Full text
Abstract:
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems.
APA, Harvard, Vancouver, ISO, and other styles
5

Bamnote, Dr G. R., and Ms Deepti Ingole. "Design of Efficient Model to Predict Duplications in Questionnaire Forum using Machine Learning." International Journal for Research in Applied Science and Engineering Technology 11, no. 5 (May 31, 2023): 5893–97. http://dx.doi.org/10.22214/ijraset.2023.53088.

Full text
Abstract:
Abstract: Detection of duplicate sentences from a corpus containing a pair of sentences deals with identifying whether two sentences in the pair convey the same meaning or not. This detection of duplicates helps in deduplication, a process in which duplicates are removed. Traditional natural language processing techniques are less accurate in identifying similarity between sentences, such similar sentences can also be referred as paraphrases. Using Quora and Twitter paraphrase corpus, we explored various approaches including several machine learning algorithms to obtain a liable approach that can identify the duplicate sentences given a pair of sentences. This paper discusses the performance of six supervised machine learning algorithms in two different paraphrase corpus, and it focuses on analyzing how accurately the algorithms classify sentences present in the corpus as duplicates and non-duplicates
APA, Harvard, Vancouver, ISO, and other styles
6

Chitra, A., and Anupriya Rajkumar. "Plagiarism Detection Using Machine Learning-Based Paraphrase Recognizer." Journal of Intelligent Systems 25, no. 3 (July 1, 2016): 351–59. http://dx.doi.org/10.1515/jisys-2014-0146.

Full text
Abstract:
AbstractPlagiarism in free text has become a common occurrence due to the wide availability of voluminous information resources. Automatic plagiarism detection systems aim to identify plagiarized content present in large repositories. This task is rendered difficult by the use of sophisticated plagiarism techniques such as paraphrasing and summarization, which mask the occurrence of plagiarism. In this work, a monolingual plagiarism detection technique has been developed to tackle cases of paraphrased plagiarism. A support vector machine based paraphrase recognition system, which works by extracting lexical, syntactic, and semantic features from input text has been used. Both sentence-level and passage-level approaches have been investigated. The performance of the system has been evaluated on various corpora, and the passage level approach has registered promising results.
APA, Harvard, Vancouver, ISO, and other styles
7

Vrbanec, Tedo, and Ana Meštrović. "Corpus-Based Paraphrase Detection Experiments and Review." Information 11, no. 5 (April 29, 2020): 241. http://dx.doi.org/10.3390/info11050241.

Full text
Abstract:
Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.
APA, Harvard, Vancouver, ISO, and other styles
8

Hudson, G. Thomas, and Noura Al Moubayed. "Ask me in your own words: paraphrasing for multitask question answering." PeerJ Computer Science 7 (October 27, 2021): e759. http://dx.doi.org/10.7717/peerj-cs.759.

Full text
Abstract:
Multitask learning has led to significant advances in Natural Language Processing, including the decaNLP benchmark where question answering is used to frame 10 natural language understanding tasks in a single model. In this work we show how models trained to solve decaNLP fail with simple paraphrasing of the question. We contribute a crowd-sourced corpus of paraphrased questions (PQ-decaNLP), annotated with paraphrase phenomena. This enables analysis of how transformations such as swapping the class labels and changing the sentence modality lead to a large performance degradation. Training both MQAN and the newer T5 model using PQ-decaNLP improves their robustness and for some tasks improves the performance on the original questions, demonstrating the benefits of a model which is more robust to paraphrasing. Additionally, we explore how paraphrasing knowledge is transferred between tasks, with the aim of exploiting the multitask property to improve the robustness of the models. We explore the addition of paraphrase detection and paraphrase generation tasks, and find that while both models are able to learn these new tasks, knowledge about paraphrasing does not transfer to other decaNLP tasks.
APA, Harvard, Vancouver, ISO, and other styles
9

Kumova Metin, Senem, Bahar Karaoğlan, Tarık Kışla, and Katira Soleymanzadeh. "Certainty factor model in paraphrase detection." Pamukkale University Journal of Engineering Sciences 27, no. 2 (2021): 139–50. http://dx.doi.org/10.5505/pajes.2020.75350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Kong, Leilei, Zhongyuan Han, Yong Han, and Haoliang Qi. "A Deep Paraphrase Identification Model Interacting Semantics with Syntax." Complexity 2020 (October 30, 2020): 1–14. http://dx.doi.org/10.1155/2020/9757032.

Full text
Abstract:
Paraphrase identification is central to many natural language applications. Based on the insight that a successful paraphrase identification model needs to adequately capture the semantics of the language objects as well as their interactions, we present a deep paraphrase identification model interacting semantics with syntax (DPIM-ISS) for paraphrase identification. DPIM-ISS introduces the linguistic features manifested in syntactic features to produce more explicit structures and encodes the semantic representation of sentence on different syntactic structures by means of interacting semantics with syntax. Then, DPIM-ISS learns the paraphrase pattern from this representation interacting the semantics with syntax by exploiting a convolutional neural network with convolution-pooling structure. Experiments are conducted on the corpus of Microsoft Research Paraphrase (MSRP), PAN 2010 corpus, and PAN 2012 corpus for paraphrase plagiarism detection. The experimental results demonstrate that DPIM-ISS outperforms the classical word-matching approaches, the syntax-similarity approaches, the convolution neural network-based models, and some deep paraphrase identification models.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Paraphrase Detection"

1

Mayes, Robin James. "A Content Originality Analysis of HRD Focused Dissertations and Published Academic Articles using TurnItIn Plagiarism Detection Software." Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984233/.

Full text
Abstract:
This empirical exploratory study quantitatively analyzed content similarity indices (potential plagiarism) from a corpus consisting of 360 dissertations and 360 published articles. The population was defined using the filtering search criteria human resource development, training and development, organizational development, career development, or HRD. This study described in detail the process of collecting content similarity analysis (CSA) metadata using Turnitin software (www.turnitin.com). This researcher conducted robust descriptive statistics, a Wilcoxon signed-rank statistic between the similarity indices before and after false positives were excluded, and a multinomial logistic regression analysis to predict levels of plagiarism for the dissertations and the published articles. The corpus of dissertations had an adjusted rate of document similarity (potential plagiarism) of M = 9%, (SD = 6%) with 88.1% of the dissertations in the low level of plagiarism, 9.7% in the high and 2.2% in the excessive group. The corpus of published articles had an adjusted rate of document similarity (potential plagiarism) of M = 11%, (SD = 10%) with 79.2% of the published articles in the low level of plagiarism, 12.8% in the high and 8.1% in the excessive group. Most of the difference between the dissertations and published articles were attributed to plagiarism-of-self issues which were absent in the dissertations. Statistics were also conducted which returned a statistically significant justification for employing the investigative process of removing false positives, thereby adjusting the Turnitin results. This study also found two independent variables (reference and word counts) that predicted dissertation membership in the high (.15-.24) and excessive level (.25-1.00) of plagiarism and published article membership in the excessive level (.25-1.00) of plagiarism. I used multinomial logistic regression to establish the optimal prediction model. The multinomial logistic regression results for the dissertations returned a Nagelkerke pseudo R2 of .169 and for the published articles a Nagelkerke pseudo R2 .095.
APA, Harvard, Vancouver, ISO, and other styles
2

Vila, Rigat Marta. "Paraphrase Scope and Typology. A Data-Driven Approach from Computational Linguistics / Abast i tipologia de la paràfrasi. Una aproximació empíriica des de la lingüíística computacional." Doctoral thesis, Universitat de Barcelona, 2013. http://hdl.handle.net/10803/117850.

Full text
Abstract:
Paraphrasing is generally understood as approximate sameness of meaning between snippets of text with a different wording. Paraphrases are omnipresent in natural languages demonstrating all the aspects of its multifaceted nature. The pervasiveness of paraphrasing has made it a focus of several tasks in computational linguistics; its complexity has in turn resulted in paraphrase remaining a still unresolved challenge. Two basic issues, directly linked to the complex nature of paraphrasing, make its computational treatment particularly difficult, namely the absence of a precise and commonly accepted definition and the lack of reference corpora for paraphrasing. Based on the assumption that linguistic knowledge should underlie computational-linguistics research, this thesis aims to go a step forward in these two questions: paraphrase characterization and paraphrase-corpus building and annotation. The knowledge and resources created are then applied to natural language processing and, in concrete, to automatic plagiarism detection in order to empirically analyse their potential. This thesis is built as an article compendium comprising six core articles divided in three blocks: (i) paraphrase scope and typology, (ii) paraphrase-corpus creation and annotation, and (iii) paraphrasing in automatic plagiarism detection. In the first block, assuming that paraphrase boundaries are not fixed but depend on the field, task, and objectives, three borderline paraphrase cases are presented: paraphrases involving content loss, pragmatic knowledge, and certain grammatical features. The limits between paraphrasing and related phenomena such as coreference are also analysed. Paraphrase characterization takes on a new dimension if we look at it in extensional terms. We have built a general and linguistically-grounded paraphrase typology in line with this approach. The third issue addressed in this block is paraphrase representation, which we consider to be essential in order to formally apprehend paraphrasing. In the second block, the Wikipedia-based Relational Paraphrase Acquisition method (WRPA) is presented. It allows for the automatic extraction of paraphrases expressing a concrete relation from Wikipedia. Using this method, the WRPA corpus, covering different relations and two languages (English and Spanish), was built. A subset of the Spanish WRPA corpus, together with paraphrases in two English paraphrase corpora that are different in nature were annotated applying a new annotation scheme derived from our paraphrase typology. These annotations were validated applying the Inter-annotator Agreement for Paraphrase-Type Annotation measures (IAPTA), also developed in the framework of this thesis. In the third and final block, our typology is applied to the field of automatic plagiarism detection, demonstrating that more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, and that lexical substitutions and text-snippet additions/deletions are the most widely used paraphrase mechanisms when plagiarizing. This provides insights for future research in automatic plagiarism detection and demonstrates, through a concrete example, the value of the knowledge and data provided in this thesis to computational-linguistics research.
S'entén per paràfrasi la igualtat aproximada de significat entre fragments de text que difereixen en la forma. La paràfrasi és omnipresent en les llengües naturals, on es troba expressada de múltiples maneres. D'una banda, la ubiqüitat de la paràfrasi l'ha convertit en el centre d’interès de moltes tasques específiques dins de la lingüística computacional; de l'altra, la seva complexitat ha fet de la paràfrasi un problema que encara no té una solució definitiva. Dues qüestions bàsiques, lligades a la naturalesa complexa de la paràfrasi, fan el seu tractament computacional particularment difícil: l'absència d'una definició precisa i comunament acceptada i la manca de corpus de paràfrasis de referència. Assumint que el coneixement lingüístic ha de ser a la base de la recerca en lingüística computacional, aquesta tesi pretén avançar en dues línies de treball: en la delimitació i comprensió del que s’entén per paràfrasi, i en la creació i anotació de corpus de paràfrasis que proporcionin dades sobre les quals fonamentar tant la recerca com futurs recursos i aplicacions. Amb l'objectiu d’avaluar empíricament el seu potencial, el coneixement i els recursos creats com a resultat d'aquest treball han estat aplicats a la detecció automàtica de plagi. Aquesta tesi consisteix en un compendi de publicacions i comprèn sis articles principals dividits en tres blocs: (i) abast i tipologia de la paràfrasi, (ii) creació i anotació de corpus de paràfrasis i (iii) la paràfrasi en la detecció automàtica de plagi. En el primer bloc, partint de la base que els límits de la paràfrasi no són fixos, sinó que depenen de l'àrea de treball, la tasca i els objectius, es presenten tres casos límit de la paràfrasi: la pèrdua de contingut, el coneixement pragmàtic i la variació en determinats trets gramaticals. La caracterització de la paràfrasi pren una nova dimensió si l'observem des d'una perspectiva extensional. En aquesta línia, s'ha construït una tipologia general de la paràfrasi lingüísticament fonamentada. La tercera qüestió tractada en aquest bloc és la representació de la paràfrasi, essencial a l'hora de tractar-la formalment. En el segon bloc, es presenta un mètode per a l’adquisició de paràfrasis relacionals a partir de la Wikipedia (Wikipedia-based Relational Paraphrase Acquistion, WRPA). Aquest mètode permet extreure automàticament de la Wikipedia paràfrasis que expressen una relació concreta. Utilitzant aquest mètode, s'ha creat el corpus WRPA, que cobreix diverses relacions i dues llengües (anglès i espanyol). Un subconjunt del corpus WRPA en espanyol i exemples extrets de dos corpus de paràfrasis en anglès s'han anotat amb els tipus de paràfrasis que es proposen en aquesta tesi. Aquesta anotació ha estat validada aplicant les mesures d’acord entre anotadors (Inter-annotator Agreement for Paraphrase-Type Annotation, IAPTA), també desenvolupades en el marc d'aquesta tesi. En el tercer i últim bloc, la tipologia proposada s'ha aplicat a l'àmbit de la detecció automàtica de plagi i s'ha demostrat que els tipus de paràfrasis més complexos i l'alta concentració de mecanismes de paràfrasi fan més difícil la detecció del plagi. També s'ha demostrat que les substitucions lèxiques i l'addició/eliminació de fragments de text són els mecanismes de paràfrasi més utilitzats en el plagi. Així, es demostra el potencial del coneixement parafràstic en la detecció automàtica de plagi i en la recerca en lingüística computacional en general.
APA, Harvard, Vancouver, ISO, and other styles
3

Nawab, Rao Muhammad Adeel. "Mono-lingual paraphrased text reuse and plagiarism detection." Thesis, University of Sheffield, 2012. http://etheses.whiterose.ac.uk/2785/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kumar, Ashutosh. "Inducing Constraints in Paraphrase Generation and Consistency in Paraphrase Detection." Thesis, 2022. https://etd.iisc.ac.in/handle/2005/6055.

Full text
Abstract:
Deep learning models typically require a large volume of data. Manual curation of datasets is time-consuming and limited by imagination. As a result, natural language generation (NLG) has been employed to automate the process. However, in their vanilla formulation, NLG model are prone to producing degenerate, uninteresting, and often hallucinated outputs. Constrained generation aims to overcome these shortcomings by providing additional information to the generation process. Training data thus generated can help improve the robustness of deep learning models. Therefore, the central research question of the thesis is: “How can we constrain generation models, especially in NLP, to produce meaningful outputs and utilize them for building better classification models?” To demonstrate how generation models can be constrained, we present two approaches for paraphrase generation. Paraphrase generation involves the generation of text that conveys the same meaning as a reference text. We propose two strategies for paraphrase generation: (1) DiPS (Diversity in Paraphrases using Submodularity): The first approach deals with constraining paraphrase generation to ensure diversity, i.e., ensuring that generated text(s) are sufficiently different from each other. We propose a decoding algorithm for obtaining diverse texts. We provide a novel formulation of the problem in terms of monotone submodular function maximization, specifically targeted toward the task of paraphrase generation. We demonstrate the effectiveness of our method for data augmentation on multiple tasks such as intent classification and paraphrase recognition. (2) SGCP (Syntax Guided Controlled Paraphraser): The second approach deals with constraining paraphrase generation to ensure syntacticality, i.e., ensuring that the generated text is syntactically coherent with an exemplar sentence. We propose Syntax Guided Controlled Paraphraser (SGCP), an end-to-end framework for syntactic paraphrase generation without compromising relevance (fidelity). Through a battery of automated metrics and comprehensive human evaluation, we verify that this approach does better than prior works that utilize only limited syntactic information in the parse tree. The second part (meaningful outputs) of the research question pertains to ensuring that the generated output is meaningful. Towards this, we present an approach for paraphrase detection to ascertain that the generated output is semantically coherent with the reference text. Paraphrase Detection is the task of detecting whether or not the two input natural language statements are paraphrases of each other. Fine-tuning pre-trained models such as BERT and RoBERTa on paraphrastic datasets have become the go-to approaches for such tasks. However, tasks like paraphrase detection are symmetric - they require the output to be invariant of the order of the inputs. In the traditional fine-tuned approach for paraphrase classification, inconsistency is often observed in the predicted labels or confidence scores based on the order of the inputs. We validate this shortcoming and apply a consistency loss function to alleviate inconsistency in symmetric classification. Our results show an improved consistency in predictions for three paraphrase detection datasets without a significant drop in the accuracy scores. While these works address the research question via paraphrase generation and detection, the approaches presented here apply broadly to NLP-based deep learning models that require imposing constraints and ensuring consistency. The work on paraphrase generation can be extended to impose new kinds of constraints (for example, sentiment coherence) on generation, while paraphrase detection can be applied to ensure consistency in other symmetric classification tasks (for example, sarcasm interpretation) that use deep learning models.
APA, Harvard, Vancouver, ISO, and other styles
5

Maraev, Vladislav. "Modelling semantic relations with distributitional semantics and deep learning : question answering, entailment recognition and paraphrase detection." Master's thesis, 2017. http://hdl.handle.net/10451/30183.

Full text
Abstract:
Nesta dissertação apresenta-se uma abordagem à tarefa de modelar relações semânticas entre dois textos com base em modelos de semântica distribucional e em aprendizagem profunda. O presente trabalho tira partido de várias disciplinas da ciência cognitiva, com especial relevo para a computação, a linguística e a inteligência artificial, e com fortes influência da neurociência e da psicologia cognitiva. Os modelos de semântica distribucional (também conhecidos como ”word embeddings”) são usados para representar o significado das palavras. As representações semânticas das palavras podem ainda ser combinadas para obter o significado de um excerto de um texto recorrendo ao uso da aprendizagem profunda, isto é, com o apoio das redes neurais de convolução. Esta abordagen é utilizada para replicar a experiência realizada por Bogdanova et al. (2015) na tarefa de deteção de perguntas que podem ser respondidas as mesmas respostas tal como estas foram respondidas em fóruns on-line. Os resultados do desempenho obtidos pelas experiências apresentadas nesta dissertação são equivalentes ou melhores que os resultados obtidos no trabalho de referência mencionado acima. Apresentao também um estudo sobre o impacto do pré-processamento apropriado do texto, tendo em conta os resultados que podem ser obtidos pelas abordagens adotadas no trabalho de referência supramencionado. Este estudo é levado a cabo removendo-se certas pistas que podem levar o sistema, indevidamente, a detetar perguntas equivalentes. Essa remoção das pistas leva a uma diminuição significativa no desempenho do sistema desenvolvido no trabalho de referência. Nesta dissertação é ainda apresentado um estudo sobre o impacto que os word embeddings treinados previamente têm na tarefa de detetar perguntas semanticamente equivalentes. Substituindo-se, aleatoriamente, word embeddings previamente treinados por outros melhora-se o desempenho do sistema. Além disso, o modelo foi utilizado na tarefa de reconhecimento de implicações para Português, onde mostrou uma taxa de acerto similar à da baseline. Este trabalho também reporta os resultados da aplicação da abordagem adotada numa competição para a deteção de paráfrases em Russo. A configuração final apresenta duas melhorias: usa character embeddings em vez de word embeddings e usa vários filtros de convolução. Esta configuração foi testado na execução padrão da Tarefa 2 da competição relevante, e mostrou resultados competitivos.
This dissertation presents an approach to the task of modelling semantic relations between two texts, which is based on distributional semantic models and deep learning. The present work takes advantage of various disciplines of cognitive science, mainly computation, linguistics and artificial intelligence, with strong influences from neuroscience and cognitive psychology. Distributional semantic models (also known as word embeddings) are used to represent the meaning of words. Word semantic representations can be further combined towards obtaining the meaning of a larger chunk of a text using a deep learning approach, namely with the support of convolutional neural networks. These approaches are used to replicate the experiment carried out, by Bogdanova et al. (2015), for the task of detecting questions that can be answered by exactly the same answer in online user forums. Performance results obtained by my experiments are comparable or better than the ones reported in that referenced work. I present also a study on the impact of appropriate text preprocessing with respect to the results that can be obtained by the approaches adopted in that referenced work. Removing certain clues that can unduly help the system to detect equivalent questions leads to a significant decrease in system’s performance supported by that referenced work. I also present a study of the impact that pre-trained word embeddings have in the task of detecting the semantically equivalent questions. Replacing pre-trained word embeddings by randomly initialised ones improves the performance of the system. Additionally, the model was applied to the task of entailment recognition for Portuguese and showed an accuracy on a level with the baseline. This dissertation also reports on the results of an experimental study on the application of the adopted approach to the shared task of sentence paraphrase detection in Russian. The final set up contained two improvements: it uses several convolutional filters and it uses character embeddings instead of word embeddings. It was tested in Task 2 standard run of the relevant shared task and it showed competitive results.
APA, Harvard, Vancouver, ISO, and other styles
6

Han, Nai-Hsuan, and 韓乃軒. "A Study of Selecting Machine Learning Features for Detecting Entailment, Paraphrase and Contradiction in Texts." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/44594957024872746237.

Full text
Abstract:
碩士
國立雲林科技大學
資訊工程系碩士班
101
NTCIR-9 RITE task evaluates systems which automatically detect entailment, paraphrase, and contradiction in texts. We developed a preliminary system for the NTCIR-9 RITE task based on rules. In NTCIR-10, we tried machine learning approaches. We transformed the existing rules into features and then added additional syntactic and semantic features for SVM. The straightforward assumption was still kept in NTCIR-10: the relation between two sentences was determined by the different parts between them instead of the identical parts. Therefore, features in NTCIR-9 including sentence lengths, the content of matched keywords, quantities of matched keywords, and their parts of speech together with new features such as parsing tree information, dependency relations, negation words and synonyms were considered. We found that some features were useful for the BC subtask while some help more in the MC subtask.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Paraphrase Detection"

1

Tian, Liuyang, Hui Ning, Leilei Kong, Kaisheng Chen, Haoliang Qi, and Zhongyuan Han. "Sentence Paraphrase Detection Using Classification Models." In Text Processing, 166–81. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-73606-8_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Senthil Kumar, B., D. Thenmozhi, and S. Kayalvizhi. "Tamil Paraphrase Detection Using Encoder-Decoder Neural Networks." In IFIP Advances in Information and Communication Technology, 30–42. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63467-4_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Fujita, Atsushi, Kentaro Inui, and Yuji Matsumoto. "Detection of Incorrect Case Assignments in Paraphrase Generation." In Natural Language Processing – IJCNLP 2004, 555–65. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/978-3-540-30211-7_59.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Deléger, Louise, Bruno Cartoni, and Pierre Zweigenbaum. "Paraphrase Detection in Monolingual Specialized/Lay Comparable Corpora." In Building and Using Comparable Corpora, 223–41. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-20128-8_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Thenmozhi, Durairaj, C. Jerin Mahibha, S. Kayalvizhi, M. Rakesh, Y. Vivek, and V. Poojesshwaran. "Paraphrase Detection in Indian Languages Using Deep Learning." In Communications in Computer and Information Science, 138–54. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-33231-9_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Mahmoud, Adnen, Ahmed Zrigui, and Mounir Zrigui. "A Text Semantic Similarity Approach for Arabic Paraphrase Detection." In Computational Linguistics and Intelligent Text Processing, 338–49. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-77116-8_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kravchenko, Dmitry. "Paraphrase Detection Using Machine Translation and Textual Similarity Algorithms." In Communications in Computer and Information Science, 277–92. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-71746-3_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Víta, Martin. "Cross-lingual Metaphor Paraphrase Detection – Experimental Corpus and Baselines." In Communications in Computer and Information Science, 345–56. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-59506-7_28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Anchiêta, Rafael Torres, and Thiago Alexandre Salgueiro Pardo. "Exploring the Potentiality of Semantic Features for Paraphrase Detection." In Lecture Notes in Computer Science, 228–38. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-41505-1_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Singh, Arwinder, and Gurpreet Singh Josan. "A Deep Network Model for Paraphrase Detection in Punjabi." In Lecture Notes in Electrical Engineering, 173–85. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-15-8297-4_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Paraphrase Detection"

1

Chi, Xiaoqiang, Yang Xiang, and Ruchao Shen. "Paraphrase Detection with Dependency Embedding." In CSAI 2020: 2020 4th International Conference on Computer Science and Artificial Intelligence. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3445815.3445850.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Cordeiro, Joao, Gael Dias, and Pavel Brazdil. "A Metric for Paraphrase Detection." In 2007 International Multi-Conference on Computing in the Global Information Technology (ICCGI'07). IEEE, 2007. http://dx.doi.org/10.1109/iccgi.2007.4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Uribe, Diego. "Monotonicity Analysis for Paraphrase Detection." In 2009 Electronics, Robotics and Automotive Mechanics Conference. IEEE, 2009. http://dx.doi.org/10.1109/cerma.2009.29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

An, Bo. "Chinese Paraphrase Dataset and Detection." In 2021 International Conference on Asian Language Processing (IALP). IEEE, 2021. http://dx.doi.org/10.1109/ialp54817.2021.9675232.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bhargava, Rupal, Gargi Sharma, and Yashvardhan Sharma. "Deep Paraphrase Detection in Indian Languages." In ASONAM '17: Advances in Social Networks Analysis and Mining 2017. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3110025.3122119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Issa, Fuad, Marco Damonte, Shay B. Cohen, Xiaohui Yan, and Yi Chang. "Abstract Meaning Representation for Paraphrase Detection." In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/n18-1041.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Duong, Phuc H., Hien T. Nguyen, Hieu N. Duong, Khoa Ngo, and Dat Ngo. "A Hybrid Approach to Paraphrase Detection." In 2018 5th NAFOSTED Conference on Information and Computer Science (NICS). IEEE, 2018. http://dx.doi.org/10.1109/nics.2018.8606845.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Rohith, Mathi, Mothukuri Jaswanth Venkat, Pasumarthy Venkata Akhil, Mandiga Sahasra Sai Tarun, and Deepa Gupta. "Telugu Paraphrase Detection Using Siamese Network." In 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2022. http://dx.doi.org/10.1109/icccnt54827.2022.9984593.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wu, Wei, Yun-Cheng Ju, Xiao Li, and Ye-Yi Wang. "Paraphrase detection on SMS messages in automobiles." In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2010. http://dx.doi.org/10.1109/icassp.2010.5494959.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Malajyan, Arthur, Karen Avetisyan, and Tsolak Ghukasyan. "ARPA: Armenian Paraphrase Detection Corpus and Models." In 2020 Ivannikov Memorial Workshop (IVMEM). IEEE, 2020. http://dx.doi.org/10.1109/ivmem51402.2020.00012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography