Tesis sobre el tema "Text analysis"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Text analysis".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Haggren, Hugo. "Text Similarity Analysis for Test Suite Minimization". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290239.
Texto completoMjukvarutestning är den mest kostsamma fasen inom mjukvaruutveckling. Därför är det förståeligt varför testoptimering är ett kritiskt område inom mjukvarubranschen. Inom mjukvarutestning ställer den gradvisa ökningen av testfall stora krav på testresurser (budget och tid). Test Suite Minimization anses vara ett potentiellt tillvägagångssätt för att hantera problemet med växande testsamlingar. Flera minimiseringsmetoder har föreslagits för att effektivt hantera testsamlingars storleksproblem. Att föreslå en bra lösning för minimering av antal testfall är en utmanande uppgift, där flera parametrar som kodtäckning, kravtäckning och testkostnad måste övervägas innan man tar bort ett testfall från testcykeln. Denna uppsats föreslår och utvärderar två olika NLP-baserade metoder för likhetsanalys mellan testfall för manuell integration, som kan användas för minimering av testsamlingar. Den ena metoden baseras på syntaktisk textlikhetsanalys, medan den andra är en maskininlärningsbaserad semantisk strategi. Genomförbarheten av de föreslagna lösningarna studeras genom analys av industriella användningsfall hos Ericsson AB i Sverige. Resultaten visar att den semantiska metoden knappt lyckas överträffa den syntaktiska metoden. Medan båda tillvägagångssätten visar lovande resultat, måste efterföljande studier göras för att ytterligare utvärdera den semantiska likhetsbaserade metoden.
Romsdorfer, Harald. "Polyglot text to speech synthesis text analysis & prosody control". Aachen Shaker, 2009. http://d-nb.info/993448836/04.
Texto completoKay, Roderick Neil. "Text analysis, summarising and retrieval". Thesis, University of Salford, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360435.
Texto completoHaselton, Curt B. Deierlein Gregory G. "Assessing seismic collapse safety of modern reinforced concrete moment-frame buildings". Berkeley, Calif. : Pacific Earthquake Engineering Research Center, 2008. http://nisee.berkeley.edu/elibrary/Text/200803261.
Texto completoOzsoy, Makbule Gulcin. "Text Summarization Using Latent Semantic Analysis". Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12612988/index.pdf.
Texto completoO'Connor, Brendan T. "Statistical Text Analysis for Social Science". Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/541.
Texto completoLin, Yuhao. "Text Analysis in Fashion : Keyphrase Extraction". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290158.
Texto completoFörmågan att extrahera användbar information från texter och presentera den i form av strukturerade attribut är ett viktigt steg mot att göra produktjämförelsesalgoritmen på ett smartare och bättre sätt. Vissa tidigare arbeten utnyttjar statistiska funktioner som ordfrekvens och grafmodeller för att förutsäga nyckelfraser. Under de senaste åren har djupa neurala nätverk visat sig vara de senaste metoderna för att hantera språkmodellering. Framgångsrika exempel inkluderar Long Short Term Memory (LSTM), Gated Recurrent Units (GRU), Bidirectional Encoder Representations from Transformers (BERT) och deras variationer. Dessutom kan vissa ordinbäddningstekniker som word2vec[1] också vara till hjälp för att förbättra prestandan. Förutom dessa tekniker är en datauppsättning av hög kvalitet också viktig för modellernas effektivitet. I detta projekt strävar vi efter att utveckla pålitliga och effektiva maskininlärningsmodeller för utvinning av nyckelfraser. På Norna AB har vi en samling produktbeskrivningar från olika leverantörer utan nyckelfrasnoteringar, vilket motiverar användningen av metoder utan tillsyn. De bör kunna extrahera användbara nyckelfraser som fångar funktionerna i en produkt. För att ytterligare utforska kraften i djupa neurala nätverk implementerar vi också flera modeller för djupinlärning. Datasetet har två delar, den första delen kallas modedataset där nyckelfraser extraheras med vår metod utan tillsyn. Den andra delen är en offentlig dataset i nyhetsdomänen. Vi finner att deep learning-modeller också kan extrahera meningsfulla nyckelfraser och överträffa den oövervakade modellen. Precision, återkallning och F1-poäng används som utvärderingsmått. Resultatet visar att modellen som använder LSTM och CRF uppnår optimal prestanda. Vi jämför också prestanda för olika modeller med avseende på keyphrase längder och nyckelfras nummer. Resultatet indikerar att alla modeller presterar bättre på att förutsäga korta tangentfraser. Vi visar också att vår raffinerade modell har fördelen att förutsäga långa tangentfraser, vilket är utmanande inom detta område.
Maisto, Alessandro. "A Hybrid Framework for Text Analysis". Doctoral thesis, Universita degli studi di Salerno, 2017. http://hdl.handle.net/10556/2481.
Texto completoIn Computational Linguistics there is an essential dichotomy between Linguists and Computer Scientists. The rst ones, with a strong knowledge of language structures, have not engineering skills. The second ones, contrariwise, expert in computer and mathematics skills, do not assign values to basic mechanisms and structures of language. Moreover, this discrepancy, especially in the last decades, has increased due to the growth of computational resources and to the gradual computerization of the world; the use of Machine Learning technologies in Arti cial Intelligence problems solving, which allows for example the machines to learn , starting from manually generated examples, has been more and more often used in Computational Linguistics in order to overcome the obstacle represented by language structures and its formal representation. The dichotomy has resulted in the birth of two main approaches to Computational Linguistics that respectively prefers: rule-based methods, that try to imitate the way in which man uses and understands the language, reproducing syntactic structures on which the understanding process is based on, building lexical resources as electronic dictionaries, taxonomies or ontologies; statistic-based methods that, conversely, treat language as a group of elements, quantifying words in a mathematical way and trying to extract information without identifying syntactic structures or, in some algorithms, trying to confer to the machine the ability to learn these structures. One of the main problems is the lack of communication between these two di erent approaches, due to substantial di erences characterizing them: on the one hand there is a strong focus on how language works and on language characteristics, there is a tendency to analytical and manual work. From other hand, engineering perspective nds in language an obstacle, and recognizes in the algorithms the fastest way to overcome this problem. However, the lack of communication is not only an incompatibility: following Harris, the best way to approach natural language, could result by taking the best of both. At the moment, there is a large number of open-source tools that perform text analysis and Natural Language Processing. A great part of these tools are based on statistical models and consist on separated modules which could be combined in order to create a pipeline for the processing of the text. Many of these resources consist in code packages which have not a GUI (Graphical User Interface) and they result impossible to use for users without programming skills. Furthermore, the vast majority of these open-source tools support only English language and, when Italian language is included, the performances of the tools decrease signi cantly. On the other hand, open source tools for Italian language are very few. In this work we want to ll this gap by present a new hybrid framework for the analysis of Italian texts. It must not be intended as a commercial tool, but the purpose for which it was built is to help linguists and other scholars to perform rapid text analysis and to produce linguistic data. The framework, that performs both statistical and rule-based analysis, is called LG-Starship. The idea is to built a modular software that includes, in the beginning, the basic algorithms to perform di erent kind of analysis. Modules will perform the following tasks: Preprocessing Module: a module with which it is possible to charge a text, normalize it or delete stop-words. As output, the module presents the list of tokens and letters which compose the texts with respective occurrences count and the processed text. Mr. Ling Module: a module with which POS tagging and Lemmatization are performed. The module also returns the table of lemmas with the count of occurrences and the table with the quanti cation of grammatical tags. Statistic Module: with which it is possible to calculate Term Frequency and TF-IDF of tokens or lemmas, extract bi-grams and tri-grams units and export results as tables. Semantic Module: which use The Hyperspace Analogue to Language algorithm to calculate semantic similarity between words. The module returns similarity matrices of words per word which can be exported and analyzed. SyntacticModule: which analyze syntax structures of a selected sentence and tag the verbs and its arguments with semantic labels. The objective of the Framework is to build an all-in-one platform for NLP which allows any kind of users to perform basic and advanced text analysis. With the purpose of make the Framework accessible to users who have not speci c computer science and programming language skills, the modules have been provided with an intuitive GUI. The framework can be considered hybrid in a double sense: as explained in the previous lines, it uses both statistical and rule/based methods, by relying on standard statistical algorithms or techniques, and, at the same time, on Lexicon-Grammar syntactic theory. In addition, it has been written in both Java and Python programming languages. LG-Starship Framework has a simple Graphic User Interface but will be also released as separated modules which may be included in any NLP pipelines independently. There are many resources of this kind, but the large majority works for English. There are very few free resources for Italian language and this work tries to cover this need by proposing a tool which can be used both by linguists or other scientist interested in language and text analysis who have no idea about programming languages, as by computer scientists, who can use free modules in their own code or in combination with di erent NLP algorithms. The Framework takes the start from a text or corpus written directly by the user or charged from an external resource. The LG-Starship Framework work ow is described in the owchart shown in g. 1. The pipeline shows that the Pre-Processing Module is applied on original imported or generated text in order to produce a clean and normalized preprocessed text. This module includes a function for text splitting, a stop-word list and a tokenization method. On the text preprocessed the Statistic Module or the Mr. Ling Module can be applied. The rst one, which includes basic statistics algorithm as Term Frequency, tf-idf and n-grams extraction, produces as output databases of lexical and numerical data which can be used to produce charts or perform more external analysis; the second one, is divided in two main task: a Pos tagger, based on the Averaged Perceptron Tagger [?] and trained on the Paisà Corpus [Lyding et al., 2014], perform the Part-Of- Speech Tagging and produce an annotated text. A lemmatization method, which relies on a set of electronic dictionaries developed at the University of Salerno [Elia, 1995, Elia et al., 2010], take as input the Postagged text and produces a new lemmatized version of original text with information about syntactic and semantic properties. This lemmatized text, which can also be processed with the Statistic Module, serves as input for two deeper level of text analysis carried out by both the Syntactic Module and the Semantic Module. The rst one lays on the Lexicon Grammar Theory [Gross, 1971, 1975] and use a database of Predicate structures in development at the Department of Political, Social and Communication Science. Its objective is to produce a Dependency Graph of the sentences that compose the text. The Semantic Module uses the Hyperspace Analogue to Language distributional semantics algorithm [Lund and Burgess, 1996] trained on the Paisà Corpus to produce a semantic network of the words of the text. These work ow has been included in two di erent experiments in which two User Generated Corpora have been involved. The rst experiment represent a statistical study of the language of Rap Music in Italy through the analysis of a great corpus of Rap Song lyrics downloaded from on line databases of user generated lyrics. The second experiment is a Feature-Based Sentiment Analysis project performed on user product reviews. For this project we integrated a large domain database of linguistic resources for Sentiment Analysis, developed in the past years by the Department of Political, Social and Communication Science of the University of Salerno, which consists of polarized dictionaries of Verbs, Adjectives, Adverbs and Nouns. These two experiment underline how the linguistic framework can be applied to di erent level of analysis and to produce both Qualitative data and Quantitative data. For what concern the obtained results, the Framework, which is only at a Beta Version, obtain discrete results both in terms of processing time that in terms of precision. Nevertheless, the work is far from being considered complete. More algorithms will be added to the Statistic Module and the Syntactic Module will be completed. The GUI will be improved and made more attractive and modern and, in addiction, an open-source on-line version of the modules will be published. [edited by author]
XV n.s.
Algarni, Abdulmohsen. "Relevance feature discovery for text analysis". Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/48230/1/Abdulmohsen_Algarni_Thesis.pdf.
Texto completoRomsdorfer, Harald [Verfasser]. "Polyglot Text-to-Speech Synthesis : Text Analysis & Prosody Control / Harald Romsdorfer". Aachen : Shaker, 2009. http://d-nb.info/1156517354/34.
Texto completoNikolaou, Angelos. "Texture of analysis for Robust Reading Systems". Doctoral thesis, Universitat Autònoma de Barcelona, 2020. http://hdl.handle.net/10803/671279.
Texto completoEsta tesis se centra en el uso del análisis de texturas para sistemas de lectura robustos. En esta tesis se explora el uso del análisis de texturas para imágenes de texto. Se presenta un análisis en profundidad del descriptor de "Local Binary Pattern" (LBP). Los descriptores de LBP se utilizan en la detección de palabras y logran el máximo rendimiento entre los métodos sin aprendizaje. Se desarrolla una variante llamada Sparse Radial Sampling LBP para explotar las propiedades únicas del texto y se utiliza para lograr un rendimiento de estado de arte en la identificación de escritores. Los mismos descriptores de características se utilizan junto con modelos de redes neuronales profundas para abordar con éxito el problema de la identificación de la escritura y el lenguaje en múltiples modalidades.
This thesis focuses on the use of texture analysis for Robust Reading Systems. In this thesis the use of texture analysis for text-images is explored. An in depth analysis of the established Local Binary Pattern (LBP) descriptor is presented. The LBP descriptors are used in word-spotting and achieves top performance among learning-free methods. A custom variant called Sparse Radial Sampling LBP is developed to exploit the unique properties of text and is used to achieve state-of-the-art performance in writer identification. The same feature descriptors are used in conjunction with deep Neural Networks in order to address successfully the problem of script and language identification in multiple modalities.
Oostendorp, Marcelyn Camereldia Antonette. "Investigating changing notions of "text": comparing news text in printed and electronic media". Thesis, University of the Western Cape, 2005. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_9984_1183428106.
Texto completoThis research aimed to give an account of the development of concepts of text and discourse and the various approaches to analysis of texts and discourses, as this is reflected in core linguistic literature since the late 1960s. The idea was to focus specifically on literature that notes the development stimulated by a proliferation of electronic media. Secondly, this research aimed to describe the nature of electronic news texts found on the internet in comparison to an equivalent printed version, namely texts printed in newspapers and simultaneously on the newspaper website.
Nyns, Roland. "Text grammar and text processing: a cognitivist approach". Doctoral thesis, Universite Libre de Bruxelles, 1989. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/213285.
Texto completoGarrad, Mark y n/a. "Computer Aided Text Analysis in Personnel Selection". Griffith University. School of Applied Psychology, 2004. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040408.093133.
Texto completoKeenan, Francis Gerard. "Large vocabulary syntactic analysis for text recognition". Thesis, Nottingham Trent University, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.334311.
Texto completoRose, Tony Gerard. "Large vocabulary semantic analysis for text recognition". Thesis, Nottingham Trent University, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.333961.
Texto completoBenbrahim, Mohamed. "Automatic text summarisation through lexical cohesion analysis". Thesis, University of Surrey, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.309200.
Texto completoIbáñez, Jiménez Jorge, Cid Daniela Jiménez y Merino Naiomi Vera. "Error Analysis in Chilean Tourist Text Translations". Tesis, Universidad de Chile, 2014. http://www.repositorio.uchile.cl/handle/2250/129945.
Texto completoPalerius, Viktor. "Affect analysis for text dialogue in movies". Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-353232.
Texto completoWu, Yingyu. "Using Text based Visualization in Data Analysis". Kent State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=kent1398079502.
Texto completoGarrad, Mark. "Computer Aided Text Analysis in Personnel Selection". Thesis, Griffith University, 2004. http://hdl.handle.net/10072/367424.
Texto completoThesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Applied Psychology (Health)
Full Text
Coccetta, Francesca. "Multimodal Text Analysis and English Language Teaching". Doctoral thesis, Università degli studi di Padova, 2009. http://hdl.handle.net/11577/3426506.
Texto completoÈ pratica comune indagare corpora di testi orali utilizzando approcci presi in prestito dallo studio di corpora di testi scritti. Ciò è in parte dovuto alla mancanza di software adeguati per la loro interrogazione. Questa pratica ha alquanto limitato le potenzialità che corpora di tali testi offrono per lo studio della lingua orale. Questa tesi riprende i modelli teorici e gli strumenti informatici sviluppati dalla linguistica dei corpora multimodali (Baldry e Thibault, 2001; 2006a; 2006b; in fase di pubblicazione), e offre un metodo alternativo per lo studio di corpora orali per funzioni linguistiche e nozioni (van Ek e Trim, 1998a; 1998b; 2001). In modo particolare, la tesi applica il modello scalare, sviluppato dalla linguistica dei corpora multimodali, ad un corpus di 52 testi, accuratamente selezionati dal Padova Multimedia English Corpus (Ackerley and Coccetta, 2007a; 2007b), e dimostra come tale approccio faciliti lo studio delle funzioni linguistiche e delle nozioni vis-à-vis ciò che Baldry (2008a) definisce il co-testo multimodale. Per illustrate ciò, è stato usato il software MCA (Multimodal Corpus Authoring System) (Baldry, 2005; Baldry e Beltrami, 2005), grazie al quale si è potuto annotare ed interrogare il corpus dal punto di vista delle funzioni linguistiche e delle nozioni, ed anche dei gesti, dello sguardo e delle azioni, per mettere in evidenza l’interazione tra il linguaggio e gli altri sistemi semiotici. I risultati della ricerca sono stati applicati nell’ambito dell’apprendimento della lingua inglese nel contesto del corso online Le@rning Links (Ackerley, 2004; Ackerley e Cloke, 2005; Ackerley, Cloke e Mazurelle, 2006; Ackerley e Cloke, 2006; Ackerley e Coccetta, in fase di pubblicazione).
Tirkkonen-Condit, Sonja. "Argumentative text structure and translation". Jyväskylä : University of Jyväskylä, 1985. http://catalog.hathitrust.org/api/volumes/oclc/13332106.html.
Texto completoBafuka, Freddy Nole. "Beyond text analysis : image-based evaluation of health-related text readability using style features". Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/53121.
Texto completoIncludes bibliographical references (p. 70-71).
Many studies have shown that the readability of health documents presented to consumers does not match their reading levels. An accurate assessment of the readability of health-related texts is an important step in providing material that match readers' literacy. Current readability measurements depend heavily on text analysis (NLP), but neglect style (text layout). In this study, we show that style properties are important predictors of documents' readability. In particular, we build an automated computer program that uses documents' style to predict their readability score. The style features are extracted by analyzing only one page of the document as an image. The scores produced by our system were tested against scores given by human experts. Our tool shows stronger correlation to experts' scores than the Flesch-Kincaid readability grading method. We provide an end-user program, VisualGrader, which provides a Graphical User Interface to the scoring model.
by Freddy Nole Bafuka.
M.Eng.
Valeš, Miroslav. "Seeking the Pattern: Using Quantitative Text Analysis to Assess Text Influence on Grant Program Results". Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-193924.
Texto completoSudhahar, Saatviga. "Automated analysis of narrative text using network analysis in large corpora". Thesis, University of Bristol, 2015. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.685924.
Texto completoAbu, Sheikha Fadi. "Analysis and Generation of Formal and Informal Text". Thesis, University of Ottawa (Canada), 2010. http://hdl.handle.net/10393/28845.
Texto completoJohansson, Christian. "Computer Forensic Text Analysis with Open Source Software". Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik och datavetenskap, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4994.
Texto completoPulliam, John Mark. "An analysis of the Septuagint text of Habakkuk". Theological Research Exchange Network (TREN), 2006. http://www.tren.com/search.cfm?p001-1086.
Texto completoKowalczyk, Thomas L. "Performance analysis of text-oriented printing using PostScript /". Online version of thesis, 1988. http://hdl.handle.net/1850/10451.
Texto completoRamachandran, Venkateshwaran. "A temporal analysis of natural language narrative text". Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-03122009-040648/.
Texto completoShepherd, David. "TEFL methods articles : text analysis and reader interaction". Thesis, Durham University, 1992. http://etheses.dur.ac.uk/5710/.
Texto completoWharton, Chris. "Text and context : an analysis of advertising reception". Thesis, Northumbria University, 2005. http://nrl.northumbria.ac.uk/2831/.
Texto completoCohen, F. "TASS - Text Analysis System for Understanding News Stories". Thesis, University of Reading, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.383567.
Texto completoBoulton, David. "Fine art image classification based on text analysis". Thesis, University of Surrey, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.252478.
Texto completoLe, Thien-Hoa. "Neural Methods for Sentiment Analysis and Text Summarization". Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0037.
Texto completoThis thesis focuses on two Natural Language Processing tasks that require to extract semantic information from raw texts: Sentiment Analysis and Text Summarization. This dissertation discusses issues and seeks to improve neural models on both tasks, which have become the dominant paradigm in the past several years. Accordingly, this dissertation is composed of two parts: the first part (Neural Sentiment Analysis) deals with the computational study of people's opinions, sentiments, and the second part (Neural Text Summarization) tries to extract salient information from a complex sentence and rewrites it in a human-readable form. Neural Sentiment Analysis. Similar to computer vision, numerous deep convolutional neural networks have been adapted to sentiment analysis and text classification tasks. However, unlike the image domain, these studies are carried on different input data types and on different datasets, which makes it hard to know if a deep network is truly needed. In this thesis, we seek to find elements to address this question, i.e. whether neural networks must compute deep hierarchies of features for textual data in the same way as they do in vision. We thus propose a new adaptation of the deepest convolutional architecture (DenseNet) for text classification and study the importance of depth in convolutional models with different atom-levels (word or character) of input. We show that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms the deep DenseNet models with word inputs. Besides, to further improve sentiment classifiers and contextualize them, we propose to model them jointly with dialog acts, which are a factor of explanation and correlate with sentiments but are nevertheless often ignored. We have manually annotated both dialogues and sentiments on a Twitter-like social medium, and train a multi-task hierarchical recurrent network on joint sentiment and dialog act recognition. We show that transfer learning may be efficiently achieved between both tasks, and further analyze some specific correlations between sentiments and dialogues on social media. Neural Text Summarization. Detecting sentiments and opinions from large digital documents does not always enable users of such systems to take informed decisions, as other important semantic information is missing. People also need the main arguments and supporting reasons from the source documents to truly understand and interpret the document. To capture such information, we aim at making the neural text summarization models more explainable. We propose a model that has better explainability properties and is flexible enough to support various shallow syntactic parsing modules. More specifically, we linearize the syntactic tree into the form of overlapping text segments, which are then selected with reinforcement learning (RL) and regenerated into a compressed form. Hence, the proposed model is able to handle both extractive and abstractive summarization. Further, we observe that RL-based models are becoming increasingly ubiquitous for many text summarization tasks. We are interested in better understanding what types of information is taken into account by such models, and we propose to study this question from the syntactic perspective. We thus provide a detailed comparison of both RL-based and syntax-aware approaches and of their combination along several dimensions that relate to the perceived quality of the generated summaries such as number of repetitions, sentence length, distribution of part-of-speech tags, relevance and grammaticality. We show that when there is a resource constraint (computation and memory), it is wise to only train models with RL and without any syntactic information, as they provide nearly as good results as syntax-aware models with less parameters and faster training convergence
Widdowson, Henry George. "Text, context, pretext : critical issues in discourse analysis /". Oxford : Blackwell, 2004. http://catalogue.bnf.fr/ark:/12148/cb41322428h.
Texto completoGränsbo, Gustav. "Word Clustering in an Interactive Text Analysis Tool". Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157497.
Texto completoCANO, ERION. "Text-based Sentiment Analysis and Music Emotion Recognition". Doctoral thesis, Politecnico di Torino, 2018. http://hdl.handle.net/11583/2709436.
Texto completoCowie, James Reid. "Automatic analysis of descriptive texts". Thesis, University of Strathclyde, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.387066.
Texto completoAshton, Triss A. "Accuracy and Interpretability Testing of Text Mining Methods". Thesis, University of North Texas, 2013. https://digital.library.unt.edu/ark:/67531/metadc283791/.
Texto completoDumont-Le, Brazidc Joffrey. "An Object-Oriented Data Analysis approach for text population". Thesis, KTH, Matematisk statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-223244.
Texto completoMed ständigt ökande tillgänglighet av textvärd data ökar behovet att kunna klustra och klassificera denna data. I detta arbete utvecklar vi statistiska verktyg för hypotestestning, klustring och klassificering av textvärd data inom ramen för objektorienterad dataanalys. Projektet inkluderar forskning på semantiska metoder för att representera texter, jämförelser mellan representationer, avstånd för sådana representationer och prestanda hos permutationstest. De viktigaste metoderna som jämförs är vektorrumsmodeller och ämnesmodeller. Mer specifikt tillhandahåller detta arbete en algoritm för permutationstest, på dokument- eller meningsnivå, i syfte att pröva hypotesen att två texter har samma fördelning med avseende på olika representationer och avstånd. Till sist används en trädrepresentation för att beskriva studiet av texter ur en syntaktisk synvinkel.
au, rmatycorp@iinet net y Ross J. Maloney. "Assisting Reading and Analysis of Text Documents by Visualization". Murdoch University, 2005. http://wwwlib.murdoch.edu.au/adt/browse/view/adt-MU20060502.150150.
Texto completoLi, Yanjun. "High Performance Text Document Clustering". Wright State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1181005422.
Texto completoBonora, Filippo. "Dynamic networks, text analysis and Gephi: the art math". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/6327/.
Texto completoBoynukalin, Zeynep. "Emotion Analysis Of Turkish Texts By Using Machine Learning Methods". Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614521/index.pdf.
Texto completos research fields. The aim is to develop a machine that can detect type of user&rsquo
s emotion from his/her text. Emotion classification of English texts is studied by several researchers and promising results are achieved. In this thesis, an emotion classification study on Turkish texts is introduced. To the best of our knowledge, this is the first study on emotion analysis of Turkish texts. In English there exists some well-defined datasets for the purpose of emotion classification, but we could not find datasets in Turkish suitable for this study. Therefore, another important contribution is the generating a new data set in Turkish for emotion analysis. The dataset is generated by combining two types of sources. Several classification algorithms are applied on the dataset and results are compared. Due to the nature of Turkish language, new features are added to the existing methods to improve the success of the proposed method.
Uchimoto, Kiyotaka. "Maximum Entropy Models for Japanese Text Analysis and Generation". 京都大学 (Kyoto University), 2004. http://hdl.handle.net/2433/147595.
Texto completoKof, Leonid. "Text analysis for requirements engineering : application of computational linguistics /". Saarbrücken : VDM Verl. Dr. Müller, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=3021639&prov=M&dok_var=1&dok_ext=htm.
Texto completoGreen, Pamela Dilys. "Extracting group relationships within changing software using text analysis". Thesis, University of Hertfordshire, 2013. http://hdl.handle.net/2299/11896.
Texto completoStein, Roger Alan. "An analysis of hierarchical text classification using word embeddings". Universidade do Vale do Rio dos Sinos, 2018. http://www.repositorio.jesuita.org.br/handle/UNISINOS/7624.
Texto completoMade available in DSpace on 2019-03-07T14:41:05Z (GMT). No. of bitstreams: 1 Roger Alan Stein_.pdf: 476239 bytes, checksum: a87a32ffe84d0e5d7a882e0db7b03847 (MD5) Previous issue date: 2018-03-28
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates application of those models and algorithms on this specific problem by means of experimentation and analysis. Classification models were trained with prominent machine learning algorithm implementations—fastText, XGBoost, and Keras’ CNN—and noticeable word embeddings generation methods—GloVe, word2vec, and fastText—with publicly available data and evaluated them with measures specifically appropriate for the hierarchical context. FastText achieved an LCAF1 of 0.871 on a single-labeled version of the RCV1 dataset. The results analysis indicates that using word embeddings is a very promising approach for HTC.
Modelos eficientes de representação numérica textual (word embeddings) combinados com algoritmos modernos de aprendizado de máquina têm recentemente produzido uma melhoria considerável em tarefas de classificação automática de documentos. Contudo, a efetividade de tais técnicas ainda não foi avaliada com relação à classificação hierárquica de texto. Este estudo investiga a aplicação daqueles modelos e algoritmos neste problema em específico através de experimentação e análise. Modelos de classificação foram treinados usando implementações proeminentes de algoritmos de aprendizado de máquina—fastText, XGBoost e CNN (Keras)— e notórios métodos de geração de word embeddings—GloVe, word2vec e fastText—com dados disponíveis publicamente e avaliados usando métricas especificamente adequadas ao contexto hierárquico. Nesses experimentos, fastText alcançou um LCAF1 de 0,871 usando uma versão da base de dados RCV1 com apenas uma categoria por tupla. A análise dos resultados indica que a utilização de word embeddings é uma abordagem muito promissora para classificação hierárquica de texto.