To see the other types of publications on this topic, follow the link: Text-based.

Dissertations / Theses on the topic 'Text-based'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Text-based.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

SOARES, FABIO DE AZEVEDO. "AUTOMATIC TEXT CATEGORIZATION BASED ON TEXT MINING." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2013. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23213@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
A Categorização de Documentos, uma das tarefas desempenhadas em Mineração de Textos, pode ser descrita como a obtenção de uma função que seja capaz de atribuir a um documento uma categoria a que ele pertença. O principal objetivo de se construir uma taxonomia de documentos é tornar mais fácil a obtenção de informação relevante. Porém, a implementação e a execução de um processo de Categorização de Documentos não é uma tarefa trivial: as ferramentas de Mineração de Textos estão em processo de amadurecimento e ainda, demandam elevado conhecimento técnico para a sua utilização. Além disso, exercendo grande importância em um processo de Mineração de Textos, a linguagem em que os documentos se encontram escritas deve ser tratada com as particularidades do idioma. Contudo há grande carência de ferramentas que forneçam tratamento adequado ao Português do Brasil. Dessa forma, os objetivos principais deste trabalho são pesquisar, propor, implementar e avaliar um framework de Mineração de Textos para a Categorização Automática de Documentos, capaz de auxiliar a execução do processo de descoberta de conhecimento e que ofereça processamento linguístico para o Português do Brasil.
Text Categorization, one of the tasks performed in Text Mining, can be described as the achievement of a function that is able to assign a document to the category, previously defined, to which it belongs. The main goal of building a taxonomy of documents is to make easier obtaining relevant information. However, the implementation and execution of Text Categorization is not a trivial task: Text Mining tools are under development and still require high technical expertise to be handled, also having great significance in a Text Mining process, the language of the documents should be treated with the peculiarities of each idiom. Yet there is great need for tools that provide proper handling to Portuguese of Brazil. Thus, the main aims of this work are to research, propose, implement and evaluate a Text Mining Framework for Automatic Text Categorization, capable of assisting the execution of knowledge discovery process and provides language processing for Brazilian Portuguese.
APA, Harvard, Vancouver, ISO, and other styles
2

NUNES, IAN MONTEIRO. "CLUSTERING TEXT STRUCTURED DATA BASED ON TEXT SIMILARITY." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2008. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=25796@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
O presente trabalho apresenta os resultados que obtivemos com a aplicação de grande número de modelos e algoritmos em um determinado conjunto de experimentos de agrupamento de texto. O objetivo de tais testes é determinar quais são as melhores abordagens para processar as grandes massas de informação geradas pelas crescentes demandas de data quality em diversos setores da economia. O processo de deduplicação foi acelerado pela divisão dos conjuntos de dados em subconjuntos de itens similares. No melhor cenário possível, cada subconjunto tem em si todas as ocorrências duplicadas de cada registro, o que leva o nível de erro na formação de cada grupo a zero. Todavia, foi determinada uma taxa de tolerância intrínseca de 5 porcento após o agrupamento. Os experimentos mostram que o tempo de processamento é significativamente menor e a taxa de acerto é de até 98,92 porcento. A melhor relação entre acurácia e desempenho é obtida pela aplicação do algoritmo K-Means com um modelo baseado em trigramas.
This document reports our findings on a set of text clusterig experiments, where a wide variety of models and algorithms were applied. The objective of these experiments is to investigate which are the most feasible strategies to process large amounts of information in face of the growing demands on data quality in many fields. The process of deduplication was accelerated through the division of the data set into individual subsets of similar items. In the best case scenario, each subset must contain all duplicates of each produced register, mitigating to zero the cluster s errors. It is established, although, a tolerance of 5 percent after the clustering process. The experiments show that the processing time is significantly lower, showing a 98,92 percent precision. The best accuracy/performance relation is achieved with the K-Means Algorithm using a trigram based model.
APA, Harvard, Vancouver, ISO, and other styles
3

Biedert, Ralf [Verfasser]. "Gaze-Based Human-Text Interaction/Text 2.0 / Ralf Biedert." München : Verlag Dr. Hut, 2014. http://d-nb.info/1050331605/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Lu, Su. "DCT coefficient based text detection." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 57 p, 2008. http://proquest.umi.com/pqdweb?did=1605147371&sid=4&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Prabowo, Rudy. "Ontology-based automatic text classification." Thesis, University of Wolverhampton, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.418665.

Full text
Abstract:
This research investigates to what extent ontologies can be used to achieve an accurate classification performance of an automatic text classifier, called the Automatic Classification Engine (ACE). The task of the classifier is to classify Web pages with respect to the Dewey Decimal Classification (DOC) and Library of Congress Classification (LCC) schemes. In particular, this research focuses on how to 1. build a set of ontologies which can provide a mechanism to enable machine reasoning; 2. define the mappings between the ontologies and the two classification schemes; 3. implement an ontology-based classifier. The design and implementation of the classifier concentrates on developing an ontologybased classification model. Given a Web page, the classifier applies the model to carry out reasoning to determine terms - from within the Web page - which represent significant concepts. The classifier, then, uses the mappings to determine the associated DOC and LCC classes of the significant concepts, and assigns the DOC and LCC classes to the Web page. The research also investigates a number of approaches which can be applied to extend the coverage of the ontologies used in a semi-automatic way, since manually constructing ontologies is time consuming. The investigation leads to the design and implementation of a semi-automatic ontology construction system which can recognise new potential terms. By using an ontology editor, those new terms can be integrated into their associated ontologies. An experiment was conducted to validate the effectiveness of the classification model, in which the classifier classified a set of collections of Web pages. The performance of the classifier was measured, in terms of its coverage and accuracy. The experimental evidence shows that the ontology-based automatic text classification approach achieved a better level of performance over the existing approaches.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Xuan. "Hardware-based text-to-braille translation." Curtin University of Technology, Department of Computer Engineering, 2007. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=17220.

Full text
Abstract:
Braille, as a special written method of communication for the blind, has been globally accepted for years. It gives blind people another chance to learn and communicate more efficiently with the rest of the world. It also makes possible the translation of printed languages into a written language which is recognisable for blind people. Recently, Braille is experiencing a decreasing popularity due to the use of alternative technologies, like speech synthesis. However, as a form of literacy, Braille is still playing a significant role in the education of people with visual impairments. With the development of electronic technology, Braille turned out to be well suited to computer-aided production because of its coded forms. Software based text-to-Braille translation has been proved to be a successful solution in Assistive Technology (AT). However, the feasibility and advantages of the algorithm reconfiguration based on hardware implementation have rarely been substantially discussed. A hardware-based translation system with algorithm reconfiguration is able to supply greater throughput than a software-based system. Further, it is also expected as a single component integrated in a multi-functional Braille system on a chip.
Therefore, this thesis presents the development of a system for text-to-Braille translation implemented in hardware. Differing from most commercial methods, this translator is able to carry out the translation in hardware instead of using software. To find a particular translation algorithm which is suitable for a hardware-based solution, the history of, and previous contributions to Braille translation are introduced and discussed. It is concluded that Markov systems, a formal language theory, were highly suitable for application to hardware based Braille translation. Furthermore, the text-to-Braille algorithm is reconfigured to achieve parallel processing to accelerate the translation speed. Characteristics and advantages of Field Programmable Gate Arrays (FPGAs), and application of Very High Speed Integrated Circuit Hardware Description Language (VHDL) are introduced to explain how the translating algorithm can be transformed to hardware. Using a Xilinx hardware development platform, the algorithm for text-to-Braille translation is implemented and the structure of the translator is described hierarchically.
APA, Harvard, Vancouver, ISO, and other styles
7

Gottlieb, Michael. "Text based methods for variant prioritization." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/60358.

Full text
Abstract:
Despite improvements in sequencing technologies, DNA sequence variant interpretation for rare genetic diseases remains challenging. In a typical workflow for the Treatable Intellectual Disability Endeavor in B.C. (TIDE BC), a geneticist examines variant calls to establish a set of candidate variants that explain a patient's phenotype. Even with a sophisticated computation pipeline for variant prioritization, they may need to consider hundreds of variants. This typically involves literature searches on individual variants to determine how well they explain the reported phenotype, which is a time consuming process. In this work, text analysis based variant prioritization methods are developed and assessed for the capacity to distinguish causal variants within exome analysis results for a reference set of individuals with metabolic disorders.
Science, Faculty of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
8

Liljeström, Monica. "Learning text talk online : Collaborative learning in asynchronous text based discussion forums." Doctoral thesis, Umeå universitet, Pedagogiska institutionen, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-34199.

Full text
Abstract:
The desire to translate constructivist and sociocultural approaches to learning in specific learning activities is evident in most forms of training at current, not least in online education. Teachers worldwide are struggling with questions of how to create conditions in this fairly new realm of education for learners to contribute to the development of a good quality in their own and others' learning. Collaboration in forms of text talk in asynchronous, text based forums (ADF) is often used so students can participate at the location and time that suits them best given the other aspects of their life situation. But previous research show how collaboration in forms of text talk do not always evolve in expected quality, and how participation sometimes can be so low that no discussions at all take place. Perhaps it is time to move on and make use of the variety of user-friendly audio-visible technologies that offers conditions for collaboration similar to those in the physical environment? Is there any point to use ADF for collaboration, beyond the flexible opportunity for participation it allows? If so, why, how and under what conditions are it worthwhile to use ADF for tasks meant to be worked collaboratively on? These questions were the starting point of the studies in this thesis that was researched through two case studies involving different techniques and data samples of various natures, with the aim to understand more about collaborative text talk. The research approach differs from the vast majority of studies in the research field of Computer Supported Collaborative Learning (CSCL) where many studies currently are conducted by analysis of quantifiable data. The first case study was conducted in the context of non-formal learning in Swedish Liberal Adult Education online, and the second in the context of higher education online in Sweden. The studies in the thesis were made on basis of socio-cultural theory and empirical studies. Empirical data was collected from questionnaires, interviews and texts created by students participating in tasks that they jointly resolved through text talk. Some results were brought back to the students for further explanation of the results. Findings from data analysis were triangulated with other results and with sociocultural theory. The results indicate that students can create knowledge relevant to their studies through text talk, but can feel restrained or dismiss the activity as irrelevant if important conditions are lacking.  Collaboration through text talk makes individual resources accessible in a specific place where it can be observed and its validity for the purpose of the task evaluated by others. Students with good insight in what they are supposed to accomplish seem be able to consult relevant guidance for this evaluation, from teachers, textbooks, scientific articles and other valid experiences important to their studies, and thereby contribute to learning of the quality they studies are meant to produce. Text talk also increases teachers’ possibilities to identify what the guidance the study group needs when evaluating the gathered resources and through their own active participation provide support in the students “zone of proximal development”. Contributions offered to the CSCL research field is the identifications of important mechanisms related to learning collaboratively through text talk, and the use of case study methodology as inspiration for others to try also these kinds of strategies to capture online learning.
APA, Harvard, Vancouver, ISO, and other styles
9

Davis, Marcia H. "Effects of text markers and familiarity on component structures of text-based representations." College Park, Md. : University of Maryland, 2006. http://hdl.handle.net/1903/4086.

Full text
Abstract:
Thesis (Ph. D.) -- University of Maryland, College Park, 2006.
Thesis research directed by: Human Development. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Nan. "TRANSFORM BASED AND SEARCH AWARE TEXT COMPRESSION SCHEMES AND COMPRESSED DOMAIN TEXT RETRIEVAL." Doctoral diss., University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3938.

Full text
Abstract:
In recent times, we have witnessed an unprecedented growth of textual information via the Internet, digital libraries and archival text in many applications. While a good fraction of this information is of transient interest, useful information of archival value will continue to accumulate. We need ways to manage, organize and transport this data from one point to the other on data communications links with limited bandwidth. We must also have means to speedily find the information we need from this huge mass of data. Sometimes, a single site may also contain large collections of data such as a library database, thereby requiring an efficient search mechanism even to search within the local data. To facilitate the information retrieval, an emerging ad hoc standard for uncompressed text is XML which preprocesses the text by putting additional user defined metadata such as DTD or hyperlinks to enable searching with better efficiency and effectiveness. This increases the file size considerably, underscoring the importance of applying text compression. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much as possible. Text compression is concerned with techniques for representing the digital text data in alternate representations that takes less space. Not only does it help conserve the storage space for archival and online data, it also helps system performance by requiring less number of secondary storage (disk or CD Rom) accesses and improves the network transmission bandwidth utilization by reducing the transmission time. Unlike static images or video, there is no international standard for text compression, although compressed formats like .zip, .gz, .Z files are increasingly being used. In general, data compression methods are classified as lossless or lossy. Lossless compression allows the original data to be recovered exactly. Although used primarily for text data, lossless compression algorithms are useful in special classes of images such as medical imaging, finger print data, astronomical images and data bases containing mostly vital numerical data, tables and text information. Many lossy algorithms use lossless methods at the final stage of the encoding stage underscoring the importance of lossless methods for both lossy and lossless compression applications. In order to be able to effectively utilize the full potential of compression techniques for the future retrieval systems, we need efficient information retrieval in the compressed domain. This means that techniques must be developed to search the compressed text without decompression or only with partial decompression independent of whether the search is done on the text or on some inversion table corresponding to a set of key words for the text. In this dissertation, we make the following contributions: (1) Star family compression algorithms: We have proposed an approach to develop a reversible transformation that can be applied to a source text that improves existing algorithm's ability to compress. We use a static dictionary to convert the English words into predefined symbol sequences. These transformed sequences create additional context information that is superior to the original text. Thus we achieve some compression at the preprocessing stage. We have a series of transforms which improve the performance. Star transform requires a static dictionary for a certain size. To avoid the considerable complexity of conversion, we employ the ternary tree data structure that efficiently converts the words in the text to the words in the star dictionary in linear time. (2) Exact and approximate pattern matching in Burrows-Wheeler transformed (BWT) files: We proposed a method to extract the useful context information in linear time from the BWT transformed text. The auxiliary arrays obtained from BWT inverse transform brings logarithm search time. Meanwhile, approximate pattern matching can be performed based on the results of exact pattern matching to extract the possible candidate for the approximate pattern matching. Then fast verifying algorithm can be applied to those candidates which could be just small parts of the original text. We present algorithms for both k-mismatch and k-approximate pattern matching in BWT compressed text. A typical compression system based on BWT has Move-to-Front and Huffman coding stages after the transformation. We propose a novel approach to replace the Move-to-Front stage in order to extend compressed domain search capability all the way to the entropy coding stage. A modification to the Move-to-Front makes it possible to randomly access any part of the compressed text without referring to the part before the access point. (3) Modified LZW algorithm that allows random access and partial decoding for the compressed text retrieval: Although many compression algorithms provide good compression ratio and/or time complexity, LZW is the first one studied for the compressed pattern matching because of its simplicity and efficiency. Modifications on LZW algorithm provide the extra advantage for fast random access and partial decoding ability that is especially useful for text retrieval systems. Based on this algorithm, we can provide a dynamic hierarchical semantic structure for the text, so that the text search can be performed on the expected level of granularity. For example, user can choose to retrieve a single line, a paragraph, or a file, etc. that contains the keywords. More importantly, we will show that parallel encoding and decoding algorithm is trivial with the modified LZW. Both encoding and decoding can be performed with multiple processors easily and encoding and decoding process are independent with respect to the number of processors.
Ph.D.
School of Computer Science
Engineering and Computer Science
Computer Science
APA, Harvard, Vancouver, ISO, and other styles
11

Mick, Alan A. "Knowledge based text indexing and retrieval utilizing case based reasoning /." Online version of thesis, 1994. http://hdl.handle.net/1850/11715.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Krishnan, Sharenya. "Text-Based Information Retrieval Using Relevance Feedback." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-53603.

Full text
Abstract:
Europeana, a freely accessible digital library with an idea to make Europe's cultural and scientific heritage available to the public was founded by the European Commission in 2008. The goal was to deliver a semantically enriched digital content with multilingual access to it. Even though they managed to increase the content of data they slowly faced the problem of retrieving information in an unstructured form. So to complement the Europeana portal services, ASSETS (Advanced Search Service and Enhanced Technological Solutions) was introduced with services that sought to improve the usability and accessibility of Europeana. My contribution is to study different text-based information retrieval models, their relevance feedback techniques and to implement one simple model. The thesis explains a detailed overview of the information retrieval process along with the implementation of the chosen strategy for relevance feedback that generates automatic query expansion. Finally, the thesis concludes with the analysis made using relevance feedback, discussion on the model implemented and then an assessment on future use of this model both as a continuation of my work and using this model in ASSETS.
APA, Harvard, Vancouver, ISO, and other styles
13

Branchetti, Simone. "Color Watermarking Techniques for Text-based Media." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/22170/.

Full text
Abstract:
The main focus in this thesis is text-based watermarking: that means embedding a secter payload of bits into a text, while keeping the text as unaltered as possible to make it harder to identifiy that a watermark happened at all. Using two Google Workplace add-ons we explore the efficacy, the ease of use and the portability of three structural watermarking techniques: Homoglyph-based watermarking, Space coloring-based watermarking and Grayscale based Watermarking. The latter two are techniques developed specifically for this thesis. Another important focus is the use of the Digital Object Identifer system as another level of protection for a document, using its metadata system like one would a Zero-watermarking technique.
APA, Harvard, Vancouver, ISO, and other styles
14

Vassiliou, Andrew. "Analysing film content : a text-based approach." Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/2244/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Westmacott, Mike. "Content based image retrieval : analogies with text." Thesis, University of Southampton, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.423038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Wang, Xutao. "Chinese Text Classification Based On Deep Learning." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-35322.

Full text
Abstract:
Text classification has always been a concern in area of natural language processing, especially nowadays the data are getting massive due to the development of internet. Recurrent neural network (RNN) is one of the most popular method for natural language processing due to its recurrent architecture which give it ability to process serialized information. In the meanwhile, Convolutional neural network (CNN) has shown its ability to extract features from visual imagery. This paper combine the advantages of RNN and CNN and proposed a model called BLSTM-C for Chinese text classification. BLSTM-C begins with a Bidirectional long short-term memory (BLSTM) layer which is an special kind of RNN to get a sequence output based on the past context and the future context. Then it feed this sequence to CNN layer which is utilized to extract features from the previous sequence. We evaluate BLSTM-C model on several tasks such as sentiment classification and category classification and the result shows our model’s remarkable performance on these text tasks.
APA, Harvard, Vancouver, ISO, and other styles
17

Wu, Yingyu. "Using Text based Visualization in Data Analysis." Kent State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=kent1398079502.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Bafuka, Freddy Nole. "Beyond text analysis : image-based evaluation of health-related text readability using style features." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/53121.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
Includes bibliographical references (p. 70-71).
Many studies have shown that the readability of health documents presented to consumers does not match their reading levels. An accurate assessment of the readability of health-related texts is an important step in providing material that match readers' literacy. Current readability measurements depend heavily on text analysis (NLP), but neglect style (text layout). In this study, we show that style properties are important predictors of documents' readability. In particular, we build an automated computer program that uses documents' style to predict their readability score. The style features are extracted by analyzing only one page of the document as an image. The scores produced by our system were tested against scores given by human experts. Our tool shows stronger correlation to experts' scores than the Flesch-Kincaid readability grading method. We provide an end-user program, VisualGrader, which provides a Graphical User Interface to the scoring model.
by Freddy Nole Bafuka.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
19

Johansson, Vida. "Depending on VR : Rule-based Text Simplification Based on Dependency Relations." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139043.

Full text
Abstract:
The amount of text that is written and made available increases all the time. However, it is not readily accessible to everyone. The goal of the research presented in this thesis was to develop a system for automatic text simplification based on dependency relations, develop a set of simplification rules for the system, and evaluate the performance of the system. The system was built on a previous tool and developments were made to ensure the that the system could perform the operations necessary for the rules included in the rule set. The rule set was developed by manual adaption of the rules to a set of training texts. The evaluation method used was a classification task with both objective measures (precision and recall) and a subjective measure (correctness). The performance of the system was compared to that of a system based on constituency relations. The results showed that the current system scored higher on both precision (96% compared to 82%) and recall (86% compared to 53%), indicating that the syntactic information dependency relations provide is sufficient to perform text simplification. Further evaluation should account for how helpful the text simplification produced by the current system is for target readers.
APA, Harvard, Vancouver, ISO, and other styles
20

Deniz, Onur. "Ontology Based Text Mining In Turkish Radiology Reports." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614145/index.pdf.

Full text
Abstract:
Vast amount of radiology reports are produced in hospitals. Being in free text format and having errors due to rapid production, it continuously gets more complicated for radiologists and physicians to reach meaningful information. Though application of ontologies into bio-medical text mining has gained increasing interest in recent years, less work has been offered for ontology based retrieval tasks in Turkish language. In this work, an information extraction and retrieval system based on SNOMED-CT ontology has been proposed for Turkish radiology reports. Main purpose of this work is to utilize semantic relations in ontology to improve precision and recall rates of search results in domain. Practical problems encountered such as spelling errors, segmentation and tokenization of unstructured medical reports has also been addressed during the work.
APA, Harvard, Vancouver, ISO, and other styles
21

Wildermoth, Brett Richard, and n/a. "Text-Independent Speaker Recognition Using Source Based Features." Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
APA, Harvard, Vancouver, ISO, and other styles
22

Boulton, David. "Fine art image classification based on text analysis." Thesis, University of Surrey, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.252478.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Mohamed, Muhidin Abdullahi. "Automatic text summarisation using linguistic knowledge-based semantics." Thesis, University of Birmingham, 2016. http://etheses.bham.ac.uk//id/eprint/6659/.

Full text
Abstract:
Text summarisation is reducing a text document to a short substitute summary. Since the commencement of the field, almost all summarisation research works implemented to this date involve identification and extraction of the most important document/cluster segments, called extraction. This typically involves scoring each document sentence according to a composite scoring function consisting of surface level and semantic features. Enabling machines to analyse text features and understand their meaning potentially requires both text semantic analysis and equipping computers with an external semantic knowledge. This thesis addresses extractive text summarisation by proposing a number of semantic and knowledge-based approaches. The work combines the high-quality semantic information in WordNet, the crowdsourced encyclopaedic knowledge in Wikipedia, and the manually crafted categorial variation in CatVar, to improve the summary quality. Such improvements are accomplished through sentence level morphological analysis and the incorporation of Wikipedia-based named-entity semantic relatedness while using heuristic algorithms. The study also investigates how sentence-level semantic analysis based on semantic role labelling (SRL), leveraged with a background world knowledge, influences sentence textual similarity and text summarisation. The proposed sentence similarity and summarisation methods were evaluated on standard publicly available datasets such as the Microsoft Research Paraphrase Corpus (MSRPC), TREC-9 Question Variants, and the Document Understanding Conference 2002, 2005, 2006 (DUC 2002, DUC 2005, DUC 2006) Corpora. The project also uses Recall-Oriented Understudy for Gisting Evaluation (ROUGE) for the quantitative assessment of the proposed summarisers’ performances. Results of our systems showed their effectiveness as compared to related state-of-the-art summarisation methods and baselines. Of the proposed summarisers, the SRL Wikipedia-based system demonstrated the best performance.
APA, Harvard, Vancouver, ISO, and other styles
24

Thaper, Nitin 1975. "Using compression for source-based classification of text." Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/86595.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Benveniste, Steven M. "Investigation into text classification with kernel based schemes." Thesis, Monterey, California : Naval Postgraduate School, 2010. http://edocs.nps.edu/npspubs/scholarly/theses/2010/Mar/10Mar%5FBenveniste.pdf.

Full text
Abstract:
Thesis (M.S. in Electrical Engineering)--Naval Postgraduate School, March 2010.
Thesis Advisor(s): Fargues, Monique P. Second Reader: Cristi, Roberto. "March 2010." Description based on title screen as viewed on May 6, 2010. Author(s) subject terms: Text Classification, Text Categorization, Kernel Based Schemes, Single Value Decomposition (SVD), Data Mining, Feature Vector Selection (FVS). Includes bibliographical references (p. 141-142). Also available in print.
APA, Harvard, Vancouver, ISO, and other styles
26

Stigeborn, Olivia. "Text ranking based on semantic meaning of sentences." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-300442.

Full text
Abstract:
Finding a suitable candidate to client match is an important part of consultant companies work. It takes a lot of time and effort for the recruiters at the company to read possibly hundreds of resumes to find a suitable candidate. Natural language processing is capable of performing a ranking task where the goal is to rank the resumes with the most suitable candidates ranked the highest. This ensures that the recruiters are only required to look at the top ranked resumes and can quickly get candidates out in the field. Former research has used methods that count specific keywords in resumes and can make decisions on whether a candidate has an experience or not. The main goal of this thesis is to use the semantic meaning of the text in the resumes to get a deeper understanding of a candidate’s level of experience. It also evaluates if the model is possible to run on-device and if the database can contain a mix of English and Swedish resumes. An algorithm was created that uses the word embedding model DistilRoBERTa that is capable of capturing the semantic meaning of text. The algorithm was evaluated by generating job descriptions from the resumes by creating a summary of each resume. The run time, memory usage and the ranking the wanted candidate achieved was documented and used to analyze the results. When the candidate who was used to generate the job description is ranked in the top 10 the classification was considered to be correct. The accuracy was calculated using this method and an accuracy of 68.3% was achieved. The results show that the algorithm is capable of ranking resumes. The algorithm is able to rank both Swedish and English resumes with an accuracy of 67.7% for Swedish resumes and 74.7% for English. The run time was fast enough at an average of 578 ms but the memory usage was too large to make it possible to use the algorithm on-device. In conclusion the semantic meaning of resumes can be used to rank resumes and possible future work would be to combine this method with a method that counts keywords to research if the accuracy would increase.
Att hitta en lämplig kandidat till kundmatchning är en viktig del av ett konsultföretags arbete. Det tar mycket tid och ansträngning för rekryterare på företaget att läsa eventuellt hundratals CV:n för att hitta en lämplig kandidat. Det finns språkteknologiska metoder för att rangordna CV:n med de mest lämpliga kandidaterna rankade högst. Detta säkerställer att rekryterare endast behöver titta på de topprankade CV:erna och snabbt kan få kandidater ut i fältet. Tidigare forskning har använt metoder som räknar specifika nyckelord i ett CV och är kapabla att avgöra om en kandidat har specifika erfarenheter. Huvudmålet med denna avhandling är att använda den semantiska innebörden av texten iCV:n för att få en djupare förståelse för en kandidats erfarenhetsnivå. Den utvärderar också om modellen kan köras på mobila enheter och om algoritmen kan rangordna CV:n oberoende av om CV:erna är på svenska eller engelska. En algoritm skapades som använder ordinbäddningsmodellen DistilRoBERTa som är kapabel att fånga textens semantiska betydelse. Algoritmen utvärderades genom att generera jobbeskrivningar från CV:n genom att skapa en sammanfattning av varje CV. Körtiden, minnesanvändningen och rankningen som den önskade kandidaten fick dokumenterades och användes för att analysera resultatet. När den kandidat som användes för att generera jobbeskrivningen rankades i topp 10 ansågs klassificeringen vara korrekt. Noggrannheten beräknades med denna metod och en noggrannhet på 68,3 % uppnåddes. Resultaten visar att algoritmen kan rangordna CV:n. Algoritmen kan rangordna både svenska och engelska CV:n med en noggrannhet på 67,7 % för svenska och 74,7 % för engelska. Körtiden var i genomsnitt 578 ms vilket skulle möjliggöra att algoritmen kan köras på mobila enheter men minnesanvändningen var för stor. Sammanfattningsvis kan den semantiska betydelsen av CV:n användas för att rangordna CV:n och ett eventuellt framtida arbete är att kombinera denna metod med en metod som räknar nyckelord för att undersöka hur noggrannheten skulle påverkas.
APA, Harvard, Vancouver, ISO, and other styles
27

Meyer, David, Kurt Hornik, and Ingo Feinerer. "Text Mining Infrastructure in R." American Statistical Association, 2008. http://epub.wu.ac.at/3978/1/textmining.pdf.

Full text
Abstract:
During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classiffication and string kernels. (authors' abstract)
APA, Harvard, Vancouver, ISO, and other styles
28

Goodrum, Abby A. (Abby Ann). "Evaluation of Text-Based and Image-Based Representations for Moving Image Documents." Thesis, University of North Texas, 1997. https://digital.library.unt.edu/ark:/67531/metadc500441/.

Full text
Abstract:
Document representation is a fundamental concept in information retrieval (IR), and has been relied upon in textual IR systems since the advent of library catalogs. The reliance upon text-based representations of stored information has been perpetuated in conventional systems for the retrieval of moving images as well. Although newer systems have added image-based representations of moving image documents as aids to retrieval, there has been little research examining how humans interpret these different types of representations. Such basic research has the potential to inform IR system designers about how best to aid users of their systems in retrieving moving images. One key requirement for the effective use of document representations in either textual or image form is thedegree to which these representations are congruent with the original documents. A measure of congruence is the degree to which human responses to representations are similar to responses produced by the document being represented. The aim of this study was to develop a model for the representation of moving images based upon human judgements of representativeness. The study measured the degree of congruence between moving image documents and their representations, both text and image based, in a non-retrieval environment with and without task constraints. Multidimensional scaling (MDS) was used to examine the dimensional dispersions of human judgements for the full moving images and their representations.
APA, Harvard, Vancouver, ISO, and other styles
29

Stymne, Sara. "Text Harmonization Strategies for Phrase-Based Statistical Machine Translation." Doctoral thesis, Linköpings universitet, NLPLAB - Laboratoriet för databehandling av naturligt språk, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-76766.

Full text
Abstract:
In this thesis I aim to improve phrase-based statistical machine translation (PBSMT) in a number of ways by the use of text harmonization strategies. PBSMT systems are built by training statistical models on large corpora of human translations. This architecture generally performs well for languages with similar structure. If the languages are different for example with respect to word order or morphological complexity, however, the standard methods do not tend to work well. I address this problem through text harmonization, by making texts more similar before training and applying a PBSMT system. I investigate how text harmonization can be used to improve PBSMT with a focus on four areas: compounding, definiteness, word order, and unknown words. For the first three areas, the focus is on linguistic differences between languages, which I address by applying transformation rules, using either rule-based or machine learning-based techniques, to the source or target data. For the last area, unknown words, I harmonize the translation input to the training data by replacing unknown words with known alternatives. I show that translation into languages with closed compounds can be improved by splitting and merging compounds. I develop new merging algorithms that outperform previously suggested algorithms and show how part-of-speech tags can be used to improve the order of compound parts. Scandinavian definite noun phrases are identified as a problem forPBSMT in translation into Scandinavian languages and I propose a preprocessing approach that addresses this problem and gives large improvements over a baseline. Several previous proposals for how to handle differences in reordering exist; I propose two types of extensions, iterating reordering and word alignment and using automatically induced word classes, which allow these methods to be used for less-resourced languages. Finally I identify several ways of replacing unknown words in the translation input, most notably a spell checking-inspired algorithm, which can be trained using character-based PBSMT techniques. Overall I present several approaches for extending PBSMT by the use of pre- and postprocessing techniques for text harmonization, and show experimentally that these methods work. Text harmonization methods are an efficient way to improve statistical machine translation within the phrase-based approach, without resorting to more complex models.
APA, Harvard, Vancouver, ISO, and other styles
30

Hossayni, Sayyed Ali. "Foundations of uncertainty management for text-based sentiment prediction." Doctoral thesis, Universitat de Girona, 2018. http://hdl.handle.net/10803/666765.

Full text
Abstract:
Analyzing the sentiment of Social Networks users is an attractive task, well-covered by the Sentiment Analysis research communities. Alongside, predicting the rating/opinion of users in Social Networks or e-commerce platforms is another attractive task covered by the Recommender Systems research communities. However, there is a rather new field of study that takes advantage of both of the mentioned scopes to predict the “unexpressed” opinion of users, based on their written sentiments and their similarity. Although the Social Network extracted data (due to the sparsity of the addressed items by different users) deals with high volumes of uncertainty, none of the few dozens of conducted studies in the Sentiment Prediction field focuses on managing the mentioned uncertainty. In this dissertation, we introduce the necessary foundations for constructing an Uncertainty-handling Sentiment Prediction system, by means of possibility theory, fuzzy theory, and probability theory. Moreover, we define an international project called probabilistic/possibilistic Text-based Emotion Rating (pTER) to fill and then enrich the gap of uncertainty management in Sentiment Prediction. pTER comprises two sub-projects: Scalar and Interval pTER. This dissertation provides five foundational research studies in the scalar pTER. Although the mentioned studies are sufficient for the targeted system, we let the scalar pTER system, itself, to be disseminated only after it can use its entire potency by utilizing the in-progress research projects of the other researchers of the pTER project, defined by this dissertation. In addition to the presented scalar-pTER studies, we also propose one research study in the interval pTER project which goes one step further in Uncertainty-handling and takes the measurement errors of the scalar pTER sub-systems into account. The presented studies in scalar- and interval-pTER belong to three phases: (I) Uncertainty-handling NLP platform, (II) Uncertainty-handling Sentiment Analysis, and (III) Uncertainty-handling Collaborative Filtering. The conducted experiments in this dissertation prove the superiority of our Uncertainty-handling approaches in all of these phases, in comparison to the corresponding state-of-the-art
Hay un campo de estudio bastante nuevo que aprovecha análisis de sentimiento y filtrado colaborativo para predecir la opinión "no expresada" de los usuarios, en función de sus sentimientos escritos y su similitud. Aunque de la red social se extraen datos (debido a la escasez de los elementos tratados por diferentes usuarios), abarcando e un alto volumen de incertidumbre, ninguna de las pocas docenas de estudios realizados en el campo predicción del sentimiento se centra en la gestión de la incertidumbre mencionada. Presentamos los fundamentos necesarios para construir un sistema de predicción del sentimiento de manejo de la incertidumbre, mediante la teoría de la posibilidad, la teoría difusa y la teoría de la probabilidad. Además, definimos un proyecto internacional llamado Probabilistic / Posibilista basado en el Emoción Rating (pTER) de textos para llenar y luego enriquecer el área de investigación de la gestión de la incertidumbre en predicción del sentimiento
APA, Harvard, Vancouver, ISO, and other styles
31

Hossain, Mahmud Shahriar. "Apriori approach to graph-based clustering of text documents." Thesis, Montana State University, 2008. http://etd.lib.montana.edu/etd/2008/hossain/HossainM0508.pdf.

Full text
Abstract:
This thesis report introduces a new technique of document clustering based on frequent senses. The developed system, named GDClust (Graph-Based Document Clustering) [1], works with frequent senses rather than dealing with frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and uses an Apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate accurate sense-based document clusters. We propose a novel multilevel Gaussian minimum support strategy for candidate subgraph generation. Additionally, we introduce another novel mechanism called Subgraph-Extension mining that reduces the number of candidates and overhead imposed by the traditional Apriori-based candidate generation mechanism. GDClust utilizes an English language thesaurus (WordNet [2]) to construct document-graphs and exploits graph-based data mining techniques for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.
APA, Harvard, Vancouver, ISO, and other styles
32

Lam, Yat-kin, and 林日堅. "Intelligent lexical access based on Chinese/English text queries." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2005. http://hub.hku.hk/bib/B30445474.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Curtmola, Emiran. "Democratic community-based search with XML full-text queries." Diss., [La Jolla] : University of California, San Diego, 2009. http://wwwlib.umi.com/cr/ucsd/fullcit?p3378521.

Full text
Abstract:
Thesis (Ph. D.)--University of California, San Diego, 2009.
Title from first page of PDF file (viewed October 22, 2009). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 184-193).
APA, Harvard, Vancouver, ISO, and other styles
34

Botha, Gerrti Reinier. "Text-based language identification for the South African languages." Pretoria : [s.n.], 2007. http://upetd.up.ac.za/thesis/available/etd-090942008-133715/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Kamal, Hasan. "An ATMS-based architecture for stylistics-aware text generation." Thesis, University of Edinburgh, 2002. http://hdl.handle.net/1842/23067.

Full text
Abstract:
This thesis is concerned with the effect of surface stylistic constraints (SSC) on syntactic and lexical choice within a unified generation architecture. Despite the fact that these issues have been investigated by researchers in the field, little work has been done with regard to system architectures that allow surface form constraints to influence earlier linguistic or even semantic decisions made throughout the NLG process. By SSC we mean those stylistic requirements that are known beforehand but cannot be tested until after the utterance or (in some lucky cases) a proper linearised part of it has been generated. These include collocational constraints, text size limits, and poetic aspects such as rhyme and metre to name a few. This thesis introduces a new NLG architecture that can be sensitive to surface stylistic requirements. It brings together a well-founded linguistic theory that has been used in many successful NLG systems (Systemic Functional Linguistics, SFL) and an existing AI search mechanism (the Assumption-based Truth Maintenance System, ATMS) which caches important search information and avoids work duplication. To this end, the thesis explores the logical relation between the grammar formalism and the search technique. It designs, based on that logical connection, an algorithm for the automatic translation of systemic grammar networks to ATMS dependency networks. The generator then uses the translated networks to generate natural language texts with a high paraphrasing power as a direct result of its ability to pursue multiple paths simultaneously. The thesis approaches the crucial notion of choice differently to previous systems using SFL. It relaxes the choice process in that choosers are not obliged to deterministically choose a single alternative allowing SSC to influence the final lexical and syntactic decisions. The thesis also develops a situation-action framework for the specification of stylistic requirements independently of the micro-semantic input. The user or application can state what surface requirements they wish to impose and the ATMS-based generator then attempts to satisfy these constraints.
APA, Harvard, Vancouver, ISO, and other styles
36

Han, Changan. "Neural Network Based Off-line Handwritten Text Recognition System." FIU Digital Commons, 2011. http://digitalcommons.fiu.edu/etd/363.

Full text
Abstract:
This dissertation introduces a new system for handwritten text recognition based on an improved neural network design. Most of the existing neural networks treat mean square error function as the standard error function. The system as proposed in this dissertation utilizes the mean quartic error function, where the third and fourth derivatives are non-zero. Consequently, many improvements on the training methods were achieved. The training results are carefully assessed before and after the update. To evaluate the performance of a training system, there are three essential factors to be considered, and they are from high to low importance priority: 1) error rate on testing set, 2) processing time needed to recognize a segmented character and 3) the total training time and subsequently the total testing time. It is observed that bounded training methods accelerate the training process, while semi-third order training methods, next-minimal training methods, and preprocessing operations reduce the error rate on the testing set. Empirical observations suggest that two combinations of training methods are needed for different case character recognition. Since character segmentation is required for word and sentence recognition, this dissertation provides also an effective rule-based segmentation method, which is different from the conventional adaptive segmentation methods. Dictionary-based correction is utilized to correct mistakes resulting from the recognition and segmentation phases. The integration of the segmentation methods with the handwritten character recognition algorithm yielded an accuracy of 92% for lower case characters and 97% for upper case characters. In the testing phase, the database consists of 20,000 handwritten characters, with 10,000 for each case. The testing phase on the recognition 10,000 handwritten characters required 8.5 seconds in processing time.
APA, Harvard, Vancouver, ISO, and other styles
37

Massey, Louis. "A lazy text-based approach to foundational knowledge acquisition." Thesis, University of Ottawa (Canada), 1995. http://hdl.handle.net/10393/10084.

Full text
Abstract:
Knowledge Acquisition (KA) from text requires that a large quantity of prior knowledge be made available to the Natural Language Processing (NLP) system. This prior knowledge is called foundational knowledge. The question of where foundational knowledge comes from in the first place is one of the biggest problem facing NLP. Conventionally, foundational knowledge has been hand-crafted on a task- and domain-specific basis. However, it is difficult to determine beforehand exactly what knowledge will be required. It has been shown within the TANKA project that a potential solution to this problem is to use surface NLP. Surface NLP relies solely on syntax and on the help of a user to elicit knowledge from text, hence effectively eliminating the need for prior-hand crafting of foundational knowledge. However, the domain knowledge obtained in this manner from a text contains gaps. The work presented in this thesis consisted in finding a better method than prior hand-crafting to acquire the knowledge needed to fill those gaps. The method presented, called Lazy KA, uses examples (short NL stories) and failures of an explanation mechanism such as EBL to find these gaps and to interactively and incrementally learn the required new knowledge. When the explanation of a particular example fails, the user is guided through a process that leads to the acquisition of the missing knowledge. Initially, the user is heavily involved, but as more examples are processed, the user becomes less and less involved. The convergence hypothesis, that is that the user interventions would decrease as examples are processed, was verified experimentally by using the prototype system FOKAS implementing these ideas. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
38

Schierz, Amanda Claire. "Monitoring innovation in emerging science : a text-based approach." Thesis, University of Surrey, 2005. http://epubs.surrey.ac.uk/842836/.

Full text
Abstract:
The transfer of knowledge from academia to industry is of critical importance to both academics and industrialists. It can be argued that patent documents referring to a set of well-researched concepts may be used as a measure of such a transfer. Concepts are typically articulated as terms, and shared terms in research papers and patent documents are proposed as the monitoring index. Key developments in science and engineering are usually signalled by the introduction of new terms and the exclusion of established ones; this change in the terminology may be construed as a change in the knowledge in that field. Early identification of these changes may provide opportunities for innovation and enhance an organisation's competitive intelligence. A corpus linguistic approach has been taken to research the changes in terminology that occurred in the development of Artificial Intelligence since 1936. We have examined the terminology used in a sample of journal papers and patents and found that the terminological preferences of the authors change over time. Biological models of growth have been applied to model the diachronic changes, and the results show that the growth of term usage and the transfer of knowledge may be modelled by using logistic growth techniques.
APA, Harvard, Vancouver, ISO, and other styles
39

Scannavino, Katia Romero Felizardo. "Evidence-based software engineering: systematic literature review process based on visual text mining." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-18072012-102032/.

Full text
Abstract:
Context: Systematic literature review (SLR) is a methodology used to aggregate all relevant evidence of a specific research question. One of the activities associated with the SLR process is the selection of primary studies. The process used to select primary studies can be arduous, particularly when the researcher faces large volumes of primary studies. Another activity associated with an SLR is the presentation of results of the primary studies that meet the SLR purpose. The results are generally summarized in tables and an alternative to reduce the time consumed to understand the data is the use of graphic representations. Systematic mapping (SM) is a more open form of SLR used to build a classification and categorization scheme of a field of interest. The categorization and classification activities in SM are not trivial tasks, since they require manual effort and domain of knowledge by reviewers to achieve adequate results. Although clearly crucial, both SLR and SM processes are time-consuming and most activities are manually conducted. Objective: The aim of this research is to use Visual Text Mining (VTM) to support different activities of SLR and SM processes, e.g., support the selection of primary studies, the presentation of results of an SLR and the categorization and classification of an SM. Method: Extensions to the SLR and SM processes based on VTM were proposed. A series of case studies were conducted to demonstrate the usefulness of the VTM techniques in the selection, review, presentation of results and categorization context. Results: The findings have showed that the application of VTM is promising in terms of providing positive support to the study selection activity and that visual representations of SLR data have led to a reduction in the time taken for their analysis, with no loss of data comprehensibility. The application of VTM is relevant also in the context of SM. Conclusions: VTM techniques can be successfully employed to assist the SLR and SM processes
Contexto: Revisão Sistemática (RS) é uma metodologia utilizada para reunir evidências sobre uma quest~ao de pesquisa específica. Uma das atividades associadas à RS é a seleção de estudos primários. Quando o pesquisador se depara com grandes volumes de estudos, torna-se difícil selecionar artigos relevantes para uma análise mais aprofundada. Outra atividade associada à RS é a apresentação dos resultados dos estudos primários que atendem aos propósitos da RS. Os resultados são geralmente resumidos em tabelas e uma alternativa para reduzir o tempo consumido para entender os dados é o uso de representações gráficas. Mapeamento sistemático (MS) é uma forma mais aberta de RS, usado para construir um esquema de classificação e categorização sobre uma área de interesse. As atividades de categorização e classificação no MS não são tarefas triviais, pois exigem um esforço manual e conhecimento do domínio por parte dos revisores para a geração de resultados adequados. Embora relevantes, ambos os processos de RS e MS são demorados e muita das atividades são realizadas manualmente. Objetivo: O objetivo desta pesquisa é a utilização de Mineração Visual de Texto (VTM) para apoiar as diferentes atividades dos processos de RS e MS como, por exemplo, suporte à seleção de estudos primários, apresentação de resultados de RSs e a categorização e classificação em MSs. Métodos: Foram propostas extensões para os processos de RS e MS com base em VTM. Uma série de estudos de caso foram realizados para demonstrar a utilidade de técnicas VTM no contexto de seleção, revisão, apresentação de resultados e categorização. Resultados: Os resultados mostraram que a aplicação de VTM é promissora em termos de apoio positivo para a atividade de seleção de estudos primários e que o uso de representações visuais para apresentar resultados de RSs leva a uma redução do tempo necessário para sua análise, sem perda de compreensão de dados. A aplicação da VTM é relevante também no contexto da MS. Conclus~oes: Técnicas VTM podem ser empregadas com sucesso para ajudar nos processos de RS e MS
APA, Harvard, Vancouver, ISO, and other styles
40

Twengström, Moira, and Viktor Mörsell. "Evaluating regular and speech-based text entry for creation of smartphone based addresses." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-282810.

Full text
Abstract:
Billions of people on earth lack a home address. In this paper we are investigating an approach to solve this using an address system where addresses consists of a GPS location and a description of how you find your way to the house when you are within close distance to the GPS location. The aim of the paper is to measure if said description has higher quality when it’s given using speech-based or regular text entry. Our findings indicate that speech based text input gives 1.7 times more information in about 5.5 times less time. From a usability standpoint there was no indicated difference, but as the experiments were carried out during perfect conditions it is concluded that speech-based text entry would likely present more of a challenge for the users. When and if speech recognition is more widely adopted into systems for everyday use, speech-based text entry will be a good asset for increasing the amount of information collected from users in navigational contexts.
Det uppskattas att över en miljard människor lever utan en adress. Den här studien siktar till att förbättra en applösning som använder genererade adresser bestående av GPS-koordinater och en tillhörande beskrivning. Beskrivningen är menad att vägleda användaren när hon befinner sig i närområdet för att komplettera GPS-punktens eventuella osäkerhet. Syftet är att undersöka om en sådan beskrivning är av bättre kvalitet om den skapas med röstigenkänning än med vanlig text-input. Resultaten visar att röstbaserad input ger 1.7 gånger mer information än om användarna får skriva direkt i sin mobiltelefon och spenderar i snitt 5.5 gånger mindre tid med uppgiften. Användarnas utvärdering indikerar ingen skillnad i användarvänlighet, men eftersom experimenten utförts under perfekta förhållanden slås det fast att röstbaserad input förmodligen skulle innebära mer av en utmaning för användare. När och om röstigenkänning blir en mer integrerad del i vardagstekniken skulle röstbaserad text-input vara ett användbart medel att öka mängden information man får ut av användarnas egna beskrivningar.
APA, Harvard, Vancouver, ISO, and other styles
41

King, Karen D. "Comparison of Social Presence in Voice-based and Text-based Asynchronous computer Conferencing." NSUWorks, 2008. http://nsuworks.nova.edu/gscis_etd/637.

Full text
Abstract:
The significance of social presence in asynchronous computer conferencing has become an increasingly important factor in establishing high-quality online learning environments. Levels of social presence exhibited in asynchronous computer conferences influence students' perceptions of learning and satisfaction levels in a Web based course. Evidence in the literature supports the use of text-based asynchronous computer conferences to enhance learning in online learning environments. Recently, faculty teaching online courses have begun to use voice-based asynchronous conferencing tools with little research to support the appropriateness of the media. A quasi-experimental study design framed this examination of the levels of social presence as measured by interaction patterns in voice-based and text-based asynchronous computer conferences. Qualitative analysis of content transcripts representing voice based and text -based asynchronous computer conferences from one human physiology course at a state university located in the southeastern United States was examined in this study. The analysis was based on the affective, communicative reinforcement, and cohesive interactions as defined by Rourke, Anderson, Garrison, and Archer. A social density score was derived from transcripts. A multivariate analysis of variance was conducted to determine if there were significant differences in levels of social presence between voice-based and text-based asynchronous computer conferences. Results reported higher levels of affective and communicative reinforcement interactions in the text-based asynchronous computer conferences at a statistically significant level. Voice-based asynchronous computer conferences contained higher levels of cohesive interaction patterns, although levels were not statistically significant. Deployment of voice-based technology as a pedagogical tool is delivered at a considerable cost to higher education institutions. These tools are often marketed based on the effectiveness of the technology in a learning environment. However, according to this study, there is no apparent benefit in using voice-based rather than text-based technology tools to facilitate asynchronous computer conferences in a Web-based learning environment.
APA, Harvard, Vancouver, ISO, and other styles
42

Tran, Binh Giang. "Combining text-based and vision-based semantics." Master's thesis, 2011. http://www.nusl.cz/ntk/nusl-313292.

Full text
Abstract:
Learning and representing semantics is one of the most important tasks that significantly contribute to some growing areas, as successful stories in the recent survey of Turney and Pantel (2010). In this thesis, we present an in- novative (and first) framework for creating a multimodal distributional semantic model from state of the art text-and image-based semantic models. We evaluate this multimodal semantic model on simulating similarity judgements, concept clustering and the newly introduced BLESS benchmark. We also propose an effective algorithm, namely Parameter Estimation, to integrate text- and image- based features in order to have a robust multimodal system. By experiments, we show that our technique is very promising. Across all experiments, our best multimodal model claims the first position. By relatively comparing with other text-based models, we are justified to affirm that our model can stay in the top line with other state of the art models. We explore various types of visual features including SIFT and other color SIFT channels in order to have prelim- inary insights about how computer-vision techniques should be applied in the natural language processing domain. Importantly, in this thesis, we show evi- dences that adding visual features (as the perceptual information coming from...
APA, Harvard, Vancouver, ISO, and other styles
43

Zhong, Ming. "Concept-based biomedical text retrieval /." 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR29634.

Full text
Abstract:
Thesis (M.Sc.)--York University, 2007. Graduate Programme in Computer Science.
Typescript. Includes bibliographical references (leaves 96-101). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR29634
APA, Harvard, Vancouver, ISO, and other styles
44

Viswanath, Meghana. "Ontology-based automatic text summarization." 2009. http://purl.galileo.usg.edu/uga%5Fetd/viswanath%5Fmeghana%5F200912%5Fms.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Zeng, Shih-Chang, and 曾士昌. "A Sentences-Based Text Summarization Method." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/69039256186158365327.

Full text
Abstract:
碩士
國立屏東商業技術學院
資訊管理系(所)
99
Due to the prevalence of digital documents on the Internet, Text Summarization has been one of the important issues in many applications, like information retrieval, briefings generation, and text categorization. Traditional methods for generating summary for a single document rely heavily on term-by-sentence matrix, which is less intuitive. The semantic unit for human discourse is a sentence rather than a term. Therefore, in this thesis we proposed using sentence-by-paragraph matrix to conduct probabilistic latent semantic analysis to generate summary for a single document. The proposed method is called sentence-based probabilistic latent semantic analysis, or spLSA. To evaluate the performance of the proposed method, we implemented four other summarizers, including NTU-approach, Luhn, LSA and pLSA (term-based). The experimental task is to compare the classification accuracy on 50 academic papers. Three common classifiers are implemented, namely, K-nearest neighbors, Naïve Bayes, and Support Vector Machine, to eliminate possible moderator effects. Six experiments are conducted: the first experiment compare the performance of term-based pLSA (tpLSA) and sentence-based pLSA (spLSA). The result shows that spLSA outperform tpLSA largely in the processing speed and a slightly better in the accuracy. In the second experiment we compared spLSA with the four other summarizers both qualitatively and quantitatively. The result shows that spLSA can have a better quality and higher precision/recall values. Based on the above two experiments, we decided not to include tpLSA as the comparison target in the subsequent experiments, mainly because of its slow processing speed and poorer performance than its counterpart spLSA. The third experiment examined the effect of class distribution on the accuracy of classification. The result shows that all of the three classifiers misclassified documents toward the class whose number of document is the majority. The fourth experiment studied the effect of Term Expansion on the classification accuracy. The result shows that the classification accuracy has been improved significantly by Term Expansion. The fifth experiment examined the effect of feature dimension on the classification accuracy. The result shows that the accuracy can be boosted when the dimension is set to 100. The sixth experiment examined the effect of training size on the classification accuracy. We use 30% of the dataset as the training documents, and the remaining 70% as testing. The result shows that most of the classifiers maintain a same level of accuracy; some of them even surpass the original 60% training/40% test cases. Overall speaking our proposed spLSA performed better than other summarizers in the classification accuracy.
APA, Harvard, Vancouver, ISO, and other styles
46

Debnath, Sandip. "Automatic text-based explanation of events." 2005. http://etda.libraries.psu.edu/theses/approved/WorldWideIndex/ETD-1045/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Phillips, Dusty. "Improving mouse-based text selection interfaces /." 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR29603.

Full text
Abstract:
Thesis (M.Sc.)--York University, 2007. Graduate Programme in Computer Science and Engineering.
Typescript. Includes bibliographical references (leaves 105-108). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR29603
APA, Harvard, Vancouver, ISO, and other styles
48

Janik, Maciej. "Training-less ontology-based text categorization." 2008. http://purl.galileo.usg.edu/uga%5Fetd/janik%5Fmaciej%5Fg%5F200808%5Fphd.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Henriques, Daniel Filipe Rodrigues. "Automatic Completion of Text-based Tasks." Master's thesis, 2019. http://hdl.handle.net/10362/92296.

Full text
Abstract:
Crowdsourcing is a widespread problem-solving model which consists in assigning tasks to an existing pool of workers in order to solve a problem, being a scalable alternative to hiring a group of experts for labeling high volumes of data. It can provide results that are similar in quality, with the advantage of achieving such standards in a faster and more efficient manner. Modern approaches to crowdsourcing use Machine Learning models to do the labeling of the data and request the crowd to validate the results. Such approaches can only be applied if the data in which the model was trained (source data), and the data that needs labeling (target data) share some relation. Furthermore, since the model is not adapted to the target data, its predictions may produce a substantial amount of errors. Consequently, the validation of these predictions can be very time-consuming. In this thesis, we propose an approach that leverages in-domain data, which is a labeled portion of the target data, to adapt the model. The remainder of the data is labeled based on these model’s predictions. The crowd is tasked with the generation of the in-domain data and the validation of the model’s predictions. Under this approach, train the model with only in-domain data and with both in-domain data and data from an outer domain. We apply these learning settings with the intent of optimizing a crowdsourcing pipeline for the area of Natural Language Processing, more concretely for the task of Named Entity Recognition (NER). This optimization relates to the effort required by the crowd to performed the NER task. The results of the experiments show that the usage of in-domain data achieves effort savings ranging from 6% to 53%. Furthermore, we such savings in nine distinct datasets, which demonstrates the robustness and application depth of this approach. In conclusion, the in-domain data approach is capable of optimizing a crowdsourcing pipeline of NER. Furthermore, it has a broader range of use cases when compared to reusing a model to generate predictions in the target data.
APA, Harvard, Vancouver, ISO, and other styles
50

Kam, Ben W. Y. "Syntax-based Security Testing for Text-based Communication Protocols." Thesis, 2010. http://hdl.handle.net/1974/5652.

Full text
Abstract:
We introduce a novel Syntax-based Security Testing (SST) framework that uses a protocol specification to effectively perform security testing on text-based communication protocols. A protocol specification of a particular text-based protocol under-tested (TPUT) represents its syntactic grammar and static semantic contracts on the grammar. Mutators written in TXL break the syntactic and semantic constraints of the protocol specification to generate test cases. Different protocol specification testing strategies can be joined together to yield a compositional testing approach. SST is independent of any particular text-based protocols. The power of SST stems from the way it obtains test cases from the protocol specifications. We also use the robust parsing technique with TXL to parse a TPUT. SST has successfully revealed security faults in different text-based protocol applications such as web applications and kOganizer. We also demonstrate SST can mimic the venerable PROTOS Test-Suite: co-http-reply developed by University of Oulu.
Thesis (Ph.D, Computing) -- Queen's University, 2010-04-30 16:01:18.048
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography