Dissertations / Theses on the topic 'LM. Automatic text retrieval'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 38 dissertations / theses for your research on the topic 'LM. Automatic text retrieval.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Viana, Hugo Henrique Amorim. "Automatic information retrieval through text-mining." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/11308.
Full textNowadays, around a huge amount of firms in the European Union catalogued as Small and Medium Enterprises (SMEs), employ almost a great portion of the active workforce in Europe. Nonetheless, SMEs cannot afford implementing neither methods nor tools to systematically adapt innovation as a part of their business process. Innovation is the engine to be competitive in the globalized environment, especially in the current socio-economic situation. This thesis provides a platform that when integrated with ExtremeFactories(EF) project, aids SMEs to become more competitive by means of monitoring schedule functionality. In this thesis a text-mining platform that possesses the ability to schedule a gathering information through keywords is presented. In order to develop the platform, several choices concerning the implementation have been made, in the sense that one of them requires particular emphasis is the framework, Apache Lucene Core 2 by supplying an efficient text-mining tool and it is highly used for the purpose of the thesis.
Lee, Hyo Sook. "Automatic text processing for Korean language free text retrieval." Thesis, University of Sheffield, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.322916.
Full textKay, Roderick Neil. "Text analysis, summarising and retrieval." Thesis, University of Salford, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360435.
Full textGoyal, Pawan. "Analytic knowledge discovery techniques for ad-hoc information retrieval and automatic text summarization." Thesis, Ulster University, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.543897.
Full textMcMurtry, William F. "Information Retrieval for Call Center Quality Assurance." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587036885211228.
Full textBrucato, Matteo. "Temporal Information Retrieval." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/5690/.
Full textErmakova, Liana. "Short text contextualization in information retrieval : application to tweet contextualization and automatic query expansion." Thesis, Toulouse 2, 2016. http://www.theses.fr/2016TOU20023/document.
Full textThe efficient communication tends to follow the principle of the least effort. According to this principle, using a given language interlocutors do not want to work any harder than necessary to reach understanding. This fact leads to the extreme compression of texts especially in electronic communication, e.g. microblogs, SMS, search queries. However, sometimes these texts are not self-contained and need to be explained since understanding them requires knowledge of terminology, named entities or related facts. The main goal of this research is to provide a context to a user or a system from a textual resource.The first aim of this work is to help a user to better understand a short message by extracting a context from an external source like a text collection, the Web or the Wikipedia by means of text summarization. To this end we developed an approach for automatic multi-document summarization and we applied it to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm for smoothing from the local context. Our approach exploits topic-comment structure of a text. Moreover, we developed a graph-based algorithm for sentence reordering. The method has been evaluated at INEX/CLEF tweet contextualization track. We provide the evaluation results over the 4 years of the track. The method was also adapted to snippet retrieval. The evaluation results indicate good performance of the approach
Sequeira, José Francisco Rodrigues. "Automatic knowledge base construction from unstructured text." Master's thesis, Universidade de Aveiro, 2016. http://hdl.handle.net/10773/17910.
Full textTaking into account the overwhelming number of biomedical publications being produced, the effort required for a user to efficiently explore those publications in order to establish relationships between a wide range of concepts is staggering. This dissertation presents GRACE, a web-based platform that provides an advanced graphical exploration interface that allows users to traverse the biomedical domain in order to find explicit and latent associations between annotated biomedical concepts belonging to a variety of semantic types (e.g., Genes, Proteins, Disorders, Procedures and Anatomy). The knowledge base utilized is a collection of MEDLINE articles with English abstracts. These annotations are then stored in an efficient data storage that allows for complex queries and high-performance data delivery. Concept relationship are inferred through statistical analysis, applying association measures to annotated terms. These processes grant the graphical interface the ability to create, in real-time, a data visualization in the form of a graph for the exploration of these biomedical concept relationships.
Tendo em conta o crescimento do número de publicações biomédicas a serem produzidas todos os anos, o esforço exigido para que um utilizador consiga, de uma forma eficiente, explorar estas publicações para conseguir estabelecer associações entre um conjunto alargado de conceitos torna esta tarefa exaustiva. Nesta disertação apresentamos uma plataforma web chamada GRACE, que providencia uma interface gráfica de exploração que permite aos utilizadores navegar pelo domínio biomédico em busca de associações explícitas ou latentes entre conceitos biomédicos pertencentes a uma variedade de domínios semânticos (i.e., Genes, Proteínas, Doenças, Procedimentos e Anatomia). A base de conhecimento usada é uma coleção de artigos MEDLINE com resumos escritos na língua inglesa. Estas anotações são armazenadas numa base de dados que permite pesquisas complexas e obtenção de dados com alta performance. As relações entre conceitos são inferidas a partir de análise estatística, aplicando medidas de associações entre os conceitos anotados. Estes processos permitem à interface gráfica criar, em tempo real, uma visualização de dados, na forma de um grafo, para a exploração destas relações entre conceitos do domínio biomédico.
Lipani, Aldo. "Query rewriting in information retrieval: automatic context extraction from local user documents to improve query results." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amslaurea.unibo.it/4528/.
Full textMartinez-Alvarez, Miguel. "Knowledge-enhanced text classification : descriptive modelling and new approaches." Thesis, Queen Mary, University of London, 2014. http://qmro.qmul.ac.uk/xmlui/handle/123456789/27205.
Full textConteduca, Antonio. "L’uso di tecniche di similarità nell’editing di documenti fortemente strutturati." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19653/.
Full textSalvatori, Stefano. "Text-to-Image Information Retrieval Basato sul Transformer Lineare Performer: Sviluppo e Applicazioni per l'Industria della Moda." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full textBonzi, Francesco. "Lyrics Instrumentalness: An Automatic System for Vocal and Instrumental Recognition in Polyphonic Music with Deep Learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full textDutra, Marcio Branquinho. "Busca guiada de patentes de Bioinformática." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-07022014-150130/.
Full textPatents are temporary public licenses granted by the State to ensure to inventors and assignees economical exploration rights. Trademark and patent offices recommend to perform wide searches in different databases using classic patent search systems and specific tools before a patent\'s application. The goal of these searches is to ensure the invention has not been published yet, either in its original field or in other fields. Researches have shown the use of classification information improves the efficiency on searches for patents. The objetive of the research related to this work is to explore linguistic artifacts, Information Retrieval techniques and Automatic Classification techniques, to guide searches for Bioinformatics patents. The result of this work is the Bioinformatics Patent Search System (BPS), that uses automatic classification to guide searches for Bioinformatics patents. The utility of BPS is illustrated by a comparison with other patent search tools. In the future, BPS system must be experimented with more robust collections.
Artchounin, Daniel. "Tuning of machine learning algorithms for automatic bug assignment." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139230.
Full textPiscaglia, Nicola. "Deep Learning for Natural Language Processing: Novel State-of-the-art Solutions in Summarisation of Legal Case Reports." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20342/.
Full textDang, Quoc Bao. "Information spotting in huge repositories of scanned document images." Thesis, La Rochelle, 2018. http://www.theses.fr/2018LAROS024/document.
Full textThis work aims at developing a generic framework which is able to produce camera-based applications of information spotting in huge repositories of heterogeneous content document images via local descriptors. The targeted systems may take as input a portion of an image acquired as a query and the system is capable of returning focused portion of database image that match the query best. We firstly propose a set of generic feature descriptors for camera-based document images retrieval and spotting systems. Our proposed descriptors comprise SRIF, PSRIF, DELTRIF and SSKSRIF that are built from spatial space information of nearest keypoints around a keypoints which are extracted from centroids of connected components. From these keypoints, the invariant geometrical features are considered to be taken into account for the descriptor. SRIF and PSRIF are computed from a local set of m nearest keypoints around a keypoint. While DELTRIF and SSKSRIF can fix the way to combine local shape description without using parameter via Delaunay triangulation formed from a set of keypoints extracted from a document image. Furthermore, we propose a framework to compute the descriptors based on spatial space of dedicated keypoints e.g SURF or SIFT or ORB so that they can deal with heterogeneous-content camera-based document image retrieval and spotting. In practice, a large-scale indexing system with an enormous of descriptors put the burdens for memory when they are stored. In addition, high dimension of descriptors can make the accuracy of indexing reduce. We propose three robust indexing frameworks that can be employed without storing local descriptors in the memory for saving memory and speeding up retrieval time by discarding distance validating. The randomized clustering tree indexing inherits kd-tree, kmean-tree and random forest from the way to select K dimensions randomly combined with the highest variance dimension from each node of the tree. We also proposed the weighted Euclidean distance between two data points that is computed and oriented the highest variance dimension. The secondly proposed hashing relies on an indexing system that employs one simple hash table for indexing and retrieving without storing database descriptors. Besides, we propose an extended hashing based method for indexing multi-kinds of features coming from multi-layer of the image. Along with proposed descriptors as well indexing frameworks, we proposed a simple robust way to compute shape orientation of MSER regions so that they can combine with dedicated descriptors (e.g SIFT, SURF, ORB and etc.) rotation invariantly. In the case that descriptors are able to capture neighborhood information around MSER regions, we propose a way to extend MSER regions by increasing the radius of each region. This strategy can be also applied for other detected regions in order to make descriptors be more distinctive. Moreover, we employed the extended hashing based method for indexing multi-kinds of features from multi-layer of images. This system are not only applied for uniform feature type but also multiple feature types from multi-layers separated. Finally, in order to assess the performances of our contributions, and based on the assessment that no public dataset exists for camera-based document image retrieval and spotting systems, we built a new dataset which has been made freely and publicly available for the scientific community. This dataset contains portions of document images acquired via a camera as a query. It is composed of three kinds of information: textual content, graphical content and heterogeneous content
Wächter, Thomas. "Semi-automated Ontology Generation for Biocuration and Semantic Search." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-64838.
Full textVidal-Santos, Gerard. "Avaluació de processos de reconeixement d’entitats (NER) com a complement a interfícies de recuperació d’informació en dipòsits digitals complexos." Thesis, 2018. http://eprints.rclis.org/33589/1/VidalSantos_TFG_2018.pdf.
Full textVidal-Santos, Gerard. "Avaluació de processos de reconeixement d’entitats (NER) com a complement a interfícies de recuperació d’informació en dipòsits digitals complexos." Thesis, 2018. http://eprints.rclis.org/33692/1/VidalSantos_TFG_2018.pdf.
Full textÇapkın, Çağdaş. "Türkçe metin tabanlı açık arşivlerde kullanılan dizinleme yönteminin değerlendirilmesi / Evaluation of indexing method used in Turkish text-based open archives." Thesis, 2011. http://eprints.rclis.org/28804/1/Cagdas_CAPKIN_Yuksek_Lisans_tezi.pdf.
Full textSifuentes, Raul. "Determinación de las causas de búsquedas sin resultados en un catálogo bibliográfico en línea mediante el análisis de bitácoras de transacciones : el caso de la Pontificia Universidad Católica del Perú." Thesis, 2013. http://eprints.rclis.org/23857/1/SIFUENTES_ARROYO_RAUL_CATALOGO_BIBLIOGRAFICO.pdf.
Full textMoreira, Walter. "Biblioteca tradicional x biblioteca virtual: modelos de recuperação da informação." Thesis, 1998. http://eprints.rclis.org/8353/1/BibliotecaTradicionalXBibliotecaVirtual_ModelosDeRecuperacaoDaInformacao.pdf.
Full textYusan, Wang, and 王愚善. "Automatic Text Corpora Retrieval in Example-Based Machine Translation." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/67545771395405026129.
Full text國立成功大學
資訊管理研究所
89
Translation is often a matter of finding analogous examples in linguistic databanks and of discovering how a particular source language has been translated before. Example-based approaches for machine translation (MT) are generally viewed as alternatives to knowledge-based methods and the supplement of traditional rule-based methods, of which three major steps — analysis, transfer and generation — are included. Researchers have shown many advantages in the use of example-based approach for translation, one of which allows the user to select text corpora for a specific domain for the sake of efficiency in matching and the better quality of translation to the target language. However, the selection of text corpora is generally accomplished by those, if not a translator, who must be equipped with a good capability to read the source language as well as to justify the category of the text. As a result, the monolingual user is unable to take the advantage, but has to experience much longer time during the text corpus matching in the bilingual (or multilingual) knowledge base. In this study, the proposed approach, namely Automatic Text Corpora Retrieval (ATCR), is able to automate the process to identify the corpora to which the source text is mostly related.
"A concept-space based multi-document text summarizer." 2001. http://library.cuhk.edu.hk/record=b5890766.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references (leaves 88-94).
Abstracts in English and Chinese.
List of Figures --- p.vi
List of Tables --- p.vii
Chapter 1. --- INTRODUCTION --- p.1
Chapter 1.1 --- Information Overloading and Low Utilization --- p.2
Chapter 1.2 --- Problem Needs To Solve --- p.3
Chapter 1.3 --- Research Contributions --- p.4
Chapter 1.3.1 --- Using Concept Space in Summarization --- p.5
Chapter 1.3.2 --- New Extraction Method --- p.5
Chapter 1.3.3 --- Experiments on New System --- p.6
Chapter 1.4 --- Organization of This Thesis --- p.7
Chapter 2. --- LITERATURE REVIEW --- p.8
Chapter 2.1 --- Classical Approach --- p.8
Chapter 2.1.1 --- Luhn's Algorithm --- p.9
Chapter 2.1.2 --- Edumundson's Algorithm --- p.11
Chapter 2.2 --- Statistical Approach --- p.15
Chapter 2.3 --- Natural Language Processing Approach --- p.15
Chapter 3. --- PROPOSED SUMMARIZATION APPROACH --- p.18
Chapter 3.1 --- Direction of Summarization --- p.19
Chapter 3.2 --- Overview of Summarization Algorithm --- p.20
Chapter 3.2.1 --- Document Pre-processing --- p.21
Chapter 3.2.2 --- Vector Space Model --- p.23
Chapter 3.2.3 --- Sentence Extraction --- p.24
Chapter 3.3 --- Evaluation Method --- p.25
Chapter 3.3.1 --- "Recall, Precision and F-measure" --- p.25
Chapter 3.4 --- Advantage of Concept Space Approach --- p.26
Chapter 4. --- SYSTEM ARCHITECTURE --- p.27
Chapter 4.1 --- Converge Process --- p.28
Chapter 4.2 --- Diverge Process --- p.30
Chapter 4.3 --- Backward Search --- p.31
Chapter 5. --- CONVERGE PROCESS --- p.32
Chapter 5.1 --- Document Merging --- p.32
Chapter 5.2 --- Word Phrase Extraction --- p.34
Chapter 5.3 --- Automatic Indexing --- p.34
Chapter 5.4 --- Cluster Analysis --- p.35
Chapter 5.5 --- Hopfield Net Classification --- p.37
Chapter 6. --- DIVERGE PROCESS --- p.42
Chapter 6.1 --- Concept Terms Refinement --- p.42
Chapter 6.2 --- Sentence Selection --- p.43
Chapter 6.3 --- Backward Searching --- p.46
Chapter 7. --- EXPERIMENT AND RESEARCH FINDINGS --- p.48
Chapter 7.1 --- System-generated Summary v.s. Source Documents --- p.52
Chapter 7.1.1 --- Compression Ratio --- p.52
Chapter 7.1.2 --- Information Loss --- p.54
Chapter 7.2 --- System-generated Summary v.s. Human-generated Summary --- p.58
Chapter 7.2.1 --- Background of EXTRACTOR --- p.59
Chapter 7.2.2 --- Evaluation Method --- p.61
Chapter 7.3 --- Evaluation of different System-generated Summaries by Human Experts --- p.63
Chapter 8. --- CONCLUSIONS AND FUTURE RESEARCH --- p.68
Chapter 8.1 --- Conclusions --- p.68
Chapter 8.2 --- Future Work --- p.69
Chapter A. --- EXTRACTOR SYSTEM FLOW AND TEN-STEP PROCEDURE --- p.71
Chapter B. --- SUMMARY GENERATED BY MS WORD2000 --- p.75
Chapter C. --- SUMMARY GENERATED BY EXTRACTOR SOFTWARE --- p.76
Chapter D. --- SUMMARY GENERATED BY OUR SYSTEM --- p.77
Chapter E. --- SYSTEM-GENERATED WORD PHRASES FROM TEST SAMPLE --- p.78
Chapter F. --- WORD PHRASES IDENTIFIED BY SUBJECTS --- p.79
Chapter G. --- SAMPLE OF QUESTIONNAIRE --- p.84
Chapter H. --- RESULT OF QUESTIONNAIRE --- p.85
Chapter I. --- EVALUATION FOR DIVERGE PROCESS --- p.86
BIBLIOGRAPHY --- p.88
"A probabilistic approach for automatic text filtering." 1998. http://library.cuhk.edu.hk/record=b5889506.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 1998.
Includes bibliographical references (leaves 165-168).
Abstract also in Chinese.
Abstract --- p.i
Acknowledgment --- p.iv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Overview of Information Filtering --- p.1
Chapter 1.2 --- Contributions --- p.4
Chapter 1.3 --- Organization of this thesis --- p.6
Chapter 2 --- Existing Approaches --- p.7
Chapter 2.1 --- Representational issues --- p.7
Chapter 2.1.1 --- Document Representation --- p.7
Chapter 2.1.2 --- Feature Selection --- p.11
Chapter 2.2 --- Traditional Approaches --- p.15
Chapter 2.2.1 --- NewsWeeder --- p.15
Chapter 2.2.2 --- NewT --- p.17
Chapter 2.2.3 --- SIFT --- p.19
Chapter 2.2.4 --- InRoute --- p.20
Chapter 2.2.5 --- Motivation of Our Approach --- p.21
Chapter 2.3 --- Probabilistic Approaches --- p.23
Chapter 2.3.1 --- The Naive Bayesian Approach --- p.25
Chapter 2.3.2 --- The Bayesian Independence Classifier Approach --- p.28
Chapter 2.4 --- Comparison --- p.31
Chapter 3 --- Our Bayesian Network Approach --- p.33
Chapter 3.1 --- Backgrounds of Bayesian Networks --- p.34
Chapter 3.2 --- Bayesian Network Induction Approach --- p.36
Chapter 3.3 --- Automatic Construction of Bayesian Networks --- p.38
Chapter 4 --- Automatic Feature Discretization --- p.50
Chapter 4.1 --- Predefined Level Discretization --- p.52
Chapter 4.2 --- Lloyd's algorithm . . > --- p.53
Chapter 4.3 --- Class Dependence Discretization --- p.55
Chapter 5 --- Experiments and Results --- p.59
Chapter 5.1 --- Document Collections --- p.60
Chapter 5.2 --- Batch Filtering Experiments --- p.63
Chapter 5.3 --- Batch Filtering Results --- p.65
Chapter 5.4 --- Incremental Session Filtering Experiments --- p.87
Chapter 5.5 --- Incremental Session Filtering Results --- p.88
Chapter 6 --- Conclusions and Future Work --- p.105
Appendix A --- p.107
Appendix B --- p.116
Appendix C --- p.126
Appendix D --- p.131
Appendix E --- p.145
"New learning strategies for automatic text categorization." 2001. http://library.cuhk.edu.hk/record=b5890838.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references (leaves 125-130).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Automatic Textual Document Categorization --- p.1
Chapter 1.2 --- Meta-Learning Approach For Text Categorization --- p.3
Chapter 1.3 --- Contributions --- p.6
Chapter 1.4 --- Organization of the Thesis --- p.7
Chapter 2 --- Related Work --- p.9
Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9
Chapter 2.2 --- Existing Meta-Learning Approaches For Information Retrieval --- p.14
Chapter 2.3 --- Our Meta-Learning Approaches --- p.20
Chapter 3 --- Document Pre-Processing --- p.22
Chapter 3.1 --- Document Representation --- p.22
Chapter 3.2 --- Classification Scheme Learning Strategy --- p.25
Chapter 4 --- Linear Combination Approach --- p.30
Chapter 4.1 --- Overview --- p.30
Chapter 4.2 --- Linear Combination Approach - The Algorithm --- p.33
Chapter 4.2.1 --- Equal Weighting Strategy --- p.34
Chapter 4.2.2 --- Weighting Strategy Based On Utility Measure --- p.34
Chapter 4.2.3 --- Weighting Strategy Based On Document Rank --- p.35
Chapter 4.3 --- Comparisons of Linear Combination Approach and Existing Meta-Learning Methods --- p.36
Chapter 4.3.1 --- LC versus Simple Majority Voting --- p.36
Chapter 4.3.2 --- LC versus BORG --- p.38
Chapter 4.3.3 --- LC versus Restricted Linear Combination Method --- p.38
Chapter 5 --- The New Meta-Learning Model - MUDOF --- p.40
Chapter 5.1 --- Overview --- p.41
Chapter 5.2 --- Document Feature Characteristics --- p.42
Chapter 5.3 --- Classification Errors --- p.44
Chapter 5.4 --- Linear Regression Model --- p.45
Chapter 5.5 --- The MUDOF Algorithm --- p.47
Chapter 6 --- Incorporating MUDOF into Linear Combination approach --- p.52
Chapter 6.1 --- Background --- p.52
Chapter 6.2 --- Overview of MUDOF2 --- p.54
Chapter 6.3 --- Major Components of the MUDOF2 --- p.57
Chapter 6.4 --- The MUDOF2 Algorithm --- p.59
Chapter 7 --- Experimental Setup --- p.66
Chapter 7.1 --- Document Collection --- p.66
Chapter 7.2 --- Evaluation Metric --- p.68
Chapter 7.3 --- Component Classification Algorithms --- p.71
Chapter 7.4 --- Categorical Document Feature Characteristics for MUDOF and MUDOF2 --- p.72
Chapter 8 --- Experimental Results and Analysis --- p.74
Chapter 8.1 --- Performance of Linear Combination Approach --- p.74
Chapter 8.2 --- Performance of the MUDOF Approach --- p.78
Chapter 8.3 --- Performance of MUDOF2 Approach --- p.87
Chapter 9 --- Conclusions and Future Work --- p.96
Chapter 9.1 --- Conclusions --- p.96
Chapter 9.2 --- Future Work --- p.98
Chapter A --- Details of Experimental Results for Reuters-21578 corpus --- p.99
Chapter B --- Details of Experimental Results for OHSUMED corpus --- p.114
Bibliography --- p.125
"Automatic text categorization for information filtering." 1998. http://library.cuhk.edu.hk/record=b5889734.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 1998.
Includes bibliographical references (leaves 157-163).
Abstract also in Chinese.
Abstract --- p.i
Acknowledgment --- p.iii
List of Figures --- p.viii
List of Tables --- p.xiv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Automatic Document Categorization --- p.1
Chapter 1.2 --- Information Filtering --- p.3
Chapter 1.3 --- Contributions --- p.6
Chapter 1.4 --- Organization of the Thesis --- p.7
Chapter 2 --- Related Work --- p.9
Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9
Chapter 2.1.1 --- Rule-Based Approach --- p.10
Chapter 2.1.2 --- Similarity-Based Approach --- p.13
Chapter 2.2 --- Existing Information Filtering Approaches --- p.19
Chapter 2.2.1 --- Information Filtering Systems --- p.19
Chapter 2.2.2 --- Filtering in TREC --- p.21
Chapter 3 --- Document Pre-Processing --- p.23
Chapter 3.1 --- Document Representation --- p.23
Chapter 3.2 --- Classification Scheme Learning Strategy --- p.26
Chapter 4 --- A New Approach - IBRI --- p.31
Chapter 4.1 --- Overview of Our New IBRI Approach --- p.31
Chapter 4.2 --- The IBRI Representation and Definitions --- p.34
Chapter 4.3 --- The IBRI Learning Algorithm --- p.37
Chapter 5 --- IBRI Experiments --- p.43
Chapter 5.1 --- Experimental Setup --- p.43
Chapter 5.2 --- Evaluation Metric --- p.45
Chapter 5.3 --- Results --- p.46
Chapter 6 --- A New Approach - GIS --- p.50
Chapter 6.1 --- Motivation of GIS --- p.50
Chapter 6.2 --- Similarity-Based Learning --- p.51
Chapter 6.3 --- The Generalized Instance Set Algorithm (GIS) --- p.58
Chapter 6.4 --- Using GIS Classifiers for Classification --- p.63
Chapter 6.5 --- Time Complexity --- p.64
Chapter 7 --- GIS Experiments --- p.68
Chapter 7.1 --- Experimental Setup --- p.68
Chapter 7.2 --- Results --- p.73
Chapter 8 --- A New Information Filtering Approach Based on GIS --- p.87
Chapter 8.1 --- Information Filtering Systems --- p.87
Chapter 8.2 --- GIS-Based Information Filtering --- p.90
Chapter 9 --- Experiments on GIS-based Information Filtering --- p.95
Chapter 9.1 --- Experimental Setup --- p.95
Chapter 9.2 --- Results --- p.100
Chapter 10 --- Conclusions and Future Work --- p.108
Chapter 10.1 --- Conclusions --- p.108
Chapter 10.2 --- Future Work --- p.110
Chapter A --- Sample Documents in the corpora --- p.111
Chapter B --- Details of Experimental Results of GIS --- p.120
Chapter C --- Computational Time of Reuters-21578 Experiments --- p.141
Kuan-MingChou and 周冠銘. "Using automatic keywords extraction and text clustering methods for medical information retrieval improvement." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/80362319360586009723.
Full text國立成功大學
醫學資訊研究所
101
Because there are huge data on the web, it will get many duplicate and near-duplicate search results when we search on the web. The motivation of this thesis is that reduce the time of filtering the huge duplicate and near-duplicate information when user search. In this thesis, we propose a novel clustering method to solve near-duplicate problem. Our method transforms each document to a feature vector, where the weights are terms frequency of each corresponding words. For reducing the dimension of these feature vectors, we used principle component analysis to transform these vectors to another space. After PCA, we used cosine similarity to compute the similarity of each document. And then, we used EM algorithm and Neyman-Pearson hypothesis test to cluster the duplicate documents. We compared out results with K-means method results. The experiments show that our method is outperformer than K-means method.
"Automatic index generation for the free-text based database." Chinese University of Hong Kong, 1992. http://library.cuhk.edu.hk/record=b5887040.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 1992.
Includes bibliographical references (leaves 183-184).
Chapter Chapter one: --- Introduction --- p.1
Chapter Chapter two: --- Background knowledge and linguistic approaches of automatic indexing --- p.5
Chapter 2.1 --- Definition of index and indexing --- p.5
Chapter 2.2 --- Indexing methods and problems --- p.7
Chapter 2.3 --- Automatic indexing and human indexing --- p.8
Chapter 2.4 --- Different approaches of automatic indexing --- p.10
Chapter 2.5 --- Example of semantic approach --- p.11
Chapter 2.6 --- Example of syntactic approach --- p.14
Chapter 2.7 --- Comments on semantic and syntactic approaches --- p.18
Chapter Chapter three: --- Rationale and methodology of automatic index generation --- p.19
Chapter 3.1 --- Problems caused by natural language --- p.19
Chapter 3.2 --- Usage of word frequencies --- p.20
Chapter 3.3 --- Brief description of rationale --- p.24
Chapter 3.4 --- Automatic index generation --- p.27
Chapter 3.4.1 --- Training phase --- p.27
Chapter 3.4.1.1 --- Selection of training documents --- p.28
Chapter 3.4.1.2 --- Control and standardization of variants of words --- p.28
Chapter 3.4.1.3 --- Calculation of associations between words and indexes --- p.30
Chapter 3.4.1.4 --- Discarding false associations --- p.33
Chapter 3.4.2 --- Indexing phase --- p.38
Chapter 3.4.3 --- Example of automatic indexing --- p.41
Chapter 3.5 --- Related researches --- p.44
Chapter 3.6 --- Word diversity and its effect on automatic indexing --- p.46
Chapter 3.7 --- Factors affecting performance of automatic indexing --- p.60
Chapter 3.8 --- Application of semantic representation --- p.61
Chapter 3.8.1 --- Problem of natural language --- p.61
Chapter 3.8.2 --- Use of concept headings --- p.62
Chapter 3.8.3 --- Example of using concept headings in automatic indexing --- p.65
Chapter 3.8.4 --- Advantages of concept headings --- p.68
Chapter 3.8.5 --- Disadvantages of concept headings --- p.69
Chapter 3.9 --- Correctness prediction for proposed indexes --- p.78
Chapter 3.9.1 --- Example of using index proposing rate --- p.80
Chapter 3.10 --- Effect of subject matter on automatic indexing --- p.83
Chapter 3.11 --- Comparison with other indexing methods --- p.85
Chapter 3.12 --- Proposal for applying Chinese medical knowledge --- p.90
Chapter Chapter four: --- Simulations of automatic index generation --- p.93
Chapter 4.1 --- Training phase simulations --- p.93
Chapter 4.1.1 --- Simulation of association calculation (word diversity uncontrolled) --- p.94
Chapter 4.1.2 --- Simulation of association calculation (word diversity controlled) --- p.102
Chapter 4.1.3 --- Simulation of discarding false associations --- p.107
Chapter 4.2 --- Indexing phase simulation --- p.115
Chapter 4.3 --- Simulation of using concept headings --- p.120
Chapter 4.4 --- Simulation for testing performance of predicting index correctness --- p.125
Chapter 4.5 --- Summary --- p.128
Chapter Chapter five: --- Real case study in database of Chinese Medicinal Material Research Center --- p.130
Chapter 5.1 --- Selection of real documents --- p.130
Chapter 5.2 --- Case study one: Overall performance using real data --- p.132
Chapter 5.2.1 --- Sample results of automatic indexing for real documents --- p.138
Chapter 5.3 --- Case study two: Using multi-word terms --- p.148
Chapter 5.4 --- Case study three: Using concept headings --- p.152
Chapter 5.5 --- Case study four: Prediction of proposed index correctness --- p.156
Chapter 5.6 --- Case study five: Use of (Σ ΔRij) Fi to determine false association --- p.159
Chapter 5.7 --- Case study six: Effect of word diversity --- p.162
Chapter 5.8 --- Summary --- p.166
Chapter Chapter six: --- Conclusion --- p.168
Appendix A: List of stopwords --- p.173
Appendix B: Index terms used in case studies --- p.174
References --- p.183
(9761117), Shayan Ali A. Akbar. "Source code search for automatic bug localization." Thesis, 2020.
Find full textKhoo, Christopher S. G. "Automatic identification of causal relations in text and their use for improving precision in information retrieval." Thesis, 1995. http://hdl.handle.net/10150/105106.
Full textThis study represents one attempt to make use of relations expressed in text to improve information retrieval effectiveness. In particular, the study investigated whether the information obtained by matching causal relations expressed in documents with the causal relations expressed in users' queries could be used to improve document retrieval results in comparison to using just term matching without considering relations. An automatic method for identifying and extracting cause-effect information in Wall Street Journal text was developed. The method uses linguistic clues to identify causal relations without recourse to knowledge-based inferencing. The method was successful in identifying and extracting about 68% of the causal relations that were clearly expressed within a sentence or between adjacent sentences in Wall Street Journal text. Of the instances that the computer program identified as causal relations, 72% can be considered to be correct. The automatic method was used in an experimental information retrieval system to identify causal relations in a database of full-text Wall Street Journal documents. Causal relation matching was found to yield a small but significant improvement in retrieval results when the weights used for combining the scores from different types of matching were customized for each query -- as in an SDI or routing queries situation. The best results were obtained when causal relation matching was combined with word proximity matching (matching pairs of causally related words in the query with pairs of words that co-occur within document sentences). An analysis using manually identified causal relations indicate that bigger retrieval improvements can be expected with more accurate identification of causal relations. The best kind of causal relation matching was found to be one in which one member of the causal relation (either the cause or the effect) was represented as a wildcard that could match with any term. The study also investigated whether using Roget's International Thesaurus (3rd ed.) to expand query terms with synonymous and related terms would improve retrieval effectiveness. Using Roget category codes in addition to keywords did give better retrieval results. However, the Roget codes were better at identifying the non-relevant documents than the relevant ones. Parts of the thesis were published in: 1. Khoo, C., Myaeng, S.H., & Oddy, R. (2001). Using cause-effect relations in text to improve information retrieval precision. Information Processing and Management, 37(1), 119-145. 2. Khoo, C., Kornfilt, J., Oddy, R., & Myaeng, S.H. (1998). Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary & Linguistic Computing, 13(4), 177-186. 3. Khoo, C. (1997). The use of relation matching in information retrieval. LIBRES: Library and Information Science Research Electronic Journal [Online], 7(2). Available at: http://aztec.lib.utk.edu/libres/libre7n2/. An update of the literature review on causal relations in text was published in: Khoo, C., Chan, S., & Niu, Y. (2002). The many facets of the cause-effect relation. In R.Green, C.A. Bean & S.H. Myaeng (Eds.), The semantics of relationships: An interdisciplinary perspective (pp. 51-70). Dordrecht: Kluwer
Williams, Kyle. "Learning to Read Bushman: Automatic Handwriting Recognition for Bushman Languages." Thesis, 2012. http://pubs.cs.uct.ac.za/archive/00000791/.
Full text"Automatic construction of wrappers for semi-structured documents." 2001. http://library.cuhk.edu.hk/record=b5890663.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references (leaves 114-123).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Information Extraction --- p.1
Chapter 1.2 --- IE from Semi-structured Documents --- p.3
Chapter 1.3 --- Thesis Contributions --- p.7
Chapter 1.4 --- Thesis Organization --- p.9
Chapter 2 --- Related Work --- p.11
Chapter 2.1 --- Existing Approaches --- p.11
Chapter 2.2 --- Limitations of Existing Approaches --- p.18
Chapter 2.3 --- Our HISER Approach --- p.20
Chapter 3 --- System Overview --- p.23
Chapter 3.1 --- Hierarchical record Structure and Extraction Rule learning (HISER) --- p.23
Chapter 3.2 --- Hierarchical Record Structure --- p.29
Chapter 3.3 --- Extraction Rule --- p.29
Chapter 3.4 --- Wrapper Adaptation --- p.32
Chapter 4 --- Automatic Hierarchical Record Structure Construction --- p.34
Chapter 4.1 --- Motivation --- p.34
Chapter 4.2 --- Hierarchical Record Structure Representation --- p.36
Chapter 4.3 --- Constructing Hierarchical Record Structure --- p.38
Chapter 5 --- Extraction Rule Induction --- p.43
Chapter 5.1 --- Rule Representation --- p.43
Chapter 5.2 --- Extraction Rule Induction Algorithm --- p.47
Chapter 6 --- Experimental Results of Wrapper Learning --- p.54
Chapter 6.1 --- Experimental Methodology --- p.54
Chapter 6.2 --- Results on Electronic Appliance Catalogs --- p.56
Chapter 6.3 --- Results on Book Catalogs --- p.60
Chapter 6.4 --- Results on Seminar Announcements --- p.62
Chapter 7 --- Adapting Wrappers to Unseen Information Sources --- p.69
Chapter 7.1 --- Motivation --- p.69
Chapter 7.2 --- Support Vector Machines --- p.72
Chapter 7.3 --- Feature Selection --- p.76
Chapter 7.4 --- Automatic Annotation of Training Examples --- p.80
Chapter 7.4.1 --- Building SVM Models --- p.81
Chapter 7.4.2 --- Seeking Potential Training Example Candidates --- p.82
Chapter 7.4.3 --- Classifying Potential Training Examples --- p.84
Chapter 8 --- Experimental Results of Wrapper Adaptation --- p.86
Chapter 8.1 --- Experimental Methodology --- p.86
Chapter 8.2 --- Results on Electronic Appliance Catalogs --- p.89
Chapter 8.3 --- Results on Book Catalogs --- p.93
Chapter 9 --- Conclusions and Future Work --- p.97
Chapter 9.1 --- Conclusions --- p.97
Chapter 9.2 --- Future Work --- p.100
Chapter A --- Sample Experimental Pages --- p.101
Chapter B --- Detailed Experimental Results of Wrapper Adaptation of HISER --- p.109
Bibliography --- p.114
"Automatic construction and adaptation of wrappers for semi-structured web documents." 2003. http://library.cuhk.edu.hk/record=b5891460.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 88-94).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Wrapper Induction for Semi-structured Web Documents --- p.1
Chapter 1.2 --- Adapting Wrappers to Unseen Web Sites --- p.6
Chapter 1.3 --- Thesis Contributions --- p.7
Chapter 1.4 --- Thesis Organization --- p.8
Chapter 2 --- Related Work --- p.10
Chapter 2.1 --- Related Work on Wrapper Induction --- p.10
Chapter 2.2 --- Related Work on Wrapper Adaptation --- p.16
Chapter 3 --- Automatic Construction of Hierarchical Wrappers --- p.20
Chapter 3.1 --- Hierarchical Record Structure Inference --- p.22
Chapter 3.2 --- Extraction Rule Induction --- p.30
Chapter 3.3 --- Applying Hierarchical Wrappers --- p.38
Chapter 4 --- Experimental Results for Wrapper Induction --- p.40
Chapter 5 --- Adaptation of Wrappers for Unseen Web Sites --- p.52
Chapter 5.1 --- Problem Definition --- p.52
Chapter 5.2 --- Overview of Wrapper Adaptation Framework --- p.55
Chapter 5.3 --- Potential Training Example Candidate Identification --- p.58
Chapter 5.3.1 --- Useful Text Fragments --- p.58
Chapter 5.3.2 --- Training Example Generation from the Unseen Web Site --- p.60
Chapter 5.3.3 --- Modified Nearest Neighbour Classification --- p.63
Chapter 5.4 --- Machine Annotated Training Example Discovery and New Wrap- per Learning --- p.64
Chapter 5.4.1 --- Text Fragment Classification --- p.64
Chapter 5.4.2 --- New Wrapper Learning --- p.69
Chapter 6 --- Case Study and Experimental Results for Wrapper Adapta- tion --- p.71
Chapter 6.1 --- Case Study on Wrapper Adaptation --- p.71
Chapter 6.2 --- Experimental Results --- p.73
Chapter 6.2.1 --- Book Domain --- p.74
Chapter 6.2.2 --- Consumer Electronic Appliance Domain --- p.79
Chapter 7 --- Conclusions and Future Work --- p.83
Bibliography --- p.88
Chapter A --- Detailed Performance of Wrapper Induction for Book Do- main --- p.95
Chapter B --- Detailed Performance of Wrapper Induction for Consumer Electronic Appliance Domain --- p.99
"Statistical modeling for lexical chains for automatic Chinese news story segmentation." 2010. http://library.cuhk.edu.hk/record=b5894500.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2010.
Includes bibliographical references (leaves 106-114).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgements --- p.v
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Problem Statement --- p.2
Chapter 1.2 --- Motivation for Story Segmentation --- p.4
Chapter 1.3 --- Terminologies --- p.5
Chapter 1.4 --- Thesis Goals --- p.6
Chapter 1.5 --- Thesis Organization --- p.8
Chapter 2 --- Background Study --- p.9
Chapter 2.1 --- Coherence-based Approaches --- p.10
Chapter 2.1.1 --- Defining Coherence --- p.10
Chapter 2.1.2 --- Lexical Chaining --- p.12
Chapter 2.1.3 --- Cosine Similarity --- p.15
Chapter 2.1.4 --- Language Modeling --- p.19
Chapter 2.2 --- Feature-based Approaches --- p.21
Chapter 2.2.1 --- Lexical Cues --- p.22
Chapter 2.2.2 --- Audio Cues --- p.23
Chapter 2.2.3 --- Video Cues --- p.24
Chapter 2.3 --- Pros and Cons and Hybrid Approaches --- p.25
Chapter 2.4 --- Chapter Summary --- p.27
Chapter 3 --- Experimental Corpora --- p.29
Chapter 3.1 --- The TDT2 and TDT3 Multi-language Text Corpus --- p.29
Chapter 3.1.1 --- Introduction --- p.29
Chapter 3.1.2 --- Program Particulars and Structures --- p.31
Chapter 3.2 --- Data Preprocessing --- p.33
Chapter 3.2.1 --- Challenges of Lexical Chain Formation on Chi- nese Text --- p.33
Chapter 3.2.2 --- Word Segmentation for Word Units Extraction --- p.35
Chapter 3.2.3 --- Part-of-speech Tagging for Candidate Words Ex- traction --- p.36
Chapter 3.3 --- Chapter Summary --- p.37
Chapter 4 --- Indication of Lexical Cohesiveness by Lexical Chains --- p.39
Chapter 4.1 --- Lexical Chain as a Representation of Cohesiveness --- p.40
Chapter 4.1.1 --- Choice of Word Relations for Lexical Chaining --- p.41
Chapter 4.1.2 --- Lexical Chaining by Connecting Repeated Lexi- cal Elements --- p.43
Chapter 4.2 --- Lexical Chain as an Indicator of Story Segments --- p.48
Chapter 4.2.1 --- Indicators of Absence of Cohesiveness --- p.49
Chapter 4.2.2 --- Indicator of Continuation of Cohesiveness --- p.58
Chapter 4.3 --- Chapter Summary --- p.62
Chapter 5 --- Indication of Story Boundaries by Lexical Chains --- p.63
Chapter 5.1 --- Formal Definition of the Classification Procedures --- p.64
Chapter 5.2 --- Theoretical Framework for Segmentation Based on Lex- ical Chaining --- p.65
Chapter 5.2.1 --- Evaluation of Story Segmentation Accuracy --- p.65
Chapter 5.2.2 --- Previous Approach of Story Segmentation Based on Lexical Chaining --- p.66
Chapter 5.2.3 --- Statistical Framework for Story Segmentation based on Lexical Chaining --- p.69
Chapter 5.2.4 --- Post Processing of Ratio for Boundary Identifi- cation --- p.73
Chapter 5.3 --- Comparing Segmentation Models --- p.75
Chapter 5.4 --- Chapter Summary --- p.79
Chapter 6 --- Analysis of Lexical Chains Features as Boundary Indi- cators --- p.80
Chapter 6.1 --- Error Analysis --- p.81
Chapter 6.2 --- Window Length in the LRT Model --- p.82
Chapter 6.3 --- The Relative Importance of Each Set of Features --- p.84
Chapter 6.4 --- The Effect of Removing Timing Information --- p.92
Chapter 6.5 --- Chapter Summary --- p.96
Chapter 7 --- Conclusions and Future Work --- p.98
Chapter 7.1 --- Contributions --- p.98
Chapter 7.2 --- Future Works --- p.100
Chapter 7.2.1 --- Further Extension of the Framework --- p.100
Chapter 7.2.2 --- Wider Applications of the Framework --- p.105
Bibliography --- p.106
Cerdeirinha, João Manuel Macedo. "Recuperação de imagens digitais com base no conteúdo: estudo na Biblioteca de Arte e Arquivos da Fundação Calouste Gulbenkian." Master's thesis, 2019. http://hdl.handle.net/10362/91474.
Full textThe massive growth of multimedia data on the Internet and the emergence of new sharing platforms created major challenges for information retrieval. The limitations of text-based searches for this type of content have led to the development of a content-based information retrieval approach that has received increasing attention in recent decades. Taking into account the research carried out in this area, and digital images being the focus of this research, concepts and techniques associated with this approach are explored through a theoretical survey that reports the evolution of information retrieval and the importance that this subject has for Information Management and Curation. In the context of the systems that have been developed using automatic indexing, the various applications of this type of process are indicated. Available CBIR tools are also identified for a case study of the application of this type of image retrieval in the context of the Art Library and Archives of the Calouste Gulbenkian Foundation and the photographic collections that it holds in its resources, considering the particularities of the institution to which they belong. For the intended demonstration and according to the established criteria, online CBIR tools were initially used and, in the following phase, locally installed software was selected to search and retrieve in a specific collection. Through this case study, the strengths and weaknesses of content-based image retrieval are attested against the more traditional approach based on textual metadata currently in use in these collections. Taking into consideration the needs of users of the systems in which these digital objects are indexed, combining these techniques may lead to more satisfactory results.
Wächter, Thomas. "Semi-automated Ontology Generation for Biocuration and Semantic Search." Doctoral thesis, 2010. https://tud.qucosa.de/id/qucosa%3A25496.
Full text