Dissertations / Theses on the topic 'Query expansion'

To see the other types of publications on this topic, follow the link: Query expansion.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Query expansion.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Billerbeck, Bodo, and bodob@cs rmit edu au. "Efficient Query Expansion." RMIT University. Computer Science and Information Technology, 2006. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20060825.154852.

Full text
Abstract:
Hundreds of millions of users each day search the web and other repositories to meet their information needs. However, queries can fail to find documents due to a mismatch in terminology. Query expansion seeks to address this problem by automatically adding terms from highly ranked documents to the query. While query expansion has been shown to be effective at improving query performance, the gain in effectiveness comes at a cost: expansion is slow and resource-intensive. Current techniques for query expansion use fixed values for key parameters, determined by tuning on test collections. We show that these parameters may not be generally applicable, and, more significantly, that the assumption that the same parameter settings can be used for all queries is invalid. Using detailed experiments, we demonstrate that new methods for choosing parameters must be found. In conventional approaches to query expansion, the additional terms are selected from highly ranked documents returned from an initial retrieval run. We demonstrate a new method of obtaining expansion terms, based on past user queries that are associated with documents in the collection. The most effective query expansion methods rely on costly retrieval and processing of feedback documents. We explore alternative methods for reducing query-evaluation costs, and propose a new method based on keeping a brief summary of each document in memory. This method allows query expansion to proceed three times faster than previously, while approximating the effectiveness of standard expansion. We investigate the use of document expansion, in which documents are augmented with related terms extracted from the corpus during indexing, as an alternative to query expansion. The overheads at query time are small. We propose and explore a range of corpus-based document expansion techniques and compare them to corpus-based query expansion on TREC data. These experiments show that document expansion delivers at best limited benefits, while query expansion � including standard techniques and efficient approaches described in recent work � usually delivers good gains. We conclude that document expansion is unpromising, but it is likely that the efficiency of query expansion can be further improved.
APA, Harvard, Vancouver, ISO, and other styles
2

Ekberg-Selander, Karin, and Johanna Enberg. "Query Expansion : en jämförande studie av Automatisk Query Expansion med och utan relevans-feedback." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-18416.

Full text
Abstract:
In query expansion (QE) terms are added to an initial query in order to improve retrieval effectiveness. In this thesis we use QE in the sense that a reformulation of the query is done by deleting the terms in the initial query and instead replacing them with terms from the documents retrieved in the initial run. The aim of this thesis is to, in a experimental full text invironment, study and compare the retrieval result of two different query expansion strategies in relation to each other. The following questions are addressed by the study: How do the two strategies perform in relation to each other regarding recall?What may be causing the result?Are the two strategies retrieving the same relevant documents?Two strategies are designed to simulate a searcher using automatic query expansion (AQE) either with or without relevance feedback. Strategy I is simulating AQE without relevance feedback by taking the top five documents that are retrieved in the initial run and then extracting the top ten most frequently occurring terms in these to create a new query. Correspondingly the Strategy II, is simulating AQE with relevance feedback by taking the top five relevant documents and extracting the top ten terms in these to create a new query. It is concluded that both of the strategies’ retrieval performance was improved for most of the topics. In average Strategy II did achieve 54.63 percent recall compared to Strategy I which did achieve 45.59 percent recall. The two strategies did retrieve different relevant documents for majority of the topics. Hence, it would be reasonable to base a system on both of them.
Uppsatsnivå: D
APA, Harvard, Vancouver, ISO, and other styles
3

Cheang, Chan Wa. "Web query expansion by WordNet." Thesis, University of Macau, 2005. http://umaclib3.umac.mo/record=b1445899.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Brandao, Wladmir Cardoso. "Exploiting entities for query expansion." Universidade Federal de Minas Gerais, 2013. http://hdl.handle.net/1843/ESBF-9GMJW2.

Full text
Abstract:
A substantial fraction of web search queries contain references to entities, such as persons, organizations, and locations. In this work, we propose entity-oriented query expansion approaches that exploit semantic sources of evidence devising discriminative term features and machine learning techniques that effectively combines these features to rank candidate expansion terms. Particularly, our unsupervised approach (UQEE) uses taxonomic features devised by the semantic structure implicitly provided by infobox templates, while our learning to rank approach (L2EE) considers semantic evidence encoded in the content of Wikipedia article fields to automatically labels training examples proportionally to their observed retrieval effectiveness. Lastly, we propose a self-supervised approach to autonomously generate infoboxes for Wikipedia articles (WAVE). Experiments attest the effectiveness of our approaches, with significantly gains compared to state-of-the-art PRF and ePRF approaches.
Uma fração substancial de consultas submetidas às máquinas de busca na web fazem referência a entidades, como pessoas, organizações e locais. No presente trabalho, nós propomos abordagens orientadas a entidade para expansão de consulta que exploram aspectos semânticos em bases de conhecimento para derivar evidências discriminativas de termos e técnicas de aprendizagem de máquina, com o intuito de combinar de maneira efetiva as evidências a fim de se obter um ranking de termos candidatos para expansão. Particularmente, nossa abordagem supervisionada (UQEE) utiliza-se de evidências derivadas da estrutura semântica implícita em templates de infoboxes em artigos da Wikipedia, enquanto nossa abordagem de aprendizagem para ranking (L2EE) considera evidências semânticas derivadas do conteúdo de campos de artigos da Wikipedia para automaticamente rotular exemplos de treino proporcionalmente à efetividade observada na recuperação. Além disso, nós propomos uma abordagem auto-supervisionada para geração automática de infoboxes para artigos da Wikipedia (WAVE). Experimentos comprovam a efetividade de nossas abordagens, com ganhos significativos comparados às abordagens estado-da-arte em pseudo-relevance feedback (PRF) e PRF baseados em entidades.
APA, Harvard, Vancouver, ISO, and other styles
5

Höglund, Sofia. "Query expansion med semantiskt relaterade termer." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-16707.

Full text
Abstract:
The aim of this masters thesis is to examine query expansion. Query expansion is the process of adding new terms to a query to improve the retrieval effectiveness. In this study the baseline query was expanded in five different modes. The first expansion strategy was formulated by the inflected terms as the system does not allow truncation, I have inflected the terms, e. g. airplane, airplanes, the airplanes, etc. in the baseline strategy. The second expansion strategy consisted of synonyms from a dictionary of synonyms. For the terms in the remaining three expansion strategies a general thesaurus was used which contained hierarchical and associative term relationships in order to find broader, narrower and related expansion terms. Relative recall and average precision was measured. The average results show that the expansion with inflected terms from the baseline query gave the most effective retrieval, both concerning increase in relative recall and average precision.
Uppsatsnivå: D
APA, Harvard, Vancouver, ISO, and other styles
6

Bilotti, Matthew W. (Matthew William) 1981. "Query expansion techniques for question answering." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/27083.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.
Includes bibliographical references (p. 105-109).
Query expansion is a technique used to boost performance of a document retrieval engine, such as those commonly found in question answering (QA) systems. Common methods of query expansion for Boolean keyword-based document retrieval engines include inserting query terms, such as alternate inflectional or derivational forms generated from existing query terms, or dropping query terms that are, for example, deemed to be too restrictive. In this thesis, I present a quantitative evaluation against a test; collection of my own design of five query expansion techniques, two term expansion methods and three term-dropping strategies. I present results that show that there exist best-performing query expansion algorithms that can be ex- perimentally optimized for specific tasks. My findings pose questions that suggest interesting avenues for further study of query expansion algorithms.
by Matthew W. Bilotti.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
7

Khandpur, Rupinder P. "Augmenting Dynamic Query Expansion in Microblog Texts." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/84852.

Full text
Abstract:
Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth of online media, such tools are essential for efficient search result refining to track emerging themes in noisy, unstructured text streams. It's crucial for large-scale predictive analytics and decision-making, systems which use open source indicators to find meaningful information rapidly and accurately. The problems of information overload and semantic mismatch are systemic during the Information Retrieval (IR) tasks undertaken by such systems. In this dissertation, we develop approaches to dynamic query expansion algorithms that can help improve the efficacy of such systems using only a small set of seed queries and requires no training or labeled samples. We primarily investigate four significant problems related to the retrieval and assessment of event-related information, viz. (1) How can we adapt the query expansion process to support rank-based analysis when tracking a fixed set of entities? A scalable framework is essential to allow relative assessment of emerging themes such as airport threats. (2) What visual knowledge discovery framework to adopt that can incorporate users' feedback back into the search result refinement process? A crucial step to efficiently integrate real-time `situational awareness' when monitoring specific themes using open source indicators. (3) How can we contextualize query expansions? We focus on capturing semantic relatedness between a query and reference text so that it can quickly adapt to different target domains. (4) How can we synchronously perform knowledge discovery and characterization (unstructured to structured) during the retrieval process? We mainly aim to model high-order, relational aspects of event-related information from microblog texts.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhuang, Wenjie. "Query Expansion Study for Clinical Decision Support." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/82068.

Full text
Abstract:
Information retrieval is widely used for retrieving relevant information among a variety of data, such as text documents, images, audio and videos. Since the first medical batch retrieval system was developed in mid 1960s, significant research efforts have focused on applying information retrieval to medical data. However, despite the vast developments in medical information retrieval and accompanying technologies, the actual promise of this area remains unfulfilled due to properties of medical data and the huge volume of medical literature. Specifically, the recall and precision of the selected dataset from the TREC clinical decision support track are low. The overriding objective of this thesis is to improve the performance of information retrieval techniques applied to biomedical text documents. We have focused on improving recall and precision among the top retrieved results. To that end, we have removed redundant words, and then expanded queries by adding MeSH terms in TREC CDS topics. We have also used other external data sources and domain knowledge to implement the expansion. In addition, we have also considered using the doc2vec model to optimize retrieval. Finally, we have applied learning to rank which sorts documents based on relevance and put relevant documents in front of irrelevant documents, so as to return the relevant retrieved data on the top. We have discovered that queries, expanded with external data sources and domain knowledge, perform better than applying the TREC topic information directly.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
9

Seher, Indra. "A personalised query expansion approach using context." View thesis, 2007. http://handle.uws.edu.au:8081/1959.7/33427.

Full text
Abstract:
Thesis (Ph.D.)--University of Western Sydney, 2007.
A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy to the College of Health & Science, School of Computing and Mathematics, University of Western Sydney. Includes bibliography.
APA, Harvard, Vancouver, ISO, and other styles
10

Qiu, Yonggang. "Automatic query expansion based on a similarity thesaurus /." [S.l.] : [s.n.], 1995. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=11158.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Ibrahim, Duraid M. "Natural language query translation and expansion in information retrieval." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0024/MQ51722.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Magennis, Mark. "The potential and actual effectiveness of interactive query expansion." Thesis, University of Glasgow, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360116.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Axensten, Siri. "En komparativ litteraturstudie av olika termkällor för query expansion." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-17718.

Full text
Abstract:
The purpose of this thesis is to make a comparative literary study of ten laboratory experiments with different kind of sources for query expansion terms. The experiments are grouped according to categories of term sources, which are: search results, collection dependent knowledge structures, collection independent knowledge structures and one that combines the two last mentioned sources into one. To enable a comparison all other variables were held as constant as possible. There is the improvement measured in mean average precision, which is used to measure the various sources potential. The result from the study shows a strong connection between the kind of source for the expansion term and improvement of the result. The experiments were structured based on results from A to B, the best result being A followed by B and so forth. The idea with these units is also to show potential common characteristics according to query expansion strategies. Unit A consists of the combined knowledge structure and has shown considerably better result compared to the others. The hypothesis of this experiment was, that different knowledge structures have various characteristics, that together reinforce each other. The experiments in unit B all use the collection as term source, including search result as such, and are also all statistically based. The only experiment using NLP technique and linguistically based measurement between terms, constitutes unit C. Unit D consists of all experiments in which collection independent sources were used for query expansion.
Uppsatsnivå: D
APA, Harvard, Vancouver, ISO, and other styles
14

Eklund, Johan, and Anders Stenström. "En komparativ studie av fem rankningsalgoritmer för query expansion." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-18322.

Full text
Abstract:
The purpose of this thesis is to compare five different ranking algorithms for query expansion. The algorithms compared are f4, f4mod, porter, wpq, and emim. This is done using a TREC collection, a selection of topics, and relevance judgements. Relative recall is measured before and after the expansion of the query. The study shows that all of the algorithms manage to increase the relative recall, f4 being the one most successful.
Uppsatsnivå: D
APA, Harvard, Vancouver, ISO, and other styles
15

Wien, Sigurd. "Efficient Top-K Fuzzy Interactive Query Expansion While Formulating a Query : From a Performance Perspective." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-23010.

Full text
Abstract:
Interactive query expansion and fuzzy search are two efficient techniques for assisting a user in an information retrieval process. Interactive query expansion helps the user refine a query by giving suggestions on how a query might be extended to further specify the actual information need of the user. Fuzzy search, on the other hand, supports the user by including results for terms that approximately equals the query string. This avoids reformulating queries with slight misspellings and will retrieve results for indexed terms not spelled as expected. This study will look at the performance aspects of combining these concepts to give the user real time suggestions on how to complete query as the query is formulated letter by letter. These suggestions will be a set of terms from the index that are fuzzy matches of the query string terms, and are chosen based on the individual rank of the term, the semantic correlation between the individual term and the edit distance between the query and the suggestion.The combination of these techniques is challenging from a performance aspect because each of them requires a lot of computation, and their relationship is such that these computations will be multiplicative when combined. Giving suggestions letter by letter as the user types requires a lookup for each letter and fuzzy search will expand each of these lookups with the fuzzy matches of the prefix to match against the index. For each of these different completions of the fuzzy matched prefixes, we will need to calculate the semantic correlation it has to the previous matched terms.This study will present three algorithms to give top-k suggestions for the single term case and then extend these in three ways to handle multi term queries. These algorithms will use a trie based term index with some extensions to enable fast lookup of top-k terms that match a given prefix and to assess the semantic correlation between the terms in the suggestion. The performance review will demonstrate that our approach will be viable to use for presenting the user with suggestions in real time even with a fairly large number of terms.
APA, Harvard, Vancouver, ISO, and other styles
16

Hellström, Else-Britt. "Den kombinerade effekten av query-expansion och query-strukturer på återvinningseffektiviteten i ett probabilistiskt system." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-17554.

Full text
Abstract:
This thesis deals with query formulation in full text retrieval. The variables studied were query expansion and query structur[INS: e :INS , and the aim of the thesis is to study their co-effects on retrieval performance in a probabilistic system. The expansion was made with synonyms, and the structure by expressing facets and phrases. The study was performed using QPA, Query Performance Analyzer, which includes InQuery retrieval system and a sub-collection of TREC documents with its topics. The measurement used was precision at 11 DCV points. The best result was obtained by queries with no expansion and a structure where all terms had equal influence. Any other combinations of expansion and structure used in this study were contraproductive. The results point out that extensive expansion including all kinds of term relations, and a faceted structure, as used in earlier studies, are more favourable than the rather limited expansion and structures used in this study. The possible influence of the fact that all queries included phrases is discussed. The methods to structure phrases and how to create their synonyms are discussed as important variables for further investigations. The importance of how expansion terms are chosen as well as good term sources is also underlined.
Uppsatsnivå: D
APA, Harvard, Vancouver, ISO, and other styles
17

Guisado, Gámez Joan. "Query expansion by relying on the structure of knowledge bases." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/460767.

Full text
Abstract:
Query expansion techniques aim at improving the results achieved by a user's query by means of introducing new expansion terms, called expansion features. Expansion features introduce new concepts that are semantically related with the concepts in the user's query and that allow retrieving documents that otherwise would be not. Thus, the challenge is to select those expansion features that are capable of improving the results the most. A bad choice of expansion features may be counterproductive. In this thesis, we use an external source of information, a Knowledge Base (KB), as source expansion features. A knowledge base consists of a set of entries, each of which represent a concept and has, at least, a name, which can be used as expansion feature. The techniques framed in this family have become more popular due to the increase of available data, as, for example, Wikipedia. Particularly, we focus on exploiting those KB whose entries are linked to each other, conforming a graph of entries. To the best of our knowledge, most of the techniques framed on the KB family rely on some kind of text analysis, such as explicit semantic analysis, or are based on other existing query expansion techniques such as pseudo relevance feedback. However, the underlying net-work structure of KBs has been barely exploited. In this thesis, we show that the structure can be used to identify reliable expansion feature for the query expansion process. Thus, we design a novel expansion technique, Structural Query Expansion (SQE). For SQE to benefit from the particular structures of KBs, we propose a methodology to identify the structural characteristics that, given a query, allow identifying those nodes in the KB that are good candidates to be used as source of expansion features, called from now on expansion nodes. The methodology consists in building a ground truth that connects each query from a query set with those nodes of the KB that when used to extract the expansion features allow achieving the best results in terms of precision, we call the set of those nodes, expansion query graph. Then, we compare the expansion query graph of each query to find shared characteristics. SQE materializes the revealed characteristics into a set of structural motifs. In the particular case of Wikipedia, we have found two motifs called triangular and square. In the former, the query node and the expansion node are doubly linked and the expansion node belongs to, at least, the same categories as the query node. In the latter, the query node and the expansion node also are doubly linked and their categories are connected somehow. These motifs are used to, given a query and its query nodes, identify all the expansion nodes which are used as source of expansion features. Notice that we have designed this technique to be orthogonal to others because is fully decoupled from the search process and does not depend on the particular collection of documents. We have tested our techniques with three different datasets to avoid any kind of overfitting. The results are shown to be consistent among the three of them. Also, the results which are validated with statistical significance tests, show that SQE is capable to achieve up to 150% improvement in the precision. Finally, we show the performance of our technique which runs in sub-second times (358.23ms at maximum) which makes it feasible for a real query expansion system. This is especially relevant because, to the best of our knowledge, the performance is an aspect that is being ignored in most of the works and, thus, it is difficult to know whether they can be include in real systems or not.
Les tècniques d'expansió de consultes tenen com a objecte millorar els resultats obtinguts per la consulta d'un usuari a partir de la introducció de termes d'expansió, anomenat característiques d'expansió. Les característiques d'expansió introdueixen nous conceptes que estan relacionats semànticament amb els conceptes de la consulta de l'usuari i que permeten obtenir documents que d'altra manera no es podrien obtenir. Per tant, el repte és seleccionar les característiques d'expansió que són capaces de millorar al màxim els resultats, doncs una mala elecció pot ser contra-productiva. En aquesta tesis, utilitzem una font externa d'informació, una Base de Coneixement (KB), com a font de característiques d'expansió. Una KB és un conjunt d'entrades, cadascuna de les quals representa un concepte i que té, com a mínim, un nom, que és susceptible de ser usat com a característica d'expansió. Les tècniques emmarcades en aquesta família han esdevingut populars degut al creixement de la informació disponible, per exemple, Wikipedia. Particularment, nosaltres en centrem en utilitzar aquelles KB les entrades de les quals estan relacionades entre si, conformant d'aquesta manera, un graf d'entrades. Segons les nostres informacions, la majora de les tècniques emmarcades en aquesta família utilitzen algun tipus d'anàlisi lingüístic, o estan basades en d'altres tècniques com relevance feedback. Ara bé, la estructura subjacent de la xarxa gairebé no s'ha utilitzat. En aquesta tesis, mostrem que la estructura es pot fer servir per identificar característiques d'expansió fiables pel procés d'expansió de consultes. De fet, proposem una tècnica d'expansió novell, Structural Query Expansion (SQE), que la explota. Perquè SQE pugui beneficiar-se de les particularitats estructurals de les KBs, hem proposat també una metodologia per revelar les característiques estructurals que, donada una consulta, permeten identificar aquells nodes que són una bona font de característiques d'expansió, els anomenats, nodes d'expansió. Aquesta metodologia consisteix en construir un ground truth que relaciona una conjunt de consultes amb el seu optimal expansion query graph. L'optimal expansion query graph és el conjunt de nodes d'expansió que quan s'utilitzen com a font de característiques d'expansió, permeten obtenir els millors resultats en termes de precisió. Un cop tenim els optimal expansion query graphs, els comparem entre si per a buscar característiques compartides. SQE materialitza aquestes característiques en un conjunt de motius estructurals. En el cas de Wikipedia hem trobat 2 motius: el triangular i el quadràtic. En els dos casos el node de la consulta ha d'estar doblement lincat amb el node d'expansió. En el triangular, les categories del node d'expansió ha de pertànyer, com a mínim, a les mateixes categories que el node de la consulta, mentre que en el quadràtic tan sols cal que les categories del node de la consulta i el d'expansió estiguin relacionades. Aquest motius s'utilitzen per, donada una consulta, identificar tots els seus nodes d'expansió. Hem dissenyat aquesta tècnica com una tècnica ortogonal a d'altres ja que està desacoblada del procés de cerca i no depèn de la col·lecció de documents. Hem provar la nostra tècnica amb 3 jocs de dades diferents per a evitar qualsevol tipus d'especialització. Els resultats són consistents entre els tres. Hem validat els resultats amb testos de significança estadística obtenint millores del 150% en la precisió. Finalment, pel que fa el rendiment de la nostra proposta, mostrem que s'executa en mil·lisegons, i això la fa susceptible de ser utilitzada en sistemes d'expansió reals. Això és especialment rellevant perquè, segons les nostres informacions, aquest és un aspecte que s'ignora en la literatura i, per tant, és difícil de saber la viabilitat de les propostes que existeixen en entorns reals.
APA, Harvard, Vancouver, ISO, and other styles
18

Efthimiadis, Efthimis Nikolaos. "Interactive query expansion and relevance feedback for document retrieval systems." Thesis, City University London, 1992. http://openaccess.city.ac.uk/7891/.

Full text
Abstract:
This thesis is aimed at investigating interactive query expansion within the context of a relevance feedback system that uses term weighting and ranking in searching online databases that are available through online vendors. Previous evaluations of relevance feedback systems have been made in laboratory conditions and not in a real operational environment. The research presented in this thesis followed the idea of testing probabilistic retrieval techniques in an operational environment. The overall aim of this research was to investigate the process of interactive query expansion (IQE) from various points of view including effectiveness. The INSPEC database, on both Data-Star and ESA-IRS, was searched online using CIRT, a front-end system that allows probabilistic term weighting, ranking and relevance feedback. The thesis is divided into three parts. Part I of the thesis covers background information and appropriate literature reviews with special emphasis on the relevance weighting theory (Binary Independence Model), the approaches to automatic and semi-automatic query expansion, the ZOOM facility of ESA/IRS and the CIRT front-end. Part II is comprised of three Pilot case studies. It introduces the idea of interactive query expansion and places it within the context of the weighted environment of CIRT. Each Pilot study looked at different aspects of the query expansion process by using a front-end. The Pilot studies were used to answer methodological questions and also research questions about the query expansion terms. The knowledge and experience that was gained from the Pilots was then applied to the methodology of the study proper (Part III). Part III discusses the Experiment and the evaluation of the six ranking algorithms. The Experiment was conducted under real operational conditions using a real system, real requests, and real interaction. Emphasis was placed on the characteristics of the interaction, especially on the selection of terms for query expansion. Data were collected from 25 searches. The data collection mechanisms included questionnaires, transaction logs, and relevance evaluations. The results of the Experiment are presented according to their treatment of query expansion as main results and other findings in Chapter 10. The main results discuss issues that relate directly to query expansion, retrieval effectiveness, the correspondence of the online-to-offline relevance judgements, and the performance of the w(p — q) ranking algorithm. Finally, a comparative evaluation of six ranking algorithms was performed. The yardstick for the evaluation was provided by the user relevance judgements on the lists of the candidate terms for query expansion. The evaluation focused on whether there are any similarities in the performance of the algorithms and how those algorithms with similar performance treat terms. This abstract refers only to the main conclusions drawn from the results of the Experiment: (1) One third of the terms presented in the list of candidate terms was on average identified by the users as potentially useful for query expansion; (2) These terms were mainly judged as either variant expression (synonyms) or alternative (related) terms to the initial query terms. However, a substantial portion of the selected terms were identified as representing new ideas. (3) The relationship of the 5 best terms chosen by the users for query expansion to the initial query terms was: (a) 34% have no relationship or other type of correspondence with a query term; (b) 66% of the query expansion terms have a relationship which makes the term: (bl) narrower term (70%), (b2) broader term (5%), (b3) related term (25%). (4) The results provide some evidence for the effectiveness of interactive query expansion. The initial search produced on average 3 highly relevant documents at a precision of 34%; the query expansion search produced on average 9 further highly relevant documents at slightly higher precision. (5) The results demonstrated the effectiveness of the w(p—q) algorithm, for the ranking of terms for query expansion, within the context of the Experiment. (6) The main results of the comparative evaluation of the six ranking algorithms, i.e. w(p — q), EMIM, F4, F4modifed, Porter and ZOOM, are that: (a) w(p — q) and EMIM performed best; and (b) the performance between w(p — q) and EMIM and between F4 and F4modified is very similar; (7) A new ranking algorithm is proposed as the result of the evaluation of the six algorithms. Finally, an investigation is by definition an exploratory study which generates hypotheses for future research. Recommendations and proposals for future research are given. The conclusions highlight the need for more research on weighted systems in operational environments, for a comparative evaluation of automatic vs interactive query expansion, and for user studies in searching weighted systems.
APA, Harvard, Vancouver, ISO, and other styles
19

Bhogal, Jagdev. "Investigating ontology based query expansion using a probabilistic retrieval model." Thesis, City University London, 2011. http://openaccess.city.ac.uk/2946/.

Full text
Abstract:
This research briefly outlines the problems of traditional information retrieval systems and discusses the different approaches to inferring context in document retrieval. By context we mean word disambiguation which is achieved by exploring the generalisation-specialisation hierarchies within a given ontology. Specifically, we examine the use of ontology based query expansion for defining query context. Query expansion can be done in many ways and in this work we consider the use of relevance feedback and pseudo-relevance feedback for query expansion. We examine relevance feedback and pseudo-relevance to ascertain the existence of performance differences between relevance feedback and pseudo-relevance feedback. The information retrieval system used is based on the probabilistic retrieval model and the query expansion method is extended using information from a news domain ontology. The aim of this project is to assess the impact of the use of the ontology on the query expansion results. Our results show that ontology based query expansion has resulted in a higher number of relevant documents being retrieved compared to the standard relevance feedback process. Overall, ontology based query expansion improves recall but does not produce any significant improvements for the precision results. Pseudo-relevance feedback has achieved better results than relevance feedback. We also found that reducing or increasing the relevance feedback parameters (number of terms or number of documents) does not correlate with the results. When comparing the effect of varying the number of terms parameter with the number of documents parameter, the former benefits the pseudo-relevance feedback results but the latter has an additional effect on the relevance feedback results. There are many factors which influence the success of ontology based query expansion. The thesis discusses these factors and gives some guidelines on using ontologies for the purpose of query expansion.
APA, Harvard, Vancouver, ISO, and other styles
20

Kojic, Kemal, and Emil Petersson. "Automatisk synonymgenerering med Word2Vec for query expansion inom e-handel." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20818.

Full text
Abstract:
I detta arbete undersöks hur väl automatisk synonymgenerering genom maskininlärnings-metoden Word2Vec, som tränats över en datamängd från Google News på hundra miljarder ord, lämpar sig för query expansion inom ehandel. Detta görs genom användning av produkt- och eventdata från ett välkänt modebolag där synonymer genereras utifrån söksträngar som loggats i eventdata genom olika metoder som i sin tur bildar synonymböcker som används i framtida sökningar med hjälp av query expansion. För att kunna besvara studiens forskningsfrågor utförs först en kvantitativ analys. Denna analys utförs på data som matchade köp, produktträffar, no-hits och söktid. Information om denna data genereras utifrån en söksimulator som simulerar loggade händelser från användarsessioner i ett ehandelssystem. Därefter filtreras de genererade synonymböckerna genom att ta bort synonymer som är kopplade till de söksträngar som producerat ett sämre resultat i simuleringen med synonymer, än utan. För att validera vårt resultat från den kvantitativa analysen utförs även en kvalitativ analys på skillnaden i sökresultatet som de olika metoderna tar fram, där vi undersöker vad det är för produkter som tas fram med hjälp av synonymerna, för att undersöka dess relevans. Våra tester uppvisar att ett lägre tröskelvärde leder till fler produkträffar och minskar antalet no-hits. Antalet produktträffar ökades med mellan 4\%-10\%, no-hits reducerades med mellan 11\%-22\%. I de fall där söksträngen har tilldelats bra synonymer påverkas relevansen av produkterna positivt då fler relevanta produkter dyker upp i sökresultatet. I de fall där söksträngen har tilldelats mindre bra synonymer påverkas relevansen av produkterna negativt då vissa irrelevanta produkter dyker upp i sökresultatet som användaren antagligen inte vill se i sitt sökresultat. I alla fall där de automatiskt genererade synonymerna används så befinner sig majoriteten av alla köpta produkter i den första halvan av sökresultatet, däremot minskar antalet köpta produkter på den första platsen i sökresultatet i alla fallen.
In this thesis, we examine automatic synonym generation through the use of the machine learning algorithm Word2Vec that has been trained using a Google News data set containing a hundred million words to find out if it is suitable for query expansions in e-commerce. This is examined through the use of product- and event data from a well-known fashion company where synonyms are generated from search-queries that have been logged in the event data through different methods, resulting in thesaurus' that are used in future searches with the use of query expansions. In order to answer the thesis' research question, a quantitative analysis is performed. This analysis is performed on data such as matched payments, product matches, no-hits and search time. Information about this data is generated through a search simulator that simulates logged events from user sessions in a e-commerce system. The generated thesaurus' are later filtered through the removal of synonyms that are connected to search queries whose results have produced worse results than the results without synonyms. In order to validate our results from the quantitative analysis a qualitative analysis is also performed on the difference of the search result that the different methods produce. In this qualitative analysis we research what type of products that the added synonyms produce in order to understand the relevance of the search query. Our tests show that the lower the threshold is, the higher the number of product hits and the lower the number of no-hits. Our tests shows that the number of product hits was increased by between 4\%-10\%, the number of no-hits was reduced by 11\%-22\%. In all of the tests using automatically generated synonyms, the results show that the majority of the purchased products are presented in the first half of the search result, however, in all of the tests using automatically generated synonyms the number of purchases in the first position of the search result was reduced.
APA, Harvard, Vancouver, ISO, and other styles
21

Dai, James Jian 1982. "Visual intelligence for online communities : commonsense image retrieval by query expansion." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/26916.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2004.
Includes bibliographical references (leaves 65-67).
This thesis explores three weaknesses of keyword-based image retrieval through the design and implementation of an actual image retrieval system. The first weakness is the requirement of heavy manual annotation of keywords for images. We investigate this weakness by aggregating the annotations of an entire community of users to alleviate the annotation requirements on the individual user. The second weakness is the hit-or-miss nature of exact keyword matching used in many existing image retrieval systems. We explore this weakness by using linguistics tools (WordNet and the OpenMind Commonsense database) to locate image keywords in a semantic network of interrelated concepts so that retrieval by keywords is automatically expanded semantically to avoid the hit-or-miss problem. Such semantic query expansion further alleviates the requirement for exhaustive manual annotation. The third weakness of keyword-based image retrieval systems is the lack of support for retrieval by subjective content. We investigate this weakness by creating a mechanism to allow users to annotate images by their subjective emotional content and subsequently to retrieve images by these emotions. This thesis is primarily an exploration of different keyword-based image retrieval techniques in a real image retrieval system. The design of the system is grounded in past research that sheds light onto how people actually encounter the task of describing images with words for future retrieval. The image retrieval system's front-end and back- end are fully integrated with the Treehouse Global Studio online community - an online environment with a suite of media design tools and database storage of media files and metadata.
(cont.) The focus of the thesis is on exploring new user scenarios for keyword-based image retrieval rather than quantitative assessment of retrieval effectiveness. Traditional information retrieval evaluation metrics are discussed but not pursued. The user scenarios for our image retrieval system are analyzed qualitatively in terms of system design and how they facilitate the overall retrieval experience.
James Jian Dai.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
22

Cui, Jun. "Query Expansion Research and Application in Search Engine Based on Concepts Lattice." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-5762.

Full text
Abstract:
Formal concept analysis is increasingly applied to query expansion and data mining problems. In this paper I analyze and compare the current concept lattice construction algorithm, and choose iPred and Border algorithms to adapt for query expansion. After I adapt two concept lattice construction algorithms, I apply these four algorithms on one query expansion prototype system. The calculation time for four algorithms are recorded and analyzed. The result of adapted algorithms is good. Moreover I find the efficiency of concept lattice construction is not consistent with complex analysis result. In stead, it is high depend on the structure of data set, which is data source of concept lattice.
APA, Harvard, Vancouver, ISO, and other styles
23

Ward, Erik. "Tweet Collect: short text message collection using automatic query expansion and classification." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-194961.

Full text
Abstract:
The growing number of twitter users create large amounts of messages that contain valuable information for market research. These messages, called tweets, which are short, contain twitter-specific writing styles and are often idiosyncratic give rise to a vocabulary mismatch between typically chosen keywords for tweet collection and words used to describe television shows. A method is presented  that uses a new form of query expansion that generates pairs of search terms and takes into consideration the language usage of twitter to access user data that would otherwise be missed. Supervised classification, without manually annotated data, is used to maintain precision by comparing collected tweets with external sources. The method is implemented, as the Tweet Collect system, in Java utilizing many processing steps to improve performance. The evaluation was carried out by collecting tweets about five different television shows during their time of airing and indicating, on average, a 66.5% increase in the number of relevant tweets compared with using the title of the show as the search terms and 68.0% total precision. Classification gives a, slightly lower, average increase of 55.2% in number of tweets and a greatly increased 82.0% total precision. The utility of an automatic system for tracking topics that can find additional keywords is demonstrated. Implementation considerations and possible improvements are discussed that can lead to improved performance.
APA, Harvard, Vancouver, ISO, and other styles
24

Li, Zhihan. "Improvement to Chinese information retrieval by incorporating word segmentation and query expansion." Thesis, Queensland University of Technology, 2009. https://eprints.qut.edu.au/30422/1/Zhihan_Li_Thesis.pdf.

Full text
Abstract:
The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.
APA, Harvard, Vancouver, ISO, and other styles
25

Johansson, Emma, and Birgitta Jonsson. "Query expansion med hjälp av en elektronisk tesaurus i en bibliografisk online-databas." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-20957.

Full text
Abstract:
In this Masters thesis we investigate query expansion using an electronic thesaurus. We formed 30 topics from the library and information science field. For each topic three searches were made in the bibliographic database LISA Library and Information Science Abstracts via CSA Cambridge Scientific Abstracts. LISA has an electronic thesaurus attached. We used this thesaurus to form baseline queries, baseline queries expanded with synonyms, and baseline queries expanded with both synonyms and related terms. The terms used in the baseline queries were the terms preferred by the thesaurus the "use for" terms. The three query types were analyzed with regard to precision and relative recall. The relative recall was set to 100% for the strategy where a baseline query was expanded with both synonyms and related terms. The average precision for the searches using baseline queries was 41% with a 60% relative recall. For the searches using baseline queries expanded with synonyms, the precision was 39% and the relative recall was 72%. The searches using baseline queries expanded with both synonyms and related terms resulted in an average precision of 37%. We also measured the "net effect", that is, the precision and recall with regard to the new documents retrieved by each search strategy. We found that, on average, 60% of the relevant documents were retrieved by the baseline queries. The strategy using a baseline query expanded with synonyms was the least effective one in retrieving new relevant documents on average it brought only 12% of the relevant documents. When measuring the precision for the new documents retrieved by each search strategy we encountered a methodological problem which is discussed in this thesis.
Uppsatsnivå: D
APA, Harvard, Vancouver, ISO, and other styles
26

Wang, Xinkai. "Chinese-English cross-lingual information retrieval in biomedicine using ontology-based query expansion." Thesis, University of Manchester, 2011. https://www.research.manchester.ac.uk/portal/en/theses/chineseenglish-crosslingual-information-retrieval-in-biomedicine-using-ontologybased-query-expansion(1b7443d3-3baf-402b-83bb-f45e78876404).html.

Full text
Abstract:
In this thesis, we propose a new approach to Chinese-English Biomedical cross-lingual information retrieval (CLIR) using query expansion based on the eCMeSH Tree, a Chinese-English ontology extended from the Chinese Medical Subject Headings (CMeSH) Tree. The CMeSH Tree is not designed for information retrieval (IR), since it only includes heading terms and has no term weighting scheme for these terms. Therefore, we design an algorithm, which employs a rule-based parsing technique combined with the C-value term extraction algorithm and a filtering technique based on mutual information, to extract Chinese synonyms for the corresponding heading terms. We also develop a term-weighting mechanism. Following the hierarchical structure of CMeSH, we extend the CMeSH Tree to the eCMeSH Tree with synonymous terms and their weights. We propose an algorithm to implement CLIR using the eCMeSH Tree terms to expand queries. In order to evaluate the retrieval improvements obtained from our approach, the results of the query expansion based on the eCMeSH Tree are individually compared with the results of the experiments of query expansion using the CMeSH Tree terms, query expansion using pseudo-relevance feedback, and document translation. We also evaluate the combinations of these three approaches. This study also investigates the factors which affect the CLIR performance, including a stemming algorithm, retrieval models, and word segmentation.
APA, Harvard, Vancouver, ISO, and other styles
27

Ermakova, Liana. "Short text contextualization in information retrieval : application to tweet contextualization and automatic query expansion." Thesis, Toulouse 2, 2016. http://www.theses.fr/2016TOU20023/document.

Full text
Abstract:
La communication efficace a tendance à suivre la loi du moindre effort. Selon ce principe, en utilisant une langue donnée les interlocuteurs ne veulent pas travailler plus que nécessaire pour être compris. Ce fait mène à la compression extrême de textes surtout dans la communication électronique, comme dans les microblogues, SMS, ou les requêtes dans les moteurs de recherche. Cependant souvent ces textes ne sont pas auto-suffisants car pour les comprendre, il est nécessaire d’avoir des connaissances sur la terminologie, les entités nommées ou les faits liés. Ainsi, la tâche principale de la recherche présentée dans ce mémoire de thèse de doctorat est de fournir le contexte d’un texte court à l’utilisateur ou au système comme à un moteur de recherche par exemple.Le premier objectif de notre travail est d'aider l’utilisateur à mieux comprendre un message court par l’extraction du contexte d’une source externe comme le Web ou la Wikipédia au moyen de résumés construits automatiquement. Pour cela nous proposons une approche pour le résumé automatique de documents multiples et nous l’appliquons à la contextualisation de messages, notamment à la contextualisation de tweets. La méthode que nous proposons est basée sur la reconnaissance des entités nommées, la pondération des parties du discours et la mesure de la qualité des phrases. Contrairement aux travaux précédents, nous introduisons un algorithme de lissage en fonction du contexte local. Notre approche s’appuie sur la structure thème-rhème des textes. De plus, nous avons développé un algorithme basé sur les graphes pour le ré-ordonnancement des phrases. La méthode a été évaluée à la tâche INEX/CLEF Tweet Contextualization sur une période de 4 ans. La méthode a été également adaptée pour la génération de snippets. Les résultats des évaluations attestent une bonne performance de notre approche
The efficient communication tends to follow the principle of the least effort. According to this principle, using a given language interlocutors do not want to work any harder than necessary to reach understanding. This fact leads to the extreme compression of texts especially in electronic communication, e.g. microblogs, SMS, search queries. However, sometimes these texts are not self-contained and need to be explained since understanding them requires knowledge of terminology, named entities or related facts. The main goal of this research is to provide a context to a user or a system from a textual resource.The first aim of this work is to help a user to better understand a short message by extracting a context from an external source like a text collection, the Web or the Wikipedia by means of text summarization. To this end we developed an approach for automatic multi-document summarization and we applied it to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm for smoothing from the local context. Our approach exploits topic-comment structure of a text. Moreover, we developed a graph-based algorithm for sentence reordering. The method has been evaluated at INEX/CLEF tweet contextualization track. We provide the evaluation results over the 4 years of the track. The method was also adapted to snippet retrieval. The evaluation results indicate good performance of the approach
APA, Harvard, Vancouver, ISO, and other styles
28

Zhu, Weizhong Allen Robert B. "Text clustering and active learning using a LSI subspace signature model and query expansion /." Philadelphia, Pa. : Drexel University, 2009. http://hdl.handle.net/1860/3077.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Nwesri, Abdusalam F. Ahmad, and nwesri@yahoo com. "Effective retrieval techniques for Arabic text." RMIT University. Computer Science and IT, 2008. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20081204.163422.

Full text
Abstract:
Arabic is a major international language, spoken in more than 23 countries, and the lingua franca of the Islamic world. The number of Arabic-speaking Internet users has grown over nine-fold in the Middle East between the year 2000 and 2007, yet research in Arabic Information Retrieval (AIR) has not advanced as in other languages such as English. In this thesis, we explore techniques that improve the performance of AIR systems. Stemming is considered one of the most important factors to improve retrieval effectiveness of AIR systems. Most current stemmers remove affixes without checking whether the removed letters are actually affixes. We propose lexicon-based improvements to light stemming that distinguish core letters from proper Arabic affixes. We devise rules to stem most affixes and show their effects on retrieval effectiveness. Using the TREC 2001 test collection, we show that applying relevance feedback with our rules produces significantly better results than light stemming. Techniques for Arabic information retrieval have been studied in depth on clean collections of newswire dispatches. However, the effectiveness of such techniques is not known on other noisy collections in which text is generated using automatic speech recognition (ASR) systems and queries are generated using machine translations (MT). Using noisy collections, we show that normalisation, stopping and light stemming improve results as in normal text collections but that n-grams and root stemming decrease performance. Most recent AIR research has been undertaken using collections that are far smaller than the collections used for English text retrieval; consequently, the significance of some published results is debatable. Using the LDC Arabic GigaWord collection that contains more than 1 500 000 documents, we create a test collection of~90 topics with their relevance judgements. Using this test collection, we show empirically that for a large collection, root stemming is not competitive. Of the approaches we have studied, lexicon-based stemming approaches perform better than light stemming approaches alone. Arabic text commonly includes foreign words transliterated into Arabic characters. Several transliterated forms may be in common use for a single foreign word, but users rarely use more than one variant during search tasks. We test the effectiveness of lexicons, Arabic patterns, and n-grams in distinguishing foreign words from native Arabic words. We introduce rules that help filter foreign words and improve the n-gram approach used in language identification. Our combined n-grams and lexicon approach successfully identifies 80% of all foreign words with a precision of 93%. To find variants of a specific foreign word, we apply phonetic and string similarity techniques and introduce novel algorithms to normalise them in Arabic text. We modify phonetic techniques used for English to suit the Arabic language, and compare several techniques to determine their effectiveness in finding foreign word variants. We show that our algorithms significantly improve recall. We also show that expanding queries using variants identified by our Soutex4 phonetic algorithm results in a significant improvement in precision and recall. Together, the approaches described in this thesis represent an important step towards realising highly effective retrieval of Arabic text.
APA, Harvard, Vancouver, ISO, and other styles
30

Lundmark, Sofia. "Automatisk query expansion : en komparativ studie av olika strategier för termklustring baserade på lokal analys." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-16688.

Full text
Abstract:
Automatic query expansion has long been studied in information retrieval research as a technique that deals with the fundamental issue of word mismatch between query and document. The purpose of this thesis is to compare the retrieval effectiveness of different strategies for automatic query expansion. The strategies are based on local analysis of the corpus and use statistical information from the local document set to extract terms that suppose to adapt themselves to each individual search and therefore appear to be searchonyms to the index terms. The strategies compared are: association clusters, metric cluster and scalar cluster. Baseline queries of 24 topics are expanded using terms from the different clusters and searches are made. The study also explores the retrieval effectiveness of an expanded query when using terms derived from the result of a truncation algorithm. The searches were performed in the InQuery IR-system together with the web-based tool QPA and the Swedish database GP_HDINF. The retrieval effectiveness of baseline and the expanded queries are evaluated using relative recall and average precision. The study shows that all of the strategies manage to increase both recall and precision compared with the initial baseline search. No significant differences between the strategies were found.
Uppsatsnivå: D
APA, Harvard, Vancouver, ISO, and other styles
31

Bettio, Raphael Winckler de. "Inter-relaão das técnicas Term Extration e Query Expansion aplicadas na recuperação de documentos textuais." Florianópolis, SC, 2007. http://repositorio.ufsc.br/xmlui/handle/123456789/90753.

Full text
Abstract:
Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-graduação em Engenharia e Gestão do Conhecimento
Made available in DSpace on 2012-10-23T14:47:37Z (GMT). No. of bitstreams: 1 246508.pdf: 642498 bytes, checksum: 8cdf470f4a300ff9badf3d49f0a8247b (MD5)
Conforme Sighal (2006) as pessoas reconhecem a importância do armazenamento e busca da informação e, com o advento dos computadores, tornou-se possível o armazenamento de grandes quantidades dela em bases de dados. Em conseqüência, catalogar a informação destas bases tornou-se imprescindível. Nesse contexto, o campo da Recuperação da Informação, surgiu na década de 50, com a finalidade de promover a construção de ferramentas computacionais que permitissem aos usuários utilizar de maneira mais eficiente essas bases de dados. O principal objetivo da presente pesquisa é desenvolver um Modelo Computacional que possibilite a recuperação de documentos textuais ordenados pela similaridade semântica, baseado na intersecção das técnicas de Term Extration e Query Expansion.
APA, Harvard, Vancouver, ISO, and other styles
32

Lyall-Wilson, Jennifer Rae. "Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association Thesaurus." Diss., The University of Arizona, 2013. http://hdl.handle.net/10150/306773.

Full text
Abstract:
The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of all the concepts within the document collection as well as the relationships these concepts have to one another. Because the representation is generated using data from the association thesaurus, a mapping will exist between the representation of the concepts and the terms used to describe these concepts. The research applies to search engines designed for use in an individual website with content focused on a specific conceptual domain. Therefore, both the document collection and the subject content must be well-bounded, which affords the ability to make use of techniques not currently feasible for general purpose search engine used on the entire web.
APA, Harvard, Vancouver, ISO, and other styles
33

Shiri, Ali Asghar. "End-user interaction with thesaurus-enhanced search interfaces : an evaluation of search term selection for query expansion." Thesis, University of Strathclyde, 2003. http://oleg.lib.strath.ac.uk:80/R/?func=dbin-jump-full&object_id=21521.

Full text
Abstract:
A major challenge faced by end-users during the information search and retrieval process is the selection of search terms for query formulation and expansion. Thesauri are recognised as one source of search terms with the potential to assist users in the process of term selection. Research in search term selection, query expansion and interface evaluation has stressed the importance of providing end-users with terminological assistance. As the number of thesauri attached to information retrieval systems has grown, a range of interface facilities and features have been developed to aid users in formulating their queries. This study investigated end-user interaction with a thesaurus-enhanced search interface to evaluate their search term selection and query expansion behaviour. The main objectives of this study were: to evaluate how and to what extent a thesaurus-enhanced search interface assisted end-users in selecting search terms for query expansion, to ascertain users' attitude toward both the thesaurus and interface as tools for facilitating search term selection, and to identify searching and browsing behaviours of users interacting with a thesaurus-enhanced interface. The test environment involved the Ovid CAB Abstracts database, the CAB thesaurus, and 30 academic staff and postgraduate students with genuine search requests. The data gathering tools employed were pre-search questionnaires, screen capturing software, post-search questionnaires, and post-session interviews. The results demonstrated different patterns of thesaurus-based search term selection by academic staff and postgraduates. Academic staff with more extensive domain knowledge tended to select narrower terms whereas postgraduates more often chose related and broader terms. In general, all users selected a larger number of narrower and related terms for expanding their queries. The effect of topic characteristics such as topic complexity and topic familiarity on search behaviour was also investigated. It was shown that complex topics affected users' cognitive and physical moves, number of search terms selected and query expansion instances. Topic familiarity was also found to have an effect on users' browsing behaviour. An evaluation of users' perceptions of the interface indicated that usability was a factor affecting thesaurus browsing and navigating behaviour. This study was constrained by the limitations of the IR system utilised, the experimental design and the choice of subjects. However, this study can be viewed as the first investigation of variables such as topic complexity and topic familiarity within a thesaurus-enhanced search environment. The findings of this study contribute to research in the areas of user-centred search term selection, thesaurus-assisted query expansion and the evaluation of user interaction with IR search interfaces.
APA, Harvard, Vancouver, ISO, and other styles
34

Carstens, Carola [Verfasser], and Christa [Akademischer Betreuer] Womser-Hacker. "Ontology Based Query Expansion - Retrieval Support for the Domain of Educational Research / Carola Carstens. Betreuer: Christa Womser-Hacker." Hildesheim : Universitätsbibliothek Hildesheim, 2012. http://d-nb.info/1023809400/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Johansson, Henrik. "Using clickstream data as implicit feedback in information retrieval systems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233870.

Full text
Abstract:
This Master's thesis project aims to investigate if Wikipedia's clickstream data can be used to improve the retrieval performance of information retrieval systems. The project is conducted under the assumption that a traversal between two article connects the two articles in regards to content. To extract useful terms out of the clickstream data, it needed to be structured so that it given a Wikipedia article it is possible to find all of the in-going or out-going article traversals.The project settled on using the clickstream data in an automatic query expansion approach.Two expansion methods were investigated, one based on expanding with full article title so that the context would be preserved, and the other expanded with individual terms from the article titles.The structure of the data and two proposed methods were evaluated using a set of queries and relevance judgments. The results of the evaluation shows that the method that expands with individual terms performed better than the full article title expansion method and that the individual term method managed to increase the MAP with 11.24%.  The expansion method was evaluated on two different query collections, and it was found that the proposed expansion method only improves the results where the average recall of the original queries are low.The thesis conclusion is that the clickstream can be used to improve retrieval performance for an information retrieval system.
Det här examensarbetets mål är att undersöka om Wikipedias klickströmsdata kan användas för att förbättra sökprestanda för informationsökningssystem. Arbetet har utförts under antagandet att en övergång mellan två artiklar på Wikipedia sammankopplar artiklarnas innehåll och är av intresse för användaren. För att kunna utnyttja klickströmsdatan krävs det att den struktureras på ett användbart sätt så att det givet en artikel går att se hur läsare har förflyttat sig ut eller in mot artikeln. Vi valde att utnyttja datamängden genom en automatisk sökfrågeexpansion. Två olika metoder togs fram, där den första expanderar sökfrågan med hela artikeltitlar medans den andra expanderar med enskilda ord ur en artikeltitel.Undersökningens resultat visar att den ordbaserade expansionsmetoden presterar bättre än metoden som expanderar med hela artikeltitlar. Den ordbaserade expansionsmetoden lyckades uppnå en förbättring för måttet MAP med 11.21%. Från arbetet kan man också se att expansionmetoden enbart förbättrar prestandan när täckningen för den ursprungliga sökfrågan är liten. Gällande strukturen på klickströmsdatan så presterade den utgående strukturen bättre än den ingående. Examensarbetets slutsats är att denna klickströmsdata lämpar sig bra för att förbättra sökprestanda för ett informationsökningssystem.
APA, Harvard, Vancouver, ISO, and other styles
36

Volpe, Isabel Cristina. "Cell assemblies para expansão de consultas." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2011. http://hdl.handle.net/10183/32858.

Full text
Abstract:
Uma das principais tarefas de Recuperação de Informações é encontrar documentos que sejam relevantes a uma consulta. Esta tarefa é difícil porque, em muitos casos os termos de busca escolhidos pelo usuário são diferentes dos termos utilizados pelos autores dos documentos. Ao longo dos anos, várias abordagens foram propostas para lidar com este problema. Uma das técnicas mais utilizadas, com o objetivo de expandir o número de documentos relevantes recuperados é a Expansão de Consultas, que consiste em expandir a consulta com a adição de termos relacionados. Este trabalho propõe um método que utiliza o modelo de Cell Assemblies para a expansão da consulta. Cell Assemblies são grupos de neurônios conectados, com padrões de disparo, que permitem que a atividade persista mesmo após a remoção dos estímulos externos. A modificação das sinapses entre os neurônios é feita através de regras de aprendizagem Hebbiana. Neste trabalho, o modelo Cell Assemblies foi adaptado a fim de aprender os relacionamentos entre os termos de uma coleção de documentos. Esses relacionamentos são utilizados para expandir a consulta original com termos relacionados. A avaliação experimental sobre uma coleção de testes padrão em Recuperação de Informações mostrou que algumas consultas melhoraram significativamente seus resultados com a técnica proposta.
One of the main tasks in Information Retrieval is to match a user query to the documents that are relevant for it. This matching is challenging because in many cases the keywords the user chooses will be different from the words the authors of the relevant documents have used. Throughout the years, many approaches have been proposed to deal with this problem. One of the most popular consists in expanding the query with related terms with the goal of retrieving more relevant documents. In this work, we propose a new method in which a Cell Assembly model is applied for query expansion. Cell Assemblies are reverberating circuits of neurons that can persist long beyond the initial stimulus has ceased. They learn through Hebbian Learning rules and have been used to simulate the formation and the usage of human concepts. We adapted the Cell Assembly model to learn relationships between the terms in a document collection. These relationships are then used to augment the original queries. Our experiments use standard Information Retrieval test collections and show that some queries significantly improved their results with the proposed technique.
APA, Harvard, Vancouver, ISO, and other styles
37

Mahendiran, Aravindan. "Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media Analytics." Thesis, Virginia Tech, 2014. http://hdl.handle.net/10919/25430.

Full text
Abstract:
Twitter has become a popular data source in the recent decade and garnered a significant amount of attention as a surrogate data source for many important forecasting problems. Strong correlations have been observed between Twitter indicators and real-world trends spanning elections, stock markets, book sales, and flu outbreaks. A key ingredient to all methods that use Twitter for forecasting is to agree on a domain-specific vocabulary to track the pertinent tweets, which is typically provided by subject matter experts (SMEs). The language used in Twitter drastically differs from other forms of online discourse, such as news articles and blogs. It constantly evolves over time as users adopt popular hashtags to express their opinions. Thus, the vocabulary used by forecasting algorithms needs to be dynamic in nature and should capture emerging trends over time. This thesis proposes a novel unsupervised learning algorithm that builds a dynamic vocabulary using Probabilistic Soft Logic (PSL), a framework for probabilistic reasoning over relational domains. Using eight presidential elections from Latin America, we show how our query expansion methodology improves the performance of traditional election forecasting algorithms. Through this approach we demonstrate how we can achieve close to a two-fold increase in the number of tweets retrieved for predictions and a 36.90% reduction in prediction error.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
38

Janaite, Neto Jorge [UNESP]. "Recuperação de informação baseada em ontologia: uma proposta utilizando o modelo vetorial." Universidade Estadual Paulista (UNESP), 2018. http://hdl.handle.net/11449/154340.

Full text
Abstract:
Submitted by Jorge Janaite Neto (janaite@gmail.com) on 2018-06-24T23:56:37Z No. of bitstreams: 1 janaite_neto_j_me_mar.pdf: 1649007 bytes, checksum: 66467a076d4f716197896c6dc3c5ee2b (MD5)
Approved for entry into archive by Satie Tagara (satie@marilia.unesp.br) on 2018-06-25T13:46:39Z (GMT) No. of bitstreams: 1 janaiteneto_j_me_mar.pdf: 1649007 bytes, checksum: 66467a076d4f716197896c6dc3c5ee2b (MD5)
Made available in DSpace on 2018-06-25T13:46:39Z (GMT). No. of bitstreams: 1 janaiteneto_j_me_mar.pdf: 1649007 bytes, checksum: 66467a076d4f716197896c6dc3c5ee2b (MD5) Previous issue date: 2018-05-30
Não recebi financiamento
A recuperação de informação ocorre por meio da comparação entre as representações dos documentos de um acervo e a representação da necessidade de informação do usuário. Um documento é recuperado quando sua representação coincidir total ou parcialmente com a representação da necessidade de informação do usuário. O processo de recuperação de informação pode ser visto como um problema linguístico no qual o conteúdo informacional dos documentos e a necessidade de informação do usuário são representados por um conjunto de termos. A eficiência do processo de recuperação de informação depende da qualidade das representações dos documentos e dos termos empregados pelo usuário para representar sua necessidade de informação. Quanto mais compatíveis forem essas representações maior será a eficiência do processo de recuperação. A partir de uma pesquisa exploratória e descritiva fundamentada em bibliografia específica, este trabalho propõe a utilização de ontologias computacionais em sistemas de recuperação de informação baseados no Modelo Espaço Vetorial. As ontologias são empregadas como estrutura terminológica externa utilizadas tanto na expansão dos termos de indexação quanto na expansão dos termos que compõe a expressão de busca. A expansão dos termos de indexação é feita logo após a extração dos termos mais representativos do documento em análise durante o processo de indexação, consistindo na adição de novos termos conceitualmente relacionados a fim de enriquecer a representação do documento. A expansão da consulta é obtida a partir da adição de novos termos relacionados aos já existentes na expressão de busca com o objetivo de melhor contextualizá-los. Nesta proposta utiliza-se apenas a estrutura terminológica e hierárquica oferecida por uma ontologia computacional OWL, sem considerar os demais tipos de relações possíveis nem as restrições lógicas que podem ser descritas, podendo esses recursos serem utilizados em trabalhos futuros na tentativa de melhorar ainda mais a eficiência do processo de recuperação. A proposta apresentada neste estudo pode ser implementada e futuramente tornar-se um sistema de recuperação de informação totalmente operacional.
The information retrieval occurs by means of match between the representations of documents from a collection and the representation of user information’s needs. A document is retrieved when its representation matches totally or partially to the user information’s needs. The process of information retrieval can be seen as a linguistic issue in which the document information content and the user information need are represented by a set of terms. Its efficiency depends on the quality of the representations of the documents and the terms used to represent the user’s information need. The more compatible these representations were, the more efficient the retrieval process. Based on an exploratory and descriptive research substantiated in a specific bibliography, this paper offers to use computational ontologies in information retrieval systems based on the Vector Space Model. The ontologies are applied as external terminological structures used in the indexing terms expansion as well as in the expansion of the terms which compound the query expression. The indexing terms expansion is made as soon as the extraction of the more representative terms of the document in analysis during the indexing process, consisting on the adding of new conceptually related terms in order to improve the document representation. Query expansion is obtained from adding new related terms to the existent ones in the query expression to better contextualize them. In this propose, only the terminological and hierarchical structure offered by an OWL computational ontology was used, regardless other possible relations and logical restrictions that could be descripted, saving these resources to be used in further works in an attempt to improve the retrieval process efficiency. The shown proposition can be implemented and become a fully operational information retrieval system.
APA, Harvard, Vancouver, ISO, and other styles
39

Hagberg, Lena, and Johanna Müntzing. "En tesaurus som ledsagare : En jämförande studie av tre sökstrategiers inverkan på återvinningsresultatet i en bibliografisk databas." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-18391.

Full text
Abstract:
This Master’s thesis is a comparative study of information retrieval results between three distinct search strategies in simulated automatic query expansion in a bibliographic database. Our purpose is to investigate which of the search strategies score the most effective precision and to what extent the same relevant documents are retrieved (overlapped). A thesaurus attached to the database is used to select appropriate descriptors for the baseline query formulations which subsequently are expanded with hierarchical relations. The search strategies are s1: A baseline query with two or three descriptors, s2: The baseline descriptors combined with at least one Narrower Term, s3: The baseline descriptors combined with Narrower Term and at least one Broader Term. A Document Cutoff Value of 15 is used and only the 15 highest ranked documents are judged by relevancy. The measurements used are precision for effectiveness and Jaccard’s index for overlap. In terms of precision, results reveal that s1 scores the highest value (average 84,8 %) with s2 and s3 in decreasing order (average 81,94 % and 61,41 % respectively). The overlap varies greatly depending on topic and the average is between s1 and s2 78,81 %, between s2 and s3 58,48 % and between s3 and s1 40,41 %. In short, average precision decreases as well as average overlap. The use of thesaurus in the applied strategy of automatic query expansion is not recommended in this specific database, if the aim is to increase precision. However, in single searches with the structure like s1 the thesaurus can be of assistance in the selection of specific search terms.
Uppsatsnivå: D
APA, Harvard, Vancouver, ISO, and other styles
40

Bouchoucha, Arbi. "Diversified query expansion." Thèse, 2015. http://hdl.handle.net/1866/12335.

Full text
Abstract:
La diversification des résultats de recherche (DRR) vise à sélectionner divers documents à partir des résultats de recherche afin de couvrir autant d’intentions que possible. Dans les approches existantes, on suppose que les résultats initiaux sont suffisamment diversifiés et couvrent bien les aspects de la requête. Or, on observe souvent que les résultats initiaux n’arrivent pas à couvrir certains aspects. Dans cette thèse, nous proposons une nouvelle approche de DRR qui consiste à diversifier l’expansion de requête (DER) afin d’avoir une meilleure couverture des aspects. Les termes d’expansion sont sélectionnés à partir d’une ou de plusieurs ressource(s) suivant le principe de pertinence marginale maximale. Dans notre première contribution, nous proposons une méthode pour DER au niveau des termes où la similarité entre les termes est mesurée superficiellement à l’aide des ressources. Quand plusieurs ressources sont utilisées pour DER, elles ont été uniformément combinées dans la littérature, ce qui permet d’ignorer la contribution individuelle de chaque ressource par rapport à la requête. Dans la seconde contribution de cette thèse, nous proposons une nouvelle méthode de pondération de ressources selon la requête. Notre méthode utilise un ensemble de caractéristiques qui sont intégrées à un modèle de régression linéaire, et génère à partir de chaque ressource un nombre de termes d’expansion proportionnellement au poids de cette ressource. Les méthodes proposées pour DER se concentrent sur l’élimination de la redondance entre les termes d’expansion sans se soucier si les termes sélectionnés couvrent effectivement les différents aspects de la requête. Pour pallier à cet inconvénient, nous introduisons dans la troisième contribution de cette thèse une nouvelle méthode pour DER au niveau des aspects. Notre méthode est entraînée de façon supervisée selon le principe que les termes reliés doivent correspondre au même aspect. Cette méthode permet de sélectionner des termes d’expansion à un niveau sémantique latent afin de couvrir autant que possible différents aspects de la requête. De plus, cette méthode autorise l’intégration de plusieurs ressources afin de suggérer des termes d’expansion, et supporte l’intégration de plusieurs contraintes telles que la contrainte de dispersion. Nous évaluons nos méthodes à l’aide des données de ClueWeb09B et de trois collections de requêtes de TRECWeb track et montrons l’utilité de nos approches par rapport aux méthodes existantes.
Search Result Diversification (SRD) aims to select diverse documents from the search results in order to cover as many search intents as possible. For the existing approaches, a prerequisite is that the initial retrieval results contain diverse documents and ensure a good coverage of the query aspects. In this thesis, we investigate a new approach to SRD by diversifying the query, namely diversified query expansion (DQE). Expansion terms are selected either from a single resource or from multiple resources following the Maximal Marginal Relevance principle. In the first contribution, we propose a new term-level DQE method in which word similarity is determined at the surface (term) level based on the resources. When different resources are used for the purpose of DQE, they are combined in a uniform way, thus totally ignoring the contribution differences among resources. In practice the usefulness of a resource greatly changes depending on the query. In the second contribution, we propose a new method of query level resource weighting for DQE. Our method is based on a set of features which are integrated into a linear regression model and generates for a resource a number of expansion candidates that is proportional to the weight of that resource. Existing DQE methods focus on removing the redundancy among selected expansion terms and no attention has been paid on how well the selected expansion terms can indeed cover the query aspects. Consequently, it is not clear how we can cope with the semantic relations between terms. To overcome this drawback, our third contribution in this thesis aims to introduce a novel method for aspect-level DQE which relies on an explicit modeling of query aspects based on embedding. Our method (called latent semantic aspect embedding) is trained in a supervised manner according to the principle that related terms should correspond to the same aspects. This method allows us to select expansion terms at a latent semantic level in order to cover as much as possible the aspects of a given query. In addition, this method also incorporates several different external resources to suggest potential expansion terms, and supports several constraints, such as the sparsity constraint. We evaluate our methods using ClueWeb09B dataset and three query sets from TRECWeb tracks, and show the usefulness of our proposed approaches compared to the state-of-the-art approaches.
APA, Harvard, Vancouver, ISO, and other styles
41

Tai, Chia-Hung, and 戴嘉宏. "Fuzzy Cluster-Based Query Expansion." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/41760976903310141825.

Full text
Abstract:
碩士
國立中山大學
資訊管理學系研究所
92
Advances in information and network technologies have fostered the creation and availability of a vast amount of online information, typically in the form of text documents. Information retrieval (IR) pertains to determining the relevance between a user query and documents in the target collection, then returning those documents that are likely to satisfy the user’s information needs. One challenging issue in IR is word mismatch, which occurs when concepts can be described by different words in the user queries and/or documents. Query expansion is a promising approach for dealing with word mismatch in IR. In this thesis, we develop a fuzzy cluster-based query expansion technique to solve the word mismatch problem. Using existing expansion techniques (i.e., global analysis and non-fuzzy cluster-based query expansion) as performance benchmarks, our empirical results suggest that the fuzzy cluster-based query expansion technique can provide a more accurate query result than the benchmark techniques can.
APA, Harvard, Vancouver, ISO, and other styles
42

Lin, Tien-Chien, and 林典鍵. "Query Expansion via Wikipedia Link." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/18697675479617353053.

Full text
Abstract:
碩士
朝陽科技大學
資訊工程系碩士班
96
Query expansion is a well-known technique to increase recall value. Previous works show that good query expansion can also increase top N precision. Since users usually browse top N search results first, the precision of top N search result is very important. In this paper, we use the anchor texts in Wikipedia as a resource to expand the original query. Query term in Wikipedia will be expanded with the anchor texts in the Wikipedia page. We conduct experiments on TREC data disk 4 and 5 and compare with Okapi BM25. The experiment results show improvement on mean average precision.
APA, Harvard, Vancouver, ISO, and other styles
43

Huang, Chun-Neng, and 黃群能. "Cluster-based Query Expansion Technique." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/64988777413472635153.

Full text
Abstract:
碩士
國立中山大學
資訊管理學系研究所
91
As advances in information and networking technologies, huge amount of information typically in the form of text documents are available online. To facilitate efficient and effective access to documents relevant to users’ information needs, information retrieval systems have been imposed a more significant role than ever. One challenging issue in information retrieval is word mismatch that refers to the phenomenon that concepts may be described by different words in user queries and/or documents. The word mismatch problem, if not appropriately addressed, would degrade retrieval effectiveness critically of an information retrieval system. In this thesis, we develop a cluster-based query expansion technique to solve the word mismatch problem. Using the traditional query expansion techniques (i.e., global analysis and local feedback) as performance benchmarks, the empirical results suggest that when a user query only consists of one query term, the global analysis technique is more effective. However, if a user query consists of two or more query terms, the cluster-based query expansion technique can provide a more accurate query result, especially within the first few top-ranked documents retrieved.
APA, Harvard, Vancouver, ISO, and other styles
44

Lin, Hsi-Ching, and 林錫慶. "New Methods for Query Expansion and Query Reweighting for Document Retrieval." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/84539972972055978983.

Full text
Abstract:
碩士
國立臺灣科技大學
資訊工程系
93
In document retrieval systems, query terms play an important role which can affect the performance of document retrieval systems. The performance of document retrieval systems can be improved by using query terms expansion techniques and query terms reweighting techniques. In this thesis, we present two new methods for query terms expansion and query terms rewieghting. The first method chooses additional query terms for query expansion according to the degrees of importance of relevant terms and use fuzzy rules to infer their weights for document retrieval. The second method adjusts the weights of query terms to be optimal using neural networks for document retrieval. The proposed methods increase the performance of information retrieval systems for dealing with document retrieval.
APA, Harvard, Vancouver, ISO, and other styles
45

Seher, Indra, University of Western Sydney, College of Health and Science, and School of Computing and Mathematics. "A personalised query expansion approach using context." 2007. http://handle.uws.edu.au:8081/1959.7/33427.

Full text
Abstract:
Users of the Web usually use search engines to find answers to a variety of questions. Although search engines can rapidly process a large number of Web documents, in many cases, the answers returned by search engines are not relevant to the user’s information need, although they do contain the same keywords as the query. This is because the Web contains information sources created by numerous authors independently, and the authors’ vocabularies vary greatly. Furthermore, most words in natural languages have inherent ambiguity. This vocabulary mismatch between user queries and Web sources is often addressed through query expansion. Moreover, user questions are often short. The results of a search can be improved when the length of the question is long. Various query expansion methods that add useful question-related terms before processing the question have been proposed and proven to increase the performance of the result. Some of these query expansion methods add contextual information related to the user and the question. On the other hand, human communications are quite successful and seem to be very easy. This is mainly due to the understanding of language and the world knowledge that humans have. Human communication is more successful when there is an implicit understanding of everyday situations of others who take part in the communication. Here the implicit situational information, or the “context” that humans share, enables them to have a more meaningful interaction amongst themselves. Similar to human–human communications, improving computers’ access to context can increase the richness of human–computer communications, giving more useful computational services to users. Based on the above factors, this research proposes a method to make use of context in order to understand and process user requests. Here, the term “context” means the meanings associated with key query terms and preferences that have to be decided in order to process the query. As in a natural environment, results produced to different users for the same question could vary in an automated system. If the automated system knows users’ preferences related to the question, then it could make use of these preferences to process user queries, producing more relevant and useful results to the user. Hence, a new approach for a personalised query expansion is proposed in this research, where user queries are expanded with user preferences and hence the expanded queries that will be used for processing vary for different users. An architecture that is required for such a Web application to carryout a personalised query expansion with contextual information is also proposed in the thesis. The preferences that could be used for the query expansion are therefore user-specific. Users have different set of preferences depending on the tasks they want to perform. Similar tasks that have same types of preferences can be grouped into task based domains. Hence, user preferences will be the same in a domain, and will vary across domains. Furthermore, there can be different types of subtasks that could be performed within a domain. The set of preferences that could be used for each sub task could vary, and it will be a sub set of the set of preferences of the domain. Hence, an approach for a personalised query expansion which adds user, domain and task-specific preferences to user queries is proposed in this research. The main stages of this expansion are identified and discussed in this thesis. Each of these stages requires different contextual information which is represented in the context model. Out of the main stages identified in the query expansion process, the first three stages, the domain identification, task identification, and missing parameter identification, are explored in the thesis. As the preferences used for the expansion depend on the query domain, it is necessary to identify the domain of the query at first instance. Hence, a domain identification algorithm which makes use of eight different features is proposed in the thesis to identify domains of given queries. This domain identification also reduces the ambiguity of query terms. When the query domain is identified, context/associating meanings of query terms are known. This limits the scope of the possible misinterpretations of query terms. A domain ontology, domain dictionary, and user profile are used by the domain identification algorithm. The domain ontology consists of objects and their categories, attributes of objects and their categories, relationships among objects, and instances and their categories in the domain. The domain dictionary consists of objects and attributes. This is created automatically from the domain ontology. The user profile has the long term preferences of the user that are domain-specific and general. When the domain of the query is known, in order to decide the preferences of the user, the task specified in the query has to be identified. This task identification process is found to be similar in domains with similar activities. Hence, domains are grouped at this stage. These domain groups and the rules that could be used to find out the tasks in the domain groups are identified and discussed in the thesis. For each sub tasks in the domain groups, the types of preferences that could be used to expand user queries are identified and are used to expand user queries. An experiment is designed to evaluate the performance of the proposed approach. The first three stages of the query expansion, the domain identification, task identification, and missing parameter identification, are implemented and evaluated. Samples of five domains are implemented, and queries are collected in these domains from various users. In order to create new domains, a wizard is provided by the system. This system also allows editing the existing domains, domain groups, and types of preferences in sub tasks of the domain groups. Instances of the attributes are manually identified and added to the system using the interface provided by the system. In each of the stages of the query expansion, the results of the queries are manually identified, and are compared with the results produced by the system. The results have confirmed that the proposed method has a positive impact in query expansion. The experiments, results and evaluation of the proposed query expansion approach are also presented in the thesis. The proposed approach for the query expansion could be used by search engines, organisations with a limited set of task domains, and any application that can be improved by making use of personalised query expansion.
Doctor of Philosophy (PhD)
APA, Harvard, Vancouver, ISO, and other styles
46

Chiang, Shun-hsien, and 江舜絃. "A Knowledge-based Chinese Query Expansion System." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/94df24.

Full text
Abstract:
碩士
國立中央大學
資訊管理研究所
97
Search engine has become an essential tool in the era of the information explosion, hence the topic of helping users to filter an excess of information and take personal implicit searching intentions into consideration in order to reach personalized searching ranking has always been important. Knowledge ontology was used to depict user’s preference and a Chinese keyword recommendation system was proposed to accomplish a Chinese Query Expansion. Analyzing the site maps of the whole user’s past browsing via web crawler, constructing a wider range of personalized domain knowledge automatically by Formal Concept Analysis, and combining Query Expansion and personal ontology which is automatic-learning through HowNet, the more complete information can be accessed easily. When user submits keywords, the system will compare keywords and concepts of personalized ontology in user’s profile in order to produce extended keyword sets similar to the keywords inputted and to be recommended to user to acquire more document information including the same concepts. The experimental results show that the system increases the retrieval precision over 70% and the retrieval precision almost doubles. By filtering most web documents unconcerned with user’s interests to acquire the actual needed information. The algorithm we proposed that provide automatic-generated user’s knowledge database, a wider range of training data source, a semi-automatic recommended mechanism of Chinese expansion words, and a sememe database of HowNet in Traditional Chinese, is proved to have better retrieval accuracy in the Chinese environment compare to methods of ordinary ontology query expansion.
APA, Harvard, Vancouver, ISO, and other styles
47

Cai, Zih-Long, and 蔡子龍. "Interactive Web Query Expansion Using Concept Hierarchy." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/36938834615691881275.

Full text
Abstract:
碩士
國立高雄第一科技大學
資訊管理所
97
Traditional data retrieval processes usually use the Boolean operation to filter out unnecessary data. Such approach only produces a ‘0’ or ‘1’ evaluation result, where ‘0’ expresses both terms are different, and ‘1’ means they are equivalent. However, the Boolean operation method is not practical enough, as there may be some fuzzy space laid between such two-type results. However how to analyze the fuzzy space is a Latent Semantic Analysis topic. Our objective in this thesis is to improve the crisp analysis induced by the Boolean operation method. We conducted Web Mining techniques in building an auto-constructed Knowledge Structure, which keeps useful information of terms, including Concept Hierarchy, Link Type and Information Distance. To utilize this useful information, our approach introduces a Query Expansion process to extend the searching result with potential associations with user’s searching concept. On the other hand, for those users who are not well-experienced or are lacking of professional domain knowledge, we provide various types of Query Expansion strategies to assist users in narrowing or broadening the searching scopes. Based on our approach, users could spend less time and effort in the on-line data retrieval process, but gain more searching result, together with some useful information close to their needs.
APA, Harvard, Vancouver, ISO, and other styles
48

Chang, Jed Kao-Tung, and 張果通. "Query expansion based on attributes of objects." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/71357177671969472146.

Full text
Abstract:
碩士
國立臺灣大學
資訊工程學研究所
87
The thesis proposes a query expansion mechanism on the attributes of the objects in a digital library. The query expansion mechanism involves matching the desired attributes specified in the user's query and the attributes of the objects. Objects with matched attributes are then forwarded to the search utilities. The proposed query expansion mechanism effectively closes up the gap between the user's concept and terms presented in the digital library. As a result, the precision and recall rates of the search utilities are improved. This thesis elaborates the data structures developed for conducting the proposed query expansion and demonstrates the effects achieved. This thesis also conducts an experiment to evaluate the proposed mechanism.
APA, Harvard, Vancouver, ISO, and other styles
49

Lin, Shen-mu, and 林伸穆. "Applying Novel Relevance Feedback in Query Expansion Enhancement." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/28437136319094505566.

Full text
Abstract:
碩士
國立雲林科技大學
資訊管理系碩士班
94
Query Expansion was designed to overcome the barren query words issued by the user and has been applied in many commercial products. This treatment tries to expand query words to identify users’ real requirement based on semantic computation. It may be critical to deal with the problem of information overloading and diminish the using threshold, however the modern retrieval systems usually lack user modeling and are not adaptive to individual users, resulting in inherently non-optimal retrieval performance. In this study, we propose the LLSF method based on each individual search history to automatically generate specific personalized profile matrix. By which to generate context-based expanded query words. Considering the accuracy of retrieving performance, we process query words re-weighting and document pooling algorithm to achieve this goal. Finally, the documents list is ranked by the way of stressed density distribution modeling. And the experimental results show that our framework corresponds to personalization and the performance is very promising.
APA, Harvard, Vancouver, ISO, and other styles
50

Huang, Szu-Jui, and 黃思瑞. "Automatic Query Expansion based on Non-Ramdomness Model." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/70455312256143540096.

Full text
Abstract:
碩士
國立中央大學
資訊管理研究所
95
Automatic query expansion addresses the problem of word mismatching that the words provided by the users in the query are not consistent with the words used by the authors. The problem of word mismatching can result in poor retrieval effectiveness. Many techniques of automatic query expansion have been developed and proved to improve retrieval effectiveness. We apply the concept of the non-randomness of probabilistic model to conceive a method for automatic query expansion. Top-ranked documents that are retrieved in the initial retrieval are used as the source of expansion terms. The candidate expansion terms are re-weighted and selected within Rocchio framework. Experimenting results show that our approach can improve the effectiveness of retrieving significantly. The experiments have the parameters that can influence the performance of automatic query expansion considered and analyzed, including number of selected documents, number of expansion terms and parameters in the Rocchio framework.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography