Dissertations / Theses: 'Information Retrieva'

1

BASSANI, ELIAS. "Neural Approaches to Personalized Search." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2023. https://hdl.handle.net/10281/404515.

Full text

Abstract:

I recenti progressi nella ricerca sulle Artificial Neural Networks (reti neurali) hanno fatto avanzare lo stato dell'arte in molti task legati al linguaggio, tra cui l'Information Retrieval, offrendo nuove opportunità per rappresentare e sfruttare le informazioni relative all'utente durante la personalizzazione. Tuttavia, la loro applicazione nel contesto della Personalized Search è ancora un'area di ricerca aperta, con molte questioni e sfide da affrontare. In questa tesi, ci concentriamo sulla rappresentazione delle preferenze dell'utente da più prospettive, sulla gestione e sulla selezione delle informazioni dell'utente per personalizzare la ricerca corrente e sul miglioramento della rappresentazione delle query con dati specifici dell'utente, proponendo nuovi approcci basati sulle reti neurali. Inoltre, affrontiamo il problema della mancanza di grandi dataset condivisi pubblicamente, adatti all'addestramento e alla valutazione di approcci basati su reti neurali per la ricerca personalizzata. In primo luogo, studiamo il problema di sfruttare le preferenze degli utenti rappresentate da più prospettive, proponendo un modello di ri-ranking multi-rappresentazione. Dimostriamo che l'approccio proposto raggiunge prestazioni competitive, è efficiente, scalabile e può essere esteso per includere rappresentazioni ed informazioni aggiuntive. In seguito, abbiamo condotto un'analisi approfondita di un meccanismo delle reti neurali, l'Attention, quando viene impiegato per la modellazione degli utenti, evidenziando alcune carenze dovute a uno dei suoi componenti interni, la funzione di normalizzazione Softmax. Per ovviare a tali carenze, abbiamo introdotto una nuova variante dell'Attention, l'a Denoising Attention, che adotta uno schema di normalizzazione più robusto e impiega un meccanismo di filtraggio. Le valutazioni sperimentali mostrano chiaramente i vantaggi dell'approccio proposto rispetto alle altre varianti di Attention. Inoltre, ci occupiamo del miglioramento delle rappresentazioni delle query con dati specifici dell'utente, proponendo un nuovo approccio di Query Expansion personalizzata progettato per i contextual word embedding, che sfrutta una procedura offline basata sul clustering per identificare i termini correlati all'utente che meglio rappresentano i suoi interessi. Dimostriamo che migliora in termini di efficacia di recupero rispetto ai metodi di Query Expansion basati su word embedding allo stato dell'arte, ottenendo anche tempi di espansione inferiori al millisecondo grazie a un'approssimazione da noi proposta. Infine, discutiamo lo stato della valutazione dell'Information Retrieval personalizzato e i dataset disponibili pubblicamente e proponiamo e condividiamo un nuovo benchmark su larga scala per quattro domini, con oltre 18 milioni di documenti e 1,9 milioni di query. Presentiamo una descrizione dettagliata della procedura di costruzione del benchmark, evidenziandone le caratteristiche e le sfide, e forniamo delle linee guida per i lavori futuri. Le soluzioni e i risultati presentati in questa tesi dimostrano che la ricerca personalizzata è un'area di ricerca ancora aperta. Inoltre, le nuove opportunità offerte dai recenti progressi delle reti neurali introducono anche nuove sfide che devono essere affrontate correttamente per sfruttare appieno il loro potenziale e renderle utili per le applicazioni di ricerca personalizzata del mondo reale.
The recent advancements in Neural Networks research have pushed forward the state-of-the-art in many language-related tasks, including Information Retrieval, bringing new opportunities for representing and leveraging user-related information during personalization. However, their application in the context of Personalized Search is still an open research area, with many issues and challenges to be addressed and tackled. In this dissertation, we focus on representing the user preferences from multiple perspectives, managing and selecting the user information to personalize the current search, and improving query representations with user-specific data by proposing new approaches based on Neural Networks. Moreover, we address the lack of publicly available large-scale datasets suited for training and evaluating Neural Networks-based approaches for Personalized Search. We first study the problem of leveraging the user preferences represented from multiple perspectives by proposing a multi-representation re-ranking model. We show that our proposed approach achieves competitive performance while being fast, scalable, and extended to include additional representations and features. We then conduct an in-depth analysis of a Neural Networks mechanism, the Attention, when employed for user modeling, highlighting some shortcomings due to one of its internal components, the Softmax normalization function. We address those shortcomings by introducing a novel Attention variant, the Denoising Attention, that adopts a more robust normalization scheme and employs a filtering mechanism. Experimental evaluations clearly show the benefits of our proposed approach over other Attention variants. Furthermore, we address the enhancement of query representations with user-specific data by proposing a novel Personalized Query Expansion approach designed for contextualized word embeddings, which leverages an offline clustering-based procedure to identify the user-related terms that better represent the user interests. We show it improves in terms of retrieval effectiveness over word embedding-based Query Expansion methods at the state-of-the-art while also achieving sub-millisecond expansion time thanks to an approximation we propose. Finally, we discuss the state of Personalized Information Retrieval evaluation and the available publicly available datasets and propose and share a novel large-scale benchmark across four domains, with more than 18 million documents and 1.9 million queries. We present a detailed description of the benchmark construction procedure, highlighting its characteristics and challenges, and provide baselines for future works. The solutions and findings presented in this dissertation show that Personalized Search is still an open research area. Moreover, the new opportunities brought to the table by the recent advancements in Neural Networks also introduce new challenges that need to be correctly addressed to both take full advantage of their potential and make them valuable for real-world Personalized Search applications.

APA, Harvard, Vancouver, ISO, and other styles

2

Bartow, Paul J. "Information retrieval /." Online version of thesis, 1991. http://hdl.handle.net/1850/12169.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Lui, Chang. "Synatic Information Retrieval." Thesis, University of Ulster, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.516287.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Dunlop, Mark David. "Multimedia information retrieval." Thesis, University of Glasgow, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358626.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Keim, Michelle. "Bayesian information retrieval /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/8937.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Brucato, Matteo. "Temporal Information Retrieval." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/5690/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Buscaldi, Davide. "Toponym Disambiguation in Information Retrieval." Doctoral thesis, Universitat Politècnica de València, 2010. http://hdl.handle.net/10251/8912.

Full text

Abstract:

In recent years, geography has acquired a great importance in the context of Information Retrieval (IR) and, in general, of the automated processing of information in text. Mobile devices that are able to surf the web and at the same time inform about their position are now a common reality, together with applications that can exploit this data to provide users with locally customised information, such as directions or advertisements. Therefore, it is important to deal properly with the geographic information that is included in electronic texts. The majority of such kind of information is contained as place names, or toponyms. Toponym ambiguity represents an important issue in Geographical Information Retrieval (GIR), due to the fact that queries are geographically constrained. There has been a struggle to nd speci c geographical IR methods that actually outperform traditional IR techniques. Toponym ambiguity may constitute a relevant factor in the inability of current GIR systems to take advantage from geographical knowledge. Recently, some Ph.D. theses have dealt with Toponym Disambiguation (TD) from di erent perspectives, from the development of resources for the evaluation of Toponym Disambiguation (Leidner (2007)) to the use of TD to improve geographical scope resolution (Andogah (2010)). The Ph.D. thesis presented here introduces a TD method based on WordNet and carries out a detailed study of the relationship of Toponym Disambiguation to some IR applications, such as GIR, Question Answering (QA) and Web retrieval. The work presented in this thesis starts with an introduction to the applications in which TD may result useful, together with an analysis of the ambiguity of toponyms in news collections. It could not be possible to study the ambiguity of toponyms without studying the resources that are used as placename repositories; these resources are the equivalent to language dictionaries, which provide the di erent meanings of a given word.
Buscaldi, D. (2010). Toponym Disambiguation in Information Retrieval [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8912
Palancia

APA, Harvard, Vancouver, ISO, and other styles

8

Morgenroth, Karlheinz. "Kontextbasiertes Information-Retrieval : Modell, Konzeption und Realisierung kontextbasierter Information-Retrieval-Systeme /." Berlin : Logos, 2006. http://deposit.ddb.de/cgi-bin/dokserv?id=2786087&prov=M&dok_var=1&dok_ext=htm.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Koenders, Michael. "FROM MUSIC INFORMATION RETRIEVAL (MIR) TO INFORMATION RETRIEVAL FOR MUSIC (IRM)." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/16914.

Full text

Abstract:

This thesis reviews and discusses certain techniques from the domain of (Music) Information Retrieval, in particular some general data mining algorithms. It also describes their specific adaptations for use as building blocks in the CACE4 software application. The use of Augmented Transition Networks (ATN) from the field of (Music) Information Retrieval is, to a certain extent, adequate as long as one keeps the underlying tonal constraints and rules as a guide to understanding the structure one is looking for. However since a large proportion of algorithmic music, including music composed by the author, is atonal, tonal constraints and rules are of little use. Analysis methods from Hierarchical Clustering Techniques (HCT) such as k-means and Expectation-Maximisation (EM) facilitate other approaches and are better suited for finding (clustered) structures in large data sets. ART2 Neural Networks (Adaptive Resonance Theory) for example, can be used for analysing and categorising these data sets. Statistical tools such as histogram analysis, mean, variance as well as correlation calculations can provide information about connections between members in a data set. Altogether this provides a diverse palette of usable data analysis methods and strategies for creating algorithmic atonal music. Now acting as (software) strategy tools, their use is determined by the quality of their output within a musical context, as demonstrated when developed and programmed into the Computer Assisted Composition Environment: CACE4. Music Information Retrieval techniques are therefore inverted: their specific techniques and associated methods of Information Retrieval and general data mining are used to access the organisation and constraints of abstract (non-specific musical) data in order to use and transform it in a musical composition.

APA, Harvard, Vancouver, ISO, and other styles

10

Osodo, Jennifer Akinyi. "An extended vector-based information retrieval system to retrieve e-learning content based on learner models." Thesis, University of Sunderland, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.542053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Graf, Erik. "Human information processing based information retrieval." Thesis, University of Glasgow, 2011. http://theses.gla.ac.uk/5188/.

Full text

Abstract:

This work focused on the investigation of the question how the concept of relevance in Information Retrieval can be validated. The work is motivated by the consistent difﬁculties of deﬁning the meaning of the concept, and by advances in the ﬁeld of cognitive science. Analytical and empirical investigations are carried out with the aim of devising a principled approach to the validation of the concept. The foundation for this work was set by interpreting relevance as a phenomenon occurring within the context of two systems: An IR system and the cognitive processing system of the user. In light of the cognitive interpretation of relevance, an analysis of the learnt lessons in cognitive science with regard to the validation of cognitive phenomena was conducted. It identiﬁed that construct validity constitutes the dominant approach to the validation of constructs in cognitive science. Construct validity constitutes a proposal for the conduction of validation in scenarios, where no direct observation of a phenomenon is possible. With regard to the limitations on direct observation of a construct (i.e. a postulated theoretic concept), it bases validation on the evaluation of its relations to other constructs. Based on the interpretation of relevance as a product of cognitive processing it was concluded, that the limitations with regard to direct observation apply to its investigation. The evaluation of its applicability to an IR context, focused on the exploration of the nomological network methodology. A nomological network constitutes an analytically constructed set of constructs and their relations. The construction of such a network forms the basis for establishing construct validity through investigation of the relations between constructs. An analysis focused on contemporary insights to the nomological network methodology identiﬁed two important aspects with regard to its application in IR. The ﬁrst aspect is given by a choice of context and the identiﬁcation of a pool of candidate constructs for the inclusion in the network. The second consists of identifying criteria for the selection of a set of constructs from the candidate pool. The identiﬁcation of the pertinent constructs for the network was based on a review of the principles of cognitive exploration, and an analysis of the state of the art in text based discourse processing and reasoning. On that basis, a listing of known sub-processes contributing to the pertinent cognitive processing was presented. Based on the identiﬁcation of a large number of potential candidates, the next step consisted of the inference of criteria for the selection of an initial set of constructs for the network. The investigation of these criteria focused on the consideration of pragmatic and meta-theoretical aspects. Based on a survey of experimental means in cognitive science and IR, ﬁve pragmatic criteria for the selection of constructs were presented. Consideration of meta-theoretically motivated criteria required to investigate what the speciﬁc challenges with regard to the validation of highly abstract constructs are. This question was explored based on the underlying considerations of the Information Processing paradigm and Newell’s (1994) cognitive bands. This led to the identiﬁcation of a set of three meta-theoretical criteria for the selection of constructs. Based on the criteria and the demarcated candidate pool, an IR focused nomological network was deﬁned. The network consists of the constructs of relevance and type and grade of word relatedness. A necessary prerequisite for making inferences based on a nomological network consists of the availability of validated measurement instruments for the constructs. To that cause, two validation studies targeting the measurement of the type and grade of relations between words were conducted. The clariﬁcation of the question of the validity of the measurement instruments enabled the application of the nomological network. A ﬁrst step of the application consisted of testing if the constructs in the network are related to each other. Based on the alignment of measurements of relevance and the word related constructs it was concluded to be true. The relation between the constructs was characterized by varying the word related constructs over a large parameter space and observing the effect of this variation on relevance. Three hypotheses relating to different aspects of the relations between the word related constructs and relevance. It was concluded, that the conclusive conﬁrmation of the hypotheses requires an extension of the experimental means underlying the study. Based on converging observations from the empirical investigation of the three hypotheses it was concluded, that semantic and associative relations distinctly differ with regard to their impact on relevance estimation.

APA, Harvard, Vancouver, ISO, and other styles

12

Abdulahhad, Karam. "Information retrieval modeling by logic and lattice : application to conceptual information retrieval." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM014/document.

Full text

Abstract:

Cette thèse se situe dans le contexte des modèles logique de Recherche d'Information (RI). Le travail présenté dans la thèse est principalement motivé par l'inexactitude de l'hypothèse sur l'indépendance de termes. En effet, cette hypothèse communément acceptée en RI stipule que les termes d'indexation sont indépendant les un des autres. Cette hypothèse est fausse en pratique mais permet tout de même aux systèmes de RI de donner de bon résultats. La proposition contenue dans cette thèse met également l'emphase sur la nature déductive du processus de jugement de pertinence. Les logiques formelles sont bien adaptées pour la représentation des connaissances. Elles permettent ainsi de représenter les relations entre les termes. Les logiques formelles sont également des systèmes d'inférence, ainsi la RI à base de logique constitue une piste de travail pour construire des systèmes efficaces de RI. Cependant, en étudiant les modèles actuels de RI basés sur la logique, nous montrons que ces modèles ont généralement des lacunes. Premièrement, les modèles de RI logiques proposent normalement des représentations complexes de document et des requête et difficile à obtenir automatiquement. Deuxièmement, la décision de pertinence d->q, qui représente la correspondance entre un document d et une requête q, pourrait être difficile à vérifier. Enfin, la mesure de l'incertitude U(d->q) est soit ad-hoc ou difficile à mettre en oeuvre. Dans cette thèse, nous proposons un nouveau modèle de RI logique afin de surmonter la plupart des limites mentionnées ci-dessus. Nous utilisons la logique propositionnelle (PL). Nous représentons les documents et les requêtes comme des phrases logiques écrites en Forme Normale Disjonctive. Nous argumentons également que la décision de pertinence d->q pourrait être remplacée par la validité de l'implication matérielle. Pour vérifier si d->q est valide ou non, nous exploitons la relation potentielle entre PL et la théorie des treillis. Nous proposons d'abord une représentation intermédiaire des phrases logiques, où elles deviennent des noeuds dans un treillis ayant une relation d'ordre partiel équivalent à la validité de l'implication matérielle. En conséquence, nous transformons la vérification de validité de d->q, ce qui est un calcul intensif, en une série de vérifications simples d'inclusion d'ensembles. Afin de mesurer l'incertitude de la décision de pertinence U(d->q), nous utilisons la fonction du degré d'inclusion Z, qui est capable de quantifier les relations d'ordre partielles définies sur des treillis. Enfin, notre modèle est capable de travailler efficacement sur toutes les phrases logiques sans aucune restriction, et est applicable aux données à grande échelle. Notre modèle apporte également quelques conclusions théoriques comme: la formalisation de l'hypothèse de van Rijsbergen sur l'estimation de l'incertitude logique U(d->q) en utilisant la probabilité conditionnelle P(q|d), la redéfinition des deux notions Exhaustivité et Spécificité, et finalement ce modèle a également la possibilité de reproduire les modèles les plus classiques de RI. De manière pratique, nous construisons trois instances opérationnelles de notre modèle. Une instance pour étudier l'importance de Exhaustivité et Spécificité, et deux autres pour montrer l'insuffisance de l'hypothèse sur l'indépendance des termes. Nos résultats expérimentaux montrent un gain de performance lors de l'intégration Exhaustivité et Spécificité. Cependant, les résultats de l'utilisation de relations sémantiques entre les termes ne sont pas suffisants pour tirer des conclusions claires. Le travail présenté dans cette thèse doit être poursuivit par plus d'expérimentations, en particulier sur l'utilisation de relations, et par des études théoriques en profondeur, en particulier sur les propriétés de la fonction Z
This thesis is situated in the context of logic-based Information Retrieval (IR) models. The work presented in this thesis is mainly motivated by the inadequate term-independence assumption, which is well-accepted in IR although terms are normally related, and also by the inferential nature of the relevance judgment process. Since formal logics are well-adapted for knowledge representation, and then for representing relations between terms, and since formal logics are also powerful systems for inference, logic-based IR thus forms a candidate piste of work for building effective IR systems. However, a study of current logic-based IR models shows that these models generally have some shortcomings. First, logic-based IR models normally propose complex, and hard to obtain, representations for documents and queries. Second, the retrieval decision d->q, which represents the matching between a document d and a query q, could be difficult to verify or check. Finally, the uncertainty measure U(d->q) is either ad-hoc or hard to implement. In this thesis, we propose a new logic-based IR model to overcome most of the previous limits. We use Propositional Logic (PL) as an underlying logical framework. We represent documents and queries as logical sentences written in Disjunctive Normal Form. We also argue that the retrieval decision d->q could be replaced by the validity of material implication. We then exploit the potential relation between PL and lattice theory to check if d->q is valid or not. We first propose an intermediate representation of logical sentences, where they become nodes in a lattice having a partial order relation that is equivalent to the validity of material implication. Accordingly, we transform the checking of the validity of d->q, which is a computationally intensive task, to a series of simple set-inclusion checking. In order to measure the uncertainty of the retrieval decision U(d->q), we use the degree of inclusion function Z that is capable of quantifying partial order relations defined on lattices. Finally, our model is capable of working efficiently on any logical sentence without any restrictions, and is applicable to large-scale data. Our model also has some theoretical conclusions, including, formalizing and showing the adequacy of van Rijsbergen assumption about estimating the logical uncertainty U(d->q) through the conditional probability P(q|d), redefining the two notions Exhaustivity and Specificity, and the possibility of reproducing most classical IR models as instances of our model. We build three operational instances of our model. An instance to study the importance of Exhaustivity and Specificity, and two others to show the inadequacy of the term-independence assumption. Our experimental results show worthy gain in performance when integrating Exhaustivity and Specificity into one concrete IR model. However, the results of using semantic relations between terms were not sufficient to draw clear conclusions. On the contrary, experiments on exploiting structural relations between terms were promising. The work presented in this thesis can be developed either by doing more experiments, especially about using relations, or by more in-depth theoretical study, especially about the properties of the Z function

APA, Harvard, Vancouver, ISO, and other styles

13

Romano, Nicholas C., Dmitri G. Roussinov, Jay F. Nunamaker, and Hsinchun Chen. "Collaborative Information Retrieval Environment: Integration of Information Retrieval with Group Support Systems." HICSS, 1999. http://hdl.handle.net/10150/105688.

Full text

Abstract:

Artificial Intelligence Lab, Department of MIS, University of Arizona
Observations of Information Retrieval (IR) system user experiences reveal a strong desire for collaborative search while at the same time suggesting that collaborative capabilities are rarely, and then only in a limited fashion, supported by current searching and visualization tools. Equally interesting is the fact that observations of user experiences with Group Support Systems (GSS) reveal that although access to external information and the ability to search for relevant material is often vital to the progress of GSS sessions, integrated support for collaborative searching and visualization of results is lacking in GSS systems. After reviewing both user experiences described in IR and GSS literature and observing and interviewing users of existing IR and GSS commercial and prototype systems, the authors conclude that there is an obvious demand for systems supporting multi-user IR.. It is surprising to the authors that very little attention has been given to the common ground shared by these two important research domains. With this in mind, our paper describes how user experiences with IR and GSS systems has shed light on a promising new area of collaborative research and led to the development of a prototype that merges the two paradigms into a Collaborative Information Retrieval Environment (CIRE). Finally the paper presents theory developed from initial user experiences with our prototype and describes plans to test the efficacy of this new paradigm empirically through controlled experimentation.

APA, Harvard, Vancouver, ISO, and other styles

14

Malek, Behzad. "Efficient private information retrieval." Thesis, University of Ottawa (Canada), 2005. http://hdl.handle.net/10393/26966.

Full text

Abstract:

In this thesis, we study Private Information Retrieval and Oblivious Transfer, two strong cryptographic tools that are widely used in various security-related applications, such as private data-mining schemes and secure function evaluation protocols. The first non-interactive, secure dot-product protocol, widely used in private data-mining schemes, is proposed based on trace functions over finite fields. We further improve the communication overhead of the best, previously known Oblivious Transfer protocol from O ((log(n))2) to O (log(n)), where n is the size of the database. Our communication-efficient Oblivious Transfer protocol is a non-interactive, single-database scheme that is generally built on Homomorphic Encryption Functions. We also introduce a new protocol that reduces the computational overhead of Private Information Retrieval protocols. This protocol is shown to be computationally secure for users, depending on the security of McEliece public-key cryptosystem. The total online computational overhead is the same as the case where no privacy is required. The computation-saving protocol can be implemented entirely in software, without any need for installing a secure piece of hardware, or replicating the database among servers.

APA, Harvard, Vancouver, ISO, and other styles

15

Arapakis, Ioannis. "Affect-based information retrieval." Thesis, University of Glasgow, 2010. http://theses.gla.ac.uk/1867/.

Full text

Abstract:

One of the main challenges Information Retrieval (IR) systems face nowadays originates from the semantic gap problem: the semantic difference between a user’s query representation and the internal representation of an information item in a collection. The gap is further widened when the user is driven by an ill-defined information need, often the result of an anomaly in his/her current state of knowledge. The formulated search queries, which are submitted to the retrieval systems to locate relevant items, produce poor results that do not address the users’ information needs. To deal with information need uncertainty IR systems have employed in the past a range of feedback techniques, which vary from explicit to implicit. The first category of feedback techniques necessitates the communication of explicit relevance judgments, in return for better query reformulations and recommendations of relevant results. However, the latter happens at the expense of users’ cognitive resources and, furthermore, introduces an additional layer of complexity to the search process. On the other hand, implicit feedback techniques make inferences on what is relevant based on observations of user search behaviour. By doing so, they disengage users from the cognitive burden of document rating and relevance assessments. However, both categories of RF techniques determine topical relevance with respect to the cognitive and situational levels of interaction, failing to acknowledge the importance of emotions in cognition and decision making. In this thesis I investigate the role of emotions in the information seeking process and develop affective feedback techniques for interactive IR. This novel feedback framework aims to aid the search process and facilitate a more natural and meaningful interaction. I develop affective models that determine topical relevance based on information gathered from various sensory channels, and enhance their performance using personalisation techniques. Furthermore, I present an operational video retrieval system that employs affective feedback to enrich user profiles and offers meaningful recommendations of unseen videos. The use of affective feedback as a surrogate for the information need is formalised as the Affective Model of Browsing. This is a cognitive model that motivates the use of evidence extracted from the psycho-somatic mobilisation that occurs during cognitive appraisal. Finally, I address some of the ethical and privacy issues that arise from the social-emotional interaction between users and computer systems. This study involves questionnaire data gathered over three user studies, from 74 participants of different educational background, ethnicity and search experience. The results show that affective feedback is a promising area of research and it can improve many aspects of the information seeking process, such as indexing, ranking and recommendation. Eventually, it may be that relevance inferences obtained from affective models will provide a more robust and personalised form of feedback, which will allow us to deal more effectively with issues such as the semantic gap.

APA, Harvard, Vancouver, ISO, and other styles

16

Plachouras, Vasileios. "Selective web information retrieval." Thesis, University of Glasgow, 2006. http://theses.gla.ac.uk/1945/.

Full text

Abstract:

This thesis proposes selective Web information retrieval, a framework formulated in terms of statistical decision theory, with the aim to apply an appropriate retrieval approach on a per-query basis. The main component of the framework is a decision mechanism that selects an appropriate retrieval approach on a per-query basis. The selection of a particular retrieval approach is based on the outcome of an experiment, which is performed before the final ranking of the retrieved documents. The experiment is a process that extracts features from a sample of the set of retrieved documents. This thesis investigates three broad types of experiments. The first one counts the occurrences of query terms in the retrieved documents, indicating the extent to which the query topic is covered in the document collection. The second type of experiments considers information from the distribution of retrieved documents in larger aggregates of related Web documents, such as whole Web sites, or directories within Web sites. The third type of experiments estimates the usefulness of the hyperlink structure among a sample of the set of retrieved Web documents. The proposed experiments are evaluated in the context of both informational and navigational search tasks with an optimal Bayesian decision mechanism, where it is assumed that relevance information exists. This thesis further investigates the implications of applying selective Web information retrieval in an operational setting, where the tuning of a decision mechanism is based on limited existing relevance information and the information retrieval system’s input is a stream of queries related to mixed informational and navigational search tasks. First, the experiments are evaluated using different training and testing query sets, as well as a mixture of different types of queries. Second, query sampling is introduced, in order to approximate the queries that a retrieval system receives, and to tune an ad-hoc decision mechanism with a broad set of automatically sampled queries.

APA, Harvard, Vancouver, ISO, and other styles

17

S, Kralina G., and Tupota E. V. "The information retrieval technology." Thesis, Київ, Національний авіаційний університет, 2009. http://er.nau.edu.ua/handle/NAU/18794.

Full text

Abstract:

Information retrieval is the science of searching for documents, for information within documents and for metadata about documents, as well as that of searching relational databases and the World Wide Web. There is overlap in the usage of the terms data retrieval, document retrieval, information retrieval, and text retrieval, but each also has its own body of literature, theory, praxis and technologies

APA, Harvard, Vancouver, ISO, and other styles

18

Adebayo, Kolawole John <1986&gt. "Multimodal Legal Information Retrieval." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amsdottorato.unibo.it/8634/1/ADEBAYO-JOHN-tesi.pdf.

Full text

Abstract:

The goal of this thesis is to present a multifaceted way of inducing semantic representation from legal documents as well as accessing information in a precise and timely manner. The thesis explored approaches for semantic information retrieval (IR) in the Legal context with a technique that maps specific parts of a text to the relevant concept. This technique relies on text segments, using the Latent Dirichlet Allocation (LDA), a topic modeling algorithm for performing text segmentation, expanding the concept using some Natural Language Processing techniques, and then associating the text segments to the concepts using a semi-supervised text similarity technique. This solves two problems, i.e., that of user specificity in formulating query, and information overload, for querying a large document collection with a set of concepts is more fine-grained since specific information, rather than full documents is retrieved. The second part of the thesis describes our Neural Network Relevance Model for E-Discovery Information Retrieval. Our algorithm is essentially a feature-rich Ensemble system with different component Neural Networks extracting different relevance signal. This model has been trained and evaluated on the TREC Legal track 2010 data. The performance of our models across board proves that it capture the semantics and relatedness between query and document which is important to the Legal Information Retrieval domain.

APA, Harvard, Vancouver, ISO, and other styles

19

Koopman, Bevan Raymond. "Semantic search as inference : applications in health informatics." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/71385/1/Bevan_Koopman_Thesis.pdf.

Full text

Abstract:

This thesis developed new search engine models that elicit the meaning behind the words found in documents and queries, rather than simply matching keywords. These new models were applied to searching medical records: an area where search is particularly challenging yet can have significant benefits to our society.

APA, Harvard, Vancouver, ISO, and other styles

20

Powell, Allison L. "Database selection in distributed information retrieval a study of multi-collection information retrieval /." Full text, Acrobat Reader required, 2001. http://viva.lib.virginia.edu/etd/diss/SEAS/ComputerScience/2001/Powell/etd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

PUTRI, DIVI GALIH PRASETYO. "MULTIDIMENSIONAL RELEVANCE IN TASK-SPECIFIC RETRIEVAL." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2021. http://hdl.handle.net/10281/329919.

Full text

Abstract:

Relevance is the core notion in Information Retrieval. Several criteria of relevance have been proposed in the literature. Relevance criteria are strongly related to the search task. Thus, it is important to employ the criteria that are useful for the considered search task. This research explores the concept of multidimensional relevance in a specific search-task. In the first phase of this PhD thesis, we aim to investigate the relationship between the search tasks and the considered relevance dimensions. We performed an exploratory study on different search tasks in the Microblog search context, and we identify some related relevance dimensions. Our findings show that there is a relation between a task and specific relevance dimensions. This suggests that in different search-tasks, some relevance dimensions should be prioritized while others should not be considered. In the second part, we propose an approach that can be used to combine more than one relevance dimension. In particular, given that recent advancements in deep neural networks enable several learning tasks to be solved simultaneously, we examine the possibility of modeling multidimensional relevance by jointly solving a retrieval task, to learn topical relevance, and a classification task, to learn additional relevance dimensions. To instantiate and evaluate the proposed model, we consider three query-independent relevance dimensions beyond topicality, i.e., readability, trustworthiness, and credibility. The findings show that the proposed joint modeling can improve the performance of the retrieval task.
Relevance is the core notion in Information Retrieval. Several criteria of relevance have been proposed in the literature. Relevance criteria are strongly related to the search task. Thus, it is important to employ the criteria that are useful for the considered search task. This research explores the concept of multidimensional relevance in a specific search-task. In the first phase of this PhD thesis, we aim to investigate the relationship between the search tasks and the considered relevance dimensions. We performed an exploratory study on different search tasks in the Microblog search context, and we identify some related relevance dimensions. Our findings show that there is a relation between a task and specific relevance dimensions. This suggests that in different search-tasks, some relevance dimensions should be prioritized while others should not be considered. In the second part, we propose an approach that can be used to combine more than one relevance dimension. In particular, given that recent advancements in deep neural networks enable several learning tasks to be solved simultaneously, we examine the possibility of modeling multidimensional relevance by jointly solving a retrieval task, to learn topical relevance, and a classification task, to learn additional relevance dimensions. To instantiate and evaluate the proposed model, we consider three query-independent relevance dimensions beyond topicality, i.e., readability, trustworthiness, and credibility. The findings show that the proposed joint modeling can improve the performance of the retrieval task.

APA, Harvard, Vancouver, ISO, and other styles

22

Paulsen, Jon Rune. "Optimal Information Retrieval Model for Molecular Biology Information." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8718.

Full text

Abstract:

Search engines for biological information are not a new technology. Since the 1960s computers have emerged as an important tool for biologists. Online Mendelian Inheritance in Man (OMIM) is a comprehensive catalogue containing approximately 14 000 records with information about human genes and genetic disorders. An approach called Latent Semantic Indexing (LSI) was introduced in 1990 that is based on Singular Value Decomposition (SVD). This approach improved the information retrieval and reduced the storage requirements. This thesis applies LSI on the collection of OMIM records. To further improve the retrieval effectiveness and efficiency, the author propose a clustering method based on the standard k-means algorithm, called Two step k-means. Both the standard k-means and the Two step k-means algorithms are tested and compared with each other.

APA, Harvard, Vancouver, ISO, and other styles

23

Smith, Stephen C. "Reducing information overload by optimising information retrieval approaches." Thesis, Loughborough University, 2010. https://dspace.lboro.ac.uk/2134/35821.

Full text

Abstract:

The information within an organisation forms a fundamental part of its success. In recent years the volume of information housed and processed by organisations has increased exponentially and grown to such a rate that it can be difficult to harness and make successful use of that information. This growth of information has led to the increasing prevalence of the concept of information overload. Although information overload is not a new concept, it is still considered a large-scale problem, with its effect upon the workplace and employees becoming increasingly detrimental. With the increase in available information comes the potential for increased overload. This research addresses some of the potential barriers that may exist preventing effective discovery, storage and sharing of information and thus increasing the information overload problem.

APA, Harvard, Vancouver, ISO, and other styles

24

YU, HONGMING. "A PERSONALIZED INFORMATION ENVIRONMENT SYSTEM FOR INFORMATION RETRIEVAL." University of Cincinnati / OhioLINK, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1060875911.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Htun, Nyi Nyi. "Non-uniform information access in collaborative information retrieval." Thesis, Glasgow Caledonian University, 2017. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.738690.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Grangier, David. "Machine learning for information retrieval." Lausanne : École polytechnique fédérale de Lausanne, 2008. http://aleph.unisg.ch/volltext/464553_Grangier_Machine_learning_for_information_retrieval.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Homann, Ingo R. "Fuzzy-Suchmethoden im Information-Retrieval." [S.l. : s.n.], 2004. http://deposit.ddb.de/cgi-bin/dokserv?idn=971067163.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Craswell, Nicholas Eric, and Nick Craswell@anu edu au. "Methods for Distributed Information Retrieval." The Australian National University. Faculty of Engineering and Information Technology, 2001. http://thesis.anu.edu.au./public/adt-ANU20020315.142540.

Full text

Abstract:

Published methods for distributed information retrieval generally rely on cooperation from search servers. But most real servers, particularly the tens of thousands available on the Web, are not engineered for such cooperation. This means that the majority of methods proposed, and evaluated in simulated environments of homogeneous cooperating servers, are never applied in practice. ¶ This thesis introduces new methods for server selection and results merging. The methods do not require search servers to cooperate, yet are as effective as the best methods which do. Two large experiments evaluate the new methods against many previously published methods. In contrast to previous experiments they simulate a Web-like environment, where servers employ varied retrieval algorithms and tend not to sub-partition documents from a single source. ¶ The server selection experiment uses pages from 956 real Web servers, three different retrieval systems and TREC ad hoc topics. Results show that a broker using queries to sample servers documents can perform selection over non-cooperating servers without loss of effectiveness. However, using the same queries to estimate the effectiveness of servers, in order to favour servers with high quality retrieval systems, did not consistently improve selection effectiveness. ¶ The results merging experiment uses documents from five TREC sub-collections, five different retrieval systems and TREC ad hoc topics. Results show that a broker using a reference set of collection statistics, rather than relying on cooperation to collate true statistics, can perform merging without loss of effectiveness. Since application of the reference statistics method requires that the broker download the documents to be merged, experiments were also conducted on effective merging based on partial documents. The new ranking method developed was not highly effective on partial documents, but showed some promise on fully downloaded documents. ¶ Using the new methods, an effective search broker can be built, capable of addressing any given set of available search servers, without their cooperation.

APA, Harvard, Vancouver, ISO, and other styles

29

Sigge, Arne-Christian. "Digitale Softwaredokumentationen und Information-Retrieval." Berlin Logos-Verl, 2005. http://deposit.ddb.de/cgi-bin/dokserv?id=2757168&prov=M&dok_var=1&dok_ext=htm.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Sundaram, Senthil Karthikeyan. "REQUIREMENTS TRACING USING INFORMATION RETRIEVAL." UKnowledge, 2007. http://uknowledge.uky.edu/gradschool_diss/539.

Full text

Abstract:

It is important to track how a requirement changes throughout the software lifecycle. Each requirement should be validated during and at the end of each phase of the software lifecycle. It is common to build traceability matrices to demonstrate that requirements are satisfied by the design. Traceability matrices are needed in various tasks in the software development process. Unfortunately, developers and designers do not always build traceability matrices or maintain traceability matrices to the proper level of detail. Therefore, traceability matrices are often built after-the-fact. The generation of traceability matrices is a time consuming, error prone, and mundane process. Most of the times, the traceability matrices are built manually. Consider the case where an analyst is tasked to trace a high level requirement document to a lower level requirement specification. The analyst may have to look through M x N elements, where M and N are the number of high and low level requirements, respectively. There are not many tools available to assist the analysts in tracing unstructured textual artifacts and the very few tools that are available require enormous pre-processing. The prime objective of this work was to dynamically generate traceability links for unstructured textual artifacts using information retrieval (IR) methods. Given a user query and a document collection, IR methods identify all the documents that match the query. A closer observation of the requirements tracing process reveals the fact that it can be stated as a recursive IR problem. The main goals of this work were to solve the requirements traceability problem using IR methods and to improve the accuracy of the traceability links generated while best utilizing the analysts time. This work looked into adopting different IR methods and using user feedback to improve the traceability links generated. It also applied wrinkles such as filtering to the original IR methods. It also analyzed using a voting mechanism to select the traceability links identified by different IR methods. Finally, the IR methods were evaluated using six datasets. The results showed that automating requirements tracing process using IR methods helped save analysts time and generate good quality traceability matrices.

APA, Harvard, Vancouver, ISO, and other styles

31

Kural, S. Yasemin. "Clustering information retrieval search outputs." Thesis, City University London, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.312900.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Rhodes, Bradley James. "Just-in-time information retrieval." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/9022.

Full text

Abstract:

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Architecture, 2000.
Includes bibliographical references (p. 145-150) and index.
This thesis defines Just-In-Time Information Retrieval agents (JITIRs): a class of software agents that proactively present potentially valuable information based on a person's local context in an easily accessible yet non-intrusive manner. The research described experimentally demonstrates that such systems encourage the viewing and use of information that would not otherwise be viewed, by reducing the cognitive effort required to find, evaluate and access information. Experiments and analysis of long-term use provide a deeper understanding of the different ways JITIRs can be valuable: by providing useful or supporting information that is relevant to the current task, by contextualizing the current task in a broader framework, by providing information that is not useful in the current task but leads to the discovery of other information that is useful, and by providing information that is not useful for the current task but is valuable for other reasons. Finally, this research documents heuristics and techniques for the design of JITIRs. These techniques are based on theory and are demonstrated by the field-testing of three complete systems: the Remembrance Agent, Margin Notes, and Jimminy. Specifically, these heuristics are designed to make information accessible with low effort, and yet ignorable should the user wish to concentrate entirely on his primary task.
by Bradley James Rhodes.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

33

Kramer, Joshua David. "Agent based personalized information retrieval." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/43539.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.
Includes bibliographical references (p. 69-74).
by Joshua David Kramer.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

34

Karlgren, Jussi. "Stylistic Experiments for Information Retrieval." Doctoral thesis, Stockholm University, SICS, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187749.

Full text

Abstract:

Information retrieval systems are built to handle texts as topical items:texts are tabulated by occurrence frequencies of content words in them,under the assumption that text topic is reasonably well modeled by contentword occurrence. But texts have several interesting characteristics beyondtopic. The experiments described in this text investigate {\em stylisticvariation}. Roughly put, style is the difference between two ways of sayingthe same thing --- and systematic stylistic variation can be used tocharacterize the {\em genre} of documents. These experiments investigate ifstylistic information is distinguishable using simple language engineeringmethods, and if in that case this type of information can be used toimprove information retrieval systems. A first set of experiments shows that simple measures of stylisticvariation can be used to distinguish genres from each other quiteadequately; how well depends on what the genres in question are. A second set of experiments evaluates the utility of stylistic measures forthe purposes of information retrieval, to identify common characteristicsof relevant and non-relevant documents. The conclusion is that the requestsfor information as typically expressed to retrieval systems are too terseand inspecific for non-topical information to improve retrieval results.Systems for information access need to be designed from the beginning tohandle richer information about the texts and documents at hand:information about stylistic variation cannot easily be added to an existingsystem. A third set of experiments explores how an interactive system can bedesigned to incorporate stylistic information in the interface between userand system. These experiments resulted in the design an interface forcategorizing retrieval results by genre, and displaying the retrievalresults using this categorization. This interface is integrated into aprototype for retrieving information from the World Wide Web.

QC 20160530

APA, Harvard, Vancouver, ISO, and other styles

35

Wilhelm-Stein, Thomas. "Information Retrieval in der Lehre." Doctoral thesis, Universitätsbibliothek Chemnitz, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-199778.

Full text

Abstract:

Das Thema Information Retrieval hat insbesondere in Form von Internetsuchmaschinen eine große Bedeutung erlangt. Retrievalsysteme werden für eine Vielzahl unterschiedlicher Rechercheszenarien eingesetzt, unter anderem für firmeninterne Supportdatenbanken, aber auch für die Organisation persönlicher E-Mails. Eine aktuelle Herausforderung besteht in der Bestimmung und Vorhersage der Leistungsfähigkeit einzelner Komponenten dieser Retrievalsysteme, insbesondere der komplexen Wechselwirkungen zwischen ihnen. Für die Implementierung und Konfiguration der Retrievalsysteme und der Retrievalkomponenten werden Fachleute benötigt. Mithilfe der webbasierten Lernanwendung Xtrieval Web Lab können Studierende praktisches Wissen über den Information Retrieval Prozess erwerben, indem sie Retrievalkomponenten zu einem Retrievalsystem zusammenstellen und evaluieren, ohne dafür eine Programmiersprache einsetzen zu müssen. Spielemechaniken leiten die Studierenden bei ihrem Entdeckungsprozess an, motivieren sie und verhindern eine Informationsüberladung durch eine Aufteilung der Lerninhalte
Information retrieval has achieved great significance in form of search engines for the Internet. Retrieval systems are used in a variety of research scenarios, including corporate support databases, but also for the organization of personal emails. A current challenge is to determine and predict the performance of individual components of these retrieval systems, in particular the complex interactions between them. For the implementation and configuration of retrieval systems and retrieval components professionals are needed. By using the web-based learning application Xtrieval Web Lab students can gain practical knowledge about the information retrieval process by arranging retrieval components in a retrieval system and their evaluation without using a programming language. Game mechanics guide the students in their discovery process, motivate them and prevent information overload by a partition of the learning content

APA, Harvard, Vancouver, ISO, and other styles

36

Maxwell, Kylie Tamsin. "Term selection in information retrieval." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20389.

Full text

Abstract:

Systems trained on linguistically annotated data achieve strong performance for many language processing tasks. This encourages the idea that annotations can improve any language processing task if applied in the right way. However, despite widespread acceptance and availability of highly accurate parsing software, it is not clear that ad hoc information retrieval (IR) techniques using annotated documents and requests consistently improve search performance compared to techniques that use no linguistic knowledge. In many cases, retrieval gains made using language processing components, such as part-of-speech tagging and head-dependent relations, are offset by significant negative effects. This results in a minimal positive, or even negative, overall impact for linguistically motivated approaches compared to approaches that do not use any syntactic or domain knowledge. In some cases, it may be that syntax does not reveal anything of practical importance about document relevance. Yet without a convincing explanation for why linguistic annotations fail in IR, the intuitive appeal of search systems that ‘understand’ text can result in the repeated application, and mis-application, of language processing to enhance search performance. This dissertation investigates whether linguistics can improve the selection of query terms by better modelling the alignment process between natural language requests and search queries. It is the most comprehensive work on the utility of linguistic methods in IR to date. Term selection in this work focuses on identification of informative query terms of 1-3 words that both represent the semantics of a request and discriminate between relevant and non-relevant documents. Approaches to word association are discussed with respect to linguistic principles, and evaluated with respect to semantic characterization and discriminative ability. Analysis is organised around three theories of language that emphasize different structures for the identification of terms: phrase structure theory, dependency theory and lexicalism. The structures identified by these theories play distinctive roles in the organisation of language. Evidence is presented regarding the value of different methods of word association based on these structures, and the effect of method and term combinations. Two highly effective, novel methods for the selection of terms from verbose queries are also proposed and evaluated. The first method focuses on the semantic phenomenon of ellipsis with a discriminative filter that leverages diverse text features. The second method exploits a term ranking algorithm, PhRank, that uses no linguistic information and relies on a network model of query context. The latter focuses queries so that 1-5 terms in an unweighted model achieve better retrieval effectiveness than weighted IR models that use up to 30 terms. In addition, unlike models that use a weighted distribution of terms or subqueries, the concise terms identified by PhRank are interpretable by users. Evaluation with newswire and web collections demonstrates that PhRank-based query reformulation significantly improves performance of verbose queries up to 14% compared to highly competitive IR models, and is at least as good for short, keyword queries with the same models. Results illustrate that linguistic processing may help with the selection of word associations but does not necessarily translate into improved IR performance. Statistical methods are necessary to overcome the limits of syntactic parsing and word adjacency measures for ad hoc IR. As a result, probabilistic frameworks that discover, and make use of, many forms of linguistic evidence may deliver small improvements in IR effectiveness, but methods that use simple features can be substantially more efficient and equally, or more, effective. Various explanations for this finding are suggested, including the probabilistic nature of grammatical categories, a lack of homomorphism between syntax and semantics, the impact of lexical relations, variability in collection data, and systemic effects in language systems.

APA, Harvard, Vancouver, ISO, and other styles

37

Shao, Bo. "User-centric Music Information Retrieval." FIU Digital Commons, 2011. http://digitalcommons.fiu.edu/etd/416.

Full text

Abstract:

The rapid growth of the Internet and the advancements of the Web technologies have made it possible for users to have access to large amounts of on-line music data, including music acoustic signals, lyrics, style/mood labels, and user-assigned tags. The progress has made music listening more fun, but has raised an issue of how to organize this data, and more generally, how computer programs can assist users in their music experience. An important subject in computer-aided music listening is music retrieval, i.e., the issue of efficiently helping users in locating the music they are looking for. Traditionally, songs were organized in a hierarchical structure such as genre->artist->album->track, to facilitate the users’ navigation. However, the intentions of the users are often hard to be captured in such a simply organized structure. The users may want to listen to music of a particular mood, style or topic; and/or any songs similar to some given music samples. This motivated us to work on user-centric music retrieval system to improve users’ satisfaction with the system. The traditional music information retrieval research was mainly concerned with classification, clustering, identification, and similarity search of acoustic data of music by way of feature extraction algorithms and machine learning techniques. More recently the music information retrieval research has focused on utilizing other types of data, such as lyrics, user access patterns, and user-defined tags, and on targeting non-genre categories for classification, such as mood labels and styles. This dissertation focused on investigating and developing effective data mining techniques for (1) organizing and annotating music data with styles, moods and user-assigned tags; (2) performing effective analysis of music data with features from diverse information sources; and (3) recommending music songs to the users utilizing both content features and user access patterns.

APA, Harvard, Vancouver, ISO, and other styles

38

Pande, Ashwini K. "Table Understanding for Information Retrieval." Thesis, Virginia Tech, 2002. http://hdl.handle.net/10919/34820.

Full text

Abstract:

This thesis proposes a novel approach for finding tables in text files containing a mixture of unstructured and structured text. Tables may be arbitrarily complex because the data in the tables may themselves be tables and because the grouping of data elements displayed in a table may be very complex. Although investigators have proposed competence models to explain the structure of tables, there are no computationally feasible performance models for detecting and parsing general structures in real data. Our emphasis is placed on the investigation of a new statistical procedure for detecting basic tables in plain text documents. The main task here is defining and testing this theory in the context of the Odessa Digital Library.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

39

Muthukrishnan, Arvind Kumar. "Information Retrieval Using Concept Lattices." University of Cincinnati / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1141055777.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Whiting, Stewart William. "Temporal dynamics in information retrieval." Thesis, University of Glasgow, 2015. http://theses.gla.ac.uk/6850/.

Full text

Abstract:

The passage of time is unrelenting. Time is an omnipresent feature of our existence, serving as a context to frame change driven by events and phenomena in our personal lives and social constructs. Accordingly, various elements of time are woven throughout information itself, and information behaviours such as creation, seeking and utilisation. Time plays a central role in many aspects of information retrieval (IR). It can not only distinguish the interpretation of information, but also profoundly influence the intentions and expectations of users' information seeking activity. Many time-based patterns and trends - namely temporal dynamics - are evident in streams of information behaviour by individuals and crowds. A temporal dynamic refers to a periodic regularity, or, a one-off or irregular past, present or future of a particular element (e.g., word, topic or query popularity) - driven by predictable and unpredictable time-based events and phenomena. Several challenges and opportunities related to temporal dynamics are apparent throughout IR. This thesis explores temporal dynamics from the perspective of query popularity and meaning, and word use and relationships over time. More specifically, the thesis posits that temporal dynamics provide tacit meaning and structure of information and information seeking. As such, temporal dynamics are a ‘two-way street’ since they must be supported, but also conversely, can be exploited to improve time-aware IR effectiveness. Real-time temporal dynamics in information seeking must be supported for consistent user satisfaction over time. Uncertainty about what the user expects is a perennial problem for IR systems, further confounded by changes over time. To alleviate this issue, IR systems can: (i) assist the user to submit an effective query (e.g., error-free and descriptive), and (ii) better anticipate what the user is most likely to want in relevance ranking. I first explore methods to help users formulate queries through time-aware query auto-completion, which can suggest both recent and always popular queries. I propose and evaluate novel approaches for time-sensitive query auto-completion, and demonstrate state-of-the-art performance of up to 9.2% improvement above the hard baseline. Notably, I find results are reflected across diverse search scenarios in different languages, confirming the pervasive and language agnostic nature of temporal dynamics. Furthermore, I explore the impact of temporal dynamics on the motives behind users' information seeking, and thus how relevance itself is subject to temporal dynamics. I find that temporal dynamics have a dramatic impact on what users expect over time for a considerable proportion of queries. In particular, I find the most likely meaning of ambiguous queries is affected over short and long-term periods (e.g., hours to months) by several periodic and one-off event temporal dynamics. Additionally, I find that for event-driven multi-faceted queries, relevance can often be inferred by modelling the temporal dynamics of changes in related information. In addition to real-time temporal dynamics, previously observed temporal dynamics offer a complementary opportunity as a tacit dimension which can be exploited to inform more effective IR systems. IR approaches are typically based on methods which characterise the nature of information through the statistical distributions of words and phrases. In this thesis I look to model and exploit the temporal dimension of the collection, characterised by temporal dynamics, in these established IR approaches. I explore how the temporal dynamic similarity of word and phrase use in a collection can be exploited to infer temporal semantic relationships between the terms. I propose an approach to uncover a query topic's "chronotype" terms -- that is, its most distinctive and temporally interdependent terms, based on a mix of temporal and non-temporal evidence. I find exploiting chronotype terms in temporal query expansion leads to significantly improved retrieval performance in several time-based collections. Temporal dynamics provide both a challenge and an opportunity for IR systems. Overall, the findings presented in this thesis demonstrate that temporal dynamics can be used to derive tacit structure and meaning of information and information behaviour, which is then valuable for improving IR. Hence, time-aware IR systems which take temporal dynamics into account can better satisfy users consistently by anticipating changing user expectations, and maximising retrieval effectiveness over time.

APA, Harvard, Vancouver, ISO, and other styles

41

Wu, Bin. "Statistical physics of information retrieval /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?PHYS%202002%20WU.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Cosijn, Erica. "Relevance judgements in information retrieval." Thesis, Pretoria [s.n.], 2003. http://upetd.up.ac.za/thesis/available/etd-09192005-145624/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Addy, Nicholas G. "Ontology driven geographic information retrieval." Thesis, Curtin University, 2009. http://hdl.handle.net/20.500.11937/2526.

Full text

Abstract:

The theory of modern information retrieval processes must be improved to meet parallel growth and efficiency in its dependent hardware architectures. The growth in data sources facilitated by hardware improvements must be conversant with parallel growth at the user end of the information retrieval paradigm, encompassing both an increasing demand for data services and a widening user base. Contemporary sources refer to such growth as three dimensional, in reference to the expected and parallel growth in the key areas of hardware processing power, demand from current users of information services and an increase in demand via an extended user base consisting of institutions and organizations who are not characteristically defined by their use of geographic information. This extended user base is expected to grow due to the demand to utilise and incorporate geographic information as part of competitive business processes, to fill the need for advertising and spatial marketing demographics. The vision of the semantic web as such is the challenge of managing integration between diverse and increasing data sources and diverse and increasing end users of information. Whilst data standardisation is one means of achieving this vision at the source end of the information flow, it is not a solution in a free market of ideas. Information in its elemental form should be accessible regardless of the domain of its creation.In an environment where the users and sources are continually growing in scope and depth, the management of data via precise and relevant information retrieval requires techniques which can integrate information seamlessly between machines and users regardless of the domain of application or data storage methods. This research is the study of a theory of geographic information structure which can be applied to all aspects of information systems development, governing at a conceptual level the representation of information to meet the requirements of inter machine operability as well as inter user operability. This research entails a thorough study of the use of ontology from theoretical definition to modern use in information systems development and retrieval, in the geographic domain. This is a study examining how the use of words to describe geographic features are elements which can form a geographic ontology and evaluates WordNet, an English language ontology in the form of a lexical database as a structure for improving geographic information recall on Gazetteers. The results of this research conclude that WordNet can be utilised to as a methodology for improving search results in geographic information retrieval processes as a source for additional query terms, but only on a narrow user domain.

APA, Harvard, Vancouver, ISO, and other styles

44

Åkesson, Mattias. "Passage Retrieval : en litteraturstudie av ett forskningsområde inom information retrieval." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-18347.

Full text

Abstract:

The aim of this thesis is to describe passage retrieval (PR), with basis in results from various empirical experiments, and to critically investigate different approaches in PR. The main questions to be answered in the thesis are: (1) What characterizes PR? (2) What approaches have been proposed? (3) How well do the approaches work in experimental information retrieval (IR)? PR is a research topic in information retrieval, which instead of retrieving the fulltext of documents, that can lead to information overload for the user, tries to retrieve the most relevant passages in the documents. This technique was investigated studying a number of central articles in the research field. PR can be divided into three different types of approaches based on the segmentation of the documents. First, you can divide the text considering the semantics and where the topics change. Second, you can divide the text based on the explicit structure of the documents, with help from e.g. a markup language like SGML. And third, you can do a form of PR, where you divide the text in parts containing a fixed number of words. This method is called unmotivated segmentation. The study showed that an unmotivated segmentation resulted in the best retrieval effectiveness even though the results are difficult to compare because of different kinds of evaluation methods and different types of test collections. A combination between full text retrieval and PR also showed improved results.
Uppsatsnivå: D

APA, Harvard, Vancouver, ISO, and other styles

45

Gundelsweiler, Fredrik. "INVISIP - Implementation eines Scatterplots zur Visualisierung von geo-räumlichen Metadaten." [S.l. : s.n.], 2002. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB10252261.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Wania, Christine Elizabeth Atwood Michael E. "Examining the impact of an information retrieval pattern language on the design of information retrieval interfaces /." Philadelphia, Pa. : Drexel University, 2008. http://hdl.handle.net/1860/2829.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

PANZERI, EMANUELE. "Enhanced XML Retrieval with Flexible Constraints Evaluation." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2014. http://hdl.handle.net/10281/50791.

Full text

Abstract:

Since its standardization by the World Wide Web Consortium (W3C) in 1998, the XML (acronym for eXtensible Markup Language) has been acknowledged as the de-facto standard format for data, besides being a data format employed by a wide and increasing number of application domains. XML allows data and textual contents to be structured; the structural elements are specified in plain text using strings of characters that can be easily read by computer programs, while maintaining human-readability. XPath and XQuery represent the two main standard languages that have been defined to inquire XML data; the two languages allow to select a subset of elements from an XML document, and to further manipulate its contents and to restructure the document tree form. Both XPath and XQuery are based on a Database perspective of XML documents, where the evaluation of the query clauses is performed like in the database query language SQL, from which both the XML languages took inspiration. The data-centric perspective adopted by the XQuery and XPath languages has been recently extended by an Information Retrieval oriented approach, where a new set of content-based constraints have been defined that allow a full-text search in an IR-style, with an element relevance scoring computation. This extension is called XQuery/XPath Full-Text and has been standardized by the W3C. In the Information Retrieval community other approaches have appeared that take into account the document structure and propose a set of approximate structural matching techniques, where the standard XQuery and XPath structural constraints are evaluated by path relaxation algorithms. Such approaches, however, do not offer the user the possibility to express vague structural constraints the approximate evaluation of which produces a set of weighted fragments, where the weight express the relevance of the fragment with respect to the structural constraints. This thesis describes the definition and the implementation of a formal XQuery Full-Text extension named FleXy, aimed at taking into account the user perspective in the formulation of structure-based constraints, where vagueness can be associated to the specification of such constraints. FleXy has been designed as an extension of the XQuery Full-Text language to inherit both the full-text search features from the Full-Text extension, and the standard element selection provided by XQuery. The evaluation of two new vague structural constraints defined in the FleXy language, named Below and Near, produces a set of weighted elements, where a structural-score is computed by taking into account the distance from the user required target element and the actually retrieved one. Thresholds variants of the Below and Near constraints have also been defined which allow to specify the extent of the application of the vague structural constraints. The formal definition of the FleXy language is here provided through its syntax, its semantics, and the algorithms that define the Below and the Nnear axes. The language implementation has been performed on top of an Open Source XQuery engine named BaseX, a fully featured XQuery and XPath engine with a complete adherence to the Full-Text language specification. Performance evaluations have been subsequently provided to compare the FleXy constraints with the standard XQuery counterparts, when available. Finally, a patent search application has been developed by leveraging the FleXy implementation provided on top of the BaseX engine: the XML structure of the US Patent Collection (USPTO) has been exploited in conjunction with the textual contents of the patents to help non-expert users to effectively retrieve relevant patents by also offering a result categorization strategy.

APA, Harvard, Vancouver, ISO, and other styles

48

Teuber, Tobias. "Information Retrieval und Dokumentenmanagement in Büroinformationssystemen /." Göttingen : Unitext-Verl, 1996. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=007232155&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Valvåg, Ottar Viken. "Multiple evidence combination in information retrieval." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Kanhabua, Nattiya. "Time-aware Approaches to Information Retrieval." Doctoral thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-16477.

Full text

Abstract:

In this thesis, we address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query. Our contributions in this thesis are different time-aware approaches within three topics in IR: content analysis, query analysis, and retrieval and ranking models. In particular, we aim at improving the retrieval effectiveness by 1) analyzing the contents of temporal document collections, 2) performing an analysis of temporal queries, and 3) explicitly modeling the time dimension into retrieval and ranking. Leveraging the time dimension in ranking can improve the retrieval effectiveness if information about the creation or publication time of documents is available. In this thesis, we analyze the contents of documents in order to determine the time of non-timestamped documents using temporal language models. We subsequently employ the temporal language models for determining the time of implicit temporal queries, and the determined time is used for re-ranking search results in order to improve the retrieval effectiveness. We study the effect of terminology changes over time and propose an approach to handling terminology changes using time-based synonyms. In addition, we propose different methods for predicting the effectiveness of temporal queries, so that a particular query enhancement technique can be performed to improve the overall performance. When the time dimension is incorporated into ranking, documents will be ranked according to both textual and temporal similarity. In this case, time uncertainty should also be taken into account. Thus, we propose a ranking model that considers the time uncertainty, and improve ranking by combining multiple features using learning-to-rank techniques. Through extensive evaluation, we show that our proposed time-aware approaches outperform traditional retrieval methods and improve the retrieval effectiveness in searching temporal document collections.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Information Retrieva'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles