Dissertations / Theses on the topic '080704 Information Retrieval and Web Search'

To see the other types of publications on this topic, follow the link: 080704 Information Retrieval and Web Search.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic '080704 Information Retrieval and Web Search.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Tjondronegoro, Dian W. "PhD Thesis: "Content-based Video Indexing for Sports Applications using Multi-modal approach"." Thesis, Deakin University, 2005. https://eprints.qut.edu.au/2199/1/PhDThesis_Tjondronegoro.pdf.

Full text
Abstract:
Triggered by technology innovations, there has been a huge increase in the utilization of video, as one of the most preferred types of media due to its content richness, for many significant applications. To sustain an ongoing rapid growth of video information, there is an emerging demand for a sophisticated content-based video indexing system. However, current video indexing solutions are still immature and lack of any standard. One solution, namely annotation-based indexing, allows video retrieval using textual annotations. However, the major limitations are the restrictions of pre-defined keywords that can be used and the expensive manual work on annotating video. Another solution called feature-based indexing allows video search by low-level features comparison such as query by a sample image. Even though this approach can use automatically extracted features, users would not be able to retrieve video intuitively, based on high-level concepts. This predicament is caused by the so-called semantic gap which highlights the fact that users recall video contents in a high-level abstraction while video is generally stored as an arbitrary sequence of audio-visual tracks. To bridge the semantic gap, this thesis will demonstrate the use of domain-specific approach which aims to utilize domain knowledge in facilitating the extraction of high-level concepts directly from the audiovisual features. The main idea behind domain-specific approach is the use of domain knowledge to guide the integration of features from multi-modal tracks. For example, to extract goal segments from soccer and basketball video, slow motion replay scenes (visual) and excitement (audio) should be detected as they are played during most goal segments. Domain-specific indexing also exploits specific browsing and querying methods which are driven by specific users/applications’ requirements. Sports video is selected as the primary domain due to its content richness and popularity. Moreover, broadcasted sports videos generally span for hours with many redundant activities and the key segments could make up only 30% to 60% of the entire data depending on the progress of the match. This thesis presents a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple (audio-visual) modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s). The main objective is to contribute to the three major components of sports video indexing systems. The first component is a set of powerful techniques to extract audio-visual features and semantic contents automatically. The main purposes are to reduce manual annotations and to summarize the lengthy contents into a compact, meaningful and more enjoyable presentation. The second component is an expressive and flexible indexing technique that supports gradual index construction. Indexing scheme is essential to determine the methods by which users can access a video database. The third and last component is a query language that can generate dynamic video summaries for smart browsing and support user-oriented retrievals.
APA, Harvard, Vancouver, ISO, and other styles
2

Lewandowski, Dirk. "Web Searching, Search Engines and Information Retrieval." ISO Press, 2005. http://hdl.handle.net/10150/106395.

Full text
Abstract:
This article discusses Web search engines; mainly the challenges in indexing the World Wide Web, the user behaviour, and the ranking factors used by these engines. Ranking factors are divided into query-dependent and query-independent factors, the latter of which have become more and more important within recent years. The possibilities of these factors are limited, mainly of those that are based on the widely used link popularity measures. The article concludes with an overview of factors that should be considered to determine the quality of Web search engines.
APA, Harvard, Vancouver, ISO, and other styles
3

Craswell, Nicholas Eric, and Nick Craswell@anu edu au. "Methods for Distributed Information Retrieval." The Australian National University. Faculty of Engineering and Information Technology, 2001. http://thesis.anu.edu.au./public/adt-ANU20020315.142540.

Full text
Abstract:
Published methods for distributed information retrieval generally rely on cooperation from search servers. But most real servers, particularly the tens of thousands available on the Web, are not engineered for such cooperation. This means that the majority of methods proposed, and evaluated in simulated environments of homogeneous cooperating servers, are never applied in practice. ¶ This thesis introduces new methods for server selection and results merging. The methods do not require search servers to cooperate, yet are as effective as the best methods which do. Two large experiments evaluate the new methods against many previously published methods. In contrast to previous experiments they simulate a Web-like environment, where servers employ varied retrieval algorithms and tend not to sub-partition documents from a single source. ¶ The server selection experiment uses pages from 956 real Web servers, three different retrieval systems and TREC ad hoc topics. Results show that a broker using queries to sample servers’ documents can perform selection over non-cooperating servers without loss of effectiveness. However, using the same queries to estimate the effectiveness of servers, in order to favour servers with high quality retrieval systems, did not consistently improve selection effectiveness. ¶ The results merging experiment uses documents from five TREC sub-collections, five different retrieval systems and TREC ad hoc topics. Results show that a broker using a reference set of collection statistics, rather than relying on cooperation to collate true statistics, can perform merging without loss of effectiveness. Since application of the reference statistics method requires that the broker download the documents to be merged, experiments were also conducted on effective merging based on partial documents. The new ranking method developed was not highly effective on partial documents, but showed some promise on fully downloaded documents. ¶ Using the new methods, an effective search broker can be built, capable of addressing any given set of available search servers, without their cooperation.
APA, Harvard, Vancouver, ISO, and other styles
4

Costa, Miguel. "SIDRA: a Flexible Web Search System." Master's thesis, Department of Informatics, University of Lisbon, 2004. http://hdl.handle.net/10451/13914.

Full text
Abstract:
Sidra is a new indexing, searching and ranking system for Web contents. It has a flexible, parallel, distributed and scalable architecture. Sidra maintains several data structures that provide multiple access methods to different data dimensions, giving it the capability to select results reflecting search contexts. Its design addresses current challenges of Web search engines: high performance, short searching and indexing times, good quality of results, scalability and high service availability
APA, Harvard, Vancouver, ISO, and other styles
5

Limbu, Dilip Kumar. "Contextual information retrieval from the WWW." Click here to access this resource online, 2008. http://hdl.handle.net/10292/450.

Full text
Abstract:
Contextual information retrieval (CIR) is a critical technique for today’s search engines in terms of facilitating queries and returning relevant information. Despite its importance, little progress has been made in its application, due to the difficulty of capturing and representing contextual information about users. This thesis details the development and evaluation of the contextual SERL search, designed to tackle some of the challenges associated with CIR from the World Wide Web. The contextual SERL search utilises a rich contextual model that exploits implicit and explicit data to modify queries to more accurately reflect the user’s interests as well as to continually build the user’s contextual profile and a shared contextual knowledge base. These profiles are used to filter results from a standard search engine to improve the relevance of the pages displayed to the user. The contextual SERL search has been tested in an observational study that has captured both qualitative and quantitative data about the ability of the framework to improve the user’s web search experience. A total of 30 subjects, with different levels of search experience, participated in the observational study experiment. The results demonstrate that when the contextual profile and the shared contextual knowledge base are used, the contextual SERL search improves search effectiveness, efficiency and subjective satisfaction. The effectiveness improves as subjects have actually entered fewer queries to reach the target information in comparison to the contemporary search engine. In the case of a particularly complex search task, the efficiency improves as subjects have browsed fewer hits, visited fewer URLs, made fewer clicks and have taken less time to reach the target information when compared to the contemporary search engine. Finally, subjects have expressed a higher degree of satisfaction on the quality of contextual support when using the shared contextual knowledge base in comparison to using their contextual profile. These results suggest that integration of a user’s contextual factors and information seeking behaviours are very important for successful development of the CIR framework. It is believed that this framework and other similar projects will help provide the basis for the next generation of contextual information retrieval from the Web.
APA, Harvard, Vancouver, ISO, and other styles
6

Morrison, Patrick Jason. "Tagging and Searching: Search Retrieval Effectiveness of Folksonomies on the Web." [Kent, Ohio] : Kent State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=kent1177305096.

Full text
Abstract:
Thesis (M.S.)--Kent State University, 2007.
Title from PDF t.p. (viewed July 2, 2007). Advisor: David B. Robins. Keywords: information retrieval, search engine, social bookmarking, tagging, folksonomy, Internet, World Wide Web. Includes survey instrument. Includes bibliographical references (p. 137-141).
APA, Harvard, Vancouver, ISO, and other styles
7

Nguyen, Qui V. "Enhancing a Web Crawler with Arabic Search." Thesis, Monterey, California: Naval Postgraduate School, 2012.

Find full text
Abstract:
Many advantages of the Internetâ ease of access, limited regulation, vast potential audience, and fast flow of informationâ have turned it into the most popular way to communicate and exchange ideas. Criminal and terrorist groups also use these advantages to turn the Internet into their new play/battle fields to conduct their illegal/terror activities. There are millions of Web sites in different languages on the Internet, but the lack of foreign language search engines makes it impossible to analyze foreign language Web sites efficiently. This thesis will enhance an open source Web crawler with Arabic search capability, thus improving an existing social networking tool to perform page correlation and analysis of Arabic Web sites. A social networking tool with Arabic search capabilities could become a valuable tool for the intelligence community. Its page correlation and analysis results could be used to collect open source intelligence and build a network of Web sites that are related to terrorist or criminal activities.
APA, Harvard, Vancouver, ISO, and other styles
8

Tsukuda, Kosetsu. "A Study on Web Search and Analysis based on Typicality." 京都大学 (Kyoto University), 2014. http://hdl.handle.net/2433/192217.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Umemoto, Kazutoshi. "A Study on Fine-Grained User Behavior Analysis in Web Search." 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/215679.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Halpin, Harry. "Sense and reference on the Web." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/3796.

Full text
Abstract:
This thesis builds a foundation for the philosophy of theWeb by examining the crucial question: What does a Uniform Resource Identifier (URI) mean? Does it have a sense, and can it refer to things? A philosophical and historical introduction to the Web explains the primary purpose of theWeb as a universal information space for naming and accessing information via URIs. A terminology, based on distinctions in philosophy, is employed to define precisely what is meant by information, language, representation, and reference. These terms are then employed to create a foundational ontology and principles ofWeb architecture. From this perspective, the SemanticWeb is then viewed as the application of the principles of Web architecture to knowledge representation. However, the classical philosophical problems of sense and reference that have been the source of debate within the philosophy of language return. Three main positions are inspected: the logicist position, as exemplified by the descriptivist theory of reference and the first-generation SemanticWeb, the direct reference position, as exemplified by Putnamand Kripke’s causal theory of reference and the second-generation Linked Data initiative, and a Wittgensteinian position that views the Semantic Web as yet another public language. After identifying the public language position as the most promising, a solution of using people’s everyday use of search engines as relevance feedback is proposed as a Wittgensteinian way to determine sense of URIs. This solution is then evaluated on a sample of the Semantic Web discovered by via using queries from a hypertext search engine query log. The results are evaluated and the technique of using relevance feedback from hypertext Web searches to determine relevant Semantic Web URIs in response to user queries is shown to considerably improve baseline performance. Future work for the Web that follows from our argument and experiments is detailed, and outlines of a future philosophy of the Web laid out.
APA, Harvard, Vancouver, ISO, and other styles
11

Weldeghebriel, Zemichael Fesahatsion. "Evaluating and comparing search engines in retrieving text information from the web." Thesis, Stellenbosch : Stellenbosch University, 2004. http://hdl.handle.net/10019.1/53740.

Full text
Abstract:
Thesis (MPhil)--Stellenbosch University, 2004
ENGLISH ABSTRACT: With the introduction of the Internet and the World Wide Web (www), information can be easily accessed and retrieved from the web using information retrieval systems such as web search engines or simply search engines. There are a number of search engines that have been developed to provide access to the resources available on the web and to help users in retrieving relevant information from the web. In particular, they are essential for finding text information on the web for academic purposes. But, how effective and efficient are those search engines in retrieving the most relevant text information from the web? Which of the search engines are more effective and efficient? So, this study was conducted to see how effective and efficient search engines are and to see which search engines are most effective and efficient in retrieving the required text information from the web. It is very important to know the most effective and efficient search engines because such search engines can be used to retrieve a higher number of the most relevant text web pages with minimum time and effort. The study was based on nine major search engines, four search queries and relevancy judgments as relevant/partly-relevanUnon-relevant. Precision and recall were calculated based on the experimental or test results and these were used as basis for the statistical evaluation and comparisons of the retrieval effectiveness of the nine search engines. Duplicated items and broken links were also recorded and examined separately and were used as an additional measure of search engine effectiveness. A response time was also recorded and used as a base for the statistical evaluation and comparisons of the retrieval efficiency of the nine search engines. Additionally, since search engines involve indexing and searching in the information retrieval processes from the web, this study first discusses, from the theoretical point of view, how the indexing and searching processes are performed in an information retrieval environment. It also discusses the influences of indexing and searching processes on the effectiveness and efficiency of information retrieval systems in general and search engines in particular in retrieving the most relevant text information from the web.
AFRIKAANSE OPSOMMING: Met die koms van die Internet en die Wêreldwye Web (www) is inligting maklik bekombaar. Dit kan herwin word deur gebruik te maak van inligtingherwinningsisteme soos soekenjins. Daar is 'n hele aantal sulke soekenjins wat ontwikkel is om toegang te verleen tot die hulpbronne beskikbaar op die web en om gebruikers te help om relevante inligting vanaf die web in te win. Dit is veral noodsaaklik vir die verkryging van teksinligting vir akademiese doeleindes. Maar hoe effektief en doelmatig is die soekenjins in die herwinning van die mees relevante teksinligting vanaf die web? Watter van die soekenjins is die effektiefste? Hierdie studie is onderneem om te kyk watter soekenjins die effektiefste en doelmatigste is in die herwinning van die nodige teksinligting. Dit is belangrik om te weet watter soekenjin die effektiefste is want so 'n enjin kan gebruik word om 'n hoër getal van die mees relevante tekswebblaaie met die minimum van tyd en moeite te herwin. Heirdie studie is baseer op die sewe hoofsoekenjins, vier soektogte, en toepasliksheidsoordele soos relevant /gedeeltelik relevant/ en nie- relevant. Presiesheid en herwinningsvermoë is bereken baseer op die eksperimente en toetsresultate en dit is gebruik as basis vir statistiese evaluasie en vergelyking van die herwinningseffektiwiteit van die nege soekenjins. Gedupliseerde items en gebreekte skakels is ook aangeteken en apart ondersoek en is gebruik as bykomende maatstaf van effektiwiteit. Die reaksietyd is ook aangeteken en is gebruik as basis vir statistiese evaluasie en die vergelyking van die herwinningseffektiwiteit van die nege soekenjins. Aangesien soekenjins betrokke is by indeksering en soekprosesse, bespreek hierdie studie eers uit 'n teoretiese oogpunt, hoe indeksering en soekprosesse uitgevoer word in 'n inligtingherwinningsomgewing. Die invloed van indeksering en soekprosesse op die doeltreffendheid van herwinningsisteme in die algemeen en veral van soekenjins in die herwinning van die mees relevante teksinligting vanaf die web, word ook bespreek.
APA, Harvard, Vancouver, ISO, and other styles
12

Na, Jin-Cheon, Christopher S. G. Khoo, and Syin Chan. "A sentiment-based meta search engine." School of Communication & Information, Nanyang Technological University, 2006. http://hdl.handle.net/10150/106241.

Full text
Abstract:
This study is in the area of sentiment classification: classifying online review documents according to the overall sentiment expressed in them. This paper presents a prototype sentiment-based meta search engine that has been developed to perform sentiment categorization of Web search results. It assists users to quickly focus on recommended or non-recommended information by classifying Web search results into four categories: positive, negative, neutral, and non-review documents. It does this by using an automatic classifier based on a supervised machine learning algorithm, Support Vector Machine (SVM). This paper also discusses various issues we have encountered during the prototype development, and presents our approaches for resolving them. A user evaluation of the prototype was carried out with positive responses from users.
APA, Harvard, Vancouver, ISO, and other styles
13

Meng, Zhao. "A Study on Web Search based on Coordinate Relationships." 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/217205.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Varadarajan, Ramakrishna R. "Ranked Search on Data Graphs." FIU Digital Commons, 2009. http://digitalcommons.fiu.edu/etd/220.

Full text
Abstract:
Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity – users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top-k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.
APA, Harvard, Vancouver, ISO, and other styles
15

Zhang, Limin. "Contextual Web Search Based on Semantic Relationships: A Theoretical Framework, Evaluation and a Medical Application Prototype." Diss., Tucson, Arizona : University of Arizona, 2006. http://etd.library.arizona.edu/etd/GetFileServlet?file=file:///data1/pdf/etd/azu%5Fetd%5F1602%5F1%5Fm.pdf&type=application/pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Chignell, Mark, Jacek Gwizdka, and Richard Bodner. "Discriminating Meta-Search: A Framework for Evaluation." Elsevier, 1999. http://hdl.handle.net/10150/105146.

Full text
Abstract:
DOI: 10.1016/S0306-4573(98)00065-X
There was a proliferation of electronic information sources and search engines in the 1990s. Many of these information sources became available through the ubiquitous interface of the Web browser. Diverse information sources became accessible to information professionals and casual end users alike. Much of the information was also hyperlinked, so that information could be explored by browsing as well as searching. While vast amounts of information were now just a few keystrokes and mouseclicks away, as the choices multiplied, so did the complexity of choosing where and how to look for the electronic information. Much of the complexity in information exploration at the turn of the twenty-first century arose because there was no common cataloguing and control system across the various electronic information sources. In addition, the many search engines available differed widely in terms of their domain coverage, query methods, and efficiency. Meta-search engines were developed to improve search performance by querying multiple search engines at once. In principle, meta-search engines could greatly simplify the search for electronic information by selecting a subset of first-level search engines and digital libraries to submit a query to based on the characteristics of the user, the query/topic, and the search strategy. This selection would be guided by diagnostic knowledge about which of the first-level search engines works best under what circumstances. Programmatic research is required to develop this diagnostic knowledge about first-level search engine performance. This paper introduces an evaluative framework for this type of research and illustrates its use in two experiments. The experimental results obtained are used to characterize some properties of leading search engines (as of 1998). Significant interactions were observed between search engine and two other factors (time of day, and Web domain). These findings supplement those of earlier studies, providing preliminary information about the complex relationship between search engine functionality and performance in different contexts. While the specific results obtained represent a time-dependent snapshot of search engine performance in 1998, the evaluative framework proposed should be generally applicable in the future.
APA, Harvard, Vancouver, ISO, and other styles
17

Al-Dallal, Ammar Sami. "Enhancing recall and precision of web search using genetic algorithm." Thesis, Brunel University, 2012. http://bura.brunel.ac.uk/handle/2438/7379.

Full text
Abstract:
Due to rapid growth of the number of Web pages, web users encounter two main problems, namely: many of the retrieved documents are not related to the user query which is called low precision, and many of relevant documents have not been retrieved yet which is called low recall. Information Retrieval (IR) is an essential and useful technique for Web search; thus, different approaches and techniques are developed. Because of its parallel mechanism with high-dimensional space, Genetic Algorithm (GA) has been adopted to solve many of optimization problems where IR is one of them. This thesis proposes searching model which is based on GA to retrieve HTML documents. This model is called IR Using GA or IRUGA. It is composed of two main units. The first unit is the document indexing unit to index the HTML documents. The second unit is the GA mechanism which applies selection, crossover, and mutation operators to produce the final result, while specially designed fitness function is applied to evaluate the documents. The performance of IRUGA is investigated using the speed of convergence of the retrieval process, precision at rank N, recall at rank N, and precision at recall N. In addition, the proposed fitness function is compared experimentally with Okapi-BM25 function and Bayesian inference network model function. Moreover, IRUGA is compared with traditional IR using the same fitness function to examine the performance in terms of time required by each technique to retrieve the documents. The new techniques developed for document representation, the GA operators and the fitness function managed to achieves an improvement over 90% for the recall and precision measures. And the relevance of the retrieved document is much higher than that retrieved by the other models. Moreover, a massive comparison of techniques applied to GA operators is performed by highlighting the strengths and weaknesses of each existing technique of GA operators. Overall, IRUGA is a promising technique in Web search domain that provides a high quality search results in terms of recall and precision.
APA, Harvard, Vancouver, ISO, and other styles
18

Knopke, Ian. "Building a search engine for music and audio on the World Wide Web." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=85177.

Full text
Abstract:
The main contribution of this dissertation is a system for locating and indexing audio files on the World Wide Web. The idea behind this system is that the use of both web page and audio file analysis techniques can produce more relevant information for locating audio files on the web than is used in full-text search engines.
The most important part of this system is a web crawler that finds materials by following hyperlinks between web pages. The crawler is distributed and operates using multiple computers across a network, storing results to a database. There are two main components: a set of retrievers that retrieve pages and audio files from the web, and a central crawl manager that coordinates the retrievers and handles data storage tasks.
The crawler is designed to locate three types of audio files: AIFF, WAVE, and MPEG-1 (MP3), but other types can be easily added to the system. Once audio files are located, analyses are performed of both the audio files and the associated web pages that link to these files. Information extracted by the crawler can be used to build search indexes for resolving user queries. A set of results demonstrating aspects of the performance of the crawler are presented, as well as some statistics and points of interest regarding the nature of audio files on the web.
APA, Harvard, Vancouver, ISO, and other styles
19

Shoji, Yoshiyuki. "A Study on Social Information Search and Analysis on the Web by Diversity Computation." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/199443.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Jing, Yushi. "Learning an integrated hybrid image retrieval system." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/43746.

Full text
Abstract:
Current Web image search engines, such as Google or Bing Images, adopt a hybrid search approach in which a text-based query (e.g. "apple") is used to retrieve a set of relevant images, which are then refined by the user (e.g. by re-ranking the retrieved images based on similarity to a selected example). This approach makes it possible to use both text information (e.g. the initial query) and image features (e.g. as part of the refinement stage) to identify images which are relevant to the user. One limitation of these current systems is that text and image features are treated as independent components and are often used in a decoupled manner. This work proposes to develop an integrated hybrid search method which leverages the synergies between text and image features. Recently, there has been tremendous progress in the computer vision community in learning models of visual concepts from collections of example images. While impressive performance has been achieved on standardized data sets, scaling these methods so that they are capable of working at web scale remains a significant challenge. This work will develop approaches to visual modeling that can be scaled to address the task of retrieving billions of images on the Web. Specifically, we propose to address two research issues related to integrated text- and image-based retrieval. First, we will explore whether models of visual concepts which are learned from collections of web images can be utilized to improve the image ranking associated with a text-based query. Second, we will investigate the hypothesis that the click-patterns associated with standard web image search engines can be utilized to learn query-specific image similarity measures that support improved query-refinement performance. We will evaluate our research by constructing a prototype integrated hybrid retrieval system based on the data from 300K real-world image queries. We will conduct user-studies to evaluate the effectiveness of our learned similarity measures and quantify the benefit of our method in real world search tasks such as target search.
APA, Harvard, Vancouver, ISO, and other styles
21

Li, Ping 1965. "Doctoral students’ mental models of a web search engine : an exploratory study." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=94181.

Full text
Abstract:
This exploratory research investigates the factors that might influence a specific group of users’ mental models of a Web search engine, Google, as measured in the dimension of completeness. A modified mental model completeness scale (MMCS) was developed based on Borgman’s, Dimitroff s, and Saxon’s models, encompassing the perception of (1) the nature of the Web search engine, (2) searching features of the Web search engine, and (3) the interaction between the searcher and the Web search engine. With this scale, a participant’s mental model completeness level was determined by how many components of the first two parts of the scale were described and which level of interaction between the participant and Google was revealed during the searches. The choice of the factors was based on the previous studies on individual differences among information seekers, including user’s search experience, cognitive style, learning style, technical aptitudes, training received, discipline, and gender. Sixteen Ph.D. students whose first language is English participated in the research. Individual semi-structured interviews were conducted to determine the students’ mental model completeness level (MMCL) as well as their search experience, training received, discipline and gender. Direct observation technique was employed to observe students’ actual interactions with Google. Standard tests were administered to determine the students’ cognitive styles, learning styles and technical aptitudes.
Cette recherche préliminaire examine les facteurs qui peuvent influencer les modèles mentaux d’un groupe spécifique d’utilisateurs d’un moteur de recherche sur le Web: Google, mesurés selon l’étendue de leur réussite.Une échelle de cette réussite en suivant un modèle mental a été constituée en adaptant les modèles présentés par Borgman, Dimitroff et Saxon, incluant la perception (1) de la nature du moteur de recherche sur le Web, (2) des caractéristiques de la recherche propres à ce moteur, (3) de l’interaction entre le chercheur et le moteur de recherche. A l’aide de cette échelle, le niveau de réussite par un sujet donné utilisant un modèle mental a été déterminé en fonction du nombre de composantes des deux premières parties de l’échelle décrites et du niveau d’interaction entre le sujet et le moteur Google, tel que révélé par ses recherches. Le choix des facteurs a été fondé sur des études précédentes portant sur les différences individuelles entre les chercheurs d’information, comprenant le degré d’expérience d’une telle recherche par l’utilisateur, son style cognitif, son style d’apprentissage, ses aptitudes techniques, la formation reçue, la discipline et le sexe. Seize étudiants en doctorat ayant l’anglais comme première langue ont participé à cette étude. Des entretiens individuels semi-dirigés ont permis de déterminer le niveau de réussite des étudiants suivant leur modèle mental, ainsi que leur expérience de la recherche, la formation reçue, la discipline et le sexe. Une observation technique directe a été utilisée pour observer l’interaction réelle des étudiants avec Google. Des tests standardisés ont été administrés pour déterminer le style cognitif des étudiants, leur style d’apprentissage et leurs aptitudes techniques. fr
APA, Harvard, Vancouver, ISO, and other styles
22

POTHIRATTANACHAIKUL, SUPPANUT. "A Study on Understanding and Encouraging Alternative Information Search." Kyoto University, 2020. http://hdl.handle.net/2433/259073.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Mendoza, Rocha Marcelo Gabriel. "Query log mining in search engines." Tesis, Universidad de Chile, 2007. http://www.repositorio.uchile.cl/handle/2250/102877.

Full text
Abstract:
Doctor en Ciencias, Mención Computación
La Web es un gran espacio de información donde muchos recursos como documentos, imágenes u otros contenidos multimediales pueden ser accesados. En este contexto, varias tecnologías de la información han sido desarrolladas para ayudar a los usuarios a satisfacer sus necesidades de búsqueda en la Web, y las más usadas de estas son los motores de búsqueda. Los motores de búsqueda permiten a los usuarios encontrar recursos formulando consultas y revisando una lista de respuestas. Uno de los principales desafíos para la comunidad de la Web es diseñar motores de búsqueda que permitan a los usuarios encontrar recursos semánticamente conectados con sus consultas. El gran tamaño de la Web y la vaguedad de los términos más comúnmente usados en la formulación de consultas es un gran obstáculo para lograr este objetivo. En esta tesis proponemos explorar las selecciones de los usuarios registradas en los logs de los motores de búsqueda para aprender cómo los usuarios buscan y también para diseñar algoritmos que permitan mejorar la precisión de las respuestas recomendadas a los usuarios. Comenzaremos explorando las propiedades de estos datos. Esta exploración nos permitirá determinar la naturaleza dispersa de estos datos. Además presentaremos modelos que nos ayudarán a entender cómo los usuarios buscan en los motores de búsqueda. Luego, exploraremos las selecciones de los usuarios para encontrar asociaciones útiles entre consultas registradas en los logs. Concentraremos los esfuerzos en el diseño de técnicas que permitirán a los usuarios encontrar mejores consultas que la consulta original. Como una aplicación, diseñaremos métodos de reformulación de consultas que ayudarán a los usuarios a encontrar términos más útiles mejorando la representación de sus necesidades. Usando términos de documentos construiremos representaciones vectoriales para consultas. Aplicando técnicas de clustering podremos determinar grupos de consultas similares. Usando estos grupos de consultas, introduciremos métodos para recomendación de consultas y documentos que nos permitirán mejorar la precisión de las recomendaciones. Finalmente, diseñaremos técnicas de clasificación de consultas que nos permitirán encontrar conceptos semánticamente relacionados con la consulta original. Para lograr esto, clasificaremos las consultas de los usuarios en directorios Web. Como una aplicación, introduciremos métodos para la manutención automática de los directorios.
APA, Harvard, Vancouver, ISO, and other styles
24

Wilson, Mathew J. "The effects of search strategies and information interaction on sensemaking." Thesis, Swansea University, 2015. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.678376.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Zhu, Dengya. "Improving the relevance of search results via search-term disambiguation and ontological filtering." Thesis, Curtin University, 2007. http://hdl.handle.net/20.500.11937/2486.

Full text
Abstract:
With the exponential growth of the Web and the inherent polysemy and synonymy problems of the natural languages, search engines are facing many challenges such as information overload, mismatch of search results, missing relevant documents, poorly organized search results, and mismatch of human mental model of clustering engines. To address these issues, much effort including employing different information retrieval (IR) models, information categorization/clustering, personalization, semantic Web, ontology-based IR, and so on, has been devoted to improve the relevance of search results. The major focus of this study is to dynamically re-organize Web search results under a socially constructed hierarchical knowledge structure, to facilitate information seekers to access and manipulate the retrieved search results, and consequently to improve the relevance of search results.To achieve the above research goal, a special search-browser is developed, and its retrieval effectiveness is evaluated. The hierarchical structure of the Open Directory Project (ODP) is employed as the socially constructed knowledge structure which is represented by the Tree component of Java. Yahoo! Search Web Services API is utilized to obtain search results directly from Yahoo! search engine databases. The Lucene text search engine calculates similarities between each returned search result and the semantic characteristics of each category in the ODP; and thus to assign the search results to the corresponding ODP categories by Majority Voting algorithm. When an interesting category is selected by a user, only search results categorized under the category are presented to the user, and the quality of the search results is consequently improved.Experiments demonstrate that the proposed approach of this research can improve the precision of Yahoo! search results at the 11 standard recall levels from an average 41.7 per cent to 65.2 per cent; the improvement is as high as 23.5 per cent. This conclusion is verified by comparing the improvements of the P@5 and P@10 of Yahoo! search results and the categorized search results of the special search-browser. The improvement of P@5 and P@10 are 38.3 per cent (85 per cent - 46.7 per cent) and 28 per cent (70 per cent - 42 per cent) respectively. The experiment of this research is well designed and controlled. To minimize the subjectiveness of relevance judgments, in this research five judges (experts) are asked to make their relevance judgments independently, and the final relevance judgment is a combination of the five judges’ judgments. The judges are presented with only search-terms, information needs, and the 50 search results of Yahoo! Search Web Service API. They are asked to make relevance judgments based on the information provided above, there is no categorization information provided.The first contribution of this research is to use an extracted category-document to represent the semantic characteristics of each of the ODP categories. A category-document is composed of the topic of the category, description of the category, the titles and the brief descriptions of the submitted Web pages under this category. Experimental results demonstrate the category-documents of the ODP can represent the semantic characteristics of the ODP in most cases. Furthermore, for machine learning algorithms, the extracted category-documents can be utilized as training data which otherwise demand much human labor to create to ensure the learning algorithm to be properly trained. The second contribution of this research is the suggestion of the new concepts of relevance judgment convergent degree and relevance judgment divergent degree that are used to measure how well different judges agree with each other when they are asked to judge the relevance of a list of search results. When the relevance judgment convergent degree of a search-term is high, an IR algorithm should obtain a higher precision as well. On the other hand, if the relevance judgment convergent degree is low, or the relevance judgment divergent degree is high, it is arguable to use the data to evaluate the IR algorithm. This intuition is manifested by the experiment of this research. The last contribution of this research is that the developed search-browser is the first IR system (IRS) to utilize the ODP hierarchical structure to categorize and filter search results, to the best of my knowledge.
APA, Harvard, Vancouver, ISO, and other styles
26

Tang, Ling-Xiang. "Link discovery for Chinese/English cross-language web information retrieval." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/58416/1/Ling-Xiang_Tang_Thesis.pdf.

Full text
Abstract:
Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.
APA, Harvard, Vancouver, ISO, and other styles
27

Kinley, Khamsum. "Towards modelling web search behaviour : integrating users’ cognitive styles." Thesis, Queensland University of Technology, 2013. https://eprints.qut.edu.au/63804/1/Kinley_Kinley_Thesis.pdf.

Full text
Abstract:
With the rapid growth of information on the Web, the study of information searching has let to an increased interest. Information behaviour (IB) researchers and information systems (IS) developers are continuously exploring user - Web search interactions to understand and to help users to provide assistance with their information searching. In attempting to develop models of IB, several studies have identified various factors that govern user's information searching and information retrieval (IR), such as age, gender, prior knowledge and task complexity. However, how users' contextual factors, such as cognitive styles, affect Web search interactions has not been clearly explained by the current models of Web Searching and IR. This study explores the influence of users' cognitive styles on their Web search behaviour. The main goal of the study is to enhance Web search models with a better understanding of how these cognitive styles affect Web searching. Modelling Web search behaviour with a greater understanding of user's cognitive styles can help information science researchers and IS designers to bridge the semantic gap between the user and the IS. To achieve the aims of the study, a user study with 50 participants was conducted. The study adopted a mixed method approach incorporating several data collection strategies to gather a range of qualitative and quantitative data. The study utilised pre-search and post-search questionnaires to collect the participants' demographic information and their level of satisfaction about the search interactions. Riding's (1991) Cognitive Style Analysis (CSA) test was used to assess the participants' cognitive styles. Participants completed three predesigned search tasks and the whole user - web search interactions, including thinkaloud, were captured using a monitoring program. Data analysis involved several qualitative and quantitative techniques: the quantitative data gave raise to detailed findings about users' Web searching and cognitive styles, the qualitative data enriched the findings with illustrative examples. The study results provide valuable insights into Web searching behaviour among different cognitive style users. The findings of the study extend our understanding of Web search behaviour and how users search information on the Web. Three key study findings emerged: • Users' Web search behaviour was demonstrated through information searching strategies, Web navigation styles, query reformulation behaviour and information processing approaches while performing Web searches. The manner in which these Web search patterns were demonstrated varied among the users with different cognitive style groups. • Users' cognitive styles influenced their information searching strategies, query reformulation behaviour, Web navigational styles and information processing approaches. Users with particular cognitive styles followed certain Web search patterns. • Fundamental relationships were evident between users' cognitive styles and their Web search behaviours; and these relationships can be illustrated through modelling Web search behaviour. Two models that depict the associations between Web search interactions, user characteristics and users' cognitive styles were developed. These models provide a greater understanding of Web search behaviour from the user perspective, particularly how users' cognitive styles influence their Web search behaviour. The significance of this research is twofold: it will provide insights for information science researchers, information system designers, academics, educators, trainers and librarians who want to better understand how users with different cognitive styles perform information searching on the Web; at the same time, it will provide assistance and support to the users. The major outcomes of this study are 1) a comprehensive analysis of how users search the Web; 2) extensive discussion on the implications of the models developed in this study for future work; and 3) a theoretical framework to bridge high-level search models and cognitive models.
APA, Harvard, Vancouver, ISO, and other styles
28

Edizel, Necati Bora. "Word embeddings with applications to web search and advertising." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/669622.

Full text
Abstract:
Word embeddings are a building block of many practical applications across NLP and related disciplines. In this thesis, we present theoretical analysis and algorithms to learn word embeddings. Moreover, we present applications of word embeddings that concern Web Search and Advertising. We start by presenting theoretical insights for one the most popular algorithm to learn word embeddings \textit{word2vec}. We also model \textit{word2vec} in Reinforcement Learning framework and showed that it's an off-policy learner with a fixed behavior policy. Then we present an off-policy learning algorithm $word2vec_{\pi}$ that uses \textit{word2vec} as a behavior policy. %With extensive experimentation, we show that the proposed method performs better than \textit{word2vec}. Then, we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. Lastly, we propose two novel approaches (one working at the character level and the other working at word level) that use deep convolutional neural networks for a central task in NLP, semantic matching. We experimentally showed the effectiveness of our approach using click-through rate prediction task for Sponsored Search.
Dins del món del Processament del Llenguatge Natural (NLP) i d’altres camps relacionats amb aquest àmbit, les representaciones latents de paraules (word embeddings) s'han convertit en una tecnologia fonamental per a desenvolupar aplicacions pràctiques. En aquesta tesi es presenta un anàlisi teòric d’aquests word embeddings així com alguns algoritmes per a entrenar-los. A més a més, com a aplicació pràctica d’aquesta recerca també es presenten aplicacions per a cerques a la web i màrqueting. Primer, s’introdueixen alguns aspectes teòrics d’un dels algoritmes més populars per a aprendre word embeddings, el word2vec. També es presenta el word2vec en un context de Reinforcement Learning demostrant que modela les normes no explícites (off-policy) en presència d’un conjunt de normes (policies) de comportament fixes. A continuació, presentem un nou algoritme de d’aprenentatge de normes no explícites (off-policy), $word2vec_{\pi}$, com a modelador de normes de comportament. La validació experimental corrobora la superioritat d’aquest nou algorithme respecte \textit{word2vec}. Segon, es presenta un mètode per a aprendre word embeddings que són resistents a errors d’escriptura. La majoria de word embeddings tenen una aplicació limitada quan s’enfronten a textos amb errors o paraules fora del vocabulari. Nosaltres proposem un mètode combinant FastText amb sub-paraules i una tasca supervisada per a aprendre patrons amb errors. Els resultats proven com les paraules mal escrites estan pròximes a les correctes quan les comparem dins de l’embedding. Finalment, aquesta tesi proposa dues tècniques noves (una a nivell de caràcter i l’altra a nivell de paraula) que empren xarxes neuronals (DNNs) per a la tasca de similaritat semàntica. Es demostra experimentalment que aquests mètodes són eficaços per a la predicció de l’eficàcia (click-through rate) dins del context de cerces patrocinades.
APA, Harvard, Vancouver, ISO, and other styles
29

Crain, Steven P. "Personalized search and recommendation for health information resources." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45805.

Full text
Abstract:
Consumers face several challenges using the Internet to fill health-related needs. (1) In many cases, they face a language gap as they look for information that is written in unfamiliar technical language. (2) Medical information in social media is of variable quality and may be appealing even when it is dangerous. (3) Discussion groups provide valuable social support for necessary lifestyle changes, but are variable in their levels of activity. (4) Finding less popular groups is tedious. We present solutions to these challenges. We use a novel adaptation of topic models to address the language gap. Conventional topic models discover a set of unrelated topics that together explain the combinations of words in a collection of documents. We add additional structure that provides relationships between topics corresponding to relationships between consumer and technical medical topics. This allows us to support search for technical information using informal consumer medical questions. We also analyze social media related to eating disorders. A third of these videos promote eating disorders and consumers are twice as engaged by these dangerous videos. We study the interactions of two communities in a photo-sharing site. There, a community that encourages recovery from eating disorders interacts with the pro-eating disorder community in an attempt to persuade them, but we found that this attempt entrenches the pro-eating disorder community more firmly in its position. We study the process by which consumers participate in discussion groups in an online diabetes community. We develop novel event history analysis techniques to identify the characteristics of groups in a diabetes community that are correlated with consumer activity. This analysis reveals that uniformly advertise the popular groups to all consumers impairs the diversity of the groups and limits their value to the community. To help consumers find interesting discussion groups, we develop a system for personalized recommendation for social connections. We extend matrix factorization techniques that are effective for product recommendation so that they become suitable for implicit power-law-distributed social ratings. We identify the best approaches for recommendation of a variety of social connections involving consumers, discussion groups and discussions.
APA, Harvard, Vancouver, ISO, and other styles
30

Zhu, Dengya. "Improving the relevance of search results via search-term disambiguation and ontological filtering." Curtin University of Technology, School of Information Systems, 2007. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=9348.

Full text
Abstract:
With the exponential growth of the Web and the inherent polysemy and synonymy problems of the natural languages, search engines are facing many challenges such as information overload, mismatch of search results, missing relevant documents, poorly organized search results, and mismatch of human mental model of clustering engines. To address these issues, much effort including employing different information retrieval (IR) models, information categorization/clustering, personalization, semantic Web, ontology-based IR, and so on, has been devoted to improve the relevance of search results. The major focus of this study is to dynamically re-organize Web search results under a socially constructed hierarchical knowledge structure, to facilitate information seekers to access and manipulate the retrieved search results, and consequently to improve the relevance of search results.
To achieve the above research goal, a special search-browser is developed, and its retrieval effectiveness is evaluated. The hierarchical structure of the Open Directory Project (ODP) is employed as the socially constructed knowledge structure which is represented by the Tree component of Java. Yahoo! Search Web Services API is utilized to obtain search results directly from Yahoo! search engine databases. The Lucene text search engine calculates similarities between each returned search result and the semantic characteristics of each category in the ODP; and thus to assign the search results to the corresponding ODP categories by Majority Voting algorithm. When an interesting category is selected by a user, only search results categorized under the category are presented to the user, and the quality of the search results is consequently improved.
Experiments demonstrate that the proposed approach of this research can improve the precision of Yahoo! search results at the 11 standard recall levels from an average 41.7 per cent to 65.2 per cent; the improvement is as high as 23.5 per cent. This conclusion is verified by comparing the improvements of the P@5 and P@10 of Yahoo! search results and the categorized search results of the special search-browser. The improvement of P@5 and P@10 are 38.3 per cent (85 per cent - 46.7 per cent) and 28 per cent (70 per cent - 42 per cent) respectively. The experiment of this research is well designed and controlled. To minimize the subjectiveness of relevance judgments, in this research five judges (experts) are asked to make their relevance judgments independently, and the final relevance judgment is a combination of the five judges’ judgments. The judges are presented with only search-terms, information needs, and the 50 search results of Yahoo! Search Web Service API. They are asked to make relevance judgments based on the information provided above, there is no categorization information provided.
The first contribution of this research is to use an extracted category-document to represent the semantic characteristics of each of the ODP categories. A category-document is composed of the topic of the category, description of the category, the titles and the brief descriptions of the submitted Web pages under this category. Experimental results demonstrate the category-documents of the ODP can represent the semantic characteristics of the ODP in most cases. Furthermore, for machine learning algorithms, the extracted category-documents can be utilized as training data which otherwise demand much human labor to create to ensure the learning algorithm to be properly trained. The second contribution of this research is the suggestion of the new concepts of relevance judgment convergent degree and relevance judgment divergent degree that are used to measure how well different judges agree with each other when they are asked to judge the relevance of a list of search results. When the relevance judgment convergent degree of a search-term is high, an IR algorithm should obtain a higher precision as well. On the other hand, if the relevance judgment convergent degree is low, or the relevance judgment divergent degree is high, it is arguable to use the data to evaluate the IR algorithm. This intuition is manifested by the experiment of this research. The last contribution of this research is that the developed search-browser is the first IR system (IRS) to utilize the ODP hierarchical structure to categorize and filter search results, to the best of my knowledge.
APA, Harvard, Vancouver, ISO, and other styles
31

Du, Jia (Tina). "Multitasking, cognitive coordination and cognitive shifts during web searching." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/35717/1/Jia_Du_Thesis.pdf.

Full text
Abstract:
As Web searching becomes more prolific for information access worldwide, we need to better understand users’ Web searching behaviour and develop better models of their interaction with Web search systems. Web search modelling is a significant and important area of Web research. Searching on the Web is an integral element of information behaviour and human–computer interaction. Web searching includes multitasking processes, the allocation of cognitive resources among several tasks, and shifts in cognitive, problem and knowledge states. In addition to multitasking, cognitive coordination and cognitive shifts are also important, but are under-explored aspects of Web searching. During the Web searching process, beyond physical actions, users experience various cognitive activities. Interactive Web searching involves many users’ cognitive shifts at different information behaviour levels. Cognitive coordination allows users to trade off the dependences among multiple information tasks and the resources available. Much research has been conducted into Web searching. However, few studies have modelled the nature of and relationship between multitasking, cognitive coordination and cognitive shifts in the Web search context. Modelling how Web users interact with Web search systems is vital for the development of more effective Web IR systems. This study aims to model the relationship between multitasking, cognitive coordination and cognitive shifts during Web searching. A preliminary theoretical model is presented based on previous studies. The research is designed to validate the preliminary model. Forty-two study participants were involved in the empirical study. A combination of data collection instruments, including pre- and post-questionnaires, think-aloud protocols, search logs, observations and interviews were employed to obtain users’ comprehensive data during Web search interactions. Based on the grounded theory approach, qualitative analysis methods including content analysis and verbal protocol analysis were used to analyse the data. The findings were inferred through an analysis of questionnaires, a transcription of think-aloud protocols, the Web search logs, and notes on observations and interviews. Five key findings emerged. (1) Multitasking during Web searching was demonstrated as a two-dimensional behaviour. The first dimension was represented as multiple information problems searching by task switching. Users’ Web searching behaviour was a process of multiple tasks switching, that is, from searching on one information problem to searching another. The second dimension of multitasking behaviour was represented as an information problem searching within multiple Web search sessions. Users usually conducted Web searching on a complex information problem by submitting multiple queries, using several Web search systems and opening multiple windows/tabs. (2) Cognitive shifts were the brain’s internal response to external stimuli. Cognitive shifts were found as an essential element of searching interactions and users’ Web searching behaviour. The study revealed two kinds of cognitive shifts. The first kind, the holistic shift, included users’ perception on the information problem and overall information evaluation before and after Web searching. The second kind, the state shift, reflected users’ changes in focus between the different cognitive states during the course of Web searching. Cognitive states included users’ focus on the states of topic, strategy, evaluation, view and overview. (3) Three levels of cognitive coordination behaviour were identified: the information task coordination level, the coordination mechanism level, and the strategy coordination level. The three levels of cognitive coordination behaviour interplayed to support multiple information tasks switching. (4) An important relationship existed between multitasking, cognitive coordination and cognitive shifts during Web searching. Cognitive coordination as a management mechanism bound together other cognitive processes, including multitasking and cognitive shifts, in order to move through users’ Web searching process. (5) Web search interaction was shown to be a multitasking process which included information problems ordering, task switching and task and mental coordinating; also, at a deeper level, cognitive shifts took place. Cognitive coordination was the hinge behaviour linking multitasking and cognitive shifts. Without cognitive coordination, neither multitasking Web searching behaviour nor the complicated mental process of cognitive shifting could occur. The preliminary model was revisited with these empirical findings. A revised theoretical model (MCC Model) was built to illustrate the relationship between multitasking, cognitive coordination and cognitive shifts during Web searching. Implications and limitations of the study are also discussed, along with future research work.
APA, Harvard, Vancouver, ISO, and other styles
32

Fidan, Guven. "Identifying The Effectiveness Of A Web Search Engine With Turkish Domain Dependent Impacts And Global Scale Information Retrieval Improvements." Phd thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614116/index.pdf.

Full text
Abstract:
This study investigates the effectiveness of a Web search engine with newly added or improved features in Web search engine architecture. These features can be categorized into three groups: The impact of link quality and usage information on page importance calculation
the use of Turkish stemmer for indexing and query substitution
and, the use of thumbnails for Web search engine result visualization. As Web search engines have become the primary means for finding and accessing information on the Internet, the effectiveness of Web search engines should be evaluated on the idea of how effectively and efficiently they assist users achieve a query, which defines performance criteria rather than the pure precision and recall measures developed among basic information retrieval roles. In this thesis, we propose three distinguishing features to increase the efficiency of a Web search engine: The impact of link quality and usage information on page importance calculation outperforms classical hyperlink graph based methods notably, such as PageRank. The use of the Turkish stemmer for indexing and query substitution has remarkable improvements on Web relevance when used in a mixed framework with normal and stemmed forms. Finally, we have observed that users are able to find the most relevant results by using webpage thumbnails in the queries with decreased precision score values, despite their preferred search engine gazing behavior is much attributed.
APA, Harvard, Vancouver, ISO, and other styles
33

Rahuma, Awatef. "Semantically-enhanced image tagging system." Thesis, De Montfort University, 2013. http://hdl.handle.net/2086/9494.

Full text
Abstract:
In multimedia databases, data are images, audio, video, texts, etc. Research interests in these types of databases have increased in the last decade or so, especially with the advent of the Internet and Semantic Web. Fundamental research issues vary from unified data modelling, retrieval of data items and dynamic nature of updates. The thesis builds on findings in Semantic Web and retrieval techniques and explores novel tagging methods for identifying data items. Tagging systems have become popular which enable the users to add tags to Internet resources such as images, video and audio to make them more manageable. Collaborative tagging is concerned with the relationship between people and resources. Most of these resources have metadata in machine processable format and enable users to use free- text keywords (so-called tags) as search techniques. This research references some tagging systems, e.g. Flicker, delicious and myweb2.0. The limitation with such techniques includes polysemy (one word and different meaning), synonymy (different words and one meaning), different lexical forms (singular, plural, and conjugated words) and misspelling errors or alternate spellings. The work presented in this thesis introduces semantic characterization of web resources that describes the structure and organization of tagging, aiming to extend the existing Multimedia Query using similarity measures to cater for collaborative tagging. In addition, we discuss the semantic difficulties of tagging systems, suggesting improvements in their accuracies. The scope of our work is classified as follows: (i) Increase the accuracy and confidence of multimedia tagging systems. (ii) Increase the similarity measures of images by integrating varieties of measures. To address the first shortcoming, we use the WordNet based on a tagging system for social sharing and retrieval of images as a semantic lingual ontology resource. For the second shortcoming we use the similarity measures in different ways to recognise the multimedia tagging system. Fundamental to our work is the novel information model that we have constructed for our computation. This is based on the fact that an image is a rich object that can be characterised and formulated in n-dimensions, each dimension contains valuable information that will help in increasing the accuracy of the search. For example an image of a tree in a forest contains more information than an image of the same tree but in a different environment. In this thesis we characterise a data item (an image) by a primary description, followed by n-secondary descriptions. As n increases, the accuracy of the search improves. We give various techniques to analyse data and its associated query. To increase the accuracy of the tagging system we have performed different experiments on many images using similarity measures and various techniques from VoI (Value of Information). The findings have shown the linkage/integration between similarity measures and that VoI improves searches and helps/guides a tagger in choosing the most adequate of tags.
APA, Harvard, Vancouver, ISO, and other styles
34

De, Groc Clément. "Collecte orientée sur le Web pour la recherche d’information spécialisée." Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112073/document.

Full text
Abstract:
Les moteurs de recherche verticaux, qui se concentrent sur des segments spécifiques du Web, deviennent aujourd'hui de plus en plus présents dans le paysage d'Internet. Les moteurs de recherche thématiques, notamment, peuvent obtenir de très bonnes performances en limitant le corpus indexé à un thème connu. Les ambiguïtés de la langue sont alors d'autant plus contrôlables que le domaine est bien ciblé. De plus, la connaissance des objets et de leurs propriétés rend possible le développement de techniques d'analyse spécifiques afin d'extraire des informations pertinentes.Dans le cadre de cette thèse, nous nous intéressons plus précisément à la procédure de collecte de documents thématiques à partir du Web pour alimenter un moteur de recherche thématique. La procédure de collecte peut être réalisée en s'appuyant sur un moteur de recherche généraliste existant (recherche orientée) ou en parcourant les hyperliens entre les pages Web (exploration orientée).Nous étudions tout d'abord la recherche orientée. Dans ce contexte, l'approche classique consiste à combiner des mot-clés du domaine d'intérêt, à les soumettre à un moteur de recherche et à télécharger les meilleurs résultats retournés par ce dernier.Après avoir évalué empiriquement cette approche sur 340 thèmes issus de l'OpenDirectory, nous proposons de l'améliorer en deux points. En amont du moteur de recherche, nous proposons de formuler des requêtes thématiques plus pertinentes pour le thème afin d'augmenter la précision de la collecte. Nous définissons une métrique fondée sur un graphe de cooccurrences et un algorithme de marche aléatoire, dans le but de prédire la pertinence d'une requête thématique. En aval du moteur de recherche, nous proposons de filtrer les documents téléchargés afin d'améliorer la qualité du corpus produit. Pour ce faire, nous modélisons la procédure de collecte sous la forme d'un graphe triparti et appliquons un algorithme de marche aléatoire biaisé afin d'ordonner par pertinence les documents et termes apparaissant dans ces derniers.Dans la seconde partie de cette thèse, nous nous focalisons sur l'exploration orientée du Web. Au coeur de tout robot d'exploration orientée se trouve une stratégie de crawl qui lui permet de maximiser le rapatriement de pages pertinentes pour un thème, tout en minimisant le nombre de pages visitées qui ne sont pas en rapport avec le thème. En pratique, cette stratégie définit l'ordre de visite des pages. Nous proposons d'apprendre automatiquement une fonction d'ordonnancement indépendante du thème à partir de données existantes annotées automatiquement
Vertical search engines, which focus on a specific segment of the Web, become more and more present in the Internet landscape. Topical search engines, notably, can obtain a significant performance boost by limiting their index on a specific topic. By doing so, language ambiguities are reduced, and both the algorithms and the user interface can take advantage of domain knowledge, such as domain objects or characteristics, to satisfy user information needs.In this thesis, we tackle the first inevitable step of a all topical search engine : focused document gathering from the Web. A thorough study of the state of art leads us to consider two strategies to gather topical documents from the Web: either relying on an existing search engine index (focused search) or directly crawling the Web (focused crawling).The first part of our research has been dedicated to focused search. In this context, a standard approach consists in combining domain-specific terms into queries, submitting those queries to a search engine and down- loading top ranked documents. After empirically evaluating this approach over 340 topics, we propose to enhance it in two different ways: Upstream of the search engine, we aim at formulating more relevant queries in or- der to increase the precision of the top retrieved documents. To do so, we define a metric based on a co-occurrence graph and a random walk algorithm, which aims at predicting the topical relevance of a query. Downstream of the search engine, we filter the retrieved documents in order to improve the document collection quality. We do so by modeling our gathering process as a tripartite graph and applying a random walk with restart algorithm so as to simultaneously order by relevance the documents and terms appearing in our corpus.In the second part of this thesis, we turn to focused crawling. We describe our focused crawler implementation that was designed to scale horizontally. Then, we consider the problem of crawl frontier ordering, which is at the very heart of a focused crawler. Such ordering strategy allows the crawler to prioritize its fetches, maximizing the number of in-domain documents retrieved while minimizing the non relevant ones. We propose to apply learning to rank algorithms to efficiently order the crawl frontier, and define a method to learn a ranking function from existing crawls
APA, Harvard, Vancouver, ISO, and other styles
35

Marangon, Sílvio Luís. "Análise de métodos para programação de contextualização." Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-14122006-112458/.

Full text
Abstract:
A localização de páginas relevantes na Internet em atividades como clipping de notícias, detecção de uso indevido de marcas ou em serviços anti-phishing torna-se cada vez mais complexa devido a vários fatores como a quantidade cada vez maior de páginas na Web e a grande quantidade de páginas irrelevantes retornadas por mecanismos de busca. Em muitos casos as técnicas tradicionais utilizadas em mecanismos de busca na Internet, isto é, localização de termos em páginas e ordenação por relevância, não são suficientes para resolver o problema de localização de páginas específicas em atividades como as citadas anteriormente. A contextualização das páginas, ou seja, a classificação de páginas segundo um contexto definido pelo usuário baseando-se nas necessidades de uma atividade específica deve permitir uma busca mais eficiente por páginas na Internet. Neste trabalho é estudada a utilização de métodos de mineração na Web para a composição de métodos de contextualização de páginas, que permitam definir contextos mais sofisticados como seu assunto ou alguma forma de relacionamento. A contextualização de páginas deve permitir a solução de vários problemas na busca de páginas na Internet pela composição de métodos, que permitam a localização de páginas através de um conjunto de suas características, diferentemente de mecanismos de busca tradicionais que apenas localizam páginas que possuam um ou mais termos especificados.
Internet services as news clipping service, anti-phising, anti-plagiarism service and other that require intensive searching in Internet have a difficult work, because of huge number of existing pages. Search Engines try driver this problem, but search engines methods retrieve a lot of irrelevant pages, some times thousands of pages and more powerful methods are necessary to drive this problem. Page content, subject, hyperlinks or location can be used to define page context and create a more powerful method that can retrieve more relevant pages, improving precision. Classification of page context is defined as classification of a page by a set of its feature. This report presents a study about Web Mining, Search Engines and application of web mining technologies to classify page context. Page context classification applied to search engines must solve the problem of irrelevant pages flood by allowing search engines retrieve pages of a context.
APA, Harvard, Vancouver, ISO, and other styles
36

Moral, Ibrahim Utku. "Publication of the Bibliographies on the World Wide Web." Thesis, Virginia Tech, 1997. http://hdl.handle.net/10919/36748.

Full text
Abstract:
Every scientific research begins with a literature review that includes an extensive bibliographic search. Such searches are known to be difficult and time-consuming because of the vast amount of topical material existing in today's ever-changing technology base. Keeping up-to-date with related literature and being aware of the most recent publications require extensive time and effort. The need for a WWW-based software tool for collecting and providing access to this scientific body of knowledge is undeniable. The study explained herein deals with this problem by developing an efficient, advanced, easy-to-use tool, WebBiblio, that provides a globally accessible WWW environment enabling the collection and dissemination of searchable bibliographies comprised of abstracts and keywords. This thesis describes the design, structure and features of WebBiblio, and explains the ideas and approaches used in its development. The developed system is not a prototype, but a production system that exploits the capabilities of the WWW. Currently, it is used to publish three VV&T bibliographies at the WWW site: http://manta.cs.vt.edu/biblio. With its rich set of features and ergonomically engineered interface, WebBiblio brings a comprehensive solution to solving the problem of globally collecting and providing access to a diverse set of bibliographies.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
37

He, Hai. "Towards automatic understanding and integration of web databases for developing large-scale unified access systems." Diss., Online access via UMI:, 2006.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
38

Lunardi, Marcia Severo. "Visualização em nuvens de texto como apoio à busca exploratória na web." Universidade do Estado do Rio de Janeiro, 2008. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=1522.

Full text
Abstract:
A presente dissertação é o resultado de uma pesquisa que avalia as vantagens da utilização de nuvens de texto para apresentar os resultados de um sistema de busca na web. Uma nuvem de texto é uma técnica de visualização de informações textuais e tem como principal objetivo proporcionar um resumo de um ou mais conteúdos em uma única tela. Em uma consulta na web, os resultados aparecem listados em diversas páginas. Através de uma nuvem de texto integrada a um sistema de busca é possível a visualização de uma síntese, de um resumo automático, do conteúdo dos resultados listados em várias páginas sem que elas tenham que ser percorridas e os sites acessados individualmente. A nuvem de texto nesse contexto funciona como uma ferramenta auxiliar para que o usuário possa gerenciar a grande carga de informação que é disponibilizada nos resultados das consultas. Dessa forma os resultados podem ser vistos em contexto e, ainda, as palavras que compõem a nuvem, podem ser utilizadas como palavras-chave adicionais para complementar uma consulta inicial. Essa pesquisa foi desenvolvida em duas fases. A primeira consistiu no desenvolvimento de uma aplicação integrada a um sistema de buscas para mostrar seus resultados em nuvens de texto. A segunda fase foi a avaliação dessa aplicação, focada principalmente em buscas exploratórias, que são aquelas em que os objetivos dos usuários não são bem definidos ou o conhecimento sobre o assunto pesquisado é vago.
This dissertation presents the results of a research that evaluates the advantages of text clouds to the visualization of web search results. A text cloud is a visualization technique for texts and textual data in general. Its main purpose is to enhance comprehension of a large body of text by summarizing it automatically and is generally applied for managing information overload. While continual improvements in search technology have made it possible to quickly find relevant information on the web, few search engines do anything to organize or to summarize the contents of such responses beyond ranking the items in a list. In exploratory searches, users may be forced to scroll through many pages to identify the information they seek and are generally not provided with any way to visualize the totality of the results returned. This research is divided in two parts. Part one describes the development of an application that generates text clouds for the summarization of search results from the standard result list provided by the Yahoo search engine. The second part describes the evaluation of this application. Adapted to this specific context, a text cloud is generated from the text of the first sites returned by the search engine according to its relevance algorithms. The benefit of this application is that it enables users to obtain a visual overview of the main results at once. From this overview the users can obtain keywords to navigate to potential relevant subjects that otherwise would be hidden deep down in the response list. Also, users can realize by visualizing the results in context that his initial query term was not the best choice.
APA, Harvard, Vancouver, ISO, and other styles
39

Sahay, Saurav. "Socio-semantic conversational information access." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42855.

Full text
Abstract:
The main contributions of this thesis revolve around development of an integrated conversational recommendation system, combining data and information models with community network and interactions to leverage multi-modal information access. We have developed a real time conversational information access community agent that leverages community knowledge by pushing relevant recommendations to users of the community. The recommendations are delivered in the form of web resources, past conversation and people to connect to. The information agent (cobot, for community/ collaborative bot) monitors the community conversations, and is 'aware' of users' preferences by implicitly capturing their short term and long term knowledge models from conversations. The agent leverages from health and medical domain knowledge to extract concepts, associations and relationships between concepts; formulates queries for semantic search and provides socio-semantic recommendations in the conversation after applying various relevance filters to the candidate results. The agent also takes into account users' verbal intentions in conversations while making recommendation decision. One of the goals of this thesis is to develop an innovative approach to delivering relevant information using a combination of social networking, information aggregation, semantic search and recommendation techniques. The idea is to facilitate timely and relevant social information access by mixing past community specific conversational knowledge and web information access to recommend and connect users with relevant information. Language and interaction creates usable memories, useful for making decisions about what actions to take and what information to retain. Cobot leverages these interactions to maintain users' episodic and long term semantic models. The agent analyzes these memory structures to match and recommend users in conversations by matching with the contextual information need. The social feedback on the recommendations is registered in the system for the algorithms to promote community preferred, contextually relevant resources. The nodes of the semantic memory are frequent concepts extracted from user's interactions. The concepts are connected with associations that develop when concepts co-occur frequently. Over a period of time when the user participates in more interactions, new concepts are added to the semantic memory. Different conversational facets are matched with episodic memories and a spreading activation search on the semantic net is performed for generating the top candidate user recommendations for the conversation. The tying themes in this thesis revolve around informational and social aspects of a unified information access architecture that integrates semantic extraction and indexing with user modeling and recommendations.
APA, Harvard, Vancouver, ISO, and other styles
40

Angelini, Marco. "Un approccio per la concettualizzazione di insiemi di documenti." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/5604/.

Full text
Abstract:
Introduzione a tecniche di web semantico e realizzazione di un approccio in grado di ricreare un ambiente familiare di un qualsiasi motore di ricerca con funzionalità semantico-lessicali e possibilità di estrazione, in base ai risultati di ricerca, dei concetti e termini chiave che costituiranno i relativi gruppi di raccolta per i vari documenti con argomenti in comune.
APA, Harvard, Vancouver, ISO, and other styles
41

Souza, Jucimar Brito de. "Algoritmos para avaliação de confiança em apontadores encontrados na Web." Universidade Federal do Amazonas, 2009. http://tede.ufam.edu.br/handle/tede/2960.

Full text
Abstract:
Made available in DSpace on 2015-04-11T14:03:17Z (GMT). No. of bitstreams: 1 DISSERTACAO JUCIMAR.pdf: 1288048 bytes, checksum: eec502380e9a7d5716cd68993d6cab40 (MD5) Previous issue date: 2009-04-23
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Search engines have become an essential tool for web users today. They use algorithms to analyze the linkage relationships of the pages in order to estimate popularity for each page, taking each link as a vote of quality for pages. This information is used in the search engine ranking algorithms. However, a large amount of links found on the Web can not be considered as a good vote for quality, presenting information that can be considered as noise for search engine ranking algorithms. This work aims to detect noises in the structure of links that exist in search engine collections. We studied the impact of the methods developed here for detection of noisy links, considering scenarios in which the reputation of pages is calculated using Pagerank and Indegree algorithms. The results of the experiments showed improvement up to 68.33% in metric Mean Reciprocal Rank (MRR) for navigational queries and up to 35.36% for randomly selected navigational queries.
Máquinas de busca têm se tornado uma ferramenta imprescindível para os usuários da Web. Elas utilizam algoritmos de análise de apontadores para explorar a estrutura dos apontadores da Web para atribuir uma estimativa de popularidade a cada página. Essa informação é usada na ordenação da lista de respostas dada por máquinas de busca a consultas submetidas por seus usuários. Contudo, alguns tipos de apontadores prejudicam a qualidade da estimativa de popularidade por apresentar informação ruidosa, podendo assim afetar negativamente a qualidade de respostas providas por máquinas de busca a seus usuários. Exemplos de tais apontadores incluem apontadores repetidos, apontadores resultantes da duplicação de páginas, SPAM, dentre outros. Esse trabalho tem como objetivo detectar ruídos na estrutura dos apontadores existentes em base de dados de máquinas de busca. Foi estudado o impacto dos métodos aqui desenvolvidos para detecção de apontadores ruidosos, considerando cenários nos quais a reputação das páginas é calculada tanto com o algoritmos Pagerank quanto com o algoritmo Indegree. Os resultados dos experimentos apresentaram melhoria de até 68,33% na métrica Mean Reciprocal Rank (MRR) para consultas navegacionais e de até 35,36% para as consultas navegacionais aleatórias quando uma máquina de busca utiliza o algoritmo Pagerank.
APA, Harvard, Vancouver, ISO, and other styles
42

Lisena, Pasquale. "Knowledge-based music recommendation : models, algorithms and exploratory search." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS614.

Full text
Abstract:
Représenter l'information décrivant la musique est une activité complexe, qui implique différentes sous-tâches. Ce manuscrit de thèse porte principalement sur la musique classique et étudie comment représenter et exploiter ses informations. L'objectif principal est l'étude de stratégies de représentation et de découverte des connaissances appliquées à la musique classique, dans des domaines tels que la production de base de connaissances, la prédiction de métadonnées et les systèmes de recommandation. Nous proposons une architecture pour la gestion des métadonnées de musique à l'aide des technologies du Web Sémantique. Nous introduisons une ontologie spécialisée et un ensemble de vocabulaires contrôlés pour les différents concepts spécifiques à la musique. Ensuite, nous présentons une approche de conversion des données, afin d’aller au-delà de la pratique bibliothécaire actuellement utilisée, en s’appuyant sur des règles de mapping et sur l’interconnexion avec des vocabulaires contrôlés. Enfin, nous montrons comment ces données peuvent être exploitées. En particulier, nous étudions des approches basées sur des plongements calculés sur des métadonnées structurées, des titres et de la musique symbolique pour classer et recommander de la musique. Plusieurs applications de démonstration ont été réalisées pour tester les approches et les ressources précédentes
Representing the information about music is a complex activity that involves different sub-tasks. This thesis manuscript mostly focuses on classical music, researching how to represent and exploit its information. The main goal is the investigation of strategies of knowledge representation and discovery applied to classical music, involving subjects such as Knowledge-Base population, metadata prediction, and recommender systems. We propose a complete workflow for the management of music metadata using Semantic Web technologies. We introduce a specialised ontology and a set of controlled vocabularies for the different concepts specific to music. Then, we present an approach for converting data, in order to go beyond the librarian practice currently in use, relying on mapping rules and interlinking with controlled vocabularies. Finally, we show how these data can be exploited. In particular, we study approaches based on embeddings computed on structured metadata, titles, and symbolic music for ranking and recommending music. Several demo applications have been realised for testing the previous approaches and resources
APA, Harvard, Vancouver, ISO, and other styles
43

Reis, Thiago. "Algoritmo rastreador web especialista nuclear." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/85/85133/tde-07012014-134548/.

Full text
Abstract:
Nos últimos anos a Web obteve um crescimento exponencial, se tornando o maior repositório de informações já criado pelo homem e representando uma fonte nova e relevante de informações potencialmente úteis para diversas áreas, inclusive a área nuclear. Entretanto, devido as suas características e, principalmente, devido ao seu grande volume de dados, emerge um problema desafiador relacionado à utilização das suas informações: a busca e recuperação informações relevantes e úteis. Este problema é tratado por algoritmos de busca e recuperação de informação que trabalham na Web, denominados rastreadores web. Neste trabalho é apresentada a pesquisa e desenvolvimento de um algoritmo rastreador que efetua buscas e recupera páginas na Web com conteúdo textual relacionado ao domínio nuclear e seus temas, de forma autônoma e massiva. Este algoritmo foi projetado sob o modelo de um sistema especialista, possuindo, desta forma, uma base de conhecimento que contem tópicos nucleares e palavras-chave que os definem e um mecanismo de inferência constituído por uma rede neural artificial perceptron multicamadas que efetua a estimação da relevância das páginas na Web para um determinado tópico nuclear, no decorrer do processo de busca, utilizando a base de conhecimento. Deste modo, o algoritmo é capaz de, autonomamente, buscar páginas na Web seguindo os hiperlinks que as interconectam e recuperar aquelas que são mais relevantes para o tópico nuclear selecionado, emulando a habilidade que um especialista nuclear tem de navegar na Web e verificar informações nucleares. Resultados experimentais preliminares apresentam uma precisão de recuperação de 80% para o tópico área nuclear em geral e 72% para o tópico de energia nuclear, indicando que o algoritmo proposto é efetivo e eficiente na busca e recuperação de informações relevantes para o domínio nuclear.
Over the last years the Web has obtained an exponential growth, becoming the largest information repository ever created and representing a new and valuable source of potentially useful information for several topics and also for nuclear-related themes. However, due to the Web characteristics and, mainly, because of its huge data volume, finding and retrieving relevant and useful information are non-trivial tasks. This challenge is addressed by web search and retrieval algorithms called web crawlers. This work presents the research and development of a crawler algorithm able to search and retrieve webpages with nuclear-related textual content, in autonomous and massive fashion. This algorithm was designed under the expert systems model, having, this way, a knowledge base that contains a list of nuclear topics and keywords that define them and an inference engine composed of a multi-layer perceptron artificial neural network that performs webpages relevance estimates to some knowledge base nuclear topic while searching the Web. Thus, the algorithm is able to autonomously search the Web by following the hyperlinks that interconnect the webpages and retrieving those that are more relevant to some predefined nuclear topic, emulating the ability a nuclear expert has to browse the Web and evaluate nuclear information. Preliminary experimental results show a retrieval precision of 80% for the nuclear general domain topic and 72% for the nuclear power topic, indicating that the proposed algorithm is effective and efficient to search the Web and to retrieve nuclear-related information.
APA, Harvard, Vancouver, ISO, and other styles
44

Penatti, Otávio Augusto Bizetto 1984. "Estudo comparativo de descritores para recuperação de imagens por conteudo na web." [s.n.], 2009. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276157.

Full text
Abstract:
Orientador: Ricardo da Silva Torres
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-13T11:00:01Z (GMT). No. of bitstreams: 1 Penatti_OtavioAugustoBizetto_M.pdf: 2250748 bytes, checksum: 57d5b2f9120a8eae69ee9881d363e9ce (MD5) Previous issue date: 2009
Resumo: A crescente quantidade de imagens geradas e disponibilizadas atualmente tem eito aumentar a necessidade de criação de sistemas de busca para este tipo de informação. Um método promissor para a realização da busca de imagens e a busca por conteúdo. Este tipo de abordagem considera o conteúdo visual das imagens, como cor, textura e forma de objetos, para indexação e recuperação. A busca de imagens por conteúdo tem como componente principal o descritor de imagens. O descritor de imagens é responsável por extrair propriedades visuais das imagens e armazená-las em vetores de características. Dados dois vetores de características, o descritor compara-os e retorna um valor de distancia. Este valor quantifica a diferença entre as imagens representadas pelos vetores. Em um sistema de busca de imagens por conteúdo, a distancia calculada pelo descritor de imagens é usada para ordenar as imagens da base em relação a uma determinada imagem de consulta. Esta dissertação realiza um estudo comparativo de descritores de imagens considerando a Web como cenário de uso. Este cenário apresenta uma quantidade muito grande de imagens e de conteúdo bastante heterogêneo. O estudo comparativo realizado nesta dissertação é feito em duas abordagens. A primeira delas considera a complexidade assinto tica dos algoritmos de extração de vetores de características e das funções de distancia dos descritores, os tamanhos dos vetores de características gerados pelos descritores e o ambiente no qual cada descritor foi validado originalmente. A segunda abordagem compara os descritores em experimentos práticos em quatro bases de imagens diferentes. Os descritores são avaliados segundo tempo de extração, tempo para cálculos de distancia, requisitos de armazenamento e eficácia. São comparados descritores de cor, textura e forma. Os experimentos são realizados com cada tipo de descritor independentemente e, baseado nestes resultados, um conjunto de descritores é avaliado em uma base com mais de 230 mil imagens heterogêneas, que reflete o conteúdo encontrado na Web. A avaliação de eficácia dos descritores na base de imagens heterogêneas é realizada por meio de experimentos com usuários reais. Esta dissertação também apresenta uma ferramenta para a realização automatizada de testes comparativos entre descritores de imagens.
Abstract: The growth in size of image collections and the worldwide availability of these collections has increased the demand for image retrieval systems. A promising approach to address this demand is to retrieve images based on image content (Content-Based Image Retrieval). This approach considers the image visual properties, like color, texture and shape of objects, for indexing and retrieval. The main component of a content-based image retrieval system is the image descriptor. The image descriptor is responsible for encoding image properties into feature vectors. Given two feature vectors, the descriptor compares them and computes a distance value. This value quantifies the difference between the images represented by their vectors. In a content-based image retrieval system, these distance values are used to rank database images with respect to their distance to a given query image. This dissertation presents a comparative study of image descriptors considering the Web as the environment of use. This environment presents a huge amount of images with heterogeneous content. The comparative study was conducted by taking into account two approaches. The first approach considers the asymptotic complexity of feature vectors extraction algorithms and distance functions, the size of the feature vectors generated by the descriptors and the environment where each descriptor was validated. The second approach compares the descriptors in practical experiments using four different image databases. The evaluation considers the time required for features extraction, the time for computing distance values, the storage requirements and the effectiveness of each descriptor. Color, texture, and shape descriptors were compared. The experiments were performed with each kind of descriptor independently and, based on these results, a set of descriptors was evaluated in an image database containing more than 230 thousand heterogeneous images, reflecting the content existent in the Web. The evaluation of descriptors effectiveness in the heterogeneous database was made by experiments using real users. This dissertation also presents a tool for executing experiments aiming to evaluate image descriptors.
Mestrado
Sistemas de Informação
Mestre em Ciência da Computação
APA, Harvard, Vancouver, ISO, and other styles
45

Synek, Pavel. "Metody vyhledávání informací na webu první a druhé generace." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-165288.

Full text
Abstract:
The thesis aims on methods and strategies usable for information retrieval on the internet as we know it today and points at how the requirements on knowledge of end-user and abilities to use different information sources have been changing; in other words it points the increasing demands on information literacy. The document is divided into three parts. The first one briefly provides an introduction to strategies of information retrieval. The second part deals on theoretic basis with tools and information sources for searching on the both surface and deep web. The author also focuses on social web and its special attributes from the searcher's point of view. The third part represents a practical utilization of knowledge from previous parts. Practical part demonstrates on model examples a nature and kind of information we can retrieve from particular resources.
APA, Harvard, Vancouver, ISO, and other styles
46

Santos, Célia Francisca dos. "Métodos de poda estática para índices de máquinas de busca." Universidade Federal do Amazonas, 2006. http://tede.ufam.edu.br/handle/tede/2944.

Full text
Abstract:
Made available in DSpace on 2015-04-11T14:03:08Z (GMT). No. of bitstreams: 1 Celia Francisca dos Santos.pdf: 545200 bytes, checksum: 1be2bb65210d0ea7f3239ecdd2efa28d (MD5) Previous issue date: 2006-02-22
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Neste trabalho são propostos e avaliados experimentalmente novos métodos de poda estática especialmente projetados para máquinas de busca web. Os métodos levam em consideração a localidade de ocorrência dos termos nos documentos para realizar a poda em índices de máquinas de busca e, por esta razão, são chamados de "métodos de poda baseados em localidade". Quatro novos métodos de poda que utilizam informação de localidade são propostos aqui: two-pass lbpm, full coverage, top fragments e random. O método two-pass lbpm é o mais efetivo dentre os métodos baseados em localidade, mas requer uma construção completa dos índices antes de realizar o processo de poda. Por outro lado, full coverage, top fragments e random são métodos single-pass que executam a poda dos índices sem requerer uma construção prévia dos índices originais. Os métodos single-pass são úteis para ambientes onde a base de documentos sofre alterações contínuas, como em máquinas de busca de grande escala desenvolvidas para a web. Experimentos utilizando uma máquina de busca real mostram que os métodos propostos neste trabalho podem reduzir o custo de armazenamento dos índices em até 60%, enquanto mantém uma perda mínima de precisão. Mais importante, os resultados dos experimentos indicam que esta mesma redução de 60% no tamanho dos índices pode reduzir o tempo de processamento de consultas para quase 57% do tempo original. Além disso, os experimentos mostram que, para consultas conjuntivas e frases, os métodos baseados em localidade produzem resultados melhores do que o método de Carmel, melhor método proposto na literatura. Por exemplo, utilizando apenas consultas com frases, com uma redução de 67% no tamanho dos índices, o método baseados em localidade two-pass lbpm produziu resultados com uma grau de similaridade de 0.71, em relação aos resultados obtidos com os índices originais, enquanto o método de Carmel produziu resultados com um grau de similaridade de apenas 0.39. Os resultados obtidos mostram que os métodos de poda baseados em localidade são mais efetivos em manter a qualidade dos resultados providos por máquinas de busca.
APA, Harvard, Vancouver, ISO, and other styles
47

Htait, Amal. "Sentiment analysis at the service of book search." Electronic Thesis or Diss., Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0260.

Full text
Abstract:
Le Web est en croissance continue, et une quantité énorme de données est générée par les réseaux sociaux, permettant aux utilisateurs d'échanger une grande diversité d'informations. En outre, les textes au sein des réseaux sociaux sont souvent subjectifs. L'exploitation de cette subjectivité présente au sein des textes peut être un facteur important lors d'une recherche d'information. En particulier, cette thèse est réalisée pour répondre aux besoins de la plate-forme Books de Open Edition en matière d'amélioration de la recherche et la recommandation de livres, en plusieurs langues. La plateforme offre des informations générées par des utilisateurs, riches en sentiments. Par conséquent, l'analyse précédente, concernant l'exploitation de sentiment en recherche d'information, joue un rôle important dans cette thèse et peut servir l'objectif d'une amélioration de qualité de la recherche de livres en utilisant les informations générées par les utilisateurs. Par conséquent, nous avons choisi de suivre une voie principale dans cette thèse consistant à combiner les domaines analyse de sentiment (AS) et recherche d'information (RI), dans le but d'améliorer les suggestions de la recherche de livres. Nos objectifs peuvent être résumés en plusieurs points: • Une approche d'analyse de sentiment, facilement applicable sur différentes langues, peu coûteuse en temps et en données annotées. • De nouvelles approches pour l'amélioration de la qualité lors de la recherche de livres, basées sur l'utilisation de l'analyse de sentiment dans le filtrage, l'extraction et la classification des informations
The web technology is in an on going growth, and a huge volume of data is generated in the social web, where users would exchange a variety of information. In addition to the fact that social web text may be rich of information, the writers are often guided by provoked sentiments reflected in their writings. Based on that concept, locating sentiment in a text can play an important role for information extraction. The purpose of this thesis is to improve the book search and recommendation quality of the Open Edition's multilingual Books platform. The Books plat- form also offers additional information through users generated information (e.g. book reviews) connected to the books and rich in emotions expressed in the users' writings. Therefore, the previous analysis, concerning locating sentiment in a text for information extraction, plays an important role in this thesis, and can serve the purpose of quality improvement concerning book search, using the shared users generated information. Accordingly, we choose to follow a main path in this thesis to combine sentiment analysis (SA) and information retrieval (IR) fields, for the purpose of improving the quality of book search. Two objectives are summarised in the following, which serve the main purpose of the thesis in the IR quality improvement using SA: • An approach for SA prediction, easily applicable on different languages, low cost in time and annotated data. • New approaches for book search quality improvement, based on SA employment in information filtering, retrieving and classifying
APA, Harvard, Vancouver, ISO, and other styles
48

Silva, Thomaz Philippe Cavalcante. "Uma abordagem evolutiva para combinação de fontes de evidência de relevância em máquinas de busca." Universidade Federal do Amazonas, 2008. http://tede.ufam.edu.br/handle/tede/2966.

Full text
Abstract:
Made available in DSpace on 2015-04-11T14:03:21Z (GMT). No. of bitstreams: 1 Thomaz Philippe Cavalcante Silva.pdf: 477958 bytes, checksum: f2d356a7d29673f431c4aa41d9c41d11 (MD5) Previous issue date: 2008-04-07
CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico
Modern search engines use different strategies to improve the quality of their answers. An important strategy is to get an ordered list of documents based on lists produced by different sources of evidence. This work studies the use of a evolutionary technique to generate good functions of combination of three different sources of evidence: the textual content of the documents, the connecting structures between the documents in a collection and the concatenation of anchor texts pointing to each document. The functions Combination findings in this study were tested in two separate collections: the first contains queries and document a real Web search engine that contains some 12 million documents and the second is to LETOR reference collection, created to allow the fair comparison between collating functions learning methods. The experiments indicate that the studied approach here is a practical and effective alternative to combining different sources of evidence in a single list of answers. We also checked different query classes require different functions combination of sources of evidence and show that our approach is feasible to identify good features.
Máquinas de busca modernas utilizam diferentes estratégias para melhorar a qualidade de suas respostas. Uma estratégia importante é obter uma única lista ordenada de documentos baseada em listas produzidas por diferentes fontes de evidência. Este trabalho estuda o uso de uma técnica evolutiva para gerar boas funções de combinação de três diferentes fontes de evidência: o conteúdo textual dos documentos, as estruturas de ligação entre os documentos de uma coleção e a concatenação dos textos de âncora que apontam para cada documento. As funções de combinação descobertas neste trabalho foram testadas em duas coleções distintas: a primeira contém consultas e documentos de uma máquina de busca real da Web que contém cerca de 12 milhões de documentos e a segunda é a coleção de referência LETOR, criada para permitir a justa comparação entre métodos de aprendizagem de funções de ordenação. Os experimentos indicam que a abordagem estudada aqui é uma alternativa prática e efetiva para combinação de diferentes fontes de evidência em uma única lista de respostas. Nós verificamos também que diferentes classes de consultas necessitam de diferentes funções de combinação de fontes de evidência e mostramos que nossa abordagem é viável em identificar boas funções.
APA, Harvard, Vancouver, ISO, and other styles
49

Andrade, Julietti de. "Interoperabilidade e mapeamentos entre sistemas de organização do conhecimento na busca e recuperação de informações em saúde: estudo de caso em ortopedia e traumatologia." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/27/27151/tde-29062015-121813/.

Full text
Abstract:
Esta pesquisa apresenta o desenvolvimento de método de busca e recuperação de informações em bases de dados especializadas para produção do conhecimento científico na área da Saúde, com ênfase na Saúde Baseada em Evidências. Recorremos, neste trabalho, a diferentes metodologias considerando as especificidades de cada etapa: pesquisa exploratória, método hipotético dedutivo e estudo de caso empírico qualitativo. Mobilizamos os fundamentos teórico-metodológicos da Ciência da Informação e da Saúde nos domínios da Organização e Recuperação da Informação e do Conhecimento, Web Semântica, Saúde Baseada em Evidências e Metodologia Científica, assim como realizamos dois experimentos: estudo de caso em Ortopedia e Traumatologia no sentido de identificar e estabelecer critérios para busca, recuperação, organização e seleção de informações de modo que possam integrar parte da metodologia de trabalhos científicos na área da Saúde; e análise dos tipos de busca e recuperação e dos mapeamentos entre Sistemas de Organização do Conhecimento (SOC) propostos no Metatesauro no escopo da Unified Medical Language System (UMLS) da US National Library of Medicine e no BioPortal da National Center for Biomedical Ontology, ambos na área biomédica. O UMLS disponibiliza acesso a 151 SOC, e o BioPortal, um conjunto de 302 ontologias. Apresentam-se propostas para construção de estratégias de busca com uso de Sistemas de Organização do Conhecimento mapeados e interoperados, bem como para realização de pesquisas bibliográficas para elaboração de trabalhos científicos na área da Saúde.
This research presents the development of method for search and information retrieval in specialized databases aiming the production of scientific knowledge in healthcare, with emphasis on Evidence-Based Health. We have used, in this work, different techniques considering the specificities of each stage: exploratory research, hypothetical deductive method and qualitative empirical case study. It mobilizes the theoretical and methodological foundations in Information Science and Health, appling them to areas as knowledge organization and information retrieval, Semantic Web, Evidence-Based Health and Scientific Methodology. Two experiments were performed: a case study in Orthopedics and Traumatology in order to identify and establish criterions for search, retrieval, organization and selection of information, so that these criterions can integrate part of the methodology of scientific work in healthcare; and analysis of kinds of search and retrieval and mappings on Knowledge Organization Systems-KOS available in Metathesaurus, considering the scope of the Unified Medical Language System (UMLS) of the US National Library of Medicine (NLM), and in the BioPortal National Center for Biomedical Ontology, both in the biomedical field. The UMLS provides access to 151 KOS, and the BioPortal provides a set of 302 ontologies. We presented proposals for construction of search strategies by using Knowledge Organization System mapped and interoperate as well as for conducting literature searches for preparation of scientific papers in healthcare.
APA, Harvard, Vancouver, ISO, and other styles
50

Craswell, Nicholas Eric. "Methods for Distributed Information Retrieval." Phd thesis, 2000. http://hdl.handle.net/1885/46255.

Full text
Abstract:
Published methods for distributed information retrieval generally rely on cooperation from search servers. But most real servers, particularly the tens of thousands available on the Web, are not engineered for such cooperation. This means that the majority of methods proposed, and evaluated in simulated environments of homogeneous cooperating servers, are never applied in practice. ¶ This thesis introduces new methods for server selection and results merging. The methods do not require search servers to cooperate, yet are as effective as the best methods which do. Two large experiments evaluate the new methods against many previously published methods. In contrast to previous experiments they simulate a Web-like environment, where servers employ varied retrieval algorithms and tend not to sub-partition documents from a single source. ...
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography