Dissertations / Theses on the topic 'Web search'

To see the other types of publications on this topic, follow the link: Web search.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Web search.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Bota, Horatiu S. "Composite web search." Thesis, University of Glasgow, 2018. http://theses.gla.ac.uk/38925/.

Full text
Abstract:
The figure above shows Google’s results page for the query “taylor swift”, captured in March 2016. Assembled around the long-established list of search results is content extracted from various source — news items and tweets merged within the results ranking, images, songs and social media profiles displayed to the right of the ranking, in an interface element that is known as an entity card. Indeed, the entire page seems more like an assembly of content extracted from various sources, rather than just a ranked list of blue links. Search engine result pages have become increasingly diverse over the past few years, with most commercial web search providers responding to user queries with different types of results, merged within a unified page. The primary reason for this diversity on the results page is that the web itself has become more diverse, given the ease with which creating and hosting different types of content on the web is possible today. This thesis investigates the aggregation of web search results retrieved from various document sources (e.g., images, tweets, Wiki pages) within information “objects” to be integrated in the results page assembled in response to user queries. We use the terms “composite objects” or “composite results” to refer to such objects, and throughout this thesis use the terminology of Composite Web Search (e.g., result composition) to distinguish our approach from other methods of aggregating diverse content within a unified results page (e.g., Aggregated Search). In our definition, the aspects that differentiate composite information objects from aggregated search blocks are that composite objects (i) contain results from multiple sources of information, (ii) are specific to a common topic or facet of a topic rather than a grouping of results of the same type, and (iii) are not a uniform ranking of results ordered only by their topical relevance to a query. The most widely used type of composite result in web search today is the entity card. Entity cards have become extremely popular over the past few years, with some informal studies suggesting that entity cards are now shown on the majority of result pages generated by Google. As composite results are used more and more by commercial search engines to address information needs directly on the results page, understanding the properties of such objects and their influence on searchers is an essential aspect of modern web search science. The work presented throughout this thesis attempts the task of studying composite objects by exploring users’ perspectives on accessing and aggregating diverse content manually, by analysing the effect composite objects have on search behaviour and perceived workload, and by investigating different approaches to constructing such objects from diverse results. Overall, our experimental findings suggest that items which play a central role within composite objects are decisive in determining their usefulness, and that the overall properties of composite objects (i.e., relevance, diversity and coherence) play a combined role in mediating object usefulness.
APA, Harvard, Vancouver, ISO, and other styles
2

Sawant, Anup Satish. "Semantic web search." Connect to this title online, 2009. http://etd.lib.clemson.edu/documents/1263410119/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Williamson, Victor Lamont. "Goal-oriented Web search." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61247.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 57-58).
We have designed and implemented a Goal-oriented Web application to search videos, images, and news by querying YouTube, Truveo, Google and Yahoo search services. The Planner module decomposes functionality in Goals and Techniques. Goals declare searches for specific types of content and Techniques query the various Web services. We choose which Web service has the best rating at runtime and return the winning results. Users weight their preferred Web services and declare a repository of their own Techniques to upload and execute.
by Victor Lamont Williamson.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
4

Selberg, Erik Warren. "Towards comprehensive Web search /." Thesis, Connect to this title online; UW restricted, 1999. http://hdl.handle.net/1773/6873.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Shen, Yipeng. "Meta-search and distributed search systems /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?COMP%202002%20SHEN.

Full text
Abstract:
Thesis (Ph. D.)--Hong Kong University of Science and Technology, 2002.
Includes bibliographical references (leaves 138-144). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
6

Tadros, Rimon. "Accelerating web search using GPUs." Thesis, University of British Columbia, 2015. http://hdl.handle.net/2429/54722.

Full text
Abstract:
The amount of content on the Internet is growing rapidly as well as the number of the online Internet users. As a consequence, web search engines need to increase their computing capabilities and data continually while maintaining low search latency and without a significant rise in the cost per query. To serve this larger numbers of online users, web search engines utilize a large distributed system in the data centers. They partition their data across several hundred of thousands of independent commodity servers called Index Serving Nodes (ISNs). These ISNs work together to serve search queries as a single coherent system in a distributed manner. The choice of a high number of commodity servers vs. a smaller number of supercomputers is due to the need for scalability, high availability/reliability, performance, and cost efficiency. For the web search engines to serve a larger data, the web search engines can be scaled either vertically or horizontally~\cite{michael2007scale}. Vertical scaling enables ranking more documents per query within a single node by employing machines with higher single thread and throughput performance with bigger and faster memory. Horizontal scaling supports a larger index by adding more index serving nodes at the cost of increased synchronization, aggregation overhead, and power consumption. This thesis evaluates the potential for achieving better vertical scaling by using Graphics processing unit (GPUs) to reduce the documents ranking latency per query at a reasonable initial cost increase. It achieves this by exploiting the parallelism in ranking the numerous potential documents that match a query to offload to the GPU. We evaluate this approach using hundreds of rankers from a commercial web search engine on real production data. Our results show an 8.8x harmonic mean reduction in the latency and 2x power efficiency when ranking 10000 web documents per query for a variety of rankers using C++AMP and a commodity GPU.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
7

Santos, Rodrygo Luis Teodoro. "Explicit web search result diversification." Thesis, University of Glasgow, 2013. http://theses.gla.ac.uk/4106/.

Full text
Abstract:
Queries submitted to a web search engine are typically short and often ambiguous. With the enormous size of the Web, a misunderstanding of the information need underlying an ambiguous query can misguide the search engine, ultimately leading the user to abandon the originally submitted query. In order to overcome this problem, a sensible approach is to diversify the documents retrieved for the user's query. As a result, the likelihood that at least one of these documents will satisfy the user's actual information need is increased. In this thesis, we argue that an ambiguous query should be seen as representing not one, but multiple information needs. Based upon this premise, we propose xQuAD---Explicit Query Aspect Diversification, a novel probabilistic framework for search result diversification. In particular, the xQuAD framework naturally models several dimensions of the search result diversification problem in a principled yet practical manner. To this end, the framework represents the possible information needs underlying a query as a set of keyword-based sub-queries. Moreover, xQuAD accounts for the overall coverage of each retrieved document with respect to the identified sub-queries, so as to rank highly diverse documents first. In addition, it accounts for how well each sub-query is covered by the other retrieved documents, so as to promote novelty---and hence penalise redundancy---in the ranking. The framework also models the importance of each of the identified sub-queries, so as to appropriately cater for the interests of the user population when diversifying the retrieved documents. Finally, since not all queries are equally ambiguous, the xQuAD framework caters for the ambiguity level of different queries, so as to appropriately trade-off relevance for diversity on a per-query basis. The xQuAD framework is general and can be used to instantiate several diversification models, including the most prominent models described in the literature. In particular, within xQuAD, each of the aforementioned dimensions of the search result diversification problem can be tackled in a variety of ways. In this thesis, as additional contributions besides the xQuAD framework, we introduce novel machine learning approaches for addressing each of these dimensions. These include a learning to rank approach for identifying effective sub-queries as query suggestions mined from a query log, an intent-aware approach for choosing the ranking models most likely to be effective for estimating the coverage and novelty of multiple documents with respect to a sub-query, and a selective approach for automatically predicting how much to diversify the documents retrieved for each individual query. In addition, we perform the first empirical analysis of the role of novelty as a diversification strategy for web search. As demonstrated throughout this thesis, the principles underlying the xQuAD framework are general, sound, and effective. In particular, to validate the contributions of this thesis, we thoroughly assess the effectiveness of xQuAD under the standard experimentation paradigm provided by the diversity task of the TREC 2009, 2010, and 2011 Web tracks. The results of this investigation demonstrate the effectiveness of our proposed framework. Indeed, xQuAD attains consistent and significant improvements in comparison to the most effective diversification approaches in the literature, and across a range of experimental conditions, comprising multiple input rankings, multiple sub-query generation and coverage estimation mechanisms, as well as queries with multiple levels of ambiguity. Altogether, these results corroborate the state-of-the-art diversification performance of xQuAD.
APA, Harvard, Vancouver, ISO, and other styles
8

Bian, Jiang. "Contextualized web search: query-dependent ranking and social media search." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37246.

Full text
Abstract:
Due to the information explosion on the Internet, effective information search techniques are required to retrieve the desired information from the Web. Based on much analysis on users' search intention and the variant forms of Web content, we find that both the query and the indexed web content are often associated with various context information, which can provide much essential information to indicate the ranking relevance in Web search. This dissertation seeks to develop new search algorithms and techniques by taking advantage of rich context information to improve search quality and consists of two major parts. In the first part, we study the context of the query in terms of various ranking objectives of different queries. In order to improve the ranking relevance, we propose to incorporate such query context information into the ranking model. Two general approaches will be introduced in the following of this dissertation. The first one proposes to incorporate query difference into ranking by introducing query-dependent loss functions, by optimizing which we can obtain better ranking model. Then, we investigate another approach which applies a divide-and-conquer framework for ranking specialization. The second part of this dissertation investigates how to extract the context of specific Web content and explore them to build more effective search system. This study is based on the new emerging social media content. Unlike traditional Web content, social media content is inherently associated with much new context information, including content semantics and quality, user reputation, and user interactions, all of which provide useful information for acquiring knowledge from social media. In this dissertation, we seek to develop algorithms and techniques for effective knowledge acquisition from collaborative social media environments by using the dynamic context information. We first propose a new general framework for searching social media content, which integrates both the content features and the user interactions. Then, a semi-supervised framework is proposed to explicitly compute content quality and user reputation, which are incorporated into the search framework to improve the search quality. Furthermore, this dissertation also investigates techniques for extracting the structured semantics of social media content as new context information, which is essential for content retrieval and organization.
APA, Harvard, Vancouver, ISO, and other styles
9

Sowmya, Mathukumalli. "Job search portal." Kansas State University, 2016. http://hdl.handle.net/2097/34518.

Full text
Abstract:
Master of Science
Department of Computer Science
Mitchell L. Neilsen
Finding jobs that best suits the interests and skill set is quite a challenging task for the job seekers. The difficulties arise from not having proper knowledge on the organization’s objective, their work culture and current job openings. In addition, finding the right candidate with desired qualifications to fill their current job openings is an important task for the recruiters of any organization. Online Job Search Portals have certainly made job seeking convenient on both sides. Job Portal is the solution where recruiter as well as the job seeker meet aiming at fulfilling their individual requirement. They are the cheapest as well as the fastest source of communication reaching wide range of audience on just a single click irrespective of their geographical distance. The web application “Job Search Portal” provides an easy and convenient search application for the job seekers to find their desired jobs and for the recruiters to find the right candidate. Job seekers from any background can search for the current job openings. Job seekers can register with the application and update their details and skill set. They can search for available jobs and apply to their desired positions. Android, being open source has already made its mark in the mobile application development. To make things handy, the user functionalities are developed as an Android application. Employer can register with the application and posts their current openings. They can view the Job applicants and can screen them according to the best fit. Users can provide a review about an organization and share their interview experience, which can be viewed by the Employers.
APA, Harvard, Vancouver, ISO, and other styles
10

Dennis, Johansson. "Search Engine Optimization and the Long Tail of Web Search." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-296388.

Full text
Abstract:
In the subject of search engine optimization, many methods exist and many aspects are important to keep in mind. This thesis studies the relation between keywords and website ranking in Google Search, and how one can create the biggest positive impact. Keywords with smaller search volume are called "long tail" keywords, and they bear the potential to expand visibility of the website to a larger crowd by increasing the rank of the website for the large fraction of keywords that might not be as common on their own, but together make up for a large amount of the total web searches. This thesis will analyze where on the web page these keywords should be placed, and a case study will be performed in which the goal is to increase the rank of a website with knowledge from previous tests in mind.
APA, Harvard, Vancouver, ISO, and other styles
11

Reimers, Axel, and Isak Gustafsson. "Indexing and Search Algorithmsfor Web shops :." Thesis, KTH, Data- och elektroteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-193373.

Full text
Abstract:
Web shops today needs to be more and more responsive, where one part of this responsivenessis fast product searches. One way of getting faster searches are by searching against anindex instead of directly against a database. Network Expertise Sweden AB (Net Exp) wants to explore different methods of implementingan index in their future web shop, building upon the open-source web shop platformSmartStore.NET. Since SmartStore.NET does all of its searches directly against itsdatabase, it will not scale well and will wear more on the database. The aim was thereforeto find different solutions to offload the database by using an index instead. A prototype that retrieved products from a database and made them searchable through anindex was developed, evaluated and implemented. The prototype indexed the data with aninverted index algorithm, and was made searchable with a search algorithm that mixed typeboolean queries with normal queries.
Webbutiker idag behöver vara mer och mer responsiva, en del av denna responsivitet ärsnabb produkt sökningar. Ett sätt att skaffa snabbare sökningar är genom att söka mot ettindex istället för att söka direkt mot en databas. Network Expertise Sweden AB vill utforska olika metoder för att implementera ett index ideras framtida webbutik, byggt ovanpå SmartStore.NET som är öppen käll-kod. Då Smart-Store.NET gör alla av sina sökningar direkt mot sin databas, kommer den inte att skala braoch kommer slita mer på databasen. Målsättningen var därför att hitta olika lösningar somavlastar databasen genom att använda ett index istället. En prototyp som hämtade produkter från en databas och gjorde dom sökbara genom ettindex var utvecklad, utvärderad och implementerad. Prototypen indexerade datan med eninverterad indexerings algoritm, och gjordes sökbara med en sök algoritm som blandar booleskafrågor med normala frågor.

APA, Harvard, Vancouver, ISO, and other styles
12

Wong, Brian Wai Fung. "Deep-web search engine ranking algorithms." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61246.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 79-80).
The deep web refers to content that is hidden behind HTML forms. The deep web contains a large collection of data that are unreachable by link-based search engines. A study conducted at University of California, Berkeley estimated that the deep web consists of around 91,000 terabytes of data, whereas the surface web is only about 167 terabytes. To access this content, one must submit valid input values to the HTML form. Several researchers have studied methods for crawling deep web content. One of the most promising methods uses unique wrappers for HTML forms. User inputs are first filtered through the wrappers before being submitted to the forms. However, this method requires a new algorithm for ranking search results generated by the wrappers. In this paper, I explore methods for ranking search results returned from a wrapped-based deep web search engine.
by Brian Wai Fung Wong.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
13

Costa, Miguel. "SIDRA: a Flexible Web Search System." Master's thesis, Department of Informatics, University of Lisbon, 2004. http://hdl.handle.net/10451/13914.

Full text
Abstract:
Sidra is a new indexing, searching and ranking system for Web contents. It has a flexible, parallel, distributed and scalable architecture. Sidra maintains several data structures that provide multiple access methods to different data dimensions, giving it the capability to select results reflecting search contexts. Its design addresses current challenges of Web search engines: high performance, short searching and indexing times, good quality of results, scalability and high service availability
APA, Harvard, Vancouver, ISO, and other styles
14

Zhao, Hongkun. "Automatic wrapper generation for the extraction of search result records from search engines." Diss., Online access via UMI:, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
15

Ogbonna, Antoine I. "The Psychology of a Web Search Engine." Youngstown State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1328897147.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Elbassuoni, Shady. "Adaptive personalization of web search : task sensitive approach to search personalization /." Saarbrücken : VDM Verlag Dr. Müller, 2008. http://d-nb.info/988664186/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Hicks, Janette M. "Search algorithms for discovery of Web services." Diss., Online access via UMI:, 2005. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:1425747.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Zhang, Lu Jansen Bernard J. "A branding model for web search engines." [University Park, Pa.] : Pennsylvania State University, 2009. http://etda.libraries.psu.edu/theses/approved/WorldWideIndex/ETD-3996/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Erola, Cañellas Arnau. "Contributions to privacy in web search engines." Doctoral thesis, Universitat Rovira i Virgili, 2013. http://hdl.handle.net/10803/130934.

Full text
Abstract:
Els motors de cerca d’Internet recullen i emmagatzemen informació sobre els seus usuaris per tal d’oferir-los millors serveis. A canvi de rebre un servei personalitzat, els usuaris perden el control de les seves pròpies dades. Els registres de cerca poden revelar informació sensible de l’usuari, o fins i tot revelar la seva identitat. En aquesta tesis tractem com limitar aquests problemes de privadesa mentre mantenim suficient informació a les dades. La primera part d’aquesta tesis tracta els mètodes per prevenir la recollida d’informació per part dels motores de cerca. Ja que aquesta informació es requerida per oferir un servei precís, l’objectiu es proporcionar registres de cerca que siguin adequats per proporcionar personalització. Amb aquesta finalitat, proposem un protocol que empra una xarxa social per tal d’ofuscar els perfils dels usuaris. La segona part tracta la disseminació de registres de cerca. Proposem tècniques que la permeten, proporcionant k-anonimat i minimitzant la pèrdua d’informació.
Web Search Engines collects and stores information about their users in order to tailor their services better to their users' needs. Nevertheless, while receiving a personalized attention, the users lose the control over their own data. Search logs can disclose sensitive information and the identities of the users, creating risks of privacy breaches. In this thesis we discuss the problem of limiting the disclosure risks while minimizing the information loss. The first part of this thesis focuses on the methods to prevent the gathering of information by WSEs. Since search logs are needed in order to receive an accurate service, the aim is to provide logs that are still suitable to provide personalization. We propose a protocol which uses a social network to obfuscate users' profiles. The second part deals with the dissemination of search logs. We propose microaggregation techniques which allow the publication of search logs, providing $k$-anonymity while minimizing the information loss.
APA, Harvard, Vancouver, ISO, and other styles
20

Wu, Le-Shin. "Adaptive peer networks for distributed Web search." [Bloomington, Ind.] : Indiana University, 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3380141.

Full text
Abstract:
Thesis (Ph.D.)--Indiana University, Dept. of Computer Science, 2009.
Title from PDF t.p. (viewed on Jul 20, 2010). Source: Dissertation Abstracts International, Volume: 70-12, Section: B, page: 7684. Adviser: Filippo Menczer.
APA, Harvard, Vancouver, ISO, and other styles
21

Borch, Hans Olaf. "On-Line Clustering of Web Search Results." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2006. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-10125.

Full text
Abstract:

Clustering in a data mining setting has been researched for decades. Lately, document clustering used to cluster web search engine results have recieved much attention. Large companies such as Google and Microsoft have shown their interest and we have seen the emergence of commercial clustering engines such as Vivisimo. This thesis shows how a search engine with clustering capabilities can be developed. The approach described has been implemented as a working prototype that allows searching and browsing clusters through a web interface. The prototype has been evaluated in a user survey and through informal testing.

APA, Harvard, Vancouver, ISO, and other styles
22

Ziembicki, Joanna. "Distributed Search in Semantic Web Service Discovery." Thesis, University of Waterloo, 2006. http://hdl.handle.net/10012/1103.

Full text
Abstract:
This thesis presents a framework for semantic Web Service discovery using descriptive (non-functional) service characteristics in a large-scale, multi-domain setting. The framework uses Web Ontology Language for Services (OWL-S) to design a template for describing non-functional service parameters in a way that facilitates service discovery, and presents a layered scheme for organizing ontologies used in service description. This service description scheme serves as a core for desigining the four main functions of a service directory: a template-based user interface, semantic query expansion algorithms, a two-level indexing scheme that combines Bloom filters with a Distributed Hash Table, and a distributed approach for storing service description. The service directory is, in turn, implemented as an extension of the Open Service Discovery Architecture.

The search algorithms presented in this thesis are designed to maximize precision and completeness of service discovery, while the distributed design of the directory allows individual administrative domains to retain a high degree of independence and maintain access control to information about their services.
APA, Harvard, Vancouver, ISO, and other styles
23

Oyama, Satoshi. "Query Refinement for Domain-Specific Web Search." 京都大学 (Kyoto University), 2002. http://hdl.handle.net/2433/149746.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Nguyen, Qui V. "Enhancing a Web Crawler with Arabic Search." Thesis, Monterey, California: Naval Postgraduate School, 2012.

Find full text
Abstract:
Many advantages of the Internetâ ease of access, limited regulation, vast potential audience, and fast flow of informationâ have turned it into the most popular way to communicate and exchange ideas. Criminal and terrorist groups also use these advantages to turn the Internet into their new play/battle fields to conduct their illegal/terror activities. There are millions of Web sites in different languages on the Internet, but the lack of foreign language search engines makes it impossible to analyze foreign language Web sites efficiently. This thesis will enhance an open source Web crawler with Arabic search capability, thus improving an existing social networking tool to perform page correlation and analysis of Arabic Web sites. A social networking tool with Arabic search capabilities could become a valuable tool for the intelligence community. Its page correlation and analysis results could be used to collect open source intelligence and build a network of Web sites that are related to terrorist or criminal activities.
APA, Harvard, Vancouver, ISO, and other styles
25

Tanudjaja, Francisco 1978. "Using web graph structures to personalize search." Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/86737.

Full text
Abstract:
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.
Includes bibliographical references (p. 93-97).
by Francisco Tanudjaja.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
26

Tifrea-Marciuska, Oana. "Personalised search for the Social Semantic Web." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:27bda5a8-2360-46ad-bcef-e72ae1ae6f52.

Full text
Abstract:
Recently, the Web has been changing more and more to what is called the Social Semantic Web. As a consequence, the ranking of search results no longer depends solely on the structure of the interconnections among Web pages. In my research, I argue that such ranking can be based on user preferences from the Social Web, and on ontological background knowledge from the Semantic Web. Therefore, I combine preference representation languages with Semantic Web technologies. There is some related research in database community that had dedicated some time to integrate preferences in database queries. However, one cannot directly use the ideas from databases, as we additionally have ontological knowledge, which may introduce unknown values, so-called nulls. Therefore, I need to define the exact semantics and check their feasibility for this context. In my thesis, as a first step towards closing the gap between the Semantic Web, databases, and preferences, I introduce families of expressive extensions of Datalog± with preferences as new paradigms for query answering over ontologies. I first define the syntax and semantic of the proposed frameworks, then propose top-k query answering algorithms under user preferences in semantic data for different types of queries and preference models. Each of the proposed frameworks comes with advantages and disadvantages; therefore, I provide formal properties of my algorithms and empirical experiments on the performance and quality of my results. Furthermore, I explore the combination of my framework with uncertainty and the generalisation to the preferences of a group of users, where I analyse properties of my algorithms related with social choice theory.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhou, Ke. "On the evaluation of aggregated web search." Thesis, University of Glasgow, 2014. http://theses.gla.ac.uk/7104/.

Full text
Abstract:
Aggregating search results from a variety of heterogeneous sources or so-called verticals such as news, image and video into a single interface is a popular paradigm in web search. This search paradigm is commonly referred to as aggregated search. The heterogeneity of the information, the richer user interaction, and the more complex presentation strategy, make the evaluation of the aggregated search paradigm quite challenging. The Cranfield paradigm, use of test collections and evaluation measures to assess the effectiveness of information retrieval (IR) systems, is the de-facto standard evaluation strategy in the IR research community and it has its origins in work dating to the early 1960s. This thesis focuses on applying this evaluation paradigm to the context of aggregated web search, contributing to the long-term goal of a complete, reproducible and reliable evaluation methodology for aggregated search in the research community. The Cranfield paradigm for aggregated search consists of building a test collection and developing a set of evaluation metrics. In the context of aggregated search, a test collection should contain results from a set of verticals, some information needs relating to this task and a set of relevance assessments. The metrics proposed should utilize the information in the test collection in order to measure the performance of any aggregated search pages. The more complex user behavior of aggregated search should be reflected in the test collection through assessments and modeled in the metrics. Therefore, firstly, we aim to better understand the factors involved in determining relevance for aggregated search and subsequently build a reliable and reusable test collection for this task. By conducting several user studies to assess vertical relevance and creating a test collection by reusing existing test collections, we create a testbed with both the vertical-level (user orientation) and document-level relevance assessments. In addition, we analyze the relationship between both types of assessments and find that they are correlated in terms of measuring the system performance for the user. Secondly, by utilizing the created test collection, we aim to investigate how to model the aggregated search user in a principled way in order to propose reliable, intuitive and trustworthy evaluation metrics to measure the user experience. We start our investigations by studying solely evaluating one key component of aggregated search: vertical selection, i.e. selecting the relevant verticals. Then we propose a general utility-effort framework to evaluate the ultimate aggregated search pages. We demonstrate the fidelity (predictive power) of the proposed metrics by correlating them to the user preferences of aggregated search pages. Furthermore, we meta-evaluate the reliability and intuitiveness of a variety of metrics and show that our proposed aggregated search metrics are the most reliable and intuitive metrics, compared to adapted diversity-based and traditional IR metrics. To summarize, in this thesis, we mainly demonstrate the feasibility to apply the Cranfield Paradigm for aggregated search for reproducible, cheap, reliable and trustworthy evaluation.
APA, Harvard, Vancouver, ISO, and other styles
28

Lewandowski, Dirk. "Web Searching, Search Engines and Information Retrieval." ISO Press, 2005. http://hdl.handle.net/10150/106395.

Full text
Abstract:
This article discusses Web search engines; mainly the challenges in indexing the World Wide Web, the user behaviour, and the ranking factors used by these engines. Ranking factors are divided into query-dependent and query-independent factors, the latter of which have become more and more important within recent years. The possibilities of these factors are limited, mainly of those that are based on the widely used link popularity measures. The article concludes with an overview of factors that should be considered to determine the quality of Web search engines.
APA, Harvard, Vancouver, ISO, and other styles
29

Petit, Albin. "Introducing privacy in current web search engines." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEI016/document.

Full text
Abstract:
Au cours des dernières années les progrès technologiques permettant de collecter, stocker et traiter d'importantes quantités de données pour un faible coût, ont soulevés de sérieux problèmes concernant la vie privée. La protection de la vie privée concerne de nombreux domaines, en particulier les sites internet fréquemment utilisés comme les moteurs de recherche (ex. : Google, Bing, Yahoo!). Ces services permettent aux utilisateurs de retrouver efficacement du contenu sur Internet en exploitant leurs données personnelles. Dans ce contexte, développer des solutions pour permettre aux utilisateurs d'utiliser ces moteurs de recherche tout en protégeant leurs vies privées est devenu primordial. Dans cette thèse, nous introduirons SimAttack, une attaque contre les solutions protégeant la vie privée de l'utilisateur dans ses interactions avec les moteurs de recherche. Cette attaque vise à retrouver les requêtes initialement envoyées par l'utilisateur. Nous avons montré avec cette attaque que trois mécanismes représentatifs de l’état de l’art ne sont pas satisfaisants pour protéger la vie privée des utilisateurs. Par conséquent, nous avons développé PEAS, un nouveau mécanisme de protection qui améliore la protection de la vie privée de l'utilisateur. Cette solution repose sur deux types de protection : cacher l'identité de l'utilisateur (par une succession de deux serveurs) et masquer sa requête (en la combinant avec des fausses requêtes). Afin de générer des fausses requêtes réalistes, PEAS se base sur les précédentes requêtes envoyées par les utilisateurs du système. Pour finir, nous présenterons des mécanismes permettant d'identifier la sensibilité des requêtes. Notre objectif est d'adapter les mécanismes de protection existants pour protéger uniquement les requêtes sensibles, et ainsi économiser des ressources (ex. : CPU, mémoire vive). Nous avons développé deux modules pour identifier les requêtes sensibles. En déployant ces modules sur des mécanismes de protection existants, nous avons établi qu'ils permettent d'améliorer considérablement leurs performances
During the last few years, the technological progress in collecting, storing and processing a large quantity of data for a reasonable cost has raised serious privacy issues. Privacy concerns many areas, but is especially important in frequently used services like search engines (e.g., Google, Bing, Yahoo!). These services allow users to retrieve relevant content on the Internet by exploiting their personal data. In this context, developing solutions to enable users to use these services in a privacy-preserving way is becoming increasingly important. In this thesis, we introduce SimAttack an attack against existing protection mechanism to query search engines in a privacy-preserving way. This attack aims at retrieving the original user query. We show with this attack that three representative state-of-the-art solutions do not protect the user privacy in a satisfactory manner. We therefore develop PEAS a new protection mechanism that better protects the user privacy. This solution leverages two types of protection: hiding the user identity (with a succession of two nodes) and masking users' queries (by combining them with several fake queries). To generate realistic fake queries, PEAS exploits previous queries sent by the users in the system. Finally, we present mechanisms to identify sensitive queries. Our goal is to adapt existing protection mechanisms to protect sensitive queries only, and thus save user resources (e.g., CPU, RAM). We design two modules to identify sensitive queries. By deploying these modules on real protection mechanisms, we establish empirically that they dramatically improve the performance of the protection mechanisms
APA, Harvard, Vancouver, ISO, and other styles
30

Umemoto, Kazutoshi. "A Study on Fine-Grained User Behavior Analysis in Web Search." 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/215679.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Ali, Halil, and hali@cs rmit edu au. "Effective web crawlers." RMIT University. CS&IT, 2008. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20081127.164414.

Full text
Abstract:
Web crawlers are the component of a search engine that must traverse the Web, gathering documents in a local repository for indexing by a search engine so that they can be ranked by their relevance to user queries. Whenever data is replicated in an autonomously updated environment, there are issues with maintaining up-to-date copies of documents. When documents are retrieved by a crawler and have subsequently been altered on the Web, the effect is an inconsistency in user search results. While the impact depends on the type and volume of change, many existing algorithms do not take the degree of change into consideration, instead using simple measures that consider any change as significant. Furthermore, many crawler evaluation metrics do not consider index freshness or the amount of impact that crawling algorithms have on user results. Most of the existing work makes assumptions about the change rate of documents on the Web, or relies on the availability of a long history of change. Our work investigates approaches to improving index consistency: detecting meaningful change, measuring the impact of a crawl on collection freshness from a user perspective, developing a framework for evaluating crawler performance, determining the effectiveness of stateless crawl ordering schemes, and proposing and evaluating the effectiveness of a dynamic crawl approach. Our work is concerned specifically with cases where there is little or no past change statistics with which predictions can be made. Our work analyses different measures of change and introduces a novel approach to measuring the impact of recrawl schemes on search engine users. Our schemes detect important changes that affect user results. Other well-known and widely used schemes have to retrieve around twice the data to achieve the same effectiveness as our schemes. Furthermore, while many studies have assumed that the Web changes according to a model, our experimental results are based on real web documents. We analyse various stateless crawl ordering schemes that have no past change statistics with which to predict which documents will change, none of which, to our knowledge, has been tested to determine effectiveness in crawling changed documents. We empirically show that the effectiveness of these schemes depends on the topology and dynamics of the domain crawled and that no one static crawl ordering scheme can effectively maintain freshness, motivating our work on dynamic approaches. We present our novel approach to maintaining freshness, which uses the anchor text linking documents to determine the likelihood of a document changing, based on statistics gathered during the current crawl. We show that this scheme is highly effective when combined with existing stateless schemes. When we combine our scheme with PageRank, our approach allows the crawler to improve both freshness and quality of a collection. Our scheme improves freshness regardless of which stateless scheme it is used in conjunction with, since it uses both positive and negative reinforcement to determine which document to retrieve. Finally, we present the design and implementation of Lara, our own distributed crawler, which we used to develop our testbed.
APA, Harvard, Vancouver, ISO, and other styles
32

Kalinov, Pavel. "Intelligent Web Exploration." Thesis, Griffith University, 2012. http://hdl.handle.net/10072/365635.

Full text
Abstract:
The hyperlinked part of the internet known as "the Web" arose without much planning for a future of millions of publishers and countless pieces of online content. It has no in-built mechanism to find anything, so tools external to it were introduced: initially web directories and then search engines. Search engines are based on machine learning and have been extremely successful. However, they have some inherent limitations and cannot, by design, address some needs: they serve the "information locating" need only and not "information discovery". Search engine users have learned to accept them and in many cases do not realise how their search has been limited by shortcomings of the model. Before the advent of the search engine, web directories were the only information-finding tool on the web. They were manually built and could not compete economically with the effciency of search engines. This lead to their virtual extinction, with the effect that the "information discovery" need of users is no longer served by any major information provider. Furthermore, none of the dominant information-finding models account for the person of the user in any meaningful way controllable by (or even visible to) the user. This work proposes a method to combine a search engine, a web directory and a personal information management agent into an intelligent Web Exploration Engine in a way which bridges the gaps between these seemingly unrelated tools. Our hybrid, for which we have developed a proof-of-concept prototype [Kalinov et al., 2010b], allows users to both locate specific data and to discover new information. Information discovery is served by a web directory which is built with the assistance of a dynamic hierarchical classifier we developed [Kalinov et al., 2010a]. The category structure achieved by it is also the basis of a large number of nested search engines, allowing information locating both in general (similar to a "standard" search engine) and in a variety of contexts selectable by the user.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
33

Gopinathan-Leela, Ligon, and n/a. "Personalisation of web information search: an agent based approach." University of Canberra. Information Sciences & Engineering, 2005. http://erl.canberra.edu.au./public/adt-AUC20060728.120849.

Full text
Abstract:
The main purpose of this research is to find an effective way to personalise information searching on the Internet using middleware search agents, namely, Personalised Search Agents (PSA). The PSA acts between users and search engines, and applies new and existing techniques to mine and exploit relevant and personalised information for users. Much research has already been done in developing personalising filters, as a middleware technique which can act between user and search engines to deliver more personalised results. These personalising filters, apply one or more of the popular techniques for search result personalisation, such as the category concept, learning from user actions and using metasearch engines. By developing the PSA, these techniques have been investigated and incorporated to create an effective middleware agent for web search personalisation. In this thesis, a conceptual model for the Personalised Search Agent is developed, implemented by developing a prototype and benchmarked the prototype against existing web search practices. System development methodology which has flexible and iterative procedures that switch between conceptual design and prototype development was adopted as the research methodology. In the conceptual model of the PSA, a multi-layer client server architecture is used by applying generalisation-specialisation features. The client and the server are structurally the same, but differ in the level of generalisation and interface. The client handles personalising information regarding one user whereas the server effectively combines the personalising information of all the clients (i.e. its users) to generate a global profile. Both client and server apply the category concept where user selected URLs are mapped against categories. The PSA learns the user relevant URLs both by requesting explicit feedback and by implicitly capturing user actions (for instance the active time spent by the user on a URL). The PSA also employs a keyword-generating algorithm, and tries different combinations of words in a user search string by effectively combining them with the relevant category values. The core functionalities of the conceptual model for the PSA, were implemented in a prototype, used to test the ideas in the real word. The result was benchmarked with the results from existing search engines to determine the efficiency of the PSA over conventional searching. A comparison of the test results revealed that the PSA is more effective and efficient in finding relevant and personalised results for individual users and possesses a unique user sense rather than the general user sense of traditional search engines. The PSA, is a novel architecture and contributes to the domain of knowledge web information searching, by delivering new ideas such as active time based user relevancy calculations, automatic generation of sensible search keyword combinations and the implementation of a multi-layer agent architecture. Moreover, the PSA has high potential for future extensions as well. Because it captures highly personalised data, data mining techniques which employ case-based reasoning make the PSA a more responsive, more accurate and more effective tool for personalised information searching.
APA, Harvard, Vancouver, ISO, and other styles
34

Lee, Ryong. "KyotoSearch : An integrated system for geographic web search using web contents analysis." 京都大学 (Kyoto University), 2003. http://hdl.handle.net/2433/148500.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Zhu, Jianhan. "Mining web site link structures for adaptive web site navigation and search." Thesis, University of Ulster, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.515890.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Zhu, Dengya. "Improving the relevance of web search results by combining web snippet categorization, clustering and personalization." Thesis, Curtin University, 2010. http://hdl.handle.net/20.500.11937/326.

Full text
Abstract:
Web search results are far from perfect due to the polysemous and synonymous characteristics of nature languages, information overload as the results of information explosion on the Web, and the flat list, “one size fits all” strategies of search engines to present search results without concentrating on user personal information needs.Re-organizing Web search results, or Web snippets, by means of text categorization and clustering are two dominant approaches to attack the issues above. Text categorization uses a collection of labeled documents to train a classifier which can then predict labels for new unlabeled documents; while text clustering groups unlabeled documents by finding common properties shared among the documents in the same group. The issue related to categorization is human labeled training documents are very expensive to obtain and thus surprisingly scarce at the moment; while how to label the generated groups is still an open research question for text clustering. In addition, a Web snippet, returned from search engines, contains only the title of a webpage and an optional very short (less than 30 words) description of the page. The less-informative aspect of Web snippets is another challenge for both text categorization and clustering.The primary objective of this research is to improve the relevance of Web search results and thus provide the user with a better search experience. To achieve this objective, the research combines Web snippet categorization, clustering and personalization techniques to recommend relevant results to search users. Using design research methodology, the study develops an IT artifact named RIB – Recommender Intelligent Browser. RIB categorizes Web snippets using a socially constructed Web directory such as the Open Directory Project (ODP) for which the semantic characteristics of the categories in ODP are extracted to generate a series of labeled document sets. At the same time, the Web snippets are clustered to boost the quality of the categorization. Based on search preferences in a user profile which is automatically generated by using information extracted from user personal computer with the approval of the user for information collection, the proposed search method will recommend personalized search results to users. Experimental data demonstrate that the mean average precision improvement of RIB over Yahoo Search Web Services API based on 25 search-terms with 1250 Web snippets is 7.84%, from 55.55% of Yahoo to 64.29% of RIB.A novel boostingUp algorithm is also proposed in this research to improve the performance of text categorization by leveraging the power of text clustering and vice versa. Experimental results illustrate that boostingUp can marginally improve the performance of both Web snippet categorization and clustering in terms of Adjusted Rand Index and F[subscript]1. BoostingUp is able to produce 0.97% improvement of macro-averaged F[subscript]1 from 24.51% to 25.48% for Naïve Bayes with combination of K-Means, 2.04% improvement of micro-averaged F[subscript]1 from 32.17% to 34.21%. On the other hand, the improvement in terms of Adjusted Rand Index of K-Means with combination of Naïve Bayes is 2.35% (from 13.17% to 15.52%), and the improvement of F[subscript]1 is 2.37% (from 21.45% to 23.82%).The issues of lack of labeled data set that can be used for Web snippet categorization and used as benchmark document collection to evaluate text categorization/clustering algorithms is addressed by extracting semantic characteristics of ODP categories to generate a series of labeled categoryDocument sets. Statistical information about the generated data sets is provided as well. The generated categoryDocuments are used to evaluate the performance of Naïve Bayes, Adaboost, and kNN text categorization algorithms when a list of feature selection algorithms including Chi-square, Mutual Information, Information Gain, Odds Ratio, are employed to pick up 50, 80, 100, 200, 300, 500,1000, 2000, 3000, 5000, and 10000 features. Other text categorization algorithms such as SVMlight and Statistical Language Model based algorithm and feature selection algorithms such as GSS Coefficient, NGL Coefficient, and Relevancy Score are also evaluated based on a specially designed small data set. Two other proposed algorithms, R[superscript]2Cut thresholding strategy and Z-tfidf, are at the same time evaluated, and demonstrate the ability of slightly improving the performance of text categorization. Text clustering algorithms such as K-Means and Hierarchical Agglomerative Clustering are also evaluated by using the generated categoryDocument sets. All algorithms involved in this research were implemented in Java.In addition, this research is the first to present the detailed information about the hierarchy of the ODP, the world’s most comprehensive human-edited Web directory, by analyzing the data in two publicly accessible files under Free Use License. Although ODP is adopted as core directory services for the World’s most popular search engines such Google, AOL Search, Netscape Search, Lycos, HotBot and hundreds of other; and used for a wide range of research purposes, there is no detailed hierarchical information about ODP published so far.The research further verifies the relationship between precision improvement and relevance judgment convergent degree when the effectiveness of an information retrieval system is evaluated based on the results of human relevance judgment; and reveals that the two variables are to some extent co-related in terms of correlation coefficient.Improving the relevance of Web searching is challenging. This research proposes to combine text categorization, clustering and personalization to provide better search experience to users. Comprehensive experimental evidence and favorable comparisons against search results of Yahoo API demonstrate the designed search objectives have been achieved.
APA, Harvard, Vancouver, ISO, and other styles
37

Mouhoub, Mohamed Lamine. "Aggregated Search of Data and Services." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLED066/document.

Full text
Abstract:
Ces dernières années ont témoigné du succès du projet Linked Open Data (LOD) et de la croissance du nombre de sources de données sémantiques disponibles sur le web. Cependant, il y a encore beaucoup de données qui ne sont pas encore mises à disposition dans le LOD telles que les données sur demande, les données de capteurs etc. Elles sont néanmoins fournies par des API des services Web. L'intégration de ces données au LOD ou dans des applications de mashups apporterait une forte valeur ajoutée. Cependant, chercher de tels services avec les outils de découverte de services existants nécessite une connaissance préalable des répertoires de services ainsi que des ontologies utilisées pour les décrire.Dans cette thèse, nous proposons de nouvelles approches et des cadres logiciels pour la recherche de services web sémantiques avec une perspective d'intégration de données. Premièrement, nous introduisons LIDSEARCH, un cadre applicatif piloté par SPARQL pour chercher des données et des services web sémantiques.De plus, nous proposons une approche pour enrichir les descriptions sémantiques de services web en décrivant les relations ontologiques entre leurs entrées et leurs sorties afin de faciliter l'automatisation de la découverte et de la composition de services. Afin d'atteindre ce but, nous utilisons des techniques de traitement automatique de la langue et d'appariement de textes basées sur le deep-learning pour mieux comprendre les descriptions des services.Nous validons notre travail avec des preuves de concept et utilisons les services et les ontologies d'OWLS-TC pour évaluer nos approches proposées de sélection et d'enrichissement
The last years witnessed the success of the Linked Open Data (LOD) project as well as a significantly growing amount of semantic data sources available on the web. However, there are still a lot of data not being published as fully materialized knowledge bases like as sensor data, dynamic data, data with limited access patterns, etc. Such data is in general available through web APIs or web services. Integrating such data to the LOD or in mashups would have a significant added value. However, discovering such services requires a lot of efforts from developers and a good knowledge of the existing service repositories that the current service discovery systems do not efficiently overcome.In this thesis, we propose novel approaches and frameworks to search for semantic web services from a data integration perspective. Firstly, we introduce LIDSEARCH, a SPARQL-driven framework to search for linked data and semantic web services. Moreover, we propose an approach to enrich semantic service descriptions with Input-Output relations from ontologies to facilitate the automation of service discovery and composition. To achieve such a purpose, we apply natural language processing techniques and deep-learning-based text similarity techniques to leverage I/O relations from text to ontologies.We validate our work with proof-of-concept frameworks and use OWLS-TC as a dataset for conducting our experiments on service search and enrichment
APA, Harvard, Vancouver, ISO, and other styles
38

Blaauw, Pieter. "Search engine poisoning and its prevalence in modern search engines." Thesis, Rhodes University, 2013. http://hdl.handle.net/10962/d1002037.

Full text
Abstract:
The prevalence of Search Engine Poisoning in trending topics and popular search terms on the web within search engines is investigated. Search Engine Poisoning is the act of manipulating search engines in order to display search results from websites infected with malware. Research done between February and August 2012, using both manual and automated techniques, shows us how easily the criminal element manages to insert malicious content into web pages related to popular search terms within search engines. In order to provide the reader with a clear overview and understanding of the motives and the methods of the operators of Search Engine Poisoning campaigns, an in-depth review of automated and semi-automated web exploit kits is done, as well as looking into the motives for running these campaigns. Three high profile case studies are examined, and the various Search Engine Poisoning campaigns associated with these case studies are discussed in detail to the reader. From February to August 2012, data was collected from the top trending topics on Google’s search engine along with the top listed sites related to these topics, and then passed through various automated tools to discover if these results have been infiltrated by the operators of Search Engine Poisoning campaings, and the results of these automated scans are then discussed in detail. During the research period, manual searching for Search Engine Poisoning campaigns was also done, using high profile news events and popular search terms. These results are analysed in detail to determine the methods of attack, the purpose of the attack and the parties behind it
APA, Harvard, Vancouver, ISO, and other styles
39

Speicher, Maximilian. "Search Interaction Optimization." Doctoral thesis, Universitätsbibliothek Chemnitz, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-208102.

Full text
Abstract:
Over the past 25 years, search engines have become one of the most important, if not the entry point of the World Wide Web. This development has been primarily due to the continuously increasing amount of available documents, which are highly unstructured. Moreover, the general trend is towards classifying search results into categories and presenting them in terms of semantic information that answer users' queries without having to leave the search engine. With the growing amount of documents and technological enhancements, the needs of users as well as search engines are continuously evolving. Users want to be presented with increasingly sophisticated results and interfaces while companies have to place advertisements and make revenue to be able to offer their services for free. To address the above needs, it is more and more important to provide highly usable and optimized search engine results pages (SERPs). Yet, existing approaches to usability evaluation are often costly or time-consuming and mostly rely on explicit feedback. They are either not efficient or not effective while SERP interfaces are commonly optimized primarily from a company's point of view. Moreover, existing approaches to predicting search result relevance, which are mostly based on clicks, are not tailored to the evolving kinds of SERPs. For instance, they fail if queries are answered directly on a SERP and no clicks need to happen. Applying Human-Centered Design principles, we propose a solution to the above in terms of a holistic approach that intends to satisfy both, searchers and developers. It provides novel means to counteract exclusively company-centric design and to make use of implicit user feedback for efficient and effective evaluation and optimization of usability and, in particular, relevance. We define personas and scenarios from which we infer unsolved problems and a set of well-defined requirements. Based on these requirements, we design and develop the Search Interaction Optimization toolkit. Using a bottom-up approach, we moreover define an eponymous, higher-level methodology. The Search Interaction Optimization toolkit comprises a total of six components. We start with INUIT [1], which is a novel minimal usability instrument specifically aiming at meaningful correlations with implicit user feedback in terms of client-side interactions. Hence, it serves as a basis for deriving usability scores directly from user behavior. INUIT has been designed based on reviews of established usability standards and guidelines as well as interviews with nine dedicated usability experts. Its feasibility and effectiveness have been investigated in a user study. Also, a confirmatory factor analysis shows that the instrument can reasonably well describe real-world perceptions of usability. Subsequently, we introduce WaPPU [2], which is a context-aware A/B testing tool based on INUIT. WaPPU implements the novel concept of Usability-based Split Testing and enables automatic usability evaluation of arbitrary SERP interfaces based on a quantitative score that is derived directly from user interactions. For this, usability models are automatically trained and applied based on machine learning techniques. In particular, the tool is not restricted to evaluating SERPs, but can be used with any web interface. Building on the above, we introduce S.O.S., the SERP Optimization Suite [3], which comprises WaPPU as well as a catalog of best practices [4]. Once it has been detected that an investigated SERP's usability is suboptimal based on scores delivered by WaPPU, corresponding optimizations are automatically proposed based on the catalog of best practices. This catalog has been compiled in a three-step process involving reviews of existing SERP interfaces and contributions by 20 dedicated usability experts. While the above focus on the general usability of SERPs, presenting the most relevant results is specifically important for search engines. Hence, our toolkit contains TellMyRelevance! (TMR) [5] — the first end-to-end pipeline for predicting search result relevance based on users’ interactions beyond clicks. TMR is a fully automatic approach that collects necessary information on the client, processes it on the server side and trains corresponding relevance models based on machine learning techniques. Predictions made by these models can then be fed back into the ranking process of the search engine, which improves result quality and hence also usability. StreamMyRelevance! (SMR) [6] takes the concept of TMR one step further by providing a streaming-based version. That is, SMR collects and processes interaction data and trains relevance models in near real-time. Based on a user study and large-scale log analysis involving real-world search engines, we have evaluated the components of the Search Interaction Optimization toolkit as a whole—also to demonstrate the interplay of the different components. S.O.S., WaPPU and INUIT have been engaged in the evaluation and optimization of a real-world SERP interface. Results show that our tools are able to correctly identify even subtle differences in usability. Moreover, optimizations proposed by S.O.S. significantly improved the usability of the investigated and redesigned SERP. TMR and SMR have been evaluated in a GB-scale interaction log analysis as well using data from real-world search engines. Our findings indicate that they are able to yield predictions that are better than those of competing state-of-the-art systems considering clicks only. Also, a comparison of SMR to existing solutions shows its superiority in terms of efficiency, robustness and scalability. The thesis concludes with a discussion of the potential and limitations of the above contributions and provides an overview of potential future work
Im Laufe der vergangenen 25 Jahre haben sich Suchmaschinen zu einem der wichtigsten, wenn nicht gar dem wichtigsten Zugangspunkt zum World Wide Web (WWW) entwickelt. Diese Entwicklung resultiert vor allem aus der kontinuierlich steigenden Zahl an Dokumenten, welche im WWW verfügbar, jedoch sehr unstrukturiert organisiert sind. Überdies werden Suchergebnisse immer häufiger in Kategorien klassifiziert und in Form semantischer Informationen bereitgestellt, die direkt in der Suchmaschine konsumiert werden können. Dies spiegelt einen allgemeinen Trend wider. Durch die wachsende Zahl an Dokumenten und technologischen Neuerungen wandeln sich die Bedürfnisse von sowohl Nutzern als auch Suchmaschinen ständig. Nutzer wollen mit immer besseren Suchergebnissen und Interfaces versorgt werden, während Suchmaschinen-Unternehmen Werbung platzieren und Gewinn machen müssen, um ihre Dienste kostenlos anbieten zu können. Damit geht die Notwendigkeit einher, in hohem Maße benutzbare und optimierte Suchergebnisseiten – sogenannte SERPs (search engine results pages) – für Nutzer bereitzustellen. Gängige Methoden zur Evaluierung und Optimierung von Usability sind jedoch größtenteils kostspielig oder zeitaufwändig und basieren meist auf explizitem Feedback. Sie sind somit entweder nicht effizient oder nicht effektiv, weshalb Optimierungen an Suchmaschinen-Schnittstellen häufig primär aus dem Unternehmensblickwinkel heraus durchgeführt werden. Des Weiteren sind bestehende Methoden zur Vorhersage der Relevanz von Suchergebnissen, welche größtenteils auf der Auswertung von Klicks basieren, nicht auf neuartige SERPs zugeschnitten. Zum Beispiel versagen diese, wenn Suchanfragen direkt auf der Suchergebnisseite beantwortet werden und der Nutzer nicht klicken muss. Basierend auf den Prinzipien des nutzerzentrierten Designs entwickeln wir eine Lösung in Form eines ganzheitlichen Ansatzes für die oben beschriebenen Probleme. Dieser Ansatz orientiert sich sowohl an Nutzern als auch an Entwicklern. Unsere Lösung stellt automatische Methoden bereit, um unternehmenszentriertem Design entgegenzuwirken und implizites Nutzerfeedback für die effizienteund effektive Evaluierung und Optimierung von Usability und insbesondere Ergebnisrelevanz nutzen zu können. Wir definieren Personas und Szenarien, aus denen wir ungelöste Probleme und konkrete Anforderungen ableiten. Basierend auf diesen Anforderungen entwickeln wir einen entsprechenden Werkzeugkasten, das Search Interaction Optimization Toolkit. Mittels eines Bottom-up-Ansatzes definieren wir zudem eine gleichnamige Methodik auf einem höheren Abstraktionsniveau. Das Search Interaction Optimization Toolkit besteht aus insgesamt sechs Komponenten. Zunächst präsentieren wir INUIT [1], ein neuartiges, minimales Instrument zur Bestimmung von Usability, welches speziell auf sinnvolle Korrelationen mit implizitem Nutzerfeedback in Form Client-seitiger Interaktionen abzielt. Aus diesem Grund dient es als Basis für die direkte Herleitung quantitativer Usability-Bewertungen aus dem Verhalten von Nutzern. Das Instrument wurde basierend auf Untersuchungen etablierter Usability-Standards und -Richtlinien sowie Experteninterviews entworfen. Die Machbarkeit und Effektivität der Benutzung von INUIT wurden in einer Nutzerstudie untersucht und darüber hinaus durch eine konfirmatorische Faktorenanalyse bestätigt. Im Anschluss beschreiben wir WaPPU [2], welches ein kontextsensitives, auf INUIT basierendes Tool zur Durchführung von A/B-Tests ist. Es implementiert das neuartige Konzept des Usability-based Split Testing und ermöglicht die automatische Evaluierung der Usability beliebiger SERPs basierend auf den bereits zuvor angesprochenen quantitativen Bewertungen, welche direkt aus Nutzerinteraktionen abgeleitet werden. Hierzu werden Techniken des maschinellen Lernens angewendet, um automatisch entsprechende Usability-Modelle generieren und anwenden zu können. WaPPU ist insbesondere nicht auf die Evaluierung von Suchergebnisseiten beschränkt, sondern kann auf jede beliebige Web-Schnittstelle in Form einer Webseite angewendet werden. Darauf aufbauend beschreiben wir S.O.S., die SERP Optimization Suite [3], welche das Tool WaPPU sowie einen neuartigen Katalog von „Best Practices“ [4] umfasst. Sobald eine durch WaPPU gemessene, suboptimale Usability-Bewertung festgestellt wird, werden – basierend auf dem Katalog von „Best Practices“ – automatisch entsprechende Gegenmaßnahmen und Optimierungen für die untersuchte Suchergebnisseite vorgeschlagen. Der Katalog wurde in einem dreistufigen Prozess erarbeitet, welcher die Untersuchung bestehender Suchergebnisseiten sowie eine Anpassung und Verifikation durch 20 Usability-Experten beinhaltete. Die bisher angesprochenen Tools fokussieren auf die generelle Usability von SERPs, jedoch ist insbesondere die Darstellung der für den Nutzer relevantesten Ergebnisse eminent wichtig für eine Suchmaschine. Da Relevanz eine Untermenge von Usability ist, beinhaltet unser Werkzeugkasten daher das Tool TellMyRelevance! (TMR) [5], die erste End-to-End-Lösung zur Vorhersage von Suchergebnisrelevanz basierend auf Client-seitigen Nutzerinteraktionen. TMR ist einvollautomatischer Ansatz, welcher die benötigten Daten auf dem Client abgreift, sie auf dem Server verarbeitet und entsprechende Relevanzmodelle bereitstellt. Die von diesen Modellen getroffenen Vorhersagen können wiederum in den Ranking-Prozess der Suchmaschine eingepflegt werden, was schlussendlich zu einer Verbesserung der Usability führt. StreamMyRelevance! (SMR) [6] erweitert das Konzept von TMR, indem es einen Streaming-basierten Ansatz bereitstellt. Hierbei geschieht die Sammlung und Verarbeitung der Daten sowie die Bereitstellung der Relevanzmodelle in Nahe-Echtzeit. Basierend auf umfangreichen Nutzerstudien mit echten Suchmaschinen haben wir den entwickelten Werkzeugkasten als Ganzes evaluiert, auch, um das Zusammenspiel der einzelnen Komponenten zu demonstrieren. S.O.S., WaPPU und INUIT wurden zur Evaluierung und Optimierung einer realen Suchergebnisseite herangezogen. Die Ergebnisse zeigen, dass unsere Tools in der Lage sind, auch kleine Abweichungen in der Usability korrekt zu identifizieren. Zudem haben die von S.O.S.vorgeschlagenen Optimierungen zu einer signifikanten Verbesserung der Usability der untersuchten und überarbeiteten Suchergebnisseite geführt. TMR und SMR wurden mit Datenmengen im zweistelligen Gigabyte-Bereich evaluiert, welche von zwei realen Hotelbuchungsportalen stammen. Beide zeigen das Potential, bessere Vorhersagen zu liefern als konkurrierende Systeme, welche lediglich Klicks auf Ergebnissen betrachten. SMR zeigt gegenüber allen anderen untersuchten Systemen zudem deutliche Vorteile bei Effizienz, Robustheit und Skalierbarkeit. Die Dissertation schließt mit einer Diskussion des Potentials und der Limitierungen der erarbeiteten Forschungsbeiträge und gibt einen Überblick über potentielle weiterführende und zukünftige Forschungsarbeiten
APA, Harvard, Vancouver, ISO, and other styles
40

Romero, Tris Cristina. "Client-side privacy-enhancing technologies in web search." Doctoral thesis, Universitat Rovira i Virgili, 2014. http://hdl.handle.net/10803/284036.

Full text
Abstract:
Els motors de cerca (En anglès, Web Search Engines - WSEs-), són eines que permeten als usuaris localitzar informació específica a Internet. Un dels objectius dels WSEs és retornar els resultats que millor coincideixen amb els interessos de cada usuari. Amb aquesta finalitat, l'WSEs recull i analitza l' historial de cerca per construir perfils. Com a resultat, un usuari que envia una certa consulta rebrà els resultats més interessants en les primeres posicions. Encara que proporcionen un servei molt útil, també representen una amenaça per a la privacitat dels seus usuaris. Es construeixen els perfils basats en la història de les consultes i altres dades relacionades que poden contenir informació personal i privada. Per evitar aquesta amenaça de privacitat, és necessari establir mecanismes per a la protecció de la privacitat dels usuaris dels motors de cerca. Actualment, hi ha diverses solucions en la literatura per proporcionar privacitat a aquests usuaris. Un dels objectius d'aquest estudi és analitzar les solucions existents, estudiar les seves diferències i els avantatges i inconvenients de cada proposta. Llavors, basat en l'estat de l'art, presentem noves propostes per protegir la privadesa dels usuaris. Més concretament, aquesta tesi proposa tres protocols per preservar la privacitat dels usuaris en la cerca web. La idea general és distribuir als usuaris en grups on intercanvi consultes, com a mètode d'ofuscació ocultar les consultes reals de cada usuari. El primer protocol distribuït que proposem es centra en la reducció del temps d'espera de consulta, és a dir, el temps que cada membre del grup ha d'esperar per rebre els resultats de la seva consulta. El segon protocol proposat millora les propostes anteriors ja que resisteix els atacs interns, i obté millors resultats que les propostes similars en termes de càlcul i comunicació. La tercera proposta és un protocol P2P, on els usuaris estan agrupats segons les seves preferències. Això permet ocultar els perfils d'usuari però conservar els interessos generals. En conseqüència, el motor de cerca és capaç de classificar millor els resultats de les seves consultes.
Los motores de búsqueda (en inglés, Web Search Engines -WSEs-) son herramientas que permiten a los usuarios localizar información específica en Internet. Uno de los objetivos de los WSEs es devolver los resultados que mejor coinciden con los intereses de cada usuario. Para ello, los WSEs recogen y analizan el historial de búsqueda de los usuarios para construir perfiles. Como resultado, un usuario que envía una cierta consulta recibirá los resultados más interesantes en las primeras posiciones. Aunque ofrecen un servicio muy útil, también representan una amenaza para la privacidad de sus usuarios. Los perfiles se construyen a partir del historial de consultas y otros datos relacionados que pueden contener información privada y personal. Para evitar esta amenaza de privacidad, es necesario establecer mecanismos de protección de privacidad de motores de búsqueda. En la actualidad, existen varias soluciones en la literatura para proporcionar privacidad a estos usuarios. Uno de los objetivos de este trabajo es examinar las soluciones existentes, analizando sus diferencias y las ventajas y desventajas de cada propuesta. Después, basándonos en el estado del arte actual, presentamos nuevas propuestas que protegen la privacidad de los usuarios. Más concretamente, esta tesis doctoral propone tres protocolos que preservan la privacidad de los usuarios en las búsquedas web. La idea general es distribuir a los usuarios en grupos donde intercambian sus consultas, como método de ofuscación para ocultar las consultas reales de cada usuario. El primer protocolo distribuido que proponemos se centra en reducir el tiempo de espera de la consulta, es decir, el tiempo que cada miembro del grupo tiene que esperar para recibir los resultados de la consulta. El segundo protocolo propuesto mejora anteriores propuestas porque resiste ataques internos, mejorando propuestas similares en términos de cómputo y comunicación. La tercera propuesta es un protocolo P2P, donde los usuarios se agrupan según sus preferencias. Esto permite ofuscar los perfiles de los usuarios pero conservando a sus intereses generales. En consecuencia, el WSE es capaz de clasificar mejor los resultados de sus consultas.
Web search engines (WSEs) are tools that allow users to locate specific information on the Internet. One of the objectives of WSEs is to return the results that best match the interests of each user. For this purpose, WSEs collect and analyze users’ search history in order to build profiles. Consequently, a profiled user who submits a certain query will receive the results which are more interesting for her in the first positions. Although they offer a very useful service, they also represent a threat for their users’ privacy. Profiles are built from past queries and other related data that may contain private and personal information. In order to avoid this privacy threat, it is necessary to provide privacy-preserving mechanisms that protect users. Nowadays, there exist several solutions that intend to provide privacy in this field. One of the goals of this work is to survey the current solutions, analyzing their differences and remarking the advantages and disadvantages of each approach. Then, based on the current state of the art, we present new proposals that protect users’ privacy. More specifically, this dissertation proposes three different privacy-preserving multi-party protocols for web search. A multi-party protocol for web search arranges users into groups where they exchange their queries. This serves as an obfuscation method to hide the real queries of each user. The first multi-party protocol that we propose focuses on reducing the query delay. This is the time that every group member has to wait in order to receive the query results. The second proposed multi-party protocol improves current literature because it is resilient against internal attacks, outperforming similar proposals in terms of computation and communication. The third proposal is a P2P protocol, where users are grouped according to their preferences. This allows to obfuscate users’ profiles but conserving their general interests. Consequently, the WSE is able to better rank the results of their queries.
APA, Harvard, Vancouver, ISO, and other styles
41

Kim, Bong-Seop. "Advanced web search based on formal concept analysis." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ62230.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Fu, Xin Marchionini Gary. "Evaluating sources of implicit feedback for web search." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2008. http://dc.lib.unc.edu/u?/etd,1797.

Full text
Abstract:
Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2008.
Title from electronic title page (viewed Sep. 16, 2008). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the School of Information and Library Science." Discipline: Information and Library Science; Department/School: Information and Library Science, School of.
APA, Harvard, Vancouver, ISO, and other styles
43

Jiang, Hao, and 江浩. "Personalized web search re-ranking and content recommendation." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/197548.

Full text
Abstract:
In this thesis, I propose a method for establishing a personalized recommendation system for re-ranking web search results and recommending web contents. The method is based on personal reading interest which can be reflected by the user’s dwell time on each document or webpage. I acquire document-level dwell times via a customized web browser, or a mobile device. To obtain better precision, I also explore the possibility of tracking gaze position and facial expression, from which I can determine the attractiveness of different parts of a document. Inspired by idea of Google Knowledge Graph, I also establish a graph-based ontology to maintain a user profile to describe the user’s personal reading interest. Each node in the graph is a concept, which represents the user’s potential interest on this concept. I also use the dwell time to measure concept-level interest, which can be inferred from document-level user dwell times. The graph is generated based on the Wikipedia. According to the estimated concept-level user interest, my algorithm can estimate a user’s potential dwell time over a new document, based on which personalized webpage re-ranking can be carried out. I compare the rankings produced by my algorithm with rankings generated by popular commercial search engines and a recently proposed personalized ranking algorithm. The results clearly show the superiority of my method. I also use my personalized recommendation framework in other applications. A good example is personalized document summarization. The same knowledge graph is employed to estimate the weight of every word in a document; combining with a traditional document summarization algorithm which focused on text mining, I could generate a personalized summary which emphasize the user’s interest in the document. To deal with images and videos, I present a new image search and ranking algorithm for retrieving unannotated images by collaboratively mining online search results, which consists of online images and text search results. The online image search results are leveraged as reference examples to perform content-based image search over unannotated images. The online text search results are used to estimate individual reference images’ relevance to the search query as not all the online image search results are closely related to the query. Overall, the key contribution of my method lies in its ability to deal with unreliable online image search results through jointly mining visual and textual aspects of online search results. Through such collaborative mining, my algorithm infers the relevance of an online search result image to a text query. Once I estimate a query relevance score for each online image search result, I can selectively use query specific online search result images as reference examples for retrieving and ranking unannotated images. To explore the performance of my algorithm, I tested it both on a standard public image datasets and several modestly sized personal photo collections. I also compared the performance of my method with that of two peer methods. The results are very positive, which indicate that my algorithm is superior to existing content-based image search algorithms for retrieving and ranking unannotated images. Overall, the main advantage of my algorithm comes from its collaborative mining over online search results both in the visual and the textual domains.
published_or_final_version
Computer Science
Doctoral
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
44

Lakshmanan, Hariharan 1980. "A client side tool for contextual Web search." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/29385.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2004.
Includes bibliographical references (p. 76-77).
This thesis describes the design and development of an application that uses information relevant to the context of a web search for the purpose of improving the search results obtained using standard search engines. The representation of the contextual information is based on a Vector Space Model and is obtained from a set of documents that have been identified as relevant to the context of the search. Two algorithms have been developed for using this contextual representation to re-rank the search results obtained using search engines. In the first algorithm, re-ranking is done based on a comparison of every search result with all the contextual documents. In the second algorithm, only a subset of the contextual documents that relate to the search query is used to measure the relevance of the search results. This subset is identified by mapping the search query onto the Vector Space representation of the contextual documents. A software application was developed using the .NET framework with C# as the implementation language. The software has functionality to enable users to identify contextual documents and perform searches either using a standard search engine or using the above-mentioned algorithms. The software implementation details, and preliminary results regarding the efficiency of the proposed algorithms have been presented.
by Hariharan Lakshmanan.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
45

Svebrant, Henrik. "Latent variable neural click models for web search." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232311.

Full text
Abstract:
User click modeling in web search is most commonly done through probabilistic graphical models. Due to the successful use of machine learning techniques in other fields of research, it is interesting to evaluate how machine learning can be applied to click modeling. In this thesis, modeling is done using recurrent neural networks trained on a distributed representation of the state of the art user browsing model (UBM). It is further evaluated how extending this representation with a set of latent variables that are easily derivable from click logs, can affect the model's prediction performance. Results show that a model using the original representation does not perform very well. However, the inclusion of simple variables can drastically increase the performance regarding the click prediction task. For which it manages to outperform the two chosen baseline models, which themselves are well performing already. It also leads to increased performance for the relevance prediction task, although the results are not as significant. It can be argued that the relevance prediction task is not a fair comparison to the baseline functions, due to them needing more significant amounts of data to learn the respective probabilities. However, it is favorable that the neural models manage to perform quite well using smaller amounts of data. It would be interesting to see how well such models would perform when trained on far greater data quantities than what was used in this project. Also tailoring the model for the use of LSTM, which supposedly could increase performance even more. Evaluating other representations than the one used would also be of interest, as this representation did not perform remarkably on its own.
Klickmodellering av användare i söksystem görs vanligtvis med hjälp av probabilistiska modeller. På grund av maskininlärningens framgångar inom andra områden är det intressant att undersöka hur dessa tekniker kan appliceras för klickmodellering. Detta examensarbete undersöker klickmodellering med hjälp av recurrent neural networks tränade på en distribuerad representation av en populär och välpresterande klickmodell benämnd user browsing model (UBM). Det undersöks vidare hur utökandet av denna representation med statistiska variabler som enkelt kan utvinnas från klickloggar, kan påverka denna modells prestanda. Resultaten visar att grundrepresentationen inte presterar särskilt bra. Däremot har användningen av simpla variabler visats medföra drastiska prestandaökningar när det kommer till att förutspå en användares klick. I detta syfte lyckas modellerna prestera bättre än de två baselinemodeller som valts, vilka redan är välpresterande för syftet. De har även lyckats förbättra modellernas förmåga att förutspå relevans, fastän skillnaderna inte är lika drastiska. Relevans utgör inte en lika jämn jämförelse gentemot baselinemodellerna, då dessa kräver mycket större datamängder för att nå verklig prestanda. Det är däremot fördelaktigt att de neurala modellerna når relativt god prestanda för datamängden som använts. Det vore intressant att undersöka hur dessa modeller skulle prestera när de tränas på mycket större datamängder än vad som använts i detta projekt. Även att skräddarsy modellerna för LSTM, vilket borde kunna öka prestandan ytterligare. Att evaluera andra representationer än den som användes i detta projekt är också av intresse, då den använda representationen inte presterade märkvärdigt i sin grundform.
APA, Harvard, Vancouver, ISO, and other styles
46

Lakshmi, Shriram. "Web-based search engine for Radiology Teaching File." [Gainesville, Fla.] : University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE0000559.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Haque, Md Rakibul. "Decentralized Web Search." Thesis, 2012. http://hdl.handle.net/10012/6795.

Full text
Abstract:
Centrally controlled search engines will not be sufficient and reliable for indexing and searching the rapidly growing World Wide Web in near future. A better solution is to enable the Web to index itself in a decentralized manner. Existing distributed approaches for ranking search results do not provide flexible searching, complete results and ranking with high accuracy. This thesis presents a decentralized Web search mechanism, named DEWS, which enables existing webservers to collaborate with each other to form a distributed index of the Web. DEWS can rank the search results based on query keyword relevance and relative importance of websites in a distributed manner preserving a hyperlink overlay on top of a structured P2P overlay. It also supports approximate matching of query keywords using phonetic codes and n-grams along with list decoding of a linear covering code. DEWS supports incremental retrieval of search results in a decentralized manner which reduces network bandwidth required for query resolution. It uses an efficient routing mechanism extending the Plexus routing protocol with a message aggregation technique. DEWS maintains replica of indexes, which reduces routing hops and makes DEWS robust to webservers failure. The standard LETOR 3.0 dataset was used to validate the DEWS protocol. Simulation results show that the ranking accuracy of DEWS is close to the centralized case, while network overhead for collaborative search and indexing is logarithmic on network size. The results also show that DEWS is resilient to changes in the available pool of indexing webservers and works efficiently even in the presence of heavy query load.
APA, Harvard, Vancouver, ISO, and other styles
48

Wang, Yuan. "Distributed web search." 2004. http://www.library.wisc.edu/databases/connect/dissertations.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Kim, YS. "An evaluation study of web monitoring : web monitoring vs. web crawling." Thesis, 2009. https://eprints.utas.edu.au/20802/1/whole_KimYangsok2009_thesis.pdf.

Full text
Abstract:
Nowadays people use web search engines to find information. Even though these engines endeavour to provide information in a complete and timely manner, there are significant delays and under-coverage in their services. However, people sometimes want to obtain new information from personally selected web pages without missing anything and with little delay. Web monitoring tries to fulfil this goal by revisiting the selected web pages frequently. Initially, web monitoring focused on the monitoring method, but then the research emphasis changed in order to address the problem of information overload and scheduling under limited resources. This dissertation focuses on the following research problems to improve the efficiency of web monitoring systems. Firstly, it analyses how efficiently a document classification system that uses an incremental knowledge acquisition method, called MCRDR (Multiple Classification Ripple-Down Rules), was used to resolve individual information overload problems. Secondly, it discusses how MCRDR knowledge bases, standard web search engines, and appropriate· web page locating heuristics can be employed in unison to locate relevant monitoring web pages. Thirdly, it demonstrates that the web monitoring system exhibits better performance in respect of service coverage and delay than commercial web search engines. Lastly, it proposes a monitoring web page prioritization method that decides the orders of monitoring sequence using the estimated service coverage and delay of web search engines obtained by using various predictor variables identified from the web crawling policies and statistical regression methods.
APA, Harvard, Vancouver, ISO, and other styles
50

Chen, Wei-Lian, and 陳威良. "Web-based Collaborative Search." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/52025604220042466928.

Full text
Abstract:
碩士
國立臺灣大學
資訊工程學研究所
99
In the era of information explosion,it becomes more and more difficult to find out the information meeting users’ real needs on the internet.On account of their own limited domain knowledge,users may often overlook the information that they are not familiar with,but need.Users,as a result of insufficient knowledge of some field,may not have any idea how to start searching information,too.Such problems might be solved more easily,as those who have had already experience if searching information in the internet could help users.   A collaborative search system based on the internet would be designed.This collaborative search system,working together with the present search engine,will segment the snippet of search results and take the segmentation for its own query prefile.When users query,the system will also automatically suggest the terms of similar query profile as queries,according to importance,relevance and novelty.The efficiency of searching on the internet could hopefully become better in this way.   Finally,there are three experiments designed to examine and evaluate query precision,the novelty and relevance of recommended terms.The strong points and possible improvement of this method will be discussed,too.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography