Дисертації з теми "080704 Information Retrieval and Web Search"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "080704 Information Retrieval and Web Search".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Tjondronegoro, Dian W. "PhD Thesis: "Content-based Video Indexing for Sports Applications using Multi-modal approach"." Thesis, Deakin University, 2005. https://eprints.qut.edu.au/2199/1/PhDThesis_Tjondronegoro.pdf.
Повний текст джерелаLewandowski, Dirk. "Web Searching, Search Engines and Information Retrieval." ISO Press, 2005. http://hdl.handle.net/10150/106395.
Повний текст джерелаCraswell, Nicholas Eric, and Nick Craswell@anu edu au. "Methods for Distributed Information Retrieval." The Australian National University. Faculty of Engineering and Information Technology, 2001. http://thesis.anu.edu.au./public/adt-ANU20020315.142540.
Повний текст джерелаCosta, Miguel. "SIDRA: a Flexible Web Search System." Master's thesis, Department of Informatics, University of Lisbon, 2004. http://hdl.handle.net/10451/13914.
Повний текст джерелаLimbu, Dilip Kumar. "Contextual information retrieval from the WWW." Click here to access this resource online, 2008. http://hdl.handle.net/10292/450.
Повний текст джерелаMorrison, Patrick Jason. "Tagging and Searching: Search Retrieval Effectiveness of Folksonomies on the Web." [Kent, Ohio] : Kent State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=kent1177305096.
Повний текст джерелаTitle from PDF t.p. (viewed July 2, 2007). Advisor: David B. Robins. Keywords: information retrieval, search engine, social bookmarking, tagging, folksonomy, Internet, World Wide Web. Includes survey instrument. Includes bibliographical references (p. 137-141).
Nguyen, Qui V. "Enhancing a Web Crawler with Arabic Search." Thesis, Monterey, California: Naval Postgraduate School, 2012.
Знайти повний текст джерелаTsukuda, Kosetsu. "A Study on Web Search and Analysis based on Typicality." 京都大学 (Kyoto University), 2014. http://hdl.handle.net/2433/192217.
Повний текст джерелаUmemoto, Kazutoshi. "A Study on Fine-Grained User Behavior Analysis in Web Search." 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/215679.
Повний текст джерелаHalpin, Harry. "Sense and reference on the Web." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/3796.
Повний текст джерелаWeldeghebriel, Zemichael Fesahatsion. "Evaluating and comparing search engines in retrieving text information from the web." Thesis, Stellenbosch : Stellenbosch University, 2004. http://hdl.handle.net/10019.1/53740.
Повний текст джерелаENGLISH ABSTRACT: With the introduction of the Internet and the World Wide Web (www), information can be easily accessed and retrieved from the web using information retrieval systems such as web search engines or simply search engines. There are a number of search engines that have been developed to provide access to the resources available on the web and to help users in retrieving relevant information from the web. In particular, they are essential for finding text information on the web for academic purposes. But, how effective and efficient are those search engines in retrieving the most relevant text information from the web? Which of the search engines are more effective and efficient? So, this study was conducted to see how effective and efficient search engines are and to see which search engines are most effective and efficient in retrieving the required text information from the web. It is very important to know the most effective and efficient search engines because such search engines can be used to retrieve a higher number of the most relevant text web pages with minimum time and effort. The study was based on nine major search engines, four search queries and relevancy judgments as relevant/partly-relevanUnon-relevant. Precision and recall were calculated based on the experimental or test results and these were used as basis for the statistical evaluation and comparisons of the retrieval effectiveness of the nine search engines. Duplicated items and broken links were also recorded and examined separately and were used as an additional measure of search engine effectiveness. A response time was also recorded and used as a base for the statistical evaluation and comparisons of the retrieval efficiency of the nine search engines. Additionally, since search engines involve indexing and searching in the information retrieval processes from the web, this study first discusses, from the theoretical point of view, how the indexing and searching processes are performed in an information retrieval environment. It also discusses the influences of indexing and searching processes on the effectiveness and efficiency of information retrieval systems in general and search engines in particular in retrieving the most relevant text information from the web.
AFRIKAANSE OPSOMMING: Met die koms van die Internet en die Wêreldwye Web (www) is inligting maklik bekombaar. Dit kan herwin word deur gebruik te maak van inligtingherwinningsisteme soos soekenjins. Daar is 'n hele aantal sulke soekenjins wat ontwikkel is om toegang te verleen tot die hulpbronne beskikbaar op die web en om gebruikers te help om relevante inligting vanaf die web in te win. Dit is veral noodsaaklik vir die verkryging van teksinligting vir akademiese doeleindes. Maar hoe effektief en doelmatig is die soekenjins in die herwinning van die mees relevante teksinligting vanaf die web? Watter van die soekenjins is die effektiefste? Hierdie studie is onderneem om te kyk watter soekenjins die effektiefste en doelmatigste is in die herwinning van die nodige teksinligting. Dit is belangrik om te weet watter soekenjin die effektiefste is want so 'n enjin kan gebruik word om 'n hoër getal van die mees relevante tekswebblaaie met die minimum van tyd en moeite te herwin. Heirdie studie is baseer op die sewe hoofsoekenjins, vier soektogte, en toepasliksheidsoordele soos relevant /gedeeltelik relevant/ en nie- relevant. Presiesheid en herwinningsvermoë is bereken baseer op die eksperimente en toetsresultate en dit is gebruik as basis vir statistiese evaluasie en vergelyking van die herwinningseffektiwiteit van die nege soekenjins. Gedupliseerde items en gebreekte skakels is ook aangeteken en apart ondersoek en is gebruik as bykomende maatstaf van effektiwiteit. Die reaksietyd is ook aangeteken en is gebruik as basis vir statistiese evaluasie en die vergelyking van die herwinningseffektiwiteit van die nege soekenjins. Aangesien soekenjins betrokke is by indeksering en soekprosesse, bespreek hierdie studie eers uit 'n teoretiese oogpunt, hoe indeksering en soekprosesse uitgevoer word in 'n inligtingherwinningsomgewing. Die invloed van indeksering en soekprosesse op die doeltreffendheid van herwinningsisteme in die algemeen en veral van soekenjins in die herwinning van die mees relevante teksinligting vanaf die web, word ook bespreek.
Na, Jin-Cheon, Christopher S. G. Khoo, and Syin Chan. "A sentiment-based meta search engine." School of Communication & Information, Nanyang Technological University, 2006. http://hdl.handle.net/10150/106241.
Повний текст джерелаMeng, Zhao. "A Study on Web Search based on Coordinate Relationships." 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/217205.
Повний текст джерелаVaradarajan, Ramakrishna R. "Ranked Search on Data Graphs." FIU Digital Commons, 2009. http://digitalcommons.fiu.edu/etd/220.
Повний текст джерелаZhang, Limin. "Contextual Web Search Based on Semantic Relationships: A Theoretical Framework, Evaluation and a Medical Application Prototype." Diss., Tucson, Arizona : University of Arizona, 2006. http://etd.library.arizona.edu/etd/GetFileServlet?file=file:///data1/pdf/etd/azu%5Fetd%5F1602%5F1%5Fm.pdf&type=application/pdf.
Повний текст джерелаChignell, Mark, Jacek Gwizdka, and Richard Bodner. "Discriminating Meta-Search: A Framework for Evaluation." Elsevier, 1999. http://hdl.handle.net/10150/105146.
Повний текст джерелаThere was a proliferation of electronic information sources and search engines in the 1990s. Many of these information sources became available through the ubiquitous interface of the Web browser. Diverse information sources became accessible to information professionals and casual end users alike. Much of the information was also hyperlinked, so that information could be explored by browsing as well as searching. While vast amounts of information were now just a few keystrokes and mouseclicks away, as the choices multiplied, so did the complexity of choosing where and how to look for the electronic information. Much of the complexity in information exploration at the turn of the twenty-first century arose because there was no common cataloguing and control system across the various electronic information sources. In addition, the many search engines available differed widely in terms of their domain coverage, query methods, and efficiency. Meta-search engines were developed to improve search performance by querying multiple search engines at once. In principle, meta-search engines could greatly simplify the search for electronic information by selecting a subset of first-level search engines and digital libraries to submit a query to based on the characteristics of the user, the query/topic, and the search strategy. This selection would be guided by diagnostic knowledge about which of the first-level search engines works best under what circumstances. Programmatic research is required to develop this diagnostic knowledge about first-level search engine performance. This paper introduces an evaluative framework for this type of research and illustrates its use in two experiments. The experimental results obtained are used to characterize some properties of leading search engines (as of 1998). Significant interactions were observed between search engine and two other factors (time of day, and Web domain). These findings supplement those of earlier studies, providing preliminary information about the complex relationship between search engine functionality and performance in different contexts. While the specific results obtained represent a time-dependent snapshot of search engine performance in 1998, the evaluative framework proposed should be generally applicable in the future.
Al-Dallal, Ammar Sami. "Enhancing recall and precision of web search using genetic algorithm." Thesis, Brunel University, 2012. http://bura.brunel.ac.uk/handle/2438/7379.
Повний текст джерелаKnopke, Ian. "Building a search engine for music and audio on the World Wide Web." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=85177.
Повний текст джерелаThe most important part of this system is a web crawler that finds materials by following hyperlinks between web pages. The crawler is distributed and operates using multiple computers across a network, storing results to a database. There are two main components: a set of retrievers that retrieve pages and audio files from the web, and a central crawl manager that coordinates the retrievers and handles data storage tasks.
The crawler is designed to locate three types of audio files: AIFF, WAVE, and MPEG-1 (MP3), but other types can be easily added to the system. Once audio files are located, analyses are performed of both the audio files and the associated web pages that link to these files. Information extracted by the crawler can be used to build search indexes for resolving user queries. A set of results demonstrating aspects of the performance of the crawler are presented, as well as some statistics and points of interest regarding the nature of audio files on the web.
Shoji, Yoshiyuki. "A Study on Social Information Search and Analysis on the Web by Diversity Computation." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/199443.
Повний текст джерелаJing, Yushi. "Learning an integrated hybrid image retrieval system." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/43746.
Повний текст джерелаLi, Ping 1965. "Doctoral students’ mental models of a web search engine : an exploratory study." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=94181.
Повний текст джерелаCette recherche préliminaire examine les facteurs qui peuvent influencer les modèles mentaux d’un groupe spécifique d’utilisateurs d’un moteur de recherche sur le Web: Google, mesurés selon l’étendue de leur réussite.Une échelle de cette réussite en suivant un modèle mental a été constituée en adaptant les modèles présentés par Borgman, Dimitroff et Saxon, incluant la perception (1) de la nature du moteur de recherche sur le Web, (2) des caractéristiques de la recherche propres à ce moteur, (3) de l’interaction entre le chercheur et le moteur de recherche. A l’aide de cette échelle, le niveau de réussite par un sujet donné utilisant un modèle mental a été déterminé en fonction du nombre de composantes des deux premières parties de l’échelle décrites et du niveau d’interaction entre le sujet et le moteur Google, tel que révélé par ses recherches. Le choix des facteurs a été fondé sur des études précédentes portant sur les différences individuelles entre les chercheurs d’information, comprenant le degré d’expérience d’une telle recherche par l’utilisateur, son style cognitif, son style d’apprentissage, ses aptitudes techniques, la formation reçue, la discipline et le sexe. Seize étudiants en doctorat ayant l’anglais comme première langue ont participé à cette étude. Des entretiens individuels semi-dirigés ont permis de déterminer le niveau de réussite des étudiants suivant leur modèle mental, ainsi que leur expérience de la recherche, la formation reçue, la discipline et le sexe. Une observation technique directe a été utilisée pour observer l’interaction réelle des étudiants avec Google. Des tests standardisés ont été administrés pour déterminer le style cognitif des étudiants, leur style d’apprentissage et leurs aptitudes techniques. fr
POTHIRATTANACHAIKUL, SUPPANUT. "A Study on Understanding and Encouraging Alternative Information Search." Kyoto University, 2020. http://hdl.handle.net/2433/259073.
Повний текст джерелаMendoza, Rocha Marcelo Gabriel. "Query log mining in search engines." Tesis, Universidad de Chile, 2007. http://www.repositorio.uchile.cl/handle/2250/102877.
Повний текст джерелаLa Web es un gran espacio de información donde muchos recursos como documentos, imágenes u otros contenidos multimediales pueden ser accesados. En este contexto, varias tecnologías de la información han sido desarrolladas para ayudar a los usuarios a satisfacer sus necesidades de búsqueda en la Web, y las más usadas de estas son los motores de búsqueda. Los motores de búsqueda permiten a los usuarios encontrar recursos formulando consultas y revisando una lista de respuestas. Uno de los principales desafíos para la comunidad de la Web es diseñar motores de búsqueda que permitan a los usuarios encontrar recursos semánticamente conectados con sus consultas. El gran tamaño de la Web y la vaguedad de los términos más comúnmente usados en la formulación de consultas es un gran obstáculo para lograr este objetivo. En esta tesis proponemos explorar las selecciones de los usuarios registradas en los logs de los motores de búsqueda para aprender cómo los usuarios buscan y también para diseñar algoritmos que permitan mejorar la precisión de las respuestas recomendadas a los usuarios. Comenzaremos explorando las propiedades de estos datos. Esta exploración nos permitirá determinar la naturaleza dispersa de estos datos. Además presentaremos modelos que nos ayudarán a entender cómo los usuarios buscan en los motores de búsqueda. Luego, exploraremos las selecciones de los usuarios para encontrar asociaciones útiles entre consultas registradas en los logs. Concentraremos los esfuerzos en el diseño de técnicas que permitirán a los usuarios encontrar mejores consultas que la consulta original. Como una aplicación, diseñaremos métodos de reformulación de consultas que ayudarán a los usuarios a encontrar términos más útiles mejorando la representación de sus necesidades. Usando términos de documentos construiremos representaciones vectoriales para consultas. Aplicando técnicas de clustering podremos determinar grupos de consultas similares. Usando estos grupos de consultas, introduciremos métodos para recomendación de consultas y documentos que nos permitirán mejorar la precisión de las recomendaciones. Finalmente, diseñaremos técnicas de clasificación de consultas que nos permitirán encontrar conceptos semánticamente relacionados con la consulta original. Para lograr esto, clasificaremos las consultas de los usuarios en directorios Web. Como una aplicación, introduciremos métodos para la manutención automática de los directorios.
Wilson, Mathew J. "The effects of search strategies and information interaction on sensemaking." Thesis, Swansea University, 2015. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.678376.
Повний текст джерелаZhu, Dengya. "Improving the relevance of search results via search-term disambiguation and ontological filtering." Thesis, Curtin University, 2007. http://hdl.handle.net/20.500.11937/2486.
Повний текст джерелаTang, Ling-Xiang. "Link discovery for Chinese/English cross-language web information retrieval." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/58416/1/Ling-Xiang_Tang_Thesis.pdf.
Повний текст джерелаKinley, Khamsum. "Towards modelling web search behaviour : integrating users’ cognitive styles." Thesis, Queensland University of Technology, 2013. https://eprints.qut.edu.au/63804/1/Kinley_Kinley_Thesis.pdf.
Повний текст джерелаEdizel, Necati Bora. "Word embeddings with applications to web search and advertising." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/669622.
Повний текст джерелаDins del món del Processament del Llenguatge Natural (NLP) i d’altres camps relacionats amb aquest àmbit, les representaciones latents de paraules (word embeddings) s'han convertit en una tecnologia fonamental per a desenvolupar aplicacions pràctiques. En aquesta tesi es presenta un anàlisi teòric d’aquests word embeddings així com alguns algoritmes per a entrenar-los. A més a més, com a aplicació pràctica d’aquesta recerca també es presenten aplicacions per a cerques a la web i màrqueting. Primer, s’introdueixen alguns aspectes teòrics d’un dels algoritmes més populars per a aprendre word embeddings, el word2vec. També es presenta el word2vec en un context de Reinforcement Learning demostrant que modela les normes no explícites (off-policy) en presència d’un conjunt de normes (policies) de comportament fixes. A continuació, presentem un nou algoritme de d’aprenentatge de normes no explícites (off-policy), $word2vec_{\pi}$, com a modelador de normes de comportament. La validació experimental corrobora la superioritat d’aquest nou algorithme respecte \textit{word2vec}. Segon, es presenta un mètode per a aprendre word embeddings que són resistents a errors d’escriptura. La majoria de word embeddings tenen una aplicació limitada quan s’enfronten a textos amb errors o paraules fora del vocabulari. Nosaltres proposem un mètode combinant FastText amb sub-paraules i una tasca supervisada per a aprendre patrons amb errors. Els resultats proven com les paraules mal escrites estan pròximes a les correctes quan les comparem dins de l’embedding. Finalment, aquesta tesi proposa dues tècniques noves (una a nivell de caràcter i l’altra a nivell de paraula) que empren xarxes neuronals (DNNs) per a la tasca de similaritat semàntica. Es demostra experimentalment que aquests mètodes són eficaços per a la predicció de l’eficàcia (click-through rate) dins del context de cerces patrocinades.
Crain, Steven P. "Personalized search and recommendation for health information resources." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45805.
Повний текст джерелаZhu, Dengya. "Improving the relevance of search results via search-term disambiguation and ontological filtering." Curtin University of Technology, School of Information Systems, 2007. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=9348.
Повний текст джерелаTo achieve the above research goal, a special search-browser is developed, and its retrieval effectiveness is evaluated. The hierarchical structure of the Open Directory Project (ODP) is employed as the socially constructed knowledge structure which is represented by the Tree component of Java. Yahoo! Search Web Services API is utilized to obtain search results directly from Yahoo! search engine databases. The Lucene text search engine calculates similarities between each returned search result and the semantic characteristics of each category in the ODP; and thus to assign the search results to the corresponding ODP categories by Majority Voting algorithm. When an interesting category is selected by a user, only search results categorized under the category are presented to the user, and the quality of the search results is consequently improved.
Experiments demonstrate that the proposed approach of this research can improve the precision of Yahoo! search results at the 11 standard recall levels from an average 41.7 per cent to 65.2 per cent; the improvement is as high as 23.5 per cent. This conclusion is verified by comparing the improvements of the P@5 and P@10 of Yahoo! search results and the categorized search results of the special search-browser. The improvement of P@5 and P@10 are 38.3 per cent (85 per cent - 46.7 per cent) and 28 per cent (70 per cent - 42 per cent) respectively. The experiment of this research is well designed and controlled. To minimize the subjectiveness of relevance judgments, in this research five judges (experts) are asked to make their relevance judgments independently, and the final relevance judgment is a combination of the five judges’ judgments. The judges are presented with only search-terms, information needs, and the 50 search results of Yahoo! Search Web Service API. They are asked to make relevance judgments based on the information provided above, there is no categorization information provided.
The first contribution of this research is to use an extracted category-document to represent the semantic characteristics of each of the ODP categories. A category-document is composed of the topic of the category, description of the category, the titles and the brief descriptions of the submitted Web pages under this category. Experimental results demonstrate the category-documents of the ODP can represent the semantic characteristics of the ODP in most cases. Furthermore, for machine learning algorithms, the extracted category-documents can be utilized as training data which otherwise demand much human labor to create to ensure the learning algorithm to be properly trained. The second contribution of this research is the suggestion of the new concepts of relevance judgment convergent degree and relevance judgment divergent degree that are used to measure how well different judges agree with each other when they are asked to judge the relevance of a list of search results. When the relevance judgment convergent degree of a search-term is high, an IR algorithm should obtain a higher precision as well. On the other hand, if the relevance judgment convergent degree is low, or the relevance judgment divergent degree is high, it is arguable to use the data to evaluate the IR algorithm. This intuition is manifested by the experiment of this research. The last contribution of this research is that the developed search-browser is the first IR system (IRS) to utilize the ODP hierarchical structure to categorize and filter search results, to the best of my knowledge.
Du, Jia (Tina). "Multitasking, cognitive coordination and cognitive shifts during web searching." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/35717/1/Jia_Du_Thesis.pdf.
Повний текст джерелаFidan, Guven. "Identifying The Effectiveness Of A Web Search Engine With Turkish Domain Dependent Impacts And Global Scale Information Retrieval Improvements." Phd thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614116/index.pdf.
Повний текст джерелаthe use of Turkish stemmer for indexing and query substitution
and, the use of thumbnails for Web search engine result visualization. As Web search engines have become the primary means for finding and accessing information on the Internet, the effectiveness of Web search engines should be evaluated on the idea of how effectively and efficiently they assist users achieve a query, which defines performance criteria rather than the pure precision and recall measures developed among basic information retrieval roles. In this thesis, we propose three distinguishing features to increase the efficiency of a Web search engine: The impact of link quality and usage information on page importance calculation outperforms classical hyperlink graph based methods notably, such as PageRank. The use of the Turkish stemmer for indexing and query substitution has remarkable improvements on Web relevance when used in a mixed framework with normal and stemmed forms. Finally, we have observed that users are able to find the most relevant results by using webpage thumbnails in the queries with decreased precision score values, despite their preferred search engine gazing behavior is much attributed.
Rahuma, Awatef. "Semantically-enhanced image tagging system." Thesis, De Montfort University, 2013. http://hdl.handle.net/2086/9494.
Повний текст джерелаDe, Groc Clément. "Collecte orientée sur le Web pour la recherche d’information spécialisée." Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112073/document.
Повний текст джерелаVertical search engines, which focus on a specific segment of the Web, become more and more present in the Internet landscape. Topical search engines, notably, can obtain a significant performance boost by limiting their index on a specific topic. By doing so, language ambiguities are reduced, and both the algorithms and the user interface can take advantage of domain knowledge, such as domain objects or characteristics, to satisfy user information needs.In this thesis, we tackle the first inevitable step of a all topical search engine : focused document gathering from the Web. A thorough study of the state of art leads us to consider two strategies to gather topical documents from the Web: either relying on an existing search engine index (focused search) or directly crawling the Web (focused crawling).The first part of our research has been dedicated to focused search. In this context, a standard approach consists in combining domain-specific terms into queries, submitting those queries to a search engine and down- loading top ranked documents. After empirically evaluating this approach over 340 topics, we propose to enhance it in two different ways: Upstream of the search engine, we aim at formulating more relevant queries in or- der to increase the precision of the top retrieved documents. To do so, we define a metric based on a co-occurrence graph and a random walk algorithm, which aims at predicting the topical relevance of a query. Downstream of the search engine, we filter the retrieved documents in order to improve the document collection quality. We do so by modeling our gathering process as a tripartite graph and applying a random walk with restart algorithm so as to simultaneously order by relevance the documents and terms appearing in our corpus.In the second part of this thesis, we turn to focused crawling. We describe our focused crawler implementation that was designed to scale horizontally. Then, we consider the problem of crawl frontier ordering, which is at the very heart of a focused crawler. Such ordering strategy allows the crawler to prioritize its fetches, maximizing the number of in-domain documents retrieved while minimizing the non relevant ones. We propose to apply learning to rank algorithms to efficiently order the crawl frontier, and define a method to learn a ranking function from existing crawls
Marangon, Sílvio Luís. "Análise de métodos para programação de contextualização." Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-14122006-112458/.
Повний текст джерелаInternet services as news clipping service, anti-phising, anti-plagiarism service and other that require intensive searching in Internet have a difficult work, because of huge number of existing pages. Search Engines try driver this problem, but search engines methods retrieve a lot of irrelevant pages, some times thousands of pages and more powerful methods are necessary to drive this problem. Page content, subject, hyperlinks or location can be used to define page context and create a more powerful method that can retrieve more relevant pages, improving precision. Classification of page context is defined as classification of a page by a set of its feature. This report presents a study about Web Mining, Search Engines and application of web mining technologies to classify page context. Page context classification applied to search engines must solve the problem of irrelevant pages flood by allowing search engines retrieve pages of a context.
Moral, Ibrahim Utku. "Publication of the Bibliographies on the World Wide Web." Thesis, Virginia Tech, 1997. http://hdl.handle.net/10919/36748.
Повний текст джерелаMaster of Science
He, Hai. "Towards automatic understanding and integration of web databases for developing large-scale unified access systems." Diss., Online access via UMI:, 2006.
Знайти повний текст джерелаLunardi, Marcia Severo. "Visualização em nuvens de texto como apoio à busca exploratória na web." Universidade do Estado do Rio de Janeiro, 2008. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=1522.
Повний текст джерелаThis dissertation presents the results of a research that evaluates the advantages of text clouds to the visualization of web search results. A text cloud is a visualization technique for texts and textual data in general. Its main purpose is to enhance comprehension of a large body of text by summarizing it automatically and is generally applied for managing information overload. While continual improvements in search technology have made it possible to quickly find relevant information on the web, few search engines do anything to organize or to summarize the contents of such responses beyond ranking the items in a list. In exploratory searches, users may be forced to scroll through many pages to identify the information they seek and are generally not provided with any way to visualize the totality of the results returned. This research is divided in two parts. Part one describes the development of an application that generates text clouds for the summarization of search results from the standard result list provided by the Yahoo search engine. The second part describes the evaluation of this application. Adapted to this specific context, a text cloud is generated from the text of the first sites returned by the search engine according to its relevance algorithms. The benefit of this application is that it enables users to obtain a visual overview of the main results at once. From this overview the users can obtain keywords to navigate to potential relevant subjects that otherwise would be hidden deep down in the response list. Also, users can realize by visualizing the results in context that his initial query term was not the best choice.
Sahay, Saurav. "Socio-semantic conversational information access." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42855.
Повний текст джерелаAngelini, Marco. "Un approccio per la concettualizzazione di insiemi di documenti." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/5604/.
Повний текст джерелаSouza, Jucimar Brito de. "Algoritmos para avaliação de confiança em apontadores encontrados na Web." Universidade Federal do Amazonas, 2009. http://tede.ufam.edu.br/handle/tede/2960.
Повний текст джерелаCoordenação de Aperfeiçoamento de Pessoal de Nível Superior
Search engines have become an essential tool for web users today. They use algorithms to analyze the linkage relationships of the pages in order to estimate popularity for each page, taking each link as a vote of quality for pages. This information is used in the search engine ranking algorithms. However, a large amount of links found on the Web can not be considered as a good vote for quality, presenting information that can be considered as noise for search engine ranking algorithms. This work aims to detect noises in the structure of links that exist in search engine collections. We studied the impact of the methods developed here for detection of noisy links, considering scenarios in which the reputation of pages is calculated using Pagerank and Indegree algorithms. The results of the experiments showed improvement up to 68.33% in metric Mean Reciprocal Rank (MRR) for navigational queries and up to 35.36% for randomly selected navigational queries.
Máquinas de busca têm se tornado uma ferramenta imprescindível para os usuários da Web. Elas utilizam algoritmos de análise de apontadores para explorar a estrutura dos apontadores da Web para atribuir uma estimativa de popularidade a cada página. Essa informação é usada na ordenação da lista de respostas dada por máquinas de busca a consultas submetidas por seus usuários. Contudo, alguns tipos de apontadores prejudicam a qualidade da estimativa de popularidade por apresentar informação ruidosa, podendo assim afetar negativamente a qualidade de respostas providas por máquinas de busca a seus usuários. Exemplos de tais apontadores incluem apontadores repetidos, apontadores resultantes da duplicação de páginas, SPAM, dentre outros. Esse trabalho tem como objetivo detectar ruídos na estrutura dos apontadores existentes em base de dados de máquinas de busca. Foi estudado o impacto dos métodos aqui desenvolvidos para detecção de apontadores ruidosos, considerando cenários nos quais a reputação das páginas é calculada tanto com o algoritmos Pagerank quanto com o algoritmo Indegree. Os resultados dos experimentos apresentaram melhoria de até 68,33% na métrica Mean Reciprocal Rank (MRR) para consultas navegacionais e de até 35,36% para as consultas navegacionais aleatórias quando uma máquina de busca utiliza o algoritmo Pagerank.
Lisena, Pasquale. "Knowledge-based music recommendation : models, algorithms and exploratory search." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS614.
Повний текст джерелаRepresenting the information about music is a complex activity that involves different sub-tasks. This thesis manuscript mostly focuses on classical music, researching how to represent and exploit its information. The main goal is the investigation of strategies of knowledge representation and discovery applied to classical music, involving subjects such as Knowledge-Base population, metadata prediction, and recommender systems. We propose a complete workflow for the management of music metadata using Semantic Web technologies. We introduce a specialised ontology and a set of controlled vocabularies for the different concepts specific to music. Then, we present an approach for converting data, in order to go beyond the librarian practice currently in use, relying on mapping rules and interlinking with controlled vocabularies. Finally, we show how these data can be exploited. In particular, we study approaches based on embeddings computed on structured metadata, titles, and symbolic music for ranking and recommending music. Several demo applications have been realised for testing the previous approaches and resources
Reis, Thiago. "Algoritmo rastreador web especialista nuclear." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/85/85133/tde-07012014-134548/.
Повний текст джерелаOver the last years the Web has obtained an exponential growth, becoming the largest information repository ever created and representing a new and valuable source of potentially useful information for several topics and also for nuclear-related themes. However, due to the Web characteristics and, mainly, because of its huge data volume, finding and retrieving relevant and useful information are non-trivial tasks. This challenge is addressed by web search and retrieval algorithms called web crawlers. This work presents the research and development of a crawler algorithm able to search and retrieve webpages with nuclear-related textual content, in autonomous and massive fashion. This algorithm was designed under the expert systems model, having, this way, a knowledge base that contains a list of nuclear topics and keywords that define them and an inference engine composed of a multi-layer perceptron artificial neural network that performs webpages relevance estimates to some knowledge base nuclear topic while searching the Web. Thus, the algorithm is able to autonomously search the Web by following the hyperlinks that interconnect the webpages and retrieving those that are more relevant to some predefined nuclear topic, emulating the ability a nuclear expert has to browse the Web and evaluate nuclear information. Preliminary experimental results show a retrieval precision of 80% for the nuclear general domain topic and 72% for the nuclear power topic, indicating that the proposed algorithm is effective and efficient to search the Web and to retrieve nuclear-related information.
Penatti, Otávio Augusto Bizetto 1984. "Estudo comparativo de descritores para recuperação de imagens por conteudo na web." [s.n.], 2009. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276157.
Повний текст джерелаDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-13T11:00:01Z (GMT). No. of bitstreams: 1 Penatti_OtavioAugustoBizetto_M.pdf: 2250748 bytes, checksum: 57d5b2f9120a8eae69ee9881d363e9ce (MD5) Previous issue date: 2009
Resumo: A crescente quantidade de imagens geradas e disponibilizadas atualmente tem eito aumentar a necessidade de criação de sistemas de busca para este tipo de informação. Um método promissor para a realização da busca de imagens e a busca por conteúdo. Este tipo de abordagem considera o conteúdo visual das imagens, como cor, textura e forma de objetos, para indexação e recuperação. A busca de imagens por conteúdo tem como componente principal o descritor de imagens. O descritor de imagens é responsável por extrair propriedades visuais das imagens e armazená-las em vetores de características. Dados dois vetores de características, o descritor compara-os e retorna um valor de distancia. Este valor quantifica a diferença entre as imagens representadas pelos vetores. Em um sistema de busca de imagens por conteúdo, a distancia calculada pelo descritor de imagens é usada para ordenar as imagens da base em relação a uma determinada imagem de consulta. Esta dissertação realiza um estudo comparativo de descritores de imagens considerando a Web como cenário de uso. Este cenário apresenta uma quantidade muito grande de imagens e de conteúdo bastante heterogêneo. O estudo comparativo realizado nesta dissertação é feito em duas abordagens. A primeira delas considera a complexidade assinto tica dos algoritmos de extração de vetores de características e das funções de distancia dos descritores, os tamanhos dos vetores de características gerados pelos descritores e o ambiente no qual cada descritor foi validado originalmente. A segunda abordagem compara os descritores em experimentos práticos em quatro bases de imagens diferentes. Os descritores são avaliados segundo tempo de extração, tempo para cálculos de distancia, requisitos de armazenamento e eficácia. São comparados descritores de cor, textura e forma. Os experimentos são realizados com cada tipo de descritor independentemente e, baseado nestes resultados, um conjunto de descritores é avaliado em uma base com mais de 230 mil imagens heterogêneas, que reflete o conteúdo encontrado na Web. A avaliação de eficácia dos descritores na base de imagens heterogêneas é realizada por meio de experimentos com usuários reais. Esta dissertação também apresenta uma ferramenta para a realização automatizada de testes comparativos entre descritores de imagens.
Abstract: The growth in size of image collections and the worldwide availability of these collections has increased the demand for image retrieval systems. A promising approach to address this demand is to retrieve images based on image content (Content-Based Image Retrieval). This approach considers the image visual properties, like color, texture and shape of objects, for indexing and retrieval. The main component of a content-based image retrieval system is the image descriptor. The image descriptor is responsible for encoding image properties into feature vectors. Given two feature vectors, the descriptor compares them and computes a distance value. This value quantifies the difference between the images represented by their vectors. In a content-based image retrieval system, these distance values are used to rank database images with respect to their distance to a given query image. This dissertation presents a comparative study of image descriptors considering the Web as the environment of use. This environment presents a huge amount of images with heterogeneous content. The comparative study was conducted by taking into account two approaches. The first approach considers the asymptotic complexity of feature vectors extraction algorithms and distance functions, the size of the feature vectors generated by the descriptors and the environment where each descriptor was validated. The second approach compares the descriptors in practical experiments using four different image databases. The evaluation considers the time required for features extraction, the time for computing distance values, the storage requirements and the effectiveness of each descriptor. Color, texture, and shape descriptors were compared. The experiments were performed with each kind of descriptor independently and, based on these results, a set of descriptors was evaluated in an image database containing more than 230 thousand heterogeneous images, reflecting the content existent in the Web. The evaluation of descriptors effectiveness in the heterogeneous database was made by experiments using real users. This dissertation also presents a tool for executing experiments aiming to evaluate image descriptors.
Mestrado
Sistemas de Informação
Mestre em Ciência da Computação
Synek, Pavel. "Metody vyhledávání informací na webu první a druhé generace." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-165288.
Повний текст джерелаSantos, Célia Francisca dos. "Métodos de poda estática para índices de máquinas de busca." Universidade Federal do Amazonas, 2006. http://tede.ufam.edu.br/handle/tede/2944.
Повний текст джерелаCoordenação de Aperfeiçoamento de Pessoal de Nível Superior
Neste trabalho são propostos e avaliados experimentalmente novos métodos de poda estática especialmente projetados para máquinas de busca web. Os métodos levam em consideração a localidade de ocorrência dos termos nos documentos para realizar a poda em índices de máquinas de busca e, por esta razão, são chamados de "métodos de poda baseados em localidade". Quatro novos métodos de poda que utilizam informação de localidade são propostos aqui: two-pass lbpm, full coverage, top fragments e random. O método two-pass lbpm é o mais efetivo dentre os métodos baseados em localidade, mas requer uma construção completa dos índices antes de realizar o processo de poda. Por outro lado, full coverage, top fragments e random são métodos single-pass que executam a poda dos índices sem requerer uma construção prévia dos índices originais. Os métodos single-pass são úteis para ambientes onde a base de documentos sofre alterações contínuas, como em máquinas de busca de grande escala desenvolvidas para a web. Experimentos utilizando uma máquina de busca real mostram que os métodos propostos neste trabalho podem reduzir o custo de armazenamento dos índices em até 60%, enquanto mantém uma perda mínima de precisão. Mais importante, os resultados dos experimentos indicam que esta mesma redução de 60% no tamanho dos índices pode reduzir o tempo de processamento de consultas para quase 57% do tempo original. Além disso, os experimentos mostram que, para consultas conjuntivas e frases, os métodos baseados em localidade produzem resultados melhores do que o método de Carmel, melhor método proposto na literatura. Por exemplo, utilizando apenas consultas com frases, com uma redução de 67% no tamanho dos índices, o método baseados em localidade two-pass lbpm produziu resultados com uma grau de similaridade de 0.71, em relação aos resultados obtidos com os índices originais, enquanto o método de Carmel produziu resultados com um grau de similaridade de apenas 0.39. Os resultados obtidos mostram que os métodos de poda baseados em localidade são mais efetivos em manter a qualidade dos resultados providos por máquinas de busca.
Htait, Amal. "Sentiment analysis at the service of book search." Electronic Thesis or Diss., Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0260.
Повний текст джерелаThe web technology is in an on going growth, and a huge volume of data is generated in the social web, where users would exchange a variety of information. In addition to the fact that social web text may be rich of information, the writers are often guided by provoked sentiments reflected in their writings. Based on that concept, locating sentiment in a text can play an important role for information extraction. The purpose of this thesis is to improve the book search and recommendation quality of the Open Edition's multilingual Books platform. The Books plat- form also offers additional information through users generated information (e.g. book reviews) connected to the books and rich in emotions expressed in the users' writings. Therefore, the previous analysis, concerning locating sentiment in a text for information extraction, plays an important role in this thesis, and can serve the purpose of quality improvement concerning book search, using the shared users generated information. Accordingly, we choose to follow a main path in this thesis to combine sentiment analysis (SA) and information retrieval (IR) fields, for the purpose of improving the quality of book search. Two objectives are summarised in the following, which serve the main purpose of the thesis in the IR quality improvement using SA: • An approach for SA prediction, easily applicable on different languages, low cost in time and annotated data. • New approaches for book search quality improvement, based on SA employment in information filtering, retrieving and classifying
Silva, Thomaz Philippe Cavalcante. "Uma abordagem evolutiva para combinação de fontes de evidência de relevância em máquinas de busca." Universidade Federal do Amazonas, 2008. http://tede.ufam.edu.br/handle/tede/2966.
Повний текст джерелаCNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico
Modern search engines use different strategies to improve the quality of their answers. An important strategy is to get an ordered list of documents based on lists produced by different sources of evidence. This work studies the use of a evolutionary technique to generate good functions of combination of three different sources of evidence: the textual content of the documents, the connecting structures between the documents in a collection and the concatenation of anchor texts pointing to each document. The functions Combination findings in this study were tested in two separate collections: the first contains queries and document a real Web search engine that contains some 12 million documents and the second is to LETOR reference collection, created to allow the fair comparison between collating functions learning methods. The experiments indicate that the studied approach here is a practical and effective alternative to combining different sources of evidence in a single list of answers. We also checked different query classes require different functions combination of sources of evidence and show that our approach is feasible to identify good features.
Máquinas de busca modernas utilizam diferentes estratégias para melhorar a qualidade de suas respostas. Uma estratégia importante é obter uma única lista ordenada de documentos baseada em listas produzidas por diferentes fontes de evidência. Este trabalho estuda o uso de uma técnica evolutiva para gerar boas funções de combinação de três diferentes fontes de evidência: o conteúdo textual dos documentos, as estruturas de ligação entre os documentos de uma coleção e a concatenação dos textos de âncora que apontam para cada documento. As funções de combinação descobertas neste trabalho foram testadas em duas coleções distintas: a primeira contém consultas e documentos de uma máquina de busca real da Web que contém cerca de 12 milhões de documentos e a segunda é a coleção de referência LETOR, criada para permitir a justa comparação entre métodos de aprendizagem de funções de ordenação. Os experimentos indicam que a abordagem estudada aqui é uma alternativa prática e efetiva para combinação de diferentes fontes de evidência em uma única lista de respostas. Nós verificamos também que diferentes classes de consultas necessitam de diferentes funções de combinação de fontes de evidência e mostramos que nossa abordagem é viável em identificar boas funções.
Andrade, Julietti de. "Interoperabilidade e mapeamentos entre sistemas de organização do conhecimento na busca e recuperação de informações em saúde: estudo de caso em ortopedia e traumatologia." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/27/27151/tde-29062015-121813/.
Повний текст джерелаThis research presents the development of method for search and information retrieval in specialized databases aiming the production of scientific knowledge in healthcare, with emphasis on Evidence-Based Health. We have used, in this work, different techniques considering the specificities of each stage: exploratory research, hypothetical deductive method and qualitative empirical case study. It mobilizes the theoretical and methodological foundations in Information Science and Health, appling them to areas as knowledge organization and information retrieval, Semantic Web, Evidence-Based Health and Scientific Methodology. Two experiments were performed: a case study in Orthopedics and Traumatology in order to identify and establish criterions for search, retrieval, organization and selection of information, so that these criterions can integrate part of the methodology of scientific work in healthcare; and analysis of kinds of search and retrieval and mappings on Knowledge Organization Systems-KOS available in Metathesaurus, considering the scope of the Unified Medical Language System (UMLS) of the US National Library of Medicine (NLM), and in the BioPortal National Center for Biomedical Ontology, both in the biomedical field. The UMLS provides access to 151 KOS, and the BioPortal provides a set of 302 ontologies. We presented proposals for construction of search strategies by using Knowledge Organization System mapped and interoperate as well as for conducting literature searches for preparation of scientific papers in healthcare.
Craswell, Nicholas Eric. "Methods for Distributed Information Retrieval." Phd thesis, 2000. http://hdl.handle.net/1885/46255.
Повний текст джерела