Log in

Relevant bibliographies by topics / Web information retrieval / Dissertations / Theses

Dissertations / Theses on the topic 'Web information retrieval'

To see the other types of publications on this topic, follow the link: Web information retrieval.

Author: Grafiati

Published: 4 June 2021

Last updated: 4 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Web information retrieval.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Plachouras, Vasileios. "Selective web information retrieval." Thesis, University of Glasgow, 2006. http://theses.gla.ac.uk/1945/.

Full text

Abstract:

This thesis proposes selective Web information retrieval, a framework formulated in terms of statistical decision theory, with the aim to apply an appropriate retrieval approach on a per-query basis. The main component of the framework is a decision mechanism that selects an appropriate retrieval approach on a per-query basis. The selection of a particular retrieval approach is based on the outcome of an experiment, which is performed before the final ranking of the retrieved documents. The experiment is a process that extracts features from a sample of the set of retrieved documents. This thesis investigates three broad types of experiments. The first one counts the occurrences of query terms in the retrieved documents, indicating the extent to which the query topic is covered in the document collection. The second type of experiments considers information from the distribution of retrieved documents in larger aggregates of related Web documents, such as whole Web sites, or directories within Web sites. The third type of experiments estimates the usefulness of the hyperlink structure among a sample of the set of retrieved Web documents. The proposed experiments are evaluated in the context of both informational and navigational search tasks with an optimal Bayesian decision mechanism, where it is assumed that relevance information exists. This thesis further investigates the implications of applying selective Web information retrieval in an operational setting, where the tuning of a decision mechanism is based on limited existing relevance information and the information retrieval system’s input is a stream of queries related to mixed informational and navigational search tasks. First, the experiments are evaluated using different training and testing query sets, as well as a mixture of different types of queries. Second, query sampling is introduced, in order to approximate the queries that a retrieval system receives, and to tune an ad-hoc decision mechanism with a broad set of automatically sampled queries.

APA, Harvard, Vancouver, ISO, and other styles

2

Robinson, Martin H. "Intelligent information retrieval using web communities." Thesis, University of Ulster, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.424555.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Tomassen, Stein L. "Conceptual Ontology Enrichment for Web Information Retrieval." Doctoral thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-14270.

Full text

Abstract:

Searching for information on the Web can be frustrating. One of the reasons is the ambiguity of words. The work presented in this thesis concentrates on how the effectiveness of standard information retrieval systems can be enhanced with semantic technologies like ontologies. Ontologies are knowledge models that can represent knowledge of any universe of discourse by describing how concepts of a domain are related. Creating and maintaining ontologies can be tedious and costly. However, we focus on reusing ontologies, rather than engineering, and on their applicability to improve the retrieval effectiveness of existing search systems. The aim of this work is to find an effective approach for applying ontologies to existing search systems. The basic idea is that these ontologies can be used to tackle the problem of ambiguous words and hence improve the retrieval effectiveness. Our approach to semantic search builds on feature vectors (FV). The basic idea is to connect the (standardised) domain terminology encoded in an ontology to the actual terminology used in a text corpus. Therefore, we propose to associate every ontology entity (classes and individuals are called entities in this work) with a FV that is tailored to the actual terminology used in a text corpus like the Web. These FVs are created off-line and later used on-line to filter (i.e. to disambiguate search) and re-rank the search results from an underlying search system. This pragmatic approach is applicable to existing search systems since it only depends on extending the query and presentation components, in other words there is no need to alter either the indexing or the ranking components of the existing systems. A set of experiments have been carried out and the results report on improvement by more than 10%. Furthermore, we have shown that the approach is neither dependent on highly specific queries nor on a collection comprised only of relevant documents. In addition, we have shown that the FVs are relatively persistent, i.e. little maintenance of the FVs is required. In this work, we focus on the creation and evaluation of these feature vectors. As a result, a part of the contribution of this work is a framework for the construction of FVs. Furthermore, we have proposed a set of metrics to measure the quality of the created FVs. We have also provided a set of guidelines for optimal construction of feature vectors for different categories of ontologies.

APA, Harvard, Vancouver, ISO, and other styles

4

Yoo, Seung Yeol Computer Science &amp Engineering Faculty of Engineering UNSW. "Topic-focused and summarized web information retrieval." Awarded by:University of New South Wales. Computer Science and Engineering, 2007. http://handle.unsw.edu.au/1959.4/26221.

Full text

Abstract:

Since the Web is getting bigger and bigger with a rapidly increasing number of heterogeneous Web pages, Web users often suffer from two problems: P1) irrelevant information and P2) information overload Irrelevant information indicates the weak relevance between the retrieved information and a user's information need. Information overload indicates that the retrieved information may contain 1) redundant information (e.g., common information between two retrieved Web pages) or 2) too much amount of information which cannot be easily understood by a user. We consider four major causes of those two problems P1) and P2) as follows; ??? Firstly, ambiguous query-terms. ??? Secondly, ambiguous terms in a Web page. ??? Thirdly, a query and a Web page cannot be semantically matched, because of the first and second causes. ??? Fourthly, the whole content of a Web page is a coarse context-boundary to measure the similarity between the Web page and a query. To answer those two problems P1) and P2), we consider that the meanings of words in a Web page and a query are primitive hints for understanding the related semantics of the Web page. Thus, in this dissertation, we developed three cooperative technologies: Word Sense Based Web Information Retrieval (WSBWIR), Subjective Segment Importance Model (SSIM) and Topic Focused Web Page Summarization (TFWPS). ??? WSBWIR allows for a user to 1) describe their information needs at senselevel and 2) provides one way for users to conceptually explore information existing within Web pages. ??? SSIM discovers a semantic structure of a Web page. A semantic structure respects not only Web page authors logical presentation structures but also a user specific topic interests on the Web pages at query time. ??? TFWPS dynamically generates extractive summaries respecting a user's topic interests. WSBWIR, SSIM and TFWPS technologies are implemented and experimented through several case-studies, classification and clustering tasks. Our experiments demonstrated that 1) the comparable effectiveness of exploration of Web pages using word senses, and 2) the segments partitioned by SSIM and summaries generated by TFWPS can provide more topically coherent features for classification and clustering purposes.

APA, Harvard, Vancouver, ISO, and other styles

5

Zayour, Iyad. "Information retrieval over the World Wide Web." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/mq22023.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Lee, Kwok-wai Joseph, and 李國偉. "Information retrieval on the world wide web." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B42576192.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Lee, Kwok-wai Joseph. "Information retrieval on the world wide web." Click to view the E-thesis via HKUTO, 2001. http://sunzi.lib.hku.hk/hkuto/record/B42576192.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Lewandowski, Dirk. "Web Searching, Search Engines and Information Retrieval." ISO Press, 2005. http://hdl.handle.net/10150/106395.

Full text

Abstract:

This article discusses Web search engines; mainly the challenges in indexing the World Wide Web, the user behaviour, and the ranking factors used by these engines. Ranking factors are divided into query-dependent and query-independent factors, the latter of which have become more and more important within recent years. The possibilities of these factors are limited, mainly of those that are based on the widely used link popularity measures. The article concludes with an overview of factors that should be considered to determine the quality of Web search engines.

APA, Harvard, Vancouver, ISO, and other styles

9

Craswell, Nicholas Eric, and Nick Craswell@anu edu au. "Methods for Distributed Information Retrieval." The Australian National University. Faculty of Engineering and Information Technology, 2001. http://thesis.anu.edu.au./public/adt-ANU20020315.142540.

Full text

Abstract:

Published methods for distributed information retrieval generally rely on cooperation from search servers. But most real servers, particularly the tens of thousands available on the Web, are not engineered for such cooperation. This means that the majority of methods proposed, and evaluated in simulated environments of homogeneous cooperating servers, are never applied in practice. ¶ This thesis introduces new methods for server selection and results merging. The methods do not require search servers to cooperate, yet are as effective as the best methods which do. Two large experiments evaluate the new methods against many previously published methods. In contrast to previous experiments they simulate a Web-like environment, where servers employ varied retrieval algorithms and tend not to sub-partition documents from a single source. ¶ The server selection experiment uses pages from 956 real Web servers, three different retrieval systems and TREC ad hoc topics. Results show that a broker using queries to sample servers documents can perform selection over non-cooperating servers without loss of effectiveness. However, using the same queries to estimate the effectiveness of servers, in order to favour servers with high quality retrieval systems, did not consistently improve selection effectiveness. ¶ The results merging experiment uses documents from five TREC sub-collections, five different retrieval systems and TREC ad hoc topics. Results show that a broker using a reference set of collection statistics, rather than relying on cooperation to collate true statistics, can perform merging without loss of effectiveness. Since application of the reference statistics method requires that the broker download the documents to be merged, experiments were also conducted on effective merging based on partial documents. The new ranking method developed was not highly effective on partial documents, but showed some promise on fully downloaded documents. ¶ Using the new methods, an effective search broker can be built, capable of addressing any given set of available search servers, without their cooperation.

APA, Harvard, Vancouver, ISO, and other styles

10

He, Bing. "Efficient information retrieval from the World Wide Web." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ33938.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Stokoe, Christopher. "Automated word sense disambiguation for Web information retrieval." Thesis, University of Sunderland, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.408881.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Al-Shannaq, Moy'awiah Abdulla. "ALGORITHMS FOR ENHANCING INFORMATION RETRIEVAL USING SEMANTIC WEB." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1437935877.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Bracamonte, Nole Teresa Jacqueline. "Improving web multimedia information retrieval using social data." Tesis, Universidad de Chile, 2018. http://repositorio.uchile.cl/handle/2250/168681.

Full text

Abstract:

Tesis para optar al grado de Doctora en Ciencias, Mención Computación
Buscar contenido multimedia es una de las tareas más comunes que los usuarios realizan en la Web. Actualmente, los motores de búsqueda en la Web han mejorado la precisión de sus búsquedas de contenido multimedia y ahora brindan una mejor experiencia de usuarios. Sin embargo, estos motores aún no logran obtener resultados precisos para consultas que no son comunes, y consultas que se refieren a conceptos abstractos. En ambos escenarios, la razón principal es la falta de información preliminar. Esta tesis se enfoca en mejorar la recuperación de información multimedia en la Web usando datos generados a partir de la interacción entre usuarios y recursos multimedia. Para eso, se propone mejorar la recuperación de información multimedia desde dos perspectivas: (1) extrayendo conceptos relevantes a los recursos multimedia, y (2) mejorando las descripciones multimedia con datos generados por el usuario. En ambos casos, proponemos sistemas que funcionan independientemente del tipo de multimedia, y del idioma de los datos de entrada. En cuanto a la identificación de conceptos relacionados a objetos multimedia, desarrollamos un sistema que va desde los resultados de búsqueda específicos de la consulta hasta los conceptos detectados para dicha consulta. Nuestro enfoque demuestra que podemos aprovechar la vista parcial de una gran colección de documentos multimedia para detectar conceptos relevantes para una consulta determinada. Además, diseñamos una evaluación basada en usuarios que demuestra que nuestro algoritmo de detección de conceptos es más sólido que otros enfoques similares basados en detección de comunidades. Para mejorar la descripción multimedia, desarrollamos un sistema que combina contenido audio-visual de documentos multimedia con información de su contexto para mejorar y generar nuevas anotaciones para los documentos multimedia. Específicamente, extraemos datos de clicks de los registros de consultas y usamos las consultas como sustitutos para las anotaciones manuales. Tras una primera inspección, demostramos que las consultas proporcionan una descripción concisa de los documentos multimedia. El objetivo principal de esta tesis es demostrar la relevancia del contexto asociado a documentos multimedia para mejorar el proceso de recuperación de documentos multimedia en la Web. Además, mostramos que los grafos proporcionan una forma natural de modelar problemas multimedia.
Fondef D09I-1185, CONICYT-PCHA/Doctorado Nacional/2013-63130260, Apoyo a estadías corta de la Escuela de Postgrado de la U. de Chile, y el Núcleo Milenio CIWS

APA, Harvard, Vancouver, ISO, and other styles

14

Nagypál, Gábor. "Possibly imperfect ontologies for effective information retrieval." Karlsruhe : Univ.-Verl. Karlsruhe, 2007. http://d-nb.info/986790028/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Limbu, Dilip Kumar. "Contextual information retrieval from the WWW." Click here to access this resource online, 2008. http://hdl.handle.net/10292/450.

Full text

Abstract:

Contextual information retrieval (CIR) is a critical technique for today’s search engines in terms of facilitating queries and returning relevant information. Despite its importance, little progress has been made in its application, due to the difficulty of capturing and representing contextual information about users. This thesis details the development and evaluation of the contextual SERL search, designed to tackle some of the challenges associated with CIR from the World Wide Web. The contextual SERL search utilises a rich contextual model that exploits implicit and explicit data to modify queries to more accurately reflect the user’s interests as well as to continually build the user’s contextual profile and a shared contextual knowledge base. These profiles are used to filter results from a standard search engine to improve the relevance of the pages displayed to the user. The contextual SERL search has been tested in an observational study that has captured both qualitative and quantitative data about the ability of the framework to improve the user’s web search experience. A total of 30 subjects, with different levels of search experience, participated in the observational study experiment. The results demonstrate that when the contextual profile and the shared contextual knowledge base are used, the contextual SERL search improves search effectiveness, efficiency and subjective satisfaction. The effectiveness improves as subjects have actually entered fewer queries to reach the target information in comparison to the contemporary search engine. In the case of a particularly complex search task, the efficiency improves as subjects have browsed fewer hits, visited fewer URLs, made fewer clicks and have taken less time to reach the target information when compared to the contemporary search engine. Finally, subjects have expressed a higher degree of satisfaction on the quality of contextual support when using the shared contextual knowledge base in comparison to using their contextual profile. These results suggest that integration of a user’s contextual factors and information seeking behaviours are very important for successful development of the CIR framework. It is believed that this framework and other similar projects will help provide the basis for the next generation of contextual information retrieval from the Web.

APA, Harvard, Vancouver, ISO, and other styles

16

Immaneni, Trivikram. "A HYBRID APPROACH TO RETRIEVING WEB DOCUMENTS AND SEMANTIC WEB DATA." Wright State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=wright1199923822.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Sütö, Mihály. "Ortsbasierter Web-Zugriff." [S.l. : s.n.], 2002. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB10361111.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Knight, Shirlee-ann. "User perceptions of information quality in world wide web information retrieval behaviour." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2007. https://ro.ecu.edu.au/theses/316.

Full text

Abstract:

In less than a generation, the World Wide Web has grown from a relatively small cyber play-ground of academic "geeks" into an 11.5 billion-page collection of heterogeneous, inter-connected, network of information and collective knowledge. As an information environment the World Wide Web is informatically representative of all that is good and bad about the human need to both absorb and transmit knowledge. The 'open' nature of the Web makes instantly available to anyone who can "log-on", a boundless digital library of information, the quality of which cannot be enforced before, during, or even after its publication. Scrutiny of Information Quality (IQ), is therefore left up to those publishers conscientious enough to care about the quality of the information they produce and the users who choose to employ the Web as an information retrieval tool. The following thesis is a qualitative investigation of how the users of information make value-judgments about the information they encounter and retrieve from the Web. Specifically, it examines perceptions of IQ from the perspective of eighty "academic" high-end users, who regularly engage the Web and its search engines to search for and retrieve high-quality information related to their research, teaching and learning. The investigation has adopted an inductive approach in the qualitative analysis of quantitative ( 10,080 separate pieces of user-data) data in the context of such established frameworks as Davis' ( 1986, I 989) Technology Acceptance Model (TAM), and Wang & Strong's ( 1996) contextual IQ framework that conceptualised dimensions of quality into four IQ categories, namely: intrinsic; representational; contextual; and accessibility IQ. Through the detailed analysis of the driving theory behind these, and other associated models of: (I) user IT acceptance; (2) Information Seeking Behaviour (ISB}; and (3) multi–dimensional characteristics of IQ; the researcher has sought to find synergies and develop an innovative framework by which to explore the impact of users' attitudes, expectations and perceptions of IQ on their Web information retrieval behaviours. The findings associated with the thesis are consistent with the proposal of a new Ongoing Technology Acceptance Model (OTAM), which facilitates the measurement of users perception of the predictability of their technology interactions, and has the capacity to more accurately investigate user individual differences. Importantly, the OTAM allows the constructs of the original TAM, along with a new construct “Perception of Interaction" (Pol) to be used to investigate users ongoing use of technologies. Findings associated with user perceptions of information quality are also explored and discussed in relation to a proposed life-cycle model of IQ.

APA, Harvard, Vancouver, ISO, and other styles

19

U, Leong Hou. "Web image clustering and retrieval." Thesis, University of Macau, 2005. http://umaclib3.umac.mo/record=b1445902.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Hess, Martin. "Verteiltes Information-Retrieval für nicht-kooperative Suchserver im WWW." [S.l. : s.n.], 2002. http://deposit.ddb.de/cgi-bin/dokserv?idn=965186687.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

種市, 淳子, Junko Taneichi, 裕. 逸村, and Hiroshi Itsumura. "エンドユーザーのWeb探索行動." 三田図書館・情報学会, 2006. http://hdl.handle.net/2237/92.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Jakob, Mihály. "Ortsbezug für Web-Inhalte." [S.l. : s.n.], 2003. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB10605156.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Wang, Jiying. "Information extraction and integration for Web databases /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?COMP%202004%20WANGJ.

Full text

Abstract:

Thesis (Ph. D.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 112-118). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

24

Balluck, Ashwinkoomarsing. "Optimising Information Retrieval from the Web in Low-bandwidth Environments." Thesis, University of Cape Town, 2007. http://pubs.cs.uct.ac.za/archive/00000397/.

Full text

Abstract:

The Internet has potential to deliver information to Web users that have no other way of getting to those resources. However, information on the Web is scattered without any proper semantics for classifying them and thus this makes information discovery difficult. Thus, to ease the querying of this huge bin of information, developers have built tools amongst which are the search engines and Web directories. However, for these tools to give optimal results, two factors need to be given due importance: the users’ ability to use these tools and the bandwidth that is present in these environments. Unfortunately, after an initial study, none of these two factors were present in Mauritius where low bandwidth prevails. Hence, this study helps us get a better idea of how users use the search tools. To achieve this, we designed a survey where Web users were asked about their skills in using search tools. Then, a jump page using the search boxes of different search engines was developed to provide directed guidance for effective searching in low bandwidth environments. We then conducted a further evaluation, using a sample of users to see if there were any changes in the way users access the search tools. The results from this study were then examined. We noticed that the users were initially unaware about the specificities of the different search tools thus preventing efficient use. However, during the survey, they were educated on how to use those tools and this was fruitful when a further evaluation was performed. Hence the efficient use of the search tools helped in reducing the traffic flow in low bandwidth environments.

APA, Harvard, Vancouver, ISO, and other styles

25

Tan, Kok Fong. "Extending information retrieval system model to improve interactive web searching." Thesis, Middlesex University, 2005. http://eprints.mdx.ac.uk/8027/.

Full text

Abstract:

The research set out with the broad objective of developing new tools to support Web information searching. A survey showed that a substantial number of interactive search tools were being developed but little work on how these new developments fitted into the general aim of helping people find information. Due to this it proved difficult to compare and analyse how tools help and affect users and where they belong in a general scheme of information search tools. A key reason for a lack of better information searching tools was identified in the ill-suited nature of existing information retrieval system models. The traditional information retrieval model is extended by synthesising work in information retrieval and information seeking research. The purpose of this new holistic search model is to assist information system practitioners in identifying, hypothesising, designing and evaluating Web information searching tools. Using the model, a term relevance feedback tool called 'Tag and Keyword' (TKy) was developed in a Web browser and it was hypothesised that it could improve query reformulation and reduce unnecessary browsing. The tool was laboratory experimented and quantitative analysis showed statistical significances in increased query reformulations and in reduced Web browsing (per query). Subjects were interviewed after the experiment and qualitative analysis revealed that they found the tool useful and saved time. Interestingly, exploratory analysis on collected data identified three different methods in which subjects had utilised the TKy tool. The research developed a holistic search model for Web searching and demonstrated that it can be used to hypothesise, design and evaluate information searching tools. Information system practitioners using it can better understand the context in which their search tools are developed and how these relate to users' search processes and other search tools.

APA, Harvard, Vancouver, ISO, and other styles

26

Tsikrika, Theodora. "Combination of evidence for relevance criteria in web information retrieval." Thesis, Queen Mary, University of London, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.515454.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Dreilinger, Daniel Ethan 1970. "Scale free information retrieval : visually searching and navigating the web." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/61097.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Tang, Ling-Xiang. "Link discovery for Chinese/English cross-language web information retrieval." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/58416/1/Ling-Xiang_Tang_Thesis.pdf.

Full text

Abstract:

Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.

APA, Harvard, Vancouver, ISO, and other styles

29

Nde, Matulová Hana. "Crosswalks between the deep web and the surface web." Hamburg Kovač, 2008. http://d-nb.info/991114655/04.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Zakos, John, and n/a. "A Novel Concept and Context-Based Approach for Web Information Retrieval." Griffith University. School of Information and Communication Technology, 2005. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20060303.104937.

Full text

Abstract:

Web information retrieval is a relatively new research area that has attracted a significant amount of interest from researchers around the world since the emergence of the World Wide Web in the early 1990s. The problems facing successful web information retrieval are a combination of challenges that stem from traditional information retrieval and challenges characterised by the nature of the World Wide Web. The goal of any information retrieval system is to provide an information need fulfilment in response to an information need. In a web setting, this means retrieving as many relevant web documents as possible in response to an inputted query that is typically limited to only containing a few terms expressive of the user's information need. This thesis is primarily concerned with firstly reviewing pertinent literature related to various aspects of web information retrieval research and secondly proposing and investigating a novel concept and context-based approach. The approach consists of techniques that can be used together or independently and aim to provide an improvement in retrieval accuracy over other approaches. A novel concept-based term weighting technique is proposed as a new method of deriving query term significance from ontologies that can be used for the weighting of inputted queries. A technique that dynamically determines the significance of terms occurring in documents based on the matching of contexts is also proposed. Other contributions of this research include techniques for the combination of document and query term weights for the ranking of retrieved documents. All techniques were implemented and tested on benchmark data. This provides a basis for performing comparison with previous top performing web information retrieval systems. High retrieval accuracy is reported as a result of utilising the proposed approach. This is supported through comprehensive experimental evidence and favourable comparisons against previously published results.

APA, Harvard, Vancouver, ISO, and other styles

31

Zakos, John. "A Novel Concept and Context-Based Approach for Web Information Retrieval." Thesis, Griffith University, 2005. http://hdl.handle.net/10072/365878.

Full text

Abstract:

Web information retrieval is a relatively new research area that has attracted a significant amount of interest from researchers around the world since the emergence of the World Wide Web in the early 1990s. The problems facing successful web information retrieval are a combination of challenges that stem from traditional information retrieval and challenges characterised by the nature of the World Wide Web. The goal of any information retrieval system is to provide an information need fulfilment in response to an information need. In a web setting, this means retrieving as many relevant web documents as possible in response to an inputted query that is typically limited to only containing a few terms expressive of the user's information need. This thesis is primarily concerned with firstly reviewing pertinent literature related to various aspects of web information retrieval research and secondly proposing and investigating a novel concept and context-based approach. The approach consists of techniques that can be used together or independently and aim to provide an improvement in retrieval accuracy over other approaches. A novel concept-based term weighting technique is proposed as a new method of deriving query term significance from ontologies that can be used for the weighting of inputted queries. A technique that dynamically determines the significance of terms occurring in documents based on the matching of contexts is also proposed. Other contributions of this research include techniques for the combination of document and query term weights for the ranking of retrieved documents. All techniques were implemented and tested on benchmark data. This provides a basis for performing comparison with previous top performing web information retrieval systems. High retrieval accuracy is reported as a result of utilising the proposed approach. This is supported through comprehensive experimental evidence and favourable comparisons against previously published results.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

32

El-Khalili, Nuha H. "Surgical training on the World Wide Web." Thesis, University of Leeds, 1999. http://etheses.whiterose.ac.uk/643/.

Full text

Abstract:

The World Wide Web as a repository of information has had a great influence on our lives. This influence is increasing as the web introduces applications in addition to information. These applications have several advantages, such as world wide accessibility, distance group learning and collaboration. Furthermore, the web encourages training applications since it offers multi-media that can support all stages of training. On the other hand, the virtual reality technology has been utilised to provide new systematic training methods for surgical procedures. These solutions are usually expensive in terms of cost and computation. In this thesis we propose a novel solution to fulfill the training needs of radiologists performing one type of minimally invasive surgery known as interventional radiology. Our training method combines the capabilities of virtual reality to provide realistic simulation environment together with the web environment to provide platform independent, scalable and accessible system. In this thesis we analyse this type of surgical procedure in order to deduce the training requirements of such an application. Then, we investigate the possibility of fulfilling these requirements within the server-client architecture of the web environment. We study the degree to which current web technologies- such as Java and VRML- can support the development of a three-dimensional virtual environment with complex interactions. Furthermore, we study the plausibility of providing high computational behaviour modelling training environment on the web by utilising physically-based modelling techniques. We also discuss the effect of adopting the web environment on fulfilling the virtual reality and training requirements of our system. Finally, we evaluate the resulting system to find out how useful is the proposed solution from the clinical point of view.

APA, Harvard, Vancouver, ISO, and other styles

33

Wilhelm-Stein, Thomas. "Information Retrieval in der Lehre." Doctoral thesis, Universitätsbibliothek Chemnitz, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-199778.

Full text

Abstract:

Das Thema Information Retrieval hat insbesondere in Form von Internetsuchmaschinen eine große Bedeutung erlangt. Retrievalsysteme werden für eine Vielzahl unterschiedlicher Rechercheszenarien eingesetzt, unter anderem für firmeninterne Supportdatenbanken, aber auch für die Organisation persönlicher E-Mails. Eine aktuelle Herausforderung besteht in der Bestimmung und Vorhersage der Leistungsfähigkeit einzelner Komponenten dieser Retrievalsysteme, insbesondere der komplexen Wechselwirkungen zwischen ihnen. Für die Implementierung und Konfiguration der Retrievalsysteme und der Retrievalkomponenten werden Fachleute benötigt. Mithilfe der webbasierten Lernanwendung Xtrieval Web Lab können Studierende praktisches Wissen über den Information Retrieval Prozess erwerben, indem sie Retrievalkomponenten zu einem Retrievalsystem zusammenstellen und evaluieren, ohne dafür eine Programmiersprache einsetzen zu müssen. Spielemechaniken leiten die Studierenden bei ihrem Entdeckungsprozess an, motivieren sie und verhindern eine Informationsüberladung durch eine Aufteilung der Lerninhalte
Information retrieval has achieved great significance in form of search engines for the Internet. Retrieval systems are used in a variety of research scenarios, including corporate support databases, but also for the organization of personal emails. A current challenge is to determine and predict the performance of individual components of these retrieval systems, in particular the complex interactions between them. For the implementation and configuration of retrieval systems and retrieval components professionals are needed. By using the web-based learning application Xtrieval Web Lab students can gain practical knowledge about the information retrieval process by arranging retrieval components in a retrieval system and their evaluation without using a programming language. Game mechanics guide the students in their discovery process, motivate them and prevent information overload by a partition of the learning content

APA, Harvard, Vancouver, ISO, and other styles

34

Pitkow, James Edward. "Characterizing world wide web ecologies." Diss., Georgia Institute of Technology, 1997. http://hdl.handle.net/1853/8243.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Tabatabai, Diana. "Modeling information-seeking expertise on the Web." Thesis, McGill University, 2002. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=38522.

Full text

Abstract:

Searching for information pervades a wide spectrum of human activity, including learning and problem solving. With recent changes in the amount of information available and the variety of means of retrieval, there is even more need to understand why some searchers are more successful than others. This study was undertaken to advance our understanding of expertise in seeking information on the Web by identifying strategies and attributes that will increase the chance of a successful search on the Web. A model that illustrated the relationship between strategies and attributes and a successful search was also created. The strategies were: Evaluation, Navigation, Affect, Metacognition, Cognition, and Prior knowledge. Attributes included Age, Sex, Years of experience, Computer knowledge, and Info-seeking knowledge. Success was defined as finding a target topic within 30 minutes. Participants were from three groups. Novices were 10 undergraduate pre-service teachers who were trained in pedagogy but not specifically in information seeking. Intermediates were nine final-year master's students who had received training on how to search but typically had not put heir knowledge into extensive practice. Experts were 10 highly experienced professional librarians working in a variety of settings including government, industry, and university. Participants' verbal protocols were transcribed verbatim into a text file and coded. These codes, along with Internet temporary files, a background questionnaire, and a post-task interview were the sources of the data. Since the variable of interest was the time to finding the topic, in addition to ANOVA and Pearson correlation, survival analysis was used to explore the data. The most significant differences in patterns of search between novices and experts were found in the Cognitive, Metacognitive, and Prior Knowledge strategies. Based on the fitted survival model, Typing Keyword, Criteria to evaluate sites, and Information-Seeking Kno

APA, Harvard, Vancouver, ISO, and other styles

36

Chang, Andrew Yee. "A web accessible clinical patient information networked system." CSUSB ScholarWorks, 2006. https://scholarworks.lib.csusb.edu/etd-project/2980.

Full text

Abstract:

Developed with the intention to make the patient data storage system in the clinical outpatient area more efficient, this system stores all pertinent and relevant patient data such as lab results, patient history and X-ray images. The system is accessible via the internet as well as operable over a local area network (LAN). The intended audience for this program is essentially the clinical staff (e.g., physicians, nursing staff, secretarial staff). The computer program was developed using Java Server Pages (JSP) and utilizes the Oracle 9i database.

APA, Harvard, Vancouver, ISO, and other styles

37

Wagner, Wiebke. "Semantische Agenten im Information Retrieval eine Studie über Semantic Web-Technologien /." [S.l. : s.n.], 2007. http://nbn-resolving.de/urn:nbn:de:bsz:16-opus-85230.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Liu, Qian. "Mining the Web to support Web image retrieval and image annotation." Thesis, University of Macau, 2007. http://umaclib3.umac.mo/record=b1677226.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Asonov, Dmitri. "Querying databases privately : a new approach to private information retrieval /." Berlin : Springer, 2004. http://springerlink.metapress.com/openurl.asp?genre=issue&issn=0302-9743&volume=3128.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Ahmed, S. M. Zabed. "A user-centered design of a web-based interface to bibliographic databases." Thesis, Loughborough University, 2002. https://dspace.lboro.ac.uk/2134/6893.

Full text

Abstract:

This thesis reports results of a research study into the usefulness of a user-centred approach for designing information retrieval interfaces. The main objective of the research was to examine the usability of an existing Web-based IR system in order to design a user-centred prototype Web interface. A series of usability experiments was carried out with the Web of Science. The first experiment was carried out using both novice and experienced users to see their performance and satisfaction with the interface. A set of search tasks was obtained from a user survey and was used in the study. The results showed that there were no significant differences in the time taken to complete the tasks, and the number of different search terms used between the two search groups. Novice users were significantly more satisfied with the interface than the experienced group. However, the experienced group was significantly more successful, and made fewer errors than the novice users. The second experiment was conducted on novices' learning and retention with the Web of Science using the same equipment, tasks and environment. The results of the original learning phase of the experiment showed that novices could readily pick up interface functionality when a brief training was provided. However, their retention of search skills weakened over time. Their subjective satisfaction with the interface also diminished from learning to retention. These findings suggested that the fundamental difficulties of searching IR systems still remain with the Web-based version. A heuristic evaluation was carried out to find out the usability problems in the Web of Science interface. Three human factors experts evaluate the interface. The heuristic evaluation was very helpful in identifying some interface design issues for Web IR systems. The most fundamental of these was increasing the match between system and the real world. The results of both the usability testing and the heuristic evaluations served as a baseline for designing a prototype Web interface. The prototype was designed based on a conceptual model of users' information seeking. Various usability evaluation methods were used to test the usability of the prototype system. After each round of testing, the interface was modified in accordance with the test findings. A summative evaluation of the prototype interface showed that both novice and experienced users improved their search performance. Comparative analysis with the earlier usability studies also showed significant improvements in performance and satisfaction with the prototype. These results show that user-centred methods can yield better interface design for IR systems.

APA, Harvard, Vancouver, ISO, and other styles

41

Mooney, Gabrielle Joanne. "Intelligent information retrieval from the World Wide Web using fuzzy user modelling." Thesis, De Montfort University, 1999. http://hdl.handle.net/2086/10685.

Full text

Abstract:

This thesis investigates the application. of fuzzy logic techniques and user modelling to the process of information retrieval (IR) from the World Wide Web (WWW). The research issue is whether this process can be improved through such an application. The exponential rise of information itself as an invaluable global commodity, coupled with .acceierating development in. computing and telecommunications, and boosted by networked information sources such as the WWW, has led to the development of tools, such as search engines, to facilitate information search and retrieval. However, despite their sophistication, they are unable effectively to. address users' information. needs. Also, as the-WWW can be seen as a dynamic, continuously changing global information corpus, these tools suffer from the problems of irrelevancy and redundancy. Therefore, in order to overcome these problems and remain effective, IR systems need to become 'intelligent' in some way. It is from this premise that the focus of this research has developed. Initially, theoretical and investigative research into the areas ofIR from electronic sources and the nature of the Internet (including the WWW) revealed that highly sophisticated systems are being developed and there is a drive towards the integration of, for example, electronic libraries, COROM networks, and the WWW. Research into intelligent IR, the use of AI techniques to improve the IR process, informed an evaluation of various approaches. This revealed that a munber of techniques, for example, expert systems, neural networks and semantic networks, have been employed, with limited success. Owing to the nature of the WWW, though, many of the previous AI approaches are inapplicable as they rely too much on extensive knowledge of the retrieval corpus. However, the evaluation suggested that fuzzy logic, with its inherent ability to capture partial knowledge within fuzzy sets, is a valid approach. User modelling research indicated that adaptive user stereotypes are a fruitful way to represent different types of user and their information need. Here, these stereotypes are represented as fuzzy sets, ensuring flexibility and adaptivity. The goal of the reported research. then, was not to. develop an 'intelligent agent' but to apply fuzzy logic techniques and user modelling to the process of user query formulation, in order to test the research issue. This issue was whether the application of these techniques could improve the IR process. A prototype system, the Fuzzy Modelling Query Assistant (FMQA), was developed that attempts intelligently to assist the user in capturing their information need. The concept was to refine the user's query before submitting it to an existing search engine, in order to improve upon the IR results of using the search tool alone. To address the research issue, a user study of the FMQA was performed. The design and conduct is reported in depth. The study results were analysed and the findings are given. The results indicate that,. for certain types of user especially, the FMQA does provide improvement in the IR process, in terms of the results. There is a critical review of the research aims in the light of the results, conclusions are drawn and recommendations for future research given.

APA, Harvard, Vancouver, ISO, and other styles

42

Xhemali, Daniela. "Automated retrieval and extraction of training course information from unstructured web pages." Thesis, Loughborough University, 2010. https://dspace.lboro.ac.uk/2134/7022.

Full text

Abstract:

Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, Naïve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance.

APA, Harvard, Vancouver, ISO, and other styles

43

Heckner, Markus. "Tagging, rating, posting : studying forms of user contribution for web-based information management and information retrieval /." Boizenburg Hülsbusch, 2008. http://d-nb.info/992369916/04.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Halpin, Harry. "Sense and reference on the Web." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/3796.

Full text

Abstract:

This thesis builds a foundation for the philosophy of theWeb by examining the crucial question: What does a Uniform Resource Identifier (URI) mean? Does it have a sense, and can it refer to things? A philosophical and historical introduction to the Web explains the primary purpose of theWeb as a universal information space for naming and accessing information via URIs. A terminology, based on distinctions in philosophy, is employed to define precisely what is meant by information, language, representation, and reference. These terms are then employed to create a foundational ontology and principles ofWeb architecture. From this perspective, the SemanticWeb is then viewed as the application of the principles of Web architecture to knowledge representation. However, the classical philosophical problems of sense and reference that have been the source of debate within the philosophy of language return. Three main positions are inspected: the logicist position, as exemplified by the descriptivist theory of reference and the first-generation SemanticWeb, the direct reference position, as exemplified by Putnamand Kripke’s causal theory of reference and the second-generation Linked Data initiative, and a Wittgensteinian position that views the Semantic Web as yet another public language. After identifying the public language position as the most promising, a solution of using people’s everyday use of search engines as relevance feedback is proposed as a Wittgensteinian way to determine sense of URIs. This solution is then evaluated on a sample of the Semantic Web discovered by via using queries from a hypertext search engine query log. The results are evaluated and the technique of using relevance feedback from hypertext Web searches to determine relevant Semantic Web URIs in response to user queries is shown to considerably improve baseline performance. Future work for the Web that follows from our argument and experiments is detailed, and outlines of a future philosophy of the Web laid out.

APA, Harvard, Vancouver, ISO, and other styles

45

Costa, Miguel. "SIDRA: a Flexible Web Search System." Master's thesis, Department of Informatics, University of Lisbon, 2004. http://hdl.handle.net/10451/13914.

Full text

Abstract:

Sidra is a new indexing, searching and ranking system for Web contents. It has a flexible, parallel, distributed and scalable architecture. Sidra maintains several data structures that provide multiple access methods to different data dimensions, giving it the capability to select results reflecting search contexts. Its design addresses current challenges of Web search engines: high performance, short searching and indexing times, good quality of results, scalability and high service availability

APA, Harvard, Vancouver, ISO, and other styles

46

Roberts, Marcus James. "A cooperative approach to networked information resource discovery." Thesis, University of Nottingham, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.338531.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Morrison, Patrick Jason. "Tagging and Searching: Search Retrieval Effectiveness of Folksonomies on the Web." [Kent, Ohio] : Kent State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=kent1177305096.

Full text

Abstract:

Thesis (M.S.)--Kent State University, 2007.
Title from PDF t.p. (viewed July 2, 2007). Advisor: David B. Robins. Keywords: information retrieval, search engine, social bookmarking, tagging, folksonomy, Internet, World Wide Web. Includes survey instrument. Includes bibliographical references (p. 137-141).

APA, Harvard, Vancouver, ISO, and other styles

48

Jäschke, Robert, and Sebastian Rudolph. "Attribute Exploration on the Web." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2013. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-113133.

Full text

Abstract:

We propose an approach for supporting attribute exploration by web information retrieval, in particular by posing appropriate queries to search engines, crowd sourcing systems, and the linked open data cloud. We discuss underlying general assumptions for this to work and the degree to which these can be taken for granted.

APA, Harvard, Vancouver, ISO, and other styles

49

Langford, James David. "Accessing Information on the World Wide Web: Predicting Usage Based on Involvement." Thesis, University of North Texas, 2003. https://digital.library.unt.edu/ark:/67531/metadc4198/.

Full text

Abstract:

Advice for Web designers often includes an admonition to use short, scannable, bullet-pointed text, reflecting the common belief that browsing the Web most often involves scanning rather than reading. Literature from several disciplines focuses on the myriad combinations of factors related to online reading but studies of the users' interests and motivations appear to offer a more promising avenue for understanding how users utilize information on Web pages. This study utilized the modified Personal Involvement Inventory (PII), a ten-item instrument used primarily in the marketing and advertising fields, to measure interest and motivation toward a topic presented on the Web. Two sites were constructed from Reader's Digest Association, Inc. online articles and a program written to track students' use of the site. Behavior was measured by the initial choice of short versus longer versions of the main page, the number of pages visited and the amount of time spent on the site. Data were gathered from students at a small, private university in the southwest part of the United States to answer six hypotheses which posited that subjects with higher involvement in a topic presented on the Web and a more positive attitude toward the Web would tend to select the longer text version, visit more pages, and spend more time on the site. While attitude toward the Web did not correlate significantly with any of the behavioral factors, the level of involvement was associated with the use of the sites in two of three hypotheses, but only partially in the manner hypothesized. Increased involvement with a Web topic did correlate with the choice of a longer, more detailed initial Web page, but was inversely related to the number of pages viewed so that the higher the involvement, the fewer pages visited. An additional indicator of usage, the average amount of time spent on each page, was measured and revealed that more involved users spent more time on each page.

APA, Harvard, Vancouver, ISO, and other styles

50

Zhang, Ying, and ying yzhang@gmail com. "Improved Cross-language Information Retrieval via Disambiguation and Vocabulary Discovery." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20090224.114940.

Full text

Abstract:

Cross-lingual information retrieval (CLIR) allows people to find documents irrespective of the language used in the query or document. This thesis is concerned with the development of techniques to improve the effectiveness of Chinese-English CLIR. In Chinese-English CLIR, the accuracy of dictionary-based query translation is limited by two major factors: translation ambiguity and the presence of out-of-vocabulary (OOV) terms. We explore alternative methods for translation disambiguation, and demonstrate new techniques based on a Markov model and the use of web documents as a corpus to provide context for disambiguation. This simple disambiguation technique has proved to be extremely robust and successful. Queries that seek topical information typically contain OOV terms that may not be found in a translation dictionary, leading to inappropriate translations and consequent poor retrieval performance. Our novel OOV term translation method is based on the Chinese authorial practice of including unfamiliar English terms in both languages. It automatically extracts correct translations from the web and can be applied to both Chinese-English and English-Chinese CLIR. Our OOV translation technique does not rely on prior segmentation and is thus free from seg mentation error. It leads to a significant improvement in CLIR effectiveness and can also be used to improve Chinese segmentation accuracy. Good quality translation resources, especially bilingual dictionaries, are valuable resources for effective CLIR. We developed a system to facilitate construction of a large-scale translation lexicon of Chinese-English OOV terms using the web. Experimental results show that this method is reliable and of practical use in query translation. In addition, parallel corpora provide a rich source of translation information. We have also developed a system that uses multiple features to identify parallel texts via a k-nearest-neighbour classifier, to automatically collect high quality parallel Chinese-English corpora from the web. These two automatic web mining systems are highly reliable and easy to deploy. In this research, we provided new ways to acquire linguistic resources using multilingual content on the web. These linguistic resources not only improve the efficiency and effectiveness of Chinese-English cross-language web retrieval; but also have wider applications than CLIR.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!