Dissertations / Theses on the topic 'Cross-language information retrieval'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Cross-language information retrieval.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Abusalah, Mustafa A. "Cross language information retrieval using ontologies." Thesis, University of Sunderland, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.505050.
Full textWang, Jianqiang. "Matching meaning for cross-language information retrieval." College Park, Md. : University of Maryland, 2005. http://hdl.handle.net/1903/3212.
Full textThesis research directed by: Library & Information Services. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
Nic, Gearailt Donnla Brighid. "Dictionary characteristics in cross-language information retrieval." Thesis, University of Cambridge, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.619885.
Full textNyman, Marie, and Maria Patja. "Cross-language information retrieval : sökfrågestruktur & sökfrågeexpansion." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-18892.
Full textUppsatsnivå: D
Adriani, Mirna. "A query ambiguity model for cross-language information retrieval." Thesis, University of Glasgow, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.407678.
Full textLoza, Christian. "Cross Language Information Retrieval for Languages with Scarce Resources." Thesis, University of North Texas, 2009. https://digital.library.unt.edu/ark:/67531/metadc12157/.
Full textLoza, Christian E. Mihalcea Rada F. "Cross language information retrieval for languages with scarce resources." [Denton, Tex.] : University of North Texas, 2009. http://digital.library.unt.edu/ark:/67531/metadc12157.
Full textLu, Chengye. "Peer to peer English/Chinese cross-language information retrieval." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/26444/1/Chengye_Lu_Thesis.pdf.
Full textLu, Chengye. "Peer to peer English/Chinese cross-language information retrieval." Queensland University of Technology, 2008. http://eprints.qut.edu.au/26444/.
Full textGupta, Parth Alokkumar. "Cross-view Embeddings for Information Retrieval." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/78457.
Full textEn esta disertación estudiamos problemas de vistas-múltiples relacionados con la recuperación de información utilizando técnicas de representación en espacios de baja dimensionalidad. Estudiamos las técnicas existentes y proponemos nuevas técnicas para solventar algunas de las limitaciones existentes. Presentamos formalmente el concepto de recuperación de información con escritura mixta, el cual trata las dificultades de los sistemas de recuperación de información cuando los textos contienen escrituras en distintos alfabetos debido a razones tecnológicas y socioculturales. Las palabras en escritura mixta son representadas en un espacio de características finito y reducido, compuesto por n-gramas de caracteres. Proponemos los auto-codificadores de vistas-múltiples (CAE, por sus siglas en inglés) para modelar dichas palabras en un espacio abstracto, y esta técnica produce resultados de vanguardia. En este sentido, estudiamos varios modelos para la recuperación de información entre lenguas diferentes (CLIR, por sus siglas en inglés) y proponemos un modelo basado en redes neuronales composicionales (XCNN, por sus siglas en inglés), el cual supera las limitaciones de los métodos existentes. El método de XCNN propuesto produce mejores resultados en diferentes tareas de CLIR tales como la recuperación de información ad-hoc, la identificación de oraciones equivalentes en lenguas distintas y la detección de plagio entre lenguas diferentes. Para tal efecto, realizamos pruebas experimentales para dichas tareas sobre conjuntos de datos disponibles públicamente, presentando los resultados y análisis correspondientes. En esta disertación, también exploramos un método eficiente para utilizar similitud semántica de contextos en el proceso de selección léxica en traducción automática. Específicamente, proponemos características extraídas de los contextos disponibles en las oraciones fuentes mediante el uso de auto-codificadores. El uso de las características propuestas demuestra mejoras estadísticamente significativas sobre sistemas de traducción robustos para las tareas de traducción entre inglés y español, e inglés e hindú. Finalmente, exploramos métodos para evaluar la calidad de las representaciones de datos de texto generadas por los auto-codificadores, a la vez que analizamos las propiedades de sus arquitecturas. Como resultado, proponemos dos nuevas métricas para cuantificar la calidad de las reconstrucciones generadas por los auto-codificadores: el índice de preservación de estructura (SPI, por sus siglas en inglés) y el índice de acumulación de similitud (SAI, por sus siglas en inglés). También presentamos el concepto de dimensión crítica de cuello de botella (CBD, por sus siglas en inglés), por debajo de la cual la información estructural se deteriora. Mostramos que, interesantemente, la CBD está relacionada con la perplejidad de la lengua.
En aquesta dissertació estudiem els problemes de vistes-múltiples relacionats amb la recuperació d'informació utilitzant tècniques de representació en espais de baixa dimensionalitat. Estudiem les tècniques existents i en proposem unes de noves per solucionar algunes de les limitacions existents. Presentem formalment el concepte de recuperació d'informació amb escriptura mixta, el qual tracta les dificultats dels sistemes de recuperació d'informació quan els textos contenen escriptures en diferents alfabets per motius tecnològics i socioculturals. Les paraules en escriptura mixta són representades en un espai de característiques finit i reduït, composat per n-grames de caràcters. Proposem els auto-codificadors de vistes-múltiples (CAE, per les seves sigles en anglès) per modelar aquestes paraules en un espai abstracte, i aquesta tècnica produeix resultats d'avantguarda. En aquest sentit, estudiem diversos models per a la recuperació d'informació entre llengües diferents (CLIR , per les sevas sigles en anglès) i proposem un model basat en xarxes neuronals composicionals (XCNN, per les sevas sigles en anglès), el qual supera les limitacions dels mètodes existents. El mètode de XCNN proposat produeix millors resultats en diferents tasques de CLIR com ara la recuperació d'informació ad-hoc, la identificació d'oracions equivalents en llengües diferents, i la detecció de plagi entre llengües diferents. Per a tal efecte, realitzem proves experimentals per aquestes tasques sobre conjunts de dades disponibles públicament, presentant els resultats i anàlisis corresponents. En aquesta dissertació, també explorem un mètode eficient per utilitzar similitud semàntica de contextos en el procés de selecció lèxica en traducció automàtica. Específicament, proposem característiques extretes dels contextos disponibles a les oracions fonts mitjançant l'ús d'auto-codificadors. L'ús de les característiques proposades demostra millores estadísticament significatives sobre sistemes de traducció robustos per a les tasques de traducció entre anglès i espanyol, i anglès i hindú. Finalment, explorem mètodes per avaluar la qualitat de les representacions de dades de text generades pels auto-codificadors, alhora que analitzem les propietats de les seves arquitectures. Com a resultat, proposem dues noves mètriques per quantificar la qualitat de les reconstruccions generades pels auto-codificadors: l'índex de preservació d'estructura (SCI, per les seves sigles en anglès) i l'índex d'acumulació de similitud (SAI, per les seves sigles en anglès). També presentem el concepte de dimensió crítica de coll d'ampolla (CBD, per les seves sigles en anglès), per sota de la qual la informació estructural es deteriora. Mostrem que, de manera interessant, la CBD està relacionada amb la perplexitat de la llengua.
Gupta, PA. (2017). Cross-view Embeddings for Information Retrieval [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/78457
TESIS
Zhang, Ying, and ying yzhang@gmail com. "Improved Cross-language Information Retrieval via Disambiguation and Vocabulary Discovery." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20090224.114940.
Full textTang, Ling-Xiang. "Link discovery for Chinese/English cross-language web information retrieval." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/58416/1/Ling-Xiang_Tang_Thesis.pdf.
Full textOrengo, Viviane Moreira. "Assessing relevance using automatically translated documents for cross-language information retrieval." Thesis, Middlesex University, 2004. http://eprints.mdx.ac.uk/13606/.
Full textWigder, Chaya. "Word embeddings for monolingual and cross-language domain-specific information retrieval." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233028.
Full textFlera studier har visat att ordinbäddningsmodeller är användningsbara för många olika språkteknologiuppgifter. Denna avhandling undersöker hur ordinbäddningsmodeller kan användas i sökmotorer för både enspråkig och tvärspråklig domänspecifik sökning. Experiment gjordes för att optimera hyperparametrarna till ordinbäddningsmodellerna och för att hitta det bästa sättet att vikta ord efter hur viktiga de är i dokumentet eller sökfrågan. Dessutom undersöktes metoder för att skapa domänspecifika tvåspråkiga inbäddningar. Systemet jämfördes med en baslinje utan inbäddningar baserad på cosinuslikhet, och för både enspråkiga och tvärspråkliga sökningar var systemet som använde enspråkiga inbäddningar bättre än baslinjen. Däremot var de tvåspråkiga inbäddningarna, särskilt för domänspecifika ord, av låg kvalitet och gav för dåliga resultat för direkt användning inom sökmotorer.
Hieber, Felix [Verfasser], and Stefan [Akademischer Betreuer] Riezler. "Translation-based Ranking in Cross-Language Information Retrieval / Felix Hieber ; Betreuer: Stefan Riezler." Heidelberg : Universitätsbibliothek Heidelberg, 2015. http://d-nb.info/1180396189/34.
Full textCederlund, Petter. "Cross-Language Information Retrieval : En granskning av tre översättningsmetoder använda i experimentell CLIR-forskning." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-20775.
Full textUppsatsnivå: D
Boström, Anna. "Cross-Language Information Retrieval : En studie av lingvistiska problem och utvecklade översättningsmetoder för lösningar angående informationsåtervinning över språkliga gränser." Thesis, Umeå University, Sociology, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1017.
Full textSyftet med denna uppsats är att undersöka problem samt lösningar i relation till informationsåtervinning över språkliga gränser. Metoden som har använts i uppsatsen är studier av forskningsmaterial inom lingvistik samt främst den relativt nya forskningsdisciplinen Cross-Language Information Retrieval (CLIR). I uppsatsen hävdas att världens alla olikartade språk i dagsläget måste betraktas som ett angeläget problem för informationsvetenskapen, ty språkliga skillnader utgör ännu ett stort hinder för den internationella informationsåtervinning som tekniska framsteg, uppkomsten av Internet, digitala bibliotek, globalisering, samt stora politiska förändringar i ett flertal länder runtom i världen under de senaste åren tekniskt och teoretiskt sett har möjliggjort. I uppsatsens första del redogörs för några universellt erkända lingvistiska skillnader mellan olika språk – i detta fall främst med exempel från europeiska språk – och vanliga problem som dessa kan bidra till angående översättningar från ett språk till ett annat. I uppsatsen hävdas att dessa skillnader och problem även måste anses som relevanta när det gäller informationsåtervinning över språkliga gränser. Uppsatsen fortskrider med att ta upp ämnet Cross-Language Information Retrieval (CLIR), inom vilken lösningar på flerspråkighet och språkskillnader inom informationsåtervinning försöker utvecklas och förbättras. Målet med CLIR är att en informationssökare så småningom skall kunna söka information på sitt modersmål men ändå hitta relevant information på flera andra språk. Ett ytterligare mål är att den återfunna informationen i sin helhet även skall kunna översättas till ett för sökaren önskat språk. Fyra olika översättningsmetoder som i dagsläget finns utvecklade inom CLIR för att automatiskt kunna översätta sökfrågor, ämnesord, eller, i vissa fall, hela dokument åt en informationssökare med lite eller ingen alls kunskap om det språk som han eller hon söker information på behandlas därefter. De fyra metoderna – identifierade som maskinöversättning, tesaurus- och ordboksöversättning, korpusbaserad översättning, samt ingen översättning – diskuteras även i relation till de lingvistiska problem och skillnader som har tagits upp i uppsatsens första del. Resultatet visar att språk är någonting mycket komplext och att de olika metoderna som hittills finns utvecklade ofta kan lösa något eller några av de uppmärksammade lingvistiska översättningssvårigheterna. Dock finns det inte någon utvecklad metod som i dagsläget kan lösa samtliga problem. Uppsatsen uppmärksammar emellertid även att CLIR-forskarna i hög grad är medvetna om de nuvarande metodernas uppenbara begränsningar och att man prövar att lösa detta genom att försöka kombinera flera olika översättningsmetoder i ett CLIR-system. Avslutningsvis redogörs även för CLIR-forskarnas förväntningar och förhoppningar inför framtiden.
This essay deals with information retrieval across languages by examining different types of literature in the research areas of linguistics and multilingual information retrieval. The essay argues that the many different languages that co-exist around the globe must be recognised as an essential obstacle for information science. The language barrier today remains a major impediment for the expansion of international information retrieval otherwise made technically and theoretically possible over the last few years by new technical developments, the Internet, digital libraries, globalisation, and moreover many political changes in several countries around the world. The first part of the essay explores linguistic differences and difficulties related to general translations from one language to another, using examples from mainly European languages. It is suggested that these problems and differences also must be acknowledged and regarded as highly important when it comes to information retrieval across languages. The essay continues by reporting on Cross-Language Information Retrieval (CLIR), a relatively new research area where methods for multilingual information retrieval are studied and developed. The object of CLIR is that people in the future shall be able to search for information in their native tongue, but still find relevant information in more than one language. Another goal for the future is the possibility to translate complete documents into a person’s language of preference. The essay reports on four different CLIR-methods currently established for automatically translating queries, subject headings, or, in some cases, complete documents, and thus aid people with little or no knowledge of the language in which he or she is looking for information. The four methods – identified as machine translation, translations using a multilingual thesaurus or a manually produced machine readable dictionary, corpus-based translation, and no translation – are discussed in relation to the linguistic translation difficulties mentioned in the paper’s initial part. The conclusion drawn is that language is exceedingly complex and that while the different CLIR-methods currently developed often can solve one or two of the acknowledged linguistic difficulties, none is able to overcome all. The essay also show, however, that CLIR-scientists are highly aware of the limitations of the different translation methods and that many are trying to get to terms with this by incorporating several sources of translation in one single CLIR-system. The essay finally concludes by looking at CLIR-scientists’ expectations and hopes for the future.
Richardson, W. Ryan. "Using Concept Maps as a Tool for Cross-Language Relevance Determination." Diss., Virginia Tech, 2007. http://hdl.handle.net/10919/28191.
Full textPh. D.
Franco, Salvador Marc. "A Cross-domain and Cross-language Knowledge-based Representation of Text and its Meaning." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/84285.
Full textEl Procesamiento del Lenguaje Natural (PLN) es un campo de la informática, la inteligencia artificial y la lingüística computacional centrado en las interacciones entre las máquinas y el lenguaje de los humanos. Uno de sus mayores desafíos implica capacitar a las máquinas para inferir el significado del lenguaje natural humano. Con este propósito, diversas representaciones del significado y el contexto han sido propuestas obteniendo un rendimiento competitivo. Sin embargo, estas representaciones todavía tienen un margen de mejora en escenarios transdominios y translingües. En esta tesis estudiamos el uso de grafos de conocimiento como una representación transdominio y translingüe del texto y su significado. Un grafo de conocimiento es un grafo que expande y relaciona los conceptos originales pertenecientes a un conjunto de palabras. Sus propiedades se consiguen gracias al uso como base de conocimiento de una red semántica multilingüe de amplia cobertura. Esto permite tener una cobertura de cientos de lenguajes y millones de conceptos generales y específicos del ser humano. Como punto de partida de nuestra investigación empleamos características basadas en grafos de conocimiento - junto con otras tradicionales y meta-aprendizaje - para la tarea de PLN de clasificación de la polaridad mono- y transdominio. El análisis y conclusiones de ese trabajo muestra evidencias de que los grafos de conocimiento capturan el significado de una forma independiente del dominio. La siguiente parte de nuestra investigación aprovecha la capacidad de la red semántica multilingüe y se centra en tareas de Recuperación de Información (RI). Primero proponemos un modelo de análisis de similitud completamente basado en grafos de conocimiento para detección de plagio translingüe. A continuación, mejoramos ese modelo para cubrir palabras fuera de vocabulario y tiempos verbales, y lo aplicamos a las tareas translingües de recuperación de documentos, clasificación, y detección de plagio. Por último, estudiamos el uso de grafos de conocimiento para las tareas de PLN de respuesta de preguntas en comunidades, identificación del lenguaje nativo, y identificación de la variedad del lenguaje. Las contribuciones de esta tesis ponen de manifiesto el potencial de los grafos de conocimiento como representación transdominio y translingüe del texto y su significado en tareas de PLN y RI. Estas contribuciones han sido publicadas en diversas revistas y conferencias internacionales.
El Processament del Llenguatge Natural (PLN) és un camp de la informàtica, la intel·ligència artificial i la lingüística computacional centrat en les interaccions entre les màquines i el llenguatge dels humans. Un dels seus majors reptes implica capacitar les màquines per inferir el significat del llenguatge natural humà. Amb aquest propòsit, diverses representacions del significat i el context han estat proposades obtenint un rendiment competitiu. No obstant això, aquestes representacions encara tenen un marge de millora en escenaris trans-dominis i trans-llenguatges. En aquesta tesi estudiem l'ús de grafs de coneixement com una representació trans-domini i trans-llenguatge del text i el seu significat. Un graf de coneixement és un graf que expandeix i relaciona els conceptes originals pertanyents a un conjunt de paraules. Les seves propietats s'aconsegueixen gràcies a l'ús com a base de coneixement d'una xarxa semàntica multilingüe d'àmplia cobertura. Això permet tenir una cobertura de centenars de llenguatges i milions de conceptes generals i específics de l'ésser humà. Com a punt de partida de la nostra investigació emprem característiques basades en grafs de coneixement - juntament amb altres tradicionals i meta-aprenentatge - per a la tasca de PLN de classificació de la polaritat mono- i trans-domini. L'anàlisi i conclusions d'aquest treball mostra evidències que els grafs de coneixement capturen el significat d'una forma independent del domini. La següent part de la nostra investigació aprofita la capacitat\hyphenation{ca-pa-ci-tat} de la xarxa semàntica multilingüe i se centra en tasques de recuperació d'informació (RI). Primer proposem un model d'anàlisi de similitud completament basat en grafs de coneixement per a detecció de plagi trans-llenguatge. A continuació, vam millorar aquest model per cobrir paraules fora de vocabulari i temps verbals, i ho apliquem a les tasques trans-llenguatges de recuperació de documents, classificació, i detecció de plagi. Finalment, estudiem l'ús de grafs de coneixement per a les tasques de PLN de resposta de preguntes en comunitats, identificació del llenguatge natiu, i identificació de la varietat del llenguatge. Les contribucions d'aquesta tesi posen de manifest el potencial dels grafs de coneixement com a representació trans-domini i trans-llenguatge del text i el seu significat en tasques de PLN i RI. Aquestes contribucions han estat publicades en diverses revistes i conferències internacionals.
Franco Salvador, M. (2017). A Cross-domain and Cross-language Knowledge-based Representation of Text and its Meaning [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/84285
TESIS
Bergstedt, Kenneth. "Lost in translation? En empirisk undersökning av användningen av tesaurer vid queryexpansion inom Cross Language Information Retrieval." Thesis, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-16903.
Full textUppsatsnivå: D
Geraldo, André Pinto. "Aplicando algoritmos de mineração de regras de associação para recuperação de informações multilíngues." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2009. http://hdl.handle.net/10183/26506.
Full textThis work proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyze market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using different languages, queries and corpora. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. A prototype for cross-language web querying was implemented to test the proposed method. The system accepts keywords in Portuguese, translates them into English and submits the query to several web-sites that provide search functionalities.
Asian, Jelita, and jelitayang@gmail com. "Effective Techniques for Indonesian Text Retrieval." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080110.084651.
Full textQureshi, Karl. "Att maskinöversätta sökfrågor : En studie av Google Translate och Bing Translators förmåga att översätta svenska sammansättningar i ett CLIR-perspektiv." Thesis, Umeå universitet, Sociologiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-131813.
Full textWilhelm, Thomas. "Entwurf und Implementierung eines Frameworks zur Analyse und Evaluation von Verfahren im Information Retrieval." Master's thesis, [S.l. : s.n.], 2008. https://monarch.qucosa.de/id/qucosa%3A18962.
Full textFeldman, Anna. "Portable language technology a resource-light approach to morpho-syntactic tagging /." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1153344391.
Full textLi, Bo. "Mesurer et améliorer la qualité des corpus comparables." Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENM069.
Full textBilingual corpora are an essential resource used to cross the language barrier in multilingual Natural Language Processing (NLP) tasks. Most of the current work makes use of parallel corpora that are mainly available for major languages and constrained areas. Comparable corpora, text collections comprised of documents covering overlapping information, are however less expensive to obtain in high volume. Previous work has shown that using comparable corpora is beneficent for several NLP tasks. Apart from those studies, we will try in this thesis to improve the quality of comparable corpora so as to improve the performance of applications exploiting them. The idea is advantageous since it can work with any existing method making use of comparable corpora. We first discuss in the thesis the notion of comparability inspired from the usage experience of bilingual corpora. The notion motivates several implementations of the comparability measure under the probabilistic framework, as well as a methodology to evaluate the ability of comparability measures to capture gold-standard comparability levels. The comparability measures are also examined in terms of robustness to dictionary changes. The experiments show that a symmetric measure relying on vocabulary overlapping can correlate very well with gold-standard comparability levels and is robust to dictionary changes. Based on the comparability measure, two methods, namely the greedy approach and the clustering approach, are then developed to improve the quality of any given comparable corpus. The general idea of these two methods is to choose the highquality subpart from the original corpus and to enrich the low-quality subpart with external resources. The experiments show that one can improve the quality, in terms of comparability scores, of the given comparable corpus by these two methods, with the clustering approach being more efficient than the greedy approach. The enhanced comparable corpus further results in better bilingual lexicons extracted with the standard extraction algorithm. Lastly, we investigate the task of Cross-Language Information Retrieval (CLIR) and the application of comparable corpora in CLIR. We develop novel CLIR models extending the recently proposed information-based models in monolingual IR. The information-based CLIR model is shown to give the best performance overall. Bilingual lexicons extracted from comparable corpora are then combined with the existing bilingual dictionary and used in CLIR experiments, which results in significant improvement of the CLIR system
Pollettini, Juliana Tarossi. "Auxílio na prevenção de doenças crônicas por meio de mapeamento e relacionamento conceitual de informações em biomedicina." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-24042012-223141/.
Full textGenomic medicine has suggested that the exposure to risk factors since conception may influence gene expression and consequently induce the development of chronic diseases in adulthood. Scientific papers bringing up these discoveries indicate that epigenetics must be exploited to prevent diseases of high prevalence, such as cardiovascular diseases, diabetes and obesity. A large amount of scientific information burdens health care professionals interested in being updated, once searches for accurate information become complex and expensive. Some computational techniques might support management of large biomedical information repositories and discovery of knowledge. This study presents a framework to support surveillance systems to alert health professionals about human development problems, retrieving scientific papers that relate chronic diseases to risk factors detected on a patient\'s clinical record. As a contribution, healthcare professionals will be able to create a routine with the family, setting up the best growing conditions. According to Butte, the effective transformation of results from biomedical research into knowledge that actually improves public health has been considered an important domain of informatics and has been called Translational Bioinformatics. Since chronic diseases are a serious health problem worldwide and leads the causes of mortality with 60% of all deaths, this scientific investigation will probably enable results from bioinformatics researches to directly benefit public health.
Luk, Wing-kong. "Concept space approach for cross-lingual information retrieval /." Hong Kong : University of Hong Kong, 2000. http://sunzi.lib.hku.hk/hkuto/record.jsp?B2275345X.
Full text陸穎剛 and Wing-kong Luk. "Concept space approach for cross-lingual information retrieval." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B30147724.
Full textMagableh, Murad. "A generic architecture for semantic enhanced tagging systems." Thesis, De Montfort University, 2011. http://hdl.handle.net/2086/5172.
Full textSaad, Motaz. "Fouille de documents et d'opinions multilingue." Thesis, Université de Lorraine, 2015. http://www.theses.fr/2015LORR0003/document.
Full textThe aim of this thesis is to study sentiments in comparable documents. First, we collect English, French and Arabic comparable corpora from Wikipedia and Euronews, and we align each corpus at the document level. We further gather English-Arabic news documents from local and foreign news agencies. The English documents are collected from BBC website and the Arabic documents are collected from Al-jazeera website. Second, we present a cross-lingual document similarity measure to automatically retrieve and align comparable documents. Then, we propose a cross-lingual sentiment annotation method to label source and target documents with sentiments. Finally, we use statistical measures to compare the agreement of sentiments in the source and the target pair of the comparable documents. The methods presented in this thesis are language independent and they can be applied on any language pair
Saad, Motaz. "Fouille de documents et d'opinions multilingue." Electronic Thesis or Diss., Université de Lorraine, 2015. http://www.theses.fr/2015LORR0003.
Full textThe aim of this thesis is to study sentiments in comparable documents. First, we collect English, French and Arabic comparable corpora from Wikipedia and Euronews, and we align each corpus at the document level. We further gather English-Arabic news documents from local and foreign news agencies. The English documents are collected from BBC website and the Arabic documents are collected from Al-jazeera website. Second, we present a cross-lingual document similarity measure to automatically retrieve and align comparable documents. Then, we propose a cross-lingual sentiment annotation method to label source and target documents with sentiments. Finally, we use statistical measures to compare the agreement of sentiments in the source and the target pair of the comparable documents. The methods presented in this thesis are language independent and they can be applied on any language pair
Beltrame, Walber Antonio Ramos. "Um sistema de disseminação seletiva da informação baseado em Cross-Document Structure Theory." Universidade Federal do Espírito Santo, 2011. http://repositorio.ufes.br/handle/10/6414.
Full textA System for Selective Dissemination of Information is a type of information system that aims to harness new intellectual products, from any source, for environments where the probability of interest is high. The inherent challenge is to establish a computational model that maps specific information needs, to a large audience, in a personalized way. Therefore, it is necessary to mediate informational structure of unit, so that includes a plurality of attributes to be considered by process of content selection. In recent publications, systems are proposed based on text markup data (meta-data models), so that treatment of manifest information between computing semi-structured data and inference mechanisms on meta-models. Such approaches only use the data structure associated with the profile of interest. To improve this characteristic, this paper proposes construction of a system for selective dissemination of information based on analysis of multiple discourses through automatic generation of conceptual graphs from texts, introduced in solution also unstructured data (text). The proposed model is motivated by Cross-Document Structure Theory, introduced in area of Natural Language Processing, focusing on automatic generation of summaries. The model aims to establish correlations between semantic of discourse, for example, if there are identical information, additional or contradictory between multiple texts. Thus, an aspects discussed in this dissertation is that these correlations can be used in process of content selection, which had already been shown in other related work. Additionally, the algorithm of the original model is revised in order to make it easy to apply
Um Sistema de Disseminação Seletiva da Informação é um tipo de Sistema de Informação que visa canalizar novas produções intelectuais, provenientes de quaisquer fontes, para ambientes onde a probabilidade de interesse seja alta. O desafio computacional inerente é estabelecer um modelo que mapeie as necessidades específicas de informação, para um grande público, de modo personalizado. Para tanto, é necessário mediar à estruturação da unidade informacional, de maneira que contemple a pluralidade de atributos a serem considerados pelo processo de seleção de conteúdo. Em recentes publicações acadêmicas, são propostos sistemas baseados em marcação de dados sobre textos (modelos de meta-dados), de forma que o tratamento da informação manifesta-se entre computação de dados semi-estruturados e mecanismos de inferência sobre meta-modelos. Tais abordagens utilizam-se apenas da associação da estrutura de dados com o perfil de interesse. Para aperfeiçoar tal característica, este trabalho propõe a construção de um sistema de disseminação seletiva da informação baseado em análise de múltiplos discursos por meio da geração automática de grafos conceituais a partir de textos, concernindo à solução também os dados não estruturados (textos). A proposta é motivada pelo modelo Cross-Document Structure Theory, recentemente difundido na área de Processamento de Língua Natural, voltado para geração automática de resumos. O modelo visa estabelecer correlações de natureza semântica entre discursos, por exemplo, se existem informações idênticas, adicionais ou contraditórias entre múltiplos textos. Desse modo, um dos aspectos discutidos nesta dissertação é que essas correlações podem ser usadas no processo de seleção de conteúdo, o que já fora evidenciado em outros trabalhos correlatos. Adicionalmente, o algoritmo do modelo original é revisado, a fim de torná-lo de fácil aplicabilidade
Kralisch, Anett. "The impact of culture and language on the use of the internet." Doctoral thesis, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, 2006. http://dx.doi.org/10.18452/15501.
Full textThis thesis analyses the impact of culture and language on Internet use. Three main areas were investigated: (1) the impact of culture and language on preferences for information presentation and search options, (2) the impact of culture on the need for specific website content, and (3) language as a barrier to information access and as a determinant of website satisfaction. In order to test the 33 hypotheses, data was gathered by means of logfile analyses, online surveys, and laboratory studies. It was concluded that culture clearly correlated with patterns of navigation behaviour and the use of search options. In contrast, results concerning the impact of culture on the need for website content were less conclusive. Results concerning language, showed that significantly fewer L1 users than L2 users accessed a website. This can be explained with language related cognitive effort as well as with the fact the websites of different languages are less linked than websites of the same language. With regard to search option use, a strong mediation effect of domain knowledge was found. Furthermore, results revealed correlations between user satisfaction and language proficiency, as well as between satisfaction and the perceived amount of native language information online.
Kubalík, Jakub. "Mining of Textual Data from the Web for Speech Recognition." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237170.
Full text"Information fusion for monolingual and cross-language spoken document retrieval." 2002. http://library.cuhk.edu.hk/record=b6073504.
Full text"October 2002."
Thesis (Ph.D.)--Chinese University of Hong Kong, 2002.
Includes bibliographical references (p. 170-184).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Mode of access: World Wide Web.
Abstracts in English and Chinese.
Yu-ChunTing and 丁鈺純. "The Establishment of English-Chinese Cross-language Information Retrieval System." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/07372899416113106412.
Full text國立成功大學
資訊管理研究所
101
In recent years, due to the fast information flow and convenience of information sharing, information overload happens. How to obtain the information required by users from large amounts of data becomes important. Information retrieval systems perform well in monolingual information retrieval, but not in cross-language information retrieval. In current globalized environment, when the users intend to understand non-native words or files, they often need to search for cross-language documents to obtain native related information assisting users across dyslexia problems. Therefore, it is necessary to establish cross-language information retrieval(CLIR) systems to help users search for relevant cross-language document. The present study indicates tjar query translation and query expansion can be used to improve the retrieval accuracy of CLIR. However, the ambiguity of query terms as well as more and more out-of vocabulary(OOV) terms easily lead to translation errors. Cheng et al. (2004) apply network resources to translate query terms, and perform well in OOV terms, but not in general terms. Therefore, this study uses bilingual corpus, Google search results and Wikipedia to extract correct query translation terms in order to reduce word ambiguity problems and thus obtain translation of OOV terms. In addition, in order to improve the performance of CLIR this study uses Google search results and Wikipedia to obtain expansion terms related to query terms. This study ways to NTCIR-8 dataset to do the test, the results show that this method can effectively improve the retrieval accuracy.
Ballesteros, Lisa Ann. "Resolving ambiguity for cross -language information retrieval: A dictionary approach." 2001. https://scholarworks.umass.edu/dissertations/AAI3027176.
Full textNel, Johannes Gerhardus. "Zulu-English cross-language information retrieval : an analysis of errors." Diss., 2004. http://hdl.handle.net/2263/27720.
Full textLiang, Je-Wei, and 梁哲瑋. "Resolving Translation Ambiguity By Ontological Chain for Cross Language Information Retrieval." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/73784787270503597835.
Full text國立交通大學
資訊科學系所
92
Bilingual dictionaries have been commonly used for query translation in cross-language information retrieval(CLIR). However, the problem of translation ambiguity happens in query translation. Recent studies suggest traversing WordNet for selecting appropriate translations. This paper proposes an ontological chain approach to resolve translation ambiguity. First, we find the most smilar ontology nodes for each query. Second, we construct a semantic graph according to the semantic distances between these nodes. And finally we select the connected component with the highest score as our ontological chain. We show that our approach reaches 81% effect of monolingual information retrieval systems. When there are many candidate translations, our system performs better than monolingual information retrieval system.
Lee, Chia-Jung, and 李佳蓉. "The Impact of Query Term Translation on Cross-language Information Retrieval." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/69548766821333713446.
Full text臺灣大學
資訊工程學研究所
98
Query translation is an important task in cross-language information retrieval (CLIR) aiming to translate queries into languages used in documents. The purpose of this paper is to investigate the necessity of translating query terms, which might differ from one term to another. Some untranslated terms cause irreparable performance drop while others do not. We propose an approach to estimate the translation probability of a query term, which helps decide if it should be translated or not. The approach learns regression and classification models based on a rich set of linguistic and statistical properties of the term. Experiments on NTCIR-4 and NTCIR-5 English-Chinese CLIR tasks demonstrate that the proposed approach can significantly improve CLIR performance. An in-depth analysis is provided for discussing the impact of untranslated out-of-vocabulary (OOV) query terms and translation quality of non-OOV query terms on CLIR performance. We also scrutinize how translation accuracy is related to translation quality, which eventually influences the translation necessity.
"Using web resources for effective English-to-Chinese cross language information retrieval." Thesis, 2005. http://library.cuhk.edu.hk/record=b6074036.
Full textJin Honglan.
"October 2005."
Adviser: Kam Fai Wong.
Source: Dissertation Abstracts International, Volume: 67-07, Section: B, page: 3899.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2005.
Includes bibliographical references (p. 115-121).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract in English and Chinese.
School code: 1307.
"A corpus-based approach for cross-lingual information retrieval." 2004. http://library.cuhk.edu.hk/record=b6073674.
Full text"July 2004."
Thesis (Ph.D.)--Chinese University of Hong Kong, 2004.
Includes bibliographical references (p. 127-139).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Mode of access: World Wide Web.
Abstracts in English and Chinese.
Ha, Yoo Jin. "Accessing and using multilanguage information by users searching in differenct information retrieval systems." 2008. http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.000051091.
Full text丁肇君. "A Chinese-English Cross-Language Information Retrieval System for On-line News Articles." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/41429848292650750654.
Full text國立臺北科技大學
電機工程系碩士班
90
Accelerated growth of the Internet and on-line news in English allow non-native English speakers to access on-line news in English more frequently. However, Chinese speaking Internet users cannot retrieve relevant topics from an enormous amount of news owing to the difficulty of making a precise query in English. Moreover, non-native English speakers cannot retrieve relevant news in English owing to limited vocabulary skills. This study proposes a novel information retrieval system for Chinese-English cross-language when retrieving on-line news articles. Thus, Chinese speaking Internet users can formulate queries in Chinese and then retrieve relevant news in English via the proposed system. The proposed system first collects on-line news from Chinese and English news web sites daily. Sentence segmentation is then performed using the Chinese query. With sentence segmentation, the original Chinese query can be expanded to more Chinese queries. Finally, these Chinese queries can be translated into English queries and then the relevant English news retrieved. Additionally, the relation between the announcement date between Chinese news and English news for the same event is considered to enhance the precision of the proposed system.
"Multi-lingual text retrieval and mining." 2003. http://library.cuhk.edu.hk/record=b5891637.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 130-134).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Cross-Lingual Information Retrieval (CLIR) --- p.2
Chapter 1.2 --- Bilingual Term Association Mining --- p.5
Chapter 1.3 --- Our Contributions --- p.6
Chapter 1.3.1 --- CLIR --- p.6
Chapter 1.3.2 --- Bilingual Term Association Mining --- p.7
Chapter 1.4 --- Thesis Organization --- p.8
Chapter 2 --- Related Work --- p.9
Chapter 2.1 --- CLIR Techniques --- p.9
Chapter 2.1.1 --- Existing Approaches --- p.9
Chapter 2.1.2 --- Difference Between Our Model and Existing Approaches --- p.13
Chapter 2.2 --- Bilingual Term Association Mining Techniques --- p.13
Chapter 2.2.1 --- Existing Approaches --- p.13
Chapter 2.2.2 --- Difference Between Our Model and Existing Approaches --- p.17
Chapter 3 --- Cross-Lingual Information Retrieval (CLIR) --- p.18
Chapter 3.1 --- Cross-Lingual Query Processing and Translation --- p.18
Chapter 3.1.1 --- Query Context and Document Context Generation --- p.20
Chapter 3.1.2 --- Context-Based Query Translation --- p.23
Chapter 3.1.3 --- Query Term Weighting --- p.28
Chapter 3.1.4 --- Final Weight Calculation --- p.30
Chapter 3.2 --- Retrieval on Documents and Automated Summaries --- p.32
Chapter 4 --- Experiments on Cross-Lingual Information Retrieval --- p.38
Chapter 4.1 --- Experimental Setup --- p.38
Chapter 4.2 --- Results of English-to-Chinese Retrieval --- p.45
Chapter 4.2.1 --- Using Mono-Lingual Retrieval as the Gold Standard --- p.45
Chapter 4.2.2 --- Using Human Relevance Judgments as the Gold Stan- dard --- p.49
Chapter 4.3 --- Results of Chinese-to-English Retrieval --- p.53
Chapter 4.3.1 --- Using Mono-lingual Retrieval as the Gold Standard --- p.53
Chapter 4.3.2 --- Using Human Relevance Judgments as the Gold Stan- dard --- p.57
Chapter 5 --- Discovering Comparable Multi-lingual Online News for Text Mining --- p.61
Chapter 5.1 --- Story Representation --- p.62
Chapter 5.2 --- Gloss Translation --- p.64
Chapter 5.3 --- Comparable News Discovery --- p.67
Chapter 6 --- Mining Bilingual Term Association Based on Co-occurrence --- p.75
Chapter 6.1 --- Bilingual Term Cognate Generation --- p.75
Chapter 6.2 --- Term Mining Algorithm --- p.77
Chapter 7 --- Phonetic Matching --- p.87
Chapter 7.1 --- Algorithm Design --- p.87
Chapter 7.2 --- Discovering Associations of English Terms and Chinese Terms --- p.93
Chapter 7.2.1 --- Converting English Terms into Phonetic Representation --- p.93
Chapter 7.2.2 --- Discovering Associations of English Terms and Man- darin Chinese Terms --- p.100
Chapter 7.2.3 --- Discovering Associations of English Terms and Can- tonese Chinese Terms --- p.104
Chapter 8 --- Experiments on Bilingual Term Association Mining --- p.111
Chapter 8.1 --- Experimental Setup --- p.111
Chapter 8.2 --- Result and Discussion of Bilingual Term Association Mining Based on Co-occurrence --- p.114
Chapter 8.3 --- Result and Discussion of Phonetic Matching --- p.121
Chapter 9 --- Conclusions and Future Work --- p.126
Chapter 9.1 --- Conclusions --- p.126
Chapter 9.1.1 --- CLIR --- p.126
Chapter 9.1.2 --- Bilingual Term Association Mining --- p.127
Chapter 9.2 --- Future Work --- p.128
Bibliography --- p.134
Chapter A --- Original English Queries --- p.135
Chapter B --- Manual translated Chinese Queries --- p.137
Chapter C --- Pronunciation symbols used by the PRONLEX Lexicon --- p.139
Chapter D --- Initial Letter-to-Phoneme Tags --- p.141
Chapter E --- English Sounds with their Chinese Equivalents --- p.143
"Named entity translation matching and learning with mining from multilingual news." 2004. http://library.cuhk.edu.hk/record=b5892099.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2004.
Includes bibliographical references (leaves 79-82).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Named Entity Translation Matching --- p.2
Chapter 1.2 --- Mining New Translations from News --- p.3
Chapter 1.3 --- Thesis Organization --- p.4
Chapter 2 --- Related Work --- p.5
Chapter 3 --- Named Entity Matching Model --- p.9
Chapter 3.1 --- Problem Nature --- p.9
Chapter 3.2 --- Matching Model Investigation --- p.12
Chapter 3.3 --- Tokenization --- p.15
Chapter 3.4 --- Hybrid Semantic and Phonetic Matching Algorithm --- p.16
Chapter 4 --- Phonetic Matching Model --- p.22
Chapter 4.1 --- Generating Phonetic Representation for English --- p.22
Chapter 4.1.1 --- Phoneme Generation --- p.22
Chapter 4.1.2 --- Training the Tagging Lexicon and Transformation Rules --- p.25
Chapter 4.2 --- Generating Phonetic Representation for Chinese --- p.29
Chapter 4.3 --- Phonetic Matching Algorithm --- p.31
Chapter 5 --- Learning Phonetic Similarity --- p.37
Chapter 5.1 --- The Widrow-Hoff Algorithm --- p.39
Chapter 5.2 --- The Exponentiated-Gradient Algorithm --- p.41
Chapter 5.3 --- The Genetic Algorithm --- p.42
Chapter 6 --- Experiments on Named Entity Matching Model --- p.43
Chapter 6.1 --- Results for Learning Phonetic Similarity --- p.44
Chapter 6.2 --- Results for Named Entity Matching --- p.46
Chapter 7 --- Mining New Entity Translations from News --- p.48
Chapter 7.1 --- Metadata Generation --- p.52
Chapter 7.2 --- Discovering Comparable News Cluster --- p.54
Chapter 7.2.1 --- News Preprocessing --- p.54
Chapter 7.2.2 --- Gloss Translation --- p.55
Chapter 7.2.3 --- Comparable News Cluster Discovery --- p.62
Chapter 7.3 --- Named Entity Cognate Generation --- p.64
Chapter 7.4 --- Entity Matching --- p.66
Chapter 7.4.1 --- Matching Algorithm --- p.66
Chapter 7.4.2 --- Matching Result Production --- p.68
Chapter 8 --- Experiments on Mining New Translations --- p.69
Chapter 9 --- Experiments on Context-based Gloss Translation --- p.72
Chapter 9.1 --- Results on Chinese News Translation --- p.73
Chapter 9.2 --- Results on Arabic News Translation --- p.75
Chapter 10 --- Conclusions and Future Work --- p.77
Bibliography --- p.79
A --- p.83
B --- p.85
C --- p.87
D --- p.89
E --- p.91
F --- p.94
G --- p.95
Bian, Guo-Wei, and 邊國維. "The Study of Query Translation and Document Translation in a Cross-Language Information Retrieval System." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/42168106915261766587.
Full text國立臺灣大學
資訊工程學研究所
87
Internet and digital libraries make available heterogeneous collections in various languages. They provide many useful and powerful information dissemination services. However, about 80% of Web sites are in English and about 40% of Internet users do not speak English. Language barrier becomes a major problem for people to search, retrieve, and understand materials in different languages. How to incorporate the technologies of machine translation with text processing has shown to be very important in the information age. In this dissertation, we first present a general model of multilingual information access system to integrate the text processing systems and language translation systems. A distributed English-Chinese system on WWW is introduced to illustrate how to integrate query translation, search engines, and web translation system. This system can help users to access and retrieve documents on the WWW in their native language(s). This dissertation deals with translation ambiguity and target polysemy problems together. For translation disambiguation, we describe a new hybrid approach combining the dictionary-based and corpus-based approaches to Chinese-English Cross-Language Information Retrieval (CLIR). The bilingual dictionary provides the translation equivalents of each query term. And the word co-occurrence information trained from the retrieval document collection or a monolingual corpus can be used to disambiguate the translation. Further, we investigate the roles of phrase-level translation and short query by comparing the word-level translation and long query for different selection strategies. Several experiments for the query translations of CLIR have been simulated, and they have shown the applicability for short queries on the WWW. Further, we discuss the multiplication effects of translation ambiguity and target polysemy in cross-language information retrieval. And a new translation method is proposed to resolve these problems. Two monolingual balanced corpora are employed to learn word co-occurrence for translation ambiguity resolution, and augmented translation restrictions for target polysemy resolution. We also analyze the two factors: word sense ambiguity in source language (translation ambiguity), and word sense ambiguity in target language (target polysemy). The statistics of word sense ambiguities have shown that target polysemy resolution is critical in Chinese-English information retrieval. The capability of machine translation (MT) is incorporated into the World Wide Web. An on-line and real-time English-to-Chinese machine translation system has been developed and evaluated. It can be treated as a Chinese document generating system to produce the Chinese or the bilingual English-Chinese versions of documents from English web pages dynamically. A quantitative study of 100,000 web pages and the 30 top requested WWW sites have reflected the importance of the tradeoff between speed and translation quality for document translation. On the average, it takes 4.66 seconds for HTML analysis and machine translation. The correct rates of tagging and lexical selection are 97.36% and 85.37%, respectively. For the end-users, this system can be used as a multilingual information access system or a cross-language information retrieval system on the Internet. It can assist the users to retrieve and understand the web pages during their navigation on WWW. Since July 1997, more than 90,000 users have accessed our system and about 450,000 English web pages have been translated to pages in Chinese or bilingual English-Chinese versions. And the average satisfaction degree of users at document level is 67.47%.
Wang, Yu-Chun, and 王昱鈞. "Web-based Named Entity Translation Method for Korean-Chinese and Japanese-Chinese Cross-language Information Retrieval." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/32466877086955994100.
Full text國立臺灣大學
電機工程學研究所
96
Named entity (NE) translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating NEs from Korean/Japanese to Chinese in order to improve Korean-Chinese and Japanese-Chinese cross-language information retrieval. The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integrate two online databases to extend the coverage of our bilingual dictionaries. We use Wikipedia as a translation tool based on the inter-language links between the Korean/Japanese edition and the Chinese or English editions. We also use Naver.com’s people search engine to find a query name’s Chinese or English translation. The second component of our system is able to learn Korean-Chinese (K-C), Korean-English (K-E), and English-Chinese (E-C) translation patterns from the web. These patterns can be used to extract K-C, K-E and E-C pairs from Google snippets. We also have the Japanese-Chinese (J-C), Japanese-English (J-E) translation patterns for translating Japanese NEs. We found CLIR performance using this hybrid configuration over five times better than that a dictionary-based configuration using only the bilingual dictionary. Mean average precision was as high as 0.3385 and recall reached 0.7578. Our method can handle Chinese, Japanese, Korean, and non-CJK NE translation and improve performance of CLIR substantially.
Wang, Yu-Chun. "Web-based Named Entity Translation Method for Korean-Chinese and Japanese-Chinese Cross-language Information Retrieval." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2207200819095000.
Full text