Dissertations / Theses: 'XML Retrieval'

1

Blanke, Tobias. "Theoretical evaluation of XML retrieval." Thesis, University of Glasgow, 2011. http://theses.gla.ac.uk/2828/.

Full text

Abstract:

This thesis develops a theoretical framework to evaluate XML retrieval. XML retrieval deals with retrieving those document parts that specifically answer a query. It is concerned with using the document structure to improve the retrieval of information from documents by only delivering those parts of a document an information need is about. We define a theoretical evaluation methodology based on the idea of `aboutness' and apply it to XML retrieval models. Situation Theory is used to express the aboutness proprieties of XML retrieval models. We develop a dedicated methodology for the evaluation of XML retrieval and apply this methodology to five XML retrieval models and other XML retrieval topics such as evaluation methodologies, filters and experimental results.

APA, Harvard, Vancouver, ISO, and other styles

2

Pehcevski, Jovan, and jovanp@cs rmit edu au. "Evaluation of Effective XML Information Retrieval." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080104.142709.

Full text

Abstract:

XML is being adopted as a common storage format in scientific data repositories, digital libraries, and on the World Wide Web. Accordingly, there is a need for content-oriented XML retrieval systems that can efficiently and effectively store, search and retrieve information from XML document collections. Unlike traditional information retrieval systems where whole documents are usually indexed and retrieved as information units, XML retrieval systems typically index and retrieve document components of varying granularity. To evaluate the effectiveness of such systems, test collections where relevance assessments are provided according to an XML-specific definition of relevance are necessary. Such test collections have been built during four rounds of the INitiative for the Evaluation of XML Retrieval (INEX). There are many different approaches to XML retrieval; most approaches either extend full-text information retrieval systems to handle XML retrieval, or use database technologies that incorporate existing XML standards to handle both XML presentation and retrieval. We present a hybrid approach to XML retrieval that combines text information retrieval features with XML-specific features found in a native XML database. Results from our experiments on the INEX 2003 and 2004 test collections demonstrate the usefulness of applying our hybrid approach to different XML retrieval tasks. A realistic definition of relevance is necessary for meaningful comparison of alternative XML retrieval approaches. The three relevance definitions used by INEX since 2002 comprise two relevance dimensions, each based on topical relevance. We perform an extensive analysis of the two INEX 2004 and 2005 relevance definitions, and show that assessors and users find them difficult to understand. We propose a new definition of relevance for XML retrieval, and demonstrate that a relevance scale based on this definition is useful for XML retrieval experiments. Finding the appropriate approach to evaluate XML retrieval effectiveness is the subject of ongoing debate within the XML information retrieval research community. We present an overview of the evaluation methodologies implemented in the current INEX metrics, which reveals that the metrics follow different assumptions and measure different XML retrieval behaviours. We propose a new evaluation metric for XML retrieval and conduct an extensive analysis of the retrieval performance of simulated runs to show what is measured. We compare the evaluation behaviour obtained with the new metric to the behaviours obtained with two of the official INEX 2005 metrics, and demonstrate that the new metric can be used to reliably evaluate XML retrieval effectiveness. To analyse the effectiveness of XML retrieval in different application scenarios, we use evaluation measures in our new metric to investigate the behaviour of XML retrieval approaches under the following two scenarios: the ad-hoc retrieval scenario, exploring the activities carried out as part of the INEX 2005 Ad-hoc track; and the multimedia retrieval scenario, exploring the activities carried out as part of the INEX 2005 Multimedia track. For both application scenarios we show that, although different values for retrieval parameters are needed to achieve the optimal performance, the desired textual or multimedia information can be effectively located using a combination of XML retrieval approaches.

APA, Harvard, Vancouver, ISO, and other styles

3

Sanz, Blasco Ismael. "Flexible techniques for heterogeneous XML data retrieval." Doctoral thesis, Universitat Jaume I, 2007. http://hdl.handle.net/10803/10373.

Full text

Abstract:

The progressive adoption of XML by new communities of users has motivated the appearance of applications that require the management of large and complex collections, which present a large amount of heterogeneity. Some relevant examples are present in the fields of bioinformatics, cultural heritage, ontology management and geographic information systems, where heterogeneity is not only reflected in the textual content of documents, but also in the presence of rich structures which cannot be properly accounted for using fixed schema definitions. Current approaches for dealing with heterogeneous XML data are, however, mainly focused at the content level, whereas at the structural level only a limited amount of heterogeneity is tolerated; for instance, weakening the parent-child relationship between nodes into the ancestor-descendant relationship.
The main objective of this thesis is devising new approaches for querying heterogeneous XML collections. This general objective has several implications: First, a collection can present different levels of heterogeneity in different granularity levels; this fact has a significant impact in the selection of specific approaches for handling, indexing and querying the collection. Therefore, several metrics are proposed for evaluating the level of heterogeneity at different levels, based on information-theoretical considerations. These metrics can be employed for characterizing collections, and clustering together those collections which present similar characteristics.
Second, the high structural variability implies that query techniques based on exact tree matching, such as the standard XPath and XQuery languages, are not suitable for heterogeneous XML collections. As a consequence, approximate querying techniques based on similarity measures must be adopted. Within the thesis, we present a formal framework for the creation of similarity measures which is based on a study of the literature that shows that most approaches for approximate XML retrieval (i) are highly tailored to very specific problems and (ii) use similarity measures for ranking that can be expressed as ad-hoc combinations of a set of --basic' measures. Some examples of these widely used measures are tf-idf for textual information and several variations of edit distances. Our approach wraps these basic measures into generic, parametrizable components that can be combined into complex measures by exploiting the composite pattern, commonly used in Software Engineering. This approach also allows us to integrate seamlessly highly specific measures, such as protein-oriented matching functions.
Finally, these measures are employed for the approximate retrieval of data in a context of highly structural heterogeneity, using a new approach based on the concepts of pattern and fragment. In our context, a pattern is a concise representations of the information needs of a user, and a fragment is a match of a pattern found in the database. A pattern consists of a set of tree-structured elements --- basically an XML subtree that is intended to be found in the database, but with a flexible semantics that is strongly dependent on a particular similarity measure. For example, depending on a particular measure, the particular hierarchy of elements, or the ordering of siblings, may or may not be deemed to be relevant when searching for occurrences in the database.
Fragment matching, as a query primitive, can deal with a much higher degree of flexibility than existing approaches. In this thesis we provide exhaustive and top-k query algorithms. In the latter case, we adopt an approach that does not require the similarity measure to be monotonic, as all previous XML top-k algorithms (usually based on Fagin's algorithm) do. We also presents two extensions which are important in practical settings: a specification for the integration of the aforementioned techniques into XQuery, and a clustering algorithm that is useful to manage complex result sets.
All of the algorithms have been implemented as part of ArHeX, a toolkit for the development of multi-similarity XML applications, which supports fragment-based queries through an extension of the XQuery language, and includes graphical tools for designing similarity measures and querying collections. We have used ArHeX to demonstrate the effectiveness of our approach using both synthetic and real data sets, in the context of a biomedical research project.

APA, Harvard, Vancouver, ISO, and other styles

4

Dopichaj, Philipp. "Content oriented retrieval on document centric XML." München Verl. Dr. Hut, 2007. http://d-nb.info/987370731/04.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Shen, Yun. "Accelerating data retrieval steps in XML documents." Thesis, University of Hull, 2005. http://hydra.hull.ac.uk/resources/hull:8310.

Full text

Abstract:

The aim of this research is to accelerate the data retrieval steps in a collection of XML (eXtensible Markup Language) documents, a key task of current XML research. The following three inter-connected issues relating to the state-of-theart XML research are thus studied: semantically clustering XML documents, efficiently querying XML document with an index structure and self-adaptively labelling dynamic XML documents, which form a basic but self-contained foundation of a native XML database system. This research is carried out by following a divide-and-conquer strategy. The issue of dividing a collection of XML documents into sub-clusters, in which semantically similar XML documents are grouped together, is addressed at first. To achieve this purpose, a semantic component model to model the implicit semantic of an XML document is proposed. This model enables us to devise a set of heuristic algorithms to' compute the degree of similarity among XML documents. In particular, the newly proposed semantic component model and the heuristic algorithms reflect the inaccuracy of the traditional edit-distance-based clustering mechanisms. After similar XML documents are grouped into sub-collections,the problem of querying XML documents with an index structure is carefully studied. A novel geometric sequence model is proposed to transform XML documents into numbered geometric sequences and XPath queries into geometric query sequences. The problem of evaluating an XPath query in an XML document is theoretically proved to be equal to the problem of finding the subsequence .matchings of a geometric query sequence in a numbered geometric document sequence. This geometric sequence model then enables us to devise two new stackbased algorithms to perform both top-down and bottom-up XPath evaluation in XML documents. In particular, the algorithms treat an XPath query as a whole unit, avoiding resource-consuming join operations and generating all the answers without semantic errors and false alarms. Finally the issue of supporting update functions in XML documents is tackled. A new Bayesian allocation model is introduced for the index structure generated in geometric sequence model. Based on k-ary tree data structure and the level traversal mechanism, the correctness and efficiency of the Bayesian allocation model in supporting dynamic XML documents is theoretically proved. In particular, the Bayesian allocation model is general and can be applied to most of the current index structures.

APA, Harvard, Vancouver, ISO, and other styles

6

PANZERI, EMANUELE. "Enhanced XML Retrieval with Flexible Constraints Evaluation." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2014. http://hdl.handle.net/10281/50791.

Full text

Abstract:

Since its standardization by the World Wide Web Consortium (W3C) in 1998, the XML (acronym for eXtensible Markup Language) has been acknowledged as the de-facto standard format for data, besides being a data format employed by a wide and increasing number of application domains. XML allows data and textual contents to be structured; the structural elements are specified in plain text using strings of characters that can be easily read by computer programs, while maintaining human-readability. XPath and XQuery represent the two main standard languages that have been defined to inquire XML data; the two languages allow to select a subset of elements from an XML document, and to further manipulate its contents and to restructure the document tree form. Both XPath and XQuery are based on a Database perspective of XML documents, where the evaluation of the query clauses is performed like in the database query language SQL, from which both the XML languages took inspiration. The data-centric perspective adopted by the XQuery and XPath languages has been recently extended by an Information Retrieval oriented approach, where a new set of content-based constraints have been defined that allow a full-text search in an IR-style, with an element relevance scoring computation. This extension is called XQuery/XPath Full-Text and has been standardized by the W3C. In the Information Retrieval community other approaches have appeared that take into account the document structure and propose a set of approximate structural matching techniques, where the standard XQuery and XPath structural constraints are evaluated by path relaxation algorithms. Such approaches, however, do not offer the user the possibility to express vague structural constraints the approximate evaluation of which produces a set of weighted fragments, where the weight express the relevance of the fragment with respect to the structural constraints. This thesis describes the definition and the implementation of a formal XQuery Full-Text extension named FleXy, aimed at taking into account the user perspective in the formulation of structure-based constraints, where vagueness can be associated to the specification of such constraints. FleXy has been designed as an extension of the XQuery Full-Text language to inherit both the full-text search features from the Full-Text extension, and the standard element selection provided by XQuery. The evaluation of two new vague structural constraints defined in the FleXy language, named Below and Near, produces a set of weighted elements, where a structural-score is computed by taking into account the distance from the user required target element and the actually retrieved one. Thresholds variants of the Below and Near constraints have also been defined which allow to specify the extent of the application of the vague structural constraints. The formal definition of the FleXy language is here provided through its syntax, its semantics, and the algorithms that define the Below and the Nnear axes. The language implementation has been performed on top of an Open Source XQuery engine named BaseX, a fully featured XQuery and XPath engine with a complete adherence to the Full-Text language specification. Performance evaluations have been subsequently provided to compare the FleXy constraints with the standard XQuery counterparts, when available. Finally, a patent search application has been developed by leveraging the FleXy implementation provided on top of the BaseX engine: the XML structure of the US Patent Collection (USPTO) has been exploited in conjunction with the textual contents of the patents to help non-expert users to effectively retrieve relevant patents by also offering a result categorization strategy.

APA, Harvard, Vancouver, ISO, and other styles

7

Bremer, Jan-Marco. "Next-generation information retrieval : integrating document and data retrieval based on XML /." For electronic version search Digital dissertations database. Restricted to UC campuses. Access is free to UC campus dissertations, 2003. http://uclibs.org/PID/11984.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Wulff, Sascha. "Integration von Information-Retrieval-Funktionalität in XML-Repositories." Zürich : Eidgenössische Technische Hochschule Zürich, Institut für Informationssysteme, Fachgruppe Datenbanken, 2002. http://e-collection.ethbib.ethz.ch/show?type=dipl&nr=30.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Ashoori, Elham. "Using Topic Shifts in Content-Oriented XML Retrieval." Thesis, Queen Mary, University of London, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.509726.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lam, Franky Shung Lai Chemical Sciences &amp Engineering Faculty of Engineering UNSW. "Optimization techniques for XML databases." Awarded by:University of New South Wales. Chemical Sciences & Engineering, 2007. http://handle.unsw.edu.au/1959.4/40702.

Full text

Abstract:

In this thesis, we address several fundamental concerns of maintaining and querying huge ordered label trees. We focus on practical implementation issues of storing, updating and query optimization of XML database management system. Specifically, we address the XML order maintenance problem, efficient evaluation of structural join, intrinsic skew handling of join, succinct storage of XML data and update synchronization of mobile XML data.

APA, Harvard, Vancouver, ISO, and other styles

11

Weigel, Felix. "Structural Summaries as a Core Technology for Efficient XML Retrieval." Diss., lmu, 2006. http://nbn-resolving.de/urn:nbn:de:bvb:19-62594.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Fuhr, Norbert Grossjohann Kai Kriewel Sascha. "A Query Language and User Interface for XML Information Retrieval." Gerhard-Mercator-Universitaet Duisburg, 2004. http://www.ub.uni-duisburg.de/ETD-db/theses/available/duett-07022004-114955/.

Full text

Abstract:

In:

Intelligent Search on XML Data : Applications, Languages, Models, Implementations, and Benchmarks / Henk Blanken ... [et al.] (eds.)- Berlin : Springer, 2003. - ISBN 3-540-40768-5, S. 59-75

APA, Harvard, Vancouver, ISO, and other styles

13

Fuhr, Norbert Goevert Norbert Abolhassani Mohammad. "Retrieval Quality vs.Effectiveness of Relevance-Oriented Search in XML Documents." Gerhard-Mercator-Universitaet Duisburg, 2004. http://www.ub.uni-duisburg.de/ETD-db/theses/available/duett-07072004-114441/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Arslan, Serdar. "An Xml Based Content-based Image Retrieval System With Mpeg-7 Descriptors." Master's thesis, METU, 2004. http://etd.lib.metu.edu.tr/upload/12605750/index.pdf.

Full text

Abstract:

Recently, very large collections of images and videos have grown rapidly. In parallel with this growth, content-based retrieval and querying the indexed collections are required to access visual information. Three main components of the visual information are color, texture and shape. In this thesis, an XML based content-based image retrieval system is presented that combines three visual descriptors of MPEG-7 and measures similarity of images by applying a distance function. An XML database is used for storing these three descriptors. The system is also extended to support high dimensional indexing for efficient search and retrieval from its XML database. To do this, an index structure, called M-Tree, is implemented which uses weighted Euclidean distance function for similarity measure. Ordered Weighted Aggregation (OWA) operators are used to define the weights of the distance function and to combine three features&rsquo
distance functions into one. The system supports nearest neighbor queries and three types of fuzzy queries
feature-based, image-based and color-based queries. Also it is shown through experimental results and analysis of retrieval effectiveness of querying that the content-based retrieval system is effective in terms of retrieval and scalability.

APA, Harvard, Vancouver, ISO, and other styles

15

Grabs, Torsten. "Storage and retrieval of XML documents with a cluster of database systems /." Berlin : Aka, 2003. http://www.loc.gov/catdir/toc/fy0713/2007435297.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Pridaphattharakun, Wilasini. "Information retrieval and answer extraction for an XML knowledge base in WebNL." [Gainesville, Fla.] : University of Florida, 2001. http://purl.fcla.edu/fcla/etd/UFE0000344.

Full text

Abstract:

Thesis (M.S.)--University of Florida, 2001.
Title from title page of source document. Document formatted into pages; contains xiii, 71 p.; also contains graphics. Includes vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

17

Calvo, Antonio M. "MARC to XML : an enhanced name authority record /." Examples, 2000. http://senna.sjsu.edu/lmain/Nomen/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Popovici, Eugen-Costin. "Information retrieval of text, structure and sequential data in heterogeneous XML document collections." Lorient, 2008. http://www.theses.fr/2008LORIS110.

Full text

Abstract:

Les documents numériques sont aujourd'hui des données complexes qui intègrent d'une manière hétérogène des informations textuelles, structurelles, multimédia ainsi que des méta-données. Le langage de balisage générique XML s’est progressivement imposé comme support privilégié non seulement pour l’échange des données mais aussi pour leur stockage. La gestion des documents stocke��s sous les formats XML nécessite le développement de méthodes et d'outils spécifiques pour l'indexation, la recherche, le filtrage et la fouille des données. En particulier, les fonctions de recherche et de filtrage doivent prendre en compte des requêtes disposant de connaissances incomplètes, imprécises, parfois même erronées sur la structure ou le contenu des documents XML. Ces fonctions doivent par ailleurs maintenir une complexité algorithmique compatible avec la complexité des données et surtout avec leur volume toujours en forte croissance, ceci pour assurer le passage à l'échelle des solutions informatique. Dans cette thèse, nous étudions des méthodes et développons des outils pour indexer et rechercher des informations multimédia hétérogènes stockées dans des banques de documents XML. Plus précisément, nous abordons la question de la recherche par similarité sur des données composites décrites par des éléments structurels, textuels et séquentiels. En s'appuyant sur la partie structurelle des documents XML, nous avons défini un modèle de représentation, d'indexation et d'interrogation flexible pour des types hétérogènes de données séquentielles. Les principes que nous développons mettent en oeuvre des mécanismes de recherche qui exploitent simultanément les éléments des structures documentaires indexées et les contenus documentaires non structurés. Nous évaluons également l’impact sur la pertinence des résultats retournés par l'introduction de mécanismes d'alignement approximatif des éléments structurels. Nous proposons des algorithmes capables de détecter et de suggérer les « meilleurs points d'entrée » pour accéder directement à l’information recherchée dans un document XML. Finalement, nous étudions l'exploitation d’une architecture matérielle dédiée pour accélérer les traitements les plus coûteux du point de vue de la complexité de notre application de recherche d’information structurée
Nowadays digital documents represent a complex and heterogeneous mixture of text, structure, meta-data and multimedia information. The XML description language is now the standard used to represent such documents in digital libraries, product catalogues, scientific data repositories and across the Web. The management of semi structured data requires the development of appropriate indexing, filtering, searching and browsing methods and tools. In particular, the filtering and searching functions of the retrieval systems should be able to answer queries having an incomplete, imprecise or even erroneous knowledge about both the structure and the content of the XML documents. Moreover, these functions should maintain an algorithmic complexity compatible with the complexity of the data while maintaining the scalability of the system. In this thesis, we explore methods for managing and searching collections of heterogeneous multimedia XML documents. We focus on the flexible searching of structure, text, and sequential data embedded in heterogeneous XML document databases. Based on the structural part of the XML documents, we propose a flexible model for the representation, indexing and retrieval of heterogeneous types of sequential data. The matching mechanism simultaneously exploits the structural organization of the sequential/textual data as well as the relevance and the characteristics of the unstructured content of the indexed documents. We also design and evaluate methods both for the approximate matching of structural constraints in an XML Information Retrieval (IR) framework and for the detection of best entry points to locate given topics in XML Documents. Finally, we explore the use of dedicated hardware architecture to accelerate the most expensive processing steps of our XML IR application

APA, Harvard, Vancouver, ISO, and other styles

19

Kazai, Gabriella. "Evaluation of focused retrieval approaches in the context of context-oriented XML information retrieval : Test collection construction and effectiveness measures." Thesis, Queen Mary, University of London, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.515465.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Raymond, Scott P. "Operation and Maintenance Support Information (OMSI) creation, management, and repurposing with XML." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2004. http://library.nps.navy.mil/uhtbin/hyperion/04Sep%5FRaymond.pdf.

Full text

Abstract:

Thesis (M.S. in Information Technology Management)--Naval Postgraduate School, Sept. 2004.
Thesis Advisor(s): Daniel R. Dolk, Gordon H. Bradley. Includes bibliographical references (p. 119-120, 121-122). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

21

Miyake, Tina M. "Metacognition, proactive interference, and working memory can people monitor for proactive interference at encoding and retrieval? /." Greensboro, N.C. : University of North Carolina at Greensboro, 2007. http://libres.uncg.edu/edocs/etd/1441/umi-uncg-1441.xml.

Full text

Abstract:

Thesis (Ph.D.)--University of North Carolina at Greensboro, 2007.
Title from PDF t.p. (viewed Oct. 22, 2007). Directed by Michael J. Kane; submitted to the Dept. of Psychology. Includes bibliographical references (p. 76-82).

APA, Harvard, Vancouver, ISO, and other styles

22

Li, Jianxin. "Adaptive query relaxation and processing over heterogeneous xml data sources." Swinburne Research Bank, 2009. http://hdl.handle.net/1959.3/66874.

Full text

Abstract:

Thesis (Ph.D) - Swinburne University of Technology, Faculty of Information & Communication Technologies, 2009.
A dissertation submitted to the Faculty of Information and Communication Technologies, Swinburne University of Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy, 2009. Typescript. "August 2009". Bibliography p. 161-171.

APA, Harvard, Vancouver, ISO, and other styles

23

Woodley, Alan Paul. "NLPX : a natural language query interface for facilitating user-oriented XML-IR." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/16642/1/Alan_Woodley_Thesis.pdf.

Full text

Abstract:

Most information retrieval (IR) systems respond to users' representation of their information needs (queries) with a ranked list of relevant results, usually text documents. XML documents di er from traditional text documents by explicitly separating structure and content. XML-IR systems aim to exploit this separation by searching and retrieving relevant components of documents (called elements) rather than entire documents thereby, better ful lling users' information needs. Despite the potential bene t of XML-IR systems, most research in this area has not been centered on the needs of users. In particular, current XML-IR query formation interfaces, namely keywords-only and formal language, are not able to optimally address the needs of users. Keywords-only interfaces are too unsophisticated to fully capture the users' complex information needs that contain both content and structural requirements. In contrast, while formal languages are able to capture users' content and structural requirements they are too di cult to use, even for experts, and are too closely tied to the physical structure of the collection. This thesis presents a solution to these problems by presenting NLPX, a natural language interface for XML-IR systems. NLPX allows users to enter XML-IR queries in natural language and translates them into a formal language (NEXI) to be processed by existing XML retrieval systems. When evaluated by system testing, NLPX outperformed alternative translation approaches. When tested in a user-based experiment, NLPX performed comparably to a query-by-template interface, the baseline user-oriented interface for formulating structured queries. It is hoped that the outcomes of this thesis will help to refocus the eld of XML-IR around the user. This will lead to the development of more useful XML-IR systems, which will hopefully result in the more widespread use of XML-IR systems.

APA, Harvard, Vancouver, ISO, and other styles

24

Woodley, Alan Paul. "NLPX : a natural language query interface for facilitating user-oriented XML-IR." Queensland University of Technology, 2008. http://eprints.qut.edu.au/16642/.

Full text

Abstract:

Most information retrieval (IR) systems respond to users' representation of their information needs (queries) with a ranked list of relevant results, usually text documents. XML documents di er from traditional text documents by explicitly separating structure and content. XML-IR systems aim to exploit this separation by searching and retrieving relevant components of documents (called elements) rather than entire documents thereby, better ful lling users' information needs. Despite the potential bene t of XML-IR systems, most research in this area has not been centered on the needs of users. In particular, current XML-IR query formation interfaces, namely keywords-only and formal language, are not able to optimally address the needs of users. Keywords-only interfaces are too unsophisticated to fully capture the users' complex information needs that contain both content and structural requirements. In contrast, while formal languages are able to capture users' content and structural requirements they are too di cult to use, even for experts, and are too closely tied to the physical structure of the collection. This thesis presents a solution to these problems by presenting NLPX, a natural language interface for XML-IR systems. NLPX allows users to enter XML-IR queries in natural language and translates them into a formal language (NEXI) to be processed by existing XML retrieval systems. When evaluated by system testing, NLPX outperformed alternative translation approaches. When tested in a user-based experiment, NLPX performed comparably to a query-by-template interface, the baseline user-oriented interface for formulating structured queries. It is hoped that the outcomes of this thesis will help to refocus the eld of XML-IR around the user. This will lead to the development of more useful XML-IR systems, which will hopefully result in the more widespread use of XML-IR systems.

APA, Harvard, Vancouver, ISO, and other styles

25

Branko, Milosavljević. "Proširivi sistem za pronalaženje multimedijalnih dokumenata." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2003. http://dx.doi.org/10.2298/NS2003MILOSAVLJEVICBRANKO.

Full text

Abstract:

Oblast pronalaženja informacija kao jedan od osnovnih problema razmatra pronalaženje dokumenata u kolekciji koji su relevantni sa stanovišta korisnika. Ova disertacija se bavi problemima pronalaženja strukturiranih multimedijalnih dokumenata. Strukturirani multimedijalni dokumenti mogu, kao svoje elemente, sadržati objekte različitih tipova medija(tekst, slika, zvuk, ili video). Tema disertacije je formalna specifikacija modela sistema koji omogućava pronalaženje multimedijalnih dokumenata obezbeđujući pri tom proširivost sistema podrškom za različite tipove medija (što uključuje upotrebu različitih postojećih rešenja iz ove oblasti) i proširivost sistema različitim modelima pronalaženja dokumenata. XML jezik se koristi kao jezik za reprezentaciju dokumenata i kao jezik za komunikacijusistema sa klijentima. Sistem je verifikovan na realnom primeru digitalne biblioteke doktorskih i magistarskih teza pomoću razvijenog prototipa. Prikazana prototipska implementacija koja ispunjava ciljeve u pogledu funkcionalnosti postavljene predsistem predstavlja potvrdu praktiˇcne vrednosti predloženog modela.
The field of information retrieval deals with retrieval of documents judged as relevant by users. This dissertation focuses on problems in retrieval of structured multimedia documents. Structured multimedia documents comprise objects of different media types (such as text, images, audio or video clips) as their elements. The subject of the dissertation is a formal specification of a multimedia information retrieval system providing extensibility with support for different media types (including utilizing existing solutions in this field) and extensibility with different document retrieval models. XML is used as a language for expressing document content and as a langugage for communication between the system and its clients. The system is verified by a case study on a networked digital library of theses and dissertations. The presented prototype implementation presents a proof of the proposed model’s practical value.

APA, Harvard, Vancouver, ISO, and other styles

26

Buffoni, David. "Learning-to-rank consistent surrogates for information retrieval tasks." Paris 6, 2012. http://www.theses.fr/2012PA066568.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Fuhr, Norbert et al (Hrsg /Eds ). "Initiative for the Evaluation of XML Retrieval (INEX) : INEX 2003 Workshop Proceedings, Dagstuhl, Germany, December 15-17, 2003." Gerhard-Mercator-Universitaet Duisburg, 2004. http://www.ub.uni-duisburg.de/ETD-db/theses/available/duett-07012004-093151/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Celik, Cigdem. "An Mpeg-7 Video Database System For Content-based Management And Retrieval." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606679/index.pdf.

Full text

Abstract:

A video data model that allows efficient and effective representation and querying of spatio-temporal properties of objects has been previously developed. The data model is focused on the semantic content of video streams. Objects, events, activities performed by objects are the main interests of the model. The model supports fuzzy spatial queries including querying spatial relationships between objects and querying the trajectories of objects. In this thesis, this work is used as a basis for the development of an XML-based video database system. This system is aimed to be compliant with the MPEG-7 Multimedia Description Schemes in order to obey a universal standard. The system is implemented using a native XML database management system. Query entrance facilities are enhanced via integrating an NLP interface.

APA, Harvard, Vancouver, ISO, and other styles

29

Jeong, ChulSung. "Integrating 14649 STEP-NC protocols with Web-based remote storage and retrieval utilizing XML, ASP.NET, and Oracle database architecture /." Available to subscribers only, 2005. http://proquest.umi.com/pqdweb?did=1079666831&sid=10&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Li, Yong. "Contour Based 3D Biological Image Reconstruction and Partial Retrieval." Digital Archive @ GSU, 2007. http://digitalarchive.gsu.edu/cs_diss/29.

Full text

Abstract:

Image segmentation is one of the most difficult tasks in image processing. Segmentation algorithms are generally based on searching a region where pixels share similar gray level intensity and satisfy a set of defined criteria. However, the segmented region cannot be used directly for partial image retrieval. In this dissertation, a Contour Based Image Structure (CBIS) model is introduced. In this model, images are divided into several objects defined by their bounding contours. The bounding contour structure allows individual object extraction, and partial object matching and retrieval from a standard CBIS image structure. The CBIS model allows the representation of 3D objects by their bounding contours which is suitable for parallel implementation particularly when extracting contour features and matching them for 3D images require heavy computations. This computational burden becomes worse for images with high resolution and large contour density. In this essence we designed two parallel algorithms; Contour Parallelization Algorithm (CPA) and Partial Retrieval Parallelization Algorithm (PRPA). Both algorithms have considerably improved the performance of CBIS for both contour shape matching as well as partial image retrieval. To improve the effectiveness of CBIS in segmenting images with inhomogeneous backgrounds we used the phase congruency invariant features of Fourier transform components to highlight boundaries of objects prior to extracting their contours. The contour matching process has also been improved by constructing a fuzzy contour matching system that allows unbiased matching decisions. Further improvements have been achieved through the use of a contour tailored Fourier descriptor to make translation and rotation invariance. It is proved to be suitable for general contour shape matching where translation, rotation, and scaling invariance are required. For those images which are hard to be classified by object contours such as bacterial images, we define a multi-level cosine transform to extract their texture features for image classification. The low frequency Discrete Cosine Transform coefficients and Zenike moments derived from images are trained by Support Vector Machine (SVM) to generate multiple classifiers.

APA, Harvard, Vancouver, ISO, and other styles

31

Pérez, Martínez Juan Manuel. "Contextualizing a Data Warehouse with Documents." Doctoral thesis, Universitat Jaume I, 2007. http://hdl.handle.net/10803/10482.

Full text

Abstract:

La tecnología actual de los almacenes de datos y las técnicas OLAP permite a las organizaciones analizar los datos estructurados que éstas recopilan en sus bases de datos. Las circunstancias que rodean a estos datos aparecen descritas en documentos, típicamente ricos en texto. Esta información sobre el contexto de los datos registrados el almacén es muy valiosa, ya que nos permite interpretar el resultado obtenido en análisis históricos. Por ejemplo, la crisis financiera relatada una revista digital sobre economía podría explicar una caída de las ventas en una determinada región. Sin embargo, no es posible explotar esta información contextual utilizando directamente las herramientas OLAP tradicionales. La principal causa es la naturaleza no-estructurada, rica en texto, de los documentos que recogen dicha información. Esta tesis presenta el almacén contextualizado: un nuevo tipo de sistema de apoyo a la decisión que combina las tecnologías de los almacenes de datos y los sistemas de recuperación de la información para integrar las fuentes de información estructurada y de documentos de una organización, y analizar estos datos bajo distintos contextos.

APA, Harvard, Vancouver, ISO, and other styles

32

Ives, Zachary G. "Efficient query processing for data integration /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6864.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Tatarinov, Igor. "Semantic data sharing with a peer data management system /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/6942.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Oztarak, Hakan. "Structural And Event Based Multimodal Video Data Modeling." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606919/index.pdf.

Full text

Abstract:

Investments on multimedia technology enable us to store many more reflections of the real world in digital world as videos. By recording videos about real world entities, we carry a lot of information to the digital world directly. In order to store and efficiently query this information, a video database system (VDBS) is necessary. In this thesis work, we propose a structural, event based and multimodal (SEBM) video data model for VDBSs. SEBM video data model supports three different modalities that are visual, auditory and textual modalities and we propose that we can dissolve these three modalities with a single SEBM video data model. This proposal is supported by the interpretation of the video data by human. Hence we can answer the content based, spatio-temporal and fuzzy queries of the user more easily, since we store the video data as the way that s/he interprets the real world data. We follow divide and conquer technique when answering very complicated queries. We have implemented the SEBM video data model in a Java based system that uses XML for representing the SEBM data model and Berkeley XML DBMS for storing the data based on the SEBM prototype system.

APA, Harvard, Vancouver, ISO, and other styles

35

Tang, Ling-Xiang. "Link discovery for Chinese/English cross-language web information retrieval." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/58416/1/Ling-Xiang_Tang_Thesis.pdf.

Full text

Abstract:

Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.

APA, Harvard, Vancouver, ISO, and other styles

36

Fettke, Peter, and Peter Loos. "Ein Vorschlag zur Spezifikation von Fachkomponenten auf der Administrations-Ebene." Universitätsbibliothek Chemnitz, 2001. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200100831.

Full text

Abstract:

In dem Ansatz zur Spezifikation von Fachkomponenten von Turowski werden verschiedene Aspekte der Spezifikation von Fachkomponenten nicht umfassend berücksichtigt. Dabei handelt es sich insbesondere um Informationen, die Einkäufer sowie Verkäufer von Fachkomponenten benötigen, um ihre Aufgaben ordnungsgemäß und sachgerecht durchführen zu können. Um derartigen Aspekten bei der Spezifikation von Fachkomponenten Rechnung zu tragen, wird in diesem Beitrag eine Administrations-Ebene zur Spezifikation eingeführt. Die Administrations-Ebene umfasst Merkmale zur betriebswirtschaftlich-organisatorischen Handhabbarkeit und Verwaltung von Fachkomponenten. Im Einzelnen werden einerseits betriebswirtschaftlich-semantische Merkmale und andererseits technische Merkmale sowie schließlich sonstige Merkmale von Fachkomponenten unterschieden. Zur Spezifikation dieser Merkmale wird ein XML-basierter Vorschlag unterbreitet. Der Vorschlag wird kritisch diskutiert. Abschließend wird ein Entwick-lungspfad für den von Turowski entwickelten Spezifikationsansatz präsentiert.

APA, Harvard, Vancouver, ISO, and other styles

37

Tjondronegoro, Dian W. "PhD Thesis: "Content-based Video Indexing for Sports Applications using Multi-modal approach"." Thesis, Deakin University, 2005. https://eprints.qut.edu.au/2199/1/PhDThesis_Tjondronegoro.pdf.

Full text

Abstract:

Triggered by technology innovations, there has been a huge increase in the utilization of video, as one of the most preferred types of media due to its content richness, for many significant applications. To sustain an ongoing rapid growth of video information, there is an emerging demand for a sophisticated content-based video indexing system. However, current video indexing solutions are still immature and lack of any standard. One solution, namely annotation-based indexing, allows video retrieval using textual annotations. However, the major limitations are the restrictions of pre-defined keywords that can be used and the expensive manual work on annotating video. Another solution called feature-based indexing allows video search by low-level features comparison such as query by a sample image. Even though this approach can use automatically extracted features, users would not be able to retrieve video intuitively, based on high-level concepts. This predicament is caused by the so-called semantic gap which highlights the fact that users recall video contents in a high-level abstraction while video is generally stored as an arbitrary sequence of audio-visual tracks. To bridge the semantic gap, this thesis will demonstrate the use of domain-specific approach which aims to utilize domain knowledge in facilitating the extraction of high-level concepts directly from the audiovisual features. The main idea behind domain-specific approach is the use of domain knowledge to guide the integration of features from multi-modal tracks. For example, to extract goal segments from soccer and basketball video, slow motion replay scenes (visual) and excitement (audio) should be detected as they are played during most goal segments. Domain-specific indexing also exploits specific browsing and querying methods which are driven by specific users/applications’ requirements. Sports video is selected as the primary domain due to its content richness and popularity. Moreover, broadcasted sports videos generally span for hours with many redundant activities and the key segments could make up only 30% to 60% of the entire data depending on the progress of the match. This thesis presents a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple (audio-visual) modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s). The main objective is to contribute to the three major components of sports video indexing systems. The first component is a set of powerful techniques to extract audio-visual features and semantic contents automatically. The main purposes are to reduce manual annotations and to summarize the lengthy contents into a compact, meaningful and more enjoyable presentation. The second component is an expressive and flexible indexing technique that supports gradual index construction. Indexing scheme is essential to determine the methods by which users can access a video database. The third and last component is a query language that can generate dynamic video summaries for smart browsing and support user-oriented retrievals.

APA, Harvard, Vancouver, ISO, and other styles

38

Farfan, Fernando R. "Efficient Storage and Domain-Specific Information Discovery on Semistructured Documents." FIU Digital Commons, 2009. http://digitalcommons.fiu.edu/etd/126.

Full text

Abstract:

The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model's parsing mechanism. The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents.

APA, Harvard, Vancouver, ISO, and other styles

39

Peng, Xiaobo. "Mediation on XQuery Views." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5442/.

Full text

Abstract:

The major goal of information integration is to provide efficient and easy-to-use access to multiple heterogeneous data sources with a single query. At the same time, one of the current trends is to use standard technologies for implementing solutions to complex software problems. In this dissertation, I used XML and XQuery as the standard technologies and have developed an extended projection algorithm to provide a solution to the information integration problem. In order to demonstrate my solution, I implemented a prototype mediation system called Omphalos based on XML related technologies. The dissertation describes the architecture of the system, its metadata, and the process it uses to answer queries. The system uses XQuery expressions (termed metaqueries) to capture complex mappings between global schemas and data source schemas. The system then applies these metaqueries in order to rewrite a user query on a virtual global database (representing the integrated view of the heterogeneous data sources) to a query (termed an outsourced query) on the real data sources. An extended XML document projection algorithm was developed to increase the efficiency of selecting the relevant subset of data from an individual data source to answer the user query. The system applies the projection algorithm to decompose an outsourced query into atomic queries which are each executed on a single data source. I also developed an algorithm to generate integrating queries, which the system uses to compose the answers from the atomic queries into a single answer to the original user query. I present a proof of both the extended XML document projection algorithm and the query integration algorithm. An analysis of the efficiency of the new extended algorithm is also presented. Finally I describe a collaborative schema-matching tool that was implemented to facilitate maintaining metadata.

APA, Harvard, Vancouver, ISO, and other styles

40

Junqueira, Mirella Silva. "Uma proposta de interface de consulta para recuperação de informação em documentos semi-estruturados." Universidade Federal de Uberlândia, 2009. https://repositorio.ufu.br/handle/123456789/12474.

Full text

Abstract:

Conselho Nacional de Desenvolvimento Científico e Tecnológico
Semi-Structured Information Retrieval is an intermediate way to retrieve information between Textual Retrieval and Structured Retrieval (typical in relational database systems). In structured retrieval systems, users generally know the available data structure and query languages, so they can formulate queries that produce more accurate results. In textual retrieval users dont known the data structure and formulate queries with keywords only, which produces not so accurate results. In Semi-Structured Retrieval, users generally dont known the data structure and formulate queries that mix textual search and structured retrieval mechanisms. In this context, the problem of how to improve the results accuracy using the structure inside semi-structured documents appears. Semi-structured data is usually stored as XML documents and can be seen as trees. Internal nodes of these trees have the structure of documents, while leaf nodes contain text. The design of interfaces for users in this context is one of the biggest challenges in semi-structured information retrieval. This occurs especially because the users dont known the document structure and have problems in formulating structured queries. This dissertation presents a proposal and a prototype interface developed to help users in the process of formulation of structured queries. The aim is to increase the precision in the results of the queries. The proposal is validated by experiments involving volunteers users and by comparing the results of textual queries and structured queries made with the help of the interface. The improvement reaches 440% for well structured queries, with a user who knows the interface, and 179.75% for reasonably structured queries, by users without experience to use the interface.
A Recuperação Semi-Estruturada é uma forma de recuperação de informação intermediária entre a Recuperação Textual e a Recuperação Estruturada (típica em sistemas de banco de dados relacionais). Em sistemas de recuperação estruturada, o usuário geralmente conhece a estrutura dos dados e as linguagens de consulta disponíveis, conseguindo assim formular consultas que produzem resultados mais precisos. Na Recuperação Textual o usuário não conhece a estrutura dos dados e formula as consultas apenas com palavraschaves, as quais geram resultados não tão precisos. Na Recuperação Semi-Estruturada, o usuário geralmente desconhece a estrutura dos dados e formula consultas que mesclam buscas textuais e mecanismos de recuperação estruturada. Neste contexto, surge o problema de como melhorar a precisão dos resultados aproveitando a estrutura contida nos documentos semi-estruturados. Dados semi-estruturados são comummente armazenados como documentos XML, os quais podem ser vistos como árvores. Nós internos dessas árvores contem a estrutura do documento enquanto os nós folhas contêm os dados. O projeto de interfaces para usuários neste contexto é um dos grandes desafios na recuperação semi-estruturada. Isso ocorre especialmente porque os usuários não conhecem a estrutura do documento e têm dificuldade na formulação de consultas estruturadas. Este trabalho apresenta uma proposta e um protótipo de interface desenvolvido para auxiliar os usuários no processo de formulação de consultas estruturadas. Pretende-se com isso aumentar a precisão nos resultados das consultas. A proposta é validada por meio de experimentos envolvendo usuários voluntários e pela comparação de resultados obtidos com consultas textuais e consultas estruturadas formuladas com o auxílio da ferramenta. A melhoria atinge 440% para consultas bem estruturadas, realizadas por usuário que conhece bem a interface, e 179,75% para consultas razoavelmente estruturadas, realizadas por usuários sem experiência no uso da interface.
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

41

Acar, Esra. "Efficient index structures for video databases." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609322/index.pdf.

Full text

Abstract:

Content-based retrieval of multimedia data has been still an active research area. The efficient retrieval of video data is proven a difficult task for content-based video retrieval systems. In this thesis study, a Content-Based Video Retrieval (CBVR) system that adapts two different index structures, namely Slim-Tree and BitMatrix, for efficiently retrieving videos based on low-level features such as color, texture, shape and motion is presented. The system represents low-level features of video data with MPEG-7 Descriptors extracted from video shots by using MPEG-7 reference software and stored in a native XML database. The low-level descriptors used in the study are Color Layout (CL), Dominant Color (DC), Edge Histogram (EH), Region Shape (RS) and Motion Activity (MA). Ordered Weighted Averaging (OWA) operator in Slim-Tree and BitMatrix aggregates these features to find final similarity between any two objects. The system supports three different types of queries: exact match queries, k-NN queries and range queries. The experiments included in this study are in terms of index construction, index update, query response time and retrieval efficiency using ANMRR performance metric and precision/recall scores. The experimental results show that using BitMatrix along with Ordered Weighted Averaging method is superior in content-based video retrieval systems.

APA, Harvard, Vancouver, ISO, and other styles

42

Lee, Chin Siong. "NPS AUV workbench: collaborative environment for autonomous underwater vehicles (AUV) mission planning and 3D visualization." Thesis, Monterey, California. Naval Postgraduate School, 2004. http://hdl.handle.net/10945/1658.

Full text

Abstract:

Approved for public release, distribution is unlimited
alities. The extensible Markup Language (XML) is used for data storage and message exchange, Extensible 3D (X3D) Graphics for visualization and XML Schema-based Binary Compression (XSBC) for data compression. The AUV Workbench provides an intuitive cross-platform-capable tool with extensibility to provide for future enhancements such as agent-based control, asynchronous reporting and communication, loss-free message compression and built-in support for mission data archiving. This thesis also investigates the Jabber instant messaging protocol, showing its suitability for text and file messaging in a tactical environment. Exemplars show that the XML backbone of this open-source technology can be leveraged to enable both human and agent messaging with improvements over current systems. Integrated Jabber instant messaging support makes the NPS AUV Workbench the first custom application supporting XML Tactical Chat (XTC). Results demonstrate that the AUV Workbench provides a capable testbed for diverse AUV technologies, assisting in the development of traditional single-vehicle operations and agent-based multiple-vehicle methodologies. The flexible design of the Workbench further encourages integration of new extensions to serve operational needs. Exemplars demonstrate how in-mission and post-mission event monitoring by human operators can be achieved via simple web page, standard clients or custom instant messaging client. Finally, the AUV Workbench's potential as a tool in the development of multiple-AUV tactics and doctrine is discussed.
Civilian, Singapore Defence Science and Technology Agency

APA, Harvard, Vancouver, ISO, and other styles

43

Chen, Chung Chen, and 陳鍾誠. "XML Retrieval - A Slot-Filling Approach." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/28440213557073524161.

Full text

Abstract:

博士
國立臺灣大學
資訊工程學研究所
90
Extensible Markup Language (XML) is widely used in data exchanging and knowledge representation. A retrieval system that understands the content of XML documents is strongly desired. In order to improve the efficiency of XML retrieval systems, we design a set of methods based on a ontology called slot-tree, and use the slot-tree to help the XML retrieval process. One problem for us to building smart computer is that computer cannot understand natural language as good as human. This is called the semantic gap between human and computer. For XML retrieval systems, semantic gap lies on both the query side and document side. The semantic gap in query side is due to the difficulty for human to write structuralized query. The semantic gap in document side is due to the difficulty for computer to understand XML documents. In order to reduce semantic gaps, we design a XML retrieval system based on the slot-tree ontology. The slot-tree ontology is an object-based knowledge representation. First, we design the slot-tree ontology to represent the inner structure of an object. Next, we design a slot-filling algorithm that maps XML documents into the slot-tree ontology in order to catch the semantics. After that, we design a XML retrieval system to reduce the semantic gap based on the slot-tree ontology and slot-filling algorithm. The system contains a slot-based query interface, a semantic retrieval model for XML, and a program that extract summary for browsing. However, the construction of slot-tree is not an easy job, so we design a slot-mining algorithm to construct slot-tree automatically. Slot-mining algorithm is a statistical approach that based on the correlation analysis between tags and words. The highly correlated terms are filled into the slot-tree as values. This algorithm eases the construction process of slot-tree. Two XML collections are used as the test bed of our XML retrieval system, one about butterflies and another about proteins. We found that our XML retrieval system is easy to use and performs well in the retrieval effectiveness and the quality of browsing. Besides, the slot-mining algorithm can fill important words into each slot. However, the mining result should be modified in order to improve the quality of slot-tree. Finally, we conclude our contribution on XML retrieval and compare our methods to some other methods. A qualitative analysis is given in the last chapter. After that, we propose our direction for the future research.

APA, Harvard, Vancouver, ISO, and other styles

44

Lee, Chen-Hsuan, and 李正軒. "Study of Information Retrieval System for XML Documents." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/19789045105833964289.

Full text

Abstract:

碩士
國立交通大學
資訊工程系
88
XML (Extensible Markup Language) is a standard of document format that allows information providers to define their own document types. When compare with HTML (HyperText Markup Language), XML is much more extensible. Therefore, it is apparent that XML is a very important standard in many applications, especially in the area of web and business-to-business. When the number of XML documents is increased, it becomes hard to search information from these documents. Therefore, this paper studies an information retrieval system for XML documents to help users easily and efficiently retrieve information among a large number of XML documents.

APA, Harvard, Vancouver, ISO, and other styles

45

Lu, Yung-hsiang, and 呂永祥. "A User-Expectation-Oriented XML Data Retrieval Technique." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/7g2vey.

Full text

Abstract:

碩士
東吳大學
資訊科學系
93
As a result of the format and element type can be clearly defined inside a XML document, information could be exchanged between different organization or community based on their DTD definition. XML has replaced the HTML and become the optimum information exchange platform in the Internet. However, it can also cause some problem such as data complicating in the situation of increasing of large amount of XML document. In this research we propose a User- Expectation-Oriented XML Retrieval Technique and implement a prototype of a retrieval system of XML data which can find the most suitable document that match user’s semantic expectation and desire. This research has three major contributions. First, we propose a structure reforming mechanism and it’s sub-components to analysis the structure and content inside XML document at the beginning of the whole retrieval process. Second, we propose a semantic analysis module based on correlation relationship between key terms and document structure information to clearly provide the degree of semantic correlation. Third, we propose a information explore module to synthesize information coming form previous module and pick up the most suitable result form all possible document set. We compared our approach with another one that only use structure-based retrieval technique by experiments. Performance evaluation shows that our approach is more effective than other in terms of user–satisfaction and the ratio of memory occupation of indexing representation.

APA, Harvard, Vancouver, ISO, and other styles

46

Lee, Zong-Han, and 李宗翰. "Design of An XML-Based Image Retrieval System." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/22982470080564171761.

Full text

Abstract:

碩士
國立高雄第一科技大學
電腦與通訊工程系
88
As the advances of the Internet technologies, more and more books and multimedia documents have been digitized and shared over the Internet. To design an effective and efficient information retrieval scheme, and hence, becomes an important issue for the purpose of querying multimedia data through the Internet. In this thesis, we propose a new object-based image retrieval system. In our system, the objects of an image are segmented and labeled by the nearest matched color of a carefully designed system color palette. Objects of an image consist of a spatial composition structure and recorded by a modified two-dimensional string structure, which is proposed to solve the rotation-variant problem in the general two-dimensional string methods. The spatial relationships among objects of image are used to index and retrieve images in the database. Besides, the proposed system is described and managed in XML. The XML technique has been proved to have powerful description capability for interchanging text information through the Internet. We use it to achieve the goal of interchanging multimedia data.

APA, Harvard, Vancouver, ISO, and other styles

47

Gu, Xin. "Comparing top-k algorithms in summary-based XML retrieval." 2007. http://link.library.utoronto.ca/eir/EIRdetail.cfm?Resources__ID=452807&T=F.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

郭保惠. "Study and implementation on XML database searching and retrieval." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/40391988269136014752.

Full text

Abstract:

碩士
國立交通大學
電資學院學程碩士班
91
This thesis focused on the search and retrieval of XML database system. An introduction about XML technology related to XQuery such as Schema and XPath. is presented. At the same time, a comparison is made between XML DOM and XSLT, which are major search engines for XML documents. Nowadays, there are two ways to store and manage XML documents. One way is to store in the “relation database”; the other is to store in the newly-developed “Native XML database”. Up to now, there is no conclusion which one is a better way to apply to contemporary commerce information system. Although many researchers have gone into this kind of study, the functional evaluation of the Native XML database is still incomplete. A prototype XML database system is developed using examination question system as an example. A two-tier client server architecture is used.. Interactive cooperation of ASP and DOM are used to develop web pages and to study the application of XML database. From this study, we can find that using XML documents can reduce the task of translating data into relational database. However, considering the efficiency in searching and retrieval the size of the XML documents should be properly designed and the structure of XML documents should not be too complicated. Through experiments, we validate the functionalities and effectiveness of the system.

APA, Harvard, Vancouver, ISO, and other styles

49

Liu, Chu-Yen, and 劉鉅彥. "Efficient Storage and Retrieval of XML Documents Using XQuery." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/26968089695001667139.

Full text

Abstract:

碩士
大同大學
資訊工程學系(所)
92
In the last decade of the 20th century, because of the popularity of Internet, the trend is towards e-solutions for businesses. Not only apply on the electronic commerce but also on the information exchange to decrease time from material in the manufacturer to products bought by customers. However, the problem we confront today is that there are full of e-documents in businesses. XML has already been the standard of data interchange on the Internet. In the future, a large amount of data will be represented in XML format. However, most of the critical data in businesses are still stored in relational database management systems. It is difficult to query XML databases because of its textual format. This thesis research intends to tackle this problem, and we proposed a system to manage XML documents that could be queried by the query language XQuery. XML documents are stored in relational format and the XQuery expressions are translated into appropriate SQL queries. The results of the SQL queries are transformed into XML documents.

APA, Harvard, Vancouver, ISO, and other styles

50

Wu, Cheng-Lung, and 吳政隆. "Using XML in Audit Data Retrieval- An Implementation Study." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/83616203393004842250.

Full text

Abstract:

碩士
中原大學
會計研究所
90
Recently, with the surging development of information technology, software and hardware standards changed rapidly. In addition, information systems are constructed to meet a variety of user needs. Auditors begin to meet troubles in integrating electronic audit data across heterogeneous data sources. To settle this problem, auditors can choose to design tailor-made auditing software for each system, or use data retrieval interface embedded in the auditing software to identify different data formats. However, the “problem-oriented” approach cannot provide general solutions to many problems in system integration. To seek a better solution, this research is attempting to find out a general interface for audit data retrieval to integrate heterogeneous audit data. XML（eXtensible Markup Language）, released by W3C（World Wide Web Consortium）in 1998, has powerful information transfer ability in the internet environment. It is capable to describe complicated data structure, and is easy for user to learn and use. For these reasons, XML is employed as an effective tool to facilitate the integration of heterogeneous databases. It provides a unified structural view for the data in different formats or in different schemas. Securities and Futures Commission and most CPA firms have been equipping themselves to adopt and promote the new technology in many accounting and auditing related services. The objective of this research is to employ concepts and technologies about XML to build a prototype system using XML in audit data retrieval. In this study, data retrieved from database are transformed into XML documents according to Document Type Definition (DTD). Then these documents are audited by preprogrammed audit programs. With the system, we can see how XML enhances auditors’ ability by making their tasks remotely and more efficiently.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'XML Retrieval'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

In: