Dissertations / Theses: 'Data and metadata structures'

1

Kruse, Sebastian [Verfasser], and Felix [Akademischer Betreuer] Naumann. "Scalable data profiling : distributed discovery and analysis of structural metadata / Sebastian Kruse ; Betreuer: Felix Naumann." Potsdam : Universität Potsdam, 2018. http://d-nb.info/1217717854/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Wu, Qinyi. "Partial persistent sequences and their applications to collaborative text document editing and processing." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/44916.

Full text

Abstract:

In a variety of text document editing and processing applications, it is necessary to keep track of the revision history of text documents by recording changes and the metadata of those changes (e.g., user names and modification timestamps). The recent Web 2.0 document editing and processing applications, such as real-time collaborative note taking and wikis, require fine-grained shared access to collaborative text documents as well as efficient retrieval of metadata associated with different parts of collaborative text documents. Current revision control techniques only support coarse-grained shared access and are inefficient to retrieve metadata of changes at the sub-document granularity. In this dissertation, we design and implement partial persistent sequences (PPSs) to support real-time collaborations and manage metadata of changes at fine granularities for collaborative text document editing and processing applications. As a persistent data structure, PPSs have two important features. First, items in the data structure are never removed. We maintain necessary timestamp information to keep track of both inserted and deleted items and use the timestamp information to reconstruct the state of a document at any point in time. Second, PPSs create unique, persistent, and ordered identifiers for items of a document at fine granularities (e.g., a word or a sentence). As a result, we are able to support consistent and fine-grained shared access to collaborative text documents by detecting and resolving editing conflicts based on the revision history as well as to efficiently index and retrieve metadata associated with different parts of collaborative text documents. We demonstrate the capabilities of PPSs through two important problems in collaborative text document editing and processing applications: data consistency control and fine-grained document provenance management. The first problem studies how to detect and resolve editing conflicts in collaborative text document editing systems. We approach this problem in two steps. In the first step, we use PPSs to capture data dependencies between different editing operations and define a consistency model more suitable for real-time collaborative editing systems. In the second step, we extend our work to the entire spectrum of collaborations and adapt transactional techniques to build a flexible framework for the development of various collaborative editing systems. The generality of this framework is demonstrated by its capabilities to specify three different types of collaborations as exemplified in the systems of RCS, MediaWiki, and Google Docs respectively. We precisely specify the programming interfaces of this framework and describe a prototype implementation over Oracle Berkeley DB High Availability, a replicated database management engine. The second problem of fine-grained document provenance management studies how to efficiently index and retrieve fine-grained metadata for different parts of collaborative text documents. We use PPSs to design both disk-economic and computation-efficient techniques to index provenance data for millions of Wikipedia articles. Our approach is disk economic because we only save a few full versions of a document and only keep delta changes between those full versions. Our approach is also computation-efficient because we avoid the necessity of parsing the revision history of collaborative documents to retrieve fine-grained metadata. Compared to MediaWiki, the revision control system for Wikipedia, our system uses less than 10% of disk space and achieves at least an order of magnitude speed-up to retrieve fine-grained metadata for documents with thousands of revisions.

APA, Harvard, Vancouver, ISO, and other styles

3

Lima, João Alberto de Oliveira. "Modelo Genérico de Relacionamentos na Organização da Informação Legislativa e Jurídica." Thesis, reponame:Repositório Institucional da UnB, 2008. http://eprints.rclis.org/11352/1/tese_Joao_Lima_FINAL.pdf.

Full text

Abstract:

In most of the time information does not work in an isolate form and it always belongs to one context, making relationships with other entities. Legislative and legal information, in a certain way, is characterized by their high degree of relationships. Laws, bills, legal cases and doctrine are connected by several forms, creating a rich network of information. Efforts done for the organization of information generate artificial models that try to represent the real world, creating systems and schemes of concepts used in the classification processes and indexing of information resources. This research had the main objective of proposing a Generic Model of Relationship (GMR), based in a simple constructs which permitted the establishment of relationships between concepts and information units. In the conception of GMR were used Ingetraut Dahlberg’s Theory of Concept and the models CIDOC CRM (ISO 21.117:2006), FRBROO and Topic Maps (ISO 13.250:1999). The identification of relationship and the characteristics of information units in a legal domain were collected in the project "Coletânea Brasileira de Normas e Julgados de Telecomunicações", using the methodology of Action Research. Besides the development of GMR and its application in the legislative and legal information domains, the research also contributed with the definitions of one identification system of documents versions and a new meaning for the term “information unit”.

APA, Harvard, Vancouver, ISO, and other styles

4

Da, Silva Carvalho Paulo. "Plateforme visuelle pour l'intégration de données faiblement structurées et incertaines." Thesis, Tours, 2017. http://www.theses.fr/2017TOUR4020/document.

Full text

Abstract:

Nous entendons beaucoup parler de Big Data, Open Data, Social Data, Scientific Data, etc. L’importance qui est apportée aux données en général est très élevée. L’analyse de ces données est importante si l’objectif est de réussir à en extraire de la valeur pour pouvoir les utiliser. Les travaux présentés dans cette thèse concernent la compréhension, l’évaluation, la correction/modification, la gestion et finalement l’intégration de données, pour permettre leur exploitation. Notre recherche étudie exclusivement les données ouvertes (DOs - Open Data) et plus précisément celles structurées sous format tabulaire (CSV). Le terme Open Data est apparu pour la première fois en 1995. Il a été utilisé par le groupe GCDIS (Global Change Data and Information System) (États-Unis) pour encourager les entités, possédant les mêmes intérêts et préoccupations, à partager leurs données [Data et System, 1995]. Le mouvement des données ouvertes étant récent, il s’agit d’un champ qui est actuellement en grande croissance. Son importance est actuellement très forte. L’encouragement donné par les gouvernements et institutions publiques à ce que leurs données soient publiées a sans doute un rôle important à ce niveau
We hear a lot about Big Data, Open Data, Social Data, Scientific Data, etc. The importance currently given to data is, in general, very high. We are living in the era of massive data. The analysis of these data is important if the objective is to successfully extract value from it so that they can be used. The work presented in this thesis project is related with the understanding, assessment, correction/modification, management and finally the integration of the data, in order to allow their respective exploitation and reuse. Our research is exclusively focused on Open Data and, more precisely, Open Data organized in tabular form (CSV - being one of the most widely used formats in the Open Data domain). The first time that the term Open Data appeared was in 1995 when the group GCDIS (Global Change Data and Information System) (from United States) used this expression to encourage entities, having the same interests and concerns, to share their data [Data et System, 1995]. However, the Open Data movement has only recently undergone a sharp increase. It has become a popular phenomenon all over the world. Being the Open Data movement recent, it is a field that is currently growing and its importance is very strong. The encouragement given by governments and public institutions to have their data published openly has an important role at this level

APA, Harvard, Vancouver, ISO, and other styles

5

Nadal, Francesch Sergi. "Metadata-driven data integration." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/666947.

Full text

Abstract:

Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings. This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities. We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems.
Les dades tenen un impacte indubtable en la societat. La capacitat d’emmagatzemar i processar grans quantitats de dades disponibles és avui en dia un dels factors claus per l’èxit d’una organització. No obstant, avui en dia estem presenciant un canvi representat per grans volums de dades heterogenis. En efecte, el 90% de les dades mundials han sigut generades en els últims dos anys. Per tal de dur a terme aquestes tasques d’explotació de dades, les organitzacions primer han de realitzar una integració de les dades, combinantles a partir de diferents fonts amb l’objectiu de tenir-ne una vista unificada d’elles. Per això, aquest fet requereix reconsiderar les assumpcions tradicionals en integració amb l’objectiu de lidiar amb els requisits imposats per aquests sistemes de tractament massiu de dades. Aquesta tesi doctoral té com a objectiu proporcional un nou marc de treball per a la integració de dades en el context de sistemes de tractament massiu de dades, el qual implica lidiar amb una gran quantitat de dades heterogènies, provinents de múltiples fonts i en el seu format original. Per això, proposem un procés d’integració compost d’una seqüència d’activitats governades per una capa semàntica, la qual és implementada a partir d’un repositori de metadades compartides. Des d’una perspectiva d’administració, aquestes activitats són el desplegament d’una arquitectura d’integració de dades, seguit per la inserció d’aquestes metadades compartides. Des d’una perspectiva de consum de dades, les activitats són la integració virtual i materialització de les dades, la primera sent una tasca exploratòria i la segona una de consolidació. Seguint el marc de treball proposat, ens centrem en proporcionar contribucions a cada una de les quatre activitats. La tesi inicia proposant una arquitectura de referència de software per a sistemes de tractament massiu de dades amb coneixement semàntic. Aquesta arquitectura serveix com a planell per a desplegar un conjunt de sistemes, sent el repositori de metadades al seu nucli. Posteriorment, proposem un model basat en grafs per a la gestió de metadades. Concretament, ens centrem en donar suport a l’evolució d’esquemes i fonts de dades, un dels factors predominants en les fonts de dades heterogènies considerades. Per a l’integració virtual, proposem algorismes de rescriptura de consultes que usen el model de metadades previament proposat. Com a afegitó, considerem heterogeneïtat semàntica en les fonts de dades, les quals els algorismes de rescriptura poden resoldre automàticament. Finalment, la tesi es centra en l’activitat d’integració materialitzada. Per això proposa un mètode per a seleccionar els resultats intermedis a materialitzar un fluxes de tractament intensiu de dades. En general, els resultats d’aquesta tesi serveixen com a contribució al camp d’integració de dades en els ecosistemes de tractament massiu de dades contemporanis
Les données ont un impact indéniable sur la société. Le stockage et le traitement de grandes quantités de données disponibles constituent actuellement l’un des facteurs clés de succès d’une entreprise. Néanmoins, nous assistons récemment à un changement représenté par des quantités de données massives et hétérogènes. En effet, 90% des données dans le monde ont été générées au cours des deux dernières années. Ainsi, pour mener à bien ces tâches d’exploitation des données, les organisations doivent d’abord réaliser une intégration des données en combinant des données provenant de sources multiples pour obtenir une vue unifiée de ces dernières. Cependant, l’intégration de quantités de données massives et hétérogènes nécessite de revoir les hypothèses d’intégration traditionnelles afin de faire face aux nouvelles exigences posées par les systèmes de gestion de données massives. Cette thèse de doctorat a pour objectif de fournir un nouveau cadre pour l’intégration de données dans le contexte d’écosystèmes à forte intensité de données, ce qui implique de traiter de grandes quantités de données hétérogènes, provenant de sources multiples et dans leur format d’origine. À cette fin, nous préconisons un processus d’intégration constitué d’activités séquentielles régies par une couche sémantique, mise en oeuvre via un dépôt partagé de métadonnées. Du point de vue de la gestion, ces activités consistent à déployer une architecture d’intégration de données, suivies de la population de métadonnées partagées. Du point de vue de la consommation de données, les activités sont l’intégration de données virtuelle et matérialisée, la première étant une tâche exploratoire et la seconde, une tâche de consolidation. Conformément au cadre proposé, nous nous attachons à fournir des contributions à chacune des quatre activités. Nous commençons par proposer une architecture logicielle de référence pour les systèmes de gestion de données massives et à connaissance sémantique. Une telle architecture consiste en un schéma directeur pour le déploiement d’une pile de systèmes, le dépôt de métadonnées étant son composant principal. Ensuite, nous proposons un modèle de métadonnées basé sur des graphes comme formalisme pour la gestion des métadonnées. Nous mettons l’accent sur la prise en charge de l’évolution des schémas et des sources de données, facteur prédominant des sources hétérogènes sous-jacentes. Pour l’intégration virtuelle, nous proposons des algorithmes de réécriture de requêtes qui s’appuient sur le modèle de métadonnées proposé précédemment. Nous considérons en outre les hétérogénéités sémantiques dans les sources de données, que les algorithmes proposés sont capables de résoudre automatiquement. Enfin, la thèse se concentre sur l’activité d’intégration matérialisée et propose à cette fin une méthode de sélection de résultats intermédiaires à matérialiser dans des flux des données massives. Dans l’ensemble, les résultats de cette thèse constituent une contribution au domaine de l’intégration des données dans les écosystèmes contemporains de gestion de données massives

APA, Harvard, Vancouver, ISO, and other styles

6

Flamino, Adriana Nascimento. "MARCXML: um padrão de descrição para recursos informacionais em Open Archives." Thesis, Marília : [s.n], 2006. http://eprints.rclis.org/16623/1/FLAMINO_AN_DISSERTACAO.pdf.

Full text

Abstract:

The scientific communication is suffering considerable alterations so much in its process as in its structure and philosophy. The open archives and open access initiatives are contributing significantly for the undoing of the traditional model of scientific communication and for the construction of a new disaggregated model and with interoperability, fairer and efficient to disseminate the research results and like this, the knowledge generated by the scientific communities. However, due to the progresses of the information and communication technologies, not only the structure and the flow of the scientific communication is suffering considerable alterations, as well as the own concept and support of the scientific documents. This has been generating the need of the development of tools to optimize the organization, description, exchange and information retrieval processes, besides the digital preservation, among others. Highlight that the MARC format it has been allowing per decades the description and the exchange of bibliographical and cataloging registrations to the institutions, favoring the access to the contents informacionais contained in several collections. However, with the exponential growth of information and of the documents generation (above all digital), this has been demanding larger flexibility and interoperability among the several information systems available. In this scenery, the XML markup language is presented as one of the current developments that has as purpose to facilitate and to optimize the administration, storage and transmission of contents through Internet, it being incorporate for several sections and areas of the knowledge for the handling easiness and operational flexibility. Front to that, an exploratory study of theoretical analysis was accomplished, identifying the adaptation of the MARCXML format in the construction in ways of descriptive representation for information resources in open archives, as a complex and flexible standard of metadata, that will make possible the interoperability among information systems heterogeneous, besides the access to the information. As result of this research, It's considered that MARCXML is an appropriate format for description of data in a complex structure. It’s ended that the measure that increases the complexity of the documents in the repositories and open archives, plus it’s justified a structure of metadata, as the MARCXML format, that support the description of the pecificities of the informational resources, once this initiative is not and nor it will be if restricting to scientific documents, but expanding the other types of informational resources more and more complex and specific, also demanding an appropriate description for the specificities of the bibliographical entities.

APA, Harvard, Vancouver, ISO, and other styles

7

Park, June Young. "Data-driven Building Metadata Inference." Research Showcase @ CMU, 2016. http://repository.cmu.edu/theses/127.

Full text

Abstract:

Building technology has been developed due to the improvement of information technology. Specifically, a human can control and monitor the building operation by a number of sensors and actuators. The sensors and actuators are installed on every single element in a building. Thus, the large stream of building data allows us to implement both quantitative and qualitative improvements. However, there are still limitations to mapping between the physical building element and cyber system. To solve this mapping issue, last summer, a text mining methodology was developed as part of a project conducted by the Consortium for Building Energy Innovation. Building data was extracted from building 661, in Philadelphia, PA. The ground truth of the building data point with semantic information was labeled by manual inspection. And a Support Vector Machine was implemented to investigate the relationship between the data point name and the semantic information. This algorithm achieves 93% accuracy with unseen building 661 data points. Techniques and lessons were gained from this project, and this knowledge was used to develop the framework for analyzing the building data from the Gates Hillman Center (GHC) building, Pittsburgh PA. This new framework consists of two stages. In the first stage, we initially tried to cluster the data points by similar semantic information, using the hierarchical clustering method. However, the effectiveness and accuracy of the clustering method is not adequate for this framework. Thus, the filtering and classification model is developed to identify the semantic information of the data points. From the filtering and classification method, it correctly identifies the damper position and supply air duct pressure data point with 90% accuracy by daily statistical features. Having the semantic information from the first stage, the second stage figures out the relationship between Variable Air Volume (VAV) terminal units and Air Handling Units (AHU). The intuitive thermal and flow relationship between VAVs and AHUs are investigated at the beginning, and the statistical features clustering method is applied from the VAV discharge temperature data. However, the control strategy of this building makes this relationship invisible. Alternatively we then compared the similarity between damper position at VAVs and supply air duct pressure at AHUs by calculating the cross correlation. Finally, this similarity scoring method achieved 80% accuracy to map the relationship between VAVs and AHUs. The suggested framework will guide the user to find the desired information such as the VAVs – AHUs relationship from the problem generated by a large number of heterogeneous sensor networks by using data-driven methodology.

APA, Harvard, Vancouver, ISO, and other styles

8

Kumar, Aman. "Metadata-Driven Management of Scientific Data." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1243898671.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Kyhlberg, Erik. "Data om Data : Metadata för geodata inom kommunal översiktsplanering." Thesis, Uppsala universitet, Kulturgeografiska institutionen, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-158528.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Pradhan, Anup. "The Geospatial Metadata Server : on-line metadata database, data conversion and GIS processing." Thesis, University of Edinburgh, 2000. http://hdl.handle.net/1842/30655.

Full text

Abstract:

This research proposes an on-line software demonstrator called the Geospatial Metadata Server (GMS), which is designed to catalogue, enter, maintain and query metadata using a centralised metadatabase. The system also converts geospatial data to and from a variety of different formats as well as proactively searches for data throughout the Web using an automated hyperlinks retrieval program. GMS is divided into six components, three of which constitute a Metadata Cataloguing tool. The metadatabase is implemented within an RDBMS, which is capable of querying large quantities of on-line metadata in standardised format. The database schema used to store the metadata was patterned in part off of the Content Standard for Digital Geographic Metadata, which provides geospatial data providers and users a common assemblage of descriptive terminology. Because of the excessive length of metadata records, GMS is equipped with a parsing algorithm and database entry utility that is used to enter delimited on-line metadata text files into the metadatabase. Alternatively the system provides a metadata entry and update utility, which is divided into a series of HTML forms each corresponding to a metadatabase table. The utility cleverly allows users to maintain their personal metadata records over the Web. The other three GMS components constitute a Metadata Querying tool. GMS consists of a search engine that can access its metadatabase, examine retrieved information and use the information within on-line server side software. The search engine integrates the metadatabase, on-line geospatial data and GIS software (i.e. the GMS conversion utilities) over the Web. The on-line conversion utilities are capable of converting geospatial data from anywhere on the Web solving many of the problems associated with the interoperability of vendor specific GIS data formats. Because the conversion utilities operate over the Web, the technology is easier to use and more accessible to a greater number of GIS users. Also integrated into the search engine is a Web robot designed to automatically seek out geospatial data from remote Web sites and index their hypertext links within a file or database.

APA, Harvard, Vancouver, ISO, and other styles

11

Ding, Luping. "Metadata-aware query processing over data streams." Worcester, Mass. : Worcester Polytechnic Institute, 2008. http://www.wpi.edu/Pubs/ETD/Available/etd-042208-194826/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Kupferschmidt, Benjamin, and Eric Pesciotta. "METADATA MODELING FOR AIRBORNE DATA ACQUISITION SYSTEMS." International Foundation for Telemetering, 2007. http://hdl.handle.net/10150/604548.

Full text

Abstract:

ITC/USA 2007 Conference Proceedings / The Forty-Third Annual International Telemetering Conference and Technical Exhibition / October 22-25, 2007 / Riviera Hotel & Convention Center, Las Vegas, Nevada
Many engineers express frustration with the multitude of vendor specific tools required to describe measurements and configure data acquisition systems. In general, tools are incompatible between vendors, forcing the engineer to enter the same or similar data multiple times. With the emergence of XML technologies, user centric data modeling for the flight test community is now possible. With this new class of technology, a vendor neutral, standard language to define measurements and configure systems may finally be realized. However, the allure of such a universal language can easily become too abstract, making it untenable for hardware configuration and resulting in a low vendor adoption rate. Conversely, a language that caters too much to vendor specific configuration will defeat its purpose. Achieving this careful balance is not trivial, but is possible. Doing so will produce a useful standard without putting it out of the reach of equipment vendors. This paper discusses the concept, merits, and possible solutions for a standard measurement metadata model. Practical solutions using XML and related technologies are discussed.

APA, Harvard, Vancouver, ISO, and other styles

13

Pozzoli, Alice. "Data and quality metadata for continuous fields." Lyon, INSA, 2008. http://theses.insa-lyon.fr/publication/2008ISAL0024/these.pdf.

Full text

Abstract:

This thesis deals with data processing in Geomatics. It ranges from data acquisition in Photogrammetry to data representation as well as in Cartography. The objective of this research was to use statistical techniques of data processing for the creation of digital surface models starting from photogrammetric images. The main function of photogrammetry is the transformation of data coming from the image space to the object space. An easy solution for three image orientation is proposed. The orientation procedure described has some relevant advantages for environmental and monitoring applications, and makes it a very powerful tool in addition to more traditional methodologies. Among many different applications, an interesting project for the survey of a hydraulic 3D model of a stream confluence in the mountain area has been performed. From a computing point of view, we propose a description of the photogrammetric data based on the XML format for geographic data (Geographic Markup Language). The aim is to optimize the archiving and management of geo-data. As a conclusion, an original software product which allows to model terrains starting from three-image photogrammetry has been developed and tested
Le sujet principal de ma thèse est le traitement des données en géomatique allant de l’acquisition des données photogrammétriques à la représentation cartographique. L’objectif de ma recherche est ainsi l’utilisation des techniques statistiques pour le traitement des données géomatiques afin de créer des modèles numériques des terrains en partant des données photogrammetriques. La fonction principale de la Photogrammétrie est la transformation des données en partant de l’espace-image à l’espace-objet. Nous avons proposé une solution pratique pour l’orientation automatique à partir de trois images. Cette méthodologie d’orientation présente de nombreux avantages pour les applications environnementales et de surveillance, et elle est un puissant instrument que l’on peut utiliser à côté de méthodologies plus traditionnelles. Parmi diverses applications possibles, on a choisi de construire le relief d’un modèle hydraulique 3D qui représente la confluence de deux torrents dans une région montagneuse. D’un point de vue informatique, nous avons proposé une description de données photogrammétriques basée sur le format XML pour les données géographiques (extension de GML, Geographic Markup Language). L’objectif est d’optimiser l’archivage et la gestion des données géomatiques. Enfin, un logiciel original a été produit, qui permet de modéliser les terrains en utilisant la photogrammétrie à trois images

APA, Harvard, Vancouver, ISO, and other styles

14

Solomonides, Anthony Eugene. "Data, metadata, and workflow in healthcare informatics." Thesis, University of the West of England, Bristol, 2018. http://eprints.uwe.ac.uk/31070/.

Full text

Abstract:

This dissertation considers a number of interlinked concepts, propositions and relations, and puts forward a set of design theses, to support the role of informatics in the overall goal of knowledge-based, information-driven, integrated, patient-centred, collaborative healthcare and research. This rather ambitious scope may be delimited by exclusion: the work is not concerned explicitly with genomics or bioinformatics, but it does encompass certain aspects of trans- lational medicine and personalized healthcare, which I take to be subsumed in some sense under “knowledge-based” and “information-driven”. Although I do not exclude public health informatics, my exposure extends only to surveillance of infectious diseases, patient engagement, and the effectiveness of screening programmes. I do take ethical, legal, social and economic issues (ELSE) to be included, at least to the extent that I aim at an infrastructure that encompasses these issues and aims to incorporate them in technical designs in an effort to meet ethicists’, lawyers’, policy makers’, and economists’ concerns halfway. To a first approx- imation, the aim has been to integrate two strands of work over the last decade or more: the informatics of medical records on one hand and the distributed computational infrastructures for healthcare and biomedical research on the other. The papers assembled in this dissertation span a period of rapid growth in biomedical inform- atics (BMIi). Their unifying theme was not declared programmatically at the beginning of this period, but rather developed, along with individual pieces of work, as my engagement – and that of my students – with BMI became more focused and penetrated deeper into the issues. Nevertheless, I believe I have learned something from each project I have been involved in and have brought this cumulative experience to bear on the central theme of my present work. My thematic vision is of a scientifically literate and engaged community whose members – citizens, patients, caregivers, advocates – are sufficiently interested in medical progress and in their own health to take ownership of their medical records, to subscribe to a research service that informs them about progress and about current studies that may interest them, and so take responsibility for their own and the health of those close to them. This entails many things: agreements on what constitutes legitimate data sharing and when such sharing may be permitted or required by the patient as owner of the data. It calls for a means of recognizing the intellectual contribution, and in some healthcare economies, the economic interest of a physician who generates that record. Ethically, it requires a consenting policy that allows patients to control who may approach them for participation in a study, whether as a subject, as a co-investigator, as a patient advocate, or as a lay advisor. Educationally, it requires willingness on the part of physician- researchers and scientists to disseminate what they have discovered and what they have learned in terms that are comprehensible to the interested lay participant—but do not speak down to her.

APA, Harvard, Vancouver, ISO, and other styles

15

Kamenieva, Iryna. "Research Ontology Data Models for Data and Metadata Exchange Repository." Thesis, Växjö University, School of Mathematics and Systems Engineering, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-6351.

Full text

Abstract:

For researches in the field of the data mining and machine learning the necessary condition is an availability of various input data set. Now researchers create the databases of such sets. Examples of the following systems are: The UCI Machine Learning Repository, Data Envelopment Analysis Dataset Repository, XMLData Repository, Frequent Itemset Mining Dataset Repository. Along with above specified statistical repositories, the whole pleiad from simple filestores to specialized repositories can be used by researchers during solution of applied tasks, researches of own algorithms and scientific problems. It would seem, a single complexity for the user will be search and direct understanding of structure of so separated storages of the information. However detailed research of such repositories leads us to comprehension of deeper problems existing in usage of data. In particular a complete mismatch and rigidity of data files structure with SDMX - Statistical Data and Metadata Exchange - standard and structure used by many European organizations, impossibility of preliminary data origination to the concrete applied task, lack of data usage history for those or other scientific and applied tasks.

Now there are lots of methods of data miming, as well as quantities of data stored in various repositories. In repositories there are no methods of DM (data miming) and moreover, methods are not linked to application areas. An essential problem is subject domain link (problem domain), methods of DM and datasets for an appropriate method. Therefore in this work we consider the building problem of ontological models of DM methods, interaction description of methods of data corresponding to them from repositories and intelligent agents allowing the statistical repository user to choose the appropriate method and data corresponding to the solved task. In this work the system structure is offered, the intelligent search agent on ontological model of DM methods considering the personal inquiries of the user is realized.

For implementation of an intelligent data and metadata exchange repository the agent oriented approach has been selected. The model uses the service oriented architecture. Here is used the cross platform programming language Java, multi-agent platform Jadex, database server Oracle Spatial 10g, and also the development environment for ontological models - Protégé Version 3.4.

APA, Harvard, Vancouver, ISO, and other styles

16

Savvidis, Evangelos. "Searching Metadata in Hadoop." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177467.

Full text

Abstract:

The rapid expansion of the internet has led to the Big Data era. Companies that provide services which deal with Big Data have to face two major issues: i) storing petabytes of data and ii) manipulating this data. On the one end the open source Hadoop ecosystem and particularly its distributed file system HDFS comes to take care of the former issue, by providing a persistent storage for unprecedented amounts of data. For the latter, there are many approaches when it comes to data analytics – from map-reduce jobs to information retrieval and data discovery. This thesis provides a novel approach to information discovery firstly by providing the means to create, manage and associate metadata to HDFS files and secondly searching for files through their metadata using Elasticsearch. The work is composed of three parts: The first one is the metadata designer/manager, which is the AngularJS front end. The second part is the J2EE back end which enables the front end to perform all the managing actions on metadata using websockets. The third part is the indexing of data into Elasticsearch, the distributed and scalable open source search engine. Our work has shown that this approach works and it greatly helps finding information in the vast sea of data in the HDFS.

APA, Harvard, Vancouver, ISO, and other styles

17

Antle, Alissa N. "Interactive visualization tools for spatial data & metadata." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0010/NQ56495.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Guo, Fan 1966. "Implementing attribute metadata operators to support semistructured data." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=82248.

Full text

Abstract:

This thesis documents the design and implementation of nine new features for supporting semistructured data in jRelix, including the augmentation of attribute metadata, the extension of the wildcard, and the expansion of recursive virtual nested relations.
The strategy for implementation of semistructured data in Aldat system is to augment the data type and operations in the programming language, jRelix. The functionalities of attribute metadata operators (eval, quote, transpose ) have been enhanced and new operators (relation, typeof, self ) have been added. The capacity of the wildcard has been magnified; therefore, it can represent top-level relations or nested relations in domain algebra and relational algebra depending on the context. The expansion of recursive virtual nested relations has been implemented to support recursive nesting structure. Applications of these operators also have been presented including attribute path and schema discovery, data reorganization, and queries involving transitive closure on graphical structured data.

APA, Harvard, Vancouver, ISO, and other styles

19

Enoksson, Fredrik. "Adaptable metadata creation for the Web of Data." Doctoral thesis, KTH, Medieteknik och interaktionsdesign, MID, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-154272.

Full text

Abstract:

One approach to manage collections is to create data about the things in it. This descriptive data is called metadata, and this term is in this thesis used as a collective noun, i.e no plural form exists. A library is a typical example of an organization that uses metadata, to manage a collection of books. The metadata about a book describes certain attributes of it, for example who the author is. Metadata also provides possibilities for a person to judge if a book is interesting without having to deal with the book itself. The metadata of the things in a collection is a representation of the collection that is easier to deal with than the collection itself. Nowadays metadata is often managed in computer-based systems that enable search possibilities and sorting of search results according to different principles. Metadata can be created both by computers and humans. This thesis will deal with certain aspects of the human activity of creating metadata and includes an explorative study of this activity. The increased amount of public information that is produced is also required to be easily accessible and therefore the situation when metadata is a part of the Semantic Web has been considered an important part of this thesis. This situation is also referred to as the Web of Data or Linked Data. With the Web of Data, metadata records living in isolation from each other can now be linked together over the web. This will probably change what kind of metadata that is being created, but also how it is being created. This thesis describes the construction and use of a framework called Annotation Profiles, a set of artifacts developed to enable an adaptable metadata creation environment with respect to what metadata that can be created. The main artifact is the Annotation Profile Model (APM), a model that holds enough information for a software application to generate a customized metadata editor from it. An instance of this model is called an annotation profile, that can be seen as a configuration for metadata editors. Changes to what metadata can be edited in a metadata editor can be done without modifying the code of the application. Two code libraries that implement the APM have been developed and have been evaluated both internally within the research group where they were developed, but also externally via interviews with software developers that have used one of the code-libraries. Another artifact presented is a protocol for how RDF metadata can be remotely updated when metadata is edited through a metadata editor. It is also described how the APM opens up possibilities for end user development and this is one of the avenues of pursuit in future research related to the APM.

QC 20141028

APA, Harvard, Vancouver, ISO, and other styles

20

Alrehamy, Hassan. "Extensible metadata management framework for personal data lake." Thesis, Cardiff University, 2018. http://orca.cf.ac.uk/119636/.

Full text

Abstract:

Common Internet users today are inundated with a deluge of diverse data being generated and siloed in a variety of digital services, applications, and a growing body of personal computing devices as we enter the era of the Internet of Things. Alongside potential privacy compromises, users are facing increasing difficulties in managing their data and are losing control over it. There appears to be a de facto agreement in business and scientific fields that there is critical new value and interesting insight that can be attained by users from analysing their own data, if only it can be freed from its silos and combined with other data in meaningful ways. This thesis takes the point of view that users should have an easy-to-use modern personal data management solution that enables them to centralise and efficiently manage their data by themselves, under their full control, for their best interests, with minimum time and efforts. In that direction, we describe the basic architecture of a management solution that is designed based on solid theoretical foundations and state of the art big data technologies. This solution (called Personal Data Lake - PDL) collects the data of a user from a plurality of heterogeneous personal data sources and stores it into a highly-scalable schema-less storage repository. To simplify the user-experience of PDL, we propose a novel extensible metadata management framework (MMF) that: (i) annotates heterogeneous data with rich lineage and semantic metadata, (ii) exploits the garnered metadata for automating data management workflows in PDL - with extensive focus on data integration, and (iii) facilitates the use and reuse of the stored data for various purposes by querying it on the metadata level either directly by the user or through third party personal analytics services. We first show how the proposed MMF is positioned in PDL architecture, and then describe its principal components. Specifically, we introduce a simple yet effective lineage manager for tracking the provenance of personal data in PDL. We then introduce an ontology-based data integration component called SemLinker which comprises two new algorithms; the first concerns generating graph-based representations to express the native schemas of (semi) structured personal data, and the second algorithm metamodels the extracted representations to a common extensible ontology. SemLinker outputs are utilised by MMF to generate user-tailored unified views that are optimised for querying heterogeneous personal data through low-level SPARQL or high-level SQL-like queries. Next, we introduce an unsupervised automatic keyphrase extraction algorithm called SemCluster that specialises in extracting thematically important keyphrases from unstructured data, and associating each keyphrase with ontological information drawn from an extensible WordNet-based ontology. SemCluster outputs serve as semantic metadata and are utilised by MMF to annotate unstructured contents in PDL, thus enabling various management functionalities such as relationship discovery and semantic search. Finally, we describe how MMF can be utilised to perform holistic integration of personal data and jointly querying it in native representations.

APA, Harvard, Vancouver, ISO, and other styles

21

Grunzke, Richard. "Generic Metadata Handling in Scientific Data Life Cycles." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-202070.

Full text

Abstract:

Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones. Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases. The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management.

APA, Harvard, Vancouver, ISO, and other styles

22

Ravindran, Nimmy. "Hide-Metadata Based Data Integration Environment for Hydrological Datasets." Thesis, Virginia Tech, 2004. http://hdl.handle.net/10919/36157.

Full text

Abstract:

Efficient data integration is one of the most challenging problems in data management, interoperation and analysis. The Earth science data which are heterogeneous are collected at various geographical locations for scientific studies and operational uses. The intrinsic problem of archiving, distributing and searching such huge scientific datasets is compounded by the heterogeneity of data and queries, thus limiting scientific analysis, and generation/validation of hydrologic forecast models. The data models of hydrologic research communities such as National Weather Service (NWS), National Oceanic and Atmospheric Administration (NOAA), and US Geological Survey (USGS) are diverse and complex. A complete derivation of any useful hydrological models from data integrated from all these sources is often a time consuming process. One of the current trends of data harvesting in scientific community is towards a distributed digital library initiative. However, these approaches may not be adequate for data sources / entities who do not want to "upload" the data into a "data pool." In view of this, we present here an effective architecture to address the issues of data integration in such a diverse environment for hydrological studies. The heterogeneities in these datasets are addressed based on the autonomy of data source in terms of design, communication, association and execution using a hierarchical integration model. A metadata model is also developed for defining data as well as the data sources, thus providing a uniform view of the data for different kind of users. An implementation of the model using web based system that integrates widely varied hydrology datasets from various data sources is also being developed.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

23

Demšar, Urška. "Exploring geographical metadata by automatic and visual data mining." Licentiate thesis, KTH, Infrastructure, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-1779.

Full text

Abstract:

Metadata are data about data. They describe characteristicsand content of an original piece of data. Geographical metadatadescribe geospatial data: maps, satellite images and othergeographically referenced material. Such metadata have twocharacteristics, high dimensionality and diversity of attributedata types, which present a problem for traditional data miningalgorithms.

Other problems that arise during the exploration ofgeographical metadata are linked to the expertise of the userperforming the analysis. The large amounts of metadata andhundreds of possible attributes limit the exploration for anon-expert user, which results in a potential loss ofinformation that is hidden in metadata.

In order to solve some of these problems, this thesispresents an approach for exploration of geographical metadataby a combination of automatic and visual data mining.

Visual data mining is a principle that involves the human inthe data exploration by presenting the data in some visualform, allowing the human to get insight into the data and torecognise patterns. The main advantages of visual dataexploration over automatic data mining are that the visualexploration allows a direct interaction with the user, that itis intuitive and does not require complex understanding ofmathematical or statistical algorithms. As a result the userhas a higher confidence in the resulting patterns than if theywere produced by computer only.

In the thesis we present the Visual data mining tool (VDMtool), which was developed for exploration of geographicalmetadata for site planning. The tool provides five differentvisualisations: a histogram, a table, a pie chart, a parallelcoordinates visualisation and a clustering visualisation. Thevisualisations are connected using the interactive selectionprinciple called brushing and linking.

In the VDM tool the visual data mining concept is integratedwith an automatic data mining method, clustering, which finds ahierarchical structure in the metadata, based on similarity ofmetadata items. In the thesis we present a visualisation of thehierarchical structure in the form of a snowflake graph.

Keywords:visualisation, data mining, clustering, treedrawing, geographical metadata.

APA, Harvard, Vancouver, ISO, and other styles

24

Chen, Yin. "A binding approach to scientific data and metadata management." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/27784.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Klovning, Eric. "Metadata management in the support of data warehouse development." Online version, 2008. http://www.uwstout.edu/lib/thesis/2008/2008klovninge.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Marney, Katherine Anne. "Geospatial metadata and an ontology for water observations data." Thesis, [Austin, Tex. : University of Texas, 2009. http://hdl.handle.net/2152/ETD-UT-2009-05-135.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Tang, Yaobin. "Butterfly -- A model of provenance." Worcester, Mass. : Worcester Polytechnic Institute, 2009. http://www.wpi.edu/Pubs/ETD/Available/etd-031309-095511/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Enoksson, Fredrik. "Flexible Authoring of Metadata for Learning : Assembling forms from a declarative data and view model." Licentiate thesis, KTH, Medieteknik och grafisk produktion, Media, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-32818.

Full text

Abstract:

With the vast amount of information in various formats that is produced today it becomes necessary for consumers ofthis information to be able to judge if it is relevant for them. One way to enable that is to provide information abouteach piece of information, i.e. provide metadata. When metadata is to be edited by a human being, a metadata editorneeds to be provided. This thesis describes the design and practical use of a configuration mechanism for metadataeditors called annotation profiles, that is intended to enable a flexible metadata editing environment. An annotationprofile is an instance of an Annotation Profile Model (APM), which is an information model that can gatherinformation from many sources. This model has been developed by the author together with colleagues at the RoyalInstitute of Technology and Uppsala University in Sweden. It is designed so that an annotation profile can holdenough information for an application to generate a customized metadata editor from it. The APM works withmetadata expressed in a format called RDF (Resource Description Framwork), which forms the technical basis for theSemantic Web. It also works with metadata that is expressed using a model similar to RDF. The RDF model providesa simple way of combining metadata standards and this makes it possible for the resulting metadata editor to combinedifferent metadata standards into one metadata description. Resources that are meant to be used in a learning situationcan be of various media types (audio- or video-files, documents, etc.), which gives rise to a situation where differentmetadata standards have to be used in combination. Such a resource would typically contain educational metadatafrom one standard, but for each media type a different metadata standard might be used for the technical description.To combine all the metadata into a single metadata record is desirable and made possible when using RDF. The focusin this thesis is on metadata for resources that can be used in such learning contexts.One of the major advantages of using annotation profiles is that they enable change of metadata editor without havingto modify the code of an application. In contrast, the annotation profile is updated to fit the required changes. In thisway, the programmer of an application can avoid the responsibility of deciding which metadata that can be edited aswell as the structure of it. Instead, such decisions can be left to the metadata specialist that creates the annotationprofiles to be used.The Annotation Profile Model can be divided into two models, the Graph Pattern Model that holds information onwhat parts of the metadata that can be edited, and the Form Template Model that provides information about how thedifferent parts of the metadata editor should be structured. An instance of the Graph Pattern Model is called a graphpattern, and it defines which parts of the metadata that the annotation profile will be editable. The author hasdeveloped an approach to how this information can be used when the RDF metadata to edit is stored on a remotesystem, e.g. a system that can only be accessed over a network. In such cases the graph pattern cannot be useddirectly, even though it defines the structures that can be affected in the editing process. The method developeddescribes how the specific parts of metadata are extracted for editing and updating when the metadata author hasfinished editing.A situation where annotation profiles have proven valuable is presented in chapter 6. Here the author have taken partin developing a portfolio system for learning resources in the area of blood diseases, hematology. A set of annotationprofiles was developed in order to adapt the portfolio system for this particular community. The annotation profilesmade use of an existing curriculum for hematology that provides a competence profile of this field. The annotationprofile makes use this curriculum in two ways:1. As a part of the personal profile for each user, i.e. metadata about a person. Through the editor, created from anannotation profile, the user can express his/her skill/knowledge/competence in the field of hematology.2. The metadata can associate a learning resource can with certain parts of the competence description, thusexpressing that the learning resource deals with a specific part of the competence profile. This provides a mechanismfor matching learning need with available learning resources.As the field of hematology is evolving, the competence profile will need to be updated. Because of the use ofannotation profiles, the metadata editors in question can be updated simply by changing the corresponding annotationprofiles. This is an example of the benefits of annotation profiles within an installed application. Annotation Profilescan also be used for applications that aim to support different metadata expressions, since the set of metadata editorscan be easily changed.The system of portfolios mentioned above provides this flexibility in metadata expression, and it has successfullybeen configured to work with resources from other domain areas, notably organic farming, by using another set ofannotation profiles. Hence, to use annotation profiles has proven useful in these settings due to the flexibility that theAnnotation Profile Model enables. Plans for the future include developing an editor for annotation profiles in order toprovide a simple way to create such profiles.
QC 20110426

APA, Harvard, Vancouver, ISO, and other styles

29

Towle, Jonathan E. Clotfelter Christopher T. "TwiddleNet metadata tagging and data dissemination in mobile device networks /." Monterey, Calif. : Naval Postgraduate School, 2007. http://bosun.nps.edu/uhtbin/hyperion-image.exe/07Sep%5FTowle.pdf.

Full text

Abstract:

Thesis (M.S. in Computer Science)--Naval Postgraduate School, September 2007.
Thesis Advisor(s): Singh, Gurminder ; Das, Arjit. "September 2007." Description based on title screen as viewed on October 23, 2007. Includes bibliographical references (p. 69-70). Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

30

Clotfelter, Christopher T. "TwiddleNet metadata tagging and data dissemination in mobile device networks." Thesis, Monterey, California. Naval Postgraduate School, 2007. http://hdl.handle.net/10945/3333.

Full text

Abstract:

y were only a few years ago; instead they offer a range of content capture capabilities, including high resolution photos, videos and sound recordings. Their communication modalities and processing power have also evolved significantly. Modern mobile devices are very capable platforms, many surpassing their desktop cousins only a few years removed. TwiddleNet is a distributed architecture of personal servers that harnesses the power of these mobile devices, enabling real time information dissemination and file sharing of multiple data types from commercial-off-the-shelf platforms. This thesis focuses on two specific issues of the TwiddleNet design; metadata tagging and data dissemination. Through a combination of automatically generated and user input metadata tag values, TwiddleNet users can locate files across participating devices. Metaphor appropriate custom tags can be added as needed to insure efficient, rich and successful file searches. Intelligent data dissemination algorithms provide context sensitive governance to the file transfer scheme. Smart dissemination reconciles device and operational states with the amount of requested data and content to send, enabling providers to meet their most pressing needs, whether that is continuing to generate content or servicing requests.

APA, Harvard, Vancouver, ISO, and other styles

31

Shi, Rong Shi. "Efficient data and metadata processing in large-scale distributed systems." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1534414418404428.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Pozzoli, Alice Laurini Robert Mussio Luigi. "Data and quality metadata for continuous fields terrains and photogrammetry /." Villeurbanne : Doc'INSA, 2009. http://docinsa.insa-lyon.fr/these/pont.php?id=pozzoli.

Full text

Abstract:

Thèse doctorat : Informatique : Villeurbanne, INSA : 2008. Thèse doctorat : Informatique : Politecnico di Milano. Scuola di dottorato in Geodesia e Geomatica : 2008.
Thèse soutenue en co-tutelle. Thèse bilingue français-anglais. Résumé étendu en français. Titre provenant de l'écran-titre. Bibliogr. p. 168-182.

APA, Harvard, Vancouver, ISO, and other styles

33

Dervos, Dimitris A., and Anita Sundaram Coleman. "A Common Sense Approach to Defining Data, Information, and Metadata." Ergon-Verlag, 2006. http://hdl.handle.net/10150/105530.

Full text

Abstract:

This is a presentation (~25 slides) made at ISKO 2006 in Vienna based on the paper (same title) published in the Proceedings of the Ninth International ISKO 2006 Conference, Vienna, Edited by Swertz, C. Berlin: Ergon, 2006.

APA, Harvard, Vancouver, ISO, and other styles

34

YAMAN, BEYZA. "Exploiting Context-Dependent Quality Metadata for Linked Data Source Selection." Doctoral thesis, Università degli studi di Genova, 2018. http://hdl.handle.net/11567/930633.

Full text

Abstract:

The traditional Web is evolving into the Web of Data which consists of huge collections of structured data over poorly controlled distributed data sources. Live queries are needed to get current information out of this global data space. In live query processing, source selection deserves attention since it allows us to identify the sources which might likely contain the relevant data. The thesis proposes a source selection technique in the context of live query processing on Linked Open Data, which takes into account the context of the request and the quality of data contained in the sources to enhance the relevance (since the context enables a better interpretation of the request) and the quality of the answers (which will be obtained by processing the request on the selected sources). Specifically, the thesis proposes an extension of the QTree indexing structure that had been proposed as a data summary to support source selection based on source content, to take into account quality and contextual information. With reference to a specific case study, the thesis also contributes an approach, relying on the Luzzu framework, to assess the quality of a source with respect to for a given context (according to different quality dimensions). An experimental evaluation of the proposed techniques is also provided

APA, Harvard, Vancouver, ISO, and other styles

35

Lawlor, Fiona. "Implementation of the metadata elements of the INSPIRE directive." [Denver, Colo.] : Regis University, 2008. http://165.236.235.140/lib/FLawlor2008.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Beránek, Lukáš. "Vizualizace technických a business metadat." Master's thesis, Vysoká škola ekonomická v Praze, 2011. http://www.nusl.cz/ntk/nusl-124775.

Full text

Abstract:

This master's degree thesis focuses on the issues of visualization formerly preprocessed business and technical metadata in a business environment. Within the process of elabora-tion and usage of the collected data in the company, it is necessary to present the data to the users in a comfortable, comprehensible and clear way. The first goal of this thesis is to describe and to specify the term of metadata in the field of theory and on the level of busi-ness, their main structure, their occurrence in a non-visual manner and related places where we can find metadata in the heterogeneous business environment. This part also includes a short introduction to the usage of metadata that is related and originates from business in-telligence and a description of Company encyclopedia that can syndicate these resources for further utilization. When defined, the sources, destinations and purpose of technical and business metadata can be used in the second part of the thesis -- this part is aimed at model-ing the use cases for the visual component that can be applied to business and technical metadata. Use cases will be focused on the roles of the users that will use this component and to discover the primary demands and requirements of these users and the functionality that will be indispensable. After the use cases are defined we can process to the next stage of visual component development -- the data must be visualized itself and we have to find proper means to achieve this with user experience being the main focus. Then we have to encapsulate the visualization with a graphical user interface that will meet the requirements and demands of the users' roles specified by the use cases section by prototyping. Lastly, the results of the previous chapters are used to prototype the visual component suitable for a web environment which is based on principles of reusability, data-driven approach, and uses modern web technologies such as framework and library D3.js, HTML5, CSS3, and SVG.

APA, Harvard, Vancouver, ISO, and other styles

37

Wahlquist, Gustav. "Improving Automatic Image Annotation Using Metadata." Thesis, Linköpings universitet, Datorseende, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176941.

Full text

Abstract:

Detecting and outlining products in images is beneficial for many use cases in e-commerce, such as automatically identifying and locating products within images and proposing matches for the detections. This study investigated how the utilisation of metadata associated with images of products could help boost the performance of an existing approach with the ultimate goal of reducing manual labour needed to annotate images. This thesis explored if approximate pseudo masks could be generated for products in images by leveraging metadata as image-level labels and subsequently using the masks to train a Mask R-CNN. However, this approach did not result in satisfactory results. Further, this study found that by incorporating the metadata directly in the Mask R-CNN, an mAP performance increase of nearly 5\% was achieved. Furthermore, utilising the available metadata to divide the training samples for a KNN model into subsets resulted in an increased top-3 accuracy of up to 16\%. By representing the data with embeddings created by a pre-trained CNN, the KNN model performed better with both higher accuracy and more reasonable suggestions.

APA, Harvard, Vancouver, ISO, and other styles

38

Jarvis, Kimberley James. "Transactional data structures." Thesis, University of Manchester, 2011. https://www.research.manchester.ac.uk/portal/en/theses/transactional-data-structures(7060eaec-7dbd-4d5a-be1a-a753d9aa32d5).html.

Full text

Abstract:

Concurrent programming is difficult and the effort is rarely rewarded by faster execution. The concurrency problem arises because information cannot pass instantly between processors resulting in temporal uncertainty. This thesis explores the idea that immutable data and distributed concurrency control can be combined to allow scalable concurrent execution and make concurrent programming easier. A concurrent system that does not impose a global ordering on events lends itself to a scalable distributed implementation. A concurrent programming environment in which the ordering of events affecting an object is enforced locally has intuitive concurrent semantics. This thesis introduces Transactional Data Structures which are data structures that permit access to past versions, although not all accesses succeed. These data structures form the basis of a concurrent programming solution that supports database type transactions in memory. Transactional Data Structures permit non-blocking concurrent access to familiar abstract data types such as deques, maps, vectors and priority queues. Using these data structures a programmer can write a concurrent program in C without having to reason about locks. The solution is evaluated by comparing the performance of a concurrent algorithm to calculate the minimum spanning tree of a graph with that of a similar algorithm which uses Transactional Memory and by comparing a non-blocking Producer Consumer Queue with its blocking counterpart.

APA, Harvard, Vancouver, ISO, and other styles

39

Price, Jeremy C., Michael S. Moore, and Bill A. Malatesta. "XML Data Modeling for Network-Based Telemetry Systems." International Foundation for Telemetering, 2008. http://hdl.handle.net/10150/606213.

Full text

Abstract:

ITC/USA 2008 Conference Proceedings / The Forty-Fourth Annual International Telemetering Conference and Technical Exhibition / October 27-30, 2008 / Town and Country Resort & Convention Center, San Diego, California
Network-based telemetry systems are often made up of many components from multiple vendors. The complexity involved in coordinating the design, integration, configuration, and operation of these systems has required instrumentation engineers to become experts in the tools and hardware from various vendors. Interoperation between the various tools and systems is very limited. One approach toward a more efficient method of managing these systems is to define a common language for describing the goals of the test, the measurements to be acquired, and the equipment that is available to compose a system. Through an open working group process, the iNET program is defining an eXtensible Markup Language (XML)-based language for describing instrumentation and telemetry systems. The language is designed with multiple aspects that allow filtered views into the instrumentation system, making the creation of the various parts of the documents more straight-forward and understandable to the type of user providing the information. This paper will describe the iNET metadata project, the model-driven approach that is being pursued, and the current state of the iNET metadata language.

APA, Harvard, Vancouver, ISO, and other styles

40

Jin, Hao. "A framework for capturing, querying, and restructuring metadata in XML data." Online access for everyone, 2005. http://www.dissertations.wsu.edu/Dissertations/Summer2005/h%5Fjin%5F072105.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Bolin, Pernilla, and Emma Liljebrand. "Kvalitativa intervjuer för framtagning av metadata och data/geodata : för blåljusaktörer." Thesis, Högskolan i Gävle, Avdelningen för Industriell utveckling, IT och Samhällsbyggnad, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-22068.

Full text

Abstract:

Idag har blåljusaktörerna (räddningstjänsten, polismyndigheten och ambulanssjukvården) inte samma kartmaterial att arbeta efter. Detta innebär att de har olika kartbilder att jobba med. Detta skulle kunna lösas med hjälp av ett gemensamt kartstöd och en touch av metadata. Metadata är data om data och blir allt vanligare i datasammanhang. Då data blir mer och mer innehållsrikt och mer komplicerat blir vikten av metadata större. Metadata kan liknas vid en innehållsförteckning till ett examensarbete. Utan innehållsförteckning och rubriker skulle arbetet vara svårt att läsa igenom. Metadata hjälper bl.a. till så vi kan dela datamängder men ser också till att data hålls uppdaterat. Data och geodata kan användas utan metadata men med den fås en beskrivning av hur aktuell data är, vilka rättigheter vi har att använda den o.s.v. Syftet med studien var att få fram det geodata och metadata som blåljusaktörerna har gemensamt i sin verksamhet. Detta för att underlätta framställandet av en gemensam blåljuskarta, så att aktörerna kan utföra sina uppdrag på ett så tidseffektivt sätt som möjligt. Genom intervjuer med de tre blåljusaktörerna fick vi reda på vilka data/geodata varje aktör har av intresse. Genom att jämföra aktörernas svar med varandra lyckades gemensamma geodata som cykelbanor och husnummer lokaliseras. Ur geodata kunde sedan metadata som, uppdatering, entrésida m.fl. extraheras.

APA, Harvard, Vancouver, ISO, and other styles

42

Gu, Peng. "METADATA AND DATA MANAGEMENT IN HIGH PERFORMANCE FILE AND STORAGE SYSTEMS." Doctoral diss., University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2155.

Full text

Abstract:

With the advent of emerging "e-Science" applications, today's scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data is deemed to have significant performance implications. Prefetching/caching and data locality awareness optimizations, as conventional and effective management techniques for metadata and data I/O performance enhancement, still play their crucial roles in current parallel and distributed file systems. In this study, we examine the limitations of existing prefetching/caching techniques and explore the untapped potentials of data locality optimization techniques in the new era of petascale computing. For metadata I/O access, we propose a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata serversan arrangement envisioned necessary for petabyte scale distributed storage systems. For data I/O access, we design and implement Segment-structured On-disk data Grouping and Prefetching (SOGP), a combined prefetching and data placement technique to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. One high-performance local I/O software package in SOGP work for Parallel Virtual File System in the number of about 2000 C lines was released to Argonne National Laboratory in 2007 for potential integration into the production mode.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering PhD

APA, Harvard, Vancouver, ISO, and other styles

43

Hargreaves, Steven. "Music metadata capture in the studio from audio and symbolic data." Thesis, Queen Mary, University of London, 2014. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8816.

Full text

Abstract:

Music Information Retrieval (MIR) tasks, in the main, are concerned with the accurate generation of one of a number of different types of music metadata {beat onsets, or melody extraction, for example. Almost always, they operate on fully mixed digital audio recordings. Commonly, this means that a large amount of signal processing effort is directed towards the isolation, and then identification, of certain highly relevant aspects of the audio mix. In some cases, results of one MIR algorithm are useful, if not essential, to the operation of another { a chord detection algorithm for example, is highly dependent upon accurate pitch detection. Although not clearly defined in all cases, certain rules exist which we may take from music theory in order to assist the task { the particular note intervals which make up a specific chord, for example. On the question of generating accurate, low level music metadata (e.g. chromatic pitch and score onset time), a potentially huge advantage lies in the use of multitrack, rather than mixed, audio recordings, in which the separate instrument recordings may be analysed in isolation. Additionally, in MIR, as in many other research areas currently, there is an increasing push towards the use of the Semantic Web for publishing metadata using the Resource Description Framework (RDF). Semantic Web technologies, though, also facilitate the querying of data via the SPARQL query language, as well as logical inferencing via the careful creation and use of web ontology language (OWL) ontologies. This, in turn, opens up the intriguing possibility of deferring our decision regarding which particular type of MIR query to ask of our low-level music metadata until some point later down the line, long after all the heavy signal processing has been carried out. In this thesis, we describe an over-arching vision for an alternative MIR paradigm, built around the principles of early, studio-based metadata capture, and exploitation of open, machine-readable Semantic Web data. Using the specific example of structural segmentation, we demonstrate that by analysing multitrack rather than mixed audio, we are able to achieve a significant and quantifiable increase in the accuracy of our segmentation algorithm. We also provide details of a new multitrack audio dataset with structural segmentation annotations, created as part of this research, and available for public use. Furthermore, we show that it is possible to fully implement a pair of pattern discovery algorithms (the SIA and SIATEC algorithms { highly applicable, but not restricted to, symbolic music data analysis) using only SemanticWeb technologies { the SPARQL query language, acting on RDF data, in tandem with a small OWL ontology. We describe the challenges encountered by taking this approach, the particular solution we've arrived at, and we evaluate the implementation both in terms of its execution time, and also within the wider context of our vision for a new MIR paradigm.

APA, Harvard, Vancouver, ISO, and other styles

44

Chilvers, Alison H. "Managing long-term access to digital data objects : a metadata approach." Thesis, Loughborough University, 2000. https://dspace.lboro.ac.uk/2134/7239.

Full text

Abstract:

As society becomes increasingly reliant on information technology for data exchange and long-term data storage the need for a system of data management to document and provide access to the 'societal memory' is becoming imperative. An examination of both the literature and current 'best practice' underlines the absence to date of a proven universal conceptual basis to digital data preservation. The examination of differences in nature and sources of origin, between traditional 'print-based' and digital objects leads to a re-appraisal of current practices of data selection and preservation. The need to embrace past, present and future metadata developments in a rapidly changing environment is considered. Various hypotheses were formulated and supported regarding: the similarities and differences required in selection criteria for different types of Digital Data Objects (DDOs), the ability to define universal threshold standards for a framework of metadata for digital data preservation, and the role of selection criteria in such a framework. The research uses Soft Systems Methodology to investigate the potential of the metadata concept as the key to universal data management. Semi-structured interviews were conducted to explore the attitudes of information professionals in the United Kingdom towards the challenges facing information-dependent organisations attempting to preserve digital data over the long-term. In particular, the nature of DDOs being encountered by stakeholders, the reasons, policies, and procedures for preserving them, together with a range of specific issues such as: the role of metadata, access to, and rights management of DDOs. The societal need for selection to ensure efficient long-term access is considered. Drawing on - SSM modelling, this research develops a flexible, long-term management framework for digital data at a level higher than metadata, with selection as an essential component. The framework's conceptual feasibility has been examined from both financial and societal benefit perspectives, together with the recognition of constraints. The super-metadata framework provides a possible systematic approach to managing a wide range of digital data in a variety of formats, created/owned by a spectrum of information-dependent organisations.

APA, Harvard, Vancouver, ISO, and other styles

45

Roxbergh, Linus. "Language Classification of Music Using Metadata." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-379625.

Full text

Abstract:

The purpose of this study was to investigate how metadata from Spotify could be used to identify the language of songs in a dataset containing nine languages. Features based on song name, album name, genre, regional popularity and vectors describing songs, playlists and users were analysed individually and in combination with each other in different classifiers. In addition to this, this report explored how different levels of prediction confidence affects performance and how it compared to a classifier based on audio input. A random forest classifier proved to have the best performance with an accuracy of 95.4% for the whole data set. Performance was also investigated when the confidence of the model was taken into account, and when only keeping more confident predictions from the model, accuracy was higher. When keeping the 70% most confident predictions an accuracy of 99.4% was achieved. The model also proved to be robust to input of other languages than it was trained on, and managed to filter out unwanted records not matching the languages of the model. A comparison was made to a classifier based on audio input, where the model using metadata performed better on the training and test set used. Finally, a number of possible improvements and future work were suggested.

APA, Harvard, Vancouver, ISO, and other styles

46

Weng, Li. "Automatic and efficient data virtualization system for scientific datasets." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1154717945.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Zurek, Fiona. "Metadatenmanagement in Bibliotheken mit KNIME und Catmandu." Thesis, 2019. http://eprints.rclis.org/39887/1/Bachelorarbeit_Metadatenmanagement_KNIME_Catmandu.pdf.

Full text

Abstract:

This thesis deals with metadata management in libraries. It examines to what extent the tools KNIME and Catmandu can be used to support libraries in typical tasks of metadata management. The technical developments in the field of metadata have become more complex due to the multitude of formats, interfaces, and applications. In order to prepare and use metadata, information about the suitability of different programs is needed. KNIME and Catmandu are both theoretically analyzed and practically tested. For this purpose it is examined, among other things, how the documentation is designed, and which data formats and interfaces are supported. Typical tasks like filtering, analysis, content enhancement, and data enrichment will be tested. The work shows that both tools have different strengths and weaknesses. Catmandu's strength is an easier introduction into the program and a variety of options for using library data formats and interfaces. An advantage of KNIME is that after an initial familiarization many problems can be solved quickly and special features are made available for numerous cases.

APA, Harvard, Vancouver, ISO, and other styles

48

Ballarin, Matteo. "SKOS : un sistema per l'organizzazione della conoscenza." Thesis, 2006. http://eprints.rclis.org/7408/1/774752.pdf.

Full text

Abstract:

The development of Semantic Web includes not only new technologies like web service, search engines, ontologies but also different worlds like librarianship and other disciplines that for hundred of years have been working with knowledge organization (KOS). This thesis focuses the attention on this type of tools: the rapid growth of means and digital contents on the Web has increased the needs to organize the immense size of information. Tools for knowledge organization like thesauri, taxonomies, concept schemas are fundamental for the birth and the development of the Semantic Web. In this thesis it comes taken in consideration and analyzed one emergent technology candidate to become soon standard recommended from the W3C: SKOS. The standards are taken in consideration that discipline the construction of thesauri, existing technologies are analyzed and some simple examples are modeled. Moreover other alternative instruments are analyzed for the knowledge organisation system, and some real applications based on the SKOS framework are introduced. Supervisor of the degree thesis: Prof. Renzo Orsini.

APA, Harvard, Vancouver, ISO, and other styles

49

Palomino, Norma. "The Bibliographic Concept of Work in Cataloguing and its Issues." Thesis, 2003. http://eprints.rclis.org/9531/1/TesisNorma_Palomino.pdf.

Full text

Abstract:

This report explores the IFLA’s document Functional Requirements for Bibliographic Records (FRBR). It discusses the notion of work in cataloguing as it was built since the 1950s, inasmuch this notion constitutes the conceptual framework for the proposal. Also, the entity-relationship database modeling (ERDM) system is described as far as such model provides to FRBR the operative elements that make it functional. ERDM gives to FRBR a user-centered approach as well. In its third chapter, the report tests the FRBR model through its application to a set of items belonging to the novel Rayuela, by Julio Cortázar, held at the Benson Latin American Collection of the University of Texas at Austin. Finally, some critical issues are raised along with general conclusions regarding the functionality of the model

APA, Harvard, Vancouver, ISO, and other styles

50

Πεπονάκης, Μανόλης. "Σύνθεση FRBR εγγραφών αξιοποιώντας υπάρχουσες βιβλιογραφικές εγγραφές (FRBRization): ομαδοποίηση σχετικών εγγραφών (clustering) και εμφάνισή τους σε on line συστήματα." Thesis, 2010. http://eprints.rclis.org/16674/1/Peponakis_FRBRization_MSc_Thesis.pdf.

Full text

Abstract:

This MSc dissertation thesis focuses upon two main issues. First issue is to review the international practices on FRBRization with special interest on clustering procedures. Second issue is to study the effectiveness of these procedures on Greek library catalogs. The study begins with a short history review of library catalogs aiming to designate the reasons which led to conceptual models such as the FRBR. A brief analysis of the FRBR follows emphasizing on the changes they bring on the methods with which library catalogs are structured along with other changes. In Chapter 3, the study attempts a rather holistic approach on the international FRBRization practices in addition to clustering procedures. The fourth chapter of the study discusses the implementation of the international clustering procedures on Greek metadata. To fulfil this part of the research it was considered necessary to elaborate on Greek catalogs’ existing metadata. This task helps mainly on reporting the problems which seem to affect the clustering procedure overall. The chapter continues with recommendations for modifications and adjustments of the existing international clustering techniques in order to match special needs and features of Greek catalog structures. This will lead to delivering better FRBRization results of Greek metadata. The fifth chapter concludes the study and is divided in two main sections. The first section presents the problems and objections regarding the FRBRization and the ways it is currently put into practice. The second section reviews briefly the hands-on effort of implementing international FRBRization techniques on Greek metadata. Results show that it is a prerequisite for the FRBRization procedure in Greek catalogs to take into consideration their special features. But even then the effectiveness of the FRBRization techniques already reported in the international bibliography is not fully proved with the Greek catalogs.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data and metadata structures'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles