Dissertations / Theses: 'Data management'

1

Morshedzadeh, Iman. "Data Classification in Product Data Management." Thesis, Högskolan i Skövde, Institutionen för teknik och samhälle, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-14651.

Full text

Abstract:

This report is about the product data classification methodology that is useable for the Volvo Cars Engine (VCE) factory's production data, and can be implemented in the Teamcenter software. There are many data generated during the life cycle of each product, and companies try to manage these data with some product data management software. Data classification is a part of data management for most effective and efficient use of data. With surveys that were done in this project, items affecting the data classification have been found. Data, attributes, classification method, Volvo Cars Engine factory and Teamcenter as the product data management software, are items that are affected data classification. In this report, all of these items will be explained separately. With the knowledge obtained about the above items, in the Volvo Cars Engine factory, the suitable hierarchical classification method is described. After defining the classification method, this method has been implemented in the software at the last part of the report to show that this method is executable.

APA, Harvard, Vancouver, ISO, and other styles

2

Yang, Ying. "Interactive Data Management and Data Analysis." Thesis, State University of New York at Buffalo, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10288109.

Full text

Abstract:

Everyone today has a big data problem. Data is everywhere and in different formats, they can be referred to as data lakes, data streams, or data swamps. To extract knowledge or insights from the data or to support decision-making, we need to go through a process of collecting, cleaning, managing and analyzing the data. In this process, data cleaning and data analysis are two of the most important and time-consuming components.

One common challenge in these two components is a lack of interaction. The data cleaning and data analysis are typically done as a batch process, operating on the whole dataset without any feedback. This leads to long, frustrating delays during which users have no idea if the process is effective. Lacking interaction, human expert effort is needed to make decisions on which algorithms or parameters to use in the systems for these two components.

We should teach computers to talk to humans, not the other way around. This dissertation focuses on building systems --- Mimir and CIA --- that help user conduct data cleaning and analysis through interaction. Mimir is a system that allows users to clean big data in a cost- and time-efficient way through interaction, a process I call on-demand ETL. Convergent inference algorithms (CIA) are a family of inference algorithms in probabilistic graphical models (PGM) that enjoys the benefit of both exact and approximate inference algorithms through interaction.

Mimir provides a general language for user to express different data cleaning needs. It acts as a shim layer that wraps around the database making it possible for the bulk of the ETL process to remain within a classical deterministic system. Mimir also helps users to measure the quality of an analysis result and provides rankings for cleaning tasks to improve the result quality in a cost efficient manner. CIA focuses on providing user interaction through the process of inference in PGMs. The goal of CIA is to free users from the upfront commitment to either approximate or exact inference, and provide user more control over time/accuracy trade-offs to direct decision-making and computation instance allocations. This dissertation describes the Mimir and CIA frameworks to demonstrate that it is feasible to build efficient interactive data management and data analysis systems.

APA, Harvard, Vancouver, ISO, and other styles

3

Mathew, Avin D. "Asset management data warehouse data modelling." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/19310/1/Avin_Mathew_Thesis.pdf.

Full text

Abstract:

Data are the lifeblood of an organisation, being employed by virtually all business functions within a firm. Data management, therefore, is a critical process in prolonging the life of a company and determining the success of each of an organisation’s business functions. The last decade and a half has seen data warehousing rising in priority within corporate data management as it provides an effective supporting platform for decision support tools. A cross-sectional survey conducted by this research showed that data warehousing is starting to be used within organisations for their engineering asset management, however the industry uptake is slow and has much room for development and improvement. This conclusion is also evidenced by the lack of systematic scholarly research within asset management data warehousing as compared to data warehousing for other business areas. This research is motivated by the lack of dedicated research into asset management data warehousing and attempts to provide original contributions to the area, focussing on data modelling. Integration is a fundamental characteristic of a data warehouse and facilitates the analysis of data from multiple sources. While several integration models exist for asset management, these only cover select areas of asset management. This research presents a novel conceptual data warehousing data model that integrates the numerous asset management data areas. The comprehensive ethnographic modelling methodology involved a diverse set of inputs (including data model patterns, standards, information system data models, and business process models) that described asset management data. Used as an integrated data source, the conceptual data model was verified by more than 20 experts in asset management and validated against four case studies. A large section of asset management data are stored in a relational format due to the maturity and pervasiveness of relational database management systems. Data warehousing offers the alternative approach of structuring data in a dimensional format, which suggests increased data retrieval speeds in addition to reducing analysis complexity for end users. To investigate the benefits of moving asset management data from a relational to multidimensional format, this research presents an innovative relational vs. multidimensional model evaluation procedure. To undertake an equitable comparison, the compared multidimensional are derived from an asset management relational model and as such, this research presents an original multidimensional modelling derivation methodology for asset management relational models. Multidimensional models were derived from the relational models in the asset management data exchange standard, MIMOSA OSA-EAI. The multidimensional and relational models were compared through a series of queries. It was discovered that multidimensional schemas reduced the data size and subsequently data insertion time, decreased the complexity of query conceptualisation, and improved the query execution performance across a range of query types. To facilitate the quicker uptake of these data warehouse multidimensional models within organisations, an alternate modelling methodology was investigated. This research presents an innovative approach of using a case-based reasoning methodology for data warehouse schema design. Using unique case representation and indexing techniques, the system also uses a business vocabulary repository to augment case searching and adaptation. The system was validated through a case-study where multidimensional schema design speed and accuracy was measured. It was found that the case-based reasoning system provided a marginal benefit, with a greater benefits gained when confronted with more difficult scenarios.

APA, Harvard, Vancouver, ISO, and other styles

4

Mathew, Avin D. "Asset management data warehouse data modelling." Queensland University of Technology, 2008. http://eprints.qut.edu.au/19310/.

Full text

Abstract:

Data are the lifeblood of an organisation, being employed by virtually all business functions within a firm. Data management, therefore, is a critical process in prolonging the life of a company and determining the success of each of an organisation’s business functions. The last decade and a half has seen data warehousing rising in priority within corporate data management as it provides an effective supporting platform for decision support tools. A cross-sectional survey conducted by this research showed that data warehousing is starting to be used within organisations for their engineering asset management, however the industry uptake is slow and has much room for development and improvement. This conclusion is also evidenced by the lack of systematic scholarly research within asset management data warehousing as compared to data warehousing for other business areas. This research is motivated by the lack of dedicated research into asset management data warehousing and attempts to provide original contributions to the area, focussing on data modelling. Integration is a fundamental characteristic of a data warehouse and facilitates the analysis of data from multiple sources. While several integration models exist for asset management, these only cover select areas of asset management. This research presents a novel conceptual data warehousing data model that integrates the numerous asset management data areas. The comprehensive ethnographic modelling methodology involved a diverse set of inputs (including data model patterns, standards, information system data models, and business process models) that described asset management data. Used as an integrated data source, the conceptual data model was verified by more than 20 experts in asset management and validated against four case studies. A large section of asset management data are stored in a relational format due to the maturity and pervasiveness of relational database management systems. Data warehousing offers the alternative approach of structuring data in a dimensional format, which suggests increased data retrieval speeds in addition to reducing analysis complexity for end users. To investigate the benefits of moving asset management data from a relational to multidimensional format, this research presents an innovative relational vs. multidimensional model evaluation procedure. To undertake an equitable comparison, the compared multidimensional are derived from an asset management relational model and as such, this research presents an original multidimensional modelling derivation methodology for asset management relational models. Multidimensional models were derived from the relational models in the asset management data exchange standard, MIMOSA OSA-EAI. The multidimensional and relational models were compared through a series of queries. It was discovered that multidimensional schemas reduced the data size and subsequently data insertion time, decreased the complexity of query conceptualisation, and improved the query execution performance across a range of query types. To facilitate the quicker uptake of these data warehouse multidimensional models within organisations, an alternate modelling methodology was investigated. This research presents an innovative approach of using a case-based reasoning methodology for data warehouse schema design. Using unique case representation and indexing techniques, the system also uses a business vocabulary repository to augment case searching and adaptation. The system was validated through a case-study where multidimensional schema design speed and accuracy was measured. It was found that the case-based reasoning system provided a marginal benefit, with a greater benefits gained when confronted with more difficult scenarios.

APA, Harvard, Vancouver, ISO, and other styles

5

Sehat, Mahdis, and FLORES RENÉ PAVEZ. "Customer Data Management." Thesis, KTH, Industriell ekonomi och organisation (Avd.), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-109251.

Full text

Abstract:

Abstract As the business complexity, number of customers continues to grow and customers evolve into multinational organisations that operate across borders, many companies are faced with great challenges in the way they manage their customer data. In today’s business, a single customer may have a relationship with several entities of an organisation, which means that the customer data is collected through different channels. One customer may be described in different ways by each entity, which makes it difficult to obtain a unified view of the customer. In companies where there are several sources of data and the data is distributed to several systems, data environments become heterogenic. In this state, customer data is often incomplete, inaccurate and inconsistent throughout the company. This thesis aims to study how organisations with heterogeneous customer data sources implement the Master Data Management (MDM) concept to achieve and maintain high customer data quality. The purpose is to provide recommendations for how to achieve successful customer data management using MDM based on existing literature related to the topic and an interview-based empirical study. Successful customer data management is more of an organisational issue than a technological one and requires a top-down approach in order to develop a common strategy for an organisation’s customer data management. Proper central assessment and maintenance processes that can be adjusted according to the entities’ needs must be in place. Responsibilities for the maintenance of customer data should be delegated to several levels of an organisation in order to better manage customer data.

APA, Harvard, Vancouver, ISO, and other styles

6

Scott, Mark. "Research data management." Thesis, University of Southampton, 2014. https://eprints.soton.ac.uk/374711/.

Full text

Abstract:

Scientists within the materials engineering community produce a wide variety of data, ranging from large 3D volume densitometry files (voxel) generated by microfocus computer tomography (μCT) to simple text files containing results from tensile tests. Increasingly they need to share this data as part of international collaborations. The design of a suitable database schema and the architecture of a flexible system that can cope with the varying information is a continuing problem in the management of heterogeneous data. We discuss the issues with managing such varying data, and present a model flexible enough to meet users’ diverse requirements. Metadata is held using a database and its design allows users to control their own data structures. Data is held in a file store which, in combination with the metadata, gives huge flexibility and means the model is limited only by the file system. Using examples from materials engineering and medicine we illustrate how the model can be applied. We will also discuss how this data model can be used to support an institutional document repository, showing how data can be published in a remote data repository at the same time as a publication is deposited in a document repository. Finally, we present educational material used to introduce the concepts of research data management. Educating students about the challenges and opportunities of data management is a key part of the solution and helps the researchers of the future to start to think about the relevant issues early on in their careers. We have compiled a set of case studies to show the similarities and differences in data between disciplines, and produced documentation for students containing the case studies and an introduction to the data lifecycle and other data management practices. Managing in-use data and metadata is just as important to users as published data. Appropriate education of users and a data staging repository with a flexible and extensible data model supports this without precluding the ability to publish the data at a later date.

APA, Harvard, Vancouver, ISO, and other styles

7

Tran, Viet-Trung. "Scalable data-management systems for Big Data." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2013. http://tel.archives-ouvertes.fr/tel-00920432.

Full text

Abstract:

Big Data can be characterized by 3 V's. * Big Volume refers to the unprecedented growth in the amount of data. * Big Velocity refers to the growth in the speed of moving data in and out management systems. * Big Variety refers to the growth in the number of different data formats. Managing Big Data requires fundamental changes in the architecture of data management systems. Data storage should continue being innovated in order to adapt to the growth of data. They need to be scalable while maintaining high performance regarding data accesses. This thesis focuses on building scalable data management systems for Big Data. Our first and second contributions address the challenge of providing efficient support for Big Volume of data in data-intensive high performance computing (HPC) environments. Particularly, we address the shortcoming of existing approaches to handle atomic, non-contiguous I/O operations in a scalable fashion. We propose and implement a versioning-based mechanism that can be leveraged to offer isolation for non-contiguous I/O without the need to perform expensive synchronizations. In the context of parallel array processing in HPC, we introduce Pyramid, a large-scale, array-oriented storage system. It revisits the physical organization of data in distributed storage systems for scalable performance. Pyramid favors multidimensional-aware data chunking, that closely matches the access patterns generated by applications. Pyramid also favors a distributed metadata management and a versioning concurrency control to eliminate synchronizations in concurrency. Our third contribution addresses Big Volume at the scale of the geographically distributed environments. We consider BlobSeer, a distributed versioning-oriented data management service, and we propose BlobSeer-WAN, an extension of BlobSeer optimized for such geographically distributed environments. BlobSeer-WAN takes into account the latency hierarchy by favoring locally metadata accesses. BlobSeer-WAN features asynchronous metadata replication and a vector-clock implementation for collision resolution. To cope with the Big Velocity characteristic of Big Data, our last contribution feautures DStore, an in-memory document-oriented store that scale vertically by leveraging large memory capability in multicore machines. DStore demonstrates fast and atomic complex transaction processing in data writing, while maintaining high throughput read access. DStore follows a single-threaded execution model to execute update transactions sequentially, while relying on a versioning concurrency control to enable a large number of simultaneous readers.

APA, Harvard, Vancouver, ISO, and other styles

8

Schnyder, Martin. "Web 2.0 data management." Zürich : ETH, Eidgenössische Technische Hochschule Zürich, Department of Computer Science, Institute of Information Systems, Global Information Systems Group, 2008. http://e-collection.ethbib.ethz.ch/show?type=dipl&nr=403.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

He, Ying Surveying &amp Spatial Information Systems Faculty of Engineering UNSW. "Spatial data quality management." Publisher:University of New South Wales. Surveying & Spatial Information Systems, 2008. http://handle.unsw.edu.au/1959.4/43323.

Full text

Abstract:

The applications of geographic information systems (GIS) in various areas have highlighted the importance of data quality. Data quality research has been given a priority by GIS academics for three decades. However, the outcomes of data quality research have not been sufficiently translated into practical applications. Users still need a GIS capable of storing, managing and manipulating data quality information. To fill this gap, this research aims to investigate how we can develop a tool that effectively and efficiently manages data quality information to aid data users to better understand and assess the quality of their GIS outputs. Specifically, this thesis aims: 1. To develop a framework for establishing a systematic linkage between data quality indicators and appropriate uncertainty models; 2. To propose an object-oriented data quality model for organising and documenting data quality information; 3. To create data quality schemas for defining and storing the contents of metadata databases; 4. To develop a new conceptual model of data quality management; 5. To develop and implement a prototype system for enhancing the capability of data quality management in commercial GIS. Based on reviews of error and uncertainty modelling in the literature, a conceptual framework has been developed to establish the systematic linkage between data quality elements and appropriate error and uncertainty models. To overcome the limitations identified in the review and satisfy a series of requirements for representing data quality, a new object-oriented data quality model has been proposed. It enables data quality information to be documented and stored in a multi-level structure and to be integrally linked with spatial data to allow access, processing and graphic visualisation. The conceptual model for data quality management is proposed where a data quality storage model, uncertainty models and visualisation methods are three basic components. This model establishes the processes involved when managing data quality, emphasising on the integration of uncertainty modelling and visualisation techniques. The above studies lay the theoretical foundations for the development of a prototype system with the ability to manage data quality. Object-oriented approach, database technology and programming technology have been integrated to design and implement the prototype system within the ESRI ArcGIS software. The object-oriented approach allows the prototype to be developed in a more flexible and easily maintained manner. The prototype allows users to browse and access data quality information at different levels. Moreover, a set of error and uncertainty models are embedded within the system. With the prototype, data quality elements can be extracted from the database and automatically linked with the appropriate error and uncertainty models, as well as with their implications in the form of simple maps. This function results in proposing a set of different uncertainty models for users to choose for assessing how uncertainty inherent in the data can affect their specific application. It will significantly increase the users' confidence in using data for a particular situation. To demonstrate the enhanced capability of the prototype, the system has been tested against the real data. The implementation has shown that the prototype can efficiently assist data users, especially non-expert users, to better understand data quality and utilise it in a more practical way. The methodologies and approaches for managing quality information presented in this thesis should serve as an impetus for supporting further research.

APA, Harvard, Vancouver, ISO, and other styles

10

Voigt, Hannes. "Flexibility in Data Management." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-136681.

Full text

Abstract:

With the ongoing expansion of information technology, new fields of application requiring data management emerge virtually every day. In our knowledge culture increasing amounts of data and work force organized in more creativity-oriented ways also radically change traditional fields of application and question established assumptions about data management. For instance, investigative analytics and agile software development move towards a very agile and flexible handling of data. As the primary facilitators of data management, database systems have to reflect and support these developments. However, traditional database management technology, in particular relational database systems, is built on assumptions of relatively stable application domains. The need to model all data up front in a prescriptive database schema earned relational database management systems the reputation among developers of being inflexible, dated, and cumbersome to work with. Nevertheless, relational systems still dominate the database market. They are a proven, standardized, and interoperable technology, well-known in IT departments with a work force of experienced and trained developers and administrators. This thesis aims at resolving the growing contradiction between the popularity and omnipresence of relational systems in companies and their increasingly bad reputation among developers. It adapts relational database technology towards more agility and flexibility. We envision a descriptive schema-comes-second relational database system, which is entity-oriented instead of schema-oriented; descriptive rather than prescriptive. The thesis provides four main contributions: (1)~a flexible relational data model, which frees relational data management from having a prescriptive schema; (2)~autonomous physical entity domains, which partition self-descriptive data according to their schema properties for better query performance; (3)~a freely adjustable storage engine, which allows adapting the physical data layout used to properties of the data and of the workload; and (4)~a self-managed indexing infrastructure, which autonomously collects and adapts index information under the presence of dynamic workloads and evolving schemas. The flexible relational data model is the thesis\' central contribution. It describes the functional appearance of the descriptive schema-comes-second relational database system. The other three contributions improve components in the architecture of database management systems to increase the query performance and the manageability of descriptive schema-comes-second relational database systems. We are confident that these four contributions can help paving the way to a more flexible future for relational database management technology.

APA, Harvard, Vancouver, ISO, and other styles

11

Nguyen, Benjamin. "Privacy-Centric Data Management." Habilitation à diriger des recherches, Université de Versailles-Saint Quentin en Yvelines, 2013. http://tel.archives-ouvertes.fr/tel-00936130.

Full text

Abstract:

This document will focus on my core computer science research since 2010, covering the topic of data management and privacy. More speci cally, I will present the following topics : - A new paradigm, called Trusted Cells for privacy-centric personal data management based on the Asymmetric Architecture composed of trusted or open (low power) distributed hardware devices acting as personal data servers and a highly powerful, highly available supporting server, such as a cloud. (Chapter 2). - Adapting aggregate data computation techniques to the Trusted Cells environment, with the example of Privacy-Preserving Data Publishing (Chapter 3). - Minimizing the data that leaves a Trusted Cell, i.e. enforcing the general privacy principle of Limited Data Collection (Chapter 4). This document contains only results that have already been published. As such, rather than focus on the details and technicalities of each result, I have tried to provide an easy way to have a global understanding of the context behind the work, explain the problematic of the work, and give a summary of the main scienti c results and impact.

APA, Harvard, Vancouver, ISO, and other styles

12

Monk, Kitty A. "Data management in MARRS." Thesis, Kansas State University, 1986. http://hdl.handle.net/2097/9939.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Uichanco, Joline Ann Villaranda. "Data-driven revenue management." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/41728.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, Computation for Design and Optimization Program, 2007.
Includes bibliographical references (p. 125-127).
In this thesis, we consider the classical newsvendor model and various important extensions. We do not assume that the demand distribution is known, rather the only information available is a set of independent samples drawn from the demand distribution. In particular, the variants of the model we consider are: the classical profit-maximization newsvendor model, the risk-averse newsvendor model and the price-setting newsvendor model. If the explicit demand distribution is known, then the exact solutions to these models can be found either analytically or numerically via simulation methods. However, in most real-life settings, the demand distribution is not available, and usually there is only historical demand data from past periods. Thus, data-driven approaches are appealing in solving these problems. In this thesis, we evaluate the theoretical and empirical performance of nonparametric and parametric approaches for solving the variants of the newsvendor model assuming partial information on the distribution. For the classical profit-maximization newsvendor model and the risk-averse newsvendor model we describe general non-parametric approaches that do not make any prior assumption on the true demand distribution. We extend and significantly improve previous theoretical bounds on the number of samples required to guarantee with high probability that the data-driven approach provides a near-optimal solution. By near-optimal we mean that the approximate solution performs arbitrarily close to the optimal solution that is computed with respect to the true demand distributions.
(cont.) For the price-setting newsvendor problem, we analyze a previously proposed simulation-based approach for a linear-additive demand model, and again derive bounds on the number of samples required to ensure that the simulation-based approach provides a near-optimal solution. We also perform computational experiments to analyze the empirical performance of these data-driven approaches.
by Joline Ann Villaranda Uichanco.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

14

Garling, James, and David Cahill. "ENTERPRISE DATA MANAGEMENT SYSTEMS." International Foundation for Telemetering, 2003. http://hdl.handle.net/10150/605813.

Full text

Abstract:

International Telemetering Conference Proceedings / October 20-23, 2003 / Riviera Hotel and Convention Center, Las Vegas, Nevada
This paper discusses ongoing regulatory effects on efforts aimed at developing data infrastructures that assist test engineers in achieving information superiority and for maintaining their information, and on possible architectural frameworks for resolving the engineer’s need versus the regulatory requirements. Since current commercial-off-the-shelf (COTS) Enterprise Content Management (ECM) systems are targeted primarily at business environments such as back office applications, financial sectors, and manufacturing, these COTS systems do not provide sufficient focus for managing the unique aspects of flight test data and associated artifacts (documents, drawings, pretest data, etc.). This paper presents our ongoing efforts for deploying a storage infrastructure independent enterprise data management system for maintaining vital up-to-date information and for managing the archival of such data.

APA, Harvard, Vancouver, ISO, and other styles

15

Anumalla, Kalyani. "DATA PREPROCESSING MANAGEMENT SYSTEM." University of Akron / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=akron1196650015.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Domingues, Sérgio Rafael de Oliveira. "Market Data information management." Master's thesis, Universidade de Aveiro, 2009. http://hdl.handle.net/10773/1740.

Full text

Abstract:

Mestrado em Engenharia e Gestão Industrial
O presente relatório de projecto resulta de um estágio curricular realizado na empresa Bosch Termotecnologia, S.A. em Cacia, ao longo de oito meses. O trabalho realizado consistiu numa recolha de informação oriunda de vários mercados, a fim de se organizar e disponibilizar a mesma, inicialmente a nível interno e depois às restantes filiais. A apresentação deste relatório de projecto foi feita em quatro capítulos. No primeiro foi feito um enquadramento teórico dos Sistemas de Informação (SI) nas organizações, no segundo foi efectuada a apresentação e evolução da empresa e no terceiro capítulo foi apresentado o projecto que se pretende implementar. Na sequência dos capítulos anteriormente referidos houve lugar às conclusões e recomendações que são apresentadas no quarto capítulo.
This present work is the outcome of an academic internship that took place at Bosch Termotecnologia, S.A. in Cacia, over eight months. The work has consisted in gathering information from different markets in order to organize and provide the results, initually, internally and, later on, to the other fimr’s subsidiaries. This present work is presented in three different chapters. The first chapter addresses a theoretical framework of Information Systems (IS) in organizations. The second chapter introduces and presents the evolution of the company and in the third chapter presents how the project was implemented. Following the abovementioned chapters, chapter four presents the conclusions and recommendations.

APA, Harvard, Vancouver, ISO, and other styles

17

Jäkel, Tobias. "Role-based Data Management." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-224416.

Full text

Abstract:

Database systems build an integral component of today’s software systems and as such they are the central point for storing and sharing a software system’s data while ensuring global data consistency at the same time. Introducing the primitives of roles and their accompanied metatype distinction in modeling and programming languages, results in a novel paradigm of designing, extending, and programming modern software systems. In detail, roles as modeling concept enable a separation of concerns within an entity. Along with its rigid core, an entity may acquire various roles in different contexts during its lifetime and thus, adapts its behavior and structure dynamically during runtime. Unfortunately, database systems, as important component and global consistency provider of such systems, do not keep pace with this trend. The absence of a metatype distinction, in terms of an entity’s separation of concerns, in the database system results in various problems for the software system in general, for the application developers, and ﬁnally for the database system itself. In case of relational database systems, these problems are concentrated under the term role-relational impedance mismatch. In particular, the whole software system is designed by using different semantics on various layers. In case of role-based software systems in combination with relational database systems this gap in semantics between applications and the database system increases dramatically. Consequently, the database system cannot directly represent the richer semantics of roles as well as the accompanied consistency constraints. These constraints have to be ensured by the applications and the database system loses its single point of truth characteristic in the software system. As the applications are in charge of guaranteeing global consistency, their development requires more effort in data management. Moreover, the software system’s data management is distributed over several layers, which results in an unstructured software system architecture. To overcome the role-relational impedance mismatch and bring the database system back in its rightful position as single point of truth in a software system, this thesis introduces the novel and tripartite RSQL approach. It combines a novel database model that represents the metatype distinction as ﬁrst class citizen in a database system, an adapted query language on the database model’s basis, and ﬁnally a proper result representation. Precisely, RSQL’s logical database model introduces Dynamic Data Types, to directly represent the separation of concerns within an entity type on the schema level. On the instance level, the database model deﬁnes the notion of a Dynamic Tuple that combines an entity with the notion of roles and thus, allows for dynamic structure adaptations during runtime without changing an entity’s overall type. These deﬁnitions build the main data structures on which the database system operates. Moreover, formal operators connecting the query language statements with the database model data structures, complete the database model. The query language, as external database system interface, features an individual data deﬁnition, data manipulation, and data query language. Their statements directly represent the metatype distinction to address Dynamic Data Types and Dynamic Tuples, respectively. As a consequence of the novel data structures, the query processing of Dynamic Tuples is completely redesigned. As last piece for a complete database integration of a role-based notion and its accompanied metatype distinction, we specify the RSQL Result Net as result representation. It provides a novel result structure and features functionalities to navigate through query results. Finally, we evaluate all three RSQL components in comparison to a relational database system. This assessment clearly demonstrates the beneﬁts of the roles concept’s full database integration.

APA, Harvard, Vancouver, ISO, and other styles

18

Öhman, Mikael. "a Data-Warehouse Solution for OMS Data Management." Thesis, Umeå universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-80688.

Full text

Abstract:

A database system for storing and querying data of a dynamic schema has been developed based on the kdb+ database management system and the q programming language for use in a financial setting of order and execution services. Some basic assumptions of mandatory fields of the data to be stored are made including that the data are time-series based.A dynamic schema enables an Order-Management System (OMS) to store information not suitable or usable when stored in log files or traditional databases. Log files are linear, cannot be queried effectively and are not suitable for the volumes produced by modern OMSs. Traditional databases are typically row-oriented which does not suit time-series based data and rely on the relational model which uses statically typed sets to store relations.The created system includes software that is capable of mining the actual schema stored in the database and visualize it. This enables ease of exploratory querying and production of applications which use the database. A feedhandler has been created optimized for handling high volumes of data. Volumes in finance are steadily growing as the industry continues to adopt computer automation of tasks. Feedhandler performance is important to reduce latency and for cost savings as a result of not having to scale horizontally. A study of the area of algorithmic trading has been performed with focus on transaction-cost analysis. Fundamental algorithms have been reviewed.A proof of concept application has been created that simulates an OMS storing logs on the execution of a Volume Weighted Average Price (VWAP) trading algorithm. The stored logs are then used in order to improve the performance of the trading algorithm through basic data mining and machine learning techniques. The actual learning algorithm focuses on predicting intraday volume patterns.

APA, Harvard, Vancouver, ISO, and other styles

19

Wang, Yi. "Data Management and Data Processing Support on Array-Based Scientific Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436157356.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Parviainen, A. (Antti). "Product portfolio management requirements for product data management." Master's thesis, University of Oulu, 2014. http://urn.fi/URN:NBN:fi:oulu-201409021800.

Full text

Abstract:

In large organisations today the amount of products is numerous and it is challenging for senior management to have proper control and understanding over all products. As the product is the most important aspect for an organisation to consider, senior management must have ability to manage investments on products and follow development of product related indicators. Managing products as investment on portfolio level, where products are divided into a limited amount of portfolios is a solution for achieving decent control over product investments on senior management level. Product portfolio management is decision making oriented, where the goal is to make the best possible strategic and financial decisions when allocating constraint resources across the entire product portfolio. The product portfolio management aims to increase strategic fit of chosen new product projects, balance in product portfolio and maximizing value of the products. The product portfolio management is constantly ongoing, cross-functional decision making function which is present in all lifecycle states of the portfolios. In this research the product portfolios are seen as investments for mainly internal use of a decision making process. The product portfolios are items that are embodied into the case company’s product data management system and the product portfolios have own lifecycle states. Approach in this research is constructive, where a current state of the case company is analysed and based on the analysis and the literature review a construction is established. The Research questions are: 1) What are the required product structures in product data management systems to support product portfolio management practices? 2) What are the information elements and their lifecycle states and what they should be in product data management systems to support product portfolio decisions? Results of this research are the current state analysis committed in the case company and the construction of product portfolio management structure and lifecycle states. In the construction a portfolio package is defined. The portfolio package is the item used for embodying portfolios into the information systems. An information model for implementing the portfolio packages into the product data management system is introduced. The construction also presents product structure for implementing the portfolio package into the product data management system. Relation of lifecycle states between the portfolio package and other items in a product hierarchy is assessed in a nested lifecycle model. Two models, required and recommended, are suggested for the company to consider for managing the lifecycle of the portfolio package item. All the results are validated from several perspectives.

APA, Harvard, Vancouver, ISO, and other styles

21

Donohue, Christine M. Hayes Gregory A. Dolk Daniel R. Bui Tung X. "Data management : implementation and lessons learned from Department of the Army data management program /." Monterey, Calif. : Springfield,Va. : Naval Postgraduate School; Available from the National Technical Information Service, 1992. http://handle.dtic.mil/100.2/ADA257858.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Donohue, Christine M. "Data management : implementation and lessons learned from Department of the Army data management program." Thesis, Monterey, California. Naval Postgraduate School, 1992. http://hdl.handle.net/10945/30610.

Full text

Abstract:

Approved for public release; distribution is unlimited.
Information systems executives within Department of Defense (DoD) activities are being challenged to develop innovative ways in which information technology can contribute to the streamlining of DoD organizations. A key step in developing information systems that will meet the future needs of DoD organizations is to manage the data resource. This thesis examines the concepts, implementation strategies, and issues relating to data management and illustrates, using a case study of the Department of the Army data management methodology, the critical success factors required to implement data management programs throughout the DoD. Data management, Data standardization, Information resource management.

APA, Harvard, Vancouver, ISO, and other styles

23

Saravanan, Mahesh. "Expressions as Data in Relational Data Base Management Systems." ScholarWorks@UNO, 2006. http://scholarworks.uno.edu/td/500.

Full text

Abstract:

Numerous applications, such as publish/subscribe, website personalization, applications involving continuous queries, etc., require that user.s interest be persistently maintained and matched with the expected data. Conditional Expressions can be used to maintain user interests. This thesis focuses on the support for expression data type in relational database system, allowing storing of conditional expressions as .data. in columns of database tables and evaluating those expressions using an EVALUATE operator. With this context, expressions can be interpreted as descriptions, queries, and filters, and this significantly broadens the use of a relational database system to support new types of applications. The thesis presents an overview of the expression data type, storing the expressions, evaluating the stored expressions and shows how these applications can be easily supported with improved functionality. A sample application is also explained in order to show the importance of expressions in application context, with a comparison of the application with and without expressions.

APA, Harvard, Vancouver, ISO, and other styles

24

Melander, Lars. "Integrating Visual Data Flow Programming with Data Stream Management." Doctoral thesis, Uppsala universitet, Datalogi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-286536.

Full text

Abstract:

Data stream management and data flow programming have many things in common. In both cases one wants to transfer possibly infinite sequences of data items from one place to another, while performing transformations to the data. This Thesis focuses on the integration of a visual programming language with a data stream management system (DSMS) to support the construction, configuration, and visualization of data stream applications. In the approach, analyses of data streams are expressed as continuous queries (CQs) that emit data in real-time. The LabVIEW visual programming platform has been adapted to support easy specification of continuous visualization of CQ results. LabVIEW has been integrated with the DSMS SVALI through a stream-oriented client-server API. Query programming is declarative, and it is desirable to make the stream visualization declarative as well, in order to raise the abstraction level and make programming more intuitive. This has been achieved by adding a set of visual data flow components (VDFCs) to LabVIEW, based on the LabVIEW actor framework. With actor-based data flows, visualization of data stream output becomes more manageable, avoiding the procedural control structures used in conventional LabVIEW programming while still utilizing the comprehensive, built-in LabVIEW visualization tools. The VDFCs are part of the Visual Data stream Monitor (VisDM), which is a client-server based platform for handling real-time data stream applications and visualizing stream output. VDFCs are based on a data flow framework that is constructed from the actor framework, and are divided into producers, operators, consumers, and controls. They allow a user to set up the interface environment, customize the visualization, and convert the streaming data to a format suitable for visualization. Furthermore, it is shown how LabVIEW can be used to graphically define interfaces to data streams and dynamically load them in SVALI through a general wrapper handler. As an illustration, an interface has been defined in LabVIEW for accessing data streams from a digital 3D antenna. VisDM has successfully been tested in two real-world applications, one at Sandvik Coromant and one at the Ångström Laboratory, Uppsala University. For the first case, VisDM was deployed as a portable system to provide direct visualization of machining data streams. The data streams can differ in many ways as do the various visualization tasks. For the second case, data streams are homogenous, high-rate, and query operations are much more computation-demanding. For both applications, data is visualized in real-time, and VisDM is capable of sufficiently high update frequencies for processing and visualizing the streaming data without obstructions. The uniqueness of VisDM is the combination of a powerful and versatile DSMS with visually programmed and completely customizable visualization, while maintaining the complete extensibility of both.

APA, Harvard, Vancouver, ISO, and other styles

25

Tatarinov, Igor. "Semantic data sharing with a peer data management system /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/6942.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Ofori-Duodu, Michael Samuel. "Exploring Data Security Management Strategies for Preventing Data Breaches." ScholarWorks, 2019. https://scholarworks.waldenu.edu/dissertations/7947.

Full text

Abstract:

Insider threat continues to pose a risk to organizations, and in some cases, the country at large. Data breach events continue to show the insider threat risk has not subsided. This qualitative case study sought to explore the data security management strategies used by database and system administrators to prevent data breaches by malicious insiders. The study population consisted of database administrators and system administrators from a government contracting agency in the northeastern region of the United States. The general systems theory, developed by Von Bertalanffy, was used as the conceptual framework for the research study. The data collection process involved interviewing database and system administrators (n = 8), organizational documents and processes (n = 6), and direct observation of a training meeting (n = 3). By using methodological triangulation and by member checking with interviews and direct observation, efforts were taken to enhance the validity of the findings of this study. Through thematic analysis, 4 major themes emerged from the study: enforcement of organizational security policy through training, use of multifaceted identity and access management techniques, use of security frameworks, and use of strong technical control operations mechanisms. The findings of this study may benefit database and system administrators by enhancing their data security management strategies to prevent data breaches by malicious insiders. Enhanced data security management strategies may contribute to social change by protecting organizational and customer data from malicious insiders that could potentially lead to espionage, identity theft, trade secrets exposure, and cyber extortion.

APA, Harvard, Vancouver, ISO, and other styles

27

Tudoran, Radu-Marius. "High-Performance Big Data Management Across Cloud Data Centers." Electronic Thesis or Diss., Rennes, École normale supérieure, 2014. http://www.theses.fr/2014ENSR0004.

Full text

Abstract:

La puissance de calcul facilement accessible offerte par les infrastructures clouds, couplés à la révolution du "Big Data", augmentent l'échelle et la vitesse auxquelles l'analyse des données est effectuée. Les ressources de cloud computing pour le calcul et le stockage sont répartis entre plusieurs centres de données de par le monde. Permettre des transferts de données rapides devient particulièrement important dans le cadre d'applications scientifiques pour lesquels déplacer le traitement proche de données est coûteux voire impossible. Les principaux objectifs de cette thèse consistent à analyser comment les clouds peuvent devenir "Big Data - friendly", et quelles sont les meilleures options pour fournir des services de gestion de données aptes à répondre aux besoins des applications. Dans cette thèse, nous présentons nos contributions pour améliorer la performance de la gestion de données pour les applications exécutées sur plusieurs centres de données géographiquement distribués. Nous commençons avec les aspects concernant l'échelle du traitement de données sur un site, et poursuivons avec le développements de solutions de type MapReduce permettant la distribution des calculs entre plusieurs centres. Ensuite, nous présentons une architecture de service de transfert qui permet d'optimiser le rapport coût-performance des transferts. Ce service est exploité dans le contexte de la diffusion de données en temps-réel entre des centres de données de clouds. Enfin, nous étudions la viabilité, pour une fournisseur de cloud, de la solution consistant à intégrer cette architecture comme un service basé sur un paradigme de tarification flexible, qualifiée de "Transfert-as-a-Service"
The easily accessible computing power offered by cloud infrastructures, coupled with the "Big Data" revolution, are increasing the scale and speed at which data analysis is performed. Cloud computing resources for compute and storage are spread across multiple data centers around the world. Enabling fast data transfers becomes especially important in scientific applications where moving the processing close to data is expensive or even impossible. The main objectives of this thesis are to analyze how clouds can become "Big Data - friendly", and what are the best options to provide data management services able to meet the needs of applications. In this thesis, we present our contributions to improve the performance of data management for applications running on several geographically distributed data centers. We start with aspects concerning the scale of data processing on a site, and continue with the development of MapReduce type solutions allowing the distribution of calculations between several centers. Then, we present a transfer service architecture that optimizes the cost-performance ratio of transfers. This service is operated in the context of real-time data streaming between cloud data centers. Finally, we study the viability, for a cloud provider, of the solution consisting in integrating this architecture as a service based on a flexible pricing paradigm, qualified as "Transfer-as-a-Service"

APA, Harvard, Vancouver, ISO, and other styles

28

Heerde, Harold Johann Wilhelm van. "Privacy-aware data management by means of data degradation." Versailles-St Quentin en Yvelines, 2010. http://www.theses.fr/2010VERS0031.

Full text

Abstract:

Les fournisseurs de services recueillent de plus en plus d'informations personnelles sensibles, bien qu’il soit réputé comme très difficile de protéger efficacement ces informations contre le pira-tage, la fuite d’information par négligence, le contournement de chartes de confidentialité peu précises, et les usages abusifs d’administrateurs de données peu scrupuleux. Dans cette thèse, nous conjecturons qu’une rétention sans limite de données sensibles dans une base de données mènera inévitablement à une divulgation non autorisée de ces données. Limiter dans le temps la rétention d'informations sensibles réduit la quantité de données emmagasinées et donc l'impact d'une telle divulgation. La première contribution de cette thèse porte sur la proposition d’un mo-dèle particulier de rétention basé sur une dégradation progressive et irréversible de données sensibles. Effacer les données d'une base de données est une tâche difficile à mettre en œuvre techniquement; la dégradation de données a en effet un impact sur les structures de stockage, l'indexation, la gestion de transactions et les mécanismes de journalisation. Pour permettre une dégradation irréversible des données, nous proposons plusieurs techniques telles que le stockage des don-nées ordonnées par le temps de dégradation et l'utilisation de techniques ad-hoc de chiffrement. Les techniques proposées sont validées par une analyse théorique ainsi que par l’implémentation d’un prototype
Service-providers collect more and more privacy-sensitive information, even though it is hard to protect this information against hackers, abuse of weak privacy policies, negligence, and malicious database administrators. In this thesis, we take the position that endless retention of privacy-sensitive information will inevitably lead to unauthorized data disclosure. Limiting the retention of privacy-sensitive information limits the amount of stored data and therefore the impact of such a disclosure. Removing data from a database system is not a straightforward task; data degradation has an impact on the storage structure, indexing, transaction management, and logging mechanisms. To show the feasibility of data degradation, we provide several techniques to implement it; mainly, a combination of keeping data sorted on degradation time and using encryption techniques where possible. The techniques are founded with a prototype implementation and a theoretical analysis

APA, Harvard, Vancouver, ISO, and other styles

29

Lee, Yong Woo. "Data aggregation for capacity management." Thesis, [College Station, Tex. : Texas A&M University, 2003. http://hdl.handle.net/1969.1/90.

Full text

Abstract:

Thesis (M.S.)--Texas A&M University, 2003.
"Major Subject: Industrial Engineering" Title from author supplied metadata (automated record created on Jul. 18, 2005.) Vita. Abstract. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

30

Herrmann, Kai. "Multi-Schema-Version Data Management." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-231946.

Full text

Abstract:

Modern agile software development methods allow to continuously evolve software systems by easily adding new features, fixing bugs, and adapting the software to changing requirements and conditions while it is continuously used by the users. A major obstacle in the agile evolution is the underlying database that persists the software system’s data from day one on. Hence, evolving the database schema requires to evolve the existing data accordingly—at this point, the currently established solutions are very expensive and error-prone and far from agile. In this thesis, we present InVerDa, a multi-schema-version database system to facilitate agile database development. Multi-schema-version database systems provide multiple schema versions within the same database, where each schema version itself behaves like a regular single-schema database. Creating new schema versions is very simple to provide the desired agility for database development. All created schema versions can co-exist and write operations are immediately propagated between schema versions with a best-effort strategy. Developers do not have to implement the propagation logic of data accesses between schema versions by hand, but InVerDa automatically generates it. To facilitate multi-schema-version database systems, we equip developers with a relational complete and bidirectional database evolution language (BiDEL) that allows to easily evolve existing schema versions to new ones. BiDEL allows to express the evolution of both the schema and the data both forwards and backwards in intuitive and consistent operations; the BiDEL evolution scripts are orders of magnitude shorter than implementing the same behavior with standard SQL and are even less likely to be erroneous, since they describe a developer’s intention of the evolution exclusively on the level of tables without further technical details. Having the developers’ intentions explicitly given in the BiDEL scripts further allows to create a new schema version by merging already existing ones. Having multiple co-existing schema versions in one database raises the need for a sophisticated physical materialization. Multi-schema-version database systems provide full data independence, hence the database administrator can choose a feasible materialization, whereby the multi-schema-version database system internally ensures that no data is lost. The search space of possible materializations can grow exponentially with the number of schema versions. Therefore, we present an adviser that releases the database administrator from diving into the complex performance characteristics of multi-schema-version database systems and merely proposes an optimized materialization for a given workload within seconds. Optimized materializations have shown to improve the performance for a given workload by orders of magnitude. We formally guarantee data independence for multi-schema-version database systems. To this end, we show that every single schema version behaves like a regular single-schema database independent of the chosen physical materialization. This important guarantee allows to easily evolve and access the database in agile software development—all the important features of relational databases, such as transaction guarantees, are preserved. To the best of our knowledge, we are the first to realize such a multi-schema-version database system that allows agile evolution of production databases with full support of co-existing schema versions and formally guaranteed data independence.

APA, Harvard, Vancouver, ISO, and other styles

31

Matus, Castillejos Abel, and n/a. "Management of Time Series Data." University of Canberra. Information Sciences & Engineering, 2006. http://erl.canberra.edu.au./public/adt-AUC20070111.095300.

Full text

Abstract:

Every day large volumes of data are collected in the form of time series. Time series are collections of events or observations, predominantly numeric in nature, sequentially recorded on a regular or irregular time basis. Time series are becoming increasingly important in nearly every organisation and industry, including banking, finance, telecommunication, and transportation. Banking institutions, for instance, rely on the analysis of time series for forecasting economic indices, elaborating financial market models, and registering international trade operations. More and more time series are being used in this type of investigation and becoming a valuable resource in today�s organisations. This thesis investigates and proposes solutions to some current and important issues in time series data management (TSDM), using Design Science Research Methodology. The thesis presents new models for mapping time series data to relational databases which optimise the use of disk space, can handle different time granularities, status attributes, and facilitate time series data manipulation in a commercial Relational Database Management System (RDBMS). These new models provide a good solution for current time series database applications with RDBMS and are tested with a case study and prototype with financial time series information. Also included is a temporal data model for illustrating time series data lifetime behaviour based on a new set of time dimensions (confidentiality, definitiveness, validity, and maturity times) specially targeted to manage time series data which are introduced to correctly represent the different status of time series data in a timeline. The proposed temporal data model gives a clear and accurate picture of the time series data lifecycle. Formal definitions of these time series dimensions are also presented. In addition, a time series grouping mechanism in an extensible commercial relational database system is defined, illustrated, and justified. The extension consists of a new data type and its corresponding rich set of routines that support modelling and operating time series information within a higher level of abstraction. It extends the capability of the database server to organise and manipulate time series into groups. Thus, this thesis presents a new data type that is referred to as GroupTimeSeries, and its corresponding architecture and support functions and operations. Implementation options for the GroupTimeSeries data type in relational based technologies are also presented. Finally, a framework for TSDM with enough expressiveness of the main requirements of time series application and the management of that data is defined. The framework aims at providing initial domain know-how and requirements of time series data management, avoiding the impracticability of designing a TSDM system on paper from scratch. Many aspects of time series applications including the way time series data are organised at the conceptual level are addressed. The central abstraction for the proposed domain specific framework is the notions of business sections, group of time series, and time series itself. The framework integrates comprehensive specification regarding structural and functional aspects for time series data management. A formal framework specification using conceptual graphs is also explored.

APA, Harvard, Vancouver, ISO, and other styles

32

Deshmukh, Pritam. "Data uncertainity in bridge management." Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/4510.

Full text

Abstract:

Thesis (M.S.) University of Missouri-Columbia, 2006.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (May 20, 2007) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

33

Wang, Yanchao. "Protein Structure Data Management System." Digital Archive @ GSU, 2007. http://digitalarchive.gsu.edu/cs_diss/20.

Full text

Abstract:

With advancement in the development of the new laboratory instruments and experimental techniques, the protein data has an explosive increasing rate. Therefore how to efficiently store, retrieve and modify protein data is becoming a challenging issue that most biological scientists have to face and solve. Traditional data models such as relational database lack of support for complex data types, which is a big issue for protein data application. Hence many scientists switch to the object-oriented databases since object-oriented nature of life science data perfectly matches the architecture of object-oriented databases, but there are still a lot of problems that need to be solved in order to apply OODB methodologies to manage protein data. One major problem is that the general-purpose OODBs do not have any built-in data types for biological research and built-in biological domain-specific functional operations. In this dissertation, we present an application system with built-in data types and built-in biological domain-specific functional operations that extends the Object-Oriented Database (OODB) system by adding domain-specific additional layers Protein-QL, Protein Algebra Architecture and Protein-OODB above OODB to manage protein structure data. This system is composed of three parts: 1) Client API to provide easy usage for different users. 2) Middleware including Protein-QL, Protein Algebra Architecture and Protein-OODB is designed to implement protein domain specific query language and optimize the complex queries, also it capsulates the details of the implementation such that users can easily understand and master Protein-QL. 3) Data Storage is used to store our protein data. This system is for protein domain, but it can be easily extended into other biological domains to build a bio-OODBMS. In this system, protein, primary, secondary, and tertiary structures are defined as internal data types to simplify the queries in Protein-QL such that the domain scientists can easily master the query language and formulate data requests, and EyeDB is used as the underlying OODB to communicate with Protein-OODB. In addition, protein data is usually stored as PDB format and PDB format is old, ambiguous, and inadequate, therefore, PDB data curation will be discussed in detail in the dissertation.

APA, Harvard, Vancouver, ISO, and other styles

34

Chiu, Chao-Ying. "Visualization of construction management data." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/37903.

Full text

Abstract:

To date, the research and development effort as reported in the literature for presenting input/output data in support of human judgment for conducting construction management (CM) functions and associated tasks has been relatively limited. In practice, CM practitioners often find it difficult to digest and interpret input/output information because of the sheer volume and high dimensionality of data. One way to address this need is to improve the data reporting capability of a construction management information system, which traditionally focuses mainly on using tabular/textual reports. Data visualization is a promising technology to enhance current reporting by creating a CM data visualization environment integrated within a CM information system. Findings from a literature review combined with a deep understanding of the CM domain were used to identify design guidelines for CM data visualization. A top-down design approach was utilized to analyze general requirements of a CM data visualization environment (e.g. common visualization features) that effect visual CM analytics for a broad range of CM functions/tasks. A bottom-up design process integrated with design guidelines and the top-down design process was then employed to implement individual visualizations in support of specific CM analytics and to acquire lessons learned for enriching the design guidelines and common visualization features. Taken together, these three components provide a potent approach for developing a data visualization tool tailored to supporting CM analytics. A research prototype CM data visualization environment that has an organization of thematic visualizations categorized by construction conditions and performance measures under multiple views of a project was created. Features of images generated from the foregoing visualizations can be characterized by different themes, types, contents, and/or formats. The visualization environment provides interaction features for changing/setting options that characterize images and enhancing readability of images as well as a mechanism for coordinating interaction features to increase efficiency of use. Case studies conducted using this environment provide the means for comparing its use with current (traditional) data reporting for CM functions related to time, quality, and change management. It is demonstrated that visual analytics enhances CM analytics capabilities applicable to a broad range of CM functions/tasks.

APA, Harvard, Vancouver, ISO, and other styles

35

Bansal, Dheeraj Kumar. "Non-identifying Data Management Systems." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-172351.

Full text

Abstract:

In this thesis we performed an initial analysis of a single business process fromadministrative data management, with respect to identifying the need to bindauthenticated identities to actions at the different steps of the process. Based onthis analysis we proposed a new model for the business process. We evaluatedour model with different evaluation criteria, set during the initial phase. Basedon discussion with the stakeholders, we arrived at the conclusion that eventhough our proposed system solves a lot of privacy related problems for thestakeholders, in case of business processes, it is not easy to change the legacysystems. We also found out an interesting set of problems that can arise withsuch systems.
I denna avhandling genomförde vi en inledande analys av en affärsprocess i administrativdatahantering. Med fokus på att kartlägga behovet av att bindaautentiserade identiteter till åtgärder på de olika stegen i processen. Baserat pådenna analys har vi föreslagit en ny modell för affärsprocessen. Vi utvärderadevår modell med olika utvärderingskriterier, som fastställdes under den inledandefasen. Baserat på diskussioner med de berörda parterna, kom vi fram till slutsatsenatt även om vårt föreslagna systemet löser en hel del av integritet relateradeproblem för de berörda parterna, i fråga om affärsprocesser, är det inte lätt attändra befintliga system. Vi fann också en intressant uppsättning av problemsom kan uppstå med sådana system.

APA, Harvard, Vancouver, ISO, and other styles

36

Harley, Samuel, Michael Reil, Thea Blunt-Henderson, and George Bartlett. "Data, Information, and Knowledge Management." International Foundation for Telemetering, 2005. http://hdl.handle.net/10150/604784.

Full text

Abstract:

ITC/USA 2005 Conference Proceedings / The Forty-First Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2005 / Riviera Hotel & Convention Center, Las Vegas, Nevada
The Aberdeen Test Center Versatile Information System – Integrated, ONline (VISION) project has developed and deployed a telemetry capability based upon modular instrumentation, seamless communications, and the VISION Digital Library. Each of the three key elements of VISION contributes to a holistic solution to the data collection, distribution, and management requirements of Test and Evaluation. This paper provides an overview of VISION instrumentation, communications, and overall data management technologies, with a focus on engineering performance data.

APA, Harvard, Vancouver, ISO, and other styles

37

Okkonen, O. (Olli). "RESTful clinical data management system." Master's thesis, University of Oulu, 2015. http://urn.fi/URN:NBN:fi:oulu-201505291735.

Full text

Abstract:

Elämme digitalisaation aikakautta, jolloin tietokoneiden avulla on mahdollista saavuttaa kustannustehokkuutta ja automaatio tiedonhallintaan. Kliiniset tutkimukset eivät ole onnistuneet pysymään mukana teknologiakehityksessä vaan turvautuvat yhä perinteisiin paperisiin keinoihin, mikä hidastaa ja vaikeuttaa tiedon hallintaa suurina määrinä viivästyttäen tutkimuksen loppuanalyysia. Suurimmat syyt tähän ovat olleen kehnot ohjelmistojen laadut, tietotaidon puute sekä epäonnistuneet käyttöönotot organisaation tasolla. Tämä diplomityö esittelee Genesiksen, web-pohjaisen kliinisen tiedonhallintajärjestelmän tukemaan LIRA-tutkimuksen tarpeita Suomessa sekä Ruotsissa. Työssä esitellään kuinka Genesiksen kehityksessä on huomioitu tietoturva ja ketterän kehityksen periaatteet tarjoamaan ohjelmistolle vaivatonta käyttöä sekä lisäarvoa uudelleenkäytettävyydellä. Uudelleenkäytettävyyttä on tavoitettu ohjelmiston helpolla siirrettävyydellä tuleviin tutkimuksiin sekä yhteensopivuutta web-pohjaisten laitteiden kanssa yhtenäisellä rajapinnalla. Lisäksi, työssä esitellään Genesiksen toteutus ja pohditaan järjestelmän tulevaisuutta. Alustavasti Genesiksen hyödyntämisestä on kiinnostunut myös maailman suurin tyypin-1 diabetes tutkimus
In the era of digitalization, clinical trials have often been left behind in adoption of automation and cost-efficiency offered by computerized systems. Poor implementations, lack of technical experience, and inertia caused by overlapping old and new procedures have failed to prove the business value of data management systems. This has led into settling for inadequate tools for data management, leaving many studies struggling with traditional approaches involving heavy paper usage further complicating the management and drastically slowing preparations for final analysis. This Master’s Thesis presents Genesis, a web-based clinical data management system development for the LIRA-study, which will take place in Finland and Sweden. Genesis has been developed to address the aforementioned obstacles with adopting information technology solutions in an agile manner with the integration of security concerns. Furthermore, Genesis has been designed to offer the long term value through reusability in terms of effortless portability for upcoming studies and interconnectability with web-enabled legacy system and handheld devices via a uniform interface. In addition to representing the design, implementation and evaluation of Genesis, the future prospects of Genesis are discussed, noting the preliminary interest of utilizing Genesis in additional studies, including the world’s largest type-1 diabetes study

APA, Harvard, Vancouver, ISO, and other styles

38

Owen, J. "Data management in engineering design." Thesis, University of Southampton, 2015. https://eprints.soton.ac.uk/385838/.

Full text

Abstract:

Engineering design involves the production of large volumes of data. These data are a sophisticated mix of high performance computational and experimental results, and must be managed, shared and distributed across worldwide networks. Given limited storage and networking bandwidth, but rapidly growing rates of data production, effective data management is becoming increasingly critical. Within the context of Airbus, a leading aerospace engineering company, this thesis bridges the gap between academia and industry in the management of engineering data. It explores the high performance computing (HPC) environment used in aerospace engineering design, about which little was previously known, and applies the findings to the specific problem of file system cleaning. The properties of Airbus HPC file systems show many similarities with other environments, such as workstations and academic or public HPC file systems, but there are also some notably unique characteristics. In this research study it was found that Airbusfile system volumes exhibit a greater disk usage by a smaller proportion of files than any other case, and a single file type accounts for 65% of the disk space but less than 1% of the files. The characteristics and retention requirements of this file type formed the basis of a new cleaning tool we have researched and deployed within Airbus that is cognizant of these properties, and yielded disk space savings of 21.1 TB (15.2%) and 37.5 TB (28.2%) over two cleaning studies, and may be able to extend the life of existing storage systems by up to 5.5 years. It was also noted that the financial value of the savings already made exceed the cost of this entire research programme. Furthermore, log files contain information about these key files, and further analysis reveals that direct associations can be made to infer valuable additional metadata about such files. These additional metadata were shown to be available for a significant proportion of the data, and could be used to improve the effectiveness and efficiency of future data management methods even further.

APA, Harvard, Vancouver, ISO, and other styles

39

Ingnäs, Joakim, Mikael Söderberg, Nicole Tutsch, and Conrad Åslund. "Digitized management of flight data." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-327984.

Full text

Abstract:

Digital flight logs were created through the use of an already existing application called Open Glider Network. Open Glider Network's system allows collection of data from airplanes that have FLARM installed. FLARM, which is an acronym for Flight Alarm, is the current safety precaution used to prevent collision between smaller aircraft. In addition, it also registers measurements such as the aircraft’s speed, altitude, and position. This project focuses on the use of collected data from the FLARM system to create digitized flight logs for Stockholms Segelflygklubb, made available through a website. This will allow them to compare the system with the analog one to ensure the safety aspects are the same. By doing this, they can minimize the human error concerning the logs when using the system.
Digitala flygloggar skapades genom att använda ett redan existerande system kallat Open Glider Network. Open Glider Network tillåter insamlingen av data från luftfartyg som har ett FLARM system installerat. FLARM, vilket är ett akronym för Flight Alarm, är det nuvarande säkerhetssystemet för att förhindra kollisioner mellan mindre luftfartyg. Utöver detta registerar FLARM även uppmätningar, såsom luftfartygets hastighet, höjd och position. Projektet fokuserar på användandet av insamlad data från FLARM systemet för att generera digitaliserade flygloggar för Stockholms Segelflygklubb, vilka görs tillgängliga via en hemsida. Detta kommer göra det möjligt för klubben att jämföra det digitala systemet med det analoga för att vara säkra på att säkerhetsaspekterna är desamma. Genom att göra detta kan de minimera riskerna för de mänskliga fel som rör flygloggarna när systemet används.

APA, Harvard, Vancouver, ISO, and other styles

40

Mühlberger, Ralf Maximilian. "Data management for interoperable systems /." [St. Lucia, Qld.], 2001. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe16277.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

El, Husseini Wafaa. "Efficient ontology-based data management." Electronic Thesis or Diss., Université de Rennes (2023-....), 2023. https://ged.univ-rennes1.fr/nuxeo/site/esupversions/afaf2edb-f3f2-4765-b1e1-9c960c6a60b4.

Full text

Abstract:

Répondre à des requêtes à l'aide d'ontologies (OMQA) consiste à poser ces requêtes sur des bases de connaissances (KB). Une KB est un ensemble de faits (base de données), qui est décrit par un domaine de connaissance (ontologie). La technique OMQA la plus étudiée est la réécriture FO (FO-rewriting); elle consiste à reformuler une requête pour y intégrer les connaissances pertinentes de l'ontologie, avant de poser la sur la base de données. Telles reformulations peuvent alors être complexes et leur optimisation est cruciale pour l'efficacité. Nous élaborons un nouveau cadre d'optimisation pour la FO-rewriting: les requêtes conjonctives (de type select-project-join) posées sur des KBs en datalog$\pm$ et en règles existentielles, logique de description et OWL, ou RDF/S. On optimise les requêtes produites par les algorithmes de la littérature pour la FO-rewriting, en calculant rapidement, à l'aide du résumé de la base de données, des requêtes plus simples (contenues) avec les mêmes réponses et qui sont évaluées plus rapidement par les SGBDs. On montre sur un benchmark OMQA bien établi, que les performances temporelles sont considérablement améliorées par notre cadre d'optimisation, jusqu'à trois ordres de grandeur
Ontology-mediated query answering (OMQA) consists in asking database queries on knowledge bases (KBs); a KB is a set of facts called a database, which is described by a domain knowledge called an ontology. A main OMQA technique is FO-rewriting, which reformulates a query asked on a KB \wrt to the KB's ontology; query answers are then computed through the relational evaluation of the query reformulation on the KB's database. Essentially, because FO-rewriting compiles the domain knowledge relevant to queries into their reformulations, query reformulations may be complex and their optimization is the crux of efficiency. We devise a novel optimization framework for a large set of OMQA settings that enjoy FO-rewriting : conjunctive queries, i.e., the core select-project-join queries, asked on KBs expressed in datalog$\pm$ and existential rules, description logic and OWL, or RDF/S. We optimize the query reformulations produced by any state-of-the-art algorithm for FO-rewriting by computing rapidly, using a KB's database summary, simpler queries with same answers that can be evaluated faster by DBMSs. We show on a well-established OMQA benchmark that time performance is significantly improved by our optimization framework in general, up to three orders of magnitude

APA, Harvard, Vancouver, ISO, and other styles

42

Antonov, Anton. "Product Information Management." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-150108.

Full text

Abstract:

Product Information Management (PIM) is a field that deals with the product master data management and combines into one base the experience and the principles of data integration and data quality. Product Information Management merges the specific attributes of products across all channels in the supply chain. By unification, centralization and standardization of product information into one platform, quality and timely information with added value can be achieved. The goal of the theoretical part of the thesis is to construct a picture of the PIM, to place the PIM into a broader context, to define and describe various parts of the PIM solution, to describe the main differences in characteristics between the product data and data about clients and to summarize the available information on the administration and management of knowledge bases of the PIM data quality relevant for solving practical problems. The practical part of the thesis focuses on designing the structure, the content and the method of filling the knowledge base of the Product Information Management solution in the environment of the DataFlux software tools from SAS Institute. The practical part of the thesis further incorporates the analysis of the real product data, the design of definitions and objects of the knowledge base, the creation of a reference database and the testing of the knowledge base with the help of specially designed web services.

APA, Harvard, Vancouver, ISO, and other styles

43

Angeles, Maria del Pilar. "Management of data quality when integrating data with known provenance." Thesis, Heriot-Watt University, 2007. http://hdl.handle.net/10399/64.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Kalibjian, Jeff. ""Big Data" Management and Security Application to Telemetry Data Products." International Foundation for Telemetering, 2013. http://hdl.handle.net/10150/579664.

Full text

Abstract:

ITC/USA 2013 Conference Proceedings / The Forty-Ninth Annual International Telemetering Conference and Technical Exhibition / October 21-24, 2013 / Bally's Hotel & Convention Center, Las Vegas, NV
"Big Data" [1] and the security challenge of managing "Big Data" is a hot topic in the IT world. The term "Big Data" is used to describe very large data sets that cannot be processed by traditional database applications in "tractable" periods of time. Securing data in a conventional database is challenge enough; securing data whose size may exceed hundreds of terabytes or even petabytes is even more daunting! As the size of telemetry product and telemetry post-processed product continues to grow, "Big Data" management techniques and the securing of that data may have ever increasing application in the telemetry realm. After reviewing "Big Data", "Big Data" security and management basics, potential application to telemetry post-processed product will be explored.

APA, Harvard, Vancouver, ISO, and other styles

45

Tatikonda, Shirish. "Towards Efficient Data Analysis and Management of Semi-structured Data." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1275414859.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Diallo, Thierno Mahamoudou. "Discovering data quality rules in a master data management context." Thesis, Lyon, INSA, 2013. http://www.theses.fr/2013ISAL0067.

Full text

Abstract:

Le manque de qualité des données continue d'avoir un impact considérable pour les entreprises. Ces problèmes, aggravés par la quantité de plus en plus croissante de données échangées, entrainent entre autres un surcoût financier et un rallongement des délais. De ce fait, trouver des techniques efficaces de correction des données est un sujet de plus en plus pertinent pour la communauté scientifique des bases de données. Par exemple, certaines classes de contraintes comme les Dépendances Fonctionnelles Conditionnelles (DFCs) ont été récemment introduites pour le nettoyage de données. Les méthodes de nettoyage basées sur les CFDs sont efficaces pour capturer les erreurs mais sont limitées pour les corriger . L’essor récent de la gestion de données de référence plus connu sous le sigle MDM (Master Data Management) a permis l'introduction d'une nouvelle classe de règle de qualité de données: les Règles d’Édition (RE) qui permettent d'identifier les attributs en erreur et de proposer les valeurs correctes correspondantes issues des données de référence. Ces derniers étant de très bonne qualité. Cependant, concevoir ces règles manuellement est un processus long et coûteux. Dans cette thèse nous développons des techniques pour découvrir de manière automatique les RE à partir des données source et des données de référence. Nous proposons une nouvelle sémantique des RE basée sur la satisfaction. Grace à cette nouvelle sémantique le problème de découverte des RE se révèle être une combinaison de la découverte des DFCs et de l'extraction des correspondances entre attributs source et attributs des données de référence. Nous abordons d'abord la découverte des DFCs, en particulier la classe des DFCs constantes très expressives pour la détection d'incohérence. Nous étendons des techniques conçues pour la découverte des traditionnelles dépendances fonctionnelles. Nous proposons ensuite une méthode basée sur les dépendances d'inclusion pour extraire les correspondances entre attributs source et attributs des données de référence avant de construire de manière automatique les RE. Enfin nous proposons quelques heuristiques d'application des ER pour le nettoyage de données. Les techniques ont été implémenté et évalué sur des données synthétiques et réelles montrant la faisabilité et la robustesse de nos propositions
Dirty data continues to be an important issue for companies. The datawarehouse institute [Eckerson, 2002], [Rockwell, 2012] stated poor data costs US businesses $611 billion dollars annually and erroneously priced data in retail databases costs US customers $2.5 billion each year. Data quality becomes more and more critical. The database community pays a particular attention to this subject where a variety of integrity constraints like Conditional Functional Dependencies (CFD) have been studied for data cleaning. Repair techniques based on these constraints are precise to catch inconsistencies but are limited on how to exactly correct data. Master data brings a new alternative for data cleaning with respect to it quality property. Thanks to the growing importance of Master Data Management (MDM), a new class of data quality rule known as Editing Rules (ER) tells how to fix errors, pointing which attributes are wrong and what values they should take. The intuition is to correct dirty data using high quality data from the master. However, finding data quality rules is an expensive process that involves intensive manual efforts. It remains unrealistic to rely on human designers. In this thesis, we develop pattern mining techniques for discovering ER from existing source relations with respect to master relations. In this set- ting, we propose a new semantics of ER taking advantage of both source and master data. Thanks to the semantics proposed in term of satisfaction, the discovery problem of ER turns out to be strongly related to the discovery of both CFD and one-to-one correspondences between sources and target attributes. We first attack the problem of discovering CFD. We concentrate our attention to the particular class of constant CFD known as very expressive to detect inconsistencies. We extend some well know concepts introduced for traditional Functional Dependencies to solve the discovery problem of CFD. Secondly, we propose a method based on INclusion Dependencies to extract one-to-one correspondences from source to master attributes before automatically building ER. Finally we propose some heuristics of applying ER to clean data. We have implemented and evaluated our techniques on both real life and synthetic databases. Experiments show both the feasibility, the scalability and the robustness of our proposal

APA, Harvard, Vancouver, ISO, and other styles

47

Schubert, Chris, Georg Seyerl, and Katharina Sack. "Dynamic Data Citation Service-Subset Tool for Operational Data Management." MDPI, 2019. http://dx.doi.org/10.3390/data4030115.

Full text

Abstract:

In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to reuse such data, especially data fragments as well as their data services in a collaborative and reproducible manner by citing the origin source, data analysts, e.g., researchers or impact modelers, need a possibility to identify the exact version, precise time information, parameter, and names of the dataset used. A manual process would make the citation of data fragments as a subset of an entire dataset rather complex and imprecise to obtain. Data in climate research are in most cases multidimensional, structured grid data that can change partially over time. The citation of such evolving content requires the approach of "dynamic data citation". The applied approach is based on associating queries with persistent identifiers. These queries contain the subsetting parameters, e.g., the spatial coordinates of the desired study area or the time frame with a start and end date, which are automatically included in the metadata of the newly generated subset and thus represent the information about the data history, the data provenance, which has to be established in data repository ecosystems. The Research Data Alliance Data Citation Working Group (RDA Data Citation WG) summarized the scientific status quo as well as the state of the art from existing citation and data management concepts and developed the scalable dynamic data citation methodology of evolving data. The Data Centre at the Climate Change Centre Austria (CCCA) has implemented the given recommendations and offers since 2017 an operational service on dynamic data citation on climate scenario data. With the consciousness that the objective of this topic brings a lot of dependencies on bibliographic citation research which is still under discussion, the CCCA service on Dynamic Data Citation focused on the climate domain specific issues, like characteristics of data, formats, software environment, and usage behavior. The current effort beyond spreading made experiences will be the scalability of the implementation, e.g., towards the potential of an Open Data Cube solution.

APA, Harvard, Vancouver, ISO, and other styles

48

Fernández, Moctezuma Rafael J. "A Data-Descriptive Feedback Framework for Data Stream Management Systems." PDXScholar, 2012. https://pdxscholar.library.pdx.edu/open_access_etds/116.

Full text

Abstract:

Data Stream Management Systems (DSMSs) provide support for continuous query evaluation over data streams. Data streams provide processing challenges due to their unbounded nature and varying characteristics, such as rate and density fluctuations. DSMSs need to adapt stream processing to these changes within certain constraints, such as available computational resources and minimum latency requirements in producing results. The proposed research develops an inter-operator feedback framework, where opportunities for run-time adaptation of stream processing are expressed in terms of descriptions of substreams and actions applicable to the substreams, called feedback punctuations. Both the discovery of adaptation opportunities and the exploitation of these opportunities are performed in the query operators. DSMSs are also concerned with state management, in particular, state derived from tuple processing. The proposed research also introduces the Contracts Framework, which provides execution guarantees about state purging in continuous query evaluation for systems with and without inter-operator feedback. This research provides both theoretical and design contributions. The research also includes an implementation and evaluation of the feedback techniques in the NiagaraST DSMS, and a reference implementation of the Contracts Framework.

APA, Harvard, Vancouver, ISO, and other styles

49

Zhang, Yanling. "From theory to practice : environmental management in China /." Berlin : wvb, 2005. http://www.wvberlin.de/data/inhalt/zhang.htm.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Lehmann, Marek. "Data access in workflow management systems /." Berlin : Aka, 2006. http://aleph.unisg.ch/hsgscan/hm00172711.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data management'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles