Relevant bibliographies by topics / Data quality

Academic literature on the topic 'Data quality'

Author: Grafiati

Published: 4 June 2021

Last updated: 20 July 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Data quality.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Data quality"

Vaníček, J. "Software and data quality." Agricultural Economics (Zemědělská ekonomika) 52, No. 3 (February 17, 2012): 138–46. http://dx.doi.org/10.17221/5007-agricecon.

Full text

Abstract:

The paper presents new ideas in the International SQuaRE (Software Quality Requirements and Evaluation) standardisation research project, which concerns the development of a special branch of international standards for software quality. Data can be considered as an integral part of software. The current international standard and technical report of the ISO/IEC 9126, ISO/IEC 14598 series and ISO/IEC 12119 standard covert the whole software as an indivisible entity. However, such data sets as databases and data stores have a special character and need a different structure of quality characteristic. Therefore it was decided in the SQuaRE project create a special international standard for data quality. The main idea for this standard and the critical discussion of these ideas is presented in this paper. The main part of this contribution was presented on the conference Agricultural Perspectives XIV, aligned by Czech University of Agriculture in Prague, September 20 to 21, 2005.

APA, Harvard, Vancouver, ISO, and other styles

Stoykov, Evgeni. "DATA QUALITY FOR HYDROGRAPHIC MEASUREMENTS." Journal Scientific and Applied Research 21, no. 1 (November 15, 2021): 26–30. http://dx.doi.org/10.46687/jsar.v21i1.314.

Full text

Abstract:

The topic of preservation of cultural heritage is very important and is an integral part of National Security. It is up-to-date and timely. Its significance is determined by the scale and intensity of criminal attacks on cultural heritage, which have caused an increase in the need to update the system of measures to safeguard cultural values and overcome the underestimation of the protection of cultural heritage as a national security factor.

APA, Harvard, Vancouver, ISO, and other styles

Batini, Carlo, Anisa Rula, Monica Scannapieco, and Gianluigi Viscusi. "From Data Quality to Big Data Quality." Journal of Database Management 26, no. 1 (January 2015): 60–82. http://dx.doi.org/10.4018/jdm.2015010103.

Full text

Abstract:

This article investigates the evolution of data quality issues from traditional structured data managed in relational databases to Big Data. In particular, the paper examines the nature of the relationship between Data Quality and several research coordinates that are relevant in Big Data, such as the variety of data types, data sources and application domains, focusing on maps, semi-structured texts, linked open data, sensor & sensor networks and official statistics. Consequently a set of structural characteristics is identified and a systematization of the a posteriori correlation between them and quality dimensions is provided. Finally, Big Data quality issues are considered in a conceptual framework suitable to map the evolution of the quality paradigm according to three core coordinates that are significant in the context of the Big Data phenomenon: the data type considered, the source of data, and the application domain. Thus, the framework allows ascertaining the relevant changes in data quality emerging with the Big Data phenomenon, through an integrative and theoretical literature review.

APA, Harvard, Vancouver, ISO, and other styles

Chicco, Joanne. "Data Quality." Health Information Management 27, no. 1 (March 1997): 12–21. http://dx.doi.org/10.1177/183335839702700106.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sadiq, Shazia, Tamraparni Dasu, Xin Luna Dong, Juliana Freire, Ihab F. Ilyas, Sebastian Link, Miller J. Miller, Felix Naumann, Xiaofang Zhou, and Divesh Srivastava. "Data Quality." ACM SIGMOD Record 46, no. 4 (February 22, 2018): 35–43. http://dx.doi.org/10.1145/3186549.3186559.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Glaze, William H. "Data quality." Environmental Science & Technology 36, no. 11 (June 2002): 225A. http://dx.doi.org/10.1021/es0223170.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Fan, Wenfei. "Data Quality." ACM SIGMOD Record 44, no. 3 (December 3, 2015): 7–18. http://dx.doi.org/10.1145/2854006.2854008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Huh, YU, FR Keller, TC Redman, and AR Watkins. "Data quality." Information and Software Technology 32, no. 8 (October 1990): 559–65. http://dx.doi.org/10.1016/0950-5849(90)90146-i.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kennedy, Dale J., Douglas C. Montgomery, Dwayne A. Rollier, and J. Bert Keats. "Data Quality." International Journal of Life Cycle Assessment 2, no. 4 (December 1997): 229–39. http://dx.doi.org/10.1007/bf02978420.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kennedy, Dale J., Douglas C. Montgomery, and Beth H. Quay. "Data quality." International Journal of Life Cycle Assessment 1, no. 4 (December 1996): 199–207. http://dx.doi.org/10.1007/bf02978693.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Data quality"

Grillo, Aderibigbe. "Developing a data quality scorecard that measures data quality in a data warehouse." Thesis, Brunel University, 2018. http://bura.brunel.ac.uk/handle/2438/17137.

Full text

Abstract:

The main purpose of this thesis is to develop a data quality scorecard (DQS) that aligns the data quality needs of the Data warehouse stakeholder group with selected data quality dimensions. To comprehend the research domain, a general and systematic literature review (SLR) was carried out, after which the research scope was established. Using Design Science Research (DSR) as the methodology to structure the research, three iterations were carried out to achieve the research aim highlighted in this thesis. In the first iteration, as DSR was used as a paradigm, the artefact was build from the results of the general and systematic literature review conduct. A data quality scorecard (DQS) was conceptualised. The result of the SLR and the recommendations for designing an effective scorecard provided the input for the development of the DQS. Using a System Usability Scale (SUS), to validate the usability of the DQS, the results of the first iteration suggest that the DW stakeholders found the DQS useful. The second iteration was conducted to further evaluate the DQS through a run through in the FMCG domain and then conducting a semi-structured interview. The thematic analysis of the semi-structured interviews demonstrated that the stakeholder's participants' found the DQS to be transparent; an additional reporting tool; Integrates; easy to use; consistent; and increases confidence in the data. However, the timeliness data dimension was found to be redundant, necessitating a modification to the DQS. The third iteration was conducted with similar steps as the second iteration but with the modified DQS in the oil and gas domain. The results from the third iteration suggest that DQS is a useful tool that is easy to use on a daily basis. The research contributes to theory by demonstrating a novel approach to DQS design This was achieved by ensuring the design of the DQS aligns with the data quality concern areas of the DW stakeholders and the data quality dimensions. Further, this research lay a good foundation for the future by establishing a DQS model that can be used as a base for further development.

APA, Harvard, Vancouver, ISO, and other styles

Sýkorová, Veronika. "Data Quality Metrics." Master's thesis, Vysoká škola ekonomická v Praze, 2008. http://www.nusl.cz/ntk/nusl-2815.

Full text

Abstract:

The aim of the thesis is to prove measurability of the Data Quality which is a relatively subjective measure and thus is difficult to measure. In doing this various aspects of measuring the quality of data are analyzed and a Complex Data Quality Monitoring System is introduced with the aim to provide a concept for measuring/monitoring the overall Data Quality in an organization. The system is built on a metrics hierarchy decomposed into particular detailed metrics, dimensions enabling multidimensional analyses of the metrics, and processes being measured by the metrics. The first part of the thesis (Chapter 2 and Chapter 3) is focused on dealing with Data Quality, i.e. provides various definitions of Data Quality, gives reasoning for the importance of Data Quality in a company, and presents some of the most common tools and solutions that target to managing Data Quality in an organization. The second part of the thesis (Chapter 4 and Chapter 5) builds on the previous part and leads into measuring Data Quality using metrics, i.e. contains definition and purpose of Data Quality Metrics, places them into the multidimensional context (dimensions, hierarchies) and states five possible decompositions of Data Quality metrics into detail. The third part of the thesis (Chapter 6) contains the proposed Complex Data Quality Monitoring System including description of Data Quality Management related dimensions and processes, and most importantly detailed definition of bottom-level metrics used for calculation of the overall Data Quality.

APA, Harvard, Vancouver, ISO, and other styles

Yu, Wenyuan. "Improving data quality : data consistency, deduplication, currency and accuracy." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/8899.

Full text

Abstract:

Data quality is one of the key problems in data management. An unprecedented amount of data has been accumulated and has become a valuable asset of an organization. The value of the data relies greatly on its quality. However, data is often dirty in real life. It may be inconsistent, duplicated, stale, inaccurate or incomplete, which can reduce its usability and increase the cost of businesses. Consequently the need for improving data quality arises, which comprises of five central issues of improving data quality, namely, data consistency, data deduplication, data currency, data accuracy and information completeness. This thesis presents the results of our work on the first four issues with regards to data consistency, deduplication, currency and accuracy. The first part of the thesis investigates incremental verifications of data consistencies in distributed data. Given a distributed database D, a set S of conditional functional dependencies (CFDs), the set V of violations of the CFDs in D, and updates ΔD to D, it is to find, with minimum data shipment, changes ΔV to V in response to ΔD. Although the problems are intractable, we show that they are bounded: there exist algorithms to detect errors such that their computational cost and data shipment are both linear in the size of ΔD and ΔV, independent of the size of the database D. Such incremental algorithms are provided for both vertically and horizontally partitioned data, and we show that the algorithms are optimal. The second part of the thesis studies the interaction between record matching and data repairing. Record matching, the main technique underlying data deduplication, aims to identify tuples that refer to the same real-world object, and repairing is to make a database consistent by fixing errors in the data using constraints. These are treated as separate processes in most data cleaning systems, based on heuristic solutions. However, our studies show that repairing can effectively help us identify matches, and vice versa. To capture the interaction, a uniform framework that seamlessly unifies repairing and matching operations is proposed to clean a database based on integrity constraints, matching rules and master data. The third part of the thesis presents our study of finding certain fixes that are absolutely correct for data repairing. Data repairing methods based on integrity constraints are normally heuristic, and they may not find certain fixes. Worse still, they may even introduce new errors when attempting to repair the data, which may not work well when repairing critical data such as medical records, in which a seemingly minor error often has disastrous consequences. We propose a framework and an algorithm to find certain fixes, based on master data, a class of editing rules and user interactions. A prototype system is also developed. The fourth part of the thesis introduces inferring data currency and consistency for conflict resolution, where data currency aims to identify the current values of entities, and conflict resolution is to combine tuples that pertain to the same real-world entity into a single tuple and resolve conflicts, which is also an important issue for data deduplication. We show that data currency and consistency help each other in resolving conflicts. We study a number of associated fundamental problems, and develop an approach for conflict resolution by inferring data currency and consistency. The last part of the thesis reports our study of data accuracy on the longstanding relative accuracy problem which is to determine, given tuples t1 and t2 that refer to the same entity e, whether t1[A] is more accurate than t2[A], i.e., t1[A] is closer to the true value of the A attribute of e than t2[A]. We introduce a class of accuracy rules and an inference system with a chase procedure to deduce relative accuracy, and the related fundamental problems are studied. We also propose a framework and algorithms for inferring accurate values with users’ interaction.

APA, Harvard, Vancouver, ISO, and other styles

Peralta, Veronika. "Data Quality Evaluation in Data Integration Systems." Phd thesis, Université de Versailles-Saint Quentin en Yvelines, 2006. http://tel.archives-ouvertes.fr/tel-00325139.

Full text

Abstract:

Les besoins d'accéder, de façon uniforme, à des sources de données multiples, sont chaque jour plus forts, particulièrement, dans les systèmes décisionnels qui ont besoin d'une analyse compréhensive des données. Avec le développement des Systèmes d'Intégration de Données (SID), la qualité de l'information est devenue une propriété de premier niveau de plus en plus exigée par les utilisateurs. Cette thèse porte sur la qualité des données dans les SID. Nous nous intéressons, plus précisément, aux problèmes de l'évaluation de la qualité des données délivrées aux utilisateurs en réponse à leurs requêtes et de la satisfaction des exigences des utilisateurs en terme de qualité. Nous analysons également l'utilisation de mesures de qualité pour l'amélioration de la conception du SID et de la qualité des données. Notre approche consiste à étudier un facteur de qualité à la fois, en analysant sa relation avec le SID, en proposant des techniques pour son évaluation et en proposant des actions pour son amélioration. Parmi les facteurs de qualité qui ont été proposés, cette thèse analyse deux facteurs de qualité : la fraîcheur et l'exactitude des données. Nous analysons les différentes définitions et mesures qui ont été proposées pour la fraîcheur et l'exactitude des données et nous faisons émerger les propriétés du SID qui ont un impact important sur leur évaluation. Nous résumons l'analyse de chaque facteur par le biais d'une taxonomie, qui sert à comparer les travaux existants et à faire ressortir les problèmes ouverts. Nous proposons un canevas qui modélise les différents éléments liés à l'évaluation de la qualité tels que les sources de données, les requêtes utilisateur, les processus d'intégration du SID, les propriétés du SID, les mesures de qualité et les algorithmes d'évaluation de la qualité. En particulier, nous modélisons les processus d'intégration du SID comme des processus de workflow, dans lesquels les activités réalisent les tâches qui extraient, intègrent et envoient des données aux utilisateurs. Notre support de raisonnement pour l'évaluation de la qualité est un graphe acyclique dirigé, appelé graphe de qualité, qui a la même structure du SID et contient, comme étiquettes, les propriétés du SID qui sont relevants pour l'évaluation de la qualité. Nous développons des algorithmes d'évaluation qui prennent en entrée les valeurs de qualité des données sources et les propriétés du SID, et, combinent ces valeurs pour qualifier les données délivrées par le SID. Ils se basent sur la représentation en forme de graphe et combinent les valeurs des propriétés en traversant le graphe. Les algorithmes d'évaluation peuvent être spécialisés pour tenir compte des propriétés qui influent la qualité dans une application concrète. L'idée derrière le canevas est de définir un contexte flexible qui permet la spécialisation des algorithmes d'évaluation à des scénarios d'application spécifiques. Les valeurs de qualité obtenues pendant l'évaluation sont comparées à celles attendues par les utilisateurs. Des actions d'amélioration peuvent se réaliser si les exigences de qualité ne sont pas satisfaites. Nous suggérons des actions d'amélioration élémentaires qui peuvent être composées pour améliorer la qualité dans un SID concret. Notre approche pour améliorer la fraîcheur des données consiste à l'analyse du SID à différents niveaux d'abstraction, de façon à identifier ses points critiques et cibler l'application d'actions d'amélioration sur ces points-là. Notre approche pour améliorer l'exactitude des données consiste à partitionner les résultats des requêtes en portions (certains attributs, certaines tuples) ayant une exactitude homogène. Cela permet aux applications utilisateur de visualiser seulement les données les plus exactes, de filtrer les données ne satisfaisant pas les exigences d'exactitude ou de visualiser les données par tranche selon leur exactitude. Comparée aux approches existantes de sélection de sources, notre proposition permet de sélectionner les portions les plus exactes au lieu de filtrer des sources entières. Les contributions principales de cette thèse sont : (1) une analyse détaillée des facteurs de qualité fraîcheur et exactitude ; (2) la proposition de techniques et algorithmes pour l'évaluation et l'amélioration de la fraîcheur et l'exactitude des données ; et (3) un prototype d'évaluation de la qualité utilisable dans la conception de SID.

APA, Harvard, Vancouver, ISO, and other styles

Peralta, Costabel Veronika del Carmen. "Data quality evaluation in data integration systems." Versailles-St Quentin en Yvelines, 2006. http://www.theses.fr/2006VERS0020.

Full text

Abstract:

This thesis deals with data quality evaluation in Data Integration Systems (DIS). Specifically, we address the problems of evaluating the quality of the data conveyed to users in response to their queries and verifying if users’ quality expectations can be achieved. We also analyze how quality measures can be used for improving the DIS and enforcing data quality. Our approach consists in studying one quality factor at a time, analyzing its impact within a DIS, proposing techniques for its evaluation and proposing improvement actions for its enforcement. Among the quality factors that have been proposed, this thesis analyzes two of the most used ones: data freshness and data accuracy
Cette thèse porte sur la qualité des données dans les Systèmes d’Intégration de Données (SID). Nous nous intéressons, plus précisément, aux problèmes de l’évaluation de la qualité des données délivrées aux utilisateurs en réponse à leurs requêtes et de la satisfaction des exigences des utilisateurs en terme de qualité. Nous analysons également l’utilisation de mesures de qualité pour l’amélioration de la conception du SID et la conséquente amélioration de la qualité des données. Notre approche consiste à étudier un facteur de qualité à la fois, en analysant sa relation avec le SID, en proposant des techniques pour son évaluation et en proposant des actions pour son amélioration. Parmi les facteurs de qualité qui ont été proposés, cette thèse analyse deux facteurs de qualité : la fraîcheur et l’exactitude des données

APA, Harvard, Vancouver, ISO, and other styles

Deb, Rupam. "Data Quality Enhancement for Traffic Accident Data." Thesis, Griffith University, 2017. http://hdl.handle.net/10072/367725.

Full text

Abstract:

Death, injury, and disability resulting from road traffic crashes continue to be a major global public health problem. Recent data suggest that the number of fatalities from traffic crashes is in excess of 1.25 million people each year with non-fatal injuries affecting a further 20-50 million people. It is predicted that by 2030, road traffic accidents will have progressed to be the 5th leading cause of death and that the number of people who will die annually from traffic accidents will have doubled from current levels. Both developed and developing countries suffer from the consequences of the increase in human population, and consequently, vehicle numbers. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. To analyze traffic accident factors effectively, a complete traffic accident historical database is needed. Road accident fatality rates depend on many factors, so it is a very challenging task to investigate the dependencies between the attributes because of the many environmental and road accident factors. Missing data and noisy data in the database obscure the discovery of important factors and lead to invalid conclusions.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Science, Environment, Engineering and Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

He, Ying Surveying &amp Spatial Information Systems Faculty of Engineering UNSW. "Spatial data quality management." Publisher:University of New South Wales. Surveying & Spatial Information Systems, 2008. http://handle.unsw.edu.au/1959.4/43323.

Full text

Abstract:

The applications of geographic information systems (GIS) in various areas have highlighted the importance of data quality. Data quality research has been given a priority by GIS academics for three decades. However, the outcomes of data quality research have not been sufficiently translated into practical applications. Users still need a GIS capable of storing, managing and manipulating data quality information. To fill this gap, this research aims to investigate how we can develop a tool that effectively and efficiently manages data quality information to aid data users to better understand and assess the quality of their GIS outputs. Specifically, this thesis aims: 1. To develop a framework for establishing a systematic linkage between data quality indicators and appropriate uncertainty models; 2. To propose an object-oriented data quality model for organising and documenting data quality information; 3. To create data quality schemas for defining and storing the contents of metadata databases; 4. To develop a new conceptual model of data quality management; 5. To develop and implement a prototype system for enhancing the capability of data quality management in commercial GIS. Based on reviews of error and uncertainty modelling in the literature, a conceptual framework has been developed to establish the systematic linkage between data quality elements and appropriate error and uncertainty models. To overcome the limitations identified in the review and satisfy a series of requirements for representing data quality, a new object-oriented data quality model has been proposed. It enables data quality information to be documented and stored in a multi-level structure and to be integrally linked with spatial data to allow access, processing and graphic visualisation. The conceptual model for data quality management is proposed where a data quality storage model, uncertainty models and visualisation methods are three basic components. This model establishes the processes involved when managing data quality, emphasising on the integration of uncertainty modelling and visualisation techniques. The above studies lay the theoretical foundations for the development of a prototype system with the ability to manage data quality. Object-oriented approach, database technology and programming technology have been integrated to design and implement the prototype system within the ESRI ArcGIS software. The object-oriented approach allows the prototype to be developed in a more flexible and easily maintained manner. The prototype allows users to browse and access data quality information at different levels. Moreover, a set of error and uncertainty models are embedded within the system. With the prototype, data quality elements can be extracted from the database and automatically linked with the appropriate error and uncertainty models, as well as with their implications in the form of simple maps. This function results in proposing a set of different uncertainty models for users to choose for assessing how uncertainty inherent in the data can affect their specific application. It will significantly increase the users' confidence in using data for a particular situation. To demonstrate the enhanced capability of the prototype, the system has been tested against the real data. The implementation has shown that the prototype can efficiently assist data users, especially non-expert users, to better understand data quality and utilise it in a more practical way. The methodologies and approaches for managing quality information presented in this thesis should serve as an impetus for supporting further research.

APA, Harvard, Vancouver, ISO, and other styles

Redgert, Rebecca. "Evaluating Data Quality in a Data Warehouse Environment." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-208766.

Full text

Abstract:

The amount of data accumulated by organizations have grown significantly during the last couple of years, increasing the importance of data quality. Ensuring data quality for large amounts of data is a complicated task, but crucial to subsequent analysis. This study investigates how to maintain and improve data quality in a data warehouse. A case study of the errors in a data warehouse was conducted at the Swedish company Kaplan, and resulted in guiding principles on how to improve the data quality. The investigation was done by manually comparing data from the source systems to the data integrated in the data warehouse and applying a quality framework based on semiotic theory to identify errors. The three main guiding principles given are (1) to implement a standardized format for the source data, (2) to implement a check prior to integration where the source data are reviewed and corrected if necessary, and (3) to create and implement specific database integrity rules. Further work is encouraged on establishing a guide for the framework on how to best perform a manual approach for comparing data, and quality assurance of source data.
Mängden data som ackumulerats av organisationer har ökat betydligt under de senaste åren, vilket har ökat betydelsen för datakvalitet. Att säkerställa datakvalitet för stora mängder data är en komplicerad uppgift, men avgörande för efterföljande analys. Denna studie undersöker hur man underhåller och förbättrar datakvaliteten i ett datalager. En fallstudie av fel i ett datalager på det svenska företaget Kaplan genomfördes och resulterade i riktlinjer för hur datakvaliteten kan förbättras. Undersökningen gjordes genom att manuellt jämföra data från källsystemen med datat integrerat i datalagret och genom att tillämpa ett kvalitetsramverk grundat på semiotisk teori för att kunna identifiera fel. De tre huvudsakliga riktlinjerna som gavs är att (1) implementera ett standardiserat format för källdatat, (2) genomföra en kontroll före integration där källdatat granskas och korrigeras vid behov, och (3) att skapa och implementera specifika databasintegritetsregler. Vidare forskning uppmuntras för att skapa en guide till ramverket om hur man bäst jämför data genom en manuell undersökning, och kvalitetssäkring av källdata.

APA, Harvard, Vancouver, ISO, and other styles

Bringle, Per. "Data Quality in Data Warehouses: a Case Study." Thesis, University of Skövde, Department of Computer Science, 1999. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-404.

Full text

Abstract:

Companies today experience problems with poor data quality in their systems. Because of the enormous amount of data in companies, the data has to be of good quality if companies want to take advantage of it. Since the purpose with a data warehouse is to gather information from several databases for decision support, it is absolutely vital that data is of good quality. There exists several ways of determining or classifying data quality in databases. In this work the data quality management in a large Swedish company's data warehouse is examined, through a case study, using a framework specialized for data warehouses. The quality of data is examined from syntactic, semantic and pragmatic point of view. The results of the examination is then compared with a similar case study previously conducted in order to find any differences and similarities.

APA, Harvard, Vancouver, ISO, and other styles

Li, Lin. "Data quality and data cleaning in database applications." Thesis, Edinburgh Napier University, 2012. http://researchrepository.napier.ac.uk/Output/5788.

Full text

Abstract:

Today, data plays an important role in people's daily activities. With the help of some database applications such as decision support systems and customer relationship management systems (CRM), useful information or knowledge could be derived from large quantities of data. However, investigations show that many such applications fail to work successfully. There are many reasons to cause the failure, such as poor system infrastructure design or query performance. But nothing is more certain to yield failure than lack of concern for the issue of data quality. High quality of data is a key to today's business success. The quality of any large real world data set depends on a number of factors among which the source of the data is often the crucial factor. It has now been recognized that an inordinate proportion of data in most data sources is dirty. Obviously, a database application with a high proportion of dirty data is not reliable for the purpose of data mining or deriving business intelligence and the quality of decisions made on the basis of such business intelligence is also unreliable. In order to ensure high quality of data, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. This thesis is focusing on the improvement of data quality in database applications with the help of current data cleaning methods. It provides a systematic and comparative description of the research issues related to the improvement of the quality of data, and has addressed a number of research issues related to data cleaning. In the first part of the thesis, related literature of data cleaning and data quality are reviewed and discussed. Building on this research, a rule-based taxonomy of dirty data is proposed in the second part of the thesis. The proposed taxonomy not only summarizes the most dirty data types but is the basis on which the proposed method for solving the Dirty Data Selection (DDS) problem during the data cleaning process was developed. This helps us to design the DDS process in the proposed data cleaning framework described in the third part of the thesis. This framework retains the most appealing characteristics of existing data cleaning approaches, and improves the efficiency and effectiveness of data cleaning as well as the degree of automation during the data cleaning process. Finally, a set of approximate string matching algorithms are studied and experimental work has been undertaken. Approximate string matching is an important part in many data cleaning approaches which has been well studied for many years. The experimental work in the thesis confirmed the statement that there is no clear best technique. It shows that the characteristics of data such as the size of a dataset, the error rate in a dataset, the type of strings in a dataset and even the type of typo in a string will have significant effect on the performance of the selected techniques. In addition, the characteristics of data also have effect on the selection of suitable threshold values for the selected matching algorithms. The achievements based on these experimental results provide the fundamental improvement in the design of 'algorithm selection mechanism' in the data cleaning framework, which enhances the performance of data cleaning system in database applications.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Data quality"

Tarpey, Simon. Data quality. [U.K]: NHS Executive, 1996.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Wang, Richard Y. Data quality. Boston: Kluwer Academic Publishers, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Wang, Richard Y. Data quality. New York: Kluwer Academic Publishers, 2002.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Willett, Terrence, and Aeron Zentner. Assessing Data Quality. 2455 Teller Road, Thousand Oaks California 91320: SAGE Publications, Inc., 2021. http://dx.doi.org/10.4135/9781071858769.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Otto, Boris, and Hubert Österle. Corporate Data Quality. Berlin, Heidelberg: Springer Berlin Heidelberg, 2016. http://dx.doi.org/10.1007/978-3-662-46806-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Fisher, Peter F., and Michael F. Goodchild. Spatial Data Quality. Edited by Wenzhong Shi. Abingdon, UK: Taylor & Francis, 2002. http://dx.doi.org/10.4324/9780203303245.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wang, Y. Richard. Quality data objects. Cambridge, Mass: Alfred P. Sloan School of Management, Massachusetts Institute of Technology, 1992.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

O'Day, James. Accident data quality. Washington, D.C: National Academy Press, 1993.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Pfeil, Buttenfield Barbara, ed. Mapping data quality. Toronto: University of Toronto Press, 1993.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Wenzhong, Shi, Goodchild Michael F, and Fisher Peter, eds. Spatial data quality. London: Taylor & Francis, 2002.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Data quality"

Fürber, Christian. "Data Quality." In Data Quality Management with Semantic Technologies, 20–55. Wiesbaden: Springer Fachmedien Wiesbaden, 2015. http://dx.doi.org/10.1007/978-3-658-12225-6_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Treder, Martin. "Data Quality." In The Chief Data Officer Management Handbook, 139–56. Berkeley, CA: Apress, 2020. http://dx.doi.org/10.1007/978-1-4842-6115-6_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Nahler, Gerhard. "data quality." In Dictionary of Pharmaceutical Medicine, 47. Vienna: Springer Vienna, 2009. http://dx.doi.org/10.1007/978-3-211-89836-9_352.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Piedmont, Ralph L. "Data Quality." In Encyclopedia of Quality of Life and Well-Being Research, 1447–48. Dordrecht: Springer Netherlands, 2014. http://dx.doi.org/10.1007/978-94-007-0753-5_667.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Fleckenstein, Mike, and Lorraine Fellows. "Data Quality." In Modern Data Strategy, 101–19. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-68993-7_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Shekhar, Shashi, and Hui Xiong. "Data Quality." In Encyclopedia of GIS, 220. Boston, MA: Springer US, 2008. http://dx.doi.org/10.1007/978-0-387-35973-1_248.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Weik, Martin H. "data quality." In Computer Science and Communications Dictionary, 355. Boston, MA: Springer US, 2000. http://dx.doi.org/10.1007/1-4020-0613-6_4349.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gayo, Jose Emilio Labra, Eric Prud’hommeaux, Iovka Boneva, and Dimitris Kontokostas. "Data Quality." In Validating RDF Data, 27–53. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-031-79478-0_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Barksdale, George “Buster”, and Kecia Pierce. "Data Quality." In Using Metering to Perform Energy Management, 73–91. New York: River Publishers, 2024. http://dx.doi.org/10.1201/9781003467113-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Piedmont, Ralph L. "Data Quality." In Encyclopedia of Quality of Life and Well-Being Research, 1599–600. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-17299-1_667.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Data quality"

Shepperd, Martin. "Data quality." In Proceeding of the 2nd international workshop. New York, New York, USA: ACM Press, 2011. http://dx.doi.org/10.1145/1985374.1985376.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sidi, Fatimah, Payam Hassany Shariat Panahy, Lilly Suriani Affendey, Marzanah A. Jabar, Hamidah Ibrahim, and Aida Mustapha. "Data quality: A survey of data quality dimensions." In 2012 International Conference on Information Retrieval & Knowledge Management (CAMP). IEEE, 2012. http://dx.doi.org/10.1109/infrkm.2012.6204995.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mehmood, Kashif, Samira Si-Said Cherfi, and Isabelle Comyn-Wattiau. "Data quality through model quality." In Proceeding of the first international workshop. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1651415.1651421.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Arruda, Darlan, and Nazim H. Madhavji. "QualiBD: A Tool for Modelling Quality Requirements for Big Data Applications." In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019. http://dx.doi.org/10.1109/bigdata47090.2019.9006294.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Fu, Qian, and John M. Easton. "Understanding data quality: Ensuring data quality by design in the rail industry." In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017. http://dx.doi.org/10.1109/bigdata.2017.8258380.

Full text

APA, Harvard, Vancouver, ISO, and other styles

He, Tianxing, Shengcheng Yu, Ziyuan Wang, Jieqiong Li, and Zhenyu Chen. "From Data Quality to Model Quality." In Internetware '19: The 11th Asia-Pacific Symposium on Internetware. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3361242.3361260.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Johnson, Theodore, and Tamraparni Dasu. "Data quality and data cleaning." In the 2003 ACM SIGMOD international conference on. New York, New York, USA: ACM Press, 2003. http://dx.doi.org/10.1145/872757.872875.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Schieferdecker, Ina. "(Open) Data Quality." In 2012 IEEE 36th Annual Computer Software and Applications Conference - COMPSAC 2012. IEEE, 2012. http://dx.doi.org/10.1109/compsac.2012.120.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Pon, Raymond K., and Alfonso F. Cárdenas. "Data quality inference." In the 2nd international workshop. New York, New York, USA: ACM Press, 2005. http://dx.doi.org/10.1145/1077501.1077519.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Song, Shaoxu, and Aoqian Zhang. "IoT Data Quality." In CIKM '20: The 29th ACM International Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3340531.3412173.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Data quality"

DEFENSE LOGISTICS AGENCY ALEXANDRIA VA. Data Quality Engineering Handbook. Fort Belvoir, VA: Defense Technical Information Center, June 1994. http://dx.doi.org/10.21236/ada315573.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ichinose, G. Waveform Data Quality Assessment. Office of Scientific and Technical Information (OSTI), April 2022. http://dx.doi.org/10.2172/1863669.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lockrem, L. L. Quality assurance and data management. Office of Scientific and Technical Information (OSTI), January 1998. http://dx.doi.org/10.2172/362612.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hovorka, Susan. SECARB-USA: Data Quality Methodology. Office of Scientific and Technical Information (OSTI), September 2020. http://dx.doi.org/10.2172/1836835.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Stewart, Ron, and Sharath Kallaje. FedFleet 2024: Fleet Data Quality - Understanding and Improving Your Agency's FAST Fleet Data Quality Metric? Office of Scientific and Technical Information (OSTI), January 2024. http://dx.doi.org/10.2172/2305390.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lombardini, Simone, Alexia Pretari, and Emily Tomkys Valteri. Going Digital: Improving data quality with digital data collection. Oxfam GB, July 2018. http://dx.doi.org/10.21201/2018.3071.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Matejka, L. A. Jr. Quality data validation: Comprehensive approach to environmental data validation. Office of Scientific and Technical Information (OSTI), October 1993. http://dx.doi.org/10.2172/10185693.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Woods, Taylor. Data from GSSE Water Quality 2020. University of Tennessee, Knoxville Libraries, June 2020. http://dx.doi.org/10.7290/91mdbwflvd.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Williams, R. J. Data Quality Statements for Spatial Databases. Fort Belvoir, VA: Defense Technical Information Center, July 1992. http://dx.doi.org/10.21236/ada264125.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Melito, Ivano, and Michael J. Briggs. Data Quality Assurance for Ribs XM2000. Fort Belvoir, VA: Defense Technical Information Center, September 2000. http://dx.doi.org/10.21236/ada383556.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Academic literature on the topic 'Data quality'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Data quality"

Dissertations / Theses on the topic "Data quality"

Books on the topic "Data quality"

Book chapters on the topic "Data quality"

Conference papers on the topic "Data quality"

Reports on the topic "Data quality"