Academic literature on the topic 'Cleaning of data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Cleaning of data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Cleaning of data"

1

Pahwa, Payal, and Rashmi Chhabra. "BST Algorithm for Duplicate Elimination in Data Warehouse." INTERNATIONAL JOURNAL OF MANAGEMENT & INFORMATION TECHNOLOGY 4, no. 1 (June 26, 2013): 190–97. http://dx.doi.org/10.24297/ijmit.v4i1.4636.

Full text
Abstract:
Data warehousing is an emerging technology and has proved to be very important for an organization. Today every business organization needs accurate and large amount of information to make proper decisions. For taking the business decisions the data should be of good quality. To improve the data quality data cleansing is needed. Data cleansing is fundamental to warehouse data reliability, and to data warehousing success. There are various methods for datacleansing. This paper addresses issues related data cleaning. We focus on the detection of duplicate records. Also anefficient algorithm for data cleaning is proposed. A review of data cleansing methods and comparison between them is presented.
APA, Harvard, Vancouver, ISO, and other styles
2

Chu, Xu, and Ihab F. Ilyas. "Qualitative data cleaning." Proceedings of the VLDB Endowment 9, no. 13 (September 2016): 1605–8. http://dx.doi.org/10.14778/3007263.3007320.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Broman, Karl W. "Cleaning genotype data." Genetic Epidemiology 17, S1 (1999): S79—S83. http://dx.doi.org/10.1002/gepi.1370170714.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Singh, Mohini. "Cleaning Up Company Data." CFA Institute Magazine 27, no. 1 (March 2016): 53. http://dx.doi.org/10.2469/cfm.v27.n1.18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Geerts, Floris, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. "Cleaning data with Llunatic." VLDB Journal 29, no. 4 (November 8, 2019): 867–92. http://dx.doi.org/10.1007/s00778-019-00586-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Aoqian, Shaoxu Song, Jianmin Wang, and Philip S. Yu. "Time series data cleaning." Proceedings of the VLDB Endowment 10, no. 10 (June 2017): 1046–57. http://dx.doi.org/10.14778/3115404.3115410.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Holstad, Mark S. "Data Driven Interceptor Cleaning." Proceedings of the Water Environment Federation 2010, no. 8 (January 1, 2010): 7636–64. http://dx.doi.org/10.2175/193864710798207792.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Karr, Alan F. "Exploratory Data Mining and Data Cleaning." Journal of the American Statistical Association 101, no. 473 (March 2006): 399. http://dx.doi.org/10.1198/jasa.2006.s81.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rahul, Kumar, and Rohitash Kumar Banyal. "Detection and Correction of Abnormal Data with Optimized Dirty Data: A New Data Cleaning Model." International Journal of Information Technology & Decision Making 20, no. 02 (March 2021): 809–41. http://dx.doi.org/10.1142/s0219622021500188.

Full text
Abstract:
Each and every business enterprises require noise-free and clean data. There is a chance of an increase in dirty data as the data warehouse loads and refreshes a large quantity of data continuously from the various sources. Hence, in order to avoid the wrong conclusions, the data cleaning process becomes a vital one in various data-connected projects. This paper made an effort to introduce a novel data cleaning technique for the effective removal of dirty data. This process involves the following two steps: (i) dirty data detection and (ii) dirty data cleaning. The dirty data detection process has been assigned with the following process namely, data normalization, hashing, clustering, and finding the suspected data. In the clustering process, the optimal selection of centroid is the promising one and is carried out by employing the optimization concept. After the finishing of dirty data prediction, the subsequent process: dirty data cleaning begins to activate. The cleaning process also assigns with some processes namely, the leveling process, Huffman coding, and cleaning the suspected data. The cleaning of suspected data is performed based on the optimization concept. Hence, for solving all optimization problems, a new hybridized algorithm is proposed, the so-called Firefly Update Enabled Rider Optimization Algorithm (FU-ROA), which is the hybridization of the Rider Optimization Algorithm (ROA) and Firefly (FF) algorithm is introduced. To the end, the analysis of the performance of the implanted data cleaning method is scrutinized over the other traditional methods like Particle Swarm Optimization (PSO), FF, Grey Wolf Optimizer (GWO), and ROA in terms of their positive and negative measures. From the result, it can be observed that for iteration 12, the performance of the proposed FU-ROA model for test case 1 on was 0.013%, 0.7%, 0.64%, and 0.29% better than the extant PSO, FF, GWO, and ROA models, respectively.
APA, Harvard, Vancouver, ISO, and other styles
10

D. Pandya, Sohil, and Paresh V. Virparia. "Context Free Data Cleaning and its Application in Mechanism for Suggestive Data Cleaning." International Journal of Information Science 1, no. 1 (August 31, 2012): 32–35. http://dx.doi.org/10.5923/j.ijis.20110101.05.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Cleaning of data"

1

Li, Lin. "Data quality and data cleaning in database applications." Thesis, Edinburgh Napier University, 2012. http://researchrepository.napier.ac.uk/Output/5788.

Full text
Abstract:
Today, data plays an important role in people's daily activities. With the help of some database applications such as decision support systems and customer relationship management systems (CRM), useful information or knowledge could be derived from large quantities of data. However, investigations show that many such applications fail to work successfully. There are many reasons to cause the failure, such as poor system infrastructure design or query performance. But nothing is more certain to yield failure than lack of concern for the issue of data quality. High quality of data is a key to today's business success. The quality of any large real world data set depends on a number of factors among which the source of the data is often the crucial factor. It has now been recognized that an inordinate proportion of data in most data sources is dirty. Obviously, a database application with a high proportion of dirty data is not reliable for the purpose of data mining or deriving business intelligence and the quality of decisions made on the basis of such business intelligence is also unreliable. In order to ensure high quality of data, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. This thesis is focusing on the improvement of data quality in database applications with the help of current data cleaning methods. It provides a systematic and comparative description of the research issues related to the improvement of the quality of data, and has addressed a number of research issues related to data cleaning. In the first part of the thesis, related literature of data cleaning and data quality are reviewed and discussed. Building on this research, a rule-based taxonomy of dirty data is proposed in the second part of the thesis. The proposed taxonomy not only summarizes the most dirty data types but is the basis on which the proposed method for solving the Dirty Data Selection (DDS) problem during the data cleaning process was developed. This helps us to design the DDS process in the proposed data cleaning framework described in the third part of the thesis. This framework retains the most appealing characteristics of existing data cleaning approaches, and improves the efficiency and effectiveness of data cleaning as well as the degree of automation during the data cleaning process. Finally, a set of approximate string matching algorithms are studied and experimental work has been undertaken. Approximate string matching is an important part in many data cleaning approaches which has been well studied for many years. The experimental work in the thesis confirmed the statement that there is no clear best technique. It shows that the characteristics of data such as the size of a dataset, the error rate in a dataset, the type of strings in a dataset and even the type of typo in a string will have significant effect on the performance of the selected techniques. In addition, the characteristics of data also have effect on the selection of suitable threshold values for the selected matching algorithms. The achievements based on these experimental results provide the fundamental improvement in the design of 'algorithm selection mechanism' in the data cleaning framework, which enhances the performance of data cleaning system in database applications.
APA, Harvard, Vancouver, ISO, and other styles
2

Liebchen, Gernot Armin. "Data cleaning techniques for software engineering data sets." Thesis, Brunel University, 2010. http://bura.brunel.ac.uk/handle/2438/5951.

Full text
Abstract:
Data quality is an important issue which has been addressed and recognised in research communities such as data warehousing, data mining and information systems. It has been agreed that poor data quality will impact the quality of results of analyses and that it will therefore impact on decisions made on the basis of these results. Empirical software engineering has neglected the issue of data quality to some extent. This fact poses the question of how researchers in empirical software engineering can trust their results without addressing the quality of the analysed data. One widely accepted definition for data quality describes it as `fitness for purpose', and the issue of poor data quality can be addressed by either introducing preventative measures or by applying means to cope with data quality issues. The research presented in this thesis addresses the latter with the special focus on noise handling. Three noise handling techniques, which utilise decision trees, are proposed for application to software engineering data sets. Each technique represents a noise handling approach: robust filtering, where training and test sets are the same; predictive filtering, where training and test sets are different; and filtering and polish, where noisy instances are corrected. The techniques were first evaluated in two different investigations by applying them to a large real world software engineering data set. In the first investigation the techniques' ability to improve predictive accuracy in differing noise levels was tested. All three techniques improved predictive accuracy in comparison to the do-nothing approach. The filtering and polish was the most successful technique in improving predictive accuracy. The second investigation utilising the large real world software engineering data set tested the techniques' ability to identify instances with implausible values. These instances were flagged for the purpose of evaluation before applying the three techniques. Robust filtering and predictive filtering decreased the number of instances with implausible values, but substantially decreased the size of the data set too. The filtering and polish technique actually increased the number of implausible values, but it did not reduce the size of the data set. Since the data set contained historical software project data, it was not possible to know the real extent of noise detected. This led to the production of simulated software engineering data sets, which were modelled on the real data set used in the previous evaluations to ensure domain specific characteristics. These simulated versions of the data set were then injected with noise, such that the real extent of the noise was known. After the noise injection the three noise handling techniques were applied to allow evaluation. This procedure of simulating software engineering data sets combined the incorporation of domain specific characteristics of the real world with the control over the simulated data. This is seen as a special strength of this evaluation approach. The results of the evaluation of the simulation showed that none of the techniques performed well. Robust filtering and filtering and polish performed very poorly, and based on the results of this evaluation they would not be recommended for the task of noise reduction. The predictive filtering technique was the best performing technique in this evaluation, but it did not perform significantly well either. An exhaustive systematic literature review has been carried out investigating to what extent the empirical software engineering community has considered data quality. The findings showed that the issue of data quality has been largely neglected by the empirical software engineering community. The work in this thesis highlights an important gap in empirical software engineering. It provided clarification and distinctions of the terms noise and outliers. Noise and outliers are overlapping, but they are fundamentally different. Since noise and outliers are often treated the same in noise handling techniques, a clarification of the two terms was necessary. To investigate the capabilities of noise handling techniques a single investigation was deemed as insufficient. The reasons for this are that the distinction between noise and outliers is not trivial, and that the investigated noise cleaning techniques are derived from traditional noise handling techniques where noise and outliers are combined. Therefore three investigations were undertaken to assess the effectiveness of the three presented noise handling techniques. Each investigation should be seen as a part of a multi-pronged approach. This thesis also highlights possible shortcomings of current automated noise handling techniques. The poor performance of the three techniques led to the conclusion that noise handling should be integrated into a data cleaning process where the input of domain knowledge and the replicability of the data cleaning process are ensured.
APA, Harvard, Vancouver, ISO, and other styles
3

Iyer, Vasanth. "Ensemble Stream Model for Data-Cleaning in Sensor Networks." FIU Digital Commons, 2013. http://digitalcommons.fiu.edu/etd/973.

Full text
Abstract:
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.
APA, Harvard, Vancouver, ISO, and other styles
4

Kokkonen, H. (Henna). "Effects of data cleaning on machine learning model performance." Bachelor's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201911133081.

Full text
Abstract:
Abstract. This thesis is focused on the preprocessing and challenges of a university student data set and how different levels of data preprocessing affect the performance of a prediction model both in general and in selected groups of interest. The data set comprises the students at the University of Oulu who were admitted to the Faculty of Information Technology and Electrical Engineering during years 2006–2015. This data set was cleaned at three different levels, which resulted in three differently processed data sets: one set is the original data set with only basic cleaning, the second has been cleaned out of the most obvious anomalies and the third has been systematically cleaned out of possible anomalies. Each of these data sets was used to build a Gradient Boosting Machine model that predicted the cumulative number of ECTS the students would achieve by the end of their second-year studies based on their first-year studies and the Matriculation Examination results. The effects of the cleaning on the model performance were examined by comparing the prediction accuracy and the information the models gave of the factors that might indicate a slow ECTS accumulation. The results showed that the prediction accuracy improved after each cleaning stage and the influences of the features altered significantly, becoming more reasonable.Datan siivouksen vaikutukset koneoppimismallin suorituskykyyn. Tiivistelmä. Tässä tutkielmassa keskitytään opiskelijadatan esikäsittelyyn ja haasteisiin sekä siihen, kuinka eritasoinen esikäsittely vaikuttaa ennustemallin suorituskykyyn sekä yleisesti että tietyissä kiinnostuksen kohteena olevissa ryhmissä. Opiskelijadata koostuu Oulun yliopiston Tieto- ja sähkötekniikan tiedekuntaan vuosina 2006–2015 valituista opiskelijoista. Tätä opiskelijadataa käsiteltiin kolmella eri tasolla, jolloin saatiin kolme eritasoisesti siivottua versiota alkuperäisestä datajoukosta. Ensimmäinen versio on alkuperäinen datajoukko, jolle on tehty vain perussiivous, toisessa versiossa datasta on poistettu vain ilmeisimmät poikkeavuudet ja kolmannessa versiossa datasta on systemaattisesti poistettu mahdolliset poikkeavuudet. Jokaisella datajoukolla opetettiin Gradient Boosting Machine koneoppismismalli ennustamaan opiskelijoiden opintopistekertymää toisen vuoden loppuun mennessä perustuen heidän ensimmäisen vuoden opintoihinsa ja ylioppilaskirjoitustensa tuloksiin. Datan eritasoisen siivouksen vaikutuksia mallin suorituskykyyn tutkittiin vertailemalla mallien ennustetarkkuutta sekä tietoa, jota mallit antoivat niistä tekijöistä, jotka voivat ennakoida hitaampaa opintopistekertymää. Tulokset osoittivat mallin ennustetarkkuuden parantuneen jokaisen käsittelytason jälkeen sekä mallin ennustajien vaikutusten muuttuneen järjellisemmiksi.
APA, Harvard, Vancouver, ISO, and other styles
5

Jia, Xibei. "From relations to XML : cleaning, integrating and securing data." Thesis, University of Edinburgh, 2008. http://hdl.handle.net/1842/3161.

Full text
Abstract:
While relational databases are still the preferred approach for storing data, XML is emerging as the primary standard for representing and exchanging data. Consequently, it has been increasingly important to provide a uniform XML interface to various data sources— integration; and critical to protect sensitive and confidential information in XML data — access control. Moreover, it is preferable to first detect and repair the inconsistencies in the data to avoid the propagation of errors to other data processing steps. In response to these challenges, this thesis presents an integrated framework for cleaning, integrating and securing data. The framework contains three parts. First, the data cleaning sub-framework makes use of a new class of constraints specially designed for improving data quality, referred to as conditional functional dependencies (CFDs), to detect and remove inconsistencies in relational data. Both batch and incremental techniques are developed for detecting CFD violations by SQL efficiently and repairing them based on a cost model. The cleaned relational data, together with other non-XML data, is then converted to XML format by using widely deployed XML publishing facilities. Second, the data integration sub-framework uses a novel formalism, XML integration grammars (XIGs), to integrate multi-source XML data which is either native or published from traditional databases. XIGs automatically support conformance to a target DTD, and allow one to build a large, complex integration via composition of component XIGs. To efficiently materialize the integrated data, algorithms are developed for merging XML queries in XIGs and for scheduling them. Third, to protect sensitive information in the integrated XML data, the data security sub-framework allows users to access the data only through authorized views. User queries posed on these views need to be rewritten into equivalent queries on the underlying document to avoid the prohibitive cost of materializing and maintaining large number of views. Two algorithms are proposed to support virtual XML views: a rewriting algorithm that characterizes the rewritten queries as a new form of automata and an evaluation algorithm to execute the automata-represented queries. They allow the security sub-framework to answer queries on views in linear time. Using both relational and XML technologies, this framework provides a uniform approach to clean, integrate and secure data. The algorithms and techniques in the framework have been implemented and the experimental study verifies their effectiveness and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
6

Bischof, Stefan, Benedikt Kämpgen, Andreas Harth, Axel Polleres, and Patrik Schneider. "Open City Data Pipeline." Department für Informationsverarbeitung und Prozessmanagement, WU Vienna University of Economics and Business, 2017. http://epub.wu.ac.at/5438/1/city%2Dqb%2Dpaper.pdf.

Full text
Abstract:
Statistical data about cities, regions and at country level is collected for various purposes and from various institutions. Yet, while access to high quality and recent such data is crucial both for decision makers as well as for the public, all to often such collections of data remain isolated and not re-usable, let alone properly integrated. In this paper we present the Open City Data Pipeline, a focused attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and republish this data in a reusable manner as Linked Data. The main feature of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques as well as ontological reasoning over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such imputations per indicator. Additionally, (iv) we make the integrated and enriched data available both in a we browser interface and as machine-readable Linked Data, using standard vocabularies such as QB and PROV, and linking to e.g. DBpedia. Lastly, in an exhaustive evaluation of our approach, we compare our enrichment and cleansing techniques to a preliminary version of the Open City Data Pipeline presented at ISWC2015: firstly, we demonstrate that the combination of equational knowledge and standard machine learning techniques significantly helps to improve the quality of our missing value imputations; secondly, we arguable show that the more data we integrate, the more reliable our predictions become. Hence, over time, the Open City Data Pipeline shall provide a sustainable effort to serve Linked Data about cities in increasing quality.
Series: Working Papers on Information Systems, Information Business and Operations
APA, Harvard, Vancouver, ISO, and other styles
7

Pumpichet, Sitthapon. "Novel Online Data Cleaning Protocols for Data Streams in Trajectory, Wireless Sensor Networks." FIU Digital Commons, 2013. http://digitalcommons.fiu.edu/etd/1004.

Full text
Abstract:
The promise of Wireless Sensor Networks (WSNs) is the autonomous collaboration of a collection of sensors to accomplish some specific goals which a single sensor cannot offer. Basically, sensor networking serves a range of applications by providing the raw data as fundamentals for further analyses and actions. The imprecision of the collected data could tremendously mislead the decision-making process of sensor-based applications, resulting in an ineffectiveness or failure of the application objectives. Due to inherent WSN characteristics normally spoiling the raw sensor readings, many research efforts attempt to improve the accuracy of the corrupted or “dirty” sensor data. The dirty data need to be cleaned or corrected. However, the developed data cleaning solutions restrict themselves to the scope of static WSNs where deployed sensors would rarely move during the operation. Nowadays, many emerging applications relying on WSNs need the sensor mobility to enhance the application efficiency and usage flexibility. The location of deployed sensors needs to be dynamic. Also, each sensor would independently function and contribute its resources. Sensors equipped with vehicles for monitoring the traffic condition could be depicted as one of the prospective examples. The sensor mobility causes a transient in network topology and correlation among sensor streams. Based on static relationships among sensors, the existing methods for cleaning sensor data in static WSNs are invalid in such mobile scenarios. Therefore, a solution of data cleaning that considers the sensor movements is actively needed. This dissertation aims to improve the quality of sensor data by considering the consequences of various trajectory relationships of autonomous mobile sensors in the system. First of all, we address the dynamic network topology due to sensor mobility. The concept of virtual sensor is presented and used for spatio-temporal selection of neighboring sensors to help in cleaning sensor data streams. This method is one of the first methods to clean data in mobile sensor environments. We also study the mobility pattern of moving sensors relative to boundaries of sub-areas of interest. We developed a belief-based analysis to determine the reliable sets of neighboring sensors to improve the cleaning performance, especially when node density is relatively low. Finally, we design a novel sketch-based technique to clean data from internal sensors where spatio-temporal relationships among sensors cannot lead to the data correlations among sensor streams.
APA, Harvard, Vancouver, ISO, and other styles
8

Artilheiro, Fernando Manuel Freitas. "Analysis and procedures of multibeam data cleaning for bathymetric charting." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1996. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp04/mq23776.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ramakrishnan, Ranjani. "A data cleaning and annotation framework for genome-wide studies." Full text open access at:, 2007. http://content.ohsu.edu/u?/etd,263.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Hallström, Fredrik, and David Adolfsson. "Data Cleaning Extension on IoT Gateway : An Extended ThingsBoard Gateway." Thesis, Karlstads universitet, Institutionen för matematik och datavetenskap (from 2013), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-84376.

Full text
Abstract:
Machine learning algorithms that run on Internet of Things sensory data requires high data quality to produce relevant output. By providing data cleaning at the edge, cloud infrastructures performing AI computations is relieved by not having to perform preprocessing. The main problem connected with edge cleaning is the dependency on unsupervised pre-processing as it leaves no guarantee of high quality output data. In this thesis an IoT gateway is extended to provide cleaning and live configuration of cleaning parameters before forwarding the data to a server cluster. Live configuration is implemented to be able to fit the parameters to match a time series and thereby mitigate quality issues. The gateway framework performance and used resources of the container was benchmarked using an MQTT stress tester. The gateway’s performance was under expectation. With high-frequency data streams, the throughput was below50%. However, these issues are not present for its Glava Energy Center connector, as their sensory data generates at a slower pace.
AI4ENERGY
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Cleaning of data"

1

Exploratory data mining and data cleaning. Hoboken, NJ: John Wiley & Sons, 2004.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Best practices in data cleaning. Thousand Oaks: SAGE, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Institute, SAS, ed. Cody's data cleaning techniques using SAS. 2nd ed. Cary, NC: SAS Institute Inc., 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Institute, SAS, ed. Cody's data cleaning techniques using SAS software. Cary, NC: SAS Institute Inc., 1999.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

de Jonge, Edwin, and Mark van der Loo. Statistical Data Cleaning with Applications in R. Chichester, UK: John Wiley & Sons, Ltd, 2018. http://dx.doi.org/10.1002/9781118897126.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Buttrey, Samuel. A Data Scientist's Guide to Acquiring, Cleaning and Managing Data in R. Chichester, UK: John Wiley & Sons Ltd, 2017. http://dx.doi.org/10.1002/9781119080053.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gibbs, Roger. A review of the data available on cleaning services. [London?: Department of Trade and Industry?], 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kimball, Ralph. The data warehouse ETL toolkit: Practical techniques for extracting, cleaning, conforming, and delivering data. Indianapolis, IN: Wiley, 2004.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Waschbusch, Robert J. Data and methods of a 1999-2000 street sweeping study on an urban freeway in Milwaukee County, Wisconsin. Middleton, Wis: U.S. Dept. of the Interior, U.S. Geological Survey, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

H, Long Stephen, Marquis M. Susan, Robert Wood Johnson Foundation, and Rand Corporation, eds. Data cleaning procedures for the 1993 Robert Wood Johnson Foundation family health insurance survey. Santa Monica, CA: Rand, 1997.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Cleaning of data"

1

Van den Broeck, Jan, and Lars Thore Fadnes. "Data Cleaning." In Epidemiology: Principles and Practical Guidelines, 389–99. Dordrecht: Springer Netherlands, 2013. http://dx.doi.org/10.1007/978-94-007-5989-3_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chu, Xu. "Data Cleaning." In Encyclopedia of Big Data Technologies, 535–41. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-319-77525-8_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chu, Xu. "Data Cleaning." In Encyclopedia of Big Data Technologies, 1–7. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-63962-8_3-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Whitmore, Nathan. "Data cleaning." In R for Conservation and Development Projects, 125–44. First edition. | Boca Raton: CRC Press, 2021. | Series: Chapman & Hall the R series: Chapman and Hall/CRC, 2020. http://dx.doi.org/10.1201/9780429262180-ch10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ganti, Venkatesh. "Data Cleaning." In Encyclopedia of Database Systems, 737–41. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_592.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ganti, Venkatesh. "Data Cleaning." In Encyclopedia of Database Systems, 561–64. Boston, MA: Springer US, 2009. http://dx.doi.org/10.1007/978-0-387-39940-9_592.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Winson-Geideman, Kimberly, Andy Krause, Clifford A. Lipscomb, and Nicholas Evangelopoulos. "Data cleaning." In Real Estate Analysis in the Information Age, 86–100. Abingdon, Oxon ; New York, NY : Routledge, 2018.: Routledge, 2017. http://dx.doi.org/10.4324/9781315311135-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Ganti, Venkatesh. "Data Cleaning." In Encyclopedia of Database Systems, 1–4. New York, NY: Springer New York, 2016. http://dx.doi.org/10.1007/978-1-4899-7993-3_592-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Costello, Tim, and Lori Blackshear. "Cleaning." In Prepare Your Data for Tableau, 91–119. Berkeley, CA: Apress, 2019. http://dx.doi.org/10.1007/978-1-4842-5497-4_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Li, Deren, Shuliang Wang, and Deyi Li. "Spatial Data Cleaning." In Spatial Data Mining, 119–55. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015. http://dx.doi.org/10.1007/978-3-662-48538-5_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Cleaning of data"

1

Chu, Xu, Ihab F. Ilyas, Sanjay Krishnan, and Jiannan Wang. "Data Cleaning." In SIGMOD/PODS'16: International Conference on Management of Data. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2882903.2912574.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Volkovs, Maksims, Fei Chiang, Jaroslaw Szlichta, and Renee J. Miller. "Continuous data cleaning." In 2014 IEEE 30th International Conference on Data Engineering (ICDE). IEEE, 2014. http://dx.doi.org/10.1109/icde.2014.6816655.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Alipour-Langouri, Morteza, Zheng Zheng, Fei Chiang, Lukasz Golab, and Jaroslaw Szlichta. "Contextual Data Cleaning." In 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW). IEEE, 2018. http://dx.doi.org/10.1109/icdew.2018.00010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Tang, Jie, Hang Li, Yunbo Cao, and Zhaohui Tang. "Email data cleaning." In Proceeding of the eleventh ACM SIGKDD international conference. New York, New York, USA: ACM Press, 2005. http://dx.doi.org/10.1145/1081870.1081926.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Aoqian, Shaoxu Song, and Jianmin Wang. "Sequential Data Cleaning." In SIGMOD/PODS'16: International Conference on Management of Data. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2882903.2915233.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Johnson, Theodore, and Tamraparni Dasu. "Data quality and data cleaning." In the 2003 ACM SIGMOD international conference on. New York, New York, USA: ACM Press, 2003. http://dx.doi.org/10.1145/872757.872875.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Parulian, Nikolaus N., and Bertram Ludascher. "Towards Transparent Data Cleaning: The Data Cleaning Model Explorer (DCM/X)." In 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, 2021. http://dx.doi.org/10.1109/jcdl52503.2021.00054.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Hua, Ming, and Jian Pei. "Cleaning disguised missing data." In the 13th ACM SIGKDD international conference. New York, New York, USA: ACM Press, 2007. http://dx.doi.org/10.1145/1281192.1281294.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Tang, Nan. "Big RDF data cleaning." In 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW). IEEE, 2015. http://dx.doi.org/10.1109/icdew.2015.7129549.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Musleh, Mashaal, Mourad Ouzzani, Nan Tang, and AnHai Doan. "CoClean: Collaborative Data Cleaning." In SIGMOD/PODS '20: International Conference on Management of Data. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3318464.3384698.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Cleaning of data"

1

Research Institute (IFPRI), International Food Policy. A guide to data cleaning using Stata. Washington, DC: International Food Policy Research Institute, 2018. http://dx.doi.org/10.2499/1024320680.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bollinger, Christopher, and Amitabh Chandra. Iatrogenic Specification Error: A Cautionary Tale of Cleaning Data. Cambridge, MA: National Bureau of Economic Research, March 2003. http://dx.doi.org/10.3386/t0289.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Marinshaw, Richard J., and Hazem Qawasmeh. Characterizing Water Use at Mosques in Abu Dhabi. RTI Press, April 2020. http://dx.doi.org/10.3768/rtipress.2020.mr.0042.2004.

Full text
Abstract:
In areas where Muslims constitute much of the population, mosques can account for a significant portion of overall water consumption. Among the various uses of water at mosques, ablution (i.e., ritual cleansing) is generally assumed to be the largest, by far. As part of an initiative to reduce water consumption at mosques in Abu Dhabi, we collected data on ablution and other end uses for water from hundreds of mosques in and around Abu Dhabi City. This paper takes a closer look at how water is used at mosques in Abu Dhabi and presents a set of water use profiles that provide a breakdown of mosque water consumption by end use. The results of this research indicate that cleaning the mosque (primarily the floors) and some of the other non-ablution end uses at mosques can account for a significant portion of the total water consumption and significantly more than was anticipated or has been found in other countries.
APA, Harvard, Vancouver, ISO, and other styles
4

Martin, Mark, Lance Vowell, Ian King, and Chris Augustus. Automated Data Cleansing in Data Harvesting and Data Migration. Office of Scientific and Technical Information (OSTI), March 2011. http://dx.doi.org/10.2172/949761.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Adjaye-Gbewonyo, Dzifa, and Lindsey Back. Dental Care Utilization Among Children Aged 1–17 Years: United States, 2019 and 2020. National Center for Health Statistics (U.S.), December 2021. http://dx.doi.org/10.15620/cdc:111175.

Full text
Abstract:
This report uses data from the 2019 and 2020 National Health Interview Survey (NHIS) to describe recent changes in the prevalence of dental examinations or cleanings in the past 12 months among children aged 1–17 years by selected sociodemographic characteristics.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography