To see the other types of publications on this topic, follow the link: Cleaning of data.

Journal articles on the topic 'Cleaning of data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Cleaning of data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Pahwa, Payal, and Rashmi Chhabra. "BST Algorithm for Duplicate Elimination in Data Warehouse." INTERNATIONAL JOURNAL OF MANAGEMENT & INFORMATION TECHNOLOGY 4, no. 1 (June 26, 2013): 190–97. http://dx.doi.org/10.24297/ijmit.v4i1.4636.

Full text
Abstract:
Data warehousing is an emerging technology and has proved to be very important for an organization. Today every business organization needs accurate and large amount of information to make proper decisions. For taking the business decisions the data should be of good quality. To improve the data quality data cleansing is needed. Data cleansing is fundamental to warehouse data reliability, and to data warehousing success. There are various methods for datacleansing. This paper addresses issues related data cleaning. We focus on the detection of duplicate records. Also anefficient algorithm for data cleaning is proposed. A review of data cleansing methods and comparison between them is presented.
APA, Harvard, Vancouver, ISO, and other styles
2

Chu, Xu, and Ihab F. Ilyas. "Qualitative data cleaning." Proceedings of the VLDB Endowment 9, no. 13 (September 2016): 1605–8. http://dx.doi.org/10.14778/3007263.3007320.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Broman, Karl W. "Cleaning genotype data." Genetic Epidemiology 17, S1 (1999): S79—S83. http://dx.doi.org/10.1002/gepi.1370170714.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Singh, Mohini. "Cleaning Up Company Data." CFA Institute Magazine 27, no. 1 (March 2016): 53. http://dx.doi.org/10.2469/cfm.v27.n1.18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Geerts, Floris, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. "Cleaning data with Llunatic." VLDB Journal 29, no. 4 (November 8, 2019): 867–92. http://dx.doi.org/10.1007/s00778-019-00586-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Aoqian, Shaoxu Song, Jianmin Wang, and Philip S. Yu. "Time series data cleaning." Proceedings of the VLDB Endowment 10, no. 10 (June 2017): 1046–57. http://dx.doi.org/10.14778/3115404.3115410.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Holstad, Mark S. "Data Driven Interceptor Cleaning." Proceedings of the Water Environment Federation 2010, no. 8 (January 1, 2010): 7636–64. http://dx.doi.org/10.2175/193864710798207792.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Karr, Alan F. "Exploratory Data Mining and Data Cleaning." Journal of the American Statistical Association 101, no. 473 (March 2006): 399. http://dx.doi.org/10.1198/jasa.2006.s81.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rahul, Kumar, and Rohitash Kumar Banyal. "Detection and Correction of Abnormal Data with Optimized Dirty Data: A New Data Cleaning Model." International Journal of Information Technology & Decision Making 20, no. 02 (March 2021): 809–41. http://dx.doi.org/10.1142/s0219622021500188.

Full text
Abstract:
Each and every business enterprises require noise-free and clean data. There is a chance of an increase in dirty data as the data warehouse loads and refreshes a large quantity of data continuously from the various sources. Hence, in order to avoid the wrong conclusions, the data cleaning process becomes a vital one in various data-connected projects. This paper made an effort to introduce a novel data cleaning technique for the effective removal of dirty data. This process involves the following two steps: (i) dirty data detection and (ii) dirty data cleaning. The dirty data detection process has been assigned with the following process namely, data normalization, hashing, clustering, and finding the suspected data. In the clustering process, the optimal selection of centroid is the promising one and is carried out by employing the optimization concept. After the finishing of dirty data prediction, the subsequent process: dirty data cleaning begins to activate. The cleaning process also assigns with some processes namely, the leveling process, Huffman coding, and cleaning the suspected data. The cleaning of suspected data is performed based on the optimization concept. Hence, for solving all optimization problems, a new hybridized algorithm is proposed, the so-called Firefly Update Enabled Rider Optimization Algorithm (FU-ROA), which is the hybridization of the Rider Optimization Algorithm (ROA) and Firefly (FF) algorithm is introduced. To the end, the analysis of the performance of the implanted data cleaning method is scrutinized over the other traditional methods like Particle Swarm Optimization (PSO), FF, Grey Wolf Optimizer (GWO), and ROA in terms of their positive and negative measures. From the result, it can be observed that for iteration 12, the performance of the proposed FU-ROA model for test case 1 on was 0.013%, 0.7%, 0.64%, and 0.29% better than the extant PSO, FF, GWO, and ROA models, respectively.
APA, Harvard, Vancouver, ISO, and other styles
10

D. Pandya, Sohil, and Paresh V. Virparia. "Context Free Data Cleaning and its Application in Mechanism for Suggestive Data Cleaning." International Journal of Information Science 1, no. 1 (August 31, 2012): 32–35. http://dx.doi.org/10.5923/j.ijis.20110101.05.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Kishore, Kamal, and Amarjeet Singh. "Statistics Corner: Data Cleaning-I." Journal of Postgraduate Medicine, Education and Research 53, no. 3 (2019): 130–32. http://dx.doi.org/10.5005/jp-journals-10028-1330.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Hayes, Patricia. "The Ethics of Cleaning Data." Clinical Nursing Research 13, no. 2 (May 2004): 95–97. http://dx.doi.org/10.1177/1054773804263173.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Hunt, Neville, and Sidney Tyrrell. "Cleaning Dirty Data in Excel." Teaching Statistics 24, no. 3 (August 22, 2002): 90–92. http://dx.doi.org/10.1111/1467-9639.00096.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Geerts, Floris, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. "The LLUNATIC data-cleaning framework." Proceedings of the VLDB Endowment 6, no. 9 (July 2013): 625–36. http://dx.doi.org/10.14778/2536360.2536363.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Rammelaere, Joeri, and Floris Geerts. "Cleaning Data with Forbidden Itemsets." IEEE Transactions on Knowledge and Data Engineering 32, no. 8 (August 1, 2020): 1489–501. http://dx.doi.org/10.1109/tkde.2019.2905548.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Richards, Kate, and Neville Davies. "Cleaning data: guess the olympian." Teaching Statistics 34, no. 1 (January 16, 2012): 31–37. http://dx.doi.org/10.1111/j.1467-9639.2011.00495.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

BARHYTE, DIANA Y., and LYND D. BACON. "Approaches to Cleaning Data Sets." Nursing Research 34, no. 1 (January 1985): 62???64. http://dx.doi.org/10.1097/00006199-198501000-00013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

ROBERTS, BEVERLY L., MARY K. ANTHONY, ELIZABETH A. MADIGAN, and YAN CHEN. "Data Management: Cleaning and Checking." Nursing Research 46, no. 6 (November 1997): 350–52. http://dx.doi.org/10.1097/00006199-199711000-00010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Ganti, Venkatesh, and Anish Das Sarma. "Data Cleaning: A Practical Perspective." Synthesis Lectures on Data Management 5, no. 3 (September 21, 2013): 1–85. http://dx.doi.org/10.2200/s00523ed1v01y201307dtm036.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Wong, Jing Ting, and Jer Lang Hong. "Data Cleaning Utilizing Ontology Tool." International Journal of Grid and Distributed Computing 9, no. 7 (July 31, 2016): 43–52. http://dx.doi.org/10.14257/ijgdc.2016.9.7.05.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Goerge, Robert M., and Bong Joo Lee. "Matching and cleaning administrative data." New Zealand Economic Papers 36, no. 1 (June 2002): 63–64. http://dx.doi.org/10.1080/00779950209544351.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Carey, Ronell, and Maurice Craig. "Cleaning Scattered Multi-Channel Data." Exploration Geophysics 35, no. 2 (June 2004): 131–36. http://dx.doi.org/10.1071/eg04131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Bhattacharjee, Arup Kumar. "Data Cleaning in Text File." IOSR Journal of Computer Engineering 9, no. 2 (2013): 17–21. http://dx.doi.org/10.9790/0661-0921721.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

P.N.V., Syamala Rao. "A Comprehensive Survey of Financial Data Modelling Processes & Data Cleaning Methods Using Composite Coefficient." Journal of Advanced Research in Dynamical and Control Systems 12, no. 01-Special Issue (February 13, 2020): 882–99. http://dx.doi.org/10.5373/jardcs/v12sp1/20201141.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Hodkiewicz, Melinda, and Mark Tien-Wei Ho. "Cleaning historical maintenance work order data for reliability analysis." Journal of Quality in Maintenance Engineering 22, no. 2 (May 9, 2016): 146–63. http://dx.doi.org/10.1108/jqme-04-2015-0013.

Full text
Abstract:
Purpose – The purpose of this paper is to identify quality issues with using historical work order (WO) data from computerised maintenance management systems for reliability analysis; and develop an efficient and transparent process to correct these data quality issues to ensure data is fit for purpose in a timely manner. Design/methodology/approach – This paper develops a rule-based approach to data cleansing and demonstrates the process on data for heavy mobile equipment from a number of organisations. Findings – Although historical WO records frequently contain missing or incorrect functional location, failure mode, maintenance action and WO status fields the authors demonstrate it is possible to make these records fit for purpose by using data in the freeform text fields; an understanding of the maintenance tactics and practices at the operation; and knowledge of where the asset is in its life cycle. The authors demonstrate that it is possible to have a repeatable and transparent process to deal with the data cleaning activities. Originality/value – How engineers deal with raw maintenance data and the decisions they make in order to produce a data set for reliability analysis is seldom discussed in detail. Assumptions and actions are often left undocumented. This paper describes typical data cleaning decisions we all have to make as a routine part of the analysis and presents a process to support the data cleaning decisions in a repeatable and transparent fashion.
APA, Harvard, Vancouver, ISO, and other styles
26

S.Kulkarni, Prerana, and J. W. Bakal. "Hybrid Approaches for Data Cleaning in Data Warehouse." International Journal of Computer Applications 88, no. 18 (February 14, 2014): 7–10. http://dx.doi.org/10.5120/15450-3813.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Van den Broeck, Jan, Solveig Argeseanu Cunningham, Roger Eeckels, and Kobus Herbst. "Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities." PLoS Medicine 2, no. 10 (September 6, 2005): e267. http://dx.doi.org/10.1371/journal.pmed.0020267.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Gueta, Tomer, Vijay Barve, Thiloshon Nagarajah, Ashwin Agrawal, and Yohay Carmel. "Introducing bdclean: a user friendly biodiversity data cleaning pipeline." Biodiversity Information Science and Standards 2 (May 22, 2018): e25564. http://dx.doi.org/10.3897/biss.2.25564.

Full text
Abstract:
A new R package for biodiversity data cleaning, 'bdclean', was initiated in the Google Summer of Code (GSoC) 2017 and is available on github. Several R packages have great data validation and cleaning functions, but 'bdclean' provides features to manage a complete pipeline for biodiversity data cleaning; from data quality explorations, to cleaning procedures and reporting. Users are able go through the quality control process in a very structured, intuitive, and effective way. A modular approach to data cleaning functionality should make this package extensible for many biodiversity data cleaning needs. Under GSoC 2018, 'bdclean' will go through a comprehensive upgrade. New features will be highlighted in the demonstration.
APA, Harvard, Vancouver, ISO, and other styles
29

Oni, Samson, Zhiyuan Chen, Susan Hoban, and Onimi Jademi. "A Comparative Study of Data Cleaning Tools." International Journal of Data Warehousing and Mining 15, no. 4 (October 2019): 48–65. http://dx.doi.org/10.4018/ijdwm.2019100103.

Full text
Abstract:
In the information era, data is crucial in decision making. Most data sets contain impurities that need to be weeded out before any meaningful decision can be made from the data. Hence, data cleaning is essential and often takes more than 80 percent of time and resources of the data analyst. Adequate tools and techniques must be used for data cleaning. There exist a lot of data cleaning tools but it is unclear how to choose them in various situations. This research aims at helping researchers and organizations choose the right tools for data cleaning. This article conducts a comparative study of four commonly used data cleaning tools on two real data sets and answers the research question of which tool will be useful based on different scenario.
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, S. Z., Y. R. Yu, and M. Z. Shen. "Study on Preprocessing Method of TCM Prescription Data in Data Mining." Journal of Physics: Conference Series 2025, no. 1 (September 1, 2021): 012027. http://dx.doi.org/10.1088/1742-6596/2025/1/012027.

Full text
Abstract:
Abstract Traditional Chinese medicine (TCM) prescriptions have been developed for thousands of years. Data forms are diverse, content is discrete and missing, and there are many uncertainties due to cultural and regional differences. Therefore, it has brought some difficulties to the mining of TCM prescriptions. Data based on the 3108 prescriptions for the treatment of typhoid fever, for example, is given priority to with data cleaning and data transformation of data preprocessing, prescriptions combined with multiple functions, expounds the unqualified prescriptions data cleansing, drug name normalization, dose for solving the problems of the unification, the data structured method, make the processed data can be effectively mining, It provides a strong support for exploring the compatibility law of prescription and the development of new drugs.
APA, Harvard, Vancouver, ISO, and other styles
31

Clark, Stephen D., S. Grant-Muller, and Haibo Chen. "Cleaning of Matched License Plate Data." Transportation Research Record: Journal of the Transportation Research Board 1804, no. 1 (January 2002): 1–7. http://dx.doi.org/10.3141/1804-01.

Full text
Abstract:
Three methods for identifying outlying journey time observations collected as part of a motorway license plate matching exercise are presented. Each method is examined to ensure that it is comprehensible to transport practitioners, is able to correctly classify outliers, and is efficient in its application. The first method is a crude method based on percentiles. The second uses a mean absolute deviation test. The third method is a modification of a traditional z- or t-statistical test. Results from each method and combinations of methods are compared. The preferred method is judged to be the third method alone, which uses the median rather than the mean as its measure of location and the inter-quartile range rather than the standard deviation as its measure of variability. This method is seen to be robust to both the outliers themselves and the presence of incident conditions. The effectiveness of the method is demonstrated under a number of typical and atypical road traffic conditions. In particular, the method is applied to a different section of motorway and is shown to still produce useful results.
APA, Harvard, Vancouver, ISO, and other styles
32

Jun, Sung-Hae, Seung-Joo Lee, and Kyung-Whan Oh. "Sparse Data Cleaning using Multiple Imputations." International Journal of Fuzzy Logic and Intelligent Systems 4, no. 1 (June 1, 2004): 119–24. http://dx.doi.org/10.5391/ijfis.2004.4.1.119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Gomathi, L. "Text Classification Method for Data Cleaning." IOSR Journal of Computer Engineering 7, no. 5 (2012): 45–54. http://dx.doi.org/10.9790/0661-0754554.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Kumar, Rajnish. "Data Cleaning by Genetic Programming Technique." IOSR Journal of Engineering 03, no. 08 (August 2013): 45–51. http://dx.doi.org/10.9790/3021-03824551.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Khedri, Ridha, Fei Chiang, and Khair Eddin Sabri. "An Algebraic Approach Towards Data Cleaning." Procedia Computer Science 21 (2013): 50–59. http://dx.doi.org/10.1016/j.procs.2013.09.009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Dijkers, Marcel P. J. M., and Cynthia L. Creighton. "Data Cleaning in Occupational Therapy Research." Occupational Therapy Journal of Research 14, no. 3 (July 1994): 144–56. http://dx.doi.org/10.1177/153944929401400302.

Full text
Abstract:
Errors in processing data prior to analysis can cause significant distortion of research findings. General principles and specific techniques for cleaning data sets are presented. Strategies are suggested for preventing errors in transcribing, coding, and keying research data.
APA, Harvard, Vancouver, ISO, and other styles
37

Cheng, Reynold, Jinchuan Chen, and Xike Xie. "Cleaning uncertain data with quality guarantees." Proceedings of the VLDB Endowment 1, no. 1 (August 2008): 722–35. http://dx.doi.org/10.14778/1453856.1453935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Prokoshyna, Nataliya, Jaroslaw Szlichta, Fei Chiang, Renée J. Miller, and Divesh Srivastava. "Combining quantitative and logical data cleaning." Proceedings of the VLDB Endowment 9, no. 4 (December 2015): 300–311. http://dx.doi.org/10.14778/2856318.2856325.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Mery, David Panda. "Does data cleaning disproportionately affect autistics?" Autism 22, no. 2 (November 29, 2016): 232. http://dx.doi.org/10.1177/1362361316673566.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Wang, Xi, and Chen Wang. "Time Series Data Cleaning: A Survey." IEEE Access 8 (2020): 1866–81. http://dx.doi.org/10.1109/access.2019.2962152.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Qahtan, Abdulhakim, Nan Tang, Mourad Ouzzani, Yang Cao, and Michael Stonebraker. "Pattern functional dependencies for data cleaning." Proceedings of the VLDB Endowment 13, no. 5 (January 2020): 684–97. http://dx.doi.org/10.14778/3377369.3377377.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Herbert, Katherine G., and Jason T. L. Wang. "Biological data cleaning: a case study." International Journal of Information Quality 1, no. 1 (2007): 60. http://dx.doi.org/10.1504/ijiq.2007.013376.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Tarter, Michael E. "Model-free data screening and cleaning." Wiley Interdisciplinary Reviews: Computational Statistics 3, no. 2 (January 4, 2011): 168–76. http://dx.doi.org/10.1002/wics.140.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Pahwa, Payal, Rajiv Arora, and Garima Thakur. "An Efficient Algorithm for Data Cleaning." International Journal of Knowledge-Based Organizations 1, no. 4 (October 2011): 56–71. http://dx.doi.org/10.4018/ijkbo.2011100104.

Full text
Abstract:
The quality of real world data that is being fed into a data warehouse is a major concern of today. As the data comes from a variety of sources before loading the data in the data warehouse, it must be checked for errors and anomalies. There may be exact duplicate records or approximate duplicate records in the source data. The presence of incorrect or inconsistent data can significantly distort the results of analyses, often negating the potential benefits of information-driven approaches. This paper addresses issues related to detection and correction of such duplicate records. Also, it analyzes data quality and various factors that degrade it. A brief analysis of existing work is discussed, pointing out its major limitations. Thus, a new framework is proposed that is an improvement over the existing technique.
APA, Harvard, Vancouver, ISO, and other styles
45

Sorriso, Antonietta, Pierpaolo Sorrentino, Rosaria Rucco, Laura Mandolesi, Giampaolo Ferraioli, Stefano Franceschini, Michele Ambrosanio, and Fabio Baselice. "An automated magnetoencephalographic data cleaning algorithm." Computer Methods in Biomechanics and Biomedical Engineering 22, no. 14 (July 16, 2019): 1116–25. http://dx.doi.org/10.1080/10255842.2019.1634695.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Parker, Jennifer D., and Kenneth C. Schoendorf. "Implications of cleaning gestational age data." Paediatric and Perinatal Epidemiology 16, no. 2 (April 2002): 181–87. http://dx.doi.org/10.1046/j.1365-3016.2002.00407.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Mong Li Lee, W. Hsu, and Vijay Kothari. "Cleaning the spurious links in data." IEEE Intelligent Systems 19, no. 2 (March 2004): 28–33. http://dx.doi.org/10.1109/mis.2004.1274908.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Krause, Todd B. "IE6.com. Cleaning data with OpenRefine." Folia Linguistica 55, s42-s2 (October 14, 2021): 527–33. http://dx.doi.org/10.1515/flin-2021-2038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Krause, Todd B. "IE6.com. Cleaning data with OpenRefine." Folia Linguistica 55, s42-s2 (October 14, 2021): 527–33. http://dx.doi.org/10.1515/flin-2021-2038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

García, J. M. Barros, and C. M. Guillen Juan. "Cleaning Pictorial Heritage Management and Dissemination of Cleaning Records and Stratigraphic Data." International Journal of Heritage in the Digital Era 1, no. 1_suppl (January 2012): 159–64. http://dx.doi.org/10.1260/2047-4970.1.0.159.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography