To see the other types of publications on this topic, follow the link: Data cleaning.

Journal articles on the topic 'Data cleaning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Data cleaning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Pahwa, Payal, and Rashmi Chhabra. "BST Algorithm for Duplicate Elimination in Data Warehouse." INTERNATIONAL JOURNAL OF MANAGEMENT & INFORMATION TECHNOLOGY 4, no. 1 (June 26, 2013): 190–97. http://dx.doi.org/10.24297/ijmit.v4i1.4636.

Full text
Abstract:
Data warehousing is an emerging technology and has proved to be very important for an organization. Today every business organization needs accurate and large amount of information to make proper decisions. For taking the business decisions the data should be of good quality. To improve the data quality data cleansing is needed. Data cleansing is fundamental to warehouse data reliability, and to data warehousing success. There are various methods for datacleansing. This paper addresses issues related data cleaning. We focus on the detection of duplicate records. Also anefficient algorithm for data cleaning is proposed. A review of data cleansing methods and comparison between them is presented.
APA, Harvard, Vancouver, ISO, and other styles
2

Chu, Xu, and Ihab F. Ilyas. "Qualitative data cleaning." Proceedings of the VLDB Endowment 9, no. 13 (September 2016): 1605–8. http://dx.doi.org/10.14778/3007263.3007320.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Broman, Karl W. "Cleaning genotype data." Genetic Epidemiology 17, S1 (1999): S79—S83. http://dx.doi.org/10.1002/gepi.1370170714.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Elvin Jafarov, Elvin Jafarov. "DATA CLEANING BEFORE UPLOADING TO STORAGE." ETM - Equipment, Technologies, Materials 13, no. 01 (February 7, 2023): 117–27. http://dx.doi.org/10.36962/etm13012023-117.

Full text
Abstract:
The article considered the issue of cleaning big data before uploading it to storage. At this time, the errors made and the methods of eliminating these errors have been clarified. The technology of creating a big data storage and analysis system is reviewed, as well as solutions for the implementation of the first stages of the Data Science process: data acquisition, cleaning and loading are described. The results of the research allow us to move towards the realization of future steps in the field of big data processing. It was noted that Data cleansing is an essential step in working with big data, as any analysis based on inaccurate data can lead to erroneous results. Also, it was noted that cleaning and consolidation of data can also be performed when the data is loaded into a distributed file system. The methods of uploading data to the storage system have been tested. An assembly from Hortonworks was used as the implementation. The easiest way to upload is to use the web interface of the Ambari system or to use HDFS commands to upload to HDFS Hadoop from the local system. It has been shown that the ETL process should be considered more broadly than just importing data from receivers, minimal transformations and loading procedures into the warehouse. Data cleaning should become a mandatory stage of work, because the cost of storage is determined not only by the amount of data, but also by the quality of the information collected. Keywords: Big Data, Data Cleaning, Storage System, ETL process, Loading methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Singh, Mohini. "Cleaning Up Company Data." CFA Institute Magazine 27, no. 1 (March 2016): 53. http://dx.doi.org/10.2469/cfm.v27.n1.18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Holstad, Mark S. "Data Driven Interceptor Cleaning." Proceedings of the Water Environment Federation 2010, no. 8 (January 1, 2010): 7636–64. http://dx.doi.org/10.2175/193864710798207792.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Aoqian, Shaoxu Song, Jianmin Wang, and Philip S. Yu. "Time series data cleaning." Proceedings of the VLDB Endowment 10, no. 10 (June 2017): 1046–57. http://dx.doi.org/10.14778/3115404.3115410.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Geerts, Floris, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. "Cleaning data with Llunatic." VLDB Journal 29, no. 4 (November 8, 2019): 867–92. http://dx.doi.org/10.1007/s00778-019-00586-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Karr, Alan F. "Exploratory Data Mining and Data Cleaning." Journal of the American Statistical Association 101, no. 473 (March 2006): 399. http://dx.doi.org/10.1198/jasa.2006.s81.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Borrohou, Sanae, Rachida Fissoune, and Hassan Badir. "Data cleaning survey and challenges – improving outlier detection algorithm in machine learning." Journal of Smart Cities and Society 2, no. 3 (October 9, 2023): 125–40. http://dx.doi.org/10.3233/scs-230008.

Full text
Abstract:
Data cleaning, also referred to as data cleansing, constitutes a pivotal phase in data processing subsequent to data collection. Its primary objective is to identify and eliminate incomplete data, duplicates, outdated information, anomalies, missing values, and errors. The influence of data quality on the effectiveness of machine learning (ML) models is widely acknowledged, prompting data scientists to dedicate substantial effort to data cleaning prior to model training. This study accentuates critical facets of data cleaning and the utilization of outlier detection algorithms. Additionally, our investigation encompasses the evaluation of prominent outlier detection algorithms through benchmarking, seeking to identify an efficient algorithm boasting consistent performance. As the culmination of our research, we introduce an innovative algorithm centered on the fusion of Isolation Forest and clustering techniques. By leveraging the strengths of both methods, this proposed algorithm aims to enhance outlier detection outcomes. This work endeavors to elucidate the multifaceted importance of data cleaning, underscored by its symbiotic relationship with ML models. Furthermore, our exploration of outlier detection methodologies aligns with the broader objective of refining data processing and analysis paradigms. Through the convergence of theoretical insights, algorithmic exploration, and innovative proposals, this study contributes to the advancement of data cleaning and outlier detection techniques in the realm of contemporary data-driven environments.
APA, Harvard, Vancouver, ISO, and other styles
11

Diachok, Roman, and Halyna Klym. "DATA CLEANING METHOD IN WIRELESS SENSOR-BASED ON INTELLIGENCE TECHNOLOGY." Measuring Equipment and Metrology 83, no. 2 (2022): 5–10. http://dx.doi.org/10.23939/istcmtm2022.02.005.

Full text
Abstract:
The method of cleaning management data in wireless sensor networks based on intelligence technology has been studied. Specific forms of application of wireless sensor networks are analyzed. The characteristics of the structure of wireless sensor networks are presented and the data cleaning technology based on the clustering model is offered. An algorithm for deleting a cluster-based replication record is proposed and the accuracy of data cleaning methods is tested. The obtained results testify to the efficiency of using the studied method.
APA, Harvard, Vancouver, ISO, and other styles
12

Rahul, Kumar, and Rohitash Kumar Banyal. "Detection and Correction of Abnormal Data with Optimized Dirty Data: A New Data Cleaning Model." International Journal of Information Technology & Decision Making 20, no. 02 (March 2021): 809–41. http://dx.doi.org/10.1142/s0219622021500188.

Full text
Abstract:
Each and every business enterprises require noise-free and clean data. There is a chance of an increase in dirty data as the data warehouse loads and refreshes a large quantity of data continuously from the various sources. Hence, in order to avoid the wrong conclusions, the data cleaning process becomes a vital one in various data-connected projects. This paper made an effort to introduce a novel data cleaning technique for the effective removal of dirty data. This process involves the following two steps: (i) dirty data detection and (ii) dirty data cleaning. The dirty data detection process has been assigned with the following process namely, data normalization, hashing, clustering, and finding the suspected data. In the clustering process, the optimal selection of centroid is the promising one and is carried out by employing the optimization concept. After the finishing of dirty data prediction, the subsequent process: dirty data cleaning begins to activate. The cleaning process also assigns with some processes namely, the leveling process, Huffman coding, and cleaning the suspected data. The cleaning of suspected data is performed based on the optimization concept. Hence, for solving all optimization problems, a new hybridized algorithm is proposed, the so-called Firefly Update Enabled Rider Optimization Algorithm (FU-ROA), which is the hybridization of the Rider Optimization Algorithm (ROA) and Firefly (FF) algorithm is introduced. To the end, the analysis of the performance of the implanted data cleaning method is scrutinized over the other traditional methods like Particle Swarm Optimization (PSO), FF, Grey Wolf Optimizer (GWO), and ROA in terms of their positive and negative measures. From the result, it can be observed that for iteration 12, the performance of the proposed FU-ROA model for test case 1 on was 0.013%, 0.7%, 0.64%, and 0.29% better than the extant PSO, FF, GWO, and ROA models, respectively.
APA, Harvard, Vancouver, ISO, and other styles
13

D. Pandya, Sohil, and Paresh V. Virparia. "Context Free Data Cleaning and its Application in Mechanism for Suggestive Data Cleaning." International Journal of Information Science 1, no. 1 (August 31, 2012): 32–35. http://dx.doi.org/10.5923/j.ijis.20110101.05.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Chauhan, Navneet Singh. "Data Cleaning: Challenges and Existing Solutions." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (April 8, 2024): 1–5. http://dx.doi.org/10.55041/ijsrem30377.

Full text
Abstract:
We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called ETL process. We also discuss current tool support for data cleaning.
APA, Harvard, Vancouver, ISO, and other styles
15

Kishore, Kamal, and Amarjeet Singh. "Statistics Corner: Data Cleaning-I." Journal of Postgraduate Medicine, Education and Research 53, no. 3 (2019): 130–32. http://dx.doi.org/10.5005/jp-journals-10028-1330.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Bhattacharjee, Arup Kumar. "Data Cleaning in Text File." IOSR Journal of Computer Engineering 9, no. 2 (2013): 17–21. http://dx.doi.org/10.9790/0661-0921721.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Carey, Ronell, and Maurice Craig. "Cleaning Scattered Multi-Channel Data." Exploration Geophysics 35, no. 2 (June 2004): 131–36. http://dx.doi.org/10.1071/eg04131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

BARHYTE, DIANA Y., and LYND D. BACON. "Approaches to Cleaning Data Sets." Nursing Research 34, no. 1 (January 1985): 62???64. http://dx.doi.org/10.1097/00006199-198501000-00013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

ROBERTS, BEVERLY L., MARY K. ANTHONY, ELIZABETH A. MADIGAN, and YAN CHEN. "Data Management: Cleaning and Checking." Nursing Research 46, no. 6 (November 1997): 350–52. http://dx.doi.org/10.1097/00006199-199711000-00010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Richards, Kate, and Neville Davies. "Cleaning data: guess the olympian." Teaching Statistics 34, no. 1 (January 16, 2012): 31–37. http://dx.doi.org/10.1111/j.1467-9639.2011.00495.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Geerts, Floris, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. "The LLUNATIC data-cleaning framework." Proceedings of the VLDB Endowment 6, no. 9 (July 2013): 625–36. http://dx.doi.org/10.14778/2536360.2536363.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Hayes, Patricia. "The Ethics of Cleaning Data." Clinical Nursing Research 13, no. 2 (May 2004): 95–97. http://dx.doi.org/10.1177/1054773804263173.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Rammelaere, Joeri, and Floris Geerts. "Cleaning Data with Forbidden Itemsets." IEEE Transactions on Knowledge and Data Engineering 32, no. 8 (August 1, 2020): 1489–501. http://dx.doi.org/10.1109/tkde.2019.2905548.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Hunt, Neville, and Sidney Tyrrell. "Cleaning Dirty Data in Excel." Teaching Statistics 24, no. 3 (August 22, 2002): 90–92. http://dx.doi.org/10.1111/1467-9639.00096.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Ganti, Venkatesh, and Anish Das Sarma. "Data Cleaning: A Practical Perspective." Synthesis Lectures on Data Management 5, no. 3 (September 21, 2013): 1–85. http://dx.doi.org/10.2200/s00523ed1v01y201307dtm036.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Wong, Jing Ting, and Jer Lang Hong. "Data Cleaning Utilizing Ontology Tool." International Journal of Grid and Distributed Computing 9, no. 7 (July 31, 2016): 43–52. http://dx.doi.org/10.14257/ijgdc.2016.9.7.05.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Goerge, Robert M., and Bong Joo Lee. "Matching and cleaning administrative data." New Zealand Economic Papers 36, no. 1 (June 2002): 63–64. http://dx.doi.org/10.1080/00779950209544351.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Yang, Min, Bing Yang, Xin Zhang, Saisai Wu, Tao Yu, Hong Song, Fei Ren, Puchun He, and Yanhui Zhu. "Experimental Study of the Factors Influencing the Regeneration Performance of Reduced Graphite Oxide Filter Materials under Water Cleaning." Materials 16, no. 11 (May 28, 2023): 4033. http://dx.doi.org/10.3390/ma16114033.

Full text
Abstract:
With the normalization of epidemic prevention and control, air filters are being used and replaced more frequently. How to efficiently utilize air filter materials and determining whether they have regenerative properties have become current research hotspots. This paper discusses the regeneration performance of reduced graphite oxide filter materials, which were studied in depth using water cleaning and the relevant parameters, including the cleaning times. The results showed that water cleaning was most effective using a 20 L/(s·m2) water flow velocity with a 17 s cleaning time. The filtration efficiency decreased as the number of cleanings increased. Compared to the blank group, the filter material’s PM10 filtration efficiency decreased by 0.8%, 19.4%, 26.5%, and 32.4% after the first to fourth cleanings, respectively. The filter material’s PM2.5 filtration efficiency increased by 12.5% after the first cleaning, and decreased by 12.9%, 17.6%, and 30.2% after the second to fourth cleanings, respectively. The filter material’s PM1.0 filtration efficiency increased by 22.7% after the first cleaning, and decreased by 8.1%, 13.8%, and 24.5% after the second to fourth cleanings, respectively. Water cleaning mainly affected the filtration efficiency of particulates sized 0.3–2.5 μm. Reduced graphite oxide air filter materials could be water washed twice and maintain cleanliness equal to 90% of the original filter material. Water washing more than twice could not achieve the standard cleanliness equal to 85% of the original filter material. These data provide useful reference values for the evaluation of the filter materials’ regeneration performance.
APA, Harvard, Vancouver, ISO, and other styles
29

H, Trabelsi. "Data Analytics and Development of a Wellbore Cleaning Coefficient Model." Petroleum & Petrochemical Engineering Journal 7, no. 4 (October 5, 2023): 1–14. http://dx.doi.org/10.23880/ppej-16000373.

Full text
Abstract:
In this study, wellbore cleaning coefficient (WCC) correlations were developed for three conventional coiled tubing sizes (2.375”, 2.625”, and 2.875”). These sizes correspond to roughness to internal (ε/D) ratios of 0.000460828, 0.000510637, and 0.000572517, respectively. Dimensional analysis, applying the Buckingham-π theorem and a database from 150 wells in the Spraberry formation in West Texas, was used. Key performance indicators (KPIs) that influence flow in a cased pipe around an object (coil tubing) were identified and employed in model development. These KPIs are (1) slick water density ( f ñ ), (2) slick water viscosity ( µ f ), (3) hydraulic diameter ( c t d - d ) between casing inner diameter (dc ) and coil tubing outer diameter ( t d ), (4) average annular velocity ( v ) and (5) cleaning pressure gradient(∆P) . The cleaning pressure gradient is the ratio of the circulating differential pressure (pu -pd ) to measured depth (MD). A global model that relates WCC to the Euler number and the inverse of the Reynolds number was attempted at first. A low coefficient of multiple determination R2 of 0.626 was obtained. To better explain the physics of the cleaning process and improve the model fit, data segregation was performed by separating data into three data sets, a set for each ε/D ratio. R2 of 0.974, 0.945, and 0.877 were obtained. It was decided to separate the database further and create models that would be used to identify “clean” and “not clean” wellbores. These equations addressed operational conditions since in data partition threshold values of annular velocity, Euler and Reynolds numbers were applied to describe laminar and turbulent flow conditions. The predictive equations showed excellent degrees of fit with R2 of 0.979, 0.822, and 0.897, for clean wells for the three ε/D ratios, respectively. This study’s findings were also validated using cumulative debris versus elapsed time data from 12 Woodford wells.
APA, Harvard, Vancouver, ISO, and other styles
30

Li, Lan, and Bertram Ludäscher. "On the Reusability of Data Cleaning Workflows." International Journal of Digital Curation 17, no. 1 (September 27, 2022): 6. http://dx.doi.org/10.2218/ijdc.v17i1.828.

Full text
Abstract:
The goal of data cleaning is to make data fit for purpose, i.e., to improve data quality, through updates and data transformations, such that downstream analyses can be conducted and lead to trustworthy results. A transparent and reusable data cleaning workflow can save time and effort through automation, and make subsequent data cleaning on new data less errorprone. However, reusability of data cleaning workflows has received little to no attention in the research community. We identify some challenges and opportunities for reusing data cleaning workflows. We present a high-level conceptual model to clarify what we mean by reusability and propose ways to improve reusability along different dimensions. We use the opportunity of presenting at IDCC to invite the community to share their uses cases, experiences, and desiderata for the reuse of data cleaning workflows and recipes in order to foster new collaborations and guide future work.
APA, Harvard, Vancouver, ISO, and other styles
31

Alotaibi, Obaid, Eric Pardede, and Sarath Tomy. "Cleaning Big Data Streams: A Systematic Literature Review." Technologies 11, no. 4 (July 26, 2023): 101. http://dx.doi.org/10.3390/technologies11040101.

Full text
Abstract:
In today’s big data era, cleaning big data streams has become a challenging task because of the different formats of big data and the massive amount of big data which is being generated. Many studies have proposed different techniques to overcome these challenges, such as cleaning big data in real time. This systematic literature review presents recently developed techniques that have been used for the cleaning process and for each data cleaning issue. Following the PRISMA framework, four databases are searched, namely IEEE Xplore, ACM Library, Scopus, and Science Direct, to select relevant studies. After selecting the relevant studies, we identify the techniques that have been utilized to clean big data streams and the evaluation methods that have been used to examine their efficiency. Also, we define the cleaning issues that may appear during the cleaning process, namely missing values, duplicated data, outliers, and irrelevant data. Based on our study, the future directions of cleaning big data streams are identified.
APA, Harvard, Vancouver, ISO, and other styles
32

Hawkins, Elizabeth M., and Dennis R. Buckmaster. "Improving Yield Data Analysis Using Contextual Data." Applied Engineering in Agriculture 39, no. 4 (2023): 391–98. http://dx.doi.org/10.13031/aea.14655.

Full text
Abstract:
Highlights Context-driven yield data cleaning resulted in more accurate whole field yield estimates Using a context-driven yield data cleaning method can improve yield estimates for zones within fields Identifying error-prone areas in field where data quality is likely to be low and removing that data in bulk can reduce data cleaning bias Abstract. As agriculture becomes more data driven, decision-making has become the focus of the industry and data quality will be increasingly important. Traditionally, yield data cleaning techniques have removed individual data points based on criteria primarily focused on the yield values themselves. However, when these methods are used, the underlying causes of the errors are often overlooked and as a result, these techniques may fail to remove all of the inaccurate (error-prone) data and/or remove legitimate data. In this research, an alternative to data cleaning was developed. Data integrity zones (DIZ) within each field were identified by evaluating metadata which included data collected by the combine that reported the operating conditions of the machinery (i.e., travel speed, crop mass flow), data about the field environment (i.e., soil type, topography, weather), and data of field operations (e.g., field logs, as-applied maps). Data in DIZ were isolated using buffers and the analysis of the reduced datasets was compared to the raw data. The amount of data removed depended on the amount of variability (e.g. soil characteristics, topography) in the field. Statistical comparisons of the data showed the mean yield estimates for soil type polygons increased by an average of 1.4 Mg/ha for corn when DIZ data was used compared to raw data. On average, the confidence around the mean remains similar even with a large amount (70%) of data removed. Notably, the none of the mean estimates derived from raw datasets were contained in the confidence intervals produced from DIZ data. This meta-data (context-driven) alternative to data cleaning effectively removed errors and artifacts from yield data which would only be identified when looking beyond the yield measurements themselves. When similarly reduced datasets are used to analyze historical yield data, they should provide a clearer picture of true yield effects of treatments, management zones, soil types, etc.; this will improve decisions on input and resource allocation, support wiser adoption of precision agricultural technologies, and refine future data collection. Keywords: Combine yield monitor, Context, Data analysis, Integrity zones, Management zones, Metadata, Precision agriculture, Yield, Yield data.
APA, Harvard, Vancouver, ISO, and other styles
33

Gueta, Tomer, Vijay Barve, Thiloshon Nagarajah, Ashwin Agrawal, and Yohay Carmel. "Introducing bdclean: a user friendly biodiversity data cleaning pipeline." Biodiversity Information Science and Standards 2 (May 22, 2018): e25564. http://dx.doi.org/10.3897/biss.2.25564.

Full text
Abstract:
A new R package for biodiversity data cleaning, 'bdclean', was initiated in the Google Summer of Code (GSoC) 2017 and is available on github. Several R packages have great data validation and cleaning functions, but 'bdclean' provides features to manage a complete pipeline for biodiversity data cleaning; from data quality explorations, to cleaning procedures and reporting. Users are able go through the quality control process in a very structured, intuitive, and effective way. A modular approach to data cleaning functionality should make this package extensible for many biodiversity data cleaning needs. Under GSoC 2018, 'bdclean' will go through a comprehensive upgrade. New features will be highlighted in the demonstration.
APA, Harvard, Vancouver, ISO, and other styles
34

Hodkiewicz, Melinda, and Mark Tien-Wei Ho. "Cleaning historical maintenance work order data for reliability analysis." Journal of Quality in Maintenance Engineering 22, no. 2 (May 9, 2016): 146–63. http://dx.doi.org/10.1108/jqme-04-2015-0013.

Full text
Abstract:
Purpose – The purpose of this paper is to identify quality issues with using historical work order (WO) data from computerised maintenance management systems for reliability analysis; and develop an efficient and transparent process to correct these data quality issues to ensure data is fit for purpose in a timely manner. Design/methodology/approach – This paper develops a rule-based approach to data cleansing and demonstrates the process on data for heavy mobile equipment from a number of organisations. Findings – Although historical WO records frequently contain missing or incorrect functional location, failure mode, maintenance action and WO status fields the authors demonstrate it is possible to make these records fit for purpose by using data in the freeform text fields; an understanding of the maintenance tactics and practices at the operation; and knowledge of where the asset is in its life cycle. The authors demonstrate that it is possible to have a repeatable and transparent process to deal with the data cleaning activities. Originality/value – How engineers deal with raw maintenance data and the decisions they make in order to produce a data set for reliability analysis is seldom discussed in detail. Assumptions and actions are often left undocumented. This paper describes typical data cleaning decisions we all have to make as a routine part of the analysis and presents a process to support the data cleaning decisions in a repeatable and transparent fashion.
APA, Harvard, Vancouver, ISO, and other styles
35

Oni, Samson, Zhiyuan Chen, Susan Hoban, and Onimi Jademi. "A Comparative Study of Data Cleaning Tools." International Journal of Data Warehousing and Mining 15, no. 4 (October 2019): 48–65. http://dx.doi.org/10.4018/ijdwm.2019100103.

Full text
Abstract:
In the information era, data is crucial in decision making. Most data sets contain impurities that need to be weeded out before any meaningful decision can be made from the data. Hence, data cleaning is essential and often takes more than 80 percent of time and resources of the data analyst. Adequate tools and techniques must be used for data cleaning. There exist a lot of data cleaning tools but it is unclear how to choose them in various situations. This research aims at helping researchers and organizations choose the right tools for data cleaning. This article conducts a comparative study of four commonly used data cleaning tools on two real data sets and answers the research question of which tool will be useful based on different scenario.
APA, Harvard, Vancouver, ISO, and other styles
36

S.Kulkarni, Prerana, and J. W. Bakal. "Hybrid Approaches for Data Cleaning in Data Warehouse." International Journal of Computer Applications 88, no. 18 (February 14, 2014): 7–10. http://dx.doi.org/10.5120/15450-3813.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Van den Broeck, Jan, Solveig Argeseanu Cunningham, Roger Eeckels, and Kobus Herbst. "Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities." PLoS Medicine 2, no. 10 (September 6, 2005): e267. http://dx.doi.org/10.1371/journal.pmed.0020267.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

P.N.V., Syamala Rao. "A Comprehensive Survey of Financial Data Modelling Processes & Data Cleaning Methods Using Composite Coefficient." Journal of Advanced Research in Dynamical and Control Systems 12, no. 01-Special Issue (February 13, 2020): 882–99. http://dx.doi.org/10.5373/jardcs/v12sp1/20201141.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Zhang, S. Z., Y. R. Yu, and M. Z. Shen. "Study on Preprocessing Method of TCM Prescription Data in Data Mining." Journal of Physics: Conference Series 2025, no. 1 (September 1, 2021): 012027. http://dx.doi.org/10.1088/1742-6596/2025/1/012027.

Full text
Abstract:
Abstract Traditional Chinese medicine (TCM) prescriptions have been developed for thousands of years. Data forms are diverse, content is discrete and missing, and there are many uncertainties due to cultural and regional differences. Therefore, it has brought some difficulties to the mining of TCM prescriptions. Data based on the 3108 prescriptions for the treatment of typhoid fever, for example, is given priority to with data cleaning and data transformation of data preprocessing, prescriptions combined with multiple functions, expounds the unqualified prescriptions data cleansing, drug name normalization, dose for solving the problems of the unification, the data structured method, make the processed data can be effectively mining, It provides a strong support for exploring the compatibility law of prescription and the development of new drugs.
APA, Harvard, Vancouver, ISO, and other styles
40

Kumar, Rajnish. "Data Cleaning by Genetic Programming Technique." IOSR Journal of Engineering 03, no. 08 (August 2013): 45–51. http://dx.doi.org/10.9790/3021-03824551.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Gomathi, L. "Text Classification Method for Data Cleaning." IOSR Journal of Computer Engineering 7, no. 5 (2012): 45–54. http://dx.doi.org/10.9790/0661-0754554.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Krause, Todd B. "IE6.com. Cleaning data with OpenRefine." Folia Linguistica 55, s42-s2 (October 14, 2021): 527–33. http://dx.doi.org/10.1515/flin-2021-2038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Krause, Todd B. "IE6.com. Cleaning data with OpenRefine." Folia Linguistica 55, s42-s2 (October 14, 2021): 527–33. http://dx.doi.org/10.1515/flin-2021-2038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Clark, Stephen D., S. Grant-Muller, and Haibo Chen. "Cleaning of Matched License Plate Data." Transportation Research Record: Journal of the Transportation Research Board 1804, no. 1 (January 2002): 1–7. http://dx.doi.org/10.3141/1804-01.

Full text
Abstract:
Three methods for identifying outlying journey time observations collected as part of a motorway license plate matching exercise are presented. Each method is examined to ensure that it is comprehensible to transport practitioners, is able to correctly classify outliers, and is efficient in its application. The first method is a crude method based on percentiles. The second uses a mean absolute deviation test. The third method is a modification of a traditional z- or t-statistical test. Results from each method and combinations of methods are compared. The preferred method is judged to be the third method alone, which uses the median rather than the mean as its measure of location and the inter-quartile range rather than the standard deviation as its measure of variability. This method is seen to be robust to both the outliers themselves and the presence of incident conditions. The effectiveness of the method is demonstrated under a number of typical and atypical road traffic conditions. In particular, the method is applied to a different section of motorway and is shown to still produce useful results.
APA, Harvard, Vancouver, ISO, and other styles
45

Herbert, Katherine G., and Jason T. L. Wang. "Biological data cleaning: a case study." International Journal of Information Quality 1, no. 1 (2007): 60. http://dx.doi.org/10.1504/ijiq.2007.013376.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Jun, Sung-Hae, Seung-Joo Lee, and Kyung-Whan Oh. "Sparse Data Cleaning using Multiple Imputations." International Journal of Fuzzy Logic and Intelligent Systems 4, no. 1 (June 1, 2004): 119–24. http://dx.doi.org/10.5391/ijfis.2004.4.1.119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Mery, David Panda. "Does data cleaning disproportionately affect autistics?" Autism 22, no. 2 (November 29, 2016): 232. http://dx.doi.org/10.1177/1362361316673566.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Qahtan, Abdulhakim, Nan Tang, Mourad Ouzzani, Yang Cao, and Michael Stonebraker. "Pattern functional dependencies for data cleaning." Proceedings of the VLDB Endowment 13, no. 5 (January 2020): 684–97. http://dx.doi.org/10.14778/3377369.3377377.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Cheng, Reynold, Jinchuan Chen, and Xike Xie. "Cleaning uncertain data with quality guarantees." Proceedings of the VLDB Endowment 1, no. 1 (August 2008): 722–35. http://dx.doi.org/10.14778/1453856.1453935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Prokoshyna, Nataliya, Jaroslaw Szlichta, Fei Chiang, Renée J. Miller, and Divesh Srivastava. "Combining quantitative and logical data cleaning." Proceedings of the VLDB Endowment 9, no. 4 (December 2015): 300–311. http://dx.doi.org/10.14778/2856318.2856325.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography