Добірка наукової літератури з теми "Data imbalance problem"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Data imbalance problem".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Статті в журналах з теми "Data imbalance problem"
Tiwari, Himani. "Improvising Balancing Methods for Classifying Imbalanced Data." International Journal for Research in Applied Science and Engineering Technology 9, no. 9 (September 30, 2021): 1535–43. http://dx.doi.org/10.22214/ijraset.2021.38225.
Повний текст джерелаIsabella, S. Josephine, Sujatha Srinivasan, and G. Suseendran. "A Framework Using Binary Cross Entropy - Gradient Boost Hybrid Ensemble Classifier for Imbalanced Data Classification." Webology 18, no. 1 (April 29, 2021): 104–20. http://dx.doi.org/10.14704/web/v18i1/web18076.
Повний текст джерелаYogi, Abhishek, and Ratul Dey. "CLASS IMBALANCE PROBLEM IN DATA SCIENCE: REVIEW." International Research Journal of Computer Science 9, no. 4 (April 30, 2022): 56–60. http://dx.doi.org/10.26562/irjcs.2021.v0904.002.
Повний текст джерелаRendón, Eréndira, Roberto Alejo, Carlos Castorena, Frank J. Isidro-Ortega, and Everardo E. Granda-Gutiérrez. "Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem." Applied Sciences 10, no. 4 (February 14, 2020): 1276. http://dx.doi.org/10.3390/app10041276.
Повний текст джерелаSUN, YANMIN, ANDREW K. C. WONG, and MOHAMED S. KAMEL. "CLASSIFICATION OF IMBALANCED DATA: A REVIEW." International Journal of Pattern Recognition and Artificial Intelligence 23, no. 04 (June 2009): 687–719. http://dx.doi.org/10.1142/s0218001409007326.
Повний текст джерелаLiu, Tian Yu. "Research on Feature Selection for Imbalanced Problem from Fault Diagnosis on Gear." Advanced Materials Research 466-467 (February 2012): 886–90. http://dx.doi.org/10.4028/www.scientific.net/amr.466-467.886.
Повний текст джерелаKhoshgoftaar, Taghi M., Naeem Seliya, and Dennis J. Drown. "Evolutionary data analysis for the class imbalance problem." Intelligent Data Analysis 14, no. 1 (January 22, 2010): 69–88. http://dx.doi.org/10.3233/ida-2010-0409.
Повний текст джерела., Hartono, Opim Salim Sitompul, Erna Budhiarti Nababan, Tulus ., Dahlan Abdullah, and Ansari Saleh Ahmar. "A New Diversity Technique for Imbalance Learning Ensembles." International Journal of Engineering & Technology 7, no. 2.14 (April 8, 2018): 478. http://dx.doi.org/10.14419/ijet.v7i2.11251.
Повний текст джерелаNaboureh, Amin, Ainong Li, Jinhu Bian, Guangbin Lei, and Meisam Amani. "A Hybrid Data Balancing Method for Classification of Imbalanced Training Data within Google Earth Engine: Case Studies from Mountainous Regions." Remote Sensing 12, no. 20 (October 11, 2020): 3301. http://dx.doi.org/10.3390/rs12203301.
Повний текст джерелаLiu, Zhenyan, Yifei Zeng, Pengfei Zhang, Jingfeng Xue, Ji Zhang, and Jiangtao Liu. "An Imbalanced Malicious Domains Detection Method Based on Passive DNS Traffic Analysis." Security and Communication Networks 2018 (June 20, 2018): 1–7. http://dx.doi.org/10.1155/2018/6510381.
Повний текст джерелаДисертації з теми "Data imbalance problem"
Gao, Jie. "Data Augmentation in Solving Data Imbalance Problems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-289208.
Повний текст джерелаDetta projekt fokuserar huvudsakligen på de olika metoderna för att lösa dataobalansproblem i fältet Natural Language Processing (NLP). Obalanserad textdata är ett vanligt problem i många uppgifter, särskilt klassificeringsuppgiften, vilket leder till att modellen inte kan förutsäga minoriteten Ibland kan vi till och med byta till en mer utmärkt och komplicerad modell inte förbättra prestandan, medan några enkla datastrategier som fokuserar på att lösa data obalanserade problem som överprov eller nedprovning ger positiva effekter på resultatet. vanliga datastrategier inkluderar några omprovningsmetoder som duplicerar nya data från originaldata eller tar bort originaldata för att få balans. Förutom det används vissa andra metoder som ordbyte, ordbyte och radering av ord i tidigare arbete Samtidigt har vissa djupinlärningsmodeller som BERT, GPT och fastText-modellen, som har en stark förmåga till en allmän förståelse av naturliga språk, så vi väljer några av dem för att lösa problemet med obalans i data. Det finns dock ingen systematisk jämförelse när man praktiserar dessa metoder. Exempelvis är överprovtagning och nedprovtagning snabba och enkla att använda i tidigare små skalor av datamängder. Med ökningen av datauppsättningen är de nya genererade data från vissa djupa nätverksmodeller mer kompatibla med originaldata. Därför fokuserar vårt arbete på hur prestandan för olika dataförstärkningstekniker används när de används för att lösa dataobalansproblem, givet datamängden och uppgiften? Efter experimentet visar både kvalitativa och kvantitativa experimentella resultat att olika metoder har sina fördelar för olika datamängder. I allmänhet kan dataförstärkning förbättra prestandan hos klassificeringsmodeller. För specifika, BERT speciellt vår finjusterade BERT har en utmärkt förmåga i de flesta med hjälp av scenarier (olika skalor och typer av datamängden). Ändå har andra tekniker som Back-translation bättre prestanda i lång textdata, till och med det kostar mer tid och har en komplicerad modell. Sammanfattningsvis lämpliga val för metoder för dataökning kan hjälpa till att lösa problem med obalans i data.
Barella, Victor Hugo. "Técnicas para o problema de dados desbalanceados em classificação hierárquica." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-06012016-145045/.
Повний текст джерелаRecent advances in science and technology have made possible the data growth in quantity and availability. Along with this explosion of generated information, there is a need to analyze data to discover new and useful knowledge. Thus, areas for extracting knowledge and useful information in large datasets have become great opportunities for the advancement of research, such as Machine Learning (ML) and Data Mining (DM). However, there are some limitations that may reduce the accuracy of some traditional algorithms of these areas, for example the imbalance of classes samples in a dataset. To mitigate this drawback, some solutions have been the target of research in recent years, such as the development of techniques for artificial balancing data, algorithm modification and new approaches for imbalanced data. An area little explored in the data imbalance vision are the problems of hierarchical classification, in which the classes are organized into hierarchies, commonly in the form of tree or DAG (Direct Acyclic Graph). The goal of this work aims at investigating the limitations and approaches to minimize the effects of imbalanced data with hierarchical classification problems. The experimental results show the need to take into account the features of hierarchical classes when deciding the application of techniques for imbalanced data in hierarchical classification.
Gao, Ming. "A study on imbalanced data classification problems." Thesis, University of Reading, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.602707.
Повний текст джерелаJeatrakul, Piyasak. "Enhancing classification performance over noise and imbalanced data problems." Thesis, Jeatrakul, Piyasak (2012) Enhancing classification performance over noise and imbalanced data problems. PhD thesis, Murdoch University, 2012. https://researchrepository.murdoch.edu.au/id/eprint/10044/.
Повний текст джерелаPan, Yi-Ying, and 潘怡瑩. "Clustering-based Data Preprocessing Approach for the Class Imbalance Problem." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/94nys8.
Повний текст джерела國立中央大學
資訊管理學系
106
The class imbalance problem is an important issue in data mining. It occurs when the number of samples in one class is much larger than the other classes. Traditional classifiers tend to misclassify most samples of the minority class into the majority class for maximizing the overall accuracy. This phenomenon makes it hard to establish a good classification rule for the minority class. The class imbalance problem often occurs in many real world applications, such as fault diagnosis, medical diagnosis and face recognition. To deal with the class imbalance problem, a clustering-based data preprocessing approach is proposed, where two different clustering techniques including affinity propagation clustering and K-means clustering are used individually to divide the majority class into several subclasses resulting in multiclass data. This approach can effectively reduce the class imbalance ratio of the training dataset, shorten the class training time and improve classification performance. Our experiments based on forty-four small class imbalance datasets from KEEL and eight high-dimensional datasets from NASA to build five types of classification models, which are C4.5, MLP, Naïve Bayes, SVM and k-NN (k=5). In addition, we also employ the classifier ensemble algorithm. This research tries to compare AUC results between different clustering techniques, different classification models and the number of clusters of K-means clustering in order to find out the best configuration of the proposed approach and compare with other literature methods. Finally, the experimental results of the KEEL datasets show that k-NN (k=5) algorithm is the best choice regardless of whether affinity propagation or K-means (K=5); the experimental results of NASA datasets show that the performance of the proposed approach is superior to the literature methods for the high-dimensional datasets.
Komba, Lyee, and Lyee Komba. "Sampling Techniques for Class Imbalance Problem in Aviation Safety Incidents Data." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/jg2y52.
Повний текст джерела國立臺北科技大學
電資國際專班
106
Like any other industries in the world, the aviation industry has a variety data acquired everyday through numerous data management systems. Structured and unstructured data are being collected through aircraft systems, maintenance systems, supply systems, ticketing and booking systems, and many other systems that are utilized in the daily operations of aviation business. Data mining can be used to analyze all these different types of data to generate meaningful information that can improve future performance, safety and profitability for aviation business and operations. This thesis presents details of data mining methods based on aviation incident data to predict incidents with fatal or a death consequence. Other literature have applied data mining techniques within the aviation industry include prediction of passenger travel, meteorological prediction, component failure prediction and other fatal incident prediction literature that aimed at finding the right features. This study uses the public dataset from the Federal Aviation Authority Accidents and Incidents Data System (FAA AIDS) website – data records from the year 2000 to year 2017. Our goal is to build a prediction model for fatal incidents and generate decision rules or factors contributing to incidents that have fatal results. In this way, the model to be built will be a predictive risk management system for aviation safety. The aviation industry generally operates at a safe state because of the transition from reactive safety and risk management to a proactive safety management approach; and now a predictive approach to safety management with the application of data mining techniques such as from this study and others. Over time, the number of systems has increased and the number of aviation accidents and serious incidents has decreased. Hence, a 0.6% of incidents with fatal consequences was attained from our analysis. During the data preprocessing stage, a problem of unbalanced dataset is encountered that invokes us to propose some techniques to solve the issue. Unbalanced datasets are datasets where least number of data is representing the minority classes than the majority class, especially when the analysis is aimed at the minority class. Not dealing with this issue correctly may result in poor performing models or misclassified data. With the increase of the travelling population in the aviation community, safety is paramount so coming up with a relatively precise model is important. In order to come up with a precise model/classifier, we need to preprocess and resample the data efficiently. This thesis also looks at combating the issue of unbalanced data to come up with a balanced data that can be used to train a classifier to design a precise model. We applied the following sampling technique in R Studio– oversampling, under-sampling, SMOTE and bootstrap samples to solve the imbalanced data. The resulting dataset from the unbalanced dataset resolution techniques are used to train different classifiers and the performance of the classifiers are measured and discussed in this thesis.
Yao, Guan-Ting, and 姚冠廷. "A Two-Stage Hybrid Data Preprocessing Approach for the Class Imbalance Problem." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/dm48kk.
Повний текст джерела國立中央大學
資訊管理學系
105
The class imbalance problem is an important issue in data mining. The class skewed distribution occurs when the number of examples that represent one class is much lower than the ones of the other classes. The traditional classifiers tend to misclassify most samples in the minority class into the majority class because of maximizing the overall accuracy. This phenomenon limits the construction of effective classifiers for the precious minority class. This problem occurs in many real-world applications, such as fault diagnosis, medical diagnosis and face recognition. To deal with the class imbalance problem, I proposed a two-stage hybrid data preprocessing framework based on clustering and instance selection techniques. This approach filters out the noisy data in the majority class and can reduce the execution time for classifier training. More importantly, it can decrease the effect of class imbalance and perform very well in the classification task. Our experiments using 44 class imbalance datasets from KEEL to build four types of classification models, which are C4.5, k-NN, Naïve Bayes and MLP. In addition, the classifier ensemble algorithm is also employed. In addition, two kinds of clustering techniques and three kinds of instance selection algorithms are used in order to find out the best combination suited for the proposed method. The experimental results show that the proposed framework performs better than many well-known state-of-the-art approaches in terms of AUC. In particular, the proposed framework combined with bagging based MLP ensemble classifiers perform the best, which provide 92% of AUC.
吳思翰. "Combine Particle Swarm Optimization and Mahalonobis-Taguchi System for Solving Classification Problem in Imbalance Data." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/06887158161687794935.
Повний текст джерелаChang, Yu-shan, and 張毓珊. "Developing Data Mining Models for Class Imbalance Problems." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/57781951199735409394.
Повний текст джерела朝陽科技大學
資訊管理系碩士班
98
In classification problems, the class imbalance problem would cause a bias on the training of classifiers and result in a low predictive accuracy over the minority class examples. This problem is caused by imbalanced data in which almost all examples belong to one class and far fewer instances belong to others. Compared with the majority examples, the minority examples are usually more interesting class, such as rare diseases in medical diagnosis data, failures in inspection data, frauds in credit screening data, and so on. When inducing knowledge from an imbalanced data set, traditional data mining algorithms will seek high classification accuracy for the majority class, but an unacceptable error rate for the minority class. Therefore, they are not suitable for handling the class imbalanced data. In order to tackle the class imbalance problem, this study aims to (1) find a robust classifier from different candidates including Decision Tree (DT), Logistic Regression (LR), Mahalanobis Distance (MD), and Support Vector Machines (SVM); (2) propose two novel methods called MD-SVM (a new two-phase learning scheme) and SWAI (SOM Weights As Input). Experimental results indicated our proposed MD-SVM and SWAI has better performance in identifying the minority class examples compared with traditional techniques such as under-sampling, cost adjusting, and cluster based sampling.
Liu, Yi-Hsun, and 劉奕勛. "Deep Discriminative Features Learning and Sampling for Imbalanced Data Problem." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/3cc7k8.
Повний текст джерела國立交通大學
資訊科學與工程研究所
106
The imbalanced data problem occurs in many application domains and is considered to be a challenging problem in machine learning and data mining. Oversampling may lead to overfitting, while undersampling may discard representative data samples. Additionally, most resampling methods for synthetic data focus on minority class without considering the data distribution of major classes. This paper presents an algorithm that combines feature embedding with the loss functions from discriminative feature learning in deep learning to generate synthetic data samples. In contrast to previous works, the proposed method considers both majority classes and minority classes to learn feature embeddings and utilizes appropriate loss functions to make feature embedding as discriminative as possible. The proposed method is a comprehensive framework and different feature extractors can be utilized for different domains. We conduct experiments utilizing eight numerical datasets and one image dataset based on multiclass classification tasks. The experimental results indicate that the proposed method provides accurate and stable results. Additionally, we thoroughly investigate the proposed method and utilize a visualization technique to determine why the proposed method can generate good data samples.
Книги з теми "Data imbalance problem"
Zabelina, Ol'ga, Irina Omel'chenko, Anna Mayorova, and Ekaterina Safonova. Human resource Development in the Digital Age: Strategic Challenges, Challenges, and Opportunities. ru: INFRA-M Academic Publishing LLC., 2021. http://dx.doi.org/10.12737/1243772.
Повний текст джерелаBennett, Jeremy, and Kara Siegrist. Myocardial Ischemia. Edited by Matthew D. McEvoy and Cory M. Furse. Oxford University Press, 2017. http://dx.doi.org/10.1093/med/9780190226459.003.0005.
Повний текст джерелаKirchman, David L. The nitrogen cycle. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198789406.003.0012.
Повний текст джерелаЧастини книг з теми "Data imbalance problem"
Ling, Charles X., and Victor S. Sheng. "Class Imbalance Problem." In Encyclopedia of Machine Learning and Data Mining, 204–5. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_110.
Повний текст джерелаGosain, Anjana, Arushi Gupta, and Deepika Singh. "Hybrid Data-Level Techniques for Class Imbalance Problem." In Advances in Intelligent Systems and Computing, 1131–41. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-5113-0_95.
Повний текст джерелаKozal, Jȩdrzej, and Paweł Ksieniewicz. "Imbalance Reduction Techniques Applied to ECG Classification Problem." In Intelligent Data Engineering and Automated Learning – IDEAL 2019, 323–31. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-33617-2_33.
Повний текст джерелаHoens, T. Ryan, Qi Qian, Nitesh V. Chawla, and Zhi-Hua Zhou. "Building Decision Trees for the Multi-class Imbalance Problem." In Advances in Knowledge Discovery and Data Mining, 122–34. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-30217-6_11.
Повний текст джерелаHimaja, D., T. Maruthi Padmaja, and P. Radha Krishna. "Oversample Based Large Scale Support Vector Machine for Online Class Imbalance Problem." In Big Data Analytics, 348–62. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-04780-1_24.
Повний текст джерелаHoens, T. Ryan, and Nitesh V. Chawla. "Generating Diverse Ensembles to Counter the Problem of Class Imbalance." In Advances in Knowledge Discovery and Data Mining, 488–99. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-13672-6_46.
Повний текст джерелаSasirekha, R., B. Kanisha, and S. Kaliraj. "Study on Class Imbalance Problem with Modified KNN for Classification." In Intelligent Data Communication Technologies and Internet of Things, 207–17. Singapore: Springer Singapore, 2022. http://dx.doi.org/10.1007/978-981-16-7610-9_15.
Повний текст джерелаMalhotra, Ruchika, and Kusum Lata. "Tackling the Imbalanced Data in Software Maintainability Prediction Using Ensembles for Class Imbalance Problem." In Advances in Interdisciplinary Research in Engineering and Business Management, 391–99. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-0037-1_31.
Повний текст джерелаZangari, Murilo, Wesley Romão, and Ademir Aparecido Constantino. "Extensions of Ant-Miner Algorithm to Deal with Class Imbalance Problem." In Intelligent Data Engineering and Automated Learning - IDEAL 2012, 9–18. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-32639-4_2.
Повний текст джерелаAl_Janabi, Samaher, and Fatma Razaq. "A Novel Tool DSMOTE to Handel Imbalance Customer Churn Problem in Telecommunication Industry." In Big Data and Networks Technologies, 36–50. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-23672-4_4.
Повний текст джерелаТези доповідей конференцій з теми "Data imbalance problem"
Li, Yanling, Guoshe Sun, and Yehang Zhu. "Data Imbalance Problem in Text Classification." In 2010 Third International Symposium on Information Processing (ISIP). IEEE, 2010. http://dx.doi.org/10.1109/isip.2010.47.
Повний текст джерелаSarmanova, Akkenzhe, and S. Albayrak. "Alleviating class imbalance problem in data mining." In 2013 21st Signal Processing and Communications Applications Conference (SIU). IEEE, 2013. http://dx.doi.org/10.1109/siu.2013.6531574.
Повний текст джерелаJohnson, Reid A., Nitesh V. Chawla, and Jessica J. Hellmann. "Species distribution modeling and prediction: A class imbalance problem." In 2012 Conference on Intelligent Data Understanding (CIDU). IEEE, 2012. http://dx.doi.org/10.1109/cidu.2012.6382186.
Повний текст джерелаWang, Jing, and Min-Ling Zhang. "Towards Mitigating the Class-Imbalance Problem for Partial Label Learning." In KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3219819.3220008.
Повний текст джерелаAn, Chunsheng, Jingtong Sun, Yifeng Wang, and Qingjie Wei. "A K-means Improved CTGAN Oversampling Method for Data Imbalance Problem." In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). IEEE, 2021. http://dx.doi.org/10.1109/qrs54544.2021.00097.
Повний текст джерелаSu, Guangxin, Weitong Chen, and Miao Xu. "Positive-Unlabeled Learning from Imbalanced Data." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/412.
Повний текст джерелаRashu, Raisul Islam, Naheena Haq, and Rashedur M. Rahman. "Data mining approaches to predict final grade by overcoming class imbalance problem." In 2014 17th International Conference on Computer and Information Technology (ICCIT). IEEE, 2014. http://dx.doi.org/10.1109/iccitechn.2014.7073095.
Повний текст джерелаMwangi, Peter Irungu, Lawrence Nderu, Leah Mutanu, and Dorcas Gicuku Mwigereri. "Hybrid Ensemble Model for Handling Class Imbalance Problem in Big Data Analytics." In 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET). IEEE, 2022. http://dx.doi.org/10.1109/icecet55527.2022.9872764.
Повний текст джерелаBaro, Pranita, and Malaya Dutta Borah. "A Hybrid Resampling Approach to Handle Class Imbalance Problem and Missing Data." In 2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON). IEEE, 2022. http://dx.doi.org/10.1109/upcon56432.2022.9986452.
Повний текст джерелаZhang, Xiaowan, and Bao-Gang Hu. "Learning in the Class Imbalance Problem When Costs are Unknown for Errors and Rejects." In 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE, 2012. http://dx.doi.org/10.1109/icdmw.2012.167.
Повний текст джерелаЗвіти організацій з теми "Data imbalance problem"
Lurie, Susan, John Labavitch, Ruth Ben-Arie, and Ken Shackel. Woolliness in Peaches and Nectarines. United States Department of Agriculture, 1995. http://dx.doi.org/10.32747/1995.7570557.bard.
Повний текст джерела