Log in

Relevant bibliographies by topics / XGBOOST PREDICTION MODEL / Journal articles

To see the other types of publications on this topic, follow the link: XGBOOST PREDICTION MODEL.

Journal articles on the topic 'XGBOOST PREDICTION MODEL'

Author: Grafiati

Published: 11 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'XGBOOST PREDICTION MODEL.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhao, Haolei, Yixian Wang, Xian Li, Panpan Guo, and Hang Lin. "Prediction of Maximum Tunnel Uplift Caused by Overlying Excavation Using XGBoost Algorithm with Bayesian Optimization." Applied Sciences 13, no. 17 (August 28, 2023): 9726. http://dx.doi.org/10.3390/app13179726.

Full text

Abstract:

The uplifting behaviors of existing tunnels due to overlying excavations are complex and non-linear. They are contributed to by multiple factors, and therefore, they are difficult to be accurately predicted. To address this issue, an extreme gradient boosting (XGBoost) prediction model based on Bayesian optimization (BO), namely, BO-XGBoost, was developed specifically for assessing the tunnel uplift. The modified model incorporated various factors such as an engineering design, soil types, and site construction conditions as input parameters. The performance of the BO-XGBoost model was compared with other models such as support vector machines (SVMs), the classification and regression tree (CART) model, and the extreme gradient boosting (XGBoost) model. In preparation for the model, 170 datasets from a construction site were collected and divided into 70% for training and 30% for testing. The BO-XGBoost model demonstrated a superior predictive performance, providing the most accurate displacement predictions and exhibiting better generalization capabilities. Further analysis revealed that the accuracy of the BO-XGBoost model was primarily influenced by the site’s construction factors. The interpretability of the BO-XGBoost model will provide valuable guidance for geotechnical practitioners in their decision-making processes.

APA, Harvard, Vancouver, ISO, and other styles

2

Gu, Xinqin, Li Yao, and Lifeng Wu. "Prediction of Water Carbon Fluxes and Emission Causes in Rice Paddies Using Two Tree-Based Ensemble Algorithms." Sustainability 15, no. 16 (August 13, 2023): 12333. http://dx.doi.org/10.3390/su151612333.

Full text

Abstract:

Quantification of water carbon fluxes in rice paddies and analysis of their causes are essential for agricultural water management and carbon budgets. In this regard, two tree-based machine learning models, which are extreme gradient boosting (XGBoost) and random forest (RF), were constructed to predict evapotranspiration (ET), net ecosystem carbon exchange (NEE), and methane flux (FCH4) in seven rice paddy sites. During the training process, the k-fold cross-validation algorithm by splitting the available data into multiple subsets or folds to avoid overfitting, and the XGBoost model was used to assess the importance of input factors. When predicting ET, the XGBoost model outperformed the RF model at all sites. Solar radiation was the most important input to ET predictions. Except for the KR-CRK site, the prediction for NEE was that the XGBoost models also performed better in the other six sites, and the root mean square error decreased by 0.90–11.21% compared to the RF models. Among all sites (except for the absence of net radiation (NETRAD) data at the JP-Mse site), NETRAD and normalized difference vegetation index (NDVI) performed well for predicting NEE. Air temperature, soil water content (SWC), and longwave radiation were particularly important at individual sites. Similarly, the XGBoost model was more capable of predicting FCH4 than the RF model, except for the IT-Cas site. FCH4 sensitivity to input factors varied from site to site. SWC, ecosystem respiration, NDVI, and soil temperature were important for FCH4 prediction. It is proposed to use the XGBoost model to model water carbon fluxes in rice paddies.

APA, Harvard, Vancouver, ISO, and other styles

3

Liu, Jialin, Jinfa Wu, Siru Liu, Mengdie Li, Kunchang Hu, and Ke Li. "Predicting mortality of patients with acute kidney injury in the ICU using XGBoost model." PLOS ONE 16, no. 2 (February 4, 2021): e0246306. http://dx.doi.org/10.1371/journal.pone.0246306.

Full text

Abstract:

Purpose The goal of this study is to construct a mortality prediction model using the XGBoot (eXtreme Gradient Boosting) decision tree model for AKI (acute kidney injury) patients in the ICU (intensive care unit), and to compare its performance with that of three other machine learning models. Methods We used the eICU Collaborative Research Database (eICU-CRD) for model development and performance comparison. The prediction performance of the XGBoot model was compared with the other three machine learning models. These models included LR (logistic regression), SVM (support vector machines), and RF (random forest). In the model comparison, the AUROC (area under receiver operating curve), accuracy, precision, recall, and F1 score were used to evaluate the predictive performance of each model. Results A total of 7548 AKI patients were analyzed in this study. The overall in-hospital mortality of AKI patients was 16.35%. The best performing algorithm in this study was XGBoost with the highest AUROC (0.796, p < 0.01), F1(0.922, p < 0.01) and accuracy (0.860). The precision (0.860) and recall (0.994) of the XGBoost model rank second among the four models. Conclusion XGBoot model had obvious advantages of performance compared to the other machine learning models. This will be helpful for risk identification and early intervention for AKI patients at risk of death.

APA, Harvard, Vancouver, ISO, and other styles

4

Wang, Jun, Wei Rong, Zhuo Zhang, and Dong Mei. "Credit Debt Default Risk Assessment Based on the XGBoost Algorithm: An Empirical Study from China." Wireless Communications and Mobile Computing 2022 (March 19, 2022): 1–14. http://dx.doi.org/10.1155/2022/8005493.

Full text

Abstract:

The bond market is an important part of China’s capital market. However, defaults have become frequent in the bond market in recent years, and consequently, the default risk of Chinese credit bonds has become increasingly prominent. Therefore, the assessment of default risk is particularly important. In this paper, we utilize 31 indicators at the macroeconomic level and the corporate microlevel for the prediction of bond defaults, and we conduct principal component analysis to extract 10 principal components from them. We use the XGBoost algorithm to analyze the importance of variables and assess the credit debt default risk based on the XGBoost prediction model through the calculation of evaluation indicators such as the area under the ROC curve (AUC), accuracy, precision, recall, and F1-score, in order to evaluate the classification prediction effect of the model. Finally, the grid search algorithm and k -fold cross-validation are used to optimize the parameters of the XGBoost model and determine the final classification prediction model. Existing research has focused on the selection of bond default risk prediction indicators and the application of XGBoost algorithm in default risk prediction. After optimization of the parameters, the optimized XGBoost algorithm is found to be more accurate than the original algorithm. The grid search and k -fold cross-validation algorithms are used to optimize the XGBoost model for predicting the default risk of credit bonds, resulting in higher accuracy of the proposed model. Our research results demonstrate that the optimized XGBoost model has a significantly improved prediction accuracy, compared to the original model, which is beneficial to improving the prediction effect for practical applications.

APA, Harvard, Vancouver, ISO, and other styles

5

Gu, Zhongyuan, Miaocong Cao, Chunguang Wang, Na Yu, and Hongyu Qing. "Research on Mining Maximum Subsidence Prediction Based on Genetic Algorithm Combined with XGBoost Model." Sustainability 14, no. 16 (August 22, 2022): 10421. http://dx.doi.org/10.3390/su141610421.

Full text

Abstract:

The extreme gradient boosting (XGBoost) ensemble learning algorithm excels in solving complex nonlinear relational problems. In order to accurately predict the surface subsidence caused by mining, this work introduces the genetic algorithm (GA) and XGBoost integrated algorithm model for mining subsidence prediction and uses the Python language to develop the GA-XGBoost combined model. The hyperparameter vector of XGBoost is optimized by a genetic algorithm to improve the prediction accuracy and reliability of the XGBoost model. Using some domestic mining subsidence data sets to conduct a model prediction evaluation, the results show that the R2 (coefficient of determination) of the prediction results of the GA-XGBoost model is 0.941, the RMSE (root mean square error) is 0.369, and the MAE (mean absolute error) is 0.308. Then, compared with classic ensemble learning models such as XGBoost, random deep forest, and gradient boost, the GA-XGBoost model has higher prediction accuracy and performance than a single machine learning model.

APA, Harvard, Vancouver, ISO, and other styles

6

Kang, Leilei, Guojing Hu, Hao Huang, Weike Lu, and Lan Liu. "Urban Traffic Travel Time Short-Term Prediction Model Based on Spatio-Temporal Feature Extraction." Journal of Advanced Transportation 2020 (August 14, 2020): 1–16. http://dx.doi.org/10.1155/2020/3247847.

Full text

Abstract:

In order to improve the accuracy of short-term travel time prediction in an urban road network, a hybrid model for spatio-temporal feature extraction and prediction of urban road network travel time is proposed in this research, which combines empirical dynamic modeling (EDM) and complex networks (CN) with an XGBoost prediction model. Due to the highly nonlinear and dynamic nature of travel time series, it is necessary to consider time dependence and the spatial reliance of travel time series for predicting the travel time of road networks. The dynamic feature of the travel time series can be revealed by the EDM method, a nonlinear approach based on Chaos theory. Further, the spatial characteristic of urban traffic topology can be reflected from the perspective of complex networks. To fully guarantee the reasonability and validity of spatio-temporal features, which are dug by empirical dynamic modeling and complex networks (EDMCN), for urban traffic travel time prediction, an XGBoost prediction model is established for those characteristics. Through the in-depth exploration of the travel time and topology of a particular road network in Guiyang, the EDMCN-XGBoost prediction model’s performance is verified. The results show that, compared with the single XGBoost, autoregressive moving average, artificial neural network, support vector machine, and other models, the proposed EDMCN-XGBoost prediction model presents a better performance in forecasting.

APA, Harvard, Vancouver, ISO, and other styles

7

Wang, Wenle, Wentao Xiong, Jing Wang, Lei Tao, Shan Li, Yugen Yi, Xiang Zou, and Cui Li. "A User Purchase Behavior Prediction Method Based on XGBoost." Electronics 12, no. 9 (April 28, 2023): 2047. http://dx.doi.org/10.3390/electronics12092047.

Full text

Abstract:

With the increasing use of electronic commerce, online purchasing users have been rapidly rising. Predicting user behavior has therefore become a vital issue based on the collected data. However, traditional machine learning algorithms for prediction require significant computing time and often produce unsatisfactory results. In this paper, a prediction model based on XGBoost is proposed to predict user purchase behavior. Firstly, a user value model (LDTD) utilizing multi-feature fusion is proposed to differentiate between user types based on the available user account data. The multi-feature behavior fusion is carried out to generate the user tag feature according to user behavior patterns. Next, the XGBoost feature importance model is employed to analyze multi-dimensional features and identify the model with the most significant weight value as the key feature for constructing the model. This feature, together with other user features, is then used for prediction via the XGBoost model. Compared to existing machine learning models such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest (RF), and Back Propagation Neural Network (BPNN), the eXtreme Gradient Boosting (XGBoost) model outperforms with an accuracy of 0.9761, an F1 score of 0.9763, and a ROC value of 0.9768. Thus, the XGBoost model demonstrates superior stability and algorithm efficiency, making it an ideal choice for predicting user purchase behavior with high levels of accuracy.

APA, Harvard, Vancouver, ISO, and other styles

8

Oubelaid, Adel, Abdelhameed Ibrahim, and Ahmed M. Elshewey. "Bridging the Gap: An Explainable Methodology for Customer Churn Prediction in Supply Chain Management." Journal of Artificial Intelligence and Metaheuristics 4, no. 1 (2023): 16–23. http://dx.doi.org/10.54216/jaim.040102.

Full text

Abstract:

Customer churn prediction is a critical task for businesses aiming to retain their valuable customers. Nevertheless, the lack of transparency and interpretability in machine learning models hinders their implementation in real-world applications. In this paper, we introduce a novel methodology for customer churn prediction in supply chain management that addresses the need for explainability. Our approach take advantage of XGBoost as the underlying predictive model. We recognize the importance of not only accurately predicting churn but also providing actionable insights into the key factors driving customer attrition. To achieve this, we employ Local Interpretable Model-agnostic Explanations (LIME), a state-of-the-art technique for generating intuitive and understandable explanations. By utilizing LIME to the predictions made by XGBoost, we enable decision-makers to gain intuition into the decision process of the model and the reasons behind churn predictions. Through a comprehensive case study on customer churn data, we demonstrate the success of our explainable ML approach. Our methodology not only achieves high prediction accuracy but also offers interpretable explanations that highlight the underlying drivers of customer churn. These insights supply valuable management for decision-making processes within supply chain management.

APA, Harvard, Vancouver, ISO, and other styles

9

Liu, Yuan, Wenyi Du, Yi Guo, Zhiqiang Tian, and Wei Shen. "Identification of high-risk factors for recurrence of colon cancer following complete mesocolic excision: An 8-year retrospective study." PLOS ONE 18, no. 8 (August 11, 2023): e0289621. http://dx.doi.org/10.1371/journal.pone.0289621.

Full text

Abstract:

Background Colon cancer recurrence is a common adverse outcome for patients after complete mesocolic excision (CME) and greatly affects the near-term and long-term prognosis of patients. This study aimed to develop a machine learning model that can identify high-risk factors before, during, and after surgery, and predict the occurrence of postoperative colon cancer recurrence. Methods The study included 1187 patients with colon cancer, including 110 patients who had recurrent colon cancer. The researchers collected 44 characteristic variables, including patient demographic characteristics, basic medical history, preoperative examination information, type of surgery, and intraoperative information. Four machine learning algorithms, namely extreme gradient boosting (XGBoost), random forest (RF), support vector machine (SVM), and k-nearest neighbor algorithm (KNN), were used to construct the model. The researchers evaluated the model using the k-fold cross-validation method, ROC curve, calibration curve, decision curve analysis (DCA), and external validation. Results Among the four prediction models, the XGBoost algorithm performed the best. The ROC curve results showed that the AUC value of XGBoost was 0.962 in the training set and 0.952 in the validation set, indicating high prediction accuracy. The XGBoost model was stable during internal validation using the k-fold cross-validation method. The calibration curve demonstrated high predictive ability of the XGBoost model. The DCA curve showed that patients who received interventional treatment had a higher benefit rate under the XGBoost model. The external validation set’s AUC value was 0.91, indicating good extrapolation of the XGBoost prediction model. Conclusion The XGBoost machine learning algorithm-based prediction model for colon cancer recurrence has high prediction accuracy and clinical utility.

APA, Harvard, Vancouver, ISO, and other styles

10

He, Wenwen, Hongli Le, and Pengcheng Du. "Stroke Prediction Model Based on XGBoost Algorithm." International Journal of Applied Sciences & Development 1 (December 13, 2022): 7–10. http://dx.doi.org/10.37394/232029.2022.1.2.

Full text

Abstract:

In this paper, individual sample data randomly measured are preprocessed, for example, outliers values are deleted and the characteristics of the samples are normalized to between 0 and 1. The correlation analysis approach is then used to determine and rank the relevance of stroke characteristics, and factors with poor correlation are discarded. The samples are randomly split into a 70% training set and a 30% testing set. Finally,the random forest model and XGBoost algorithm combined with cross-validation and grid search method are implemented to learn the stroke characteristics. The accuracy of the testing set by the XGBoost algorithm is 0.9257, which is better than that of the random forest model with 0.8991. Thus, the XGBoost model is selected to predict the stroke for ten people, and the obtained conclusion is that two people have a stroke and eight people have no stroke.

APA, Harvard, Vancouver, ISO, and other styles

11

Shin, Juyoung, Joonyub Lee, Taehoon Ko, Kanghyuck Lee, Yera Choi, and Hun-Sung Kim. "Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness." Journal of Personalized Medicine 12, no. 11 (November 14, 2022): 1899. http://dx.doi.org/10.3390/jpm12111899.

Full text

Abstract:

The early prediction of diabetes can facilitate interventions to prevent or delay it. This study proposes a diabetes prediction model based on machine learning (ML) to encourage individuals at risk of diabetes to employ healthy interventions. A total of 38,379 subjects were included. We trained the model on 80% of the subjects and verified its predictive performance on the remaining 20%. Furthermore, the performances of several algorithms were compared, including logistic regression, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Cox regression, and XGBoost Survival Embedding (XGBSE). The area under the receiver operating characteristic curve (AUROC) of the XGBoost model was the largest, followed by those of the decision tree, logistic regression, and random forest models. For the survival analysis, XGBSE yielded an AUROC exceeding 0.9 for the 2- to 9-year predictions and a C-index of 0.934, while the Cox regression achieved a C-index of 0.921. After lowering the threshold from 0.5 to 0.25, the sensitivity increased from 0.011 to 0.236 for the 2-year prediction model and from 0.607 to 0.994 for the 9-year prediction model, while the specificity showed negligible changes. We developed a high-performance diabetes prediction model that applied the XGBSE algorithm with threshold adjustment. We plan to use this prediction model in real clinical practice for diabetes prevention after simplifying and validating it externally.

APA, Harvard, Vancouver, ISO, and other styles

12

Wu, Kehe, Yanyu Chai, Xiaoliang Zhang, and Xun Zhao. "Research on Power Price Forecasting Based on PSO-XGBoost." Electronics 11, no. 22 (November 16, 2022): 3763. http://dx.doi.org/10.3390/electronics11223763.

Full text

Abstract:

With the reform of the power system, the prediction of power market pricing has become one of the key problems that needs to be solved in time. Power price prediction plays an important role in maximizing the profits of the participants in the power market and making full use of power energy. In order to improve the prediction accuracy of the power price, this paper proposes a power price prediction method based on PSO optimization of the XGBoost model, which optimizes eight main parameters of the XGBoost model through particle swarm optimization to improve the prediction accuracy of the XGBoost model. Using the electricity price data of Australia from January to December 2019, the proposed model is compared with the XGBoost model. The experimental results show that PSO can effectively improve the performance of the model. In addition, the prediction results of PSO-XGBoost are compared with those of SVM, LSTM, ARIMA, RW and XGBoost, and the average relative error and root mean square error of different power price prediction models are calculated. The experimental results show that the prediction accuracy of the PSO-XGBoost model is higher and more in line with the actual trend of power price change.

APA, Harvard, Vancouver, ISO, and other styles

13

Xiong, Shuai, Zhixiang Liu, Chendi Min, Ying Shi, Shuangxia Zhang, and Weijun Liu. "Compressive Strength Prediction of Cemented Backfill Containing Phosphate Tailings Using Extreme Gradient Boosting Optimized by Whale Optimization Algorithm." Materials 16, no. 1 (December 28, 2022): 308. http://dx.doi.org/10.3390/ma16010308.

Full text

Abstract:

Unconfined compressive strength (UCS) is the most significant mechanical index for cemented backfill, and it is mainly determined by traditional mechanical tests. This study optimized the extreme gradient boosting (XGBoost) model by utilizing the whale optimization algorithm (WOA) to construct a hybrid model for the UCS prediction of cemented backfill. The PT proportion, the OPC proportion, the FA proportion, the solid concentration, and the curing age were selected as input variables, and the UCS of the cemented PT backfill was selected as the output variable. The original XGBoost model, the XGBoost model optimized by particle swarm optimization (PSO-XGBoost), and the decision tree (DT) model were also constructed for comparison with the WOA-XGBoost model. The results showed that the values of the root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE) obtained from the WOA-XGBoost model, XGBoost model, PSO-XGBoost model, and DT model were equal to (0.241, 0.967, 0.184), (0.426, 0.917, 0.336), (0.316, 0.943, 0.258), and (0.464, 0.852, 0.357), respectively. The results show that the proposed WOA-XGBoost has better prediction accuracy than the other machine learning models, confirming the ability of the WOA to enhance XGBoost in cemented PT backfill strength prediction. The WOA-XGBoost model could be a fast and accurate method for the UCS prediction of cemented PT backfill.

APA, Harvard, Vancouver, ISO, and other styles

14

Wang, Yu, Li Guo, Yanrui Zhang, and Xinyue Ma. "Research on CSI 300 Stock Index Price Prediction Based On EMD-XGBoost." Frontiers in Computing and Intelligent Systems 3, no. 1 (March 17, 2023): 72–77. http://dx.doi.org/10.54097/fcis.v3i1.6027.

Full text

Abstract:

The combination of artificial intelligence techniques and quantitative investment has given birth to various types of price prediction models based on machine learning algorithms. In this study, we verify the applicability of machine learning fused with statistical method models through the EMD-XGBoost model for stock price prediction. In the modeling process, specific solutions are proposed for overfitting problems that arise. The stock prediction model of machine learning fused with statistical learning was constructed from an empirical perspective, and an XGBoost algorithm model based on empirical modal decomposition was proposed. The data set selected for the experiment was the closing price of the CSI 300 index, and the model was judged by four indicators:mean absolute error, mean error, and root mean square error, etc. The method used for the experiment was the EMD-XGBoost network model, which had the following advantages: first, combining the empirical modal decomposition method with the XGBoost model is conducive to mining the time series data for Second, the decomposition of the CSI 300 index data by the empirical modal decomposition method is helpful to improve the accuracy of the XGBoost model for time series data prediction. The experiments show that the EMD-XGBoost model outperforms the single ARIMA or LSTM network model as well as the EMD-LSTM network model in terms of mean absolute error, mean error, and root mean square error.

APA, Harvard, Vancouver, ISO, and other styles

15

Yang, Tian. "Sales Prediction of Walmart Sales Based on OLS, Random Forest, and XGBoost Models." Highlights in Science, Engineering and Technology 49 (May 21, 2023): 244–49. http://dx.doi.org/10.54097/hset.v49i.8513.

Full text

Abstract:

The technique of estimating future sales levels for a good or service is known as sales forecasting. The corresponding forecasting methods range from initially qualitative analysis to later time series methods, regression analysis and econometric models, as well as machine learning methods that have emerged in recent decades. This paper compares the different performances of OLS, Random Forest and XGBoost machine learning models in predicting the sales of Walmart stores. According to the analysis, XGBoost model has the best sales forecasting ability. In the case of logarithmic sales, R2 of the XGBoost model is as high as 0.984, while MSE and MAE are only 0.065 and 0.124, respectively. The XGBoost model is therefore an option when making sales forecasts. These results compare different types of models, find out the best prediction model, and provide suggestions for future prediction model selection.

APA, Harvard, Vancouver, ISO, and other styles

16

Li, Kunluo. "A Sales Prediction Method Based on XGBoost Algorithm Model." BCP Business & Management 36 (January 13, 2023): 367–71. http://dx.doi.org/10.54691/bcpbm.v36i.3487.

Full text

Abstract:

Reasonable and accurate sales forecasting is an important issue for large chain stores. Forecasting short- and long-term product sales helps companies develop marketing strategies and inventory turnover plans. In today's ever-changing business environment, the application of artificial intelligence technology allows for more efficient processing of large amounts of data while taking into account many external factors such as the climate, consumer patterns, and financial situation. An XGBoost linear regression model for the Kaggle competition was trained using the dataset of Ecuadorian Favorita chain stores that was made available. The suggested prediction model seeks to address the seasonality and data scarcity issues. In the context of machine learning, producing several samples for both training and testing aids in our ability to assess the model's efficacy. The most popular technique for detecting overfitting and underfitting issues is to create various samples of data for training and testing models. The experimental findings demonstrate that the XGBoost linear regression model can reasonably provide scientifically based predictions for chain store sales and has a high prediction accuracy.

APA, Harvard, Vancouver, ISO, and other styles

17

Yang, Hao, Jiaxi Li, Siru Liu, Xiaoling Yang, and Jialin Liu. "Predicting Risk of Hypoglycemia in Patients With Type 2 Diabetes by Electronic Health Record–Based Machine Learning: Development and Validation." JMIR Medical Informatics 10, no. 6 (June 16, 2022): e36958. http://dx.doi.org/10.2196/36958.

Full text

Abstract:

Background Hypoglycemia is a common adverse event in the treatment of diabetes. To efficiently cope with hypoglycemia, effective hypoglycemia prediction models need to be developed. Objective The aim of this study was to develop and validate machine learning models to predict the risk of hypoglycemia in adult patients with type 2 diabetes. Methods We used the electronic health records of all adult patients with type 2 diabetes admitted to West China Hospital between November 2019 and December 2021. The prediction model was developed based on XGBoost and natural language processing. F1 score, area under the receiver operating characteristic curve (AUC), and decision curve analysis (DCA) were used as the main criteria to evaluate model performance. Results We included 29,843 patients with type 2 diabetes, of whom 2804 patients (9.4%) developed hypoglycemia. In this study, the embedding machine learning model (XGBoost3) showed the best performance among all the models. The AUC and the accuracy of XGBoost are 0.82 and 0.93, respectively. The XGboost3 was also superior to other models in DCA. Conclusions The Paragraph Vector–Distributed Memory model can effectively extract features and improve the performance of the XGBoost model, which can then effectively predict hypoglycemia in patients with type 2 diabetes.

APA, Harvard, Vancouver, ISO, and other styles

18

Syafrudin, Muhammad, Ganjar Alfian, Norma Latif Fitriyani, Muhammad Anshari, Tony Hadibarata, Agung Fatwanto, and Jongtae Rhee. "A Self-Care Prediction Model for Children with Disability Based on Genetic Algorithm and Extreme Gradient Boosting." Mathematics 8, no. 9 (September 15, 2020): 1590. http://dx.doi.org/10.3390/math8091590.

Full text

Abstract:

Detecting self-care problems is one of important and challenging issues for occupational therapists, since it requires a complex and time-consuming process. Machine learning algorithms have been recently applied to overcome this issue. In this study, we propose a self-care prediction model called GA-XGBoost, which combines genetic algorithms (GAs) with extreme gradient boosting (XGBoost) for predicting self-care problems of children with disability. Selecting the feature subset affects the model performance; thus, we utilize GA to optimize finding the optimum feature subsets toward improving the model’s performance. To validate the effectiveness of GA-XGBoost, we present six experiments: comparing GA-XGBoost with other machine learning models and previous study results, a statistical significant test, impact analysis of feature selection and comparison with other feature selection methods, and sensitivity analysis of GA parameters. During the experiments, we use accuracy, precision, recall, and f1-score to measure the performance of the prediction models. The results show that GA-XGBoost obtains better performance than other prediction models and the previous study results. In addition, we design and develop a web-based self-care prediction to help therapist diagnose the self-care problems of children with disabilities. Therefore, appropriate treatment/therapy could be performed for each child to improve their therapeutic outcome.

APA, Harvard, Vancouver, ISO, and other styles

19

Li, Weihong, and Xiujuan Xu. "Ensemble learning algorithm - research analysis on the management of financial fraud and violation in listed companies." Decision Making: Applications in Management and Engineering 6, no. 2 (October 15, 2023): 722–33. http://dx.doi.org/10.31181/dmame622023785.

Full text

Abstract:

In recent years, despite the strict "zero tolerance" crackdown on the financial fraud and violation behavior of listed companies, the cases of financial fraud, revenue and profit overstatement, and suspected fraud have continued to be exposed. This study first established a financial fraud index system and used the XGBoost algorithm to construct a prediction model for financial fraud and violations of listed companies. The indicators were selected and input into the model. A dataset was obtained for experiments. The XGBoost algorithm was compared with two other algorithms. The receiver operator characteristic (ROC) curves showed that the XGBoost algorithm had the best prediction performance among the three algorithms. It was found that the precision of the XGBoost algorithm was 93.17%, the recall rate was 92.23%, the value was 0.9270, and the area under the curve was 0.90, indicating a better performance than the prediction models based on the Gradient Boosted Decision Tree (GBDT) algorithm and the Logistics algorithm. Considering the data of various evaluation indicators, it is found that the predictive effect of the financial fraud and violation prediction model built by the XGBoost algorithm is the best.

APA, Harvard, Vancouver, ISO, and other styles

20

Rasaizadi, Arash, and Seyedehsan Seyedabrishami. "Stacking Ensemble Learning Process to Predict Rural Road Traffic Flow." Journal of Advanced Transportation 2022 (June 1, 2022): 1–12. http://dx.doi.org/10.1155/2022/3198636.

Full text

Abstract:

By predicting and informing the future of traffic through intelligent transportation systems, there is more readiness to avoid traffic congestion. In this study, an ensemble learning process is proposed to predict the hourly traffic flow. First, three base models, including K-nearest neighbors, random forest, and recurrent neural network, are trained. Predictions of base models are given to the XGBoost stacking model and bagged average to determine the final prediction. Two groups of models predict traffic flow of short-term and mid-term future. In mid-term models, predictor features are cyclical temporal features, holidays, and weather conditions. In short-term models, in addition to the mentioned features, the observed traffic flow in the past 3 to 8 hours has been used. The results show that for both short-term and mid-term models, the least prediction error is obtained by the XGBoost model. In mid-term models, the root mean square error of the XGBoost for the Saveh to Tehran direction and Tehran to Saveh direction is 521 and 607 (veh/hr), respectively. For short-term models, these values are decreased to 453 and 386 (veh/hr). This model also brings less prediction error for predicting the first and fourth quartiles of the observed traffic flow as rare events.

APA, Harvard, Vancouver, ISO, and other styles

21

Lu, Xin, Cai Chen, RuiDan Gao, and ZhenZhen Xing. "Prediction of High-Speed Traffic Flow around City Based on BO-XGBoost Model." Symmetry 15, no. 7 (July 20, 2023): 1453. http://dx.doi.org/10.3390/sym15071453.

Full text

Abstract:

The prediction of high-speed traffic flow around the city is affected by multiple factors, which have certain particularity and difficulty. This study devised an asymmetric Bayesian optimization extreme gradient boosting (BO-XGBoost) model based on Bayesian optimization for the spatiotemporal and multigranularity prediction of high-speed traffic flow around a city. First, a traffic flow dataset for a ring expressway was constructed, and the data features were processed based on the original data. The data were then visualized, and their spatiotemporal distribution exhibited characteristics such as randomness, continuity, periodicity, and rising fluctuations. Secondly, a feature matrix was constructed monthly for the dataset, and the BO-XGBoost model was used for traffic flow prediction. The proposed model BO-XGBoost was compared with the symmetric model bidirectional long short-term memory and integrated models (random forest, extreme gradient boosting, and categorical boosting) that directly input temporal data. The R-squared (R2) of the BO XGBoost model for predicting TF and PCU reached 0.90 and 0.87, respectively, with an average absolute percentage error of 2.88% and 3.12%, respectively. Thus, the proposed model achieved an accurate prediction of high-speed traffic flow around the province, providing a theoretical basis and data support for the development of central-city planning.

APA, Harvard, Vancouver, ISO, and other styles

22

Zhang, Chao, Yihang Zhao, and Huiru Zhao. "A Novel Hybrid Price Prediction Model for Multimodal Carbon Emission Trading Market Based on CEEMDAN Algorithm and Window-Based XGBoost Approach." Mathematics 10, no. 21 (November 1, 2022): 4072. http://dx.doi.org/10.3390/math10214072.

Full text

Abstract:

Accurate prediction of the carbon trading price (CTP) is crucial to the decision-making of relevant stakeholders, and can also provide a reference for policy makers. However, the time interval for the CTP is one day, resulting in a relatively small sample size of data available for predictions. When dealing with small sample data, deep learning algorithms can trade only a small improvement in prediction accuracy at the expense of efficiency and computing time. In contrast, fine-grained configurations of traditional model inputs and parameters often perform no less well than deep learning algorithms. In this context, this paper proposes a novel hybrid CTP prediction model based on the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and a windowed-based XGBoost approach. First, the initial CTP data is decomposed into multiple subsequences with relatively low volatility and randomness based on the CEEMDAN algorithm. Then, the decomposed carbon valence series and covariates are subject to windowed processing to become the inputs of the XGBoost model. Finally, the universality of the proposed model is verified through case studies of four carbon emission trading markets with different modal characteristics, and the superiority of the proposed model is verified by comparing with seven other models. The results show that the prediction error of the proposed XGBoost(W-b) algorithm is reduced by 4.72%~81.47% compared to other prediction algorithms. In addition, the introduction of CEEMDAN further reduces the prediction error by 25.24%~89.28% on the basis of XGBoost(W-b).

APA, Harvard, Vancouver, ISO, and other styles

23

Tang, Jinjun, Lanlan Zheng, Chunyang Han, Fang Liu, and Jianming Cai. "Traffic Incident Clearance Time Prediction and Influencing Factor Analysis Using Extreme Gradient Boosting Model." Journal of Advanced Transportation 2020 (June 9, 2020): 1–12. http://dx.doi.org/10.1155/2020/6401082.

Full text

Abstract:

Accurate prediction and reliable significant factor analysis of incident clearance time are two main objects of traffic incident management (TIM) system, as it could help to relieve traffic congestion caused by traffic incidents. This study applies the extreme gradient boosting machine algorithm (XGBoost) to predict incident clearance time on freeway and analyze the significant factors of clearance time. The XGBoost integrates the superiority of statistical and machine learning methods, which can flexibly deal with the nonlinear data in high-dimensional space and quantify the relative importance of the explanatory variables. The data collected from the Washington Incident Tracking System in 2011 are used in this research. To investigate the potential philosophy hidden in data, K-means is chosen to cluster the data into two clusters. The XGBoost is built for each cluster. Bayesian optimization is used to optimize the parameters of XGBoost, and the MAPE is considered as the predictive indicator to evaluate the prediction performance. A comparative study confirms that the XGBoost outperforms other models. In addition, response time, AADT (annual average daily traffic), incident type, and lane closure type are identified as the significant explanatory variables for clearance time.

APA, Harvard, Vancouver, ISO, and other styles

24

Huang, Yongfen, Can Chen, and Yuqing Miao. "Prediction Model of Bone Marrow Infiltration in Patients with Malignant Lymphoma Based on Logistic Regression and XGBoost Algorithm." Computational and Mathematical Methods in Medicine 2022 (June 28, 2022): 1–7. http://dx.doi.org/10.1155/2022/9620780.

Full text

Abstract:

Objective. The prediction model of bone marrow infiltration (BMI) in patients with malignant lymphoma (ML) was established based on the logistic regression and the XGBoost algorithm. The model’s prediction efficiency was evaluated. Methods. A total of 120 patients diagnosed with ML in the department of hematology from January 2018 to January 2021 were retrospectively selected. The training set ( n = 84 ) and test set ( n = 36 ) were randomly divided into 7 : 3, and logistic regression and XGBoost algorithm models were constructed using the training set data. Predictors of BMI were screened based on laboratory indicators, and the model’s efficacy was evaluated using test set data. Results. The prediction algorithm model’s top three essential characteristics are the blood platelet count, soluble interleukin-2 receptor, and non-Hodgkin’s lymphoma. The area under the curve of the logistic regression model for predicting the BMI of patients with ML was 0.843 (95% CI: 0.761~0.926). The area under the curve of the XGBoost model is 0.844 (95% CI: 0.765~0.937). Conclusion. The prediction model constructed in this study based on logistic regression and XGBoost algorithm has a good prediction model. The results showed that blood platelet count and soluble interleukin-2 receptor were good predictors of BMI in ML patients.

APA, Harvard, Vancouver, ISO, and other styles

25

Thongprayoon, Charat, Pattharawin Pattharanitima, Andrea G. Kattah, Michael A. Mao, Mira T. Keddis, John J. Dillon, Wisit Kaewput, et al. "Explainable Preoperative Automated Machine Learning Prediction Model for Cardiac Surgery-Associated Acute Kidney Injury." Journal of Clinical Medicine 11, no. 21 (October 24, 2022): 6264. http://dx.doi.org/10.3390/jcm11216264.

Full text

Abstract:

Background: We aimed to develop and validate an automated machine learning (autoML) prediction model for cardiac surgery-associated acute kidney injury (CSA-AKI). Methods: Using 69 preoperative variables, we developed several models to predict post-operative AKI in adult patients undergoing cardiac surgery. Models included autoML and non-autoML types, including decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), and artificial neural network (ANN), as well as a logistic regression prediction model. We then compared model performance using area under the receiver operating characteristic curve (AUROC) and assessed model calibration using Brier score on the independent testing dataset. Results: The incidence of CSA-AKI was 36%. Stacked ensemble autoML had the highest predictive performance among autoML models, and was chosen for comparison with other non-autoML and multivariable logistic regression models. The autoML had the highest AUROC (0.79), followed by RF (0.78), XGBoost (0.77), multivariable logistic regression (0.77), ANN (0.75), and DT (0.64). The autoML had comparable AUROC with RF and outperformed the other models. The autoML was well-calibrated. The Brier score for autoML, RF, DT, XGBoost, ANN, and multivariable logistic regression was 0.18, 0.18, 0.21, 0.19, 0.19, and 0.18, respectively. We applied SHAP and LIME algorithms to our autoML prediction model to extract an explanation of the variables that drive patient-specific predictions of CSA-AKI. Conclusion: We were able to present a preoperative autoML prediction model for CSA-AKI that provided high predictive performance that was comparable to RF and superior to other ML and multivariable logistic regression models. The novel approaches of the proposed explainable preoperative autoML prediction model for CSA-AKI may guide clinicians in advancing individualized medicine plans for patients under cardiac surgery.

APA, Harvard, Vancouver, ISO, and other styles

26

Zhang, Ping, Rongqin Wang, and Nianfeng Shi. "IgA Nephropathy Prediction in Children with Machine Learning Algorithms." Future Internet 12, no. 12 (December 17, 2020): 230. http://dx.doi.org/10.3390/fi12120230.

Full text

Abstract:

Immunoglobulin A nephropathy (IgAN) is the most common primary glomerular disease all over the world and it is a major cause of renal failure. IgAN prediction in children with machine learning algorithms has been rarely studied. We retrospectively analyzed the electronic medical records from the Nanjing Eastern War Zone Hospital, chose eXtreme Gradient Boosting (XGBoost), random forest (RF), CatBoost, support vector machines (SVM), k-nearest neighbor (KNN), and extreme learning machine (ELM) models in order to predict the probability that the patient would not reach or reach end-stage renal disease (ESRD) within five years, used the chi-square test to select the most relevant 16 features as the input of the model, and designed a decision-making system (DMS) of IgAN prediction in children that is based on XGBoost and Django framework. The receiver operating characteristic (ROC) curve was used in order to evaluate the performance of the models and XGBoost had the best performance by comparison. The AUC value, accuracy, precision, recall, and f1-score of XGBoost were 85.11%, 78.60%, 75.96%, 76.70%, and 76.33%, respectively. The XGBoost model is useful for physicians and pediatric patients in providing predictions regarding IgAN. As an advantage, a DMS can be designed based on the XGBoost model to assist a physician to effectively treat IgAN in children for preventing deterioration.

APA, Harvard, Vancouver, ISO, and other styles

27

Feng, Dachun, Bing Zhou, Shahbaz Gul Hassan, Longqin Xu, Tonglai Liu, Liang Cao, Shuangyin Liu, and Jianjun Guo. "A Hybrid Model for Temperature Prediction in a Sheep House." Animals 12, no. 20 (October 17, 2022): 2806. http://dx.doi.org/10.3390/ani12202806.

Full text

Abstract:

Too high or too low temperature in the sheep house will directly threaten the healthy growth of sheep. Prediction and early warning of temperature changes is an important measure to ensure the healthy growth of sheep. Aiming at the randomness and empirical problem of parameter selection of the traditional single extreme Gradient boosting (XGBoost) model, this paper proposes an optimization method based on Principal Component Analysis (PCA) and Particle Swarm Optimization (PSO). Then, using the proposed PCA-PSO-XGBoost to predict the temperature in the sheep house. First, PCA is used to screen the key influencing factors of the sheep house temperature. The dimension of the input vector of the model is reduced; PSO-XGBoost is used to build a temperature prediction model, and the PSO optimization algorithm selects the main hyperparameters of XGBoost. We carried out a global search and determined the optimal hyperparameters of the XGBoost model through iterative calculation. Using the data of the Xinjiang Manas intensive sheep breeding base to conduct a simulation experiment, the results show that it is different from the existing ones. Compared with the temperature prediction model, the evaluation indicators of the PCA-PSO-XGBoost model proposed in this paper are root mean square error (RMSE), mean square error (MSE), coefficient of determination (R2), mean absolute error (MAE) are 0.0433, 0.0019, 0.9995, 0.0065, respectively. RMSE, MSE, and MAE are improved by 68, 90, and 94% compared with the traditional XGBoost model. The experimental results show that the model established in this paper has higher accuracy and better stability, can effectively provide guiding suggestions for monitoring and regulating temperature changes in intensive housing and can be extended to the prediction research of other environmental parameters of other animal houses such as pig houses and cow houses in the future.

APA, Harvard, Vancouver, ISO, and other styles

28

Yuan, Yufei, Ruoran Wang, Mingyue Luo, Yidan Zhang, Fanfan Guo, Guiqin Bai, Yang Yang, and JingZhao. "A Machine Learning Approach Using XGBoost Predicts Lung Metastasis in Patients with Ovarian Cancer." BioMed Research International 2022 (October 12, 2022): 1–8. http://dx.doi.org/10.1155/2022/8501819.

Full text

Abstract:

Background. Liver metastasis (LM) is an independent risk factor that affects the prognosis of patients with ovarian cancer; however, there is still a lack of prediction. This study developed a limit gradient enhancement (XGBoost) to predict the risk of lung metastasis in newly diagnosed patients with ovarian cancer, thereby improving prediction efficiency. Patients and Methods. Data of patients diagnosed with ovarian cancer in the Surveillance, Epidemiology, and Final Results (SEER) database from 2010 to 2015 were retrospectively collected. The XGBoost algorithm was used to establish a lung metastasis model for patients with ovarian cancer. The performance of the predictive model was tested by the area under the curve (AUC) of the receiver operating characteristic curve (ROC). Results. The results of the XGBoost algorithm showed that the top five important factors were age, laterality, histological type, grade, and marital status. XGBoost showed good discriminative ability, with an AUC of 0.843. Accuracy, sensitivity, and specificity were 0.982, 1.000, and 0.686, respectively. Conclusion. This study is the first to develop a machine-learning-based prediction model for lung metastasis in patients with ovarian cancer. The prediction model based on the XGBoost algorithm has a higher accuracy rate than traditional logistic regression and can be used to predict the risk of lung metastasis in newly diagnosed patients with ovarian cancer.

APA, Harvard, Vancouver, ISO, and other styles

29

Chen, Mujun, Xiangmei Meng, Guangming Kan, Jingqiang Wang, Guanbao Li, Baohua Liu, Chenguang Liu, Yanguang Liu, Yuanxu Liu, and Junjie Lu. "Predicting the Sound Speed of Seafloor Sediments in the East China Sea Based on an XGBoost Algorithm." Journal of Marine Science and Engineering 10, no. 10 (September 24, 2022): 1366. http://dx.doi.org/10.3390/jmse10101366.

Full text

Abstract:

Based on the acoustic and physical data of typical seafloor sediment samples collected in the East China Sea, this study on the super parameter selection and contribution of the characteristic factors of the machine learning model for predicting the sound speed of seafloor sediments was conducted using the eXtreme gradient boosting (XGBoost) algorithm. An XGBoost model for predicting the sound speed of seafloor sediments was established based on five physical parameters: density (ρ), water content (w), void ratio (e), sand content (S), and average grain size (Mz). The results demonstrated that the model had the highest accuracy when n_estimator was 75 and max_depth was 5. The model training goodness of fit (R2) was as high as 0.92, and the mean absolute error and mean absolute percent error of the model prediction were 7.99 m/s and 0.51%, respectively. The results demonstrated that, in the study area, the XGBoost prediction method for the sound speed of seafloor sediments was superior to the traditional single- and two-parameter regressional equation prediction methods, with higher prediction accuracy, thus providing a new approach to predict the sound speed of seafloor sediments.

APA, Harvard, Vancouver, ISO, and other styles

30

Gopatoti, Anandbabu. "A novel metaheuristic prediction approach for COVID-19 cases using XGBoost algorithm." International Journal of Scientific Methods in Intelligence Engineering Networks 01, no. 01 (2023): 85–93. http://dx.doi.org/10.58599/ijsmien.2023.1108.

Full text

Abstract:

COVID-19 prediction is of great importance to build stronger government prevention and control of the global pandemic. This pandemic has a devastating impact on people and their lives. Many industries are suffering and are struggling to overcome unexpected pandemic challenges. Therefore, it is extremely important to develop up to date and practicable models for time series prediction of COVID-19 that would give promising results. In this paper, we build a model for predicting daily confirmed n w cases for the time series data of Europe by applying the Extreme Gradient Boosting (XGBoost) algorithm and using two performance parameters such as RMSE and MAPE to evaluate the effect of model fitting. Since machine learning methods (ML) have shown favorable results recently, among the tested models, XGBoost has shown to give the best results among the other models tested. Another advantage of the model is that we can use the XGBoost model to determine the robustness of the predictive model by adjusting parameter features. Our results are in line with the expectations of the performance of the model based on the available data and the variability of the data. By analyzing important features and updating the prediction for the new cases of the pandemic, the governments all around the globe could manage the situation much more effortlessly and with greater impact.

APA, Harvard, Vancouver, ISO, and other styles

31

Chen, Yuhuan, and Yingqing Jiang. "Construction of Prediction Model of Deep Vein Thrombosis Risk after Total Knee Arthroplasty Based on XGBoost Algorithm." Computational and Mathematical Methods in Medicine 2022 (January 25, 2022): 1–6. http://dx.doi.org/10.1155/2022/3452348.

Full text

Abstract:

Objective. Based on the XGBoost algorithm, the prediction model of the risk of deep vein thrombosis (DVT) in patients after total knee arthroplasty (TKA) was established, and the prediction performance was compared. Methods. A total of 100 patients with TKA from January 2019 to December 2020 were retrospectively selected as the study subjects and randomly divided into a training set ( n = 60 ) and a test set ( n = 40 ). The training set data was used to construct the XGBoost algorithm prediction model and to screen the predictive factors of postoperative DVT in TKA patients. The prediction effect of the model was evaluated by using the test set data. An independent sample T -test was used for comparison between groups, and the χ 2 test was used for comparison between counting data groups. Results. The top five items were combined with multiple injuries (35 points), time from injury to operation (28 points), age (24 points), combined with coronary heart disease (21 points), and D-dimer 1 day after operation (16 points). In the training set, the area under the curve of the XGBoost algorithm model was 0.832 (95% CI: 0.748-0.916). Conclusion. The model based on the XGBoost algorithm can predict the incidence of DVT in patients after TKA with good performance.

APA, Harvard, Vancouver, ISO, and other styles

32

Yang, Zhao, Yifan Wang, Jie Li, Liming Liu, Jiyang Ma, and Yi Zhong. "Airport Arrival Flow Prediction considering Meteorological Factors Based on Deep-Learning Methods." Complexity 2020 (October 26, 2020): 1–11. http://dx.doi.org/10.1155/2020/6309272.

Full text

Abstract:

This study presents a combined Long Short-Term Memory and Extreme Gradient Boosting (LSTM-XGBoost) method for flight arrival flow prediction at the airport. Correlation analysis is conducted between the historic arrival flow and input features. The XGBoost method is applied to identify the relative importance of various variables. The historic time-series data of airport arrival flow and selected features are taken as input variables, and the subsequent flight arrival flow is the output variable. The model parameters are sequentially updated based on the recently collected data and the new predicting results. It is found that the prediction accuracy is greatly improved by incorporating the meteorological features. The data analysis results indicate that the developed method can characterize well the dynamics of the airport arrival flow, thereby providing satisfactory prediction results. The prediction performance is compared with benchmark methods including backpropagation neural network, LSTM neural network, support vector machine, gradient boosting regression tree, and XGBoost. The results show that the proposed LSTM-XGBoost model outperforms baseline and state-of-the-art neural network models.

APA, Harvard, Vancouver, ISO, and other styles

33

Oh, Sejong, Yuli Park, Kyong Jin Cho, and Seong Jae Kim. "Explainable Machine Learning Model for Glaucoma Diagnosis and Its Interpretation." Diagnostics 11, no. 3 (March 13, 2021): 510. http://dx.doi.org/10.3390/diagnostics11030510.

Full text

Abstract:

The aim is to develop a machine learning prediction model for the diagnosis of glaucoma and an explanation system for a specific prediction. Clinical data of the patients based on a visual field test, a retinal nerve fiber layer optical coherence tomography (RNFL OCT) test, a general examination including an intraocular pressure (IOP) measurement, and fundus photography were provided for the feature selection process. Five selected features (variables) were used to develop a machine learning prediction model. The support vector machine, C5.0, random forest, and XGboost algorithms were tested for the prediction model. The performance of the prediction models was tested with 10-fold cross-validation. Statistical charts, such as gauge, radar, and Shapley Additive Explanations (SHAP), were used to explain the prediction case. All four models achieved similarly high diagnostic performance, with accuracy values ranging from 0.903 to 0.947. The XGboost model is the best model with an accuracy of 0.947, sensitivity of 0.941, specificity of 0.950, and AUC of 0.945. Three statistical charts were established to explain the prediction based on the characteristics of the XGboost model. Higher diagnostic performance was achieved with the XGboost model. These three statistical charts can help us understand why the machine learning model produces a specific prediction result. This may be the first attempt to apply “explainable artificial intelligence” to eye disease diagnosis.

APA, Harvard, Vancouver, ISO, and other styles

34

Xu, Bing, Youcheng Tan, Weibang Sun, Tianxing Ma, Hengyu Liu, and Daguo Wang. "Study on the Prediction of the Uniaxial Compressive Strength of Rock Based on the SSA-XGBoost Model." Sustainability 15, no. 6 (March 15, 2023): 5201. http://dx.doi.org/10.3390/su15065201.

Full text

Abstract:

The uniaxial compressive strength of rock is one of the important parameters characterizing the properties of rock masses in geotechnical engineering. To quickly and accurately predict the uniaxial compressive strength of rock, a new SSA-XGBoost optimizer prediction model was produced to predict the uniaxial compressive strength of 290 rock samples. With four parameters, namely, porosity (n,%), Schmidt rebound number (Rn), longitudinal wave velocity (Vp, m/s), and point load strength (Is(50), MPa) as input variables and uniaxial compressive strength (UCS, MPa) as the output variables, a prediction model of uniaxial compressive strength was built based on the SSA-XGBoost model. To verify the effectiveness of the SSA-XGBoost model, empirical formulas, XGBoost, SVM, RF, BPNN, KNN, PLSR, and other models were also established and compared with the SSA-XGBoost model. All models were evaluated using the root mean square error (RMSE), correlation coefficient (R2), mean absolute error (MAE), and variance interpretation (VAF). The results calculated by the SSA-XGBoost model (R2 = 0.84, RMSE = 19.85, MAE = 14.79, and VAF = 81.36), are the best among all prediction models. Therefore, the SSA-XGBoost model is the best model to predict the uniaxial compressive strength of rock, for the dataset tested.

APA, Harvard, Vancouver, ISO, and other styles

35

Harriz, Muhammad Alfathan, Nurhaliza Vania Akbariani, Harlis Setiyowati, and Handri Santoso. "Enhancing the Efficiency of Jakarta's Mass Rapid Transit System with XGBoost Algorithm for Passenger Prediction." Jambura Journal of Informatics 5, no. 1 (April 27, 2023): 1–6. http://dx.doi.org/10.37905/jji.v5i1.18814.

Full text

Abstract:

This study is based on a machine learning algorithm known as XGBoost. We used the XGBoost algorithm to forecast the capacity of Jakarta's mass transit system. Using preprocessed raw data obtained from the Jakarta Open Data website for the period 2020-2021 as a training medium, we achieved a mean absolute percentage error of 69. However, after the model was fine-tuned, the MAPE was significantly reduced by 28.99% to 49.97. The XGBoost algorithm was found to be effective in detecting patterns and trends in the data, which can be used to improve routes and plan future studies by providing valuable insights. It is possible that additional data points, such as holidays and weather conditions, will further enhance the accuracy of the model in future research. As a result of implementing XGBoost, Jakarta's transportation system can optimize resource utilization and improve customer service in order to improve passenger satisfaction. Future studies may benefit from additional data points, such as holidays and weather conditions, in order to improve XGBoost's efficiency.

APA, Harvard, Vancouver, ISO, and other styles

36

Liu, Linxiang, Yuan Nie, Qi Liu, and Xuan Zhu. "A Practical Model for Predicting Esophageal Variceal Rebleeding in Patients with Hepatitis B-Associated Cirrhosis." International Journal of Clinical Practice 2023 (August 3, 2023): 1–11. http://dx.doi.org/10.1155/2023/9701841.

Full text

Abstract:

Background. Variceal rebleeding is a significant and potentially life-threatening complication of cirrhosis. Unfortunately, currently, there is no reliable method for stratifying high-risk patients. Liver stiffness measurements (LSM) have been shown to have a predictive value in identifying complications associated with portal hypertension, including first-time bleeding. However, there is a lack of evidence to confirm that LSM is reliable in predicting variceal rebleeding. The objective of our study was to evaluate the ability of generating a extreme gradient boosting (XGBoost) algorithm model to improve the prediction of variceal rebleeding. Methods. This retrospective analysis examined a cohort of 284 patients with hepatitis B-related cirrhosis. XGBoost models were developed using laboratory data, LSM, and imaging data to predict the risk of rebleeding in the patients. In addition, we compared the XGBoost models with traditional logistic regression (LR) models. We evaluated and compared the two models using the area under the receiver operating characteristic curve (AUROC) and other model performance parameters. Lastly, we validated the models using nomograms and decision curve analysis (DCA). Results. During a median follow-up of 66.6 weeks, 72 patients experienced rebleeding, including 21 (7.39%) and 61 (21.48%) patients who rebleed within 6 weeks and 1 year, respectively. In brief, the AUC of the LR models in predicting rebleeding at 6 weeks and 1 year was 0.828 (0.759–0.897) and 0.799 (0.738–0.860), respectively. In contrast, the accuracy of the XGBoost model in predicting rebleeding at 6 weeks and 1 year was 0.985 (0.907–0.731) and 0.931 (0.806–0.935), respectively. LSM and high-density lipoprotein (HDL) levels differed significantly between the rebleeding and nonrebleeding groups, with LSM being a reliable predictor in those models. The XGBoost models outperformed the LR models in predicting rebleeding within 6 weeks and 1 year, as demonstrated by the ROC and DCA curves. Conclusion. The XGBoost algorithm model can achieve higher accuracy than the LR model in predicting rebleeding, making it a clinically beneficial tool. This implies that the XGBoost model is better suited for predicting the risk of esophageal variceal rebleeding in patients.

APA, Harvard, Vancouver, ISO, and other styles

37

Guo, Jiang, Chen Zhang, Shoudong Xie, and Yi Liu. "Research on the Prediction Model of Blasting Vibration Velocity in the Dahuangshan Mine." Applied Sciences 12, no. 12 (June 8, 2022): 5849. http://dx.doi.org/10.3390/app12125849.

Full text

Abstract:

In order to improve the prediction accuracy of blast vibration velocity, the model for predicting the peak particle velocity of blast vibration using the XGBoost (Extreme Gradient Boosting) method is improved, and the EWT–XGBoost model is established to predict the peak particle velocity of blast vibration by combining it with the EWT (Empirical Wavelet Transform) method. Calculate the relative error and root mean square error between the predicted value and measured value of each test sample, and compare the prediction performance of the EWT–XGBoost model with the original model. There is a large elevation difference between each vibration measurement location of high and steep slopes, but high and steep slopes are extremely dangerous, which is not conducive to the layout of blasting vibration monitoring equipment. The vibration velocity prediction model adopts the numerical simulation method, selects the center position of the small platform as the measurement point of the peak particle velocity, and studies the variation law of the blasting vibration velocity of the high and steep slopes under the action of top blasting. The research results show that the EWT–XGBoost model has a higher accuracy than the original model in the prediction of blasting vibration velocity; the simultaneous detonation method on adjacent high and steep slopes cannot meet the relevant requirements of safety regulations, and the delayed detonation method can effectively reduce the blasting vibration of high and steep slopes. The shock absorption effect of the elevation difference within 45 m is obvious.

APA, Harvard, Vancouver, ISO, and other styles

38

Yuan, Jianming. "Predicting Death Risk of COVID-19 Patients Leveraging Machine Learning Algorithm." Applied and Computational Engineering 8, no. 1 (August 1, 2023): 186–90. http://dx.doi.org/10.54254/2755-2721/8/20230122.

Full text

Abstract:

The first instance of COVID-19 was found in Wuhan, China, which mainly caused damage to human body in the form of respiratory diseases. In this study, an XGBoost prediction model was put forward according to the analysis on age, pneumonia, diabetes, and other attributes in the dataset, which was employed to estimate the COVID-19 patients' risk of death. In this study, a lot of preprocessing was carried out on the dataset, such as deleting null values in the dataset. In addition, there are strong correlation between sex, pnueumonia and death probability. In this study, XGBoost, CatBoost, logistic regression and random forest were established by machine learning method to forecast the COVID-19 patients' chance of mortality. The findings revealed that XGBoost's prediction performance was the best, while the logistic regression model performed poorly in this reported dataset of COVID-19 patients when compared to other approaches. From the feature importance map of XGBoost, it is found that age and pneumonia have great influence on the prediction of death risk.

APA, Harvard, Vancouver, ISO, and other styles

39

Li, Mingguang, Runyi Huang, and Yumiao Yang. "Short-term wind speed prediction based on combinatorial prediction model." Highlights in Science, Engineering and Technology 60 (July 25, 2023): 274–82. http://dx.doi.org/10.54097/hset.v60i.10534.

Full text

Abstract:

Improving the accuracy of wind speed forecast can increase wind power generation and better achieve wind energy grid connection. Therefore, a two-stage wind speed prediction model based on Ensemble Empirical Modal Decomposition (EEMD) and the combination prediction of Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), eXtreme Gradient Boosting (XGBOOST), Gate Recurrent Unit (GRU), Temporal Convolutional Network (TCN) is proposed. First, the original wind speed series is separated into Intrinsic Mode Functions (IMFs) using EEMD. Then, RNN, LSTM, XGBOOST, GRU, TCN multiple prediction models are established to learn features from each subsequence and superimpose the prediction results of subsequences. Finally, Particle Swarm Optimization (PSO) is applied to the results of multiple prediction models to assign weights, combined with weight superimposing sequences to achieve higher accuracy and more robust wind speed prediction. Simulation analysis using data from St. Thomas, Virgin Islands wind measurement station to validate the validity of the combined prediction model. The experimental simulation results show that the model proposed in this paper has a good result on increasing wind speed prediction accuracy.

APA, Harvard, Vancouver, ISO, and other styles

40

Li, Xiangcheng, Jialong Wang, Zhirui Geng, Yang Jin, and Jiawei Xu. "Short-term Wind Power Prediction Method Based on Genetic Algorithm Optimized XGBoost Regression Model." Journal of Physics: Conference Series 2527, no. 1 (June 1, 2023): 012061. http://dx.doi.org/10.1088/1742-6596/2527/1/012061.

Full text

Abstract:

Abstract In order to solve the problem of accuracy and rapidity of short-term prediction of wind power output, the eXtreme Gradient Boosting (XGBoost) regression model is used in this paper to predict wind power output. For the models commonly used at the present stage, such as Long Short Term Memory (LSTM), random forest and ordinary XGBoost model, the modelling time is long, and the accuracy is not enough. In this paper, a genetic algorithm (GA) is introduced to improve the accuracy and speed of prediction of the XGBoost regression model. Firstly, the learning rate of the XGBoost model is optimized by using the good searching ability and flexibility of the genetic algorithm. Then variable weight combination prediction is carried out. The objective function for this problem is the mean square error that occurs between the value that is predicted and the value that actually occurs in the training set. GA is responsible for determining the model’s final weight. The historical output data of the wind plant is used in this paper to verify the XGBoost regression model based on a genetic algorithm and get the predicted value, which is then compared with the prediction results of LSTM and random forest algorithm. Example simulation and analysis show that the XGBoost regression model optimized by the genetic algorithm can be more significantly in solving the accuracy and rapidity of the prediction of short-term wind power output.

APA, Harvard, Vancouver, ISO, and other styles

41

Kuthe, Annaji, Chaitanya Bhake, Vaibhav Bhoyar, Aman Yenurkar, Vedant Khandekar, and Ketan Gawale. "Water Quality Prediction Using Machine Learning." International Journal of Computer Science and Mobile Computing 12, no. 4 (April 30, 2023): 52–59. http://dx.doi.org/10.47760/ijcsmc.2023.v12i04.006.

Full text

Abstract:

Different toxins have been imperiling water quality over the past decades. As a result, foreseeing and modeling water quality have gotten to be basic to minimizing water contamination. This inquiry has created a classification calculation to foresee the water quality classification (WQC). The WQC is classified based on the water quality file (WQI) from 7 parameters in a dataset utilizing Back Vector Machine (SVM) and Extraordinary Gradient Boosting (XGBoost). The comes about from the proposed model can precisely classify the water quality based on their features. The inquire about result illustrated that the XGBoost model performed way better, with an exactness of 94%, compared to the SVM demonstrate, with as it were a 67% exactness. Indeed way better, the XGBoost brought about in as it were 6% misclassification mistake compared to SVM, which had 33%. On best of that, XGBoost too gotten consistent predominant comes about from 5-fold approval with an normal accuracy of 90%, whereas SVM with an normal exactness of 64%. Considering the upgraded execution, XGBoost is concluded to be superior at water quality classification.

APA, Harvard, Vancouver, ISO, and other styles

42

Meng, Delin, Jun Xu, and Jijun Zhao. "Analysis and prediction of hand, foot and mouth disease incidence in China using Random Forest and XGBoost." PLOS ONE 16, no. 12 (December 22, 2021): e0261629. http://dx.doi.org/10.1371/journal.pone.0261629.

Full text

Abstract:

Hand, foot and mouth disease (HFMD) is an increasingly serious public health problem, and it has caused an outbreak in China every year since 2008. Predicting the incidence of HFMD and analyzing its influential factors are of great significance to its prevention. Now, machine learning has shown advantages in infectious disease models, but there are few studies on HFMD incidence based on machine learning that cover all the provinces in mainland China. In this study, we proposed two different machine learning algorithms, Random Forest and eXtreme Gradient Boosting (XGBoost), to perform our analysis and prediction. We first used Random Forest to examine the association between HFMD incidence and potential influential factors for 31 provinces in mainland China. Next, we established Random Forest and XGBoost prediction models using meteorological and social factors as the predictors. Finally, we applied our prediction models in four different regions of mainland China and evaluated the performance of them. Our results show that: 1) Meteorological factors and social factors jointly affect the incidence of HFMD in mainland China. Average temperature and population density are the two most significant influential factors; 2) Population flux has different delayed effect in affecting HFMD incidence in different regions. From a national perspective, the model using population flux data delayed for one month has better prediction performance; 3) The prediction capability of XGBoost model was better than that of Random Forest model from the overall perspective. XGBoost model is more suitable for predicting the incidence of HFMD in mainland China.

APA, Harvard, Vancouver, ISO, and other styles

43

Lin, Xiaoxuan, Lixin Chen, Defu Zhang, Shuyu Luo, Yuanyuan Sheng, Xiaohua Liu, Qian Liu, et al. "Prediction of Surgical Approach in Mitral Valve Disease by XGBoost Algorithm Based on Echocardiographic Features." Journal of Clinical Medicine 12, no. 3 (February 2, 2023): 1193. http://dx.doi.org/10.3390/jcm12031193.

Full text

Abstract:

In this study, we aimed to develop a prediction model to assist surgeons in choosing an appropriate surgical approach for mitral valve disease patients. We retrospectively analyzed a total of 143 patients who underwent surgery for mitral valve disease. The XGBoost algorithm was used to establish a predictive model to decide a surgical approach (mitral valve repair or replacement) based on the echocardiographic features of the mitral valve apparatus, such as leaflets, the annulus, and sub-valvular structures. The results showed that the accuracy of the predictive model was 81.09% in predicting the appropriate surgical approach based on the patient’s preoperative echocardiography. The result of the predictive model was superior to the traditional complexity score (81.09% vs. 75%). Additionally, the predictive model showed that the three main factors affecting the choice of surgical approach were leaflet restriction, calcification of the leaflet, and perforation or cleft of the leaflet. We developed a novel predictive model using the XGBoost algorithm based on echocardiographic features to assist surgeons in choosing an appropriate surgical approach for patients with mitral valve disease.

APA, Harvard, Vancouver, ISO, and other styles

44

Ding, Chao, Yuwen Guo, Qinqin Mo, and Jin Ma. "Prediction Model of Postoperative Severe Hypocalcemia in Patients with Secondary Hyperparathyroidism Based on Logistic Regression and XGBoost Algorithm." Computational and Mathematical Methods in Medicine 2022 (July 25, 2022): 1–7. http://dx.doi.org/10.1155/2022/8752826.

Full text

Abstract:

Objective. A predictive model was established based on logistic regression and XGBoost algorithm to investigate the factors related to postoperative hypocalcemia in patients with secondary hyperparathyroidism (SHPT). Methods. A total of 60 SHPT patients who underwent parathyroidectomy (PTX) in our hospital were retrospectively enrolled. All patients were randomly divided into a training set ( n = 42 ) and a test set ( n = 18 ). The clinical data of the patients were analyzed, including gender, age, dialysis time, body mass, and several preoperative biochemical indicators. The multivariate logistic regression and XGBoost algorithm models were used to analyze the independent risk factors for severe postoperative hypocalcemia (SH). The forecasting efficiency of the two prediction models is analyzed. Results. Multivariate logistic regression analysis showed that body mass ( OR = 1.203 , P = 0.032 ), age ( OR = 1.214 , P = 0.035 ), preoperative PTH ( OR = 1.026 , P = 0.043 ), preoperative Ca ( OR = 1.062 , P = 0.025 ), and preoperative ALP ( OR = 1.031 , P = 0.027 ) were positively correlated with postoperative SH. The top three important features of XGBoost algorithm prediction model were preoperative Ca, preoperative PTH, and preoperative ALP. The area under the curve of the logistic regression and XGBoost algorithm model in the test set was 0.734 (95% CI: 0.595~0.872) and 0.827 (95% CI: 0.722~0.932), respectively. Conclusion. The predictive models based on the logistic regression and XGBoost algorithm model can predict the occurrence of postoperative SH.

APA, Harvard, Vancouver, ISO, and other styles

45

Moore, Alexander, and Max Bell. "XGBoost, A Novel Explainable AI Technique, in the Prediction of Myocardial Infarction: A UK Biobank Cohort Study." Clinical Medicine Insights: Cardiology 16 (January 2022): 117954682211336. http://dx.doi.org/10.1177/11795468221133611.

Full text

Abstract:

We wanted to assess if “Explainable AI” in the form of extreme gradient boosting (XGBoost) could outperform traditional logistic regression in predicting myocardial infarction (MI) in a large cohort. Two machine learning methods, XGBoost and logistic regression, were compared in predicting risk of MI. The UK Biobank is a population-based prospective cohort including 502 506 volunteers with active consent, aged 40 to 69 years at recruitment from 2006 to 2010. These subjects were followed until end of 2019 and the primary outcome was myocardial infarction. Both models were trained using 90% of the cohort. The remaining 10% was used as a test set. Both models were equally precise, but the regression model classified more of the healthy class correctly. XGBoost was more accurate in identifying individuals who later suffered a myocardial infarction. Receiver operator characteristic (ROC) scores are class size invariant. In this metric XGBoost outperformed the logistic regression model, with ROC scores of 0.86 (accuracy 0.75 (CI ±0.00379) and 0.77 (accuracy 0.77 (CI ± 0.00369) respectively. Secondly, we demonstrate how SHAPley values can be used to visualize and interpret the predictions made by XGBoost models, both for the cohort test set and for individuals. The XGBoost machine learning model shows very promising results in evaluating risk of MI in a large and diverse population. This model can be used, and visualized, both for individual assessments and in larger cohorts. The predictions made by the XGBoost models, points toward a future where “Explainable AI” may help to bridge the gap between medicine and data science.

APA, Harvard, Vancouver, ISO, and other styles

46

Xu, Jialing, Jingxing He, Jinqiang Gu, Huayang Wu, Lei Wang, Yongzhen Zhu, Tiejun Wang, Xiaoling He, and Zhangyuan Zhou. "Financial Time Series Prediction Based on XGBoost and Generative Adversarial Networks." International Journal of Circuits, Systems and Signal Processing 16 (January 15, 2022): 637–45. http://dx.doi.org/10.46300/9106.2022.16.79.

Full text

Abstract:

Considering the problems of the model collapse and the low forecast precision in predicting the financial time series of the generative adversarial networks (GAN), we apply the WGAN-GP model to solve the gradient collapse. Extreme gradient boosting (XGBoost) is used for feature extraction to improve prediction accuracy. Alibaba stock is taken as the research object, using XGBoost to optimize its characteristic factors, and training the optimized characteristic variables with WGAN-GP. We compare the prediction results of WGAN-GP model and classical time series prediction models, long short term memory (LSTM) and gate recurrent unit (GRU). In the experimental stage, root mean square error (RMSE) is chosen as the evaluation index. The results of different models show that the RMSE of WGAN-GP model is the smallest, which are 61.94% and 47.42%, lower than that of LSTM model and GRU model respectively. At the same time, the stock price data of Google and Amazon confirm the stability of WGAN-GP model. WGAN-GP model can obtain higher prediction accuracy than the classical time series prediction model.

APA, Harvard, Vancouver, ISO, and other styles

47

Jin, Deyan. "Risk Prediction Method of Obstetric Nursing Based on Data Mining." Contrast Media & Molecular Imaging 2022 (August 24, 2022): 1–11. http://dx.doi.org/10.1155/2022/5100860.

Full text

Abstract:

Obstetric nursing is not only complex but also prone to risks, which can have adverse effects on hospitals. Improper handling of existing risks in obstetric care can lead to enormous harm to patients and families. Therefore, it is necessary to pay attention to the risks of obstetric nursing, especially to predict the risks in a timely manner, and take effective measures to prevent them in time, so as to achieve the purpose of allowing patients to recover as soon as possible. Data mining has powerful forecasting function, so this paper proposes to combine the data-mining-based support vector machine method and XGBoost method into a forecasting model, which overcomes the shortcomings of unstable forecasting and low accuracy of a single forecasting model. The experimental results of this paper have shown that the prediction accuracy of the SVM-XGBoost combined prediction model has reached 100%, the accuracy of the single SVM prediction model is about 78%, and the accuracy of the single XGBoost prediction model is about 75%. Compared with the single SVM model and the XGBoost prediction model, the accuracy rate had increased by about 22% and 25%, and the precision rate and recall rate are also improved. Therefore, it is very suitable to use the SVM-XGBoost combined prediction model to predict the risk of obstetric nursing.

APA, Harvard, Vancouver, ISO, and other styles

48

Dai, Hongbin, Guangqiu Huang, Huibin Zeng, and Fan Yang. "PM2.5 Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM." Sustainability 13, no. 21 (November 1, 2021): 12071. http://dx.doi.org/10.3390/su132112071.

Full text

Abstract:

With the rapid development of China’s industrialization, air pollution is becoming more and more serious. Predicting air quality is essential for identifying further preventive measures to avoid negative impacts. The existing prediction of atmospheric pollutant concentration ignores the problem of feature redundancy and spatio-temporal characteristics; the accuracy of the model is not high, the mobility of it is not strong. Therefore, firstly, extreme gradient lifting (XGBoost) is applied to extract features from PM2.5, then one-dimensional multi-scale convolution kernel (MSCNN) is used to extract local temporal and spatial feature relations from air quality data, and linear splicing and fusion is carried out to obtain the spatio-temporal feature relationship of multi-features. Finally, XGBoost and MSCNN combine the advantages of LSTM in dealing with time series. Genetic algorithm (GA) is applied to optimize the parameter set of long-term and short-term memory network (LSTM) network. The spatio-temporal relationship of multi-features is input into LSTM network, and then the long-term feature dependence of multi-feature selection is output to predict PM2.5 concentration. A XGBoost-MSCGL of PM2.5 concentration prediction model based on spatio-temporal feature selection is established. The data set comes from the hourly concentration data of six kinds of atmospheric pollutants and meteorological data in Fen-Wei Plain in 2020. To verify the effectiveness of the model, the XGBoost-MSCGL model is compared with the benchmark models such as multilayer perceptron (MLP), CNN, LSTM, XGBoost, CNN-LSTM with before and after using XGBoost feature selection. According to the forecast results of 12 cities, compared with the single model, the root mean square error (RMSE) decreased by about 39.07%, the average MAE decreased by about 42.18%, the average MAE decreased by about 49.33%, but R2 increased by 23.7%. Compared with the model after feature selection, the root mean square error (RMSE) decreased by an average of about 15%. On average, the MAPE decreased by 16%, the MAE decreased by 21%, and R2 increased by 2.6%. The experimental results show that the XGBoost-MSCGL prediction model offer a more comprehensive understanding, runs deeper levels, guarantees a higher prediction accuracy, and ensures a better generalization ability in the prediction of PM2.5 concentration.

APA, Harvard, Vancouver, ISO, and other styles

49

Narvekar, Aditya, and Debashis Guha. "Bankruptcy prediction using machine learning and an application to the case of the COVID-19 recession." Data Science in Finance and Economics 1, no. 2 (2021): 180–95. http://dx.doi.org/10.3934/dsfe.2021010.

Full text

Abstract:

<abstract> <p>Bankruptcy prediction is an important problem in finance, since successful predictions would allow stakeholders to take early actions to limit their economic losses. In recent years many studies have explored the application of machine learning models to bankruptcy prediction with financial ratios as predictors. This study extends this research by applying machine learning techniques to a quarterly data set covering financial ratios for a large sample of public U.S. firms from 1970–2019. We find that tree-based ensemble methods, especially XGBoost, can achieve a high degree of accuracy in out-of-sample bankruptcy prediction. We next apply our best model, using XGBoost, to the problem of predicting the overall bankruptcy rate in USA in the second half of 2020, after the COVID-19 pandemic had necessitated a lockdown, leading to a deep recession. Our model supports the prediction, made by leading economists, that the rate of bankruptcies will rise substantially in 2020, but it also suggests that this elevated level will not be much higher than 2010.</p> </abstract>

APA, Harvard, Vancouver, ISO, and other styles

50

Kartina Diah Kusuma Wardani and Memen Akbar. "Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 7, no. 4 (August 12, 2023): 824–31. http://dx.doi.org/10.29207/resti.v7i4.4651.

Full text

Abstract:

Diabetes results from impaired pancreas function as a producer of insulin and glucagon hormones, which regulate glucose levels in the blood. People with diabetes today are not only experienced adults, but pre-diabetes has been identified since the age of children and adolescents. Early prediction of diabetes can make it easier for doctors and patients to intervene as soon as possible so that the risk of complications can be reduced. One of the uses of medical data from diabetes patients is used to produce a model that can be used by medical staff to predict and identify diabetes in patients. Various techniques are used to provide the earliest possible prediction of diabetes based on the symptoms experienced by diabetic patients, including using machine learning. People can use Machine Learning to generate models based on historical data of diabetic patients, and predictions are made with the model. In this study, extreme gradient boosting is the machine learning technique to predict diabetes (xgboost) using Feature Importance XGBoost. The diabetes dataset used in this study comes from the Early stage diabetes risk prediction dataset published by UCI Machine Learning, which has 520 records and 16 attributes. The diabetes prediction model using xgboost is displayed as a tree. The model accuracy result in this study was 98.71%, for the F1 score was 98.18%. While the accuracy obtained based on the best 10 attributes using the XGBoost feature importance are 98.72%.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!