Log in

Relevant bibliographies by topics / PREDICTION DATASET / Journal articles

To see the other types of publications on this topic, follow the link: PREDICTION DATASET.

Journal articles on the topic 'PREDICTION DATASET'

Author: Grafiati

Published: 11 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'PREDICTION DATASET.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Burmakova, Anastasiya, and Diana Kalibatienė. "Applying Fuzzy Inference and Machine Learning Methods for Prediction with a Small Dataset: A Case Study for Predicting the Consequences of Oil Spills on a Ground Environment." Applied Sciences 12, no. 16 (August 18, 2022): 8252. http://dx.doi.org/10.3390/app12168252.

Full text

Abstract:

Applying machine learning (ML) and fuzzy inference systems (FIS) requires large datasets to obtain more accurate predictions. However, in the cases of oil spills on ground environments, only small datasets are available. Therefore, this research aims to assess the suitability of ML techniques and FIS for the prediction of the consequences of oil spills on ground environments using small datasets. Consequently, we present a hybrid approach for assessing the suitability of ML (Linear Regression, Decision Trees, Support Vector Regression, Ensembles, and Gaussian Process Regression) and the adaptive neural fuzzy inference system (ANFIS) for predicting the consequences of oil spills with a small dataset. This paper proposes enlarging the initial small dataset of an oil spill on a ground environment by using the synthetic data generated by applying a mathematical model. ML techniques and ANFIS were tested with the same generated synthetic datasets to assess the proposed approach. The proposed ANFIS-based approach shows significant performance and sufficient efficiency for predicting the consequences of oil spills on ground environments with a smaller dataset than the applied ML techniques. The main finding of this paper indicates that FIS is suitable for prediction with a small dataset and provides sufficiently accurate prediction results.

APA, Harvard, Vancouver, ISO, and other styles

2

Abdullahi, Dauda Sani, Dr Muhammad Sirajo Aliyu, and Usman Musa Abdullahi. "Comparative analysis of resampling algorithms in the prediction of stroke diseases." UMYU Scientifica 2, no. 1 (March 30, 2023): 88–94. http://dx.doi.org/10.56919/usci.2123.011.

Full text

Abstract:

Stroke disease is a serious cause of death globally. Early predictions of the disease will save a lot of lives but most of the clinical datasets are imbalanced in nature including the stroke dataset, making the predictive algorithms biased towards the majority class. The objective of this research is to compare different data resampling algorithms on the stroke dataset to improve the prediction performances of the machine learning models. This paper considered five (5) resampling algorithms namely; Random over Sampling (ROS), Synthetic Minority oversampling Technique (SMOTE), Adaptive Synthetic (ADASYN), hybrid techniques like SMOTE with Edited Nearest Neighbor (SMOTE-ENN), and SMOTE with Tomek Links (SMOTE-TOMEK) and trained on six (6) machine learning classifiers namely; Logistic Regression (LR), Decision Tree (DT), K-nearest Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF), and XGBoost (XGB). The hybrid technique SMOTE-ENN influences the machine learning classifiers the best followed by the SMOTE technique while the combination of SMOTE and XGB perform better with an accuracy of 97.99% and G-mean score of 0.99, and auc_roc score of 0.99. Resampling algorithms balance the dataset and enhanced the predictive power of machine learning algorithms. Therefore, we recommend resampling stroke dataset in predicting stroke disease than modeling on the imbalanced dataset.

APA, Harvard, Vancouver, ISO, and other styles

3

Gangil, Tarun, Krishna Sharan, B. Dinesh Rao, Krishnamoorthy Palanisamy, Biswaroop Chakrabarti, and Rajagopal Kadavigere. "Utility of adding Radiomics to clinical features in predicting the outcomes of radiotherapy for head and neck cancer using machine learning." PLOS ONE 17, no. 12 (December 15, 2022): e0277168. http://dx.doi.org/10.1371/journal.pone.0277168.

Full text

Abstract:

Background Radiomics involves the extraction of quantitative information from annotated Computed-Tomography (CT) images, and has been used to predict outcomes in Head and Neck Squamous Cell Carcinoma (HNSCC). Subjecting combined Radiomics and Clinical features to Machine Learning (ML) could offer better predictions of clinical outcomes. This study is a comparative performance analysis of ML models with Clinical, Radiomics, and Clinico-Radiomic datasets for predicting four outcomes of HNSCC treated with Curative Radiation Therapy (RT): Distant Metastases, Locoregional Recurrence, New Primary, and Residual Disease. Methodology The study used retrospective data of 311 HNSCC patients treated with radiotherapy between 2013–2018 at our centre. Binary prediction models were developed for the four outcomes with Clinical-only, Clinico-Radiomic, and Radiomics-only datasets, using three different ML classification algorithms namely, Random Forest (RF), Kernel Support Vector Machine (KSVM), and XGBoost. The best-performing ML algorithms of the three dataset groups was then compared. Results The Clinico-Radiomic dataset using KSVM classifier provided the best prediction. Predicted mean testing accuracy for Distant Metastases, Locoregional Recurrence, New Primary, and Residual Disease was 97%, 72%, 99%, and 96%, respectively. The mean area under the receiver operating curve (AUC) was calculated and displayed for all the models using three dataset groups. Conclusion Clinico-Radiomic dataset improved the predictive ability of ML models over clinical features alone, while models built using Radiomics performed poorly. Radiomics data could therefore effectively supplement clinical data in predicting outcomes.

APA, Harvard, Vancouver, ISO, and other styles

4

Rau, Cheng-Shyuan, Shao-Chun Wu, Jung-Fang Chuang, Chun-Ying Huang, Hang-Tsung Liu, Peng-Chen Chien, and Ching-Hua Hsieh. "Machine Learning Models of Survival Prediction in Trauma Patients." Journal of Clinical Medicine 8, no. 6 (June 5, 2019): 799. http://dx.doi.org/10.3390/jcm8060799.

Full text

Abstract:

Background: We aimed to build a model using machine learning for the prediction of survival in trauma patients and compared these model predictions to those predicted by the most commonly used algorithm, the Trauma and Injury Severity Score (TRISS). Methods: Enrolled hospitalized trauma patients from 2009 to 2016 were divided into a training dataset (70% of the original data set) for generation of a plausible model under supervised classification, and a test dataset (30% of the original data set) to test the performance of the model. The training and test datasets comprised 13,208 (12,871 survival and 337 mortality) and 5603 (5473 survival and 130 mortality) patients, respectively. With the provision of additional information such as pre-existing comorbidity status or laboratory data, logistic regression (LR), support vector machine (SVM), and neural network (NN) (with the Stuttgart Neural Network Simulator (RSNNS)) were used to build models of survival prediction and compared to the predictive performance of TRISS. Predictive performance was evaluated by accuracy, sensitivity, and specificity, as well as by area under the curve (AUC) measures of receiver operating characteristic curves. Results: In the validation dataset, NN and the TRISS presented the highest score (82.0%) for balanced accuracy, followed by SVM (75.2%) and LR (71.8%) models. In the test dataset, NN had the highest balanced accuracy (75.1%), followed by the TRISS (70.2%), SVM (70.6%), and LR (68.9%) models. All four models (LR, SVM, NN, and TRISS) exhibited a high accuracy of more than 97.5% and a sensitivity of more than 98.6%. However, NN exhibited the highest specificity (51.5%), followed by the TRISS (41.5%), SVM (40.8%), and LR (38.5%) models. Conclusions: These four models (LR, SVM, NN, and TRISS) exhibited a similar high accuracy and sensitivity in predicting the survival of the trauma patients. In the test dataset, the NN model had the highest balanced accuracy and predictive specificity.

APA, Harvard, Vancouver, ISO, and other styles

5

Sinaga, Benyamin Langgu, Sabrina Ahmad, Zuraida Abal Abas, and Intan Ermahani A. Jalil. "A recommendation system of training data selection method for cross-project defect prediction." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 2 (August 1, 2022): 990. http://dx.doi.org/10.11591/ijeecs.v27.i2.pp990-1006.

Full text

Abstract:

Cross-project <span lang="EN-US">defect prediction (CPDP) has been a popular approach to address the limited historical dataset when building a defect prediction model. Directly applying cross-project datasets to learn the prediction model produces an unsatisfactory predictive model. Therefore, the selection of training data is essential. Many studies have examined the effectiveness of training data selection methods, and the best-performing method varied across datasets. While no method consistently outperformed the others across all datasets, predicting the best method for a specific dataset is essential. This study proposed a recommendation system to select the most suitable training data selection method in the CPDP setting. We evaluated the proposed system using 44 datasets, 13 training data selection methods, and six classification algorithms. The findings concluded that the recommendation system effectively recommends the best method to select training data.</span>

APA, Harvard, Vancouver, ISO, and other styles

6

Morgan, Maria, Carla Blank, and Raed Seetan. "Plant disease prediction using classification algorithms." IAES International Journal of Artificial Intelligence (IJ-AI) 10, no. 1 (March 1, 2021): 257. http://dx.doi.org/10.11591/ijai.v10.i1.pp257-264.

Full text

Abstract:

<p>This paper investigates the capability of six existing classification algorithms (Artificial Neural Network, Naïve Bayes, k-Nearest Neighbor, Support Vector Machine, Decision Tree and Random Forest) in classifying and predicting diseases in soybean and mushroom datasets using datasets with numerical or categorical attributes. While many similar studies have been conducted on datasets of images to predict plant diseases, the main objective of this study is to suggest classification methods that can be used for disease classification and prediction in datasets that contain raw measurements instead of images. A fungus and a plant dataset, which had many differences, were chosen so that the findings in this paper could be applied to future research for disease prediction and classification in a variety of datasets which contain raw measurements. A key difference between the two datasets, other than one being a fungus and one being a plant, is that the mushroom dataset is balanced and only contained two classes while the soybean dataset is imbalanced and contained eighteen classes. All six algorithms performed well on the mushroom dataset, while the Artificial Neural Network and k-Nearest Neighbor algorithms performed best on the soybean dataset. The findings of this paper can be applied to future research on disease classification and prediction in a variety of dataset types such as fungi, plants, humans, and animals.</p>

APA, Harvard, Vancouver, ISO, and other styles

7

Nunez, John-Jose, Teyden T. Nguyen, Yihan Zhou, Bo Cao, Raymond T. Ng, Jun Chen, Benicio N. Frey, et al. "Replication of machine learning methods to predict treatment outcome with antidepressant medications in patients with major depressive disorder from STAR*D and CAN-BIND-1." PLOS ONE 16, no. 6 (June 28, 2021): e0253023. http://dx.doi.org/10.1371/journal.pone.0253023.

Full text

Abstract:

Objectives Antidepressants are first-line treatments for major depressive disorder (MDD), but 40–60% of patients will not respond, hence, predicting response would be a major clinical advance. Machine learning algorithms hold promise to predict treatment outcomes based on clinical symptoms and episode features. We sought to independently replicate recent machine learning methodology predicting antidepressant outcomes using the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) dataset, and then externally validate these methods to train models using data from the Canadian Biomarker Integration Network in Depression (CAN-BIND-1) dataset. Methods We replicated methodology from Nie et al (2018) using common algorithms based on linear regressions and decision trees to predict treatment-resistant depression (TRD, defined as failing to respond to 2 or more antidepressants) in the STAR*D dataset. We then trained and externally validated models using the clinical features found in both datasets to predict response (≥50% reduction on the Quick Inventory for Depressive Symptomatology, Self-Rated [QIDS-SR]) and remission (endpoint QIDS-SR score ≤5) in the CAN-BIND-1 dataset. We evaluated additional models to investigate how different outcomes and features may affect prediction performance. Results Our replicated models predicted TRD in the STAR*D dataset with slightly better balanced accuracy than Nie et al (70%-73% versus 64%-71%, respectively). Prediction performance on our external methodology validation on the CAN-BIND-1 dataset varied depending on outcome; performance was worse for response (best balanced accuracy 65%) compared to remission (77%). Using the smaller set of features found in both datasets generally improved prediction performance when evaluated on the STAR*D dataset. Conclusion We successfully replicated prior work predicting antidepressant treatment outcomes using machine learning methods and clinical data. We found similar prediction performance using these methods on an external database, although prediction of remission was better than prediction of response. Future work is needed to improve prediction performance to be clinically useful.

APA, Harvard, Vancouver, ISO, and other styles

8

Ahamed, B. Shamreen, Meenakshi S. Arya, and Auxilia Osvin V. Nancy. "Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation." Advances in Human-Computer Interaction 2022 (September 19, 2022): 1–14. http://dx.doi.org/10.1155/2022/9220560.

Full text

Abstract:

The technical improvements in healthcare sector today have given rise to many new inventions in the field of artificial intelligence. Patterns for disease identification are carried out, and the onset of prediction of many diseases is detected. Diseases include diabetes mellitus disease, fatal heart diseases, and symptomatic cancer. There are many algorithms that have played a critical role in the prediction of diseases. This paper proposes an ML based approach for diabetes mellitus disease prediction. For diabetes prediction, many ML algorithms are compared and used in the proposed work, and finally the three ML classifiers providing the highest accuracy are determined: RF, GBM, and LGBM. The accuracy of prediction is obtained using two types of datasets. They are Pima Indians dataset and a curated dataset. The ML classifiers LGBM, GB, and RF are used to build a predictive model, and the accuracy of each classifier is noted and compared. In addition to the generalized prediction mechanism, the data augmentation technique is also used, and the final accuracy of prediction is obtained for the classifiers LGBM, GB, and RF. A comparative study and demonstration between augmentation and non-augmentation are also discussed for the two datasets used in order to further improve the performance accuracy for predicting diabetes disease.

APA, Harvard, Vancouver, ISO, and other styles

9

Partin, Alexander, Thomas S. Brettin, Yitan Zhu, Jamie Overbeek, Oleksandr Narykov, Priyanka Vasanthakumari, Austin Clyde, et al. "Abstract 5380: Systematic evaluation and comparison of drug response prediction models: a case study of prediction generalization across cell lines datasets." Cancer Research 83, no. 7_Supplement (April 4, 2023): 5380. http://dx.doi.org/10.1158/1538-7445.am2023-5380.

Full text

Abstract:

Abstract Predictive modeling holds great promise for improving personalized cancer treatment and efficiency of drug development. In recent years, deep learning (DL) has been extensively explored for drug response prediction (DRP), outperforming classical machine learning in prediction generalization to new data. Despite the considerable interest in DRP, no agreed-upon methodology for evaluating and comparing the diverse DL models yet exists. Existing papers generally demonstrate the performance of proposed models using cross-validation within a single cell line dataset and compare with baseline models of their choice, substantially limiting the scope and validity of model evaluation and comparison. In this work, we investigate the ability of DRP models for generalizing predictions across datasets of multiple drug screening studies, a more challenging scenario mimicking practical applications of DRP models. Five cell line datasets and six community DRP models with advanced DL architectures have been explored. Public cell line drug screening datasets have been curated and processed for this analysis, including CCLE, CTRP, GDSC1, GDSC2, and GCSI. For each dataset, the same preprocessing pipeline was used to generate cell line gene expressions, drug representations, and drug response values. The six DRP models include advanced architectures and feature engineering methods such as transformer, graph neural network, and image representation of tabular data. Systematic model curation and training have been applied, including consistent training and testing data splits across models and hyperparameter optimization (HPO). To cope with the large-scale model training and HPO, automatic workflows have been implemented and executed on high-performance computing systems. A 5-by-5 matrix of prediction scores, corresponding to the five datasets in both row and column dimensions, has been generated for each model, with off-diagonal values representing the cross-dataset generalization. Despite the advanced DL techniques, all models exhibit substantially inferior performance in cross-dataset analysis as compared with cross-validation within a single dataset. This result demonstrates the challenge of cross-dataset generalization for DRP and motivates the need for rigorous and systematic evaluation of DRP models, which simulates real-world applications. Citation Format: Alexander Partin, Thomas S. Brettin, Yitan Zhu, Jamie Overbeek, Oleksandr Narykov, Priyanka Vasanthakumari, Austin Clyde, Sara E. Jones, Satishkumar Ranganathan Ganakammal, Justin M. Wozniak, Andreas Wilke, Jamaludin Mohd-Yusof, Michael R. Weil, Alexander T. Pearson, Rick L. Stevens. Systematic evaluation and comparison of drug response prediction models: a case study of prediction generalization across cell lines datasets. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5380.

APA, Harvard, Vancouver, ISO, and other styles

10

Preethi, B. Meena, R. Gowtham, S. Aishvarya, S. Karthick, and D. G. Sabareesh. "Rainfall Prediction using Machine Learning and Deep Learning Algorithms." International Journal of Recent Technology and Engineering (IJRTE) 10, no. 4 (November 30, 2021): 251–54. http://dx.doi.org/10.35940/ijrte.d6611.1110421.

Full text

Abstract:

The project entitled as “Rainfall Prediction using Machine Learning & Deep Learning Algorithms” is a research project which is developed in Python Language and dataset is stored in Microsoft Excel. This prediction uses various machine learning and deep learning algorithms to find which algorithm predicts with most accurately. Rainfall prediction can be achieved by using binary classification under Data Mining. Predicting the rainfall is very important in several aspects of one’s country and can help from preventing serious natural disasters. For this prediction, Artificial Neural Network using Forward and Backward Propagation, Ada Boost, Gradient Boosting and XGBoost algorithms are used in this model for predicting the rainfall. There are totally five modules used in this project. The Data Analysis Module will analyse the datasets and finding the missing values in the dataset. The Data Pre-processing includes Data Cleaning which is the process of filling the missing values in the dataset. The Feature Transformation Module is used to modify the features of the dataset. The Data Mining Module is used to train the dataset to models using any algorithm for learning the pattern. The Model Evaluation Module is used to measure the performance of the model and finalize the overall best accuracy for the prediction. Dataset used in this prediction is for the country Australia. This main aim of the project is to compare the various boosting algorithms with the neural network and find the best algorithm among them. This prediction can be major advantage to the farmers in order to plant the types of crops according to the needy of water. Overall, we analyse the algorithm which is feasible for qualitatively predicting the rainfall.

APA, Harvard, Vancouver, ISO, and other styles

11

Tian, Simiao, Laurence Mioche, Jean-Baptiste Denis, and Béatrice Morio. "A multivariate model for predicting segmental body composition." British Journal of Nutrition 110, no. 12 (July 11, 2013): 2260–70. http://dx.doi.org/10.1017/s0007114513001803.

Full text

Abstract:

The aims of the present study were to propose a multivariate model for predicting simultaneously body, trunk and appendicular fat and lean masses from easily measured variables and to compare its predictive capacity with that of the available univariate models that predict body fat percentage (BF%). The dual-energy X-ray absorptiometry (DXA) dataset (52 % men and 48 % women) with White, Black and Hispanic ethnicities (1999–2004, National Health and Nutrition Examination Survey) was randomly divided into three sub-datasets: a training dataset (TRD), a test dataset (TED); a validation dataset (VAD), comprising 3835, 1917 and 1917 subjects. For each sex, several multivariate prediction models were fitted from the TRD using age, weight, height and possibly waist circumference. The most accurate model was selected from the TED and then applied to the VAD and a French DXA dataset (French DB) (526 men and 529 women) to assess the prediction accuracy in comparison with that of five published univariate models, for which adjusted formulas were re-estimated using the TRD. Waist circumference was found to improve the prediction accuracy, especially in men. For BF%, the standard error of prediction (SEP) values were 3·26 (3·75) % for men and 3·47 (3·95) % for women in the VAD (French DB), as good as those of the adjusted univariate models. Moreover, the SEP values for the prediction of body and appendicular lean masses ranged from 1·39 to 2·75 kg for both the sexes. The prediction accuracy was best for age < 65 years, BMI < 30 kg/m2and the Hispanic ethnicity. The application of our multivariate model to large populations could be useful to address various public health issues.

APA, Harvard, Vancouver, ISO, and other styles

12

Yuda Syahidin, Aditya Pratama Ismail, and Fawwaz Nafis Siraj. "Application of Artificial Neural Network Algorithms to Heart Disease Prediction Models with Python Programming." Jurnal E-Komtek (Elektro-Komputer-Teknik) 6, no. 2 (December 31, 2022): 292–302. http://dx.doi.org/10.37339/e-komtek.v6i2.932.

Full text

Abstract:

Heart disease is one of the deadliest diseases in and is the number one killer in the world so many studies are carried out to contribute to predicting a person's heart disease. This study aims to help create an early heart disease prediction model from the UCI Machine Learning Repository dataset. The method proposed in this study is a deep learning technique that applies an artificial neural network algorithm with a hidden layer technique in making a heart disease prediction model. This research stage found problems in improving the accuracy of the datasets used by dealing with problems in pre-processing data, such as missing data and determining the form of data correlation. The model was then tested through a heart disease dataset and yielded 90% accuracy. With the creation of this prediction model with python programming, it is hoped that in addition to helping to make disease predictions, it can also provide further innovations in data science in the health sector.

APA, Harvard, Vancouver, ISO, and other styles

13

Zulqarnain, Muhammad, Rozaida Ghazali, Muhammad Ghulam Ghouse, Yana Mazwin Mohmad Hassim, and Irfan Javid. "Predicting Financial Prices of Stock Market using Recurrent Convolutional Neural Networks." International Journal of Intelligent Systems and Applications 12, no. 6 (December 8, 2020): 21–32. http://dx.doi.org/10.5815/ijisa.2020.06.02.

Full text

Abstract:

Financial time-series prediction has been long and the most challenging issues in financial market analysis. The deep neural networks is one of the excellent data mining approach has received great attention by researchers in several areas of time-series prediction since last 10 years. “Convolutional neural network (CNN) and recurrent neural network (RNN) models have become the mainstream methods for financial predictions. In this paper, we proposed to combine architectures, which exploit the advantages of CNN and RNN simultaneously, for the prediction of trading signals. Our model is essentially presented to financial time series predicting signals through a CNN layer, and directly fed into a gated recurrent unit (GRU) layer to capture long-term signals dependencies. GRU model perform better in sequential learning tasks and solve the vanishing gradients and exploding issue in standard RNNs. We evaluate our model on three datasets for stock indexes of the Hang Seng Indexes (HSI), the Deutscher Aktienindex (DAX) and the S&P 500 Index range 2008 to 2016, and associate the GRU-CNN based approaches with the existing deep learning models. Experimental results present that the proposed GRU-CNN model obtained the best prediction accuracy 56.2% on HIS dataset, 56.1% on DAX dataset and 56.3% on S&P500 dataset respectively.

APA, Harvard, Vancouver, ISO, and other styles

14

Li, Wencui, Hongru Shen, Lizhu Han, Jiaxin Liu, Bohan Xiao, Xubin Li, and Zhaoxiang Ye. "A Multiparametric Fusion Radiomics Signature Based on Contrast-Enhanced MRI for Predicting Early Recurrence of Hepatocellular Carcinoma." Journal of Oncology 2022 (September 28, 2022): 1–12. http://dx.doi.org/10.1155/2022/3704987.

Full text

Abstract:

Objectives. The postoperative early recurrence (ER) rate of hepatocellular carcinoma (HCC) is 50%, and no highly reliable predictive tool has been developed yet. The aim of this study was to develop and validate a predictive model with radiomics analysis based on multiparametric magnetic resonance (MR) images to predict early recurrence of HCC. Methods. In total, 302 patients (training dataset: n = 211; validation dataset: n = 91) with pathologically confirmed HCC who underwent preoperative MR imaging were enrolled in this study. Three-dimensional regions of interest of the entire lesion were accessed by manually drawing along the tumor margins on the multiple sequences of MR images. Least absolute shrinkage and selection operator Cox regression were then applied to select ER-related radiomics features and construct radiomics signatures. Univariate analysis and multivariate Cox regression analysis were used to identify the significant clinico-radiological factors and establish a clinico-radiological model. A predictive model of ER incorporating the fusion radiomics signature and clinico-radiological risk factors was constructed. The diagnostic performance and clinical utility of this model were measured by receiver-operating characteristic (ROC), calibration curve, and decision curve analyses. Results. The fusion radiomics signature consisting of 6 radiomics features achieved good prediction performance (training dataset: AUC = 0.85, validation dataset: AUC = 0.79). The predictive model of ER integrating clinico-radiological risk factors and the fusion radiomics signature improved the prediction efficacy with AUCs of 0.91 and 0.87 in the training and validation datasets, respectively. Furthermore, the nomogram and ER risk stratification system based on the predictive model demonstrated encouraging predictions of the individualized risk of ER and gave three risk groups with low, intermediate, or high risk of ER. Conclusions. The proposed predictive model incorporating clinico-radiological factors and the fusion radiomics signature derived from multiparametric MR images may be an effective tool for the individualized prediction of postoperative ER in patients with HCC.

APA, Harvard, Vancouver, ISO, and other styles

15

Ferenc, Rudolf, Zoltán Tóth, Gergely Ladányi, István Siket, and Tibor Gyimóthy. "A public unified bug dataset for java and its assessment regarding metrics and bug prediction." Software Quality Journal 28, no. 4 (June 3, 2020): 1447–506. http://dx.doi.org/10.1007/s11219-020-09515-0.

Full text

Abstract:

AbstractBug datasets have been created and used by many researchers to build and validate novel bug prediction models. In this work, our aim is to collect existing public source code metric-based bug datasets and unify their contents. Furthermore, we wish to assess the plethora of collected metrics and the capabilities of the unified bug dataset in bug prediction. We considered 5 public datasets and we downloaded the corresponding source code for each system in the datasets and performed source code analysis to obtain a common set of source code metrics. This way, we produced a unified bug dataset at class and file level as well. We investigated the diversion of metric definitions and values of the different bug datasets. Finally, we used a decision tree algorithm to show the capabilities of the dataset in bug prediction. We found that there are statistically significant differences in the values of the original and the newly calculated metrics; furthermore, notations and definitions can severely differ. We compared the bug prediction capabilities of the original and the extended metric suites (within-project learning). Afterwards, we merged all classes (and files) into one large dataset which consists of 47,618 elements (43,744 for files) and we evaluated the bug prediction model build on this large dataset as well. Finally, we also investigated cross-project capabilities of the bug prediction models and datasets. We made the unified dataset publicly available for everyone. By using a public unified dataset as an input for different bug prediction related investigations, researchers can make their studies reproducible, thus able to be validated and verified.

APA, Harvard, Vancouver, ISO, and other styles

16

Du, Hao, Ziyuan Pan, Kee Yuan Ngiam, Fei Wang, Ping Shum, and Mengling Feng. "Self-Correcting Recurrent Neural Network for Acute Kidney Injury Prediction in Critical Care." Health Data Science 2021 (December 23, 2021): 1–10. http://dx.doi.org/10.34133/2021/9808426.

Full text

Abstract:

Background. In critical care, intensivists are required to continuously monitor high-dimensional vital signs and lab measurements to detect and diagnose acute patient conditions, which has always been a challenging task. Recently, deep learning models such as recurrent neural networks (RNNs) have demonstrated their strong potential on predicting such events. However, in real deployment, the patient data are continuously coming and there is no effective adaptation mechanism for RNN to incorporate those new data and become more accurate. Methods. In this study, we propose a novel self-correcting mechanism for RNN to fill in this gap. Our mechanism feeds prediction errors from the predictions of previous timestamps into the prediction of the current timestamp, so that the model can “learn” from previous predictions. We also proposed a regularization method that takes into account not only the model’s prediction errors on the labels but also its estimation errors on the input data. Results. We compared the performance of our proposed method with the conventional deep learning models on two real-world clinical datasets for the task of acute kidney injury (AKI) prediction and demonstrated that the proposed model achieved an area under ROC curve at 0.893 on the MIMIC-III dataset and 0.871 on the Philips eICU dataset. Conclusions. The proposed self-correcting RNNs demonstrated effectiveness in AKI prediction and have the potential to be applied to clinical applications.

APA, Harvard, Vancouver, ISO, and other styles

17

Wynants, L., Y. Vergouwe, S. Van Huffel, D. Timmerman, and B. Van Calster. "Does ignoring clustering in multicenter data influence the performance of prediction models? A simulation study." Statistical Methods in Medical Research 27, no. 6 (September 19, 2016): 1723–36. http://dx.doi.org/10.1177/0962280216668555.

Full text

Abstract:

Clinical risk prediction models are increasingly being developed and validated on multicenter datasets. In this article, we present a comprehensive framework for the evaluation of the predictive performance of prediction models at the center level and the population level, considering population-averaged predictions, center-specific predictions, and predictions assuming an average random center effect. We demonstrated in a simulation study that calibration slopes do not only deviate from one because of over- or underfitting of patterns in the development dataset, but also as a result of the choice of the model (standard versus mixed effects logistic regression), the type of predictions (marginal versus conditional versus assuming an average random effect), and the level of model validation (center versus population). In particular, when data is heavily clustered (ICC 20%), center-specific predictions offer the best predictive performance at the population level and the center level. We recommend that models should reflect the data structure, while the level of model validation should reflect the research question.

APA, Harvard, Vancouver, ISO, and other styles

18

Kim, Eunhye, Tsatsral Amarbayasgalan, and Hoon Jung. "Efficient Weighted Ensemble Method for Predicting Peak-Period Postal Logistics Volume: A South Korean Case Study." Applied Sciences 12, no. 23 (November 23, 2022): 11962. http://dx.doi.org/10.3390/app122311962.

Full text

Abstract:

Demand prediction for postal delivery services is useful for managing logistic operations optimally. Particularly for holiday periods, namely the Lunar New Year and Korean Thanksgiving Day (Chuseok) in South Korea, the logistics service increases sharply compared with the usual period, which makes it hard to provide reliable operation in mail centers. This study proposes a Multilayer Perceptron-based weighted ensemble method for predicting the accepted parcel volumes during special periods. The proposed method consists of two main phases: the first phase enriches the training dataset via synthetic samples using unsupervised learning; the second phase builds two Multilayer Perceptron models using internal and external factor-derived features for prediction. The final result is estimated by the weighted average predictions of these models. We conducted experiments on 25 Korean mail center datasets. The experimental study on the dataset provided by Korea Post shows better performance than other compared methods.

APA, Harvard, Vancouver, ISO, and other styles

19

Sakiyama, Hiroshi, Motohisa Fukuda, and Takashi Okuno. "Prediction of Blood-Brain Barrier Penetration (BBBP) Based on Molecular Descriptors of the Free-Form and In-Blood-Form Datasets." Molecules 26, no. 24 (December 7, 2021): 7428. http://dx.doi.org/10.3390/molecules26247428.

Full text

Abstract:

The blood-brain barrier (BBB) controls the entry of chemicals from the blood to the brain. Since brain drugs need to penetrate the BBB, rapid and reliable prediction of BBB penetration (BBBP) is helpful for drug development. In this study, free-form and in-blood-form datasets were prepared by modifying the original BBBP dataset, and the effects of the data modification were investigated. For each dataset, molecular descriptors were generated and used for BBBP prediction by machine learning (ML). For ML, the dataset was split into training, validation, and test data by the scaffold split algorithm MoleculeNet used. This creates an unbalanced split and makes the prediction difficult; however, we decided to use that algorithm to evaluate the predictive performance for unknown compounds dissimilar to existing ones. The highest prediction score was obtained by the random forest model using 212 descriptors from the free-form dataset, and this score was higher than the existing best score using the same split algorithm without using any external database. Furthermore, using a deep neural network, a comparable result was obtained with only 11 descriptors from the free-form dataset, and the resulting descriptors suggested the importance of recognizing the glucose-like characteristics in BBBP prediction.

APA, Harvard, Vancouver, ISO, and other styles

20

Hijazi, Ala, Sameer Al-Dahidi, and Safwan Altarazi. "A Novel Assisted Artificial Neural Network Modeling Approach for Improved Accuracy Using Small Datasets: Application in Residual Strength Evaluation of Panels with Multiple Site Damage Cracks." Applied Sciences 10, no. 22 (November 20, 2020): 8255. http://dx.doi.org/10.3390/app10228255.

Full text

Abstract:

An artificial neural network (ANN) extracts knowledge from a training dataset and uses this acquired knowledge to forecast outputs for any new set of inputs. When the input/output relations are complex and highly non-linear, the ANN needs a relatively large training dataset (hundreds of data points) to capture these relations adequately. This paper introduces a novel assisted-ANN modeling approach that enables the development of ANNs using small datasets, while maintaining high prediction accuracy. This approach uses parameters that are obtained using the known input/output relations (partial or full relations). These so called assistance parameters are included as ANN inputs in addition to the traditional direct independent inputs. The proposed assisted approach is applied for predicting the residual strength of panels with multiple site damage (MSD) cracks. Different assistance levels (four levels) and different training dataset sizes (from 75 down to 22 data points) are investigated, and the results are compared to the traditional approach. The results show that the assisted approach helps in achieving high predictions’ accuracy (<3% average error). The relative accuracy improvement is higher (up to 46%) for ANN learning algorithms that give lower prediction accuracy. Also, the relative accuracy improvement becomes more significant (up to 38%) for smaller dataset sizes.

APA, Harvard, Vancouver, ISO, and other styles

21

Chen, Qi, Bihan Tang, Yinghong Zhai, Yuqi Chen, Zhichao Jin, Hedong Han, Yongqing Gao, Cheng Wu, Tao Chen, and Jia He. "Dynamic statistical model for predicting the risk of death among older Chinese people, using longitudinal repeated measures of the frailty index: a prospective cohort study." Age and Ageing 49, no. 6 (May 4, 2020): 966–73. http://dx.doi.org/10.1093/ageing/afaa056.

Full text

Abstract:

Abstract Background Frailty is a common characteristic of older people with the ageing process. We aimed to develop and validate a dynamic statistical prediction model to calculate the risk of death in people aged ≥65 years, using a longitudinal frailty index (FI). Methods One training dataset and three validation datasets from the Chinese Longitudinal Healthy Longevity Survey (CLHLS) were used in our study. The training dataset and validation datasets 1 to 3 included data from 9,748, 7,459, 9,093 and 6,368 individuals, respectively. We used 35 health deficits to construct the FI and a longitudinal FI based on repeated measurement of FI at every wave of the CLHLS. A joint model was used to build a dynamic prediction model considering both baseline covariates and the longitudinal FI. Areas under time-dependent receiver operating characteristic curves (AUCs) and calibration curves were employed to assess the predictive performance of the model. Results A linear mixed-effects model used time, sex, residence (city, town, or rural), living alone, smoking and alcohol consumption to calculate a subject-specific longitudinal FI. The dynamic prediction model was built using the longitudinal FI, age, residence, sex and an FI–age interaction term. The AUCs ranged from 0.64 to 0.84, and both the AUCs and the calibration curves showed good predictive ability. Conclusions We developed a dynamic prediction model that was able to update predictions of the risk of death as updated measurements of FI became available. This model could be used to estimate the risk of death in individuals aged >65 years.

APA, Harvard, Vancouver, ISO, and other styles

22

Lee, Chia-Ying, Suzana J. Camargo, Fréderic Vitart, Adam H. Sobel, Joanne Camp, Shuguang Wang, Michael K. Tippett, and Qidong Yang. "Subseasonal Predictions of Tropical Cyclone Occurrence and ACE in the S2S Dataset." Weather and Forecasting 35, no. 3 (April 22, 2020): 921–38. http://dx.doi.org/10.1175/waf-d-19-0217.1.

Full text

Abstract:

Abstract Probabilistic tropical cyclone (TC) occurrence, at lead times of week 1–4, in the Subseasonal to Seasonal (S2S) dataset are examined here. Forecasts are defined over 15° in latitude × 20° in longitude regions, and the prediction skill is measured using the Brier skill score with reference to climatological reference forecasts. Two types of reference forecasts are used: a seasonally constant one and a seasonally varying one, with the latter used for forecasts of anomalies from the seasonal climatology. Models from the European Centre for Medium-Range Weather Forecasts (ECMWF), Australian Bureau of Meteorology, and Météo-France/Centre National de Recherche Météorologiques have skill in predicting TC occurrence four weeks in advance. In contrast, only the ECMWF model is skillful in predicting the anomaly of TC occurrence beyond one week. Errors in genesis prediction largely limit models’ skill in predicting TC occurrence. Three calibration techniques, removing the mean genesis and occurrence forecast biases, and a linear regression method, are explored here. The linear regression method performs the best and guarantees a higher skill score when applied to the in-sample dataset. However, when applied to the out-of-sample data, especially in areas where the TC sample size is small, it may reduce the models’ prediction skill. Generally speaking, the S2S models are more skillful in predicting TC occurrence during favorable Madden–Julian oscillation phases. Last, we also report accumulated cyclone energy predictions skill using the ranked probability skill score.

APA, Harvard, Vancouver, ISO, and other styles

23

Lo, Jui-En, Eugene Yu-Chuan Kang, Yun-Nung Chen, Yi-Ting Hsieh, Nan-Kai Wang, Ta-Ching Chen, Kuan-Jen Chen, et al. "Data Homogeneity Effect in Deep Learning-Based Prediction of Type 1 Diabetic Retinopathy." Journal of Diabetes Research 2021 (December 28, 2021): 1–9. http://dx.doi.org/10.1155/2021/2751695.

Full text

Abstract:

This study is aimed at evaluating a deep transfer learning-based model for identifying diabetic retinopathy (DR) that was trained using a dataset with high variability and predominant type 2 diabetes (T2D) and comparing model performance with that in patients with type 1 diabetes (T1D). The Kaggle dataset, which is a publicly available dataset, was divided into training and testing Kaggle datasets. In the comparison dataset, we collected retinal fundus images of T1D patients at Chang Gung Memorial Hospital in Taiwan from 2013 to 2020, and the images were divided into training and testing T1D datasets. The model was developed using 4 different convolutional neural networks (Inception-V3, DenseNet-121, VGG1, and Xception). The model performance in predicting DR was evaluated using testing images from each dataset, and area under the curve (AUC), sensitivity, and specificity were calculated. The model trained using the Kaggle dataset had an average (range) AUC of 0.74 (0.03) and 0.87 (0.01) in the testing Kaggle and T1D datasets, respectively. The model trained using the T1D dataset had an AUC of 0.88 (0.03), which decreased to 0.57 (0.02) in the testing Kaggle dataset. Heatmaps showed that the model focused on retinal hemorrhage, vessels, and exudation to predict DR. In wrong prediction images, artifacts and low-image quality affected model performance. The model developed with the high variability and T2D predominant dataset could be applied to T1D patients. Dataset homogeneity could affect the performance, trainability, and generalization of the model.

APA, Harvard, Vancouver, ISO, and other styles

24

Liu, Yimo, Wanchang Zhang, Zhijie Zhang, Qiang Xu, and Weile Li. "Risk Factor Detection and Landslide Susceptibility Mapping Using Geo-Detector and Random Forest Models: The 2018 Hokkaido Eastern Iburi Earthquake." Remote Sensing 13, no. 6 (March 18, 2021): 1157. http://dx.doi.org/10.3390/rs13061157.

Full text

Abstract:

Landslide susceptibility mapping is an effective approach for landslide risk prevention and assessments. The occurrence of slope instability is highly correlated with intrinsic variables that contribute to the occurrence of landslides, such as geology, geomorphology, climate, hydrology, etc. However, feature selection of those conditioning factors to constitute datasets with optimal predictive capability effectively and accurately is still an open question. The present study aims to examine further the integration of the selected landslide conditioning factors with Q-statistic in Geo-detector for determining stratification and selection of landslide conditioning factors in landslide risk analysis as to ultimately optimize landslide susceptibility model prediction. The location chosen for the study was Atsuma Town, which suffered from landslides following the Eastern Iburi Earthquake in 2018 in Hokkaido, Japan. A total of 13 conditioning factors were obtained from different sources belonging to six categories: geology, geomorphology, seismology, hydrology, land cover/use and human activity; these were selected to generate the datasets for landslide susceptibility mapping. The original datasets of landslide conditioning factors were analyzed with Q-statistic in Geo-detector to examine their explanatory powers regarding the occurrence of landslides. A Random Forest (RF) model was adopted for landslide susceptibility mapping. Subsequently, four subsets, including the Manually delineated landslide Points with 9 features Dataset (MPD9), the Randomly delineated landslide Points with 9 features Dataset (RPD9), the Manually delineated landslide Points with 13 features Dataset (MPD13), and the Randomly delineated landslide Points with 13 features Dataset (RPD13), were selected by an analysis of Q-statistic for training and validating the Geo-detector-RF- integrated model. Overall, using dataset MPD9, the Geo-detector-RF-integrated model yielded the highest prediction accuracy (89.90%), followed by using dataset MPD13 (89.53%), dataset RPD13 (88.63%) and dataset RPD9 (87.07%), which implied that optimized conditioning factors can effectively improve the prediction accuracy of landslide susceptibility mapping.

APA, Harvard, Vancouver, ISO, and other styles

25

M.G, Rahul, Srujan R. Rajanalli, Sammed Endoli, Mahantesh Magi, and Dr N. Ramavenkateswaran. "Machine Learning Algorithms for Classification of Gas Sensor Array Dataset." Journal of University of Shanghai for Science and Technology 23, no. 06 (June 17, 2021): 721–28. http://dx.doi.org/10.51201/jusst/21/05331.

Full text

Abstract:

To measure the accuracy of the data being sensed predictive machine learning models have been used. These models take input in the form of datasets and predict the output based on them. By using a large dataset better and efficient predictive models can be designed because a large amount of data can be used to train the model. But having a larger dataset leads to a dimensionality problem. This problem is solved using Dimensionality Reduction Principal Component Analysis(PCA) algorithm. PCA helps to reduce the redundant data or correlated data present in the dataset by which dimensionality of the dataset is reduced. Classifier algorithms like K Nearest Neighbour(KNN), Logistic Regression(LR), Naive Bayes(NB), and Support Vector Machine(SVM) are used which gives output in the form of the confusion matrix. From this confusion matrix, the prediction accuracy of models is decided. From the accuracy measurements, it is found that the SVM model is more accurate(94%) in predicting the output whereas the NB model is the least accurate(60%).

APA, Harvard, Vancouver, ISO, and other styles

26

Anorboev, Abdulaziz, Javokhir Musaev, Sarvinoz Anorboeva, Jeongkyu Hong, Yeong-Seok Seo, Thanh Nguyen, and Dosam Hwang. "Ensemble of top3 prediction with image pixel interval method using deep learning." Computer Science and Information Systems, no. 00 (2023): 56. http://dx.doi.org/10.2298/csis230223056a.

Full text

Abstract:

Computer vision (CV) has been successfully used in picture categorization applications in various fields, including medicine, production quality control, and transportation systems. CV models use an excessive number of photos to train potential models. Considering that image acquisition is typically expensive and time-consuming, in this study, we provide a multistep strategy to improve image categorization accuracy with less data. In the first stage, we constructed numerous datasets from a single dataset. Given that an image has pixels with values ranging from 0 to 255, the images were separated into pixel intervals based on the type of dataset. The pixel interval was split into two portions when the dataset was grayscale and five portions when it was composed of RGB images. Next, we trained the model using both the original and newly constructed datasets. Each image in the training process showed a non-identical prediction space, and we suggested using the top three prediction probability ensemble technique. The top three predictions for the newly created images were combined with the corresponding probability for the original image. The results showed that learning patterns from each interval of pixels and ensembling the top three predictions significantly improve the performance and accuracy, and this strategy can be used with any model.

APA, Harvard, Vancouver, ISO, and other styles

27

Sumalatha, M., and Latha Parthiban. "Augmentation of Predictive Competence of Non-Small Cell Lung Cancer Datasets through Feature Pre-Processing Techniques." EAI Endorsed Transactions on Pervasive Health and Technology 8, no. 5 (November 2, 2022): e1. http://dx.doi.org/10.4108/eetpht.v8i5.3169.

Full text

Abstract:

The major Objective of the Study is to augment the predictive analytics of Non-Small Cell Lung Cancer (NSCLC) datasets with Feature Pre-Processing (FPP) technique in three stages viz. Remove base errors with common analytics on emptiness or non-numerical or missing values in the dataset, remove repeated features through regression analysis and eliminate irrelevant features through clustering methods. The FPP Model is validated using classifiers like simple and complex Tree, Linear and Gaussian SVM, Weighted KNN and Boosted Trees in terms of accuracy, sensitivity, specificity, kappa, positive and negative likelihood. The result showed that the NSCLC dataset formed after FPP outperformed the raw NSCLC dataset in all performance levels and showed good augmentation in predictive analytics of NSCLC datasets. The research proved that preprocessing is essential for better prediction of complex medical datasets.

APA, Harvard, Vancouver, ISO, and other styles

28

Ma, Yuexin, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, and Dinesh Manocha. "TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6120–27. http://dx.doi.org/10.1609/aaai.v33i01.33016120.

Full text

Abstract:

To safely and efficiently navigate in complex urban traffic, autonomous vehicles must make responsible predictions in relation to surrounding traffic-agents (vehicles, bicycles, pedestrians, etc.). A challenging and critical task is to explore the movement patterns of different traffic-agents and predict their future trajectories accurately to help the autonomous vehicle make reasonable navigation decision. To solve this problem, we propose a long short-term memory-based (LSTM-based) realtime traffic prediction algorithm, TrafficPredict. Our approach uses an instance layer to learn instances’ movements and interactions and has a category layer to learn the similarities of instances belonging to the same type to refine the prediction. In order to evaluate its performance, we collected trajectory datasets in a large city consisting of varying conditions and traffic densities. The dataset includes many challenging scenarios where vehicles, bicycles, and pedestrians move among one another. We evaluate the performance of TrafficPredict on our new dataset and highlight its higher accuracy for trajectory prediction by comparing with prior prediction methods.

APA, Harvard, Vancouver, ISO, and other styles

29

Albahli, Saleh. "A Deep Ensemble Learning Method for Effort-Aware Just-In-Time Defect Prediction." Future Internet 11, no. 12 (November 20, 2019): 246. http://dx.doi.org/10.3390/fi11120246.

Full text

Abstract:

Since the introduction of just-in-time effort aware defect prediction, many researchers are focusing on evaluating the different learning methods, which can predict the defect inducing changes in a software product. In order to predict these changes, it is important for a learning model to consider the nature of the dataset, its unbalancing properties and the correlation between different attributes. In this paper, we evaluated the importance of these properties for a specific dataset and proposed a novel methodology for learning the effort aware just-in-time prediction of defect inducing changes. Moreover, we devised an ensemble classifier, which fuses the output of three individual classifiers (Random forest, XGBoost, Multi-layer perceptron) to build an efficient state-of-the-art prediction model. The experimental analysis of the proposed methodology showed significant performance with 77% accuracy on the sample dataset and 81% accuracy on different datasets. Furthermore, we proposed a highly competent reinforcement learning technique to avoid false alarms in real time predictions.

APA, Harvard, Vancouver, ISO, and other styles

30

Liang, Yun-Chia, Yona Maimury, Angela Hsiang-Ling Chen, and Josue Rodolfo Cuevas Juarez. "Machine Learning-Based Prediction of Air Quality." Applied Sciences 10, no. 24 (December 21, 2020): 9151. http://dx.doi.org/10.3390/app10249151.

Full text

Abstract:

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.

APA, Harvard, Vancouver, ISO, and other styles

31

Carton, Quinten, Bart Merema, and Hilde Breesch. "Recommendations for model identification for MPC of an all-Air HVAC system." E3S Web of Conferences 246 (2021): 11006. http://dx.doi.org/10.1051/e3sconf/202124611006.

Full text

Abstract:

Rule-based control (RBC) strategies are often unable to execute the optimal control action, which leads to unnecessary energy consumption and suboptimal comfort. Model predictive control (MPC) is a dynamic control strategy for heating, ventilation and air-conditioning (HVAC) systems that is mostly more capable of performing optimal control actions. The identification process of predictive models is an essential aspect of MPC. However, this model identification process remains time consuming due to the large variation in buildings and systems. The aim of this paper is to determine guidelines to identify predictive grey-box models more time efficient, thus enhancing the applicability of MPC. This paper focusses on a case study building equipped with an all-air HVAC system, which combines ventilation, heating and cooling. Making both temperature and CO2-concentration key parameters to predict. The grey-box model represents an open zone in a landscaped office, making the influence of neighbouring zones an additional challenge. Different models for predicting the zone temperature and CO2-concentration are identified, evaluated and validated using CTSM-R. The following aspects are studied: the dataset size, the influence of neighbouring zones, the difference between winter and summer conditions, number of states and the prediction horizon. A three state RC-model with the implementation of the zone temperature of one neighbouring zone is preferred for predicting the indoor temperature with an acceptable prediction horizon of one day. However, this temperature model is not suitable during sunny periods. A simple model representing a mass balance obtains accurate predictions of the zone CO2-concentration for a timestep of 15 minutes. For both model types the utilization of 5-day datasets is favoured over 12-day datasets due to a shorter monitoring period, lower prediction error and an easier parameter convergence. The usage of 12-day datasets is only preferred when an accurate estimation of the thermal inertia is pursued.

APA, Harvard, Vancouver, ISO, and other styles

32

Ozsert Yıgıt, Gozde, Mehmet Fatih Akay, and Hacer Alak. "Development of New Hybrid Admission Decision Prediction Models Using Support Vector Machines Combined with Feature Selection." New Trends and Issues Proceedings on Humanities and Social Sciences 3, no. 3 (March 22, 2017): 1–10. http://dx.doi.org/10.18844/prosoc.v3i3.1502.

Full text

Abstract:

The purpose of this paper is to develop new hybrid admission decision prediction models by using Support Vector Machines (SVM) combined with a feature selection algorithm to investigate the effect of the predictor variables on the admission decision of a candidate to the School of of Physical Education and Sports at Cukurova University. Experiments have been conducted on the dataset, which contains data of participants who applied to the School in 2006. The dataset has been randomly split into training and test sets using 10-fold cross validation as well as different percentage ratios. The performance of the prediction models for the datasets has been assessed using classification accuracy, specificity, sensitivity, positive predictive value (PPV) and negative predictive value (NPV). The results show that a decrease in the number of predictor variables in the prediction models usually leads to a parallel decrease in classification accuracy.Keywords: machine learning; prediction; physical ability test; feature selection;

APA, Harvard, Vancouver, ISO, and other styles

33

Kneebone, D. G., and G. McL Dryden. "Prediction of diet quality for sheep from faecal characteristics: comparison of near-infrared spectroscopy and conventional chemistry predictive models." Animal Production Science 55, no. 1 (2015): 1. http://dx.doi.org/10.1071/an13252.

Full text

Abstract:

This study evaluated the ability of equations developed from the analysis of faecal material by conventional chemistry (F.CHEM), and by near-infrared spectroscopy (F.NIRS), to predict intake and digestibility of forages fed with or without supplements. In vivo datasets were obtained using 30 sheep and 25 diets to provide 124 diet–faecal pairs, with each sheep fed four or five of the diets. The diets were five forages fed alone or with urea, molasses, cottonseed meal or sorghum grain supplements. Ninety-nine diet–faecal pairs were selected at random, but ensuring that all diets were represented and both the F.CHEM and F.NIRS prediction equations were developed from this dataset. The remaining 25 diet–faecal pairs were used as a validation dataset. Regressions for F.CHEM were developed by stepwise regression, and F.NIRS prediction equations were developed by partial least-squares regression. Prediction equations based solely on faecal analyte concentrations (F.CHEMc) had poor predictive ability, and models incorporating faecal constituent excretion rates (F.CHEMe) were the best at predicting feed constituent intakes. These models had slightly lower standard errors of prediction (SEP) for organic matter (OM) intake and digestible OM intake compared with the F.NIRS models that did not include faecal excretion rates. However, F.NIRS models had lower SEP for protein intake and OM digestibility. Good agreement between the F.CHEMe and F.NIRS methods was evident (according to the 95% limits-of-agreement test), and both predicted the reference values precisely and with small bias. Equations derived from a dataset that included representatives of all diets used in the experiment gave much better prediction of diet characteristics than those developed from a dataset constructed entirely at random. Equations for F.NIRS developed in this way successfully predicted the characteristics of diets that included forages fed alone and with the type of supplements used in tropical Australia.

APA, Harvard, Vancouver, ISO, and other styles

34

Mao, Yiwen, and Asgeir Sorteberg. "Improving Radar-Based Precipitation Nowcasts with Machine Learning Using an Approach Based on Random Forest." Weather and Forecasting 35, no. 6 (December 2020): 2461–78. http://dx.doi.org/10.1175/waf-d-20-0080.1.

Full text

Abstract:

AbstractA binary classification model is trained by random forest using data from 41 stations in Norway to predict the precipitation in a given hour. The predictors consist of results from radar nowcasts and numerical weather predictions. The results demonstrate that the random forest model can improve the precipitation predictions by the radar nowcasts and the numerical weather predictions. This study clarifies whether certain potential factors related to model training can influence the predictive skill of the random forest method. The results indicate that enforcing a balanced prediction by resampling the training datasets or lowering the threshold probability for classification cannot improve the predictive skill of the random forest model. The study reveals that the predictive skill of the random forest model shows seasonality, but is only weakly influenced by the geographic diversity of the training dataset. Finally, the study shows that the most important predictor is the precipitation predictions by the radar nowcasts followed by the precipitation predictions by the numerical weather predictions. Although meteorological variables other than precipitation are weaker predictors, the results suggest that they can help to reduce the false alarm ratio and to increase the success ratio of the precipitation prediction.

APA, Harvard, Vancouver, ISO, and other styles

35

Qian, Tingyu. "Used Car Price Prediction by Using XGBoost." BCP Business & Management 44 (April 27, 2023): 62–68. http://dx.doi.org/10.54691/bcpbm.v44i.4794.

Full text

Abstract:

This article demonstrates that by using methods such as Extreme Gradient Boosting (XGBoost), dummy variables, etc., the selling price can be accurately predicted according to the different conditions and variables of each used car. The used car dataset is divided into a training dataset and a test dataset according to the ratio of 83% and 17%. This article uses a total of three data processing methods to find the most accurate prediction method. The first is to remove the outliers of the training dataset and test dataset, and then directly use the xgboost prediction method for prediction. The second is to remove the outliers and remove the variable power that is most closely related to the price of the used car, and then use the xgboost prediction method to make predictions. The third method is to remove outliers and then normalize the training dataset and test dataset, finally using the xgboost prediction method to predict. The experimental results show that normalizing the dataset and then using XGBoost and dummy variables can be used to predict the selling price accurately and efficiently through the different usage conditions of each used car.

APA, Harvard, Vancouver, ISO, and other styles

36

Asif, Daniyal, Mairaj Bibi, Muhammad Shoaib Arif, and Aiman Mukheimer. "Enhancing Heart Disease Prediction through Ensemble Learning Techniques with Hyperparameter Optimization." Algorithms 16, no. 6 (June 20, 2023): 308. http://dx.doi.org/10.3390/a16060308.

Full text

Abstract:

Heart disease is a significant global health issue, contributing to high morbidity and mortality rates. Early and accurate heart disease prediction is crucial for effectively preventing and managing the condition. However, this remains a challenging task to achieve. This study proposes a machine learning model that leverages various preprocessing steps, hyperparameter optimization techniques, and ensemble learning algorithms to predict heart disease. To evaluate the performance of our model, we merged three datasets from Kaggle that have similar features, creating a comprehensive dataset for analysis. By employing the extra tree classifier, normalizing the data, utilizing grid search cross-validation (CV) for hyperparameter optimization, and splitting the dataset with an 80:20 ratio for training and testing, our proposed approach achieved an impressive accuracy of 98.15%. These findings demonstrated the potential of our model for accurately predicting the presence or absence of heart disease. Such accurate predictions could significantly aid in early prevention, detection, and treatment, ultimately reducing the mortality and morbidity associated with heart disease.

APA, Harvard, Vancouver, ISO, and other styles

37

Son, Hye Min, See Hyung Kim, Bo Ra Kwon, Mi Jeong Kim, Chan Sun Kim, and Seung Hyun Cho. "Preoperative prediction of suboptimal resection in advanced ovarian cancer based on clinical and CT parameters." Acta Radiologica 58, no. 4 (July 22, 2016): 498–504. http://dx.doi.org/10.1177/0284185116658683.

Full text

Abstract:

Background Cytoreduction is important as a survival predictor in advanced ovarian cancer. Purpose To determine the prediction of suboptimal resection (SOR) in advanced ovarian cancer based on clinical and computed tomography (CT) parameters. Material and Methods Between 2007 and 2015, 327 consecutive patients with FIGO stage III–IV ovarian cancer and preoperative CT were included. During 2007–2012, patients were assigned to a derivation dataset ( n = 220) and the others were assigned to a validation dataset ( n = 107). Clinical parameters were reviewed and two radiologists assessed the presence or absence of tabulated parameters on CT images. Logistic regression analyses based on area under the receiver-operating characteristic curve (AUROC) were performed to identify variables predicting SOR, and generated simple score using Cox proportional hazards model. Results There was no statistical difference in patients’ characteristics in both datasets, except for residual disease ( P = 0.001). Optimal resection improved from 45.0% (99/220) in the derivation dataset to 64.4% (69/107) in the validation dataset. Logistic regression identified that Eastern Cooperative Oncology Group-performance status (ECOG-PS 2), involvements of peritoneum, diaphragm, bowel mesentery and suprarenal lymph nodes, and pleural effusion were independent variables of SOR. Overall AUROC for score predicting SOR was 0.761 with sensitivity, specificity, and positive and negative predictive values of 70.6%, 73.2%, 68.7%, and 91.9%, respectively. In the derivation dataset, AUROC was 0.792, with sensitivity of 71.4% and specificity of 74.3%, and AUROC of 0.758 with sensitivity of 69.2% and specificity of 72.8% in the validation dataset. Conclusion CT may be a useful preoperative predictor of SOR in advanced ovarian cancer.

APA, Harvard, Vancouver, ISO, and other styles

38

Mostofi, Fatemeh, Vedat Toğan, and Hasan Basri Başağa. "Real-estate price prediction with deep neural network and principal component analysis." Organization, Technology and Management in Construction: an International Journal 14, no. 1 (January 1, 2022): 2741–59. http://dx.doi.org/10.2478/otmcj-2022-0016.

Full text

Abstract:

Abstract Despite the wide application of deep neural networks (DNN) models, their application over small-sized real-estate price prediction is limited due to the reduced prediction accuracy and the high-dimensionality of the dataset. This study motivates small-sized real-estate agencies to take DNN-driven decisions using the available local dataset. To improve the high-dimensionality of real-estate price datasets and thus enhance the price-prediction accuracy of a DNN model, this paper adopts principal component analysis (PCA). The PCA benefits in improving the prediction accuracy of a DNN model are threefold: dimensionality reduction, dataset transformation and localisation of influential price features. The results indicate that, through the PCA-DNN model, the transformed dataset achieves higher accuracy (90%–95%) and better generalisation ability compared with other benchmark price predictors. The spatial and building age proved to have the most impact in determining the overall real-estate price. The application of PCA not only reduces the high-dimensionality of the dataset but also enhances the quality of the encoded feature attributes. The model is beneficial in real-estate and construction applications, where the absence of medium and big datasets decreases the price-prediction accuracy.

APA, Harvard, Vancouver, ISO, and other styles

39

Fjodorova, Natalja, and Marjana Novič. "Rodent Carcinogenicity Dataset." Dataset Papers in Medicine 2013 (January 17, 2013): 1–6. http://dx.doi.org/10.1155/2013/361615.

Full text

Abstract:

The rodent carcinogenicity dataset was compiled from the Carcinogenic Potency Database (CPDBAS) and was applied for the classification of quantitative structure-activity relationship (QSAR) models for the prediction of carcinogenicity based on the counter-propagation artificial neural network (CP ANN) algorithm. The models were developed within EU-funded project CAESAR for regulatory use. The dataset contains the following information: common information about chemicals (ID, chemical name, and their CASRN), molecular structure information (SDF files and SMILES), and carcinogenic (toxicological) properties information: carcinogenic potency (TD50_Rat_mg; carcinogen/noncarcinogen) and structural alert (SA) for carcinogenicity based on mechanistic data. Molecular structure information was used to get chemometrics information to calculate molecular descriptors (254 MDL and 784 Dragon descriptors), which were further used in predictive QSAR modeling. The dataset presented in the paper can be used in future research in oncology, ecology, or chemicals' risk assessment.

APA, Harvard, Vancouver, ISO, and other styles

40

Alshayeb, Mohammad, and Mashaan A. Alshammari. "The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study." Inteligencia Artificial 24, no. 68 (October 26, 2021): 72–88. http://dx.doi.org/10.4114/intartif.vol24iss68pp72-88.

Full text

Abstract:

The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assurance. This study investigates the impact of dataset size and feature selection algorithms on software defect prediction models. We use two approaches to build software defect prediction models: a statistical approach and a machine learning approach with support vector machines (SVMs). The fault prediction model was built based on four datasets of different sizes. Additionally, four feature selection algorithms were used. We found that applying the SVM defect prediction model on datasets with a reduced number of measures as features may enhance the accuracy of the fault prediction model. Also, it directs the test effort to maintain the most influential set of metrics. We also found that the running time of the SVM fault prediction model is not consistent with dataset size. Therefore, having fewer metrics does not guarantee a shorter execution time. From the experiments, we found that dataset size has a direct influence on the SVM fault prediction model. However, reduced datasets performed the same or slightly lower than the original datasets.

APA, Harvard, Vancouver, ISO, and other styles

41

Fernandes, Pedro Henrique Evangelista, Giovanni Corsetti Silva, Diogo Berta Pitz, Matteo Schnelle, Katharina Koschek, Christof Nagel, and Vinicius Carrillo Beber. "Data-Driven, Physics-Based, or Both: Fatigue Prediction of Structural Adhesive Joints by Artificial Intelligence." Applied Mechanics 4, no. 1 (March 8, 2023): 334–55. http://dx.doi.org/10.3390/applmech4010019.

Full text

Abstract:

Here, a comparative investigation of data-driven, physics-based, and hybrid models for the fatigue lifetime prediction of structural adhesive joints in terms of complexity of implementation, sensitivity to data size, and prediction accuracy is presented. Four data-driven models (DDM) are constructed using extremely randomized trees (ERT), eXtreme gradient boosting (XGB), LightGBM (LGBM) and histogram-based gradient boosting (HGB). The physics-based model (PBM) relies on the Findley’s critical plane approach. Two hybrid models (HM) were developed by combining data-driven and physics-based approaches obtained from invariant stresses (HM-I) and Findley’s stress (HM-F). A fatigue dataset of 979 data points of four structural adhesives is employed. To assess the sensitivity to data size, the dataset is split into three train/test ratios, namely 70%/30%, 50%/50%, and 30%/70%. Results revealed that DDMs are more accurate, but more sensitive to dataset size compared to the PBM. Among different regressors, the LGBM presented the best performance in terms of accuracy and generalization power. HMs increased the accuracy of predictions, whilst reducing the sensitivity to data size. The HM-I demonstrated that datasets from different sources can be utilized to improve predictions (especially with small datasets). Finally, the HM-I showed the highest accuracy with an improved sensitivity to data size.

APA, Harvard, Vancouver, ISO, and other styles

42

Chen, Hao, Taoyun Ji, Xiang Zhan, Xiaoxin Liu, Guojing Yu, Wen Wang, Yuwu Jiang, and Xiao-Hua Zhou. "An Explainable Statistical Method for Seizure Prediction Using Brain Functional Connectivity from EEG." Computational Intelligence and Neuroscience 2022 (December 8, 2022): 1–8. http://dx.doi.org/10.1155/2022/2183562.

Full text

Abstract:

Background. Epilepsy is a group of chronic neurological disorders characterized by recurrent and abrupt seizures. The accurate prediction of seizures can reduce the burdens of this disorder. Now, existing studies use brain network features to classify patients’ preictal or interictal states, enabling seizure prediction. However, most predicting methods are based on deep learning techniques, which have weak interpretability and high computational complexity. To address these issues, in this study, we proposed a novel two-stage statistical method that is interpretable and easy to compute. Methods. We used two datasets to evaluate the performance of the proposed method, including the well-known public dataset CHB-MIT. In the first stage, we estimated the dynamic brain functional connectivity network for each epoch. Then, in the second stage, we used the derived network predictor for seizure prediction. Results. We illustrated the results of our method in seizure prediction in two datasets separately. For the FH-PKU dataset, our approach achieved an AUC value of 0.963, a prediction sensitivity of 93.1%, and a false discovery rate of 7.7%. For the CHB-MIT dataset, our approach achieved an AUC value of 0.940, a prediction sensitivity of 93.0%, and a false discovery rate of 11.1%, outperforming existing state-of-the-art methods. Significance. This study proposed an explainable statistical method, which can estimate the brain network using the scalp EEG method and use the net-work predictor to predict epileptic seizures. Availability and Implementation. R Source code is available at https://github.com/HaoChen1994/Seizure-Prediction.

APA, Harvard, Vancouver, ISO, and other styles

43

Gan, Shengfeng, Mohammed Alshahrani, and Shichao Liu. "Positive-Unlabeled Learning for Network Link Prediction." Mathematics 10, no. 18 (September 15, 2022): 3345. http://dx.doi.org/10.3390/math10183345.

Full text

Abstract:

Link prediction is an important problem in network data mining, which is dedicated to predicting the potential relationship between nodes in the network. Normally, network link prediction based on supervised classification will be trained on a dataset consisting of a set of positive samples and a set of negative samples. However, well-labeled training datasets with positive and negative annotations are always inadequate in real-world scenarios, and the datasets contain a large number of unlabeled samples that may hinder the performance of the model. To address this problem, we propose a positive-unlabeled learning framework with network representation for network link prediction only using positive samples and unlabeled samples. We first learn representation vectors of nodes using a network representation method. Next, we concatenate representation vectors of node pairs and then feed them into different classifiers to predict whether the link exists or not. To alleviate data imbalance and enhance the prediction precision, we adopt three types of positive-unlabeled (PU) learning strategies to improve the prediction performance using traditional classifier estimation, bagging strategy and reliable negative sampling. We conduct experiments on three datasets to compare different PU learning methods and discuss their influence on the prediction results. The experimental results demonstrate that PU learning has a positive impact on predictive performances and the promotion effects vary with different network structures.

APA, Harvard, Vancouver, ISO, and other styles

44

GUBBI, JAYAVARDHANA, DANIEL T. H. LAI, MARIMUTHU PALANISWAMI, and MICHAEL PARKER. "PROTEIN SECONDARY STRUCTURE PREDICTION USING SUPPORT VECTOR MACHINES AND A NEW FEATURE REPRESENTATION." International Journal of Computational Intelligence and Applications 06, no. 04 (December 2006): 551–67. http://dx.doi.org/10.1142/s1469026806002076.

Full text

Abstract:

Knowledge of the secondary structure and solvent accessibility of a protein plays a vital role in the prediction of fold, and eventually the tertiary structure of the protein. A challenging issue of predicting protein secondary structure from sequence alone is addressed. Support vector machines (SVM) are employed for the classification and the SVM outputs are converted to posterior probabilities for multi-class classification. The effect of using Chou–Fasman parameters and physico-chemical parameters along with evolutionary information in the form of position specific scoring matrix (PSSM) is analyzed. These proposed methods are tested on the RS126 and CB513 datasets. A new dataset is curated (PSS504) using recent release of CATH. On the CB513 dataset, sevenfold cross-validation accuracy of 77.9% was obtained using the proposed encoding method. A new method of calculating the reliability index based on the number of votes and the Support Vector Machine decision value is also proposed. A blind test on the EVA dataset gives an average Q3 accuracy of 74.5% and ranks in top five protein structure prediction methods. Supplementary material including datasets are available on .

APA, Harvard, Vancouver, ISO, and other styles

45

Lertampaiporn, Supatcha, Sirapop Nuannimnoi, Tayvich Vorapreeda, Nipa Chokesajjawatee, Wonnop Visessanguan, and Chinae Thammarongtham. "PSO-LocBact: A Consensus Method for Optimizing Multiple Classifier Results for Predicting the Subcellular Localization of Bacterial Proteins." BioMed Research International 2019 (November 19, 2019): 1–11. http://dx.doi.org/10.1155/2019/5617153.

Full text

Abstract:

Several computational approaches for predicting subcellular localization have been developed and proposed. These approaches provide diverse performance because of their different combinations of protein features, training datasets, training strategies, and computational machine learning algorithms. In some cases, these tools may yield inconsistent and conflicting prediction results. It is important to consider such conflicting or contradictory predictions from multiple prediction programs during protein annotation, especially in the case of a multiclass classification problem such as subcellular localization. Hence, to address this issue, this work proposes the use of the particle swarm optimization (PSO) algorithm to combine the prediction outputs from multiple different subcellular localization predictors with the aim of integrating diverse prediction models to enhance the final predictions. Herein, we present PSO-LocBact, a consensus classifier based on PSO that can be used to combine the strengths of several preexisting protein localization predictors specially designed for bacteria. Our experimental results indicate that the proposed method can resolve inconsistency problems in subcellular localization prediction for both Gram-negative and Gram-positive bacterial proteins. The average accuracy achieved on each test dataset is over 98%, higher than that achieved with any individual predictor.

APA, Harvard, Vancouver, ISO, and other styles

46

Wang, Xiao, Yinping Jin, and Qiuwen Zhang. "DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment." International Journal of Molecular Sciences 21, no. 16 (August 9, 2020): 5710. http://dx.doi.org/10.3390/ijms21165710.

Full text

Abstract:

Mitochondrial proteins are physiologically active in different compartments, and their abnormal location will trigger the pathogenesis of human mitochondrial pathologies. Correctly identifying submitochondrial locations can provide information for disease pathogenesis and drug design. A mitochondrion has four submitochondrial compartments, the matrix, the outer membrane, the inner membrane, and the intermembrane space, but various existing studies ignored the intermembrane space. The majority of researchers used traditional machine learning methods for predicting mitochondrial protein localization. Those predictors required expert-level knowledge of biology to be encoded as features rather than allowing the underlying predictor to extract features through a data-driven procedure. Besides, few researchers have considered the imbalance in datasets. In this paper, we propose a novel end-to-end predictor employing deep neural networks, DeepPred-SubMito, for protein submitochondrial location prediction. First, we utilize random over-sampling to decrease the influence caused by unbalanced datasets. Next, we train a multi-channel bilayer convolutional neural network for multiple subsequences to learn high-level features. Third, the prediction result is outputted through the fully connected layer. The performance of the predictor is measured by 10-fold cross-validation and 5-fold cross-validation on the SM424-18 dataset and the SubMitoPred dataset, respectively. Experimental results show that the predictor outperforms state-of-the-art predictors. In addition, the prediction of results in the M983 dataset also confirmed its effectiveness in predicting submitochondrial locations.

APA, Harvard, Vancouver, ISO, and other styles

47

Ann Romalt, A., and Mathusoothana S. Kumar. "A Novel Machine Learning Based Probabilistic Classification Model for Heart Disease Prediction." Journal of Medical Imaging and Health Informatics 12, no. 3 (March 1, 2022): 221–29. http://dx.doi.org/10.1166/jmihi.2022.3940.

Full text

Abstract:

Cardiovascular disease (CVD) is most dreadful disease that results in fatal-threats like heart attacks. Accurate disease prediction is very essential and machine-learning techniques contribute a major part in predicting occurrence. In this paper, a novel machine learning based model for accurate prediction of cardiovascular disease is developed that applies unique feature selection technique called Chronic Fatigue Syndrome Best Known Method (CFSBKM). Each feature is ranked based on the feature importance scores. The new learning model eliminates the most irrelevant and low importance features from the datasets thereby resulting in the robust heart disease risk prediction model. The multi-nominal Naive Bayes classifier is used for the classification. The performance of the CFSBKM model is evaluated using the Benchmark dataset Cleveland dataset from UCI repository and the proposed models out-perform the existing techniques.

APA, Harvard, Vancouver, ISO, and other styles

48

Zareapoor, Masoumeh, and Pourya Shamsolmoali. "Boosting prediction performance on imbalanced dataset." International Journal of Information and Communication Technology 13, no. 2 (2018): 186. http://dx.doi.org/10.1504/ijict.2018.090556.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Zareapoor, Masoumeh, and Pourya Shamsolmoali. "Boosting prediction performance on imbalanced dataset." International Journal of Information and Communication Technology 13, no. 2 (2018): 186. http://dx.doi.org/10.1504/ijict.2018.10011701.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Wang, Liang, Zhiwen Yu, Bin Guo, Tao Ku, and Fei Yi. "Moving Destination Prediction Using Sparse Dataset." ACM Transactions on Knowledge Discovery from Data 11, no. 3 (April 14, 2017): 1–33. http://dx.doi.org/10.1145/3051128.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!