To see the other types of publications on this topic, follow the link: Pruning random forest.

Journal articles on the topic 'Pruning random forest'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Pruning random forest.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Yang, Fan, Wei-hang Lu, Lin-kai Luo, and Tao Li. "Margin optimization based pruning for random forest." Neurocomputing 94 (October 2012): 54–63. http://dx.doi.org/10.1016/j.neucom.2012.04.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tarchoune, Ilhem, Akila Djebbar, and Hayet Farida Merouani. "Improving Random Forest with Pre-pruning technique for Binary classification." All Sciences Abstracts 1, no. 2 (July 25, 2023): 11. http://dx.doi.org/10.59287/as-abstracts.1202.

Full text
Abstract:
Random Forest (RF) is a popular machine learning algorithm. It is based on the concept of ensemble learning, which is a process of combining several classifiers to solve a complex problem and improve model performance. The random forest allows extending the notions of decision trees (DT) in order to build more stable models. In this work we propose to further improve the predictions of the trees in the forest by a pre-pruning technique, which aims to optimize the performance of the nodes and to minimize the size of the trees. Two experiments are performed to evaluate the performance of the proposed method; in the first experiment we applied the Classical Random Forest algorithm (CRF) with several different trees. While in the second one, a pre-pruning technique is established on the trees in order to define the optimal size of the forest. Finally, we compared the results obtained. The main objective is to produce accurate decision trees with high precision. The effectiveness of the proposed method is validated on five medical databases; the prediction precision will be improved with 83%, 94%, 95%, 97%, and 81% for Diabetes, Hepatitis, SaHeart, EEG-Eye-State, Prostate-cancer databases respectively. The performance results confirm that the proposed method performs better than the classical random forest algorithm.
APA, Harvard, Vancouver, ISO, and other styles
3

Fawagreh, Khaled, and Mohamed Medhat Gaber. "eGAP: An Evolutionary Game Theoretic Approach to Random Forest Pruning." Big Data and Cognitive Computing 4, no. 4 (November 28, 2020): 37. http://dx.doi.org/10.3390/bdcc4040037.

Full text
Abstract:
To make healthcare available and easily accessible, the Internet of Things (IoT), which paved the way to the construction of smart cities, marked the birth of many smart applications in numerous areas, including healthcare. As a result, smart healthcare applications have been and are being developed to provide, using mobile and electronic technology, higher diagnosis quality of the diseases, better treatment of the patients, and improved quality of lives. Since smart healthcare applications that are mainly concerned with the prediction of healthcare data (like diseases for example) rely on predictive healthcare data analytics, it is imperative for such predictive healthcare data analytics to be as accurate as possible. In this paper, we will exploit supervised machine learning methods in classification and regression to improve the performance of the traditional Random Forest on healthcare datasets, both in terms of accuracy and classification/regression speed, in order to produce an effective and efficient smart healthcare application, which we have termed eGAP. eGAP uses the evolutionary game theoretic approach replicator dynamics to evolve a Random Forest ensemble. Trees of high resemblance in an initial Random Forest are clustered, and then clusters grow and shrink by adding and removing trees using replicator dynamics, according to the predictive accuracy of each subforest represented by a cluster of trees. All clusters have an initial number of trees that is equal to the number of trees in the smallest cluster. Cluster growth is performed using trees that are not initially sampled. The speed and accuracy of the proposed method have been demonstrated by an experimental study on 10 classification and 10 regression medical datasets.
APA, Harvard, Vancouver, ISO, and other styles
4

El Habib Daho, Mostafa, Nesma Settouti, Mohammed El Amine Bechar, Amina Boublenza, and Mohammed Amine Chikh. "A new correlation-based approach for ensemble selection in random forests." International Journal of Intelligent Computing and Cybernetics 14, no. 2 (March 23, 2021): 251–68. http://dx.doi.org/10.1108/ijicc-10-2020-0147.

Full text
Abstract:
PurposeEnsemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems. Despite the effectiveness of these techniques, studies have shown that ensemble methods generate a large number of hypotheses and that contain redundant classifiers in most cases. Several works proposed in the state of the art attempt to reduce all hypotheses without affecting performance.Design/methodology/approachIn this work, the authors are proposing a pruning method that takes into consideration the correlation between classifiers/classes and each classifier with the rest of the set. The authors have used the random forest algorithm as trees-based ensemble classifiers and the pruning was made by a technique inspired by the CFS (correlation feature selection) algorithm.FindingsThe proposed method CES (correlation-based Ensemble Selection) was evaluated on ten datasets from the UCI machine learning repository, and the performances were compared to six ensemble pruning techniques. The results showed that our proposed pruning method selects a small ensemble in a smaller amount of time while improving classification rates compared to the state-of-the-art methods.Originality/valueCES is a new ordering-based method that uses the CFS algorithm. CES selects, in a short time, a small sub-ensemble that outperforms results obtained from the whole forest and the other state-of-the-art techniques used in this study.
APA, Harvard, Vancouver, ISO, and other styles
5

Gefeller, Olaf, Asma Gul, Folkert Horn, Zardad Khan, Berthold Lausen, and Werner Adler. "Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set." Methods of Information in Medicine 55, no. 06 (2016): 557–63. http://dx.doi.org/10.3414/me16-01-0055.

Full text
Abstract:
SummaryBackground: Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased perfor -mance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges.Objectives: We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation.Methods: The data set consists of 102 topo-graphical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma.Results: In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies.Conclusions: The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of perfor -mance in a population with increased risk of glaucoma.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhu, Wancai, Zhaogang Liu, Weiwei Jia, and Dandan Li. "Modelling the Tree Height, Crown Base Height, and Effective Crown Height of Pinus koraiensis Plantations Based on Knot Analysis." Forests 12, no. 12 (December 15, 2021): 1778. http://dx.doi.org/10.3390/f12121778.

Full text
Abstract:
Taking 1735 Pinus koraiensis knots in Mengjiagang Forest Farm plantations in Jiamusi City, Heilongjiang Province as the research object, a dynamic tree height, effective crown height, and crown base height growth model was developed using 349 screened knots. The Richards equation was selected as the basic model to develop a crown base height and effective crown height nonlinear mixed-effects model considering random tree-level effects. Model parameters were estimated with the non-liner mixed effect model (NLMIXED) Statistical Analysis System (SAS) module. The akaike information criterion (AIC), bayesian information criterion (BIC), −2 Log likelihood (−2LL), adjusted coefficient (Ra2), root mean square error (RMSE), and residual squared sum (RSS) values were used for the optimal model selection and performance evaluation. When tested with independent sample data, the mixed-effects model tree effects-considering outperformed the traditional model regarding their goodness of fit and validation; the two-parameter mixed-effects model outperformed the one-parameter model. Pinus koraiensis pruning times and intensities were calculated using the developed model. The difference between the effective crown and crown base heights was 1.01 m at the 15th year; thus, artificial pruning could occur. Initial pruning was performed with a 1.01 m intensity in the 15th year. Five pruning were required throughout the young forest period; the average pruning intensity was 1.46 m. The pruning interval did not differ extensively in the half-mature forest period, while the intensity decreased significantly. The final pruning intensity was only 0.34 m.
APA, Harvard, Vancouver, ISO, and other styles
7

Paudel, Nawaraj, and Jagdish Bhatta. "Mushroom Classification using Random Forest and REP Tree Classifiers." Nepal Journal of Mathematical Sciences 3, no. 1 (August 31, 2022): 111–16. http://dx.doi.org/10.3126/njmathsci.v3i1.44130.

Full text
Abstract:
Mushroom is a popular fruit of a much larger fungus that has a high level of protein and a rich source of vitamin B. It aids in the prevention of cancer, weight loss, and immune system enhancement. There are numerous thousands of mushroom species within the world and a few are eatable and a few are noxious due to noteworthy poisons on them. Hence, it is a vital errand to distinguish between eatable and harmful mushrooms. This paper focuses on comparing the performance of Random Forest and Reduced Error Pruning (REP) Tree classification algorithms for the classification of edible and poisonous mushrooms. In this paper, mushroom dataset from UCI machine learning repository has been classified using Random Forest and REP Tree classifiers. The result based on accuracy, precision, recall and F-measure showed that the Random Forest outperformed REP Tree algorithm as it had highest accuracy value of 100%, precision value of 100%, recall value of 100% and F- measure value of 100%. The performance is 100% by using Random Forest, which is found better with respect to REP Tree classifier.
APA, Harvard, Vancouver, ISO, and other styles
8

Yadav, Dhyan Chandra, and Saurabh Pal. "Analysis of Heart Disease Using Parallel and Sequential Ensemble Methods With Feature Selection Techniques." International Journal of Big Data and Analytics in Healthcare 6, no. 1 (January 2021): 40–56. http://dx.doi.org/10.4018/ijbdah.20210101.oa4.

Full text
Abstract:
This paper has organized a heart disease-related dataset from UCI repository. The organized dataset describes variables correlations with class-level target variables. This experiment has analyzed the variables by different machine learning algorithms. The authors have considered prediction-based previous work and finds some machine learning algorithms did not properly work or do not cover 100% classification accuracy with overfitting, underfitting, noisy data, residual errors on base level decision tree. This research has used Pearson correlation and chi-square features selection-based algorithms for heart disease attributes correlation strength. The main objective of this research to achieved highest classification accuracy with fewer errors. So, the authors have used parallel and sequential ensemble methods to reduce above drawback in prediction. The parallel and serial ensemble methods were organized by J48 algorithm, reduced error pruning, and decision stump algorithm decision tree-based algorithms. This paper has used random forest ensemble method for parallel randomly selection in prediction and various sequential ensemble methods such as AdaBoost, Gradient Boosting, and XGBoost Meta classifiers. In this paper, the experiment divides into two parts: The first part deals with J48, reduced error pruning and decision stump and generated a random forest ensemble method. This parallel ensemble method calculated high classification accuracy 100% with low error. The second part of the experiment deals with J48, reduced error pruning, and decision stump with three sequential ensemble methods, namely AdaBoostM1, XG Boost, and Gradient Boosting. The XG Boost ensemble method calculated better results or high classification accuracy and low error compare to AdaBoostM1 and Gradient Boosting ensemble methods. The XG Boost ensemble method calculated 98.05% classification accuracy, but random forest ensemble method calculated high classification accuracy 100% with low error.
APA, Harvard, Vancouver, ISO, and other styles
9

González, Sergio, Francisco Herrera, and Salvador García. "Monotonic Random Forest with an Ensemble Pruning Mechanism based on the Degree of Monotonicity." New Generation Computing 33, no. 4 (July 2015): 367–88. http://dx.doi.org/10.1007/s00354-015-0402-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mulyo, Harminto, and Nadia Annisa Maori. "PENINGKATAN AKURASI PREDIKSI PEMILIHAN PROGRAM STUDI CALON MAHASISWA BARU MELALUI OPTIMASI ALGORITMA DECISION TREE DENGAN TEKNIK PRUNING DAN ENSEMBLE." Jurnal Disprotek 15, no. 1 (January 2, 2024): 15–25. http://dx.doi.org/10.34001/jdpt.v15i1.5585.

Full text
Abstract:
ENHACING PREDICTION ACCURACY OF NEW STUDENT PROGRAM SELECTION THROUGH DECISION TREE ALGORITHM OPTIMIZATION WITH PRUNING TECHNIQUE AND ENSEMBLEIn the current era of reform and globalization, the complexity of choosing the right study program is increasing with the many choices available. One of the challenges faced by the Nahdlatul Ulama Islamic University (UNISNU) Jepara is the increase in students with non-active status which can have an impact on the reputation of the university. One of the factors that can influence is the inaccuracy of students in choosing a study program, so that they are reluctant to continue because they are not enthusiastic about continuing their studies. The solution provided is to predict the selection of the right study program for prospective new students by utilizing the Decision Tree algorithm which is optimized with pruning and ensemble techniques with Random Forest which can help overcome overfitting in the decision tree. The data used is UNISNU student data from 2013 to 2023 with a total of 15,289 records and 52 attributes. The results showed that the Decision Tree and Random Forest models provided the highest accuracy, namely 0.88 with a max_depth value of 20 and succeeded in overcoming the problem of overfitting the decision tree. This model can then be used as a recommendation in predicting the selection of study programs for prospective new students at UNISNU Jepara.Dalam era reformasi dan globalisasi saat ini, kompleksitas dalam memilih program studi yang sesuai semakin meningkat dengan banyaknya pilihan yang tersedia. Salah satu tantangan yang dihadapi oleh Universitas Islam Nahdlatul Ulama (UNISNU) Jepara adalah meningkatnya mahasiswa dengan status non-aktif yang dapat berdampak pada reputasi universitas. Salah satu faktor yang dapat mempengaruhi adalah ketidaktepatan mahasiswa dalam memilih program studi, sehingga enggan untuk meneruskan karena tidak bersemangat dalam melanjutkan perkuliahan. Solusi yang diberikan adalah dengan melakukan prediksi pemilihan program studi bagi yang tepat bagi calon mahasiswa baru dengan memanfaatkan algoritma Decision Tree yang dioptimalkan dengan teknik pruning dan ensemble dengan Random Forest yang dapat membantu mengatasi overfitting pada decision tree. Data yang digunakan adalah data mahasiswa UNISNU dari tahun 2013 sampai dengan 2023 dengan jumlah 15.289 record dan 52 atribut. Hasil penelitian menunjukkan model Decision Tree dan Random Forest memberikan akurasi tertinggi, yaitu 0.88 dengan nilai max_depth sebesar 20 dan berhasil mengatasi masalah overfitting pada decision tree. Model ini selanjutnya dapat menjadi rekomendasi dalam prediksi pemilihan program studi bagi calon mahasiswa baru di UNISNU Jepara.
APA, Harvard, Vancouver, ISO, and other styles
11

Mawarni, Ajeng Citra, Rusdah Rusdah, Law Li Hin, and Dian Anubhakti. "DETEKSI DINI GEJALA AWAL PENYAKIT DIABETES MENGGUNAKAN ALGORITMA RANDOM FOREST." IDEALIS : InDonEsiA journaL Information System 6, no. 2 (July 15, 2023): 165–71. http://dx.doi.org/10.36080/idealis.v6i2.3018.

Full text
Abstract:
Diabetes merupakan penyakit kronis yang disebabkan karena pancreas tidak dapat memproduksi insulin sesuai dengan kebutuhan tubuh atau kondisi ketika tubuh tidak dapat menggunakan insulin secara efektif. Pada tahun 2021 Indonesia memperoleh urutan ke-5 didunia dengan populasi penderita penyakit diabetes terbanyak dan terdapat lebih dari 1 orang diantara 10 orang dewasa yang menderita diabetes. Semakin meningkatnya penderita diabetes di Indonesia bahkan di dunia yang sebenarnya sudah positif diderita tetapi tidak menimbulkan komplikasi lebih lanjut hingga kematian. Hal ini disebabkan karena belum adanya model klasifikasi deteksi dini gejala awal diabetes. Maka pada penelitian ini perlu dilakukannya pembuatan model klasifikasi deteksi dini gejala awal penyakit diabetes dengan metode penelitian Cross Industry Standard Process for Data Mining (CRISP-DM) yaitu dengan melaksanakan riset jurnal. Penelitian ini menggunakan algoritma Random Forest. Data yang akan digunakan bersifat public yang didapatkan melalui website www.kaggle.com dengan total 520 record dataset yang terdiri dari 17 attribut, terdapat 320 dataset dengan positif diabetes dan 200 dataset dengan negative diabetes. Klasifikasi dilakukan dengan dengan komposisi data training dan data testing 90:10 menggunakan teknik stratified random sampling dengan number of trees 5, maximal depth 5, dan dilakukannya apply pruning. Diperoleh akurasi 90.38%, precision 100%, recall 84.38% dan niai AUC 1.00. Sehingga dapat disimpulkan bahwa model klasifikasi dengan algoritma Random Forest dapat bekerja sangat baik terhadap data deteksi dini gejala awal penyakit diabetes.
APA, Harvard, Vancouver, ISO, and other styles
12

Li, Xin, Baodong Qin, Yiyuan Luo, and Dong Zheng. "A Differential Privacy Budget Allocation Algorithm Based on Out-of-Bag Estimation in Random Forest." Mathematics 10, no. 22 (November 18, 2022): 4338. http://dx.doi.org/10.3390/math10224338.

Full text
Abstract:
The issue of how to improve the usability of data publishing under differential privacy has become one of the top questions in the field of machine learning privacy protection, and the key to solving this problem is to allocate a reasonable privacy protection budget. To solve this problem, we design a privacy budget allocation algorithm based on out-of-bag estimation in random forest. The algorithm firstly calculates the decision tree weights and feature weights by the out-of-bag data under differential privacy protection. Secondly, statistical methods are introduced to classify features into best feature set, pruned feature set, and removable feature set. Then, pruning is performed using the pruned feature set to avoid decision trees over-fitting when constructing an ϵ-differential privacy random forest. Finally, the privacy budget is allocated proportionally based on the decision tree weights and feature weights in the random forest. We conducted experimental comparisons with real data sets from Adult and Mushroom to demonstrate that this algorithm not only protects data security and privacy, but also improves model classification accuracy and data availability.
APA, Harvard, Vancouver, ISO, and other styles
13

Arora, Gourav, Devender Kumar, and Balraj Singh. "Tree based Regression Models for Predicting the Compressive Strength of Concrete at High Temperature." IOP Conference Series: Earth and Environmental Science 1327, no. 1 (April 1, 2024): 012015. http://dx.doi.org/10.1088/1755-1315/1327/1/012015.

Full text
Abstract:
Abstract Predicting the compressive strength of concrete is a complicated process due to the heterogeneous mixture of concrete and high variable materials. Researchers have predicted the compressive strength of concrete for various mixes using soft computing models. In this research, compressive strength of concrete at high temperature with fly ash, super plasticizers, and fibre is predicted using three regression tree-based soft computing models (Random Forest, Random Tree, and Reduced-Error Pruning Tree (REP Tree)). The data used in this study is collected from the literature, and two-thirds of the total data is used for model training, while the remaining third is reserved for testing the prepared model. The model’s performance is evaluated based on scatter plots, variation plots, box plots, and prediction error rates, i.e., R, RMSE, and MAE. The results highlight the highest performance of the Random Forest model, with R of 0.9142; RMSE of 9.6285 MPa and MAE of 6.7931 MPa, outperforming the other competing models. Furthermore, the most influential parameter is determined using sensitivity analysis. Thus, the Random Forest model is the model that can be used for predicting the compressive strength of concrete at high temperatures.
APA, Harvard, Vancouver, ISO, and other styles
14

Kong Qingqing, 孔清清, 丁香乾 Ding Xiangqian, and 宫会丽 Gong Huili. "Application of Improved Random Forest Pruning Algorithm in Tobacco Origin Identification of Near Infrared Spectrum." Laser & Optoelectronics Progress 55, no. 1 (2018): 013006. http://dx.doi.org/10.3788/lop55.013006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Yosefian, Iman, Ehsan Mosa Farkhani, and Mohammad Reza Baneshi. "Application of Random Forest Survival Models to Increase Generalizability of Decision Trees: A Case Study in Acute Myocardial Infarction." Computational and Mathematical Methods in Medicine 2015 (2015): 1–6. http://dx.doi.org/10.1155/2015/576413.

Full text
Abstract:
Background. Tree models provide easily interpretable prognostic tool, but instable results. Two approaches to enhance the generalizability of the results are pruning and random survival forest (RSF). The aim of this study is to assess the generalizability of saturated tree (ST), pruned tree (PT), and RSF.Methods. Data of 607 patients was randomly divided into training and test set applying 10-fold cross-validation. Using training sets, all three models were applied. Using Log-Rank test, ST was constructed by searching for optimal cutoffs. PT was selected plotting error rate versus minimum sample size in terminal nodes. In construction of RSF, 1000 bootstrap samples were drawn from the training set.C-index and integrated Brier score (IBS) statistic were used to compare models.Results. ST provides the most overoptimized statistics. Mean difference betweenC-index in training and test set was 0.237. Corresponding figure in PT and RSF was 0.054 and 0.007. In terms of IBS, the difference was 0.136 in ST, 0.021 in PT, and 0.0003 in RSF.Conclusion. Pruning of tree and assessment of its performance of a test set partially improve the generalizability of decision trees. RSF provides results that are highly generalizable.
APA, Harvard, Vancouver, ISO, and other styles
16

Zhenzhen Liu, Zhenzhen Liu, Rui Zhou Zhenzhen Liu, Kangqian Huang Rui Zhou, Xin Hu Kangqian Huang, Zhe Jiang Xin Hu, Binsi Cai Zhe Jiang, and Kaiguo Yuan Binsi Cai. "Intrusion Detection Based on Feature Reduction and Model Pruning in Electricity Trading Network." 電腦學刊 34, no. 5 (October 2023): 213–27. http://dx.doi.org/10.53106/199115992023103405017.

Full text
Abstract:
<p>The electricity trading network increases network flexibility and lowers trading costs with the aid of 5G and IOT technology. While it has improved trading efficiency and enhanced system intelligence, its security vulnerabilities pose significant challenges. In this study, we propose an intrusion detection method that focuses on feature reduction and model pruning in electricity trading network. The method effectively addresses the imbalance issue of the IDS2017 dataset by employing the SMOTE algorithm, reduces feature size and computational complexity through the application of PCA, autoencoder, and random forest techniques, and develops a lightweight intrusion detection model specifically designed for electricity trading network using model pruning and compression techniques. Experimental results demonstrate the effectiveness of the proposed model in detecting intrusions. The achieved precision, recall, F1 score, and false positive rate are at least 98.8%, 87.9%, 90.0%, and 0.08%, respectively. Furthermore, we conducted a comparative analysis of different pruning thresholds and determined that reducing the dimensionality to 49 dimensions yields superior model performance, making it particularly suitable for resource-constrained electricity trading network.</p> <p>&nbsp;</p>
APA, Harvard, Vancouver, ISO, and other styles
17

Xu, Yonghao, Li Liu, Meizhen Huang, and Ning Xu. "High accuracy determination of Angelica dahurica origin based on near infrared spectroscopy and a random forest pruning algorithm." Journal of Near Infrared Spectroscopy 27, no. 4 (April 20, 2019): 278–85. http://dx.doi.org/10.1177/0967033519841127.

Full text
Abstract:
A near infrared spectroscopy method combined with a random forest pruning algorithm based on margin optimization and principal component analysis (PCA-MORFP) was proposed to identify the origin of Angelica dahurica. One hundred and ninety-six samples of A. dahurica were collected from four original cultivation regions; their NIR diffuse reflectance spectra were measured by a custom-built near infrared spectrometer which works in the range of 900–1700 nm with a resolution (full width at half maximum [FWHM]) of 4 nm. Combinations of Savitzky–Golay smoothing, standard normal variates, and first derivative transformations were used to preprocess the spectral data. Then the PCA-MORFP classification model was constructed. Meanwhile, the was compared with other classifying approaches, including: principal component analysis-K-nearest neighbor, principal component analysis-support vector machine, and principal component analysis-random forest. Experimental results showed that the PCA-MORFP achieved the best prediction performance over other compared methods. The recognition rates of the PCA-MORFP model were up to 100% for the calibration set and 98.2% for the prediction set, respectively. The method provides a rapid and convenient detection technique for the origin identification of A. dahurica.
APA, Harvard, Vancouver, ISO, and other styles
18

Nhu, Viet-Ha, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema, and Hoang Nguyen. "Daily Water Level Prediction of Zrebar Lake (Iran): A Comparison between M5P, Random Forest, Random Tree and Reduced Error Pruning Trees Algorithms." ISPRS International Journal of Geo-Information 9, no. 8 (July 31, 2020): 479. http://dx.doi.org/10.3390/ijgi9080479.

Full text
Abstract:
Zrebar Lake is one of the largest freshwater lakes in Iran and it plays an important role in the ecosystem of the environment, while its desiccation has a negative impact on the surrounded ecosystem. Despite this, this lake provides an interesting recreation setting in terms of ecotourism. The prediction and forecasting of the water level of the lake through simple but practical methods can provide a reliable tool for future lake water resource management. In the present study, we predict the daily water level of Zrebar Lake in Iran through well-known decision tree-based algorithms, including the M5 pruned (M5P), random forest (RF), random tree (RT) and reduced error pruning tree (REPT). We used five different water input combinations to find the most effective one. For our modeling, we chose 70% of the dataset for training (from 2011 to 2015) and 30% for model evaluation (from 2015 to 2017). We evaluated the models’ performances using different quantitative (root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), percent bias (PBIAS) and ratio of the root mean square error to the standard deviation of measured data (RSR)) and visual frameworks (Taylor diagram and box plot). Our results showed that water level with a one-day lag time had the highest effect on the result and, by increasing the lag time, its effect on the result was decreased. This result indicated that all the developed models had a good prediction capability, but the M5P model outperformed the others, followed by RF and RT equally and then REPT. Our results showed that these algorithms can predict water level accurately only with a one-day lag time in water level as an input and they are cost-effective tools for future predictions.
APA, Harvard, Vancouver, ISO, and other styles
19

Ed-Daoudi, Rachid, Altaf Alaoui, Badia Ettaki, and Jamal Zerouaoui. "A Machine Learning Approach to Identify Optimal Cultivation Practices for Sustainable apple Production in Precision Agriculture in Morocco." E3S Web of Conferences 469 (2023): 00052. http://dx.doi.org/10.1051/e3sconf/202346900052.

Full text
Abstract:
Precision agriculture techniques have been increasingly adopted worldwide to optimize cultivation practices and achieve sustainable crop production. In this study, we developed a Machine Learning approach to identify optimal cultivation practices for sustainable apple production in precision agriculture in the Msemrir town Morocco. We collected a dataset of cultivation practices and apple yield and size data from 10 farms in the town and used correlation-based feature selection and three Machine Learning algorithms (Linear Regression, Decision Tree, and Random Forest) to develop predictive models. The results showed that irrigation, fertilization, and pruning are the most important cultivation practices for apple production in the region, and the Random Forest model performed the best in predicting apple yield and size based on the selected practices. The use of Machine Learning techniques can help farmers optimize cultivation practices and achieve sustainable apple production by reducing inputs such as water and fertilizer and minimizing environmental impact. Moreover, the use of precision agriculture techniques can help farmers meet consumer demand for sustainable and high-quality apple products.
APA, Harvard, Vancouver, ISO, and other styles
20

Venkatarathinam, R., R. Sivakami, Prasanna Ranjith Christodoss, Mahesh T R, E. Mohan, and Vinoth Kumar V. "Ensemble of Homogenous and Heterogeneous Classifiers using K-Fold Cross Validation with Reduced Entropy." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 8s (August 18, 2023): 315–24. http://dx.doi.org/10.17762/ijritcc.v11i8s.7211.

Full text
Abstract:
Chronic kidney disease (CKD) affects millions of people worldwide, greatly reducing their quality of life and creating serious economic, social, and medical problems. Some automated diagnosis methods can detect chronic renal disease. In-depth studies on data mining techniques have recently focused on accuracy in the diagnosis of chronic renal illnesses, either by taking advantage of the disease's simplicity or doing feature selection in addition to pre-processing. In order to handle the unbalanced dataset in this work, Synthetic Minority Over Sampling Technique (SMOTE) is used during pre-processing. For this investigation, 400 data from the publicly accessible UCI machine learning (ML) repository are used. For the implementation, both homogeneous and heterogeneous ensemble classifiers which combine two separate classifiers have been used. Different machine learning (ML) techniques, such as the Classification and Regression Tree (CART), Adaboost classifier, Decision Tree (DT), Reduced Error Pruning Tree, Alternating Decision Tree, and Random Forests Algorithm and their ensembles with a significant reduction in entropy, are used to perform the classification. With a 99.12% accuracy rate and a 99.10% f1 score, the homogeneous classifier Adaboost-Random Forest outperforms other models in the prediction of CKD.
APA, Harvard, Vancouver, ISO, and other styles
21

Liu Ming, 刘. 明., 李忠任 Li Zhongren, 张海涛 Zhang Haitao, 于春霞 Yu Chunxia, 唐兴宏 Tang Xinghong, and 丁香乾 Ding Xiangqian. "Feature Selection Algorithm Application in Near-Infrared Spectroscopy Classification Based on Binary Search Combined with Random Forest Pruning." Laser & Optoelectronics Progress 54, no. 10 (2017): 103001. http://dx.doi.org/10.3788/lop54.103001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Djafri, Laouni, Djamel Amar Bensaber, and Reda Adjoudj. "Big Data analytics for prediction: parallel processing of the big learning base with the possibility of improving the final result of the prediction." Information Discovery and Delivery 46, no. 3 (August 20, 2018): 147–60. http://dx.doi.org/10.1108/idd-02-2018-0002.

Full text
Abstract:
Purpose This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time. Design/methodology/approach This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to obtain a representative learning base of all original data. The second part proposes to design a distributed architecture supported by new technologies solutions, which in turn works in a coherent and efficient way with the sampling strategy under the supervision of the Map-Reduce algorithm. Findings The representative learning base obtained by the integration of two learning bases, the partial base and the shared base, presents an excellent representation of the original data set and gives very good results of the Big Data predictive analytics. Furthermore, these results were supported by the improved random forests supervised learning method, which played a key role in this context. Originality/value All companies are concerned, especially those with large amounts of information and want to screen them to improve their knowledge for the customer and optimize their campaigns.
APA, Harvard, Vancouver, ISO, and other styles
23

Chang, Yangyang, and Fadi Abu-Amara. "An Efficient Hybrid Classifier for Cancer Detection." International Journal of Online and Biomedical Engineering (iJOE) 17, no. 03 (March 9, 2021): 76. http://dx.doi.org/10.3991/ijoe.v17i03.19683.

Full text
Abstract:
<span>The early detection of cancer in both healthy and high-risk populations offers increased opportunity for treatment and curative intent. In this paper, we propose a hybrid classifier that produces an efficient classification system for cancer detection in cell datasets. The first part of this work investigates the performance of artificial neural networks (ANN) such as Self-Organizing Feature Map (SOM) and Learning Vector Quantization (LVQ), while in the second part, we present our investigation on the performances of Decision Tree (DT) and its pruning model. We also, in the third part, present our proposal for a new hybrid classifier that is based on the Random Forest (RF) and the combination of the LVQ and DT. Experimental results of the proposed hybrid classifier indicate that the hybrid classifier effectively avoids the drawbacks of individual classifiers and has high anti-noise performance.</span>
APA, Harvard, Vancouver, ISO, and other styles
24

Khozani, Zohreh Sheikh, Khabat Khosravi, Binh Thai Pham, Bjørn Kløve, Wan Hanna Melini Wan Mohtar, and Zaher Mundher Yaseen. "Determination of compound channel apparent shear stress: application of novel data mining models." Journal of Hydroinformatics 21, no. 5 (June 18, 2019): 798–811. http://dx.doi.org/10.2166/hydro.2019.037.

Full text
Abstract:
Abstract Momentum exchange in the mixing region between the floodplain and the main channel is an essential hydraulic process, particularly for the estimation of discharge. The current study investigated various data mining models to estimate apparent shear stress in a symmetric compound channel with smooth and rough floodplains. The applied predictive models include random forest (RF), random tree (RT), reduced error pruning tree (REPT), M5P, and the distinguished hybrid bagging-M5P model. The models are constructed based on several correlated physical channel characteristic variables to predict the apparent shear stress. A sensitivity analysis is applied to select the best function tuning parameters for each model. Results showed that input with six variables exhibited the best prediction results for RF model while input with four variables produced the best performance for other models. Based on the optimised input variables for each model, the efficiency of five predictive models discussed here was evaluated. It was found that the M5P and hybrid bagging-M5P models with the coefficient of determination (R2) equal to 0.905 and 0.92, respectively, in the testing stage are superior in estimating apparent shear stress in compound channels than other RF, RT and REPT models.
APA, Harvard, Vancouver, ISO, and other styles
25

Mr. D Krishna, Erukulla Laasya, A Sowmya Sri, T Ravinder Reddy, and Akhil Sanjoy. "BIOMEDICAL TEXT DOCUMENT CLASSIFICATION." international journal of engineering technology and management sciences 7, no. 3 (2023): 788–92. http://dx.doi.org/10.46647/ijetms.2023.v07i03.121.

Full text
Abstract:
Information extraction, retrieval, and text categorization are only a few of the significant research fields covered by "bio medical text classification." This study examines many text categorization techniques utilised in practise, as well as their strengths and weaknesses, in order to improve knowledge of various information extraction opportunities in the field of data mining. We compiled a dataset with a focus on three categories: "Thyroid Cancer," "Lung Cancer," and "Colon Cancer." This paper presents an empirical study of a classifier. The investigation was carried out using biomedical literature benchmarks. Many metaheuristic algorithms are investigated, including genetic algorithms, particle swarm optimisation, firefly, cuckoo, and bat algorithms. In addition, the proposed multiple classifier system outperforms ensemble learning, ensemble pruning, and traditional classification methods. Based on the data, we forecast if it is Thyroid Cancer, Lung Cancer, or Colon Cancer using basic EDA, text preprocessing, and several models such as Logistic Regression, Decision Tree Classification, and Random Forest Classification.
APA, Harvard, Vancouver, ISO, and other styles
26

Gao, Jun, Lingwei Sun, Shushan Zhang, Jiehuan Xu, Mengqian He, Defu Zhang, Caifeng Wu, and Jianjun Dai. "Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm." Genes 13, no. 12 (November 25, 2022): 2207. http://dx.doi.org/10.3390/genes13122207.

Full text
Abstract:
Chinese indigenous pig breeds have unique genetic characteristics and a rich diversity; however, effective breed identification methods have not yet been well established. In this study, a genotype file of 62,822 single-nucleotide polymorphisms (SNPs), which were obtained from 1059 individuals of 18 Chinese indigenous pig breeds and 5 cosmopolitan breeds, were used to screen the discriminating SNPs for pig breed identification. After linkage disequilibrium (LD) pruning filtering, this study excluded 396 SNPs on non-constant chromosomes and retained 20.92~−27.84% of SNPs for each of the 18 autosomes, leaving a total of 14,823 SNPs. The principal component analysis (PCA) showed the largest differences between cosmopolitan and Chinese pig breeds (PC1 = 10.452%), while relatively small differences were found among the 18 indigenous pig breeds from the Yangtze River Delta region of China. Next, a random forest (RF) algorithm was used to filter these SNPs and obtain the optimal number of decision trees (ntree = 1000) using corresponding out-of-bag (OOB) error rates. By comparing two different SNP ranking methods in the RF analysis, the mean decreasing accuracy (MDA) and mean decreasing Gini index (MDG), the effects of panels with different numbers of SNPs on the assignment accuracy, and the statistics of SNP distribution on each chromosome in the panels, a panel of 1000 of the most breed-discriminative tagged SNPs were finally selected based on the MDA screening method. A high accuracy (>99.3%) was obtained by the breed prediction of 318 samples in the RF test set; thus, a machine learning classification method was established for the multi-breed identification of Chinese indigenous pigs based on a low-density panel of SNPs.
APA, Harvard, Vancouver, ISO, and other styles
27

Grégoire, Guillaume, Josée Fortin, Isa Ebtehaj, and Hossein Bonakdari. "Novel Hybrid Statistical Learning Framework Coupled with Random Forest and Grasshopper Optimization Algorithm to Forecast Pesticide Use on Golf Courses." Agriculture 12, no. 7 (June 28, 2022): 933. http://dx.doi.org/10.3390/agriculture12070933.

Full text
Abstract:
Golf course maintenance requires the use of several inputs, such as pesticides and fertilizers, that can be harmful to human health or the environment. Understanding the factors associated with pesticide use on golf courses may help golf-course managers reduce their reliance on these products. In this study, we used a database of about 14,000 pesticide applications in the province of Québec, Canada, to develop a novel hybrid machine learning approach to predict pesticide use on golf courses. We created this proposed model, called RF-SVM-GOA, by coupling a support vector machine (SVM) with random forest (RF) and the grasshopper optimization algorithm (GOA). We applied RF to handle the wide range of datasets and GOA to find the optimal SVM settings. We considered five different dependent variables—region, golf course ID, number of holes, year, and treated area—as input variables. The experimental results confirmed that the developed hybrid RF-SVM-GOA approach was able to estimate the active ingredient total (AIT) with a high level of accuracy (R = 0.99; MAE = 0.84; RMSE = 0.84; NRMSE = 0.04). We compared the results produced by the developed RF-SVM-GOA model with those of four tree-based techniques including M5P, random tree, reduced error pruning tree (REP tree), and RF, as well as with those of two non-tree-based techniques including the generalized structure of group method of data handling (GSGMDH) and evolutionary polynomial regression (EPR). The computational results showed that the accuracy of the proposed RF-SVM-GOA approach was higher, outperforming the other methods. We analyzed sensitivity to find the most effective variables in AIT forecasting. The results indicated that the treated area is the most effective variable in AIT forecasting. The results of the current study provide a method for increasing the sustainability of golf course management.
APA, Harvard, Vancouver, ISO, and other styles
28

Menéndez García, Luis Alfonso, Marta Menéndez Fernández, Violetta Sokoła-Szewioła, Laura Álvarez de Prado, Almudena Ortiz Marqués, David Fernández López, and Antonio Bernardo Sánchez. "A Method of Pruning and Random Replacing of Known Values for Comparing Missing Data Imputation Models for Incomplete Air Quality Time Series." Applied Sciences 12, no. 13 (June 25, 2022): 6465. http://dx.doi.org/10.3390/app12136465.

Full text
Abstract:
The data obtained from air quality monitoring stations, which are used to carry out studies using data mining techniques, present the problem of missing values. This paper describes a research work on missing data imputation. Among the most common methods, the method that best imputes values to the available data set is analysed. It uses an algorithm that randomly replaces all known values in a dataset once with imputed values and compares them with the actual known values, forming several subsets. Data from seven stations in the Silesian region (Poland) were analyzed for hourly concentrations of four pollutants: nitrogen dioxide (NO2), nitrogen oxides (NOx), particles of 10 μm or less (PM10) and sulphur dioxide (SO2) for five years. Imputations were performed using linear imputation (LI), predictive mean matching (PMM), random forest (RF), k-nearest neighbours (k-NN) and imputation by Kalman smoothing on structural time series (Kalman) methods and performance evaluations were performed. Once the comparison method was validated, it was determine that, in general, Kalman structural smoothing and the linear imputation methods best fitted the imputed values to the data pattern. It was observed that each imputation method behaves in an analogous way for the different stations The variables with the best results are NO2 and SO2. The UMI method is the worst imputer for missing values in the data sets.
APA, Harvard, Vancouver, ISO, and other styles
29

Xu, JunYi. "Systematic Analysis and Application Prospect of Decision Tree." Highlights in Science, Engineering and Technology 71 (November 28, 2023): 163–70. http://dx.doi.org/10.54097/hset.v71i.12687.

Full text
Abstract:
Decision making is common practice for everyone. One must make multiple decisions to move on during his/her lifetime. People are always eager to make the optimum decisions so that they could save energy and step in a right path. Although people try to avoid inferior options that comes with risk and danger, things happen from time to time. In this study, a powerful tool, decision tree, will be introduced to address this problem. With assistance of decision tree, one will make better choices more validly and more efficiently. Some concepts and algorithms of decision tree will be also included to understand examples from the application part. The purpose of this study is to introduce the concept of decision tree, analyze the advantages of decision tree and discuss its future. Although decision tree is widely utilized in many aspects in society, it still has shortcomings like overfitting and underfitting. Fortunately, there are methods such as pruning and random forest to solve these problems. The future of decision tree will be promising.
APA, Harvard, Vancouver, ISO, and other styles
30

Chen, Lei, Yu-Hang Zhang, Xiaoyong Pan, Min Liu, Shaopeng Wang, Tao Huang, and Yu-Dong Cai. "Tissue Expression Difference between mRNAs and lncRNAs." International Journal of Molecular Sciences 19, no. 11 (October 31, 2018): 3416. http://dx.doi.org/10.3390/ijms19113416.

Full text
Abstract:
Messenger RNA (mRNA) and long noncoding RNA (lncRNA) are two main subgroups of RNAs participating in transcription regulation. With the development of next generation sequencing, increasing lncRNAs are identified. Many hidden functions of lncRNAs are also revealed. However, the differences in lncRNAs and mRNAs are still unclear. For example, we need to determine whether lncRNAs have stronger tissue specificity than mRNAs and which tissues have more lncRNAs expressed. To investigate such tissue expression difference between mRNAs and lncRNAs, we encoded 9339 lncRNAs and 14,294 mRNAs with 71 expression features, including 69 maximum expression features for 69 types of cells, one feature for the maximum expression in all cells, and one expression specificity feature that was measured as Chao-Shen-corrected Shannon’s entropy. With advanced feature selection methods, such as maximum relevance minimum redundancy, incremental feature selection methods, and random forest algorithm, 13 features presented the dissimilarity of lncRNAs and mRNAs. The 11 cell subtype features indicated which cell types of the lncRNAs and mRNAs had the largest expression difference. Such cell subtypes may be the potential cell models for lncRNA identification and function investigation. The expression specificity feature suggested that the cell types to express mRNAs and lncRNAs were different. The maximum expression feature suggested that the maximum expression levels of mRNAs and lncRNAs were different. In addition, the rule learning algorithm, repeated incremental pruning to produce error reduction algorithm, was also employed to produce effective classification rules for classifying lncRNAs and mRNAs, which gave competitive results compared with random forest and could give a clearer picture of different expression patterns between lncRNAs and mRNAs. Results not only revealed the heterogeneous expression pattern of lncRNA and mRNA, but also gave rise to the development of a new tool to identify the potential biological functions of such RNA subgroups.
APA, Harvard, Vancouver, ISO, and other styles
31

Kamarudin, Nur Fatihah, and Zuraini Ali Shah. "Feature Extraction And Classification On Single Nucleotide Polymorphism." International Journal of Advanced Science Computing and Engineering 1, no. 2 (September 2, 2019): 85–90. http://dx.doi.org/10.30630/ijasce.1.2.6.

Full text
Abstract:
Malay in Peninsular Malaysia can be divided into eight sub-ethnics which are Malay Bugis, Malay, Malay Champa, Malay Jawa, Malay Kelantan, Malay Kedah, Malay Minang and Malay Pattani. Ancestry informative marker (AIM) can be used to represent the eight subethnic of Malay population in Peninsular Malaysia. In this research, single nucleotide polymorphism (SNP) datasets of eight sub-ethnics are analyses in order to obtain the AIM for Malays population in Peninsular Malaysia. However, the dataset may have outlier, missing data and redundancy that may impact the accuracy of the result. Pre-processing data is an important step that will remove the entire problem. Iterative pruning principal component analysis (ipPCA) is one of the techniques that usually use in analysis on genome datasets to extract the information. It can be applied on the high structured data and can improve the resolution of the data. It also used for structure a sub-population. Random Forest and Hidden Naïve Bayes is used to classify the SNP that can be used as AIM. Information Gain Ratio will rank the chosen AIM based on the value of each attribute
APA, Harvard, Vancouver, ISO, and other styles
32

Gadebe, Moses Lesiba, and Okuthe Paul Kogeda. "Top-K Human Activity Recognition Dataset." International Journal of Interactive Mobile Technologies (iJIM) 14, no. 18 (November 10, 2020): 68. http://dx.doi.org/10.3991/ijim.v14i18.16965.

Full text
Abstract:
<span lang="EN-US">The availability of Smartphones has increased the possibility of self-monitoring to increase physical activity and behavior change to prevent obesity. However self-monitoring on a Smartphtone comes with some challenges such as unavailability of lightweight classification algorithm, personalized dataset to completely capture bodily postures, subject sensitivity, limited storage and computational power. However, most classification algorithms such as Support Vector Machines, C4.5, Naïve Bayes and K Neighbor relies on larger dataset to accurately predict human activities. In this paper, we present top-k of compressed small personalized dataset to reduce computational cost with increased accuracy. We collected top-k personalized dataset from 13 recruited subjects. After benchmarking our collected dataset we found that the dataset is suitable for tree-oriented algorithm, especially the Random Forest, C4.5 and Boosted tree with accuracy and precision of 100% except for KNN, Support Vector and Naïve Bayes. Further, our top-k personalized dataset improves pruning and overfitting of tree-oriented algorithms. Moreover, the linear consistence of static human activities reveals the potential of our top-k dataset to be replicated to multiple-subject to close subject sensitivity challenge.</span>
APA, Harvard, Vancouver, ISO, and other styles
33

Moura, Rebecca Silva de, Kellen Rabello de Souza, Daniel Da Silva Souza, Gabriel Mendes Santana, Guilherme Murilo De Oliveira, Fábio Venturoli, and Carlos De Melo e. Silva-Neto. "Damage in Khaya ivorensis caused by Trigona spinipes in Brazilian savannah." Acta Brasiliensis 1, no. 1 (January 15, 2017): 40. http://dx.doi.org/10.22571/actabra11201715.

Full text
Abstract:
Trigona spinipes (Dog bee) attack the apical bud of Khaya ivorensis causing atrophy and budding which provoke branches that will depreciate the shaft if not handled. Damage to the culture of K. ivorensis has been reported for Brazil, but never before for the Brazilian savannah. The aim of this study was to survey the dog bee attack and report as first record the presence and damage caused in the African mahogany plantations in the Brazilian savannah. The area has about of 16.6 hectares of African mahogany monoculture in the municipality of Piracanjuba, Goiás. It has been used 21 plots of 400 m², pre-defined and simple random sampling method, in which was carried out forest inventory and observation sprouts the apical part of the stem and consequent artificial pruning of sprouts. Also performing were viewed bees in foraging of activity by cutting the shoots of K. ivorensis. In planting, 6.14% of the trees had regrowth and this percentage may indicate the number of attacked trees. The total trees with regrowth is a large quantity of trees which may develop with problems, generating more than one stem or branch, thereby preclude the affected wood is used for furniture.
APA, Harvard, Vancouver, ISO, and other styles
34

Kamarudin, Nur Fatihah, Zuraini Ali Shah, Mohd Farhan Md Fudzee, and Shahreen Kasim. "Feature Extraction and Classification On Single Nucleotide Polymorphism." International Journal of Advanced Science Computing and Engineering 1, no. 2 (August 30, 2019): 85–90. http://dx.doi.org/10.62527/ijasce.1.2.6.

Full text
Abstract:
Malay in Peninsular Malaysia can be divided into eight sub-ethnics which are Malay Bugis, Malay, Malay Champa, Malay Jawa, Malay Kelantan, Malay Kedah, Malay Minang and Malay Pattani. Ancestry informative marker (AIM) can be used to represent the eight subethnic of Malay population in Peninsular Malaysia. In this research, single nucleotide polymorphism (SNP) datasets of eight sub-ethnics are analyses in order to obtain the AIM for Malays population in Peninsular Malaysia. However, the dataset may have outlier, missing data and redundancy that may impact the accuracy of the result. Pre-processing data is an important step that will remove the entire problem. Iterative pruning principal component analysis (ipPCA) is one of the techniques that usually use in analysis on genome datasets to extract the information. It can be applied on the high structured data and can improve the resolution of the data. It also used for structure a sub-population. Random Forest and Hidden Naïve Bayes is used to classify the SNP that can be used as AIM. Information Gain Ratio will rank the chosen AIM based on the value of each attribute
APA, Harvard, Vancouver, ISO, and other styles
35

Nickele, Mariane Aparecida, and Wilson Reis Filho. "Population Dynamics of Acromyrmex crassispinus (Forel) (Hymenoptera: Formicidae) and Attacks on Pinus taeda Linnaeus (Pinaceae) plantations." Sociobiology 62, no. 3 (September 30, 2015): 340. http://dx.doi.org/10.13102/sociobiology.v62i3.422.

Full text
Abstract:
This work aimed to study the population dynamics of Acromyrmex crassispinus (Forel) in Pinus taeda L. plantations, evaluating the density and spatial distribution of nests over time, inferring about the period of the first nuptial flight of A. crassispinus colonies, and evaluating the levels of attack of this leaf-cutting ant on P. taeda plants. Assessments were performed monthly in the first year after planting, every three months until the third year and every six months until the plantation was six years old. The presence of nests was observed only after 15 months after planting. The nest density gradually increased until the planting completed 30 months, and decreased when the forest canopy began to close (after 54 months). Spatial distribution of A. crassispinus nests was random. Probably, the first nuptial flight of an A. crassispinus colony occurs after the third year of the colony foundation. Pinus taeda plants were not attacked by A. crassispinus throughout the evaluation period. Then, when dealing with a replanting area of Pinus plantation, where the previous forest has not been subject to pruning nor thinning, the problem with A. crassispinus is almost null if the clearcutting and the new planting occur during the winter. In this case, leaf-cutting ants control can be alleviated and it is not necessary to carry out systematic control of ants where A. crassispinus is the predominant leaf cutting ant species. Acromyrmex crassispinus control should be done only if nests are located or if attacked plants by ants are detected.
APA, Harvard, Vancouver, ISO, and other styles
36

Jiang, Sheng, Ziyi Liu, Jiajun Hua, Zhenyu Zhang, Shuai Zhao, Fangnan Xie, Jiangbo Ao, et al. "A Real-Time Detection and Maturity Classification Method for Loofah." Agronomy 13, no. 8 (August 16, 2023): 2144. http://dx.doi.org/10.3390/agronomy13082144.

Full text
Abstract:
Fruit maturity is a crucial index for determining the optimal harvesting period of open-field loofah. Given the plant’s continuous flowering and fruiting patterns, fruits often reach maturity at different times, making precise maturity detection essential for high-quality and high-yield loofah production. Despite its importance, little research has been conducted in China on open-field young fruits and vegetables and a dearth of standards and techniques for accurate and non-destructive monitoring of loofah fruit maturity exists. This study introduces a real-time detection and maturity classification method for loofah, comprising two components: LuffaInst, a one-stage instance segmentation model, and a machine learning-based maturity classification model. LuffaInst employs a lightweight EdgeNeXt as the backbone and an enhanced pyramid attention-based feature pyramid network (PAFPN). To cater to the unique characteristics of elongated loofah fruits and the challenge of small target detection, we incorporated a novel attention module, the efficient strip attention module (ESA), which utilizes long and narrow convolutional kernels for strip pooling, a strategy more suitable for loofah fruit detection than traditional spatial pooling. Experimental results on the loofah dataset reveal that these improvements equip our LuffaInst with lower parameter weights and higher accuracy than other prevalent instance segmentation models. The mean average precision (mAP) on the loofah image dataset improved by at least 3.2% and the FPS increased by at least 10.13 f/s compared with Mask R-CNN, Mask Scoring R-CNN, YOLACT++, and SOLOv2, thereby satisfying the real-time detection requirement. Additionally, a random forest model, relying on color and texture features, was developed for three maturity classifications of loofah fruit instances (M1: fruit setting stage, M2: fruit enlargement stage, M3: fruit maturation stage). The application of a pruning strategy helped attain the random forest model with the highest accuracy (91.47% for M1, 90.13% for M2, and 92.96% for M3), culminating in an overall accuracy of 91.12%. This study offers promising results for loofah fruit maturity detection, providing technical support for the automated intelligent harvesting of loofah.
APA, Harvard, Vancouver, ISO, and other styles
37

Almohammed, Fadi, Parveen Sihag, Saad Sh Sammen, Krzysztof Adam Ostrowski, Karan Singh, C. Venkata Siva Rama Prasad, and Paulina Zajdel. "Assessment of Soft Computing Techniques for the Prediction of Compressive Strength of Bacterial Concrete." Materials 15, no. 2 (January 10, 2022): 489. http://dx.doi.org/10.3390/ma15020489.

Full text
Abstract:
In this investigation, the potential of M5P, Random Tree (RT), Reduced Error Pruning Tree (REP Tree), Random Forest (RF), and Support Vector Regression (SVR) techniques have been evaluated and compared with the multiple linear regression-based model (MLR) to be used for prediction of the compressive strength of bacterial concrete. For this purpose, 128 experimental observations have been collected. The total data set has been divided into two segments such as training (87 observations) and testing (41 observations). The process of data set separation was arbitrary. Cement, Aggregate, Sand, Water to Cement Ratio, Curing time, Percentage of Bacteria, and type of sand were the input variables, whereas the compressive strength of bacterial concrete has been considered as the final target. Seven performance evaluation indices such as Correlation Coefficient (CC), Coefficient of determination (R2), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Bias, Nash-Sutcliffe Efficiency (NSE), and Scatter Index (SI) have been used to evaluate the performance of the developed models. Outcomes of performance evaluation indices recommend that the Polynomial kernel function based SVR model works better than other developed models with CC values as 0.9919, 0.9901, R2 values as 0.9839, 0.9803, NSE values as 0.9832, 0.9800, and lower values of RMSE are 1.5680, 1.9384, MAE is 0.7854, 1.5155, Bias are 0.2353, 0.1350 and SI are 0.0347, 0.0414 for training and testing stages, respectively. The sensitivity investigation shows that the curing time (T) is the vital input variable affecting the prediction of the compressive strength of bacterial concrete, using this data set.
APA, Harvard, Vancouver, ISO, and other styles
38

Wang, Shuai, Yiping Yao, Feng Zhu, Wenjie Tang, and Yuhao Xiao. "A Probabilistic Prediction Approach for Memory Resource of Complex System Simulation in Cloud Computing Environment." Symmetry 12, no. 11 (November 4, 2020): 1826. http://dx.doi.org/10.3390/sym12111826.

Full text
Abstract:
Accurate memory resource prediction can achieve optimal performance for complex system simulation (CSS) using optimistic parallel execution in the cloud computing environment. However, because of the varying memory resource demands of CSS applications caused by the simulation entity scale and frequent optimistic synchronization, the existing approaches are unable to predict the memory resource required by a CSS application accurately, which cannot take full advantage of the elasticity and symmetry of cloud computing. In this paper, a probabilistic prediction approach based on ensemble learning, which regards the entity scale and frequent optimistic synchronization as the important features, is proposed. The approach using stacking strategy consists of a two-layer architecture. The first-layer architecture includes two kinds of base models, namely, back-propagation neural network (BPNN) and random forest (RF). The root mean squared error-based pruning algorithm is designed to choose the optimal subset of the base models. The second-layer is the Gaussian process regression (GPR) model, which is applied to quantify the uncertainty information in the probabilistic prediction for memory resources. A series of experiments are presented to prove that the proposed approach can achieve higher accuracy and performance compared to RF, BPNN, GPR, Bagging ensemble approach, and Regressive Ensemble Approach for Prediction.
APA, Harvard, Vancouver, ISO, and other styles
39

AlHadid, Issam, Evon Abu-Taieh, Rami S. Alkhawaldeh, Sufian Khwaldeh, Ra’ed Masa’deh, Khalid Kaabneh, and Ala’Aldin Alrowwad. "Predictors for E-Government Adoption of SANAD App Services Integrating UTAUT, TPB, TAM, Trust, and Perceived Risk." International Journal of Environmental Research and Public Health 19, no. 14 (July 7, 2022): 8281. http://dx.doi.org/10.3390/ijerph19148281.

Full text
Abstract:
Using mobile applications in e-government for the purpose of health protection is a new idea during COVID-19 epidemic. Hence, the goal of this study is to examine the various factors that influence the use of SANAD App As a health protection tool. The factors were adopted from well-established models like UTAUT, TAM, and extended PBT. Using survey data from 442 SANAD App from Jordan, the model was empirically validated using AMOS 20 confirmatory factor analysis, structural equation modeling (SEM) and machine learning (ML) methods were performed to assess the study hypotheses. The ML methods used are ANN, SMO, the bagging reduced error pruning tree (RepTree), and random forest. The results suggested several key findings: the respondents’ performance expectancy, effort expectancy, social influence, facilitating conditions, perceived risk, trust, and perceived service quality of this digital technology were significant antecedents for their attitude to using it. The strength of these relationships is affected by the moderating variables, including age, gender, educational level, and internet experience on behavioral intention. Yet, perceived risk did not have a significant effect on attitude towards SANAD App The study adds to literature by empirically testing and theorizing the effects of SANAD App on public health protection.
APA, Harvard, Vancouver, ISO, and other styles
40

Abu-Taieh, Evon, Issam AlHadid, Ra’ed Masa’deh, Rami S. Alkhawaldeh, Sufian Khwaldeh, and Ala’aldin Alrowwad. "Factors Influencing YouTube as a Learning Tool and Its Influence on Academic Achievement in a Bilingual Environment Using Extended Information Adoption Model (IAM) with ML Prediction—Jordan Case Study." Applied Sciences 12, no. 12 (June 9, 2022): 5856. http://dx.doi.org/10.3390/app12125856.

Full text
Abstract:
YouTube usage as a learning tool is evident among students. Hence, the goal of this study is to examine the various factors that influence the use of YouTube as a learning tool, which influences academic achievement in a bilingual academic context. Using survey data from 704 YouTube users from Jordan’s bilingual academic institutes, the research model was empirically validated. Using Amos 20, structural equation modeling (SEM) was performed to assess the study hypotheses. SEM permits concurrent checking of the direct and indirect effects of all hypotheses. Confirmatory factor analysis (CFA) was used to validate the instrument items’ properties in addition to machine learning methods: ANN, SMO, the bagging reduced error pruning tree (RepTree), and random forest. The empirical results offer several key findings: academic achievement (AA) is influenced by the information adoption (IA) of YouTube as a learning tool. Information adoption (IA) is influenced by information usefulness (IU). Source credibility (SC) and information quality (IQ) both influence information usefulness (IU), while information language (IL) does not. Information quality (IQ) is influenced by intrinsic, contextual, and accessibility information quality. This study adds to the literature by empirically testing and theorizing the effects of YouTube as a learning tool on the academic achievement of Jordanian university students who are studying in bilingual surroundings.
APA, Harvard, Vancouver, ISO, and other styles
41

Tu, Yu-Hsuan, Kasper Johansen, Stuart Phinn, and Andrew Robson. "Measuring Canopy Structure and Condition Using Multi-Spectral UAS Imagery in a Horticultural Environment." Remote Sensing 11, no. 3 (January 30, 2019): 269. http://dx.doi.org/10.3390/rs11030269.

Full text
Abstract:
Tree condition, pruning and orchard management practices within intensive horticultural tree crop systems can be determined via measurements of tree structure. Multi-spectral imagery acquired from an unmanned aerial system (UAS) has been demonstrated as an accurate and efficient platform for measuring various tree structural attributes, but research in complex horticultural environments has been limited. This research established a methodology for accurately estimating tree crown height, extent, plant projective cover (PPC) and condition of avocado tree crops, from a UAS platform. Individual tree crowns were delineated using object-based image analysis. In comparison to field measured canopy heights, an image-derived canopy height model provided a coefficient of determination (R2) of 0.65 and relative root mean squared error of 6%. Tree crown length perpendicular to the hedgerow was accurately mapped. PPC was measured using spectral and textural image information and produced an R2 value of 0.62 against field data. A random forest classifier was applied to assign tree condition into four categories in accordance with industry standards, producing out-of-bag accuracies >96%. Our results demonstrate the potential of UAS-based mapping for the provision of information to support the horticulture industry and facilitate orchard-based assessment and management.
APA, Harvard, Vancouver, ISO, and other styles
42

Karri, Praveen Kumar, D. Jaya Kumari, and Sowmya Sree Karri. "A Scalable Malware Detection Approach through Significant Permission Identification for Android Devices." International Journal of Innovation in Multidisciplinary Scientific Research 02, no. 01 (2024): 24–29. http://dx.doi.org/10.61239/ijimsr.2024.2113.

Full text
Abstract:
The global ubiquity of smartphones has led to the availability of many free apps for gaming, communication, financial, and educational needs. However, hazardous malicious software targeting smartphones has increased as the global adoption of these devices has grown. Malware is growing rapidly, with reports predicting a new Android app every 10 seconds, threatening the mobile ecosystem. Due to Android's versatility, users can install apps from third-party app shops and file-sharing websites, compounding malware outbreaks. The seriousness of this situation requires scalable malware detection. Based on permission usage analysis, this project introduces Significant Permission Identification (SigPID), a novel malware detection technique. SigPID uses a three-tiered permission pruning mechanism to discover the most important permissions for distinguishing benign from malicious apps, unlike standard methods that scan all Android permissions. The system first uses the Random Forest method for machine learning classifications. The study reduces non-sensitive permissions by identifying benign and harmful permission lists. Support Vector Machine classification, K-Nearest Neighbor, and Linear Regression are then used to a fresh dataset. SigPID, written in Python 3.7, is a powerful and scalable Android malware countermeasure. SigPID uses advanced machine learning and large permissions to protect the mobile ecosystem from harmful apps, making it safer and more secure.
APA, Harvard, Vancouver, ISO, and other styles
43

Bilal, Boudy, Kaan Yetilmezsoy, and Mohammed Ouassaid. "Benchmarking of Various Flexible Soft-Computing Strategies for the Accurate Estimation of Wind Turbine Output Power." Energies 17, no. 3 (February 1, 2024): 697. http://dx.doi.org/10.3390/en17030697.

Full text
Abstract:
This computational study explores the potential of several soft-computing techniques for wind turbine (WT) output power (kW) estimation based on seven input variables of wind speed (m/s), wind direction (°), air temperature (°C), pitch angle (°), generator temperature (°C), rotating speed of the generator (rpm), and voltage of the network (V). In the present analysis, a nonlinear regression-based model (NRM), three decision tree-based methods (random forest (RF), random tree (RT), and reduced error pruning tree (REPT) models), and multilayer perceptron-based soft-computing approach (artificial neural network (ANN) model) were simultaneously implemented for the first time in the prediction of WT output power (WTOP). To identify the top-performing soft computing technique, the applied models’ predictive success was compared using over 30 distinct statistical goodness-of-fit parameters. The performance assessment indices corroborated the superiority of the RF-based model over other data-intelligent models in predicting WTOP. It was seen from the results that the proposed RF-based model obtained the narrowest uncertainty bands and the lowest quantities of increased uncertainty values across all sets. Although the determination coefficient values of all competitive decision tree-based models were satisfactory, the lower percentile deviations and higher overall accuracy score of the RF-based model indicated its superior performance and higher accuracy over other competitive approaches. The generator’s rotational speed was shown to be the most useful parameter for RF-based model prediction of WTOP, according to a sensitivity study. This study highlighted the significance and capability of the implemented soft-computing strategy for better management and reliable operation of wind farms in wind energy forecasting.
APA, Harvard, Vancouver, ISO, and other styles
44

Almutairi, Saad, S. Manimurugan, Naveen Chilamkurti, Majed Mohammed Aborokbah, C. Narmatha, Subramaniam Ganesan, Riyadh A. Alzaheb, and Hani Almoamari. "A Context-Aware MRIPPER Algorithm for Heart Disease Prediction." Journal of Healthcare Engineering 2022 (July 11, 2022): 1–11. http://dx.doi.org/10.1155/2022/7853604.

Full text
Abstract:
These days, mobile computing devices are ubiquitous and are widely used in almost every facet of daily life. In addition, computing and the modern technologies are not really coexisting anymore. With a wide range of conditions and areas of concern, the medical domain was also concerned. New types of technologies, such as context-aware systems and applications, are constantly being infused into the medicine field. An IoT-enabled healthcare system based on context awareness is developed in this work. In order to collect and store the patient data, smart medical devices are employed. Context-aware data from the database includes the patient’s medical records and personal information. The MRIPPER (Modified Repeated Incremental Pruning to Produce Error) technique is used to analyze and classify the data. A rule-based machine learning method is used in this algorithm. The rules for analyzing datasets in order to make predictions about heart disease are framed using this algorithm. MATLAB is used to simulate the proposed model’s performance analysis. Other models like random forest, J48, CART, JRip, and OneR algorithms are also compared to validate the proposed model’s performance. The proposed model obtains 98.89 percent accuracy, 96.76 percent precision, 99.05 percent sensitivity, 94.35 percent specificity, and 97.60 percent f-score. Predictions for subjects in the normal and abnormal classes were both accurate with 97.38 for normal and 97.93 for abnormal subjects.
APA, Harvard, Vancouver, ISO, and other styles
45

Islam, Abu Reza Md Towfiqul, Swapan Talukdar, Shumona Akhter, Kutub Uddin Eibek, Md Mostafizur Rahman, Swades Pal, Mohd Waseem Naikoo, Atiqur Rahman, and Amir Mosavi. "Assessing the Impact of the Farakka Barrage on Hydrological Alteration in the Padma River with Future Insight." Sustainability 14, no. 9 (April 26, 2022): 5233. http://dx.doi.org/10.3390/su14095233.

Full text
Abstract:
Climate change and human interventions (e.g., massive barrages, dams, sand mining, and sluice gates) in the Ganga–Padma River (India and Bangladesh) have escalated in recent decades, disrupting the natural flow regime and habitat. This study employed innovative trend analysis (ITA), range of variability approach (RVA), and continuous wavelet analysis (CWA) to quantify the past to future hydrological change in the river because of the building of the Farakka Barrage (FB). We also forecast flow regimes using unique hybrid machine learning techniques based on particle swarm optimization (PSO). The ITA findings revealed that the average discharge trended substantially negatively throughout the dry season (January–May). However, the RVA analysis showed that average discharge was lower than environmental flows. The CWA indicated that the FB has a significant influence on the periodicity of the streamflow regime. PSO-Reduced Error Pruning Tree (REPTree) was the best fit for average discharge prediction (RMSE = 0.14), PSO-random forest (RF) was the best match for maximum discharge (RMSE = 0.3), and PSO-M5P (RMSE = 0.18) was better for the lowest discharge prediction. Furthermore, the basin’s discharge has reduced over time, concerning the riparian environment. This research describes the measurement of hydrological change and forecasts the discharge for upcoming days, which might be valuable in developing sustainable water resource management plans in this location.
APA, Harvard, Vancouver, ISO, and other styles
46

Zhu, Jun, Ziwu Pan, Hang Wang, Peijie Huang, Jiulin Sun, Fen Qin, and Zhenzhen Liu. "An Improved Multi-temporal and Multi-feature Tea Plantation Identification Method Using Sentinel-2 Imagery." Sensors 19, no. 9 (May 5, 2019): 2087. http://dx.doi.org/10.3390/s19092087.

Full text
Abstract:
As tea is an important economic crop in many regions, efficient and accurate methods for remotely identifying tea plantations are essential for the implementation of sustainable tea practices and for periodic monitoring. In this study, we developed and tested a method for tea plantation identification based on multi-temporal Sentinel-2 images and a multi-feature Random Forest (RF) algorithm. We used phenological patterns of tea cultivation in China’s Shihe District (such as the multiple annual growing, harvest, and pruning stages) to extracted multi-temporal Sentinel-2 MSI bands, their derived first spectral derivative, NDVI and textures, and topographic features. We then assessed feature importance using RF analysis; the optimal combination of features was used as the input variable for RF classification to extract tea plantations in the study area. A comparison of our results with those achieved using the Support Vector Machine method and statistical data from local government departments showed that our method had a higher producer’s accuracy (96.57%) and user’s accuracy (96.02%). These results demonstrate that: (1) multi-temporal and multi-feature classification can improve the accuracy of tea plantation recognition, (2) RF classification feature importance analysis can effectively reduce feature dimensions and improve classification efficiency, and (3) the combination of multi-temporal Sentinel-2 images and the RF algorithm improves our ability to identify and monitor tea plantations.
APA, Harvard, Vancouver, ISO, and other styles
47

Yallini, S. K. Komagal. "An Ensemble Methods of Predicting the New Labels with Concept Drift from a High-Dimensional Data Stream." international journal of advanced research in computer science 15, no. 2 (April 20, 2024): 92–100. http://dx.doi.org/10.26483/ijarcs.v15i2.7068.

Full text
Abstract:
Multi-Label Learning (MLL) has arisen in data engineering to identify instances based on a specific feature associated with a collection of labels. Adaptive learning necessitates classifying features with New Labels (NLs) if a data stream contains newer perspectives. As a result, an MLL with Emerging Multiple NLs (MuEMNL) and managing High-Dimensional data streams (MuEMNLHD) approaches were developed that divides the NL sets into multiple NLs for efficient classification. However, it did not handle concept drift issues when huge amounts of data arrived at high speeds using limited resources. Hence, this article proposes an adaptive ensemble learning approach to cope with a huge amount of data streams and solve concept drift issues by constructing a MuEMNL-Ensemble Neural Network (ENN) rather than a random forest classifier. It defines the number of NNs in the ensemble, whether or not they use constructive pruning, how many hidden nodes each NN uses, and how many training samples are used to train each NN independently. Also, to solve the concept drifts, pairwise and non-pairwise diversity measures are analyzed while constructing ensemble NN for efficient training using the entire learning examples. Moreover, the tradeoff between the NN’s precision and diversity is maintained simultaneously. At last, the test outcomes reveal that the proposed approach attains a better performance contrasted with the existing MLL approaches.
APA, Harvard, Vancouver, ISO, and other styles
48

Joshi, Ankur, Sukanya Sharma, N. V. M. Rao, and A. K. Vaish. "Usage of Machine Learning Algorithm Models to Predict Operational Efficiency Performance of Selected Banking Sectors of India." International Journal of Emerging Technology and Advanced Engineering 12, no. 6 (June 2, 2022): 105–14. http://dx.doi.org/10.46338/ijetae0622_14.

Full text
Abstract:
—It was an attempt to predict the impact of NPAs in the selected public (SBI, BoI, BoB, BoM, CBoI, AB, CB, AlB,) and private (AxB, ICB, HDFCB and KB) banking sectors from 2008 to 2019. The data was also used to predict operational performance efficiency of these banking sectors after extracting through machine learning (ML) algorithm models and statistical interpretation of prediction accuracy by using WEKA tool. We used different models viz. NaiveBayes (NB), BayesNet (BN), logistic regression (LgR), Sequential minimal optimization of Support Vector Machine regression (SMOreg), Linear Logistic Regression (SL), Classification via Regression (CR), LogitBoost (LB); Logistic Model Tree (LMT), Random Forest & Random tree (RF & RT), Pruned & unpruned decision tree C4 (J48), and Class implementing minimal cost-complexity pruning (Cart) related to 15 attributes viz. GNPA, NNPA, GDP, CPI, PSL, TL, STA, GDP-1, RR, CPI-1, TE, TP and USTA as numeric as well as Banks, Year, GNPA>6, and GNPA>7, as nominal categories of dataset where overall performance accuracy was determined. The algorithm model classification predicted the highest values were for LB (78.47%) and Cart (74.30%) followed by J48 (73.61%), CR (72.91%) and LMT (69.44%) and lowest value in SMO (34.72%) as per 10-fold cross validation test. Additionally, these predicted results may have valuable implications for Indian banking sectors. We evaluated the operational efficiency as cumulative performance for 12 banking sectors as per assumed cut off values of GNPA. It may be varied with other independent variables like credit risk parameters, etc. It is suggested in future to study with parameters of deposit collection and investment to determine credit risk of these banking sectors. Keywords—Indian banking sectors, Machine learning models, Non-performing assets, Operational efficiency, WEKA tool
APA, Harvard, Vancouver, ISO, and other styles
49

Abounoas, Zahira, Wassim Raphael, Yarob Badr, Rafic Faddoul, and Anne Guillaume. "Crash data reporting systems in fourteen Arab countries: challenges and improvement." Archives of Transport 56, no. 4 (December 30, 2020): 73–88. http://dx.doi.org/10.5604/01.3001.0014.5628.

Full text
Abstract:
Traffic crash fatalities and serious injuries still represent a big burden for most Arab countries because the actual policies, strategies, and interventions are based on poorly collected data. Through this paper, we assessed the crash data reporting systems in Fourteen Arab countries via a survey conducted to identify the fundamental dysfunctions at the management and data collection levels. Then, to address some of the dataset problems, we had applied data mining technics to select a minimum of variables (crash, vehicle, and road user) that should be collected for a better understanding of crash circumstances. For this raison, three methods of selection (correlation, information gain, and gain ratio) and seven classifiers (naive Bayes, nearest neighbour, random forest, random tree, J48, reduced error pruning tree, and bagging) were tested and compared to identify the variables that affect significantly the crashes severity. Decision trees family of classifiers showed the best performance based on the analysis of the area under the curve. The explanatory variables obtained from the data mining process were combined with other descriptive variables to maintain traceability. As a result, we produced hybrid lists of variables for the crash, vehicle, and road user, each contains 25 variables. Finally, in order to propose a cost-effective solution to switch from manual to electronic data collection, we got inspired by a tool used to track animals to create and customize a unified e-form for handheld devices, in order to ensure easy entering of the harmonized data for the entire region based on our selected lists of variables. The tool verified the countries requirements especially by enabling data collection and transfer with and without the internet, and by allowing data analysis thought its built-in Geographic Information System (GIS) capabilities.
APA, Harvard, Vancouver, ISO, and other styles
50

Singh, Sanjay, Rajiv Pandey, and Rameshwar Das. "Estimation of Diospyros melanoxylon Roxb. Leaves Production in Forests of Jharkhand, India." Asian Plant Research Journal 11, no. 6 (October 16, 2023): 1–8. http://dx.doi.org/10.9734/aprj/2023/v11i6226.

Full text
Abstract:
Diospyros melanoxylon Roxb. leaves used in manufacture of indigenous traditional cigarette, called as bidi, contributes to social economic livelihood of rural and tribal people in India generating a source of subsidiary occupation and supplementary income apart from providing significant revenue to state forest departments. However, a reliable, scientific and statistically sound estimate of its production is essential to obtain optimal revenue and livelihood opportunities in the sector. Thus, the present study was carried out to quantify production of D. melanoxylon leaves in the state of Jharkhand, India on a scientific basis. Collection of D. melanoxylon leaves being a time-bound seasonal activity, field survey and data collection was planned by dividing the state into in five administrative zones namely Palamu, Hazaribagh, Giridih, Singhbhum and Ranchi divided into 45 MFP ranges comprising of 295 lots and 686 collection units. The focus of sampling was estimation of number of plants in a fixed area i.e. plant density, number of pluck able leaves per plant, number of total pluckable leaves per plant, and plant growth geometry. A novel sampling strategy designed as Stratified Cluster Line Transect Quadrat Random Sampling comprising of essential elements of sampling strategy and vegetational survey methods with line transect of 100m x 10m and quadrate of 5m x 5m. In all, 52% collection units from all lots of entire MFP ranges were surveyed. Three permanent plots per MFP range were also maintained to evaluate the quantum of leaf harvesting throughout the collection season by collecting all pluckable leaves. The productivity of D. melanoxylon leaves was found around 11.50 lakh standard bags (52000 tonnes). However, realized yield in absence of pruning/pollarding and other silvicultural operation as well as the political disturbance severely hampering efficient working, may be less.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography