Log in

Relevant bibliographies by topics / Random Forests Classifiers / Journal articles

To see the other types of publications on this topic, follow the link: Random Forests Classifiers.

Journal articles on the topic 'Random Forests Classifiers'

Author: Grafiati

Published: 28 June 2021

Last updated: 14 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Random Forests Classifiers.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Sadorsky, Perry. "Predicting Gold and Silver Price Direction Using Tree-Based Classifiers." Journal of Risk and Financial Management 14, no. 5 (April 29, 2021): 198. http://dx.doi.org/10.3390/jrfm14050198.

Full text

Abstract:

Gold is often used by investors as a hedge against inflation or adverse economic times. Consequently, it is important for investors to have accurate forecasts of gold prices. This paper uses several machine learning tree-based classifiers (bagging, stochastic gradient boosting, random forests) to predict the price direction of gold and silver exchange traded funds. Decision tree bagging, stochastic gradient boosting, and random forests predictions of gold and silver price direction are much more accurate than those obtained from logit models. For a 20-day forecast horizon, tree bagging, stochastic gradient boosting, and random forests produce accuracy rates of between 85% and 90% while logit models produce accuracy rates of between 55% and 60%. Stochastic gradient boosting accuracy is a few percentage points less than that of random forests for forecast horizons over 10 days. For those looking to forecast the direction of gold and silver prices, tree bagging and random forests offer an attractive combination of accuracy and ease of estimation. For each of gold and silver, a portfolio based on the random forests price direction forecasts outperformed a buy and hold portfolio.

APA, Harvard, Vancouver, ISO, and other styles

2

Kulyukin, Vladimir, Nikhil Ganta, and Anastasiia Tkachenko. "On Image Classification in Video Analysis of Omnidirectional Apis Mellifera Traffic: Random Reinforced Forests vs. Shallow Convolutional Networks." Applied Sciences 11, no. 17 (September 2, 2021): 8141. http://dx.doi.org/10.3390/app11178141.

Full text

Abstract:

Omnidirectional honeybee traffic is the number of bees moving in arbitrary directions in close proximity to the landing pad of a beehive over a period of time. Automated video analysis of such traffic is critical for continuous colony health assessment. In our previous research, we proposed a two-tier algorithm to measure omnidirectional bee traffic in videos. Our algorithm combines motion detection with image classification: in tier 1, motion detection functions as class-agnostic object location to generate regions with possible objects; in tier 2, each region from tier 1 is classified by a class-specific classifier. In this article, we present an empirical and theoretical comparison of random reinforced forests and shallow convolutional networks as tier 2 classifiers. A random reinforced forest is a random forest trained on a dataset with reinforcement learning. We present several methods of training random reinforced forests and compare their performance with shallow convolutional networks on seven image datasets. We develop a theoretical framework to assess the complexity of image classification by a image classifier. We formulate and prove three theorems on finding optimal random reinforced forests. Our conclusion is that, despite their limitations, random reinforced forests are a reasonable alternative to convolutional networks when memory footprints and classification and energy efficiencies are important factors. We outline several ways in which the performance of random reinforced forests may be improved.

APA, Harvard, Vancouver, ISO, and other styles

3

Daho, Mostafa El Habib, and Mohammed Amine Chikh. "Combining Bootstrapping Samples, Random Subspaces and Random Forests to Build Classifiers." Journal of Medical Imaging and Health Informatics 5, no. 3 (June 1, 2015): 539–44. http://dx.doi.org/10.1166/jmihi.2015.1423.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Alhudhaif, Adi. "A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic sampling (ADASYN) approach." PeerJ Computer Science 7 (May 14, 2021): e523. http://dx.doi.org/10.7717/peerj-cs.523.

Full text

Abstract:

Background Brain signals (EEG—Electroencephalography) are a gold standard frequently used in epilepsy prediction. It is crucial to predict epilepsy, which is common in the community. Early diagnosis is essential to reduce the treatment process of the disease and to keep the process healthier. Methods In this study, a five-classes dataset was used: EEG signals from different individuals, healthy EEG signals from tumor document, EEG signal with epilepsy, EEG signal with eyes closed, and EEG signal with eyes open. Four different methods have been proposed to classify five classes of EEG signals. In the first approach, the EEG signal was first divided into four different bands (beta, alpha, theta, and delta), and then 25 time-domain features were extracted from each band, and the main EEG signal and these extracted features were combined to obtain 125-time domain features (feature extraction). Using the Random Forests classifier, EEG activities were classified into five classes. In the second approach, each One-Against-One (OVO) approach with 125 attributes was split into ten parts, pairwise, and then each piece was classified with the Random Forests classifier. The majority voting scheme was used to combine decisions from the ten classifiers. In the third proposed method, each One-Against-All (OVA) approach with 125 attributes was divided into five parts, and then each piece was classified with the Random Forests classifier. The majority voting scheme was used to combine decisions from the five classifiers. In the fourth proposed approach, each One-Against-All (OVA) approach with 125 attributes was divided into five parts. Since each piece obtained had an imbalanced data distribution, an adaptive synthetic (ADASYN) sampling approach was used to stabilize each piece. Then, each balanced piece was classified with the Random Forests classifier. To combine the decisions obtanied from each classifier, the majority voting scheme has been used. Results The first approach achieved 71.90% classification success in classifying five-class EEG signals. The second approach achieved a classification success of 91.08% in classifying five-class EEG signals. The third method achieved 89% success, while the fourth proposed approach achieved 91.72% success. The results obtained show that the proposed fourth approach (the combination of the ADASYN sampling approach and Random Forest Classifier) achieved the best success in classifying five class EEG signals. This proposed method could be used in the detection of epilepsy events in the EEG signals.

APA, Harvard, Vancouver, ISO, and other styles

5

Yu, Tianyu, Cuiwei Liu, Zhuo Yan, and Xiangbin Shi. "A Multi-Task Framework for Action Prediction." Information 11, no. 3 (March 16, 2020): 158. http://dx.doi.org/10.3390/info11030158.

Full text

Abstract:

Predicting the categories of actions in partially observed videos is a challenging task in the computer vision field. The temporal progress of an ongoing action is of great importance for action prediction, since actions can present different characteristics at different temporal stages. To this end, we propose a novel multi-task deep forest framework, which treats temporal progress analysis as a relevant task to action prediction and takes advantage of observation ratio labels of incomplete videos during training. The proposed multi-task deep forest is a cascade structure of random forests and multi-task random forests. Unlike the traditional single-task random forests, multi-task random forests are built upon incomplete training videos annotated with action labels as well as temporal progress labels. Meanwhile, incorporating both random forests and multi-task random forests can increase the diversity of classifiers and improve the discriminative power of the multi-task deep forest. Experiments on the UT-Interaction and the BIT-Interaction datasets demonstrate the effectiveness of the proposed multi-task deep forest.

APA, Harvard, Vancouver, ISO, and other styles

6

Polaka, Inese, Igor Tom, and Arkady Borisov. "Decision Tree Classifiers in Bioinformatics." Scientific Journal of Riga Technical University. Computer Sciences 42, no. 1 (January 1, 2010): 118–23. http://dx.doi.org/10.2478/v10143-010-0052-4.

Full text

Abstract:

Decision Tree Classifiers in BioinformaticsThis paper presents a literature review of articles related to the use of decision tree classifiers in gene microarray data analysis published in the last ten years. The main focus is on researches solving the cancer classification problem using single decision tree classifiers (algorithms C4.5 and CART) and decision tree forests (e.g. random forests) showing strengths and weaknesses of the proposed methodologies when compared to other popular classification methods. The article also touches the use of decision tree classifiers in gene selection.

APA, Harvard, Vancouver, ISO, and other styles

7

El Habib Daho, Mostafa, Nesma Settouti, Mohammed El Amine Bechar, Amina Boublenza, and Mohammed Amine Chikh. "A new correlation-based approach for ensemble selection in random forests." International Journal of Intelligent Computing and Cybernetics 14, no. 2 (March 23, 2021): 251–68. http://dx.doi.org/10.1108/ijicc-10-2020-0147.

Full text

Abstract:

PurposeEnsemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems. Despite the effectiveness of these techniques, studies have shown that ensemble methods generate a large number of hypotheses and that contain redundant classifiers in most cases. Several works proposed in the state of the art attempt to reduce all hypotheses without affecting performance.Design/methodology/approachIn this work, the authors are proposing a pruning method that takes into consideration the correlation between classifiers/classes and each classifier with the rest of the set. The authors have used the random forest algorithm as trees-based ensemble classifiers and the pruning was made by a technique inspired by the CFS (correlation feature selection) algorithm.FindingsThe proposed method CES (correlation-based Ensemble Selection) was evaluated on ten datasets from the UCI machine learning repository, and the performances were compared to six ensemble pruning techniques. The results showed that our proposed pruning method selects a small ensemble in a smaller amount of time while improving classification rates compared to the state-of-the-art methods.Originality/valueCES is a new ordering-based method that uses the CFS algorithm. CES selects, in a short time, a small sub-ensemble that outperforms results obtained from the whole forest and the other state-of-the-art techniques used in this study.

APA, Harvard, Vancouver, ISO, and other styles

8

Krautenbacher, Norbert, Fabian J. Theis, and Christiane Fuchs. "Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies." Computational and Mathematical Methods in Medicine 2017 (2017): 1–18. http://dx.doi.org/10.1155/2017/7847531.

Full text

Abstract:

Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R packagesambia.

APA, Harvard, Vancouver, ISO, and other styles

9

Liu, Sheng, Yixin Chen, and Dawn Wilkins. "Large margin classifiers and Random Forests for integrated biological prediction." International Journal of Bioinformatics Research and Applications 8, no. 1/2 (2012): 38. http://dx.doi.org/10.1504/ijbra.2012.045975.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Van Assche, Anneleen, Celine Vens, Hendrik Blockeel, and Sašo Džeroski. "First order random forests: Learning relational classifiers with complex aggregates." Machine Learning 64, no. 1-3 (June 21, 2006): 149–82. http://dx.doi.org/10.1007/s10994-006-8713-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Forsey, D., B. Leblon, A. LaRocque, M. Skinner, and A. Douglas. "EELGRASS MAPPING IN ATLANTIC CANADA USING WORLDVIEW-2 IMAGERY." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B3-2020 (August 21, 2020): 685–92. http://dx.doi.org/10.5194/isprs-archives-xliii-b3-2020-685-2020.

Full text

Abstract:

Abstract. Eelgrass (Zostera marina L.) is a marine angiosperm plant that grows throughout coastal areas in Atlantic Canada. Eelgrass meadows provide numerous ecosystem services, and while they have been acknowledged as important habitats, their location, extent, and health in Atlantic Canada are poorly understood. This study examined the effectiveness of WorldView-2 optical satellite imagery to map eelgrass presence in Tabusintac Bay, New Brunswick (Canada), an estuarine lagoon with extensive eelgrass coverage. The imagery was classified using two supervised classifiers: the parametric Maximum Likelihood Classifier (MLC) and the non-parametric Random Forests (RF) classifier. While Random Forests was expected to produce higher classification accuracies, it was shown not to be much better than MLC. The overall validation accuracy was 97.6% with RF and 99.8% with MLC.

APA, Harvard, Vancouver, ISO, and other styles

12

Yao, Jianzhuang, Hong Guo, and Xiaohan Yang. "PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction." International Journal of Genomics 2015 (2015): 1–7. http://dx.doi.org/10.1155/2015/608042.

Full text

Abstract:

Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using an assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.

APA, Harvard, Vancouver, ISO, and other styles

13

Wu, David J., Tony Feng, Michael Naehrig, and Kristin Lauter. "Privately Evaluating Decision Trees and Random Forests." Proceedings on Privacy Enhancing Technologies 2016, no. 4 (October 1, 2016): 335–55. http://dx.doi.org/10.1515/popets-2016-0043.

Full text

Abstract:

Abstract Decision trees and random forests are common classifiers with widespread use. In this paper, we develop two protocols for privately evaluating decision trees and random forests. We operate in the standard two-party setting where the server holds a model (either a tree or a forest), and the client holds an input (a feature vector). At the conclusion of the protocol, the client learns only the model’s output on its input and a few generic parameters concerning the model; the server learns nothing. The first protocol we develop provides security against semi-honest adversaries. We then give an extension of the semi-honest protocol that is robust against malicious adversaries. We implement both protocols and show that both variants are able to process trees with several hundred decision nodes in just a few seconds and a modest amount of bandwidth. Compared to previous semi-honest protocols for private decision tree evaluation, we demonstrate a tenfold improvement in computation and bandwidth.

APA, Harvard, Vancouver, ISO, and other styles

14

Ranzato, Francesco, and Marco Zanella. "Abstract Interpretation of Decision Tree Ensemble Classifiers." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 5478–86. http://dx.doi.org/10.1609/aaai.v34i04.5998.

Full text

Abstract:

We study the problem of formally and automatically verifying robustness properties of decision tree ensemble classifiers such as random forests and gradient boosted decision tree models. A recent stream of works showed how abstract interpretation, which is ubiquitously used in static program analysis, can be successfully deployed to formally verify (deep) neural networks. In this work we push forward this line of research by designing a general and principled abstract interpretation-based framework for the formal verification of robustness and stability properties of decision tree ensemble models. Our abstract interpretation-based method may induce complete robustness checks of standard adversarial perturbations and output concrete adversarial attacks. We implemented our abstract verification technique in a tool called silva, which leverages an abstract domain of not necessarily closed real hyperrectangles and is instantiated to verify random forests and gradient boosted decision trees. Our experimental evaluation on the MNIST dataset shows that silva provides a precise and efficient tool which advances the current state of the art in tree ensembles verification.

APA, Harvard, Vancouver, ISO, and other styles

15

Niculescu, S., J. Xia, D. Roberts, and A. Billey. "ROTATION FORESTS AND RANDOM FOREST CLASSIFIERS FOR MONITORING OF VEGETATION IN PAYS DE BREST (FRANCE)." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B3-2020 (August 21, 2020): 727–32. http://dx.doi.org/10.5194/isprs-archives-xliii-b3-2020-727-2020.

Full text

Abstract:

Abstract. Remote sensing is a potentially very useful source of information for spatial monitoring of natural or cultivated vegetation. The latest advances, in particular the arrival of new image acquisition programs, are changing the temporal approach to monitoring vegetation. The latest European satellites launched, delivering an image every 5 days for each point on the globe, allow the end of a growing season to be monitored. The main objective of this work is to identify and map the vegetation in the Pays de Brest area by using a multi sensors stacking of Sentinel-1 and Sentinel-2 satellites data via Random Forest, Rotation forests (RoF) and Canonical Correlation Forests (CCFs). RoF and CCF create diverse base learners using data transformation and subset features. Twenty four radar images and optical dataa representing different dates in 2017 were processed in time series stacks. The results of RoF and CCF were compared with the ones of RF.

APA, Harvard, Vancouver, ISO, and other styles

16

Catal, Cagatay, Serkan Tugul, and Basar Akpinar. "Automatic Software Categorization Using Ensemble Methods and Bytecode Analysis." International Journal of Software Engineering and Knowledge Engineering 27, no. 07 (September 2017): 1129–44. http://dx.doi.org/10.1142/s0218194017500425.

Full text

Abstract:

Software repositories consist of thousands of applications and the manual categorization of these applications into domain categories is very expensive and time-consuming. In this study, we investigate the use of an ensemble of classifiers approach to solve the automatic software categorization problem when the source code is not available. Therefore, we used three data sets (package level/class level/method level) that belong to 745 closed-source Java applications from the Sharejar repository. We applied the Vote algorithm, AdaBoost, and Bagging ensemble methods and the base classifiers were Support Vector Machines, Naive Bayes, J48, IBk, and Random Forests. The best performance was achieved when the Vote algorithm was used. The base classifiers of the Vote algorithm were AdaBoost with J48, AdaBoost with Random Forest, and Random Forest algorithms. We showed that the Vote approach with method attributes provides the best performance for automatic software categorization; these results demonstrate that the proposed approach can effectively categorize applications into domain categories in the absence of source code.

APA, Harvard, Vancouver, ISO, and other styles

17

Sharma, Ram, and Keitarou Hara. "Characterization of Vegetation Physiognomic Types Using Bidirectional Reflectance Data." Geosciences 8, no. 11 (October 29, 2018): 394. http://dx.doi.org/10.3390/geosciences8110394.

Full text

Abstract:

This paper presents an assessment of the bidirectional reflectance features for the classification and characterization of vegetation physiognomic types at a national scale. The bidirectional reflectance data at multiple illumination and viewing geometries were generated by simulating the Moderate Resolution Imaging Spectroradiometer (MODIS) Bidirectional Reflectance Distribution Function (BRDF) model parameters with Ross-Thick Li-Sparse-Reciprocal (RT-LSR) kernel weights. This research dealt with the classification and characterization of six vegetation physiognomic types—evergreen coniferous forest, evergreen broadleaf forest, deciduous coniferous forest, deciduous broadleaf forest, shrubs, and herbaceous—which are distributed all over the country. The supervised classification approach was used by employing four machine learning classifiers—k-Nearest Neighbors (KNN), Random Forests (RF), Support Vector Machines (SVM), and Multilayer Perceptron Neural Networks (NN)—with the support of ground truth data. The confusion matrix, overall accuracy, and kappa coefficient were calculated through a 10-fold cross-validation approach, and were also used as the metrics for quantitative evaluation. Among the classifiers tested, the accuracy metrics did not vary much with the classifiers; however, the Random Forests (RF; Overall accuracy = 0.76, Kappa coefficient = 0.72) and Support Vector Machines (SVM; Overall accuracy = 0.76, Kappa coefficient = 0.71) classifiers performed slightly better than other classifiers. The bidirectional reflectance spectra did not only vary with the vegetation physiognomic types, it also showed a pronounced difference between the backward and forward scattering directions. Thus, the bidirectional reflectance data provides additional features for improving the classification and characterization of vegetation physiognomic types at the broad scale.

APA, Harvard, Vancouver, ISO, and other styles

18

Steyrl, David, Reinhold Scherer, Josef Faller, and Gernot R. Müller-Putz. "Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier." Biomedical Engineering / Biomedizinische Technik 61, no. 1 (February 1, 2016): 77–86. http://dx.doi.org/10.1515/bmt-2014-0117.

Full text

Abstract:

Abstract There is general agreement in the brain-computer interface (BCI) community that although non-linear classifiers can provide better results in some cases, linear classifiers are preferable. Particularly, as non-linear classifiers often involve a number of parameters that must be carefully chosen. However, new non-linear classifiers were developed over the last decade. One of them is the random forest (RF) classifier. Although popular in other fields of science, RFs are not common in BCI research. In this work, we address three open questions regarding RFs in sensorimotor rhythm (SMR) BCIs: parametrization, online applicability, and performance compared to regularized linear discriminant analysis (LDA). We found that the performance of RF is constant over a large range of parameter values. We demonstrate – for the first time – that RFs are applicable online in SMR-BCIs. Further, we show in an offline BCI simulation that RFs statistically significantly outperform regularized LDA by about 3%. These results confirm that RFs are practical and convenient non-linear classifiers for SMR-BCIs. Taking into account further properties of RFs, such as independence from feature distributions, maximum margin behavior, multiclass and advanced data mining capabilities, we argue that RFs should be taken into consideration for future BCIs.

APA, Harvard, Vancouver, ISO, and other styles

19

Dhamodaran, S., G. KipsonRoy, A. Kishor, J. Refonaa, and S. L. JanyShabu. "A Comparative Analysis of Rainfall Prediction Using Support Vector Machine and Random Forest." Journal of Computational and Theoretical Nanoscience 17, no. 8 (August 1, 2020): 3539–42. http://dx.doi.org/10.1166/jctn.2020.9227.

Full text

Abstract:

Classification and prediction of various data have become one of the most interesting area of research work. Numerous researchers are working on various data samples for predicting and classifying into various categories. Weather prediction is not an exception. Rainfall prediction is one of the most widely predicted data sample as it needs to be predicted in advance in order to take preventive measures. Heavy rainfall can endanger day to day activities and also could lead to various other disasters such as floods. Though numerous research works have been carried out on this particular area, we have made an comparative analyses of predicting rainfall using SVM classifier and Random Forest classifier. A web portal is built which runs SVM and Random Forest algorithms, that are designed for predicting the rainfall. The comparison results show that SVM classifiers are able to predict better than the Random Forests algorithms by classifying the data sample for about 92%.

APA, Harvard, Vancouver, ISO, and other styles

20

Ion-Margineanu, Adrian, Sofie Van Cauter, Diana M. Sima, Frederik Maes, Stefaan W. Van Gool, Stefan Sunaert, Uwe Himmelreich, and Sabine Van Huffel. "Tumour Relapse Prediction Using Multiparametric MR Data Recorded during Follow-Up of GBM Patients." BioMed Research International 2015 (2015): 1–13. http://dx.doi.org/10.1155/2015/842923.

Full text

Abstract:

Purpose. We have focused on finding a classifier that best discriminates between tumour progression and regression based on multiparametric MR data retrieved from follow-up GBM patients.Materials and Methods. Multiparametric MR data consisting of conventional and advanced MRI (perfusion, diffusion, and spectroscopy) were acquired from 29 GBM patients treated with adjuvant therapy after surgery over a period of several months. A 27-feature vector was built for each time point, although not all features could be obtained at all time points due to missing data or quality issues. We tested classifiers using LOPO method on complete and imputed data. We measure the performance by computing BER for each time point and wBER for all time points.Results. If we train random forests, LogitBoost, or RobustBoost on data with complete features, we can differentiate between tumour progression and regression with 100% accuracy, one time point (i.e., about 1 month) earlier than the date when doctors had put a label (progressive or responsive) according to established radiological criteria. We obtain the same result when training the same classifiers solely on complete perfusion data.Conclusions. Our findings suggest that ensemble classifiers (i.e., random forests and boost classifiers) show promising results in predicting tumour progression earlier than established radiological criteria and should be further investigated.

APA, Harvard, Vancouver, ISO, and other styles

21

Scott, I. M., W. Lin, M. Liakata, J. E. Wood, C. P. Vermeer, D. Allaway, J. L. Ward, et al. "Merits of random forests emerge in evaluation of chemometric classifiers by external validation." Analytica Chimica Acta 801 (November 2013): 22–33. http://dx.doi.org/10.1016/j.aca.2013.09.027.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Ahmed, Ismail Taha, Baraa Tareq Hammad, and Norziana Jamil. "Common Gabor Features for Image Watermarking Identification." Applied Sciences 11, no. 18 (September 8, 2021): 8308. http://dx.doi.org/10.3390/app11188308.

Full text

Abstract:

Image watermarking is one of many methods for preventing unauthorized alterations to digital images. The major goal of the research is to find and identify photos that include a watermark, regardless of the method used to add the watermark or the shape of the watermark. As a result, this study advocated using the best Gabor features and classifiers to improve the accuracy of image watermarking identification. As classifiers, discriminant analysis (DA) and random forests are used. The DA and random forest use mean squared energy feature, mean amplitude feature, and combined feature vector as inputs for classification. The performance of the classifiers is evaluated using a variety of feature sets, and the best results are achieved. In order to assess the performance of the proposed method, we use a public database. VOC2008 is a public database that we use. The findings reveal that our proposed method’s DA classifier with integrated features had the greatest TPR of 93.71 and the lowest FNR of 6.29. This shows that the performance outcomes of the proposed approach are consistent. The proposed method has the advantages of being able to find images with the watermark in any database and not requiring a specific type or algorithm for embedding the watermark.

APA, Harvard, Vancouver, ISO, and other styles

23

Sharma, Ram C., Keitarou Hara, and Hidetake Hirayama. "A Machine Learning and Cross-Validation Approach for the Discrimination of Vegetation Physiognomic Types Using Satellite Based Multispectral and Multitemporal Data." Scientifica 2017 (2017): 1–8. http://dx.doi.org/10.1155/2017/9806479.

Full text

Abstract:

This paper presents the performance and evaluation of a number of machine learning classifiers for the discrimination between the vegetation physiognomic classes using the satellite based time-series of the surface reflectance data. Discrimination of six vegetation physiognomic classes, Evergreen Coniferous Forest, Evergreen Broadleaf Forest, Deciduous Coniferous Forest, Deciduous Broadleaf Forest, Shrubs, and Herbs, was dealt with in the research. Rich-feature data were prepared from time-series of the satellite data for the discrimination and cross-validation of the vegetation physiognomic types using machine learning approach. A set of machine learning experiments comprised of a number of supervised classifiers with different model parameters was conducted to assess how the discrimination of vegetation physiognomic classes varies with classifiers, input features, and ground truth data size. The performance of each experiment was evaluated by using the 10-fold cross-validation method. Experiment using the Random Forests classifier provided highest overall accuracy (0.81) and kappa coefficient (0.78). However, accuracy metrics did not vary much with experiments. Accuracy metrics were found to be very sensitive to input features and size of ground truth data. The results obtained in the research are expected to be useful for improving the vegetation physiognomic mapping in Japan.

APA, Harvard, Vancouver, ISO, and other styles

24

Hanberry, Brice B. "Classifying Large Wildfires in the United States by Land Cover." Remote Sensing 12, no. 18 (September 12, 2020): 2966. http://dx.doi.org/10.3390/rs12182966.

Full text

Abstract:

Fire is an ecological process that also has socio-economic effects. To learn more about fire occurrence, I examined relationships between land classes and about 12,000 spatially delineated large wildfires (defined here as uncontrolled fires ≥200 ha, although definitions vary) during 1999 to 2017 in the conterminous United States. Using random forests, extreme gradient boosting, and c5.0 classifiers, I modeled all fires, first years (1999 to 2002), last years (2014 to 2017), the eastern, central, and western United States and seven ecoregions. The three classifiers performed well (true positive rates 0.82 to 0.94) at modeling all fires and fires by year, region, and ecoregion. The random forests classifier did not predict to other time intervals or regions as well as other classifiers and models were not constant in time and space. For example, the eastern region overpredicted fires in the western region and models for the western region underpredicted fires in the eastern region. Overall, greater abundance of herbaceous grasslands, or herbaceous wetlands in the eastern region, and evergreen forest and low abundance of crops and pasture characterized most large fires, even with regional differences. The 14 states in the northeastern United States with no or few large fires contained limited herbaceous area and abundant crops or developed lands. Herbaceous vegetation was the most important variable for fire occurrences in the western region. Lack of crops was most important for fires in the central region and a lack of pasture, crops, and developed open space was most important for fires in the eastern region. A combination of wildlands vegetation was most influential for most ecoregions, although herbaceous vegetation alone and lack of pasture, crops, and developed open space also were influential. Despite departure from historical fire regimes, these models demonstrated that herbaceous vegetation remains necessary for fires and that evergreen forests in particular are fire-prone, while reduction of vegetation surrounding housing developments will help provide a buffer to reduce large fires.

APA, Harvard, Vancouver, ISO, and other styles

25

Mujtaba Khandy, Owais, and Samad Dadvandipour. "Analysis of machine learning algorithms for character recognition: a case study on handwritten digit recognition." Indonesian Journal of Electrical Engineering and Computer Science 21, no. 1 (January 1, 2021): 574. http://dx.doi.org/10.11591/ijeecs.v21.i1.pp574-581.

Full text

Abstract:

<p><span>This paper covers the work done in handwritten digit recognition and the various classifiers that have been developed. Methods like MLP, SVM, Bayesian networks, and Random forests were discussed with their accuracy and are empirically evaluated. Boosted LetNet 4, an ensemble of various classifiers, has shown maximum efficiency among these methods. </span></p>

APA, Harvard, Vancouver, ISO, and other styles

26

Rajure, Pranita. "Prediction of Domestic Airline Tickets using Machine Learning." International Journal for Research in Applied Science and Engineering Technology 9, no. VI (June 14, 2021): 666–74. http://dx.doi.org/10.22214/ijraset.2021.35053.

Full text

Abstract:

Airlines usually keep their price strategies as commercial secrets and information is always asymmetric, it is difficult for ordinary customers to estimate future flight price changes. However, a reasonable prediction can help customers make decisions when to buy air tickets for a lower price. Flight price prediction can be regarded as a typical time series prediction problem. When you give customers a device that can help them save some money, they will pay you back with loyalty, which is priceless. Interesting fact: Fareboom users started spending twice as much time per session within a month of the release of an airfare price forecasting feature. Considering the features such as departure time, the number of days left for departure and time of the day it will give the best time to buy the ticket. Features are extracted from the collected data to apply Random Forest Machine Learning (ML) model. Then using this information, we are intended to build a system that can help buyers whether to buy a ticket or not. We have used Random Forest Algorithm which is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model. With that said, random forests are a strong modelling technique and much more robust than a single decision tree. They aggregate many decision trees to limit over fitting as well as error due to bias and therefore yield useful results. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.

APA, Harvard, Vancouver, ISO, and other styles

27

Vinasco, J. S., D. A. Rodríguez, S. Velásquez, D. F. Quintero, L. R. Livni, and F. L. Hernández. "COVERAGE CHANGES DETECTION AT CIÉNAGA GRANDE, SANTA MARTA – COLOMBIA USING AUTOMATIC CLASSIFICATION." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-3/W12-2020 (November 6, 2020): 195–200. http://dx.doi.org/10.5194/isprs-archives-xlii-3-w12-2020-195-2020.

Full text

Abstract:

Abstract. The Ciénaga Grande, Santa Marta is the largest and most diverse ecosystem of its kind in Colombia. Its primary function is acting as a filter for the organic carbon cycle. Recently, this place has been suffering disruptions due to the anthropic activities taking place in its surroundings. The present study, the changes in the surface of Ciénaga Grande, Santa Marta, Magdalena, Colombia between 2013 and 2018 were determined using semiautomatic detection methods with high resolution data from remote sensors (Landsat 8). The zone of studies was classified in six kinds of surfaces: 1) artificial territories, 2) agricultural territories, 3) forests and semi-natural areas, 4) wet areas, 5) deep water surfaces & 6) wich is related to clouds as a masking method. Random Forest classifiers were utilized and the Feed For Ward multilayer perceptron neuronal network (ANN) was simultaneously assessed. The training stage for both methods was performed with 300 samples, distributed in equal quantities, over each coverage class. The semi-automatic classification was carried out with an annual frequency, but the monitoring was carried out throughout the analysis period through the performance of three indicators Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI) and Normalized Difference Water Index (NDWI). It was found from the confusion matrix that the Random Forest method more accurately classified four classes while Neural Networks Analysis (NNA) just three. Finally, taking the Random Forest results into account, it was found that the agricultural expansion increased from 7% to 9% and the urban zone increased from 20% to 30% of the total area. As well as a decrease of damp areas from 27% to 12% and forests from 4% to 3% of the total area of study.

APA, Harvard, Vancouver, ISO, and other styles

28

Jiang, Yufeng, Li Zhang, Min Yan, Jianguo Qi, Tianmeng Fu, Shunxiang Fan, and Bowei Chen. "High-Resolution Mangrove Forests Classification with Machine Learning Using Worldview and UAV Hyperspectral Data." Remote Sensing 13, no. 8 (April 15, 2021): 1529. http://dx.doi.org/10.3390/rs13081529.

Full text

Abstract:

Mangrove forests, as important ecological and economic resources, have suffered a loss in the area due to natural and human activities. Monitoring the distribution of and obtaining accurate information on mangrove species is necessary for ameliorating the damage and protecting and restoring mangrove forests. In this study, we compared the performance of UAV Rikola hyperspectral images, WorldView-2 (WV-2) satellite-based multispectral images, and a fusion of data from both in the classification of mangrove species. We first used recursive feature elimination‒random forest (RFE-RF) to select the vegetation’s spectral and texture feature variables, and then implemented random forest (RF) and support vector machine (SVM) algorithms as classifiers. The results showed that the accuracy of the combined data was higher than that of UAV and WV-2 data; the vegetation index features of UAV hyperspectral data and texture index of WV-2 data played dominant roles; the overall accuracy of the RF algorithm was 95.89% with a Kappa coefficient of 0.95, which is more accurate and efficient than SVM. The use of combined data and RF methods for the classification of mangrove species could be useful in biomass estimation and breeding cultivation.

APA, Harvard, Vancouver, ISO, and other styles

29

Yang, Jianbo, Jianchu Xu, and De-Li Zhai. "Integrating Phenological and Geographical Information with Artificial Intelligence Algorithm to Map Rubber Plantations in Xishuangbanna." Remote Sensing 13, no. 14 (July 16, 2021): 2793. http://dx.doi.org/10.3390/rs13142793.

Full text

Abstract:

Most natural rubber trees (Hevea brasiliensis) are grown on plantations, making rubber an important industrial crop. Rubber plantations are also an important source of household income for over 20 million people. The accurate mapping of rubber plantations is important for both local governments and the global market. Remote sensing has been a widely used approach for mapping rubber plantations, typically using optical remote sensing data obtained at the regional scale. Improving the efficiency and accuracy of rubber plantation maps has become a research hotspot in rubber-related literature. To improve the classification efficiency, researchers have combined the phenology, geography, and texture of rubber trees with spectral information. Among these, there are three main classifiers: maximum likelihood, QUEST decision tree, and random forest methods. However, until now, no comparative studies have been conducted for the above three classifiers. Therefore, in this study, we evaluated the mapping accuracy based on these three classifiers, using four kinds of data input: Landsat spectral information, phenology–Landsat spectral information, topography–Landsat spectral information, and phenology–topography–Landsat spectral information. We found that the random forest method had the highest mapping accuracy when compared with the maximum likelihood and QUEST decision tree methods. We also found that adding either phenology or topography could improve the mapping accuracy for rubber plantations. When either phenology or topography were added as parameters within the random forest method, the kappa coefficient increased by 5.5% and 6.2%, respectively, compared to the kappa coefficient for the baseline Landsat spectral band data input. The highest accuracy was obtained from the addition of both phenology–topography–Landsat spectral bands to the random forest method, achieving a kappa coefficient of 97%. We therefore mapped rubber plantations in Xishuangbanna using the random forest method, with the addition of phenology and topography information from 1990–2020. Our results demonstrated the usefulness of integrating phenology and topography for mapping rubber plantations. The machine learning approach showed great potential for accurate regional mapping, particularly by incorporating plant habitat and ecological information. We found that during 1990–2020, the total area of rubber plantations had expanded to over three times their former area, while natural forests had lost 17.2% of their former area.

APA, Harvard, Vancouver, ISO, and other styles

30

Moni, Vidya. "Human Papillomavirus Targeted Immunotherapy Outcome Prediction Using Machine Learning." International Journal for Research in Applied Science and Engineering Technology 9, no. VII (July 31, 2021): 3598–611. http://dx.doi.org/10.22214/ijraset.2021.37197.

Full text

Abstract:

Warts caused by the Human Papillomavirus (HPV) is a highly contagious disease, and affects several million people across the globe every year, in the form of small lesions on the skin, commonly known as warts. Warts can be treated effectively with several methods, the most effective being Immunotherapy and Cryotherapy. Our research is focused on the performance comparison of modern Machine Learning classification techniques to predict the outcome (positive or negative) of Immunotherapy treatment given to a patient, by using patient data as input features to our classifiers. The precision, recall, f-measure and accuracy were used to compare the performance of the various classifiers considered in this study. We considered Logistic Regression, ZeroR, AdaBoost, K-Nearest Neighbours (KNN), Support Vector Machines (SVM), Gradient Boosting, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), Decision Trees and Random Forests. The ZeroR classifier was used as a baseline to provide us with insights into the skewed nature of the data, so as to enable us to better understand the comparison in performance of the various classifiers.

APA, Harvard, Vancouver, ISO, and other styles

31

Sothe, C., L. E. C. la Rosa, C. M. de Almeida, A. Gonsamo, M. B. Schimalski, J. D. B. Castro, R. Q. Feitosa, et al. "EVALUATING A CONVOLUTIONAL NEURAL NETWORK FOR FEATURE EXTRACTION AND TREE SPECIES CLASSIFICATION USING UAV-HYPERSPECTRAL IMAGES." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences V-3-2020 (August 3, 2020): 193–99. http://dx.doi.org/10.5194/isprs-annals-v-3-2020-193-2020.

Full text

Abstract:

Abstract. The classification of tree species can significantly benefit from high spatial and spectral information acquired by unmanned aerial vehicles (UAVs) associated with advanced feature extraction and classification methods. Different from the traditional feature extraction methods, that highly depend on user’s knowledge, the convolutional neural network (CNN)-based method can automatically learn and extract the spatial-related features layer by layer. However, in order to capture significant features of the data, the CNN classifier requires a large number of training samples, which are hardly available when dealing with tree species in tropical forests. This study investigated the following topics concerning the classification of 14 tree species in a subtropical forest area of Southern Brazil: i) the performance of the CNN method associated with a previous step to increase and balance the sample set (data augmentation) for tree species classification as compared to the conventional machine learning methods support vector machine (SVM) and random forest (RF) using the original training data; ii) the performance of the SVM and RF classifiers when associated with a data augmentation step and spatial features extracted from a CNN. Results showed that the CNN classifier outperformed the conventional SVM and RF classifiers, reaching an overall accuracy (OA) of 84.37% and Kappa of 0.82. The SVM and RF had a poor accuracy with the original spectral bands (OA 62.67% and 59.24%) but presented an increase between 14% and 21% in OA when associated with a data augmentation and spatial features extracted from a CNN.

APA, Harvard, Vancouver, ISO, and other styles

32

Nguyen, Huong Thi Thanh, Trung Minh Doan, Erkki Tomppo, and Ronald E. McRoberts. "Land Use/Land Cover Mapping Using Multitemporal Sentinel-2 Imagery and Four Classification Methods—A Case Study from Dak Nong, Vietnam." Remote Sensing 12, no. 9 (April 26, 2020): 1367. http://dx.doi.org/10.3390/rs12091367.

Full text

Abstract:

Information on land use and land cover (LULC) including forest cover is important for the development of strategies for land planning and management. Satellite remotely sensed data of varying resolutions have been an unmatched source of such information that can be used to produce estimates with a greater degree of confidence than traditional inventory estimates. However, use of these data has always been a challenge in tropical regions owing to the complexity of the biophysical environment, clouds, and haze, and atmospheric moisture content, all of which impede accurate LULC classification. We tested a parametric classifier (logistic regression) and three non-parametric machine learning classifiers (improved k-nearest neighbors, random forests, and support vector machine) for classification of multi-temporal Sentinel 2 satellite imagery into LULC categories in Dak Nong province, Vietnam. A total of 446 images, 235 from the year 2017 and 211 from the year 2018, were pre-processed to gain high quality images for mapping LULC in the 6516 km2 study area. The Sentinel 2 images were tested and classified separately for four temporal periods: (i) dry season, (ii) rainy season, (iii) the entirety of the year 2017, and (iv) the combination of dry and rainy seasons. Eleven different LULC classes were discriminated of which five were forest classes. For each combination of temporal image set and classifier, a confusion matrix was constructed using independent reference data and pixel classifications, and the area on the ground of each class was estimated. For overall temporal periods and classifiers, overall accuracy ranged from 63.9% to 80.3%, and the Kappa coefficient ranged from 0.611 to 0.813. Area estimates for individual classes ranged from 70 km2 (1% of the study area) to 2200 km2 (34% of the study area) with greater uncertainties for smaller classes.

APA, Harvard, Vancouver, ISO, and other styles

33

Ćwiklińska-Jurkowska, Małgorzata. "Gene selection ensembles and classifier ensembles for medical diagnosis." Biometrical Letters 56, no. 2 (December 1, 2019): 117–38. http://dx.doi.org/10.2478/bile-2019-0007.

Full text

Abstract:

SummaryThe usefulness of combining methods is examined using the example of microarray cancer data sets, where expression levels of huge numbers of genes are reported. Problems of discrimination into two groups are examined on three data sets relating to the expression of huge numbers of genes. For the three examined microarray data sets, the cross-validation errors evaluated on the remaining half of the whole data set, not used earlier for the selection of genes, were used as measures of classifier performance. Common single procedures for the selection of genes—Prediction Analysis of Microarrays (PAM) and Significance Analysis of Microarrays (SAM)—were compared with the fusion of eight selection procedures, or of a smaller subset of five of them, excluding SAM or PAM. Merging five or eight selection methods gave similar results. Based on the misclassification rates for the three examined microarray data sets, for any examined ensemble of classifiers, the combining of gene selection methods was not superior to single PAM or SAM selection for two of the examined data sets. Additionally, the procedure of heterogeneous combining of five base classifiers—k-nearest neighbors, SVM linear and SVM radial with parameter c=1, shrunken centroids regularized classifier (SCRDA) and nearest mean classifier—proved to significantly outperform resampling classifiers such as bagging decision trees. Heterogeneously combined classifiers also outperformed double bagging for some ranges of gene numbers and data sets, but merging is generally not superior to random forests. The preliminary step of combining gene rankings was generally not essential for the performance for either heterogeneously or homogeneously combined classifiers.

APA, Harvard, Vancouver, ISO, and other styles

34

LEE, HEEYOUNG, MIHAI SURDEANU, and DAN JURAFSKY. "A scaffolding approach to coreference resolution integrating statistical and rule-based models." Natural Language Engineering 23, no. 5 (March 21, 2017): 733–62. http://dx.doi.org/10.1017/s1351324917000109.

Full text

Abstract:

AbstractWe describe a scaffolding approach to the task of coreference resolution that incrementally combines statistical classifiers, each designed for a particular mention type, with rule-based models (for sub-tasks well-matched to determinism). We motivate our design by an oracle-based analysis of errors in a rule-based coreference resolution system, showing that rule-based approaches are poorly suited to tasks that require a large lexical feature space, such as resolving pronominal and common-noun mentions. Our approach combines many advantages: it incrementally builds clusters integrating joint information about entities, uses rules for deterministic phenomena, and integrates rich lexical, syntactic, and semantic features with random forest classifiers well-suited to modeling the complex feature interactions that are known to characterize the coreference task. We demonstrate that all these decisions are important. The resulting system achieves 63.2 F1 on the CoNLL-2012 shared task dataset, outperforming the rule-based starting point by over seven F1 points. Similarly, our system outperforms an equivalent sieve-based approach that relies on logistic regression classifiers instead of random forests by over four F1 points. Lastly, we show that by changing the coreference resolution system from relying on constituent-based syntax to using dependency syntax, which can be generated in linear time, we achieve a runtime speedup of 550 per cent without considerable loss of accuracy.

APA, Harvard, Vancouver, ISO, and other styles

35

TRAWIŃSKI, KRZYSZTOF, OSCAR CORDÓN, and ARNAUD QUIRIN. "ON DESIGNING FUZZY RULE-BASED MULTICLASSIFICATION SYSTEMS BY COMBINING FURIA WITH BAGGING AND FEATURE SELECTION." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 19, no. 04 (August 2011): 589–633. http://dx.doi.org/10.1142/s0218488511007155.

Full text

Abstract:

In this work, we conduct a study considering a fuzzy rule-based multiclassification system design framework based on Fuzzy Unordered Rule Induction Algorithm (FURIA). This advanced method serves as the fuzzy classification rule learning algorithm to derive the component classifiers considering bagging and feature selection. We develop an exhaustive study on the potential of bagging and feature selection to design a final FURIA-based fuzzy multiclassifier dealing with high dimensional data. Several parameter settings for the global approach are tested when applied to twenty one popular UCI datasets. The results obtained show that FURIA-based fuzzy multiclassifiers outperform the single FURIA classifier and are competitive with C4.5 multiclassifiers and random forests.

APA, Harvard, Vancouver, ISO, and other styles

36

Zararsiz, Gokmen, Dincer Goksuluk, Bernd Klaus, Selcuk Korkmaz, Vahap Eldem, Erdem Karabulut, and Ahmet Ozturk. "voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data." PeerJ 5 (October 6, 2017): e3890. http://dx.doi.org/10.7717/peerj.3890.

Full text

Abstract:

RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom’s precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.

APA, Harvard, Vancouver, ISO, and other styles

37

Dankovičová, Zuzana, Dávid Sovák, Peter Drotár, and Liberios Vokorokos. "Machine Learning Approach to Dysphonia Detection." Applied Sciences 8, no. 10 (October 15, 2018): 1927. http://dx.doi.org/10.3390/app8101927.

Full text

Abstract:

This paper addresses the processing of speech data and their utilization in a decision support system. The main aim of this work is to utilize machine learning methods to recognize pathological speech, particularly dysphonia. We extracted 1560 speech features and used these to train the classification model. As classifiers, three state-of-the-art methods were used: K-nearest neighbors, random forests, and support vector machine. We analyzed the performance of classifiers with and without gender taken into account. The experimental results showed that it is possible to recognize pathological speech with as high as a 91.3% classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

38

Sideris, Nikolaos, Georgios Bardis, Athanasios Voulodimos, Georgios Miaoulis, and Djamchid Ghazanfarpour. "Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System." Sensors 19, no. 10 (May 16, 2019): 2266. http://dx.doi.org/10.3390/s19102266.

Full text

Abstract:

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

APA, Harvard, Vancouver, ISO, and other styles

39

Kulkarni, Keerti, and Vijaya P. A. "Using Combination Technique for Land Cover Classification of Optical Multispectral Images." International Journal of Applied Geospatial Research 12, no. 4 (October 2021): 22–39. http://dx.doi.org/10.4018/ijagr.2021100102.

Full text

Abstract:

The need for efficient planning of the land is exponentially increasing because of the unplanned human activities, especially in the urban areas. A land cover map gives a detailed report on temporal dynamics of a given geographical area. The land cover map can be obtained by using machine learning classifiers on the raw satellite images. In this work, the authors propose a combination method for the land cover classification. This method combines the outputs of two classifiers, namely, random forests (RF) and support vector machines (SVM), using Dempster-Shafer combination theory (DSCT), also called the theory of evidence. This combination is possible because of the inherent uncertainties associated with the output of each classifier. The experimental results indicate an improved accuracy (89.6%, kappa = 0.86 as versus accuracy of RF [87.31%, kappa = 0.83] and SVM [82.144%, kappa = 0.76]). The results are validated using the normalized difference vegetation index (NDVI), and the overall accuracy (OA) has been used as a comparison basis.

APA, Harvard, Vancouver, ISO, and other styles

40

A, Soumya, and G. Hemantha Kumar. "Performance Analysis of Random Forests with SVM and KNN in Classification of Ancient Kannada Scripts." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 13, no. 9 (September 30, 2014): 4907–21. http://dx.doi.org/10.24297/ijct.v13i9.2392.

Full text

Abstract:

Ancient inscriptions which reveal the details of yester years are difficult to interpret by modern readers and efforts are being made in automating such tasks of deciphering historical records. The Kannada script which is used to write in Kannada language has gradually evolved from the ancient script known as Brahmi. Kannada script has traveled a long way from the earlier Brahmi model and has undergone a number of changes during the regimes of Ashoka, Shatavahana, Kadamba, Ganga, Rashtrakuta, Chalukya, Hoysala , Vijayanagara andÂ Wodeyar dynasties.Â In this paper we discuss on Classification of ancient Kannada Scripts during three different periods Ashoka, Kadamba and Satavahana. A reconstructed grayscale ancient Kannada epigraph image is input, which is binarized using Otsuâ€™s method. Normalized Central and Zernike Moment features are extracted for classification. The RF Classifier designed is tested on handwritten base characters belonging to Ashoka, Satavahana and Kadamba dynasties. For each dynasty, 105 handwritten samples with 35 base characters are considered. The classification rates for the training and testing base characters from Satavahana period, for varying number of trees and thresholds of RF are determined. Finally a Comparative analysis of the Classification rates is made for the designed RF with SVM and k-NN classifiers, for the ancient Kannada base characters from 3 different eras Ashoka, Kadamba and Satavahana period.

APA, Harvard, Vancouver, ISO, and other styles

41

Pramanik, Moumita, Ratika Pradhan, Parvati Nandy, Akash Kumar Bhoi, and Paolo Barsocchi. "Machine Learning Methods with Decision Forests for Parkinson’s Detection." Applied Sciences 11, no. 2 (January 8, 2021): 581. http://dx.doi.org/10.3390/app11020581.

Full text

Abstract:

Biomedical engineers prefer decision forests over traditional decision trees to design state-of-the-art Parkinson’s Detection Systems (PDS) on massive acoustic signal data. However, the challenges that the researchers are facing with decision forests is identifying the minimum number of decision trees required to achieve maximum detection accuracy with the lowest error rate. This article examines two recent decision forest algorithms Systematically Developed Forest (SysFor), and Decision Forest by Penalizing Attributes (ForestPA) along with the popular Random Forest to design three distinct Parkinson’s detection schemes with optimum number of decision trees. The proposed approach undertakes minimum number of decision trees to achieve maximum detection accuracy. The training and testing samples and the density of trees in the forest are kept dynamic and incremental to achieve the decision forests with maximum capability for detecting Parkinson’s Disease (PD). The incremental tree densities with dynamic training and testing of decision forests proved to be a better approach for detection of PD. The proposed approaches are examined along with other state-of-the-art classifiers including the modern deep learning techniques to observe the detection capability. The article also provides a guideline to generate ideal training and testing split of two modern acoustic datasets of Parkinson’s and control subjects donated by the Department of Neurology in Cerrahpaşa, Istanbul and Departamento de Matemáticas, Universidad de Extremadura, Cáceres, Spain. Among the three proposed detection schemes the Forest by Penalizing Attributes (ForestPA) proved to be a promising Parkinson’s disease detector with a little number of decision trees in the forest to score the highest detection accuracy of 94.12% to 95.00%.

APA, Harvard, Vancouver, ISO, and other styles

42

Pramanik, Moumita, Ratika Pradhan, Parvati Nandy, Akash Kumar Bhoi, and Paolo Barsocchi. "Machine Learning Methods with Decision Forests for Parkinson’s Detection." Applied Sciences 11, no. 2 (January 8, 2021): 581. http://dx.doi.org/10.3390/app11020581.

Full text

Abstract:

Biomedical engineers prefer decision forests over traditional decision trees to design state-of-the-art Parkinson’s Detection Systems (PDS) on massive acoustic signal data. However, the challenges that the researchers are facing with decision forests is identifying the minimum number of decision trees required to achieve maximum detection accuracy with the lowest error rate. This article examines two recent decision forest algorithms Systematically Developed Forest (SysFor), and Decision Forest by Penalizing Attributes (ForestPA) along with the popular Random Forest to design three distinct Parkinson’s detection schemes with optimum number of decision trees. The proposed approach undertakes minimum number of decision trees to achieve maximum detection accuracy. The training and testing samples and the density of trees in the forest are kept dynamic and incremental to achieve the decision forests with maximum capability for detecting Parkinson’s Disease (PD). The incremental tree densities with dynamic training and testing of decision forests proved to be a better approach for detection of PD. The proposed approaches are examined along with other state-of-the-art classifiers including the modern deep learning techniques to observe the detection capability. The article also provides a guideline to generate ideal training and testing split of two modern acoustic datasets of Parkinson’s and control subjects donated by the Department of Neurology in Cerrahpaşa, Istanbul and Departamento de Matemáticas, Universidad de Extremadura, Cáceres, Spain. Among the three proposed detection schemes the Forest by Penalizing Attributes (ForestPA) proved to be a promising Parkinson’s disease detector with a little number of decision trees in the forest to score the highest detection accuracy of 94.12% to 95.00%.

APA, Harvard, Vancouver, ISO, and other styles

43

Gašparović, Mateo, and Dino Dobrinić. "Comparative Assessment of Machine Learning Methods for Urban Vegetation Mapping Using Multitemporal Sentinel-1 Imagery." Remote Sensing 12, no. 12 (June 17, 2020): 1952. http://dx.doi.org/10.3390/rs12121952.

Full text

Abstract:

Mapping of green vegetation in urban areas using remote sensing techniques can be used as a tool for integrated spatial planning to deal with urban challenges. In this context, multitemporal (MT) synthetic aperture radar (SAR) data have not been equally investigated, as compared to optical satellite data. This research compared various machine learning methods using single-date and MT Sentinel-1 (S1) imagery. The research was focused on vegetation mapping in urban areas across Europe. Urban vegetation was classified using six classifiers—random forests (RF), support vector machine (SVM), extreme gradient boosting (XGB), multi-layer perceptron (MLP), AdaBoost.M1 (AB), and extreme learning machine (ELM). Whereas, SVM showed the best performance in the single-date image analysis, the MLP classifier yielded the highest overall accuracy in the MT classification scenario. Mean overall accuracy (OA) values for all machine learning methods increased from 57% to 77% with speckle filtering. Using MT SAR data, i.e., three and five S1 imagery, an additional increase in the OA of 8.59% and 13.66% occurred, respectively. Additionally, using three and five S1 imagery for classification, the F1 measure for forest and low vegetation land-cover class exceeded 90%. This research allowed us to confirm the possibility of MT C-band SAR imagery for urban vegetation mapping.

APA, Harvard, Vancouver, ISO, and other styles

44

Singh Sisodia, Dilip. "Ensemble Learning Approach for Clickbait Detection Using Article Headline Features." Informing Science: The International Journal of an Emerging Transdiscipline 22 (2019): 031–44. http://dx.doi.org/10.28945/4279.

Full text

Abstract:

Aim/Purpose: The aim of this paper is to propose an ensemble learners based classification model for classification clickbaits from genuine article headlines. Background: Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to an unpleasant experience for readers. Therefore, it is essential to distinguish clickbaits from authentic headlines to mitigate their impact on readers’ perception. Methodology: A total of one hundred thousand article headlines are collected from news aggregator sites consists of clickbaits and authentic news headlines. The collected data samples are divided into five training sets of balanced and unbalanced data. The natural language processing techniques are used to extract 19 manually selected features from article headlines. Contribution: Three ensemble learning techniques including bagging, boosting, and random forests are used to design a classifier model for classifying a given headline into the clickbait or non-clickbait. The performances of learners are evaluated using accuracy, precision, recall, and F-measures. Findings: It is observed that the random forest classifier detects clickbaits better than the other classifiers with an accuracy of 91.16 %, a total precision, recall, and f-measure of 91 %.

APA, Harvard, Vancouver, ISO, and other styles

45

Zhou, Xisheng, Long Li, Longqian Chen, Yunqiang Liu, Yifan Cui, Yu Zhang, and Ting Zhang. "Discriminating Urban Forest Types from Sentinel-2A Image Data through Linear Spectral Mixture Analysis: A Case Study of Xuzhou, East China." Forests 10, no. 6 (May 31, 2019): 478. http://dx.doi.org/10.3390/f10060478.

Full text

Abstract:

Urban forests are an important component of the urban ecosystem. Urban forest types are a key piece of information required for monitoring the condition of an urban ecosystem. In this study, we propose an urban forest type discrimination method based on linear spectral mixture analysis (LSMA) and a support vector machine (SVM) in the case study of Xuzhou, east China. From 10-m Sentinel-2A imagery data, three different vegetation endmembers, namely broadleaved forest, coniferous forest, and low vegetation, and their abundances were extracted through LSMA. Using a combination of image spectra, topography, texture, and vegetation abundances, four SVM classification models were performed and compared to investigate the impact of these features on classification accuracy. With a particular interest in the role that vegetation abundances play in classification, we also compared SVM and other classifiers, i.e., random forest (RF), artificial neural network (ANN), and quick unbiased efficient statistical tree (QUEST). Results indicate that (1) the LSMA method can derive accurate vegetation abundances from Sentinel-2A image data, and the root-mean-square error (RMSE) was 0.019; (2) the classification accuracies of the four SVM models were improved after adding topographic features, textural features, and vegetation abundances one after the other; (3) the SVM produced higher classification accuracies than the other three classifiers when identical classification features were used; and (4) vegetation endmember abundances improved classification accuracy regardless of which classifier was used. It is concluded that Sentinel-2A image data has a strong capability to discriminate urban forest types in spectrally heterogeneous urban areas, and that vegetation abundances derived from LSMA can enhance such discrimination.

APA, Harvard, Vancouver, ISO, and other styles

46

Jordanov, Ivan, Nedyalko Petrov, and Alessio Petrozziello. "Classifiers Accuracy Improvement Based on Missing Data Imputation." Journal of Artificial Intelligence and Soft Computing Research 8, no. 1 (January 1, 2018): 31–48. http://dx.doi.org/10.1515/jaiscr-2018-0002.

Full text

Abstract:

Abstract In this paper we investigate further and extend our previous work on radar signal identification and classification based on a data set which comprises continuous, discrete and categorical data that represent radar pulse train characteristics such as signal frequencies, pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the most of the real world datasets, it also contains high percentage of missing values and to deal with this problem we investigate three imputation techniques: Multiple Imputation (MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI). We apply these methods to data samples with up to 60% missingness, this way doubling the number of instances with complete values in the resulting dataset. The imputation models performance is assessed with Wilcoxon’s test for statistical significance and Cohen’s effect size metrics. To solve the classification task, we employ three intelligent approaches: Neural Networks (NN); Support Vector Machines (SVM); and Random Forests (RF). Subsequently, we critically analyse which imputation method influences most the classifiers’ performance, using a multiclass classification accuracy metric, based on the area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each containing several ‘subclasses’, and introduce and propose two new metrics: inner class accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy (OCA) metric. We conclude that they can be used as complementary to the OCA when choosing the best classifier for the problem at hand.

APA, Harvard, Vancouver, ISO, and other styles

47

Hammad, Mahmoud, Mohammad Al-Smadi, Qanita Baker, Muntaha D, Nour Al-Khdour, Mutaz Younes, and Enas Khwaileh. "Question to Question Similarity Analysis Using Morphological, Syntactic, Semantic, and Lexical Features." JUCS - Journal of Universal Computer Science 26, no. 6 (June 28, 2020): 671–97. http://dx.doi.org/10.3897/jucs.2020.036.

Full text

Abstract:

In the digitally connected world that we are living in, people expect to get answers to their questions spontaneously. This expectation increased the burden on Question/Answer platforms such as Stack Overflow and many others. A promising solution to this problem is to detect if a question being asked is similar to a question in the database, then present the answer of the detected question to the user. To address this challenge, we propose a novel Natural Language Processing (NLP) approach that detects if two Arabic questions are similar or not using their extracted morphological, syntactic, semantic, lexical, overlapping, and semantic lexical features. Our approach involves several phases including Arabic text processing, novel feature extraction, and text classifications. Moreover, we conducted a comparison between seven different machine learning classifiers. The included classifiers are: Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), Extreme Gradient Boosting (XGB), Random Forests (RF), Adaptive Boosting (AdaBoost), and Multilayer Perceptron (MLP). To conduct our experiments, we used a real-world questions dataset consisting of around 19,136 questions (9,568 pairs of questions) in which our approach achieved 82.93% accuracy using our XGB model on the best features selected by the Random Forest feature selection technique. This high accuracy of our model shows the ability of our approach to correctly detect similar Arabic questions and hence increases user satisfactions.

APA, Harvard, Vancouver, ISO, and other styles

48

Mishra, Sanket, Sarthak Rajwanshi, and Chittaranjan Hota. "Internet of Things Based Occupancy Detection Using Ensemble Classifier for Smart Buildings." Journal of Computational and Theoretical Nanoscience 17, no. 1 (January 1, 2020): 505–12. http://dx.doi.org/10.1166/jctn.2020.8698.

Full text

Abstract:

Buildings account for a large share in energy consumption in day to day life. Occupancy based models can help in modeling whether the building or the particular room is occupied or not. Occupancy detection mechanisms can help in automating the electrical appliances and make them operational only in presence of the person in the room. This helps in creating energy aware scenarios which can contribute to the energy efficiency and reduction in power tariff. In this work, we take the approach of occupancy modeling by the help of Ensemble Models constructed using Random Forests, Logistic Regression and Support Vector Machine classifiers. The ensemble approach undertaken in this work is Voting and the weights of classifiers to the meta-model are fine-tuned using a Differential Evolution optimization algorithm. The results were found to be of high accuracy, i.e., 98.8% and 98.7% on the given test sets.

APA, Harvard, Vancouver, ISO, and other styles

49

Tyryshkina, Anastasia, Nate Coraor, and Anton Nekrutenko. "Predicting runtimes of bioinformatics tools based on historical data: five years of Galaxy usage." Bioinformatics 35, no. 18 (January 30, 2019): 3453–60. http://dx.doi.org/10.1093/bioinformatics/btz054.

Full text

Abstract:

Abstract Motivation One of the many technical challenges that arises when scheduling bioinformatics analyses at scale is determining the appropriate amount of memory and processing resources. Both over- and under-allocation leads to an inefficient use of computational infrastructure. Over allocation locks resources that could otherwise be used for other analyses. Under-allocation causes job failure and requires analyses to be repeated with a larger memory or runtime allowance. We address this challenge by using a historical dataset of bioinformatics analyses run on the Galaxy platform to demonstrate the feasibility of an online service for resource requirement estimation. Results Here we introduced the Galaxy job run dataset and tested popular machine learning models on the task of resource usage prediction. We include three popular forest models: the extra trees regressor, the gradient boosting regressor and the random forest regressor, and find that random forests perform best in the runtime prediction task. We also present two methods of choosing walltimes for previously unseen jobs. Quantile regression forests are more accurate in their predictions, and grant the ability to improve performance by changing the confidence of the estimates. However, the sizes of the confidence intervals are variable and cannot be absolutely constrained. Random forest classifiers address this problem by providing control over the size of the prediction intervals with an accuracy that is comparable to that of the regressor. We show that estimating the memory requirements of a job is possible using the same methods, which as far as we know, has not been done before. Such estimation can be highly beneficial for accurate resource allocation. Availability and implementation Source code available at https://github.com/atyryshkina/algorithm-performance-analysis, implemented in Python. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

50

Adler, W., A. Peters, and B. Lausen. "Comparison of Classifiers Applied to Confocal Scanning Laser Ophthalmoscopy Data." Methods of Information in Medicine 47, no. 01 (2008): 38–46. http://dx.doi.org/10.3414/me0348.

Full text

Abstract:

Summary Objectives: Comparison of classification methods using data of one clinical study. The tuning of hyperparameters is assessed as part of the methods by nested-loop cross-validation. Methods: We assess the ability of 18 statistical and machine learning classifiers to detect glaucoma. The training data set is one case-control study consisting of confocal scanning laser ophthalmoscopy measurement values from 98 glaucoma patients and 98 healthy controls. We compare bootstrap estimates of the classification error by the Wilcoxon signed rank test and box-plots of a bootstrap distribution of the estimate. Results: The comparison of out-of-bag bootstrap estimators of classification errors is assessed by Spearman’s rank correlation, Wilcoxon signed rank tests and box-plots of a bootstrap distribution of the estimate. The classification methods random forests 15.4%, support vector machines 15.9%, bundling 16.3% to 17.8%, and penalized discriminant analysis 16.8% show the best results. Conclusions: Using nested-loop cross-validation we account for the tuning of hyperparameters and demonstrate the assessment of different classifiers. We recommend a block design of the bootstrap simulation to allow a statistical assessment of the bootstrap estimates of the misclassification error. The results depend on the data of the clinical study and the given size of the bootstrap sample.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!