Academic literature on the topic 'Random Forests Classifiers'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Random Forests Classifiers.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Random Forests Classifiers"

1

Sadorsky, Perry. "Predicting Gold and Silver Price Direction Using Tree-Based Classifiers." Journal of Risk and Financial Management 14, no. 5 (April 29, 2021): 198. http://dx.doi.org/10.3390/jrfm14050198.

Full text
Abstract:
Gold is often used by investors as a hedge against inflation or adverse economic times. Consequently, it is important for investors to have accurate forecasts of gold prices. This paper uses several machine learning tree-based classifiers (bagging, stochastic gradient boosting, random forests) to predict the price direction of gold and silver exchange traded funds. Decision tree bagging, stochastic gradient boosting, and random forests predictions of gold and silver price direction are much more accurate than those obtained from logit models. For a 20-day forecast horizon, tree bagging, stochastic gradient boosting, and random forests produce accuracy rates of between 85% and 90% while logit models produce accuracy rates of between 55% and 60%. Stochastic gradient boosting accuracy is a few percentage points less than that of random forests for forecast horizons over 10 days. For those looking to forecast the direction of gold and silver prices, tree bagging and random forests offer an attractive combination of accuracy and ease of estimation. For each of gold and silver, a portfolio based on the random forests price direction forecasts outperformed a buy and hold portfolio.
APA, Harvard, Vancouver, ISO, and other styles
2

Kulyukin, Vladimir, Nikhil Ganta, and Anastasiia Tkachenko. "On Image Classification in Video Analysis of Omnidirectional Apis Mellifera Traffic: Random Reinforced Forests vs. Shallow Convolutional Networks." Applied Sciences 11, no. 17 (September 2, 2021): 8141. http://dx.doi.org/10.3390/app11178141.

Full text
Abstract:
Omnidirectional honeybee traffic is the number of bees moving in arbitrary directions in close proximity to the landing pad of a beehive over a period of time. Automated video analysis of such traffic is critical for continuous colony health assessment. In our previous research, we proposed a two-tier algorithm to measure omnidirectional bee traffic in videos. Our algorithm combines motion detection with image classification: in tier 1, motion detection functions as class-agnostic object location to generate regions with possible objects; in tier 2, each region from tier 1 is classified by a class-specific classifier. In this article, we present an empirical and theoretical comparison of random reinforced forests and shallow convolutional networks as tier 2 classifiers. A random reinforced forest is a random forest trained on a dataset with reinforcement learning. We present several methods of training random reinforced forests and compare their performance with shallow convolutional networks on seven image datasets. We develop a theoretical framework to assess the complexity of image classification by a image classifier. We formulate and prove three theorems on finding optimal random reinforced forests. Our conclusion is that, despite their limitations, random reinforced forests are a reasonable alternative to convolutional networks when memory footprints and classification and energy efficiencies are important factors. We outline several ways in which the performance of random reinforced forests may be improved.
APA, Harvard, Vancouver, ISO, and other styles
3

Daho, Mostafa El Habib, and Mohammed Amine Chikh. "Combining Bootstrapping Samples, Random Subspaces and Random Forests to Build Classifiers." Journal of Medical Imaging and Health Informatics 5, no. 3 (June 1, 2015): 539–44. http://dx.doi.org/10.1166/jmihi.2015.1423.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Alhudhaif, Adi. "A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic sampling (ADASYN) approach." PeerJ Computer Science 7 (May 14, 2021): e523. http://dx.doi.org/10.7717/peerj-cs.523.

Full text
Abstract:
Background Brain signals (EEG—Electroencephalography) are a gold standard frequently used in epilepsy prediction. It is crucial to predict epilepsy, which is common in the community. Early diagnosis is essential to reduce the treatment process of the disease and to keep the process healthier. Methods In this study, a five-classes dataset was used: EEG signals from different individuals, healthy EEG signals from tumor document, EEG signal with epilepsy, EEG signal with eyes closed, and EEG signal with eyes open. Four different methods have been proposed to classify five classes of EEG signals. In the first approach, the EEG signal was first divided into four different bands (beta, alpha, theta, and delta), and then 25 time-domain features were extracted from each band, and the main EEG signal and these extracted features were combined to obtain 125-time domain features (feature extraction). Using the Random Forests classifier, EEG activities were classified into five classes. In the second approach, each One-Against-One (OVO) approach with 125 attributes was split into ten parts, pairwise, and then each piece was classified with the Random Forests classifier. The majority voting scheme was used to combine decisions from the ten classifiers. In the third proposed method, each One-Against-All (OVA) approach with 125 attributes was divided into five parts, and then each piece was classified with the Random Forests classifier. The majority voting scheme was used to combine decisions from the five classifiers. In the fourth proposed approach, each One-Against-All (OVA) approach with 125 attributes was divided into five parts. Since each piece obtained had an imbalanced data distribution, an adaptive synthetic (ADASYN) sampling approach was used to stabilize each piece. Then, each balanced piece was classified with the Random Forests classifier. To combine the decisions obtanied from each classifier, the majority voting scheme has been used. Results The first approach achieved 71.90% classification success in classifying five-class EEG signals. The second approach achieved a classification success of 91.08% in classifying five-class EEG signals. The third method achieved 89% success, while the fourth proposed approach achieved 91.72% success. The results obtained show that the proposed fourth approach (the combination of the ADASYN sampling approach and Random Forest Classifier) achieved the best success in classifying five class EEG signals. This proposed method could be used in the detection of epilepsy events in the EEG signals.
APA, Harvard, Vancouver, ISO, and other styles
5

Yu, Tianyu, Cuiwei Liu, Zhuo Yan, and Xiangbin Shi. "A Multi-Task Framework for Action Prediction." Information 11, no. 3 (March 16, 2020): 158. http://dx.doi.org/10.3390/info11030158.

Full text
Abstract:
Predicting the categories of actions in partially observed videos is a challenging task in the computer vision field. The temporal progress of an ongoing action is of great importance for action prediction, since actions can present different characteristics at different temporal stages. To this end, we propose a novel multi-task deep forest framework, which treats temporal progress analysis as a relevant task to action prediction and takes advantage of observation ratio labels of incomplete videos during training. The proposed multi-task deep forest is a cascade structure of random forests and multi-task random forests. Unlike the traditional single-task random forests, multi-task random forests are built upon incomplete training videos annotated with action labels as well as temporal progress labels. Meanwhile, incorporating both random forests and multi-task random forests can increase the diversity of classifiers and improve the discriminative power of the multi-task deep forest. Experiments on the UT-Interaction and the BIT-Interaction datasets demonstrate the effectiveness of the proposed multi-task deep forest.
APA, Harvard, Vancouver, ISO, and other styles
6

Polaka, Inese, Igor Tom, and Arkady Borisov. "Decision Tree Classifiers in Bioinformatics." Scientific Journal of Riga Technical University. Computer Sciences 42, no. 1 (January 1, 2010): 118–23. http://dx.doi.org/10.2478/v10143-010-0052-4.

Full text
Abstract:
Decision Tree Classifiers in BioinformaticsThis paper presents a literature review of articles related to the use of decision tree classifiers in gene microarray data analysis published in the last ten years. The main focus is on researches solving the cancer classification problem using single decision tree classifiers (algorithms C4.5 and CART) and decision tree forests (e.g. random forests) showing strengths and weaknesses of the proposed methodologies when compared to other popular classification methods. The article also touches the use of decision tree classifiers in gene selection.
APA, Harvard, Vancouver, ISO, and other styles
7

El Habib Daho, Mostafa, Nesma Settouti, Mohammed El Amine Bechar, Amina Boublenza, and Mohammed Amine Chikh. "A new correlation-based approach for ensemble selection in random forests." International Journal of Intelligent Computing and Cybernetics 14, no. 2 (March 23, 2021): 251–68. http://dx.doi.org/10.1108/ijicc-10-2020-0147.

Full text
Abstract:
PurposeEnsemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems. Despite the effectiveness of these techniques, studies have shown that ensemble methods generate a large number of hypotheses and that contain redundant classifiers in most cases. Several works proposed in the state of the art attempt to reduce all hypotheses without affecting performance.Design/methodology/approachIn this work, the authors are proposing a pruning method that takes into consideration the correlation between classifiers/classes and each classifier with the rest of the set. The authors have used the random forest algorithm as trees-based ensemble classifiers and the pruning was made by a technique inspired by the CFS (correlation feature selection) algorithm.FindingsThe proposed method CES (correlation-based Ensemble Selection) was evaluated on ten datasets from the UCI machine learning repository, and the performances were compared to six ensemble pruning techniques. The results showed that our proposed pruning method selects a small ensemble in a smaller amount of time while improving classification rates compared to the state-of-the-art methods.Originality/valueCES is a new ordering-based method that uses the CFS algorithm. CES selects, in a short time, a small sub-ensemble that outperforms results obtained from the whole forest and the other state-of-the-art techniques used in this study.
APA, Harvard, Vancouver, ISO, and other styles
8

Krautenbacher, Norbert, Fabian J. Theis, and Christiane Fuchs. "Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies." Computational and Mathematical Methods in Medicine 2017 (2017): 1–18. http://dx.doi.org/10.1155/2017/7847531.

Full text
Abstract:
Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R packagesambia.
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Sheng, Yixin Chen, and Dawn Wilkins. "Large margin classifiers and Random Forests for integrated biological prediction." International Journal of Bioinformatics Research and Applications 8, no. 1/2 (2012): 38. http://dx.doi.org/10.1504/ijbra.2012.045975.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Van Assche, Anneleen, Celine Vens, Hendrik Blockeel, and Sašo Džeroski. "First order random forests: Learning relational classifiers with complex aggregates." Machine Learning 64, no. 1-3 (June 21, 2006): 149–82. http://dx.doi.org/10.1007/s10994-006-8713-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Random Forests Classifiers"

1

Siegel, Kathryn I. (Kathryn Iris). "Incremental random forest classifiers in spark." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/106105.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
Cataloged from PDF version of thesis.
Includes bibliographical references (page 53).
The random forest is a machine learning algorithm that has gained popularity due to its resistance to noise, good performance, and training efficiency. Random forests are typically constructed using a static dataset; to accommodate new data, random forests are usually regrown. This thesis presents two main strategies for updating random forests incrementally, rather than entirely rebuilding the forests. I implement these two strategies-incrementally growing existing trees and replacing old trees-in Spark Machine Learning(ML), a commonly used library for running ML algorithms in Spark. My implementation draws from existing methods in online learning literature, but includes several novel refinements. I evaluate the two implementations, as well as a variety of hybrid strategies, by recording their error rates and training times on four different datasets. My benchmarks show that the optimal strategy for incremental growth depends on the batch size and the presence of concept drift in a data workload. I find that workloads with large batches should be classified using a strategy that favors tree regrowth, while workloads with small batches should be classified using a strategy that favors incremental growth of existing trees. Overall, the system demonstrates significant efficiency gains when compared to the standard method of regrowing the random forest.
by Kathryn I. Siegel.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
2

Nygren, Rasmus. "Evaluation of hyperparameter optimization methods for Random Forest classifiers." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301739.

Full text
Abstract:
In order to create a machine learning model, one is often tasked with selecting certain hyperparameters which configure the behavior of the model. The performance of the model can vary greatly depending on how these hyperparameters are selected, thus making it relevant to investigate the effects of hyperparameter optimization on the classification accuracy of a machine learning model. In this study, we train and evaluate a Random Forest classifier whose hyperparameters are set to default values and compare its classification accuracy to another classifier whose hyperparameters are obtained through the use of the hyperparameter optimization (HPO) methods Random Search, Bayesian Optimization and Particle Swarm Optimization. This is done on three different datasets, and each HPO method is evaluated based on the classification accuracy change it induces across the datasets. We found that every HPO method yielded a total classification accuracy increase of approximately 2-3% across all datasets compared to the accuracies obtained using the default hyperparameters. However, due to limitations of time, data and computational resources, no assertions can be made as to whether the observed positive effect is generalizable at a larger scale. Instead, we could conclude that the utility of HPO methods is dependent on the dataset at hand.
För att skapa en maskininlärningsmodell behöver en ofta välja olika hyperparametrar som konfigurerar modellens egenskaper. Prestandan av en sådan modell beror starkt på valet av dessa hyperparametrar, varför det är relevant att undersöka hur optimering av hyperparametrar kan påverka klassifikationssäkerheten av en maskininlärningsmodell. I denna studie tränar och utvärderar vi en Random Forest-klassificerare vars hyperparametrar sätts till särskilda standardvärden och jämför denna med en klassificerare vars hyperparametrar bestäms av tre olika metoder för optimering av hyperparametrar (HPO) - Random Search, Bayesian Optimization och Particle Swarm Optimization. Detta görs på tre olika dataset, och varje HPO- metod utvärderas baserat på den ändring av klassificeringsträffsäkerhet som den medför över dessa dataset. Vi fann att varje HPO-metod resulterade i en total ökning av klassificeringsträffsäkerhet på cirka 2-3% över alla dataset jämfört med den träffsäkerhet som kruleslassificeraren fick med standardvärdena för hyperparametrana. På grund av begränsningar i form av tid och data kunde vi inte fastställa om den positiva effekten är generaliserbar till en större skala. Slutsatsen som kunde dras var istället att användbarheten av metoder för optimering av hyperparametrar är beroende på det dataset de tillämpas på.
APA, Harvard, Vancouver, ISO, and other styles
3

Sandsveden, Daniel. "Evaluation of Random Forests for Detection and Localization of Cattle Eyes." Thesis, Linköpings universitet, Datorseende, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-121540.

Full text
Abstract:
In a time when cattle herds grow continually larger the need for automatic methods to detect diseases is ever increasing. One possible method to discover diseases is to use thermal images and automatic head and eye detectors. In this thesis an eye detector and a head detector is implemented using the Random Forests classifier. During the implementation the classifier is evaluated using three different descriptors: Histogram of Oriented Gradients, Local Binary Patterns, and a descriptor based on pixel differences. An alternative classifier, the Support Vector Machine, is also evaluated for comparison against Random Forests. The thesis results show that Histogram of Oriented Gradients performs well as a description of cattle heads, while Local Binary Patterns performs well as a description of cattle eyes. The provided descriptor performs almost equally well in both cases. The results also show that Random Forests performs approximately as good as the Support Vector Machine, when the Support Vector Machine is paired with Local Binary Patterns for both heads and eyes. Finally the thesis results indicate that it is easier to detect and locate cattle heads than it is to detect and locate cattle eyes. For eyes, combining a head detector and an eye detector is shown to give a better result than only using an eye detector. In this combination heads are first detected in images, followed by using the eye detector in areas classified as heads.
APA, Harvard, Vancouver, ISO, and other styles
4

Abd, El Meguid Mostafa. "Unconstrained facial expression recognition in still images and video sequences using Random Forest classifiers." Thesis, McGill University, 2012. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=107692.

Full text
Abstract:
The aim of this project is to construct and implement a comprehensive facial expression detection and classification framework through the use of a proprietary face detector (PittPatt) and a novel classifier consisting of a set of Random Forests paired with either support vector machine or k-nearest neighbour labellers. The system should perform at real-time rates under unconstrained image conditions, with no intermediate human intervention. The still-image Binghamton University 3D Facial Expression database was used for training purposes, while a number of other expression-labelled video databases were used for testing. Quantitative evidence for qualitative and intuitive facial expression recognition constitutes the main theoretical contribution to the field.
L'objectif de ce projet est de construire et mettre en œuvre un cadre complète de détection de l'expression du visage par l'utilisation d'un détecteur de visage exclusif (PittPatt) et un nouveau classificateur composé d'un ensemble de 'Random Forests' a accompagné d'un étiqueteur 'support vector machine' ou 'k-nearest neighbour'. Le système doit effectuer au temps réel, dans des conditions sans contrainte, sans aucune intervention humaine intermédiaires. La base de données d'images fixes 'Binghamton University 3D Facial Expressions' était utilisé à des fins de formation. Un nombre de bases de données d'expression d'images fixes et de vidéo ont été utilisés pour l'évaluation. Des données quantitatives pour l'analyse qualitative, et parfois intuitive, les sujets liés à l'expression faciale constituaient la contribution principale et théorique sur le terrain.
APA, Harvard, Vancouver, ISO, and other styles
5

Sjöqvist, Hugo. "Classifying Forest Cover type with cartographic variables via the Support Vector Machine, Naive Bayes and Random Forest classifiers." Thesis, Örebro universitet, Handelshögskolan vid Örebro Universitet, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-58384.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Halmann, Marju. "Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-14710.

Full text
Abstract:
Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random Forest classifier can be used for the more general email classification task and how it compares to other existing email classifiers. The comparison is based on the literature study and on the empirical experimentation using two real-life datasets. Firstly, a literature study is performed to gain insight of the accuracy of other available email classifiers. Secondly, proposed model’s accuracy is explored with experimentation. The literature study shows that the accuracy of more general email classifiers differs greatly on different user sets. The proposed model accuracy is within the reported accuracy range, however in the lower part. It indicates that the proposed model performs poorly compared to other classifiers. On average, the classifier performance improves 15 percentage points with additional information. This indicates that Latent Dirichlet Allocation (LDA) combined with Random Forest classifier is promising, however future studies are needed to explore the model and ways to further increase the accuracy.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Qing Frankowski Ralph. "An empirical evaluation of the random forests classifier models for variable selection in a large-scale lung cancer case-control study /." See options below, 2006. http://proquest.umi.com/pqdweb?did=1324365481&sid=1&Fmt=2&clientId=68716&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Xia, Junshi. "Multiple classifier systems for the classification of hyperspectral data." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENT047/document.

Full text
Abstract:
Dans cette thèse, nous proposons plusieurs nouvelles techniques pour la classification d'images hyperspectrales basées sur l'apprentissage d'ensemble. Le cadre proposé introduit des innovations importantes par rapport aux approches précédentes dans le même domaine, dont beaucoup sont basées principalement sur un algorithme individuel. Tout d'abord, nous proposons d'utiliser la Forêt de Rotation (Rotation Forest) avec différentes techiniques d'extraction de caractéristiques linéaire et nous comparons nos méthodes avec les approches d'ensemble traditionnelles, tels que Bagging, Boosting, Sous-espace Aléatoire et Forêts Aléatoires. Ensuite, l'intégration des machines à vecteurs de support (SVM) avec le cadre de sous-espace de rotation pour la classification de contexte est étudiée. SVM et sous-espace de rotation sont deux outils puissants pour la classification des données de grande dimension. C'est pourquoi, la combinaison de ces deux méthodes peut améliorer les performances de classification. Puis, nous étendons le travail de la Forêt de Rotation en intégrant la technique d'extraction de caractéristiques locales et l'information contextuelle spatiale avec un champ de Markov aléatoire (MRF) pour concevoir des méthodes spatio-spectrale robustes. Enfin, nous présentons un nouveau cadre général, ensemble de sous-espace aléatoire, pour former une série de classifieurs efficaces, y compris les arbres de décision et la machine d'apprentissage extrême (ELM), avec des profils multi-attributs étendus (EMaPS) pour la classification des données hyperspectrales. Six méthodes d'ensemble de sous-espace aléatoire, y compris les sous-espaces aléatoires avec les arbres de décision, Forêts Aléatoires (RF), la Forêt de Rotation (RoF), la Forêt de Rotation Aléatoires (Rorf), RS avec ELM (RSELM) et sous-espace de rotation avec ELM (RoELM), sont construits par multiples apprenants de base. L'efficacité des techniques proposées est illustrée par la comparaison avec des méthodes de l'état de l'art en utilisant des données hyperspectrales réelles dans de contextes différents
In this thesis, we propose several new techniques for the classification of hyperspectral remote sensing images based on multiple classifier system (MCS). Our proposed framework introduces significant innovations with regards to previous approaches in the same field, many of which are mainly based on an individual algorithm. First, we propose to use Rotation Forests with several linear feature extraction and compared them with the traditional ensemble approaches, such as Bagging, Boosting, Random subspace and Random Forest. Second, the integration of the support vector machines (SVM) with Rotation subspace framework for context classification is investigated. SVM and Rotation subspace are two powerful tools for high-dimensional data classification. Therefore, combining them can further improve the classification performance. Third, we extend the work of Rotation Forests by incorporating local feature extraction technique and spatial contextual information with Markov random Field (MRF) to design robust spatial-spectral methods. Finally, we presented a new general framework, Random subspace ensemble, to train series of effective classifiers, including decision trees and extreme learning machine (ELM), with extended multi-attribute profiles (EMAPs) for classifying hyperspectral data. Six RS ensemble methods, including Random subspace with DT (RSDT), Random Forest (RF), Rotation Forest (RoF), Rotation Random Forest (RoRF), RS with ELM (RSELM) and Rotation subspace with ELM (RoELM), are constructed by the multiple base learners. The effectiveness of the proposed techniques is illustrated by comparing with state-of-the-art methods by using real hyperspectral data sets with different contexts
APA, Harvard, Vancouver, ISO, and other styles
9

Pettersson, Anders. "High-Dimensional Classification Models with Applications to Email Targeting." Thesis, KTH, Matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-168203.

Full text
Abstract:
Email communication is valuable for any modern company, since it offers an easy mean for spreading important information or advertising new products, features or offers and much more. To be able to identify which customers that would be interested in certain information would make it possible to significantly improve a company's email communication and as such avoiding that customers start ignoring messages and creating unnecessary badwill. This thesis focuses on trying to target customers by applying statistical learning methods to historical data provided by the music streaming company Spotify. An important aspect was the high-dimensionality of the data, creating certain demands on the applied methods. A binary classification model was created, where the target was whether a customer will open the email or not. Two approaches were used for trying to target the costumers, logistic regression, both with and without regularization, and random forest classifier, for their ability to handle the high-dimensionality of the data. Performance accuracy of the suggested models were then evaluated on both a training set and a test set using statistical validation methods, such as cross-validation, ROC curves and lift charts. The models were studied under both large-sample and high-dimensional scenarios. The high-dimensional scenario represents when the number of observations, N, is of the same order as the number of features, p and the large sample scenario represents when N ≫ p. Lasso-based variable selection was performed for both these scenarios, to study the informative value of the features. This study demonstrates that it is possible to greatly improve the opening rate of emails by targeting users, even in the high dimensional scenario. The results show that increasing the amount of training data over a thousand fold will only improve the performance marginally. Rather efficient customer targeting can be achieved by using a few highly informative variables selected by the Lasso regularization.
Företag kan använda e-mejl för att på ett enkelt sätt sprida viktig information, göra reklam för nya produkter eller erbjudanden och mycket mer, men för många e-mejl kan göra att kunder slutar intressera sig för innehållet, genererar badwill och omöjliggöra framtida kommunikation. Att kunna urskilja vilka kunder som är intresserade av det specifika innehållet skulle vara en möjlighet att signifikant förbättra ett företags användning av e-mejl som kommunikationskanal. Denna studie fokuserar på att urskilja kunder med hjälp av statistisk inlärning applicerad på historisk data tillhandahållen av musikstreaming-företaget Spotify. En binärklassificeringsmodell valdes, där responsvariabeln beskrev huruvida kunden öppnade e-mejlet eller inte. Två olika metoder användes för att försöka identifiera de kunder som troligtvis skulle öppna e-mejlen, logistisk regression, både med och utan regularisering, samt random forest klassificerare, tack vare deras förmåga att hantera högdimensionella data. Metoderna blev sedan utvärderade på både ett träningsset och ett testset, med hjälp av flera olika statistiska valideringsmetoder så som korsvalidering och ROC kurvor. Modellerna studerades under både scenarios med stora stickprov och högdimensionella data. Där scenarion med högdimensionella data representeras av att antalet observationer, N, är av liknande storlek som antalet förklarande variabler, p, och scenarion med stora stickprov representeras av att N ≫ p. Lasso-baserad variabelselektion utfördes för båda dessa scenarion för att studera informationsvärdet av förklaringsvariablerna. Denna studie visar att det är möjligt att signifikant förbättra öppningsfrekvensen av e-mejl genom att selektera kunder, även när man endast använder små mängder av data. Resultaten visar att en enorm ökning i antalet träningsobservationer endast kommer förbättra modellernas förmåga att urskilja kunder marginellt.
APA, Harvard, Vancouver, ISO, and other styles
10

Amlathe, Prakhar. "Standard Machine Learning Techniques in Audio Beehive Monitoring: Classification of Audio Samples with Logistic Regression, K-Nearest Neighbor, Random Forest and Support Vector Machine." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7050.

Full text
Abstract:
Honeybees are one of the most important pollinating species in agriculture. Every three out of four crops have honeybee as their sole pollinator. Since 2006 there has been a drastic decrease in the bee population which is attributed to Colony Collapse Disorder(CCD). The bee colonies fail/ die without giving any traditional health symptoms which otherwise could help in alerting the Beekeepers in advance about their situation. Electronic Beehive Monitoring System has various sensors embedded in it to extract video, audio and temperature data that could provide critical information on colony behavior and health without invasive beehive inspections. Previously, significant patterns and information have been extracted by processing the video/image data, but no work has been done using audio data. This research inaugurates and takes the first step towards the use of audio data in the Electronic Beehive Monitoring System (BeePi) by enabling a path towards the automatic classification of audio samples in different classes and categories within it. The experimental results give an initial support to the claim that monitoring of bee buzzing signals from the hive is feasible, it can be a good indicator to estimate hive health and can help to differentiate normal behavior against any deviation for honeybees.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Random Forests Classifiers"

1

Latinne, Patrice, Olivier Debeir, and Christine Decaestecker. "Limiting the Number of Trees in Random Forests." In Multiple Classifier Systems, 178–87. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-48219-9_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bernard, Simon, Laurent Heutte, and Sébastien Adam. "Influence of Hyperparameters on Random Forest Accuracy." In Multiple Classifier Systems, 171–80. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-02326-2_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Baumann, Florian, Fangda Li, Arne Ehlers, and Bodo Rosenhahn. "Thresholding a Random Forest Classifier." In Advances in Visual Computing, 95–106. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-14364-4_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Smith, R. S., M. Bober, and T. Windeatt. "A Comparison of Random Forest with ECOC-Based Classifiers." In Multiple Classifier Systems, 207–16. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-21557-5_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Svetnik, Vladimir, Andy Liaw, Christopher Tong, and Ting Wang. "Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules." In Multiple Classifier Systems, 334–43. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-25966-4_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Mishra, Sushruta, Yeshihareg Tadesse, Anuttam Dash, Lambodar Jena, and Piyush Ranjan. "Thyroid Disorder Analysis Using Random Forest Classifier." In Smart Innovation, Systems and Technologies, 385–90. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-6202-0_39.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Tiwari, Kamlesh, and Mayank Patel. "Facial Expression Recognition Using Random Forest Classifier." In Algorithms for Intelligent Systems, 121–30. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-1059-5_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Vakharia, V., S. Vaishnani, and H. Thakker. "Appliances Energy Prediction Using Random Forest Classifier." In Lecture Notes in Mechanical Engineering, 405–10. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-8704-7_50.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Wenbin, Albert Bifet, Xiangliang Zhang, Jeremy C. Weiss, and Wolfgang Nejdl. "FARF: A Fair and Adaptive Random Forests Classifier." In Advances in Knowledge Discovery and Data Mining, 245–56. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-75765-6_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Camgöz, Necati Cihan, Ahmet Alp Kindiroglu, and Lale Akarun. "Gesture Recognition Using Template Based Random Forest Classifiers." In Computer Vision - ECCV 2014 Workshops, 579–94. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-16178-5_41.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Random Forests Classifiers"

1

Izza, Yacine, and Joao Marques-Silva. "On Explaining Random Forests with SAT." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/356.

Full text
Abstract:
Random Forest (RFs) are among the most widely used Machine Learning (ML) classifiers. Even though RFs are not interpretable, there are no dedicated non-heuristic approaches for computing explanations of RFs. Moreover, there is recent work on polynomial algorithms for explaining ML models, including naive Bayes classifiers. Hence, one question is whether finding explanations of RFs can be solved in polynomial time. This paper answers this question negatively, by proving that computing one PI-explanation of an RF is D^P-hard. Furthermore, the paper proposes a propositional encoding for computing explanations of RFs, thus enabling finding PI-explanations with a SAT solver. This contrasts with earlier work on explaining boosted trees (BTs) and neural networks (NNs), which requires encodings based on SMT/MILP. Experimental results, obtained on a wide range of publicly available datasets, demonstrate that the proposed SAT-based approach scales to RFs of sizes common in practical applications. Perhaps more importantly, the experimental results demonstrate that, for the vast majority of examples considered, the SAT-based approach proposed in this paper significantly outperforms existing heuristic approaches.
APA, Harvard, Vancouver, ISO, and other styles
2

Sathe, Saket, and Charu C. Aggarwal. "Nearest Neighbor Classifiers Versus Random Forests and Support Vector Machines." In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 2019. http://dx.doi.org/10.1109/icdm.2019.00164.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Cohen, Joseph, Baoyang Jiang, and Jun Ni. "Fault Diagnosis of Timed Event Systems: An Exploration of Machine Learning Methods." In ASME 2020 15th International Manufacturing Science and Engineering Conference. American Society of Mechanical Engineers, 2020. http://dx.doi.org/10.1115/msec2020-8360.

Full text
Abstract:
Abstract Especially common in discrete manufacturing, timed event systems often require a high degree of synchronization for healthy operation. Discrete event system methods have been used as mathematical tools to detect known faults, but do not scale well for problems with extensive variability in the normal class. A hybridized discrete event and data-driven method is suggested to supplement fault diagnosis in the case where failure patterns are not known in advance. A unique fault diagnosis framework consisting of signal data from programmable logic controllers, a Timed Petri Net of the normal process behavior, and machine learning algorithms is presented to improve fault diagnosis of timed event systems. Various supervised and unsupervised machine learning algorithms are explored as the methodology is implemented to a case study in semiconductor manufacturing. State-of-the-art classifiers such as artificial neural networks, support vector machines, and random forests are implemented and compared for handling multi-fault diagnosis using programmable logic controller signal data. For unsupervised learning, classifiers based on principal component analysis utilizing major and minor principal components are compared for anomaly detection. The rule-based extreme random forest classifier achieves the highest validation accuracy of 98% for multi-fault classification. Likewise, the unsupervised learning approach shows similar success, yielding anomaly detection rates of 98% with false alarms under 3%. The industrial feasibility of this method is notable, with the results achieved with a training set 99% smaller than the supervised learning classifiers.
APA, Harvard, Vancouver, ISO, and other styles
4

"Ensemble Learning Approach for Clickbait Detection Using Article Headline Features." In InSITE 2019: Informing Science + IT Education Conferences: Jerusalem. Informing Science Institute, 2019. http://dx.doi.org/10.28945/4319.

Full text
Abstract:
[This Proceedings paper was revised and published in the 2019 issue of the journal Informing Science: The International Journal of an Emerging Transdiscipline, Volume 22] Aim/Purpose: The aim of this paper is to propose an ensemble learners based classification model for classification clickbaits from genuine article headlines. Background: Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to an unpleasant experience for readers. Therefore, it is essential to distinguish clickbaits from authentic headlines to mitigate their impact on readers’ perception. Methodology: A total of one hundred thousand article headlines are collected from news aggregator sites consists of clickbaits and authentic news headlines. The collected data samples are divided into five training sets of balanced and unbalanced data. The natural language processing techniques are used to extract 19 manually selected features from article headlines. Contribution: Three ensemble learning techniques including bagging, boosting, and random forests are used to design a classifier model for classifying a given headline into the clickbait or non-clickbait. The performances of learners are evaluated using accuracy, precision, recall, and F-measures. Findings: It is observed that the random forest classifier detects clickbaits better than the other classifiers with an accuracy of 91.16 %, a total precision, recall, and f-measure of 91 %.
APA, Harvard, Vancouver, ISO, and other styles
5

Losi, Enzo, Mauro Venturini, Lucrezia Manservigi, Giuseppe Fabio Ceschini, Giovanni Bechini, Giuseppe Cota, and Fabrizio Riguzzi. "Prediction of Gas Turbine Trip: a Novel Methodology Based on Random Forest Models." In ASME Turbo Expo 2021: Turbomachinery Technical Conference and Exposition. American Society of Mechanical Engineers, 2021. http://dx.doi.org/10.1115/gt2021-58916.

Full text
Abstract:
Abstract A gas turbine trip is an unplanned shutdown, of which the most relevant consequences are business interruption and a reduction of equipment remaining useful life. Thus, understanding the underlying causes of gas turbine trip would allow predicting its occurrence in order to maximize gas turbine profitability and improve its availability. In the ever competitive Oil & Gas sector, data mining and machine learning are increasingly being employed to support a deeper insight and improved operation of gas turbines. Among the various machine learning tools, Random Forests are an ensemble learning method consisting of an aggregation of decision tree classifiers. This paper presents a novel methodology aimed at exploiting information embedded in the data and develops Random Forest models, aimed at predicting gas turbine trip based on information gathered during a timeframe of historical data acquired from multiple sensors. The novel approach exploits time series segmentation to increase the amount of training data, thus reducing overfitting. First, data are transformed according to a feature engineering methodology developed in a separate work by the same authors. Then, Random Forest models are trained and tested on unseen observations to demonstrate the benefits of the novel approach. The superiority of the novel approach is proved by considering two real-word case-studies, involving filed data taken during three years of operation of two fleets of Siemens gas turbines located in different regions. The novel methodology allows values of Precision, Recall and Accuracy in the range 75–85 %, thus demonstrating the industrial feasibility of the predictive methodology.
APA, Harvard, Vancouver, ISO, and other styles
6

J. Stein, Aviel, Janith Weerasinghe, Spiros Mancoridis, and Rachel Greenstadt. "News Article Text Classification and Summary for Authors and Topics." In 9th International Conference on Natural Language Processing (NLP 2020). AIRCC Publishing Corporation, 2020. http://dx.doi.org/10.5121/csit.2020.101401.

Full text
Abstract:
News articles are important for providing timely, historic information. However, the Internet is replete with text that may contain irrelevant or unhelpful information, therefore means of processing it and distilling content is important and useful to human readers as well as information extracting tools. Some common questions we may want to answer are “what is this article about?” and “who wrote it?”. In this work we compare machine learning models for evaluating two common NLP tasks, topic and authorship attribution, on the 2017 Vox Media dataset. Additionally, we use the models to classify on a subsection, about ~20%, of the original text which show to be better for classification than the provided blurbs. Because of the large number of topics, we take into account topic overlap and address it via top-n accuracy and hierarchical groupings of topics. We also consider edge cases in authorship by classifying on inter-topic and intra-topic author distributions. Our results show that both topics and authors readily identifiable consistently perform best when using neural networks rather than support vector, random forests, or naive Bayes classifiers, although the latter methods perform acceptably.
APA, Harvard, Vancouver, ISO, and other styles
7

Das, Dipankar, and Krishna Sharma. "Leveraging of Weighted Ensemble Technique for Identifying Medical Concepts from Clinical Texts at Word and Phrase Level." In 2nd International Conference on Machine Learning, IOT and Blockchain (MLIOB 2021). Academy and Industry Research Collaboration Center (AIRCC), 2021. http://dx.doi.org/10.5121/csit.2021.111213.

Full text
Abstract:
Concept identification from medical texts becomes important due to digitization. However, it is not always feasible to identify all such medical concepts manually. Thus, in the present attempt, we have applied five machine learning classifiers (Support Vector Machine, K-Nearest Neighbours, Logistic Regression, Random Forest and Naïve Bayes) and one deep learning classifier (Long Short Term Memory) to identify medical concepts by training a total of 27.383K sentences. In addition, we have also developed a rule based phrase identification module to help the existing classifiers for identifying multi- word medical concepts. We have employed word2vec technique for feature extraction and PCA and T- SNE for conducting ablation study over various features to select important ones. Finally, we have adopted two different ensemble approaches, stacking and weighted sum to improve the performance of the individual classifier and significant improvements were observed with respect to each of the classifiers. It has been observed that phrase identification module plays an important role when dealing with individual classifier in identifying higher order ngram medical concepts. Finally, the ensemble approach enhances the results over SVM that was showing initial improvement even after the application of phrase based module.
APA, Harvard, Vancouver, ISO, and other styles
8

Schnebly, James, and Shamik Sengupta. "Random Forest Twitter Bot Classifier." In 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2019. http://dx.doi.org/10.1109/ccwc.2019.8666593.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Kocher, Geeta, and Gulshan Kumar. "Performance Analysis of Machine Learning Classifiers for Intrusion Detection using UNSW-NB15 Dataset." In 6th International Conference on Signal and Image Processing (SIGI 2020). AIRCC Publishing Corporation, 2020. http://dx.doi.org/10.5121/csit.2020.102004.

Full text
Abstract:
With the advancement of internet technology, the numbers of threats are also rising exponentially. To reduce the impact of these threats, researchers have proposed many solutions for intrusion detection. In the literature, various machine learning classifiers are trained on older datasets for intrusion detection which limits their detection accuracy. So, there is a need to train the machine learning classifiers on latest dataset. In this paper, UNSW-NB15, the latest dataset is used to train machine learning classifiers. On the basis of theoretical analysis, taxonomy is proposed in terms of lazy and eager learners. From this proposed taxonomy, KNearest Neighbors (KNN), Stochastic Gradient Descent (SGD), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR) and Naïve Bayes (NB) classifiers are selected for training. The performance of these classifiers is tested in terms of Accuracy, Mean Squared Error (MSE), Precision, Recall, F1-Score, True Positive Rate (TPR) and False Positive Rate (FPR) on UNSW-NB15 dataset and comparative analysis of these machine learning classifiers is carried out. The experimental results show that RF classifier outperforms other classifiers.
APA, Harvard, Vancouver, ISO, and other styles
10

Mohandoss, Divya Pramasani, Yong Shi, and Kun Suo. "Outlier Prediction Using Random Forest Classifier." In 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2021. http://dx.doi.org/10.1109/ccwc51732.2021.9376077.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography