Academic literature on the topic 'Outlier Detection, Random Forest, Pattern Recognition, Anomaly Detection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Outlier Detection, Random Forest, Pattern Recognition, Anomaly Detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Outlier Detection, Random Forest, Pattern Recognition, Anomaly Detection"

1

Kim, Taegong, and Cheong Hee Park. "Anomaly Pattern Detection in Streaming Data Based on the Transformation to Multiple Binary-Valued Data Streams." Journal of Artificial Intelligence and Soft Computing Research 12, no. 1 (October 8, 2021): 19–27. http://dx.doi.org/10.2478/jaiscr-2022-0002.

Full text
Abstract:
Abstract Anomaly pattern detection in a data stream aims to detect a time point where outliers begin to occur abnormally. Recently, a method for anomaly pattern detection has been proposed based on binary classification for outliers and statistical tests in the data stream of binary labels of normal or an outlier. It showed that an anomaly pattern can be detected accurately even when outlier detection performance is relatively low. However, since the anomaly pattern detection method is based on the binary classification for outliers, most well-known outlier detection methods, with the output of real-valued outlier scores, can not be used directly. In this paper, we propose an anomaly pattern detection method in a data stream using the transformation to multiple binary-valued data streams from real-valued outlier scores. By using three outlier detection methods, Isolation Forest(IF), Autoencoder-based outlier detection, and Local outlier factor(LOF), the proposed anomaly pattern detection method is tested using artificial and real data sets. The experimental results show that anomaly pattern detection using Isolation Forest gives the best performance.
APA, Harvard, Vancouver, ISO, and other styles
2

Tan, Xu, Jiawei Yang, and Susanto Rahardja. "Sparse random projection isolation forest for outlier detection." Pattern Recognition Letters 163 (November 2022): 65–73. http://dx.doi.org/10.1016/j.patrec.2022.09.015.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Cheung, Catherine, Julio J. Valdés, Richard Salas Chavez, and Srishti Sehgal. "Failure Modeling of a Propulsion Subsystem: Unsupervised and Semi-Supervised Approaches to Anomaly Detection." International Journal of Pattern Recognition and Artificial Intelligence 33, no. 11 (October 2019): 1940019. http://dx.doi.org/10.1142/s0218001419400196.

Full text
Abstract:
In this work, the sensor data related to a diesel engine system and specifically its turbocharger subsystem were analyzed. An incident where the turbocharger seized was recorded by dozens of standard turbocharger-related sensors. By training models to distinguish between normal healthy operating conditions and deteriorated conditions, there is an opportunity to develop prognostic and predictive tools to ideally help prevent a similar occurrence in the future. Analysis of this event provides an opportunity to identify changes in equipment indicators with a known outcome. A number of data analysis tools were used to characterize the healthy and deteriorated states of the turbocharger system, including various supervised classification as well as semi-supervised and unsupervised anomaly detection techniques. The leader clustering algorithm was also implemented to reduce the amount of data to train and develop the models. This paper describes the results of this modeling process, validated by testing on healthy data from the same propulsion system and a second distinct one. Although this problem posed challenges due to the severely imbalanced class distribution, the supervised classifiers, in particular Support Vector Machine (SVM) and Random Forest (RF), performed very well in all metrics while the unsupervised anomaly detection models achieved near-perfect accuracy for identifying healthy turbocharger states.
APA, Harvard, Vancouver, ISO, and other styles
4

Hao, Yinhui, and Fuqiang Qiu. "Research on the Application of DM Technology with RF in Enterprise Financial Audit." Mobile Information Systems 2022 (May 26, 2022): 1–9. http://dx.doi.org/10.1155/2022/4051469.

Full text
Abstract:
Data mining (DM), as a new technology in the information age, is applied to modern audit work, which is more effective than traditional audit methods. In view of the problems existing in traditional tax audit methods, such as the huge amount of audit data, limited knowledge and experience of auditors, and difficult tracking of audit data, this paper uses computer-aided audit technology to collect, clean up, convert, and analyze data, comprehensively uses data warehouse technology, pattern recognition method, data analysis method, and anomaly detection theory as research methods, and makes a comprehensive study on tax affairs. Then, a random forest (RF) algorithm is used to establish the classification and identification model of audit risk. Second, based on the RF algorithm, the audit early warning framework of accounts receivable and payable in enterprise financial sharing mode is constructed, and the financial data and business data in enterprise financial sharing mode are extracted by using big data technology. The comparison of the results shows that the RF model has higher prediction accuracy and better robustness, which can better improve the antirisk ability of listed companies in China.
APA, Harvard, Vancouver, ISO, and other styles
5

Thi Ngoc Anh, Nguyen, Pham Ngoc Quang Anh, Vu Hoai Thu, Doan Van Thai, Vijender Kumar Solanki, and Dang Minh Tuan. "A novel approach for anomaly detection in automatic meter intelligence system using machine learning and pattern recognition." Journal of Intelligent & Fuzzy Systems, March 12, 2022, 1–10. http://dx.doi.org/10.3233/jifs-219285.

Full text
Abstract:
Anomaly detection for sensor systems is one of the most researched topics for the Internet of Thing systems. Researchers have been attracted to machine learning classification problems that are considered the most effective techniques. The novel model is proposed by combining anomaly pattern Symbolic Aggregate Approximation (SAX), processing imbalance data and machine learning techniques for sensor anomaly detection. The advantage of anomaly patterns and machine learning leads to the the proposed model to have better performance. The proposed model consists of three phases: finding anomaly pattern features, processing imbalanced data, exploring data by machine learning model. In this paper, the main contributions with respect to previous works can be listed as follows: (i) Successful modeling the new method of SAX for time series data for finding complex and dynamic anomaly patterns. (ii) Archiving applied anomaly pattern feature into machine learning model Random Forest and hyperparameters optimisation of these model. (iii) Fitfully proposed a model combining SAX, imbalance technique, and random forest to anomaly detection. (iv) Achieving applied proposal model in automatic meter intelligence system in Vietnam. The experiential results of the proposed model have described the robustness and better performance for detecting anomalies of power meter sensors.
APA, Harvard, Vancouver, ISO, and other styles
6

Qu, Haicheng, Jianzhong Zhou, Jitao Qin, and Xiaorong Tian. "Anomaly Detection for Industrial Control Networks Based on Improved One-Class Support Vector Machine." International Journal of Pattern Recognition and Artificial Intelligence, December 16, 2020, 2150012. http://dx.doi.org/10.1142/s0218001421500129.

Full text
Abstract:
In traditional network anomaly detection algorithms, the anomaly threshold needs to be defined manually. Keeping this as background, this study proposes an anomaly detection algorithm (VAEOCSVM), which combines the variable auto-encoder (VAE) and one-class support vector machine (OCSVM) to realize anomaly detection in industrial control networks. First, the VAE model is used to obtain the distribution of the original normal sample data represented by the low-dimensional code; the reconstruction error of the VAE model is merged into the new input. Then, using OCSVM’s hinge-loss objective function and the random Fourier feature fitting radial basis function (RBF) kernel method, the OCSVM model is represented and solved using the deep neural network and gradient descent method. Finally, the decision function of the OCSVM model is constructed by using the solved parameter information to realize the detection of abnormal data. The proposed algorithm is compared with other machine-learning-based anomaly detection algorithms in terms of multiple indicators such as precision, recall, and [Formula: see text] score. The experimental results using various datasets show that the proposed algorithm has a better outlier recognition ability than the machine-learning-based anomaly detection algorithms.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Outlier Detection, Random Forest, Pattern Recognition, Anomaly Detection"

1

Antonella, Mensi. "Advanced random forest approaches for outlier detection." Doctoral thesis, 2022. http://hdl.handle.net/11562/1067504.

Full text
Abstract:
Outlier Detection (OD) is a Pattern Recognition task which consists of finding those patterns in a set of data which are likely to have been generated by a different mechanism than the one underlying the rest of the data. The importance of OD is visible in everyday life. Indeed, fast, and accurate detection of outliers is crucial: for example, in the electrocardiogram of a patient, an abnormality in the heart rhythm can cause severe health problems. Due to the high number of fields in which OD is needed, several approaches have been designed. Among them, Random Forest-based techniques have raised great interest in the research community: a Random Forest (RF) is an ensemble of Decision Trees where each tree is diverse and independent. They are characterized by a high degree of flexibility, robustness, and high generalization capabilities. Even though originally designed for classification and regression, in the latest years, due to their success, there has been an increased development of RF-based approaches for other learning tasks, including OD. The forerunner of several RF methods for OD is Isolation Forest (iForest), a technique which main principle is isolation, i.e. the separation of each object from the rest of the data. Since outliers are different from the rest of the data and thus easier to separate, we can easily identify them as those objects isolated after few splits in the tree. iForests have been employed in a great variety of application fields, showing excellent performances. This thesis is inserted into the above scenario: even if some extensions of basic RF-based approaches for OD have been proposed, their potentialities have not been fully exploited and there is large room for improvements. In this thesis, we introduce some advanced RF-based techniques for OD, investigating both methodological issues and alternative uses of these flexible approaches. In detail, we moved along four research directions. The starting point of the first one is the absence of RF methods for OD able to work with non-vectorial data: here we propose ProxIForest, an approach which works with all types of data for which a distance measure can be defined, thus including non-vectorial data as well. Indeed, for the latter, many powerful distances have been proposed. The second direction focuses on how to measure the outlierness degree of an object in an RF, i.e. the anomaly score, since most extensions of iForest concern only the tree building procedure. In detail, we propose two novel classes of methods: the first class exploits the information contained within a tree. The second one focuses on the ensemble aspect of RFs: the aggregation of the anomaly scores extracted from each tree is crucial to correctly identify outliers. As to the third research direction we took a different perspective exploiting the fact that each tree in a forest is a space partitioner encoding relations, i.e. distances, between objects. Whereas this aspect has been widely researched in the clustering field, it has never been investigated for OD: we extract from an iForest a distance measure and input it to an outlier detector. As last research direction, we designed a new variant of iForest to characterize multiple sclerosis given a brain connectivity network: we cast the problem as an OD task, by making an analogy between disconnected brain regions, the hallmark of the disease, and outliers. All proposals have been thoroughly empirically validated on either classical or ad hoc datasets: we performed several analyses, including comparisons to state-of-the-art approaches and statistical tests. This thesis proves the suitability of RF-based approaches for OD from different perspectives: not only they can be successfully used for the task, but we can also use them to extract distances or features. Further, by contributing to this field, this thesis proves that there are still many aspects requiring further investigation.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Outlier Detection, Random Forest, Pattern Recognition, Anomaly Detection"

1

Devshali, Sagun, Shailesh Kumar Tripathi, Dhiraj Dodda, Manish Kumar, Rishabh Uniyal, M. Yadav, and S. Malhotra. "Predicting ESP failures Using Artificial Intelligence for Improved Production Performance in One of the Offshore Fields in India." In ADIPEC. SPE, 2022. http://dx.doi.org/10.2118/211031-ms.

Full text
Abstract:
Abstract Field X is situated at a water depth of 90 meters in the western continental shelf at a distance of 200 Kilometers from Mumbai. It is one of the few fields in the world operating entirely on Electric Submersible Pumps with 36 wells in 5 wellhead platforms producing 62907 barrels of liquid per day with an average water cut of 68%. The performance of ESPs is being continuously monitored in the field. With continuous improvement, the run life of ESPs has increased from a few months to an average of 3 years. Despite the improvement in the run life, unexpected failures still occur from time to time. These unanticipated ESP failures cause substantial production deferment leading to considerable losses in terms of revenue and resources. This paper presents the findings of an Artificial Intelligence based model developed for failure prediction of ESPs aiming to minimize the unexpected production loss for Field X. From the historical data obtained, 47 instances of pump failure have been identified. One of the challenges encountered during Data Exploration was missing data which in many cases was due to downhole sensor failure before the pump failure. The missing values have been inferred and imputed from the known available parameters for each pump. Various machine learning algorithms including Random Forest Regressor, Xgboost Regression, Copula-based Outlier Detection, Scalable Unsupervised Outlier Detection and Long Short Term Memory (LSTM) Autoencoder have been applied on these failure instances to develop a model for predicting the run lives of ESPs. Out of all the methods, LSTM Autoencoder model has been found to be the best suited model for anomaly detection before failure of ESPs. Autoencoders learn patterns in data over long sequences which makes them suitable for anomaly detection before the actual pump failure. The pattern recognition algorithms of Autoencoders have been able to predict the anomaly at approximately ~60 days before failure in a number of pump failure instances. The paper discusses a proactive approach by building a predictive model for estimating ESP lifespan based on machine learning algorithms. The model's predictive accuracy can be improved over time by adding information and further improving the model components.
APA, Harvard, Vancouver, ISO, and other styles
2

Figueirêdo, Ilan Sousa, Tássio Farias Carvalho, Wenisten José Dantas Silva, Lílian Lefol Nani Guarieiro, and Erick Giovani Sperandio Nascimento. "Detecting Interesting and Anomalous Patterns In Multivariate Time-Series Data in an Offshore Platform Using Unsupervised Learning." In Offshore Technology Conference. OTC, 2021. http://dx.doi.org/10.4043/31297-ms.

Full text
Abstract:
Abstract Detection of anomalous events in practical operation of oil and gas (O&G) wells and lines can help to avoid production losses, environmental disasters, and human fatalities, besides decreasing maintenance costs. Supervised machine learning algorithms have been successful to detect, diagnose, and forecast anomalous events in O&G industry. Nevertheless, these algorithms need a large quantity of annotated dataset and labelling data in real world scenarios is typically unfeasible because of exhaustive work of experts. Therefore, as unsupervised machine learning does not require an annotated dataset, this paper intends to perform a comparative evaluation performance of unsupervised learning algorithms to support experts for anomaly detection and pattern recognition in multivariate time-series data. So, the goal is to allow experts to analyze a small set of patterns and label them, instead of analyzing large datasets. This paper used the public 3W database of three offshore naturally flowing wells. The experiment used real data of production of O&G from underground reservoirs with the following anomalous events: (i) spurious closure of Downhole Safety Valve (DHSV) and (ii) quick restriction in Production Choke (PCK). Six unsupervised machine learning algorithms were assessed: Cluster-based Algorithm for Anomaly Detection in Time Series Using Mahalanobis Distance (C-AMDATS), Luminol Bitmap, SAX-REPEAT, k-NN, Bootstrap, and Robust Random Cut Forest (RRCF). The comparison evaluation of unsupervised learning algorithms was performed using a set of metrics: accuracy (ACC), precision (PR), recall (REC), specificity (SP), F1-Score (F1), Area Under the Receiver Operating Characteristic Curve (AUC-ROC), and Area Under the Precision-Recall Curve (AUC-PRC). The experiments only used the data labels for assessment purposes. The results revealed that unsupervised learning successfully detected the patterns of interest in multivariate data without prior annotation, with emphasis on the C-AMDATS algorithm. Thus, unsupervised learning can leverage supervised models through the support given to data annotation.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography