Journal articles on the topic 'Outliers'

To see the other types of publications on this topic, follow the link: Outliers.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Outliers.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Seo, Han Son. "Outlier tests on potential outliers." Korean Journal of Applied Statistics 30, no. 1 (February 28, 2017): 159–67. http://dx.doi.org/10.5351/kjas.2017.30.1.159.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

., Srividya, S. Mohanavalli, N. Sripriya, and S. Poornima. "Outlier Detection using Clustering Techniques." International Journal of Engineering & Technology 7, no. 3.12 (July 20, 2018): 813. http://dx.doi.org/10.14419/ijet.v7i3.12.16508.

Full text
Abstract:
An outlier is nothing but a pattern that is different compared to the other existing patterns in a particular dataset. In some applications it is very important to understand and identify outliers. Detecting outlier is of major importance in many of the fields like cybersecurity, machine learning, finance, healthcare, etc., A clustering based method is proposed to detect outliers using different algorithms like k means, PAM, Clara, DBScan and LOF on different data sets like breast cancer, heart diseases, multi shaped datasets. This work aims to identify the best suitable method to detect the outliners accurately.
APA, Harvard, Vancouver, ISO, and other styles
3

Huda, Nur'ainul Miftahul, Utriweni Mukhaiyar, and Nurfitri Imro'ah. "AN ITERATIVE PROCEDURE FOR OUTLIER DETECTION IN GSTAR(1;1) MODEL." BAREKENG: Jurnal Ilmu Matematika dan Terapan 16, no. 3 (September 1, 2022): 975–84. http://dx.doi.org/10.30598/barekengvol16iss3pp975-984.

Full text
Abstract:
Outliers are observations that differ significantly from others that can affect the estimation results in the model and reduce the estimator's accuracy. To deal with outliers is to remove outliers from the data. However, sometimes important information is contained in the outlier, so eliminating outliers is a misinterpretation. There are two types of outliers in the time series model, Innovative Outlier (IO) and Additive Outlier (AO). In the GSTAR model, outliers and spatial and time correlations can also be detected. We introduce an iterative procedure for detecting outliers in the GSTAR model. The first step is to form a GSTAR model without outlier factors. Furthermore, the detection of outliers from the model's residuals. If an outlier is detected, add an outlier factor into the initial model and estimate the parameters so that a new GSTAR model and residuals are obtained from the model. The process is repeated by detecting outliers and adding them to the model until a GSTAR model is obtained with no outliers detected. As a result, outliers are not removed or ignored but add an outlier factor to the GSTAR model. This paper presents case studies about Dengue Hemorrhagic Fever cases in five locations in West Kalimantan Province. These are the subject of the GSTAR model with adding outlier factors. The result of this paper is that using an iterative procedure to detect outliers based on the GSTAR residual model provides better accuracy than the regular GSTAR model (without adding outliers to the model). It can be solved without removing outliers from the data by adding outlier factors to the model. This way, the critical information in the outlier id is not lost, and an accurate ore model is obtained.
APA, Harvard, Vancouver, ISO, and other styles
4

Muhima, Rani Rotul, Muchamad Kurniawan, and Oktavian Tegar Pambudi. "A LOF K-Means Clustering on Hotspot Data." International Journal of Artificial Intelligence & Robotics (IJAIR) 2, no. 1 (July 1, 2020): 29. http://dx.doi.org/10.25139/ijair.v2i1.2634.

Full text
Abstract:
K-Means is the most popular of clustering method, but its drawback is sensitivity to outliers. This paper discusses the addition of the outlier removal method to the K-Means method to improve the performance of clustering. The outlier removal method was added to the Local Outlier Factor (LOF). LOF is the representative outlier’s detection algorithm based on density. In this research, the method is called LOF K-Means. The first applying clustering by using the K-Means method on hotspot data and then finding outliers using the LOF method. The object detected outliers are then removed. Then new centroid for each group is obtained using the K-Means method again. This dataset was taken from the FIRM are provided by the National Aeronautics and Space Administration (NASA). Clustering was done by varying the number of clusters (k = 10, 15, 20, 25, 30, 35, 40, 45 and 50) with cluster optimal is k = 20. The result based on the value of Sum of Squared Error (SSE) shown the LOF K-Means method was better than the K-Means method.
APA, Harvard, Vancouver, ISO, and other styles
5

Agyemang, Malik, Ken Barker, and Reda Alhajj. "Web outlier mining: Discovering outliers from web datasets1." Intelligent Data Analysis 9, no. 5 (November 3, 2005): 473–86. http://dx.doi.org/10.3233/ida-2005-9505.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Syed Abd Mutalib, Sharifah Sakinah, Siti Zanariah Satari, and Wan Nur Syahidah Wan Yusoff. "SYNTHETIC MULTIVARIATE DATA GENERATION PROCEDURE WITH VARIOUS OUTLIER SCENARIOS USING R PROGRAMMING LANGUAGE." Jurnal Teknologi 84, no. 3 (March 31, 2022): 89–101. http://dx.doi.org/10.11113/jurnalteknologi.v84.17900.

Full text
Abstract:
A synthetic data generation procedure is a procedure to generate data from either a statistical or mathematical model. The data generation procedure has been used in simulation studies to compare statistical performance methods or propose a new statistical method with a specific distribution. A synthetic multivariate data generation procedure with various outlier scenarios using R is formulated in this study. An outlier generating model is used to generate multivariate data that contains outliers. Data generation procedures for various outlier scenarios by using R are explained. Three outlier scenarios are produced, and graphical representations using 3D scatterplot and Chernoff faces for these outlier scenarios are shown. The graphical representation shows that as the distance between outliers and inliers by shifting the mean, increases in Outlier Scenario 1, the outliers and inliers are completely separated. The same pattern can also be seen when the distance between outliers and inliers, by shifting the covariance, increase in Outlier Scenario 2. For Outlier Scenario 3, when both values and increase, the separation of outliers and inliers are more apparent. The data generation procedure in this study will be continually used in other applications, such as identifying outliers by using the clustering method.
APA, Harvard, Vancouver, ISO, and other styles
7

Yulistiani, Selma, and Suliadi Suliadi. "Deteksi Pencilan pada Model ARIMA dengan Bayesian Information Criterion (BIC) Termodifikasi." STATISTIKA: Journal of Theoretical Statistics and Its Applications 19, no. 1 (June 20, 2019): 29–37. http://dx.doi.org/10.29313/jstat.v19i1.4740.

Full text
Abstract:
Time series data may be affected by special events or circumstances such as promotions, natural disasters, etc. These events can lead to inconsistent observations in the series called outliers. Because outliers can make invalid conclusions, it is important to carry out procedures in detecting outlier effects. In outlier detection there is one type of outlier, namely additive outlier (AO). The process of detecting additive outliers in the ARIMA model can be said as a model selection problem, where the candidate model assumes additive outliers at a certain time. In the selection of models there are criteria that must be considered in order to produce the best model. The good criteria for models selection can use the Bayesian Information Criterion (BIC) derived by Schwarz (1978). Galeano and Pena (2011) proposed a modified Bayesian Information Criterion for model selection and detect potential outliers. The modified Bayesian Information Criterion for outlier detection will be applied to the data OutStanding Loan PT.Pegadaian Cimahi year 2013-2017. So that the best model is obtained that the model with adding 2 potential outliers with the ARIMA model (1.0,0), that outliers at observations 48, and 58 because it has a minimum BICUP value of 1064.95650.
APA, Harvard, Vancouver, ISO, and other styles
8

Knight, Nathan L., and Jinling Wang. "A Comparison of Outlier Detection Procedures and Robust Estimation Methods in GPS Positioning." Journal of Navigation 62, no. 4 (October 2009): 699–709. http://dx.doi.org/10.1017/s0373463309990142.

Full text
Abstract:
With more satellite systems becoming available there is currently a need for Receiver Autonomous Integrity Monitoring (RAIM) to exclude multiple outliers. While the single outlier test can be applied iteratively, in the field of statistics robust methods are preferred when multiple outliers exist. This study compares the outlier test and numerous robust methods with simulated GPS measurements to identify which methods have the greatest ability to correctly exclude outliers. It was found that no method could correctly exclude outliers 100% of the time. However, for a single outlier the outlier test achieved the highest rates of correct exclusion followed by the MM-estimator and the L1-norm. As the number of outliers increased MM-estimators and the L1-norm obtained the highest rates of normal exclusion, which were up to ten percent higher than the outlier test.
APA, Harvard, Vancouver, ISO, and other styles
9

Hasanah, Siti Tabi'atul. "Pendeteksian Outlier pada Regresi Nonlinier dengan Metode statistik Likelihood Displacement." CAUCHY 2, no. 3 (November 15, 2012): 177. http://dx.doi.org/10.18860/ca.v2i3.3127.

Full text
Abstract:
<div class="standard"><a id="magicparlabel-1713">Outlier is an observation that much different (extreme) from the other observational data, or data can be interpreted that do not follow the general pattern of the model. Sometimes outliers provide information that can not be provided by other data. That's why outliers should not just be eliminated. Outliers can also be an influential observation. There are many methods that can be used to detect of outliers. In previous studies done on outlier detection of linear regression. Next will be developed detection of outliers in nonlinear regression. Nonlinear regression here is devoted to multiplicative nonlinear regression. To detect is use of statistical method likelihood displacement. Statistical methods abbreviated likelihood displacement (LD) is a method to detect outliers by removing the suspected outlier data. To estimate the parameters are used to the maximum likelihood method, so we get the estimate of the maximum. By using LD method is obtained i.e likelihood displacement is thought to contain outliers. Further accuracy of LD method in detecting the outliers are shown by comparing the MSE of LD with the MSE from the regression in general. Statistic test used is Λ. Initial hypothesis was rejected when proved so is an outlier.</a></div>
APA, Harvard, Vancouver, ISO, and other styles
10

Maia Lima, Luís Fernando, Alexandre Masson Maroldi, Dávilla Vieira Odízio da Silva, Carlos Roberto Massao Hayashi, and Maria Cristina Piumbato Innocentini Hayashi. "A influência de outliers nos estudos métricos da informação: uma análise de dados univariados." Em Questão 24 (December 31, 2018): 216. http://dx.doi.org/10.19132/1808-5245240.216-235.

Full text
Abstract:
Este artigo apresenta uma nova fórmula de detecção de outliers via Análise Exploratória de Dados, levando em conta a assimetria dos dados, e também estuda o efeito da remoção dos outliers dos dados originais. Aplica-se a fórmula para três conjuntos de dados publicados na literatura de estudos métricos da informação. O primeiro conjunto de dados apresenta cinco outliers inferiores. A média, dos dados agregados, conduz à falsa impressão de que 40 universidades, de um total de 49, estão acima da média. A remoção dos cinco outliers inferiores conduz a uma nova média em que somente 22 universidades estão acima da média. No segundo conjunto de dados há a presença de cinco outliers inferiores e um outlier superior. Neste caso, o outlier superior ameniza o efeito dos outliers inferiores. No terceiro conjunto de dados, detectam-se cinco outliers superiores e um outlier inferior. A média, dos dados agregados, aponta que dez universidades estão acima da média. Removendo-se os seis outliers dos dados originais, encontra-se que 28 universidades estão acima do novo valor da média. Para os três conjuntos de dados analisados o trabalho também demonstra o efeito dos outliers na estimativa intervalar (inferência estatística): a remoção dos outliers gera valores mais representativos tanto para a média como para o desvio padrão da amostra analisada. Portanto, evidencia-se como outliers podem afetar resultados e conclusões nos estudos métricos da informação. Todavia, a fórmula para a detecção de outliers apresenta-se aberta para futuras pesquisas.
APA, Harvard, Vancouver, ISO, and other styles
11

Parrinello, Christina M., Morgan E. Grams, Yingying Sang, David Couper, Lisa M. Wruck, Danni Li, John H. Eckfeldt, Elizabeth Selvin, and Josef Coresh. "Iterative Outlier Removal: A Method for Identifying Outliers in Laboratory Recalibration Studies." Clinical Chemistry 62, no. 7 (July 1, 2016): 966–72. http://dx.doi.org/10.1373/clinchem.2016.255216.

Full text
Abstract:
Abstract BACKGROUND Extreme values that arise for any reason, including those through nonlaboratory measurement procedure-related processes (inadequate mixing, evaporation, mislabeling), lead to outliers and inflate errors in recalibration studies. We present an approach termed iterative outlier removal (IOR) for identifying such outliers. METHODS We previously identified substantial laboratory drift in uric acid measurements in the Atherosclerosis Risk in Communities (ARIC) Study over time. Serum uric acid was originally measured in 1990–1992 on a Coulter DACOS instrument using an uricase-based measurement procedure. To recalibrate previous measured concentrations to a newer enzymatic colorimetric measurement procedure, uric acid was remeasured in 200 participants from stored plasma in 2011–2013 on a Beckman Olympus 480 autoanalyzer. To conduct IOR, we excluded data points &gt;3 SDs from the mean difference. We continued this process using the resulting data until no outliers remained. RESULTS IOR detected more outliers and yielded greater precision in simulation. The original mean difference (SD) in uric acid was 1.25 (0.62) mg/dL. After 4 iterations, 9 outliers were excluded, and the mean difference (SD) was 1.23 (0.45) mg/dL. Conducting only one round of outlier removal (standard approach) would have excluded 4 outliers [mean difference (SD) = 1.22 (0.51) mg/dL]. Applying the recalibration (derived from Deming regression) from each approach to the original measurements, the prevalence of hyperuricemia (&gt;7 mg/dL) was 28.5% before IOR and 8.5% after IOR. CONCLUSIONS IOR is a useful method for removal of extreme outliers irrelevant to recalibrating laboratory measurements, and identifies more extraneous outliers than the standard approach.
APA, Harvard, Vancouver, ISO, and other styles
12

Zhao, Xi, Yun Zhang, Shoulie Xie, Qianqing Qin, Shiqian Wu, and Bin Luo. "Outlier Detection Based on Residual Histogram Preference for Geometric Multi-Model Fitting." Sensors 20, no. 11 (May 27, 2020): 3037. http://dx.doi.org/10.3390/s20113037.

Full text
Abstract:
Geometric model fitting is a fundamental issue in computer vision, and the fitting accuracy is affected by outliers. In order to eliminate the impact of the outliers, the inlier threshold or scale estimator is usually adopted. However, a single inlier threshold cannot satisfy multiple models in the data, and scale estimators with a certain noise distribution model work poorly in geometric model fitting. It can be observed that the residuals of outliers are big for all true models in the data, which makes the consensus of the outliers. Based on this observation, we propose a preference analysis method based on residual histograms to study the outlier consensus for outlier detection in this paper. We have found that the outlier consensus makes the outliers gather away from the inliers on the designed residual histogram preference space, which is quite convenient to separate outliers from inliers through linkage clustering. After the outliers are detected and removed, a linkage clustering with permutation preference is introduced to segment the inliers. In addition, in order to make the linkage clustering process stable and robust, an alternative sampling and clustering framework is proposed in both the outlier detection and inlier segmentation processes. The experimental results also show that the outlier detection scheme based on residual histogram preference can detect most of the outliers in the data sets, and the fitting results are better than most of the state-of-the-art methods in geometric multi-model fitting.
APA, Harvard, Vancouver, ISO, and other styles
13

Massarweh, Nader N., Chung-Yuan Hu, Y. Nancy You, Brian K. Bednarski, Miguel A. Rodriguez-Bigas, John M. Skibber, Scott B. Cantor, Janice N. Cormier, Barry W. Feig, and George J. Chang. "Risk-Adjusted Pathologic Margin Positivity Rate As a Quality Indicator in Rectal Cancer Surgery." Journal of Clinical Oncology 32, no. 27 (September 20, 2014): 2967–74. http://dx.doi.org/10.1200/jco.2014.55.5334.

Full text
Abstract:
Purpose Margin positivity after rectal cancer resection is associated with poorer outcomes. We previously developed an instrument for calculating hospital risk-adjusted margin positivity rate (RAMP) that allows identification of performance-based outliers and may represent a rectal cancer surgery quality metric. Methods This was an observational cohort study of patients with rectal cancer within the National Cancer Data Base (2003 to 2005). Hospital performance was categorized as low outlier (better than expected), high outlier (worse than expected), or non-RAMP outlier using standard observed-to-expected methodology. The association between outlier status and overall risk of death at 5 years was evaluated using Cox shared frailty modeling. Results Among 32,354 patients with cancer (mean age, 63.8 ± 13.2 years; 56.7% male; 87.3% white) treated at 1,349 hospitals (4.9% high outlier, 0.7% low outlier), 5.6% of patients were treated at high outliers and 3.0% were treated at low outliers. Various structural (academic status and volume), process (pathologic nodal evaluation and neoadjuvant radiation therapy use), and outcome (sphincter preservation, readmission, and 30-day postoperative mortality) measures were significantly associated with outlier status. Five-year overall survival was better at low outliers (79.9%) compared with high outliers (64.9%) and nonoutliers (68.9%; log-rank test, P < .001). Risk of death was lower at low outliers compared with high outliers (hazard ratio [HR], 0.61; 95% CI, 0.50 to 0.75) and nonoutliers (HR, 0.69; 95% CI, 0.57 to 0.83). Risk of death was higher at high outliers compared with nonoutliers (HR, 1.12; 95% CI, 1.03 to 1.23). Conclusion Hospital RAMP outlier status is a rectal cancer surgery composite metric that reliably captures hospital quality across all levels of care and could be integrated into existing quality improvement initiatives for hospital performance.
APA, Harvard, Vancouver, ISO, and other styles
14

Fitrianto, Anwar, Wan Zuki Azman Wan Muhamad, Suliana Kriswan, and Budi Susetyo. "Comparing Outlier Detection Methods using Boxplot Generalized Extreme Studentized Deviate and Sequential Fences." Aceh International Journal of Science and Technology 11, no. 1 (April 24, 2022): 38–45. http://dx.doi.org/10.13170/aijst.11.1.23809.

Full text
Abstract:
Outliers identification is essential in data analysis since it can make wrong inferential statistics. This study aimed to compare the performance of Boxplot, Generalized Extreme Studentized Deviate (Generalized ESD), and Sequential Fences method in identifying outliers. A published dataset was used in the study. Based on preliminary outlier identification, the data did not contain outliers. Each outlier detection method's performance was evaluated by contaminating the original data with few outliers. The contaminations were conducted by replacing the two smallest and largest observations with outliers. The analysis was conducted using SAS version 9.2 for both original and contaminated data. We found that Sequential Fences have outstanding performance in identifying outliers compared to Boxplot and Generalized ESD.
APA, Harvard, Vancouver, ISO, and other styles
15

Cram, Peter, Xin Lu, Stephen L. Kates, Yue Li, and Benjamin J. Miller. "Outliers." Geriatric Orthopaedic Surgery & Rehabilitation 2, no. 4 (July 2011): 135–47. http://dx.doi.org/10.1177/2151458511419847.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

He, Zengyou, Xiaofei Xu, Zhexue Huang, and Shengchun Deng. "FP-outlier: Frequent pattern based outlier detection." Computer Science and Information Systems 2, no. 1 (2005): 103–18. http://dx.doi.org/10.2298/csis0501103h.

Full text
Abstract:
An outlier in a dataset is an observation or a point that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of such outliers is important for many applications and has recently attracted much attention in the data mining research community. In this paper, we present a new method to detect outliers by discovering frequent patterns (or frequent itemsets) from the data set. The outliers are defined as the data transactions that contain less frequent patterns in their itemsets. We define a measure called FPOF (Frequent Pattern Outlier Factor) to detect the outlier transactions and propose the FindFPOF algorithm to discover outliers. The experimental results have shown that our approach outperformed the existing methods on identifying interesting outliers.
APA, Harvard, Vancouver, ISO, and other styles
17

Twumasi-Ankrah, Sampson, Simon Kojo Appiah, Doris Arthur, Wilhemina Adoma Pels, Jonathan Kwaku Afriyie, and Danielson Nartey. "Comparison of outlier detection techniques in non-stationary time series data." Global Journal of Pure and Applied Sciences 27, no. 1 (March 5, 2021): 55–60. http://dx.doi.org/10.4314/gjpas.v27i1.7.

Full text
Abstract:
This study examined the performance of six outlier detection techniques using a non-stationary time series dataset. Two key issues were of interest. Scenario one was the method that could correctly detect the number of outliers introduced into the dataset whiles scenario two was to find the technique that would over detect the number of outliers introduced into the dataset, when a dataset contains only extreme maxima values, extreme minima values or both. Air passenger dataset was used with different outliers or extreme values ranging from 1 to 10 and 40. The six outlier detection techniques used in this study were Mahalanobis distance, depth-based, robust kernel-based outlier factor (RKOF), generalized dispersion, Kth nearest neighbors distance (KNND), and principal component (PC) methods. When detecting extreme maxima, the Mahalanobis and the principal component methods performed better in correctly detecting outliers in the dataset. Also, the Mahalanobis method could identify more outliers than the others, making it the "best" method for the extreme minima category. The kth nearest neighbor distance method was the "best" method for not over-detecting the number of outliers for extreme minima. However, the Mahalanobis distance and the principal component methods were the "best" performed methods for not over-detecting the number of outliers for the extreme maxima category. Therefore, the Mahalanobis outlier detection technique is recommended for detecting outlier in nonstationary time series data.
APA, Harvard, Vancouver, ISO, and other styles
18

Johansen, Martin Berg, and Peter Astrup Christensen. "A simple transformation independent method for outlier definition." Clinical Chemistry and Laboratory Medicine (CCLM) 56, no. 9 (August 28, 2018): 1524–32. http://dx.doi.org/10.1515/cclm-2018-0025.

Full text
Abstract:
AbstractBackground:Definition and elimination of outliers is a key element for medical laboratories establishing or verifying reference intervals (RIs). Especially as inclusion of just a few outlying observations may seriously affect the determination of the reference limits. Many methods have been developed for definition of outliers. Several of these methods are developed for the normal distribution and often data require transformation before outlier elimination.Methods:We have developed a non-parametric transformation independent outlier definition. The new method relies on drawing reproducible histograms. This is done by using defined bin sizes above and below the median. The method is compared to the method recommended by CLSI/IFCC, which uses Box-Cox transformation (BCT) and Tukey’s fences for outlier definition. The comparison is done on eight simulated distributions and an indirect clinical datasets.Results:The comparison on simulated distributions shows that without outliers added the recommended method in general defines fewer outliers. However, when outliers are added on one side the proposed method often produces better results. With outliers on both sides the methods are equally good. Furthermore, it is found that the presence of outliers affects the BCT, and subsequently affects the determined limits of current recommended methods. This is especially seen in skewed distributions. The proposed outlier definition reproduced current RI limits on clinical data containing outliers.Conclusions:We find our simple transformation independent outlier detection method as good as or better than the currently recommended methods.
APA, Harvard, Vancouver, ISO, and other styles
19

Read, Randy J. "Detecting outliers in non-redundant diffraction data." Acta Crystallographica Section D Biological Crystallography 55, no. 10 (October 1, 1999): 1759–64. http://dx.doi.org/10.1107/s0907444999008471.

Full text
Abstract:
Outliers are observations which are very unlikely to be correct, as judged by independent observations or other prior information. Such unexpected observations are treated, effectively, as being more informative about possible models, so they can seriously impede the course of structure determination and refinement. The best way to detect and eliminate outliers is to collect highly redundant data, but it is not always possible to make multiple measurements of every reflection. For non-redundant data, the prior expectation given either by a Wilson distribution of intensities or model-based structure-factor probability distributions can be used to detect outliers. This captures mostly the excessively strong reflections, which dominate the features of electron-density maps or, even more so, Patterson maps. The outlier rejection tests have been implemented in a program,Outliar.
APA, Harvard, Vancouver, ISO, and other styles
20

Adikaram, K. K. L. B., M. A. Hussein, M. Effenberger, and T. Becker. "Outlier Detection Method in Linear Regression Based on Sum of Arithmetic Progression." Scientific World Journal 2014 (2014): 1–12. http://dx.doi.org/10.1155/2014/821623.

Full text
Abstract:
We introduce a new nonparametric outlier detection method for linear series, which requires no missing or removed data imputation. For an arithmetic progression (a series without outliers) withnelements, the ratio (R) of the sum of the minimum and the maximum elements and the sum of all elements is always2/n:(0,1].R≠2/nalways implies the existence of outliers. Usually,R<2/nimplies that the minimum is an outlier, andR>2/nimplies that the maximum is an outlier. Based upon this, we derived a new method for identifying significant and nonsignificant outliers, separately. Two different techniques were used to manage missing data and removed outliers: (1) recalculate the terms after (or before) the removed or missing element while maintaining the initial angle in relation to a certain point or (2) transform data into a constant value, which is not affected by missing or removed elements. With a reference element, which was not an outlier, the method detected all outliers from data sets with 6 to 1000 elements containing 50% outliers which deviated by a factor of±1.0e-2to±1.0e+2from the correct value.
APA, Harvard, Vancouver, ISO, and other styles
21

Chung, Se Yeon, and Sang Cheol Kim. "Anomaly Detection in Livestock Environmental Time Series Data Using LSTM Autoencoders: A Comparison of Performance Based on Threshold Settings." Korean Institute of Smart Media 13, no. 4 (April 30, 2024): 48–56. http://dx.doi.org/10.30693/smj.2024.13.4.48.

Full text
Abstract:
In the livestock industry, detecting environmental outliers and predicting data are crucial tasks. Outliers in livestock environment data, typically gathered through time-series methods, can signal rapid changes in the environment and potential unexpected epidemics. Prompt detection and response to these outliers are essential to minimize stress in livestock and reduce economic losses for farmers by early detection of epidemic conditions. This study employs two methods to experiment and compare performances in setting thresholds that define outliers in livestock environment data outlier detection. The first method is an outlier detection using Mean Squared Error (MSE), and the second is an outlier detection using a Dynamic Threshold, which analyzes variability against the average value of previous data to identify outliers. The MSE-based method demonstrated a 94.98% accuracy rate, while the Dynamic Threshold method, which uses standard deviation, showed superior performance with 99.66% accuracy.
APA, Harvard, Vancouver, ISO, and other styles
22

Al. Abri, Khoula, and Manjit Singh Sidhu. "Machine Learning Approaches to Advanced Outlier Detection in Psychological Datasets." International journal of electrical and computer engineering systems 15, no. 1 (January 19, 2024): 13–20. http://dx.doi.org/10.32985/ijeces.15.1.2.

Full text
Abstract:
The core aim of this study is to determine the most effective outlier detection methodologies for multivariate psychological datasets, particularly those derived from Omani students. Due to their complex nature, such datasets demand robust analytical methods. To this end, we employed three sophisticated algorithms: local outlier factor (LOF), one-class support vector machine (OCSVM), and isolation forest (IF). Our initial findings showed 155 outliers by both LOF and IF and 147 by OCSVM. A deeper analysis revealed that LOF detected 55 unique outliers based on differences in local density, OCSVM isolated 44 unique outliers utilizing its transformed feature space, and IF identified 76 unique outliers leveraging its tree-based mechanics. Despite these varying results, all methods had a consensus for just 44 outliers. Employing ensemble techniques, both averaging and voting methods identified 155 outliers, whereas the weighted method highlighted 151, with a consensus of 150 outliers across the board. In conclusion, while individual algorithms provide distinct perspectives, ensemble techniques enhance the accuracy and consistency of outlier detection. This underscores the necessity of using multiple algorithms with ensemble techniques in analyzing psychological datasets, facilitating a richer comprehension of inherent data structures.
APA, Harvard, Vancouver, ISO, and other styles
23

Bouguessa, Mohamed. "A Mixture Model-Based Combination Approach for Outlier Detection." International Journal on Artificial Intelligence Tools 23, no. 04 (August 2014): 1460021. http://dx.doi.org/10.1142/s0218213014600215.

Full text
Abstract:
In this paper, we propose an approach that combines different outlier detection algorithms in order to gain an improved effectiveness. To this end, we first estimate an outlier score vector for each data object. Each element of the estimated vectors corresponds to an outlier score produced by a specific outlier detection algorithm. We then use the multivariate beta mixture model to cluster the outlier score vectors into several components so that the component that corresponds to the outliers can be identified. A notable feature of the proposed approach is the automatic identification of outliers, while most existing methods return only a ranked list of points, expecting the outliers to come first; or require empirical threshold estimation to identify outliers. Experimental results, on both synthetic and real data sets, show that our approach substantially enhances the accuracy of outlier base detectors considered in the combination and overcome their drawbacks.
APA, Harvard, Vancouver, ISO, and other styles
24

Chen, Ping, Ling Dong, Wanyi Chen, and Jin-Guan Lin. "Outlier Detection in Adaptive Functional-Coefficient Autoregressive Models Based on Extreme Value Theory." Mathematical Problems in Engineering 2013 (2013): 1–9. http://dx.doi.org/10.1155/2013/910828.

Full text
Abstract:
This paper proposes several test statistics to detect additive or innovative outliers in adaptive functional-coefficient autoregressive (AFAR) models based on extreme value theory and likelihood ratio tests. All the test statistics follow a tractable asymptotic Gumbel distribution. Also, we propose an asymptotic critical value on a fixed significance level and obtain an asymptoticp-value for testing, which is used to detect outliers in time series. Simulation studies indicate that the extreme value method for detecting outliers in AFAR models is effective both for AO and IO, for a lone outlier and multiple outliers, and for separate outliers and outlier patches. Furthermore, it is shown that our procedure can reduce possible effects of masking and swamping.
APA, Harvard, Vancouver, ISO, and other styles
25

Baba, Ali Mohammed, Habshah Midi, and Nur Haizum Abd Rahman. "Spatial Outlier Accommodation Using a Spatial Variance Shift Outlier Model." Mathematics 10, no. 17 (September 3, 2022): 3182. http://dx.doi.org/10.3390/math10173182.

Full text
Abstract:
Outlier detection has been a long-debated subject among researchers due to its effect on model fitting. Spatial outlier detection has received considerable attention in the recent past. On the other hand, outlier accommodation, particularly in spatial applications, retains vital information about the model. It is pertinent to develop a method that is capable of accommodating detected spatial outliers in a fashion that retains vital information in the spatial models. In this paper, we formulate the variance shift outlier model (SVSOM) in the spatial regression as a robust spatial model using restricted maximum likelihood (REML) and use weights based on the detected outliers in the model. The spatial outliers are accommodated via a revised model for the outlier observations with the help of the SVSOM. Simulation results show that the SVSOM, based on the detected spatial outliers is more efficient than the general spatial model (GSM). The findings of this study also reveal that contamination in the residuals and x variable have little effect on the parameter estimates of the SVSOM, and that outliers in the y variable are always detectable. Asymptotic distribution of the squared spatial prediction residuals are obtained to confirm the outlyingness of an observation. The merit of or proposed SVSOM for the detection and accommodating outliers is also confirmed using artificial and COVID-19 data sets.
APA, Harvard, Vancouver, ISO, and other styles
26

Wang, Lihui, Kangyi Zhi, Bin Li, and Yuexin Zhang. "Dynamically Adjusting Filter Gain Method for Suppressing GNSS Observation Outliers in Integrated Navigation." Journal of Navigation 71, no. 6 (June 29, 2018): 1396–412. http://dx.doi.org/10.1017/s0373463318000334.

Full text
Abstract:
Global Navigation Satellite Systems (GNSSs) are easily influenced by the external environment. Signals may be lost or become abnormal thereby causing outliers. The filter gain of the standard Kalman filter of a loosely coupled GNSS/inertial navigation system cannot change with the outliers of the GNSS, causing large deviations in the filtering results. In this paper, a method based on a χ2-test and a dynamically adjusting filter gain method are proposed to detect and separately to suppress GNSS observation outliers in integrated navigation. An indicator of an innovation vector is constructed, and a χ2-test is performed for this indicator. If it fails the test, the corresponding observation value is considered as an outlier. A scale factor is constructed according to this outlier, which is then used to lower the filter gain dynamically to decrease the influence of outliers. The simulation results demonstrate that the observation outlier processing method does not affect the normal values under normal circumstances; it can also discriminate between single and continuous outliers without errors or omissions. The impact time of outliers is greatly reduced, and the system performance is improved by more than 90%. Experimental results indicate that the proposed methods are effective in suppressing GNSS observation outliers in integrated navigation.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhou, Shi Bo, and Wei Xiang Xu. "Local Outlier Detection Algorithm Based on Coefficient of Variation." Applied Mechanics and Materials 635-637 (September 2014): 1723–28. http://dx.doi.org/10.4028/www.scientific.net/amm.635-637.1723.

Full text
Abstract:
Local outliers detection is an important issue in data mining. By analyzing the limitations of the existing outlier detection algorthms, a local outlier detection algorthm based on coefficient of variation is introduced. This algorthms applies K-means which is strong in outliers searching, divides data set into sections, puts outliers and their nearing clusters into a local neighbourhood, then figures out the local deviation factor of each local neighbourhood by coefficient of variation, as a result, local outliers can more likely be found.The heoretic analysis and experimental results indicate that the method is ef fective and efficient.
APA, Harvard, Vancouver, ISO, and other styles
28

Rajalakshmi, P., and P. Geetha. "Detection of Outliers through Influence Function on Affinity." Mapana - Journal of Sciences 6, no. 2 (November 30, 2007): 34–44. http://dx.doi.org/10.12723/mjs.11.2.

Full text
Abstract:
Outliers are the atypical observations that lie at abnormal distances from the other observations in a random sample. Such outliers are often seen as contaminating the data. In general, the rejection of influential outliers improves the accuracy of the estimators and so the results with the identification of outliers have become the most important aspect in any data analysis. Outlier detection finds many applications in the areas such as data cleaning, fraud detection, network intrusion, pharmaceutical research and exploration in science data buses. The distance based outlier detection is the most commonly used method. In this paper, the influence function for affinity is explained and the detection of outliers in classification problems using influence function for affinity is illustrated for univariate data through a few examples.
APA, Harvard, Vancouver, ISO, and other styles
29

Steinbuss, Georg, and Klemens Böhm. "Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data." ACM Transactions on Knowledge Discovery from Data 15, no. 4 (June 2021): 1–20. http://dx.doi.org/10.1145/3441453.

Full text
Abstract:
Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instances with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work, we propose a generic process for the generation of datasets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. We propose and describe a generic process for the benchmarking of unsupervised outlier detection, as sketched so far. We then describe three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. To validate our process, we perform a benchmark with state-of-the-art detection methods and carry out experiments to study the quality of data reconstructed in this way. Next to showcasing the workflow, this confirms the usefulness of our proposed process. In particular, our process yields regular instances close to the ones from real data. Summing up, we propose and validate a new and practical process for the benchmarking of unsupervised outlier detection.
APA, Harvard, Vancouver, ISO, and other styles
30

Vasuki, C. "OUTLIER DETECTION." International Scientific Journal of Engineering and Management 03, no. 03 (March 23, 2024): 1–9. http://dx.doi.org/10.55041/isjem01499.

Full text
Abstract:
To find observations that differ considerably from the bulk of the data points, outlier detection is an essential task in data analysis. To put it more simply, outliers are individual data points that stand out from the rest of the dataset. the Iris dataset, a machine learning benchmark, is used for outlier detection. The Rank SVM method, which is typically utilized for ranking jobs but has been modified for outlier identification, is used to find outliers. Standardizing the features pre-processes the dataset, which consists of measurements of iris flower sepal and petal diameters. The standardised dataset is used to train a Rank SVM model. According to the model's categorization, outliers are anticipated to be categorized as -1 and inliers as 1. The study shows the indices of outliers found in the Iris dataset and sheds light on how well Rank SVM works for outlier identification tasks. Keywords: outlier detection, iris dataset, human behavior, machine learning
APA, Harvard, Vancouver, ISO, and other styles
31

Zhang, Yun, Bin Yang, Xi Zhao, Shiqian Wu, Bin Luo, and Liangpei Zhang. "Outlier Detection by Energy Minimization in Quantized Residual Preference Space for Geometric Model Fitting." Electronics 13, no. 11 (May 28, 2024): 2101. http://dx.doi.org/10.3390/electronics13112101.

Full text
Abstract:
Outliers significantly impact the accuracy of geometric model fitting. Previous approaches to handling outliers have involved threshold selection and scale estimation. However, many scale estimators assume that the inlier distribution follows a Gaussian model, which often does not accurately represent cases in geometric model fitting. Outliers, defined as points with large residuals to all true models, exhibit similar characteristics to high values in quantized residual preferences, thus causing outliers to cluster away from inliers in quantized residual preference space. In this paper, we leverage this consensus among outliers in quantized residual preference space by extending energy minimization to combine model error and spatial smoothness for outlier detection. The outlier detection process based on energy minimization follows an alternate sampling and labeling framework. Subsequently, an ordinary energy minimization method is employed to optimize inlier labels, thereby following the alternate sampling and labeling framework. Experimental results demonstrate that the energy minimization-based outlier detection method effectively identifies most outliers in the data. Additionally, the proposed energy minimization-based inlier segmentation accurately segments inliers into different models. Overall, the performance of the proposed method surpasses that of most state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
32

Hekimoglu, S., B. Erdogan, and R. C. Erenoglu. "A New Outlier Detection Method Considering Outliers As Model Errors." Experimental Techniques 39, no. 1 (November 26, 2012): 57–68. http://dx.doi.org/10.1111/j.1747-1567.2012.00876.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Ahmar, Ansari Saleh, Suryo Guritno, Abdurakhman, Abdul Rahman, Awi, Alimuddin, Ilham Minggi, et al. "Modeling Data Containing Outliers using ARIMA Additive Outlier (ARIMA-AO)." Journal of Physics: Conference Series 954 (January 2018): 012010. http://dx.doi.org/10.1088/1742-6596/954/1/012010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Lalitha, S., and Nirpeksh Kumar. "Multiple outlier test for upper outliers in an exponential sample." Journal of Applied Statistics 39, no. 6 (June 2012): 1323–30. http://dx.doi.org/10.1080/02664763.2011.645158.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Yang, Jiawei, Xu Tan, and Sylwan Rahardja. "MiPo: How to Detect Trajectory Outliers with Tabular Outlier Detectors." Remote Sensing 14, no. 21 (October 27, 2022): 5394. http://dx.doi.org/10.3390/rs14215394.

Full text
Abstract:
Trajectory outlier detection is one of the fundamental data mining techniques used to analyze the trajectory data of the Global Positioning System. A comprehensive literature review of trajectory outlier detectors published between 2000 and 2022 led to a conclusion that conventional trajectory outlier detectors suffered from drawbacks, either due to the detectors themselves or the pre-processing methods for the variable-length trajectory inputs utilized by detectors. To address these issues, we proposed a feature extraction method called middle polar coordinates (MiPo). MiPo extracted tabular features from trajectory data prior to the application of conventional outlier detectors to detect trajectory outliers. By representing variable-length trajectory data as fixed-length tabular data, MiPo granted tabular outlier detectors the ability to detect trajectory outliers, which was previously impossible. Experiments with real-world datasets showed that MiPo outperformed all baseline methods with 0.99 AUC on average; however, it only required approximately 10% of the computing time of the existing industrial best. MiPo exhibited linear time and space complexity. The features extracted by MiPo may aid other trajectory data mining tasks. We believe that MiPo has the potential to revolutionize the field of trajectory outlier detection.
APA, Harvard, Vancouver, ISO, and other styles
36

Berki, S. E., and Nancy B. Schneier. "Frequency and Cost of Diagnosis-Related Group Outliers Among Newborns." Pediatrics 79, no. 6 (June 1, 1987): 874–81. http://dx.doi.org/10.1542/peds.79.6.874.

Full text
Abstract:
Analysis of outliers, as defined by the Health Core Financing Administration, among 47,776 newborns discharged from 33 short-term hospitals in Maryland in 1981 shows that the three prematurity diagnosis-related groups (DRGs) (386 to 388) represented only 5.3% of all discharges of newborns, but more than one fifth of all outliers and more than three fifths of outlier days of care for newborns. The disparity in charges for outliers and inliers (not exceeding the "trim point") is even more dramatic. Newborns with "extreme immaturity" (DRG 386) and "prematurity with major problems" (DRG 387) together accounted for less than 3% of all newborn discharges but for nearly one fourth of all outlier discharges. The mean length of stay in hospitals for outliers in those two DRGs was more than 2 months. The mean charge per outlier discharge in DRG 386 was $27,061 in 1981. Nearly one third of the discharges and more than two thirds of the days of care in this DRG were for outliers. Outliers occurred up to five times more often among premature neonates than among normal newborns and occurred preponderantly in teaching hospitals, especially those with more than 400 beds. This finding may require a reevaluation of the outlier trim points and the reimbursement method for newborn DRGs to assure adequate payment to the providers of neonatal intensive care, mainly large teaching hospitals.
APA, Harvard, Vancouver, ISO, and other styles
37

LI, SHUKAI, and WEE KEONG NG. "MAXIMUM VOLUME OUTLIER DETECTION AND ITS APPLICATIONS IN CREDIT RISK ANALYSIS." International Journal on Artificial Intelligence Tools 22, no. 05 (October 2013): 1360012. http://dx.doi.org/10.1142/s0218213013600129.

Full text
Abstract:
Because of the scarcity and diversity of outliers, it is very difficult to design a robust outlier detector. In this paper, we first propose to use the maximum margin criterion to sift unknown outliers, which demonstrates superior performance. However, the resultant learning task is formulated as a Mixed Integer Programming (MIP) problem, which is computationally hard. Therefore, we alter the recently developed label generating technique, which efficiently solves a convex relaxation of the MIP problem of outlier detection. Specifically, we propose an effective procedure to find a largely violated labeling vector for identifying rare outliers from abundant normal patterns, and its convergence is also presented. Then, a set of largely violated labeling vectors are combined by multiple kernel learning methods to robustly detect outliers. Besides these, in order to further enhance the efficacy of our outlier detector, we also explore the use of maximum volume criterion to measure the quality of separation between outliers and normal patterns. This criterion can be easily incorporated into our proposed framework by introducing an additional regularization. Comprehensive experiments on toy and real-world data sets verify that the outlier detectors using the two proposed criteria outperform existing outlier detection methods. Furthermore, our models are employed to detect corporate credit risk and demonstrate excellent performance.
APA, Harvard, Vancouver, ISO, and other styles
38

Lee, Kyuman, and Eric N. Johnson. "Robust Outlier-Adaptive Filtering for Vision-Aided Inertial Navigation." Sensors 20, no. 7 (April 4, 2020): 2036. http://dx.doi.org/10.3390/s20072036.

Full text
Abstract:
With the advent of unmanned aerial vehicles (UAVs), a major area of interest in the research field of UAVs has been vision-aided inertial navigation systems (V-INS). In the front-end of V-INS, image processing extracts information about the surrounding environment and determines features or points of interest. With the extracted vision data and inertial measurement unit (IMU) dead reckoning, the most widely used algorithm for estimating vehicle and feature states in the back-end of V-INS is an extended Kalman filter (EKF). An important assumption of the EKF is Gaussian white noise. In fact, measurement outliers that arise in various realistic conditions are often non-Gaussian. A lack of compensation for unknown noise parameters often leads to a serious impact on the reliability and robustness of these navigation systems. To compensate for uncertainties of the outliers, we require modified versions of the estimator or the incorporation of other techniques into the filter. The main purpose of this paper is to develop accurate and robust V-INS for UAVs, in particular, those for situations pertaining to such unknown outliers. Feature correspondence in image processing front-end rejects vision outliers, and then a statistic test in filtering back-end detects the remaining outliers of the vision data. For frequent outliers occurrence, variational approximation for Bayesian inference derives a way to compute the optimal noise precision matrices of the measurement outliers. The overall process of outlier removal and adaptation is referred to here as “outlier-adaptive filtering”. Even though almost all approaches of V-INS remove outliers by some method, few researchers have treated outlier adaptation in V-INS in much detail. Here, results from flight datasets validate the improved accuracy of V-INS employing the proposed outlier-adaptive filtering framework.
APA, Harvard, Vancouver, ISO, and other styles
39

Yang, Zi Rong, and Zhen Zeng. "Outlier Analysis in Large Sample and High Dimensional Data Based on Feature Weighting." Applied Mechanics and Materials 571-572 (June 2014): 650–57. http://dx.doi.org/10.4028/www.scientific.net/amm.571-572.650.

Full text
Abstract:
The usual method of outlier analysis is mainly analyzing the outliers according to the Anomaly Index and Variable Contribution Measurement. But in the analysis of large samples of high-dimensional data, this method is difficult. Owing to this, this paper presents a method that weight value for outliers is introduced. The features of outliers are weighted by Analytic Hierarchy Process method. Through this method, the importance of each property of outlier for data mining’s target is rationed, namely the weight number of each property is calculated. And then the correlation values, which represent the degree of relevance between outliers and data mining target, are calculated by using the weight number multiplying by the property value. After correlation values computed, we array the correlation values of outlier from high to low then outlier analysis can become more efficient. At the end of this paper, an instance is presented to demonstrate the maneuverability and feasibility of the method.
APA, Harvard, Vancouver, ISO, and other styles
40

Rajalakshmi, S., and P. Madhubala. "Certain Investigation on Perpetualistic Fuzzy Outlier Data for Efficiency Evaluation of Centroid Stability with Cluster Boundary Fitness." Data Analytics and Artificial Intelligence 3, no. 2 (January 1, 2023): 16–20. http://dx.doi.org/10.46632/daai/3/2/4.

Full text
Abstract:
This paper aims to investigate certain factors that hide outliers in two dimensions such as boundary partitioning and space angular parameters. In this proposed algorithm, boundary representation of clusters, the data points that lie on the cluster boundary is stored geometrically as coordinate values such as i_bound (inliers) and o_bound(outliers). Outliers that present in dataset are investigated by boundary fitness over centroid stability. In this paper we focus to examine whether the data point lie on the boundary is treated as inliers or outliers. Several iterations are manipulated to fix the outlier point deeply. Using fuzzy clustering, data points are clustered and boundary is fixed. If the space occupied by the cluster varies for every iteration, the distance from inlier to outlier between the boundaries is calculated. After calculation, if the data point is below the threshold value, it is treated as outlier. Our proposed method shows efficiency over evaluation metrics of outlier detection performance.
APA, Harvard, Vancouver, ISO, and other styles
41

Liu, Zhicheng, Yang Zhang, Ruihong Huang, Zhiwei Chen, Shaoxu Song, and Jianmin Wang. "EXPERIENCE: Algorithms and Case Study for Explaining Repairs with Uniform Profiles over IoT Data." Journal of Data and Information Quality 13, no. 3 (April 27, 2021): 1–17. http://dx.doi.org/10.1145/3436239.

Full text
Abstract:
IoT data with timestamps are often found with outliers, such as GPS trajectories or sensor readings. While existing systems mostly focus on detecting temporal outliers without explanations and repairs, a decision maker may be more interested in the cause of the outlier appearance such that subsequent actions would be taken, e.g., cleaning unreliable readings or repairing broken devices or adopting a strategy for data repairs. Such outlier detection, explanation, and repairs are expected to be performed in either offline (batch) or online modes (over streaming IoT data with timestamps). In this work, we present TsClean, a new prototype system for detecting and repairing outliers with explanations over IoT data. The framework defines uniform profiles to explain the outliers detected by various algorithms, including the outliers with variant time intervals, and take approaches to repair outliers. Both batch and streaming processing are supported in a uniform framework. In particular, by varying the block size, it provides a tradeoff between computing the accurate results and approximating with efficient incremental computation. In this article, we present several case studies of applying TsClean in industry, e.g., how this framework works in detecting and repairing outliers over excavator water temperature data, and how to get reasonable explanations and repairs for the detected outliers in tracking excavators.
APA, Harvard, Vancouver, ISO, and other styles
42

Yuliatin, Umi. "DETEKSI OUTLIERS DAN ANALISIS INTERVENSI DALAM MODEL ARMA." MAp (Mathematics and Applications) Journal 4, no. 1 (June 30, 2022): 76–84. http://dx.doi.org/10.15548/map.v4i1.4279.

Full text
Abstract:
Adanya kehadiran outliers dalam analisisi runtun waktu mengaburkan estimasi parameter model yang diberikan. Selain itu outlier juga memberi dampak besaran eror yang lebih tinggi. Dalam analisis time series Additive outliers (AO) dan innovational outliers (IO) diperkenalkan sebagai usaha dalam memodelkan outliers. Usaha ini diberikan untuk menangani obserbasi yang tidak mengharmoniskan pola data sehingga membantu untuk dibentuknya model runtun waktu yang sehat terutama dalam proses ARMA. Estimator linier square error (LSE) digunakan untuk mengestimasi besarnya penyimpangan dari model dasarnya. Prosedur iterative dipaparkan sebagai salah satu prosedur untuk mendeteksi kedua model outliers ini. Diperkenalkan juga analisis intervensi yang digunakan untuk mengakomodasi kejadian luar sebagai variabel eksogen dalam proses ARMA. Kemudian kombinasi analisis ourliers-intervensi ini bisa digunakan sebagai kesatuan analisis untuk menangani data yang jauh dari pusat. Sebagai simulasi data dalam kasus ini adalah data PDRB D.I Yogyakarta dalam bidang pertambangan dan penggalian. Dalam analisis ini ditunjukkan deteksi outlier didalam model memberikan jumlahan kuadrat eror yang lebih kecil dibandingkan dengan model tanpa deteksi outlier sedemikian sehingga diperoleh model yang lebih baik.
APA, Harvard, Vancouver, ISO, and other styles
43

Leng, Yong Lin, Hua Shen, and Fu Yu Lu. "Outlier Detection Clustering Algorithm Based on Density." Applied Mechanics and Materials 713-715 (January 2015): 1808–12. http://dx.doi.org/10.4028/www.scientific.net/amm.713-715.1808.

Full text
Abstract:
K-means is a classic algorithm of clustering analysis and widely applied to various data mining fields. Traditional K-means algorithm selects the initial centroids randomly, so the clustering result will be affected by the noise points, and the clustering result is not stable. For this problem, this paper proposed a k-means algorithm based on density outlier detection. The algorithm firstly detected the outliers with the density model and avoided selecting outliers as the initial cluster centers. After clustering the non outlier, according to distance of the outlier to each centroids, the algorithm distributed the outliers to the corresponding clustering. The algorithm effectively reduced the influence of outliers to K-means and improved the accuracy of clustering result. The experimental result demonstrated that this algorithm can effectively improve the accurate rate and stability of the clustering.
APA, Harvard, Vancouver, ISO, and other styles
44

Park, Sangsu, No-Suk Park, Seong-su Kim, Gwirae Jo, and Sukmin Yoon. "Outlier Detection of Water Quality Data Using Ensemble Empirical Mode Decomposition." Journal of Korean Society of Environmental Engineers 43, no. 3 (April 7, 2021): 160–70. http://dx.doi.org/10.4491/ksee.2021.43.3.160.

Full text
Abstract:
Objectives : This study was conducted to propose a new methodology for efficiently identifying and removing various outliers that occur in data collected through automated water quality monitoring systems. In the present study, water temperature data were collected from domestic G_water supply system, and the performance of the proposed methodology was tested for water temperature data collected from domestic G_water supply system.Methods : We applied the following analytical procedure to identify outliers in the water quality data: First, a normality test was performed on the collected data. If normality condition was satisfied, the Z-score was used. However, if the normality condition was not satisfied, outliers were identified using the quartile, and the limitations of the existing methodology were analyzed. Second, we decomposed the intrinsic mode function using empirical mode decomposition and ensemble empirical mode decomposition for the collected data, and then considered the occurrence of modal mixing. Finally, a group of intrinsic mode functions was selected using statistical characteristics to identify outliers. In addition, the performance of the method was verified after removing and interpolating outliers using regression analysis and Cook’s distance.Results and Discussion : In the case of water temperature data, as normality condition was not satisfied, outlier identification was carried out by applying the modified quartile method. It was confirmed that outliers distributed within the seasonal component could not be identified at all. In the case of empirical mode decomposition, modal mixing occurred because of the effect of outliers. However, in the case of the ensemble empirical mode decomposition, modal mixing was resolved and the distinct seasonal components were decomposed as intrinsic mode functions. The intrinsic mode functions were synthesized, which showed statistical correlation with the raw water temperature data. As a result of developing a regression model using the synthesized intrinsic mode functions and raw water temperature data and performing outlier search based on Cook’s distances, we concluded that various outliers distributed within the seasonal component could be effectively identified.Conclusions : Considering that satisfactory results could be derived from statistical analysis of the data collected from the automated water quality monitoring system, it can be concluded that outlier identification procedures are essential. However, in the case of the conventional univariate outlier search method, it is apparent that the outlier search performance is significantly poor for data with strong inherent variability, and the interpolation method for the searched outlier cannot be performed. Conversely, the outlier identification method based on ensemble empirical mode decomposition and regression analysis proposed in this study shows excellent discrimination performance for outliers distributed in data with strong inherent variability. Moreover, this method has the advantage of reducing the analyst’s dependence on subjective judgment by presenting statistical cutoff criteria. An additional advantage of the method is that data can be interpolated after removing outliers using intrinsic mode functions. Therefore, the outlier search and interpolation method proposed in this study is expected to have greater applicability as a more effective analysis tool compared to the existing univariate outlier search method.
APA, Harvard, Vancouver, ISO, and other styles
45

van der Spoel, Evie, Jungyeon Choi, Ferdinand Roelfsema, Saskia le Cessie, Diana van Heemst, and Olaf M. Dekkers. "Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data." Journal of Biological Rhythms 34, no. 4 (June 12, 2019): 347–63. http://dx.doi.org/10.1177/0748730419850917.

Full text
Abstract:
Measurement errors commonly occur in 24-h hormonal data and may affect the outcomes of such studies. Measurement errors often appear as outliers in such data sets; however, no well-established method is available for their automatic detection. In this study, we aimed to compare performances of different methods for outlier detection in hormonal serial data. Hormones (glucose, insulin, thyroid-stimulating hormone, cortisol, and growth hormone) were measured in blood sampled every 10 min for 24 h in 38 participants of the Leiden Longevity Study. Four methods for detecting outliers were compared: (1) eyeballing, (2) Tukey’s fences, (3) stepwise approach, and (4) the expectation-maximization (EM) algorithm. Eyeballing detects outliers based on experts’ knowledge, and the stepwise approach incorporates physiological knowledge with a statistical algorithm. Tukey’s fences and the EM algorithm are data-driven methods, using interquartile range and a mathematical algorithm to identify the underlying distribution, respectively. The performance of the methods was evaluated based on the number of outliers detected and the change in statistical outcomes after removing detected outliers. Eyeballing resulted in the lowest number of outliers detected (1.0% of all data points), followed by Tukey’s fences (2.3%), the stepwise approach (2.7%), and the EM algorithm (11.0%). In all methods, the mean hormone levels did not change materially after removing outliers. However, their minima were affected by outlier removal. Although removing outliers affected the correlation between glucose and insulin on the individual level, when averaged over all participants, none of the 4 methods influenced the correlation. Based on our results, the EM algorithm is not recommended given the high number of outliers detected, even where data points are physiologically plausible. Since Tukey’s fences is not suitable for all types of data and eyeballing is time-consuming, we recommend the stepwise approach for outlier detection, which combines physiological knowledge and an automated process.
APA, Harvard, Vancouver, ISO, and other styles
46

Tian, Chao, and Peng-Lang Shui. "Outlier-Robust Truncated Maximum Likelihood Parameter Estimation of Compound-Gaussian Clutter with Inverse Gaussian Texture." Remote Sensing 14, no. 16 (August 17, 2022): 4004. http://dx.doi.org/10.3390/rs14164004.

Full text
Abstract:
Compound-Gaussian distributions with inverse Gaussian textures, referred to as the IGCG distributions, are often used to model moderate/high-resolution sea clutter in amplitude. In moderate/high-resolution maritime radars, parameter estimation of the IGCG distributions from radar returns data plays an important role in adaptive target detection. Due to the inevitable existence of outliers of high amplitude in radar returns data from targets and reefs, parameter estimation must be outlier robust. In this paper, an outlier-robust truncated maximum likelihood (TML) estimation method is proposed to mitigate the effect of outliers of high amplitude in data. The data are first transferred into the truncated data by removing a given percentage of the largest samples in amplitude. From the truncated data, the truncated likelihood function is constructed, and its maximum corresponds to the TML estimates of the scale and inverse shape parameters. Further, an iterative algorithm is presented to obtain the TML estimates from data with outliers, which is an extension of the ML estimation method in the case that data contain outliers. In comparison with outlier-sensitive estimation methods and outlier-robust bipercentile estimation methods, the performance of the TML estimation method is close to that of the best ML estimation method in the case that data are without outlier, and it is better in the case that data are with outliers.
APA, Harvard, Vancouver, ISO, and other styles
47

Matkan, A. A., M. Hajeb, B. Mirbagheri, S. Sadeghian, and M. Ahmadi. "SPATIAL ANALYSIS FOR OUTLIER REMOVAL FROM LIDAR DATA." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XL-2/W3 (October 22, 2014): 187–90. http://dx.doi.org/10.5194/isprsarchives-xl-2-w3-187-2014.

Full text
Abstract:
Outlier detection in LiDAR point clouds is a necessary process before the subsequent modelling. So far, many studies have been done in order to remove the outliers from LiDAR data. Some of the existing algorithms require ancillary data such as topographic map, multiple laser returns or intensity data which may not be available, and some deal only with the single isolated outliers. This is an attempt to present an algorithm to remove both the single and cluster types of outliers, by exclusively use of the last return data. The outliers will be removed by spatial analyzing of LiDAR point clouds in a hierarchical scheme that is uses a cross-validation technique. The algorithm is tested on a dataset including many single and cluster outliers. Our algorithm can deal with both the irregular LiDAR point clouds and the regular grid data. Experimental results show that the presented algorithm almost completely detects both the single and cluster outliers, but some inlier points are wrongly removed as outlier. An accuracy assessment indicated 0.018% Error α and, 0.352% Error β that are very satisfactory.
APA, Harvard, Vancouver, ISO, and other styles
48

Horn, Paul S., Lan Feng, Yanmei Li, and Amadeo J. Pesce. "Effect of Outliers and Nonhealthy Individuals on Reference Interval Estimation." Clinical Chemistry 47, no. 12 (December 1, 2001): 2137–45. http://dx.doi.org/10.1093/clinchem/47.12.2137.

Full text
Abstract:
Abstract Background: Improvement in reference interval estimation using a new outlier detection technique, even with a physician-determined healthy sample, is examined. The effect of including physician-determined nonhealthy individuals in the sample is evaluated. Methods: Traditional data transformation coupled with robust and exploratory outlier detection methodology were used in conjunction with various reference interval determination techniques. A simulation study was used to examine the effects of outliers on known reference intervals. Physician-defined healthy groups with and without nonhealthy individuals were compared on real data. Results: With 5% outliers in simulated samples, the described outlier detection techniques had narrower reference intervals. Application of the technique to real data provided reference intervals that were, on average, 10% narrower than those obtained when outlier detection was not used. Only 1.6% of the samples were identified as outliers and removed from reference interval determination in both the healthy and combined samples. Conclusions: Even in healthy samples, outliers may exist. Combining traditional and robust statistical techniques provide a good method of identifying outliers in a reference interval setting. Laboratories in general do not have a well-defined healthy group from which to compute reference intervals. The effect of nonhealthy individuals in the computation increases reference interval width by ∼10%. However, there is a large deviation among analytes.
APA, Harvard, Vancouver, ISO, and other styles
49

Kircher, Magdalena, Josefin Säurich, Michael Selle, and Klaus Jung. "Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier." Genes 14, no. 2 (February 1, 2023): 387. http://dx.doi.org/10.3390/genes14020387.

Full text
Abstract:
Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier’s performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses.
APA, Harvard, Vancouver, ISO, and other styles
50

Nowak-Brzezińska, Agnieszka, and Igor Gaibei. "How the Outliers Influence the Quality of Clustering?" Entropy 24, no. 7 (June 30, 2022): 917. http://dx.doi.org/10.3390/e24070917.

Full text
Abstract:
In this article, we evaluate the efficiency and performance of two clustering algorithms: AHC (Agglomerative Hierarchical Clustering) and K−Means. We are aware that there are various linkage options and distance measures that influence the clustering results. We assess the quality of clustering using the Davies–Bouldin and Dunn cluster validity indexes. The main contribution of this research is to verify whether the quality of clusters without outliers is higher than those with outliers in the data. To do this, we compare and analyze outlier detection algorithms depending on the applied clustering algorithm. In our research, we use and compare the LOF (Local Outlier Factor) and COF (Connectivity-based Outlier Factor) algorithms for detecting outliers before and after removing 1%, 5%, and 10% of outliers. Next, we analyze how the quality of clustering has improved. In the experiments, three real data sets were used with a different number of instances.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography