Journal articles on the topic 'Outlier analyses'

To see the other types of publications on this topic, follow the link: Outlier analyses.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Outlier analyses.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Bhushan, A., M. H. Sharker, and H. A. Karimi. "INCREMENTAL PRINCIPAL COMPONENT ANALYSIS BASED OUTLIER DETECTION METHODS FOR SPATIOTEMPORAL DATA STREAMS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences II-4/W2 (July 10, 2015): 67–71. http://dx.doi.org/10.5194/isprsannals-ii-4-w2-67-2015.

Full text
Abstract:
In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams.
APA, Harvard, Vancouver, ISO, and other styles
2

Singal, J., G. Silverman, E. Jones, T. Do, B. Boscoe, and Y. Wan. "Machine Learning Classification to Identify Catastrophic Outlier Photometric Redshift Estimates." Astrophysical Journal 928, no. 1 (March 1, 2022): 6. http://dx.doi.org/10.3847/1538-4357/ac53b5.

Full text
Abstract:
Abstract We present results of using a basic binary classification neural network model to identify likely catastrophic outlier photometric redshift estimates of individual galaxies, based only on the galaxies’ measured photometric band magnitude values. We find that a simple implementation of this classification can identify a significant fraction of galaxies with catastrophic outlier photometric redshift estimates while falsely categorizing only a much smaller fraction of non-outliers. These methods have the potential to reduce the errors introduced into science analyses by catastrophic outlier photometric redshift estimates.
APA, Harvard, Vancouver, ISO, and other styles
3

Svabova, Lucia, and Marek Durica. "Being an outlier: a company non-prosperity sign?" Equilibrium 14, no. 2 (June 30, 2019): 359–75. http://dx.doi.org/10.24136/eq.2019.017.

Full text
Abstract:
Research background: The state of financial distress or imminent bankruptcy are very difficult situations that the management of every company wants to avoid. For these reasons, prediction of company bankruptcy or financial distress has been recently in a focus of economists and scientists in many countries over the world. Purpose of the article: Various financial indicators, mostly financial ratios, are usually used to predict the financial distress. In order to create a strong prediction model and a statistically significant prediction of bankruptcy, it is advisable to use a deep statistical analysis of the data. In this paper, we analysed the real financial ratios of Slovak companies from the year 2017. In the phase of data preparation for further analysis, we checked the existence of outliers and found that there are some companies that are multivariate outliers because are significantly different from other companies in the database. Thus, we deeply focused on these outlying companies and analysed whether to be an outlier is a sign of financial distress. Methods: We analysed whether there are much more non-prosperous companies in the set of outlier companies and if their financial indicators are significantly different from those of the prosperous companies. For these analyses, we used testing of the statistical hypotheses, such as the test for equality of means and chi-square test. Findings & Value added: The ratio of non-prosperous companies between the outliers is significantly higher than 50 % and the attributes of non-prosperity and being an outlier are dependent. The means of almost all financial ratios of prosperous and non-prosperous companies among outliers are significantly different.
APA, Harvard, Vancouver, ISO, and other styles
4

Wu, Zifeng, Zhouxiang Wu, and Laurence R. Rilett. "Innovative Nonparametric Method for Data Outlier Filtering." Transportation Research Record: Journal of the Transportation Research Board 2674, no. 10 (September 18, 2020): 167–76. http://dx.doi.org/10.1177/0361198120945697.

Full text
Abstract:
Outlier filtering of empirical travel time data is essential for traffic analyses. Most of the widely applied outlier filtering algorithms are parametric in nature and based on assumed data distributions. The assumption, however, might not hold under unstable traffic conditions. This paper proposes a nonparametric outlier filtering method based on a robust locally weighted regression scatterplot smoothing model. The proposed method identifies outliers based on a data point’s standard residual in the robust local regression model. This approach fits a regression surface with no constraint on parametric distributions and limited influence from outliers. The proposed outlier filtering algorithm can be applied to various data collection technologies and for real-time applications. The performance of the new outlier filtering algorithm is compared with the moving standard deviation method and other traditional filtering algorithms. The test sites include GPS data of an Interstate highway in Indiana and Bluetooth data of an urban arterial roadway in Texas. It is shown that the proposed filtering algorithm has several advantages over the traditional filtering algorithms.
APA, Harvard, Vancouver, ISO, and other styles
5

Bae, Inhyeok, and Un Ji. "Outlier Detection and Smoothing Process for Water Level Data Measured by Ultrasonic Sensor in Stream Flows." Water 11, no. 5 (May 7, 2019): 951. http://dx.doi.org/10.3390/w11050951.

Full text
Abstract:
Water level data sets acquired by ultrasonic sensors in stream-scale channels exhibit relatively large numbers of outliers that are off the measurement range between the ultrasonic sensor and water surface, as well as data dispersion of approximately 2 cm due to random errors such as water waves. Therefore, this study develops a data processing algorithm for outlier removal and smoothing for water level data measured by ultrasonic sensors to consider these characteristics. The outlier removal process includes an initial cutoff process to remove outliers out of the measurement range and an outlier detection process using modified Z-scores based on the median absolute deviation (MAD) of a robust estimator. In addition, an exponentially weighted moving average (EWMA) method is applied to smooth the processed data. Sensitivity analyses are performed for factors that are subjectively set by the user, including the window size for the MAD outlier detection stage, the rejection criterion for the modified Z-score outlier removal stage, and the smoothing constant for the EWMA smoothing stage, based on four different water level data sets acquired by ultrasonic sensors in stream-scale experiments.
APA, Harvard, Vancouver, ISO, and other styles
6

Cobb, Natalie L., Sigrid Collier, Engi F. Attia, Orvalho Augusto, T. Eoin West, and Bradley H. Wagenaar. "Global influenza surveillance systems to detect the spread of influenza-negative influenza-like illness during the COVID-19 pandemic: Time series outlier analyses from 2015–2020." PLOS Medicine 19, no. 7 (July 19, 2022): e1004035. http://dx.doi.org/10.1371/journal.pmed.1004035.

Full text
Abstract:
Background Surveillance systems are important in detecting changes in disease patterns and can act as early warning systems for emerging disease outbreaks. We hypothesized that analysis of data from existing global influenza surveillance networks early in the COVID-19 pandemic could identify outliers in influenza-negative influenza-like illness (ILI). We used data-driven methods to detect outliers in ILI that preceded the first reported peaks of COVID-19. Methods and findings We used data from the World Health Organization’s Global Influenza Surveillance and Response System to evaluate time series outliers in influenza-negative ILI. Using automated autoregressive integrated moving average (ARIMA) time series outlier detection models and baseline influenza-negative ILI training data from 2015–2019, we analyzed 8,792 country-weeks across 28 countries to identify the first week in 2020 with a positive outlier in influenza-negative ILI. We present the difference in weeks between identified outliers and the first reported COVID-19 peaks in these 28 countries with high levels of data completeness for influenza surveillance data and the highest number of reported COVID-19 cases globally in 2020. To account for missing data, we also performed a sensitivity analysis using linear interpolation for missing observations of influenza-negative ILI. In 16 of the 28 countries (57%) included in this study, we identified positive outliers in cases of influenza-negative ILI that predated the first reported COVID-19 peak in each country; the average lag between the first positive ILI outlier and the reported COVID-19 peak was 13.3 weeks (standard deviation 6.8). In our primary analysis, the earliest outliers occurred during the week of January 13, 2020, in Peru, the Philippines, Poland, and Spain. Using linear interpolation for missing data, the earliest outliers were detected during the weeks beginning December 30, 2019, and January 20, 2020, in Poland and Peru, respectively. This contrasts with the reported COVID-19 peaks, which occurred on April 6 in Poland and June 1 in Peru. In many low- and middle-income countries in particular, the lag between detected outliers and COVID-19 peaks exceeded 12 weeks. These outliers may represent undetected spread of SARS-CoV-2, although a limitation of this study is that we could not evaluate SARS-CoV-2 positivity. Conclusions Using an automated system of influenza-negative ILI outlier monitoring may have informed countries of the spread of COVID-19 more than 13 weeks before the first reported COVID-19 peaks. This proof-of-concept paper suggests that a system of influenza-negative ILI outlier monitoring could have informed national and global responses to SARS-CoV-2 during the rapid spread of this novel pathogen in early 2020.
APA, Harvard, Vancouver, ISO, and other styles
7

Reddy, Y. Harshavardhan, M. Hari Srinivas, Adnan Ali, and A. Zaheer Sha. "A Review on Outliers in IoT." South Asian Research Journal of Engineering and Technology 4, no. 6 (November 11, 2022): 134–41. http://dx.doi.org/10.36346/sarjet.2022.v04i06.001.

Full text
Abstract:
In recent decades, the Internet of Things (IoT) has grown rapidly, attracting the attention of scientists and businesspeople. In extreme conditions, autonomously scattered sensor nodes pose a high risk of failure and intrusion into the IoT, skewing sensor values. Abnormal data, anomalies, or outliers are sensor values that depart from norms. When abnormalities are factored into data analytics, the ultimate judgment is affected. Using data-driven algorithms for IoT outlier detection is a cutting-edge tactic in Machine Learning (ML). However, evaluating the effectiveness of implemented ML techniques for outlier detection in IoT, which have the minimal processing power and power sources to ensure data quality, raises several difficulties that have just recently begun to be addressed in the academic literature. This paper analyses the cutting-edge architecture, type, degree, technique, and detection mode of AI and statistical outlier detection strategies in IoTs. Also, each of the ways to find outliers is talked about in detail, along with ways to make them better.
APA, Harvard, Vancouver, ISO, and other styles
8

Höhne, Jan Karem, and Stephan Schlosser. "Investigating the Adequacy of Response Time Outlier Definitions in Computer-Based Web Surveys Using Paradata SurveyFocus." Social Science Computer Review 36, no. 3 (June 1, 2017): 369–78. http://dx.doi.org/10.1177/0894439317710450.

Full text
Abstract:
Web surveys are commonly used in social research because they are usually cheaper, faster, and simpler to conduct than other modes. They also enable researchers to capture paradata such as response times. Particularly, the determination of proper values to define outliers in response time analyses has proven to be an intricate challenge. In fact, to a certain degree, researchers determine them arbitrarily. In this study, we use “SurveyFocus (SF)”—a paradata tool that records the activity of the web-survey pages—to assess outlier definitions based on response time distributions. Our analyses reveal that these common procedures provide relatively sufficient results. However, they are unable to detect all respondents who temporarily leave the survey, causing bias in the response times. Therefore, we recommend a two-step procedure consisting of the utilization of SF and a common outlier definition to attain a more appropriate analysis and interpretation of response times.
APA, Harvard, Vancouver, ISO, and other styles
9

Mao, Jialin, Frederic Scott Resnic, Leonard N. Girardi, Mario Fl Gaudino, and Art Sedrakyan. "Challenges in outlier surgeon assessment in the era of public reporting." Heart 105, no. 9 (November 10, 2018): 721–27. http://dx.doi.org/10.1136/heartjnl-2018-313650.

Full text
Abstract:
ObjectiveTo assess the effect of various evaluation and reporting strategies in determining outlier surgeons, defined by having worse-than-expected mortality after cardiac surgery.MethodsOur study included 33 394 isolated coronary artery bypass graft (CABG) procedures performed by 136 surgeons and 12 172 surgical aortic valve replacement (SAVR) procedures performed by 113 surgeons between 2010 and 2014. Three current methodologies based on the framework of comparing observed and expected (O/E ratio) mortality, with different distributional assumptions, were examined. We further assessed the consistency of outliers detected by these three methods and the impact of using different time windows and aggregating data of CABG and SAVR procedures.ResultsThe three methods were consistent and detected same outliers, with the least conservative method detecting additional outliers (outliers detected for methods 1, 2 and 3: CABG 3 (2.2%), 2 (1.5%) and 8 (5.9%); SAVR 1 (0.9%), 0 (0.0%) and 11 (9.7%)). When numbers of cases recorded were low and events were rare, the two more conservative methods were unlikely to detect outliers unless the O/E ratios were extremely high. However, these two methods were more consistent in detecting the same surgeons as outliers across different time windows for assessment. Of the surgeons who performed both CABG and SAVR, none was an outlier for both procedures when assessed separately. Aggregating data from CABG and SAVR may lead to results to be dominated by the procedure that had a higher caseload.ConclusionsThe choices of outlier assessment method, time window for assessment and data aggregation have an intertwined impact on detecting outlier surgeons, often representing different value assumptions toward patient protection and provider penalty. It is desirable to use different methods as sensitivity analyses, avoid aggregating procedures and avoid rare-event endpoints if possible.
APA, Harvard, Vancouver, ISO, and other styles
10

Beasley, Charles M., Brenda Crowe, Mary Nilsson, LieLing Wu, Rebeka Tabbey, Ryan T. Hietpas, Robert Dean, and Paul S. Horn. "Reference Limits for Outlier Analyses in Randomized Clinical Trials." Therapeutic Innovation & Regulatory Science 51, no. 6 (November 2017): 683–737. http://dx.doi.org/10.1177/2168479017700679.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Pereira, Francisco Melo, and Rute C. Sofia. "An Analysis of ML-Based Outlier Detection from Mobile Phone Trajectories." Future Internet 15, no. 1 (December 23, 2022): 4. http://dx.doi.org/10.3390/fi15010004.

Full text
Abstract:
This paper provides an analysis of two machine learning algorithms, density-based spatial clustering of applications with noise (DBSCAN) and the local outlier factor (LOF), applied in the detection of outliers in the context of a continuous framework for the detection of points of interest (PoI). This framework has as input mobile trajectories of users that are continuously fed to the framework in close to real time. Such frameworks are today still in their infancy and highly required in large-scale sensing deployments, e.g., Smart City planning deployments, where individual anonymous trajectories of mobile users can be useful to better develop urban planning. The paper’s contributions are twofold. Firstly, the paper provides the functional design for the overall PoI detection framework. Secondly, the paper analyses the performance of DBSCAN and LOF for outlier detection considering two different datasets, a dense and large dataset with over 170 mobile phone-based trajectories and a smaller and sparser dataset, involving 3 users and 36 trajectories. Results achieved show that LOF exhibits the best performance across the different datasets, thus showing better suitability for outlier detection in the context of frameworks that perform PoI detection in close to real time.
APA, Harvard, Vancouver, ISO, and other styles
12

Essick, Reed, Amanda Farah, Shanika Galaudage, Colm Talbot, Maya Fishbach, Eric Thrane, and Daniel E. Holz. "Probing Extremal Gravitational-wave Events with Coarse-grained Likelihoods." Astrophysical Journal 926, no. 1 (February 1, 2022): 34. http://dx.doi.org/10.3847/1538-4357/ac3978.

Full text
Abstract:
Abstract As catalogs of gravitational-wave transients grow, new records are set for the most extreme systems observed to date. The most massive observed black holes probe the physics of pair-instability supernovae while providing clues about the environments in which binary black hole systems are assembled. The least massive black holes, meanwhile, allow us to investigate the purported neutron star–black hole mass gap, and binaries with unusually asymmetric mass ratios or large spins inform our understanding of binary and stellar evolution. Existing outlier tests generally implement leave-one-out analyses, but these do not account for the fact that the event being left out was by definition an extreme member of the population. This results in a bias in the evaluation of outliers. We correct for this bias by introducing a coarse-graining framework to investigate whether these extremal events are true outliers or whether they are consistent with the rest of the observed population. Our method enables us to study extremal events while testing for population model misspecification. We show that this ameliorates biases present in the leave-one-out analyses commonly used within the gravitational-wave community. Applying our method to results from the second LIGO–Virgo transient catalog, we find qualitative agreement with the conclusions of Abbott et al. GW190814 is an outlier because of its small secondary mass. We find that neither GW190412 nor GW190521 is an outlier.
APA, Harvard, Vancouver, ISO, and other styles
13

Kumar, Nishith, Md Aminul Hoque, Md Shahjaman, S. M. Shahinul Islam, and Md Nurul Haque Mollah. "A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis." Current Bioinformatics 14, no. 1 (December 6, 2018): 43–52. http://dx.doi.org/10.2174/1574893612666171121154655.

Full text
Abstract:
Background: Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatography mass spectrometry (GC-MS), liquid chromatography mass spectrometry (LC-MS), etc.) metabolomics data frequently contain missing values that make some quantitative analysis complex. Typically metabolomics datasets contain 10% to 20% missing values that originate from several reasons, like analytical, computational as well as biological hazard. Imputation of missing values is a very important and interesting issue for further metabolomics data analysis. </P><P> Objective: This paper introduces a new algorithm for missing value imputation in the presence of outliers for metabolomics data analysis. </P><P> Method: Currently, the most well known missing value imputation techniques in metabolomics data are knearest neighbours (kNN), random forest (RF) and zero imputation. However, these techniques are sensitive to outliers. In this paper, we have proposed an outlier robust missing imputation technique by minimizing twoway empirical mean absolute error (MAE) loss function for imputing missing values in metabolomics data. Results: We have investigated the performance of the proposed missing value imputation technique in a comparison of the other traditional imputation techniques using both simulated and real data analysis in the absence and presence of outliers. Conclusion: Results of both simulated and real data analyses show that the proposed outlier robust missing imputation technique is better performer than the traditional missing imputation methods in both absence and presence of outliers.
APA, Harvard, Vancouver, ISO, and other styles
14

Verardi, Vincenzo, and Catherine Vermandele. "Univariate and Multivariate Outlier Identification for Skewed or Heavy-Tailed Distributions." Stata Journal: Promoting communications on statistics and Stata 18, no. 3 (September 2018): 517–32. http://dx.doi.org/10.1177/1536867x1801800303.

Full text
Abstract:
In univariate and in multivariate analyses, it is difficult to identify outliers in the case of skewed or heavy-tailed distributions. In this article, we propose simple univariate and multivariate outlier identification procedures that perform well with these types of distributions while keeping the computational complexity low. We describe the commands gboxplot (univariate case) and sdasym (multivariate case), which implement these procedures in Stata.
APA, Harvard, Vancouver, ISO, and other styles
15

Chelishchev, Petr, Aleksandr Popov, and Knut Sørby. "An investigation of Outlier Detection Procedures for CMM Measurement Data." MATEC Web of Conferences 220 (2018): 04002. http://dx.doi.org/10.1051/matecconf/201822004002.

Full text
Abstract:
The paper analyses methods for outlier detection in dimensional measurement. The cross sections of an internal cylinder were inspected by CMM (coordinate measuring machine), and received data sets were employed for further investigation. The efficiency of Rosner’s and Grubbs’ methods for excluding outliers from the measuring data had been estimated. The method of Rosner had been defined as the most effective for this case study. The simulation results were confirmed by experimental verification.
APA, Harvard, Vancouver, ISO, and other styles
16

McCoach, D. Betsy, Jessica Goldstein, Peter Behuniak, Sally M. Reis, Anne C. Black, Erin E. Sullivan, and Karen Rambo. "Examining the Unexpected: Outlier Analyses of Factors Affecting Student Achievement." Journal of Advanced Academics 21, no. 3 (May 2010): 426–68. http://dx.doi.org/10.1177/1932202x1002100304.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Domingues, Rémi, Maurizio Filippone, Pietro Michiardi, and Jihane Zouaoui. "A comparative evaluation of outlier detection algorithms: Experiments and analyses." Pattern Recognition 74 (February 2018): 406–21. http://dx.doi.org/10.1016/j.patcog.2017.09.037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Lazarus, David, Manuel Weinkauf, and Patrick Diver. "Pacman profiling: a simple procedure to identify stratigraphic outliers in high-density deep-sea microfossil data." Paleobiology 38, no. 1 (2012): 144–61. http://dx.doi.org/10.1017/s0094837300000452.

Full text
Abstract:
The deep-sea microfossil record is characterized by an extraordinarily high density and abundance of fossil specimens, and by a very high degree of spatial and temporal continuity of sedimentation. This record provides a unique opportunity to study evolution at the species level for entire clades of organisms. Compilations of deep-sea microfossil species occurrences are, however, affected by reworking of material, age model errors, and taxonomic uncertainties, all of which combine to displace a small fraction of the recorded occurrence data both forward and backwards in time, extending total stratigraphic ranges for taxa. These data outliers introduce substantial errors into both biostratigraphic and evolutionary analyses of species occurrences over time. We propose a simple method—Pacman—to identify and remove outliers from such data, and to identify problematic samples or sections from which the outlier data have derived. The method consists of, for a large group of species, compiling species occurrences by time and marking as outliers calibrated fractions of the youngest and oldest occurrence data for each species. A subset of biostratigraphic marker species whose ranges have been previously documented is used to calibrate the fraction of occurrences to mark as outliers. These outlier occurrences are compiled for samples, and profiles of outlier frequency are made from the sections used to compile the data; the profiles can then identify samples and sections with problematic data caused, for example, by taxonomic errors, incorrect age models, or reworking of sediment. These samples/sections can then be targeted for re-study.
APA, Harvard, Vancouver, ISO, and other styles
19

Liao, Xiou, Tongtong Wang, and Guohua Zou. "A Method for Detecting Outliers from the Gamma Distribution." Axioms 12, no. 2 (January 19, 2023): 107. http://dx.doi.org/10.3390/axioms12020107.

Full text
Abstract:
Outliers often occur during data collection, which could impact the result seriously and lead to a large inference error; therefore, it is important to detect outliers before data analysis. Gamma distribution is a popular distribution in statistics; this paper proposes a method for detecting multiple upper outliers from gamma (m,θ). For computing the critical value of the test statistic in our method, we derive the density function for the case of a single outlier and design two algorithms based on the Monte Carlo and the kernel density estimation for the case of multiple upper outliers. A simulation study shows that the test statistic proposed in this paper outperforms some common test statistics. Finally, we propose an improved testing method to reduce the impact of the swamping effect, which is demonstrated by real data analyses.
APA, Harvard, Vancouver, ISO, and other styles
20

Oliver, Gavin R., W. Garrett Jenkinson, Rory J. Olson, Laura E. Schultz-Rogers, and Eric W. Klee. "BOREALIS: an R/Bioconductor package to detect outlier methylation from bisulfite sequencing data." F1000Research 11 (December 20, 2022): 1538. http://dx.doi.org/10.12688/f1000research.128354.1.

Full text
Abstract:
Background: Rare genetic disease studies have benefited from the era of high throughput sequencing. DNA sequencing results in genetic diagnosis of 18-40% of previously unsolved cases, while the incorporation of RNA-Seq analysis has more recently been shown to generate significant numbers of previously unattainable diagnoses. While DNA methylation remains less explored, multiple inborn diseases resulting from disorders of genomic imprinting are well characterized and a growing body of literature suggests the causative or correlative role of aberrant methylation in diverse rare inherited conditions. Complex pictures of methylation patterning are also emerging, including the association of regional, multiple specific-site or even single-site methylation, with disease. The systematic application of genomic-wide methylation-based sequencing for undiagnosed cases of rare diseases is a logical progression from current testing paradigms. Similar to the rationale previously exploited in RNA-based rare disease studies, we can assume that disease-associated or causative methylation aberrations in an individual will demonstrate significant differences from other individuals with unrelated phenotypes. Thus, aberrantly methylated sites will be outliers from a heterogeneous cohort of individuals. Methods: Based on this rationale, we present BOREALIS: Bisulfite-seq OutlieR MEthylation At SingLeSIte ReSolution. BOREALIS uses a beta binomial model to identify outlier methylation at single CpG site resolution from bisulfite sequencing data. Results: Utilizing power analyses, we demonstrate that BOREALIS can identify outlier CpG methylation within a cohort of samples. Furthermore, we show that BOREALIS is tolerant to the inclusion of multiple identical outliers with sufficient cohort size and sequencing depth. Conclusions: The method demonstrates improved performance versus standard statistical testing and is suited for single or multi-site downstream analysis.
APA, Harvard, Vancouver, ISO, and other styles
21

Demir, Seda. "A comparison of fixed and random effect models by the number of research in the meta-analysis studies with and without an outlier." African Educational Research Journal 10, no. 3 (August 16, 2022): 277–90. http://dx.doi.org/10.30918/aerj.103.22.035.

Full text
Abstract:
The purpose of this research was to compare the performances of the Fixed Effect Model (FEM) and the Random Effects Model (REM) in the meta-analysis studies conducted through 5, 10, 20 and 40 studies with an outlier and 4, 9, 19 and 39 studies without an outlier in terms of estimated common effect size, confidence interval coverage rate and heterogeneity measures. In this descriptive study, real data set consisting of different studies examining teachers’ emotional burnout in terms of gender were used and a total of 72 meta-analyses were performed with R program. The results indicated that REM was more advantageous when compared to FEM for the meta-analysis of data sets with an outlier. On the other hand, without an outlier, it was determined that the common effect size was generally estimated to be similar for all methods. Moreover, the increase in the number of studies included in the meta-analysis reduced the effect of the outlier on the effect size estimation and decreased the heterogeneity. When the examination of the confidence interval coverage accuracy rates of the meta-analysis methods was examined, it was concluded that the confidence intervals included the estimated effect sizes in all data sets and all methods. The findings of the current study showed that the methods used in meta-analysis studies with 20 or more studies were less affected by the outlier runs in the estimated common effect size.
APA, Harvard, Vancouver, ISO, and other styles
22

Mori, Keita, Tomonori Oura, Hisashi Noma, and Shigeyuki Matsui. "Cancer Outlier Analysis Based on Mixture Modeling of Gene Expression Data." Computational and Mathematical Methods in Medicine 2013 (2013): 1–8. http://dx.doi.org/10.1155/2013/693901.

Full text
Abstract:
Molecular heterogeneity of cancer, partially caused by various chromosomal aberrations or gene mutations, can yield substantial heterogeneity in gene expression profile in cancer samples. To detect cancer-related genes which are active only in a subset of cancer samples or cancer outliers, several methods have been proposed in the context of multiple testing. Such cancer outlier analyses will generally suffer from a serious lack of power, compared with the standard multiple testing setting where common activation of genes across all cancer samples is supposed. In this paper, we consider information sharing across genes and cancer samples, via a parametric normal mixture modeling of gene expression levels of cancer samples across genes after a standardization using the reference, normal sample data. A gene-based statistic for gene selection is developed on the basis of a posterior probability of cancer outlier for each cancer sample. Some efficiency improvement by using our method was demonstrated, even under settings with misspecified, heavy-tailedt-distributions. An application to a real dataset from hematologic malignancies is provided.
APA, Harvard, Vancouver, ISO, and other styles
23

Herrmann, Moritz, and Fabian Scheipl. "A Geometric Perspective on Functional Outlier Detection." Stats 4, no. 4 (November 24, 2021): 971–1011. http://dx.doi.org/10.3390/stats4040057.

Full text
Abstract:
We consider functional outlier detection from a geometric perspective, specifically: for functional datasets drawn from a functional manifold, which is defined by the data’s modes of variation in shape, translation, and phase. Based on this manifold, we developed a conceptualization of functional outlier detection that is more widely applicable and realistic than previously proposed taxonomies. Our theoretical and experimental analyses demonstrated several important advantages of this perspective: it considerably improves theoretical understanding and allows describing and analyzing complex functional outlier scenarios consistently and in full generality, by differentiating between structurally anomalous outlier data that are off-manifold and distributionally outlying data that are on-manifold, but at its margins. This improves the practical feasibility of functional outlier detection: we show that simple manifold-learning methods can be used to reliably infer and visualize the geometric structure of functional datasets. We also show that standard outlier-detection methods requiring tabular data inputs can be applied to functional data very successfully by simply using their vector-valued representations learned from manifold learning methods as the input features. Our experiments on synthetic and real datasets demonstrated that this approach leads to outlier detection performances at least on par with existing functional-data-specific methods in a large variety of settings, without the highly specialized, complex methodology and narrow domain of application these methods often entail.
APA, Harvard, Vancouver, ISO, and other styles
24

Bulteau, T., D. Idier, J. Lambert, and M. Garcin. "How historical information can improve estimation and prediction of extreme coastal water levels: application to the Xynthia event at La Rochelle (France)." Natural Hazards and Earth System Sciences 15, no. 6 (June 5, 2015): 1135–47. http://dx.doi.org/10.5194/nhess-15-1135-2015.

Full text
Abstract:
Abstract. The knowledge of extreme coastal water levels is useful for coastal flooding studies or the design of coastal defences. While deriving such extremes with standard analyses using tide-gauge measurements, one often needs to deal with limited effective duration of observation which can result in large statistical uncertainties. This is even truer when one faces the issue of outliers, those particularly extreme values distant from the others which increase the uncertainty on the results. In this study, we investigate how historical information, even partial, of past events reported in archives can reduce statistical uncertainties and relativise such outlying observations. A Bayesian Markov chain Monte Carlo method is developed to tackle this issue. We apply this method to the site of La Rochelle (France), where the storm Xynthia in 2010 generated a water level considered so far as an outlier. Based on 30 years of tide-gauge measurements and 8 historical events, the analysis shows that (1) integrating historical information in the analysis greatly reduces statistical uncertainties on return levels (2) Xynthia's water level no longer appears as an outlier, (3) we could have reasonably predicted the annual exceedance probability of that level beforehand (predictive probability for 2010 based on data until the end of 2009 of the same order of magnitude as the standard estimative probability using data until the end of 2010). Such results illustrate the usefulness of historical information in extreme value analyses of coastal water levels, as well as the relevance of the proposed method to integrate heterogeneous data in such analyses.
APA, Harvard, Vancouver, ISO, and other styles
25

Bulteau, T., D. Idier, J. Lambert, and M. Garcin. "How historical information can improve extreme coastal water levels probability prediction: application to the Xynthia event at La Rochelle (France)." Natural Hazards and Earth System Sciences Discussions 2, no. 11 (November 20, 2014): 7061–88. http://dx.doi.org/10.5194/nhessd-2-7061-2014.

Full text
Abstract:
Abstract. The knowledge of extreme coastal water levels is useful for coastal flooding studies or the design of coastal defences. While deriving such extremes with standard analyses using tide gauge measurements, one often needs to deal with limited effective duration of observation which can result in large statistical uncertainties. This is even truer when one faces the issue of outliers, those particularly extreme values distant from the others which increase the uncertainty on the results. In this study, we investigate how historical information, even partial, of past events reported in archives can reduce statistical uncertainties and relativize such outlying observations. A Bayesian Markov Chain Monte Carlo method is developed to tackle this issue. We apply this method to the site of La Rochelle (France), where the storm Xynthia in 2010 generated a water level considered so far as an outlier. Based on 30 years of tide gauge measurements and 8 historical events, the analysis shows that: (1) integrating historical information in the analysis greatly reduces statistical uncertainties on return levels (2) Xynthia's water level no longer appears as an outlier, (3) we could have reasonably predicted the annual exceedance probability of that level beforehand (predictive probability for 2010 based on data till end of 2009 of the same order of magnitude as the standard estimative probability using data till end of 2010). Such results illustrate the usefulness of historical information in extreme value analyses of coastal water levels, as well as the relevance of the proposed method to integrate heterogeneous data in such analyses.
APA, Harvard, Vancouver, ISO, and other styles
26

Dyke, Ruth M. Van. "Space Syntax Analysis at the Chacoan Outlier of Guadalupe." American Antiquity 64, no. 3 (July 1999): 461–73. http://dx.doi.org/10.2307/2694146.

Full text
Abstract:
Space syntax analysis is a popular method for investigating social processes by quantifying the relationships among architectural spaces. Identification of spatial patterns is straightforward, but interpretation is less so. In this study, segregated spatial patterns were assumed to indicate the presence of social inequality. A space syntax analysis was conducted for Guadalupe Ruin, an excavated, outlying Chacoan great house with three well-dated construction episodes. The study investigated great house function and social context. Results seemed contradictory until room function and pueblo layout were incorporated into the interpretation. The great house can be understood as an group of separate but equal household units accessible primarily through the roof and plaza. Analyzed as a discrete entity, Guadalupe Ruin appears to have been a domestic building rather than an administrative or ceremonial facility. However, topographic restrictions and other differences with the surrounding community of small sites need to be explored through an expanded study at the community level. Comparison of the Guadalupe study with other great house space syntax analyses supports the recognition that Chacoan great houses varied considerably across time and space.
APA, Harvard, Vancouver, ISO, and other styles
27

Zippi, Pierre A., and Andrew F. Bajc. "Recognition of a Cretaceous outlier in northwestern Ontario." Canadian Journal of Earth Sciences 27, no. 2 (February 1, 1990): 306–11. http://dx.doi.org/10.1139/e90-029.

Full text
Abstract:
Borehole F-88-33, located near Rainy River, Ontario, intersected Cretaceous nonmarine clastic sediments. This is the first documented occurrence in Ontario of Cretaceous sediments associated with the western interior. Lithologie and heavy-mineral analyses were used to differentiate this unit from the overlying Quaternary sediments. Seventy-five species of fossil angiosperm pollen, gymnosperm pollen, spores, megaspores, and algal cysts were recovered from borehole F-88-33 and used to date the pre-Quaternary sediments as late Albian to early Cenomanian. The occurrence of these nonmarine sediments in northwestern Ontario helps to better define the limits of Cretaceous sedimentation in the western interior.
APA, Harvard, Vancouver, ISO, and other styles
28

Alvarez Prado, Santiago, Isabelle Sanchez, Llorenç Cabrera-Bosquet, Antonin Grau, Claude Welcker, François Tardieu, and Nadine Hilgert. "To clean or not to clean phenotypic datasets for outlier plants in genetic analyses?" Journal of Experimental Botany 70, no. 15 (April 25, 2019): 3693–98. http://dx.doi.org/10.1093/jxb/erz191.

Full text
Abstract:
Excluding outlier plants (biological replicates deviating from the expected distribution on a multi-criteria basis) from phenotypic datasets is necessary to avoid false-positive associations between genome markers and traits.
APA, Harvard, Vancouver, ISO, and other styles
29

Hewitson, Steve, Hung Kyu Lee, and Jinling Wang. "Localizability Analysis for GPS/Galileo Receiver Autonomous Integrity Monitoring." Journal of Navigation 57, no. 2 (April 21, 2004): 245–59. http://dx.doi.org/10.1017/s0373463304002693.

Full text
Abstract:
With the European Commission (EC) and European Space Agency's (ESA) plans to develop a new satellite navigation system, Galileo and the modernisation of GPS well underway the integrity of such systems is as much, if not more, of a concern as ever. Receiver Autonomous Integrity Monitoring (RAIM) refers to the integrity monitoring of the GPS/Galileo navigation signals autonomously performed by the receiver independent of any external reference systems, apart from the navigation signals themselves. Quality measures need to be used to evaluate the RAIM performance at different locations and under various navigation modes, such as GPS only and GPS/Galileo integration, etc. The quality measures should include both the reliability and localizability measures. Reliability is used to assess the capability of GPS/Galileo receivers to detect the outliers while localizability is used to determine the capability of GPS/Galileo receivers to correctly identify the detected outlier from the measurements processed.Within this paper, the fundamental equations required for effective outlier detection and identification algorithms are described together with the measures of reliability and localizability. Detailed simulations and analyses have been performed to assess the performances of GPS only and integrated GPS/Galileo navigation solutions with respect to reliability and localizability. Simulation results show that, in comparison with the GPS-only solution, the localizability of the integrated GPS/Galileo solution can be improved by up to 270%. The results also indicate an expectation of a considerable increase in the sensitivity to outliers and accuracy of their estimation with the augmentation of the Galileo system with the existing GPS system.
APA, Harvard, Vancouver, ISO, and other styles
30

Anitha Kumari, K., Avinash Sharma, S. Nivethitha, V. Dharini, V. Sanjith, R. Vaishnavi, G. Jothika, and K. Shophiya. "Automated Outlier Detection for Electrical Motors and Transformers." Journal of Computational and Theoretical Nanoscience 17, no. 9 (July 1, 2020): 4703–8. http://dx.doi.org/10.1166/jctn.2020.9304.

Full text
Abstract:
Electrical appliances most commonly consist of two electrical devices, namely, electrical motors and transformers. Typically, electrical motors are normally used in all sort of industrial purposes. Failures of such motors results in serious problems, such as overheat, shut down and even burnt, in their host systems. Thus, more attention have to be paid in detecting the outliers. In a similar way, to avoid the unexpected power reliability problems and system damages, the prediction of the failures in the transformers is expected to quantify the impacts. By predicting the failures, the lifetime of the transformers increases and unnecessary accidents is avoided. Therefore, this paper presents the detection of the outliers in electrical motors and failures in transformers using supervised machine learning algorithms. Machine learning techniques such as Support Vector Machine (SVM), Random Forest (RF) and regression techniques like Support Vector Regression (SVR), Polynomial Regression (PR) are used to analyze the use cases of different motor specifications. Evaluation and the efficiency of findings are proved by considering accuracy, precision, F-measure, and recall for motors. Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE) and R-squared Error (R2) are considered as metrics for transformers. The proposed approach helps to identify the anomalies like vibration loss, copper loss and overheating in the industrial motor and to determine the abnormal functioning of the transformer that in turn leads to ascertain the lifetime. The proposed system analyses the behaviour of the electrical machines using the energy meter data and reports the outliers to users. It also analyses the abnormalities occurring in the transformer using the parameters involved in the degradation of the paper-oil insulation system and the voltage of operation as a whole leads to the predict the lifetime.
APA, Harvard, Vancouver, ISO, and other styles
31

Cygańska, Małgorzata, Magdalena Kludacz-Alessandri, and Chris Pyke. "Healthcare Costs and Health Status: Insights from the SHARE Survey." International Journal of Environmental Research and Public Health 20, no. 2 (January 12, 2023): 1418. http://dx.doi.org/10.3390/ijerph20021418.

Full text
Abstract:
The substantial rise in hospital costs over recent years is associated with the rapid increase in the older age population. This study addresses an empirical gap in the literature concerning the determinants of high hospital costs in a group of older patients in Europe. The objective of the study is to examine the association of patient health status with in-hospital costs among older people across European countries. We used the data from the Survey of Health, Ageing and Retirement in Europe (SHARE) database. The analysis included 9671 patients from 18 European countries. We considered socio-demographic, lifestyle and clinical variables as possible factors influencing in-hospital costs. Univariate and multivariable logistic regression analyses were used to determine the determinants of in-hospital costs. To benchmark the hospital costs across European countries, we used the cost-outlier methodology. Rates of hospital cost outliers among older people varies from 5.80 to 12.65% across Europe. Factors associated with extremely high in-patient costs differ among European countries. In most countries, they include the length of stay in the hospital, comorbidities, functional mobility and physical activity. The treatment of older people reporting heart attack, diabetes, chronic lung disease and cancer are more often connected with cost outliers. The risk of being a cost outlier increased by 20% with each day spent in the hospital. We advocate that including patient characteristics in the reimbursement system could provide a relatively simple strategy for reducing hospitals’ financial risk connected with exceptionally costly cases.
APA, Harvard, Vancouver, ISO, and other styles
32

Lee, Yujin, Mary M. Capraro, Robert M. Capraro, and Ali Bicer. "A Meta-Analysis: Improvement of Students’ Algebraic Reasoning through Metacognitive Training." International Education Studies 11, no. 10 (September 27, 2018): 42. http://dx.doi.org/10.5539/ies.v11n10p42.

Full text
Abstract:
Although algebraic reasoning has been considered as an important factor influencing students&rsquo; mathematical performance, many students struggle to build concrete algebraic reasoning. Metacognitive training has been regarded as one effective method to develop students&rsquo; algebraic reasoning; however, there are no published meta-analyses that include an examination of the effects of metacognitive training on students&rsquo; algebraic reasoning. Therefore, the purpose of this meta-analysis was to examine the impact of metacognitive training on students&rsquo; algebraic reasoning. Eighteen studies with 22 effect sizes were selected for inclusion in the present meta-analysis. In the process of the analysis, one study was determined as an outlier; therefore, another meta-analysis was reconstructed without the outlier to calculate more robust results. The findings indicated that the overall effect size without an outlier equaled d=0.973 with SE=0.196. Q=20.201 (p&lt;.05) and I2=0.997, which indicated heterogeneity of the studies. These results showed that the metacognitive training had a statistically significant positive impact on students&rsquo; algebraic reasoning.
APA, Harvard, Vancouver, ISO, and other styles
33

McKeown, Niall J., Piera Carpi, Joana F. Silva, Amy J. E. Healey, Paul W. Shaw, and Jeroen van der Kooij. "Genetic population structure and tools for the management of European sprat (Sprattus sprattus)." ICES Journal of Marine Science 77, no. 6 (August 8, 2020): 2134–43. http://dx.doi.org/10.1093/icesjms/fsaa113.

Full text
Abstract:
Abstract This study used RAD-seq-derived SNPs to explore population connectivity, local adaptation, and individual assignment in European sprat (Sprattus sprattus) and inform the alignment of management units with biological processes. FST, clustering, and outlier analyses support a genetically cohesive population spanning the Celtic Sea-English Channel-North Sea-Kattegat (NE Atlantic) region. The lack of structure among the NE Atlantic samples indicates connectivity across current management boundaries. However, the assumption of demographic panmixia is cautioned against unless verified by a multidisciplinary approach. The data confirm high genetic divergence of a Baltic population (average FST vs. NE Atlantic samples = 0.051) with signatures compatible with local adaptation in the form of outlier loci, some of which are shown to occur within exonic regions. The outliers permit diagnostic assignment of individuals between the NE Atlantic and Baltic populations and thus represent a “reduced panel” of markers for monitoring a potential mixed stock fishery within the western Baltic. Overall, this study provides information that may help refine spatial management boundaries of sprat and resources for genetic-assisted management.
APA, Harvard, Vancouver, ISO, and other styles
34

Cottle, D. J., and R. J. Eckard. "Global beef cattle methane emissions: yield prediction by cluster and meta-analyses." Animal Production Science 58, no. 12 (2018): 2167. http://dx.doi.org/10.1071/an17832.

Full text
Abstract:
Methane yield values (MY; g methane/kg dry-matter intake) in beef cattle reported in the global literature (expanded MitiGate database of methane-mitigation studies) were analysed by cluster and meta-analyses. The Ward and k means cluster analyses included accounting for the categorical effects of methane measurement method, cattle breed type, country or region of study, age and sex of cattle, and proportion of grain in the diet and the standardised continuous variables of number of animals, liveweight and MY. After removal of data from outlier studies, meta-analyses were conducted on subsets of data to produce prediction equations for MY. Removing outliers with absolute studentised residual values of >1, followed by meta-analysis of data accounting for categorical effects, is recommended as a method for predicting MY. The large differences among some countries in MY values were significant but difficult to interpret. On the basis of the datasets available, a single, global MY or percentage of gross energy in feed converted to methane (Ym) value is not appropriate for use in Intergovernmental Panel on Climate Change (IPCC) greenhouse accounting methods around the world. Therefore, ideally country-specific MY values should be used in each country’s accounts (i.e. an IPCC Tier 2 or 3 approach) from data generated within that country.
APA, Harvard, Vancouver, ISO, and other styles
35

Nicklin, Christopher, and Luke Plonsky. "Outliers in L2 Research in Applied Linguistics: A Synthesis and Data Re-Analysis." Annual Review of Applied Linguistics 40 (March 2020): 26–55. http://dx.doi.org/10.1017/s0267190520000057.

Full text
Abstract:
AbstractData from self-paced reading (SPR) tasks are routinely checked for statistical outliers (Marsden, Thompson, & Plonsky, 2018). Such data points can be handled in a variety of ways (e.g., trimming, data transformation), each of which may influence study results in a different manner. This two-phase study sought, first, to systematically review outlier handling techniques found in studies that involve SPR and, second, to re-analyze raw data from SPR tasks to understand the impact of those techniques. Toward these ends, in Phase I, a sample of 104 studies that employed SPR tasks was collected and coded for different outlier treatments. As found in Marsden et al. (2018), wide variability was observed across the sample in terms of selection of time and standard deviation (SD)-based boundaries for determining what constitutes a legitimate reading time (RT). In Phase II, the raw data from the SPR studies in Phase I were requested from the authors. Nineteen usable datasets were obtained and re-analyzed using data transformations, SD boundaries, trimming, and winsorizing, in order to test their relative effectiveness for normalizing SPR reaction time data. The results suggested that, in the vast majority of cases, logarithmic transformation circumvented the need for SD boundaries, which blindly eliminate or alter potentially legitimate data. The results also indicated that choice of SD boundary had little influence on the data and revealed no meaningful difference between trimming and winsorizing, implying that blindly removing data from SPR analyses might be unnecessary. Suggestions are provided for future research involving SPR data and the handling of outliers in second language (L2) research more generally.
APA, Harvard, Vancouver, ISO, and other styles
36

Lee, Jung Il, Joo Ho Lee, Seung Hwan Park, Han Shin Choi, Hoon Cho, Hyung Ho Jo, Skae K. Kim, Hyuk Chon Kwon, and Jung Eui Hong. "Design of Manufacturing Process of Oxygen-Free High Conductivity Copper Using Mahalanobis-Distance Outlier Detection Method." Materials Science Forum 544-545 (May 2007): 965–68. http://dx.doi.org/10.4028/www.scientific.net/msf.544-545.965.

Full text
Abstract:
The proper control of total impurities and oxygen contents of oxygen-free high conductivity (OFHC) copper prepared by vacuum high-frequency melting technique was studied using Mahalanobis-Distance (MD) outlier detection method as functions of raw material purities, vacuum pressure, melting temperature and holding time. The properties of vacuum-melted OFHC copper was examined by thermo-gravimetric analysis, differential scanning calorimetry, hardness test, macro and optical microstructure analyses and ultimate tensile test. In multivariate systems, the existence of outlier makes it difficult to analyze the system and oultier detection belongs to the most important tasks in experimental data analysis. Mahalanobis Distance is most commonly used as a diagnosis of existance of outlier in multivariate system. The relationship between experiment conditions and total impurities and oxygen contents can be defined with the regression analysis results. At this research, our desirable manufacturing conditions is to obtain the total impurities under 40 ppm and oxygen contents under 5 ppm. After this statistical approach, the suggested minimum maufacturing conditions are the purity of raw material was 4N, vacuum pressure was 10-1 torr, melting temperature was 1150°C and melt holding time was 20 minutes.
APA, Harvard, Vancouver, ISO, and other styles
37

Bardet, L., C. M. Duluc, V. Rebour, and J. L'Her. "Regional frequency analysis of extreme storm surges along the French coast." Natural Hazards and Earth System Sciences 11, no. 6 (June 14, 2011): 1627–39. http://dx.doi.org/10.5194/nhess-11-1627-2011.

Full text
Abstract:
Abstract. A good knowledge of extreme storm surges is necessary to ensure protection against flood. In this paper we introduce a methodology to determine time series of skew surges in France as well as a statistical approach for estimating extreme storm surges. With the aim to cope with the outlier issue in surge series, a regional frequency analysis has been carried out for the surges along the Atlantic coast and the Channel coast. This methodology is not the current approach used to estimate extreme surges in France. First results showed that the extreme events identified as outliers in at-site analyses do not appear to be outliers any more in the regional empirical distribution. Indeed the regional distribution presents a curve to the top with these extreme events that a mixed exponential distribution seems to recreate. Thus, the regional approach appears to be more reliable for some sites than at-site analyses. A fast comparison at a given site showed surge estimates with the regional approach and a mixed exponential distribution are higher than surge estimates with an at-site fitting. In the case of Brest, the 1000-yr return surge is 167 cm in height with the regional approach instead of 126 cm with an at-site analysis.
APA, Harvard, Vancouver, ISO, and other styles
38

Jeong, Jong-Hoon, Jiwen Fan, Cameron R. Homeyer, and Zhangshuan Hou. "Understanding Hailstone Temporal Variability and Contributing Factors over the U.S. Southern Great Plains." Journal of Climate 33, no. 10 (May 15, 2020): 3947–66. http://dx.doi.org/10.1175/jcli-d-19-0606.1.

Full text
Abstract:
AbstractHailstones are a natural hazard that pose a significant threat to property and are responsible for significant economic losses each year in the United States. Detailed understanding of their characteristics is essential to mitigate their impact. Identifying the dynamic and physical factors contributing to hail formation and hailstone sizes is of great importance to weather and climate prediction and policymakers. In this study, we have analyzed the temporal and spatial variabilities of severe hail occurrences over the U.S. southern Great Plains (SGP) states from 2004 to 2016 using two hail datasets: hail reports from the Storm Prediction Center and the newly developed radar-retrieved maximum expected size of hail (MESH). It is found that severe and significant severe hail occurrences have a considerable year-to-year temporal variability in the SGP region. The interannual variabilities have a strong correspondence with sea surface temperature anomalies over the northern Gulf of Mexico and there is no outlier. The year 2016 is identified as an outlier for the correlations with both El Niño–Southern Oscillation (ENSO) and aerosol loading. The correlations with ENSO and aerosol loading are not statistically robust to inclusion of the outlier 2016. Statistical analysis without the outlier 2016 shows that 1) aerosols that may be mainly from northern Mexico have the largest correlation with hail interannual variability among the three factors and 2) meteorological covariation does not significantly contribute to the high correlation. These analyses warrant further investigations of aerosol impacts on hail occurrence.
APA, Harvard, Vancouver, ISO, and other styles
39

Farnsworth, David L. "Modeling and Fitting Two-Way Tables Containing Outliers." International Journal of Mathematics and Mathematical Sciences 2023 (February 11, 2023): 1–6. http://dx.doi.org/10.1155/2023/6352058.

Full text
Abstract:
A model is proposed for two-way tables of measurement data containing outliers. The two independent variables are categorical and error-free. Neither missing values nor replication is present. The model consists of the sum of a customary additive part that can be fit using least squares and a part that is composed of outliers. Recommendations are made for methods for identifying cells containing outliers and fitting the model. A graph of the observations is used to determine the outliers’ locations. For all cells containing an outlier, replacement values are determined simultaneously using a classical missing-data tool. The result is called the adjusted table. The inserted values are such that, when a mean-based fitting of the adjusted table is performed, the residuals in those cells are zero. The outlying portion of the observation in each of those cells is the difference of the observation and the replacement value. In this way, outliers are removed from further analyses of the adjusted table. This is particularly helpful because outliers can greatly contaminate and alter computations and conclusions. Subsequently, the causes of the outliers might be determined, and statistical estimation and testing can be implemented on the adjusted table.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhu, Qi, Qiyu Chen, Ying Tian, Jing Zhang, Rui Ran, and Shiyu Shu. "Genetic Predisposition to a Higher Whole Body Water Mass May Increase the Risk of Atrial Fibrillation: A Mendelian Randomization Study." Journal of Cardiovascular Development and Disease 10, no. 2 (February 10, 2023): 76. http://dx.doi.org/10.3390/jcdd10020076.

Full text
Abstract:
Background: Observational studies have found an association between increased whole body water mass (BWM) and atrial fibrillation (AF). However, the causality has yet to be confirmed. To provide feasible protective measures on disease development, we performed Mendelian randomization (MR) design to estimate the potential causal relationship between increased BWM and AF. Methods: We implemented a two-sample MR study to assess whether increased BWM causally influences AF incidence. For exposure, 61 well-powered genetic instruments extracted from UK Biobank (N = 331,315) were used as the proxies of BWM. Summary genetic data of AF were obtained from FinnGen (Ncase = 22,068; Ncontrol = 116,926). Inverse-variance weighted (IVW), MR-Egger and weighted median methods were selected to infer causality, complemented with a series of sensitivity analyses. MR-Pleiotropy Residual Sum and Outlier (MR-PRESSO) and Radial MR were employed to identify outliers. Furthermore, risk factor analyses were performed to investigate the potential mechanisms between increased BWM and AF. Results: Genetic predisposition to increased BWM was demonstrated to be significantly associated with AF in the IVW model (OR = 2.23; 95% CI = 1.47–3.09; p = 1.60 × 10−7), and the result was consistent in other MR approaches. There was no heterogeneity or pleiotropy detected in sensitivity analysis. MR-PRESSO identified no outliers with potential pleiotropy after excluding outliers by Radial MR. Furthermore, our risk factor analyses supported a positive causal effect of genetic predicted increased BWM on edematous diseases. Conclusions: MR estimates showed that a higher BWM could increase the risk of AF. Pathological edema is an important intermediate link mediating this causal relationship.
APA, Harvard, Vancouver, ISO, and other styles
41

Jordaan, A. C., E. V. D. M. Smit, and W. D. Hamman. "An investigation into the normality of the distributions of financial ratios of listed South African industrial companies." South African Journal of Business Management 25, no. 2 (June 30, 1994): 65–71. http://dx.doi.org/10.4102/sajbm.v25i2.844.

Full text
Abstract:
In this article we examine some of the inter-temporal and cross-sectional distributional properties of a selected number of financial ratios of South African industrial companies and we evaluate the effect of a simple procedure of outlier rejection. The normality assumption is rejected consistently in the case of the industry analysis and frequently in the sectoral and yearly analyses.
APA, Harvard, Vancouver, ISO, and other styles
42

Fei, Ke, Qi Li, Can Cui, Xue chen, Xinxin Xu, Benshan Xue, and Weifeng Cai. "Nontechnical Loss Detection using Neural Architecture Search and Outlier Detection." E3S Web of Conferences 256 (2021): 01025. http://dx.doi.org/10.1051/e3sconf/202125601025.

Full text
Abstract:
Electricity supply is essential to economy growth and improvement of people’s life. For a long time, illegal electricity theft not only affects the supply of power, but also causes significant economic loss. Traditional techniques for detecting electricity theft are inefficient and time-consuming. Data-based detecting algorithms become a new solution. This article analyses the features of electricity consumption, current, voltage and opening records under various electricity theft modes and proposes a new simulation method for electricity theft users. Based on the simulation dataset, a feature extraction method based on neural architecture search (NAS) is proposed. The advantage of this feature extraction model is demonstrated in the comparison experiments with other feature extraction model. Finally, the effectiveness and accuracy of the electricity theft detection method based on NAS model and outlier detection are verified through an industrial case study.
APA, Harvard, Vancouver, ISO, and other styles
43

Baesjou, Jean-Paul, and Maren Wellenreuther. "Genomic Signatures of Domestication Selection in the Australasian Snapper (Chrysophrys auratus)." Genes 12, no. 11 (October 29, 2021): 1737. http://dx.doi.org/10.3390/genes12111737.

Full text
Abstract:
Domestication of teleost fish is a recent development, and in most cases started less than 50 years ago. Shedding light on the genomic changes in key economic traits during the domestication process can provide crucial insights into the evolutionary processes involved and help inform selective breeding programmes. Here we report on the recent domestication of a native marine teleost species in New Zealand, the Australasian snapper (Chrysophrys auratus). Specifically, we use genome-wide data from a three-generation pedigree of this species to uncover genetic signatures of domestication selection for growth. Genotyping-By-Sequencing (GBS) was used to generate genome-wide SNP data from a three-generation pedigree to calculate generation-wide averages of FST between every generation pair. The level of differentiation between generations was further investigated using ADMIXTURE analysis and Principal Component Analysis (PCA). After that, genome scans using Bayescan, LFMM and XP-EHH were applied to identify SNP variants under putative selection following selection for growth. Finally, genes near candidate SNP variants were annotated to gain functional insights. Analysis showed that between generations FST values slightly increased as generational time increased. The extent of these changes was small, and both ADMIXTURE analysis and PCA were unable to form clear clusters. Genome scans revealed a number of SNP outliers, indicative of selection, of which a small number overlapped across analyses methods and populations. Genes of interest within proximity of putative selective SNPs were related to biological functions, and revealed an association with growth, immunity, neural development and behaviour, and tumour repression. Even though few genes overlapped between outlier SNP methods, gene functionalities showed greater overlap between methods. While the genetic changes observed were small in most cases, a number of outlier SNPs could be identified, of which some were found by more than one method. Multiple outlier SNPs appeared to be predominately linked to gene functionalities that modulate growth and survival. Ultimately, the results help to shed light on the genomic changes occurring during the early stages of domestication selection in teleost fish species such as snapper, and will provide useful candidates for the ongoing selective breeding in the future of this and related species.
APA, Harvard, Vancouver, ISO, and other styles
44

Kuhrij, Laurien, Erik van Zwet, Renske van den Berg-Vos, Paul Nederkoorn, and Perla J. Marang-van de Mheen. "Enhancing feedback on performance measures: the difference in outlier detection using a binary versus continuous outcome funnel plot and implications for quality improvement." BMJ Quality & Safety 30, no. 1 (February 7, 2020): 38–45. http://dx.doi.org/10.1136/bmjqs-2019-009929.

Full text
Abstract:
BackgroundHospitals and providers receive feedback information on how their performance compares with others, often using funnel plots to detect outliers. These funnel plots typically use binary outcomes, and continuous variables are dichotomised to fit this format. However, information is lost using a binary measure, which is only sensitive to detect differences in higher values (the tail) rather than the entire distribution. This study therefore aims to investigate whether different outlier hospitals are identified when using a funnel plot for a binary vs a continuous outcome. This is relevant for hospitals with suboptimal performance to decide whether performance can be improved by targeting processes for all patients or a subgroup with higher values.MethodsWe examined the door-to-needle time (DNT) of all (6080) patients with acute ischaemic stroke treated with intravenous thrombolysis in 65 hospitals in 2017, registered in the Dutch Acute Stroke Audit. We compared outlier hospitals in two funnel plots: the median DNT versus the proportion of patients with substantially delayed DNT (above the 90th percentile (P90)), whether these were the same or different hospitals. Two sensitivity analyses were performed using the proportion above the median and a continuous P90 funnel plot.ResultsThe median DNT was 24 min and P90 was 50 min. In the binary funnel plot for the proportion of patients above P90, 58 hospitals had average performance, whereas in the funnel plot around the median 14 of these hospitals had significantly higher median DNT (24%). These hospitals can likely improve their DNT by focusing on care processes for all patients, not shown by the binary outcome funnel plot. Similar results were shown in sensitivity analyses.ConclusionUsing funnel plots for continuous versus binary outcomes identify different outlier hospitals, which may enhance hospital feedback to direct more targeted improvement initiatives.
APA, Harvard, Vancouver, ISO, and other styles
45

Tigano, Anna, Allison J. Shultz, Scott V. Edwards, Gregory J. Robertson, and Vicki L. Friesen. "Outlier analyses to test for local adaptation to breeding grounds in a migratory arctic seabird." Ecology and Evolution 7, no. 7 (March 12, 2017): 2370–81. http://dx.doi.org/10.1002/ece3.2819.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Maroso, Francesco, Konstantinos Gkagkavouzis, Sabina De Innocentiis, Jasmien Hillen, Fernanda do Prado, Nikoleta Karaiskou, John Bernard Taggart, et al. "Genome-wide analysis clarifies the population genetic structure of wild gilthead sea bream (Sparus aurata)." PLOS ONE 16, no. 1 (January 11, 2021): e0236230. http://dx.doi.org/10.1371/journal.pone.0236230.

Full text
Abstract:
Gilthead sea bream is an important target for both recreational and commercial fishing in Europe, where it is also one of the most important cultured fish. Its distribution ranges from the Mediterranean to the African and European coasts of the North-East Atlantic. Until now, the population genetic structure of this species in the wild has largely been studied using microsatellite DNA markers, with minimal genetic differentiation being detected. In this geographically widespread study, 958 wild gilthead sea bream from 23 locations within the Mediterranean Sea and Atlantic Ocean were genotyped at 1159 genome-wide SNP markers by RAD sequencing. Outlier analyses identified 18 loci potentially under selection. Neutral marker analyses identified weak subdivision into three genetic clusters: Atlantic, West, and East Mediterranean. The latter group could be further subdivided into an Ionian/Adriatic and an Aegean group using the outlier markers alone. Seascape analysis suggested that this differentiation was mainly due to difference in salinity, this being also supported by preliminary genomic functional analysis. These results are of fundamental importance for the development of proper management of this species in the wild and are a first step toward the study of the potential genetic impact of the sea bream aquaculture industry.
APA, Harvard, Vancouver, ISO, and other styles
47

Paulauskas, Nerijus, and Algirdas Baskys. "Application of Histogram-Based Outlier Scores to Detect Computer Network Anomalies." Electronics 8, no. 11 (November 1, 2019): 1251. http://dx.doi.org/10.3390/electronics8111251.

Full text
Abstract:
Misuse activity in computer networks constantly creates new challenges and difficulties to ensure data confidentiality, integrity, and availability. The capability to identify and quickly stop the attacks is essential, as the undetected and successful attack may cause losses of critical resources. The anomaly-based intrusion detection system (IDS) is a valuable security tool that is capable of detecting new, previously unseen attacks. Anomaly-based IDS sends an alarm when it detects an event that deviates from the behavior characterized as normal. This paper analyses the use of the histogram-based outlier score (HBOS) to detect anomalies in the computer network. Experimental results of different histogram creation methods and the influence of the number of bins on the performance of anomaly detection are presented. Experiments were conducted using an NSL-KDD dataset.
APA, Harvard, Vancouver, ISO, and other styles
48

Tappia, Paramjit S., Andrew W. Maksymiuk, Daniel S. Sitar, Parveen S. Akhtar, Nazrina Khatun, Rahnuma Parveen, Rashiduzzaman Ahmed, et al. "Predictive value and clinical significance of increased SSAT-1 activity in healthy adults." Future Science OA 5, no. 7 (August 2019): FSO400. http://dx.doi.org/10.2144/fsoa-2019-0023.

Full text
Abstract:
Aim: Spermidine/spermine N1-acetyltransferase (SSAT-1) regulates cell growth, proliferation and death. Amantadine is converted by SSAT-1 to acetylamantadine (AA). In our earlier studies, although SSAT-1 was activated in patients with cancer, a number of ostensibly healthy adult volunteers had higher than expected AA concentration. This study was therefore undertaken to examine the outlier group. Materials & methods: A follow up of urine analysis for AA by liquid chromatography-tandem mass spectrometry as well as clinical assessments and additional blood analyses were conducted. Results: In some of the outlier controls, higher than expected AA concentration was linked to increased serum carcinoembryonic antigen. Clinical and radiographic assessments revealed underlying abnormalities in other cases that could represent premalignant conditions. Hematology tests revealed elevations in white blood cells and platelets, which are markers of inflammation. Conclusion: High urine concentration of AA could be used as a simple and useful test for screening of cancer in high-risk populations.
APA, Harvard, Vancouver, ISO, and other styles
49

Weekley, R. Andrew, Robert K. Goodrich, and Larry B. Cornman. "An Algorithm for Classification and Outlier Detection of Time-Series Data." Journal of Atmospheric and Oceanic Technology 27, no. 1 (January 1, 2010): 94–107. http://dx.doi.org/10.1175/2009jtecha1299.1.

Full text
Abstract:
Abstract An algorithm to perform outlier detection on time-series data is developed, the intelligent outlier detection algorithm (IODA). This algorithm treats a time series as an image and segments the image into clusters of interest, such as “nominal data” and “failure mode” clusters. The algorithm uses density clustering techniques to identify sequences of coincident clusters in both the time domain and delay space, where the delay-space representation of the time series consists of ordered pairs of consecutive data points taken from the time series. “Optimal” clusters that contain either mostly nominal or mostly failure-mode data are identified in both the time domain and delay space. A best cluster is selected in delay space and used to construct a “feature” in the time domain from a subset of the optimal time-domain clusters. Segments of the time series and each datum in the time series are classified using decision trees. Depending on the classification of the time series, a final quality score (or quality index) for each data point is calculated by combining a number of individual indicators. The performance of the algorithm is demonstrated via analyses of real and simulated time-series data.
APA, Harvard, Vancouver, ISO, and other styles
50

Bilska, Katarzyna, and Monika Szczecińska. "Comparison of the effectiveness of ISJ and SSR markers and detection of outlier loci in conservation genetics ofPulsatilla patenspopulations." PeerJ 4 (November 2, 2016): e2504. http://dx.doi.org/10.7717/peerj.2504.

Full text
Abstract:
BackgroundResearch into the protection of rare and endangered plant species involves genetic analyses to determine their genetic variation and genetic structure. Various categories of genetic markers are used for this purpose. Microsatellites, also known as simple sequence repeats (SSR), are the most popular category of markers in population genetics research. In most cases, microsatellites account for a large part of the noncoding DNA and exert a neutral effect on the genome. Neutrality is a desirable feature in evaluations of genetic differences between populations, but it does not support analyses of a population’s ability to adapt to a given environment or its evolutionary potential. Despite the numerous advantages of microsatellites, non-neutral markers may supply important information in conservation genetics research. They are used to evaluate adaptation to specific environmental conditions and a population’s adaptive potential. The aim of this study was to compare the level of genetic variation inPulsatilla patenspopulations revealed by neutral SSR markers and putatively adaptive ISJ markers (intron-exon splice junction).MethodsThe experiment was conducted on 14 Polish populations ofP. patensand threeP. patenspopulations from the nearby region of Vitebsk in Belarus. A total of 345 individuals were examined. Analyses were performed with the use of eight SSR primers specific toP. patensand three ISJ primers.ResultsSSR markers revealed a higher level of genetic variation than ISJ markers (He= 0.609,He= 0.145, respectively). An analysis of molecular variance (AMOVA) revealed that, the overall genetic diversity between the analyzed populations defined by parametersFSTand ΦPTfor SSR (20%) and ΦPTfor ISJ (21%) markers was similar. Analysis conducted in theStructureprogram divided analyzed populations into two groups (SSR loci) and three groups (ISJ markers). Mantel test revealed correlations between the geographic distance and genetic diversity of Polish populations ofP. patensfor ISJ markers, but not for SSR markers.ConclusionsThe results of the present study suggest that ISJ markers can complement the analyses based on SSRs. However, neutral and adaptive markers should not be alternatively applied. Neutral microsatellite markers cannot depict the full range of genetic variation in a population because they do not enable to analyze functional variation. Although ISJ markers are less polymorphic, they can contribute to the reliability of analyses based on SSRs.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography