Log in

Relevant bibliographies by topics / Outliers / Dissertations / Theses

Dissertations / Theses on the topic 'Outliers'

To see the other types of publications on this topic, follow the link: Outliers.

Author: Grafiati

Published: 4 June 2021

Last updated: 8 June 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Outliers.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Sean, Viseth. "Exploration Framework For Detecting Outliers In Data Streams." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-theses/395.

Full text

Abstract:

Current real-world applications are generating a large volume of datasets that are often continuously updated over time. Detecting outliers on such evolving datasets requires us to continuously update the result. Furthermore, the response time is very important for these time critical applications. This is challenging. First, the algorithm is complex; even mining outliers from a static dataset once is already very expensive. Second, users need to specify input parameters to approach the true outliers. While the number of parameters is large, using a trial and error approach online would be not only impractical and expensive but also tedious for the analysts. Worst yet, since the dataset is changing, the best parameter will need to be updated to respond to user exploration requests. Overall, the large number of parameter settings and evolving datasets make the problem of efficiently mining outliers from dynamic datasets very challenging. Thus, in this thesis, we design an exploration framework for detecting outliers in data streams, called EFO, which enables analysts to continuously explore anomalies in dynamic datasets. EFO is a continuous lightweight preprocessing framework. EFO embraces two optimization principles namely "best life expectancy" and "minimal trial," to compress evolving datasets into a knowledge-rich abstraction of important interrelationships among data. An incremental sorting technique is also used to leverage the almost ordered lists in this framework. Thereafter, the knowledge abstraction generated by EFO not only supports traditional outlier detection requests but also novel outlier exploration operations on evolving datasets. Our experimental study conducted on two real datasets demonstrates that EFO outperforms state-of-the-art technique in terms of CPU processing costs when varying stream volume, velocity and outlier rate.

APA, Harvard, Vancouver, ISO, and other styles

2

Beau, Thabiso. "Normality of JSE Returns: Macro-outliers, Micro-outliers: an Empirical Evaluation." Master's thesis, Faculty of Commerce, 2019. https://hdl.handle.net/11427/31721.

Full text

Abstract:

Previous work on the empirical distribution of security returns has found that equity returns are not normally distributed. These findings have brought the applicability of certain asset allocation and pricing frameworks into question. This study examines whether the removal of a priori macro-outliers and micro-outliers leads to improved fits to the Gaussian distribution for single-listed equities on the Johannesburg Stock Exchange (JSE). Single-listed equities refer to stocks (i) listed on the JSE Main Board over the period covered in this study, (ii) that comprise of the exchange’s largest 100 stocks by market capitalisation, and (iii) have been determined, by comparing American Depository Receipt (ADR) trading volume to JSE trading volume, to be mainly exposed to the South African market. Regarding the predetermined outliers, the study categorises macro-outliers as days related to predictable market announcements which are US nonfarm payrolls announcement days. Similarly, micro-outliers are classified as days linked to predictable sector-specific and firm-specific news, which are sectoral announcement, and company earnings announcement days, respectively. The study aims to contribute to the empirical and theoretical literature on the distributional properties of South African equity returns. This study makes use of a filter to narrow the sample of stocks for empirical investigation over the period from 1 January 2016 to 31 December 2017, and analyses daily stock returns on a 65-day rolling basis. Using only those equities, an evaluation of the goodness-of-fit methodology is conducted using graphical methods, and statistical goodness-of-fit tests sorted into (i) empirical distribution function, (ii) regression and correlation, and (iii) moment tests. It is found that the majority of the data exhibits significant departures from normality in empirical distribution function, and regression and correlation tests. The results were statistically significant at three confidence levels. However, in the case of moment tests, the results show a clear divergence between the methods. It is further demonstrated that while the daily stock returns have improved fits to the normal distribution, they remain predominantly positively-skewed and thick-tailed even after the removal of the a priori outliers. On this basis, it is argued that some downside risk measures, and asset allocation frameworks may not be applicable in the South African context.

APA, Harvard, Vancouver, ISO, and other styles

3

Mitchell, Napoleon. "Outliers and Regression Models." Thesis, University of North Texas, 1992. https://digital.library.unt.edu/ark:/67531/metadc279029/.

Full text

Abstract:

The mitigation of outliers serves to increase the strength of a relationship between variables. This study defined outliers in three different ways and used five regression procedures to describe the effects of outliers on 50 data sets. This study also examined the relationship among the shape of the distribution, skewness, and outliers.

APA, Harvard, Vancouver, ISO, and other styles

4

Yin, Yong. "Outliers in Time Series /." Connect to resource, 1995. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1262638388.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Halldestam, Markus. "ANOVA - The Effect of Outliers." Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-295864.

Full text

Abstract:

This bachelor’s thesis focuses on the effect of outliers on the one-way analysis of variance and examines whether the estimate in ANOVA is robust and whether the actual test itself is robust from influence of extreme outliers. The robustness of the estimates is examined using the breakdown point while the robustness of the test is examined by simulating the hypothesis test under some extreme situations. This study finds evidence that the estimates in ANOVA are sensitive to outliers, i.e. that the procedure is not robust. Samples with a larger portion of extreme outliers have a higher type-I error probability than the expected level.

APA, Harvard, Vancouver, ISO, and other styles

6

Schall, Robert. "Outliers and influence under arbitrary variance." Doctoral thesis, University of Cape Town, 1986. http://hdl.handle.net/11427/21913.

Full text

Abstract:

Using a geometric approach to best linear unbiased estimation in the general linear model, the additional sum of squares principle, used to generate decompositions, can be generalized allowing for an efficient treatment of augmented linear models. The notion of the admissibility of a new variable is useful in augmenting models. Best linear unbiased estimation and tests of hypotheses can be performed through transformations and reparametrizations of the general linear model. The theory of outliers and influential observations can be generalized so as to be applicable for the general univariate linear model, where three types of outlier and influence may be distinguished. The adjusted models, adjusted parameter estimates, and test statistics corresponding to each type of outlier are obtained, and data adjustments can be effected. Relationships to missing data problems are exhibited. A unified approach to outliers in the general linear model is developed. The concept of recursive residuals admits generalization. The typification of outliers and influential observations in the general linear model can be extended to normal multivariate models. When the outliers in a multivariate regression model follow a nested pattern, maximum likelihood estimation of the parameters in the model adjusted for the different types of outlier can be performed in closed form, and the corresponding likelihood ratio test statistic is obtained in closed form. For an arbitrary outlier pattern, and for the problem of outliers in the generalized multivariate regression model, three versions of the EM-algorithm corresponding to three types of outlier are used to obtain maximum likelihood estimates iteratively. A fundamental principle is the comparison of observations with a choice of distribution appropriate to the presumed type of outlier present. Applications are not necessarily restricted to multivariate normality.

APA, Harvard, Vancouver, ISO, and other styles

7

Campos, Guilherme Oliveira. "Estudo, avaliação e comparação de técnicas de detecção não supervisionada de outliers." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-04082015-084412/.

Full text

Abstract:

A área de detecção de outliers (ou detecção de anomalias) possui um papel fundamental na descoberta de padrões em dados que podem ser considerados excepcionais sob alguma perspectiva. Detectar tais padrões é relevante de maneira geral porque, em muitas aplicações de mineração de dados, tais padrões representam comportamentos extraordinários que merecem uma atenção especial. Uma importante distinção se dá entre as técnicas supervisionadas e não supervisionadas de detecção. O presente projeto enfoca as técnicas de detecção não supervisionadas. Existem dezenas de algoritmos desta categoria na literatura e novos algoritmos são propostos de tempos em tempos, porém cada um deles utiliza uma abordagem própria do que deve ser considerado um outlier ou não, que é um conceito subjetivo no contexto não supervisionado. Isso dificulta sensivelmente a escolha de um algoritmo em particular em uma dada aplicação prática. Embora seja de conhecimento comum que nenhum algoritmo de aprendizado de máquina pode ser superior a todos os demais em todos os cenários de aplicação, é uma questão relevante se o desempenho de certos algoritmos em geral tende a dominar o de determinados outros, ao menos em classes particulares de problemas. Neste projeto, propõe-se contribuir com o estudo, seleção e pré-processamento de bases de dados que sejam apropriadas para se juntarem a uma coleção de benchmarks para avaliação de algoritmos de detecção não supervisionada de outliers. Propõe-se ainda avaliar comparativamente o desempenho de métodos de detecção de outliers. Durante parte do meu trabalho de mestrado, tive a colaboração intelectual de Erich Schubert, Ira Assent, Barbora Micenková, Michael Houle e, principalmente, Joerg Sander e Arthur Zimek. A contribuição deles foi essencial para as análises dos resultados e a forma compacta de apresentá-los.
The outlier detection area has an essential role in discovering patterns in data that can be considered as exceptional in some perspective. Detect such patterns is important in general because, in many data mining applications, such patterns represent extraordinary behaviors that deserve special attention. An important distinction occurs between supervised and unsupervised detection techniques. This project focuses on the unsupervised detection techniques. There are dozens of algorithms in this category in literature and new algorithms are proposed from time to time, but each of them uses its own approach of what should be considered an outlier or not, which is a subjective concept in the unsupervised context. This considerably complicates the choice of a particular algorithm in a given practical application. While it is common knowledge that no machine learning algorithm can be superior to all others in all application scenarios, it is a relevant question if the performance of certain algorithms in general tends to dominate certain other, at least in particular classes of problems. In this project, proposes to contribute to the databases study, selection and pre-processing that are appropriate to join a benchmark collection for evaluating unsupervised outlier detection algorithms. It is also proposed to evaluate comparatively the performance of outlier detection methods. During part of my master thesis, I had the intellectual collaboration of Erich Schubert, Ira Assent, Barbora Micenková, Michael Houle and especially Joerg Sander and Arthur Zimek. Their contribution was essential for the analysis of the results and the compact way to present them.

APA, Harvard, Vancouver, ISO, and other styles

8

Berton, Lilian. "Caracterização de classes e detecção de outliers em redes complexa." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-19072011-132701/.

Full text

Abstract:

As redes complexas surgiram como uma nova e importante maneira de representação e abstração de dados capaz de capturar as relações espaciais, topológicas, funcionais, entre outras características presentes em muitas bases de dados. Dentre as várias abordagens para a análise de dados, destacam-se a classificação e a detecção de outliers. A classificação de dados permite atribuir uma classe aos dados, baseada nas características de seus atributos e a detecção de outliers busca por dados cujas características se diferem dos demais. Métodos de classificação de dados e de detecção de outliers baseados em redes complexas ainda são pouco estudados. Tendo em vista os benefícios proporcionados pelo uso de redes complexas na representação de dados, o presente trabalho apresenta o desenvolvimento de um método baseado em redes complexas para detecção de outliers que utiliza a caminhada aleatória e um índice de dissimilaridade. Este método possibilita a identificação de diferentes tipos de outliers usando a mesma medida. Dependendo da estrutura da rede, os vértices outliers podem ser tanto aqueles distantes do centro como os centrais, podem ser hubs ou vértices com poucas ligações. De um modo geral, a medida proposta é uma boa estimadora de vértices outliers em uma rede, identificando, de maneira adequada, vértices com uma estrutura diferenciada ou com uma função especial na rede. Foi proposta também uma técnica de construção de redes capaz de representar relações de similaridade entre classes de dados, baseada em uma função de energia que considera medidas de pureza e extensão da rede. Esta rede construída foi utilizada para caracterizar mistura entre classes de dados. A caracterização de classes é uma questão importante na classificação de dados, porém ainda é pouco explorada. Considera-se que o trabalho desenvolvido é uma das primeiras tentativas nesta direção
Complex networks have emerged as a new and important way of representation and data abstraction capable of capturing the spatial relationships, topological, functional, and other features present in many databases. Among the various approaches to data analysis, we highlight classification and outlier detection. Data classification allows to assign a class to the data based on characteristics of their attributes and outlier detection search for data whose characteristics differ from the others. Methods of data classification and outlier detection based on complex networks are still little studied. Given the benefits provided by the use of complex networks in data representation, this study developed a method based on complex networks to detect outliers based on random walk and on a dissimilarity index. The method allows the identification of different types of outliers using the same measure. Depending on the structure of the network, the vertices outliers can be either those distant from the center as the central, can be hubs or vertices with few connections. In general, the proposed measure is a good estimator of outlier vertices in a network, properly identifying vertices with a different structure or a special function in the network. We also propose a technique for building networks capable of representing similarity relationships between classes of data based on an energy function that considers measures of purity and extension of the network. This network was used to characterize mixing among data classes. Characterization of classes is an important issue in data classification, but it is little explored. We consider that this work is one of the first attempts in this direction

APA, Harvard, Vancouver, ISO, and other styles

9

Iranzo, Pérez David. "Análisis de outliers: un caso a estudio." Doctoral thesis, Universitat de València, 2007. http://hdl.handle.net/10803/9467.

Full text

Abstract:

Una de las limitaciones del estudio de series temporales mediante lamodelización ARIMA, y en concreto a través del enfoque Box-Jenkins, es la dificultadde identificar correctamente el modelo y, en su caso, seleccionar el más adecuado. Elprocedimiento de filtrado estándar para estimar el ciclo de negocios puede requeriralgunas correcciones previas de las series, dado que, de otro modo, se podrían producirgraves distorsiones en los resultados. Un destacado ejemplo es la corrección por outliersque es tratada, junto con el resto de ajustes previos.Los outliers denotan observaciones atípicas que, hablando en general, no puedenser explicadas por el modelo ARIMA y violan sus subyacentes supuestos denormalidad. Como los modelos ARIMA utilizados frecuentemente en series temporalesestán diseñados para recoger la información de procesos que tienen una ciertahomogeneidad, los outliers y los cambios estructurales influyen en la eficiencia y labondad del ajuste de dichos modelos.Siguiendo el trabajo seminal de Fox, cuatro diferentes tipos de outliers han sidopropuestos, junto con diversos procedimientos para detectarlos. Los cuatro tipos deoutliers que se han considerado en la literatura son: el outlier aditivo (AO), el cambio ennivel (LS), el cambio temporal (TC) y el outlier innovacional (OI).El presente estudio hace una comparación de los programas TRAMO/SEATS yX12ARIMA, ampliamente usados (y recomendados) por Eurostat y el Banco CentralEuropeo, junto con X12ARIMA. La comparación es importante para dilucidar laconveniencia de promover el uso de uno de los dos, en aras a armonizar el tratamientode series temporales.Ambos programas son altamente configurables y disponen de una infinidad deparámetros que el usuario puede determinar.Para ilustrar el trabajo se realiza, en primer lugar, un experimento con seriesgeneradas, en el cual se va a trabajar con un total de nueve mil series ruido blancosimuladas a partir de una función generadora de datos aleatorios, resultado deconsiderar tres modelos econométricos distintos y, a su vez, tres periodos muestralesdistintos en cada caso (60, 120 y 300 observaciones). Además, se va a forzar lapresencia de los tres tipos de outliers (AO, LS, TC) con tres niveles de intensidad delimpacto. Para cada uno de estos casos concretos se estudiarán un total de cien series.En segundo lugar, se trabaja con series reales donde se trata de analizar laincidencia del shock provocado por un acto terrorista, sobre la actividad turística en unadeterminada zona. Para ello se realiza un estudio detallado de las pernoctaciones totalesde viajeros en establecimientos hoteleros según el país de procedencia.El marco teórico utilizado se inspira en los trabajos de Enders et al. (1992) yDrakos et al. (2001), mientras que la metodología utilizada se inspira en el análisis deseries temporales, en concreto se sigue la propuesta de A. Maravall y V. Gómez (1996).Dentro de las acciones terroristas, destacan las acciones sobre la actividadturística en general y sobre el sector del transporte en particular. Dichos sectores son losmás vulnerables ante las amenazas de inseguridad.Tanto en el experimento con series generadas como en el experimento con seriesreales se procede a analizar las series con ambos programes, es decir, TRAMO/SEATSy X12ARIMA para comparar los resultados y así poder establecer diferencias entre losprogramas.
One of the limitations of using ARIMA modelling, and more specifically theBox-Jenkins approach, to study time series is how difficult it is to correctly identify themodel and, where applicable, to choose the most suitable one. The standard filteringprocess used to estimate the business cycle can require the prior correction of someseries, due to the fact that if this were not the case, results could be seriously distorted.One outstanding example is outlier correction.Outliers denote unusual observations that, generally speaking, cannot beexplained by the ARIMA model and violate its underlying normality assumptions. Asthe ARIMA models frequently used in time series are designed to capture informationin processes that have some degree of homogeneity, their efficiency and goodness-of-fitcan be influenced by outliers and structural changes.Following the seminal research by Fox, four different types of outliers areproposed, together with various processes to detect them. The four types of outlierscontemplated in the literature are: Additive Outlier (AO), Level Shift (LS), TemporaryChange (TC) and Innovational Outlier (IO).In order to illustrate this research, in the first place, an experiment is carried outusing nine thousand white noise series simulated using a random data generationfunction after considering three different econometric models and, at the same time,three different sample periods in each case (60, 120 and 300 observations).Furthermore, the presence of three types of outliers will be forced (AO, LS and TC)with three different levels of impact. A total of 100 series will be studied for each ofthese specific cases.In the second place, real series are used to analyse the influence of a shockcaused by a terrorist attack on tourism activity in a given area. In order to do so, wecarry out a detailed study of travellers' total overnight stays in hotels by country oforigin.Both programmes, that is, TRAMO/SEAT and X12ARIMA, are used to analysedata in both the experiment with generated series and that using real series in order tocompare results and hence establish differences between the two.

APA, Harvard, Vancouver, ISO, and other styles

10

Dunagan, John D. (John David) 1976. "A geometric theory of outliers and perturbation." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/8396.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2002.
Includes bibliographical references (p. 91-94).
We develop a new understanding of outliers and the behavior of linear programs under perturbation. Outliers are ubiquitous in scientific theory and practice. We analyze a simple algorithm for removal of outliers from a high-dimensional data set and show the algorithm to be asymptotically good. We extend this result to distributions that we can access only by sampling, and also to the optimization version of the problem. Our results cover both the discrete and continuous cases. This is joint work with Santosh Vempala. The complexity of solving linear programs has interested researchers for half a century now. We show that an arbitrary linear program subject to a small random relative perturbation has good condition number with high probability, and hence is easy to solve. This is joint work with Avrim Blum, Daniel Spielman, and Shang-Hua Teng. This result forms part of the smoothed analysis project initiated by Spielman and Teng to better explain mathematically the observed performance of algorithms.
by John D. Dunagan.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

11

Astl, Stefan Ludwig. "Suboptimal LULU-estimators in measurements containing outliers." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019.1/85833.

Full text

Abstract:

Thesis (MSc)--Stellenbosch University, 2013.
ENGLISH ABSTRACT: Techniques for estimating a signal in the presence of noise which contains outliers are currently not well developed. In this thesis, we consider a constant signal superimposed by a family of noise distributions structured as a tunable mixture f(x) = α g(x) + (1 − α) h(x) between finitesupport components of “well-behaved” noise with small variance g(x) and of “impulsive” noise h(x) with a large amplitude and strongly asymmetric character. When α ≈ 1, h(x) can for example model a cosmic ray striking an experimental detector. In the first part of our work, a method for obtaining the expected values of the positive and negative pulses in the first resolution level of a LULU Discrete Pulse Transform (DPT) is established. Subsequent analysis of sequences smoothed by the operators L1U1 or U1L1 of LULU-theory shows that a robust estimator for the location parameter for g is achieved in the sense that the contribution by h to the expected average of the smoothed sequences is suppressed to order (1 − α)2 or higher. In cases where the specific shape of h can be difficult to guess due to the assumed lack of data, it is thus also shown to be of lesser importance. Furthermore, upon smoothing a sequence with L1U1 or U1L1, estimators for the scale parameters of the model distribution become easily available. In the second part of our work, the same problem and data is approached from a Bayesian inference perspective. The Bayesian estimators are found to be optimal in the sense that they make full use of available information in the data. Heuristic comparison shows, however, that Bayes estimators do not always outperform the LULU estimators. Although the Bayesian perspective provides much insight into the logical connections inherent in the problem, its estimators can be difficult to obtain in analytic form and are slow to compute numerically. Suboptimal LULU-estimators are shown to be reasonable practical compromises in practical problems.
AFRIKAANSE OPSOMMING: Tegnieke om ’n sein af te skat in die teenwoordigheid van geraas wat uitskieters bevat is tans nie goed ontwikkel nie. In hierdie tesis aanskou ons ’n konstante sein gesuperponeer met ’n familie van geraasverdelings wat as verstelbare mengsel f(x) = α g(x) + (1 − α) h(x) tussen eindige-uitkomsruimte geraaskomponente g(x) wat “goeie gedrag” en klein variansie toon, plus “impulsiewe” geraas h(x) met groot amplitude en sterk asimmetriese karakter. Wanneer α ≈ 1 kan h(x) byvoorbeeld ’n kosmiese straal wat ’n eksperimentele apparaat tref modelleer. In die eerste gedeelte van ons werk word ’n metode om die verwagtingswaardes van die positiewe en negatiewe pulse in die eerste resolusievlak van ’n LULU Diskrete Pulse Transform (DPT) vasgestel. Die analise van rye verkry deur die inwerking van die gladstrykers L1U1 en U1L1 van die LULU-teorie toon dat hul verwagte gemiddelde waardes as afskatters van die liggingsparameter van g kan dien wat robuus is in die sin dat die bydrae van h tot die gemiddeld van orde grootte (1 − α)2 of hoër is. Die spesifieke vorm van h word dan ook onbelangrik. Daar word verder gewys dat afskatters vir die relevante skaalparameters van die model maklik verkry kan word na gladstryking met die operatore L1U1 of U1L1. In die tweede gedeelte van ons werk word dieselfde probleem en data vanuit ’n Bayesiese inferensie perspektief benader. Die Bayesiese afskatters word as optimaal bevind in die sin dat hulle vol gebruikmaak van die beskikbare inligting in die data. Heuristiese vergelyking wys egter dat Bayesiese afskatters nie altyd beter vaar as die LULU afskatters nie. Alhoewel die Bayesiese sienswyse baie insig in die logiese verbindings van die probleem gee, kan die afskatters moeilik wees om analities af te lei en stadig om numeries te bereken. Suboptimale LULU-beramers word voorgestel as redelike praktiese kompromieë in praktiese probleme.

APA, Harvard, Vancouver, ISO, and other styles

12

Giroldo, Fabíola Rocha de Santana. "Alguns métodos robustos para detectar outliers multivariados." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-20102009-211316/.

Full text

Abstract:

Observações ou outliers estão quase sempre presentes em qualquer conjunto de dados, seja ele grande ou pequeno. Isso pode ocorrer por erro no armazenamento dos dados ou por existirem realmente alguns pontos diferentes dos demais. A presença desses pontos pode causar distorções nos resultados de modelos e estimativas. Por isso, a sua detecção é muito importante e deve ser feita antes do início de uma análise mais profunda dos dados. Após esse diagnóstico, pode-se tomar uma decisão a respeito dos pontos atípicos. Uma possibilidade é corrigi-los caso tenha ocorrido erro na transcrição dos dados. Caso sejam pontos válidos, eles devem ser tratados de forma diferente dos demais, seja com uma ponderação, seja com uma análise especial. Nos casos univariado e bivariado, o outlier pode ser detectado analisando-se o gráfico de dispersão que mostra o comportamento de cada observação do conjunto de dados de interesse. Se houver pontos distantes da massa de dados, eles devem ser considerados atípicos. No caso multivariado, a detecção por meio de gráficos torna-se um pouco mais complexa porque a análise deveria ser feita observando-se duas variáveis por vez, o que tornaria o processo longo e pouco confiável, pois um ponto pode ser atípico com relação a algumas variáveis e não ser com relação a outras, o que faria com que o resultado ficasse mascarado. Neste trabalho, alguns métodos robustos para detecção de outliers em dados multivariados são apresentados. A aplicação de cada um dos métodos é feita para um exemplo. Além disso, os métodos são comparados de acordo com o resultado que cada um apresentar para o exemplo em questão e via simulação.
Unusual observations or outliers are frequent in any data set, if it is large or not. Outliers may occur by typing mistake or by the existence of observations that are really different from the others. The presence of this observations may distort the results of models and estimates. Therefore, their detection is very important and it is recommended to be performed before any detailed analysis, when a decision can be taken about these atypical observations. A possibility is to correct these observations if the problem occurred with the construction of the data set. If the observations are correct, different strategies can be adopted, with some weights or with special analysis. In univariate and bivariate data sets, outliers can be detected analyzing the scatter plot. Observations distant from the cloud formed by the data set are considered unusual. In multivariate data sets, the detection of outliers using graphics is more difficult because we have to analyse a couple of variables each time, which results is a long and less reliable process because we can find an observation that is unusual for one variable and not unusual for the others, masking the results. In this work, some robust methods for detection of multivariate outliers are presented. The application of each one is done for an example. Moreover, the methods are compared by the results of each one in the example and by simulation.

APA, Harvard, Vancouver, ISO, and other styles

13

Santos, Adriana Maria Rocha Trancoso. "Outliers em variáveis geoespaciais: proprosições utilizando geoestatística." Universidade Federal de Viçosa, 2016. http://www.locus.ufv.br/handle/123456789/9784.

Full text

Abstract:

Submitted by Reginaldo Soares de Freitas (reginaldo.freitas@ufv.br) on 2017-03-14T11:45:38Z No. of bitstreams: 1 texto completo.pdf: 1172199 bytes, checksum: 33710fa298bd2474b7030d1c436c7f20 (MD5)
Made available in DSpace on 2017-03-14T11:45:38Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1172199 bytes, checksum: 33710fa298bd2474b7030d1c436c7f20 (MD5) Previous issue date: 2016-12-16
Faculdades Adventistas de Minas Gerais
As observações que se afastam estatisticamente das demais em um conjunto de dados comumente são denominadas de outliers. Tal comportamento faculta o surgimento de hipóteses como por exemplo, a de que os dados pertencem à outra população. Contudo, independentemente das hipóteses que podem surgir, é importante considerar frequentemente a adequabilidade das metodologias existentes aos diversos tipos de variáveis envolvidas em investigações científicas. Na literatura especializada, é comum encontrar na metodologia o uso do Box Plot como principal mecanismo de detecção, e a exclusão dos dados “discrepantes”, detectados por este mecanismo, do conjunto de dados em estudo. Como o Box Plot é um mecanismo que não leva em consideração a posição geográfica dos dados, tem-se como hipótese a não aplicabilidade deste em dados geoespaciais contínuos. Assim, apresenta-se neste trabalho um estudo sobre a importância da proposição de métodos de detecção de outliers que incorporam a localização dos dados, bem como a comparação de seu desempenho com o Box Plot. No primeiro capítulo foi proposto um novo método de detecção de outliers para dados geoespaciais contínuos, em que um conjunto de dados reais, sabidamente com outliers, foi analisado tanto pelo Box Plot quanto pelo método em proposição. No segundo capítulo foi proposto um novo método de detecção de outliers para dados geoespaciais contínuos, cujas variáveis são não-negativas. Um conjunto de dados reais foi analisado usando o Box Plot e usando o novo método proposto. Finalmente, no terceiro capítulo foi proposto um mecanismo metodológico para a decisão de exclusão dos dados com alta probabilidade de discrepância. Neste capítulo foram utilizados quatro conjuntos de dados, sendo três simulados computacionalmente e um conjunto de dados reais. Visando robustecer teoricamente toda a proposição do trabalho, adotou-se como princípios norteadores uma combinação de teoremas da Estatística Clássica e da aplicação da Geoestatística, como principal metodologia de apoio. A Geoestatística foi adotada por incorporar a localização geográfica dos dados no processo analítico, estar baseada em suas características estatisticamente ótimas, ou seja, uma metodologia criada para ser sem tendência e com variância mínima na predição de valores não observados, além de levar em consideração na modelagem e predição a estrutura de dependência espacial das amostras, o que é inerente aos dados geoespaciais.
The observations that differ statistically from the others in a data set commonly are named outliers. Such behavior empowers the emergence of hypothesis such as, the data belong to another population. However, independently from the hypothesis that may arise, it is important to consider frequently the suitability of the existent methodologies to the many types of involved variables in scientific investigations. In the specialized literacy, it is common to find in the suggested methodology the use of the Box Plot as a main mechanism of detection, and the exclusion of "discrepant" data of the data set studied, detected by this mechanism. Since the Box Plot is a mechanism that does not take into consideration the geographic position of the data, there is the hypothesis of the non- suitability of such mechanism in continuous geospatial data. Thus, it is presented in this work a study about the importance of a proposition of methods of outliers detection that incorporate the localization of the data, comparing them to the Box Plot. In the first chapter it was proposed a new method of outliers detection for continuous geospatial data, in which the real data set, with known outliers, was analyzed through the Box Plot and the proposition method. In the second chapter it was proposed a new method of outliers detection for continuous geospatial data, which variables are nonnegatives. A real data set, was analyzed using the Box Plot and using the new proposed method. Finally, in the third chapter it was proposed a methodological mechanism for the decision of exclusion of the data with high probability of discrepancy. In this chapter there were utilized four data sets, being one a real data set and three simulated computationally. Aiming to theoretically strengthen in all of the work's proposition, it was adopted as guiding principles a combination of theorems of Classic Statistics and of the application of Geostatistics, as main support methodology. The Geostatistics was adopted for incorporating a geographic localization of the data in the analytical process, being based in its statistically great characteristics, meaning that, a created methodology to be without trend and with minimum variance in the prediction of non observed values, besides taking into consideration in the modeling and prediction the structure of the spatial dependence of the samples, with is inherent to the geospatial data.

APA, Harvard, Vancouver, ISO, and other styles

14

Soon, Shih Chung. "On detection of extreme data points in cluster analysis." Connect to resource, 1987. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1262886219.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Araújo, Bilzã Marques de. "Identificação de outliers em redes complexas baseado em caminhada aleatória." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-06102010-141931/.

Full text

Abstract:

Na natureza e na ciência, dados e informações que desviam significativamente da média frequentemente possuem grande relevância. Esses dados são usualmente denominados na literatura como outliers. A identificação de outliers é importante em muitas aplicações reais, tais como detecção de fraudes, diagnóstico de falhas, e monitoramento de condições médicas. Nos últimos anos tem-se testemunhado um grande interesse na área de Redes Complexas. Redes complexas são grafos de grande escala que possuem padrões de conexão não trivial, mostrando-se uma poderosa maneira de representação e abstração de dados. Embora um grande montante de resultados tenham sido reportados nesta área de pesquisa, pouco tem sido explorado acerca de detecção de outliers em redes complexas. Considerando-se a dinâmica de uma caminhada aleatória, foram propostos neste trabalho uma medida de distância e um método de ranqueamento de outliers. Através desta técnica, é possível detectar como outlier não somente nós periféricos, mas também nós centrais (hubs), depedendo da estrutura da rede. Também foi identificado que existem características bem definidas entre os nós outliers, relacionadas a funcionalidade dos mesmos para a rede. Além disso, foi descoberto que nós outliers têm papel importante para a rotulação a priori na tarefa de detecção de comunidades semi-supervisionada. Isto porque os nós centrais são bons difusores de informação e os nós periféricos encontram-se em regiões de borda de comunidade. Baseado nessa observação, foi proposto um método de detecção de comunidades semi-supervisionado. Os resultados de simulações mostram que essa abordagem é promissora
In nature and science, information and data that deviate significantly from the average value often have great relevance. These data are often called in literature as outliers. Outlier identification is important in many real applications, such as fraud detection, fault diagnosis, monitoring of medical conditions. In recent years, it has been witnessed a great interest in the area of Complex Networks. Complex networks are large-scale graphs with non-trivial connection patterns, proving to be a powerful way of data representation and abstraction. Although a large amount of results have been reported in this research area, little has been explored about the outlier detection in complex networks. Considering the dynamics of a random walk, we proposed in this paper a distance measure and a outlier ranking method. By using this technique, we can detect not only peripheral nodes, but also central nodes (hubs) as outliers, depending on the network structure. We also identified that there are well defined relationship between the outlier nodes and the functionality of the same nodes for the network. Furthermore, we found that outliers play an important role to label a priori nodes in the task of semi-supervised community detection. This is because the hubs are good information disseminators and peripheral nodes are usually localized in the regions of community edges. Based on this observation, we proposed a method of semi-supervised community detection. The simulation results show that this approach is promising

APA, Harvard, Vancouver, ISO, and other styles

16

Zamoner, Fabio Willian. "Técnica de aprendizado semissupervisionado para detecção de outliers." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-07042014-100038/.

Full text

Abstract:

Detecção de outliers desempenha um importante papel para descoberta de conhecimento em grandes bases de dados. O estudo é motivado por inúmeras aplicações reais como fraudes de cartões de crédito, detecção de falhas em componentes industriais, intrusão em redes de computadores, aprovação de empréstimos e monitoramento de condições médicas. Um outlier é definido como uma observação que desvia das outras observações em relação a uma medida e exerce considerável influência na análise de dados. Embora existam inúmeras técnicas de aprendizado de máquina para tratar desse problemas, a maioria delas não faz uso de conhecimento prévio sobre os dados. Técnicas de aprendizado semissupervisionado para detecção de outliers são relativamente novas e incluem apenas um pequeno número de rótulos da classe normal para construir um classificador. Recentemente um modelo semissupervisionado baseado em rede foi proposto para classificação de dados empregando um mecanismo de competição e cooperação de partículas. As partículas são responsáveis pela propagação dos rótulos para toda a rede. Neste trabalho, o modelo foi adaptado a fim de detectar outliers através da definição de um escore de outlier baseado na frequência de visitas. O número de visitas recebido por um outlier é significativamente diferente dos demais objetos de mesma classe. Essa abordagem leva a uma maneira não tradicional de tratar os outliers. Avaliações empíricas sobre bases artificiais e reais demonstram que a técnica proposta funciona bem para bases desbalanceadas e atinge precisão comparável às obtidas pelas técnicas tradicionais de detecção de outliers. Além disso, a técnica pode fornecer novas perspectivas sobre como diferenciar objetos, pois considera não somente a distância física, mas também a formação de padrão dos dados
Outloier detection plays an important role for discovering knowledge in large data sets. The study is motivated by plethora of real applications such as credit card frauds, fault detection in industrial components, network instrusion detection, loan application precoessing and medical condition monitoring. An outlier is defined as an observation that deviates from other observations with respect to a measure and exerts a substantial influence on data analysis. Although numerous machine learning techniques have been developed for attacking this problem, most of them work with no prior knowledge of the data. Semi-supervised outlier detection techniques are reçlatively new and include only a few labels of normal class for building a classifier. Recently, a network-based semi-supervised model was proposed for data clasification by employing a mechanism based on particle competiton and cooperation. Such particle competition and cooperaction. Such particles are responsible for label propagation throughout the network. In this work, we adapt this model by defining a new outlier score based on visit frequency counting. The number of visits received by an outlier is significantly different from the remaining objects. This approach leads to an anorthodox way to deal with outliers. Our empirical ecaluations on both real and simulated data sets demonstrate that proposed technique works well with unbalanced data sets and achieves a precision compared to traditional outlier detection techniques. Moreover, the technique might provide new insights into how to differentiate objects because it considers not only the physical distance but also the pattern formation of the data

APA, Harvard, Vancouver, ISO, and other styles

17

Zaharim, Azami. "Outliers and change points in time series data." Thesis, University of Newcastle Upon Tyne, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.295109.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Budzier, Alexander. "Theorizing outliers : explaining variation in IT project performance." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:9fd44230-32a0-41f0-861e-4ef999aea22f.

Full text

Abstract:

IT projects are temporary organizations of strategic importance. Companies invest large amounts of money, time, and resources into business-embedded IT projects in order to change and gain a competitive advantage. Extreme cases of failures were previously only analyzed as case studies, e.g., Denver Airport, London Stock Exchange Taurus, London Ambulance Service. The research poses an important question: What is the risk of these outliers, that is markedly deviant observations of IT project performance? What causes outliers in IT project performance? Only very few studies problematized the frequency of outliers directly. Reported numbers range from 33% to as low as 0.2%. The variation has been explained through biases in planning processes of organizations and as artefact of data collection. An alternative explanation is that the true nature of IT projects contains more variation than commonly assumed. A rich body of organizational, project management, and IT project management literature offers antecedents of outliers. The extant literature falls broadly into three schools of thought: (1) system-centric, (2) event-centric, and (3) process-centric theories of why outliers occurred. System-centric explanations focus on the question of system design, based on theories of normal accidents and high reliability organizations. Event-centric explanations focus on how organizations respond to rare events that impact the organization, based on theories of crisis management, management of organizational turbulence, and strategic surprises. Process-centric explanations focus on the role of managing uncertainty and risk over time, based on theories of man-made disasters, escalation of commitment to a failing course of action, and the normalization of deviance. The study is based on the archival research of 4,307 IT projects from 190 organizations. The findings show that the tail of the cost, schedule, and effort performance distributions is best fitted by a power law, with overwhelming goodness of fit. Moreover, the findings show that system-centric explanations and process-centric theories offer explanations for the thickness of the tail and the odds of an outlier occurring. In particular five variables were associated with outliers: estimated cost and duration, perceived uniqueness of the project, the qualification and motivation of the project team, and the effectiveness of monitoring and controlling. The results show that outliers are not chance events; they follow patterns that are describable. The study showed how design factors, that are often conceptualized as system complexities, and execution factors, that are often conceptualized as the effectiveness of project processes, explain project outliers. Lastly, the thesis draws implications for research and practice.

APA, Harvard, Vancouver, ISO, and other styles

19

Monat, Andre Soares. "Exceptional values in relational databases." Thesis, University of East Anglia, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.359326.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Marques, Henrique Oliveira. "Avaliação e seleção de modelos em detecção não supervisionada de outliers." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26062015-101457/.

Full text

Abstract:

A área de detecção de outliers (ou detecção de anomalias) possui um papel fundamental na descoberta de padrões em dados que podem ser considerados excepcionais sob alguma perspectiva. Uma importante distinção se dá entre as técnicas supervisionadas e não supervisionadas. O presente trabalho enfoca as técnicas de detecção não supervisionadas. Existem dezenas de algoritmos desta categoria na literatura, porém cada um deles utiliza uma intuição própria do que deve ser considerado um outlier ou não, que é naturalmente um conceito subjetivo. Isso dificulta sensivelmente a escolha de um algoritmo em particular e também a escolha de uma configuração adequada para o algoritmo escolhido em uma dada aplicação prática. Isso também torna altamente complexo avaliar a qualidade da solução obtida por um algoritmo/configuração em particular adotados pelo analista, especialmente em função da problemática de se definir uma medida de qualidade que não seja vinculada ao próprio critério utilizado pelo algoritmo. Tais questões estão inter-relacionadas e se referem respectivamente aos problemas de seleção de modelos e avaliação (ou validação) de resultados em aprendizado de máquina não supervisionado. Neste trabalho foi desenvolvido um índice pioneiro para avaliação não supervisionada de detecção de outliers. O índice, chamado IREOS (Internal, Relative Evaluation of Outlier Solutions), avalia e compara diferentes soluções (top-n, i.e., rotulações binárias) candidatas baseando-se apenas nas informações dos dados e nas próprias soluções a serem avaliadas. O índice também é ajustado estatisticamente para aleatoriedade e extensivamente avaliado em vários experimentos envolvendo diferentes coleções de bases de dados sintéticas e reais.
Outlier detection (or anomaly detection) plays an important role in the pattern discovery from data that can be considered exceptional in some sense. An important distinction is that between the supervised and unsupervised techniques. In this work we focus on unsupervised outlier detection techniques. There are dozens of algorithms of this category in literature, however, each of these algorithms uses its own intuition to judge what should be considered an outlier or not, which naturally is a subjective concept. This substantially complicates the selection of a particular algorithm and also the choice of an appropriate configuration of parameters for a given algorithm in a practical application. This also makes it highly complex to evaluate the quality of the solution obtained by an algorithm or configuration adopted by the analyst, especially in light of the problem of defining a measure of quality that is not hooked on the criterion used by the algorithm itself. These issues are interrelated and refer respectively to the problems of model selection and evaluation (or validation) of results in unsupervised learning. Here we developed a pioneer index for unsupervised evaluation of outlier detection results. The index, called IREOS (Internal, Relative Evaluation of Outlier Solutions), can evaluate and compare different candidate (top-n, i.e., binary labelings) solutions based only upon the data information and the solution to be evaluated. The index is also statistically adjusted for chance and extensively evaluated in several experiments involving different collections of synthetic and real data sets.

APA, Harvard, Vancouver, ISO, and other styles

21

Rodriguez, Gabriel. "Unit root, outliers and cointegration analysis with macroeconomic applications." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0028/NQ48794.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

馮榮錦 and Wing-kam Tony Fung. "Analysis of outliers using graphical and quasi-Bayesian methods." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1987. http://hub.hku.hk/bib/B31230842.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Kawabata, Thatiane. "Detecção de outliers espaciais : refinamento de similaridade e desempenho /." São José do Rio Preto, 2015. http://hdl.handle.net/11449/127787.

Full text

Abstract:

Orientador: Carlos Roberto Valêncio
Banca: Rogéria Cristiane Gratão de Souza
Banca: Enzo Seraphim
Resumo: O avanço e desenvolvimento de tecnologias utilizadas na coleta de informações georreferenciáveis proporcionou um aumento na quantidade de dados espaciais armazenados nas bases de dados. Isso também acarretou muitos problemas, comuns em grandes bases de dados, tais como: redundância de dados, dados incompletos, valores desconhecidos e outliers. Com o objetivo de obter informações relevantes dos dados espaciais, a aplicação de algoritmos de prospecção de dados espaciais, principalmente os algoritmos de agrupamentos espaciais, tornou-se uma prática bastante recorrente em todo cenário mundial. Por outro lado, muitos algoritmos atuais desconsideram a presença de outliers locais em dados espaciais, ou apenas consideram a sua localidade em relação aos demais dados da base, o que pode gerar resultados inconsistentes e dificultar a extração de conhecimento. Dessa forma, com o propósito de contribuir nesse sentido, o trabalho visa elaborar um levantamento de informações relacionadas a prospecção de dados espaciais, detecção de outliers convencionais e espaciais, assim como, apresentar os principais trabalhos no estado da arte. Por fim, propõe-se disponibilizar uma abordagem configurável e portável aos resultados dos algoritmos de agrupamento espaciais, na qual inclui-se uma melhoria em um algoritmo de detecção de outliers espaciais, que visa a prospecção de informações no conjunto de dados
Abstract: The progress and development of technologies used to collect spatial information resulted in an increase in the amount of spatial data stored in databases. This also caused many problems, common in large databases, such as data redundancy, incomplete data, unknown values and outliers. Aiming to obtain relevant information from spatial data, the application of algorithms for exploration of spatial data, especially spatial clusters of algorithms, has become a fairly common practice across the world scene. Moreover, many current algorithms ignore the presence of local outliers in spatial data, or just consider your location in relation to other data in base, which can cause inconsistent results and complicate the extraction of knowledge. Thus, in order to contribute to this, the work aims to develop a survey of information related to exploration of spatial data, detection of conventional and spatial outliers, as well as, present the main work in state of the art. Finally, we propose to provide a portable and configurable algorithms to the results of spatial clustering approach, which includes an improvement on an algorithm to detect spatial outliers, aimed at prospecting for information in the dataset
Mestre

APA, Harvard, Vancouver, ISO, and other styles

24

Shaw, James H. M. "Identification of outliers with an application in seed testing." Thesis, University of Edinburgh, 1996. http://hdl.handle.net/1842/12921.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Silva, Flávio Roberto. "Uma abordagem para detecção de outliers em dados categoricos." [s.n.], 2004. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276461.

Full text

Abstract:

Orientador: Geovane Cayres Magalhães
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-04T02:10:07Z (GMT). No. of bitstreams: 1 Silva_FlavioRoberto_M.pdf: 2674028 bytes, checksum: 456319a74b85e74d16832bff92d67eed (MD5) Previous issue date: 2004
Resumo: Outliers são elementos que não obedecem a um padrão do conjunto de dados ao qual eles pertencem. A detecção de outliers pode trazer informações não esperadas e importantes para algumas aplicações, como por exemplo: descoberta de fraudes em sistemas telefônicos e de cartão de crédito e sistemas de detecção de intrusão. Esta dissertação apresenta uma nova abordagem para detecção de outliers em bancos de dados com atributos categóricos. A abordagem proposta usa modelos log-lineares como um padrão para o conjunto de dados, o que torna mais fácil a tarefa de interpretação dos resultados pelo usuário. Também é apresentado o FOCaD (Finding Outliers in Categorical Data), protótipo de um sistema de análise de dados categóricos. Ele ajusta e seleciona modelos, faz testes estatísticos e detecta outliers
Abstract: An outlier is an element that does not conform to a given pattern to a set. Outlier detection can lead to unexpected and useful information to some applications, e.g., discovery of fraud in telephonic and credit card systems, intrusion detection systems. This Master Thesis presents a new approach for outlier detection in databases with categorical attributes. The proposed approach uses log-linear models as a pattern for the dataset, which makes easier the task of interpreting results by the user'. It is also presented FOCaD (Finding Outliers in Categorical Data), a prototype of a categorical data analysis system. It adjusts and selects models, performs statistic tests, and outlier detection
Mestrado
Ciência da Computação
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

26

Rodrigues, Rafael Delalibera. "Detecção de outliers baseada em caminhada determinística do turista." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/59/59143/tde-14062018-223903/.

Full text

Abstract:

Detecção de outliers é uma tarefa fundamental para descoberta de conhecimento em mineração de dados. Cujo objetivo é identificar as amostras de dados que desviam acentuadamente dos padrões apresentados num conjunto de dados. Neste trabalho, apresentamos uma nova técnica de detecção de outliers baseada em caminhada determinística do turista. Especificamente um caminhante é iniciado para cada exemplar de dado, variando-se o tamanho da memória, assim, um exemplar recebe uma alta pontuação de outlier ao participar em poucos atratores, enquanto que receberá uma baixa pontuação no caso de participar numa grande quantidade de atratores. Os resultados experimentais em cenários artificiais e reais evidenciaram um bom desempenho do método proposto. Em comparação com os métodos clássicos, o método proposto apresenta as seguintes características salientes: 1) Identifica os outliers através da determinação de estruturas no espaço de dados ao invés de considerar apenas características físicas, como distância, similaridade e densidade. 2) É capaz de detectar outliers internos, situados em regiões entre dois ou mais agrupamentos. 3) Com a variação do valor de memória, os caminhantes conseguem extrair tanto características locais, quanto globais do conjunto de dados. 4) O método proposto é determinístico, não exigindo diversas execuções (em contraste às técnicas estocásticas). Além disso, neste trabalho caracterizamos, pela primeira vez, que as dinâmicas exibidas pela caminhada do turista podem gerar atratores complexos, com diversos cruzamentos. Sendo que estes podem revelar estruturas ainda mais detalhadas e consequentemente melhorar a detecção dos outliers.
Outlier detection is a fundamental task for knowledge discovery in data mining. It aims to detect data items that deviate from the general pattern of a given data set. In this work, we present a new outlier detection technique using tourist walks. Specifically, starting from each data sample and varying the memory size, a data sample gets a higher outlier score if it participates in few tourist walk attractors, while it gets a low score if it participates in a large number of attractors. Experimental results on artificial and real data sets show good performance of the proposed method. In comparison to classical methods, the proposed one shows the following salient features: 1) It finds out outliers by identifying the structure of the input data set instead of considering only physical features, such as distance, similarity or density. 2) It can detect not only external outliers as classical methods do, but also internal outliers staying among various normal data groups. 3) By varying the memory size, the tourist walks can characterize both local and global structures of the data set. 4) The proposed method is a deterministic technique. Therefore, only one run is sufficient, in contrast to stochastic techniques, which require many runs. Moreover, in this work, we find, for the first time, that tourist walks can generate complex attractors in various crossing shapes. Such complex attractors reveal data structures in more details. Consequently, it can improve the outlier detection.

APA, Harvard, Vancouver, ISO, and other styles

27

Braumann, Maria Manuela São Pedro Abreu. "Sobre testes de detecção de "outliers" em populações exponenciais." Doctoral thesis, Universidade de Évora, 1994. http://hdl.handle.net/10174/11835.

Full text

Abstract:

É óbvio o interesse da detecção de "outliers" em amostras, uma vez que estas podem ser contaminadas por essa observações "surpreendentes", isto é, a informação dada pelas amostras poderá ser distorcida. Torna-se portanto fundamental procurar meios de interpretar ou reconhecer "outliers". No entanto, até agora, a detecção de "outliers" não tem sido feita por métodos rigorosos e objectivos, uma vez que na selecção das observações a testar se têm utilizado apenas processos intuitivos ( os candidatos a "outliers" são escolhidos empiricamente, a priori ). Com o método GAN (generativo de alternativa natural), Rosado, na sua tese de doutoramento (1984), trata o problema de uma forma objectiva, sendo a observação rejeitada como "outlier" escolhida a posteriori, uma vez rejeitada a homogeneidade nas observações. A detecção e tratamento de "outliers" têm importância em todas as áreas científicas e aplicações que recorrem a estudos estatísticos. As técnicas de detecção e identificação de "outliers" são também importantes para a eliminação de elementos estranhos em amostras de populações e para o ajustamento de modelos regressionais ou outros (através do estudo de existência de "outliers" nos resíduos). A distribuição exponencial tem um papel relevante em muitas aplicações, principalmente quando se pretende estudar tempos de vida de sistemas (mecânicos, electrónicos, biológicos ou outros) ou suas componentes. A "performance" dos testes desenvolvidos para detecção e identificação de "outliers" em populações exponenciais, quer os clássicos quer os novos testes obtidos por Rosado, não era conhecida, não obstante a existência de critérios de medição de "performance" para testes desta natureza propostos na literatura e aceites como relevantes. Um dos objectivos deste trabalho é a obtenção de novas estatísticas de teste obtidas pelo método GAN, de forma a cobrir todas as hipóteses possíveis (para um "outlier") no que respeita à distribuição exponencial. Um outro objectivo é a elaboração de tabelas de valores críticos e, por fim, a determinação das medidas de "performance" dos testes já existentes e dos novos obtidos neste trabalho. Será feito um estudo comparativo da "performance" dos vários testes, estudo esse que trará consequências sobre a sua aplicação prática. Posto isto, vejamos sucintamente no que consta o trabalho. No capítulo I começaremos por ver o que é um "outlier", como aparece, como o detectar e por fim como o tratar (o que fazer com ele). Nesta parte do trabalho seguiremos de perto Braumann (1989). A seguir vamos ver o que são testes de discordância para "outliers"; serão apresentados os testes tradicionais e abordado um novo teste de Rosado (1984). No ponto 1.5 abordaremos as medidas de "performance" dos testes. Já no capítulo II será desenvolvido o novo método de Rosado (1984), o método generativo de alternativa natural (método GAN) e com base nele obteremos novas estatísticas de teste para o caso da distribuição exponencial. Para além da obtenção destas novas estatísticas serão também apresentadas as estatísticas já anteriormente obtidas por Rosado, algumas das quais coincidem com as estatísticas clássicas. Para todas estas estatíticas serão determinadas as respectivas funções de distribuição, para os dois casos possíveis: existência de "outlier" na amostra e não existência de "outlier" na amostra. No capítulo III e relativamente a todas as estatísticas, serão determinadas expressões analíticas para o cálculo dos valores críticos e para as medidas de "performance". Apresentar-se-ão também formas de proceder ao cálculo numérico das mesmas, deduzindo-se por vezes fórmulas alternativas que visam facultar e apressar o cálculo, o qual, de outra forma, se tornaria praticamente impossível. No capítulo IV apresentar-se-ão tabelas de valores críticos e tabelas das medidas de "performance". Será ainda feita uma análise destes valores, nomeadamente comparando a "performance" dos novos testes obtidos pelo método GAN com a "performance" de testes tradicionais.

APA, Harvard, Vancouver, ISO, and other styles

28

Miranda, Carla da Fonseca. "Modelação linear de séries temporais na presença de outliers." Master's thesis, Universidade do Porto. Reitoria, 2001. http://hdl.handle.net/10216/10001.

Full text

Abstract:

Dissertação de Mestrado em Estatística apresentada à Faculdade de Ciências da Universidade do Porto
Na análise de séries temporais, encontram-se frequentemente outliers e mudanças estruturais, que podem estar associadas a acontecimentos inesperados ou incontroláveis como por exemplo, greves, guerras, mudanças políticas, ou podem dever-se simplesmente a erros de medição ou de registo de observações.Estas observações podem comprometer os procedimentos usuais de modelação linear de uma série temporal, nomeadamente podem induzir a uma identificação incorrecta de um modelo ARIMA e a uma estimação enviezada dos parâmetros do modelo. O objectivo principal deste trabalho é apresentar alguns procedimentos de modelação linear de uma série temporal na presença de outliers e de mudanças estruturais. A abordagem usualmente adoptada neste tipo de procedimentos consiste na identificação da localização e dos tipos de outliers ou mudanças estruturais e na utilização de modelos de intervenção de Box e Tiao (1975) para acomodar os seus efeitos. Esta aproximação requere iterações entre etapas de detecção, utilizando estatísticas de razão de verosimilhanças para localizar e identificar os outliers e as mudanças estruturais de acordo com o seu tipo, e de estimação de um modelo gerador destas perturbações, para acomodar os seus efeitos. Os outliers usualmente considerados são os outliers do tipo aditivo (AO) e os outliers do tipo inovador (IO) e as mudanças estruturais são as alterações de nível permanentes e transitórias (LC) e (TC). Uma abordagem alternativa ao uso de estatísticas de razão de verosimilhanças paradetectar outliers e alterações de nível, consiste na utilização de estatísticas que se baseiam na exclusão de uma ou de um grupo de observações para medir as consequentes alterações nas estimativas dos parâmetros do modelo. Esta aproximação permite detectar observações influentes que podem ser outliers. Neste sentido, também serão apresentados neste trabalho diagnósticos indicadores de observações e de outliers influentes.

APA, Harvard, Vancouver, ISO, and other styles

29

Kawabata, Thatiane [UNESP]. "Detecção de outliers espaciais: refinamento de similaridade e desempenho." Universidade Estadual Paulista (UNESP), 2015. http://hdl.handle.net/11449/127787.

Full text

Abstract:

Made available in DSpace on 2015-09-17T15:25:36Z (GMT). No. of bitstreams: 0 Previous issue date: 2015-03-06. Added 1 bitstream(s) on 2015-09-17T15:48:34Z : No. of bitstreams: 1 000846509.pdf: 1580186 bytes, checksum: d89c082f46e712aad17c33f71c4143c3 (MD5)
O avanço e desenvolvimento de tecnologias utilizadas na coleta de informações georreferenciáveis proporcionou um aumento na quantidade de dados espaciais armazenados nas bases de dados. Isso também acarretou muitos problemas, comuns em grandes bases de dados, tais como: redundância de dados, dados incompletos, valores desconhecidos e outliers. Com o objetivo de obter informações relevantes dos dados espaciais, a aplicação de algoritmos de prospecção de dados espaciais, principalmente os algoritmos de agrupamentos espaciais, tornou-se uma prática bastante recorrente em todo cenário mundial. Por outro lado, muitos algoritmos atuais desconsideram a presença de outliers locais em dados espaciais, ou apenas consideram a sua localidade em relação aos demais dados da base, o que pode gerar resultados inconsistentes e dificultar a extração de conhecimento. Dessa forma, com o propósito de contribuir nesse sentido, o trabalho visa elaborar um levantamento de informações relacionadas a prospecção de dados espaciais, detecção de outliers convencionais e espaciais, assim como, apresentar os principais trabalhos no estado da arte. Por fim, propõe-se disponibilizar uma abordagem configurável e portável aos resultados dos algoritmos de agrupamento espaciais, na qual inclui-se uma melhoria em um algoritmo de detecção de outliers espaciais, que visa a prospecção de informações no conjunto de dados
The progress and development of technologies used to collect spatial information resulted in an increase in the amount of spatial data stored in databases. This also caused many problems, common in large databases, such as data redundancy, incomplete data, unknown values and outliers. Aiming to obtain relevant information from spatial data, the application of algorithms for exploration of spatial data, especially spatial clusters of algorithms, has become a fairly common practice across the world scene. Moreover, many current algorithms ignore the presence of local outliers in spatial data, or just consider your location in relation to other data in base, which can cause inconsistent results and complicate the extraction of knowledge. Thus, in order to contribute to this, the work aims to develop a survey of information related to exploration of spatial data, detection of conventional and spatial outliers, as well as, present the main work in state of the art. Finally, we propose to provide a portable and configurable algorithms to the results of spatial clustering approach, which includes an improvement on an algorithm to detect spatial outliers, aimed at prospecting for information in the dataset

APA, Harvard, Vancouver, ISO, and other styles

30

Fung, Wing-kam Tony. "Analysis of outliers using graphical and quasi-Bayesian methods /." [Hong Kong] : University of Hong Kong, 1987. http://sunzi.lib.hku.hk/hkuto/record.jsp?B1236146X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Page, Garritt L. "Bayesian mixture modeling and outliers in inter-laboratory studies." [Ames, Iowa : Iowa State University], 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3389133.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Miranda, Carla da Fonseca. "Modelação linear de séries temporais na presença de outliers." Dissertação, Universidade do Porto. Reitoria, 2001. http://hdl.handle.net/10216/10001.

Full text

Abstract:

Dissertação de Mestrado em Estatística apresentada à Faculdade de Ciências da Universidade do Porto
Na análise de séries temporais, encontram-se frequentemente outliers e mudanças estruturais, que podem estar associadas a acontecimentos inesperados ou incontroláveis como por exemplo, greves, guerras, mudanças políticas, ou podem dever-se simplesmente a erros de medição ou de registo de observações.Estas observações podem comprometer os procedimentos usuais de modelação linear de uma série temporal, nomeadamente podem induzir a uma identificação incorrecta de um modelo ARIMA e a uma estimação enviezada dos parâmetros do modelo. O objectivo principal deste trabalho é apresentar alguns procedimentos de modelação linear de uma série temporal na presença de outliers e de mudanças estruturais. A abordagem usualmente adoptada neste tipo de procedimentos consiste na identificação da localização e dos tipos de outliers ou mudanças estruturais e na utilização de modelos de intervenção de Box e Tiao (1975) para acomodar os seus efeitos. Esta aproximação requere iterações entre etapas de detecção, utilizando estatísticas de razão de verosimilhanças para localizar e identificar os outliers e as mudanças estruturais de acordo com o seu tipo, e de estimação de um modelo gerador destas perturbações, para acomodar os seus efeitos. Os outliers usualmente considerados são os outliers do tipo aditivo (AO) e os outliers do tipo inovador (IO) e as mudanças estruturais são as alterações de nível permanentes e transitórias (LC) e (TC). Uma abordagem alternativa ao uso de estatísticas de razão de verosimilhanças paradetectar outliers e alterações de nível, consiste na utilização de estatísticas que se baseiam na exclusão de uma ou de um grupo de observações para medir as consequentes alterações nas estimativas dos parâmetros do modelo. Esta aproximação permite detectar observações influentes que podem ser outliers. Neste sentido, também serão apresentados neste trabalho diagnósticos indicadores de observações e de outliers influentes.

APA, Harvard, Vancouver, ISO, and other styles

33

Liu, Jie. "Exploring Ways of Identifying Outliers in Spatial Point Patterns." Digital Commons @ East Tennessee State University, 2015. https://dc.etsu.edu/etd/2528.

Full text

Abstract:

This work discusses alternative methods to detect outliers in spatial point patterns. Outliers are defined based on location only and also with respect to associated variables. Throughout the thesis we discuss five case studies, three of them come from experiments with spiders and bees, and the other two are data from earthquakes in a certain region. One of the main conclusions is that when detecting outliers from the point of view of location we need to take into consideration both the degree of clustering of the events and the context of the study. When detecting outliers from the point of view of an associated variable, outliers can be identified from a global or local perspective. For global outliers, one of the main questions addressed is whether the outliers tend to be clustered or randomly distributed in the region. All the work was done using the R programming language.

APA, Harvard, Vancouver, ISO, and other styles

34

Hacini, Akram. "Une approche de détection d'outliers en présence de l'incertitude." Thesis, Paris 8, 2018. http://www.theses.fr/2018PA080068.

Full text

Abstract:

Un des aspects de complexité des nouvelles données, issues des différents systèmes de traitement,sont l’imprécision, l’incertitude, et l’incomplétude. Ces aspects ont aggravés la multiplicité etdissémination des sources productrices de données, qu’on observe facilement dans les systèmesde contrôle et de monitoring. Si les outils de la fouille de données sont devenus assez performants avec des données dont on dispose de connaissances a priori fiables, ils ne peuvent pas êtreappliqués aux données où les connaissances elles mêmes peuvent être entachées d’incertitude etd’imprécision. De ce fait, de nouvelles approches qui prennent en compte cet aspect vont certainement améliorer les performances des systèmes de fouille de données, dont la détection desoutliers, objet de notre recherche dans le cadre de cette thèse. Cette thèse s’inscrit dans cette optique, à savoir la proposition d’une nouvelle méthode pourla détection d’outliers dans les données incertaines et/ou imprécises. En effet, l’imprécision etl’incertitude des expertises relatives aux données d’apprentissage, est un aspect de complexitédes données. Pour pallier à ce problème particulier d’imprécision et d’incertitude des donnéesexpertisées, nous avons combinés des techniques issues de l’apprentissage automatique, et plusparticulièrement le clustering, et des techniques issues de la logique floue, en particulier les ensembles flous, et ce, pour pouvoir projeter de nouvelles observations, sur les clusters des donnéesd’apprentissage, et après seuillage, pouvoir définir les observations à considérer comme aberrantes(outliers) dans le jeu de données considéré.Concrètement, en utilisant les tables de décision ambigües (TDA), nous sommes partis des indices d’ambigüité des données d’apprentissage pour calculer les indices d’ambigüités des nouvellesobservations (données de test), et ce en faisant recours à l’inférence floue. Après un clustering del’ensemble des indices d’ambigüité, une opération α-coupe, nous a permis de définir une frontièrede décision au sein des clusters, et qui a été utilisée à son tour pour catégoriser les observations,en normales (inliers) ou aberrantes (outliers). La force de la méthode proposée réside dans sonpouvoir à traiter avec des données d’apprentissage imprécises et/ou incertaines en utilisant uniquement les indices d’ambigüité, palliant ainsi aux différents problèmes d’incomplétude des jeuxde données. Les métriques de faux positifs et de rappel, nous ont permis d’une part d’évaluer lesperformances de notre méthode, et aussi de la paramétrer selon les choix de l’utilisateur
One of the complexity aspects of the new data produced by the different processing systems is the inaccuracy, the uncertainty, and the incompleteness. These aspects are aggravated by the multiplicity and the dissemination of data-generating sources, that can be easily observed within various control and monitoring systems. While the tools of data mining have become fairly efficient with data that have reliable prior knowledge, they cannot be applied to data where the knowledge itself may be tainted with uncertainty and inaccuracy. As a result, new approaches that take into account this aspect will certainly improve the performance of data mining systems, including the detection of outliers,which is the subject of our research in this thesis.This thesis deals therefore with a particular aspect of uncertainty and accuracy, namely the proposal of a new method to detect outliers in uncertain and / or inaccurate data. Indeed, the inaccuracy of the expertise related to the learning data, is an aspect of complexity. To overcome this particular problem of inaccuracy and uncertainty of the expertise data, we have combined techniques resulting from machine learning, especially clustering, and techniques derived from fuzzy logic, especially fuzzy sets. So we will be able to project the new observations, on the clusters of the learning data, and after thresholding, defining the observations to consider as aberrant (outliers) in the considered dataset.Specifically, using ambiguous decision tables (ADTs), we proceeded from the ambiguity indices of the learning data to compute the ambiguity indices of the new observations (test data), using the Fuzzy Inference. After clustering, the set of ambiguity indices, an α-cut operation allowed us to define a decision boundary within the clusters, which was used in turn to categorize the observations as normal (inliers ) or aberrant (outliers). The strength of the proposed method lies in its ability to deal with inaccurate and / or uncertain learning data using only the indices of ambiguity, thus overcoming the various problems of incompleteness of the datasets. The metrics of false positives and recall, allowed us on one hand to evaluate the performances of our method, and also to parameterize it according to the choices of the user

APA, Harvard, Vancouver, ISO, and other styles

35

ALMutawa, Jaafar Hasan Mohamed Yusuf. "Subspace identification of linear systems in the presence of outliers." 京都大学 (Kyoto University), 2006. http://hdl.handle.net/2433/143896.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Sothinathan, Nalaiyini. "Bayesian Analysis for outliers in binomial, Normal and circular data." Thesis, Queen Mary, University of London, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.498204.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Derksen, Timothy J. (Timothy John). "Processing of outliers and missing data in multivariate manufacturing data." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/38800.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.
Includes bibliographical references (leaf 64).
by Timothy J. Derksen.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

38

Masood, Adnan. "Measuring Interestingness in Outliers with Explanation Facility using Belief Networks." NSUWorks, 2014. http://nsuworks.nova.edu/gscis_etd/232.

Full text

Abstract:

This research explores the potential of improving the explainability of outliers using Bayesian Belief Networks as background knowledge. Outliers are deviations from the usual trends of data. Mining outliers may help discover potential anomalies and fraudulent activities. Meaningful outliers can be retrieved and analyzed by using domain knowledge. Domain knowledge (or background knowledge) is represented using probabilistic graphical models such as Bayesian belief networks. Bayesian networks are graph-based representation used to model and encode mutual relationships between entities. Due to their probabilistic graphical nature, Belief Networks are an ideal way to capture the sensitivity, causal inference, uncertainty and background knowledge in real world data sets. Bayesian Networks effectively present the causal relationships between different entities (nodes) using conditional probability. This probabilistic relationship shows the degree of belief between entities. A quantitative measure which computes changes in this degree of belief acts as a sensitivity measure . The first contribution of this research is enhancing the performance for measurement of sensitivity based on earlier research work, the Interestingness Filtering Engine Miner algorithm. The algorithm developed (IBOX - Interestingness based Bayesian outlier eXplainer) provides progressive improvement in the performance and sensitivity scoring of earlier works. Earlier approaches compute sensitivity by measuring divergence among conditional probability of training and test data, while using only couple of probabilistic interestingness measures such as Mutual information and Support to calculate belief sensitivity. With ingrained support from the literature as well as quantitative evidence, IBOX provides a framework to use multiple interestingness measures resulting in better performance and improved sensitivity analysis. The results provide improved performance, and therefore explainability of rare class entities. This research quantitatively validated probabilistic interestingness measures as an effective sensitivity analysis technique in rare class mining. This results in a novel, original, and progressive research contribution to the areas of probabilistic graphical models and outlier analysis.

APA, Harvard, Vancouver, ISO, and other styles

39

Karlsson, Peter S. "Issues of incompleteness, outliers and asymptotics in high dimensional data." Doctoral thesis, Internationella Handelshögskolan, Högskolan i Jönköping, IHH, Economics, Finance and Statistics, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-14934.

Full text

Abstract:

This thesis consists of four individual essays and an introduction chapter. The essays are in the field of multivariate statistical analysis of High dimensional data. The first essay presents the issue of estimating the inverse covariance matrix alone and when it is used within the Mahalanobis distance in High-dimensional data. Three types of ridge-shrinkage estimators of the inverse covariance matrix are suggested and evaluated through Monte Carlo simulations. The second essay deals with incomplete observations in empirical applications of the Arbitrage Pricing Theory model and the interest is to model the underlying covariance structure among the variables by a few common factors. Two possible solutions to the problem are considered and a case study using the Swedish OMX data is conducted for demonstration. In the third essay the issue of outlier detection in High-dimensional data is treated. A number of point estimators of the Mahalanobis distance are suggested and their properties are evaluated. In the fourth and last essay the relation between the second central moment of a distribution to its first raw moment is considered in an financial context. Three possible estimators are considered and it is shown that they are consistent even when the dimension increases proportionally to the number of observations.

APA, Harvard, Vancouver, ISO, and other styles

40

Dishman, Tamarah Crouse. "Identifying Outliers in a Random Effects Model For Longitudinal Data." UNF Digital Commons, 1989. http://digitalcommons.unf.edu/etd/191.

Full text

Abstract:

Identifying non-tracking individuals in a population of longitudinal data has many applications as well as complications. The analysis of longitudinal data is a special study in itself. There are several accepted methods, of those we chose a two-stage random effects model coupled with the Estimation Maximization Algorithm (E-M Algorithm) . Our project consisted of first estimating population parameters using the previously mentioned methods. The Mahalanobis distance was then used to sequentially identify and eliminate non-trackers from the population. Computer simulations were run in order to measure the algorithm's effectiveness. Our results show that the average specificity for the repetitions for each simulation remained at the 99% level. The sensitivity was best when only a single non-tracker was present with a very different parameter a. The sensitivity of the program decreased when more than one tracker was present, indicating our method of identifying a non-tracker is not effective when the estimates of the population parameters are contaminated.

APA, Harvard, Vancouver, ISO, and other styles

41

Ramos, Jonathan da Silva. "Algoritmos de casamento de imagens com filtragem adaptativa de outliers." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-02022017-110428/.

Full text

Abstract:

O registro de imagens tem um papel importante em várias aplicações, tais como reconstrução de objetos 3D, reconhecimento de padrões, imagens microscópicas, entre outras. Este registro é composto por três passos principais: (1) seleção de pontos de interesse; (2) extração de características dos pontos de interesse; (3) correspondência entre os pontos de interesse de uma imagem para a outra. Para os passos 1 e 2, algoritmos como SIFT e SURF têm apresentado resultados satisfatórios. Entretanto, para o passo 3 ocorre a presença de outliers, ou seja, pontos de interesse que foram incorretamente correspondidos. Uma única correspondência incorreta leva a um resultado final indesejável. Os algoritmos para remoção de outliers (consenso) possuem um alto custo computacional, que cresce à medida que a quantidade de outliers aumenta. Com o objetivo de reduzir o tempo de processamento necessário por esses algoritmos, o algoritmo FOMP(do inglês, Filtering out Outliers from Matched Points), foi proposto e desenvolvido neste trabalho para realizar a filtragem de outliers no conjunto de pontos inicialmente correspondidos. O método FOMP considera cada conjunto de pontos como um grafo completo, no qual os pesos são as distâncias entre os pontos. Por meio da soma de diferenças entre os pesos das arestas, o vértice que apresentar maior valor é removido. Para validar o método FOMP, foram realizados experimentos utilizando quatro bases de imagens. Cada base apresenta características intrínsecas: (a) diferenças de rotação zoom da câmera; (b) padrões repetitivos, os quais geram duplicidade nos vetores de características; (c) objetos de formados, tais como plásticos, papéis ou tecido; (d) transformações afins (diferentes pontos de vista). Os experimentos realizados mostraram que o filtro FOMP remove mais de 65% dos outliers, enquanto mantém cerca de 98%dos inliers. A abordagem proposta mantém a precisão dos métodos de consenso, enquanto reduz o tempo de processamento pela metade para os métodos baseados em grafos.
Image matching plays a major role in many applications, such as pattern recognition and microscopic imaging. It encompasses three steps: 1) interest point selection; 2) feature extraction from each point; 3) feature point matching. For steps 1 and 2, traditional interest point detectors/ extractors have worked well. However, for step 3 even a few points incorrectly matched (outliers), might lead to an undesirable result. State-of-the-art consensus algorithms present a high time cost as the number of outlier increases. Aiming at overcoming this problem, we present FOMP, a preprocessing approach, that reduces the number of outliers in the initial set of matched points. FOMP filters out the vertices that present a higher difference among their edges in a complete graph representation of the points. To validate the proposed method, experiments were performed with four image database: (a) variations of rotation or camera zoom; (b) repetitive patterns, which leads to duplicity of features vectors; (c) deformable objects, such as plastics, clothes or papers; (d) affine transformations (different viewpoint). The experimental results showed that FOMP removes more than 65% of the outliers, while keeping over 98% of the inliers. Moreover, the precision of traditional methods is kept, while reducing the processing time of graph based approaches by half.

APA, Harvard, Vancouver, ISO, and other styles

42

Bulhões, Rodrigo de Souza. "Contribuições à análise de outliers em modelos de equações estruturais." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-19062013-135858/.

Full text

Abstract:

O Modelo de Equações Estruturais (MEE) é habitualmente ajustado para realizar uma análise confirmatória sobre as conjecturas de um pesquisador acerca do relacionamento entre as variáveis observadas e latentes de algum estudo. Na prática, a maneira mais recorrente de avaliar a qualidade das estimativas de um MEE é a partir de medidas que buscam mensurar o quanto a usual matriz de covariâncias clássicas ou ordinárias se distancia da matriz de covariâncias do modelo ajustado, ou a magnitude do afastamento entre as funções de discrepância do modelo hipotético e do modelo saturado. Entretanto, elas podem não captar problemas no ajuste quando há muitos parâmetros a estimar ou bastantes observações. A fim de detectar irregularidades no ajustamento resultantes do impacto provocado pela presença de outliers no conjunto de dados, este trabalho contemplou alguns indicadores conhecidos na literatura, como também considerou alterações no Índice da Qualidade do Ajuste (ou GFI, de Goodness-of-Fit Index) e no Índice Corrigido da Qualidade do Ajuste (ou AGFI, de Ajusted Goodness-of-Fit Index), ambos nas expressões para estimação de parâmetros pelo método de Máxima Verossimilhança, que consistiram em substituir a tradicional matriz de covariâncias pelas matrizes de covariâncias computadas com os seguintes estimadores: Elipsoide de Volume Mínimo, Covariância de Determinante Mínimo, S, MM e Gnanadesikan-Kettenring Ortogonalizado (GKO). Através de estudos de simulação sobre perturbações de desvio de simetria e excesso de curtose, em baixa e alta frações de contaminação, em diferentes tamanhos de amostra e quantidades de variáveis observadas afetadas, foi possível constatar que as propostas de modificação do GFI e do AGFI adaptadas pelo estimador GKO foram as únicas que conseguiram ser informativas em todas essas situações, devendo-se escolher a primeira ou a segunda respectivamente quando a quantidade de parâmetros a serem estimados é baixa ou elevada.
The Structural Equation Model (SEM) is usually set to perform a confirmatory analysis on the assumptions of a researcher about the relationship between the observed variables and the latent variables of such a study. In practice, the most iterant way of evaluating the quality of the estimates of a SEM comes either from procedures of measuring how distant the usual classic or ordinary covariance matrix is from the covariance matrix of the adjusted model, or from the magnitude of the hiatus in discrepancy functions of both the hypothetical model and the saturated model. Nevertheless, they may fail to capture problems in the adjustment in the occurrence of either several parameters to estimate or several observations. This study included indicators known in the literature in order to detect irregularities in the adjustment resulting from the impact caused by the presence of outliers in the data set. This study has also considered changes in both the Goodness-of-Fit Index (GFI) and the Adjusted Goodness-of-Fit Index (AGFI) in the expressions for parameter estimation by Maximum Likelihood method, which consisted in replacing the traditional covariance matrix by the robust covariance matrices computed through the following estimators: Minimum Volume Ellipsoid, Minimum Covariance Determinant, S, MM and Orthogonalized Gnanadesikan-Kettenring (OGK). Through simulation studies on disturbances of both symmetry deviations and excess kurtosis in both low and high fractions of contamination in different sample sizes and quantities of affected observed variables it has become clear that the proposals of modification of both the GFI and the AGFI adapted by the OGK estimator were the only ones able to be informative in all these situations. It must be considered that GFI or AGFI must be used when the number of parameters to be estimated is either low or high, respectively.

APA, Harvard, Vancouver, ISO, and other styles

43

Katshunga, Dominique. "Identifying outliers and influential observations in general linear regression models." Thesis, University of Cape Town, 2004. http://hdl.handle.net/11427/6772.

Full text

Abstract:

Includes bibliographical references (leaves 140-149).
Identifying outliers and/or influential observations is a fundamental step in any statistical analysis, since their presence is likely to lead to erroneous results. Numerous measures have been proposed for detecting outliers and assessing the influence of observations on least squares regression results. Since outliers can arise in different ways, the above mentioned measures are based on motivational arguments and they are designed to measure the influence of observations on different aspects of various regression results. In what follows, we investigate how one can combine different test statistics based on residuals and diagnostic plots to identify outliers and influential observations (both in the single and multiple case) in general linear regression models.

APA, Harvard, Vancouver, ISO, and other styles

44

Aghlmandi, Soheila. "Outliers detection in INAR(1) model with negative binomial innovations." Master's thesis, Universidade de Aveiro, 2012. http://hdl.handle.net/10773/9875.

Full text

Abstract:

Mestrado em Matemática e Aplicações
Os processos de contagem, apesar de serem largamente usados na pr atica, continuam a ser alvo de investiga c~ao. Neste trabalho considera-se o processo de contagem autorregressivo de 1a ordem - INAR(1). O objetivo principal consiste em tratar o problema da dete c~ao de outliers aditivos em processos INAR(1), considerando uma distribui c~ao binomial negativa para o processo de inova c~oes. Aplica-se a abordagem bayesiana, atrav es da amostragem de Gibbs, para estimar a probabilidade de que uma observa c~ao seja afetada por um outlier. A metodologia proposta e ilustrada atrav es de v arios exemplos simulados e conjuntos de dados reais.
Discrete-valued, or so called Integer-valued, time series is widely used in practice; but still it can be considered as a new subject for research nowadays. In this context, the variables of the process take place on nite or countable in nite sets. In this work, we study rst-order INteger-valued AutoRegressive, INAR(1), processes. The main goal, however, is to develop the statistical expressions for detecting outliers for the model, by considering the distributions of innovations as negative binomial. The Binomial thinning operator is used in process. This work considers a Bayesian approach to the problem of modeling a negative binomial integer-valued autoregressive time series contaminated with additive outliers. Furthermore, we focus on computational part of detecting the outliers of INAR(1) process where we use R software. We show how Gibbs sampling can be used to detect outlying observations in INAR(1) processes.

APA, Harvard, Vancouver, ISO, and other styles

45

Lowthian, Philip James. "Some studies on the perception of outliers in graphical displays." Thesis, Keele University, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.282633.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Cordner, Sheila Connors. "Educational outliers: exclusion as innovation in nineteenth-century British literature." Thesis, Boston University, 2013. https://hdl.handle.net/2144/12740.

Full text

Abstract:

Thesis (Ph.D.)--Boston University PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.
This dissertation traces a genealogy of literary resistance to dominant pedagogies in nineteenth-century Britain. Although politicians, religious leaders, and literary authors celebrated the expansion of schools for people outside of privileged classes, a persistent tradition of writers registers the loss of non-institutional forms of learning. Excluded from Oxford and Cambridge because of their class or gender, Jane Austen, Elizabeth Barrett Browning, Thomas Hardy, and Virginia Woolf use their position outside of educational institutions to critique rote learning at universities for the elite as well as utilitarian schools for the masses. Hardy describes the "mental limitations" of Angel's Cambridge-educated brothers in Tess of the d'Urbervilles (1891), for example, mocking them as "such unimpeachable models as are turned out yearly by the lathe of systematic tuition." The radicalism of educational outliers emerges when read alongside educational pamphlets, working-men's club reports, college newspapers, and parliamentary debates. Educational outliers investigate the role that literature plays in un-teaching readers. They model alternative pedagogies centered on active learning instead of rote memorization. With Mansfield Park (1814), Austen inaugurates this tradition; at a time when proclamations on women's education proliferated, she offers novels as anti-treatises that constantly disrupt the reading experience instead of offering simplistic truths, forcing us to rely on our own judgment to make sense of the disorder that characterizes her model of self-education. Several decades later in her "novel-poem" Aurora Leigh (1856), Barrett Browning instructs us in a "headlong," empathic reading of her text as part of her experiential learning approach for women of different classes that stresses reform from within. Writing after more working-class schools had opened, Hardy tests the novel's capacity to un-teach assumptions about categories like "autodidact" itself and rewrites the celebratory self-made man's narrative by placing the reader in the position from which to weigh the positives and negatives of self-education. In the early twentieth century, Woolf imagines an education that "unfixes" students from their rigid class mindset in her "essay-novel" The Pargiters. Educational outliers' innovations ultimately prompt us to think about what outsiders' perspectives might be helpful today.

APA, Harvard, Vancouver, ISO, and other styles

47

Aquino, Artur Ribeiro de. "Um método para interpretar outliers em trajetórias de objetos móveis." reponame:Repositório Institucional da UFSC, 2014. https://repositorio.ufsc.br/xmlui/handle/123456789/123274.

Full text

Abstract:

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2014.
Made available in DSpace on 2014-08-06T18:00:39Z (GMT). No. of bitstreams: 1 326743.pdf: 3567685 bytes, checksum: 552537134134fce0fdc4becfa0599acf (MD5) Previous issue date: 2014
Dispositivos capazes de registrar o rastro de um objeto móvel estão cada vez mais populares. Esses registros são chamados de Trajetórias de Objetos Móveis. Devido ao grande volume desses dados surge a necessidade de criar métodos e algoritmos para extrair alguma informação útil desses dados. Existem vários trabalhos de mineração de dados em trajetórias para detectar diferentes tipos de padrões, porém poucos focam na detecção de outliers entre trajetórias. Os outliers entre trajetórias são aqueles com um comportamento ou característica diferente da maioria. Se a maioria dos objetos estão andando a 80km/h em um determinado trecho, os objetos a 120km/h são os outliers. Outliers de trajetórias podem ser interessantes para descobrir comportamentos suspeitos em um grupo de pessoas, para encontrar rotas alternativas na análise de tráfego e até saber quais são os melhores ou piores caminhos conectando duas regiões de interesse. Não se teve conhecimento de um outro trabalho na literatura que fizesse uma análise mais aprofundada, que interpretasse ou desse significado aos outliers. A semântica dos outliers pode prover mais informação para tomadas de decisão. Nesse trabalho é apresentado um algoritmo para agregar significado aos outliers de trajetórias de motoristas considerando três possíveis razões principais para um desvio: paradas fora do caminho padrão, eventos ou trânsito no caminho padrão. Experimentos são mostrados com dados reais e o método encontra os diferentes tipos e classificações de outliers corretamente.

Abstract : Devices for recording moving object traces are becoming very popular. These traces are called Trajectories of Moving Objects. The huge volume of these data raises the need for developing methods and algorithms to extract useful information from these data. There are many works related to trajectory data mining that nd dierent types of patterns, but only a few of them focused on outlier detection between trajectories. Outliers between trajectories are the ones that behave different from the majority. If the majority of the objects are going on a speed of 80km/h in some part of a road, for example, the objects on 120km/h are the outliers. Trajectory outliers are interesting to discover suspicious behaviors in a group of people, to nd alternative routes in trac analysis and even to discover better and worse paths connecting two regions of interest. To the best of our knowledge, no works so far have made a deeper analysis to either understand or give a meaning to the outliers. Outliers with semantic information can provide more information for decision making. In this work we present an algorithm to add meaning to trajectory outliers of vehicles drivers considering three main possible reasons for a detour: stops outside the standard route, events, and trac jams in the standard path. We show throughexperiments on real data that the method correctly nds the dierent types of outliers and classies them correctly.

APA, Harvard, Vancouver, ISO, and other styles

48

Dall'Acqua, Fernando Maida. "Risco soberano Brasil: uma explicação do spread e dos outliers." reponame:Repositório Institucional do FGV, 2003. http://hdl.handle.net/10438/5611.

Full text

Abstract:

Made available in DSpace on 2010-04-20T20:20:11Z (GMT). No. of bitstreams: 0 Previous issue date: 2003-09-29T00:00:00Z
Este trabalho tem como objetivo propor um exame sistemático do chamado prêmio do risco soberano dos títulos emitidos pelo governo brasileiro que permita a categorização dos fatores que possam ser entendidos como geradores do conceito de risco soberano.

APA, Harvard, Vancouver, ISO, and other styles

49

Barbosa, Josino José. "Identificação de outliers multivariados - Uma aplicação em dados de saúde." Universidade Federal de Viçosa, 2017. http://www.locus.ufv.br/handle/123456789/10041.

Full text

Abstract:

Submitted by Marco Antônio de Ramos Chagas (mchagas@ufv.br) on 2017-04-11T14:32:01Z No. of bitstreams: 1 texto completo.pdf: 1609406 bytes, checksum: 9cedba288b402aa34f47d430c8a495cf (MD5)
Made available in DSpace on 2017-04-11T14:32:01Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1609406 bytes, checksum: 9cedba288b402aa34f47d430c8a495cf (MD5) Previous issue date: 2017-02-17
A identificação de outliers desempenha um papel importante na análise estatística, pois tais observações podem conter informações importantes em relação aos dados. Se modelos estatísticos clássicos são cegamente aplicados a dados contendo valores atípicos, os resultados podem ser enganosos e decisões equivocadas podem ser tornadas. Além disso, em situações práticas, os próprios outliers são muitas vezes os pontos especiais de interesse e sua identificação pode ser o principal objetivo da investigação. Por isso, a finalidade desse trabalho é propor uma técnica de detecção de outliers multivariados, baseada em análise agrupamento e comparar essa técnica com o método de identificação de outliers via Distância de Mahalanobis. Para geração dos dados utilizou-se simulação através do Método de Monte Carlo e a técnica de mistura de distribuições normais multivariadas. Os resultados apresentados nas simulações mostram que o método proposto foi superior ao método de Mahalanobis tanto para sensibilidade quanto para especificidade, ou seja, ele apresenta maior capacidade de diagnosticar corretamente os indivíduos outliers e os não outliers. Além disso, a metodologia proposta foi ilustrada com uma aplicação em dados reais provenientes da área de saúde.
The identification of outliers plays an important role in statistical analysis, as such observations may contain important information regarding the data. If classical statistical models are blindly applied to data containing atypical values, the results may be misleading and mistaken decisions can be made. Moreover, in practical situations, the outliers themselves are often the special points of interest and their identification may be the main objective of the investigation. Therefore, the purpose of this work is to propose a technique of detection of multivariate outliers based on cluster analysis and to compare this technique with the method of identifying outliers via Mahalanobis Distance. For data generation, the Monte Carlo method and the mixed-multivariate normal distribution technique were used. The results presented in the simulations show that the proposed method was superior to the Mahalanobis method for both sensitivity and specificity, that is, it presents greater capacity to correctly diagnose outliers and non-outliers individuals. In addition, the proposed methodology was illustrated with an application in real data from the health area.

APA, Harvard, Vancouver, ISO, and other styles

50

Kim, Younghui. "Seeking for outliers: Artistic exploration of data through creative practice." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/206985/1/Younghui_Kim_Thesis.pdf.

Full text

Abstract:

Situating this art practice as data art in the field of digital art, I suggest a new landscape of data art by exploring data as an art material, medium, a concept-driver and as containing the current social-political issues of data bias. The creative outcomes of this research project result from my process of artistic exploration of data. This has been a journey that explored the context of data and artistic potential of outliers as a concept-driver.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!