Dissertations / Theses on the topic 'Outliers'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Outliers.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Sean, Viseth. "Exploration Framework For Detecting Outliers In Data Streams." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-theses/395.
Full textBeau, Thabiso. "Normality of JSE Returns: Macro-outliers, Micro-outliers: an Empirical Evaluation." Master's thesis, Faculty of Commerce, 2019. https://hdl.handle.net/11427/31721.
Full textMitchell, Napoleon. "Outliers and Regression Models." Thesis, University of North Texas, 1992. https://digital.library.unt.edu/ark:/67531/metadc279029/.
Full textYin, Yong. "Outliers in Time Series /." Connect to resource, 1995. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1262638388.
Full textHalldestam, Markus. "ANOVA - The Effect of Outliers." Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-295864.
Full textSchall, Robert. "Outliers and influence under arbitrary variance." Doctoral thesis, University of Cape Town, 1986. http://hdl.handle.net/11427/21913.
Full textCampos, Guilherme Oliveira. "Estudo, avaliação e comparação de técnicas de detecção não supervisionada de outliers." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-04082015-084412/.
Full textThe outlier detection area has an essential role in discovering patterns in data that can be considered as exceptional in some perspective. Detect such patterns is important in general because, in many data mining applications, such patterns represent extraordinary behaviors that deserve special attention. An important distinction occurs between supervised and unsupervised detection techniques. This project focuses on the unsupervised detection techniques. There are dozens of algorithms in this category in literature and new algorithms are proposed from time to time, but each of them uses its own approach of what should be considered an outlier or not, which is a subjective concept in the unsupervised context. This considerably complicates the choice of a particular algorithm in a given practical application. While it is common knowledge that no machine learning algorithm can be superior to all others in all application scenarios, it is a relevant question if the performance of certain algorithms in general tends to dominate certain other, at least in particular classes of problems. In this project, proposes to contribute to the databases study, selection and pre-processing that are appropriate to join a benchmark collection for evaluating unsupervised outlier detection algorithms. It is also proposed to evaluate comparatively the performance of outlier detection methods. During part of my master thesis, I had the intellectual collaboration of Erich Schubert, Ira Assent, Barbora Micenková, Michael Houle and especially Joerg Sander and Arthur Zimek. Their contribution was essential for the analysis of the results and the compact way to present them.
Berton, Lilian. "Caracterização de classes e detecção de outliers em redes complexa." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-19072011-132701/.
Full textComplex networks have emerged as a new and important way of representation and data abstraction capable of capturing the spatial relationships, topological, functional, and other features present in many databases. Among the various approaches to data analysis, we highlight classification and outlier detection. Data classification allows to assign a class to the data based on characteristics of their attributes and outlier detection search for data whose characteristics differ from the others. Methods of data classification and outlier detection based on complex networks are still little studied. Given the benefits provided by the use of complex networks in data representation, this study developed a method based on complex networks to detect outliers based on random walk and on a dissimilarity index. The method allows the identification of different types of outliers using the same measure. Depending on the structure of the network, the vertices outliers can be either those distant from the center as the central, can be hubs or vertices with few connections. In general, the proposed measure is a good estimator of outlier vertices in a network, properly identifying vertices with a different structure or a special function in the network. We also propose a technique for building networks capable of representing similarity relationships between classes of data based on an energy function that considers measures of purity and extension of the network. This network was used to characterize mixing among data classes. Characterization of classes is an important issue in data classification, but it is little explored. We consider that this work is one of the first attempts in this direction
Iranzo, Pérez David. "Análisis de outliers: un caso a estudio." Doctoral thesis, Universitat de València, 2007. http://hdl.handle.net/10803/9467.
Full textOne of the limitations of using ARIMA modelling, and more specifically theBox-Jenkins approach, to study time series is how difficult it is to correctly identify themodel and, where applicable, to choose the most suitable one. The standard filteringprocess used to estimate the business cycle can require the prior correction of someseries, due to the fact that if this were not the case, results could be seriously distorted.One outstanding example is outlier correction.Outliers denote unusual observations that, generally speaking, cannot beexplained by the ARIMA model and violate its underlying normality assumptions. Asthe ARIMA models frequently used in time series are designed to capture informationin processes that have some degree of homogeneity, their efficiency and goodness-of-fitcan be influenced by outliers and structural changes.Following the seminal research by Fox, four different types of outliers areproposed, together with various processes to detect them. The four types of outlierscontemplated in the literature are: Additive Outlier (AO), Level Shift (LS), TemporaryChange (TC) and Innovational Outlier (IO).In order to illustrate this research, in the first place, an experiment is carried outusing nine thousand white noise series simulated using a random data generationfunction after considering three different econometric models and, at the same time,three different sample periods in each case (60, 120 and 300 observations).Furthermore, the presence of three types of outliers will be forced (AO, LS and TC)with three different levels of impact. A total of 100 series will be studied for each ofthese specific cases.In the second place, real series are used to analyse the influence of a shockcaused by a terrorist attack on tourism activity in a given area. In order to do so, wecarry out a detailed study of travellers' total overnight stays in hotels by country oforigin.Both programmes, that is, TRAMO/SEAT and X12ARIMA, are used to analysedata in both the experiment with generated series and that using real series in order tocompare results and hence establish differences between the two.
Dunagan, John D. (John David) 1976. "A geometric theory of outliers and perturbation." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/8396.
Full textIncludes bibliographical references (p. 91-94).
We develop a new understanding of outliers and the behavior of linear programs under perturbation. Outliers are ubiquitous in scientific theory and practice. We analyze a simple algorithm for removal of outliers from a high-dimensional data set and show the algorithm to be asymptotically good. We extend this result to distributions that we can access only by sampling, and also to the optimization version of the problem. Our results cover both the discrete and continuous cases. This is joint work with Santosh Vempala. The complexity of solving linear programs has interested researchers for half a century now. We show that an arbitrary linear program subject to a small random relative perturbation has good condition number with high probability, and hence is easy to solve. This is joint work with Avrim Blum, Daniel Spielman, and Shang-Hua Teng. This result forms part of the smoothed analysis project initiated by Spielman and Teng to better explain mathematically the observed performance of algorithms.
by John D. Dunagan.
Ph.D.
Astl, Stefan Ludwig. "Suboptimal LULU-estimators in measurements containing outliers." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019.1/85833.
Full textENGLISH ABSTRACT: Techniques for estimating a signal in the presence of noise which contains outliers are currently not well developed. In this thesis, we consider a constant signal superimposed by a family of noise distributions structured as a tunable mixture f(x) = α g(x) + (1 − α) h(x) between finitesupport components of “well-behaved” noise with small variance g(x) and of “impulsive” noise h(x) with a large amplitude and strongly asymmetric character. When α ≈ 1, h(x) can for example model a cosmic ray striking an experimental detector. In the first part of our work, a method for obtaining the expected values of the positive and negative pulses in the first resolution level of a LULU Discrete Pulse Transform (DPT) is established. Subsequent analysis of sequences smoothed by the operators L1U1 or U1L1 of LULU-theory shows that a robust estimator for the location parameter for g is achieved in the sense that the contribution by h to the expected average of the smoothed sequences is suppressed to order (1 − α)2 or higher. In cases where the specific shape of h can be difficult to guess due to the assumed lack of data, it is thus also shown to be of lesser importance. Furthermore, upon smoothing a sequence with L1U1 or U1L1, estimators for the scale parameters of the model distribution become easily available. In the second part of our work, the same problem and data is approached from a Bayesian inference perspective. The Bayesian estimators are found to be optimal in the sense that they make full use of available information in the data. Heuristic comparison shows, however, that Bayes estimators do not always outperform the LULU estimators. Although the Bayesian perspective provides much insight into the logical connections inherent in the problem, its estimators can be difficult to obtain in analytic form and are slow to compute numerically. Suboptimal LULU-estimators are shown to be reasonable practical compromises in practical problems.
AFRIKAANSE OPSOMMING: Tegnieke om ’n sein af te skat in die teenwoordigheid van geraas wat uitskieters bevat is tans nie goed ontwikkel nie. In hierdie tesis aanskou ons ’n konstante sein gesuperponeer met ’n familie van geraasverdelings wat as verstelbare mengsel f(x) = α g(x) + (1 − α) h(x) tussen eindige-uitkomsruimte geraaskomponente g(x) wat “goeie gedrag” en klein variansie toon, plus “impulsiewe” geraas h(x) met groot amplitude en sterk asimmetriese karakter. Wanneer α ≈ 1 kan h(x) byvoorbeeld ’n kosmiese straal wat ’n eksperimentele apparaat tref modelleer. In die eerste gedeelte van ons werk word ’n metode om die verwagtingswaardes van die positiewe en negatiewe pulse in die eerste resolusievlak van ’n LULU Diskrete Pulse Transform (DPT) vasgestel. Die analise van rye verkry deur die inwerking van die gladstrykers L1U1 en U1L1 van die LULU-teorie toon dat hul verwagte gemiddelde waardes as afskatters van die liggingsparameter van g kan dien wat robuus is in die sin dat die bydrae van h tot die gemiddeld van orde grootte (1 − α)2 of hoër is. Die spesifieke vorm van h word dan ook onbelangrik. Daar word verder gewys dat afskatters vir die relevante skaalparameters van die model maklik verkry kan word na gladstryking met die operatore L1U1 of U1L1. In die tweede gedeelte van ons werk word dieselfde probleem en data vanuit ’n Bayesiese inferensie perspektief benader. Die Bayesiese afskatters word as optimaal bevind in die sin dat hulle vol gebruikmaak van die beskikbare inligting in die data. Heuristiese vergelyking wys egter dat Bayesiese afskatters nie altyd beter vaar as die LULU afskatters nie. Alhoewel die Bayesiese sienswyse baie insig in die logiese verbindings van die probleem gee, kan die afskatters moeilik wees om analities af te lei en stadig om numeries te bereken. Suboptimale LULU-beramers word voorgestel as redelike praktiese kompromieë in praktiese probleme.
Giroldo, Fabíola Rocha de Santana. "Alguns métodos robustos para detectar outliers multivariados." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-20102009-211316/.
Full textUnusual observations or outliers are frequent in any data set, if it is large or not. Outliers may occur by typing mistake or by the existence of observations that are really different from the others. The presence of this observations may distort the results of models and estimates. Therefore, their detection is very important and it is recommended to be performed before any detailed analysis, when a decision can be taken about these atypical observations. A possibility is to correct these observations if the problem occurred with the construction of the data set. If the observations are correct, different strategies can be adopted, with some weights or with special analysis. In univariate and bivariate data sets, outliers can be detected analyzing the scatter plot. Observations distant from the cloud formed by the data set are considered unusual. In multivariate data sets, the detection of outliers using graphics is more difficult because we have to analyse a couple of variables each time, which results is a long and less reliable process because we can find an observation that is unusual for one variable and not unusual for the others, masking the results. In this work, some robust methods for detection of multivariate outliers are presented. The application of each one is done for an example. Moreover, the methods are compared by the results of each one in the example and by simulation.
Santos, Adriana Maria Rocha Trancoso. "Outliers em variáveis geoespaciais: proprosições utilizando geoestatística." Universidade Federal de Viçosa, 2016. http://www.locus.ufv.br/handle/123456789/9784.
Full textMade available in DSpace on 2017-03-14T11:45:38Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1172199 bytes, checksum: 33710fa298bd2474b7030d1c436c7f20 (MD5) Previous issue date: 2016-12-16
Faculdades Adventistas de Minas Gerais
As observações que se afastam estatisticamente das demais em um conjunto de dados comumente são denominadas de outliers. Tal comportamento faculta o surgimento de hipóteses como por exemplo, a de que os dados pertencem à outra população. Contudo, independentemente das hipóteses que podem surgir, é importante considerar frequentemente a adequabilidade das metodologias existentes aos diversos tipos de variáveis envolvidas em investigações científicas. Na literatura especializada, é comum encontrar na metodologia o uso do Box Plot como principal mecanismo de detecção, e a exclusão dos dados “discrepantes”, detectados por este mecanismo, do conjunto de dados em estudo. Como o Box Plot é um mecanismo que não leva em consideração a posição geográfica dos dados, tem-se como hipótese a não aplicabilidade deste em dados geoespaciais contínuos. Assim, apresenta-se neste trabalho um estudo sobre a importância da proposição de métodos de detecção de outliers que incorporam a localização dos dados, bem como a comparação de seu desempenho com o Box Plot. No primeiro capítulo foi proposto um novo método de detecção de outliers para dados geoespaciais contínuos, em que um conjunto de dados reais, sabidamente com outliers, foi analisado tanto pelo Box Plot quanto pelo método em proposição. No segundo capítulo foi proposto um novo método de detecção de outliers para dados geoespaciais contínuos, cujas variáveis são não-negativas. Um conjunto de dados reais foi analisado usando o Box Plot e usando o novo método proposto. Finalmente, no terceiro capítulo foi proposto um mecanismo metodológico para a decisão de exclusão dos dados com alta probabilidade de discrepância. Neste capítulo foram utilizados quatro conjuntos de dados, sendo três simulados computacionalmente e um conjunto de dados reais. Visando robustecer teoricamente toda a proposição do trabalho, adotou-se como princípios norteadores uma combinação de teoremas da Estatística Clássica e da aplicação da Geoestatística, como principal metodologia de apoio. A Geoestatística foi adotada por incorporar a localização geográfica dos dados no processo analítico, estar baseada em suas características estatisticamente ótimas, ou seja, uma metodologia criada para ser sem tendência e com variância mínima na predição de valores não observados, além de levar em consideração na modelagem e predição a estrutura de dependência espacial das amostras, o que é inerente aos dados geoespaciais.
The observations that differ statistically from the others in a data set commonly are named outliers. Such behavior empowers the emergence of hypothesis such as, the data belong to another population. However, independently from the hypothesis that may arise, it is important to consider frequently the suitability of the existent methodologies to the many types of involved variables in scientific investigations. In the specialized literacy, it is common to find in the suggested methodology the use of the Box Plot as a main mechanism of detection, and the exclusion of "discrepant" data of the data set studied, detected by this mechanism. Since the Box Plot is a mechanism that does not take into consideration the geographic position of the data, there is the hypothesis of the non- suitability of such mechanism in continuous geospatial data. Thus, it is presented in this work a study about the importance of a proposition of methods of outliers detection that incorporate the localization of the data, comparing them to the Box Plot. In the first chapter it was proposed a new method of outliers detection for continuous geospatial data, in which the real data set, with known outliers, was analyzed through the Box Plot and the proposition method. In the second chapter it was proposed a new method of outliers detection for continuous geospatial data, which variables are nonnegatives. A real data set, was analyzed using the Box Plot and using the new proposed method. Finally, in the third chapter it was proposed a methodological mechanism for the decision of exclusion of the data with high probability of discrepancy. In this chapter there were utilized four data sets, being one a real data set and three simulated computationally. Aiming to theoretically strengthen in all of the work's proposition, it was adopted as guiding principles a combination of theorems of Classic Statistics and of the application of Geostatistics, as main support methodology. The Geostatistics was adopted for incorporating a geographic localization of the data in the analytical process, being based in its statistically great characteristics, meaning that, a created methodology to be without trend and with minimum variance in the prediction of non observed values, besides taking into consideration in the modeling and prediction the structure of the spatial dependence of the samples, with is inherent to the geospatial data.
Soon, Shih Chung. "On detection of extreme data points in cluster analysis." Connect to resource, 1987. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1262886219.
Full textAraújo, Bilzã Marques de. "Identificação de outliers em redes complexas baseado em caminhada aleatória." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-06102010-141931/.
Full textIn nature and science, information and data that deviate significantly from the average value often have great relevance. These data are often called in literature as outliers. Outlier identification is important in many real applications, such as fraud detection, fault diagnosis, monitoring of medical conditions. In recent years, it has been witnessed a great interest in the area of Complex Networks. Complex networks are large-scale graphs with non-trivial connection patterns, proving to be a powerful way of data representation and abstraction. Although a large amount of results have been reported in this research area, little has been explored about the outlier detection in complex networks. Considering the dynamics of a random walk, we proposed in this paper a distance measure and a outlier ranking method. By using this technique, we can detect not only peripheral nodes, but also central nodes (hubs) as outliers, depending on the network structure. We also identified that there are well defined relationship between the outlier nodes and the functionality of the same nodes for the network. Furthermore, we found that outliers play an important role to label a priori nodes in the task of semi-supervised community detection. This is because the hubs are good information disseminators and peripheral nodes are usually localized in the regions of community edges. Based on this observation, we proposed a method of semi-supervised community detection. The simulation results show that this approach is promising
Zamoner, Fabio Willian. "Técnica de aprendizado semissupervisionado para detecção de outliers." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-07042014-100038/.
Full textOutloier detection plays an important role for discovering knowledge in large data sets. The study is motivated by plethora of real applications such as credit card frauds, fault detection in industrial components, network instrusion detection, loan application precoessing and medical condition monitoring. An outlier is defined as an observation that deviates from other observations with respect to a measure and exerts a substantial influence on data analysis. Although numerous machine learning techniques have been developed for attacking this problem, most of them work with no prior knowledge of the data. Semi-supervised outlier detection techniques are reçlatively new and include only a few labels of normal class for building a classifier. Recently, a network-based semi-supervised model was proposed for data clasification by employing a mechanism based on particle competiton and cooperation. Such particle competition and cooperaction. Such particles are responsible for label propagation throughout the network. In this work, we adapt this model by defining a new outlier score based on visit frequency counting. The number of visits received by an outlier is significantly different from the remaining objects. This approach leads to an anorthodox way to deal with outliers. Our empirical ecaluations on both real and simulated data sets demonstrate that proposed technique works well with unbalanced data sets and achieves a precision compared to traditional outlier detection techniques. Moreover, the technique might provide new insights into how to differentiate objects because it considers not only the physical distance but also the pattern formation of the data
Zaharim, Azami. "Outliers and change points in time series data." Thesis, University of Newcastle Upon Tyne, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.295109.
Full textBudzier, Alexander. "Theorizing outliers : explaining variation in IT project performance." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:9fd44230-32a0-41f0-861e-4ef999aea22f.
Full textMonat, Andre Soares. "Exceptional values in relational databases." Thesis, University of East Anglia, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.359326.
Full textMarques, Henrique Oliveira. "Avaliação e seleção de modelos em detecção não supervisionada de outliers." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26062015-101457/.
Full textOutlier detection (or anomaly detection) plays an important role in the pattern discovery from data that can be considered exceptional in some sense. An important distinction is that between the supervised and unsupervised techniques. In this work we focus on unsupervised outlier detection techniques. There are dozens of algorithms of this category in literature, however, each of these algorithms uses its own intuition to judge what should be considered an outlier or not, which naturally is a subjective concept. This substantially complicates the selection of a particular algorithm and also the choice of an appropriate configuration of parameters for a given algorithm in a practical application. This also makes it highly complex to evaluate the quality of the solution obtained by an algorithm or configuration adopted by the analyst, especially in light of the problem of defining a measure of quality that is not hooked on the criterion used by the algorithm itself. These issues are interrelated and refer respectively to the problems of model selection and evaluation (or validation) of results in unsupervised learning. Here we developed a pioneer index for unsupervised evaluation of outlier detection results. The index, called IREOS (Internal, Relative Evaluation of Outlier Solutions), can evaluate and compare different candidate (top-n, i.e., binary labelings) solutions based only upon the data information and the solution to be evaluated. The index is also statistically adjusted for chance and extensively evaluated in several experiments involving different collections of synthetic and real data sets.
Rodriguez, Gabriel. "Unit root, outliers and cointegration analysis with macroeconomic applications." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0028/NQ48794.pdf.
Full text馮榮錦 and Wing-kam Tony Fung. "Analysis of outliers using graphical and quasi-Bayesian methods." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1987. http://hub.hku.hk/bib/B31230842.
Full textKawabata, Thatiane. "Detecção de outliers espaciais : refinamento de similaridade e desempenho /." São José do Rio Preto, 2015. http://hdl.handle.net/11449/127787.
Full textBanca: Rogéria Cristiane Gratão de Souza
Banca: Enzo Seraphim
Resumo: O avanço e desenvolvimento de tecnologias utilizadas na coleta de informações georreferenciáveis proporcionou um aumento na quantidade de dados espaciais armazenados nas bases de dados. Isso também acarretou muitos problemas, comuns em grandes bases de dados, tais como: redundância de dados, dados incompletos, valores desconhecidos e outliers. Com o objetivo de obter informações relevantes dos dados espaciais, a aplicação de algoritmos de prospecção de dados espaciais, principalmente os algoritmos de agrupamentos espaciais, tornou-se uma prática bastante recorrente em todo cenário mundial. Por outro lado, muitos algoritmos atuais desconsideram a presença de outliers locais em dados espaciais, ou apenas consideram a sua localidade em relação aos demais dados da base, o que pode gerar resultados inconsistentes e dificultar a extração de conhecimento. Dessa forma, com o propósito de contribuir nesse sentido, o trabalho visa elaborar um levantamento de informações relacionadas a prospecção de dados espaciais, detecção de outliers convencionais e espaciais, assim como, apresentar os principais trabalhos no estado da arte. Por fim, propõe-se disponibilizar uma abordagem configurável e portável aos resultados dos algoritmos de agrupamento espaciais, na qual inclui-se uma melhoria em um algoritmo de detecção de outliers espaciais, que visa a prospecção de informações no conjunto de dados
Abstract: The progress and development of technologies used to collect spatial information resulted in an increase in the amount of spatial data stored in databases. This also caused many problems, common in large databases, such as data redundancy, incomplete data, unknown values and outliers. Aiming to obtain relevant information from spatial data, the application of algorithms for exploration of spatial data, especially spatial clusters of algorithms, has become a fairly common practice across the world scene. Moreover, many current algorithms ignore the presence of local outliers in spatial data, or just consider your location in relation to other data in base, which can cause inconsistent results and complicate the extraction of knowledge. Thus, in order to contribute to this, the work aims to develop a survey of information related to exploration of spatial data, detection of conventional and spatial outliers, as well as, present the main work in state of the art. Finally, we propose to provide a portable and configurable algorithms to the results of spatial clustering approach, which includes an improvement on an algorithm to detect spatial outliers, aimed at prospecting for information in the dataset
Mestre
Shaw, James H. M. "Identification of outliers with an application in seed testing." Thesis, University of Edinburgh, 1996. http://hdl.handle.net/1842/12921.
Full textSilva, Flávio Roberto. "Uma abordagem para detecção de outliers em dados categoricos." [s.n.], 2004. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276461.
Full textDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-04T02:10:07Z (GMT). No. of bitstreams: 1 Silva_FlavioRoberto_M.pdf: 2674028 bytes, checksum: 456319a74b85e74d16832bff92d67eed (MD5) Previous issue date: 2004
Resumo: Outliers são elementos que não obedecem a um padrão do conjunto de dados ao qual eles pertencem. A detecção de outliers pode trazer informações não esperadas e importantes para algumas aplicações, como por exemplo: descoberta de fraudes em sistemas telefônicos e de cartão de crédito e sistemas de detecção de intrusão. Esta dissertação apresenta uma nova abordagem para detecção de outliers em bancos de dados com atributos categóricos. A abordagem proposta usa modelos log-lineares como um padrão para o conjunto de dados, o que torna mais fácil a tarefa de interpretação dos resultados pelo usuário. Também é apresentado o FOCaD (Finding Outliers in Categorical Data), protótipo de um sistema de análise de dados categóricos. Ele ajusta e seleciona modelos, faz testes estatísticos e detecta outliers
Abstract: An outlier is an element that does not conform to a given pattern to a set. Outlier detection can lead to unexpected and useful information to some applications, e.g., discovery of fraud in telephonic and credit card systems, intrusion detection systems. This Master Thesis presents a new approach for outlier detection in databases with categorical attributes. The proposed approach uses log-linear models as a pattern for the dataset, which makes easier the task of interpreting results by the user'. It is also presented FOCaD (Finding Outliers in Categorical Data), a prototype of a categorical data analysis system. It adjusts and selects models, performs statistic tests, and outlier detection
Mestrado
Ciência da Computação
Mestre em Ciência da Computação
Rodrigues, Rafael Delalibera. "Detecção de outliers baseada em caminhada determinística do turista." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/59/59143/tde-14062018-223903/.
Full textOutlier detection is a fundamental task for knowledge discovery in data mining. It aims to detect data items that deviate from the general pattern of a given data set. In this work, we present a new outlier detection technique using tourist walks. Specifically, starting from each data sample and varying the memory size, a data sample gets a higher outlier score if it participates in few tourist walk attractors, while it gets a low score if it participates in a large number of attractors. Experimental results on artificial and real data sets show good performance of the proposed method. In comparison to classical methods, the proposed one shows the following salient features: 1) It finds out outliers by identifying the structure of the input data set instead of considering only physical features, such as distance, similarity or density. 2) It can detect not only external outliers as classical methods do, but also internal outliers staying among various normal data groups. 3) By varying the memory size, the tourist walks can characterize both local and global structures of the data set. 4) The proposed method is a deterministic technique. Therefore, only one run is sufficient, in contrast to stochastic techniques, which require many runs. Moreover, in this work, we find, for the first time, that tourist walks can generate complex attractors in various crossing shapes. Such complex attractors reveal data structures in more details. Consequently, it can improve the outlier detection.
Braumann, Maria Manuela São Pedro Abreu. "Sobre testes de detecção de "outliers" em populações exponenciais." Doctoral thesis, Universidade de Évora, 1994. http://hdl.handle.net/10174/11835.
Full textMiranda, Carla da Fonseca. "Modelação linear de séries temporais na presença de outliers." Master's thesis, Universidade do Porto. Reitoria, 2001. http://hdl.handle.net/10216/10001.
Full textNa análise de séries temporais, encontram-se frequentemente outliers e mudanças estruturais, que podem estar associadas a acontecimentos inesperados ou incontroláveis como por exemplo, greves, guerras, mudanças políticas, ou podem dever-se simplesmente a erros de medição ou de registo de observações.Estas observações podem comprometer os procedimentos usuais de modelação linear de uma série temporal, nomeadamente podem induzir a uma identificação incorrecta de um modelo ARIMA e a uma estimação enviezada dos parâmetros do modelo. O objectivo principal deste trabalho é apresentar alguns procedimentos de modelação linear de uma série temporal na presença de outliers e de mudanças estruturais. A abordagem usualmente adoptada neste tipo de procedimentos consiste na identificação da localização e dos tipos de outliers ou mudanças estruturais e na utilização de modelos de intervenção de Box e Tiao (1975) para acomodar os seus efeitos. Esta aproximação requere iterações entre etapas de detecção, utilizando estatísticas de razão de verosimilhanças para localizar e identificar os outliers e as mudanças estruturais de acordo com o seu tipo, e de estimação de um modelo gerador destas perturbações, para acomodar os seus efeitos. Os outliers usualmente considerados são os outliers do tipo aditivo (AO) e os outliers do tipo inovador (IO) e as mudanças estruturais são as alterações de nível permanentes e transitórias (LC) e (TC). Uma abordagem alternativa ao uso de estatísticas de razão de verosimilhanças paradetectar outliers e alterações de nível, consiste na utilização de estatísticas que se baseiam na exclusão de uma ou de um grupo de observações para medir as consequentes alterações nas estimativas dos parâmetros do modelo. Esta aproximação permite detectar observações influentes que podem ser outliers. Neste sentido, também serão apresentados neste trabalho diagnósticos indicadores de observações e de outliers influentes.
Kawabata, Thatiane [UNESP]. "Detecção de outliers espaciais: refinamento de similaridade e desempenho." Universidade Estadual Paulista (UNESP), 2015. http://hdl.handle.net/11449/127787.
Full textO avanço e desenvolvimento de tecnologias utilizadas na coleta de informações georreferenciáveis proporcionou um aumento na quantidade de dados espaciais armazenados nas bases de dados. Isso também acarretou muitos problemas, comuns em grandes bases de dados, tais como: redundância de dados, dados incompletos, valores desconhecidos e outliers. Com o objetivo de obter informações relevantes dos dados espaciais, a aplicação de algoritmos de prospecção de dados espaciais, principalmente os algoritmos de agrupamentos espaciais, tornou-se uma prática bastante recorrente em todo cenário mundial. Por outro lado, muitos algoritmos atuais desconsideram a presença de outliers locais em dados espaciais, ou apenas consideram a sua localidade em relação aos demais dados da base, o que pode gerar resultados inconsistentes e dificultar a extração de conhecimento. Dessa forma, com o propósito de contribuir nesse sentido, o trabalho visa elaborar um levantamento de informações relacionadas a prospecção de dados espaciais, detecção de outliers convencionais e espaciais, assim como, apresentar os principais trabalhos no estado da arte. Por fim, propõe-se disponibilizar uma abordagem configurável e portável aos resultados dos algoritmos de agrupamento espaciais, na qual inclui-se uma melhoria em um algoritmo de detecção de outliers espaciais, que visa a prospecção de informações no conjunto de dados
The progress and development of technologies used to collect spatial information resulted in an increase in the amount of spatial data stored in databases. This also caused many problems, common in large databases, such as data redundancy, incomplete data, unknown values and outliers. Aiming to obtain relevant information from spatial data, the application of algorithms for exploration of spatial data, especially spatial clusters of algorithms, has become a fairly common practice across the world scene. Moreover, many current algorithms ignore the presence of local outliers in spatial data, or just consider your location in relation to other data in base, which can cause inconsistent results and complicate the extraction of knowledge. Thus, in order to contribute to this, the work aims to develop a survey of information related to exploration of spatial data, detection of conventional and spatial outliers, as well as, present the main work in state of the art. Finally, we propose to provide a portable and configurable algorithms to the results of spatial clustering approach, which includes an improvement on an algorithm to detect spatial outliers, aimed at prospecting for information in the dataset
Fung, Wing-kam Tony. "Analysis of outliers using graphical and quasi-Bayesian methods /." [Hong Kong] : University of Hong Kong, 1987. http://sunzi.lib.hku.hk/hkuto/record.jsp?B1236146X.
Full textPage, Garritt L. "Bayesian mixture modeling and outliers in inter-laboratory studies." [Ames, Iowa : Iowa State University], 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3389133.
Full textMiranda, Carla da Fonseca. "Modelação linear de séries temporais na presença de outliers." Dissertação, Universidade do Porto. Reitoria, 2001. http://hdl.handle.net/10216/10001.
Full textNa análise de séries temporais, encontram-se frequentemente outliers e mudanças estruturais, que podem estar associadas a acontecimentos inesperados ou incontroláveis como por exemplo, greves, guerras, mudanças políticas, ou podem dever-se simplesmente a erros de medição ou de registo de observações.Estas observações podem comprometer os procedimentos usuais de modelação linear de uma série temporal, nomeadamente podem induzir a uma identificação incorrecta de um modelo ARIMA e a uma estimação enviezada dos parâmetros do modelo. O objectivo principal deste trabalho é apresentar alguns procedimentos de modelação linear de uma série temporal na presença de outliers e de mudanças estruturais. A abordagem usualmente adoptada neste tipo de procedimentos consiste na identificação da localização e dos tipos de outliers ou mudanças estruturais e na utilização de modelos de intervenção de Box e Tiao (1975) para acomodar os seus efeitos. Esta aproximação requere iterações entre etapas de detecção, utilizando estatísticas de razão de verosimilhanças para localizar e identificar os outliers e as mudanças estruturais de acordo com o seu tipo, e de estimação de um modelo gerador destas perturbações, para acomodar os seus efeitos. Os outliers usualmente considerados são os outliers do tipo aditivo (AO) e os outliers do tipo inovador (IO) e as mudanças estruturais são as alterações de nível permanentes e transitórias (LC) e (TC). Uma abordagem alternativa ao uso de estatísticas de razão de verosimilhanças paradetectar outliers e alterações de nível, consiste na utilização de estatísticas que se baseiam na exclusão de uma ou de um grupo de observações para medir as consequentes alterações nas estimativas dos parâmetros do modelo. Esta aproximação permite detectar observações influentes que podem ser outliers. Neste sentido, também serão apresentados neste trabalho diagnósticos indicadores de observações e de outliers influentes.
Liu, Jie. "Exploring Ways of Identifying Outliers in Spatial Point Patterns." Digital Commons @ East Tennessee State University, 2015. https://dc.etsu.edu/etd/2528.
Full textHacini, Akram. "Une approche de détection d'outliers en présence de l'incertitude." Thesis, Paris 8, 2018. http://www.theses.fr/2018PA080068.
Full textOne of the complexity aspects of the new data produced by the different processing systems is the inaccuracy, the uncertainty, and the incompleteness. These aspects are aggravated by the multiplicity and the dissemination of data-generating sources, that can be easily observed within various control and monitoring systems. While the tools of data mining have become fairly efficient with data that have reliable prior knowledge, they cannot be applied to data where the knowledge itself may be tainted with uncertainty and inaccuracy. As a result, new approaches that take into account this aspect will certainly improve the performance of data mining systems, including the detection of outliers,which is the subject of our research in this thesis.This thesis deals therefore with a particular aspect of uncertainty and accuracy, namely the proposal of a new method to detect outliers in uncertain and / or inaccurate data. Indeed, the inaccuracy of the expertise related to the learning data, is an aspect of complexity. To overcome this particular problem of inaccuracy and uncertainty of the expertise data, we have combined techniques resulting from machine learning, especially clustering, and techniques derived from fuzzy logic, especially fuzzy sets. So we will be able to project the new observations, on the clusters of the learning data, and after thresholding, defining the observations to consider as aberrant (outliers) in the considered dataset.Specifically, using ambiguous decision tables (ADTs), we proceeded from the ambiguity indices of the learning data to compute the ambiguity indices of the new observations (test data), using the Fuzzy Inference. After clustering, the set of ambiguity indices, an α-cut operation allowed us to define a decision boundary within the clusters, which was used in turn to categorize the observations as normal (inliers ) or aberrant (outliers). The strength of the proposed method lies in its ability to deal with inaccurate and / or uncertain learning data using only the indices of ambiguity, thus overcoming the various problems of incompleteness of the datasets. The metrics of false positives and recall, allowed us on one hand to evaluate the performances of our method, and also to parameterize it according to the choices of the user
ALMutawa, Jaafar Hasan Mohamed Yusuf. "Subspace identification of linear systems in the presence of outliers." 京都大学 (Kyoto University), 2006. http://hdl.handle.net/2433/143896.
Full textSothinathan, Nalaiyini. "Bayesian Analysis for outliers in binomial, Normal and circular data." Thesis, Queen Mary, University of London, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.498204.
Full textDerksen, Timothy J. (Timothy John). "Processing of outliers and missing data in multivariate manufacturing data." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/38800.
Full textIncludes bibliographical references (leaf 64).
by Timothy J. Derksen.
M.Eng.
Masood, Adnan. "Measuring Interestingness in Outliers with Explanation Facility using Belief Networks." NSUWorks, 2014. http://nsuworks.nova.edu/gscis_etd/232.
Full textKarlsson, Peter S. "Issues of incompleteness, outliers and asymptotics in high dimensional data." Doctoral thesis, Internationella Handelshögskolan, Högskolan i Jönköping, IHH, Economics, Finance and Statistics, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-14934.
Full textDishman, Tamarah Crouse. "Identifying Outliers in a Random Effects Model For Longitudinal Data." UNF Digital Commons, 1989. http://digitalcommons.unf.edu/etd/191.
Full textRamos, Jonathan da Silva. "Algoritmos de casamento de imagens com filtragem adaptativa de outliers." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-02022017-110428/.
Full textImage matching plays a major role in many applications, such as pattern recognition and microscopic imaging. It encompasses three steps: 1) interest point selection; 2) feature extraction from each point; 3) feature point matching. For steps 1 and 2, traditional interest point detectors/ extractors have worked well. However, for step 3 even a few points incorrectly matched (outliers), might lead to an undesirable result. State-of-the-art consensus algorithms present a high time cost as the number of outlier increases. Aiming at overcoming this problem, we present FOMP, a preprocessing approach, that reduces the number of outliers in the initial set of matched points. FOMP filters out the vertices that present a higher difference among their edges in a complete graph representation of the points. To validate the proposed method, experiments were performed with four image database: (a) variations of rotation or camera zoom; (b) repetitive patterns, which leads to duplicity of features vectors; (c) deformable objects, such as plastics, clothes or papers; (d) affine transformations (different viewpoint). The experimental results showed that FOMP removes more than 65% of the outliers, while keeping over 98% of the inliers. Moreover, the precision of traditional methods is kept, while reducing the processing time of graph based approaches by half.
Bulhões, Rodrigo de Souza. "Contribuições à análise de outliers em modelos de equações estruturais." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-19062013-135858/.
Full textThe Structural Equation Model (SEM) is usually set to perform a confirmatory analysis on the assumptions of a researcher about the relationship between the observed variables and the latent variables of such a study. In practice, the most iterant way of evaluating the quality of the estimates of a SEM comes either from procedures of measuring how distant the usual classic or ordinary covariance matrix is from the covariance matrix of the adjusted model, or from the magnitude of the hiatus in discrepancy functions of both the hypothetical model and the saturated model. Nevertheless, they may fail to capture problems in the adjustment in the occurrence of either several parameters to estimate or several observations. This study included indicators known in the literature in order to detect irregularities in the adjustment resulting from the impact caused by the presence of outliers in the data set. This study has also considered changes in both the Goodness-of-Fit Index (GFI) and the Adjusted Goodness-of-Fit Index (AGFI) in the expressions for parameter estimation by Maximum Likelihood method, which consisted in replacing the traditional covariance matrix by the robust covariance matrices computed through the following estimators: Minimum Volume Ellipsoid, Minimum Covariance Determinant, S, MM and Orthogonalized Gnanadesikan-Kettenring (OGK). Through simulation studies on disturbances of both symmetry deviations and excess kurtosis in both low and high fractions of contamination in different sample sizes and quantities of affected observed variables it has become clear that the proposals of modification of both the GFI and the AGFI adapted by the OGK estimator were the only ones able to be informative in all these situations. It must be considered that GFI or AGFI must be used when the number of parameters to be estimated is either low or high, respectively.
Katshunga, Dominique. "Identifying outliers and influential observations in general linear regression models." Thesis, University of Cape Town, 2004. http://hdl.handle.net/11427/6772.
Full textIdentifying outliers and/or influential observations is a fundamental step in any statistical analysis, since their presence is likely to lead to erroneous results. Numerous measures have been proposed for detecting outliers and assessing the influence of observations on least squares regression results. Since outliers can arise in different ways, the above mentioned measures are based on motivational arguments and they are designed to measure the influence of observations on different aspects of various regression results. In what follows, we investigate how one can combine different test statistics based on residuals and diagnostic plots to identify outliers and influential observations (both in the single and multiple case) in general linear regression models.
Aghlmandi, Soheila. "Outliers detection in INAR(1) model with negative binomial innovations." Master's thesis, Universidade de Aveiro, 2012. http://hdl.handle.net/10773/9875.
Full textOs processos de contagem, apesar de serem largamente usados na pr atica, continuam a ser alvo de investiga c~ao. Neste trabalho considera-se o processo de contagem autorregressivo de 1a ordem - INAR(1). O objetivo principal consiste em tratar o problema da dete c~ao de outliers aditivos em processos INAR(1), considerando uma distribui c~ao binomial negativa para o processo de inova c~oes. Aplica-se a abordagem bayesiana, atrav es da amostragem de Gibbs, para estimar a probabilidade de que uma observa c~ao seja afetada por um outlier. A metodologia proposta e ilustrada atrav es de v arios exemplos simulados e conjuntos de dados reais.
Discrete-valued, or so called Integer-valued, time series is widely used in practice; but still it can be considered as a new subject for research nowadays. In this context, the variables of the process take place on nite or countable in nite sets. In this work, we study rst-order INteger-valued AutoRegressive, INAR(1), processes. The main goal, however, is to develop the statistical expressions for detecting outliers for the model, by considering the distributions of innovations as negative binomial. The Binomial thinning operator is used in process. This work considers a Bayesian approach to the problem of modeling a negative binomial integer-valued autoregressive time series contaminated with additive outliers. Furthermore, we focus on computational part of detecting the outliers of INAR(1) process where we use R software. We show how Gibbs sampling can be used to detect outlying observations in INAR(1) processes.
Lowthian, Philip James. "Some studies on the perception of outliers in graphical displays." Thesis, Keele University, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.282633.
Full textCordner, Sheila Connors. "Educational outliers: exclusion as innovation in nineteenth-century British literature." Thesis, Boston University, 2013. https://hdl.handle.net/2144/12740.
Full textThis dissertation traces a genealogy of literary resistance to dominant pedagogies in nineteenth-century Britain. Although politicians, religious leaders, and literary authors celebrated the expansion of schools for people outside of privileged classes, a persistent tradition of writers registers the loss of non-institutional forms of learning. Excluded from Oxford and Cambridge because of their class or gender, Jane Austen, Elizabeth Barrett Browning, Thomas Hardy, and Virginia Woolf use their position outside of educational institutions to critique rote learning at universities for the elite as well as utilitarian schools for the masses. Hardy describes the "mental limitations" of Angel's Cambridge-educated brothers in Tess of the d'Urbervilles (1891), for example, mocking them as "such unimpeachable models as are turned out yearly by the lathe of systematic tuition." The radicalism of educational outliers emerges when read alongside educational pamphlets, working-men's club reports, college newspapers, and parliamentary debates. Educational outliers investigate the role that literature plays in un-teaching readers. They model alternative pedagogies centered on active learning instead of rote memorization. With Mansfield Park (1814), Austen inaugurates this tradition; at a time when proclamations on women's education proliferated, she offers novels as anti-treatises that constantly disrupt the reading experience instead of offering simplistic truths, forcing us to rely on our own judgment to make sense of the disorder that characterizes her model of self-education. Several decades later in her "novel-poem" Aurora Leigh (1856), Barrett Browning instructs us in a "headlong," empathic reading of her text as part of her experiential learning approach for women of different classes that stresses reform from within. Writing after more working-class schools had opened, Hardy tests the novel's capacity to un-teach assumptions about categories like "autodidact" itself and rewrites the celebratory self-made man's narrative by placing the reader in the position from which to weigh the positives and negatives of self-education. In the early twentieth century, Woolf imagines an education that "unfixes" students from their rigid class mindset in her "essay-novel" The Pargiters. Educational outliers' innovations ultimately prompt us to think about what outsiders' perspectives might be helpful today.
Aquino, Artur Ribeiro de. "Um método para interpretar outliers em trajetórias de objetos móveis." reponame:Repositório Institucional da UFSC, 2014. https://repositorio.ufsc.br/xmlui/handle/123456789/123274.
Full textMade available in DSpace on 2014-08-06T18:00:39Z (GMT). No. of bitstreams: 1 326743.pdf: 3567685 bytes, checksum: 552537134134fce0fdc4becfa0599acf (MD5) Previous issue date: 2014
Dispositivos capazes de registrar o rastro de um objeto móvel estão cada vez mais populares. Esses registros são chamados de Trajetórias de Objetos Móveis. Devido ao grande volume desses dados surge a necessidade de criar métodos e algoritmos para extrair alguma informação útil desses dados. Existem vários trabalhos de mineração de dados em trajetórias para detectar diferentes tipos de padrões, porém poucos focam na detecção de outliers entre trajetórias. Os outliers entre trajetórias são aqueles com um comportamento ou característica diferente da maioria. Se a maioria dos objetos estão andando a 80km/h em um determinado trecho, os objetos a 120km/h são os outliers. Outliers de trajetórias podem ser interessantes para descobrir comportamentos suspeitos em um grupo de pessoas, para encontrar rotas alternativas na análise de tráfego e até saber quais são os melhores ou piores caminhos conectando duas regiões de interesse. Não se teve conhecimento de um outro trabalho na literatura que fizesse uma análise mais aprofundada, que interpretasse ou desse significado aos outliers. A semântica dos outliers pode prover mais informação para tomadas de decisão. Nesse trabalho é apresentado um algoritmo para agregar significado aos outliers de trajetórias de motoristas considerando três possíveis razões principais para um desvio: paradas fora do caminho padrão, eventos ou trânsito no caminho padrão. Experimentos são mostrados com dados reais e o método encontra os diferentes tipos e classificações de outliers corretamente.
Abstract : Devices for recording moving object traces are becoming very popular. These traces are called Trajectories of Moving Objects. The huge volume of these data raises the need for developing methods and algorithms to extract useful information from these data. There are many works related to trajectory data mining that nd dierent types of patterns, but only a few of them focused on outlier detection between trajectories. Outliers between trajectories are the ones that behave different from the majority. If the majority of the objects are going on a speed of 80km/h in some part of a road, for example, the objects on 120km/h are the outliers. Trajectory outliers are interesting to discover suspicious behaviors in a group of people, to nd alternative routes in trac analysis and even to discover better and worse paths connecting two regions of interest. To the best of our knowledge, no works so far have made a deeper analysis to either understand or give a meaning to the outliers. Outliers with semantic information can provide more information for decision making. In this work we present an algorithm to add meaning to trajectory outliers of vehicles drivers considering three main possible reasons for a detour: stops outside the standard route, events, and trac jams in the standard path. We show throughexperiments on real data that the method correctly nds the dierent types of outliers and classies them correctly.
Dall'Acqua, Fernando Maida. "Risco soberano Brasil: uma explicação do spread e dos outliers." reponame:Repositório Institucional do FGV, 2003. http://hdl.handle.net/10438/5611.
Full textEste trabalho tem como objetivo propor um exame sistemático do chamado prêmio do risco soberano dos títulos emitidos pelo governo brasileiro que permita a categorização dos fatores que possam ser entendidos como geradores do conceito de risco soberano.
Barbosa, Josino José. "Identificação de outliers multivariados - Uma aplicação em dados de saúde." Universidade Federal de Viçosa, 2017. http://www.locus.ufv.br/handle/123456789/10041.
Full textMade available in DSpace on 2017-04-11T14:32:01Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1609406 bytes, checksum: 9cedba288b402aa34f47d430c8a495cf (MD5) Previous issue date: 2017-02-17
A identificação de outliers desempenha um papel importante na análise estatística, pois tais observações podem conter informações importantes em relação aos dados. Se modelos estatísticos clássicos são cegamente aplicados a dados contendo valores atípicos, os resultados podem ser enganosos e decisões equivocadas podem ser tornadas. Além disso, em situações práticas, os próprios outliers são muitas vezes os pontos especiais de interesse e sua identificação pode ser o principal objetivo da investigação. Por isso, a finalidade desse trabalho é propor uma técnica de detecção de outliers multivariados, baseada em análise agrupamento e comparar essa técnica com o método de identificação de outliers via Distância de Mahalanobis. Para geração dos dados utilizou-se simulação através do Método de Monte Carlo e a técnica de mistura de distribuições normais multivariadas. Os resultados apresentados nas simulações mostram que o método proposto foi superior ao método de Mahalanobis tanto para sensibilidade quanto para especificidade, ou seja, ele apresenta maior capacidade de diagnosticar corretamente os indivíduos outliers e os não outliers. Além disso, a metodologia proposta foi ilustrada com uma aplicação em dados reais provenientes da área de saúde.
The identification of outliers plays an important role in statistical analysis, as such observations may contain important information regarding the data. If classical statistical models are blindly applied to data containing atypical values, the results may be misleading and mistaken decisions can be made. Moreover, in practical situations, the outliers themselves are often the special points of interest and their identification may be the main objective of the investigation. Therefore, the purpose of this work is to propose a technique of detection of multivariate outliers based on cluster analysis and to compare this technique with the method of identifying outliers via Mahalanobis Distance. For data generation, the Monte Carlo method and the mixed-multivariate normal distribution technique were used. The results presented in the simulations show that the proposed method was superior to the Mahalanobis method for both sensitivity and specificity, that is, it presents greater capacity to correctly diagnose outliers and non-outliers individuals. In addition, the proposed methodology was illustrated with an application in real data from the health area.
Kim, Younghui. "Seeking for outliers: Artistic exploration of data through creative practice." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/206985/1/Younghui_Kim_Thesis.pdf.
Full text