Dissertations / Theses on the topic 'Outlier analyses'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Outlier analyses.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Zhang, Ji. "Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy." University of Southern Queensland, Faculty of Sciences, 2008. http://eprints.usq.edu.au/archive/00005645/.
Full textCheng, Gongxian. "Outlier management in intelligent data analysis." Thesis, Birkbeck (University of London), 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.417120.
Full textAbghari, Shahrooz. "Data Modeling for Outlier Detection." Licentiate thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16580.
Full textScalable resource-efficient systems for big data analytics
Birch, Gary Edward. "Single trial EEG signal analysis using outlier information." Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/28626.
Full textApplied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate
Mitchell, Napoleon. "Outliers and Regression Models." Thesis, University of North Texas, 1992. https://digital.library.unt.edu/ark:/67531/metadc279029/.
Full textSoon, Shih Chung. "On detection of extreme data points in cluster analysis." Connect to resource, 1987. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1262886219.
Full textRobson, Geoffrey. "Multiple outlier detection and cluster analysis of multivariate normal data." Thesis, Stellenbosch : Stellenbosch University, 2003. http://hdl.handle.net/10019.1/53508.
Full textENGLISH ABSTRACT: Outliers may be defined as observations that are sufficiently aberrant to arouse the suspicion of the analyst as to their origin. They could be the result of human error, in which case they should be corrected, but they may also be an interesting exception, and this would deserve further investigation. Identification of outliers typically consists of an informal inspection of a plot of the data, but this is unreliable for dimensions greater than two. A formal procedure for detecting outliers allows for consistency when classifying observations. It also enables one to automate the detection of outliers by using computers. The special case of univariate data is treated separately to introduce essential concepts, and also because it may well be of interest in its own right. We then consider techniques used for detecting multiple outliers in a multivariate normal sample, and go on to explain how these may be generalized to include cluster analysis. Multivariate outlier detection is based on the Minimum Covariance Determinant (MCD) subset, and is therefore treated in detail. Exact bivariate algorithms were refined and implemented, and the solutions were used to establish the performance of the commonly used heuristic, Fast–MCD.
AFRIKAANSE OPSOMMING: Uitskieters word gedefinieer as waarnemings wat tot s´o ’n mate afwyk van die verwagte gedrag dat die analis wantrouig is oor die oorsprong daarvan. Hierdie waarnemings mag die resultaat wees van menslike foute, in welke geval dit reggestel moet word. Dit mag egter ook ’n interressante verskynsel wees wat verdere ondersoek benodig. Die identifikasie van uitskieters word tipies informeel deur inspeksie vanaf ’n grafiese voorstelling van die data uitgevoer, maar hierdie benadering is onbetroubaar vir dimensies groter as twee. ’n Formele prosedure vir die bepaling van uitskieters sal meer konsekwente klassifisering van steekproefdata tot gevolg hˆe. Dit gee ook geleentheid vir effektiewe rekenaar implementering van die tegnieke. Aanvanklik word die spesiale geval van eenveranderlike data behandel om noodsaaklike begrippe bekend te stel, maar ook aangesien dit in eie reg ’n area van groot belang is. Verder word tegnieke vir die identifikasie van verskeie uitskieters in meerveranderlike, normaal verspreide data beskou. Daar word ook ondersoek hoe hierdie idees veralgemeen kan word om tros analise in te sluit. Die sogenaamde Minimum Covariance Determinant (MCD) subversameling is fundamenteel vir die identifikasie van meerveranderlike uitskieters, en word daarom in detail ondersoek. Deterministiese tweeveranderlike algoritmes is verfyn en ge¨ımplementeer, en gebruik om die effektiwiteit van die algemeen gebruikte heuristiese algoritme, Fast–MCD, te ondersoek.
Halldestam, Markus. "ANOVA - The Effect of Outliers." Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-295864.
Full textAstl, Stefan Ludwig. "Suboptimal LULU-estimators in measurements containing outliers." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019.1/85833.
Full textENGLISH ABSTRACT: Techniques for estimating a signal in the presence of noise which contains outliers are currently not well developed. In this thesis, we consider a constant signal superimposed by a family of noise distributions structured as a tunable mixture f(x) = α g(x) + (1 − α) h(x) between finitesupport components of “well-behaved” noise with small variance g(x) and of “impulsive” noise h(x) with a large amplitude and strongly asymmetric character. When α ≈ 1, h(x) can for example model a cosmic ray striking an experimental detector. In the first part of our work, a method for obtaining the expected values of the positive and negative pulses in the first resolution level of a LULU Discrete Pulse Transform (DPT) is established. Subsequent analysis of sequences smoothed by the operators L1U1 or U1L1 of LULU-theory shows that a robust estimator for the location parameter for g is achieved in the sense that the contribution by h to the expected average of the smoothed sequences is suppressed to order (1 − α)2 or higher. In cases where the specific shape of h can be difficult to guess due to the assumed lack of data, it is thus also shown to be of lesser importance. Furthermore, upon smoothing a sequence with L1U1 or U1L1, estimators for the scale parameters of the model distribution become easily available. In the second part of our work, the same problem and data is approached from a Bayesian inference perspective. The Bayesian estimators are found to be optimal in the sense that they make full use of available information in the data. Heuristic comparison shows, however, that Bayes estimators do not always outperform the LULU estimators. Although the Bayesian perspective provides much insight into the logical connections inherent in the problem, its estimators can be difficult to obtain in analytic form and are slow to compute numerically. Suboptimal LULU-estimators are shown to be reasonable practical compromises in practical problems.
AFRIKAANSE OPSOMMING: Tegnieke om ’n sein af te skat in die teenwoordigheid van geraas wat uitskieters bevat is tans nie goed ontwikkel nie. In hierdie tesis aanskou ons ’n konstante sein gesuperponeer met ’n familie van geraasverdelings wat as verstelbare mengsel f(x) = α g(x) + (1 − α) h(x) tussen eindige-uitkomsruimte geraaskomponente g(x) wat “goeie gedrag” en klein variansie toon, plus “impulsiewe” geraas h(x) met groot amplitude en sterk asimmetriese karakter. Wanneer α ≈ 1 kan h(x) byvoorbeeld ’n kosmiese straal wat ’n eksperimentele apparaat tref modelleer. In die eerste gedeelte van ons werk word ’n metode om die verwagtingswaardes van die positiewe en negatiewe pulse in die eerste resolusievlak van ’n LULU Diskrete Pulse Transform (DPT) vasgestel. Die analise van rye verkry deur die inwerking van die gladstrykers L1U1 en U1L1 van die LULU-teorie toon dat hul verwagte gemiddelde waardes as afskatters van die liggingsparameter van g kan dien wat robuus is in die sin dat die bydrae van h tot die gemiddeld van orde grootte (1 − α)2 of hoër is. Die spesifieke vorm van h word dan ook onbelangrik. Daar word verder gewys dat afskatters vir die relevante skaalparameters van die model maklik verkry kan word na gladstryking met die operatore L1U1 of U1L1. In die tweede gedeelte van ons werk word dieselfde probleem en data vanuit ’n Bayesiese inferensie perspektief benader. Die Bayesiese afskatters word as optimaal bevind in die sin dat hulle vol gebruikmaak van die beskikbare inligting in die data. Heuristiese vergelyking wys egter dat Bayesiese afskatters nie altyd beter vaar as die LULU afskatters nie. Alhoewel die Bayesiese sienswyse baie insig in die logiese verbindings van die probleem gee, kan die afskatters moeilik wees om analities af te lei en stadig om numeries te bereken. Suboptimale LULU-beramers word voorgestel as redelike praktiese kompromieë in praktiese probleme.
Lipkovich, Ilya A. "Bayesian Model Averaging and Variable Selection in Multivariate Ecological Models." Diss., Virginia Tech, 2002. http://hdl.handle.net/10919/11045.
Full textPh. D.
Rodriguez, Gabriel. "Unit root, outliers and cointegration analysis with macroeconomic applications." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0028/NQ48794.pdf.
Full text馮榮錦 and Wing-kam Tony Fung. "Analysis of outliers using graphical and quasi-Bayesian methods." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1987. http://hub.hku.hk/bib/B31230842.
Full textFung, Wing-kam Tony. "Analysis of outliers using graphical and quasi-Bayesian methods /." [Hong Kong] : University of Hong Kong, 1987. http://sunzi.lib.hku.hk/hkuto/record.jsp?B1236146X.
Full textAl-Kahwati, Kammal. "Outlier detection on sparse-encoded vibration signals from rolling element bearings." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-76592.
Full textSothinathan, Nalaiyini. "Bayesian Analysis for outliers in binomial, Normal and circular data." Thesis, Queen Mary, University of London, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.498204.
Full textSchubert, Daniel Dice. "A multivariate adaptive trimmed likelihood algorithm /." Access via Murdoch University Digital Theses Project, 2005. http://wwwlib.murdoch.edu.au/adt/browse/view/adt-MU20061019.132720.
Full textKinns, David Jonathan. "Multiple case influence analysis with particular reference to the linear model." Thesis, University of Birmingham, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368427.
Full textKeller, Fabian [Verfasser], and K. [Akademischer Betreuer] Böhm. "Attribute Relationship Analysis in Outlier Mining and Stream Processing / Fabian Keller. Betreuer: K. Böhm." Karlsruhe : KIT-Bibliothek, 2015. http://d-nb.info/1075254019/34.
Full textGoi, Yoshinao. "Bayesian Damage Detection for Vibration Based Bridge Health Monitoring." Kyoto University, 2018. http://hdl.handle.net/2433/232013.
Full textAndrésen, Anton, and Adam Håkansson. "Comparing unsupervised clustering algorithms to locate uncommon user behavior in public travel data : A comparison between the K-Means and Gaussian Mixture Model algorithms." Thesis, Tekniska Högskolan, Jönköping University, JTH, Datateknik och informatik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-49243.
Full textStark, Love. "Outlier detection with ensembled LSTM auto-encoders on PCA transformed financial data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-296161.
Full textFinansinstitut genererar idag en stor mängd data, data som kan innehålla intressant information värd att undersöka för att främja den ekonomiska tillväxten för nämnda institution. Det finns ett intresse för att analysera dessa informationspunkter, särskilt om de är avvikande från det normala dagliga arbetet. Att upptäcka dessa avvikelser är dock inte en lätt uppgift och ej möjligt att göra manuellt på grund av de stora mängderna data som genereras dagligen. Tidigare arbete för att lösa detta har undersökt användningen av maskininlärning för att upptäcka avvikelser i finansiell data. Tidigare studier har visat på att förbehandlingen av datan vanligtvis står för en stor del i förlust av emphinformation från datan. Detta arbete syftar till att studera om det finns en korrekt balans i hur förbehandlingen utförs för att behålla den högsta mängden information samtidigt som datan inte förblir för komplex för maskininlärnings-modellerna. Det emphdataset som användes bestod av valutatransaktioner som tillhandahölls av värdföretaget och förbehandlades genom användning av Principal Component Analysis (PCA). Huvudsyftet med detta arbete är att undersöka om en ensemble av Long Short-Term Memory Recurrent Neural Networks (LSTM), konfigurerad som autoenkodare, kan användas för att upptäcka avvikelser i data och om ensemblen är mer precis i sina predikteringar än en ensam LSTM-autoenkodare. Tidigare studier har visat att en ensembel avautoenkodare kan visa sig vara mer precisa än en singel autokodare, särskilt när SkipCells har implementerats (en konfiguration som hoppar över vissa av LSTM-cellerna för att göra modellerna mer varierade). En datapunkt kommer att betraktas som en avvikelse om LSTM-modellen har problem med att återskapa den väl, dvs ett mönster som nätverket har svårt att återskapa, vilket gör datapunkten tillgänglig för vidare undersökningar. Resultaten visar att en ensemble av LSTM-modeller predikterade mer precist än en singel LSTM-modell när det gäller att återskapa datasetet, och då enligt vår definition av avvikelser, mer precis avvikelse detektering. Resultaten från förbehandlingen visar olika metoder för att uppnå ett optimalt antal komponenter för dina data genom att studera bibehållen varians och precision för PCA-transformation jämfört med modellprestanda. En av slutsatserna från arbetet är att en ensembel av LSTM-nätverk kan visa sig vara mycket kraftfulla, men att alternativ till förbehandling bör undersökas, såsom categorical embedding istället för PCA.
Van, Deventer Petrus Jacobus Uys. "Outliers, influential observations and robust estimation in non-linear regression analysis and discriminant analysis." Doctoral thesis, University of Cape Town, 1993. http://hdl.handle.net/11427/4363.
Full textArruda, Gabriel Domingos de. "Análise de viés em notícias na língua portuguesa." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-10012016-144315/.
Full textThe project described here proposes a model to study bias on newswire texts, related to political entities. Three types of bias are analysed: selection bias, which refers to the amount of times an entity is referenced by the media outlet; coverage bias, which assesses the amount of coverage given to an entity and, finally, the assertion bias, which analyses whether the news is a positive or negative report of an entity. To accomplish this, a corpus was systematically built by extracting news from 5 different newswires. These texts were manually classified according to their polarity alignment and associated entity. Sentiment Analysis techniques were applied and evaluated using the corpus. Based on the concept of outliers, a methodology for bias detection was created. Bias was analysed using the proposed methodology on the generated corpus for candidates to the government of the state of São Paulo and to presidency, being identified in two newswires for the three above-defined types
Balasubramanian, Vijay. "Variance reduction and outlier identification for IDDQ testing of integrated chips using principal component analysis." Texas A&M University, 2006. http://hdl.handle.net/1969.1/4766.
Full textChen, Feng. "Efficient Algorithms for Mining Large Spatio-Temporal Data." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/19220.
Full textgrowing interests. Recent advances on remote sensing technology mean
that massive amounts of spatio-temporal data are being collected,
and its volume keeps increasing at an ever faster pace. It becomes
critical to design efficient algorithms for identifying novel and
meaningful patterns from massive spatio-temporal datasets. Different
from the other data sources, this data exhibits significant
space-time statistical dependence, and the assumption of i.i.d. is
no longer valid. The exact modeling of space-time dependence will
render the exponential growth of model complexity as the data size
increases. This research focuses on the construction of efficient
and effective approaches using approximate inference techniques for
three main mining tasks, including spatial outlier detection, robust
spatio-temporal prediction, and novel applications to real world
problems.
Spatial novelty patterns, or spatial outliers, are those data points
whose characteristics are markedly different from their spatial
neighbors. There are two major branches of spatial outlier detection
methodologies, which can be either global Kriging based or local
Laplacian smoothing based. The former approach requires the exact
modeling of spatial dependence, which is time extensive; and the
latter approach requires the i.i.d. assumption of the smoothed
observations, which is not statistically solid. These two approaches
are constrained to numerical data, but in real world applications we
are often faced with a variety of non-numerical data types, such as
count, binary, nominal, and ordinal. To summarize, the main research
challenges are: 1) how much spatial dependence can be eliminated via
Laplace smoothing; 2) how to effectively and efficiently detect
outliers for large numerical spatial datasets; 3) how to generalize
numerical detection methods and develop a unified outlier detection
framework suitable for large non-numerical datasets; 4) how to
achieve accurate spatial prediction even when the training data has
been contaminated by outliers; 5) how to deal with spatio-temporal
data for the preceding problems.
To address the first and second challenges, we mathematically
validated the effectiveness of Laplacian smoothing on the
elimination of spatial autocorrelations. This work provides
fundamental support for existing Laplacian smoothing based methods.
We also discovered a nontrivial side-effect of Laplacian smoothing,
which ingests additional spatial variations to the data due to
convolution effects. To capture this extra variability, we proposed
a generalized local statistical model, and designed two fast forward
and backward outlier detection methods that achieve a better balance
between computational efficiency and accuracy than most existing
methods, and are well suited to large numerical spatial datasets.
We addressed the third challenge by mapping non-numerical variables
to latent numerical variables via a link function, such as logit
function used in logistic regression, and then utilizing
error-buffer artificial variables, which follow a Student-t
distribution, to capture the large valuations caused by outliers. We
proposed a unified statistical framework, which integrates the
advantages of spatial generalized linear mixed model, robust spatial
linear model, reduced-rank dimension reduction, and Bayesian
hierarchical model. A linear-time approximate inference algorithm
was designed to infer the posterior distribution of the error-buffer
artificial variables conditioned on observations. We demonstrated
that traditional numerical outlier detection methods can be directly
applied to the estimated artificial variables for outliers
detection. To the best of our knowledge, this is the first
linear-time outlier detection algorithm that supports a variety of
spatial attribute types, such as binary, count, ordinal, and
nominal.
To address the fourth and fifth challenges, we proposed a robust
version of the Spatio-Temporal Random Effects (STRE) model, namely
the Robust STRE (R-STRE) model. The regular STRE model is a recently
proposed statistical model for large spatio-temporal data that has a
linear order time complexity, but is not best suited for
non-Gaussian and contaminated datasets. This deficiency can be
systemically addressed by increasing the robustness of the model
using heavy-tailed distributions, such as the Huber, Laplace, or
Student-t distribution to model the measurement error, instead of
the traditional Gaussian. However, the resulting R-STRE model
becomes analytical intractable, and direct application of
approximate inferences techniques still has a cubic order time
complexity. To address the computational challenge, we reformulated
the prediction problem as a maximum a posterior (MAP) problem with a
non-smooth objection function, transformed it to a equivalent
quadratic programming problem, and developed an efficient
interior-point numerical algorithm with a near linear order
complexity. This work presents the first near linear time robust
prediction approach for large spatio-temporal datasets in both
offline and online cases.
Ph. D.
Åkerberg, Ludvig. "Using Unsupervised Machine Learning for Outlier Detection in Data to Improve Wind Power Production Prediction." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-200336.
Full textVindkraftsproduktion som källa för hållbar elektrisk energi har på senare år ökat och visar inga tecken på att sakta in. Den här oförutsägbara källan till energi har bidragit till att destabilisera elnätet vilket orsakat dagliga kraftiga svängningar i priser på elmarknaden. För att elproducenter och konsumenter ska kunna göra bra investeringar har metoder för att prediktera vindkraftsproduktionen utvecklats. Dessa metoder är ofta baserade på maskininlärning där historiska data från väderleksprognoser och vindkraftsproduktion använts. Denna data kan innehålla så kallade outliers, vilket resulterar i försämrade prediktioner från maskininlärningsmetoderna. Målet med det här examensarbetet var att identifiera och ta bort outliers från data så att prediktionerna från dessa metoder kan förbättras. För att göra det har en metod för outlier-identifikation utveklats baserad på oövervakad maskininlärning och forskning har genomförts på områdena inom maskininlärning för att identifiera outliers samt prediktion för vindkraftsproduktion.
Kuzmak, Barbara R. "An examination of outliers and interaction in a nonreplicated two-way table." Diss., Virginia Tech, 1990. http://hdl.handle.net/10919/37747.
Full textPh. D.
He, Tian Ying. "Outline-based image content analysis using partial signature." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ40415.pdf.
Full textMasood, Adnan. "Measuring Interestingness in Outliers with Explanation Facility using Belief Networks." NSUWorks, 2014. http://nsuworks.nova.edu/gscis_etd/232.
Full textZhou, Bin. "Computational Analysis of LC-MS/MS Data for Metabolite Identification." Thesis, Virginia Tech, 2011. http://hdl.handle.net/10919/36109.
Full textMaster of Science
Kaltenbach, Kelley J. "Analysis of magnetic anomalies in determining fault displacement in the crystalline Precambrian basement underneath the Bellefontaine Outlier, Ohio /." Connect to resource, 1998. http://hdl.handle.net/1811/28551.
Full textFidler, Michael L. "Three dimensional digital analysis of 2,500 square kilometers of gravity and magnetic survey data, Bellefontaine Outlier area, Ohio /." Columbus, Ohio : Ohio State University, 2003. http://hdl.handle.net/1811/6110.
Full textKanneganti, Raghuveer. "CLASSIFICATION OF ONE-DIMENSIONAL AND TWO-DIMENSIONAL SIGNALS." OpenSIUC, 2014. https://opensiuc.lib.siu.edu/dissertations/892.
Full textGracie, Christina. "Bayesian analysis of agricultural treatment effects in the presence of a fertility trend and outliers." Thesis, Gracie, Christina (2005) Bayesian analysis of agricultural treatment effects in the presence of a fertility trend and outliers. Honours thesis, Murdoch University, 2005. https://researchrepository.murdoch.edu.au/id/eprint/40843/.
Full textAnderson, Cynthia 1962. "A Comparison of Five Robust Regression Methods with Ordinary Least Squares: Relative Efficiency, Bias and Test of the Null Hypothesis." Thesis, University of North Texas, 2001. https://digital.library.unt.edu/ark:/67531/metadc5808/.
Full textLiu, Yan. "Documenting the impact of outliers on decisions about the number of factors in exploratory factor analysis." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/39802.
Full textSun, Yi. "New matching algorithm -- Outlier First Matching (OFM) and its performance on Propensity Score Analysis (PSA) under new Stepwise Matching Framework (SMF)." Thesis, State University of New York at Albany, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3633233.
Full textAn observational study is an empirical investigation of treatment effect when randomized experimentation is not ethical or feasible (Rosenbaum 2009). Observational studies are common in real life due to the following reasons: a) randomization is not feasible due to the ethical or financial reason; b) data are collected from survey or other resources where the object and design of the study has not been determined (e.g. retrospective study using administrative records); c) little knowledge on the given region so that some preliminary studies of observational data are conducted to formulate hypotheses to be tested in subsequent experiments. When statistical analysis are done using observational studies, the following issues need to be considered: a) the lack of randomization may lead to a selection bias; b) representativeness of sampling with respect to the problem under consideration (e.g. study of factors influencing a rare disease using a nationally representative survey with respective to race, income, and gender but not with respect to the rare disease condition).We will use the following sample to illustrate the challenges of observational studies and possible mitigation measures.
Our example is based on the study by Lalonde (1986), which evaluated the impact of job training on the earnings improvement of low-skilled workers in 1970's (In Paper 1 section 1.5.2, we will discuss this data set in more detail). The treatment effect estimated from the observational study was quite different from the one obtained using the baseline randomized "National Supported Work (NSW) Experiment" carried out in the mid-1970's. Now we understand the treatment effect which is the impact of job training. Selection bias may contaminate the treatment effect, in other words, workers who receive the job training may be fundamentally different from those who do not. Furthermore, the sample of control group selected for observational study by Lalonde may not represent the sample of control group from the original NSW experiment.
In this study, we address the issue of lack of randomization by applying a new matching algorithm (Outlier First Matching, OFM) which can be used in conjunction with the Propensity Score Analysis (PSA) or other similar methods to achieve the convincible treatment effect estimation in observational studies.
This dissertation consists of three papers.
Paper 1 proposes a new "Stepwise Matching Framework (SMF)" and rationalizes its usage in causal inference study (especially for PSA study using observational data). Furthermore, under the new framework of SMF, one new matching algorithm (Outlier First Matching or OFM in short) will be introduced. Its performance along with other well-known matching algorithms will be studied using the cross sectional data.
Paper 2 extends methods of paper 1 to correlated data (especially to longitudinal data). In the circumstance of correlated data (e.g. longitudinal data), besides the selection bias as in cross-sectional observational data, the repeated measures bring out the between-subject and within-subject correlation. Furthermore, the repeated measures can also bring out the missing value problem and rolling enrollment problem. All of above challenges from correlated data complexity the data structure and need to be addressed using more complex model and methodology. Our methodology calculate the variant p-score of control subjects at each time point and generate the p-score difference from each control subject to every treatment subject at treatment subject's time point. Then such p-score differences are summarized to create the distance matrix for next step analysis. Once again, the performance of OFM and other well-established matching algorithms are compared side by side and the conclusion will be summarized through simulation and real data applications.
Paper 3 handles missing value problem in longitudinal data. As we have mentioned in paper 2, the complexity of data structure of longitudinal data often comes with the problem of missing data. Due to the possibility of between subject and within subject correlation, the traditional imputation methodology will probably ignore the above two correlations so that it may lead to biased or inefficient imputation of missing data. We adopt one missing value imputation strategy introduced by Schafer and Yucel (2002) through one R package "pan" to handle the above two correlations. The "imputed complete data" will be treated using the similar methodology as paper 2. Then MI results will be summarized using Rubin's rule (1987). The conclusion will be drawn based on the findings through simulation study and compared to what we have found in complete longitudinal data study in paper 2.
In last section, we conclude the dissertation with the discussion of preliminary results, as well as the strengths and limitations of the present research. Also we will point out the direction of the future study and provide suggestions to practice works.
Patyk, Sylwia. "Forces analysis rolling burnishing rough surfaces with triangular outlines asperity : PhD thesis summary." Rozprawa doktorska, [s.n.], 2015. http://dlibra.tu.koszalin.pl/Content/1059.
Full textLabaš, Dominik. "Analýza metod pro detekci odlehlých hodnot." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445527.
Full textAldas, Cem Nuri. "An Analysis Of Peculiarity Oriented Interestingness Measures On Medical Data." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609856/index.pdf.
Full textSlezak, Thomas Joseph. "Quantitative Morphological Classification of Planetary Craterforms Using Multivariate Methods of Outline-Based Shape Analysis." BYU ScholarsArchive, 2017. https://scholarsarchive.byu.edu/etd/6639.
Full textWedlake, Ryan Stuart. "Robust principal component analysis biplots." Thesis, Link to the online version, 2008. http://hdl.handle.net/10019/929.
Full textFritsch, Virgile. "High-dimensional statistical methods for inter-subject studies in neuroimaging." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00934695.
Full textBartholomäus, Jenny, Sven Wunderlich, and Zoltán Sasvári. "Identification of Suspicious Semiconductor Devices Using Independent Component Analysis with Dimensionality Reduction." Institute of Electrical and Electronics Engineers (IEEE), 2019. https://tud.qucosa.de/id/qucosa%3A35129.
Full textAlmeida, Júnior José de. "Detecção de outlier como suporte para o controle estatístico do processo multivariado: um estudo de caso em uma empresa do setor plástico." Universidade Federal da Paraíba, 2013. http://tede.biblioteca.ufpb.br:8080/handle/tede/5225.
Full textCoordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
The research project studied, aimed to apply a forward search algorithm to aid decision making in multivariate statistical process control in the manufacture of crates in a company of plastic products. Besides, the use of principal components analysis (PCA) and the Hotelling T square chart can summarize relevant information of this process. Thus, they were produced two results of considerable importance: the scores of the principal components and an adapted Hotelling T square chart, highlighting the relationship between the ten variables analyzed. The forward search algorithm detects discordant points of the data clustering rest that, when are too far away or have very different characteristics, are called outliers. The BACON algorithm was used for the detection of such occurrences, which part of a small subset demonstrably free of the original data outliers and it goes adding new information, which is not outliers, to this initial subset until no information can more be absorbed. One of the advantages of using this algorithm is that it combats the masking and swamping phenomena that alter the mean and covariance estimates. The research results showed that, for the dataset studied, the BACON algorithm did not detected no dissenting point. A simulation was then developed, using a uniform distribution by obtaining random numbers within a range for modifying the mean and standard deviation values, in order to show that this method is effective in detecting these outliers. For this simulation, they were randomly changed 5% of the mean and the standard deviation values of the original data. The result of this simulation showed that the BACON algorithm is perfectly applicable to this case study, being indicated its use in other processes that simultaneously depend on several variables.
O projeto de pesquisa estudado teve o objetivo de aplicar um algoritmo de busca sucessiva para o auxílio à tomada de decisão no controle estatístico do processo multivariado, na fabricação de garrafeiras em uma empresa de produtos plásticos. Além disso, a utilização das técnicas de análise de componentes principais (ACP) e da carta T² de Hotelling pode sumarizar parte das informações relevantes desse processo. Produziram-se então dois resultados de considerável importância: os escores dos componentes principais e um gráfico T² de Hotelling adaptado, evidenciando a relação entre as dez variáveis analisadas. O algoritmo de busca sucessiva detecta pontos discordantes do restante do agrupamento de dados que, quando se encontram muito distantes ou têm características muito diferentes, são denominados outliers. O algoritmo BACON foi utilizado para a detecção de tais ocorrências, o qual parte de um pequeno subconjunto, comprovadamente livre de outliers, dos dados originais e vai adicionando novas informações, que também não são outliers, a esse subconjunto inicial até que nenhuma informação possa mais ser absorvida. Uma das vantagens da utilização desse algoritmo é que ele combate os fenômenos do mascaramento e do esmagamento que alteram as estimativas da média e da covariância. Os resultados da pesquisa mostraram que, para a o conjunto de dados estudados, o algoritmo BACON não detectou nenhum ponto discordante. Uma simulação foi então desenvolvida, utilizando uma distribuição uniforme através da obtenção de números aleatórios dentro de um intervalo para a modificação dos valores da média e do desvio-padrão, a fim de mostrar que tal método é eficaz na detecção desses pontos aberrantes. Para essa simulação, foram alterados aleatoriamente os valores da média e do desvio-padrão de 5% dos dados originais. O resultado dessa simulação mostrou que o algoritmo BACON é perfeitamente aplicável ao caso estudado, sendo indicada a sua utilização em outros processos produtivos que dependam simultaneamente de diversas variáveis.
Lausberg, Isabel. "Kundenpräferenzen für neue Angebotsformen im Einzelhandel eine Analyse am Beispiel von Factory Outlet Centern /." [S.l. : s.n.], 2002. http://d-nb.info/965502074/34.
Full textSienkiewicz, Stefan Fareed Abbas. "Five modes of scepticism : an analysis of the Agrippan modes in Sextus Empiricus' Outlines of Pyrrhonism." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:2f49a75d-164c-4534-aa9e-9579d55be086.
Full textTuma, Josef. "Strategická analýza firmy "Marie Tumová" a nástin strategie." Master's thesis, Vysoká škola ekonomická v Praze, 2008. http://www.nusl.cz/ntk/nusl-76982.
Full textWaddle, Ashleigh Danielle. "A Market Analysis for Specialty Beef in Virginia." Thesis, Virginia Tech, 2009. http://hdl.handle.net/10919/32656.
Full textMaster of Science
Cenonfolo, Filippo. "Signal cleaning techniques and anomaly detection algorithms for motorbike applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full text