Einloggen

Thematische Bibliographien / Mathematics Outlines / Dissertationen

Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Mathematics Outlines.

Dissertationen zum Thema „Mathematics Outlines“

Autor: Grafiati

Veröffentlicht am 4. Juni 2021

Zuletzt aktualisiert am 8. Februar 2022

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-25 Dissertationen für die Forschung zum Thema "Mathematics Outlines" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Dissertationen für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Breetzke, Peter Roland. „Sequence in the mathematics syllabus : an investigation of the Senior Secondary Mathematics Syllabus (July 1984) of the Cape Education Department attempting to reconcile the demands of the strictly mathematical order and the developmental needs of pupils, modified by the mathematical potential of the electronic calculator : some teaching strategies resulting from new influences in the syllabus“. Thesis, Rhodes University, 1988. http://hdl.handle.net/10962/d1001430.

Der volle Inhalt der Quelle

Annotation:

This study was motivated by the latest revision of the mathematics syllabuses of the Cape Education Department. The most important changes to content in the Senior Secondary Mathematics Syllabus (July 1984) are the introduction of calculus and linear programming, the substitution of a section on analytical geometry for vector algebra and the recall of the remainder and factor theorems. The way in which these changes were introduced left the task of integrating them into the teaching process in the hands of individual teachers. This is a task of extreme importance. If one's classroom practice is to simply plough one's way through the syllabus, one loses many opportunities to make the study of mathematics meaningful and worthwhile. Accepting the view of the spiral nature of the curriculum where one returns to concepts and procedures at increasing levels of sophistication, one needs to identify the position of topics in this spiral and to trace their conceptual foundations. Analytical geometry is in particular need of this treatment. Similarly there are many opportunities for preparing for the introduction of calculus. If the teaching of calculus is left until the last moments of the Standard 10 year without proper groundwork, the pupil will be left with little time to develop an understanding of the concepts involved. It is the advent of calculators which presents the greatest challenge to mathematics education. We ignore this challenge to the detriment of our teaching. Taken seriously calculators have the potential to exert a radical influence on the content of curricula and examinations. They bring into question the time we spend on teaching arithmetic algorithms and the priority given to algebraic manipulation. Numercial methods gain new prominence. Calculators can even breathe new life into the existing curriculum. Their computing power can be harnessed not only to carry out specific calculations but also to introduce new topics and for concept reinforcement. The purpose of this study has been to bring about a proper integration of the new sections into the existing syllabus and to give some instances of how the calculator can become an integral part of the teaching/learning process

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Robson, Geoffrey. „Multiple outlier detection and cluster analysis of multivariate normal data“. Thesis, Stellenbosch : Stellenbosch University, 2003. http://hdl.handle.net/10019.1/53508.

Der volle Inhalt der Quelle

Annotation:

Thesis (MscEng)--Stellenbosch University, 2003.
ENGLISH ABSTRACT: Outliers may be defined as observations that are sufficiently aberrant to arouse the suspicion of the analyst as to their origin. They could be the result of human error, in which case they should be corrected, but they may also be an interesting exception, and this would deserve further investigation. Identification of outliers typically consists of an informal inspection of a plot of the data, but this is unreliable for dimensions greater than two. A formal procedure for detecting outliers allows for consistency when classifying observations. It also enables one to automate the detection of outliers by using computers. The special case of univariate data is treated separately to introduce essential concepts, and also because it may well be of interest in its own right. We then consider techniques used for detecting multiple outliers in a multivariate normal sample, and go on to explain how these may be generalized to include cluster analysis. Multivariate outlier detection is based on the Minimum Covariance Determinant (MCD) subset, and is therefore treated in detail. Exact bivariate algorithms were refined and implemented, and the solutions were used to establish the performance of the commonly used heuristic, Fast–MCD.
AFRIKAANSE OPSOMMING: Uitskieters word gedefinieer as waarnemings wat tot s´o ’n mate afwyk van die verwagte gedrag dat die analis wantrouig is oor die oorsprong daarvan. Hierdie waarnemings mag die resultaat wees van menslike foute, in welke geval dit reggestel moet word. Dit mag egter ook ’n interressante verskynsel wees wat verdere ondersoek benodig. Die identifikasie van uitskieters word tipies informeel deur inspeksie vanaf ’n grafiese voorstelling van die data uitgevoer, maar hierdie benadering is onbetroubaar vir dimensies groter as twee. ’n Formele prosedure vir die bepaling van uitskieters sal meer konsekwente klassifisering van steekproefdata tot gevolg hˆe. Dit gee ook geleentheid vir effektiewe rekenaar implementering van die tegnieke. Aanvanklik word die spesiale geval van eenveranderlike data behandel om noodsaaklike begrippe bekend te stel, maar ook aangesien dit in eie reg ’n area van groot belang is. Verder word tegnieke vir die identifikasie van verskeie uitskieters in meerveranderlike, normaal verspreide data beskou. Daar word ook ondersoek hoe hierdie idees veralgemeen kan word om tros analise in te sluit. Die sogenaamde Minimum Covariance Determinant (MCD) subversameling is fundamenteel vir die identifikasie van meerveranderlike uitskieters, en word daarom in detail ondersoek. Deterministiese tweeveranderlike algoritmes is verfyn en ge¨ımplementeer, en gebruik om die effektiwiteit van die algemeen gebruikte heuristiese algoritme, Fast–MCD, te ondersoek.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Dunagan, John D. (John David) 1976. „A geometric theory of outliers and perturbation“. Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/8396.

Der volle Inhalt der Quelle

Annotation:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2002.
Includes bibliographical references (p. 91-94).
We develop a new understanding of outliers and the behavior of linear programs under perturbation. Outliers are ubiquitous in scientific theory and practice. We analyze a simple algorithm for removal of outliers from a high-dimensional data set and show the algorithm to be asymptotically good. We extend this result to distributions that we can access only by sampling, and also to the optimization version of the problem. Our results cover both the discrete and continuous cases. This is joint work with Santosh Vempala. The complexity of solving linear programs has interested researchers for half a century now. We show that an arbitrary linear program subject to a small random relative perturbation has good condition number with high probability, and hence is easy to solve. This is joint work with Avrim Blum, Daniel Spielman, and Shang-Hua Teng. This result forms part of the smoothed analysis project initiated by Spielman and Teng to better explain mathematically the observed performance of algorithms.
by John D. Dunagan.
Ph.D.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Kuo, Jonny. „Outlier detection for overnight index swaps“. Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-173378.

Der volle Inhalt der Quelle

Annotation:

I examensarbetet undersöks metoder för anomalidetektion i tidsserie data. Givet data för overnight index swaps (SEK), så har syntetiskt data skapats med olika ty-per av anomalier. Jämförelse mellan algoritmerna Isolation forest och Local outlierfactor görs genom att mäta respektive prestande för de syntetiska dataseten mot Accuracy, Precision, Recall, F-measure och Matthews correlation coefficient.
In the thesis, methods for anomaly detection in time series data are investigated. Given data for overnight index swaps (SEK), synthetic data has been created with different types of anomalies. Comparison between the Isolation forest and Local outlier factor algorithms is done by measuring the respective performances for the synthetic data sets against Accuracy, Precision, Recall, F-Measure and Matthews correlation coefficient.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Schall, Robert. „Outliers and influence under arbitrary variance“. Doctoral thesis, University of Cape Town, 1986. http://hdl.handle.net/11427/21913.

Der volle Inhalt der Quelle

Annotation:

Using a geometric approach to best linear unbiased estimation in the general linear model, the additional sum of squares principle, used to generate decompositions, can be generalized allowing for an efficient treatment of augmented linear models. The notion of the admissibility of a new variable is useful in augmenting models. Best linear unbiased estimation and tests of hypotheses can be performed through transformations and reparametrizations of the general linear model. The theory of outliers and influential observations can be generalized so as to be applicable for the general univariate linear model, where three types of outlier and influence may be distinguished. The adjusted models, adjusted parameter estimates, and test statistics corresponding to each type of outlier are obtained, and data adjustments can be effected. Relationships to missing data problems are exhibited. A unified approach to outliers in the general linear model is developed. The concept of recursive residuals admits generalization. The typification of outliers and influential observations in the general linear model can be extended to normal multivariate models. When the outliers in a multivariate regression model follow a nested pattern, maximum likelihood estimation of the parameters in the model adjusted for the different types of outlier can be performed in closed form, and the corresponding likelihood ratio test statistic is obtained in closed form. For an arbitrary outlier pattern, and for the problem of outliers in the generalized multivariate regression model, three versions of the EM-algorithm corresponding to three types of outlier are used to obtain maximum likelihood estimates iteratively. A fundamental principle is the comparison of observations with a choice of distribution appropriate to the presumed type of outlier present. Applications are not necessarily restricted to multivariate normality.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Sedman, Robin. „Online Outlier Detection in Financial Time Series“. Thesis, KTH, Matematisk statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-228069.

Der volle Inhalt der Quelle

Annotation:

In this Master’s thesis, diﬀerent models for outlier detection in ﬁnancial time series are examined. The ﬁnancial time series are price series such as index prices or asset prices. Outliers are, in this thesis, deﬁned as extreme and false points, but this deﬁnition is also investigated and revised. Two diﬀerent time series models are examined: an autoregressive (AR) and a generalized autoregressive conditional heteroskedastic (GARCH) time series model, as well as one test statistic method based on the GARCH model. Additionally, a nonparametric model is examined, which utilizes kernel density estimation in order to detect outliers. The models are evaluated by how well they detect outliers and how often they misclassify inliers as well as the run time of the models. It is found that all the models performs approximately equally good, on the data sets used in thesis and the simulations done, in terms of how well the methods ﬁnd outliers, apart from the test static method which performs worse than the others. Furthermore it is found that deﬁnition of an outlier is very crucial to how well a model detects the outliers. For the application of this thesis, the run time is an important aspect, and with this in mind an autoregressive model with a Student’s t-noise distribution is found to be the best one, both with respect to how well it detects outliers, misclassify inliers and run time of the model.
I detta examensarbete undersöks olika modeller för outlierdetektering i ﬁnansiella tidsserier. De ﬁnansiella tidsserierna är prisserier som indexpriser eller tillgångspriser. Outliers är i detta examensarbete deﬁnierade som extrema och falska punkter, men denna deﬁnition undersöks och revideras också. Två olika tidsseriemodeller undersöks: en autoregressiv (AR) och en generel au-toregressiv betingad heteroskedasticitet1 (GARCH) tidsseriemodell, samt en hypotesprövning2 baserad på GARCH-modellen. Dessutom undersöks en icke-parametrisk modell, vilken använder sig utav uppskattning av täthetsfunktionen med hjälp av kärnfunktioner3 för att detektera out-liers. Modellerna utvärderas utifrån hur väl de upptäcker outliers, hur ofta de kategoriserar icke-outliers som outliers samt modellens körtid. Det är konstaterat att alla modeller ungefär presterar lika bra, baserat på den data som används och de simuleringar som gjorts, i form av hur väl outliers är detekterade, förutom metoden baserad på hypotesprövning som fungerar sämre än de andra. Vidare är det uppenbart att deﬁnitionen av en outlier är väldigt avgörande för hur bra en modell detekterar outliers. För tillämpningen av detta examensarbete, så är körtid en viktig faktor, och med detta i åtanke är en autoregressiv modell med Students t-brusfördelning funnen att vara den bästa modellen, både med avseende på hur väl den detekterar outliers, felaktigt detekterar inliers som outliers och modellens körtid.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Dishman, Tamarah Crouse. „Identifying Outliers in a Random Effects Model For Longitudinal Data“. UNF Digital Commons, 1989. http://digitalcommons.unf.edu/etd/191.

Der volle Inhalt der Quelle

Annotation:

Identifying non-tracking individuals in a population of longitudinal data has many applications as well as complications. The analysis of longitudinal data is a special study in itself. There are several accepted methods, of those we chose a two-stage random effects model coupled with the Estimation Maximization Algorithm (E-M Algorithm) . Our project consisted of first estimating population parameters using the previously mentioned methods. The Mahalanobis distance was then used to sequentially identify and eliminate non-trackers from the population. Computer simulations were run in order to measure the algorithm's effectiveness. Our results show that the average specificity for the repetitions for each simulation remained at the 99% level. The sensitivity was best when only a single non-tracker was present with a very different parameter a. The sensitivity of the program decreased when more than one tracker was present, indicating our method of identifying a non-tracker is not effective when the estimates of the population parameters are contaminated.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Liu, Jie. „Exploring Ways of Identifying Outliers in Spatial Point Patterns“. Digital Commons @ East Tennessee State University, 2015. https://dc.etsu.edu/etd/2528.

Der volle Inhalt der Quelle

Annotation:

This work discusses alternative methods to detect outliers in spatial point patterns. Outliers are defined based on location only and also with respect to associated variables. Throughout the thesis we discuss five case studies, three of them come from experiments with spiders and bees, and the other two are data from earthquakes in a certain region. One of the main conclusions is that when detecting outliers from the point of view of location we need to take into consideration both the degree of clustering of the events and the context of the study. When detecting outliers from the point of view of an associated variable, outliers can be identified from a global or local perspective. For global outliers, one of the main questions addressed is whether the outliers tend to be clustered or randomly distributed in the region. All the work was done using the R programming language.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Katshunga, Dominique. „Identifying outliers and influential observations in general linear regression models“. Thesis, University of Cape Town, 2004. http://hdl.handle.net/11427/6772.

Der volle Inhalt der Quelle

Annotation:

Includes bibliographical references (leaves 140-149).
Identifying outliers and/or influential observations is a fundamental step in any statistical analysis, since their presence is likely to lead to erroneous results. Numerous measures have been proposed for detecting outliers and assessing the influence of observations on least squares regression results. Since outliers can arise in different ways, the above mentioned measures are based on motivational arguments and they are designed to measure the influence of observations on different aspects of various regression results. In what follows, we investigate how one can combine different test statistics based on residuals and diagnostic plots to identify outliers and influential observations (both in the single and multiple case) in general linear regression models.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Giziaki, Ernestini. „Tests for uncharacteristic changes in time series data and the effects of outliers on forecasts“. Thesis, London Metropolitan University, 1987. http://repository.londonmet.ac.uk/3115/.

Der volle Inhalt der Quelle

Annotation:

The thesis deals with some of the anomalies,that affect the predictive performance of univariate time series. This project should help to improve the forecasts made and should also assist those engaged in time series forecasting in real life situations in industry,government and elsewhere. The problem of testing a set of data for outliers is not new in statistics,methods having been proposed for the general linear model. However, there are very few papers on testing time series data for outliers. The greater part of the thesis is concerned with the effects of outliers on forecasts, statistical methods of detection of outliers and the comparison of these methods. Applications of these methods in real life situations are also considered. A subsidiary part of the thesis is concerned with the shift in the level of the series type of anomaly. Very few papers are published. These papers are reviewed. Tests of detection of this type of anomaly are proposed. The final section considers the contribution made, the findings of the work and areas for further research.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

Van, Deventer Petrus Jacobus Uys. „Outliers, influential observations and robust estimation in non-linear regression analysis and discriminant analysis“. Doctoral thesis, University of Cape Town, 1993. http://hdl.handle.net/11427/4363.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

Gustavsson, Hanna. „Clustering Based Outlier Detection for Improved Situation Awareness within Air Traffic Control“. Thesis, KTH, Optimeringslära och systemteori, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264215.

Der volle Inhalt der Quelle

Annotation:

The aim of this thesis is to examine clustering based outlier detection algorithms on their ability to detect abnormal events in flight traffic. A nominal model is trained on a data-set containing only flights which are labeled as normal. A detection scoring function based on the nominal model is used to decide if a new and in forehand unseen data-point behaves like the nominal model or not. Due to the unknown structure of the data-set three different clustering algorithms are examined for training the nominal model, K-means, Gaussian Mixture Model and Spectral Clustering. Depending on the nominal model different methods to obtain a detection scoring is used, such as metric distance, probability and OneClass Support Vector Machine. This thesis concludes that a clustering based outlier detection algorithm is feasible for detecting abnormal events in flight traffic. The best performance was obtained by using Spectral Clustering combined with a Oneclass Support Vector Machine. The accuracy on the test data-set was 95.8%. The algorithm managed to correctly classify 89.4% of the datapoints labeled as abnormal and correctly classified 96.2% of the datapoints labeled as normal.
Syftet med detta arbete är att undersöka huruvida klusterbaserad anomalidetektering kan upptäcka onormala händelser inom flygtrafik. En normalmodell är anpassad till data som endast innehåller flygturer som är märkta som normala. Givet denna normalmodell så anpassas en anomalidetekteringsfunktion så att data-punkter som är lika normalmodellen klassificeras som normala och data-punkter som är avvikande som anomalier. På grund av att strukturen av nomraldatan är okänd så är tre olika klustermetoder testade, K-means, Gaussian Mixture Model och Spektralklustering. Beroende på hur normalmodellen är modellerad så har olika metoder för anpassa en detekteringsfunktion används, så som baserat på avstånd, sannolikhet och slutligen genom One-class Support Vector Machine. Detta arbete kan dra slutsatsen att det är möjligt att detektera anomalier med hjälp av en klusterbaserad anomalidetektering. Den algoritm som presterade bäst var den som kombinerade spektralklustring med One-class Support Vector Machine. På test-datan så klassificerade algoritmen $95.8\%$ av all data korrekt. Av alla data-punkter som var märka som anomalier så klassificerade denna algoritm 89.4% rätt, och på de data-punkter som var märka som normala så klassificerade algoritmen 96.2% rätt.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

Schnepper, Teresa [Verfasser]. „Location Problems with k-max Functions - Modelling and Analysing Outliers in Center Problems / Teresa Schnepper“. Wuppertal : Universitätsbibliothek Wuppertal, 2017. http://d-nb.info/1138639443/34.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Schubert, Erich. „Generalized and efficient outlier detection for spatial, temporal, and high-dimensional data mining“. Diss., Ludwig-Maximilians-Universität München, 2013. http://nbn-resolving.de/urn:nbn:de:bvb:19-166938.

Der volle Inhalt der Quelle

Annotation:

Knowledge Discovery in Databases (KDD) ist der Prozess, nicht-triviale Muster aus großen Datenbanken zu extrahieren, mit dem Ziel, dass diese bisher unbekannt, potentiell nützlich, statistisch fundiert und verständlich sind. Der Prozess umfasst mehrere Schritte wie die Selektion, Vorverarbeitung, Evaluierung und den Analyseschritt, der als Data-Mining bekannt ist. Eine der zentralen Aufgabenstellungen im Data-Mining ist die Ausreißererkennung, das Identifizieren von Beobachtungen, die ungewöhnlich sind und mit der Mehrzahl der Daten inkonsistent erscheinen. Solche seltene Beobachtungen können verschiedene Ursachen haben: Messfehler, ungewöhnlich starke (aber dennoch genuine) Abweichungen, beschädigte oder auch manipulierte Daten. In den letzten Jahren wurden zahlreiche Verfahren zur Erkennung von Ausreißern vorgeschlagen, die sich oft nur geringfügig zu unterscheiden scheinen, aber in den Publikationen experimental als ``klar besser'' dargestellt sind. Ein Schwerpunkt dieser Arbeit ist es, die unterschiedlichen Verfahren zusammenzuführen und in einem gemeinsamen Formalismus zu modularisieren. Damit wird einerseits die Analyse der Unterschiede vereinfacht, andererseits aber die Flexibilität der Verfahren erhöht, indem man Module hinzufügen oder ersetzen und damit die Methode an geänderte Anforderungen und Datentypen anpassen kann. Um die Vorteile der modularisierten Struktur zu zeigen, werden (i) zahlreiche bestehende Algorithmen in dem Schema formalisiert, (ii) neue Module hinzugefügt, um die Robustheit, Effizienz, statistische Aussagekraft und Nutzbarkeit der Bewertungsfunktionen zu verbessern, mit denen die existierenden Methoden kombiniert werden können, (iii) Module modifiziert, um bestehende und neue Algorithmen auf andere, oft komplexere, Datentypen anzuwenden wie geographisch annotierte Daten, Zeitreihen und hochdimensionale Räume, (iv) mehrere Methoden in ein Verfahren kombiniert, um bessere Ergebnisse zu erzielen, (v) die Skalierbarkeit auf große Datenmengen durch approximative oder exakte Indizierung verbessert. Ausgangspunkt der Arbeit ist der Algorithmus Local Outlier Factor (LOF). Er wird zunächst mit kleinen Erweiterungen modifiziert, um die Robustheit und die Nutzbarkeit der Bewertung zu verbessern. Diese Methoden werden anschließend in einem gemeinsamen Rahmen zur Erkennung lokaler Ausreißer formalisiert, um die entsprechenden Vorteile auch in anderen Algorithmen nutzen zu können. Durch Abstraktion von einem einzelnen Vektorraum zu allgemeinen Datentypen können auch räumliche und zeitliche Beziehungen analysiert werden. Die Verwendung von Unterraum- und Korrelations-basierten Nachbarschaften ermöglicht dann, einen neue Arten von Ausreißern in beliebig orientierten Projektionen zu erkennen. Verbesserungen bei den Bewertungsfunktionen erlauben es, die Bewertung mit der statistischen Intuition einer Wahrscheinlichkeit zu interpretieren und nicht nur eine Ausreißer-Rangfolge zu erstellen wie zuvor. Verbesserte Modelle generieren auch Erklärungen, warum ein Objekt als Ausreißer bewertet wurde. Anschließend werden für verschiedene Module Verbesserungen eingeführt, die unter anderem ermöglichen, die Algorithmen auf wesentlich größere Datensätze anzuwenden -- in annähernd linearer statt in quadratischer Zeit --, indem man approximative Nachbarschaften bei geringem Verlust an Präzision und Effektivität erlaubt. Des weiteren wird gezeigt, wie mehrere solcher Algorithmen mit unterschiedlichen Intuitionen gleichzeitig benutzt und die Ergebnisse in einer Methode kombiniert werden können, die dadurch unterschiedliche Arten von Ausreißern erkennen kann. Schließlich werden für reale Datensätze neue Ausreißeralgorithmen konstruiert, die auf das spezifische Problem angepasst sind. Diese neuen Methoden erlauben es, so aufschlussreiche Ergebnisse zu erhalten, die mit den bestehenden Methoden nicht erreicht werden konnten. Da sie aus den Bausteinen der modularen Struktur entwickelt wurden, ist ein direkter Bezug zu den früheren Ansätzen gegeben. Durch Verwendung der Indexstrukturen können die Algorithmen selbst auf großen Datensätzen effizient ausgeführt werden.
Knowledge Discovery in Databases (KDD) is the process of extracting non-trivial patterns in large data bases, with the focus of extracting novel, potentially useful, statistically valid and understandable patterns. The process involves multiple phases including selection, preprocessing, evaluation and the analysis step which is known as Data Mining. One of the key techniques of Data Mining is outlier detection, that is the identification of observations that are unusual and seemingly inconsistent with the majority of the data set. Such rare observations can have various reasons: they can be measurement errors, unusually extreme (but valid) measurements, data corruption or even manipulated data. Over the previous years, various outlier detection algorithms have been proposed that often appear to be only slightly different than previous but ``clearly outperform'' the others in the experiments. A key focus of this thesis is to unify and modularize the various approaches into a common formalism to make the analysis of the actual differences easier, but at the same time increase the flexibility of the approaches by allowing the addition and replacement of modules to adapt the methods to different requirements and data types. To show the benefits of the modularized structure, (i) several existing algorithms are formalized within the new framework (ii) new modules are added that improve the robustness, efficiency, statistical validity and score usability and that can be combined with existing methods (iii) modules are modified to allow existing and new algorithms to run on other, often more complex data types including spatial, temporal and high-dimensional data spaces (iv) the combination of multiple algorithm instances into an ensemble method is discussed (v) the scalability to large data sets is improved using approximate as well as exact indexing. The starting point is the Local Outlier Factor (LOF) algorithm, which is extended with slight modifications to increase robustness and the usability of the produced scores. In order to get the same benefits for other methods, these methods are abstracted to a general framework for local outlier detection. By abstracting from a single vector space, other data types that involve spatial and temporal relationships can be analyzed. The use of subspace and correlation neighborhoods allows the algorithms to detect new kinds of outliers in arbitrarily oriented subspaces. Improvements in the score normalization bring back a statistic intuition of probabilities to the outlier scores that previously were only useful for ranking objects, while improved models also offer explanations of why an object was considered to be an outlier. Subsequently, for different modules found in the framework improved modules are presented that for example allow to run the same algorithms on significantly larger data sets -- in approximately linear complexity instead of quadratic complexity -- by accepting approximated neighborhoods at little loss in precision and effectiveness. Additionally, multiple algorithms with different intuitions can be run at the same time, and the results combined into an ensemble method that is able to detect outliers of different types. Finally, new outlier detection methods are constructed; customized for the specific problems of these real data sets. The new methods allow to obtain insightful results that could not be obtained with the existing methods. Since being constructed from the same building blocks, there however exists a strong and explicit connection to the previous approaches, and by using the indexing strategies introduced earlier, the algorithms can be executed efficiently even on large data sets.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Ghawi, Christina. „Forecasting Volume of Sales During the Abnormal Time Period of COVID-19. An Investigation on How to Forecast, Where the Classical ARIMA Family of Models Fail“. Thesis, KTH, Matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302396.

Der volle Inhalt der Quelle

Annotation:

During the COVID-19 pandemic, customer shopping habits have changed. Some industries experienced an abrupt shift during the pandemic outbreak while others navigate in new normal states. For some merchants, the highly-uncertain new phenomena of COVID-19 expresses as outliers in time series of volume of sales. As forecasting models tend to replicate past behavior of a series, outliers complicates the procedure of forecasting; the abnormal events tend to unreliably replicate in forecasts of the subsequent year(s). In this thesis, we investigate how to forecast volume of sales during the abnormal time period of COVID-19, where the classical ARIMA family of models produce unreliable forecasts. The research revolved around three time series exhibiting three types of outliers: a level shift, a transient change and an additive outlier. Upon detecting the time period of the abnormal behavior in each series, two experiments were carried out as attempts for increasing the predictive accuracy for the three extreme cases. The first experiment was related to imputing the abnormal data in the series and the second was related to using a combined model of a pre-pandemic and a post-abnormal forecast. The results of the experiments pointed at significant improvement of the mean absolute percentage error at significance level alpha=0.05 for the level shift when using a combined model compared to the pre-pandemic best-fit SARIMA model. Also, at significant improvement for the additive outlier when using a linear impute. For the transient change, the results pointed at no significant improvement in the predictive accuracy of the experimental models compared to the pre-pandemic best-fit SARIMA model. For the purpose of generalizing to large-scale conclusions of methods' superiority or feasibility for particular abnormal behaviors, empirical evaluations are required. The proposed experimental models were discussed in terms of reliability, validity and quality. By residual diagnostics, it was argued that the models were valid; however, that further improvements can be made. Also, it was argued that the models fulfilled desired attributes of simplicity, scaleability and flexibility. Due to the uncertain phenomena of the COVID-19 pandemic, it was suggested not to take the outputs as long-term reliable solutions. Rather, as temporary solutions requiring more frequent updating of forecasts.
Under coronapandemin har kundbeteenden och köpvanor förändrats. I vissa branscher upplevdes ett plötsligt skifte vid pandemiutbrottet och i andra navigerar handlare i nya normaltillstånd. För vissa handlare är förändringarna så pass distinkta att de yttrar sig som avvikelser i tidsserier över försäljningsvolym. Dessa avvikelser komplicerar prognosering. Då prognosmodeller tenderar att replikera tidsseriers tidigare beteenden, tenderas det avvikande beteendet att replikeras i försäljningsprognoser för nästkommande år. I detta examensarbete ämnar vi att undersöka tillvägagångssätt för att estimera försäljningsprognoser under den abnorma tidsperioden av COVID-19, då klassiska tidsseriemodeller felprognoserar. Detta arbete kretsade kring tre tidsserier som uttryckte tre avvikelsertyper: en nivåförskjutning, en övergående förändring och en additiv avvikelse. Efter att ha definierat en specifik tidsperiod relaterat till det abnorma beteendet i varje tidsserie, utfördes två experiment med syftet att öka den prediktiva noggrannheten för de tre extremfallen. Det första experimentet handlade om att ersätta den abnorma datan i varje serie och det andra experimentet handlade om att använda en kombinerad pronosmodell av två estimerade prognoser, en pre-pandemisk och en post-abnorm. Resultaten av experimenten pekade på signifikant förbättring av ett absolut procentuellt genomsnittsfel för nivåförskjutningen vid användande av den kombinerade modellen, i jämförelse med den pre-pandemiskt bäst passande SARIMA-modellen. Även, signifikant förbättring för den additiva avvikelsen vid ersättning av abnorm data till ett motsvarande linjärt polynom. För den övergående förändringen pekade resultaten inte på en signifikant förbättring vid användande av de experimentella modellerna. För att generalisera till storskaliga slutsatser giltiga för specifika avvikande beteenden krävs empirisk utvärdering. De föreslagna modellerna diskuterades utifrån tillförlitlighet, validitet och kvalitet. Modellerna uppfyllde önskvärda kvalitativa attribut såsom enkelhet, skalbarhet och flexibilitet. På grund av hög osäkerhet i den nuvarande abnorma tidsperioden av coronapandemin, föreslogs det att inte se prognoserna som långsiktigt pålitliga lösningar, utan snarare som tillfälliga tillvägagångssätt som regelbundet kräver om-prognosering.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Paulin, Carl. „Detecting anomalies in data streams driven by ajump-diffusion process“. Thesis, Umeå universitet, Institutionen för fysik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184230.

Der volle Inhalt der Quelle

Annotation:

Jump-diffusion processes often model financial time series as they can simulate the random jumps that they frequently exhibit. These jumps can be seen as anomalies and are essential for financial analysis and model building, making them vital to detect.The realized variation, realized bipower variation, and realized semi-variation were tested to see if one could use them to detect jumps in a jump-diffusion process and if anomaly detection algorithms can use them as features to improve their accuracy. The algorithms tested were Isolation Forest, Robust Random Cut Forest, and Isolation Forest Algorithm for Streaming Data, where the latter two use streaming data. This was done by generating a Merton jump-diffusion process with a varying jump-rate and tested using each algorithm with each of the features. The performance of each algorithm was measured using the F1-score to compare the difference between features and algorithms. It was found that the algorithms were improved from using the features; Isolation Forest saw improvement from using one, or more, of the named features. For the streaming algorithms, Robust Random Cut Forest performed the best for every jump-rate except the lowest. Using a combination of the features gave the highest F1-score for both streaming algorithms. These results show one can use these features to extract jumps, as anomaly scores, and improve the accuracy of the algorithms, both in a batch and stream setting.
Hopp-diffusionsprocesser används regelbundet för att modellera finansiella tidsserier eftersom de kan simulera de slumpmässiga hopp som ofta uppstår. Dessa hopp kan ses som anomalier och är viktiga för finansiell analys och modellbyggnad, vilket gör dom väldigt viktiga att hitta. Den realiserade variationen, realiserade bipower variationen, och realiserade semi-variationen är faktorer av en tidsserie som kan användas för att hitta hopp i hopp-diffusionprocesser. De används här för att testa om anomali-detektionsalgoritmer kan använda funktionerna för att förbättra dess förmåga att detektera hopp. Algoritmerna som testades var Isolation Forest, Robust Random Cut Forest, och Isolation Forest Algoritmen för Strömmande data, där de två sistnämnda använder strömmande data. Detta gjordes genom att genera data från en Merton hopp-diffusionprocess med varierande hoppfrekvens där de olika algoritmerna testades med varje funktion samt med kombinationer av funktioner. Prestationen av varje algoritm beräknades med hjälp av F1-värde för att kunna jämföra algoritmerna och funktionerna med varandra. Det hittades att funktionerna kan användas för att extrahera hopp från hopp-diffusionprocesser och även använda de som en indikator för när hopp skulle ha hänt. Algoritmerna fick även ett högre F1-värde när de använde funktionerna. Isolation Forest fick ett förbättrat F1-värde genom att använda en eller fler utav funktionerna och hade ett högre F1-värde än att bara använda funktionerna för att detektera hopp. Robust Random Cut Forest hade högst F1-värde av de två algoritmer som använde strömmande data och båda fick högst F1-värde när man använde en kombination utav alla funktioner. Resultatet visar att dessa funktioner fungerar för att extrahera hopp från hopprocesser, använda dem för att detektera hopp, och att algoritmernas förmåga att detektera hoppen ökade med hjälp av funktionerna.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Razroev, Stanislav. „AUTOMATED OPTIMAL FORECASTING OF UNIVARIATE MONITORING PROCESSES : Employing a novel optimal forecast methodology to define four classes of forecast approaches and testing them on real-life monitoring processes“. Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-165990.

Der volle Inhalt der Quelle

Annotation:

This work aims to explore practical one-step-ahead forecasting of structurally changing data, an unstable behaviour, that real-life data connected to human activity often exhibit. This setting can be characterized as monitoring process. Various forecast models, methods and approaches can range from being simple and computationally "cheap" to very sophisticated and computationally "expensive". Moreover, different forecast methods handle different data-patterns and structural changes differently: for some particular data types or data intervals some particular forecast methods are better than the others, something that is usually not known beforehand. This raises a question: "Can one design a forecast procedure, that effectively and optimally switches between various forecast methods, adapting the forecast methods usage to the changes in the incoming data flow?" The thesis answers this question by introducing optimality concept, that allows optimal switching between simultaneously executed forecast methods, thus "tailoring" forecast methods to the changes in the data. It is also shown, how another forecast approach: combinational forecasting, where forecast methods are combined using weighted average, can be utilized by optimality principle and can therefore benefit from it. Thus, four classes of forecast results can be considered and compared: basic forecast methods, basic optimality, combinational forecasting, and combinational optimality. The thesis shows, that the usage of optimality gives results, where most of the time optimality is no worse or better than the best of forecast methods, that optimality is based on. Optimality reduces also scattering from multitude of various forecast suggestions to a single number or only a few numbers (in a controllable fashion). Optimality gives additionally lower bound for optimal forecasting: the hypothetically best achievable forecast result. The main conclusion is that optimality approach makes more or less obsolete other traditional ways of treating the monitoring processes: trying to find the single best forecast method for some structurally changing data. This search still can be sought, of course, but it is best done within optimality approach as its innate component. All this makes the proposed optimality approach for forecasting purposes a valid "representative" of a more broad ensemble approach (which likewise motivated development of now popular Ensemble Learning concept as a valid part of Machine Learning framework).
Denna avhandling syftar till undersöka en praktisk ett-steg-i-taget prediktering av strukturmässigt skiftande data, ett icke-stabilt beteende som verkliga data kopplade till människoaktiviteter ofta demonstrerar. Denna uppsättning kan alltså karakteriseras som övervakningsprocess eller monitoringsprocess. Olika prediktionsmodeller, metoder och tillvägagångssätt kan variera från att vara enkla och "beräkningsbilliga" till sofistikerade och "beräkningsdyra". Olika prediktionsmetoder hanterar dessutom olika mönster eller strukturförändringar i data på olika sätt: för vissa typer av data eller vissa dataintervall är vissa prediktionsmetoder bättre än andra, vilket inte brukar vara känt i förväg. Detta väcker en fråga: "Kan man skapa en predictionsprocedur, som effektivt och på ett optimalt sätt skulle byta mellan olika prediktionsmetoder och för att adaptera dess användning till ändringar i inkommande dataflöde?" Avhandlingen svarar på frågan genom att introducera optimalitetskoncept eller optimalitet, något som tillåter ett optimalbyte mellan parallellt utförda prediktionsmetoder, för att på så sätt skräddarsy prediktionsmetoder till förändringar i data. Det visas också, hur ett annat prediktionstillvägagångssätt: kombinationsprediktering, där olika prediktionsmetoder kombineras med hjälp av viktat medelvärde, kan utnyttjas av optimalitetsprincipen och därmed få nytta av den. Alltså, fyra klasser av prediktionsresultat kan betraktas och jämföras: basprediktionsmetoder, basoptimalitet, kombinationsprediktering och kombinationsoptimalitet. Denna avhandling visar, att användning av optimalitet ger resultat, där optimaliteten för det mesta inte är sämre eller bättre än den bästa av enskilda prediktionsmetoder, som själva optimaliteten är baserad på. Optimalitet reducerar också spridningen från mängden av olika prediktionsförslag till ett tal eller bara några enstaka tal (på ett kontrollerat sätt). Optimalitet producerar ytterligare en nedre gräns för optimalprediktion: det hypotetiskt bästa uppnåeliga prediktionsresultatet. Huvudslutsatsen är följande: optimalitetstillvägagångssätt gör att andra traditionella sätt att ta hand om övervakningsprocesser blir mer eller mindre föråldrade: att leta bara efter den enda bästa enskilda prediktionsmetoden för data med strukturskift. Sådan sökning kan fortfarande göras, men det är bäst att göra den inom optimalitetstillvägagångssättet, där den ingår som en naturlig komponent. Allt detta gör det föreslagna optimalitetstillvägagångssättetet för prediktionsändamål till en giltig "representant" för det mer allmäna ensembletillvägagångssättet (något som också motiverade utvecklingen av numera populär Ensembleinlärning som en giltig del av Maskininlärning).

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

Truran, John Maxwell. „The Teaching and Learning of Probability, with Special Reference to South Australian Schools from 1959-1994“. 2001. http://hdl.handle.net/2440/37837.

Der volle Inhalt der Quelle

Annotation:

The teaching of probability in schools provides a good opportunity for examining how a new topic is integrated into a school curriculum. Furthermore, because probabilistic thinking is quite different from the deterministic thinking traditionally found in mathematics classrooms, such an examination is particularly able to highlight significant forces operating within educational practice. After six chapters which describe relevant aspects of the philosophical, cultural, and intellectual environment within which probability has been taught, a 'Broad-Spectrum Ecological Model' is developed to examine the forces which operate on a school system. The Model sees school systems and their various participants as operating according to general ecological principles, where and interprets actions as responses to situations in ways which minimise energy expenditure and maximise chances of survival. The Model posits three principal forces-Physical, Social and Intellectual-as providing an adequate structure. The value of the Model as an interpretative framework is then assessed by examining three separate aspects of the teaching of probability. The first is a general survey of the history of the teaching of the topic from 1959 to 1994, paying particular attention to South Australia, but making some comparisons with other countries and other states of Australia. The second examines in detail attempts which have been made throughout the world to assess the understanding of probabilistic ideas. The third addresses the influence on classroom practice of research into the teaching and learning of probabilistic ideas. In all three situations the Model is shown to be a helpful way of interpreting the data, but to need some refinements. This involves the uniting of the Social and Physical forces, the division of the Intellectual force into Mathematics and Mathematics Education forces, and the addition of Pedagogical and Charismatic forces. A diagrammatic form of the Model is constructed which provides a way of indicating the relative strengths of these forces. The initial form is used throughout the thesis for interpreting the events described. The revised form is then defined and assessed, particularly against alternative explanations of the events described, and also used for drawing some comparisons with medical education. The Model appears to be effective in highlighting uneven forces and in predicting outcomes which are likely to arise from such asymmetries, and this potential predictive power is assessed for one small case study. All Models have limitations, but this one seems to explain far more than the other models used for mathematics curriculum development in Australia which have tended to see our practice as an imitation of that in other countries.
Thesis (Ph.D.)--Graduate School of Education and Department of Pure Mathematics, 2001.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

Truran, J. M. (John M. ). „The teaching and learning of probability, with special reference to South Australian schools from 1959-1994“. 2001. http://web4.library.adelaide.edu.au/theses/09PH/09pht872.pdf.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Wen, Zhen-Fu, und 溫振甫. „High school mathematics textbooks inquiry generic example: 99 lesson outline (a), (b) Case Volume“. Thesis, 2015. http://ndltd.ncl.edu.tw/handle/73243089709604257213.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Rauch, Geraldine [Verfasser]. „The LORELIA residual test : a new outlier identification test for method comparison studies / by Geraldine Rauch“. 2009. http://d-nb.info/998406295/34.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Chen, Xin. „Robust second-order least squares estimation for linear regression models“. Thesis, 2009. http://hdl.handle.net/1828/3087.

Der volle Inhalt der Quelle

Annotation:

The second-order least-squares estimator (SLSE), which was proposed by Wang (2003), is asymptotically more efficient than the least-squares estimator (LSE) if the third moment of the error distribution is nonzero. However, it is not robust against outliers. In this paper. we propose two robust second-order least-squares estimators (RSLSE) for linear regression models. RSLSE-I and RSLSE-II, where RSLSE-I is robust against X-outliers and RSLSE-II is robust. against X-outliers and Y-outliers. The basic idea is to choose proper weight matrices, which give a zero weight to an outlier. The RSLSEs are asymptotically normally distributed and are highly efficient with high breakdown point.. Moreover, we compare the RSLSEs with the LSE, the SLSE and the robust MM-estimator through simulation studies and real data examples. The results show that they perform very well and are competitive to other robust regression estimators.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Chaka, Lyson. „Impact of unbalancedness and heteroscedasticity on classic parametric significance tests of two-way fixed-effects ANOVA tests“. Diss., 2016. http://hdl.handle.net/10500/23287.

Der volle Inhalt der Quelle

Annotation:

Classic parametric statistical tests, like the analysis of variance (ANOVA), are powerful tools used for comparing population means. These tests produce accurate results provided the data satisfies underlying assumptions such as homoscedasticity and balancedness, otherwise biased results are obtained. However, these assumptions are rarely satisfied in real-life. Alternative procedures must be explored. This thesis aims at investigating the impact of heteroscedasticity and unbalancedness on effect sizes in two-way fixed-effects ANOVA models. A real-life dataset, from which three different samples were simulated was used to investigate the changes in effect sizes under the influence of unequal variances and unbalancedness. The parametric bootstrap approach was proposed in case of unequal variances and non-normality. The results obtained indicated that heteroscedasticity significantly inflates effect sizes while unbalancedness has non-significant impact on effect sizes in two-way ANOVA models. However, the impact worsens when the data is both unbalanced and heteroscedastic.
Statistics
M. Sc. (Statistics)

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

Dongmo, Jiongo Valéry. „Inférence robuste à la présence des valeurs aberrantes dans les enquêtes“. Thèse, 2015. http://hdl.handle.net/1866/13720.

Der volle Inhalt der Quelle

Annotation:

Cette thèse comporte trois articles dont un est publié et deux en préparation. Le sujet central de la thèse porte sur le traitement des valeurs aberrantes représentatives dans deux aspects importants des enquêtes que sont : l’estimation des petits domaines et l’imputation en présence de non-réponse partielle. En ce qui concerne les petits domaines, les estimateurs robustes dans le cadre des modèles au niveau des unités ont été étudiés. Sinha & Rao (2009) proposent une version robuste du meilleur prédicteur linéaire sans biais empirique pour la moyenne des petits domaines. Leur estimateur robuste est de type «plugin», et à la lumière des travaux de Chambers (1986), cet estimateur peut être biaisé dans certaines situations. Chambers et al. (2014) proposent un estimateur corrigé du biais. En outre, un estimateur de l’erreur quadratique moyenne a été associé à ces estimateurs ponctuels. Sinha & Rao (2009) proposent une procédure bootstrap paramétrique pour estimer l’erreur quadratique moyenne. Des méthodes analytiques sont proposées dans Chambers et al. (2014). Cependant, leur validité théorique n’a pas été établie et leurs performances empiriques ne sont pas pleinement satisfaisantes. Ici, nous examinons deux nouvelles approches pour obtenir une version robuste du meilleur prédicteur linéaire sans biais empirique : la première est fondée sur les travaux de Chambers (1986), et la deuxième est basée sur le concept de biais conditionnel comme mesure de l’influence d’une unité de la population. Ces deux classes d’estimateurs robustes des petits domaines incluent également un terme de correction pour le biais. Cependant, ils utilisent tous les deux l’information disponible dans tous les domaines contrairement à celui de Chambers et al. (2014) qui utilise uniquement l’information disponible dans le domaine d’intérêt. Dans certaines situations, un biais non négligeable est possible pour l’estimateur de Sinha & Rao (2009), alors que les estimateurs proposés exhibent un faible biais pour un choix approprié de la fonction d’influence et de la constante de robustesse. Les simulations Monte Carlo sont effectuées, et les comparaisons sont faites entre les estimateurs proposés et ceux de Sinha & Rao (2009) et de Chambers et al. (2014). Les résultats montrent que les estimateurs de Sinha & Rao (2009) et de Chambers et al. (2014) peuvent avoir un biais important, alors que les estimateurs proposés ont une meilleure performance en termes de biais et d’erreur quadratique moyenne. En outre, nous proposons une nouvelle procédure bootstrap pour l’estimation de l’erreur quadratique moyenne des estimateurs robustes des petits domaines. Contrairement aux procédures existantes, nous montrons formellement la validité asymptotique de la méthode bootstrap proposée. Par ailleurs, la méthode proposée est semi-paramétrique, c’est-à-dire, elle n’est pas assujettie à une hypothèse sur les distributions des erreurs ou des effets aléatoires. Ainsi, elle est particulièrement attrayante et plus largement applicable. Nous examinons les performances de notre procédure bootstrap avec les simulations Monte Carlo. Les résultats montrent que notre procédure performe bien et surtout performe mieux que tous les compétiteurs étudiés. Une application de la méthode proposée est illustrée en analysant les données réelles contenant des valeurs aberrantes de Battese, Harter & Fuller (1988). S’agissant de l’imputation en présence de non-réponse partielle, certaines formes d’imputation simple ont été étudiées. L’imputation par la régression déterministe entre les classes, qui inclut l’imputation par le ratio et l’imputation par la moyenne sont souvent utilisées dans les enquêtes. Ces méthodes d’imputation peuvent conduire à des estimateurs imputés biaisés si le modèle d’imputation ou le modèle de non-réponse n’est pas correctement spécifié. Des estimateurs doublement robustes ont été développés dans les années récentes. Ces estimateurs sont sans biais si l’un au moins des modèles d’imputation ou de non-réponse est bien spécifié. Cependant, en présence des valeurs aberrantes, les estimateurs imputés doublement robustes peuvent être très instables. En utilisant le concept de biais conditionnel, nous proposons une version robuste aux valeurs aberrantes de l’estimateur doublement robuste. Les résultats des études par simulations montrent que l’estimateur proposé performe bien pour un choix approprié de la constante de robustesse.
This thesis focuses on the treatment of representative outliers in two important aspects of surveys: small area estimation and imputation for item non-response. Concerning small area estimation, robust estimators in unit-level models have been studied. Sinha & Rao (2009) proposed estimation procedures designed for small area means, based on robustified maximum likelihood parameters estimates of linear mixed model and robust empirical best linear unbiased predictors of the random effect of the underlying model. Their robust methods for estimating area means are of the plug-in type, and in view of the results of Chambers (1986), the resulting robust estimators may be biased in some situations. Biascorrected estimators have been proposed by Chambers et al. (2014). In addition, these robust small area estimators were associated with the estimation of the Mean Square Error (MSE). Sinha & Rao (2009) proposed a parametric bootstrap procedure based on the robust estimates of the parameters of the underlying linear mixed model to estimate the MSE. Analytical procedures for the estimation of the MSE have been proposed in Chambers et al. (2014). However, their theoretical validity has not been formally established and their empirical performances are not fully satisfactorily. Here, we investigate two new approaches for the robust version the best empirical unbiased estimator: the first one relies on the work of Chambers (1986), while the second proposal uses the concept of conditional bias as an influence measure to assess the impact of units in the population. These two classes of robust small area estimators also include a correction term for the bias. However, they are both fully bias-corrected, in the sense that the correction term takes into account the potential impact of the other domains on the small area of interest unlike the one of Chambers et al. (2014) which focuses only on the domain of interest. Under certain conditions, non-negligible bias is expected for the Sinha-Rao method, while the proposed methods exhibit significant bias reduction, controlled by appropriate choices of the influence function and tuning constants. Monte Carlo simulations are conducted, and comparisons are made between: the new robust estimators, the Sinha-Rao estimator, and the bias-corrected estimator. Empirical results suggest that the Sinha-Rao method and the bias-adjusted estimator of Chambers et al (2014) may exhibit a large bias, while the new procedures offer often better performances in terms of bias and mean squared error. In addition, we propose a new bootstrap procedure for MSE estimation of robust small area predictors. Unlike existing approaches, we formally prove the asymptotic validity of the proposed bootstrap method. Moreover, the proposed method is semi-parametric, i.e., it does not rely on specific distributional assumptions about the errors and random effects of the unit-level model underlying the small-area estimation, thus it is particularly attractive and more widely applicable. We assess the finite sample performance of our bootstrap estimator through Monte Carlo simulations. The results show that our procedure performs satisfactorily well and outperforms existing ones. Application of the proposed method is illustrated by analyzing a well-known outlier-contaminated small county crops area data from North-Central Iowa farms and Landsat satellite images. Concerning imputation in the presence of item non-response, some single imputation methods have been studied. The deterministic regression imputation, which includes the ratio imputation and mean imputation are often used in surveys. These imputation methods may lead to biased imputed estimators if the imputation model or the non-response model is not properly specified. Recently, doubly robust imputed estimators have been developed. However, in the presence of outliers, the doubly robust imputed estimators can be very unstable. Using the concept of conditional bias as a measure of influence (Beaumont, Haziza and Ruiz-Gazen, 2013), we propose an outlier robust version of the doubly robust imputed estimator. Thus this estimator is denoted as a triple robust imputed estimator. The results of simulation studies show that the proposed estimator performs satisfactorily well for an appropriate choice of the tuning constant.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

Gagnon, Philippe. „Sélection de modèles robuste : régression linéaire et algorithme à sauts réversibles“. Thèse, 2017. http://hdl.handle.net/1866/20583.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!