Дисертації: "Kernel testing"

1

Lee, Kevin Sung-ho. "Kernel-adaptor interface testing of Project Timeliner." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/49939.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Ozier-Lafontaine, Anthony. "Kernel-based testing and their application to single-cell data." Electronic Thesis or Diss., Ecole centrale de Nantes, 2023. http://www.theses.fr/2023ECDN0025.

Повний текст джерела

Анотація:

Les technologies de sequençage en cellule unique mesurent des informations à l’échelle de chaque cellule d’une population. Les données issues de ces technologies présentent de nombreux défis : beaucoup d’observations en grande dimension et souvent parcimonieuses. De nombreuses expériences de biologie consistent à comparer des conditions.L’objet de la thèse est de développer un ensemble d’outils qui compare des échantillons de données issues des technologies de séquençage en cellule unique afin de détecter et décrire les différences qui existent. Pour cela, nous proposons d’appliquer les tests de comparaison de deux échantillons basés sur les méthodes à noyaux existants. Nous proposons de généraliser ces tests à noyaux pour les designs expérimentaux quelconques, ce test s’inspire du test de la trace de Hotelling- Lawley. Nous implémentons pour la première fois ces tests à noyaux dans un packageR et Python nommé ktest, et nos applications sur données simulées et issues d’expériences démontrent leurs performances. L’application de ces méthodes à des données expérimentales permet d’identifier les observations qui expliquent les différences détectées. Enfin, nous proposons une implémentation efficace de ces tests basée sur des factorisations matricielles de type Nyström, ainsi qu’un ensemble d’outils de diagnostic et d’interprétation des résultats pour rendre ces méthodes accessibles et compréhensibles par des nonspécialistes
Single-cell technologies generate data at the single-cell level. They are coumposed of hundreds to thousands of observations (i.e. cells) and tens of thousands of variables (i.e. genes). New methodological challenges arose to fully exploit the potentialities of these complex data. A major statistical challenge is to distinguish biological informationfrom technical noise in order to compare conditions or tissues. This thesis explores the application of kernel testing on single-cell datasets in order to detect and describe the potential differences between compared conditions.To overcome the limitations of existing kernel two-sample tests, we propose a kernel test inspired from the Hotelling-Lawley test that can apply to any experimental design. We implemented these tests in a R and Python package called ktest that is their first useroriented implementation. We demonstrate the performances of kernel testing on simulateddatasets and on various experimental singlecell datasets. The geometrical interpretations of these methods allows to identify the observations leading a detected difference. Finally, we propose a Nyström-based efficient implementationof these kernel tests as well as a range of diagnostic and interpretation tools

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Kotlyarova, Yulia. "Kernel estimators : testing and bandwidth selection in models of unknown smoothness." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=85179.

Повний текст джерела

Анотація:

Semiparametric and nonparametric estimators are becoming indispensable tools in applied econometrics. Many of these estimators depend on the choice of smoothing bandwidth and kernel function. Optimality of such parameters is determined by unobservable smoothness of the model, that is, by differentiability of the distribution functions of random variables in the model. In this thesis we consider two estimators of this class: the smoothed maximum score estimator for binary choice models and the kernel density estimator.
We present theoretical results on the asymptotic distribution of the estimators under various smoothness assumptions and derive the limiting joint distributions for estimators with different combinations of bandwidths and kernel functions. Using these nontrivial joint distributions, we suggest a new way of improving accuracy and robustness of the estimators by considering a linear combination of estimators with different smoothing parameters. The weights in the combination minimize an estimate of the mean squared error. Monte Carlo simulations confirm suitability of this method for both smooth and non-smooth models.
For the original and smoothed maximum score estimators, a formal procedure is introduced to test for equivalence of the maximum likelihood estimators and these semiparametric estimators, which converge to the true value at slower rates. The test allows one to identify heteroskedastic misspecifications in the logit/probit models. The method has been applied to analyze the decision of married women to join the labour force.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Liero, Hannelore. "Testing the Hazard Rate, Part I." Universität Potsdam, 2003. http://opus.kobv.de/ubp/volltexte/2011/5151/.

Повний текст джерела

Анотація:

We consider a nonparametric survival model with random censoring. To test whether the hazard rate has a parametric form the unknown hazard rate is estimated by a kernel estimator. Based on a limit theorem stating the asymptotic normality of the quadratic distance of this estimator from the smoothed hypothesis an asymptotic ®-test is proposed. Since the test statistic depends on the maximum likelihood estimator for the unknown parameter in the hypothetical model properties of this parameter estimator are investigated. Power considerations complete the approach.

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Friedrichs, Stefanie Verfasser], Heike [Akademischer Betreuer] Bickeböller, Thomas [Gutachter] [Kneib, and Tim [Gutachter] Beißbarth. "Kernel-Based Pathway Approaches for Testing and Selection / Stefanie Friedrichs ; Gutachter: Thomas Kneib, Tim Beißbarth ; Betreuer: Heike Bickeböller." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2017. http://d-nb.info/114137952X/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Li, Yinglei. "Genetic Association Testing of Copy Number Variation." UKnowledge, 2014. http://uknowledge.uky.edu/statistics_etds/8.

Повний текст джерела

Анотація:

Copy-number variation (CNV) has been implicated in many complex diseases. It is of great interest to detect and locate such regions through genetic association testings. However, the association testings are complicated by the fact that CNVs usually span multiple markers and thus such markers are correlated to each other. To overcome the difficulty, it is desirable to pool information across the markers. In this thesis, we propose a kernel-based method for aggregation of marker-level tests, in which first we obtain a bunch of p-values through association tests for every marker and then the association test involving CNV is based on the statistic of p-values combinations. In addition, we explore several aspects of its implementation. Since p-values among markers are correlated, it is complicated to obtain the null distribution of test statistics for kernel-base aggregation of marker-level tests. To solve the problem, we develop two proper methods that are both demonstrated to preserve the family-wise error rate of the test procedure. They are permutation based and correlation base approaches. Many implementation aspects of kernel-based method are compared through the empirical power studies in a number of simulations constructed from real data involving a pharmacogenomic study of gemcitabine. In addition, more performance comparisons are shown between permutation-based and correlation-based approach. We also apply those two approaches to the real data. The main contribution of the dissertation is the development of marker-level association testing, a comparable and powerful approach to detect phenotype-associated CNVs. Furthermore, the approach is extended to high dimension setting with high efficiency.

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Akcin, Haci Mustafa. "NONPARAMETRIC INFERENCES FOR THE HAZARD FUNCTION WITH RIGHT TRUNCATION." Digital Archive @ GSU, 2013. http://digitalarchive.gsu.edu/math_diss/12.

Повний текст джерела

Анотація:

Incompleteness is a major feature of time-to-event data. As one type of incompleteness, truncation refers to the unobservability of the time-to-event variable because it is smaller (or greater) than the truncation variable. A truncated sample always involves left and right truncation. Left truncation has been studied extensively while right truncation has not received the same level of attention. In one of the earliest studies on right truncation, Lagakos et al. (1988) proposed to transform a right truncated variable to a left truncated variable and then apply existing methods to the transformed variable. The reverse-time hazard function is introduced through transformation. However, this quantity does not have a natural interpretation. There exist gaps in the inferences for the regular forward-time hazard function with right truncated data. This dissertation discusses variance estimation of the cumulative hazard estimator, one-sample log-rank test, and comparison of hazard rate functions among finite independent samples under the context of right truncation. First, the relation between the reverse- and forward-time cumulative hazard functions is clarified. This relation leads to the nonparametric inference for the cumulative hazard function. Jiang (2010) recently conducted a research on this direction and proposed two variance estimators of the cumulative hazard estimator. Some revision to the variance estimators is suggested in this dissertation and evaluated in a Monte-Carlo study. Second, this dissertation studies the hypothesis testing for right truncated data. A series of tests is developed with the hazard rate function as the target quantity. A one-sample log-rank test is first discussed, followed by a family of weighted tests for comparison between finite $K$-samples. Particular weight functions lead to log-rank, Gehan, Tarone-Ware tests and these three tests are evaluated in a Monte-Carlo study. Finally, this dissertation studies the nonparametric inference for the hazard rate function for the right truncated data. The kernel smoothing technique is utilized in estimating the hazard rate function. A Monte-Carlo study investigates the uniform kernel smoothed estimator and its variance estimator. The uniform, Epanechnikov and biweight kernel estimators are implemented in the example of blood transfusion infected AIDS data.

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Li, Na. "MMD and Ward criterion in a RKHS : application to Kernel based hierarchical agglomerative clustering." Thesis, Troyes, 2015. http://www.theses.fr/2015TROY0033/document.

Повний текст джерела

Анотація:

La classification non supervisée consiste à regrouper des objets afin de former des groupes homogènes au sens d’une mesure de similitude. C’est un outil utile pour explorer la structure d’un ensemble de données non étiquetées. Par ailleurs, les méthodes à noyau, introduites initialement dans le cadre supervisé, ont démontré leur intérêt par leur capacité à réaliser des traitements non linéaires des données en limitant la complexité algorithmique. En effet, elles permettent de transformer un problème non linéaire en un problème linéaire dans un espace de plus grande dimension. Dans ce travail, nous proposons un algorithme de classification hiérarchique ascendante utilisant le formalisme des méthodes à noyau. Nous avons tout d’abord recherché des mesures de similitude entre des distributions de probabilité aisément calculables à l’aide de noyaux. Parmi celles-ci, la maximum mean discrepancy a retenu notre attention. Afin de pallier les limites inhérentes à son usage, nous avons proposé une modification qui conduit au critère de Ward, bien connu en classification hiérarchique. Nous avons enfin proposé un algorithme itératif de clustering reposant sur la classification hiérarchique à noyau et permettant d’optimiser le noyau et de déterminer le nombre de classes en présence
Clustering, as a useful tool for unsupervised classification, is the task of grouping objects according to some measured or perceived characteristics of them and it has owned great success in exploring the hidden structure of unlabeled data sets. Kernel-based clustering algorithms have shown great prominence. They provide competitive performance compared with conventional methods owing to their ability of transforming nonlinear problem into linear ones in a higher dimensional feature space. In this work, we propose a Kernel-based Hierarchical Agglomerative Clustering algorithms (KHAC) using Ward’s criterion. Our method is induced by a recently arisen criterion called Maximum Mean Discrepancy (MMD). This criterion has firstly been proposed to measure difference between different distributions and can easily be embedded into a RKHS. Close relationships have been proved between MMD and Ward's criterion. In our KHAC method, selection of the kernel parameter and determination of the number of clusters have been studied, which provide satisfactory performance. Finally an iterative KHAC algorithm is proposed which aims at determining the optimal kernel parameter, giving a meaningful number of clusters and partitioning the data set automatically

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Bissyande, Tegawende. "Contributions for improving debugging of kernel-level services in a monolithic operating system." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00821893.

Повний текст джерела

Анотація:

Alors que la recherche sur la qualité du code des systèmes a connu un formidable engouement, les systèmes d'exploitation sont encore aux prises avec des problèmes de fiabilité notamment dus aux bogues de programmation au niveau des services noyaux tels que les pilotes de périphériques et l'implémentation des systèmes de fichiers. Des études ont en effet montré que chaque version du noyau Linux contient entre 600 et 700 fautes, et que la propension des pilotes de périphériques à contenir des erreurs est jusqu'à sept fois plus élevée que toute autre partie du noyau. Ces chiffres suggèrent que le code des services noyau n'est pas suffisamment testé et que de nombreux défauts passent inaperçus ou sont difficiles à réparer par des programmeurs non-experts, ces derniers formant pourtant la majorité des développeurs de services. Cette thèse propose une nouvelle approche pour le débogage et le test des services noyau. Notre approche est focalisée sur l'interaction entre les services noyau et le noyau central en abordant la question des "trous de sûreté" dans le code de définition des fonctions de l'API du noyau. Dans le contexte du noyau Linux, nous avons mis en place une approche automatique, dénommée Diagnosys, qui repose sur l'analyse statique du code du noyau afin d'identifier, classer et exposer les différents trous de sûreté de l'API qui pourraient donner lieu à des fautes d'exécution lorsque les fonctions sont utilisées dans du code de service écrit par des développeurs ayant une connaissance limitée des subtilités du noyau. Pour illustrer notre approche, nous avons implémenté Diagnosys pour la version 2.6.32 du noyau Linux. Nous avons montré ses avantages à soutenir les développeurs dans leurs activités de tests et de débogage.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Singh, Yuvraj. "Regression Models to Predict Coastdown Road Load for Various Vehicle Types." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1595265184541326.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Li, Yuyi. "Empirical likelihood with applications in time series." Thesis, University of Manchester, 2011. https://www.research.manchester.ac.uk/portal/en/theses/empirical-likelihood-with-applications-in-time-series(29c74808-f784-4306-8df9-26f45b30b553).html.

Повний текст джерела

Анотація:

This thesis investigates the statistical properties of Kernel Smoothed Empirical Likelihood (KSEL, e.g. Smith, 1997 and 2004) estimator and various associated inference procedures in weakly dependent data. New tests for structural stability are proposed and analysed. Asymptotic analyses and Monte Carlo experiments are applied to assess these new tests, theoretically and empirically. Chapter 1 reviews and discusses some estimation and inferential properties of Empirical Likelihood (EL, Owen, 1988) for identically and independently distributed data and compares it with Generalised EL (GEL), GMM and other estimators. KSEL is extensively treated, by specialising kernel-smoothed GEL in the working paper of Smith (2004), some of whose results and proofs are extended and refined in Chapter 2. Asymptotic properties of some tests in Smith (2004) are also analysed under local alternatives. These special treatments on KSEL lay the foundation for analyses in Chapters 3 and 4, which would not otherwise follow straightforwardly. In Chapters 3 and 4, subsample KSEL estimators are proposed to assist the development of KSEL structural stability tests to diagnose for a given breakpoint and for an unknown breakpoint, respectively, based on relevant work using GMM (e.g. Hall and Sen, 1999; Andrews and Fair, 1988; Andrews and Ploberger, 1994). It is also original in these two chapters that moment functions are allowed to be kernel-smoothed after or before the sample split, and it is rigorously proved that these two smoothing orders are asymptotically equivalent. The overall null hypothesis of structural stability is decomposed according to the identifying and overidentifying restrictions, as Hall and Sen (1999) advocate in GMM, leading to a more practical and precise structural stability diagnosis procedure. In this framework, these KSEL structural stability tests are also proved via asymptotic analysis to be capable of identifying different sources of instability, arising from parameter value change or violation of overidentifying restrictions. The analyses show that these KSEL tests follow the same limit distributions as their counterparts using GMM. To examine the finite-sample performance of KSEL structural stability tests in comparison to GMM's, Monte Carlo simulations are conducted in Chapter 5 using a simple linear model considered by Hall and Sen (1999). This chapter details some relevant computational algorithms and permits different smoothing order, kernel type and prewhitening options. In general, simulation evidence seems to suggest that compared to GMM's tests, these newly proposed KSEL tests often perform comparably. However, in some cases, the sizes of these can be slightly larger, and the false null hypotheses are rejected with much higher frequencies. Thus, these KSEL based tests are valid theoretical and practical alternatives to GMM's.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Bounliphone, Wacha. "Tests d’hypothèses statistiquement et algorithmiquement efficaces de similarité et de dépendance." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLC002/document.

Повний текст джерела

Анотація:

Cette thèse présente de nouveaux tests d’hypothèses statistiques efficaces pour la relative similarité et dépendance, et l’estimation de la matrice de précision. La principale méthodologie adoptée dans cette thèse est la classe des estimateurs U-statistiques.Le premier test statistique porte sur les tests de relative similarité appliqués au problème de la sélection de modèles. Les modèles génératifs probabilistes fournissent un cadre puissant pour représenter les données. La sélection de modèles dans ce contexte génératif peut être difficile. Pour résoudre ce problème, nous proposons un nouveau test d’hypothèse non paramétrique de relative similarité et testons si un premier modèle candidat génère un échantillon de données significativement plus proche d’un ensemble de validation de référence.La deuxième test d’hypothèse statistique non paramétrique est pour la relative dépendance. En présence de dépendances multiples, les méthodes existantes ne répondent qu’indirectement à la question de la relative dépendance. Or, savoir si une dépendance est plus forte qu’une autre est important pour la prise de décision. Nous présentons un test statistique qui détermine si une variable dépend beaucoup plus d’une première variable cible ou d’une seconde variable.Enfin, une nouvelle méthode de découverte de structure dans un modèle graphique est proposée. En partant du fait que les zéros d’une matrice de précision représentent les indépendances conditionnelles, nous développons un nouveau test statistique qui estime une borne pour une entrée de la matrice de précision. Les méthodes existantes de découverte de structure font généralement des hypothèses restrictives de distributions gaussiennes ou parcimonieuses qui ne correspondent pas forcément à l’étude de données réelles. Nous introduisons ici un nouveau test utilisant les propriétés des U-statistics appliqués à la matrice de covariance, et en déduisons une borne sur la matrice de précision
The dissertation presents novel statistically and computationally efficient hypothesis tests for relative similarity and dependency, and precision matrix estimation. The key methodology adopted in this thesis is the class of U-statistic estimators. The class of U-statistics results in a minimum-variance unbiased estimation of a parameter.The first part of the thesis focuses on relative similarity tests applied to the problem of model selection. Probabilistic generative models provide a powerful framework for representing data. Model selection in this generative setting can be challenging. To address this issue, we provide a novel non-parametric hypothesis test of relative similarity and test whether a first candidate model generates a data sample significantly closer to a reference validation set.Subsequently, the second part of the thesis focuses on developing a novel non-parametric statistical hypothesis test for relative dependency. Tests of dependence are important tools in statistical analysis, and several canonical tests for the existence of dependence have been developed in the literature. However, the question of whether there exist dependencies is secondary. The determination of whether one dependence is stronger than another is frequently necessary for decision making. We present a statistical test which determine whether one variables is significantly more dependent on a first target variable or a second.Finally, a novel method for structure discovery in a graphical model is proposed. Making use of a result that zeros of a precision matrix can encode conditional independencies, we develop a test that estimates and bounds an entry of the precision matrix. Methods for structure discovery in the literature typically make restrictive distributional (e.g. Gaussian) or sparsity assumptions that may not apply to a data sample of interest. Consequently, we derive a new test that makes use of results for U-statistics and applies them to the covariance matrix, which then implies a bound on the precision matrix

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Chwialkowski, K. P. "Topics in kernal hypothesis testing." Thesis, University College London (University of London), 2016. http://discovery.ucl.ac.uk/1519607/.

Повний текст джерела

Анотація:

This thesis investigates some unaddressed problems in kernel nonparametric hypothesis testing. The contributions are grouped around three main themes: Wild Bootstrap for Degenerate Kernel Tests. A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler. A Kernel Test of Goodness of Fit. A nonparametric statistical test for goodness-of-fit is proposed: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence constructed via Stein's method using functions from a Reproducing Kernel Hilbert Space. Construction of the test is based on the wild bootstrap method. We apply our test to quantifying convergence of approximate Markov Chain Monte Carlo methods, statistical model criticism, and evaluating quality of fit vs model complexity in nonparametric density estimation. Fast Analytic Functions Based Two Sample Test. A class of nonparametric two-sample tests with a cost linear in the sample size is proposed. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. Experiments on artificial benchmarks and on challenging real-world testing problems demonstrate good power/time tradeoff retained even in high dimensional problems. The main contributions to science are the following. We prove that the kernel tests based on the wild bootstrap method tightly control the type one error on the desired level and are consistent i.e. type two error drops to zero with increasing number of samples. We construct a kernel goodness of fit test that requires only knowledge of the density up to an normalizing constant. We use this test to construct first consistent test for convergence of Markov Chains and use it to quantify properties of approximate MCMC algorithms. Finally, we construct a linear time two-sample test that uses new, finite dimensional feature representation of probability measures.

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Heideklang, René. "Data Fusion for Multi-Sensor Nondestructive Detection of Surface Cracks in Ferromagnetic Materials." Doctoral thesis, Humboldt-Universität zu Berlin, 2018. http://dx.doi.org/10.18452/19586.

Повний текст джерела

Анотація:

Ermüdungsrissbildung ist ein gefährliches und kostenintensives Phänomen, welches frühzeitig erkannt werden muss. Weil kleine Fehlstellen jedoch hohe Testempfindlichkeit erfordern, wird die Prüfzuverlässigkeit durch Falschanzeigen vermindert. Diese Arbeit macht sich deshalb die Diversität unterschiedlicher zerstörungsfreier Oberflächenprüfmethoden zu Nutze, um mittels Datenfusion die Zuverlässigkeit der Fehlererkennung zu erhöhen. Der erste Beitrag dieser Arbeit in neuartigen Ansätzen zur Fusion von Prüfbildern. Diese werden durch Oberflächenabtastung mittels Wirbelstromprüfung, thermischer Prüfung und magnetischer Streuflussprüfung gewonnen. Die Ergebnisse zeigen, dass schon einfache algebraische Fusionsregeln gute Ergebnisse liefern, sofern die Daten adäquat vorverarbeitet wurden. So übertrifft Datenfusion den besten Einzelsensor in der pixelbasierten Falscherkennungsrate um den Faktor sechs bei einer Nutentiefe von 10 μm. Weiterhin wird die Fusion im Bildtransformationsbereich untersucht. Jedoch werden die theoretischen Vorteile solcher richtungsempfindlichen Transformationen in der Praxis mit den vorliegenden Daten nicht erreicht. Nichtsdestotrotz wird der Vorteil der Fusion gegenüber Einzelsensorprüfung auch hier bestätigt. Darüber hinaus liefert diese Arbeit neuartige Techniken zur Fusion auch auf höheren Ebenen der Signalabstraktion. Ein Ansatz, der auf Kerndichtefunktionen beruht, wird eingeführt, um örtlich verteilte Detektionshypothesen zu integrieren. Er ermöglicht, die praktisch unvermeidbaren Registrierungsfehler explizit zu modellieren. Oberflächenunstetigkeiten von 30 μm Tiefe können zuverlässig durch Fusion gefunden werden, wogegen das beste Einzelverfahren erst Tiefen ab 40–50 μm erfolgreich auffindet. Das Experiment wird auf einem zweiten Prüfkörper bestätigt. Am Ende der Arbeit werden Richtlinien für den Einsatz von Datenfusion gegeben, und die Notwendigkeit einer Initiative zum Teilen von Messdaten wird betont, um zukünftige Forschung zu fördern.
Fatigue cracking is a dangerous and cost-intensive phenomenon that requires early detection. But at high test sensitivity, the abundance of false indications limits the reliability of conventional materials testing. This thesis exploits the diversity of physical principles that different nondestructive surface inspection methods offer, by applying data fusion techniques to increase the reliability of defect detection. The first main contribution are novel approaches for the fusion of NDT images. These surface scans are obtained from state-of-the-art inspection procedures in Eddy Current Testing, Thermal Testing and Magnetic Flux Leakage Testing. The implemented image fusion strategy demonstrates that simple algebraic fusion rules are sufficient for high performance, given adequate signal normalization. Data fusion reduces the rate of false positives is reduced by a factor of six over the best individual sensor at a 10 μm deep groove. Moreover, the utility of state-of-the-art image representations, like the Shearlet domain, are explored. However, the theoretical advantages of such directional transforms are not attained in practice with the given data. Nevertheless, the benefit of fusion over single-sensor inspection is confirmed a second time. Furthermore, this work proposes novel techniques for fusion at a high level of signal abstraction. A kernel-based approach is introduced to integrate spatially scattered detection hypotheses. This method explicitly deals with registration errors that are unavoidable in practice. Surface discontinuities as shallow as 30 μm are reliably found by fusion, whereas the best individual sensor requires depths of 40–50 μm for successful detection. The experiment is replicated on a similar second test specimen. Practical guidelines are given at the end of the thesis, and the need for a data sharing initiative is stressed to promote future research on this topic.

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Coudret, Raphaël. "Stochastic modelling using large data sets : applications in ecology and genetics." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00865867.

Повний текст джерела

Анотація:

There are two main parts in this thesis. The first one concerns valvometry, which is here the study of the distance between both parts of the shell of an oyster, over time. The health status of oysters can be characterized using valvometry in order to obtain insights about the quality of their environment. We consider that a renewal process with four states underlies the behaviour of the studied oysters. Such a hidden process can be retrieved from a valvometric signal by assuming that some probability density function linked with this signal, is bimodal. We then compare several estimators which take this assumption into account, including kernel density estimators.In another chapter, we compare several regression approaches, aiming at analysing transcriptomic data. To understand which explanatory variables have an effect on gene expressions, we apply a multiple testing procedure on these data, through the linear model FAMT. The SIR method may find nonlinear relations in such a context. It is however more commonly used when the response variable is univariate. A multivariate version of SIR was then developed. Procedures to measure gene expressions can be expensive. The sample size n of the corresponding datasets is then often small. That is why we also studied SIR when n is less than the number of explanatory variables p.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Nguyen, Van Hanh. "Modèles de mélange semi-paramétriques et applications aux tests multiples." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00987035.

Повний текст джерела

Анотація:

Dans un contexte de test multiple, nous considérons un modèle de mélange semi-paramétrique avec deux composantes. Une composante est supposée connue et correspond à la distribution des p-valeurs sous hypothèse nulle avec probabilité a priori p. L'autre composante f est nonparamétrique et représente la distribution des p-valeurs sous l'hypothèse alternative. Le problème d'estimer les paramètres p et f du modèle apparaît dans les procédures de contrôle du taux de faux positifs (''false discovery rate'' ou FDR). Dans la première partie de cette dissertation, nous étudions l'estimation de la proportion p. Nous discutons de résultats d'efficacité asymptotique et établissons que deux cas différents arrivent suivant que f s'annule ou non surtout un intervalle non-vide. Dans le premier cas (annulation surtout un intervalle), nous présentons des estimateurs qui convergent \' la vitesse paramétrique, calculons la variance asymptotique optimale et conjecturons qu'aucun estimateur n'est asymptotiquement efficace (i.e atteint la variance asymptotique optimale). Dans le deuxième cas, nous prouvons que le risque quadratique de n'importe quel estimateur ne converge pas à la vitesse paramétrique. Dans la deuxième partie de la dissertation, nous nous concentrons sur l'estimation de la composante inconnue nonparamétrique f dans le mélange, en comptant sur un estimateur préliminaire de p. Nous proposons et étudions les propriétés asymptotiques de deux estimateurs différents pour cette composante inconnue. Le premier estimateur est un estimateur à noyau avec poids aléatoires. Nous établissons une borne supérieure pour son risque quadratique ponctuel, en montrant une vitesse de convergence nonparamétrique classique sur une classe de Holder. Le deuxième estimateur est un estimateur du maximum de vraisemblance régularisée. Il est calculé par un algorithme itératif, pour lequel nous établissons une propriété de décroissance d'un critère. De plus, ces estimateurs sont utilisés dans une procédure de test multiple pour estimer le taux local de faux positifs (''local false discovery rate'' ou lfdr).

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Vitale, Raffaele. "Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/90442.

Повний текст джерела

Анотація:

The present Ph.D. thesis, primarily conceived to support and reinforce the relation between academic and industrial worlds, was developed in collaboration with Shell Global Solutions (Amsterdam, The Netherlands) in the endeavour of applying and possibly extending well-established latent variable-based approaches (i.e. Principal Component Analysis - PCA - Partial Least Squares regression - PLS - or Partial Least Squares Discriminant Analysis - PLSDA) for complex problem solving not only in the fields of manufacturing troubleshooting and optimisation, but also in the wider environment of multivariate data analysis. To this end, novel efficient algorithmic solutions are proposed throughout all chapters to address very disparate tasks, from calibration transfer in spectroscopy to real-time modelling of streaming flows of data. The manuscript is divided into the following six parts, focused on various topics of interest: Part I - Preface, where an overview of this research work, its main aims and justification is given together with a brief introduction on PCA, PLS and PLSDA; Part II - On kernel-based extensions of PCA, PLS and PLSDA, where the potential of kernel techniques, possibly coupled to specific variants of the recently rediscovered pseudo-sample projection, formulated by the English statistician John C. Gower, is explored and their performance compared to that of more classical methodologies in four different applications scenarios: segmentation of Red-Green-Blue (RGB) images, discrimination of on-/off-specification batch runs, monitoring of batch processes and analysis of mixture designs of experiments; Part III - On the selection of the number of factors in PCA by permutation testing, where an extensive guideline on how to accomplish the selection of PCA components by permutation testing is provided through the comprehensive illustration of an original algorithmic procedure implemented for such a purpose; Part IV - On modelling common and distinctive sources of variability in multi-set data analysis, where several practical aspects of two-block common and distinctive component analysis (carried out by methods like Simultaneous Component Analysis - SCA - DIStinctive and COmmon Simultaneous Component Analysis - DISCO-SCA - Adapted Generalised Singular Value Decomposition - Adapted GSVD - ECO-POWER, Canonical Correlation Analysis - CCA - and 2-block Orthogonal Projections to Latent Structures - O2PLS) are discussed, a new computational strategy for determining the number of common factors underlying two data matrices sharing the same row- or column-dimension is described, and two innovative approaches for calibration transfer between near-infrared spectrometers are presented; Part V - On the on-the-fly processing and modelling of continuous high-dimensional data streams, where a novel software system for rational handling of multi-channel measurements recorded in real time, the On-The-Fly Processing (OTFP) tool, is designed; Part VI - Epilogue, where final conclusions are drawn, future perspectives are delineated, and annexes are included.
La presente tesis doctoral, concebida principalmente para apoyar y reforzar la relación entre la academia y la industria, se desarrolló en colaboración con Shell Global Solutions (Amsterdam, Países Bajos) en el esfuerzo de aplicar y posiblemente extender los enfoques ya consolidados basados en variables latentes (es decir, Análisis de Componentes Principales - PCA - Regresión en Mínimos Cuadrados Parciales - PLS - o PLS discriminante - PLSDA) para la resolución de problemas complejos no sólo en los campos de mejora y optimización de procesos, sino también en el entorno más amplio del análisis de datos multivariados. Con este fin, en todos los capítulos proponemos nuevas soluciones algorítmicas eficientes para abordar tareas dispares, desde la transferencia de calibración en espectroscopia hasta el modelado en tiempo real de flujos de datos. El manuscrito se divide en las seis partes siguientes, centradas en diversos temas de interés: Parte I - Prefacio, donde presentamos un resumen de este trabajo de investigación, damos sus principales objetivos y justificaciones junto con una breve introducción sobre PCA, PLS y PLSDA; Parte II - Sobre las extensiones basadas en kernels de PCA, PLS y PLSDA, donde presentamos el potencial de las técnicas de kernel, eventualmente acopladas a variantes específicas de la recién redescubierta proyección de pseudo-muestras, formulada por el estadista inglés John C. Gower, y comparamos su rendimiento respecto a metodologías más clásicas en cuatro aplicaciones a escenarios diferentes: segmentación de imágenes Rojo-Verde-Azul (RGB), discriminación y monitorización de procesos por lotes y análisis de diseños de experimentos de mezclas; Parte III - Sobre la selección del número de factores en el PCA por pruebas de permutación, donde aportamos una guía extensa sobre cómo conseguir la selección de componentes de PCA mediante pruebas de permutación y una ilustración completa de un procedimiento algorítmico original implementado para tal fin; Parte IV - Sobre la modelización de fuentes de variabilidad común y distintiva en el análisis de datos multi-conjunto, donde discutimos varios aspectos prácticos del análisis de componentes comunes y distintivos de dos bloques de datos (realizado por métodos como el Análisis Simultáneo de Componentes - SCA - Análisis Simultáneo de Componentes Distintivos y Comunes - DISCO-SCA - Descomposición Adaptada Generalizada de Valores Singulares - Adapted GSVD - ECO-POWER, Análisis de Correlaciones Canónicas - CCA - y Proyecciones Ortogonales de 2 conjuntos a Estructuras Latentes - O2PLS). Presentamos a su vez una nueva estrategia computacional para determinar el número de factores comunes subyacentes a dos matrices de datos que comparten la misma dimensión de fila o columna y dos planteamientos novedosos para la transferencia de calibración entre espectrómetros de infrarrojo cercano; Parte V - Sobre el procesamiento y la modelización en tiempo real de flujos de datos de alta dimensión, donde diseñamos la herramienta de Procesamiento en Tiempo Real (OTFP), un nuevo sistema de manejo racional de mediciones multi-canal registradas en tiempo real; Parte VI - Epílogo, donde presentamos las conclusiones finales, delimitamos las perspectivas futuras, e incluimos los anexos.
La present tesi doctoral, concebuda principalment per a recolzar i reforçar la relació entre l'acadèmia i la indústria, es va desenvolupar en col·laboració amb Shell Global Solutions (Amsterdam, Països Baixos) amb l'esforç d'aplicar i possiblement estendre els enfocaments ja consolidats basats en variables latents (és a dir, Anàlisi de Components Principals - PCA - Regressió en Mínims Quadrats Parcials - PLS - o PLS discriminant - PLSDA) per a la resolució de problemes complexos no solament en els camps de la millora i optimització de processos, sinó també en l'entorn més ampli de l'anàlisi de dades multivariades. A aquest efecte, en tots els capítols proposem noves solucions algorítmiques eficients per a abordar tasques dispars, des de la transferència de calibratge en espectroscopia fins al modelatge en temps real de fluxos de dades. El manuscrit es divideix en les sis parts següents, centrades en diversos temes d'interès: Part I - Prefaci, on presentem un resum d'aquest treball de recerca, es donen els seus principals objectius i justificacions juntament amb una breu introducció sobre PCA, PLS i PLSDA; Part II - Sobre les extensions basades en kernels de PCA, PLS i PLSDA, on presentem el potencial de les tècniques de kernel, eventualment acoblades a variants específiques de la recentment redescoberta projecció de pseudo-mostres, formulada per l'estadista anglés John C. Gower, i comparem el seu rendiment respecte a metodologies més clàssiques en quatre aplicacions a escenaris diferents: segmentació d'imatges Roig-Verd-Blau (RGB), discriminació i monitorització de processos per lots i anàlisi de dissenys d'experiments de mescles; Part III - Sobre la selecció del nombre de factors en el PCA per proves de permutació, on aportem una guia extensa sobre com aconseguir la selecció de components de PCA a través de proves de permutació i una il·lustració completa d'un procediment algorítmic original implementat per a la finalitat esmentada; Part IV - Sobre la modelització de fonts de variabilitat comuna i distintiva en l'anàlisi de dades multi-conjunt, on discutim diversos aspectes pràctics de l'anàlisis de components comuns i distintius de dos blocs de dades (realitzat per mètodes com l'Anàlisi Simultània de Components - SCA - Anàlisi Simultània de Components Distintius i Comuns - DISCO-SCA - Descomposició Adaptada Generalitzada en Valors Singulars - Adapted GSVD - ECO-POWER, Anàlisi de Correlacions Canòniques - CCA - i Projeccions Ortogonals de 2 blocs a Estructures Latents - O2PLS). Presentem al mateix temps una nova estratègia computacional per a determinar el nombre de factors comuns subjacents a dues matrius de dades que comparteixen la mateixa dimensió de fila o columna, i dos plantejaments nous per a la transferència de calibratge entre espectròmetres d'infraroig proper; Part V - Sobre el processament i la modelització en temps real de fluxos de dades d'alta dimensió, on dissenyem l'eina de Processament en Temps Real (OTFP), un nou sistema de tractament racional de mesures multi-canal registrades en temps real; Part VI - Epíleg, on presentem les conclusions finals, delimitem les perspectives futures, i incloem annexos.
Vitale, R. (2017). Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90442
TESIS

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Wende, Ulrich. "Modellierung und Testverfahren für CMOS-kompatible Fluxgatesensoren mit planaren weichmagnetischen Kernen - Modeling and testing of a CMOS compatible fluxgate sensor with a planar softmagnetic core." Gerhard-Mercator-Universitaet Duisburg, 2001. http://www.ub.uni-duisburg.de/ETD-db/theses/available/duett-05222001-120352/.

Повний текст джерела

Анотація:

This thesis describes the optimization and characterisation of an integrated fluxgate sensor. It is fabricated with a CMOS-compatible technology for planar coils with ferromagnetic cores. Limitations of sensor measurement range and linearity are analyzed by analytical and numerical calculations of stray fields and demagnetizing effects in the cores coupled with signal analysis of the calculated coil output voltage. The sensor resultion is limited by magnetic domain effects. Based on these results the sensor layout is optimized for the compass application. Electrical and magneto-optical methods for on wafer characterisation of the ferromagnetic layer, the electrical and magnetical coil parameters and the sensor itself are developed and meet production requirements.

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Jeunesse, Paulien. "Estimation non paramétrique du taux de mort dans un modèle de population générale : Théorie et applications. A new inference strategy for general population mortality tables Nonparametric adaptive inference of birth and death models in a large population limit Nonparametric inference of age-structured models in a large population limit with interactions, immigration and characteristics Nonparametric test of time dependance of age-structured models in a large population limit." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLED013.

Повний текст джерела

Анотація:

L’étude du taux de mortalité dans des modèles de population humaine ou en biologie est le cœur de ce travail. Cette thèse se situe à la frontière de la statistique des processus, de la statistique non-paramétrique et de l’analyse.Dans une première partie, centrée sur une problématique actuarielle, un algorithme est proposé pour estimer les tables de mortalité, utiles en assurance. Cet algorithme se base sur un modèle déterministe de population. Ces nouvelles estimations améliorent les résultats actuels en prenant en compte la dynamique globale de la population. Ainsi les naissances sont incorporées dans le modèle pour calculer le taux de mort. De plus, ces estimations sont mises en lien avec les travaux précédents, assurant ainsi la continuité théorique de notre travail.Dans une deuxième partie, nous nous intéressons à l’estimation du taux de mortalité dans un modèle stochastique de population. Cela nous pousse à utiliser des arguments propres à la statistique des processus et à la statistique non-paramétrique. On trouve alors des estimateurs non-paramétriques adaptatifs dans un cadre anisotrope pour la mortalité et la densité de population, ainsi que des inégalités de concentration non asymptotiques quantiﬁant la distance entre le modèle stochastique et le modèle déterministe limite utilisé dans la première partie. On montre que ces estimateurs restent optimaux dans un modèle où le taux de mort dépend d’interactions, comme dans le cas de la population logistique.Dans une troisième partie, on considère la réalisation d’un test pour détecter la présence d’interactions dans le taux de mortalité. Ce test permet en réalité de juger de la dépendance temporelle de ce taux. Sous une hypothèse, on montre alors qu’il est possible de détecter la présence d’interactions. Un algorithme pratique est proposé pour réaliser ce test
In this thesis, we study the mortality rate in different population models to apply our results to demography or biology. The mathematical framework includes statistics of process, nonparametric estimations and analysis.In a ﬁrst part, an algorithm is proposed to estimate the mortality tables. This problematic comes from actuarial science and the aim is to apply our results in the insurance ﬁeld. This algorithm is founded on a deterministic population model. The new estimates we gets improve the actual results. Its advantage is to take into account the global population dynamics. Thanks to that, births are used in our model to compute the mortality rate. Finally these estimations are linked with the precedent works. This is a point of great importance in the ﬁeld of actuarial science.In a second part, we are interested in the estimation of the mortality rate in a stochastic population model. We need to use the tools coming from nonparametric estimations and statistics of process to do so. Indeed, the mortality rate is a function of two parameters, the time and the age. We propose minimax optimal and adaptive estimators for the mortality and the population density. We also demonstrate some non asymptotics concentration inequalities. These inequalities quantiﬁy the deviation between the stochastic process and its deterministic limit we used in the ﬁrst part. We prove that our estimators are still optimal in a model where the mortality is inﬂuenced by interactions. This is for example the case for the logistic population.In a third part, we consider the testing problem to detect the existence of interactions. This test is in fact designed to detect the time dependance of the mortality rate. Under the assumption the time dependance in the mortality rate comes only from the interactions, we can detect the presence of interactions. Finally we propose an algorithm to do this test

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Friedrichs, Stefanie. "Kernel-Based Pathway Approaches for Testing and Selection." Doctoral thesis, 2017. http://hdl.handle.net/11858/00-1735-0000-0023-3F2D-5.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Gheorghe, Marian, R. Ceterchi, F. Ipate, Savas Konur, and Raluca Lefticaru. "Kernel P systems: from modelling to verification and testing." 2017. http://hdl.handle.net/10454/11720.

Повний текст джерела

Анотація:

Yes
A kernel P system integrates in a coherent and elegant manner some of the most successfully used features of the P systems employed in modelling various applications. It also provides a theoretical framework for analysing these applications and a software environment for simulating and verifying them. In this paper, we illustrate the modelling capabilities of kernel P systems by showing how other classes of P systems can be represented with this formalism and providing a number of kernel P system models for a sorting algorithm and a broadcasting problem. We also show how formal verification can be used to validate that the given models work as desired. Finally, a test generation method based on automata is extended to non-deterministic kernel P systems.
The work of MG, FI and RL were supported by a grant of the Romanian National Author- ity for Scientific Research, CNCS-UEFISCDI (project number: PN-II-ID-PCE-2011-3-0688); RCUK

Стилі APA, Harvard, Vancouver, ISO та ін.

22

LIN, CHIH-HUA, and 林之華. "A Practical Application of the Kernel Method of Testing Equation." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/36842061472673580182.

Повний текст джерела

Анотація:

碩士
國立臺北大學
統計學系
104
When a standardized achievement test is adopted, whether the scores between the current test taker and the predecessor are fair is our major concern. In order to ensure the fairness of the tests, a statistical procedure to standardize the achievement test under different measurement time and different versions of the examination problems is called "equating". In this study, the basic theory of "The Kernel Method of Test Equating" (2004) was extended to the theory named "observed-score kernel method” in the area of score equating. For convenience, the “kequate” package of free software R was applied to complete the analysis and the equating procedure of this study. According to the Item Response Theory proposed by Yu, M. N. (1992), the basis of "equating" theory should satisfied four different basic assumptions. Only all these four assumptions are satisfied, IRT can be applied to analyze test data. In addition, IRT has some valuable properties. The particular one is to provide the standard errors of estimated abilities for individual taker. Furthermore, the Item Information Function is defined as the inverse of the square of the estimated standard error, which is an indicator of the accuracy of estimate similar to the “reliability" indicator in classical test theory. The achievement test scores of calculus of NTPU freshmen in 2014 and 2015 academic years was adopted as the practical data in this study to show how to apply observed-score kernel method to equating the scores between two different years. We also presented how to collect data by the NEAT-CE design, how to sort data, and how to use the generalized linear models to analyze data, last but not the least, how to choose the best model to manipulate the “equating” stage by the “kequate” package. By comparing the descriptive statistics and the equating output, the character of the test was evaluated as well.

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Shen, Shu. "Three essays in econometrics." Thesis, 2011. http://hdl.handle.net/2152/26903.

Повний текст джерела

Анотація:

My dissertation includes three essays that examine or relax classical restrictive assumptions used in econometrics estimation methods. The first chapter proposes methods for examining how a response variable is influenced by a covariate. Rather than focusing on the conditional mean I consider a test of whether a covariate has an effect on the entire conditional distribution of the response variable given the covariate and other conditioning variables. This type of analysis is useful in situations where the econometrician or policy maker is interested in knowing whether a variable or policy would improve the distribution of the response outcomes in a stochastic dominance sense. The response variable is assumed to be continuous, while both discrete and continuous covariate cases are considered. I derive the asymptotic distribution of the test statistics and show that they have simple known asymptotic distributions under the null by using and extending conditional empirical process results given by Horvath and Yandell (1988). Monte Carlo experiments are conducted, and the tests are shown to have good small sample behavior. The tests are applied to a study on father's labor supply. The second chapter is based on previous joint work with Jason Abrevaya. It considers estimation of censored panel-data models with individual-specific slope heterogeneity. The slope heterogeneity may be random (random-slopes model) or related to covariates (correlated-random-slopes model). Maximum likelihood and censored least-absolute deviations estimators are proposed for both models. Specification tests are provided to test the slope-heterogeneity models against nested alternatives. The proposed estimators and tests are used for an empirical study of Dutch household portfolio choice. Strong evidence of correlated random slopes for the age variables is found, indicating that the age profile of portfolio adjustment varies significantly with other household characteristics. The third chapter proposes specification tests in models with endogenous covariates. In empirical studies, econometricians often have little information on the functional form of the structural model, regardless of whether covariates in model are exogenous or endogenous. In this chapter, I propose tests for restricted structural model specifications with endogenous covariates against the fully nonparametric alternative. The restricted model specifications include the nonparametric specification with a restricted set of covariates, the semiparametric single index specification and the parametric linear specification. Test statistics are “leave-one-out” type kernel U-statistic as used in Fan and Lee (1996). They are constructed using the idea of the control function approach. Monte Carlo results are provided and tests are shown to have reasonable small sample behavior.
text

Стилі APA, Harvard, Vancouver, ISO та ін.

24

LIU, SHIFANG. "Statistical Methods for Testing Treatment-Covariate Interactions in Cancer Clinical Trials." Thesis, 2011. http://hdl.handle.net/1974/6758.

Повний текст джерела

Анотація:

Treatment–covariate interaction is often used in clinical trials to assess the homogeneity of treatment effects over these subgroups defined by a baseline covariate, which is frequently conducted after primary analysis including all patients is completed. When the endpoint is the time to an event, as in the cancer clinical trials, the Cox proportional hazard model with an interaction term has been used exclusively to test the significance of treatment-covariate interaction in oncology literature. But the proportional hazards assumption may not be satisfied by the data from clinical trials. Although there are several procedures proposed in statistical literature to assess the interaction based on a nonparametric measure of interaction or nonparametric models, some of these procedures do not take into the account of the nature of the data well, while some are very complicated which may have limited their applications in practice. In this thesis, a non-parametric procedure based on the smoothed estimate of Patel–Hoel measure is first derived to test the interaction between the treatment and a binary covariate with censored data. The theoretical distribution of the test statistic of the proposed procedure is derived. The proposed procedure is also evaluated through Monte-Carlo simulations and applications to data from a cancer clinical trial. Jackknifed versions of two test statistics based on nonparametric models are then derived by simplifying these test statistics and applying the jackknife method to estimate their variances. These jackknifed tests are also compared with the smoothed test and other related tests.
Thesis (Ph.D, Mathematics & Statistics) -- Queen's University, 2011-09-27 11:09:28.449

Стилі APA, Harvard, Vancouver, ISO та ін.

25

(7027331), Jiasen Yang. "Statistical Learning and Model Criticism for Networks and Point Processes." Thesis, 2019.

Знайти повний текст джерела

Анотація:

Networks and point processes provide flexible tools for representing and modeling complex dependencies in data arising from various social and physical domains. Graphs, or networks, encode relational dependencies between entities, while point processes characterize temporal or spatial interactions among events.

In the first part of this dissertation, we consider dynamic network data (such as communication networks) in which links connecting pairs of nodes appear continuously over time. We propose latent space point process models to capture two different aspects of the data: (i) communication occurs at a higher rate between individuals with similar latent attributes (i.e., homophily); and (ii) individuals tend to reciprocate communications from others, but in a varied manner. Our framework marries ideas from point process models, including Poisson and Hawkes processes, with ideas from latent space models of static networks. We evaluate our models on several real-world datasets and show that a dual latent space model, which accounts for heterogeneity in both homophily and reciprocity, significantly improves performance in various link prediction and network embedding tasks.

In the second part of this dissertation, we develop nonparametric goodness-of-fit tests for discrete distributions and point processes that contain intractable normalization constants, providing the first generally applicable and computationally feasible approaches under those circumstances. Specifically, we propose and characterize Stein operators for discrete distributions, and construct a general Stein operator for point processes using the Papangelou conditional intensity function. Based on the proposed Stein operators, we establish kernelized Stein discrepancy measures for discrete distributions and point processes, which enable us to develop nonparametric goodness-of-fit tests for un-normalized density/intensity functions. We apply the kernelized Stein discrepancy tests to discrete distributions (including network models) as well as temporal and spatial point processes. Our experiments demonstrate that the proposed tests typically outperform two-sample tests based on the maximum mean discrepancy, which, unlike our goodness-of-fit tests, assume the availability of exact samples from the null model.

Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Kernel testing"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями