Dissertations / Theses on the topic 'Sequential nonparametric kernel regression'

To see the other types of publications on this topic, follow the link: Sequential nonparametric kernel regression.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 33 dissertations / theses for your research on the topic 'Sequential nonparametric kernel regression.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Dharmasena, Tibbotuwa Deniye Kankanamge Lasitha Sandamali, and Sandamali dharmasena@rmit edu au. "Sequential Procedures for Nonparametric Kernel Regression." RMIT University. Mathematical and Geospatial Sciences, 2008. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20090119.134815.

Full text
Abstract:
In a nonparametric setting, the functional form of the relationship between the response variable and the associated predictor variables is unspecified; however it is assumed to be a smooth function. The main aim of nonparametric regression is to highlight an important structure in data without any assumptions about the shape of an underlying regression function. In regression, the random and fixed design models should be distinguished. Among the variety of nonparametric regression estimators currently in use, kernel type estimators are most popular. Kernel type estimators provide a flexible class of nonparametric procedures by estimating unknown function as a weighted average using a kernel function. The bandwidth which determines the influence of the kernel has to be adapted to any kernel type estimator. Our focus is on Nadaraya-Watson estimator and Local Linear estimator which belong to a class of kernel type regression estimators called local polynomial kerne l estimators. A closely related problem is the determination of an appropriate sample size that would be required to achieve a desired confidence level of accuracy for the nonparametric regression estimators. Since sequential procedures allow an experimenter to make decisions based on the smallest number of observations without compromising accuracy, application of sequential procedures to a nonparametric regression model at a given point or series of points is considered. The motivation for using such procedures is: in many applications the quality of estimating an underlying regression function in a controlled experiment is paramount; thus, it is reasonable to invoke a sequential procedure of estimation that chooses a sample size based on recorded observations that guarantees a preassigned accuracy. We have employed sequential techniques to develop a procedure for constructing a fixed-width confidence interval for the predicted value at a specific point of the independent variable. These fixed-width confidence intervals are developed using asymptotic properties of both Nadaraya-Watson and local linear kernel estimators of nonparametric kernel regression with data-driven bandwidths and studied for both fixed and random design contexts. The sample sizes for a preset confidence coefficient are optimized using sequential procedures, namely two-stage procedure, modified two-stage procedure and purely sequential procedure. The proposed methodology is first tested by employing a large-scale simulation study. The performance of each kernel estimation method is assessed by comparing their coverage accuracy with corresponding preset confidence coefficients, proximity of computed sample sizes match up to optimal sample sizes and contrasting the estimated values obtained from the two nonparametric methods with act ual values at given series of design points of interest. We also employed the symmetric bootstrap method which is considered as an alternative method of estimating properties of unknown distributions. Resampling is done from a suitably estimated residual distribution and utilizes the percentiles of the approximate distribution to construct confidence intervals for the curve at a set of given design points. A methodology is developed for determining whether it is advantageous to use the symmetric bootstrap method to reduce the extent of oversampling that is normally known to plague Stein's two-stage sequential procedure. The procedure developed is validated using an extensive simulation study and we also explore the asymptotic properties of the relevant estimators. Finally, application of our proposed sequential nonparametric kernel regression methods are made to some problems in software reliability and finance.
APA, Harvard, Vancouver, ISO, and other styles
2

Signorini, David F. "Practical aspects of kernel smoothing for binary regression and density estimation." Thesis, n.p, 1998. http://oro.open.ac.uk/19923/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Sejong. "Three nonparametric specification tests for parametric regression models : the kernel estimation approach." Connect to resource, 1994. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1261492759.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

El, Ghouch Anouar. "Nonparametric statistical inference for dependent censored data." Université catholique de Louvain, 2007. http://edoc.bib.ucl.ac.be:81/ETD-db/collection/available/BelnUcetd-09262007-123927/.

Full text
Abstract:
A frequent problem that appears in practical survival data analysis is censoring. A censored observation occurs when the observation of the event time (duration or survival time) may be prevented by the occurrence of an earlier competing event (censoring time). Censoring may be due to different causes. For example, the loss of some subjects under study, the end of the follow-up period, drop out or the termination of the study and the limitation in the sensitivity of a measurement instrument. The literature about censored data focuses on the i.i.d. case. However in many real applications the data are collected sequentially in time or space and so the assumption of independence in such case does not hold. Here we only give some typical examples from the literature involving correlated data which are subject to censoring. In the clinical trials domain it frequently happens that the patients from the same hospital have correlated survival times due to unmeasured variables like the quality of the hospital equipment. Censored correlated data are also a common problem in the domain of environmental and spatial (geographical or ecological) statistics. In fact, due to the process being used in the data sampling procedure, e.g. the analytical equipment, only the measurements which exceed some thresholds, for example the method detection limits or the instrumental detection limits, can be included in the data analysis. Many other examples can also be found in other fields like econometrics and financial statistics. Observations on duration of unemployment e.g., may be right censored and are typically correlated. When the data are not independent and are subject to censoring, estimation and inference become more challenging mathematical problems with a wide area of applications. In this context, we propose here some new and flexible tools based on a nonparametric approach. More precisely, allowing dependence between individuals, our main contribution to this domain concerns the following aspects. First, we are interested in developing more suitable confidence intervals for a general class of functionals of a survival distribution via the empirical likelihood method. Secondly, we study the problem of conditional mean estimation using the local linear technique. Thirdly, we develop and study a new estimator of the conditional quantile function also based on the local linear method. In this dissertation, for each proposed method, asymptotic results like consistency and asymptotic normality are derived and the finite sample performance is evaluated in a simulation study.
APA, Harvard, Vancouver, ISO, and other styles
5

Kim, Byung-Jun. "Semiparametric and Nonparametric Methods for Complex Data." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99155.

Full text
Abstract:
A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing those complex data in this dissertation. We have then provided several contributions to semiparametric and nonparametric methods for dealing with the following problems: the first is to propose a method for testing the significance of a functional association under the matched study; the second is to develop a method to simultaneously identify important variables and build a network in HDHC data; the third is to propose a multi-class dynamic model for recognizing a pattern in the time-trend analysis. For the first topic, we propose a semiparametric omnibus test for testing the significance of a functional association between the clustered binary outcomes and covariates with measurement error by taking into account the effect modification of matching covariates. We develop a flexible omnibus test for testing purposes without a specific alternative form of a hypothesis. The advantages of our omnibus test are demonstrated through simulation studies and 1-4 bidirectional matched data analyses from an epidemiology study. For the second topic, we propose a joint semiparametric kernel machine network approach to provide a connection between variable selection and network estimation. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among them. We develop our approach under a semiparametric kernel machine regression framework, which can allow for the possibility that each variable might be nonlinear and is likely to interact with each other in a complicated way. We demonstrate our approach using simulation studies and real application on genetic pathway analysis. Lastly, for the third project, we propose a Bayesian focal-area detection method for a multi-class dynamic model under a Bayesian hierarchical framework. Two-step Bayesian sequential procedures are developed to estimate patterns and detect focal intervals, which can be used for gas chromatography. We demonstrate the performance of our proposed method using a simulation study and real application on gas chromatography on Fast Odor Chromatographic Sniffer (FOX) system.
Doctor of Philosophy
A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing the following three types of data: (1) matched case-crossover data, (2) HCHD data, and (3) Time-series data. We contribute to the development of statistical methods to deal with such complex data. First, under the matched study, we discuss an idea about hypothesis testing to effectively determine the association between observed factors and risk of interested disease. Because, in practice, we do not know the specific form of the association, it might be challenging to set a specific alternative hypothesis. By reflecting the reality, we consider the possibility that some observations are measured with errors. By considering these measurement errors, we develop a testing procedure under the matched case-crossover framework. This testing procedure has the flexibility to make inferences on various hypothesis settings. Second, we consider the data where the number of variables is very large compared to the sample size, and the variables are correlated to each other. In this case, our goal is to identify important variables for outcome among a large amount of the variables and build their network. For example, identifying few genes among whole genomics associated with diabetes can be used to develop biomarkers. By our proposed approach in the second project, we can identify differentially expressed and important genes and their network structure with consideration for the outcome. Lastly, we consider the scenario of changing patterns of interest over time with application to gas chromatography. We propose an efficient detection method to effectively distinguish the patterns of multi-level subjects in time-trend analysis. We suggest that our proposed method can give precious information on efficient search for the distinguishable patterns so as to reduce the burden of examining all observations in the data.
APA, Harvard, Vancouver, ISO, and other styles
6

Maity, Arnab. "Efficient inference in general semiparametric regression models." [College Station, Tex. : Texas A&M University, 2008. http://hdl.handle.net/1969.1/ETD-TAMU-3075.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Doruska, Paul F. "Methods for Quantitatively Describing Tree Crown Profiles of Loblolly pine (Pinus taeda L.)." Diss., Virginia Tech, 1998. http://hdl.handle.net/10919/30638.

Full text
Abstract:
Physiological process models, productivity studies, and wildlife abundance studies all require accurate representations of tree crowns. In the past, geometric shapes or flexible mathematical equations approximating geometric shapes were used to represent crown profiles. Crown profile of loblolly pine (Pinus taeda L.) was described using single-regressor, nonparametric regression analysis in an effort to improve crown representations. The resulting profiles were compared to more traditional representations. Nonparametric regression may be applicable when an underlying parametric model cannot be identified. The modeler does not specify a functional form. Rather, a data-driven technique is used to determine the shape a curve. The modeler determines the amount of local curvature to be depicted in the curve. A class of local-polynomial estimators which contains the popular kernel estimator as a special case was investigated. Kernel regression appears to fit closely to the interior data points but often possesses bias problems at the boundaries of the data, a feature less exhibited by local linear or local quadratic regression. When using nonparametric regression, decisions must be made regarding polynomial order and bandwidth. Such decisions depend on the presence of local curvature, desired degree of smoothing, and, for bandwidth in particular, the minimization of some global error criterion. In the present study, a penalized PRESS criterion (PRESS*) was selected as the global error criterion. When individual- tree, crown profile data are available, the technique of nonparametric regression appears capable of capturing more of the tree to tree variation in crown shape than multiple linear regression and other published functional forms. Thus, modelers should consider the use of nonparametric regression when describing crown profiles as well as in any regression situation where traditional techniques perform unsatisfactorily or fail.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
8

Chu, Chi-Yang. "Applied Nonparametric Density and Regression Estimation with Discrete Data| Plug-In Bandwidth Selection and Non-Geometric Kernel Functions." Thesis, The University of Alabama, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10262364.

Full text
Abstract:

Bandwidth selection plays an important role in kernel density estimation. Least-squares cross-validation and plug-in methods are commonly used as bandwidth selectors for the continuous data setting. The former is a data-driven approach and the latter requires a priori assumptions about the unknown distribution of the data. A benefit from the plug-in method is its relatively quick computation and hence it is often used for preliminary analysis. However, we find that much less is known about the plug-in method in the discrete data setting and this motivates us to propose a plug-in bandwidth selector. A related issue is undersmoothing in kernel density estimation. Least-squares cross-validation is a popular bandwidth selector, but in many applied situations, it tends to select a relatively small bandwidth, or undersmooths. The literature suggests several methods to solve this problem, but most of them are the modifications of extant error criterions for continuous variables. Here we discuss this problem in the discrete data setting and propose non-geometric discrete kernel functions as a possible solution. This issue also occurs in kernel regression estimation. Our proposed bandwidth selector and kernel functions perform well in simulated and real data.

APA, Harvard, Vancouver, ISO, and other styles
9

Edwards, Adam Michael. "Precision Aggregated Local Models." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/102125.

Full text
Abstract:
Large scale Gaussian process (GP) regression is infeasible for larger data sets due to cubic scaling of flops and quadratic storage involved in working with covariance matrices. Remedies in recent literature focus on divide-and-conquer, e.g., partitioning into sub-problems and inducing functional (and thus computational) independence. Such approximations can speedy, accurate, and sometimes even more flexible than an ordinary GPs. However, a big downside is loss of continuity at partition boundaries. Modern methods like local approximate GPs (LAGPs) imply effectively infinite partitioning and are thus pathologically good and bad in this regard. Model averaging, an alternative to divide-and-conquer, can maintain absolute continuity but often over-smooth, diminishing accuracy. Here I propose putting LAGP-like methods into a local experts-like framework, blending partition-based speed with model-averaging continuity, as a flagship example of what I call precision aggregated local models (PALM). Using N_C LAGPs, each selecting n from N data pairs, I illustrate a scheme that is at most cubic in n, quadratic in N_C, and linear in N, drastically reducing computational and storage demands. Extensive empirical illustration shows how PALM is at least as accurate as LAGP, can be much faster in terms of speed, and furnishes continuous predictive surfaces. Finally, I propose sequential updating scheme which greedily refines a PALM predictor up to a computational budget, and several variations on the basic PALM that may provide predictive improvements.
Doctor of Philosophy
Occasionally, when describing the relationship between two variables, it may be helpful to use a so-called ``non-parametric" regression that is agnostic to the function that connects them. Gaussian Processes (GPs) are a popular method of non-parametric regression used for their relative flexibility and interpretability, but they have the unfortunate drawback of being computationally infeasible for large data sets. Past work into solving the scaling issues for GPs has focused on ``divide and conquer" style schemes that spread the data out across multiple smaller GP models. While these model make GP methods much more accessible to large data sets they do so either at the expense of local predictive accuracy of global surface continuity. Precision Aggregated Local Models (PALM) is a novel divide and conquer method for GP models that is scalable for large data while maintaining local accuracy and a smooth global model. I demonstrate that PALM can be built quickly, and performs well predictively compared to other state of the art methods. This document also provides a sequential algorithm for selecting the location of each local model, and variations on the basic PALM methodology.
APA, Harvard, Vancouver, ISO, and other styles
10

Song, Song. "Confidence bands in quantile regression and generalized dynamic semiparametric factor models." Doctoral thesis, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, 2010. http://dx.doi.org/10.18452/16341.

Full text
Abstract:
In vielen Anwendungen ist es notwendig, die stochastische Schwankungen der maximalen Abweichungen der nichtparametrischen Schätzer von Quantil zu wissen, zB um die verschiedene parametrische Modelle zu überprüfen. Einheitliche Konfidenzbänder sind daher für nichtparametrische Quantil Schätzungen der Regressionsfunktionen gebaut. Die erste Methode basiert auf der starken Approximation der empirischen Verfahren und Extremwert-Theorie. Die starke gleichmäßige Konsistenz liegt auch unter allgemeinen Bedingungen etabliert. Die zweite Methode beruht auf der Bootstrap Resampling-Verfahren. Es ist bewiesen, dass die Bootstrap-Approximation eine wesentliche Verbesserung ergibt. Der Fall von mehrdimensionalen und diskrete Regressorvariablen wird mit Hilfe einer partiellen linearen Modell behandelt. Das Verfahren wird mithilfe der Arbeitsmarktanalysebeispiel erklärt. Hoch-dimensionale Zeitreihen, die nichtstationäre und eventuell periodische Verhalten zeigen, sind häufig in vielen Bereichen der Wissenschaft, zB Makroökonomie, Meteorologie, Medizin und Financial Engineering, getroffen. Der typische Modelierungsansatz ist die Modellierung von hochdimensionalen Zeitreihen in Zeit Ausbreitung der niedrig dimensionalen Zeitreihen und hoch-dimensionale zeitinvarianten Funktionen über dynamische Faktorenanalyse zu teilen. Wir schlagen ein zweistufiges Schätzverfahren. Im ersten Schritt entfernen wir den Langzeittrend der Zeitreihen durch Einbeziehung Zeitbasis von der Gruppe Lasso-Technik und wählen den Raumbasis mithilfe der funktionalen Hauptkomponentenanalyse aus. Wir zeigen die Eigenschaften dieser Schätzer unter den abhängigen Szenario. Im zweiten Schritt erhalten wir den trendbereinigten niedrig-dimensionalen stochastischen Prozess (stationär).
In many applications it is necessary to know the stochastic fluctuation of the maximal deviations of the nonparametric quantile estimates, e.g. for various parametric models check. Uniform confidence bands are therefore constructed for nonparametric quantile estimates of regression functions. The first method is based on the strong approximations of the empirical process and extreme value theory. The strong uniform consistency rate is also established under general conditions. The second method is based on the bootstrap resampling method. It is proved that the bootstrap approximation provides a substantial improvement. The case of multidimensional and discrete regressor variables is dealt with using a partial linear model. A labor market analysis is provided to illustrate the method. High dimensional time series which reveal nonstationary and possibly periodic behavior occur frequently in many fields of science, e.g. macroeconomics, meteorology, medicine and financial engineering. One of the common approach is to separate the modeling of high dimensional time series to time propagation of low dimensional time series and high dimensional time invariant functions via dynamic factor analysis. We propose a two-step estimation procedure. At the first step, we detrend the time series by incorporating time basis selected by the group Lasso-type technique and choose the space basis based on smoothed functional principal component analysis. We show properties of this estimator under the dependent scenario. At the second step, we obtain the detrended low dimensional stochastic process (stationary).
APA, Harvard, Vancouver, ISO, and other styles
11

Benelmadani, Djihad. "Contribution à la régression non paramétrique avec un processus erreur d'autocovariance générale et application en pharmacocinétique." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM034/document.

Full text
Abstract:
Dans cette thèse, nous considérons le modèle de régression avec plusieurs unités expérimentales, où les erreurs forment un processus d'autocovariance dans un cadre générale, c'est-à-dire, un processus du second ordre (stationnaire ou non stationnaire) avec une autocovariance non différentiable le long de la diagonale. Nous sommes intéressés, entre autres, à l'estimation non paramétrique de la fonction de régression de ce modèle.Premièrement, nous considérons l'estimateur classique proposé par Gasser et Müller. Nous étudions ses performances asymptotiques quand le nombre d'unités expérimentales et le nombre d'observations tendent vers l'infini. Pour un échantillonnage régulier, nous améliorons les vitesses de convergence d'ordre supérieur de son biais et de sa variance. Nous montrons aussi sa normalité asymptotique dans le cas des erreurs corrélées.Deuxièmement, nous proposons un nouvel estimateur à noyau pour la fonction de régression, basé sur une propriété de projection. Cet estimateur est construit à travers la fonction d'autocovariance des erreurs et une fonction particulière appartenant à l'Espace de Hilbert à Noyau Autoreproduisant (RKHS) associé à la fonction d'autocovariance. Nous étudions les performances asymptotiques de l'estimateur en utilisant les propriétés de RKHS. Ces propriétés nous permettent d'obtenir la vitesse optimale de convergence de la variance de cet estimateur. Nous prouvons sa normalité asymptotique, et montrons que sa variance est asymptotiquement plus petite que celle de l'estimateur de Gasser et Müller. Nous conduisons une étude de simulation pour confirmer nos résultats théoriques.Troisièmement, nous proposons un nouvel estimateur à noyau pour la fonction de régression. Cet estimateur est construit en utilisant la règle numérique des trapèzes, pour approximer l'estimateur basé sur des données continues. Nous étudions aussi sa performance asymptotique et nous montrons sa normalité asymptotique. En outre, cet estimateur permet d'obtenir le plan d'échantillonnage optimal pour l'estimation de la fonction de régression. Une étude de simulation est conduite afin de tester le comportement de cet estimateur dans un plan d'échantillonnage de taille finie, en terme d'erreur en moyenne quadratique intégrée (IMSE). De plus, nous montrons la réduction dans l'IMSE en utilisant le plan d'échantillonnage optimal au lieu de l'échantillonnage uniforme.Finalement, nous considérons une application de la régression non paramétrique dans le domaine pharmacocinétique. Nous proposons l'utilisation de l'estimateur non paramétrique à noyau pour l'estimation de la fonction de concentration. Nous vérifions son bon comportement par des simulations et une analyse de données réelles. Nous investiguons aussi le problème de l'estimation de l'Aire Sous la Courbe de concentration (AUC), pour lequel nous proposons un nouvel estimateur à noyau, obtenu par l'intégration de l'estimateur à noyau de la fonction de régression. Nous montrons, par une étude de simulation, que le nouvel estimateur est meilleur que l'estimateur classique en terme d'erreur en moyenne quadratique. Le problème crucial de l'obtention d'un plan d'échantillonnage optimale pour l'estimation de l'AUC est discuté en utilisant l'algorithme de recuit simulé généralisé
In this thesis, we consider the fixed design regression model with repeated measurements, where the errors form a process with general autocovariance function, i.e. a second order process (stationary or nonstationary), with a non-differentiable covariance function along the diagonal. We are interested, among other problems, in the nonparametric estimation of the regression function of this model.We first consider the well-known kernel regression estimator proposed by Gasser and Müller. We study its asymptotic performance when the number of experimental units and the number of observations tend to infinity. For a regular sequence of designs, we improve the higher rates of convergence of the variance and the bias. We also prove the asymptotic normality of this estimator in the case of correlated errors.Second, we propose a new kernel estimator of the regression function based on a projection property. This estimator is constructed through the autocovariance function of the errors, and a specific function belonging to the Reproducing Kernel Hilbert Space (RKHS) associated to the autocovariance function. We study its asymptotic performance using the RKHS properties. These properties allow to obtain the optimal convergence rate of the variance. We also prove its asymptotic normality. We show that this new estimator has a smaller asymptotic variance then the one of Gasser and Müller. A simulation study is conducted to confirm this theoretical result.Third, we propose a new kernel estimator for the regression function. This estimator is constructed through the trapezoidal numerical approximation of the kernel regression estimator based on continuous observations. We study its asymptotic performance, and we prove its asymptotic normality. Moreover, this estimator allow to obtain the asymptotic optimal sampling design for the estimation of the regression function. We run a simulation study to test the performance of the proposed estimator in a finite sample set, where we see its good performance, in terms of Integrated Mean Squared Error (IMSE). In addition, we show the reduction of the IMSE using the optimal sampling design instead of the uniform design in a finite sample set.Finally, we consider an application of the regression function estimation in pharmacokinetics problems. We propose to use the nonparametric kernel methods, for the concentration-time curve estimation, instead of the classical parametric ones. We prove its good performance via simulation study and real data analysis. We also investigate the problem of estimating the Area Under the concentration Curve (AUC), where we introduce a new kernel estimator, obtained by the integration of the regression function estimator. We prove, using a simulation study, that the proposed estimators outperform the classical one in terms of Mean Squared Error. The crucial problem of finding the optimal sampling design for the AUC estimation is investigated using the Generalized Simulating Annealing algorithm
APA, Harvard, Vancouver, ISO, and other styles
12

Sow, Mohamedou. "Développement de modèles non paramétriques et robustes : application à l’analyse du comportement de bivalves et à l’analyse de liaison génétique." Thesis, Bordeaux 1, 2011. http://www.theses.fr/2011BOR14257/document.

Full text
Abstract:
Le développement des approches robustes et non paramétriques pour l’analyse et le traitement statistique de gros volumes de données présentant une forte variabilité,comme dans les domaines de l’environnement et de la génétique, est fondamental.Nous modélisons ici des données complexes de biologie appliquées à l’étude du comportement de bivalves et à l’analyse de liaison génétique. L’application des mathématiques à l’analyse du comportement de mollusques bivalves nous a permis d’aller vers une quantification et une traduction mathématique de comportements d’animaux in-situ, en milieu proche ou lointain. Nous avons proposé un modèle de régression non paramétrique et comparé 3 estimateurs non paramétriques, récursifs ou non,de la fonction de régression pour optimiser le meilleur estimateur. Nous avons ensuite caractérisé des rythmes biologiques, formalisé l’évolution d’états d’ouvertures,proposé des méthodes de discrimination de comportements, utilisé la méthode des shot-noises pour caractériser différents états d’ouverture-fermetures transitoires et développé une méthode originale de mesure de croissance en ligne.En génétique, nous avons abordé un cadre plus général de statistiques robustes pour l’analyse de liaison génétique. Nous avons développé des estimateurs robustes aux hypothèses de normalités et à la présence de valeurs aberrantes, nous avons aussi utilisé une approche statistique, où nous avons abordé la dépendance entre variables aléatoires via la théorie des copules. Nos principaux résultats ont montré l’intérêt pratique de ces estimateurs sur des données réelles de QTL et eQTL
The development of robust and nonparametric approaches for the analysis and statistical treatment of high-dimensional data sets exhibiting high variability, as seen in the environmental and genetic fields, is instrumental. Here, we model complex biological data with application to the analysis of bivalves’ behavior and to linkage analysis. The application of mathematics to the analysis of mollusk bivalves’behavior gave us the possibility to quantify and translate mathematically the animals’behavior in situ, in close or far field. We proposed a nonparametric regression model and compared three nonparametric estimators (recursive or not) of the regressionfunction to optimize the best estimator. We then characterized the biological rhythms, formalized the states of opening, proposed methods able to discriminate the behaviors, used shot-noise analysis to characterize various opening/closing transitory states and developed an original approach for measuring online growth.In genetics, we proposed a more general framework of robust statistics for linkage analysis. We developed estimators robust to distribution assumptions and the presence of outlier observations. We also used a statistical approach where the dependence between random variables is specified through copula theory. Our main results showed the practical interest of these estimators on real data for QTL and eQTL analysis
APA, Harvard, Vancouver, ISO, and other styles
13

Amiri, Aboubacar. "Estimateurs fonctionnels récursifs et leurs applications à la prévision." Phd thesis, Université d'Avignon, 2010. http://tel.archives-ouvertes.fr/tel-00565221.

Full text
Abstract:
Nous nous intéressons dans cette thèse aux méthodes d'estimation non paramétriques par noyaux récursifs ainsi qu'à leurs applications à la prévision. Nous introduisons dans un premier chapitre une famille d'estimateurs récursifs de la densité indexée par un paramètre ℓ ∈ [0, 1]. Leur comportement asymptotique en fonction de ℓ va nous amener à introduire des critères de comparaison basés sur les biais, variance et erreur quadratique asymptotiques. Pour ces critères, nous comparons les estimateurs entre eux et aussi comparons notre famille à l'estimateur non récursif de la densité de Parzen-Rosenblatt. Ensuite, nous définissons à partir de notre famille d'estimateurs de la densité, une famille d'estimateurs récursifs à noyau de la fonction de régression. Nous étudions ses propriétés asymptotiques en fonction du paramètre ℓ. Nous utilisons enfin les résultats obtenus sur l'estimation de la régression pour construire un prédicteur non paramétrique par noyau. Nous obtenons ainsi une famille de prédicteurs non paramétriques qui permettent de réduire considérablement le temps de calcul. Des exemples d'application sont donnés pour valider la performance de nos estimateurs
APA, Harvard, Vancouver, ISO, and other styles
14

Ternynck, Camille. "Contributions à la modélisation de données spatiales et fonctionnelles : applications." Thesis, Lille 3, 2014. http://www.theses.fr/2014LIL30062/document.

Full text
Abstract:
Dans ce mémoire de thèse, nous nous intéressons à la modélisation non paramétrique de données spatiales et/ou fonctionnelles, plus particulièrement basée sur la méthode à noyau. En général, les échantillons que nous avons considérés pour établir les propriétés asymptotiques des estimateurs proposés sont constitués de variables dépendantes. La spécificité des méthodes étudiées réside dans le fait que les estimateurs prennent en compte la structure de dépendance des données considérées.Dans une première partie, nous appréhendons l’étude de variables réelles spatialement dépendantes. Nous proposons une nouvelle approche à noyau pour estimer les fonctions de densité de probabilité et de régression spatiales ainsi que le mode. La particularité de cette approche est qu’elle permet de tenir compte à la fois de la proximité entre les observations et de celle entre les sites. Nous étudions les comportements asymptotiques des estimateurs proposés ainsi que leurs applications à des données simulées et réelles.Dans une seconde partie, nous nous intéressons à la modélisation de données à valeurs dans un espace de dimension infinie ou dites "données fonctionnelles". Dans un premier temps, nous adaptons le modèle de régression non paramétrique introduit en première partie au cadre de données fonctionnelles spatialement dépendantes. Nous donnons des résultats asymptotiques ainsi que numériques. Puis, dans un second temps, nous étudions un modèle de régression de séries temporelles dont les variables explicatives sont fonctionnelles et le processus des innovations est autorégressif. Nous proposons une procédure permettant de tenir compte de l’information contenue dans le processus des erreurs. Après avoir étudié le comportement asymptotique de l’estimateur à noyau proposé, nous analysons ses performances sur des données simulées puis réelles.La troisième partie est consacrée aux applications. Tout d’abord, nous présentons des résultats de classification non supervisée de données spatiales (multivariées), simulées et réelles. La méthode de classification considérée est basée sur l’estimation du mode spatial, obtenu à partir de l’estimateur de la fonction de densité spatiale introduit dans le cadre de la première partie de cette thèse. Puis, nous appliquons cette méthode de classification basée sur le mode ainsi que d’autres méthodes de classification non supervisée de la littérature sur des données hydrologiques de nature fonctionnelle. Enfin, cette classification des données hydrologiques nous a amené à appliquer des outils de détection de rupture sur ces données fonctionnelles
In this dissertation, we are interested in nonparametric modeling of spatial and/or functional data, more specifically based on kernel method. Generally, the samples we have considered for establishing asymptotic properties of the proposed estimators are constituted of dependent variables. The specificity of the studied methods lies in the fact that the estimators take into account the structure of the dependence of the considered data.In a first part, we study real variables spatially dependent. We propose a new kernel approach to estimating spatial probability density of the mode and regression functions. The distinctive feature of this approach is that it allows taking into account both the proximity between observations and that between sites. We study the asymptotic behaviors of the proposed estimates as well as their applications to simulated and real data. In a second part, we are interested in modeling data valued in a space of infinite dimension or so-called "functional data". As a first step, we adapt the nonparametric regression model, introduced in the first part, to spatially functional dependent data framework. We get convergence results as well as numerical results. Then, later, we study time series regression model in which explanatory variables are functional and the innovation process is autoregressive. We propose a procedure which allows us to take into account information contained in the error process. After showing asymptotic behavior of the proposed kernel estimate, we study its performance on simulated and real data.The third part is devoted to applications. First of all, we present unsupervised classificationresults of simulated and real spatial data (multivariate). The considered classification method is based on the estimation of spatial mode, obtained from the spatial density function introduced in the first part of this thesis. Then, we apply this classification method based on the mode as well as other unsupervised classification methods of the literature on hydrological data of functional nature. Lastly, this classification of hydrological data has led us to apply change point detection tools on these functional data
APA, Harvard, Vancouver, ISO, and other styles
15

Tencaliec, Patricia. "Developments in statistics applied to hydrometeorology : imputation of streamflow data and semiparametric precipitation modeling." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM006/document.

Full text
Abstract:
Les précipitations et les débits des cours d'eau constituent les deux variables hydrométéorologiques les plus importantes pour l'analyse des bassins versants. Ils fournissent des informations fondamentales pour la gestion intégrée des ressources en eau, telles que l’approvisionnement en eau potable, l'hydroélectricité, les prévisions d'inondations ou de sécheresses ou les systèmes d'irrigation.Dans cette thèse de doctorat sont abordés deux problèmes distincts. Le premier prend sa source dans l’étude des débits des cours d’eau. Dans le but de bien caractériser le comportement global d'un bassin versant, de longues séries temporelles de débit couvrant plusieurs dizaines d'années sont nécessaires. Cependant les données manquantes constatées dans les séries représentent une perte d'information et de fiabilité, et peuvent entraîner une interprétation erronée des caractéristiques statistiques des données. La méthode que nous proposons pour aborder le problème de l'imputation des débits se base sur des modèles de régression dynamique (DRM), plus spécifiquement, une régression linéaire multiple couplée à une modélisation des résidus de type ARIMA. Contrairement aux études antérieures portant sur l'inclusion de variables explicatives multiples ou la modélisation des résidus à partir d'une régression linéaire simple, l'utilisation des DRMs permet de prendre en compte les deux aspects. Nous appliquons cette méthode pour reconstruire les données journalières de débit à huit stations situées dans le bassin versant de la Durance (France), sur une période de 107 ans. En appliquant la méthode proposée, nous parvenons à reconstituer les débits sans utiliser d'autres variables explicatives. Nous comparons les résultats de notre modèle avec ceux obtenus à partir d'un modèle complexe basé sur les analogues et la modélisation hydrologique et d'une approche basée sur le plus proche voisin. Dans la majorité des cas, les DRMs montrent une meilleure performance lors de la reconstitution de périodes de données manquantes de tailles différentes, dans certains cas pouvant allant jusqu'à 20 ans.Le deuxième problème que nous considérons dans cette thèse concerne la modélisation statistique des quantités de précipitations. La recherche dans ce domaine est actuellement très active car la distribution des précipitations exhibe une queue supérieure lourde et, au début de cette thèse, il n'existait aucune méthode satisfaisante permettant de modéliser toute la gamme des précipitations. Récemment, une nouvelle classe de distribution paramétrique, appelée distribution généralisée de Pareto étendue (EGPD), a été développée dans ce but. Cette distribution exhibe une meilleure performance, mais elle manque de flexibilité pour modéliser la partie centrale de la distribution. Dans le but d’améliorer la flexibilité, nous développons, deux nouveaux modèles reposant sur des méthodes semiparamétriques.Le premier estimateur développé transforme d'abord les données avec la distribution cumulative EGPD puis estime la densité des données transformées en appliquant un estimateur nonparamétrique par noyau. Nous comparons les résultats de la méthode proposée avec ceux obtenus en appliquant la distribution EGPD paramétrique sur plusieurs simulations, ainsi que sur deux séries de précipitations au sud-est de la France. Les résultats montrent que la méthode proposée se comporte mieux que l'EGPD, l’erreur absolue moyenne intégrée (MIAE) de la densité étant dans tous les cas presque deux fois inférieure.Le deuxième modèle considère une distribution EGPD semiparamétrique basée sur les polynômes de Bernstein. Plus précisément, nous utilisons un mélange creuse de densités béta. De même, nous comparons nos résultats avec ceux obtenus par la distribution EGPD paramétrique sur des jeux de données simulés et réels. Comme précédemment, le MIAE de la densité est considérablement réduit, cet effet étant encore plus évident à mesure que la taille de l'échantillon augmente
Precipitation and streamflow are the two most important meteorological and hydrological variables when analyzing river watersheds. They provide fundamental insights for water resources management, design, or planning, such as urban water supplies, hydropower, forecast of flood or droughts events, or irrigation systems for agriculture.In this PhD thesis we approach two different problems. The first one originates from the study of observed streamflow data. In order to properly characterize the overall behavior of a watershed, long datasets spanning tens of years are needed. However, the quality of the measurement dataset decreases the further we go back in time, and blocks of data of different lengths are missing from the dataset. These missing intervals represent a loss of information and can cause erroneous summary data interpretation or unreliable scientific analysis.The method that we propose for approaching the problem of streamflow imputation is based on dynamic regression models (DRMs), more specifically, a multiple linear regression with ARIMA residual modeling. Unlike previous studies that address either the inclusion of multiple explanatory variables or the modeling of the residuals from a simple linear regression, the use of DRMs allows to take into account both aspects. We apply this method for reconstructing the data of eight stations situated in the Durance watershed in the south-east of France, each containing daily streamflow measurements over a period of 107 years. By applying the proposed method, we manage to reconstruct the data without making use of additional variables, like other models require. We compare the results of our model with the ones obtained from a complex approach based on analogs coupled to a hydrological model and a nearest-neighbor approach, respectively. In the majority of cases, DRMs show an increased performance when reconstructing missing values blocks of various lengths, in some of the cases ranging up to 20 years.The second problem that we approach in this PhD thesis addresses the statistical modeling of precipitation amounts. The research area regarding this topic is currently very active as the distribution of precipitation is a heavy-tailed one, and at the moment, there is no general method for modeling the entire range of data with high performance. Recently, in order to propose a method that models the full-range precipitation amounts, a new class of distribution called extended generalized Pareto distribution (EGPD) was introduced, specifically with focus on the EGPD models based on parametric families. These models provide an improved performance when compared to previously proposed distributions, however, they lack flexibility in modeling the bulk of the distribution. We want to improve, through, this aspect by proposing in the second part of the thesis, two new models relying on semiparametric methods.The first method that we develop is the transformed kernel estimator based on the EGPD transformation. That is, we propose an estimator obtained by, first, transforming the data with the EGPD cdf, and then, estimating the density of the transformed data by applying a nonparametric kernel density estimator. We compare the results of the proposed method with the ones obtained by applying EGPD on several simulated scenarios, as well as on two precipitation datasets from south-east of France. The results show that the proposed method behaves better than parametric EGPD, the MIAE of the density being in all the cases almost twice as small.A second approach consists of a new model from the general EGPD class, i.e., we consider a semiparametric EGPD based on Bernstein polynomials, more specifically, we use a sparse mixture of beta densities. Once again, we compare our results with the ones obtained by EGPD on both simulated and real datasets. As before, the MIAE of the density is considerably reduced, this effect being even more obvious as the sample size increases
APA, Harvard, Vancouver, ISO, and other styles
16

Azaïs, Romain. "Estimation non paramétrique pour les processus markoviens déterministes par morceaux." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00844395.

Full text
Abstract:
M.H.A. Davis a introduit les processus markoviens déterministes par morceaux (PDMP) comme une classe générale de modèles stochastiques non diffusifs, donnant lieu à des trajectoires déterministes ponctuées, à des instants aléatoires, par des sauts aléatoires. Dans cette thèse, nous présentons et analysons des estimateurs non paramétriques des lois conditionnelles des deux aléas intervenant dans la dynamique de tels processus. Plus précisément, dans le cadre d'une observation en temps long de la trajectoire d'un PDMP, nous présentons des estimateurs de la densité conditionnelle des temps inter-sauts et du noyau de Markov qui gouverne la loi des sauts. Nous établissons des résultats de convergence pour nos estimateurs. Des simulations numériques pour différentes applications illustrent nos résultats. Nous proposons également un estimateur du taux de saut pour des processus de renouvellement, ainsi qu'une méthode d'approximation numérique pour un modèle de régression semi-paramétrique.
APA, Harvard, Vancouver, ISO, and other styles
17

Kowolowski, Alexander. "Vývoj moderních akustických parametrů kvantifikujících hypokinetickou dysartrii." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-401990.

Full text
Abstract:
This work deals with designing and testing of new acoustic features for analysis of dysprosodic speech occurring in hypokinetic dysarthria patients. 41 new features for dysprosody quantification (describing melody, loudness, rhythm and pace) are presented and tested in this work. New features can be divided into 7 groups. Inside the groups, features vary by the used statistical values. First four groups are based on absolute differences and cumulative sums of fundamental frequency and short-time energy of the signal. Fifth group contains features based on multiples of this fundamental frequency and short-time energy combined into one global intonation feature. Sixth group contains global time features, which are made of divisions between conventional rhythm and pace features. Last group contains global features for quantification of whole dysprosody, made of divisions between global intonation and global time features. All features were tested on Czech Parkinsonian speech database PARCZ. First, kernel density estimation was made and plotted for all features. Then correlation analysis with medicinal metadata was made, first for all the features, then for global features only. Next classification and regression analysis were made, using classification and regression trees algorithm (CART). This analysis was first made for all the features separately, then for all the data at once and eventually a sequential floating feature selection was made, to find out the best fitting combination of features for the current matter. Even though none of the features emerged as a universal best, there were a few features, that were appearing as one of the best repeatedly and also there was a trend that there was a bigger drop between the best and the second best feature, marking it as a much better feature for the given matter, than the rest of the tested. Results are included in the conclusion together with the discussion.
APA, Harvard, Vancouver, ISO, and other styles
18

Aloui, Nadia. "Localisation sonore par retournement temporel." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENT079/document.

Full text
Abstract:
L'objectif général de cette thèse était de proposer une solution de localisation en intérieur à la fois simple et capable de surmonter les défis de la propagation dans les environnements en intérieur. Pour ce faire, un système de localisation basé sur la méthode des signatures et adoptant le temps d'arrivée du signal de l'émetteur au récepteur comme signature, a été proposé. Le système présente deux architectures différentes, une première orientée privée utilisant la méthode d'accès multiple à répartition par code et une deuxième centralisée basée sur la méthode d'accès multiple à répartition dans le temps. Le système calcule la position de l'objet d'intérêt par la méthode de noyau. Une comparaison expérimentale entre le système à architecture orientée privée et un système de localisation sonore déjà existant et basé sur la méthode de trilatération, a permis de confirmer les résultats trouvés dans le cas de la localisation par ondes radiofréquences. Cependant, nos expérimentations étaient les premières à montrer l'effet de la réverbération sur les approches de la localisation acoustique. Dans un second lieu, un système de localisation basé sur la technique de retournement temporel, permettant une localisation simultanée de sources avec différentes précisions, a été testé par simulations en faisant varier le nombre de sources. Ce système a été ensuite validé par expérimentations. Dans la dernière partie de notre étude, nous nous sommes intéressés à la réduction de l'audibilité du signal utile à la localisation par recours à la psycho-acoustique. Un filtre défini à partir du seuil d'audition absolu a été appliqué au signal de localisation. Nos résultats ont montré une amélioration de la précision de localisation comparé au système de localisation sans modèle psycho-acoustique et ce grâce à l'utilisation d'un filtre adapté au modèle psycho-acoustique à la réception. Par ailleurs, l'écoute du signal après application du modèle psycho-acoustique a montré une réduction significative de son audibilité comparée à celle du signal original
The objective of this PhD is to propose a location solution that should be simple and robust to multipath that characterizes the indoor environments. First, a location system that exploits the time domain of channel parameters has been proposed. The system adopts the time of arrival of the path of maximum amplitude as a signature and estimates the target position through nonparametric kernel regression. The system was evaluated in experiments for two main configurations: a privacy-oriented configuration with code-division multiple-access operation and a centralized configuration with time-division multiple-access operation. A comparison between our privacy-oriented system and another acoustic location system based on code-division multiple-access operation and lateration method confirms the results found in radiofrequency-based localization. However, our experiments are the first to demonstrate the detrimental effect that reverberation has on acoustic localization approaches. Second, a location system based on time reversal technique and able to localize simultaneously sources with different location precisions has been tested through simulations for different values of the number of sources. The system has then been validated by experiments. Finally, we have been interested in reducing the audibility of the localization signal through psycho-acoustics. A filter, set from the absolute threshold of hearing, is then applied to the signal. Our results showed an improvement in precision, when compared to the location system without psychoacoustic model, thanks to the use of matched filter at the receiver. Moreover, we have noticed a significant reduction in the audibility of the filtered signal compared to that of the original signal
APA, Harvard, Vancouver, ISO, and other styles
19

Somé, Sobom Matthieu. "Estimations non paramétriques par noyaux associés multivariés et applications." Thesis, Besançon, 2015. http://www.theses.fr/2015BESA2030/document.

Full text
Abstract:
Dans ce travail, l'approche non-paramétrique par noyaux associés mixtes multivariés est présentée pour les fonctions de densités, de masse de probabilité et de régressions à supports partiellement ou totalement discrets et continus. Pour cela, quelques aspects essentiels des notions d'estimation par noyaux continus (dits classiques) multivariés et par noyaux associés univariés (discrets et continus) sont d'abord rappelés. Les problèmes de supports sont alors révisés ainsi qu'une résolution des effets de bords dans les cas des noyaux associés univariés. Le noyau associé multivarié est ensuite défini et une méthode de leur construction dite mode-dispersion multivarié est proposée. Il s'ensuit une illustration dans le cas continu utilisant le noyau bêta bivarié avec ou sans structure de corrélation de type Sarmanov. Les propriétés des estimateurs telles que les biais, les variances et les erreurs quadratiques moyennes sont également étudiées. Un algorithme de réduction du biais est alors proposé et illustré sur ce même noyau avec structure de corrélation. Des études par simulations et applications avec le noyau bêta bivarié avec structure de corrélation sont aussi présentées. Trois formes de matrices des fenêtres, à savoir, pleine, Scott et diagonale, y sont utilisées puis leurs performances relatives sont discutées. De plus, des noyaux associés multiples ont été efficaces dans le cadre de l'analyse discriminante. Pour cela, on a utilisé les noyaux univariés binomial, catégoriel, triangulaire discret, gamma et bêta. Par la suite, les noyaux associés avec ou sans structure de corrélation ont été étudiés dans le cadre de la régression multiple. En plus des noyaux univariés ci-dessus, les noyaux bivariés avec ou sans structure de corrélation ont été aussi pris en compte. Les études par simulations montrent l'importance et les bonnes performances du choix des noyaux associés multivariés à matrice de lissage pleine ou diagonale. Puis, les noyaux associés continus et discrets sont combinés pour définir les noyaux associés mixtes univariés. Les travaux ont aussi donné lieu à la création d'un package R pour l'estimation de fonctions univariés de densités, de masse de probabilité et de régression. Plusieurs méthodes de sélections de fenêtres optimales y sont implémentées avec une interface facile d'utilisation. Tout au long de ce travail, la sélection des matrices de lissage se fait généralement par validation croisée et parfois par les méthodes bayésiennes. Enfin, des compléments sur les constantes de normalisations des estimateurs à noyaux associés des fonctions de densité et de masse de probabilité sont présentés
This work is about nonparametric approach using multivariate mixed associated kernels for densities, probability mass functions and regressions estimation having supports partially or totally discrete and continuous. Some key aspects of kernel estimation using multivariate continuous (classical) and (discrete and continuous) univariate associated kernels are recalled. Problem of supports are also revised as well as a resolution of boundary effects for univariate associated kernels. The multivariate associated kernel is then defined and a construction by multivariate mode-dispersion method is provided. This leads to an illustration on the bivariate beta kernel with Sarmanov's correlation structure in continuous case. Properties of these estimators are studied, such as the bias, variances and mean squared errors. An algorithm for reducing the bias is proposed and illustrated on this bivariate beta kernel. Simulations studies and applications are then performed with bivariate beta kernel. Three types of bandwidth matrices, namely, full, Scott and diagonal are used. Furthermore, appropriated multiple associated kernels are used in a practical discriminant analysis task. These are the binomial, categorical, discrete triangular, gamma and beta. Thereafter, associated kernels with or without correlation structure are used in multiple regression. In addition to the previous univariate associated kernels, bivariate beta kernels with or without correlation structure are taken into account. Simulations studies show the performance of the choice of associated kernels with full or diagonal bandwidth matrices. Then, (discrete and continuous) associated kernels are combined to define mixed univariate associated kernels. Using the tools of unification of discrete and continuous analysis, the properties of the mixed associated kernel estimators are shown. This is followed by an R package, created in univariate case, for densities, probability mass functions and regressions estimations. Several smoothing parameter selections are implemented via an easy-to-use interface. Throughout the paper, bandwidth matrix selections are generally obtained using cross-validation and sometimes Bayesian methods. Finally, some additionnal informations on normalizing constants of associated kernel estimators are presented for densities or probability mass functions
APA, Harvard, Vancouver, ISO, and other styles
20

Ahmed, Mohamed Salem. "Contribution à la statistique spatiale et l'analyse de données fonctionnelles." Thesis, Lille 3, 2017. http://www.theses.fr/2017LIL30047/document.

Full text
Abstract:
Ce mémoire de thèse porte sur la statistique inférentielle des données spatiales et/ou fonctionnelles. En effet, nous nous sommes intéressés à l’estimation de paramètres inconnus de certains modèles à partir d’échantillons obtenus par un processus d’échantillonnage aléatoire ou non (stratifié), composés de variables indépendantes ou spatialement dépendantes.La spécificité des méthodes proposées réside dans le fait qu’elles tiennent compte de la nature de l’échantillon étudié (échantillon stratifié ou composé de données spatiales dépendantes).Tout d’abord, nous étudions des données à valeurs dans un espace de dimension infinie ou dites ”données fonctionnelles”. Dans un premier temps, nous étudions les modèles de choix binaires fonctionnels dans un contexte d’échantillonnage par stratification endogène (échantillonnage Cas-Témoin ou échantillonnage basé sur le choix). La spécificité de cette étude réside sur le fait que la méthode proposée prend en considération le schéma d’échantillonnage. Nous décrivons une fonction de vraisemblance conditionnelle sous l’échantillonnage considérée et une stratégie de réduction de dimension afin d’introduire une estimation du modèle par vraisemblance conditionnelle. Nous étudions les propriétés asymptotiques des estimateurs proposées ainsi que leurs applications à des données simulées et réelles. Nous nous sommes ensuite intéressés à un modèle linéaire fonctionnel spatial auto-régressif. La particularité du modèle réside dans la nature fonctionnelle de la variable explicative et la structure de la dépendance spatiale des variables de l’échantillon considéré. La procédure d’estimation que nous proposons consiste à réduire la dimension infinie de la variable explicative fonctionnelle et à maximiser une quasi-vraisemblance associée au modèle. Nous établissons la consistance, la normalité asymptotique et les performances numériques des estimateurs proposés.Dans la deuxième partie du mémoire, nous abordons des problèmes de régression et prédiction de variables dépendantes à valeurs réelles. Nous commençons par généraliser la méthode de k-plus proches voisins (k-nearest neighbors; k-NN) afin de prédire un processus spatial en des sites non-observés, en présence de co-variables spatiaux. La spécificité du prédicteur proposé est qu’il tient compte d’une hétérogénéité au niveau de la co-variable utilisée. Nous établissons la convergence presque complète avec vitesse du prédicteur et donnons des résultats numériques à l’aide de données simulées et environnementales.Nous généralisons ensuite le modèle probit partiellement linéaire pour données indépendantes à des données spatiales. Nous utilisons un processus spatial linéaire pour modéliser les perturbations du processus considéré, permettant ainsi plus de flexibilité et d’englober plusieurs types de dépendances spatiales. Nous proposons une approche d’estimation semi paramétrique basée sur une vraisemblance pondérée et la méthode des moments généralisées et en étudions les propriétés asymptotiques et performances numériques. Une étude sur la détection des facteurs de risque de cancer VADS (voies aéro-digestives supérieures)dans la région Nord de France à l’aide de modèles spatiaux à choix binaire termine notre contribution
This thesis is about statistical inference for spatial and/or functional data. Indeed, weare interested in estimation of unknown parameters of some models from random or nonrandom(stratified) samples composed of independent or spatially dependent variables.The specificity of the proposed methods lies in the fact that they take into considerationthe considered sample nature (stratified or spatial sample).We begin by studying data valued in a space of infinite dimension or so-called ”functionaldata”. First, we study a functional binary choice model explored in a case-controlor choice-based sample design context. The specificity of this study is that the proposedmethod takes into account the sampling scheme. We describe a conditional likelihoodfunction under the sampling distribution and a reduction of dimension strategy to definea feasible conditional maximum likelihood estimator of the model. Asymptotic propertiesof the proposed estimates as well as their application to simulated and real data are given.Secondly, we explore a functional linear autoregressive spatial model whose particularityis on the functional nature of the explanatory variable and the structure of the spatialdependence. The estimation procedure consists of reducing the infinite dimension of thefunctional variable and maximizing a quasi-likelihood function. We establish the consistencyand asymptotic normality of the estimator. The usefulness of the methodology isillustrated via simulations and an application to some real data.In the second part of the thesis, we address some estimation and prediction problemsof real random spatial variables. We start by generalizing the k-nearest neighbors method,namely k-NN, to predict a spatial process at non-observed locations using some covariates.The specificity of the proposed k-NN predictor lies in the fact that it is flexible and allowsa number of heterogeneity in the covariate. We establish the almost complete convergencewith rates of the spatial predictor whose performance is ensured by an application oversimulated and environmental data. In addition, we generalize the partially linear probitmodel of independent data to the spatial case. We use a linear process for disturbancesallowing various spatial dependencies and propose a semiparametric estimation approachbased on weighted likelihood and generalized method of moments methods. We establishthe consistency and asymptotic distribution of the proposed estimators and investigate thefinite sample performance of the estimators on simulated data. We end by an applicationof spatial binary choice models to identify UADT (Upper aerodigestive tract) cancer riskfactors in the north region of France which displays the highest rates of such cancerincidence and mortality of the country
APA, Harvard, Vancouver, ISO, and other styles
21

Lu, Fan. "Regularized nonparametric logistic regression and kernel regularization." 2006. http://www.library.wisc.edu/databases/connect/dissertations.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Lien, Ya-Ting, and 連雅亭. "Nonparametric Kernel Regression Estimation inDeterminants of Religious Giving." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/04580167685015635070.

Full text
Abstract:
碩士
國立臺灣大學
政治學研究所
104
The parametric estimation method used to make several assumptions on the population and data. In real case, however, researchers often have to ignore these violations. In non-parametric methods, researchers don’t have to make so many assumptions as they do in parametric estimation. In addition, using non-parametric methods, researchers can get a better fitted model for the data. The application of non-parametric methods in religious giving studies is quite rare, therefore in this study, we introduced the non-parametric kernel regression method to estimate the 2013~2014 religious giving amount of Taiwan. We compared the results of multiple linear regression, Tobit regression and non-parametric kernel regression and found that the kernel regression model shows the best fitting and the smallest RSE. Also, the significance of each coefficients in kernel regression is quite different from that in multiple regression and Tobit regression.
APA, Harvard, Vancouver, ISO, and other styles
23

Chih-Lung, Lu, and 陸治隆. "Double Smoothing of Kernel Estimator in Nonparametric Regression." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/88736964907361472522.

Full text
Abstract:
碩士
淡江大學
數學學系
90
In the nonparametric regression model﹐to ideally estimate the regression function on the whole support of the design density﹐the kernel estimate of the regression function value at each point on the support of the design density needs to be calculated﹒The resulting regression estimate is called the ideal regression function estimate in this paper. In practice﹐the regression function estimate is produced by joining every two consecutive kernel estimates of regression function values by a straight line segment﹒Here the regression function values are estimated on a sequence of equally spaced partition points of real line﹒Hence such regression function estimate is one of the polygon type﹒This type has been addressed by Jones(1989)﹐Deng and Chu(1999)﹐in which it is called interpolated kernel regression estimate﹒The asymptotic bias of the polygon is worse than ideal regression function estimate and the IMSE is also worse than ideal regression function estimate﹒To improve the disadvantage, Yen、Wu and Cheng(2001)proposed a quadratic interpolated regression estimate with three points to estimate the regression function and they also to structure a new estimator with the quadratic interpolated regression estimate which is better than local linear kernel estimate﹒In this paper﹐we propose two new version of the Yen、Wu and Cheng in which that is to fit a piece of quadratic cure with four points and the other is moving quadratic kernel estimate﹒And we prove that the asymptotic bias of the piece of quadratic cure with four points is the same as the ideal regression function estimate, and the asymptotic variance is better than ideal regression function estimate. Simulation studies show that the resulting performance of the moving quadratic kernel estimate with four points is better than two-order kernel estimate and moving quadratic kernel estimate with three points﹒When samples small than 200﹐our method is also better than four-order kernel estimate﹒
APA, Harvard, Vancouver, ISO, and other styles
24

Schindler, Anja. "Bandwidth Selection in Nonparametric Kernel Estimation." Thesis, 2011. http://hdl.handle.net/11858/00-1735-0000-0006-AFD4-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Lin, Yung-Li, and 林永立. "The Application of Fourier Series And Kernel Estimators in Nonparametric Regression Analysis." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/52714711274185599593.

Full text
Abstract:
碩士
國立中興大學
農藝學系
87
One of the objective of regression analysis is to investigate the relationship between the design points tj and expected value of response variable yj on the data set {(tj,yj)},j=1,…,n . In general, the analysis could be classified as parametric and nonparametric models. The main distinction is whether the functional form of μ is known. A parametric regression model assume that the form of μ is known. In nonparametric regression analysis, the information of observations yj on the neighborhood of design point t are used to estimate μ(t) . The simplest way is to calculate the weighted average of these observations. Both classical Fourier series and kernel estimators are the kind of weighted average. When the observed data exhibit periodic behavior, usually a linear model including sines and cosines is employed to estimate the regression function μ. The classical Fourier series estimator is of this type. However, if the regression function μ does not satisfy periodic boundary conditions, the criterion based on the mean squared error, i.e. CV or GCV, to choose the number of trigonometric functions in the regression would yield too many terms and the chosen function performs wiggly. A combination of low-order polynomial and trigonometric terms could alleviate the above problem and achieve a smooth curve. As the data set on the interval [0,1] , kernel estimator may also be used to estimate the regression function. The kernel estimator is biased. Under a specified kernel function, smaller bandwidth will cause small bias, but large variance, hence the estimator will be undersmoothing. On the contrary, large bandwidth will cause large bias, but small variance, so the estimator will be oversmoothing. When the σ2 is known, the risk function criteria can be employed to trade-off the biasness and variance and used to decide a suitable smoothing parameter value. However, when the σ2 is unknown, CV and GCV criteria can be used to choose a smoothing parameter value. Beside using on the fitting of data, kernel estimation was also applicable for the estimation of median effective dose (ED50), and constructed the approximate confidence interval for the dose-response curve in bioassay. Finney(1978) had used the data of insulin, 9 doses of s preparation with a dose of insulin were treated to mice, then recorded the numbers of mice showing the symptoms of convulsions. The fitness of probit and logit models of the data set was examined by Pearson's chi-squares test, and the ED50 estimate and 95% confidence interval using parametric method and kernel estimation were compared. The ED50 estimate using kernel estimation is larger than using parametric method, and the width of confidence interval constructed by kernel estimation is wider. Additionally, on the data of treating carbofuran to Meloidogyne incognita, after lack of fit test, it was shown should that both the probit and logit models were not appropriate. The estimate of the ED50 using trimmed Spearman-Karber method was compared with the result using kernel estimate. There is little difference between these two ED50 estimates, and the width of confidence interval of ED50 using kernel estimation is narrower.
APA, Harvard, Vancouver, ISO, and other styles
26

Mao, Kai. "Nonparametric Bayesian Models for Supervised Dimension Reduction and Regression." Diss., 2009. http://hdl.handle.net/10161/1581.

Full text
Abstract:

We propose nonparametric Bayesian models for supervised dimension

reduction and regression problems. Supervised dimension reduction is

a setting where one needs to reduce the dimensionality of the

predictors or find the dimension reduction subspace and lose little

or no predictive information. Our first method retrieves the

dimension reduction subspace in the inverse regression framework by

utilizing a dependent Dirichlet process that allows for natural

clustering for the data in terms of both the response and predictor

variables. Our second method is based on ideas from the gradient

learning framework and retrieves the dimension reduction subspace

through coherent nonparametric Bayesian kernel models. We also

discuss and provide a new rationalization of kernel regression based

on nonparametric Bayesian models allowing for direct and formal

inference on the uncertain regression functions. Our proposed models

apply for high dimensional cases where the number of variables far

exceed the sample size, and hold for both the classical setting of

Euclidean subspaces and the Riemannian setting where the marginal

distribution is concentrated on a manifold. Our Bayesian perspective

adds appropriate probabilistic and statistical frameworks that allow

for rich inference such as uncertainty estimation which is important

for measuring the estimates. Formal probabilistic models with

likelihoods and priors are given and efficient posterior sampling

can be obtained by Markov chain Monte Carlo methodologies,

particularly Gibbs sampling schemes. For the supervised dimension

reduction as the posterior draws are linear subspaces which are

points on a Grassmann manifold, we do the posterior inference with

respect to geodesics on the Grassmannian. The utility of our

approaches is illustrated on simulated and real examples.


Dissertation
APA, Harvard, Vancouver, ISO, and other styles
27

Chang, Po-Jen, and 張博仁. "A Nonparametric Approach to Pricing and Hedging MBS Via Kernel-Density Regression Model." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/34145936749136671434.

Full text
Abstract:
碩士
國立中正大學
財務金融研究所
90
The Financial Asset Securitization Law was just passed by Taiwan’s legislature in June, 2002. The law is expected to address stagnancy in Taiwan''s capital markets, it could potentially reinvigorate domestic banks by allowing their asset-backed loans to be packaged into securities. Hence, this new law facilitates the mechanism whereby banks can re-package collateral and sell it to investors as securities, helping to increase the "liquidity of banks" and also diversifying their risks. Financial assets such as mortgages, car loans or credit card receivable accounts can be repackaged into small values in the form of securities certificates or beneficiary certificates for sales to investors. Among all kinds of financial asset products, mortgage-backed security is the most popular in US market. The current way of solving this valuation problems has been to assume a stochastic process for term structure movements and to employ either a simulation/forecasting pricing approach or an empirical/statistical approach for prepayment behavior and price process. In this article, we propose a nonparametric pricing method, kernel-density regression approach, to price weekly TBA (to be announced) GNMA securities. Here we have three goals: the first is to find out what is the best way to reduce the number of independent variables to use for the kernel model and other model and what is the remaining inputs, the second is to assess the pricing effect of kernel-density regression approach versus other pricing models. Finally, we want to recognize the hedging effectiveness of kernel-density regression approach and other models. For comparison, we use another two popular pricing approaches: ordinary least squares (OLS) and a parametric model (proprietary practitioner model). According empirical results, we find that kernel-density regression model perform more effectively on estimating MBS price than the other two models mentioned in this article, except in out-of-sample of time-series sampling. Moreover, kernel-density regression model have better pricing effect on random sampling than on time-series sampling, especially in out-of-sample. In addition, SAS MAXR procedure and principal component analysis can effectively reduce the number of independent variables used for both kernel-density regression model and OLS model. In regard to the hedging effect, the results of in-of-sample are approximately the same with pricing effect analysis. But, in contrast with in-of-sample, Kernel(3-month rate) is the best way to hedge the MBS in out-of-sample, especially on random sampling.
APA, Harvard, Vancouver, ISO, and other styles
28

Savchuk, Olga. "Choosing a Kernel for Cross-Validation." 2009. http://hdl.handle.net/1969.1/ETD-TAMU-2009-08-7002.

Full text
Abstract:
The statistical properties of cross-validation bandwidths can be improved by choosing an appropriate kernel, which is different from the kernels traditionally used for cross- validation purposes. In the light of this idea, we developed two new methods of bandwidth selection termed: Indirect cross-validation and Robust one-sided cross- validation. The kernels used in the Indirect cross-validation method yield an improvement in the relative bandwidth rate to n^1=4, which is substantially better than the n^1=10 rate of the least squares cross-validation method. The robust kernels used in the Robust one-sided cross-validation method eliminate the bandwidth bias for the case of regression functions with discontinuous derivatives.
APA, Harvard, Vancouver, ISO, and other styles
29

Krugell, Marike. "Bias reduction studies in nonparametric regression with applications : an empirical approach / Marike Krugell." Thesis, 2014. http://hdl.handle.net/10394/15345.

Full text
Abstract:
The purpose of this study is to determine the effect of three improvement methods on nonparametric kernel regression estimators. The improvement methods are applied to the Nadaraya-Watson estimator with crossvalidation bandwidth selection, the Nadaraya-Watson estimator with plug-in bandwidth selection, the local linear estimator with plug-in bandwidth selection and a bias corrected nonparametric estimator proposed by Yao (2012). The di erent resulting regression estimates are evaluated by minimising a global discrepancy measure, i.e. the mean integrated squared error (MISE). In the machine learning context various improvement methods, in terms of the precision and accuracy of an estimator, exist. The rst two improvement methods introduced in this study are bootstrapped based. Bagging is an acronym for bootstrap aggregating and was introduced by Breiman (1996a) from a machine learning viewpoint and by Swanepoel (1988, 1990) in a functional context. Bagging is primarily a variance reduction tool, i.e. bagging is implemented to reduce the variance of an estimator and in this way improve the precision of the estimation process. Bagging is performed by drawing repetitive bootstrap samples from the original sample and generating multiple versions of an estimator. These replicates of the estimator are then used to obtain an aggregated estimator. Bragging stands for bootstrap robust aggregating. A robust estimator is obtained by using the sample median over the B bootstrap estimates instead of the sample mean as in bagging. The third improvement method aims to reduce the bias component of the estimator and is referred to as boosting. Boosting is a general method for improving the accuracy of any given learning algorithm. The method starts of with a sensible estimator and improves iteratively, based on its performance on a training dataset. Results and conclusions verifying existing literature are provided, as well as new results for the new methods.
MSc (Statistics), North-West University, Potchefstroom Campus, 2015
APA, Harvard, Vancouver, ISO, and other styles
30

Coufal, David. "Sekvenční metody Monte Carlo." Master's thesis, 2013. http://www.nusl.cz/ntk/nusl-324557.

Full text
Abstract:
Title: Sequential Monte Carlo Methods Author: David Coufal Department: Department of Probability and Mathematical Statistics Supervisor: prof. RNDr. Viktor Beneš, DrSc. Abstract: The thesis summarizes theoretical foundations of sequential Monte Carlo methods with a focus on the application in the area of particle filters; and basic results from the theory of nonparametric kernel density estimation. The summary creates the basis for investigation of application of kernel meth- ods for approximation of densities of distributions generated by particle filters. The main results of the work are the proof of convergence of kernel estimates to related theoretical densities and the specification of the development of approx- imation error with respect to time evolution of a filter. The work is completed by an experimental part demonstrating the work of presented algorithms by simulations in the MATLABR⃝ computational environment. Keywords: sequential Monte Carlo methods, particle filters, nonparametric kernel estimates
APA, Harvard, Vancouver, ISO, and other styles
31

Schutte, Willem Daniël. "Nonparametric estimation of the off-pulse interval(s) of a pulsar light curve / Willem Daniël Schutte." Thesis, 2014. http://hdl.handle.net/10394/12199.

Full text
Abstract:
The main objective of this thesis is the development of a nonparametric sequential estimation technique for the off-pulse interval(s) of a source function originating from a pulsar. It is important to identify the off-pulse interval of each pulsar accurately, since the properties of the off-pulse emissions are further researched by astrophysicists in an attempt to detect potential emissions from the associated pulsar wind nebula (PWN). The identification technique currently used in the literature is subjective in nature, since it is based on the visual inspection of the histogram estimate of the pulsar light curve. The developed nonparametric estimation technique is not only objective in nature, but also accurate in the estimation of the off-pulse interval of a pulsar, as evident from the simulation study and the application of the developed technique to observed pulsar data. The first two chapters of this thesis are devoted to a literature study that provides background information on the pulsar environment and -ray astronomy, together with an explanation of the on-pulse and off-pulse interval of a pulsar and the importance thereof for the present study. This is followed by a discussion on some fundamental circular statistical ideas, as well as an overview of kernel density estimation techniques. These two statistical topics are then united in order to illustrate kernel density estimation techniques applied to circular data, since this concept is the starting point of the developed nonparametric sequential estimation technique. Once the basic theoretical background of the pulsar environment and circular kernel density estimation has been established, the new sequential off-pulse interval estimator is formulated. The estimation technique will be referred to as `SOPIE'. A number of tuning parameters form part of SOPIE, and therefore the performed simulation study not only serves as an evaluation of the performance of SOPIE, but also as a mechanism to establish which tuning parameter configurations consistently perform better than some other configurations. In conclusion, the optimal parameter configurations are utilised in the application of SOPIE to pulsar data. For several pulsars, the sequential off-pulse interval estimators are compared to the off-pulse intervals published in research papers, which were identified with the subjective \eye-ball" technique. It is found that the sequential off-pulse interval estimators are closely related to the off-pulse intervals identified with subjective visual inspection, with the benefit that the estimated intervals are objectively obtained with a nonparametric estimation technique.
PhD (Statistics), North-West University, Potchefstroom Campus, 2014
APA, Harvard, Vancouver, ISO, and other styles
32

Krishnan, Sunder Ram. "Optimum Savitzky-Golay Filtering for Signal Estimation." Thesis, 2013. http://hdl.handle.net/2005/3293.

Full text
Abstract:
Motivated by the classic works of Charles M. Stein, we focus on developing risk-estimation frameworks for denoising problems in both one-and two-dimensions. We assume a standard additive noise model, and formulate the denoising problem as one of estimating the underlying clean signal from noisy measurements by minimizing a risk corresponding to a chosen loss function. Our goal is to incorporate perceptually-motivated loss functions wherever applicable, as in the case of speech enhancement, with the squared error loss being considered for the other scenarios. Since the true risks are observed to depend on the unknown parameter of interest, we circumvent the roadblock by deriving finite-sample un-biased estimators of the corresponding risks based on Stein’s lemma. We establish the link with the multivariate parameter estimation problem addressed by Stein and our denoising problem, and derive estimators of the oracle risks. In all cases, optimum values of the parameters characterizing the denoising algorithm are determined by minimizing the Stein’s unbiased risk estimator (SURE). The key contribution of this thesis is the development of a risk-estimation approach for choosing the two critical parameters affecting the quality of nonparametric regression, namely, the order and bandwidth/smoothing parameters. This is a classic problem in statistics, and certain algorithms relying on derivation of suitable finite-sample risk estimators for minimization have been reported in the literature (note that all these works consider the mean squared error (MSE) objective). We show that a SURE-based formalism is well-suited to the regression parameter selection problem, and that the optimum solution guarantees near-minimum MSE (MMSE) performance. We develop algorithms for both glob-ally and locally choosing the two parameters, the latter referred to as spatially-adaptive regression. We observe that the parameters are so chosen as to tradeoff the squared bias and variance quantities that constitute the MSE. We also indicate the advantages accruing out of incorporating a regularization term in the cost function in addition to the data error term. In the more general case of kernel regression, which uses a weighted least-squares (LS) optimization, we consider the applications of image restoration from very few random measurements, in addition to denoising of uniformly sampled data. We show that local polynomial regression (LPR) becomes a special case of kernel regression, and extend our results for LPR on uniform data to non-uniformly sampled data also. The denoising algorithms are compared with other standard, performant methods available in the literature both in terms of estimation error and computational complexity. A major perspective provided in this thesis is that the problem of optimum parameter choice in nonparametric regression can be viewed as the selection of optimum parameters of a linear, shift-invariant filter. This interpretation is provided by deriving motivation out of the hallmark paper of Savitzky and Golay and Schafer’s recent article in IEEE Signal Processing Magazine. It is worth noting that Savitzky and Golay had shown in their original Analytical Chemistry journal article, that LS fitting of a fixed-order polynomial over a neighborhood of fixed size is equivalent to convolution with an impulse response that is fixed and can be pre-computed. They had provided tables of impulse response coefficients for computing the smoothed function and smoothed derivatives for different orders and neighborhood sizes, the resulting filters being referred to as Savitzky-Golay (S-G) filters. Thus, we provide the new perspective that the regression parameter choice is equivalent to optimizing for the filter impulse response length/3dB bandwidth, which are inversely related. We observe that the MMSE solution is such that the S-G filter chosen is of longer impulse response length (equivalently smaller cutoff frequency) at relatively flat portions of the noisy signal so as to smooth noise, and vice versa at locally fast-varying portions of the signal so as to capture the signal patterns. Also, we provide a generalized S-G filtering viewpoint in the case of kernel regression. Building on the S-G filtering perspective, we turn to the problem of dynamic feature computation in speech recognition. We observe that the methodology employed for computing dynamic features from the trajectories of static features is in fact derivative S-G filtering. With this perspective, we note that the filter coefficients can be pre-computed, and that the whole problem of delta feature computation becomes efficient. Indeed, we observe an advantage by a factor of 104 on making use of S-G filtering over actual LS polynomial fitting and evaluation. Thereafter, we study the properties of first-and second-order derivative S-G filters of certain orders and lengths experimentally. The derivative filters are bandpass due to the combined effects of LPR and derivative computation, which are lowpass and highpass operations, respectively. The first-and second-order S-G derivative filters are also observed to exhibit an approximately constant-Q property. We perform a TIMIT phoneme recognition experiment comparing the recognition accuracies obtained using S-G filters and the conventional approach followed in HTK, where Furui’s regression formula is made use of. The recognition accuracies for both cases are almost identical, with S-G filters of certain bandwidths and orders registering a marginal improvement. The accuracies are also observed to improve with longer filter lengths, for a particular order. In terms of computation latency, we note that S-G filtering achieves delta and delta-delta feature computation in parallel by linear filtering, whereas they need to be obtained sequentially in case of the standard regression formulas used in the literature. Finally, we turn to the problem of speech enhancement where we are interested in de-noising using perceptually-motivated loss functions such as Itakura-Saito (IS). We propose to perform enhancement in the discrete cosine transform domain using risk-minimization. The cost functions considered are non-quadratic, and derivation of the unbiased estimator of the risk corresponding to the IS distortion is achieved using an approximate Taylor-series analysis under high signal-to-noise ratio assumption. The exposition is general since we focus on an additive noise model with the noise density assumed to fall within the exponential class of density functions, which comprises most of the common densities. The denoising function is assumed to be pointwise linear (modified James-Stein (MJS) estimator), and parallels between Wiener filtering and the optimum MJS estimator are discussed.
APA, Harvard, Vancouver, ISO, and other styles
33

Singer, Marco. "Partial Least Squares for Serially Dependent Data." Doctoral thesis, 2016. http://hdl.handle.net/11858/00-1735-0000-0028-8831-B.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography