To see the other types of publications on this topic, follow the link: Non-parametric learning.

Dissertations / Theses on the topic 'Non-parametric learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 26 dissertations / theses for your research on the topic 'Non-parametric learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Zewdie, Dawit (Dawit Habtamu). "Representation discovery in non-parametric reinforcement learning." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/91883.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 71-73).
Recent years have seen a surge of interest in non-parametric reinforcement learning. There are now practical non-parametric algorithms that use kernel regression to approximate value functions. The correctness guarantees of kernel regression require that the underlying value function be smooth. Most problems of interest do not satisfy this requirement in their native space, but can be represented in such a way that they do. In this thesis, we show that the ideal representation is one that maps points directly to their values. Existing representation discovery algorithms that have been used in parametric reinforcement learning settings do not, in general, produce such a representation. We go on to present Fit-Improving Iterative Representation Adjustment (FIIRA), a novel framework for function approximation and representation discovery, which interleaves steps of value estimation and representation adjustment to increase the expressive power of a given regression scheme. We then show that FIIRA creates representations that correlate highly with value, giving kernel regression the power to represent discontinuous functions. Finally, we extend kernel-based reinforcement learning to use FIIRA and show that this results in performance improvements on three benchmark problems: Mountain-Car, Acrobot, and PinBall.
by Dawit Zewdie.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
2

Campanholo, Guizilini Vitor. "Non-Parametric Learning for Monocular Visual Odometry." Thesis, The University of Sydney, 2013. http://hdl.handle.net/2123/9903.

Full text
Abstract:
This thesis addresses the problem of incremental localization from visual information, a scenario commonly known as visual odometry. Current visual odometry algorithms are heavily dependent on camera calibration, using a pre-established geometric model to provide the transformation between input (optical flow estimates) and output (vehicle motion estimates) information. A novel approach to visual odometry is proposed in this thesis where the need for camera calibration, or even for a geometric model, is circumvented by the use of machine learning principles and techniques. A non-parametric Bayesian regression technique, the Gaussian Process (GP), is used to elect the most probable transformation function hypothesis from input to output, based on training data collected prior and during navigation. Other than eliminating the need for a geometric model and traditional camera calibration, this approach also allows for scale recovery even in a monocular configuration, and provides a natural treatment of uncertainties due to the probabilistic nature of GPs. Several extensions to the traditional GP framework are introduced and discussed in depth, and they constitute the core of the contributions of this thesis to the machine learning and robotics community. The proposed framework is tested in a wide variety of scenarios, ranging from urban and off-road ground vehicles to unconstrained 3D unmanned aircrafts. The results show a significant improvement over traditional visual odometry algorithms, and also surpass results obtained using other sensors, such as laser scanners and IMUs. The incorporation of these results to a SLAM scenario, using a Exact Sparse Information Filter (ESIF), is shown to decrease global uncertainty by exploiting revisited areas of the environment. Finally, a technique for the automatic segmentation of dynamic objects is presented, as a way to increase the robustness of image information and further improve visual odometry results.
APA, Harvard, Vancouver, ISO, and other styles
3

Bratières, Sébastien. "Non-parametric Bayesian models for structured output prediction." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/274973.

Full text
Abstract:
Structured output prediction is a machine learning tasks in which an input object is not just assigned a single class, as in classification, but multiple, interdependent labels. This means that the presence or value of a given label affects the other labels, for instance in text labelling problems, where output labels are applied to each word, and their interdependencies must be modelled. Non-parametric Bayesian (NPB) techniques are probabilistic modelling techniques which have the interesting property of allowing model capacity to grow, in a controllable way, with data complexity, while maintaining the advantages of Bayesian modelling. In this thesis, we develop NPB algorithms to solve structured output problems. We first study a map-reduce implementation of a stochastic inference method designed for the infinite hidden Markov model, applied to a computational linguistics task, part-of-speech tagging. We show that mainstream map-reduce frameworks do not easily support highly iterative algorithms. The main contribution of this thesis consists in a conceptually novel discriminative model, GPstruct. It is motivated by labelling tasks, and combines attractive properties of conditional random fields (CRF), structured support vector machines, and Gaussian process (GP) classifiers. In probabilistic terms, GPstruct combines a CRF likelihood with a GP prior on factors; it can also be described as a Bayesian kernelized CRF. To train this model, we develop a Markov chain Monte Carlo algorithm based on elliptical slice sampling and investigate its properties. We then validate it on real data experiments, and explore two topologies: sequence output with text labelling tasks, and grid output with semantic segmentation of images. The latter case poses scalability issues, which are addressed using likelihood approximations and an ensemble method which allows distributed inference and prediction. The experimental validation demonstrates: (a) the model is flexible and its constituent parts are modular and easy to engineer; (b) predictive performance and, most crucially, the probabilistic calibration of predictions are better than or equal to that of competitor models, and (c) model hyperparameters can be learnt from data.
APA, Harvard, Vancouver, ISO, and other styles
4

Prando, Giulia. "Non-Parametric Bayesian Methods for Linear System Identification." Doctoral thesis, Università degli studi di Padova, 2017. http://hdl.handle.net/11577/3426195.

Full text
Abstract:
Recent contributions have tackled the linear system identification problem by means of non-parametric Bayesian methods, which are built on largely adopted machine learning techniques, such as Gaussian Process regression and kernel-based regularized regression. Following the Bayesian paradigm, these procedures treat the impulse response of the system to be estimated as the realization of a Gaussian process. Typically, a Gaussian prior accounting for stability and smoothness of the impulse response is postulated, as a function of some parameters (called hyper-parameters in the Bayesian framework). These are generally estimated by maximizing the so-called marginal likelihood, i.e. the likelihood after the impulse response has been marginalized out. Once the hyper-parameters have been fixed in this way, the final estimator is computed as the conditional expected value of the impulse response w.r.t. the posterior distribution, which coincides with the minimum variance estimator. Assuming that the identification data are corrupted by Gaussian noise, the above-mentioned estimator coincides with the solution of a regularized estimation problem, in which the regularization term is the l2 norm of the impulse response, weighted by the inverse of the prior covariance function (a.k.a. kernel in the machine learning literature). Recent works have shown how such Bayesian approaches are able to jointly perform estimation and model selection, thus overcoming one of the main issues affecting parametric identification procedures, that is complexity selection.
While keeping the classical system identification methods (e.g. Prediction Error Methods and subspace algorithms) as a benchmark for numerical comparison, this thesis extends and analyzes some key aspects of the above-mentioned Bayesian procedure. In particular, four main topics are considered. 1. PRIOR DESIGN. Adopting Maximum Entropy arguments, a new type of l2 regularization is derived: the aim is to penalize the rank of the block Hankel matrix built with Markov coefficients, thus controlling the complexity of the identified model, measured by its McMillan degree. By accounting for the coupling between different input-output channels, this new prior results particularly suited when dealing for the identification of MIMO systems
To speed up the computational requirements of the estimation algorithm, a tailored version of the Scaled Gradient Projection algorithm is designed to optimize the marginal likelihood. 2. CHARACTERIZATION OF UNCERTAINTY. The confidence sets returned by the non-parametric Bayesian identification algorithm are analyzed and compared with those returned by parametric Prediction Error Methods. The comparison is carried out in the impulse response space, by deriving “particle” versions (i.e. Monte-Carlo approximations) of the standard confidence sets. 3. ONLINE ESTIMATION. The application of the non-parametric Bayesian system identification techniques is extended to an online setting, in which new data become available as time goes. Specifically, two key modifications of the original “batch” procedure are proposed in order to meet the real-time requirements. In addition, the identification of time-varying systems is tackled by introducing a forgetting factor in the estimation criterion and by treating it as a hyper-parameter. 4. POST PROCESSING: MODEL REDUCTION. Non-parametric Bayesian identification procedures estimate the unknown system in terms of its impulse response coefficients, thus returning a model with high (possibly infinite) McMillan degree. A tailored procedure is proposed to reduce such model to a lower degree one, which appears more suitable for filtering and control applications. Different criteria for the selection of the order of the reduced model are evaluated and compared.
Recentemente, il problema di identificazione di sistemi lineari è stato risolto ricorrendo a metodi Bayesiani non-parametrici, che sfruttano di tecniche di Machine Learning ampiamente utilizzate, come la regressione gaussiana e la regolarizzazione basata su kernels. Seguendo il paradigma Bayesiano, queste procedure richiedono una distribuzione Gaussiana a-priori per la risposta impulsiva. Tale distribuzione viene definita in funzione di alcuni parametri (chiamati iper-parametri nell'ambito Bayesiano), che vengono stimati usando i dati a disposizione. Una volta che gli iper-parametri sono stati fissati, è possibile calcolare lo stimatore a minima varianza come il valore atteso della risposta impulsiva, condizionato rispetto alla distribuzione a posteriori. Assumendo che i dati di identificazione siano corrotti da rumore Gaussiano, tale stimatore coincide con la soluzione di un problema di stima regolarizzato, nel quale il termine di regolarizzazione è la norma l2 della risposta impulsiva, pesata dall'inverso della funzione di covarianza a priori (tale funzione viene anche detta "kernel" nella letteratura di Machine Learning). Recenti lavori hanno dimostrato come questi metodi Bayesiani possano contemporaneamente selezionare un modello ottimale e stimare la quantità sconosciuta. In tal modo sono in grado di superare uno dei principali problemi che affliggono le tecniche di identificazione parametrica, ovvero quella della selezione della complessità di modello. Considerando come benchmark le tecniche classiche di identificazione (ovvero i Metodi a Predizione d'Errore e gli algoritmi Subspace), questa tesi estende ed analizza alcuni aspetti chiave della procedura Bayesiana sopraccitata. In particolare, la tesi si sviluppa su quattro argomenti principali. 1. DESIGN DELLA DISTRIBUZIONE A PRIORI. Sfruttando la teoria delle distribuzioni a Massima Entropia, viene derivato un nuovo tipo di regolarizzazione l2 con l'obiettivo di penalizzare il rango della matrice di Hankel contenente i coefficienti di Markov. In tal modo è possibile controllare la complessità del modello stimato, misurata in termini del grado di McMillan. 2. CARATTERIZZAZIONE DELL'INCERTEZZA. Gli intervalli di confidenza costruiti dall'algoritmo di identificazione Bayesiana non-parametrica vengono analizzati e confrontati con quelli restituiti dai metodi parametrici a Predizione d'Errore. Convertendo quest'ultimi nelle loro approssimazioni campionarie, il confronto viene effettuato nello spazio a cui appartiene la risposta impulsiva. 3. STIMA ON-LINE. L'applicazione delle tecniche Bayesiane non-parametriche per l'identificazione dei sistemi viene estesa ad uno scenario on-line, in cui nuovi dati diventano disponibili ad intervalli di tempo prefissati. Vengono proposte due modifiche chiave della procedura standard off-line in modo da soddisfare i requisiti della stima real-time. Viene anche affrontata l'identificazione di sistemi tempo-varianti tramite l'introduzione, nel criterio di stima, di un fattore di dimenticanza, il quale e' in seguito trattato come un iper-parametro. 4. RIDUZIONE DEL MODELLO STIMATO. Le tecniche di identificazione Bayesiana non-parametrica restituiscono una stima della risposta impulsiva del sistema sconosciuto, ovvero un modello con un alto (verosimilmente infinito) grado di McMillan. Viene quindi proposta un'apposita procedura per ridurre tale modello ad un grado più basso, in modo che risulti più adatto per future applicazioni di controllo e filtraggio. Vengono inoltre confrontati diversi criteri per la selezione dell'ordine del modello ridotto.
APA, Harvard, Vancouver, ISO, and other styles
5

Angola, Enrique. "Novelty Detection Of Machinery Using A Non-Parametric Machine Learning Approach." ScholarWorks @ UVM, 2018. https://scholarworks.uvm.edu/graddis/923.

Full text
Abstract:
A novelty detection algorithm inspired by human audio pattern recognition is conceptualized and experimentally tested. This anomaly detection technique can be used to monitor the health of a machine or could also be coupled with a current state of the art system to enhance its fault detection capabilities. Time-domain data obtained from a microphone is processed by applying a short-time FFT, which returns time-frequency patterns. Such patterns are fed to a machine learning algorithm, which is designed to detect novel signals and identify windows in the frequency domain where such novelties occur. The algorithm presented in this paper uses one-dimensional kernel density estimation for different frequency bins. This process eliminates the need for data dimension reduction algorithms. The method of "pseudo-likelihood cross validation" is used to find an independent optimal kernel bandwidth for each frequency bin. Metrics such as the "Individual Node Relative Difference" and "Total Novelty Score" are presented in this work, and used to assess the degree of novelty of a new signal. Experimental datasets containing synthetic and real novelties are used to illustrate and test the novelty detection algorithm. Novelties are successfully detected in all experiments. The presented novelty detection technique could greatly enhance the performance of current state-of-the art condition monitoring systems, or could also be used as a stand-alone system.
APA, Harvard, Vancouver, ISO, and other styles
6

Bartcus, Marius. "Bayesian non-parametric parsimonious mixtures for model-based clustering." Thesis, Toulon, 2015. http://www.theses.fr/2015TOUL0010/document.

Full text
Abstract:
Cette thèse porte sur l’apprentissage statistique et l’analyse de données multi-dimensionnelles. Elle se focalise particulièrement sur l’apprentissage non supervisé de modèles génératifs pour la classification automatique. Nous étudions les modèles de mélanges Gaussians, aussi bien dans le contexte d’estimation par maximum de vraisemblance via l’algorithme EM, que dans le contexte Bayésien d’estimation par Maximum A Posteriori via des techniques d’échantillonnage par Monte Carlo. Nous considérons principalement les modèles de mélange parcimonieux qui reposent sur une décomposition spectrale de la matrice de covariance et qui offre un cadre flexible notamment pour les problèmes de classification en grande dimension. Ensuite, nous investiguons les mélanges Bayésiens non-paramétriques qui se basent sur des processus généraux flexibles comme le processus de Dirichlet et le Processus du Restaurant Chinois. Cette formulation non-paramétrique des modèles est pertinente aussi bien pour l’apprentissage du modèle, que pour la question difficile du choix de modèle. Nous proposons de nouveaux modèles de mélanges Bayésiens non-paramétriques parcimonieux et dérivons une technique d’échantillonnage par Monte Carlo dans laquelle le modèle de mélange et son nombre de composantes sont appris simultanément à partir des données. La sélection de la structure du modèle est effectuée en utilisant le facteur de Bayes. Ces modèles, par leur formulation non-paramétrique et parcimonieuse, sont utiles pour les problèmes d’analyse de masses de données lorsque le nombre de classe est indéterminé et augmente avec les données, et lorsque la dimension est grande. Les modèles proposés validés sur des données simulées et des jeux de données réelles standard. Ensuite, ils sont appliqués sur un problème réel difficile de structuration automatique de données bioacoustiques complexes issues de signaux de chant de baleine. Enfin, nous ouvrons des perspectives Markoviennes via les processus de Dirichlet hiérarchiques pour les modèles Markov cachés
This thesis focuses on statistical learning and multi-dimensional data analysis. It particularly focuses on unsupervised learning of generative models for model-based clustering. We study the Gaussians mixture models, in the context of maximum likelihood estimation via the EM algorithm, as well as in the Bayesian estimation context by maximum a posteriori via Markov Chain Monte Carlo (MCMC) sampling techniques. We mainly consider the parsimonious mixture models which are based on a spectral decomposition of the covariance matrix and provide a flexible framework particularly for the analysis of high-dimensional data. Then, we investigate non-parametric Bayesian mixtures which are based on general flexible processes such as the Dirichlet process and the Chinese Restaurant Process. This non-parametric model formulation is relevant for both learning the model, as well for dealing with the issue of model selection. We propose new Bayesian non-parametric parsimonious mixtures and derive a MCMC sampling technique where the mixture model and the number of mixture components are simultaneously learned from the data. The selection of the model structure is performed by using Bayes Factors. These models, by their non-parametric and sparse formulation, are useful for the analysis of large data sets when the number of classes is undetermined and increases with the data, and when the dimension is high. The models are validated on simulated data and standard real data sets. Then, they are applied to a real difficult problem of automatic structuring of complex bioacoustic data issued from whale song signals. Finally, we open Markovian perspectives via hierarchical Dirichlet processes hidden Markov models
APA, Harvard, Vancouver, ISO, and other styles
7

Mahler, Nicolas. "Machine learning methods for discrete multi-scale fows : application to finance." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2012. http://tel.archives-ouvertes.fr/tel-00749717.

Full text
Abstract:
This research work studies the problem of identifying and predicting the trends of a single financial target variable in a multivariate setting. The machine learning point of view on this problem is presented in chapter I. The efficient market hypothesis, which stands in contradiction with the objective of trend prediction, is first recalled. The different schools of thought in market analysis, which disagree to some extent with the efficient market hypothesis, are reviewed as well. The tenets of the fundamental analysis, the technical analysis and the quantitative analysis are made explicit. We particularly focus on the use of machine learning techniques for computing predictions on time-series. The challenges of dealing with dependent and/or non-stationary features while avoiding the usual traps of overfitting and data snooping are emphasized. Extensions of the classical statistical learning framework, particularly transfer learning, are presented. The main contribution of this chapter is the introduction of a research methodology for developing trend predictive numerical models. It is based on an experimentation protocol, which is made of four interdependent modules. The first module, entitled Data Observation and Modeling Choices, is a preliminary module devoted to the statement of very general modeling choices, hypotheses and objectives. The second module, Database Construction, turns the target and explanatory variables into features and labels in order to train trend predictive numerical models. The purpose of the third module, entitled Model Construction, is the construction of trend predictive numerical models. The fourth and last module, entitled Backtesting and Numerical Results, evaluates the accuracy of the trend predictive numerical models over a "significant" test set via two generic backtesting plans. The first plan computes recognition rates of upward and downward trends. The second plan designs trading rules using predictions made over the test set. Each trading rule yields a profit and loss account (P&L), which is the cumulated earned money over time. These backtesting plans are additionally completed by interpretation functionalities, which help to analyze the decision mechanism of the numerical models. These functionalities can be measures of feature prediction ability and measures of model and prediction reliability. They decisively contribute to formulating better data hypotheses and enhancing the time-series representation, database and model construction procedures. This is made explicit in chapter IV. Numerical models, aiming at predicting the trends of the target variables introduced in chapter II, are indeed computed for the model construction methods described in chapter III and thoroughly backtested. The switch from one model construction approach to another is particularly motivated. The dramatic influence of the choice of parameters - at each step of the experimentation protocol - on the formulation of conclusion statements is also highlighted. The RNN procedure, which does not require any parameter tuning, has thus been used to reliably study the efficient market hypothesis. New research directions for designing trend predictive models are finally discussed.
APA, Harvard, Vancouver, ISO, and other styles
8

GONÇALVES, JÚNIOR Paulo Mauricio. "Multivariate non-parametric statistical tests to reuse classifiers in recurring concept drifting environments." Universidade Federal de Pernambuco, 2013. https://repositorio.ufpe.br/handle/123456789/12226.

Full text
Abstract:
Data streams are a recent processing model where data arrive continuously, in large quantities, at high speeds, so that they must be processed on-line. Besides that, several private and public institutions store large amounts of data that also must be processed. Traditional batch classi ers are not well suited to handle huge amounts of data for basically two reasons. First, they usually read the available data several times until convergence, which is impractical in this scenario. Second, they imply that the context represented by data is stable in time, which may not be true. In fact, the context change is a common situation in data streams, and is named concept drift. This thesis presents rcd, a framework that o ers an alternative approach to handle data streams that su er from recurring concept drifts. It creates a new classi er to each context found and stores a sample of the data used to build it. When a new concept drift occurs, rcd compares the new context to old ones using a non-parametric multivariate statistical test to verify if both contexts come from the same distribution. If so, the corresponding classi er is reused. If not, a new classi er is generated and stored. Three kinds of tests were performed. One compares the rcd framework with several adaptive algorithms (among single and ensemble approaches) in arti cial and real data sets, among the most used in the concept drift research area, with abrupt and gradual concept drifts. It is observed the ability of the classi ers in representing each context, how they handle concept drift, and training and testing times needed to evaluate the data sets. Results indicate that rcd had similar or better statistical results compared to the other classi ers. In the real-world data sets, rcd presented accuracies close to the best classi er in each data set. Another test compares two statistical tests (knn and Cramer) in their capability in representing and identifying contexts. Tests were performed using adaptive and batch classi ers as base learners of rcd, in arti cial and real-world data sets, with several rates-of-change. Results indicate that, in average, knn had better results compared to the Cramer test, and was also faster. Independently of the test used, rcd had higher accuracy values compared to their respective base learners. It is also presented an improvement in the rcd framework where the statistical tests are performed in parallel through the use of a thread pool. Tests were performed in three processors with di erent numbers of cores. Better results were obtained when there was a high number of detected concept drifts, the bu er size used to represent each data distribution was large, and there was a high test frequency. Even if none of these conditions apply, parallel and sequential execution still have very similar performances. Finally, a comparison between six di erent drift detection methods was also performed, comparing the predictive accuracies, evaluation times, and drift handling, including false alarm and miss detection rates, as well as the average distance to the drift point and its standard deviation.
Submitted by João Arthur Martins (joao.arthur@ufpe.br) on 2015-03-12T18:02:08Z No. of bitstreams: 2 Tese Paulo Gonçalves Jr..pdf: 2957463 bytes, checksum: de163caadf10cbd5442e145778865224 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Made available in DSpace on 2015-03-12T18:02:08Z (GMT). No. of bitstreams: 2 Tese Paulo Gonçalves Jr..pdf: 2957463 bytes, checksum: de163caadf10cbd5442e145778865224 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2013-04-23
Fluxos de dados s~ao um modelo de processamento de dados recente, onde os dados chegam continuamente, em grandes quantidades, a altas velocidades, de modo que eles devem ser processados em tempo real. Al em disso, v arias institui c~oes p ublicas e privadas armazenam grandes quantidades de dados que tamb em devem ser processadas. Classi cadores tradicionais n~ao s~ao adequados para lidar com grandes quantidades de dados por basicamente duas raz~oes. Primeiro, eles costumam ler os dados dispon veis v arias vezes at e convergirem, o que e impratic avel neste cen ario. Em segundo lugar, eles assumem que o contexto representado por dados e est avel no tempo, o que pode n~ao ser verdadeiro. Na verdade, a mudan ca de contexto e uma situa c~ao comum em uxos de dados, e e chamado de mudan ca de conceito. Esta tese apresenta o rcd, uma estrutura que oferece uma abordagem alternativa para lidar com os uxos de dados que sofrem de mudan cas de conceito recorrentes. Ele cria um novo classi cador para cada contexto encontrado e armazena uma amostra dos dados usados para constru -lo. Quando uma nova mudan ca de conceito ocorre, rcd compara o novo contexto com os antigos, utilizando um teste estat stico n~ao param etrico multivariado para veri car se ambos os contextos prov^em da mesma distribui c~ao. Se assim for, o classi cador correspondente e reutilizado. Se n~ao, um novo classi cador e gerado e armazenado. Tr^es tipos de testes foram realizados. Um compara o rcd com v arios algoritmos adaptativos (entre as abordagens individuais e de agrupamento) em conjuntos de dados arti ciais e reais, entre os mais utilizados na area de pesquisa de mudan ca de conceito, com mudan cas bruscas e graduais. E observada a capacidade dos classi cadores em representar cada contexto, como eles lidam com as mudan cas de conceito e os tempos de treinamento e teste necess arios para avaliar os conjuntos de dados. Os resultados indicam que rcd teve resultados estat sticos semelhantes ou melhores, em compara c~ao com os outros classi cadores. Nos conjuntos de dados do mundo real, rcd apresentou precis~oes pr oximas do melhor classi cador em cada conjunto de dados. Outro teste compara dois testes estat sticos (knn e Cramer) em suas capacidades de representar e identi car contextos. Os testes foram realizados utilizando classi cadores xi xii RESUMO tradicionais e adaptativos como base do rcd, em conjuntos de dados arti ciais e do mundo real, com v arias taxas de varia c~ao. Os resultados indicam que, em m edia, KNN obteve melhores resultados em compara c~ao com o teste de Cramer, al em de ser mais r apido. Independentemente do crit erio utilizado, rcd apresentou valores mais elevados de precis~ao em compara c~ao com seus respectivos classi cadores base. Tamb em e apresentada uma melhoria do rcd onde os testes estat sticos s~ao executadas em paralelo por meio do uso de um pool de threads. Os testes foram realizados em tr^es processadores com diferentes n umeros de n ucleos. Melhores resultados foram obtidos quando houve um elevado n umero de mudan cas de conceito detectadas, o tamanho das amostras utilizadas para representar cada distribui c~ao de dados era grande, e havia uma alta freq u^encia de testes. Mesmo que nenhuma destas condi c~oes se aplicam, a execu c~ao paralela e seq uencial ainda t^em performances muito semelhantes. Finalmente, uma compara c~ao entre seis diferentes m etodos de detec c~ao de mudan ca de conceito tamb em foi realizada, comparando a precis~ao, os tempos de avalia c~ao, manipula c~ao das mudan cas de conceito, incluindo as taxas de falsos positivos e negativos, bem como a m edia da dist^ancia ao ponto de mudan ca e o seu desvio padr~ao.
APA, Harvard, Vancouver, ISO, and other styles
9

Gonçalves, Júnior Paulo Mauricio. "Multivariate non-parametric statistical tests to reuse classifiers in recurring concept drifting environments." Universidade Federal de Pernambuco, 2013. https://repositorio.ufpe.br/handle/123456789/12288.

Full text
Abstract:
Data streams are a recent processing model where data arrive continuously, in large quantities, at high speeds, so that they must be processed on-line. Besides that, several private and public institutions store large amounts of data that also must be processed. Traditional batch classi ers are not well suited to handle huge amounts of data for basically two reasons. First, they usually read the available data several times until convergence, which is impractical in this scenario. Second, they imply that the context represented by data is stable in time, which may not be true. In fact, the context change is a common situation in data streams, and is named concept drift. This thesis presents rcd, a framework that o ers an alternative approach to handle data streams that su er from recurring concept drifts. It creates a new classi er to each context found and stores a sample of the data used to build it. When a new concept drift occurs, rcd compares the new context to old ones using a non-parametric multivariate statistical test to verify if both contexts come from the same distribution. If so, the corresponding classi er is reused. If not, a new classi er is generated and stored. Three kinds of tests were performed. One compares the rcd framework with several adaptive algorithms (among single and ensemble approaches) in arti cial and real data sets, among the most used in the concept drift research area, with abrupt and gradual concept drifts. It is observed the ability of the classi ers in representing each context, how they handle concept drift, and training and testing times needed to evaluate the data sets. Results indicate that rcd had similar or better statistical results compared to the other classi ers. In the real-world data sets, rcd presented accuracies close to the best classi er in each data set. Another test compares two statistical tests (knn and Cramer) in their capability in representing and identifying contexts. Tests were performed using adaptive and batch classi ers as base learners of rcd, in arti cial and real-world data sets, with several rates-of-change. Results indicate that, in average, knn had better results compared to the Cramer test, and was also faster. Independently of the test used, rcd had higher accuracy values compared to their respective base learners. It is also presented an improvement in the rcd framework where the statistical tests are performed in parallel through the use of a thread pool. Tests were performed in three processors with di erent numbers of cores. Better results were obtained when there was a high number of detected concept drifts, the bu er size used to represent each data distribution was large, and there was a high test frequency. Even if none of these conditions apply, parallel and sequential execution still have very similar performances. Finally, a comparison between six di erent drift detection methods was also performed, comparing the predictive accuracies, evaluation times, and drift handling, including false alarm and miss detection rates, as well as the average distance to the drift point and its standard deviation.
Submitted by João Arthur Martins (joao.arthur@ufpe.br) on 2015-03-12T19:25:11Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) tese Paulo Mauricio Gonçalves Jr..pdf: 2957463 bytes, checksum: de163caadf10cbd5442e145778865224 (MD5)
Made available in DSpace on 2015-03-12T19:25:11Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) tese Paulo Mauricio Gonçalves Jr..pdf: 2957463 bytes, checksum: de163caadf10cbd5442e145778865224 (MD5) Previous issue date: 2013-04-23
Fluxos de dados s~ao um modelo de processamento de dados recente, onde os dados chegam continuamente, em grandes quantidades, a altas velocidades, de modo que eles devem ser processados em tempo real. Al em disso, v arias institui c~oes p ublicas e privadas armazenam grandes quantidades de dados que tamb em devem ser processadas. Classi cadores tradicionais n~ao s~ao adequados para lidar com grandes quantidades de dados por basicamente duas raz~oes. Primeiro, eles costumam ler os dados dispon veis v arias vezes at e convergirem, o que e impratic avel neste cen ario. Em segundo lugar, eles assumem que o contexto representado por dados e est avel no tempo, o que pode n~ao ser verdadeiro. Na verdade, a mudan ca de contexto e uma situa c~ao comum em uxos de dados, e e chamado de mudan ca de conceito. Esta tese apresenta o rcd, uma estrutura que oferece uma abordagem alternativa para lidar com os uxos de dados que sofrem de mudan cas de conceito recorrentes. Ele cria um novo classi cador para cada contexto encontrado e armazena uma amostra dos dados usados para constru -lo. Quando uma nova mudan ca de conceito ocorre, rcd compara o novo contexto com os antigos, utilizando um teste estat stico n~ao param etrico multivariado para veri car se ambos os contextos prov^em da mesma distribui c~ao. Se assim for, o classi cador correspondente e reutilizado. Se n~ao, um novo classi cador e gerado e armazenado. Tr^es tipos de testes foram realizados. Um compara o rcd com v arios algoritmos adaptativos (entre as abordagens individuais e de agrupamento) em conjuntos de dados arti ciais e reais, entre os mais utilizados na area de pesquisa de mudan ca de conceito, com mudan cas bruscas e graduais. E observada a capacidade dos classi cadores em representar cada contexto, como eles lidam com as mudan cas de conceito e os tempos de treinamento e teste necess arios para avaliar os conjuntos de dados. Os resultados indicam que rcd teve resultados estat sticos semelhantes ou melhores, em compara c~ao com os outros classi cadores. Nos conjuntos de dados do mundo real, rcd apresentou precis~oes pr oximas do melhor classi cador em cada conjunto de dados. Outro teste compara dois testes estat sticos (knn e Cramer) em suas capacidades de representar e identi car contextos. Os testes foram realizados utilizando classi cadores tradicionais e adaptativos como base do rcd, em conjuntos de dados arti ciais e do mundo real, com v arias taxas de varia c~ao. Os resultados indicam que, em m edia, KNN obteve melhores resultados em compara c~ao com o teste de Cramer, al em de ser mais r apido. Independentemente do crit erio utilizado, rcd apresentou valores mais elevados de precis~ao em compara c~ao com seus respectivos classi cadores base. Tamb em e apresentada uma melhoria do rcd onde os testes estat sticos s~ao executadas em paralelo por meio do uso de um pool de threads. Os testes foram realizados em tr^es processadores com diferentes n umeros de n ucleos. Melhores resultados foram obtidos quando houve um elevado n umero de mudan cas de conceito detectadas, o tamanho das amostras utilizadas para representar cada distribui c~ao de dados era grande, e havia uma alta freq u^encia de testes. Mesmo que nenhuma destas condi c~oes se aplicam, a execu c~ao paralela e seq uencial ainda t^em performances muito semelhantes. Finalmente, uma compara c~ao entre seis diferentes m etodos de detec c~ao de mudan ca de conceito tamb em foi realizada, comparando a precis~ao, os tempos de avalia c~ao, manipula c~ao das mudan cas de conceito, incluindo as taxas de falsos positivos e negativos, bem como a m edia da dist^ancia ao ponto de mudan ca e o seu desvio padr~ao.
APA, Harvard, Vancouver, ISO, and other styles
10

Wei, Wei. "Probabilistic Models of Topics and Social Events." Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/941.

Full text
Abstract:
Structured probabilistic inference has shown to be useful in modeling complex latent structures of data. One successful way in which this technique has been applied is in the discovery of latent topical structures of text data, which is usually referred to as topic modeling. With the recent popularity of mobile devices and social networking, we can now easily acquire text data attached to meta information, such as geo-spatial coordinates and time stamps. This metadata can provide rich and accurate information that is helpful in answering many research questions related to spatial and temporal reasoning. However, such data must be treated differently from text data. For example, spatial data is usually organized in terms of a two dimensional region while temporal information can exhibit periodicities. While some work existing in the topic modeling community that utilizes some of the meta information, these models largely focused on incorporating metadata into text analysis, rather than providing models that make full use of the joint distribution of metainformation and text. In this thesis, I propose the event detection problem, which is a multidimensional latent clustering problem on spatial, temporal and topical data. I start with a simple parametric model to discover independent events using geo-tagged Twitter data. The model is then improved toward two directions. First, I augmented the model using Recurrent Chinese Restaurant Process (RCRP) to discover events that are dynamic in nature. Second, I studied a model that can detect events using data from multiple media sources. I studied the characteristics of different media in terms of reported event times and linguistic patterns. The approaches studied in this thesis are largely based on Bayesian nonparametric methods to deal with steaming data and unpredictable number of clusters. The research will not only serve the event detection problem itself but also shed light into a more general structured clustering problem in spatial, temporal and textual data.
APA, Harvard, Vancouver, ISO, and other styles
11

Eamrurksiri, Araya. "Applying Machine Learning to LTE/5G Performance Trend Analysis." Thesis, Linköpings universitet, Statistik och maskininlärning, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139126.

Full text
Abstract:
The core idea of this thesis is to reduce the workload of manual inspection when the performance analysis of an updated software is required. The Central Process- ing Unit (CPU) utilization, which is one of the essential factors for evaluating the performance, is analyzed. The purpose of this work is to apply machine learning techniques that are suitable for detecting the state of the CPU utilization and any changes in the test environment that affects the CPU utilization. The detection re- lies on a Markov switching model to identify structural changes, which are assumed to follow an unobserved Markov chain, in the time series data. A historical behav- ior of the data can be described by a first-order autoregression. Then, the Markov switching model becomes a Markov switching autoregressive model. Another ap- proach based on a non-parametric analysis, a distribution-free method that requires fewer assumptions, called an E-divisive method, is proposed. This method uses a hi- erarchical clustering algorithm to detect multiple change point locations in the time series data. As the data used in this analysis does not contain any ground truth, the evaluation of the methods is analyzed by generating simulated datasets with known states. Besides, these simulated datasets are used for studying and compar- ing between the Markov switching autoregressive model and the E-divisive method. Results show that the former method is preferable because of its better performance in detecting changes. Some information about the state of the CPU utilization are also obtained from performing the Markov switching model. The E-divisive method is proved to have less power in detecting changes and has a higher rate of missed detections. The results from applying the Markov switching autoregressive model to the real data are presented with interpretations and discussions.
APA, Harvard, Vancouver, ISO, and other styles
12

Landoni, E. "A COMPREHENSIVE PIPELINE FOR CLASS COMPARISON AND CLASS PREDICTION IN CANCER RESEARCH." Doctoral thesis, Università degli Studi di Milano, 2015. http://hdl.handle.net/2434/344575.

Full text
Abstract:
Personalized medicine is an emerging field that promises to bring radical changes in healthcare and may be defined as “a medical model using molecular profiling technologies for tailoring the right therapeutic strategy for the right person at the right time, and determine the predisposition to disease at the population level and to deliver timely and stratified prevention”. The sequencing of the human genome together with the development and implementation of new high throughput technologies has provided access to large ‘omics’ (e.g. genomics, proteomics) data, bringing a better understanding of cancer biology and enabling new approaches to diagnosis, drug development, and individualized therapy. ‘Omics’ data have the potential as cancer biomarkers but no consolidated guidelines have been established for discovery analyses. In the context of the EDERA project, funded by the Italian Association for Cancer Research, a structured pipeline was developed with innovative applications of existing bioinformatics methods including: 1) the combination of the results of two statistical tests (t and Anderson-Darling) to detect features with significant fold change or general distributional differences in class comparison; 2) the application of a bootstrap selection procedure together with machine learning techniques to guarantee result generalizability and study the interconnections among the selected features in class prediction. Such a pipeline was successfully applied to plasmatic microRNA, identifying five hemolysis related microRNAs and to Secondary ElectroSpray Ionization-Mass Spectrometry data, in which case eight mass spectrometry signals were found able to discriminate exhaled breath from breast cancer patients from that of healthy individuals.
APA, Harvard, Vancouver, ISO, and other styles
13

Aghazadeh, Omid. "Data Driven Visual Recognition." Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-145865.

Full text
Abstract:
This thesis is mostly about supervised visual recognition problems. Based on a general definition of categories, the contents are divided into two parts: one which models categories and one which is not category based. We are interested in data driven solutions for both kinds of problems. In the category-free part, we study novelty detection in temporal and spatial domains as a category-free recognition problem. Using data driven models, we demonstrate that based on a few reference exemplars, our methods are able to detect novelties in ego-motions of people, and changes in the static environments surrounding them. In the category level part, we study object recognition. We consider both object category classification and localization, and propose scalable data driven approaches for both problems. A mixture of parametric classifiers, initialized with a sophisticated clustering of the training data, is demonstrated to adapt to the data better than various baselines such as the same model initialized with less subtly designed procedures. A nonparametric large margin classifier is introduced and demonstrated to have a multitude of advantages in comparison to its competitors: better training and testing time costs, the ability to make use of indefinite/invariant and deformable similarity measures, and adaptive complexity are the main features of the proposed model. We also propose a rather realistic model of recognition problems, which quantifies the interplay between representations, classifiers, and recognition performances. Based on data-describing measures which are aggregates of pairwise similarities of the training data, our model characterizes and describes the distributions of training exemplars. The measures are shown to capture many aspects of the difficulty of categorization problems and correlate significantly to the observed recognition performances. Utilizing these measures, the model predicts the performance of particular classifiers on distributions similar to the training data. These predictions, when compared to the test performance of the classifiers on the test sets, are reasonably accurate. We discuss various aspects of visual recognition problems: what is the interplay between representations and classification tasks, how can different models better adapt to the training data, etc. We describe and analyze the aforementioned methods that are designed to tackle different visual recognition problems, but share one common characteristic: being data driven.

QC 20140604

APA, Harvard, Vancouver, ISO, and other styles
14

van, der Wilk Mark. "Sparse Gaussian process approximations and applications." Thesis, University of Cambridge, 2019. https://www.repository.cam.ac.uk/handle/1810/288347.

Full text
Abstract:
Many tasks in machine learning require learning some kind of input-output relation (function), for example, recognising handwritten digits (from image to number) or learning the motion behaviour of a dynamical system like a pendulum (from positions and velocities now to future positions and velocities). We consider this problem using the Bayesian framework, where we use probability distributions to represent the state of uncertainty that a learning agent is in. In particular, we will investigate methods which use Gaussian processes to represent distributions over functions. Gaussian process models require approximations in order to be practically useful. This thesis focuses on understanding existing approximations and investigating new ones tailored to specific applications. We advance the understanding of existing techniques first through a thorough review. We propose desiderata for non-parametric basis function model approximations, which we use to assess the existing approximations. Following this, we perform an in-depth empirical investigation of two popular approximations (VFE and FITC). Based on the insights gained, we propose a new inter-domain Gaussian process approximation, which can be used to increase the sparsity of the approximation, in comparison to regular inducing point approximations. This allows GP models to be stored and communicated more compactly. Next, we show that inter-domain approximations can also allow the use of models which would otherwise be impractical, as opposed to improving existing approximations. We introduce an inter-domain approximation for the Convolutional Gaussian process - a model that makes Gaussian processes suitable to image inputs, and which has strong relations to convolutional neural networks. This same technique is valuable for approximating Gaussian processes with more general invariance properties. Finally, we revisit the derivation of the Gaussian process State Space Model, and discuss some subtleties relating to their approximation. We hope that this thesis illustrates some benefits of non-parametric models and their approximation in a non-parametric fashion, and that it provides models and approximations that prove to be useful for the development of more complex and performant models in the future.
APA, Harvard, Vancouver, ISO, and other styles
15

Shandilya, Sharad. "ASSESSMENT AND PREDICTION OF CARDIOVASCULAR STATUS DURING CARDIAC ARREST THROUGH MACHINE LEARNING AND DYNAMICAL TIME-SERIES ANALYSIS." VCU Scholars Compass, 2013. http://scholarscompass.vcu.edu/etd/3198.

Full text
Abstract:
In this work, new methods of feature extraction, feature selection, stochastic data characterization/modeling, variance reduction and measures for parametric discrimination are proposed. These methods have implications for data mining, machine learning, and information theory. A novel decision-support system is developed in order to guide intervention during cardiac arrest. The models are built upon knowledge extracted with signal-processing, non-linear dynamic and machine-learning methods. The proposed ECG characterization, combined with information extracted from PetCO2 signals, shows viability for decision-support in clinical settings. The approach, which focuses on integration of multiple features through machine learning techniques, suits well to inclusion of multiple physiologic signals. Ventricular Fibrillation (VF) is a common presenting dysrhythmia in the setting of cardiac arrest whose main treatment is defibrillation through direct current countershock to achieve return of spontaneous circulation. However, often defibrillation is unsuccessful and may even lead to the transition of VF to more nefarious rhythms such as asystole or pulseless electrical activity. Multiple methods have been proposed for predicting defibrillation success based on examination of the VF waveform. To date, however, no analytical technique has been widely accepted. For a given desired sensitivity, the proposed model provides a significantly higher accuracy and specificity as compared to the state-of-the-art. Notably, within the range of 80-90% of sensitivity, the method provides about 40% higher specificity. This means that when trained to have the same level of sensitivity, the model will yield far fewer false positives (unnecessary shocks). Also introduced is a new model that predicts recurrence of arrest after a successful countershock is delivered. To date, no other work has sought to build such a model. I validate the method by reporting multiple performance metrics calculated on (blind) test sets.
APA, Harvard, Vancouver, ISO, and other styles
16

Hall, Otto. "Inference of buffer queue times in data processing systems using Gaussian Processes : An introduction to latency prediction for dynamic software optimization in high-end trading systems." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-214791.

Full text
Abstract:
This study investigates whether Gaussian Process Regression can be applied to evaluate buffer queue times in large scale data processing systems. It is additionally considered whether high-frequency data stream rates can be generalized into a small subset of the sample space. With the aim of providing basis for dynamic software optimization, a promising foundation for continued research is introduced. The study is intended to contribute to Direct Market Access financial trading systems which processes immense amounts of market data daily. Due to certain limitations, we shoulder a naïve approach and model latencies as a function of only data throughput in eight small historical intervals. The training and test sets are represented from raw market data, and we resort to pruning operations to shrink the datasets by a factor of approximately 0.0005 in order to achieve computational feasibility. We further consider four different implementations of Gaussian Process Regression. The resulting algorithms perform well on pruned datasets, with an average R2 statistic of 0.8399 over six test sets of approximately equal size as the training set. Testing on non-pruned datasets indicate shortcomings from the generalization procedure, where input vectors corresponding to low-latency target values are associated with less accuracy. We conclude that depending on application, the shortcomings may be make the model intractable. However for the purposes of this study it is found that buffer queue times can indeed be modelled by regression algorithms. We discuss several methods for improvements, both in regards to pruning procedures and Gaussian Processes, and open up for promising continued research.
Denna studie undersöker huruvida Gaussian Process Regression kan appliceras för att utvärdera buffer-kötider i storskaliga dataprocesseringssystem. Dessutom utforskas ifall dataströmsfrekvenser kan generaliseras till en liten delmängd av utfallsrymden. Medmålet att erhålla en grund för dynamisk mjukvaruoptimering introduceras en lovandestartpunkt för fortsatt forskning. Studien riktas mot Direct Market Access system för handel på finansiella marknader, somprocesserar enorma mängder marknadsdata dagligen. På grund av vissa begränsningar axlas ett naivt tillvägagångssätt och väntetider modelleras som en funktion av enbartdatagenomströmning i åtta små historiska tidsinterval. Tränings- och testdataset representeras från ren marknadsdata och pruning-tekniker används för att krympa dataseten med en ungefärlig faktor om 0.0005, för att uppnå beräkningsmässig genomförbarhet. Vidare tas fyra olika implementationer av Gaussian Process Regression i beaktning. De resulterande algorithmerna presterar bra på krympta dataset, med en medel R2 statisticpå 0.8399 över sex testdataset, alla av ungefär samma storlek som träningsdatasetet. Tester på icke krympta dataset indikerar vissa brister från pruning, där input vektorermotsvararande låga latenstider är associerade med mindre exakthet. Slutsatsen dras att beroende på applikation kan dessa brister göra modellen obrukbar. För studiens syftefinnes emellertid att latenstider kan sannerligen modelleras av regressionsalgoritmer. Slutligen diskuteras metoder för förbättrning med hänsyn till både pruning och GaussianProcess Regression, och det öppnas upp för lovande vidare forskning.
APA, Harvard, Vancouver, ISO, and other styles
17

Dang, Hong-Phuong. "Approches bayésiennes non paramétriques et apprentissage de dictionnaire pour les problèmes inverses en traitement d'image." Thesis, Ecole centrale de Lille, 2016. http://www.theses.fr/2016ECLI0019/document.

Full text
Abstract:
L'apprentissage de dictionnaire pour la représentation parcimonieuse est bien connu dans le cadre de la résolution de problèmes inverses. Les méthodes d'optimisation et les approches paramétriques ont été particulièrement explorées. Ces méthodes rencontrent certaines limitations, notamment liées au choix de paramètres. En général, la taille de dictionnaire doit être fixée à l'avance et une connaissance des niveaux de bruit et éventuellement de parcimonie sont aussi nécessaires. Les contributions méthodologies de cette thèse concernent l'apprentissage conjoint du dictionnaire et de ces paramètres, notamment pour les problèmes inverses en traitement d'image. Nous étudions et proposons la méthode IBP-DL (Indien Buffet Process for Dictionary Learning) en utilisant une approche bayésienne non paramétrique. Une introduction sur les approches bayésiennes non paramétriques est présentée. Le processus de Dirichlet et son dérivé, le processus du restaurant chinois, ainsi que le processus Bêta et son dérivé, le processus du buffet indien, sont décrits. Le modèle proposé pour l'apprentissage de dictionnaire s'appuie sur un a priori de type Buffet Indien qui permet d'apprendre un dictionnaire de taille adaptative. Nous détaillons la méthode de Monte-Carlo proposée pour l'inférence. Le niveau de bruit et celui de la parcimonie sont aussi échantillonnés, de sorte qu'aucun réglage de paramètres n'est nécessaire en pratique. Des expériences numériques illustrent les performances de l'approche pour les problèmes du débruitage, de l'inpainting et de l'acquisition compressée. Les résultats sont comparés avec l'état de l'art.Le code source en Matlab et en C est mis à disposition
Dictionary learning for sparse representation has been widely advocated for solving inverse problems. Optimization methods and parametric approaches towards dictionary learning have been particularly explored. These methods meet some limitations, particularly related to the choice of parameters. In general, the dictionary size is fixed in advance, and sparsity or noise level may also be needed. In this thesis, we show how to perform jointly dictionary and parameter learning, with an emphasis on image processing. We propose and study the Indian Buffet Process for Dictionary Learning (IBP-DL) method, using a bayesian nonparametric approach.A primer on bayesian nonparametrics is first presented. Dirichlet and Beta processes and their respective derivatives, the Chinese restaurant and Indian Buffet processes are described. The proposed model for dictionary learning relies on an Indian Buffet prior, which permits to learn an adaptive size dictionary. The Monte-Carlo method for inference is detailed. Noise and sparsity levels are also inferred, so that in practice no parameter tuning is required. Numerical experiments illustrate the performances of the approach in different settings: image denoising, inpainting and compressed sensing. Results are compared with state-of-the art methods is made. Matlab and C sources are available for sake of reproducibility
APA, Harvard, Vancouver, ISO, and other styles
18

Knefati, Muhammad Anas. "Estimation non-paramétrique du quantile conditionnel et apprentissage semi-paramétrique : applications en assurance et actuariat." Thesis, Poitiers, 2015. http://www.theses.fr/2015POIT2280/document.

Full text
Abstract:
La thèse se compose de deux parties : une partie consacrée à l'estimation des quantiles conditionnels et une autre à l'apprentissage supervisé. La partie "Estimation des quantiles conditionnels" est organisée en 3 chapitres : Le chapitre 1 est consacré à une introduction sur la régression linéaire locale, présentant les méthodes les plus utilisées, pour estimer le paramètre de lissage. Le chapitre 2 traite des méthodes existantes d’estimation nonparamétriques du quantile conditionnel ; Ces méthodes sont comparées, au moyen d’expériences numériques sur des données simulées et des données réelles. Le chapitre 3 est consacré à un nouvel estimateur du quantile conditionnel et que nous proposons ; Cet estimateur repose sur l'utilisation d'un noyau asymétrique en x. Sous certaines hypothèses, notre estimateur s'avère plus performant que les estimateurs usuels. La partie "Apprentissage supervisé" est, elle aussi, composée de 3 chapitres : Le chapitre 4 est une introduction à l’apprentissage statistique et les notions de base utilisées, dans cette partie. Le chapitre 5 est une revue des méthodes conventionnelles de classification supervisée. Le chapitre 6 est consacré au transfert d'un modèle d'apprentissage semi-paramétrique. La performance de cette méthode est montrée par des expériences numériques sur des données morphométriques et des données de credit-scoring
The thesis consists of two parts: One part is about the estimation of conditional quantiles and the other is about supervised learning. The "conditional quantile estimate" part is organized into 3 chapters. Chapter 1 is devoted to an introduction to the local linear regression and then goes on to present the methods, the most used in the literature to estimate the smoothing parameter. Chapter 2 addresses the nonparametric estimation methods of conditional quantile and then gives numerical experiments on simulated data and real data. Chapter 3 is devoted to a new conditional quantile estimator, we propose. This estimator is based on the use of asymmetrical kernels w.r.t. x. We show, under some hypothesis, that this new estimator is more efficient than the other estimators already used. The "supervised learning" part is, too, with 3 chapters: Chapter 4 provides an introduction to statistical learning, remembering the basic concepts used in this part. Chapter 5 discusses the conventional methods of supervised classification. Chapter 6 is devoted to propose a method of transferring a semiparametric model. The performance of this method is shown by numerical experiments on morphometric data and credit-scoring data
APA, Harvard, Vancouver, ISO, and other styles
19

SARCIA', SALVATORE ALESSANDRO. "An Approach to improving parametric estimation models in the case of violation of assumptions based upon risk analysis." Doctoral thesis, Università degli Studi di Roma "Tor Vergata", 2009. http://hdl.handle.net/2108/1048.

Full text
Abstract:
In this work, we show the mathematical reasons why parametric models fall short of providing correct estimates and define an approach that overcomes the causes of these shortfalls. The approach aims at improving parametric estimation models when any regression model assumption is violated for the data being analyzed. Violations can be that, the errors are x-correlated, the model is not linear, the sample is heteroscedastic, or the error probability distribution is not Gaussian. If data violates the regression assumptions and we do not deal with the consequences of these violations, we cannot improve the model and estimates will be incorrect forever. The novelty of this work is that we define and use a feed-forward multi-layer neural network for discrimination problems to calculate prediction intervals (i.e. evaluate uncertainty), make estimates, and detect improvement needs. The primary difference from traditional methodologies is that the proposed approach can deal with scope error, model error, and assumption error at the same time. The approach can be applied for prediction, inference, and model improvement over any situation and context without making specific assumptions. An important benefit of the approach is that, it can be completely automated as a stand-alone estimation methodology or used for supporting experts and organizations together with other estimation techniques (e.g., human judgment, parametric models). Unlike other methodologies, the proposed approach focuses on the model improvement by integrating the estimation activity into a wider process that we call the Estimation Improvement Process as an instantiation of the Quality Improvement Paradigm. This approach aids mature organizations in learning from their experience and improving their processes over time with respect to managing their estimation activities. To provide an exposition of the approach, we use an old NASA COCOMO data set to (1) build an evolvable neural network model and (2) show how a parametric model, e.g., a regression model, can be improved and evolved with the new project data.
APA, Harvard, Vancouver, ISO, and other styles
20

Wang, Chunping. "Non-parametric Bayesian Learning with Incomplete Data." Diss., 2010. http://hdl.handle.net/10161/3075.

Full text
Abstract:

In most machine learning approaches, it is usually assumed that data are complete. When data are partially missing due to various reasons, for example, the failure of a subset of sensors, image corruption or inadequate medical measurements, many learning methods designed for complete data cannot be directly applied. In this dissertation we treat two kinds of problems with incomplete data using non-parametric Bayesian approaches: classification with incomplete features and analysis of low-rank matrices with missing entries.

Incomplete data in classification problems are handled by assuming input features to be generated from a mixture-of-experts model, with each individual expert (classifier) defined by a local Gaussian in feature space. With a linear classifier associated with each Gaussian component, nonlinear classification boundaries are achievable without the introduction of kernels. Within the proposed model, the number of components is theoretically ``infinite'' as defined by a Dirichlet process construction, with the actual number of mixture components (experts) needed inferred based upon the data under test. With a higher-level DP we further extend the classifier for analysis of multiple related tasks (multi-task learning), where model components may be shared across tasks. Available data could be augmented by this way of information transfer even when tasks are only similar in some local regions of feature space, which is particularly critical for cases with scarce incomplete training samples from each task. The proposed algorithms are implemented using efficient variational Bayesian inference and robust performance is demonstrated on synthetic data, benchmark data sets, and real data with natural missing values.

Another scenario of interest is to complete a data matrix with entries missing. The recovery of missing matrix entries is not possible without additional assumptions on the matrix under test, and here we employ the common assumption that the matrix is low-rank. Unlike methods with a preset fixed rank, we propose a non-parametric Bayesian alternative based on the singular value decomposition (SVD), where missing entries are handled naturally, and the number of underlying factors is imposed to be small and inferred in the light of observed entries. Although we assume missing at random, the proposed model is generalized to incorporate auxiliary information including missingness features. We also make a first attempt in the matrix-completion community to acquire new entries actively. By introducing a probit link function, we are able to handle counting matrices with the decomposed low-rank matrices latent. The basic model and its extensions are validated on

synthetic data, a movie-rating benchmark and a new data set presented for the first time.


Dissertation
APA, Harvard, Vancouver, ISO, and other styles
21

Castro, Rui M. "Active learning and adaptive sampling for non-parametric inference." Thesis, 2008. http://hdl.handle.net/1911/22265.

Full text
Abstract:
This thesis presents a general discussion of active learning and adaptive sampling. In many practical scenarios it is possible to use information gleaned from previous observations to focus the sampling process, in the spirit of the "twenty-questions" game. As more samples are collected one can learn how to improve the sampling process by deciding where to sample next, for example. These sampling feedback techniques are generically known as active learning or adaptive sampling. Although appealing, analysis of such methodologies is difficult, since there are strong dependencies between the observed data. This is especially important in the presence of measurement uncertainty or noise. The main thrust of this thesis is to characterize the potential and fundamental limitations of active learning, particularly in non-parametric settings. First, we consider the probabilistic classification setting. Using minimax analysis techniques we investigate the achievable rates of classification error convergence for broad classes of distributions characterized by decision boundary regularity and noise conditions (which describe the observation noise near the decision boundary). The results clearly indicate the conditions under which one can expect significant gains through active learning. Furthermore we show that the learning rates derived are tight for "boundary fragment" classes in d-dimensional feature spaces when the feature marginal density is bounded from above and below. Second we study the problem of estimating an unknown function from noisy point-wise samples, where the sample locations are adaptively chosen based on previous samples and observations, as described above. We present results characterizing the potential and fundamental limits of active learning for certain classes of nonparametric regression problems, and also present practical algorithms capable of exploiting the sampling adaptivity and provably improving upon non-adaptive techniques. Our active sampling procedure is based on a novel coarse-to-fine strategy, based on and motivated by the success of spatially-adaptive methods such as wavelet analysis in nonparametric function estimation. Using the ideas developed when solving the function regression problem we present a greedy algorithm for estimating piecewise constant functions with smooth boundaries that is near minimax optimal but is computationally much more efficient than the best dictionary based method (in this case wedgelet approximations). Finally we compare adaptive sampling (where feedback guiding the sampling process is present) with non-adaptive compressive sampling (where non-traditional projection samples are used). It is shown that under mild noise compressive sampling can be competitive with adaptive sampling, but adaptive sampling significantly outperforms compressive sampling in lower signal-to-noise conditions. Furthermore this work also helps the understanding of the different behavior of compressive sampling under noisy and noiseless settings.
APA, Harvard, Vancouver, ISO, and other styles
22

Amaro, Miguel Mendes. "Credit scoring: comparison of non‐parametric techniques against logistic regression." Master's thesis, 2020. http://hdl.handle.net/10362/99692.

Full text
Abstract:
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
Over the past decades, financial institutions have been giving increased importance to credit risk management as a critical tool to control their profitability. More than ever, it became crucial for these institutions to be able to well discriminate between good and bad clients for only accepting the credit applications that are not likely to default. To calculate the probability of default of a particular client, most financial institutions have credit scoring models based on parametric techniques. Logistic regression is the current industry standard technique in credit scoring models, and it is one of the techniques under study in this dissertation. Although it is regarded as a robust and intuitive technique, it is still not free from several critics towards the model assumptions it takes that can compromise its predictions. This dissertation intends to evaluate the gains in performance resulting from using more modern non-parametric techniques instead of logistic regression, performing a model comparison over four different real-life credit datasets. Specifically, the techniques compared against logistic regression in this study consist of two single classifiers (decision tree and SVM with RBF kernel) and two ensemble methods (random forest and stacking with cross-validation). The literature review demonstrates that heterogeneous ensemble approaches have a weaker presence in credit scoring studies and, because of that, stacking with cross-validation was considered in this study. The results demonstrate that logistic regression outperforms the decision tree classifier, has similar performance in relation to SVM and slightly underperforms both ensemble approaches in similar extents.
APA, Harvard, Vancouver, ISO, and other styles
23

"Computational Challenges in Non-parametric Prediction of Bradycardia in Preterm Infants." Master's thesis, 2020. http://hdl.handle.net/2286/R.I.63054.

Full text
Abstract:
abstract: Infants born before 37 weeks of pregnancy are considered to be preterm. Typically, preterm infants have to be strictly monitored since they are highly susceptible to health problems like hypoxemia (low blood oxygen level), apnea, respiratory issues, cardiac problems, neurological problems as well as an increased chance of long-term health issues such as cerebral palsy, asthma and sudden infant death syndrome. One of the leading health complications in preterm infants is bradycardia - which is defined as the slower than expected heart rate, generally beating lower than 60 beats per minute. Bradycardia is often accompanied by low oxygen levels and can cause additional long term health problems in the premature infant.The implementation of a non-parametric method to predict the onset of brady- cardia is presented. This method assumes no prior knowledge of the data and uses kernel density estimation to predict the future onset of bradycardia events. The data is preprocessed, and then analyzed to detect the peaks in the ECG signals, following which different kernels are implemented to estimate the shared underlying distribu- tion of the data. The performance of the algorithm is evaluated using various metrics and the computational challenges and methods to overcome them are also discussed. It is observed that the performance of the algorithm with regards to the kernels used are consistent with the theoretical performance of the kernel as presented in a previous work. The theoretical approach has also been automated in this work and the various implementation challenges have been addressed.
Dissertation/Thesis
Masters Thesis Electrical Engineering 2020
APA, Harvard, Vancouver, ISO, and other styles
24

(11197908), Yicheng Cheng. "Machine Learning in the Open World." Thesis, 2021.

Find full text
Abstract:
By Machine Learning in the Open World, we are trying to build models that can be used in a more realistic setting where there could always be something "unknown" happening. Beyond the traditional machine learning tasks such as classification and segmentation where all classes are predefined, we are dealing with the challenges from newly emerged classes, irrelevant classes, outliers, and class imbalance.
At the beginning, we focus on the Non-Exhaustive Learning (NEL) problem from a statistical aspect. By NEL, we assume that our training classes are non-exhaustive, where the testing data could contain unknown classes. And we aim to build models that could simultaneously perform classification and class discovery. We proposed a non-parametric Bayesian model that learns some hyper-parameters from both training and discovered classes (which is empty at the beginning), then infer the label partitioning under the guidance of the learned hyper-parameters, and repeat the above procedure until convergence.
After obtaining good results on applications with plain and low dimensional data such flow-cytometry and some benchmark datasets, we move forward to Non-Exhaustive Feature Learning (NEFL). For NEFL, we extend our work with deep learning techniques to learn representations on datasets with complex structural and spatial correlations. We proposed a metric learning approach to learn a feature space with good discrimination on both training classes and generalize well on unknown classes. Then we developed some variants of this metric learning algorithm to deal with outliers and irrelevant classes. We applied our final model to applications such as open world image classification, image segmentation, and SRS hyperspectral image segmentation and obtained promising results.
Finally, we did some explorations with Out of Distribution detection (OOD) to detect irrelevant sample and outliers to complete the story.
APA, Harvard, Vancouver, ISO, and other styles
25

Pazis, Jason. "PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates." Diss., 2015. http://hdl.handle.net/10161/11334.

Full text
Abstract:

As the reinforcement learning community has shifted its focus from heuristic methods to methods that have performance guarantees, PAC-optimal exploration algorithms have received significant attention. Unfortunately, the majority of current PAC-optimal exploration algorithms are inapplicable in realistic scenarios: 1) They scale poorly to domains of realistic size. 2) They are only applicable to discrete state-action spaces. 3) They assume that experience comes from a single, continuous trajectory. 4) They assume that value function updates are instantaneous. The goal of this work is to bridge the gap between theory and practice, by introducing an efficient and customizable PAC optimal exploration algorithm, that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a significant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present Bellman error MDPs, a new analysis methodology for online and offline reinforcement learning algorithms, and TCE, a new, fine grained metric for the cost of exploration.


Dissertation
APA, Harvard, Vancouver, ISO, and other styles
26

"Graph-based Estimation of Information Divergence Functions." Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.38649.

Full text
Abstract:
abstract: Information divergence functions, such as the Kullback-Leibler divergence or the Hellinger distance, play a critical role in statistical signal processing and information theory; however estimating them can be challenge. Most often, parametric assumptions are made about the two distributions to estimate the divergence of interest. In cases where no parametric model fits the data, non-parametric density estimation is used. In statistical signal processing applications, Gaussianity is usually assumed since closed-form expressions for common divergence measures have been derived for this family of distributions. Parametric assumptions are preferred when it is known that the data follows the model, however this is rarely the case in real-word scenarios. Non-parametric density estimators are characterized by a very large number of parameters that have to be tuned with costly cross-validation. In this dissertation we focus on a specific family of non-parametric estimators, called direct estimators, that bypass density estimation completely and directly estimate the quantity of interest from the data. We introduce a new divergence measure, the $D_p$-divergence, that can be estimated directly from samples without parametric assumptions on the distribution. We show that the $D_p$-divergence bounds the binary, cross-domain, and multi-class Bayes error rates and, in certain cases, provides provably tighter bounds than the Hellinger divergence. In addition, we also propose a new methodology that allows the experimenter to construct direct estimators for existing divergence measures or to construct new divergence measures with custom properties that are tailored to the application. To examine the practical efficacy of these new methods, we evaluate them in a statistical learning framework on a series of real-world data science problems involving speech-based monitoring of neuro-motor disorders.
Dissertation/Thesis
Doctoral Dissertation Electrical Engineering 2017
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography