Tesi: "Generalization bound"

1

McDonald, Daniel J. "Generalization Error Bounds for Time Series". Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/184.

Testo completo

Abstract (sommario):

In this thesis, I derive generalization error bounds — bounds on the expected inaccuracy of the predictions — for time series forecasting models. These bounds allow forecasters to select among competing models, and to declare that, with high probability, their chosen model will perform well — without making strong assumptions about the data generating process or appealing to asymptotic theory. Expanding upon results from statistical learning theory, I demonstrate how these techniques can help time series forecasters to choose models which behave well under uncertainty. I also show how to estimate the β-mixing coefficients for dependent data so that my results can be used empirically. I use the bound explicitly to evaluate different predictive models for the volatility of IBM stock and for a standard set of macroeconomic variables. Taken together my results show how to control the generalization error of time series models with fixed or growing memory.

Gli stili APA, Harvard, Vancouver, ISO e altri

2

Kroon, Rodney Stephen. "Support vector machines, generalization bounds, and transduction". Thesis, Stellenbosch : University of Stellenbosch, 2003. http://hdl.handle.net/10019.1/16375.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

3

Kelby, Robin J. "Formalized Generalization Bounds for Perceptron-Like Algorithms". Ohio University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1594805966855804.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

4

Giulini, Ilaria. "Generalization bounds for random samples in Hilbert spaces". Thesis, Paris, Ecole normale supérieure, 2015. http://www.theses.fr/2015ENSU0026/document.

Testo completo

Abstract (sommario):

Ce travail de thèse porte sur l'obtention de bornes de généralisation pour des échantillons statistiques à valeur dans des espaces de Hilbert définis par des noyaux reproduisants. L'approche consiste à obtenir des bornes non asymptotiques indépendantes de la dimension dans des espaces de dimension finie, en utilisant des inégalités PAC-Bayesiennes liées à une perturbation Gaussienne du paramètre et à les étendre ensuite aux espaces de Hilbert séparables. On se pose dans un premier temps la question de l'estimation de l'opérateur de Gram à partir d'un échantillon i. i. d. par un estimateur robuste et on propose des bornes uniformes, sous des hypothèses faibles de moments. Ces résultats permettent de caractériser l'analyse en composantes principales indépendamment de la dimension et d'en proposer des variantes robustes. On propose ensuite un nouvel algorithme de clustering spectral. Au lieu de ne garder que la projection sur les premiers vecteurs propres, on calcule une itérée du Laplacian normalisé. Cette itération, justifiée par l'analyse du clustering en termes de chaînes de Markov, opère comme une version régularisée de la projection sur les premiers vecteurs propres et permet d'obtenir un algorithme dans lequel le nombre de clusters est déterminé automatiquement. On présente des bornes non asymptotiques concernant la convergence de cet algorithme, lorsque les points à classer forment un échantillon i. i. d. d'une loi à support compact dans un espace de Hilbert. Ces bornes sont déduites des bornes obtenues pour l'estimation d'un opérateur de Gram dans un espace de Hilbert. On termine par un aperçu de l'intérêt du clustering spectral dans le cadre de l'analyse d'images
This thesis focuses on obtaining generalization bounds for random samples in reproducing kernel Hilbert spaces. The approach consists in first obtaining non-asymptotic dimension-free bounds in finite-dimensional spaces using some PAC-Bayesian inequalities related to Gaussian perturbations and then in generalizing the results in a separable Hilbert space. We first investigate the question of estimating the Gram operator by a robust estimator from an i. i. d. sample and we present uniform bounds that hold under weak moment assumptions. These results allow us to qualify principal component analysis independently of the dimension of the ambient space and to propose stable versions of it. In the last part of the thesis we present a new algorithm for spectral clustering. It consists in replacing the projection on the eigenvectors associated with the largest eigenvalues of the Laplacian matrix by a power of the normalized Laplacian. This iteration, justified by the analysis of clustering in terms of Markov chains, performs a smooth truncation. We prove nonasymptotic bounds for the convergence of our spectral clustering algorithm applied to a random sample of points in a Hilbert space that are deduced from the bounds for the Gram operator in a Hilbert space. Experiments are done in the context of image analysis

Gli stili APA, Harvard, Vancouver, ISO e altri

5

Wade, Modou. "Apprentissage profond pour les processus faiblement dépendants". Electronic Thesis or Diss., CY Cergy Paris Université, 2024. http://www.theses.fr/2024CYUN1299.

Testo completo

Abstract (sommario):

Cette thèse porte sur l'apprentissage profond pour les processus faiblement dépendants. Nous avons considéré une classe d'estimateur de réseau de neurones profonds avec la régularisation par sparsité et/ou la régularisation par pénalité.Le chapitre1 est une synthèse des travaux. Il s'agit ici de présenter le cadre de l'apprentissage profond et de rappeler les principaux résultats obtenus aux chapitres 2, 3, 4, 5, 6.Le chapitre 2 considère l'apprentissage profond pour les processus psi-faiblement dépendants. Nous avons établi une vitesse de convergence de l'algorithme de minimisation du risque empirique (MRE) sur la classe des estimateurs de réseaux de neurones profonds (RNPs). Pour ces estimateurs, nous avons fourni une borne de généralisation et une vitesse asymptotique d'ordre O(n^{-1/alpha}) pour tout alpha > 2 est obtenue. Une borne de l'excès de risque pour une large classe de prédicteurs cibles est aussi établie.Le chapitre 3 présente l'estimateur de réseaux de neurones profonds pénalisé sous la dépendance faible. Nous avons considéré les problèmes de régression non-paramétrique et de classification pour les processus faiblement dépendants. Nous avons utilisé une méthode de régularisation par pénalisation. Pour la régression non-paramétrique et la classification binaire, nous avons établi une inégalité d'oracle pour l'excès de risque de l'estimateur de réseau de neurones pénalisé. Nous avons aussi fourni une vitesse de convergence de ces estimateurs.Le chapitre 4 porte sur l'estimateur de réseaux de neurones profonds pénalisé avec une fonction de perte générale sous la dépendance faible. Nous nous sommes placés dans le cadre de la structure de psi-dépendance faible et dans le cas spécifique où les observations sont bornées, nous avons utilisé la theta_{infty}-dépendance faible. Pour l'apprentissage des processus psi et theta_{infty}-faiblement dépendants, nous avons établi une inégalité d'oracle pour les excès de risque de l'estimateur de réseau de neurones profond pénalisé. Nous avons montré que lorsque la fonction cible est suffisamment régulière, la vitesse de convergence de ces excès de risque est d'ordre O(n^{-1/3}). Le chapitre 5 présente l'apprentissage profond robuste à partir de données faiblement dépendantes. Nous avons supposé que la variable de sortie admet des moments d'ordre r finis, avec r >= 1. Pour l'apprentissage des processus à mélange forts et psi-faiblement dépendants, une borne non asymptotique de l'espérance de l'excès de risque de l'estimateur de réseau de neurones est établie. Nous avons montré que lorsque la fonction cible appartient à la classe des fonctions de H"older régulières la vitesse de convergence de l'espérance de l'excès de risque obtenue sur l'apprentissage des données exponentiellement fort mélangeant est proche de ou égale à celle obtenue avec un échantillon indépendant et identiquement distribué (i.i.d.). Le chapitre 6 porte sur l'apprentissage profond pour les processus fortement mélangeants avec une régularisation par pénalité et minimax optimalité. Nous avons considéré aussi le problème de la régression non-paramétrique sur des données à mélange fort avec un bruit sous-exponentiel. Ainsi, lorsque la fonction cible appartient à la classe de composition de fonctions de H"older nous avons établi une borne supérieure de l'inégalité d'oracle de l'erreur L_2. Dans le cas spécifique de la régression autorégressive avec un bruit de Laplace ou normal standard, nous avons fourni une borne inférieure de l'erreur L_2 dans cette classe, qui correspond à un facteur logarithmique près à la borne supérieure ; ainsi l'estimateur de réseau de neurones profonds atteint une vitesse de convergence optimale
This thesis focuses on deep learning for weakly dependent processes. We consider a class of deep neural network estimators with sparsity regularisation and/or penalty regularisation.Chapter1 is a summary of the work. It presents the deep learning framework and reviews the main results obtained in chapters 2, 3, 4, 5 and 6.Chapter 2 considers deep learning for psi-weakly dependent processes. We have established the convergence rate of the empirical risk minimization (ERM) algorithm on the class of deep neural network (DNN) estimators. For these estimators, we have provided a generalization bound and an asymptotic learning rate of order O(n^{-1/alpha}) for all alpha > 2 is obtained. A bound of the excess risk for a large class of target predictors is also established. Chapter 3 presents the sparse-penalized deep neural networks estimator under weak dependence. We consider nonparametric regression and classification problems for weakly dependent processes. We use a method of regularization by penalization. For nonparametric regression and binary classification, we establish an oracle inequality for the excess risk of the sparse-penalized deep neural networks (SPDNN) estimator. We have also provided a convergence rate for these estimators.Chapter 4 focuses on the penalized deep neural networks estimator with a general loss function under weak dependence. We consider the psi-weak dependence structure and, in the specific case where the observations are bounded, we deal with the theta_{infty}-weak dependence. For learning psi and theta_{infty}-weakly dependent processes, we have established an oracle inequality for the excess risks of the sparse-penalized deep neural networks estimator. We have shown that when the target function is sufficiently smooth, the convergence rate of these excess risks is close to O(n^{-1/3}).Chapter 5 presents robust deep learning from weakly dependent data. We assume that the output variable has finite r moments, with r >= 1. For learning strong mixing and psi-weakly dependent processes, a non-asymptotic bound for the expected excess risk of the deep neural networks estimator is established. We have shown that when the target function belongs to the class of H"older smooth functions, the convergence rate of the expected excess risk for exponentially strongly mixing data is close to or equal to that obtained with an independent and identically distributed sample. Chapter 6 focuses on deep learning for strongly mixing observation with sparse-penalized regularization and minimax optimality. We have provided an oracle inequality and a bound on the class of H"older smooth functions for the expected excess risk of the deep neural network estimator. We have also considered the problem of nonparametric regression from strongly mixing data with sub-exponential noise. When the target function belongs to the class of H"older composition functions, we have established an upper bound for the oracle inequality of the L_2 error. In the specific case of autoregressive regression with standard Laplace or normal error, we have provided a lower bound for the L_2 error in this classe, which matches up to a logarithmic factor the upper bound; thus the deep neural network estimator achieves optimal convergence rate

Gli stili APA, Harvard, Vancouver, ISO e altri

6

Rakhlin, Alexander. "Applications of empirical processes in learning theory : algorithmic stability and generalization bounds". Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/34564.

Testo completo

Abstract (sommario):

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2006.
Includes bibliographical references (p. 141-148).
This thesis studies two key properties of learning algorithms: their generalization ability and their stability with respect to perturbations. To analyze these properties, we focus on concentration inequalities and tools from empirical process theory. We obtain theoretical results and demonstrate their applications to machine learning. First, we show how various notions of stability upper- and lower-bound the bias and variance of several estimators of the expected performance for general learning algorithms. A weak stability condition is shown to be equivalent to consistency of empirical risk minimization. The second part of the thesis derives tight performance guarantees for greedy error minimization methods - a family of computationally tractable algorithms. In particular, we derive risk bounds for a greedy mixture density estimation procedure. We prove that, unlike what is suggested in the literature, the number of terms in the mixture is not a bias-variance trade-off for the performance. The third part of this thesis provides a solution to an open problem regarding the stability of Empirical Risk Minimization (ERM). This algorithm is of central importance in Learning Theory.
(cont.) By studying the suprema of the empirical process, we prove that ERM over Donsker classes of functions is stable in the L1 norm. Hence, as the number of samples grows, it becomes less and less likely that a perturbation of o(v/n) samples will result in a very different empirical minimizer. Asymptotic rates of this stability are proved under metric entropy assumptions on the function class. Through the use of a ratio limit inequality, we also prove stability of expected errors of empirical minimizers. Next, we investigate applications of the stability result. In particular, we focus on procedures that optimize an objective function, such as k-means and other clustering methods. We demonstrate that stability of clustering, just like stability of ERM, is closely related to the geometry of the class and the underlying measure. Furthermore, our result on stability of ERM delineates a phase transition between stability and instability of clustering methods. In the last chapter, we prove a generalization of the bounded-difference concentration inequality for almost-everywhere smooth functions. This result can be utilized to analyze algorithms which are almost always stable. Next, we prove a phase transition in the concentration of almost-everywhere smooth functions. Finally, a tight concentration of empirical errors of empirical minimizers is shown under an assumption on the underlying space.
by Alexander Rakhlin.
Ph.D.

Gli stili APA, Harvard, Vancouver, ISO e altri

7

Bellet, Aurélien. "Supervised metric learning with generalization guarantees". Phd thesis, Université Jean Monnet - Saint-Etienne, 2012. http://tel.archives-ouvertes.fr/tel-00770627.

Testo completo

Abstract (sommario):

In recent years, the crucial importance of metrics in machine learningalgorithms has led to an increasing interest in optimizing distanceand similarity functions using knowledge from training data to make them suitable for the problem at hand.This area of research is known as metric learning. Existing methods typically aim at optimizing the parameters of a given metric with respect to some local constraints over the training sample. The learned metrics are generally used in nearest-neighbor and clustering algorithms.When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance, which is parameterized by a positive semi-definite matrix. Recent methods offer good scalability to large datasets.Less work has been devoted to metric learning from structured objects (such as strings or trees), because it often involves complex procedures. Most of the work has focused on optimizing a notion of edit distance, which measures (in terms of number of operations) the cost of turning an object into another.We identify two important limitations of current supervised metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not really been studied so far. Second, and perhaps more importantly, the question of the generalization ability of metric learning methods has been largely ignored.In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Unlike other string kernels, it is guaranteed to be valid and parameter-free. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (epsilon,gamma,tau)-good similarity functions and formulated as a convex optimization problem. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend the same ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (epsilon,gamma,tau)-goodness. The similarity is learned based on global constraints that are more appropriate to linear classification. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms. It is based on a simple adaptation of the notion of algorithmic robustness and allows the derivation of bounds for various loss functions and regularizers.

Gli stili APA, Harvard, Vancouver, ISO e altri

8

Nordenfors, Oskar. "A Literature Study Concerning Generalization Error Bounds for Neural Networks via Rademacher Complexity". Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184487.

Testo completo

Abstract (sommario):

In this essay some fundamental results from the theory of machine learning and neural networks are presented, with the goal of finally discussing bounds on the generalization error of neural networks, via Rademacher complexity.
I denna uppsats presenteras några grundläggande resultat från teorin kring maskininlärning och neurala nätverk, med målet att slutligen diskutera övre begräsningar på generaliseringsfelet hos neurala nätverk, via Rademachers komplexitet.

Gli stili APA, Harvard, Vancouver, ISO e altri

9

Katsikarelis, Ioannis. "Structurally Parameterized Tight Bounds and Approximation for Generalizations of Independence and Domination". Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLED048.

Testo completo

Abstract (sommario):

Nous nous concentrons sur les problèmes (k, r)-CENTER et d-SCATTERED SET qui généralisent les concepts de domination et indépendance des sommets, sur les distances plus grandes.Dans la première partie, nous examinons le paramétrage standard, ainsi que les paramètres des graphes mesurant la structure de l’entrée. Nous proposons des résultats qui montrent qu’il n’existe pas d’algorithme avec un temps d’exécution inférieur à certaines limites, si l’hypothèse du temps exponentiel est vraie, nous produisons des algorithmes de complexité essentiellement optimale qui correspondent à ces limites et nous essayons en outre de proposer une alternative au calcul exact en temps considérablement réduit, grâce à l’approximation.Dans la deuxième partie, nous considérons l’approximabilité (super-)polynômiale du problème de d-SCATTERED SET c’est-à-dire que nous déterminons la relation exacte entre un réalisable rapport d’approximation ρ, le paramètre de distance d et le temps d’exécution de l’algorithme avec un rapport de ρ, en fonction de ce qui précède et de la taille de l’entrée n. Nous considérons ensuite les temps d’exécution strictement polynomiaux et les graphes de degré maximal borné, ainsi que les graphes bipartites
In this thesis we focus on the NP-hard problems (k, r)-CENTER and d-SCATTERED SET that generalize the well-studied concepts of domination and independence over larger distances. In the first part we maintain a parameterized viewpoint and examine the standard parameterization as well as the most widely-used graph parameters measuring the input’s structure. We offer hardness results that show there is no algorithm of running-time below certain bounds, subject to the (Strong) Exponential Time Hypothesis, produce essentially optimal algorithms of complexity that matches these lower bounds and further attempt to offer an alternative to exact computation in significantly reduced running-time by way of approximation algorithms. In the second part we consider the (super-)polynomial (in-)approximability of the d-SCATTERED SET problem, i.e. we determine the exact relationship between an achievable approximation ratio ρ, the distance parameter d, and the runningtime of any ρ-approximation algorithm expressed as a function of the above and the size of the input n. We then consider strictly polynomial running-times and improve our understanding on the approximability characteristics of the problem on graphs of bounded maximum degree as well as bipartite graphs

Gli stili APA, Harvard, Vancouver, ISO e altri

10

Musayeva, Khadija. "Generalization Performance of Margin Multi-category Classifiers". Thesis, Université de Lorraine, 2019. http://www.theses.fr/2019LORR0096/document.

Testo completo

Abstract (sommario):

Cette thèse porte sur la théorie de la discrimination multi-classe à marge. Elle a pour cadre la théorie statistique de l’apprentissage de Vapnik et Chervonenkis. L’objectif est d’établir des bornes de généralisation possédant une dépendances explicite au nombre C de catégories, à la taille m de l’échantillon et au paramètre de marge gamma, lorsque la fonction de perte considérée est une fonction de perte à marge possédant la propriété d’être lipschitzienne. La borne de généralisation repose sur la performance empirique du classifieur ainsi que sur sa "capacité". Dans cette thèse, les mesures de capacité considérées sont les suivantes : la complexité de Rademacher, les nombres de recouvrement et la dimension fat-shattering. Nos principales contributions sont obtenues sous l’hypothèse que les classes de fonctions composantes calculées par le classifieur ont des dimensions fat-shattering polynomiales et que les fonctions composantes sont indépendantes. Dans le contexte du schéma de calcul introduit par Mendelson, qui repose sur les relations entre les mesures de capacité évoquées plus haut, nous étudions l’impact que la décomposition au niveau de l’une de ces mesures de capacité a sur les dépendances (de la borne de généralisation) à C, m et gamma. En particulier, nous démontrons que la dépendance à C peut être considérablement améliorée par rapport à l’état de l’art si la décomposition est reportée au niveau du nombre de recouvrement ou de la dimension fat-shattering. Ce changement peut affecter négativement le taux de convergence (dépendance à m), ce qui souligne le fait que l’optimisation par rapport aux trois paramètres fondamentaux se traduit par la recherche d’un compromis
This thesis deals with the theory of margin multi-category classification, and is based on the statistical learning theory founded by Vapnik and Chervonenkis. We are interested in deriving generalization bounds with explicit dependencies on the number C of categories, the sample size m and the margin parameter gamma, when the loss function considered is a Lipschitz continuous margin loss function. Generalization bounds rely on the empirical performance of the classifier as well as its "capacity". In this work, the following scale-sensitive capacity measures are considered: the Rademacher complexity, the covering numbers and the fat-shattering dimension. Our main contributions are obtained under the assumption that the classes of component functions implemented by a classifier have polynomially growing fat-shattering dimensions and that the component functions are independent. In the context of the pathway of Mendelson, which relates the Rademacher complexity to the covering numbers and the latter to the fat-shattering dimension, we study the impact that decomposing at the level of one of these capacity measures has on the dependencies on C, m and gamma. In particular, we demonstrate that the dependency on C can be substantially improved over the state of the art if the decomposition is postponed to the level of the metric entropy or the fat-shattering dimension. On the other hand, this impacts negatively the rate of convergence (dependency on m), an indication of the fact that optimizing the dependencies on the three basic parameters amounts to looking for a trade-off

Gli stili APA, Harvard, Vancouver, ISO e altri

11

Musayeva, Khadija. "Generalization Performance of Margin Multi-category Classifiers". Electronic Thesis or Diss., Université de Lorraine, 2019. http://www.theses.fr/2019LORR0096.

Testo completo

Abstract (sommario):

Cette thèse porte sur la théorie de la discrimination multi-classe à marge. Elle a pour cadre la théorie statistique de l’apprentissage de Vapnik et Chervonenkis. L’objectif est d’établir des bornes de généralisation possédant une dépendances explicite au nombre C de catégories, à la taille m de l’échantillon et au paramètre de marge gamma, lorsque la fonction de perte considérée est une fonction de perte à marge possédant la propriété d’être lipschitzienne. La borne de généralisation repose sur la performance empirique du classifieur ainsi que sur sa "capacité". Dans cette thèse, les mesures de capacité considérées sont les suivantes : la complexité de Rademacher, les nombres de recouvrement et la dimension fat-shattering. Nos principales contributions sont obtenues sous l’hypothèse que les classes de fonctions composantes calculées par le classifieur ont des dimensions fat-shattering polynomiales et que les fonctions composantes sont indépendantes. Dans le contexte du schéma de calcul introduit par Mendelson, qui repose sur les relations entre les mesures de capacité évoquées plus haut, nous étudions l’impact que la décomposition au niveau de l’une de ces mesures de capacité a sur les dépendances (de la borne de généralisation) à C, m et gamma. En particulier, nous démontrons que la dépendance à C peut être considérablement améliorée par rapport à l’état de l’art si la décomposition est reportée au niveau du nombre de recouvrement ou de la dimension fat-shattering. Ce changement peut affecter négativement le taux de convergence (dépendance à m), ce qui souligne le fait que l’optimisation par rapport aux trois paramètres fondamentaux se traduit par la recherche d’un compromis
This thesis deals with the theory of margin multi-category classification, and is based on the statistical learning theory founded by Vapnik and Chervonenkis. We are interested in deriving generalization bounds with explicit dependencies on the number C of categories, the sample size m and the margin parameter gamma, when the loss function considered is a Lipschitz continuous margin loss function. Generalization bounds rely on the empirical performance of the classifier as well as its "capacity". In this work, the following scale-sensitive capacity measures are considered: the Rademacher complexity, the covering numbers and the fat-shattering dimension. Our main contributions are obtained under the assumption that the classes of component functions implemented by a classifier have polynomially growing fat-shattering dimensions and that the component functions are independent. In the context of the pathway of Mendelson, which relates the Rademacher complexity to the covering numbers and the latter to the fat-shattering dimension, we study the impact that decomposing at the level of one of these capacity measures has on the dependencies on C, m and gamma. In particular, we demonstrate that the dependency on C can be substantially improved over the state of the art if the decomposition is postponed to the level of the metric entropy or the fat-shattering dimension. On the other hand, this impacts negatively the rate of convergence (dependency on m), an indication of the fact that optimizing the dependencies on the three basic parameters amounts to looking for a trade-off

Gli stili APA, Harvard, Vancouver, ISO e altri

12

Philips, Petra Camilla, e petra philips@gmail com. "Data-Dependent Analysis of Learning Algorithms". The Australian National University. Research School of Information Sciences and Engineering, 2005. http://thesis.anu.edu.au./public/adt-ANU20050901.204523.

Testo completo

Abstract (sommario):

This thesis studies the generalization ability of machine learning algorithms in a statistical setting. It focuses on the data-dependent analysis of the generalization performance of learning algorithms in order to make full use of the potential of the actual training sample from which these algorithms learn.¶ First, we propose an extension of the standard framework for the derivation of generalization bounds for algorithms taking their hypotheses from random classes of functions. This approach is motivated by the fact that the function produced by a learning algorithm based on a random sample of data depends on this sample and is therefore a random function. Such an approach avoids the detour of the worst-case uniform bounds as done in the standard approach. We show that the mechanism which allows one to obtain generalization bounds for random classes in our framework is based on a “small complexity” of certain random coordinate projections. We demonstrate how this notion of complexity relates to learnability and how one can explore geometric properties of these projections in order to derive estimates of rates of convergence and good confidence interval estimates for the expected risk. We then demonstrate the generality of our new approach by presenting a range of examples, among them the algorithm-dependent compression schemes and the data-dependent luckiness frameworks, which fall into our random subclass framework.¶ Second, we study in more detail generalization bounds for a specific algorithm which is of central importance in learning theory, namely the Empirical Risk Minimization algorithm (ERM). Recent results show that one can significantly improve the high-probability estimates for the convergence rates for empirical minimizers by a direct analysis of the ERM algorithm. These results are based on a new localized notion of complexity of subsets of hypothesis functions with identical expected errors and are therefore dependent on the underlying unknown distribution. We investigate the extent to which one can estimate these high-probability convergence rates in a data-dependent manner. We provide an algorithm which computes a data-dependent upper bound for the expected error of empirical minimizers in terms of the “complexity” of data-dependent local subsets. These subsets are sets of functions of empirical errors of a given range and can be determined based solely on empirical data. We then show that recent direct estimates, which are essentially sharp estimates on the high-probability convergence rate for the ERM algorithm, can not be recovered universally from empirical data.

Gli stili APA, Harvard, Vancouver, ISO e altri

13

Peel, Thomas. "Algorithmes de poursuite stochastiques et inégalités de concentration empiriques pour l'apprentissage statistique". Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4769/document.

Testo completo

Abstract (sommario):

La première partie de cette thèse introduit de nouveaux algorithmes de décomposition parcimonieuse de signaux. Basés sur Matching Pursuit (MP) ils répondent au problème suivant : comment réduire le temps de calcul de l'étape de sélection de MP, souvent très coûteuse. En réponse, nous sous-échantillonnons le dictionnaire à chaque itération, en lignes et en colonnes. Nous montrons que cette approche fondée théoriquement affiche de bons résultats en pratique. Nous proposons ensuite un algorithme itératif de descente de gradient par blocs de coordonnées pour sélectionner des caractéristiques en classification multi-classes. Celui-ci s'appuie sur l'utilisation de codes correcteurs d'erreurs transformant le problème en un problème de représentation parcimonieuse simultanée de signaux. La deuxième partie expose de nouvelles inégalités de concentration empiriques de type Bernstein. En premier, elles concernent la théorie des U-statistiques et sont utilisées pour élaborer des bornes en généralisation dans le cadre d'algorithmes de ranking. Ces bornes tirent parti d'un estimateur de variance pour lequel nous proposons un algorithme de calcul efficace. Ensuite, nous présentons une version empirique de l'inégalité de type Bernstein proposée par Freedman [1975] pour les martingales. Ici encore, la force de notre borne réside dans l'introduction d'un estimateur de variance calculable à partir des données. Cela nous permet de proposer des bornes en généralisation pour l'ensemble des algorithmes d'apprentissage en ligne améliorant l'état de l'art et ouvrant la porte à une nouvelle famille d'algorithmes d'apprentissage tirant parti de cette information empirique
The first part of this thesis introduces new algorithms for the sparse encoding of signals. Based on Matching Pursuit (MP) they focus on the following problem : how to reduce the computation time of the selection step of MP. As an answer, we sub-sample the dictionary in line and column at each iteration. We show that this theoretically grounded approach has good empirical performances. We then propose a bloc coordinate gradient descent algorithm for feature selection problems in the multiclass classification setting. Thanks to the use of error-correcting output codes, this task can be seen as a simultaneous sparse encoding of signals problem. The second part exposes new empirical Bernstein inequalities. Firstly, they concern the theory of the U-Statistics and are applied in order to design generalization bounds for ranking algorithms. These bounds take advantage of a variance estimator and we propose an efficient algorithm to compute it. Then, we present an empirical version of the Bernstein type inequality for martingales by Freedman [1975]. Again, the strength of our result lies in the variance estimator computable from the data. This allows us to propose generalization bounds for online learning algorithms which improve the state of the art and pave the way to a new family of learning algorithms taking advantage of this empirical information

Gli stili APA, Harvard, Vancouver, ISO e altri

14

(11196552), Kevin Segundo Bello Medina. "STRUCTURED PREDICTION: STATISTICAL AND COMPUTATIONAL GUARANTEES IN LEARNING AND INFERENCE". Thesis, 2021.

Cerca il testo completo

Abstract (sommario):

Structured prediction consists of receiving a structured input and producing a combinatorial structure such as trees, clusters, networks, sequences, permutations, among others. From the computational viewpoint, structured prediction is in general considered intractable because of the size of the output space being exponential in the input size. For instance, in image segmentation tasks, the number of admissible segments is exponential in the number of pixels. A second factor is the combination of the input dimensionality along with the amount of data under availability. In structured prediction it is common to have the input live in a high-dimensional space, which involves to jointly reason about thousands or millions of variables, and at the same time contend with limited amount of data. Thus, learning and inference methods with strong computational and statistical guarantees are desired. The focus of our research is then to propose principled methods for structured prediction that are both polynomial time, i.e., computationally efficient, and require a polynomial number of data samples, i.e., statistically efficient.

The main contributions of this thesis are as follows:

(i) We develop an efficient and principled learning method of latent variable models for structured prediction under Gaussian perturbations. We derive a Rademacher-based generalization bound and argue that the use of non-convex formulations in learning latent-variable models leads to tighter bounds of the Gibbs decoder distortion.

(ii) We study the fundamental limits of structured prediction, i.e., we characterize the necessary sample complexity for learning factor graph models in the context of structured prediction. In particular, we show that the finiteness of our novel MaxPair-dimension is necessary for learning. Lastly, we show a connection between the MaxPair-dimension and the VC-dimension---which allows for using existing results on VC-dimension to calculate the MaxPair-dimension.

(iii) We analyze a generative model based on connected graphs, and find the structural conditions of the graph that allow for the exact recovery of the node labels. In particular, we show that exact recovery is realizable in polynomial time for a large class of graphs. Our analysis is based on convex relaxations, where we thoroughly analyze a semidefinite program and a degree-4 sum-of-squares program. Finally, we extend this model to consider linear constraints (e.g., fairness), and formally explain the effect of the added constraints on the probability of exact recovery.

Gli stili APA, Harvard, Vancouver, ISO e altri

15

Philips, Petra. "Data-Dependent Analysis of Learning Algorithms". Phd thesis, 2005. http://hdl.handle.net/1885/47998.

Testo completo

Abstract (sommario):

This thesis studies the generalization ability of machine learning algorithms in a statistical setting. It focuses on the data-dependent analysis of the generalization performance of learning algorithms in order to make full use of the potential of the actual training sample from which these algorithms learn.¶ First, we propose an extension of the standard framework for the derivation of generalization bounds for algorithms taking their hypotheses from random classes of functions. ... ¶ Second, we study in more detail generalization bounds for a specific algorithm which is of central importance in learning theory, namely the Empirical Risk Minimization algorithm (ERM). ...

Gli stili APA, Harvard, Vancouver, ISO e altri

Tesi sul tema "Generalization bound"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili