Dissertations / Theses: 'Generalization bounds'

1

McDonald, Daniel J. "Generalization Error Bounds for Time Series." Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/184.

Full text

Abstract:

In this thesis, I derive generalization error bounds — bounds on the expected inaccuracy of the predictions — for time series forecasting models. These bounds allow forecasters to select among competing models, and to declare that, with high probability, their chosen model will perform well — without making strong assumptions about the data generating process or appealing to asymptotic theory. Expanding upon results from statistical learning theory, I demonstrate how these techniques can help time series forecasters to choose models which behave well under uncertainty. I also show how to estimate the β-mixing coefficients for dependent data so that my results can be used empirically. I use the bound explicitly to evaluate different predictive models for the volatility of IBM stock and for a standard set of macroeconomic variables. Taken together my results show how to control the generalization error of time series models with fixed or growing memory.

APA, Harvard, Vancouver, ISO, and other styles

2

Kroon, Rodney Stephen. "Support vector machines, generalization bounds, and transduction." Thesis, Stellenbosch : University of Stellenbosch, 2003. http://hdl.handle.net/10019.1/16375.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Kelby, Robin J. "Formalized Generalization Bounds for Perceptron-Like Algorithms." Ohio University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1594805966855804.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Giulini, Ilaria. "Generalization bounds for random samples in Hilbert spaces." Thesis, Paris, Ecole normale supérieure, 2015. http://www.theses.fr/2015ENSU0026/document.

Full text

Abstract:

Ce travail de thèse porte sur l'obtention de bornes de généralisation pour des échantillons statistiques à valeur dans des espaces de Hilbert définis par des noyaux reproduisants. L'approche consiste à obtenir des bornes non asymptotiques indépendantes de la dimension dans des espaces de dimension finie, en utilisant des inégalités PAC-Bayesiennes liées à une perturbation Gaussienne du paramètre et à les étendre ensuite aux espaces de Hilbert séparables. On se pose dans un premier temps la question de l'estimation de l'opérateur de Gram à partir d'un échantillon i. i. d. par un estimateur robuste et on propose des bornes uniformes, sous des hypothèses faibles de moments. Ces résultats permettent de caractériser l'analyse en composantes principales indépendamment de la dimension et d'en proposer des variantes robustes. On propose ensuite un nouvel algorithme de clustering spectral. Au lieu de ne garder que la projection sur les premiers vecteurs propres, on calcule une itérée du Laplacian normalisé. Cette itération, justifiée par l'analyse du clustering en termes de chaînes de Markov, opère comme une version régularisée de la projection sur les premiers vecteurs propres et permet d'obtenir un algorithme dans lequel le nombre de clusters est déterminé automatiquement. On présente des bornes non asymptotiques concernant la convergence de cet algorithme, lorsque les points à classer forment un échantillon i. i. d. d'une loi à support compact dans un espace de Hilbert. Ces bornes sont déduites des bornes obtenues pour l'estimation d'un opérateur de Gram dans un espace de Hilbert. On termine par un aperçu de l'intérêt du clustering spectral dans le cadre de l'analyse d'images
This thesis focuses on obtaining generalization bounds for random samples in reproducing kernel Hilbert spaces. The approach consists in first obtaining non-asymptotic dimension-free bounds in finite-dimensional spaces using some PAC-Bayesian inequalities related to Gaussian perturbations and then in generalizing the results in a separable Hilbert space. We first investigate the question of estimating the Gram operator by a robust estimator from an i. i. d. sample and we present uniform bounds that hold under weak moment assumptions. These results allow us to qualify principal component analysis independently of the dimension of the ambient space and to propose stable versions of it. In the last part of the thesis we present a new algorithm for spectral clustering. It consists in replacing the projection on the eigenvectors associated with the largest eigenvalues of the Laplacian matrix by a power of the normalized Laplacian. This iteration, justified by the analysis of clustering in terms of Markov chains, performs a smooth truncation. We prove nonasymptotic bounds for the convergence of our spectral clustering algorithm applied to a random sample of points in a Hilbert space that are deduced from the bounds for the Gram operator in a Hilbert space. Experiments are done in the context of image analysis

APA, Harvard, Vancouver, ISO, and other styles

5

Rakhlin, Alexander. "Applications of empirical processes in learning theory : algorithmic stability and generalization bounds." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/34564.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2006.
Includes bibliographical references (p. 141-148).
This thesis studies two key properties of learning algorithms: their generalization ability and their stability with respect to perturbations. To analyze these properties, we focus on concentration inequalities and tools from empirical process theory. We obtain theoretical results and demonstrate their applications to machine learning. First, we show how various notions of stability upper- and lower-bound the bias and variance of several estimators of the expected performance for general learning algorithms. A weak stability condition is shown to be equivalent to consistency of empirical risk minimization. The second part of the thesis derives tight performance guarantees for greedy error minimization methods - a family of computationally tractable algorithms. In particular, we derive risk bounds for a greedy mixture density estimation procedure. We prove that, unlike what is suggested in the literature, the number of terms in the mixture is not a bias-variance trade-off for the performance. The third part of this thesis provides a solution to an open problem regarding the stability of Empirical Risk Minimization (ERM). This algorithm is of central importance in Learning Theory.
(cont.) By studying the suprema of the empirical process, we prove that ERM over Donsker classes of functions is stable in the L1 norm. Hence, as the number of samples grows, it becomes less and less likely that a perturbation of o(v/n) samples will result in a very different empirical minimizer. Asymptotic rates of this stability are proved under metric entropy assumptions on the function class. Through the use of a ratio limit inequality, we also prove stability of expected errors of empirical minimizers. Next, we investigate applications of the stability result. In particular, we focus on procedures that optimize an objective function, such as k-means and other clustering methods. We demonstrate that stability of clustering, just like stability of ERM, is closely related to the geometry of the class and the underlying measure. Furthermore, our result on stability of ERM delineates a phase transition between stability and instability of clustering methods. In the last chapter, we prove a generalization of the bounded-difference concentration inequality for almost-everywhere smooth functions. This result can be utilized to analyze algorithms which are almost always stable. Next, we prove a phase transition in the concentration of almost-everywhere smooth functions. Finally, a tight concentration of empirical errors of empirical minimizers is shown under an assumption on the underlying space.
by Alexander Rakhlin.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

6

Nordenfors, Oskar. "A Literature Study Concerning Generalization Error Bounds for Neural Networks via Rademacher Complexity." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184487.

Full text

Abstract:

In this essay some fundamental results from the theory of machine learning and neural networks are presented, with the goal of finally discussing bounds on the generalization error of neural networks, via Rademacher complexity.
I denna uppsats presenteras några grundläggande resultat från teorin kring maskininlärning och neurala nätverk, med målet att slutligen diskutera övre begräsningar på generaliseringsfelet hos neurala nätverk, via Rademachers komplexitet.

APA, Harvard, Vancouver, ISO, and other styles

7

Bellet, Aurélien. "Supervised metric learning with generalization guarantees." Phd thesis, Université Jean Monnet - Saint-Etienne, 2012. http://tel.archives-ouvertes.fr/tel-00770627.

Full text

Abstract:

In recent years, the crucial importance of metrics in machine learningalgorithms has led to an increasing interest in optimizing distanceand similarity functions using knowledge from training data to make them suitable for the problem at hand.This area of research is known as metric learning. Existing methods typically aim at optimizing the parameters of a given metric with respect to some local constraints over the training sample. The learned metrics are generally used in nearest-neighbor and clustering algorithms.When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance, which is parameterized by a positive semi-definite matrix. Recent methods offer good scalability to large datasets.Less work has been devoted to metric learning from structured objects (such as strings or trees), because it often involves complex procedures. Most of the work has focused on optimizing a notion of edit distance, which measures (in terms of number of operations) the cost of turning an object into another.We identify two important limitations of current supervised metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not really been studied so far. Second, and perhaps more importantly, the question of the generalization ability of metric learning methods has been largely ignored.In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Unlike other string kernels, it is guaranteed to be valid and parameter-free. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (epsilon,gamma,tau)-good similarity functions and formulated as a convex optimization problem. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend the same ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (epsilon,gamma,tau)-goodness. The similarity is learned based on global constraints that are more appropriate to linear classification. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms. It is based on a simple adaptation of the notion of algorithmic robustness and allows the derivation of bounds for various loss functions and regularizers.

APA, Harvard, Vancouver, ISO, and other styles

8

Musayeva, Khadija. "Generalization Performance of Margin Multi-category Classifiers." Thesis, Université de Lorraine, 2019. http://www.theses.fr/2019LORR0096/document.

Full text

Abstract:

Cette thèse porte sur la théorie de la discrimination multi-classe à marge. Elle a pour cadre la théorie statistique de l’apprentissage de Vapnik et Chervonenkis. L’objectif est d’établir des bornes de généralisation possédant une dépendances explicite au nombre C de catégories, à la taille m de l’échantillon et au paramètre de marge gamma, lorsque la fonction de perte considérée est une fonction de perte à marge possédant la propriété d’être lipschitzienne. La borne de généralisation repose sur la performance empirique du classifieur ainsi que sur sa "capacité". Dans cette thèse, les mesures de capacité considérées sont les suivantes : la complexité de Rademacher, les nombres de recouvrement et la dimension fat-shattering. Nos principales contributions sont obtenues sous l’hypothèse que les classes de fonctions composantes calculées par le classifieur ont des dimensions fat-shattering polynomiales et que les fonctions composantes sont indépendantes. Dans le contexte du schéma de calcul introduit par Mendelson, qui repose sur les relations entre les mesures de capacité évoquées plus haut, nous étudions l’impact que la décomposition au niveau de l’une de ces mesures de capacité a sur les dépendances (de la borne de généralisation) à C, m et gamma. En particulier, nous démontrons que la dépendance à C peut être considérablement améliorée par rapport à l’état de l’art si la décomposition est reportée au niveau du nombre de recouvrement ou de la dimension fat-shattering. Ce changement peut affecter négativement le taux de convergence (dépendance à m), ce qui souligne le fait que l’optimisation par rapport aux trois paramètres fondamentaux se traduit par la recherche d’un compromis
This thesis deals with the theory of margin multi-category classification, and is based on the statistical learning theory founded by Vapnik and Chervonenkis. We are interested in deriving generalization bounds with explicit dependencies on the number C of categories, the sample size m and the margin parameter gamma, when the loss function considered is a Lipschitz continuous margin loss function. Generalization bounds rely on the empirical performance of the classifier as well as its "capacity". In this work, the following scale-sensitive capacity measures are considered: the Rademacher complexity, the covering numbers and the fat-shattering dimension. Our main contributions are obtained under the assumption that the classes of component functions implemented by a classifier have polynomially growing fat-shattering dimensions and that the component functions are independent. In the context of the pathway of Mendelson, which relates the Rademacher complexity to the covering numbers and the latter to the fat-shattering dimension, we study the impact that decomposing at the level of one of these capacity measures has on the dependencies on C, m and gamma. In particular, we demonstrate that the dependency on C can be substantially improved over the state of the art if the decomposition is postponed to the level of the metric entropy or the fat-shattering dimension. On the other hand, this impacts negatively the rate of convergence (dependency on m), an indication of the fact that optimizing the dependencies on the three basic parameters amounts to looking for a trade-off

APA, Harvard, Vancouver, ISO, and other styles

9

Philips, Petra Camilla, and petra philips@gmail com. "Data-Dependent Analysis of Learning Algorithms." The Australian National University. Research School of Information Sciences and Engineering, 2005. http://thesis.anu.edu.au./public/adt-ANU20050901.204523.

Full text

Abstract:

This thesis studies the generalization ability of machine learning algorithms in a statistical setting. It focuses on the data-dependent analysis of the generalization performance of learning algorithms in order to make full use of the potential of the actual training sample from which these algorithms learn.¶ First, we propose an extension of the standard framework for the derivation of generalization bounds for algorithms taking their hypotheses from random classes of functions. This approach is motivated by the fact that the function produced by a learning algorithm based on a random sample of data depends on this sample and is therefore a random function. Such an approach avoids the detour of the worst-case uniform bounds as done in the standard approach. We show that the mechanism which allows one to obtain generalization bounds for random classes in our framework is based on a “small complexity” of certain random coordinate projections. We demonstrate how this notion of complexity relates to learnability and how one can explore geometric properties of these projections in order to derive estimates of rates of convergence and good confidence interval estimates for the expected risk. We then demonstrate the generality of our new approach by presenting a range of examples, among them the algorithm-dependent compression schemes and the data-dependent luckiness frameworks, which fall into our random subclass framework.¶ Second, we study in more detail generalization bounds for a specific algorithm which is of central importance in learning theory, namely the Empirical Risk Minimization algorithm (ERM). Recent results show that one can significantly improve the high-probability estimates for the convergence rates for empirical minimizers by a direct analysis of the ERM algorithm. These results are based on a new localized notion of complexity of subsets of hypothesis functions with identical expected errors and are therefore dependent on the underlying unknown distribution. We investigate the extent to which one can estimate these high-probability convergence rates in a data-dependent manner. We provide an algorithm which computes a data-dependent upper bound for the expected error of empirical minimizers in terms of the “complexity” of data-dependent local subsets. These subsets are sets of functions of empirical errors of a given range and can be determined based solely on empirical data. We then show that recent direct estimates, which are essentially sharp estimates on the high-probability convergence rate for the ERM algorithm, can not be recovered universally from empirical data.

APA, Harvard, Vancouver, ISO, and other styles

10

Katsikarelis, Ioannis. "Structurally Parameterized Tight Bounds and Approximation for Generalizations of Independence and Domination." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLED048.

Full text

Abstract:

Nous nous concentrons sur les problèmes (k, r)-CENTER et d-SCATTERED SET qui généralisent les concepts de domination et indépendance des sommets, sur les distances plus grandes.Dans la première partie, nous examinons le paramétrage standard, ainsi que les paramètres des graphes mesurant la structure de l’entrée. Nous proposons des résultats qui montrent qu’il n’existe pas d’algorithme avec un temps d’exécution inférieur à certaines limites, si l’hypothèse du temps exponentiel est vraie, nous produisons des algorithmes de complexité essentiellement optimale qui correspondent à ces limites et nous essayons en outre de proposer une alternative au calcul exact en temps considérablement réduit, grâce à l’approximation.Dans la deuxième partie, nous considérons l’approximabilité (super-)polynômiale du problème de d-SCATTERED SET c’est-à-dire que nous déterminons la relation exacte entre un réalisable rapport d’approximation ρ, le paramètre de distance d et le temps d’exécution de l’algorithme avec un rapport de ρ, en fonction de ce qui précède et de la taille de l’entrée n. Nous considérons ensuite les temps d’exécution strictement polynomiaux et les graphes de degré maximal borné, ainsi que les graphes bipartites
In this thesis we focus on the NP-hard problems (k, r)-CENTER and d-SCATTERED SET that generalize the well-studied concepts of domination and independence over larger distances. In the first part we maintain a parameterized viewpoint and examine the standard parameterization as well as the most widely-used graph parameters measuring the input’s structure. We offer hardness results that show there is no algorithm of running-time below certain bounds, subject to the (Strong) Exponential Time Hypothesis, produce essentially optimal algorithms of complexity that matches these lower bounds and further attempt to offer an alternative to exact computation in significantly reduced running-time by way of approximation algorithms. In the second part we consider the (super-)polynomial (in-)approximability of the d-SCATTERED SET problem, i.e. we determine the exact relationship between an achievable approximation ratio ρ, the distance parameter d, and the runningtime of any ρ-approximation algorithm expressed as a function of the above and the size of the input n. We then consider strictly polynomial running-times and improve our understanding on the approximability characteristics of the problem on graphs of bounded maximum degree as well as bipartite graphs

APA, Harvard, Vancouver, ISO, and other styles

11

Fukihara, Yoji. "Generalization of Bounded Linear Logic and its Categorical Semantics." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263441.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Schweighofer, Markus. "Iterated rings of bounded elements and generalizations of Schmüdgen's theorem." [S.l. : s.n.], 2002. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB9911683.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Peel, Thomas. "Algorithmes de poursuite stochastiques et inégalités de concentration empiriques pour l'apprentissage statistique." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4769/document.

Full text

Abstract:

La première partie de cette thèse introduit de nouveaux algorithmes de décomposition parcimonieuse de signaux. Basés sur Matching Pursuit (MP) ils répondent au problème suivant : comment réduire le temps de calcul de l'étape de sélection de MP, souvent très coûteuse. En réponse, nous sous-échantillonnons le dictionnaire à chaque itération, en lignes et en colonnes. Nous montrons que cette approche fondée théoriquement affiche de bons résultats en pratique. Nous proposons ensuite un algorithme itératif de descente de gradient par blocs de coordonnées pour sélectionner des caractéristiques en classification multi-classes. Celui-ci s'appuie sur l'utilisation de codes correcteurs d'erreurs transformant le problème en un problème de représentation parcimonieuse simultanée de signaux. La deuxième partie expose de nouvelles inégalités de concentration empiriques de type Bernstein. En premier, elles concernent la théorie des U-statistiques et sont utilisées pour élaborer des bornes en généralisation dans le cadre d'algorithmes de ranking. Ces bornes tirent parti d'un estimateur de variance pour lequel nous proposons un algorithme de calcul efficace. Ensuite, nous présentons une version empirique de l'inégalité de type Bernstein proposée par Freedman [1975] pour les martingales. Ici encore, la force de notre borne réside dans l'introduction d'un estimateur de variance calculable à partir des données. Cela nous permet de proposer des bornes en généralisation pour l'ensemble des algorithmes d'apprentissage en ligne améliorant l'état de l'art et ouvrant la porte à une nouvelle famille d'algorithmes d'apprentissage tirant parti de cette information empirique
The first part of this thesis introduces new algorithms for the sparse encoding of signals. Based on Matching Pursuit (MP) they focus on the following problem : how to reduce the computation time of the selection step of MP. As an answer, we sub-sample the dictionary in line and column at each iteration. We show that this theoretically grounded approach has good empirical performances. We then propose a bloc coordinate gradient descent algorithm for feature selection problems in the multiclass classification setting. Thanks to the use of error-correcting output codes, this task can be seen as a simultaneous sparse encoding of signals problem. The second part exposes new empirical Bernstein inequalities. Firstly, they concern the theory of the U-Statistics and are applied in order to design generalization bounds for ranking algorithms. These bounds take advantage of a variance estimator and we propose an efficient algorithm to compute it. Then, we present an empirical version of the Bernstein type inequality for martingales by Freedman [1975]. Again, the strength of our result lies in the variance estimator computable from the data. This allows us to propose generalization bounds for online learning algorithms which improve the state of the art and pave the way to a new family of learning algorithms taking advantage of this empirical information

APA, Harvard, Vancouver, ISO, and other styles

14

Polat, Faruk. "On The Generalizations And Properties Of Abramovich-wickstead Spaces." Phd thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12610166/index.pdf.

Full text

Abstract:

In this thesis, we study two problems. The first problem is to introduce the general version of Abramovich-Wickstead type spaces and investigate its order properties. In particular, we study the ideals, order bounded sets, disjointness properties, Dedekind completion and the norm properties of this Riesz space. We also define a new concrete example of Riesz space-valued uniformly continuous functions, denoted by CDr0 which generalizes the original Abramovich-Wickstead space. It is also shown that similar spaces CD0 and CDw introduced earlier by Alpay and Ercan are decomposable lattice-normed spaces. The second problem is related to analytic representations of different classes of dominated operators on these spaces. Our main representation theorems say that regular linear operators on CDr0 or linear dominated operators on CD0 may be represented as the sum of integration with respect to operator-valued measure and summation operation. In the case when the operator is order continuous or bo-continuous, then these representations reduce to discrete parts.

APA, Harvard, Vancouver, ISO, and other styles

15

Chinot, Geoffrey. "Localization methods with applications to robust learning and interpolation." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAG002.

Full text

Abstract:

Cette thèse de doctorat est centrée sur l'apprentissage supervisé. L'objectif principal est l'utilisation de méthodes de localisation pour obtenir des vitesses rapides de convergence, c'est-à-dire, des vitesse de l'ordre O(1/n), où n est le nombre d'observations. Ces vitesses ne sont pas toujours atteignables. Il faut imposer des contraintes sur la variance du problème comme une condition de Bernstein ou de marge. Plus particulièrement, dans cette thèse nous tentons d'établir des vitesses rapides de convergences pour des problèmes de robustesse et d'interpolation.On dit qu'un estimateur est robuste si ce dernier présente certaines garanties théoriques, sous le moins d'hypothèses possibles. Cette problématique de robustesse devient de plus en plus populaire. La raison principale est que dans l'ère actuelle du “big data", les données sont très souvent corrompues. Ainsi, construire des estimateurs fiables dans cette situation est essentiel. Dans cette thèse nous montrons que le fameux minimiseur du risque empirique (regularisé) associé à une fonction de perte Lipschitz est robuste à des bruits à queues lourde ainsi qu'a des outliers dans les labels. En revanche si la classe de prédicteurs est à queue lourde, cet estimateur n'est pas fiable. Dans ce cas, nous construisons des estimateurs appelé estimateur minmax-MOM, optimal lorsque les données sont à queues lourdes et possiblement corrompues.En apprentissage statistique, on dit qu'un estimateur interpole, lorsque ce dernier prédit parfaitement sur un jeu d'entrainement. En grande dimension, certains estimateurs interpolant les données peuvent être bons. En particulier, cette thèse nous étudions le modèle linéaire Gaussien en grande dimension et montrons que l'estimateur interpolant les données de plus petite norme est consistant et atteint même des vitesses rapides
This PhD thesis deals with supervized machine learning and statistics. The main goal is to use localization techniques to derive fast rates of convergence, with a particular focus on robust learning and interpolation problems.Localization methods aim to analyze localized properties of an estimator to obtain fast rates of convergence, that is rates of order O(1/n), where n is the number of observations. Under assumptions, such as the Bernstein condition, such rates are attainable.A robust estimator is an estimator with good theoretical guarantees, under as few assumptions as possible. This question is getting more and more popular in the current era of big data. Large dataset are very likely to be corrupted and one would like to build reliable estimators in such a setting. We show that the well-known regularized empirical risk minimizer (RERM) with Lipschitz-loss function is robust with respect to heavy-tailed noise and outliers in the label. When the class of predictor is heavy-tailed, RERM is not reliable. In this setting, we show that minmax Median of Means estimators can be a solution. By construction minmax-MOM estimators are also robust to an adversarial contamination.Interpolation problems study learning procedure with zero training error. Surprisingly, in large dimension, interpolating the data does not necessarily implies over-fitting. We study a high dimensional Gaussian linear model and show that sometimes the over-fitting may be benign

APA, Harvard, Vancouver, ISO, and other styles

16

Cherief-Abdellatif, Badr-Eddine. "Contributions to the theoretical study of variational inference and robustness." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAG001.

Full text

Abstract:

Cette thèse de doctorat traite de l'inférence variationnelle et de la robustesse en statistique et en machine learning. Plus précisément, elle se concentre sur les propriétés statistiques des approximations variationnelles et sur la conception d'algorithmes efficaces pour les calculer de manière séquentielle, et étudie les estimateurs basés sur le Maximum Mean Discrepancy comme règles d'apprentissage qui sont robustes à la mauvaise spécification du modèle.Ces dernières années, l'inférence variationnelle a été largement étudiée du point de vue computationnel, cependant, la littérature n'a accordé que peu d'attention à ses propriétés théoriques jusqu'à très récemment. Dans cette thèse, nous étudions la consistence des approximations variationnelles dans divers modèles statistiques et les conditions qui assurent leur consistence. En particulier, nous abordons le cas des modèles de mélange et des réseaux de neurones profonds. Nous justifions également d'un point de vue théorique l'utilisation de la stratégie de maximisation de l'ELBO, un critère numérique qui est largement utilisé dans la communauté VB pour la sélection de modèle et dont l'efficacité a déjà été confirmée en pratique. En outre, l'inférence Bayésienne offre un cadre d'apprentissage en ligne attrayant pour analyser des données séquentielles, et offre des garanties de généralisation qui restent valables même en cas de mauvaise spécification des modèles et en présence d'adversaires. Malheureusement, l'inférence Bayésienne exacte est rarement tractable en pratique et des méthodes d'approximation sont généralement employées, mais ces méthodes préservent-elles les propriétés de généralisation de l'inférence Bayésienne ? Dans cette thèse, nous montrons que c'est effectivement le cas pour certains algorithmes d'inférence variationnelle (VI). Nous proposons de nouveaux algorithmes tempérés en ligne et nous en déduisons des bornes de généralisation. Notre résultat théorique repose sur la convexité de l'objectif variationnel, mais nous soutenons que notre résultat devrait être plus général et présentons des preuves empiriques à l'appui. Notre travail donne des justifications théoriques en faveur des algorithmes en ligne qui s'appuient sur des méthodes Bayésiennes approchées.Une autre question d'intérêt majeur en statistique qui est abordée dans cette thèse est la conception d'une procédure d'estimation universelle. Cette question est d'un intérêt majeur, notamment parce qu'elle conduit à des estimateurs robustes, un thème d'actualité en statistique et en machine learning. Nous abordons le problème de l'estimation universelle en utilisant un estimateur de minimisation de distance basé sur la Maximum Mean Discrepancy. Nous montrons que l'estimateur est robuste à la fois à la dépendance et à la présence de valeurs aberrantes dans le jeu de données. Nous mettons également en évidence les liens qui peuvent exister avec les estimateurs de minimisation de distance utilisant la distance L2. Enfin, nous présentons une étude théorique de l'algorithme de descente de gradient stochastique utilisé pour calculer l'estimateur, et nous étayons nos conclusions par des simulations numériques. Nous proposons également une version Bayésienne de notre estimateur, que nous étudions à la fois d'un point de vue théorique et d'un point de vue computationnel
This PhD thesis deals with variational inference and robustness. More precisely, it focuses on the statistical properties of variational approximations and the design of efficient algorithms for computing them in an online fashion, and investigates Maximum Mean Discrepancy based estimators as learning rules that are robust to model misspecification.In recent years, variational inference has been extensively studied from the computational viewpoint, but only little attention has been put in the literature towards theoretical properties of variational approximations until very recently. In this thesis, we investigate the consistency of variational approximations in various statistical models and the conditions that ensure the consistency of variational approximations. In particular, we tackle the special case of mixture models and deep neural networks. We also justify in theory the use of the ELBO maximization strategy, a model selection criterion that is widely used in the Variational Bayes community and is known to work well in practice.Moreover, Bayesian inference provides an attractive online-learning framework to analyze sequential data, and offers generalization guarantees which hold even under model mismatch and with adversaries. Unfortunately, exact Bayesian inference is rarely feasible in practice and approximation methods are usually employed, but do such methods preserve the generalization properties of Bayesian inference? In this thesis, we show that this is indeed the case for some variational inference algorithms. We propose new online, tempered variational algorithms and derive their generalization bounds. Our theoretical result relies on the convexity of the variational objective, but we argue that our result should hold more generally and present empirical evidence in support of this. Our work presents theoretical justifications in favor of online algorithms that rely on approximate Bayesian methods. Another point that is addressed in this thesis is the design of a universal estimation procedure. This question is of major interest, in particular because it leads to robust estimators, a very hot topic in statistics and machine learning. We tackle the problem of universal estimation using a minimum distance estimator based on the Maximum Mean Discrepancy. We show that the estimator is robust to both dependence and to the presence of outliers in the dataset. We also highlight the connections that may exist with minimum distance estimators using L2-distance. Finally, we provide a theoretical study of the stochastic gradient descent algorithm used to compute the estimator, and we support our findings with numerical simulations. We also propose a Bayesian version of our estimator, that we study from both a theoretical and a computational points of view

APA, Harvard, Vancouver, ISO, and other styles

17

(11196552), Kevin Segundo Bello Medina. "STRUCTURED PREDICTION: STATISTICAL AND COMPUTATIONAL GUARANTEES IN LEARNING AND INFERENCE." Thesis, 2021.

Find full text

Abstract:

Structured prediction consists of receiving a structured input and producing a combinatorial structure such as trees, clusters, networks, sequences, permutations, among others. From the computational viewpoint, structured prediction is in general considered intractable because of the size of the output space being exponential in the input size. For instance, in image segmentation tasks, the number of admissible segments is exponential in the number of pixels. A second factor is the combination of the input dimensionality along with the amount of data under availability. In structured prediction it is common to have the input live in a high-dimensional space, which involves to jointly reason about thousands or millions of variables, and at the same time contend with limited amount of data. Thus, learning and inference methods with strong computational and statistical guarantees are desired. The focus of our research is then to propose principled methods for structured prediction that are both polynomial time, i.e., computationally efficient, and require a polynomial number of data samples, i.e., statistically efficient.

The main contributions of this thesis are as follows:

(i) We develop an efficient and principled learning method of latent variable models for structured prediction under Gaussian perturbations. We derive a Rademacher-based generalization bound and argue that the use of non-convex formulations in learning latent-variable models leads to tighter bounds of the Gibbs decoder distortion.

(ii) We study the fundamental limits of structured prediction, i.e., we characterize the necessary sample complexity for learning factor graph models in the context of structured prediction. In particular, we show that the finiteness of our novel MaxPair-dimension is necessary for learning. Lastly, we show a connection between the MaxPair-dimension and the VC-dimension---which allows for using existing results on VC-dimension to calculate the MaxPair-dimension.

(iii) We analyze a generative model based on connected graphs, and find the structural conditions of the graph that allow for the exact recovery of the node labels. In particular, we show that exact recovery is realizable in polynomial time for a large class of graphs. Our analysis is based on convex relaxations, where we thoroughly analyze a semidefinite program and a degree-4 sum-of-squares program. Finally, we extend this model to consider linear constraints (e.g., fairness), and formally explain the effect of the added constraints on the probability of exact recovery.

APA, Harvard, Vancouver, ISO, and other styles

18

Philips, Petra. "Data-Dependent Analysis of Learning Algorithms." Phd thesis, 2005. http://hdl.handle.net/1885/47998.

Full text

Abstract:

This thesis studies the generalization ability of machine learning algorithms in a statistical setting. It focuses on the data-dependent analysis of the generalization performance of learning algorithms in order to make full use of the potential of the actual training sample from which these algorithms learn.¶ First, we propose an extension of the standard framework for the derivation of generalization bounds for algorithms taking their hypotheses from random classes of functions. ... ¶ Second, we study in more detail generalization bounds for a specific algorithm which is of central importance in learning theory, namely the Empirical Risk Minimization algorithm (ERM). ...

APA, Harvard, Vancouver, ISO, and other styles

19

Makareh, Shireh Miad. "Topics in the Notion of Amenability and its Generalizations for Banach Algebras." 2010. http://hdl.handle.net/1993/4197.

Full text

Abstract:

This thesis has two parts. The first part deals with some questions in amenability. We show that for a Banach algebra A with a bounded approximate identity, the amenability of the projective tensor product of A with itself, the amenability of the projective tensor product of A with A^op and the amenability of A are equivalent. Also if A is a closed ideal in a commutative Banach algebra B, then the (weak) amenability of the projective tensor product of A and B implies the (weak) amenability of A. Finally, we show that if the Banach algebra A is amenable through multiplication π then is also amenable through any multiplication ρ such that the norm of π-ρ is less than 1/( 11). The second part deals with questions in generalized notions of amenability such as approximate amenability and bounded approximate amenability. First we prove some new results about approximately amenable Banach algebras. Then we state a characterization of approximately amenable Banach algebras and a characterization of boundedly approximately amenable Banach algebras. Finally, we prove that B(l^p (E)) is not approximately amenable for Banach spaces E with certain properties. As a corollary of this part, we give a new proof that B(l^2) is not approximately amenable.

APA, Harvard, Vancouver, ISO, and other styles

20

Schweighofer, Markus [Verfasser]. "Iterated rings of bounded elements and generalizations of Schmüdgen's theorem / Markus Schweighofer." 2002. http://d-nb.info/964451824/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Nasre, Meghana. "Generalizations Of The Popular Matching Problem." Thesis, 2011. http://etd.iisc.ernet.in/handle/2005/2095.

Full text

Abstract:

Matching problems arise in several real-world scenarios like assigning posts to applicants, houses to trainees and room-mates to one another. In this thesis we consider the bipartite matching problem where one side of the bipartition specifies preferences over the other side. That is, we are given a bipartite graph G = (A ∪ P,E) where A denotes the set of applicants, P denotes the set of posts, and the preferences of applicants are specified by ranks on the edges. Several notions of optimality like pareto-optimality, rank-maximality, popularity have been studied in the literature; we focus on the notion of popularity. A matching M is more popular than another matching M′ if the number of applicants that prefer M to M′ exceeds the number of applicants that prefer M′ to M. A matching M is said to be popular if there exists no matching that is more popular than M. Popular matchings have the desirable property that no applicant majority can force a migration to another matching. However, popular matchings do not provide a complete answer since there exist simple instances that do not admit any popular matching. Abraham et al. (SICOMP 2007) characterized instances that admit a popular matching and also gave efficient algorithms to find one when it exists. We present several generalizations of the popular matchings problem in this thesis. Majority of our work deals with instances that do not admit any popular matching. We propose three different solution concepts for such instances. A reasonable solution when an instance does not admit a popular matching is to output a matching that is least unpopular amongst the set of unpopular matchings. McCutchen (LATIN 2008) introduced and studied measures of unpopularity, namely the unpopularity factor and unpopularity margin. He proved that computing either a least unpopularity factor matching or a least unpopularity margin matching is NP-hard. We build upon this work and design an O(km√n) time algorithm which produces matchings with bounded unpopularity provided a certain subgraph of G admits an A-complete matching (a matching that matches all the applicants). Here n and m denote the number of vertices and the number of edges in G respectively, and k, which is bounded by |A|, is the number of iterations taken by our algorithm to terminate. We also show that if a certain subgraph of G admits an A-complete matching, then we have computed a matching with the least unpopularity factor. Another feasible solution for instances without any popular matching is to output a mixed matching that is popular. A mixed matching is simply a probability distribution over the set of matchings. A mixed matching Q is popular if no mixed matching is more popular than Q. We seek to answer the existence and computation of popular mixed matchings in a given instance G. We begin with a linear programming formulation to compute a mixed matching with the least unpopularity margin. We show that although the linear program has exponentially many constraints, we have a polynomial time separation oracle and hence a least unpopularity margin mixed matching can be computed in polynomial time. By casting the popular mixed matchings problem as a two player zero-sum game, it is possible to prove that every instance of the popular matchings problem admits a popular mixed matching. Therefore, the matching returned by our linear program is indeed a popular mixed matching. Finally, we propose augmentation of the input graph for instances that do not admit any popular matching. Assume that we are dealing with a set of items B (say, DVDs/books) instead of posts and it is possible to make duplicates of these items. Our goal is to make duplicates of appropriate items such that the augmented graph admits a popular matching. However, since allowing arbitrarily many copies for items is not feasible in practice, we impose restrictions in two forms – (i) associating costs with items, and (ii) bounding the number of copies. In the first case, we assume that we pay a price of cost(b) for every extra copy of b that we make; the first copy is assumed to be given to us at free. The total cost of the augmented instance is the sum of costs of all the extra copies that we make. Our goal is to compute a minimum cost augmented instance which admits a popular matching. In the second case, along with the input graph G = (A ∪ B,E), we are given a vector hc1, c2, . . . , c|B|i denoting upper bounds on the number of copies of every item. We seek to answer whether there exists a vector hx1, x2, . . . , x|B|i such that having xi copies of item bi where 1 ≤ xi ≤ ci enables the augmented graph to admit a popular matching. We prove that several problems under both these models turn out to be NP-hard – in fact they remain NP-hard even under severe restrictions on the preference lists. Our final results deal with instances that admit popular matchings. When the input instance admits a popular matching, there may be several popular matchings – in fact there may be several maximum cardinality popular matchings. Hence one may not be content with returning any maximum cardinality popular matching and instead ask for an optimal popular matching. Assuming that the notion of optimality is specified as a part of the problem, we present an O(m + n21 ) time algorithm for computing an optimal popular matching in G. Here m denotes the number of edges in G and n1 denotes the number of applicants. We also consider the problem of computing a minimum cost popular matching where with every post p, a price cost(p) and a capacity cap(p) are associated. A post with capacity cap(p) can be matched with up to cap(p) many applicants. We present an O(mn1) time algorithm to compute a minimum cost popular matching in such instances. We believe that the work provides interesting insights into the popular matchings problem and its variants.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Generalization bounds'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles