Dissertations / Theses on the topic 'Generalization bound'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 15 dissertations / theses for your research on the topic 'Generalization bound.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
McDonald, Daniel J. "Generalization Error Bounds for Time Series." Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/184.
Full textKroon, Rodney Stephen. "Support vector machines, generalization bounds, and transduction." Thesis, Stellenbosch : University of Stellenbosch, 2003. http://hdl.handle.net/10019.1/16375.
Full textKelby, Robin J. "Formalized Generalization Bounds for Perceptron-Like Algorithms." Ohio University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1594805966855804.
Full textGiulini, Ilaria. "Generalization bounds for random samples in Hilbert spaces." Thesis, Paris, Ecole normale supérieure, 2015. http://www.theses.fr/2015ENSU0026/document.
Full textThis thesis focuses on obtaining generalization bounds for random samples in reproducing kernel Hilbert spaces. The approach consists in first obtaining non-asymptotic dimension-free bounds in finite-dimensional spaces using some PAC-Bayesian inequalities related to Gaussian perturbations and then in generalizing the results in a separable Hilbert space. We first investigate the question of estimating the Gram operator by a robust estimator from an i. i. d. sample and we present uniform bounds that hold under weak moment assumptions. These results allow us to qualify principal component analysis independently of the dimension of the ambient space and to propose stable versions of it. In the last part of the thesis we present a new algorithm for spectral clustering. It consists in replacing the projection on the eigenvectors associated with the largest eigenvalues of the Laplacian matrix by a power of the normalized Laplacian. This iteration, justified by the analysis of clustering in terms of Markov chains, performs a smooth truncation. We prove nonasymptotic bounds for the convergence of our spectral clustering algorithm applied to a random sample of points in a Hilbert space that are deduced from the bounds for the Gram operator in a Hilbert space. Experiments are done in the context of image analysis
Wade, Modou. "Apprentissage profond pour les processus faiblement dépendants." Electronic Thesis or Diss., CY Cergy Paris Université, 2024. http://www.theses.fr/2024CYUN1299.
Full textThis thesis focuses on deep learning for weakly dependent processes. We consider a class of deep neural network estimators with sparsity regularisation and/or penalty regularisation.Chapter1 is a summary of the work. It presents the deep learning framework and reviews the main results obtained in chapters 2, 3, 4, 5 and 6.Chapter 2 considers deep learning for psi-weakly dependent processes. We have established the convergence rate of the empirical risk minimization (ERM) algorithm on the class of deep neural network (DNN) estimators. For these estimators, we have provided a generalization bound and an asymptotic learning rate of order O(n^{-1/alpha}) for all alpha > 2 is obtained. A bound of the excess risk for a large class of target predictors is also established. Chapter 3 presents the sparse-penalized deep neural networks estimator under weak dependence. We consider nonparametric regression and classification problems for weakly dependent processes. We use a method of regularization by penalization. For nonparametric regression and binary classification, we establish an oracle inequality for the excess risk of the sparse-penalized deep neural networks (SPDNN) estimator. We have also provided a convergence rate for these estimators.Chapter 4 focuses on the penalized deep neural networks estimator with a general loss function under weak dependence. We consider the psi-weak dependence structure and, in the specific case where the observations are bounded, we deal with the theta_{infty}-weak dependence. For learning psi and theta_{infty}-weakly dependent processes, we have established an oracle inequality for the excess risks of the sparse-penalized deep neural networks estimator. We have shown that when the target function is sufficiently smooth, the convergence rate of these excess risks is close to O(n^{-1/3}).Chapter 5 presents robust deep learning from weakly dependent data. We assume that the output variable has finite r moments, with r >= 1. For learning strong mixing and psi-weakly dependent processes, a non-asymptotic bound for the expected excess risk of the deep neural networks estimator is established. We have shown that when the target function belongs to the class of H"older smooth functions, the convergence rate of the expected excess risk for exponentially strongly mixing data is close to or equal to that obtained with an independent and identically distributed sample. Chapter 6 focuses on deep learning for strongly mixing observation with sparse-penalized regularization and minimax optimality. We have provided an oracle inequality and a bound on the class of H"older smooth functions for the expected excess risk of the deep neural network estimator. We have also considered the problem of nonparametric regression from strongly mixing data with sub-exponential noise. When the target function belongs to the class of H"older composition functions, we have established an upper bound for the oracle inequality of the L_2 error. In the specific case of autoregressive regression with standard Laplace or normal error, we have provided a lower bound for the L_2 error in this classe, which matches up to a logarithmic factor the upper bound; thus the deep neural network estimator achieves optimal convergence rate
Rakhlin, Alexander. "Applications of empirical processes in learning theory : algorithmic stability and generalization bounds." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/34564.
Full textIncludes bibliographical references (p. 141-148).
This thesis studies two key properties of learning algorithms: their generalization ability and their stability with respect to perturbations. To analyze these properties, we focus on concentration inequalities and tools from empirical process theory. We obtain theoretical results and demonstrate their applications to machine learning. First, we show how various notions of stability upper- and lower-bound the bias and variance of several estimators of the expected performance for general learning algorithms. A weak stability condition is shown to be equivalent to consistency of empirical risk minimization. The second part of the thesis derives tight performance guarantees for greedy error minimization methods - a family of computationally tractable algorithms. In particular, we derive risk bounds for a greedy mixture density estimation procedure. We prove that, unlike what is suggested in the literature, the number of terms in the mixture is not a bias-variance trade-off for the performance. The third part of this thesis provides a solution to an open problem regarding the stability of Empirical Risk Minimization (ERM). This algorithm is of central importance in Learning Theory.
(cont.) By studying the suprema of the empirical process, we prove that ERM over Donsker classes of functions is stable in the L1 norm. Hence, as the number of samples grows, it becomes less and less likely that a perturbation of o(v/n) samples will result in a very different empirical minimizer. Asymptotic rates of this stability are proved under metric entropy assumptions on the function class. Through the use of a ratio limit inequality, we also prove stability of expected errors of empirical minimizers. Next, we investigate applications of the stability result. In particular, we focus on procedures that optimize an objective function, such as k-means and other clustering methods. We demonstrate that stability of clustering, just like stability of ERM, is closely related to the geometry of the class and the underlying measure. Furthermore, our result on stability of ERM delineates a phase transition between stability and instability of clustering methods. In the last chapter, we prove a generalization of the bounded-difference concentration inequality for almost-everywhere smooth functions. This result can be utilized to analyze algorithms which are almost always stable. Next, we prove a phase transition in the concentration of almost-everywhere smooth functions. Finally, a tight concentration of empirical errors of empirical minimizers is shown under an assumption on the underlying space.
by Alexander Rakhlin.
Ph.D.
Bellet, Aurélien. "Supervised metric learning with generalization guarantees." Phd thesis, Université Jean Monnet - Saint-Etienne, 2012. http://tel.archives-ouvertes.fr/tel-00770627.
Full textNordenfors, Oskar. "A Literature Study Concerning Generalization Error Bounds for Neural Networks via Rademacher Complexity." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184487.
Full textI denna uppsats presenteras några grundläggande resultat från teorin kring maskininlärning och neurala nätverk, med målet att slutligen diskutera övre begräsningar på generaliseringsfelet hos neurala nätverk, via Rademachers komplexitet.
Katsikarelis, Ioannis. "Structurally Parameterized Tight Bounds and Approximation for Generalizations of Independence and Domination." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLED048.
Full textIn this thesis we focus on the NP-hard problems (k, r)-CENTER and d-SCATTERED SET that generalize the well-studied concepts of domination and independence over larger distances. In the first part we maintain a parameterized viewpoint and examine the standard parameterization as well as the most widely-used graph parameters measuring the input’s structure. We offer hardness results that show there is no algorithm of running-time below certain bounds, subject to the (Strong) Exponential Time Hypothesis, produce essentially optimal algorithms of complexity that matches these lower bounds and further attempt to offer an alternative to exact computation in significantly reduced running-time by way of approximation algorithms. In the second part we consider the (super-)polynomial (in-)approximability of the d-SCATTERED SET problem, i.e. we determine the exact relationship between an achievable approximation ratio ρ, the distance parameter d, and the runningtime of any ρ-approximation algorithm expressed as a function of the above and the size of the input n. We then consider strictly polynomial running-times and improve our understanding on the approximability characteristics of the problem on graphs of bounded maximum degree as well as bipartite graphs
Musayeva, Khadija. "Generalization Performance of Margin Multi-category Classifiers." Thesis, Université de Lorraine, 2019. http://www.theses.fr/2019LORR0096/document.
Full textThis thesis deals with the theory of margin multi-category classification, and is based on the statistical learning theory founded by Vapnik and Chervonenkis. We are interested in deriving generalization bounds with explicit dependencies on the number C of categories, the sample size m and the margin parameter gamma, when the loss function considered is a Lipschitz continuous margin loss function. Generalization bounds rely on the empirical performance of the classifier as well as its "capacity". In this work, the following scale-sensitive capacity measures are considered: the Rademacher complexity, the covering numbers and the fat-shattering dimension. Our main contributions are obtained under the assumption that the classes of component functions implemented by a classifier have polynomially growing fat-shattering dimensions and that the component functions are independent. In the context of the pathway of Mendelson, which relates the Rademacher complexity to the covering numbers and the latter to the fat-shattering dimension, we study the impact that decomposing at the level of one of these capacity measures has on the dependencies on C, m and gamma. In particular, we demonstrate that the dependency on C can be substantially improved over the state of the art if the decomposition is postponed to the level of the metric entropy or the fat-shattering dimension. On the other hand, this impacts negatively the rate of convergence (dependency on m), an indication of the fact that optimizing the dependencies on the three basic parameters amounts to looking for a trade-off
Musayeva, Khadija. "Generalization Performance of Margin Multi-category Classifiers." Electronic Thesis or Diss., Université de Lorraine, 2019. http://www.theses.fr/2019LORR0096.
Full textThis thesis deals with the theory of margin multi-category classification, and is based on the statistical learning theory founded by Vapnik and Chervonenkis. We are interested in deriving generalization bounds with explicit dependencies on the number C of categories, the sample size m and the margin parameter gamma, when the loss function considered is a Lipschitz continuous margin loss function. Generalization bounds rely on the empirical performance of the classifier as well as its "capacity". In this work, the following scale-sensitive capacity measures are considered: the Rademacher complexity, the covering numbers and the fat-shattering dimension. Our main contributions are obtained under the assumption that the classes of component functions implemented by a classifier have polynomially growing fat-shattering dimensions and that the component functions are independent. In the context of the pathway of Mendelson, which relates the Rademacher complexity to the covering numbers and the latter to the fat-shattering dimension, we study the impact that decomposing at the level of one of these capacity measures has on the dependencies on C, m and gamma. In particular, we demonstrate that the dependency on C can be substantially improved over the state of the art if the decomposition is postponed to the level of the metric entropy or the fat-shattering dimension. On the other hand, this impacts negatively the rate of convergence (dependency on m), an indication of the fact that optimizing the dependencies on the three basic parameters amounts to looking for a trade-off
Philips, Petra Camilla, and petra philips@gmail com. "Data-Dependent Analysis of Learning Algorithms." The Australian National University. Research School of Information Sciences and Engineering, 2005. http://thesis.anu.edu.au./public/adt-ANU20050901.204523.
Full textPeel, Thomas. "Algorithmes de poursuite stochastiques et inégalités de concentration empiriques pour l'apprentissage statistique." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4769/document.
Full textThe first part of this thesis introduces new algorithms for the sparse encoding of signals. Based on Matching Pursuit (MP) they focus on the following problem : how to reduce the computation time of the selection step of MP. As an answer, we sub-sample the dictionary in line and column at each iteration. We show that this theoretically grounded approach has good empirical performances. We then propose a bloc coordinate gradient descent algorithm for feature selection problems in the multiclass classification setting. Thanks to the use of error-correcting output codes, this task can be seen as a simultaneous sparse encoding of signals problem. The second part exposes new empirical Bernstein inequalities. Firstly, they concern the theory of the U-Statistics and are applied in order to design generalization bounds for ranking algorithms. These bounds take advantage of a variance estimator and we propose an efficient algorithm to compute it. Then, we present an empirical version of the Bernstein type inequality for martingales by Freedman [1975]. Again, the strength of our result lies in the variance estimator computable from the data. This allows us to propose generalization bounds for online learning algorithms which improve the state of the art and pave the way to a new family of learning algorithms taking advantage of this empirical information
(11196552), Kevin Segundo Bello Medina. "STRUCTURED PREDICTION: STATISTICAL AND COMPUTATIONAL GUARANTEES IN LEARNING AND INFERENCE." Thesis, 2021.
Find full textPhilips, Petra. "Data-Dependent Analysis of Learning Algorithms." Phd thesis, 2005. http://hdl.handle.net/1885/47998.
Full text