Academic literature on the topic 'Empirical risk minimization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Empirical risk minimization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Empirical risk minimization"

1

Clémençon, Stephan, Patrice Bertail, and Emilie Chautru. "Sampling and empirical risk minimization." Statistics 51, no. 1 (December 14, 2016): 30–42. http://dx.doi.org/10.1080/02331888.2016.1259810.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lecué, Guillaume, and Shahar Mendelson. "Aggregation via empirical risk minimization." Probability Theory and Related Fields 145, no. 3-4 (November 12, 2008): 591–613. http://dx.doi.org/10.1007/s00440-008-0180-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Lugosi, G., and K. Zeger. "Nonparametric estimation via empirical risk minimization." IEEE Transactions on Information Theory 41, no. 3 (May 1995): 677–87. http://dx.doi.org/10.1109/18.382014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Koltchinskii, Vladimir. "Sparsity in penalized empirical risk minimization." Annales de l'Institut Henri Poincaré, Probabilités et Statistiques 45, no. 1 (February 2009): 7–57. http://dx.doi.org/10.1214/07-aihp146.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Klemelä, Jussi, and Enno Mammen. "Empirical risk minimization in inverse problems." Annals of Statistics 38, no. 1 (February 2010): 482–511. http://dx.doi.org/10.1214/09-aos726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Liyuan, Biqin Song, Zhibin Pan, Chuanwu Yang, Chi Xiao, and Weifu Li. "Gradient Learning under Tilted Empirical Risk Minimization." Entropy 24, no. 7 (July 9, 2022): 956. http://dx.doi.org/10.3390/e24070956.

Full text
Abstract:
Gradient Learning (GL), aiming to estimate the gradient of target function, has attracted much attention in variable selection problems due to its mild structure requirements and wide applicability. Despite rapid progress, the majority of the existing GL works are based on the empirical risk minimization (ERM) principle, which may face the degraded performance under complex data environment, e.g., non-Gaussian noise. To alleviate this sensitiveness, we propose a new GL model with the help of the tilted ERM criterion, and establish its theoretical support from the function approximation viewpoint. Specifically, the operator approximation technique plays the crucial role in our analysis. To solve the proposed learning objective, a gradient descent method is proposed, and the convergence analysis is provided. Finally, simulated experimental results validate the effectiveness of our approach when the input variables are correlated.
APA, Harvard, Vancouver, ISO, and other styles
7

Perez-Cruz, F., A. Navia-Vazquez, A. R. Figueiras-Vidal, and A. Artes-Rodriguez. "Empirical risk minimization for support vector classifiers." IEEE Transactions on Neural Networks 14, no. 2 (March 2003): 296–303. http://dx.doi.org/10.1109/tnn.2003.809399.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Golubev, G. K. "On a Method of Empirical Risk Minimization." Problems of Information Transmission 40, no. 3 (July 2004): 202–11. http://dx.doi.org/10.1023/b:prit.0000044256.20595.e6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Brownlees, Christian, Emilien Joly, and Gábor Lugosi. "Empirical risk minimization for heavy-tailed losses." Annals of Statistics 43, no. 6 (December 2015): 2507–36. http://dx.doi.org/10.1214/15-aos1350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Loustau, Sébastien. "Penalized empirical risk minimization over Besov spaces." Electronic Journal of Statistics 3 (2009): 824–50. http://dx.doi.org/10.1214/08-ejs316.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Empirical risk minimization"

1

Csiba, Dominik. "Data sampling strategies in stochastic algorithms for empirical risk minimization." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/29635.

Full text
Abstract:
Gradient descent methods and especially their stochastic variants have become highly popular in the last decade due to their efficiency on big data optimization problems. In this thesis we present the development of data sampling strategies for these methods. In the first four chapters we focus on four views on the sampling for convex problems, developing and analyzing new state-of-the-art methods using non-standard data sampling strategies. Finally, in the last chapter we present a more flexible framework, which generalizes to more problems as well as more sampling rules. In the first chapter we propose an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization (ERM) problem. Our modification consists in allowing the method to adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves a provably better complexity bound than SDCA with the best fixed probability distribution, known as importance sampling. However, it is of a theoretical character as it is expensive to implement. We also propose AdaSDCA+: a practical variant which in our experiments outperforms existing non-adaptive methods. In the second chapter we extend the dual-free analysis of SDCA, to arbitrary mini-batching schemes. Our method is able to better utilize the information in the data defining the ERM problem. For convex loss functions, our complexity results match those of QUARTZ, which is a primal-dual method also allowing for arbitrary mini-batching schemes. The advantage of a dual-free analysis comes from the fact that it guarantees convergence even for non-convex loss functions, as long as the average loss is convex. We illustrate through experiments the utility of being able to design arbitrary mini-batching schemes. In the third chapter we study importance sampling of minibatches. Minibatching is a well studied and highly popular technique in supervised learning, used by practitioners due to its ability to accelerate training through better utilization of parallel processing power and reduction of stochastic variance. Another popular technique is importance sampling { a strategy for preferential sampling of more important examples also capable of accelerating the training process. However, despite considerable effort by the community in these areas, and due to the inherent technical difficulty of the problem, there is no existing work combining the power of importance sampling with the strength of minibatching. In this chapter we propose the first importance sampling for minibatches and give simple and rigorous complexity analysis of its performance. We illustrate on synthetic problems that for training data of certain properties, our sampling can lead to several orders of magnitude improvement in training time. We then test the new sampling on several popular datasets, and show that the improvement can reach an order of magnitude. In the fourth chapter we ask whether randomized coordinate descent (RCD) methods should be applied to the ERM problem or rather to its dual. When the number of examples (n) is much larger than the number of features (d), a common strategy is to apply RCD to the dual problem. On the other hand, when the number of features is much larger than the number of examples, it makes sense to apply RCD directly to the primal problem. In this paper we provide the first joint study of these two approaches when applied to L2-regularized ERM. First, we show through a rigorous analysis that for dense data, the above intuition is precisely correct. However, we find that for sparse and structured data, primal RCD can significantly outperform dual RCD even if d ≪ n, and vice versa, dual RCD can be much faster than primal RCD even if n ≫ d. Moreover, we show that, surprisingly, a single sampling strategy minimizes both the (bound on the) number of iterations and the overall expected complexity of RCD. Note that the latter complexity measure also takes into account the average cost of the iterations, which depends on the structure and sparsity of the data, and on the sampling strategy employed. We confirm our theoretical predictions using extensive experiments with both synthetic and real data sets. In the last chapter we introduce two novel generalizations of the theory for gradient descent type methods in the proximal setting. Firstly, we introduce the proportion function, which we further use to analyze all the known block-selection rules for coordinate descent methods under a single framework. This framework includes randomized methods with uniform, non-uniform or even adaptive sampling strategies, as well as deterministic methods with batch, greedy or cyclic selection rules. We additionally introduce a novel block selection technique called greedy minibatches, for which we provide competitive convergence guarantees. Secondly, the whole theory of strongly-convex optimization was recently generalized to a specific class of non-convex functions satisfying the so-called Polyak- Lojasiewicz condition. To mirror this generalization in the weakly convex case, we introduce the Weak Polyak- Lojasiewicz condition, using which we give global convergence guarantees for a class of non-convex functions previously not considered in theory. Additionally, we give local convergence guarantees for an even larger class of non-convex functions satisfying only a certain smoothness assumption. By combining the two above mentioned generalizations we recover the state-of-the-art convergence guarantees for a large class of previously known methods and setups as special cases of our framework. Also, we provide new guarantees for many previously not considered combinations of methods and setups, as well as a huge class of novel non-convex objectives. The flexibility of our approach offers a lot of potential for future research, as any new block selection procedure will have a convergence guarantee for all objectives considered in our framework, while any new objective analyzed under our approach will have a whole fleet of block selection rules with convergence guarantees readily available.
APA, Harvard, Vancouver, ISO, and other styles
2

Mathieu, Timothée. "M-estimation and Median of Means applied to statistical learning Robust classification via MOM minimization MONK – outlier-robust mean embedding estimation by median-of-means Excess risk bounds in robust empirical risk minimization." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPASM002.

Full text
Abstract:
Le principal objectif de cette thèse est d'étudier des méthodes d'apprentissage statistique robuste. Traditionnellement, en statistique nous utilisons des modèles ou des hypothèses simplificatrices qui nous permettent de représenter le monde réel tout en sachant l'analyser convenablement. Cependant, certaines déviations des hypothèses peuvent fortement perturber l'analyse statistique d'une base de données. Par statistiques robuste, nous entendons ici des méthodes pouvant gérer d'une part des données dites anormales (erreur de capteur, erreur humaine) mais aussi des données de nature très variables. Nous appliquons ce genre de technique à l'apprentissage statistique, donnant ainsi des assurances théoriques d'efficacité des méthodes proposées ainsi que des illustrations sur des données simulées et réelles
The main objective of this thesis is to study methods for robust statistical learning. Traditionally, in statistics we use models or simplifying assumptions that allow us to represent the real world. However, some deviations from the hypotheses can strongly disrupt the statistical analysis of a database. By robust statistics, we mean methods that can handle on the one hand so-called abnormal data (sensor error, human error) but also data of a highly variable nature. We apply robust techniques to statistical learning, giving theoretical efficiency results of the proposed methods as well as illustrations on simulated and real data
APA, Harvard, Vancouver, ISO, and other styles
3

Suzzi, Mattia. "Introduzione al Machine Learning e alle reti neurali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
Abstract:
Nel primo capitolo di questo elaborato vengono esposte le basi di un algoritmo di classificazione Machine Learning. In particolare ci si concentra sull’obiettivo dello stesso, ovvero generare una funzione ipotesi in grado di svolgere il compito richiesto. Per controllare il successo di questa operazione viene introdotto il concetto di errore, ovvero quanto la soluzione trovata è vicina a quella cercata. Essendo quest’ultima sconosciuta, è impossibile calcolare effettivamente l’errore reale, per questo viene utilizzato l’errore empirico, che riflette quanto é efficace la funzione ipotesi su un insieme di allenamento. Questo processo, per funzionare, necessita che l’errore empirico sia effettivamente una buona approssimazione dell’errore reale, proprietà verificata dagli algoritmi dotati della proprietà di Uniform Convergence. Verrà quindi esposto cosa significa, quali sono le condizioni per ottenerla e una classe di problemi per cui è particolarmente efficiente seguire questo processo (Convex Optimization). Nel secondo capitolo viene analizzato un particolare tipo di algoritmi, le reti neurali. Verrà esposto come i principi precedentemente visti per il Machine Learning non valgono nel caso delle reti neurali, problema che rende necessari nuovi risultati per garantire il successo dell’algoritmo. In particolare si affronta lo studio delle reti a due livelli, con qualche risultato sulla convergenza delle ipotesi così prodotte. Riguardo questo argomento l'elaborato descrive due teoremi che permettono di dimostrare come sia possibile approssimare, tramite una rete neurale a due livelli, una classe di funzioni abbastanza ampia e come l’errore raggiunto sia inversamente proporzionale al numero di neuroni. Il primo risultato ci assicura la convergenza dell’errore di approssimazione mentre il secondo ci indica il numero di neuroni necessari per raggiungere quel grado di approssimazione.
APA, Harvard, Vancouver, ISO, and other styles
4

Achab, Mastane. "Ranking and risk-aware reinforcement learning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT020.

Full text
Abstract:
Les travaux de cette thèse se situent à l’interface de deux thématiques de l'apprentissage automatique : l’apprentissage de préférences d'une part, et l’apprentissage par renforcement de l'autre. La première consiste à percoler différents classements d’un même ensemble d’objets afin d’en extraire un ordre général, la seconde à identifier séquentiellement une stratégie optimale en observant des récompenses sanctionnant chaque action essayée. La structure de la thèse suit ce découpage thématique. En première partie, le paradigme de minimisation du risque empirique est utilisé à des fins d'ordonnancement. Partant du problème d’apprentissage supervisé de règles d’ordonnancement à partir de données étiquetées de façon binaire, une extension est proposée au cas où les étiquettes prennent des valeurs continues. Les critères de performance usuels dans le cas binaire, à savoir la courbe caractéristique de l’opérateur de réception (COR) et l’aire sous la courbe COR (ASC), sont étendus au cas continu : les métriques COR intégrée (CORI) et ASC intégrée (ASCI) sont introduites à cet effet. Le second problème d'ordonnancement étudié est celui de l'agrégation de classements à travers l'identification du consensus de Kemeny. En particulier, une relaxation au problème plus général de la réduction de la dimensionnalité dans l'espace des distributions sur le groupe symétrique est formulée à l'aide d'outils mathématiques empruntés à la théorie du transport optimal. La seconde partie de cette thèse s'intéresse à l'apprentissage par renforcement. Des problèmes de bandit manchot sont analysés dans des contextes où la performance moyenne n'est pas pertinente et où la gestion du risque prévaut. Enfin, le problème plus général de l'apprentissage par renforcement distributionnel, dans lequel le décideur cherche à connaître l'entière distribution de sa performance et non pas uniquement sa valeur moyenne, est considéré. De nouveaux opérateurs de programmation dynamique ainsi que leurs pendants atomiques mènent à de nouveaux algorithmes stochastiques distributionnels
This thesis divides into two parts: the first part is on ranking and the second on risk-aware reinforcement learning. While binary classification is the flagship application of empirical risk minimization (ERM), the main paradigm of machine learning, more challenging problems such as bipartite ranking can also be expressed through that setup. In bipartite ranking, the goal is to order, by means of scoring methods, all the elements of some feature space based on a training dataset composed of feature vectors with their binary labels. This thesis extends this setting to the continuous ranking problem, a variant where the labels are taking continuous values instead of being simply binary. The analysis of ranking data, initiated in the 18th century in the context of elections, has led to another ranking problem using ERM, namely ranking aggregation and more precisely the Kemeny's consensus approach. From a training dataset made of ranking data, such as permutations or pairwise comparisons, the goal is to find the single "median permutation" that best corresponds to a consensus order. We present a less drastic dimensionality reduction approach where a distribution on rankings is approximated by a simpler distribution, which is not necessarily reduced to a Dirac mass as in ranking aggregation.For that purpose, we rely on mathematical tools from the theory of optimal transport such as Wasserstein metrics. The second part of this thesis focuses on risk-aware versions of the stochastic multi-armed bandit problem and of reinforcement learning (RL), where an agent is interacting with a dynamic environment by taking actions and receiving rewards, the objective being to maximize the total payoff. In particular, a novel atomic distributional RL approach is provided: the distribution of the total payoff is approximated by particles that correspond to trimmed means
APA, Harvard, Vancouver, ISO, and other styles
5

Philips, Petra Camilla, and petra philips@gmail com. "Data-Dependent Analysis of Learning Algorithms." The Australian National University. Research School of Information Sciences and Engineering, 2005. http://thesis.anu.edu.au./public/adt-ANU20050901.204523.

Full text
Abstract:
This thesis studies the generalization ability of machine learning algorithms in a statistical setting. It focuses on the data-dependent analysis of the generalization performance of learning algorithms in order to make full use of the potential of the actual training sample from which these algorithms learn.¶ First, we propose an extension of the standard framework for the derivation of generalization bounds for algorithms taking their hypotheses from random classes of functions. This approach is motivated by the fact that the function produced by a learning algorithm based on a random sample of data depends on this sample and is therefore a random function. Such an approach avoids the detour of the worst-case uniform bounds as done in the standard approach. We show that the mechanism which allows one to obtain generalization bounds for random classes in our framework is based on a “small complexity” of certain random coordinate projections. We demonstrate how this notion of complexity relates to learnability and how one can explore geometric properties of these projections in order to derive estimates of rates of convergence and good confidence interval estimates for the expected risk. We then demonstrate the generality of our new approach by presenting a range of examples, among them the algorithm-dependent compression schemes and the data-dependent luckiness frameworks, which fall into our random subclass framework.¶ Second, we study in more detail generalization bounds for a specific algorithm which is of central importance in learning theory, namely the Empirical Risk Minimization algorithm (ERM). Recent results show that one can significantly improve the high-probability estimates for the convergence rates for empirical minimizers by a direct analysis of the ERM algorithm. These results are based on a new localized notion of complexity of subsets of hypothesis functions with identical expected errors and are therefore dependent on the underlying unknown distribution. We investigate the extent to which one can estimate these high-probability convergence rates in a data-dependent manner. We provide an algorithm which computes a data-dependent upper bound for the expected error of empirical minimizers in terms of the “complexity” of data-dependent local subsets. These subsets are sets of functions of empirical errors of a given range and can be determined based solely on empirical data. We then show that recent direct estimates, which are essentially sharp estimates on the high-probability convergence rate for the ERM algorithm, can not be recovered universally from empirical data.
APA, Harvard, Vancouver, ISO, and other styles
6

Neirac, Lucie. "Learning with a linear loss function : excess risk and estimation bounds for ERM and minimax MOM estimators, with applications." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAG012.

Full text
Abstract:
La détection de communautés sur des graphes, la récupération de phase, le clustering signé, la synchronisation angulaire, le problème de la coupe maximale, la sparse PCA, ou encore le single index model, sont des problèmes classiques dans le domaine de l'apprentissage statistique. Au premier abord, ces problèmes semblent très dissemblables, impliquant différents types de données et poursuivant des objectifs distincts. Cependant, la littérature récente révèle qu'ils partagent un point commun : ils peuvent tous être formulés sous la forme de problèmes d'optimisation semi-définie positive (SDP). En utilisant cette modélisation, il devient possible de les aborder du point de vue classique du machine learning, en se basant sur la minimisation du risque empirique (ERM) et en utilisant la fonction de perte la plus élémentaire: la fonction de perte linéaire. Cela ouvre la voie à l'exploitation de la vaste littérature liée à la minimisation du risque, permettant ainsi d'obtenir des bornes d'estimation et de développer des algorithmes pour résoudre ces problèmes. L'objectif de cette thèse est de présenter une méthodologie unifiée pour obtenir les propriétés statistiques de procédures classiques en machine learning basées sur la fonction de perte linéaire. Cela s'applique notamment aux procédures SDP, que nous considérons comme des procédures ERM. L'adoption d'un “point de vue machine learning” nous permet d'aller plus loin en introduisant d'autres estimateurs performants pour relever deux défis majeurs en apprentissage statistique : la parcimonie et la robustesse face à la contamination adversaire et aux données à distribution à queue lourde. Nous abordons le problème des données parcimonieuses en proposant une version régularisée de l'estimateur ERM. Ensuite, nous nous attaquons au problème de la robustesse en introduisant un estimateur basé sur le principe de la "Médiane des Moyennes" (MOM), que nous nommons l'estimateur minmax MOM. Cet estimateur permet de faire face au problème de la robustesse et peut être utilisé avec n'importe quelle fonction de perte, y compris la fonction de perte linéaire. Nous présentons également une version régularisée de l'estimateur minmax MOM. Pour chacun de ces estimateurs, nous sommes en mesure de fournir un “excès de risque” ainsi que des bornes d'estimation, en utilisant deux outils clés : les points fixes de complexité locale et les équations de courbure de la fonction d'excès de risque. Afin d'illustrer la pertinence de notre approche, nous appliquons notre méthodologie à cinq problèmes classiques en machine learning, pour lesquels nous améliorons l'état de l'art
Community detection, phase recovery, signed clustering, angular group synchronization, Maxcut, sparse PCA, the single index model, and the list goes on, are all classical topics within the field of machine learning and statistics. At first glance, they are pretty different problems with different types of data and different goals. However, the literature of recent years shows that they do have one thing in common: they all are amenable to Semi-Definite Programming (SDP). And because they are amenable to SDP, we can go further and recast them in the classical machine learning framework of risk minimization, and this with the simplest possible loss function: the linear loss function. This, in turn, opens up the opportunity to leverage the vast literature related to risk minimization to derive excess risk and estimation bounds as well as algorithms to unravel these problems. The aim of this work is to propose a unified methodology to obtain statistical properties of classical machine learning procedures based on the linear loss function, which corresponds, for example, to the case of SDP procedures that we look as ERM procedures. Embracing a machine learning view point allows us to go into greater depth and introduce other estimators which are effective in handling two key challenges within statistical learning: sparsity, and robustness to adversarial contamination and heavy-tailed data. We attack the structural learning problem by proposing a regularized version of the ERM estimator. We then turn to the robustness problem and introduce an estimator based on the median of means (MOM) principle, which we call the minmax MOM estimator. This latter estimator addresses the problem of robustness and can be constructed whatever the loss function, including the linear loss function. We also present a regularized version of the minmax MOM estimator. For each of those estimators we are able to provide excess risk and estimation bounds, which are derived from two key tools: local complexity fixed points and curvature equations of the excess risk function. To illustrate the relevance of our approach, we apply our methodology to five classical problems within the frame of statistical learning, for which we improve the state-of-the-art results
APA, Harvard, Vancouver, ISO, and other styles
7

Gazagnadou, Nidham. "Expected smoothness for stochastic variance-reduced methods and sketch-and-project methods for structured linear systems." Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAT035.

Full text
Abstract:
L'augmentation considérable du volume de données ainsi que de la taille des échantillons complexifie la phase d'optimisation des algorithmes d'apprentissage, nécessitant la minimisation d'une fonction de perte. La descente de gradient stochastique (SGD) et ses variantes à réduction de variance (SAGA, SVRG, MISO) sont largement utilisées pour résoudre ces problèmes. En pratique, ces méthodes sont accélérées en calculant des gradients stochastiques sur un "mini-batch" : un petit groupe d'échantillons tiré aléatoirement. En effet, les récentes améliorations technologiques permettant la parallélisation de ces calculs ont généralisé l'utilisation des mini-batchs.Dans cette thèse, nous nous intéressons à l'étude d'algorithmes du gradient stochastique à variance réduite en essayant d'en trouver les hyperparamètres optimaux: taille du pas et du mini-batch. Cette étude nous permet de donner des résultats de convergence interpolant entre celui des méthodes stochastiques tirant un seul échantillon par itération et la descente de gradient dite "full-batch" utilisant l'ensemble des échantillons disponibles à chaque itération. Notre analyse se base sur la constante de régularité moyenne, outil fondamental de notre analyse, qui permet de mesurer la régularité de la fonction aléatoire dont le gradient est calculé.Nous étudions un autre type d'algorithmes d'optimisation : les méthodes "sketch-and-project". Ces dernières peuvent être utilisées lorsque le problème d'apprentissage est équivalent à la résolution d'un système linéaire. C'est par exemple le cas des moindres carrés ou de la régression ridge. Nous analysons ici des variantes de cette méthode qui utilisent différentes stratégies de momentum et d'accélération. L'efficacité de ces méthodes dépend de la stratégie de "sketching" utilisée pour compresser l'information du système à résoudre, et ce, à chaque itération. Enfin, nous montrons que ces méthodes peuvent aussi être étendues à d'autres problèmes d'analyse numérique. En effet, l'extension des méthodes de sketch-and-project aux méthodes de direction alternée implicite (ADI) permet de les appliquer en grande dimension lorsque les solveurs classiques s'avèrent trop lents
The considerable increase in the number of data and features complicates the learning phase requiring the minimization of a loss function. Stochastic gradient descent (SGD) and variance reduction variants (SAGA, SVRG, MISO) are widely used to solve this problem. In practice, these methods are accelerated by computing these stochastic gradients on a "mini-batch": a small group of samples randomly drawn.Indeed, recent technological improvements allowing the parallelization of these calculations have generalized the use of mini-batches.In this thesis, we are interested in the study of variants of stochastic gradient algorithms with reduced variance by trying to find the optimal hyperparameters: step and mini-batch size. Our study allows us to give convergence results interpolating between stochastic methods drawing a single sample per iteration and the so-called "full-batch" gradient descent using all samples at each iteration. Our analysis is based on the expected smoothness constant which allows to capture the regularity of the random function whose gradient is calculated.We study another class of optimization algorithms: the "sketch-and-project" methods. These methods can also be applied as soon as the learning problem boils down to solving a linear system. This is the case of ridge regression. We analyze here variants of this method that use different strategies of momentum and acceleration. These methods also depend on the sketching strategy used to compress the information of the system to be solved at each iteration. Finally, we show that these methods can also be extended to numerical analysis problems. Indeed, the extension of sketch-and-project methods to Alternating-Direction Implicit (ADI) methods allows to apply them to large-scale problems, when the so-called "direct" solvers are too slow
APA, Harvard, Vancouver, ISO, and other styles
8

Caponnetto, Andrea, and Alexander Rakhlin. "Some Properties of Empirical Risk Minimization over Donsker Classes." 2005. http://hdl.handle.net/1721.1/30545.

Full text
Abstract:
We study properties of algorithms which minimize (or almost minimize) empirical error over a Donsker class of functions. We show that the L2-diameter of the set of almost-minimizers is converging to zero in probability. Therefore, as the number of samples grows, it is becoming unlikely that adding a point (or a number of points) to the training set will result in a large jump (in L2 distance) to a new hypothesis. We also show that under some conditions the expected errors of the almost-minimizers are becoming close with a rate faster than n^{-1/2}.
APA, Harvard, Vancouver, ISO, and other styles
9

Mukherjee, Sayan, Partha Niyogi, Tomaso Poggio, and Ryan Rifkin. "Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization." 2002. http://hdl.handle.net/1721.3/5507.

Full text
Abstract:
Solutions of learning problems by Empirical Risk Minimization (ERM) need to be consistent, so that they may be predictive. They also need to be well-posed, so that they can be used robustly. We show that a statistical form of well-posedness, defined in terms of the key property of L-stability, is necessary and sufficient for consistency of ERM.
revised July 2003
APA, Harvard, Vancouver, ISO, and other styles
10

Philips, Petra. "Data-Dependent Analysis of Learning Algorithms." Phd thesis, 2005. http://hdl.handle.net/1885/47998.

Full text
Abstract:
This thesis studies the generalization ability of machine learning algorithms in a statistical setting. It focuses on the data-dependent analysis of the generalization performance of learning algorithms in order to make full use of the potential of the actual training sample from which these algorithms learn.¶ First, we propose an extension of the standard framework for the derivation of generalization bounds for algorithms taking their hypotheses from random classes of functions. ... ¶ Second, we study in more detail generalization bounds for a specific algorithm which is of central importance in learning theory, namely the Empirical Risk Minimization algorithm (ERM). ...
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Empirical risk minimization"

1

Koltchinskii, Vladimir. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-22147-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Koltchinskii, Vladimir. Oracle inequalities in empirical risk minimization and sparse recovery problems: École d'été de probabilités de Saint-Flour XXXVIII-2008. Berlin: Springer Verlag, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ecole d'été de probabilités de Saint-Flour (38th : 2008), ed. Oracle inequalities in empirical risk minimization and sparse recovery problems: École d'été de probabilités de Saint-Flour XXXVIII-2008. Berlin: Springer Verlag, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Machine Learning from Weak Supervision: An Empirical Risk Minimization Approach. MIT Press, 2022.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Koltchinskii, Vladimir. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École d'Été de Probabilités de Saint-Flour XXXVIII-2008. Springer London, Limited, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Koltchinskii, Vladimir. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École d'Été de Probabilités de Saint-Flour XXXVIII-2008. Springer, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Baillo, Amparo, Antonio Cuevas, and Ricardo Fraiman. Classification methods for functional data. Edited by Frédéric Ferraty and Yves Romain. Oxford University Press, 2018. http://dx.doi.org/10.1093/oxfordhb/9780199568444.013.10.

Full text
Abstract:
This article reviews the literature concerning supervised and unsupervised classification of functional data. It first explains the meaning of unsupervised classification vs. supervised classification before discussing the supervised classification problem in the infinite-dimensional case, showing that its formal statement generally coincides with that of discriminant analysis in the classical multivariate case. It then considers the optimal classifier and plug-in rules, empirical risk and empirical minimization rules, linear discrimination rules, the k nearest neighbor (k-NN) method, and kernel rules. It also describes classification based on partial least squares, classification based on reproducing kernels, and depth-based classification. Finally, it examines unsupervised classification methods, focusing on K-means for functional data, K-means for data in a Hilbert space, and impartial trimmed K-means for functional data. Some practical issues, in particular real-data examples and simulations, are reviewed and some selected proofs are given.
APA, Harvard, Vancouver, ISO, and other styles
8

Kuypers, Dirk R. J., and Maarten Naesens. Immunosuppression. Edited by Jeremy R. Chapman. Oxford University Press, 2018. http://dx.doi.org/10.1093/med/9780199592548.003.0281_update_001.

Full text
Abstract:
Combination immunosuppressive therapy produces excellent short-term results after kidney transplantation. Long-term graft survival has improved, but less dramatically. Death with a functioning graft remains the primary cause of graft loss. Dosing of current immunosuppressive therapy balances between careful clinical interpretation of time-driven immunological risk assessments and drug-related toxicity on the one hand, and the use of simple surrogate drug exposure indicators like blood/plasma concentrations on the other. The combined use of calcineurin-inhibitors (CNIs) with mycophenolic acids and corticosteroids has been fine-tuned over the last decade, based on empirically derived observations as well as on the results of large multicentre randomized clinical studies. Corticosteroid withdrawal and avoidance are feasible, at least in patients with a low immunological risk, but CNI-free protocols have had few long-term successes. Some minimization strategies have increased risk of developing acute rejection or (donor-specific) anti-HLA antibodies, with deleterious effects on the graft. Mammalian target of rapamycin inhibitors (mTORi) have shown limited benefit in early CNI replacement regimens and their long-term use as primary drug is hampered by intolerance. In the setting of particular malignant disease occurring after transplantation, such as squamous cell carcinoma of the skin and Kaposi’s sarcoma, mTORi seem promising. Induction agents (anti-interleukin 2 receptor monoclonal antibodies, antithymocyte globulins) effectively diminish the risk of early immunological graft loss in recipients with moderate to high immunological risk but at the price of more infectious or malignant complications. While personalized transplantation medicine is only in its early stages of development, attempts are made to quantitatively measure the clinical degree of immunosuppression, to tailor immunosuppressive therapy more specifically to the patient’s individual profile, and to monitor graft status by use of invasive (e.g. surveillance renal biopsies) and non-invasive biomarkers. These scientific endeavours are a necessity to further optimize the current immunosuppressive therapy which will remain for some time to come.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Empirical risk minimization"

1

Langford, John, Xinhua Zhang, Gavin Brown, Indrajit Bhattacharya, Lise Getoor, Thomas Zeugmann, Thomas Zeugmann, et al. "Empirical Risk Minimization." In Encyclopedia of Machine Learning, 312. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_251.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jung, Alexander. "Empirical Risk Minimization." In Machine Learning: Foundations, Methodologies, and Applications, 81–98. Singapore: Springer Singapore, 2022. http://dx.doi.org/10.1007/978-981-16-8193-6_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Xinhua. "Empirical Risk Minimization." In Encyclopedia of Machine Learning and Data Mining, 1. Boston, MA: Springer US, 2016. http://dx.doi.org/10.1007/978-1-4899-7502-7_79-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zhang, Xinhua. "Empirical Risk Minimization." In Encyclopedia of Machine Learning and Data Mining, 392–93. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_79.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Franco, Danilo, Luca Oneto, and Davide Anguita. "Fair Empirical Risk Minimization Revised." In Advances in Computational Intelligence, 29–42. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-43085-5_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Vapnik, Vladimir. "Methods of Expected-Risk Minimization." In Estimation of Dependences Based on Empirical Data, 27–44. New York, NY: Springer New York, 2006. http://dx.doi.org/10.1007/0-387-34239-7_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Bartlett, Peter L., Shahar Mendelson, and Petra Philips. "Local Complexities for Empirical Risk Minimization." In Learning Theory, 270–84. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-27819-1_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Vapnik, Vladimir. "The Method of Structural Minimization of Risk." In Estimation of Dependences Based on Empirical Data, 232–66. New York, NY: Springer New York, 2006. http://dx.doi.org/10.1007/0-387-34239-7_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Clémençon, Stéphan, Gábor Lugosi, and Nicolas Vayatis. "Ranking and Scoring Using Empirical Risk Minimization." In Learning Theory, 1–15. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11503415_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Fukuchi, Kazuto, and Jun Sakuma. "Neutralized Empirical Risk Minimization with Generalization Neutrality Bound." In Machine Learning and Knowledge Discovery in Databases, 418–33. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-44848-9_27.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Empirical risk minimization"

1

Oneto, Luca, Michele Donini, and Massimiliano Pontil. "General Fair Empirical Risk Minimization." In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. http://dx.doi.org/10.1109/ijcnn48605.2020.9206819.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Xin, Ran, Anit Kumar Sahu, Soummya Kar, and Usman A. Khan. "Distributed empirical risk minimization over directed graphs." In 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 2019. http://dx.doi.org/10.1109/ieeeconf44664.2019.9049065.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Shaoshen, Yanbin Liu, Ling Chen, and Chengqi Zhang. "Diminishing Empirical Risk Minimization for Unsupervised Anomaly Detection." In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. http://dx.doi.org/10.1109/ijcnn55064.2022.9892076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Taheri, Omid, and Sergiy A. Vorobyov. "Empirical risk minimization-based analysis of segmented compressed sampling." In 2010 44th Asilomar Conference on Signals, Systems and Computers. IEEE, 2010. http://dx.doi.org/10.1109/acssc.2010.5757506.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Xinqiang, and Lei Guo. "A new convergent algorithm for online empirical risk minimization." In 2017 36th Chinese Control Conference (CCC). IEEE, 2017. http://dx.doi.org/10.23919/chicc.2017.8029140.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Deoras, Anoop, Denis Filimonov, Mary Harper, and Fred Jelinek. "Model combination for Speech Recognition using Empirical Bayes Risk minimization." In 2010 IEEE Spoken Language Technology Workshop (SLT 2010). IEEE, 2010. http://dx.doi.org/10.1109/slt.2010.5700857.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Bassily, Raef, Adam Smith, and Abhradeep Thakurta. "Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds." In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2014. http://dx.doi.org/10.1109/focs.2014.56.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Badiei Khuzani, Masoud. "Distributed Primal-Dual Proximal Method for Regularized Empirical Risk Minimization." In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2018. http://dx.doi.org/10.1109/icmla.2018.00152.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Jiaxuan Huang and Hongsen Huang. "Using empirical risk minimization to detect community structure in the blogosphere." In 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering (ISKE). IEEE, 2010. http://dx.doi.org/10.1109/iske.2010.5680843.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Shen, Zebang, Hui Qian, Tongzhou Mu, and Chao Zhang. "Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/378.

Full text
Abstract:
Nowadays, algorithms with fast convergence, small memory footprints, and low per-iteration complexity are particularly favorable for artificial intelligence applications. In this paper, we propose a doubly stochastic algorithm with a novel accelerating multi-momentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergence rate, in each iteration, such algorithm only accesses a mini batch of samples and meanwhile updates a small block of variable coordinates, which substantially reduces the amount of memory reference when both the massive sample size and ultra-high dimensionality are involved. Specifically, to obtain an ε-accurate solution, our algorithm requires only O(log(1/ε)/sqrt(ε)) overall computation for the general convex case and O((n+sqrt{nκ})log(1/ε)) for the strongly convex case. Empirical studies on huge scale datasets are conducted to illustrate the efficiency of our method in practice.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Empirical risk minimization"

1

Caponnetto, Andrea, and Alexander Rakhlin. Some Properties of Empirical Risk Minimization Over Donsker Classes. Fort Belvoir, VA: Defense Technical Information Center, May 2005. http://dx.doi.org/10.21236/ada454986.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Mukherjee, Sayan, Partha Niyogi, Tomaso Poggio, and Ryan Rifkin. Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization. Fort Belvoir, VA: Defense Technical Information Center, January 2004. http://dx.doi.org/10.21236/ada459857.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography