Relevant bibliographies by topics / Scaled gradient descent

Academic literature on the topic 'Scaled gradient descent'

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Journal articles
Dissertations / Theses
Book chapters
Conference papers

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Scaled gradient descent.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Scaled gradient descent"

Farhana Husin, Siti, Mustafa Mamat, Mohd Asrul Hery Ibrahim, and Mohd Rivaie. "A modification of steepest descent method for solving large-scaled unconstrained optimization problems." International Journal of Engineering & Technology 7, no. 3.28 (August 17, 2018): 72. http://dx.doi.org/10.14419/ijet.v7i3.28.20969.

Full text

Abstract:

In this paper, we develop a new search direction for Steepest Descent (SD) method by replacing previous search direction from Conjugate Gradient (CG) method, , with gradient from the previous step, for solving large-scale optimization problem. We also used one of the conjugate coefficient as a coefficient for matrix . Under some reasonable assumptions, we prove that the proposed method with exact line search satisfies descent property and possesses the globally convergent. Further, the numerical results on some unconstrained optimization problem show that the proposed algorithm is promising.

APA, Harvard, Vancouver, ISO, and other styles

Okoubi, Firmin Andzembe, and Jonas Koko. "Parallel Nesterov Domain Decomposition Method for Elliptic Partial Differential Equations." Parallel Processing Letters 30, no. 01 (March 2020): 2050004. http://dx.doi.org/10.1142/s0129626420500048.

Full text

Abstract:

We study a parallel non-overlapping domain decomposition method, based on the Nesterov accelerated gradient descent, for the numerical approximation of elliptic partial differential equations. The problem is reformulated as a constrained (convex) minimization problem with the interface continuity conditions as constraints. The resulting domain decomposition method is an accelerated projected gradient descent with convergence rate [Formula: see text]. At each iteration, the proposed method needs only one matrix/vector multiplication. Numerical experiments show that significant (standard and scaled) speed-ups can be obtained.

APA, Harvard, Vancouver, ISO, and other styles

Maduranga, Kehelwala D. G., Kyle E. Helfrich, and Qiang Ye. "Complex Unitary Recurrent Neural Networks Using Scaled Cayley Transform." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 4528–35. http://dx.doi.org/10.1609/aaai.v33i01.33014528.

Full text

Abstract:

Recurrent neural networks (RNNs) have been successfully used on a wide range of sequential data problems. A well known difficulty in using RNNs is the vanishing or exploding gradient problem. Recently, there have been several different RNN architectures that try to mitigate this issue by maintaining an orthogonal or unitary recurrent weight matrix. One such architecture is the scaled Cayley orthogonal recurrent neural network (scoRNN) which parameterizes the orthogonal recurrent weight matrix through a scaled Cayley transform. This parametrization contains a diagonal scaling matrix consisting of positive or negative one entries that can not be optimized by gradient descent. Thus the scaling matrix is fixed before training and a hyperparameter is introduced to tune the matrix for each particular task. In this paper, we develop a unitary RNN architecture based on a complex scaled Cayley transform. Unlike the real orthogonal case, the transformation uses a diagonal scaling matrix consisting of entries on the complex unit circle which can be optimized using gradient descent and no longer requires the tuning of a hyperparameter. We also provide an analysis of a potential issue of the modReLU activiation function which is used in our work and several other unitary RNNs. In the experiments conducted, the scaled Cayley unitary recurrent neural network (scuRNN) achieves comparable or better results than scoRNN and other unitary RNNs without fixing the scaling matrix.

APA, Harvard, Vancouver, ISO, and other styles

Bayati. "New Scaled Sufficient Descent Conjugate Gradient Algorithm for Solving Unconstraint Optimization Problems." Journal of Computer Science 6, no. 5 (May 1, 2010): 511–18. http://dx.doi.org/10.3844/jcssp.2010.511.518.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Al-batah, Mohammad Subhi, Mutasem Sh Alkhasawneh, Lea Tien Tay, Umi Kalthum Ngah, Habibah Hj Lateh, and Nor Ashidi Mat Isa. "Landslide Occurrence Prediction Using Trainable Cascade Forward Network and Multilayer Perceptron." Mathematical Problems in Engineering 2015 (2015): 1–9. http://dx.doi.org/10.1155/2015/512158.

Full text

Abstract:

Landslides are one of the dangerous natural phenomena that hinder the development in Penang Island, Malaysia. Therefore, finding the reliable method to predict the occurrence of landslides is still the research of interest. In this paper, two models of artificial neural network, namely, Multilayer Perceptron (MLP) and Cascade Forward Neural Network (CFNN), are introduced to predict the landslide hazard map of Penang Island. These two models were tested and compared using eleven machine learning algorithms, that is, Levenberg Marquardt, Broyden Fletcher Goldfarb, Resilient Back Propagation, Scaled Conjugate Gradient, Conjugate Gradient with Beale, Conjugate Gradient with Fletcher Reeves updates, Conjugate Gradient with Polakribiere updates, One Step Secant, Gradient Descent, Gradient Descent with Momentum and Adaptive Learning Rate, and Gradient Descent with Momentum algorithm. Often, the performance of the landslide prediction depends on the input factors beside the prediction method. In this research work, 14 input factors were used. The prediction accuracies of networks were verified using the Area under the Curve method for the Receiver Operating Characteristics. The results indicated that the best prediction accuracy of 82.89% was achieved using the CFNN network with the Levenberg Marquardt learning algorithm for the training data set and 81.62% for the testing data set.

APA, Harvard, Vancouver, ISO, and other styles

Al-Naemi, Ghada M., and Ahmed H. Sheekoo. "New scaled algorithm for non-linear conjugate gradients in unconstrained optimization." Indonesian Journal of Electrical Engineering and Computer Science 24, no. 3 (December 1, 2021): 1589. http://dx.doi.org/10.11591/ijeecs.v24.i3.pp1589-1595.

Full text

Abstract:

<p>A new scaled conjugate gradient (SCG) method is proposed throughout this paper, the SCG technique may be a special important generalization conjugate gradient (CG) method, and it is an efficient numerical method for solving nonlinear large scale unconstrained optimization. As a result, we proposed the new SCG method with a strong Wolfe condition (SWC) line search is proposed. The proposed technique's descent property, as well as its global convergence property, are satisfied without the use of any line searches under some suitable assumptions. The proposed technique's efficiency and feasibility are backed up by numerical experiments comparing them to traditional CG techniques.</p>

APA, Harvard, Vancouver, ISO, and other styles

Arthur, C. K., V. A. Temeng, and Y. Y. Ziggah. "Performance Evaluation of Training Algorithms in Backpropagation Neural Network Approach to Blast-Induced Ground Vibration Prediction." Ghana Mining Journal 20, no. 1 (July 7, 2020): 20–33. http://dx.doi.org/10.4314/gm.v20i1.3.

Full text

Abstract:

Abstract Backpropagation Neural Network (BPNN) is an artificial intelligence technique that has seen several applications in many fields of science and engineering. It is well-known that, the critical task in developing an effective and accurate BPNN model depends on an appropriate training algorithm, transfer function, number of hidden layers and number of hidden neurons. Despite the numerous contributing factors for the development of a BPNN model, training algorithm is key in achieving optimum BPNN model performance. This study is focused on evaluating and comparing the performance of 13 training algorithms in BPNN for the prediction of blast-induced ground vibration. The training algorithms considered include: Levenberg-Marquardt, Bayesian Regularisation, Broyden–Fletcher–Goldfarb–Shanno (BFGS) Quasi-Newton, Resilient Backpropagation, Scaled Conjugate Gradient, Conjugate Gradient with Powell/Beale Restarts, Fletcher-Powell Conjugate Gradient, Polak-Ribiére Conjugate Gradient, One Step Secant, Gradient Descent with Adaptive Learning Rate, Gradient Descent with Momentum, Gradient Descent, and Gradient Descent with Momentum and Adaptive Learning Rate. Using ranking values for the performance indicators of Mean Squared Error (MSE), correlation coefficient (R), number of training epoch (iteration) and the duration for convergence, the performance of the various training algorithms used to build the BPNN models were evaluated. The obtained overall ranking results showed that the BFGS Quasi-Newton algorithm outperformed the other training algorithms even though the Levenberg Marquardt algorithm was found to have the best computational speed and utilised the smallest number of epochs. Keywords: Artificial Intelligence, Blast-induced Ground Vibration, Backpropagation Training Algorithms

APA, Harvard, Vancouver, ISO, and other styles

Abbaspour-Gilandeh, Yousef, Masoud Fazeli, Ali Roshanianfard, Mario Hernández-Hernández, Iván Gallardo-Bernal, and José Luis Hernández-Hernández. "Prediction of Draft Force of a Chisel Cultivator Using Artificial Neural Networks and Its Comparison with Regression Model." Agronomy 10, no. 4 (March 25, 2020): 451. http://dx.doi.org/10.3390/agronomy10040451.

Full text

Abstract:

In this study, artificial neural networks (ANNs) were used to predict the draft force of a rigid tine chisel cultivator. The factorial experiment based on the randomized complete block design (RCBD) was used to obtain the required data and to determine the factors affecting the draft force. The draft force of the chisel cultivator was measured using a three-point hitch dynamometer and data were collected using a DT800 datalogger. A recurrent back-propagation multilayer network was selected to predict the draft force of the cultivator. The gradient descent algorithm with momentum, Levenberg–Marquardt algorithm, and scaled conjugate gradient descent algorithm were used for network training. The tangent sigmoid transfer function was the activation functions in the layers. The draft force was predicted based on the tillage depth, soil moisture content, soil cone index, and forward speed. The results showed that the developed ANNs with two hidden layers (24 and 26 neurons in the first and second layers, respectively) with the use of the scaled conjugate gradient descent algorithm outperformed the networks developed with other algorithms. The average simulation accuracy and the correlation coefficient for the prediction of draft force of a chisel cultivator were 99.83% and 0.9445, respectively. The linear regression model had a much lower accuracy and correlation coefficient for predicting the draft force compared to the ANNs.

APA, Harvard, Vancouver, ISO, and other styles

Sra, Suvrit. "On the Matrix Square Root via Geometric Optimization." Electronic Journal of Linear Algebra 31 (February 5, 2016): 433–43. http://dx.doi.org/10.13001/1081-3810.3196.

Full text

Abstract:

This paper is triggered by the preprint [P. Jain, C. Jin, S.M. Kakade, and P. Netrapalli. Computing matrix squareroot via non convex local search. Preprint, arXiv:1507.05854, 2015.], which analyzes gradient-descent for computing the square root of a positive definite matrix. Contrary to claims of Jain et al., the authorâs experiments reveal that Newton-like methods compute matrix square roots rapidly and reliably, even for highly ill-conditioned matrices and without requiring com-mutativity. The author observes that gradient-descent converges very slowly primarily due to tiny step-sizes and ill-conditioning. The paper derives an alternative first-order method based on geodesic convexity; this method admits a transparent convergence analysis (< 1 page), attains linear rate, and displays reliable convergence even for rank deficient problems. Though superior to gradient-descent, ultimately this method is also outperformed by a well-known scaled Newton method. Nevertheless, the primary value of the paper is conceptual: it shows that for deriving gradient based methods for the matrix square root, the manifold geometric view of positive definite matrices can be much more advantageous than the Euclidean view.

APA, Harvard, Vancouver, ISO, and other styles

Hamed, Eman T., Rana Z. Al-Kawaz, and Abbas Y. Al-Bayati. "New Investigation for the Liu-Story Scaled Conjugate Gradient Method for Nonlinear Optimization." Journal of Mathematics 2020 (January 25, 2020): 1–12. http://dx.doi.org/10.1155/2020/3615208.

Full text

Abstract:

This article considers modified formulas for the standard conjugate gradient (CG) technique that is planned by Li and Fukushima. A new scalar parameter θkNew for this CG technique of unconstrained optimization is planned. The descent condition and global convergent property are established below using strong Wolfe conditions. Our numerical experiments show that the new proposed algorithms are more stable and economic as compared to some well-known standard CG methods.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Scaled gradient descent"

Doan, Thanh-Nghi. "Large scale support vector machines algorithms for visual classification." Thesis, Rennes 1, 2013. http://www.theses.fr/2013REN1S083/document.

Full text

Abstract:

Nous présentons deux contributions majeures : 1) une combinaison de plusieurs descripteurs d’images pour la classification à grande échelle, 2) des algorithmes parallèles de SVM pour la classification d’images à grande échelle. Nous proposons aussi un algorithme incrémental et parallèle de classification lorsque les données ne peuvent plus tenir en mémoire vive
We have proposed a novel method of combination multiple of different features for image classification. For large scale learning classifiers, we have developed the parallel versions of both state-of-the-art linear and nonlinear SVMs. We have also proposed a novel algorithm to extend stochastic gradient descent SVM for large scale learning. A class of large scale incremental SVM classifiers has been developed in order to perform classification tasks on large datasets with very large number of classes and training data can not fit into memory

APA, Harvard, Vancouver, ISO, and other styles

Akata, Zeynep. "Contributions à l'apprentissage grande échelle pour la classification d'images." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM003/document.

Full text

Abstract:

La construction d'algorithmes classifiant des images à grande échelle est devenue une t^ache essentielle du fait de la difficulté d'effectuer des recherches dans les immenses collections de données visuelles non-etiquetées présentes sur Internet. L'objetif est de classifier des images en fonction de leur contenu pour simplifier la gestion de telles bases de données. La classification d'images à grande échelle est un problème complexe, de par l'importance de la taille des ensembles de données, tant en nombre d'images qu'en nombre de classes. Certaines de ces classes sont dites "fine-grained" (sémantiquement proches les unes des autres) et peuvent même ne contenir aucun représentant étiqueté. Dans cette thèse, nous utilisons des représentations à l'état de l'art d'images et nous concentrons sur des méthodes d'apprentissage efficaces. Nos contributions sont (1) un banc d'essai d'algorithmes d'apprentissage pour la classification à grande échelle et (2) un nouvel algorithme basé sur l'incorporation d'étiquettes pour apprendre sur des données peu abondantes. En premier lieu, nous introduisons un banc d'essai d'algorithmes d'apprentissage pour la classification à grande échelle, dans un cadre entièrement supervisé. Il compare plusieurs fonctions objectifs pour apprendre des classifieurs linéaires, tels que "un contre tous", "multiclasse", "classement", "classement avec pondération" par descente de gradient stochastique. Ce banc d'essai se conclut en un ensemble de recommandations pour la classification à grande échelle. Avec une simple repondération des données, la stratégie "un contre tous" donne des performances meilleures que toutes les autres. Par ailleurs, en apprentissage en ligne, un pas d'apprentissage assez petit s'avère suffisant pour obtenir des résultats au niveau de l'état de l'art. Enfin, l'arrêt prématuré de la descente de gradient stochastique introduit une régularisation qui améliore la vitesse d'entraînement ainsi que la capacité de régularisation. Deuxièmement, face à des milliers de classes, il est parfois difficile de rassembler suffisamment de données d'entraînement pour chacune des classes. En particulier, certaines classes peuvent être entièrement dénuées d'exemples. En conséquence, nous proposons un nouvel algorithme adapté à ce scénario d'apprentissage dit "zero-shot". Notre algorithme utilise des données parallèles, comme les attributs, pour incorporer les classes dans un espace euclidien. Nous introduisons par ailleurs une fonction pour mesurer la compatibilité entre image et étiquette. Les paramètres de cette fonction sont appris en utilisant un objectif de type "ranking". Notre algorithme dépasse l'état de l'art pour l'apprentissage "zero-shot", et fait preuve d'une grande flexibilité en permettant d'incorporer d'autres sources d'information parallèle, comme des hiérarchies. Il permet en outre une transition sans heurt du cas "zero-shot" au cas où peu d'exemples sont disponibles
Building algorithms that classify images on a large scale is an essential task due to the difficulty in searching massive amount of unlabeled visual data available on the Internet. We aim at classifying images based on their content to simplify the manageability of such large-scale collections. Large-scale image classification is a difficult problem as datasets are large with respect to both the number of images and the number of classes. Some of these classes are fine grained and they may not contain any labeled representatives. In this thesis, we use state-of-the-art image representations and focus on efficient learning methods. Our contributions are (1) a benchmark of learning algorithms for large scale image classification, and (2) a novel learning algorithm based on label embedding for learning with scarce training data. Firstly, we propose a benchmark of learning algorithms for large scale image classification in the fully supervised setting. It compares several objective functions for learning linear classifiers such as one-vs-rest, multiclass, ranking and weighted average ranking using the stochastic gradient descent optimization. The output of this benchmark is a set of recommendations for large-scale learning. We experimentally show that, online learning is well suited for large-scale image classification. With simple data rebalancing, One-vs-Rest performs better than all other methods. Moreover, in online learning, using a small enough step size with respect to the learning rate is sufficient for state-of-the-art performance. Finally, regularization through early stopping results in fast training and a good generalization performance. Secondly, when dealing with thousands of classes, it is difficult to collect sufficient labeled training data for each class. For some classes we might not even have a single training example. We propose a novel algorithm for this zero-shot learning scenario. Our algorithm uses side information, such as attributes to embed classes in a Euclidean space. We also introduce a function to measure the compatibility between an image and a label. The parameters of this function are learned using a ranking objective. Our algorithm outperforms the state-of-the-art for zero-shot learning. It is flexible and can accommodate other sources of side information such as hierarchies. It also allows for a smooth transition from zero-shot to few-shots learning

APA, Harvard, Vancouver, ISO, and other styles

Silveti, Falls Antonio. "First-order noneuclidean splitting methods for large-scale optimization : deterministic and stochastic algorithms." Thesis, Normandie, 2021. http://www.theses.fr/2021NORMC204.

Full text

Abstract:

Dans ce travail, nous développons et examinons deux nouveaux algorithmes d'éclatement du premier ordre pour résoudre des problèmes d'optimisation composites à grande échelle dans des espaces à dimensions infinies. Ces problèmes sont au coeur de nombres de domaines scientifiques et d'ingénierie, en particulier la science des données et l'imagerie. Notre travail est axé sur l'assouplissement des hypothèses de régularité de Lipschitz généralement requises par les algorithmes de fractionnement du premier ordre en remplaçant l'énergie euclidienne par une divergence de Bregman. Ces développements permettent de résoudre des problèmes ayant une géométrie plus exotique que celle du cadre euclidien habituel. Un des algorithmes développés est l'hybridation de l'algorithme de gradient conditionnel, utilisant un oracle de minimisation linéaire à chaque itération, avec méthode du Lagrangien augmenté, permettant ainsi la prise en compte de contraintes affines. L'autre algorithme est un schéma d'éclatement primal-dual incorporant les divergences de Bregman pour le calcul des opérateurs proximaux associés. Pour ces deux algorithmes, nous montrons la convergence des valeurs Lagrangiennes, la convergence faible des itérés vers les solutions ainsi que les taux de convergence. En plus de ces nouveaux algorithmes déterministes, nous introduisons et étudions également leurs extensions stochastiques au travers d'un point de vue d'analyse de stablité aux perturbations. Nos résultats dans cette partie comprennent des résultats de convergence presque sûre pour les mêmes quantités que dans le cadre déterministe, avec des taux de convergence également. Enfin, nous abordons de nouveaux problèmes qui ne sont accessibles qu'à travers les hypothèses relâchées que nos algorithmes permettent. Nous démontrons l'efficacité numérique et illustrons nos résultats théoriques sur des problèmes comme la complétion de matrice parcimonieuse de rang faible, les problèmes inverses sur le simplexe, ou encore les problèmes inverses impliquant la distance de Wasserstein régularisée
In this work we develop and examine two novel first-order splitting algorithms for solving large-scale composite optimization problems in infinite-dimensional spaces. Such problems are ubiquitous in many areas of science and engineering, particularly in data science and imaging sciences. Our work is focused on relaxing the Lipschitz-smoothness assumptions generally required by first-order splitting algorithms by replacing the Euclidean energy with a Bregman divergence. These developments allow one to solve problems having more exotic geometry than that of the usual Euclidean setting. One algorithm is hybridization of the conditional gradient algorithm, making use of a linear minimization oracle at each iteration, with an augmented Lagrangian algorithm, allowing for affine constraints. The other algorithm is a primal-dual splitting algorithm incorporating Bregman divergences for computing the associated proximal operators. For both of these algorithms, our analysis shows convergence of the Lagrangian values, subsequential weak convergence of the iterates to solutions, and rates of convergence. In addition to these novel deterministic algorithms, we introduce and study also the stochastic extensions of these algorithms through a perturbation perspective. Our results in this part include almost sure convergence results for all the same quantities as in the deterministic setting, with rates as well. Finally, we tackle new problems that are only accessible through the relaxed assumptions our algorithms allow. We demonstrate numerical efficiency and verify our theoretical results on problems like low rank, sparse matrix completion, inverse problems on the simplex, and entropically regularized Wasserstein inverse problems

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Scaled gradient descent"

Bottou, Léon. "Large-Scale Machine Learning with Stochastic Gradient Descent." In Proceedings of COMPSTAT'2010, 177–86. Heidelberg: Physica-Verlag HD, 2010. http://dx.doi.org/10.1007/978-3-7908-2604-3_16.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Shi, Ziqiang, and Rujie Liu. "Large Scale Optimization with Proximal Stochastic Newton-Type Gradient Descent." In Machine Learning and Knowledge Discovery in Databases, 691–704. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-23528-8_43.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yating, Chen. "Cooperation Coevolution Differential Evolution with Gradient Descent Strategy for Large Scale." In Lecture Notes in Computer Science, 429–39. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-61824-1_47.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sharma, Sweta, and Reshma Rastogi. "Stochastic Conjugate Gradient Descent Twin Support Vector Machine for Large Scale Pattern Classification." In AI 2018: Advances in Artificial Intelligence, 590–602. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-03991-2_54.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bottou, Léon. "Large-Scale Machine Learning with Stochastic Gradient Descent." In Chapman & Hall/CRC Computer Science & Data Analysis, 17–25. Chapman and Hall/CRC, 2011. http://dx.doi.org/10.1201/b11429-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Surono, Sugiyarto, Aris Thobirin, Zani Anjani Rafsanjani Hsm, Asih Yuli Astuti, Berlin Ryan Kp, and Milla Oktavia. "Optimization of Fuzzy System Inference Model on Mini Batch Gradient Descent." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2022. http://dx.doi.org/10.3233/faia220387.

Full text

Abstract:

Optimization is one of the factors in machine learning to help model training during backpropagation. This is conducted by adjusting the weights to minimize the loss function and to overcome dimensional problems. Also, the gradient descent method is a simple approach in the backpropagation model to solve minimum problems. The mini-batch gradient descent (MBGD) is one of the methods proven to be powerful for large-scale learning. The addition of several approaches to the MBGD such as AB, BN, and UR can accelerate the convergence process, hence, the algorithm becomes faster and more effective. This added method will perform an optimization process on the results of the data rule that has been processed as its objective function. The processing results showed the MBGD-AB-BN-UR method has a more stable computational time in the three data sets than the other methods. For the model evaluation, this research used RMSE, MAE, and MAPE.

APA, Harvard, Vancouver, ISO, and other styles

"Large-Scale Machine Learning with Stochastic Gradient Descent Léon Bottou." In Statistical Learning and Data Science, 33–42. Chapman and Hall/CRC, 2011. http://dx.doi.org/10.1201/b11429-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Naik, Bighnaraj, Janmenjoy Nayak, and H. S. Behera. "A Hybrid Model of FLANN and Firefly Algorithm for Classification." In Handbook of Research on Natural Computing for Optimization Problems, 491–522. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-5225-0058-2.ch021.

Full text

Abstract:

Since last decade, biologically inspired optimization techniques have been a keen interest among the researchers of optimization community. Some of the well developed and advanced popular algorithms such as GA, PSO etc. are found to be performing well for solving large scale problems. In this chapter, a recently developed nature inspired firefly algorithm has been proposed by the combination of an efficient higher order functional link neural network for the classification of the real world data. The main advantage of firefly algorithm is to obtain the solutions for global optima, where some of the earlier developed swarm intelligence algorithms fail to do so. For learning the neural network, efficient gradient descent learning is used to optimize the weights. The proposed method is able to classify the non-linear data more efficiently with less error rate. Under null-hypothesis, the proposed method has been tested with various statistical methods to prove its statistical significance.

APA, Harvard, Vancouver, ISO, and other styles

Mennour, Rostom, and Mohamed Batouche. "Novel Scalable Deep Learning Approaches for Big Data Analytics Applied to ECG Processing." In Deep Learning and Neural Networks, 633–53. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-7998-0414-7.ch035.

Full text

Abstract:

Big data analytics and deep learning are nowadays two of the most active research areas in computer science. As the data is becoming bigger and bigger, deep learning has a very important role to play in data analytics, and big data technologies will give it huge opportunities for different sectors. Deep learning brings new challenges especially when it comes to large amounts of data, the volume of datasets has to be processed and managed, also data in various applications come in a streaming way and deep learning approaches have to deal with this kind of applications. In this paper, the authors propose two novel approaches for discriminative deep learning, namely LS-DSN, and StreamDSN that are inspired from the deep stacking network algorithm. Two versions of the gradient descent algorithm were used to train the proposed algorithms. The experiment results have shown that the algorithms gave satisfying accuracy results and scale well when the size of data increases. In addition, StreamDSN algorithm have been applied to classify beats of ECG signals and provided good promising results.

APA, Harvard, Vancouver, ISO, and other styles

Amitab, Khwairakpam, Debdatta Kandar, and Arnab K. Maji. "Speckle Noise Filtering Using Back-Propagation Multi-Layer Perceptron Network in Synthetic Aperture Radar Image." In Research Advances in the Integration of Big Data and Smart Computing, 280–301. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-4666-8737-0.ch016.

Full text

Abstract:

Synthetic Aperture Radar (SAR) are imaging Radar, it uses electromagnetic radiation to illuminate the scanned surface and produce high resolution images in all-weather condition, day and night. Interference of signals causes noise and degrades the quality of the image, it causes serious difficulty in analyzing the images. Speckle is multiplicative noise that inherently exist in SAR images. Artificial Neural Network (ANN) have the capability of learning and is gaining popularity in SAR image processing. Multi-Layer Perceptron (MLP) is a feed forward artificial neural network model that consists of an input layer, several hidden layers, and an output layer. We have simulated MLP with two hidden layer in Matlab. Speckle noises were added to the target SAR image and applied MLP for speckle noise reduction. It is found that speckle noise in SAR images can be reduced by using MLP. We have considered Log-sigmoid, Tan-Sigmoid and Linear Transfer Function for the hidden layers. The MLP network are trained using Gradient descent with momentum back propagation, Resilient back propagation and Levenberg-Marquardt back propagation and comparatively evaluated the performance.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Scaled gradient descent"

Mishra, Bamdev, and Rodolphe Sepulchre. "Scaled stochastic gradient descent for low-rank matrix completion." In 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 2016. http://dx.doi.org/10.1109/cdc.2016.7798689.

Full text

APA, Harvard, Vancouver, ISO, and other styles

"SCALED GRADIENT DESCENT LEARNING RATE - Reinforcement learning with light-seeking robot." In First International Conference on Informatics in Control, Automation and Robotics. SciTePress - Science and and Technology Publications, 2004. http://dx.doi.org/10.5220/0001138600030011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wu, Jui-Yu, and Pei-Ci Liu. "Identifying a Default of Credit Card Clients by using a LSTM Method: A Case Study." In 8th International Conference on Artificial Intelligence (ARIN 2022). Academy and Industry Research Collaboration Center (AIRCC), 2022. http://dx.doi.org/10.5121/csit.2022.121012.

Full text

Abstract:

Detecting fraudulent transactions is critical and challenging for financial banks and institutes. This study used a deep learning technique, which is a long short-term memory (LSTM) method, for identifying a default of credit card clients (an imbalanced dataset). To evaluate the performance of optimizers for the LSTM approach, this study employed three optimizers based on gradient methods, such as adaptive moment estimation (Adam), stochastic gradient descent with momentum (Sgdm) and root mean square propagation (Rmsprop). This study used 10-fold cross-validation. Moreover, this study compared the best numerical results of the LSTM method with those of supervised machine learning classifiers, which are back-propagation neural network (BPNN) with a gradient descent algorithm (GDA) and a scaled conjugate gradient algorithm (SCGA). Numerical results indicate that the LSTM-Adam and the BPNN-SCGA classifiers have identical performance, and that selecting an appropriate classification threshold value is important for an imbalanced dataset. Based on the numerical results, the LSTM-Adam classifier can be considered for dealing with credit scoring problems, which are binary classification problems.

APA, Harvard, Vancouver, ISO, and other styles

Zhou, Fan, and Guojing Cong. "On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/447.

Full text

Abstract:

We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG for solving large scale machine learning problems. We establish the convergence results of K-AVG for nonconvex objectives. Our analysis of K-AVG applies to many existing variants of synchronous SGD. We explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is equivalent to K-AVG with $K=1$. We also show that K-AVG scales better with the number of learners than asynchronous stochastic gradient descent (ASGD). Another advantage of K-AVG over ASGD is that it allows larger stepsizes and facilitates faster convergence. On a cluster of $128$ GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.

APA, Harvard, Vancouver, ISO, and other styles

Tang, Jingjing, Yingjie Tian, Guoqiang Wu, and Dewei Li. "Stochastic gradient descent for large-scale linear nonparallel SVM." In WI '17: International Conference on Web Intelligence 2017. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3106426.3109427.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gemulla, Rainer, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. "Large-scale matrix factorization with distributed stochastic gradient descent." In the 17th ACM SIGKDD international conference. New York, New York, USA: ACM Press, 2011. http://dx.doi.org/10.1145/2020408.2020426.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mu, Yang, Wei Ding, Tianyi Zhou, and Dacheng Tao. "Constrained stochastic gradient descent for large-scale least squares problem." In KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2013. http://dx.doi.org/10.1145/2487575.2487635.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gupta, Suyog, Wei Zhang, and Fei Wang. "Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/681.

Full text

Abstract:

Deep learning with a large number of parame-ters requires distributed training, where model accuracy and runtime are two important factors to be considered. However, there has been no systematic study of the tradeoff between these two factors during the model training process. This paper presents Rudra, a parameter server based distributed computing framework tuned for training large-scale deep neural networks. Using variants of the asynchronous stochastic gradient descent algorithm we study the impact of synchronization protocol, stale gradient updates, minibatch size, learning rates, and number of learners on runtime performance and model accuracy. We introduce a new learningrate modulation strategy to counter the effect of stale gradients and propose a new synchronization protocol that can effectively bound the staleness in gradients, improve runtime performance and achieve good model accuracy. Our empirical investigation reveals a principled approach for distributed training of neural networks: the mini-batch size per learner should be reduced as more learners are added to the system to preserve the model accuracy. We validate this approach using commonly-used image classification benchmarks: CIFAR10 and ImageNet.

APA, Harvard, Vancouver, ISO, and other styles

Li, Fengan, Lingjiao Chen, Yijing Zeng, Arun Kumar, Xi Wu, Jeffrey F. Naughton, and Jignesh M. Patel. "Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent." In SIGMOD/PODS '19: International Conference on Management of Data. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3299869.3300070.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Tong. "Solving large scale linear prediction problems using stochastic gradient descent algorithms." In Twenty-first international conference. New York, New York, USA: ACM Press, 2004. http://dx.doi.org/10.1145/1015330.1015332.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Academic literature on the topic 'Scaled gradient descent'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Scaled gradient descent"

Dissertations / Theses on the topic "Scaled gradient descent"

Book chapters on the topic "Scaled gradient descent"

Conference papers on the topic "Scaled gradient descent"