Dissertations / Theses: 'Neural Network Pruning'

1

Scalco, Alberto <1993&gt. "Feature Selection Using Neural Network Pruning." Master's Degree Thesis, Università Ca' Foscari Venezia, 2019. http://hdl.handle.net/10579/14382.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Feature selection is a well known technique for data prepossessing with the purpose of removing redundant and irrelevant information with the benefits, among others, of an improved generalization and a decreased curse of dimensionality. This paper investigates an approach based on a trained neural network model, where features are selected by iteratively removing a node in the input layer. This pruning process, comprise a node selection criterion and a subsequent weight correction: after a node elimination, the remaining weights are adjusted in a way that the overall network behaviour do not worsen over the entire training set. The pruning problem is formulated as a system of linear equations solved in a least-squares sense. This method allows the direct evaluation of the performance at each iteration and a stopping condition is also proposed. Finally experimental results are presented in comparison to another feature selection method.

2

Labarge, Isaac E. "Neural Network Pruning for ECG Arrhythmia Classification." DigitalCommons@CalPoly, 2020. https://digitalcommons.calpoly.edu/theses/2136.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Convolutional Neural Networks (CNNs) are a widely accepted means of solving complex classification and detection problems in imaging and speech. However, problem complexity often leads to considerable increases in computation and parameter storage costs. Many successful attempts have been made in effectively reducing these overheads by pruning and compressing large CNNs with only a slight decline in model accuracy. In this study, two pruning methods are implemented and compared on the CIFAR-10 database and an ECG arrhythmia classification task. Each pruning method employs a pruning phase interleaved with a finetuning phase. It is shown that when performing the scale-factor pruning algorithm on ECG, finetuning time can be expedited by 1.4 times over the traditional approach with only 10% of expensive floating-point operations retained, while experiencing no significant impact on accuracy.

3

Brantley, Kiante. "BCAP| An Artificial Neural Network Pruning Technique to Reduce Overfitting." Thesis, University of Maryland, Baltimore County, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10140605.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Determining the optimal size of a neural network is complicated. Neural networks, with many free parameters, can be used to solve very complex problems. However, these neural networks are susceptible to overfitting. BCAP (Brantley-Clark Artificial Neural Network Pruning Technique) addresses overfitting by combining duplicate neurons in a neural network hidden layer, thereby forcing the network to learn more distinct features. We compare hidden units using the cosine similarity, and combine those that are similar with each other within a threshold ϵ. By doing so the co-adaption of the neurons in the network is reduced because hidden units that are highly correlated (i.e. similar) are combined. In this paper we show evidence that BCAP is successful in reducing network size while maintaining accuracy, or improving accuracy of neural networks during and after training.

4

Hubens, Nathan. "Towards lighter and faster deep neural networks with parameter pruning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS025.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Depuis leur résurgence en 2012, les réseaux de neurones profonds sont devenus omniprésents dans la plupart des disciplines de l'intelligence artificielle, comme la reconnaissance d'images, le traitement de la parole et le traitement du langage naturel. Cependant, au cours des dernières années, les réseaux de neurones sont devenus exponentiellement profonds, faisant intervenir de plus en plus de paramètres. Aujourd'hui, il n'est pas rare de rencontrer des architectures impliquant plusieurs milliards de paramètres, alors qu'elles en contenaient le plus souvent des milliers il y a moins de dix ans.Cette augmentation généralisée du nombre de paramètres rend ces grands modèles gourmands en ressources informatiques et essentiellement inefficaces sur le plan énergétique. Cela rend les modèles déployés coûteux à maintenir, mais aussi leur utilisation dans des environnements limités en ressources très difficile.Pour ces raisons, de nombreuses recherches ont été menées pour proposer des techniques permettant de réduire la quantité de stockage et de calcul requise par les réseaux neuronaux. Parmi ces techniques, l'élagage synaptique, consistant à créer des modèles réduits, a récemment été mis en évidence. Cependant, bien que l'élagage soit une technique de compression courante, il n'existe actuellement aucune méthode standard pour mettre en œuvre ou évaluer les nouvelles méthodes, rendant la comparaison avec les recherches précédentes difficile.Notre première contribution concerne donc une description inédite des techniques d'élagage, développée selon quatre axes, et permettant de définir de manière univoque et complète les méthodes existantes. Ces composantes sont : la granularité, le contexte, les critères et le programme. Cette nouvelle définition du problème de l'élagage nous permet de le subdiviser en quatre sous-problèmes indépendants et de mieux déterminer les axes de recherche potentiels.De plus, les méthodes d'élagage en sont encore à un stade de développement précoce et principalement destinées aux chercheurs, rendant difficile pour les novices d'appliquer ces techniques. Pour combler cette lacune, nous avons proposé l'outil FasterAI, destiné aux chercheurs, désireux de créer et d'expérimenter différentes techniques de compression, mais aussi aux nouveaux venus, souhaitant compresser leurs modèles pour des applications concrètes. Cet outil a de plus été construit selon les quatre composantes précédemment définis, permettant une correspondance aisée entre les idées de recherche et leur mise en œuvre.Nous proposons ensuite quatre contributions théoriques, chacune visant à fournir de nouvelles perspectives et à améliorer les méthodes actuelles dans chacun des quatre axes de description identifiés. De plus, ces contributions ont été réalisées en utilisant l'outil précédemment développé, validant ainsi son utilité scientifique.Enfin, afin de démontrer que l'outil développé, ainsi que les différentes contributions scientifiques proposées, peuvent être applicables à un problème complexe et réel, nous avons sélectionné un cas d'utilisation : la détection de la manipulation faciale, également appelée détection de DeepFakes. Cette dernière contribution est accompagnée d'une application de preuve de concept, permettant à quiconque de réaliser la détection sur une image ou une vidéo de son choix.L'ère actuelle du Deep Learning a émergé grâce aux améliorations considérables des puissances de calcul et à l'accès à une grande quantité de données. Cependant, depuis le déclin de la loi de Moore, les experts suggèrent que nous pourrions observer un changement dans la façon dont nous concevons les ressources de calcul, conduisant ainsi à une nouvelle ère de collaboration entre les communautés du logiciel, du matériel et de l'apprentissage automatique. Cette nouvelle quête de plus d'efficacité passera donc indéniablement par les différentes techniques de compression des réseaux neuronaux, et notamment les techniques d'élagage
Since their resurgence in 2012, Deep Neural Networks have become ubiquitous in most disciplines of Artificial Intelligence, such as image recognition, speech processing, and Natural Language Processing. However, over the last few years, neural networks have grown exponentially deeper, involving more and more parameters. Nowadays, it is not unusual to encounter architectures involving several billions of parameters, while they mostly contained thousands less than ten years ago.This generalized increase in the number of parameters makes such large models compute-intensive and essentially energy inefficient. This makes deployed models costly to maintain but also their use in resource-constrained environments very challenging.For these reasons, much research has been conducted to provide techniques reducing the amount of storage and computing required by neural networks. Among those techniques, neural network pruning, consisting in creating sparsely connected models, has been recently at the forefront of research. However, although pruning is a prevalent compression technique, there is currently no standard way of implementing or evaluating novel pruning techniques, making the comparison with previous research challenging.Our first contribution thus concerns a novel description of pruning techniques, developed according to four axes, and allowing us to unequivocally and completely define currently existing pruning techniques. Those components are: the granularity, the context, the criteria, and the schedule. Defining the pruning problem according to those components allows us to subdivide the problem into four mostly independent subproblems and also to better determine potential research lines.Moreover, pruning methods are still in an early development stage, and primarily designed for the research community. Indeed, most pruning works are usually implemented in a self-contained and sophisticated way, making it troublesome for non-researchers to apply such techniques without having to learn all the intricacies of the field. To fill this gap, we proposed FasterAI toolbox, intended to be helpful to researchers, eager to create and experiment with different compression techniques, but also to newcomers, that desire to compress their neural network for concrete applications. In particular, the sparsification capabilities of FasterAI have been built according to the previously defined pruning components, allowing for a seamless mapping between research ideas and their implementation.We then propose four theoretical contributions, each one aiming at providing new insights and improving on state-of-the-art methods in each of the four identified description axes. Also, those contributions have been realized by using the previously developed toolbox, thus validating its scientific utility.Finally, to validate the applicative character of the pruning technique, we have selected a use case: the detection of facial manipulation, also called DeepFakes Detection. The goal is to demonstrate that the developed tool, as well as the different proposed scientific contributions, can be applicable to a complex and actual problem. This last contribution is accompanied by a proof-of-concept application, providing DeepFake detection capabilities in a web-based environment, thus allowing anyone to perform detection on an image or video of their choice.This Deep Learning era has emerged thanks to the considerable improvements in high-performance hardware and access to a large amount of data. However, since the decline of Moore's Law, experts are suggesting that we might observe a shift in how we conceptualize the hardware, by going from task-agnostic to domain-specialized computations, thus leading to a new era of collaboration between software, hardware, and machine learning communities. This new quest for more efficiency will thus undeniably go through neural network compression techniques, and particularly sparse computations

5

Santacroce, Michael. "Neural Classification of Malware-As-Video with Considerations for In-Hardware Inferencing." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1554216974556897.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Dupont, Robin. "Deep Neural Network Compression for Visual Recognition." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS565.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Grâce à la miniaturisation de l'électronique, les dispositifs embarqués sont devenus omniprésents depuis les années 2010, réalisant diverses tâches autour de nous. À mesure que leur utilisation augmente, la demande pour des dispositifs traitant les données et prenant des décisions complexes de manière efficace s'intensifie. Les réseaux de neurones profonds sont puissants pour cet objectif, mais souvent trop lourds pour les appareils embarqués. Il est donc impératif de compresser ces réseaux sans compromettre leur performance. Cette thèse introduit deux méthodes innovantes centrées sur l'élagage, pour compresser les réseaux sans impacter leur précision. Elle introduit d'abord une méthode qui considère un budget pour la compression de grands réseaux via la reparamétrisation des poids et une fonction de coût budgétaire, sans nécessité de fine-tuning. Les méthodes d'élagage traditionnelles reposent sur des indicateurs post-entraînement pour éliminer les poids, négligeant le taux d'élagage visé. Notre approche intègre une fonction de coût, guidant l'élagage vers une parcimonie précise pendant l'entraînement, optimisant la topologie et les poids. En simulant l'élagage des petits poids pendant l'entraînement via reparamétrisation, notre méthode limite la perte de précision par rapport aux méthodes traditionnelles. Nous démontrons son efficacité sur divers ensembles de données et architectures. La thèse se penche ensuite sur l'extraction de sous-réseaux efficaces sans entraîner les poids. L'objectif est de trouver la meilleure topologie d'un sous-réseau dans un grand réseau sans optimiser les poids, tout en offrant de bonnes performances. Ceci est fait grâce à notre méthode, l'Arbitrarily Shifted Log-Parametrisation, qui échantillonne des topologies de manière différentiable, permettant de former des masques indiquant la probabilité de sélection des poids. En parallèle, un mécanisme de recalibrage des poids, le Smart Rescale, est introduit, améliorant la performance des sous-réseaux et accélérant leur formation. Notre méthode trouve également le taux d'élagage optimal après un entraînement unique, évitant la recherche d'hyperparamètres et un entraînement pour chaque taux. Nous prouvons que notre méthode dépasse les techniques de pointe et permet de créer des réseaux légers avec haute parcimonie sans perdre en précision
Thanks to the miniaturisation of electronics, embedded devices have become ubiquitous since the 2010s, performing various tasks around us. As their usage expands, there's an increasing demand for efficient data processing and decision-making. Deep neural networks are apt tools for this, but they are often too large and intricate for embedded systems. Therefore, methods to compress these networks without affecting their performance are crucial. This PhD thesis introduces two methods focused on pruning to compress networks, maintaining accuracy. The thesis first details a budget-aware method for compressing large neural networks using weight reparametrisation and a budget loss, eliminating the need for fine-tuning. Traditional pruning methods often use post-training indicators to cut weights, ignoring desired pruning rates. Our method incorporates a budget loss, directing pruning during training, enabling simultaneous topology and weight optimisation. By soft-pruning smaller weights via reparametrisation, we reduce accuracy loss compared to standard pruning. We validate our method on several datasets and architectures. Later, the thesis examines extracting efficient subnetworks without weight training. We aim to discern the optimal subnetwork topology within a large network, bypassing weight optimisation yet ensuring strong performance. This is realized with our Arbitrarily Shifted Log Parametrisation, a differentiable method for discrete topology sampling, facilitating masks' training to denote weight selection probability. Additionally, a weight recalibration technique, Smart Rescale, is presented. It boosts extracted subnetworks' performance and hastens their training. Our method identifies the best pruning rate in a single training cycle, averting exhaustive hyperparameter searches and various rate training. Through extensive tests, our technique consistently surpasses similar state-of-the-art methods, creating streamlined networks that achieve high sparsity without notable accuracy drops

7

PRONO, LUCIANO. "Methods and Applications for Low-power Deep Neural Networks on Edge Devices." Doctoral thesis, Politecnico di Torino, 2023. https://hdl.handle.net/11583/2976593.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

ZULLICH, MARCO. "Un'analisi delle Tecniche di Potatura in Reti Neurali Profonde: Studi Sperimentali ed Applicazioni." Doctoral thesis, Università degli Studi di Trieste, 2023. https://hdl.handle.net/11368/3041099.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

La potatura, nel contesto dell'Apprendimento Automatico, denota l'atto di rimuovere parametri da modelli parametrici come modelli lineari, alberi decisionali e Reti Neurali Artificiali (ANN). La potatura di un modello può essere motivata da numerose esigenze, primo fra tutti la riduzione in dimensione e l'occupazione di memoria, possibilmente senza inficiare l'accuratezza finale del modello. L'interesse della comunità scientifica riguardo alla potatura delle ANN è aumentato in maniera sostanziosa nell'ultimo decennio a causa dell'altrettanto cospicua crescita nella dimensione di tali modelli. Ciò può seriamente limitare l'implementazione delle ANN in computer di bassa fascia, ponendo oltretutto un ostacolo alla democratizzazione dell'Intelligenza Artificiale. Avanzamenti recenti nell'ambito della potatura hanno mostrato in maniera empirica come si può, di fatto, rimuovere una grossa porzione di parametri (a volte anche superiore al 99%) con perdita in accuratezza minima o nulla. Nonostante ciò, rimangono ancora questioni aperte in proposito, specialmente per quanto concerne le dinamiche interne della potatura, ad esempio riguardo alle modalità con cui le caratteristiche apprese dalle ANN potate si relazionano a quelle delle corrispondenti ANN dense, oppure all'abilità delle ANN potate di generalizzare i loro risultati a dati o ambienti non osservati durante l'addestramento. Inoltre, la potatura è spesso costosa dal punto di vista computazionale e pone notevoli problematiche connesse all'alto consumo di energia e all'inquinamento. Nel presente elaborato, esporremo alcuni approcci per affrontare i problemi sopra introdotti: comparazione di rappresentazioni/caratteristiche apprese dalle ANN potate, efficientamento temporale di tecniche di potatura, applicazione della potatura a robot simulati, con un occhio di riguardo alla generalizzazione. Infine, mostriamo un utilizzo della potatura ai fini di ridurre la dimensione di un grosso modello di riconoscimento di oggetti per il riconoscimento di mascherine facciali, implementando successivamente tale modello in un dispositivo di bassa fascia a memoria ridotta, figurando una futura applicazione del modello nel campo della videosorveglianza.
Pruning, in the context of Machine Learning, denotes the act of removing parameters from parametric models, such as linear models, decision trees, and ANNs. Pruning can be motivated by several necessities, first and foremost the reduction in the size and the memory footprint of a model, possibly without hurting its accuracy. The interest of the scientific community to pruning applied to ANNs has increased substantially in the last decade due to the dramatic expansion in the size of these models. This can hinder the implementation of ANNs in lower-end computers, also posing a burden to democratization of Artificial Intelligence. Recent advances in pruning techniques have empirically shown to effectively remove a large portion of parameters (even over 99%) with none to minimal loss in accuracy. Despite this, open questions on the matter still remain, especially regarding the inner dynamics of pruning concerning, e.g., the way features learned by the pruned ANNs relate to their dense versions, or the ability of pruned ANNs to generalize to data or environments unseen during training. In addition, pruning is often computationally-expensive and poses notable issues concerning high energy consumption and pollution. We hereby present some approaches for tackling the aforementioned issues: comparing representations/features learned by pruned ANNs, improvement in time-efficiency of pruning, application to pruning to simulated robots, with an eye on generalization. Finally, we showcase the usage of pruning for deploying, on a low-end device with limited memory, a large object detection model for face mask detection, envisioning an application of the model to videosurveillance.

9

Yvinec, Edouard. "Efficient Neural Networks : Post Training Pruning and Quantization." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS581.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les réseaux de neurones profonds sont devenus les modèles les plus utilisés, que ce soit en vision par ordinateur ou en traitement du langage. Depuis le sursaut provoqué par l'utilisation des ordinateurs modernes, en 2012, la taille de ces modèles n'a fait qu'augmenter, aussi bien en matière de taille mémoire qu'en matière de coût de calcul. Ce phénomène a grandement limité le déploiement industriel de ces modèles. Spécifiquement, le cas de l'IA générative, et plus particulièrement des modèles de langue tels que GPT, a fait atteindre une toute nouvelle dimension à ce problème. En effet, ces réseaux sont définis par des milliards de paramètres et nécessitent plusieurs gpu en parallèle pour effectuer des inférences en temps réel. En réponse, la communauté scientifique et les spécialistes de l'apprentissage profond ont développé des solutions afin de compresser et d'accélérer ces modèles. Ces solutions sont : l'utilisation d'architecture efficiente par design, la décomposition tensorielle, l'élagage (ou pruning) et la quantification. Dans ce manuscrit de thèse, je propose de dépeindre une vue d'ensemble du domaine de la compression des réseaux de neurones artificiels ainsi que de mes contributions. Dans le premier chapitre, je présente une introduction générale au fonctionnement de chaque méthode de compression précédemment citée. De plus, j'y ajoute les intuitions relatives à leurs limitations ainsi que des exemples pratiques, issus des cours que j'ai donnés. Dans le second chapitre, je présente mes contributions au sujet du pruning. Ces dernières ont mené à la publications de trois articles: RED, RED++ et SInGE. Dans RED et RED++, j'ai proposé une nouvelle approche pour le pruning et la décomposition tensorielle, sans données. L'idée centrale était de réduire la redundance au sein des opérations effectuées par le modèle. 'A l'opposé, dans SInGE, j'ai défini un nouveau critère de pruning par importance. Pour ce faire, j'ai puisé de l'inspiration dans le domaine de l'attribution. En effet, afin d'expliquer les règles de décisions des réseaux de neurones profonds, les chercheurs et les chercheuses ont introduit des techniques visant à estimer l'importance relative des entrées du modèle par rapport aux sorties. Dans SInGE, j'ai adapté l'une de ces méthodes les plus efficaces, au pruning afin d'estimer l'importance des poids et donc des calculs du modèle. Dans le troisième chapitre, j'aborde mes contributions relatives à la quantification de réseaux de neurones. Celles-ci ont donné lieu à plusieurs publications dont les principales: SPIQ, PowerQuant, REx, NUPES et une publication sur les meilleurs pratiques à adopter. Dans SPIQ, PowerQuant et REx, j'adresse des limites spécifiques à la quantification sans données. En particulier, la granularité, dans SPIQ, la quantification non-uniform par automorphismes dans PowerQuant et l'utilisation d'une bit-width spécifique dans REx. Par ailleurs, dans les deux autres articles, je me suis attelé à la quantification post-training avec optimisation par descente de gradient. N'ayant pas eu le temps de toucher à tous les aspects de la compression de réseau de neurones, je conclue ce manuscrit par un chapitre sur ce qui me semble être les enjeux de demain ainsi que des pistes de solutions
Deep neural networks have grown to be the most widely adopted models to solve most computer vision and natural language processing tasks. Since the renewed interest, sparked in 2012, for these architectures, in machine learning, their size in terms of memory footprint and computational costs have increased tremendously, which has hindered their deployment. In particular, with the rising interest for generative ai such as large language models and diffusion models, this phenomenon has recently reached new heights, as these models can weight several billions of parameters and require multiple high-end gpus in order to infer in real-time. In response, the deep learning community has researched for methods to compress and accelerate these models. These methods are: efficient architecture design, tensor decomposition, pruning and quantization. In this manuscript, I paint a landscape of the current state-of-the art in deep neural networks compression and acceleration as well as my contributions to the field. First, I propose a general introduction to the aforementioned techniques and highlight their shortcomings and current challenges. Second, I provide a detailed discussion regarding my contributions to the field of deep neural networks pruning. These contributions led to the publication of three articles: RED, RED++ and SInGE. In RED and RED++, I introduced a novel way to perform data-free pruning and tensor decomposition based on redundancy reduction. On the flip side, in SInGE, I proposed a new importance-based criterion for data-driven pruning. This criterion was inspired by attribution techniques which consist in ranking inputs by their relative importance with respect to the final prediction. In SInGE, I adapted one of the most effective attribution technique to weight importance ranking for pruning. In the third chapter, I layout my contributions to the field of deep quantization: SPIQ, PowerQuant, REx, NUPES, and a best practice paper. Each of these methods address one of the previous limitations of post-training quantization. In SPIQ, PowerQuant and REx, I provide a solution to the granularity limitations of quantization, a novel non-uniform format which is particularly effective on transformer architectures and a technique for quantization decomposition which eliminates the need for unsupported bit-widths, respectively. In the two remaining articles, I provide significant improvements over existing gradient-based post-training quantization techniques, bridging the gap between such techniques and non-uniform quantization. In the last chapter, I propose a set of leads for future work which I believe to be the, current, most important unanswered questions in the field

10

Brigandì, Camilla. "Utilizzo della omologia persistente nelle reti neurali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Lo scopo di questa tesi è quello di introdurre alcune applicazioni della topologia algebrica, e in particolare della teoria dell’omologia persistente, alle reti neurali. A tal fine, nel primo capitolo dell’elaborato vengono introdotti i concetti di neurone e rete neurale artificiale. Viene posta particolare attenzione sull’addestramento di una rete, spiegando anche delle problematiche e delle caratteristiche ad esso legate, come il problema dell’overfitting e la capacità di generalizzazione. All’interno dello stesso capitolo vengono anche esposti il concetto di similarità tra due reti e il concetto di pruning, e vengono definiti rigorosamente i problemi di classificazione. Nel secondo capitolo vengono introdotte le nozioni basilari relative all’omologia persistente, vengono forniti degli strumenti utili alla visualizzazione e comparazione di tali nozioni (i barcodes e i diagrammi di persistenza), e vengono esposti dei metodi per la costruzione di complessi simpliciali a partire da grafi o insiemi di punti in R^d. Nel terzo e ultimo capitolo vengono riportati i risultati di applicazione cui ci si riferiva all’inizio dell’abstract. In particolare, vengono esposte delle ricerche basate sull’utilizzo dell’omologia persistente riguardanti la creazione di misure di espressività e similarità di architetture neurali, la messa a punto di un metodo di pruning, la creazione di una rete neurale resistente agli adversarial attacks di misura data, e alcuni risultati sulla modifica topologica dei dati che vengono elaborati da un’architettura neurale.

11

Riera, Villanueva Marc. "Low-power accelerators for cognitive computing." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669828.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications, and are especially efficient in classification and decision making problems such as speech recognition or machine translation. Mobile and embedded devices increasingly rely on DNNs to understand the world. Smartphones, smartwatches and cars perform discriminative tasks, such as face or object recognition, on a daily basis. Despite the increasing popularity of DNNs, running them on mobile and embedded systems comes with several main challenges: delivering high accuracy and performance with a small memory and energy budget. Modern DNN models consist of billions of parameters requiring huge computational and memory resources and, hence, they cannot be directly deployed on low-power systems with limited resources. The objective of this thesis is to address these issues and propose novel solutions in order to design highly efficient custom accelerators for DNN-based cognitive computing systems. In first place, we focus on optimizing the inference of DNNs for sequence processing applications. We perform an analysis of the input similarity between consecutive DNN executions. Then, based on the high degree of input similarity, we propose DISC, a hardware accelerator implementing a Differential Input Similarity Computation technique to reuse the computations of the previous execution, instead of computing the entire DNN. We observe that, on average, more than 60% of the inputs of any neural network layer tested exhibit negligible changes with respect to the previous execution. Avoiding the memory accesses and computations for these inputs results in 63% energy savings on average. In second place, we propose to further optimize the inference of FC-based DNNs. We first analyze the number of unique weights per input neuron of several DNNs. Exploiting common optimizations, such as linear quantization, we observe a very small number of unique weights per input for several FC layers of modern DNNs. Then, to improve the energy-efficiency of FC computation, we present CREW, a hardware accelerator that implements a Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage. We evaluate CREW on a diverse set of modern DNNs. On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator. In third place, we propose a mechanism to optimize the inference of RNNs. RNN cells perform element-wise multiplications across the activations of different gates, sigmoid and tanh being the common activation functions. We perform an analysis of the activation function values, and show that a significant fraction are saturated towards zero or one in popular RNNs. Then, we propose CGPA to dynamically prune activations from RNNs at a coarse granularity. CGPA avoids the evaluation of entire neurons whenever the outputs of peer neurons are saturated. CGPA significantly reduces the amount of computations and memory accesses while avoiding sparsity by a large extent, and can be easily implemented on top of conventional accelerators such as TPU with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs. Finally, in the last contribution of this thesis we focus on static DNN pruning methodologies. DNN pruning reduces memory footprint and computational work by removing connections and/or neurons that are ineffectual. However, we show that prior pruning schemes require an extremely time-consuming iterative process that requires retraining the DNN many times to tune the pruning parameters. Then, we propose a DNN pruning scheme based on Principal Component Analysis and relative importance of each neuron's connection that automatically finds the optimized DNN in one shot.
Les xarxes neuronals profundes (DNN) han aconseguit un èxit enorme en aplicacions cognitives, i són especialment eficients en problemes de classificació i presa de decisions com ara reconeixement de veu o traducció automàtica. Els dispositius mòbils depenen cada cop més de les DNNs per entendre el món. Els telèfons i rellotges intel·ligents, o fins i tot els cotxes, realitzen diàriament tasques discriminatòries com ara el reconeixement de rostres o objectes. Malgrat la popularitat creixent de les DNNs, el seu funcionament en sistemes mòbils presenta diversos reptes: proporcionar una alta precisió i rendiment amb un petit pressupost de memòria i energia. Les DNNs modernes consisteixen en milions de paràmetres que requereixen recursos computacionals i de memòria enormes i, per tant, no es poden utilitzar directament en sistemes de baixa potència amb recursos limitats. L'objectiu d'aquesta tesi és abordar aquests problemes i proposar noves solucions per tal de dissenyar acceleradors eficients per a sistemes de computació cognitiva basats en DNNs. En primer lloc, ens centrem en optimitzar la inferència de les DNNs per a aplicacions de processament de seqüències. Realitzem una anàlisi de la similitud de les entrades entre execucions consecutives de les DNNs. A continuació, proposem DISC, un accelerador que implementa una tècnica de càlcul diferencial, basat en l'alt grau de semblança de les entrades, per reutilitzar els càlculs de l'execució anterior, en lloc de computar tota la xarxa. Observem que, de mitjana, més del 60% de les entrades de qualsevol capa de les DNNs utilitzades presenten canvis menors respecte a l'execució anterior. Evitar els accessos de memòria i càlculs d'aquestes entrades comporta un estalvi d'energia del 63% de mitjana. En segon lloc, proposem optimitzar la inferència de les DNNs basades en capes FC. Primer analitzem el nombre de pesos únics per neurona d'entrada en diverses xarxes. Aprofitant optimitzacions comunes com la quantització lineal, observem un nombre molt reduït de pesos únics per entrada en diverses capes FC de DNNs modernes. A continuació, per millorar l'eficiència energètica del càlcul de les capes FC, presentem CREW, un accelerador que implementa un eficient mecanisme de reutilització de càlculs i emmagatzematge dels pesos. CREW redueix el nombre de multiplicacions i proporciona estalvis importants en l'ús de la memòria. Avaluem CREW en un conjunt divers de DNNs modernes. CREW proporciona, de mitjana, una millora en rendiment de 2,61x i un estalvi d'energia de 2,42x. En tercer lloc, proposem un mecanisme per optimitzar la inferència de les RNNs. Les cel·les de les xarxes recurrents realitzen multiplicacions element a element de les activacions de diferents comportes, sigmoides i tanh sent les funcions habituals d'activació. Realitzem una anàlisi dels valors de les funcions d'activació i mostrem que una fracció significativa està saturada cap a zero o un en un conjunto d'RNNs populars. A continuació, proposem CGPA per podar dinàmicament les activacions de les RNNs a una granularitat gruixuda. CGPA evita l'avaluació de neurones senceres cada vegada que les sortides de neurones parelles estan saturades. CGPA redueix significativament la quantitat de càlculs i accessos a la memòria, aconseguint en mitjana un 12% de millora en el rendiment i estalvi d'energia. Finalment, en l'última contribució d'aquesta tesi ens centrem en metodologies de poda estàtica de les DNNs. La poda redueix la petjada de memòria i el treball computacional mitjançant l'eliminació de connexions o neurones redundants. Tanmateix, mostrem que els esquemes de poda previs fan servir un procés iteratiu molt llarg que requereix l'entrenament de les DNNs moltes vegades per ajustar els paràmetres de poda. A continuació, proposem un esquema de poda basat en l'anàlisi de components principals i la importància relativa de les connexions de cada neurona que optimitza automàticament el DNN optimitzat en un sol tret sense necessitat de sintonitzar manualment múltiples paràmetres

12

Faraone, Julian. "Simplification Of Deep Neural Networks For Efficient Inference." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/25846.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In recent years, Deep Neural Networks (DNNs) have become an area of high interest due to it's ground-breaking results in many fields and applications. In many of these applications however, the model's runtime and memory cost of computing inference is more important than the cost of training the model. Inference is computationally expensive, making them difficult to deploy in constrained hardware environments. This has lead to an increasing interest in recent years for model compression techniques for these models. In this thesis, model compression techniques are presented for achieving efficient representations of DNNs for hardware acceleration. Firstly, a weight pruning technique to achieve unstructured sparse representations of bitwise DNNs is explored on the MNIST and CIFAR10 datasets. Accompanying this, is a hardware exploration of the resulting representations. Secondly, a hardware-aware filter pruning technique to achieve structured sparse representations of bitwise DNNs is investigated on the ImageNet dataset and hardware performance improvements are evaluated via a Field Programmable Gate Array (FPGA) implementation. Thirdly, a quantization method is introduced for training highly accurate bitwise networks with high computational efficiency on the ImageNet dataset. A hardware architecture is designed for this representation and its performance evaluated via FPGA simulations. Lastly, a custom arithmetic is designed which utilizes FPGA-optimized multipliers. Additionally, a training methodology is presented which is customized for DNN models to be compatible with the multiplier. Together, this work illustrates the effectiveness of designing DNNs with hardware in mind. Adjunctly, designing customized hardware helps in optimizing accuracy and hardware efficiency. This is very useful for many real-world DNN applications where hardware performance is paramount.

13

Chan, Kin Wah. "Pruning of hidden Markov model with optimal brain surgeon /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20CHAN.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 72-76). Also available in electronic version. Access restricted to campus users.

14

Gaopande, Meghana Laxmidhar. "Exploring Accumulated Gradient-Based Quantization and Compression for Deep Neural Networks." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/98617.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The growing complexity of neural networks makes their deployment on resource-constrained embedded or mobile devices challenging. With millions of weights and biases, modern deep neural networks can be computationally intensive, with large memory, power and computational requirements. In this thesis, we devise and explore three quantization methods (post-training, in-training and combined quantization) that quantize 32-bit floating-point weights and biases to lower bit width fixed-point parameters while also achieving significant pruning, leading to model compression. We use the total accumulated absolute gradient over the training process as the indicator of importance of a parameter to the network. The most important parameters are quantized by the smallest amount. The post-training quantization method sorts and clusters the accumulated gradients of the full parameter set and subsequently assigns a bit width to each cluster. The in-training quantization method sorts and divides the accumulated gradients into two groups after each training epoch. The larger group consisting of the lowest accumulated gradients is quantized. The combined quantization method performs in-training quantization followed by post-training quantization. We assume storage of the quantized parameters using compressed sparse row format for sparse matrix storage. On LeNet-300-100 (MNIST dataset), LeNet-5 (MNIST dataset), AlexNet (CIFAR-10 dataset) and VGG-16 (CIFAR-10 dataset), post-training quantization achieves 7.62x, 10.87x, 6.39x and 12.43x compression, in-training quantization achieves 22.08x, 21.05x, 7.95x and 12.71x compression and combined quantization achieves 57.22x, 50.19x, 13.15x and 13.53x compression, respectively. Our methods quantize at the cost of accuracy, and we present our work in the light of the accuracy-compression trade-off.
Master of Science
Neural networks are being employed in many different real-world applications. By learning the complex relationship between the input data and ground-truth output data during the training process, neural networks can predict outputs on new input data obtained in real time. To do so, a typical deep neural network often needs millions of numerical parameters, stored in memory. In this research, we explore techniques for reducing the storage requirements for neural network parameters. We propose software methods that convert 32-bit neural network parameters to values that can be stored using fewer bits. Our methods also convert a majority of numerical parameters to zero. Using special storage methods that only require storage of non-zero parameters, we gain significant compression benefits. On typical benchmarks like LeNet-300-100 (MNIST dataset), LeNet-5 (MNIST dataset), AlexNet (CIFAR-10 dataset) and VGG-16 (CIFAR-10 dataset), our methods can achieve up to 57.22x, 50.19x, 13.15x and 13.53x compression respectively. Storage benefits are achieved at the cost of classification accuracy, and we present our work in the light of the accuracy-compression trade-off.

15

Wolinski, Pierre. "Structural Learning of Neural Networks." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASS026.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

La structure d'un réseau de neurones détermine dans une large mesure son coût d'entraînement et d'utilisation, ainsi que sa capacité à apprendre. Ces deux aspects sont habituellement en compétition : plus un réseau de neurones est grand, mieux il remplira la tâche qui lui a été assignée, mais plus son entraînement nécessitera des ressources en mémoire et en temps de calcul. L'automatisation de la recherche des structures de réseaux efficaces - de taille raisonnable, mais performantes dans l'accomplissement de la tâche - est donc une question très étudiée dans ce domaine. Dans ce contexte, des réseaux de neurones aux structures variées doivent être entraînés, ce qui nécessite un nouveau jeu d'hyperparamètres d'entraînement à chaque nouvelle structure testée. L'objectif de la thèse est de traiter différents aspects de ce problème. La première contribution est une méthode d'entraînement de réseau qui fonctionne dans un vaste périmètre de structures de réseaux et de tâches à accomplir, sans nécessité de régler le taux d'apprentissage. La deuxième contribution est une technique d'entraînement et d'élagage de réseau, conçue pour être insensible à la largeur initiale de celui-ci. La dernière contribution est principalement un théorème qui permet de traduire une pénalité d'entraînement empirique en a priori bayésien, théoriquement bien fondé. Ce travail résulte d'une recherche des propriétés que doivent théoriquement vérifier les algorithmes d'entraînement et d'élagage pour être valables sur un vaste ensemble de réseaux de neurones et d'objectifs
The structure of a neural network determines to a large extent its cost of training and use, as well as its ability to learn. These two aspects are usually in competition: the larger a neural network is, the better it will perform the task assigned to it, but the more it will require memory and computing time resources for training. Automating the search of efficient network structures -of reasonable size and performing well- is then a very studied question in this area. Within this context, neural networks with various structures are trained, which requires a new set of training hyperparameters for each new structure tested. The aim of the thesis is to address different aspects of this problem. The first contribution is a training method that operates within a large perimeter of network structures and tasks, without needing to adjust the learning rate. The second contribution is a network training and pruning technique, designed to be insensitive to the initial width of the network. The last contribution is mainly a theorem that makes possible to translate an empirical training penalty into a Bayesian prior, theoretically well founded. This work results from a search for properties that theoretically must be verified by training and pruning algorithms to be valid over a wide range of neural networks and objectives

16

Kubisz, Jan. "Využití umělé inteligence k monitorování stavu obráběcího stroje." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2020. http://www.nusl.cz/ntk/nusl-417752.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Diploma thesis focus on creation of neural network’s internal structure with goal of creation Artificial Neural Network capable of machine state monitoring and predicting its remaining usefull life. Main goal is creation of algorithm’s and library for design and learning of Artificial Neural Network, and deeper understanding of the problematics in the process, then by utilising existing libraries. Selected method was forward-propagation network with multi-layered perceptron architecture, and backpropagation learning. Achieved results was, that the network was able to determine parts state from vibration measurement and on its basis predict remaining usefull life.

17

Weman, Nicklas. "Empirical Investigation of the Effect of Pruning Artificial Neural Networks With Respect to Increased Generalization Ability." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-60112.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This final thesis covers the basics of artificial neural networks, with focus on supervised learning, pruning and the problem of achieving good generalization ability. An empirical investigation is conducted on twelve dierent problems originating from the Proben1 benchmark collection.The results indicate that pruning is more likely to improve generalization if the data is sensitive to overtting or if the networks are likely to be trapped in local minima.

18

Strömberg, Lucas. "Optimizing Convolutional Neural Networks for Inference on Embedded Systems." Thesis, Uppsala universitet, Signaler och system, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444802.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Convolutional neural networks (CNN) are state of the art machine learning models used for various computer vision problems, such as image recognition. As these networks normally need a vast amount of parameters they can be computationally expensive, which complicates deployment on embedded hardware, especially if there are contraints on for instance latency, memory or power consumption. This thesis examines the CNN optimization methods pruning and quantization, in order to explore how they affect not only model accuracy, but also possible inference latency speedup. Four baseline CNN models, based on popular and relevant architectures, were implemented and trained on the CIFAR-10 dataset. The networks were then quantized or pruned for various optimization parameters. All models can be successfully quantized to both 5-bit weights and activations, or pruned with 70% sparsity without any substantial effect on accuracy. The larger baseline models are generally more robust and can be quantized more aggressively, however they are also more sensitive to low-bit activations. Moreover, for 8-bit integer quantization the networks were implemented on an ARM Cortex-A72 microprocessor, where inference latency was studied. These fixed-point models achieves up to 5.5x inference speedup on the ARM processor, compared to the 32-bit floating-point baselines. The larger models gain more speedup from quantization than the smaller ones. While the results are not necessarily generalizable to different CNN architectures or datasets, the valuable insights obtained in this thesis can be used as starting points for further investigations in model optimization and possible effects on accuracy and embedded inference latency.

19

Bonfiglioli, Luca. "Identificazione efficiente di reti neurali sparse basata sulla Lottery Ticket Hypothesis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Frankle e Carbin 2018, data una rete densa inizializzata casualmente, mostrano che esistono sottoreti sparse di tale rete che possono ottenere accuratezze superiori alla rete densa e richiedono meno iterazioni di addestramento per raggiungere l’early stop. Tali sottoreti sono indicate con il nome di winning ticket. L’identificazione di questi ultimi richiede tuttavia almeno un addestramento completo del modello denso, il che ne limita l’impiego pratico, se non come tecnica di compressione. In questa tesi, si mira a trovare una variante più efficiente dei metodi di magnitude based pruning proposti in letteratura, valutando diversi metodi euristici e data driven per ottenere winning ticket senza completare l’addestramento della rete densa. Confrontandosi con i risultati di Zhou et al. 2019, si mostra come l’accuratezza all’inizializzazione di un winning ticket non sia predittiva dell’accuratezza finale raggiunta dopo l’addestramento e come, di conseguenza, ottimizzare l’accuratezza al momento di inizializzazione non garantisca altrettanto elevate accuratezze dopo il riaddestramento. Viene inoltre mostrata la presenza di good ticket, ovvero un intero spettro di reti sparse con performance confrontabili, almeno lungo una dimensione, con quelle dei winning ticket, e come sia possibile identificare sottoreti che rientrano in questa categoria anche dopo poche iterazioni di addestramento della rete densa iniziale. L’identificazione di queste reti sparse avviene in modo simile a quanto proposto da You et al. 2020, mediante una predizione del winning ticket effettuata prima del completamento dell’addestramento della rete densa. Viene mostrato che l’utilizzo di euristiche alternative al magnitude based pruning per effettuare queste predizioni consente, con costi computazionali marginalmente superiori, di ottenere predizioni significativamente migliori sulle architetture prese in esame.

20

Medeiros, ClÃudio Marques de SÃ. "Uma contribuiÃÃo ao problema de seleÃÃo de modelos neurais usando o princÃpio de mÃxima correlaÃÃo dos erros." Universidade Federal do CearÃ, 2008. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=2132.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

nÃo hÃ
PropÃe-se nesta tese um mÃtodo de poda de pesos para redes Perceptron Multicamadas (MLP). TÃcnicas clÃssicas de poda convencionais, tais como Optimal Brain Surgeon(OBS) e Optimal Brain Damage(OBD), baseiam-se na anÃlise de sensibilidade de cada peso da rede, o que requer a determinaÃÃo da inversa da matriz Hessiana da funÃÃo-custo. A inversÃo da matriz Hessiana, alÃm de possuir um alto custo computacional, Ã bastante susceptÃvel a problemas numÃricos decorrentes do mal-condicionamento da mesma. MÃtodos de poda baseados na regularizaÃÃo da funÃÃo-custo, por outro lado, exigem a determinaÃÃo por tentativa-e-erro de um parÃmetro de regularizaÃÃo. Tendo em mente as limitaÃÃes dos mÃtodos de poda supracitados, o mÃtodo proposto baseia-se no "PrincÃpio da MÃxima CorrelaÃÃo dos Erros" (MAXCORE). A idÃia consiste em analisar a importÃncia de cada conexÃo da rede a partir da correlaÃÃo cruzada entre os erros em uma camada e os erros retropropagados para a camada anterior, partindo da camada de saÃda em direÃÃo Ã camada de entrada. As conexÃes que produzem as maiores correlaÃÃes tendem a se manter na rede podada. Uma vantagem imediata deste procedimento estÃ em nÃo requerer a inversÃo de matrizes, nem um parÃmetro de regularizaÃÃo. O desempenho do mÃtodo proposto Ã avaliado em problemas de classificaÃÃo de padrÃes e os resultados sÃo comparados aos obtidos pelos mÃtodos OBS/OBD e por um mÃtodo de poda baseado em regularizaÃÃo. Para este fim, sÃo usados, alÃm de dados articialmente criados para salientar caracterÃsticas importantes do mÃtodo, os conjuntos de dados bem conhecidos da comunidade de aprendizado de mÃquinas: Iris, Wine e Dermatology. Utilizou-se tambÃm um conjunto de dados reais referentes ao diagnÃstico de patologias da coluna vertebral. Os resultados obtidos mostram que o mÃtodo proposto apresenta desempenho equivalente ou superior aos mÃtodos de poda convencionais, com as vantagens adicionais do baixo custo computacional e simplicidade. O mÃtodo proposto tambÃm mostrou-se bastante agressivo na poda de unidades de entrada (atributos), o que sugere a sua aplicaÃÃo em seleÃÃo de caracterÃsticas.
This thesis proposes a new pruning method which eliminates redundant weights in a multilayer perceptron (MLP). Conventional pruning techniques, like Optimal Brain Surgeon (OBS) and Optimal Brain Damage (OBD), are based on weight sensitivity analysis, which requires the inversion of the error Hessian matrix of the loss function (i.e. mean squared error). This inversion is specially susceptible to numerical problems due to poor conditioning of the Hessian matrix and demands great computational efforts. Another kind of pruning method is based on the regularization of the loss function, but it requires the determination of the regularization parameter by trial and error. The proposed method is based on "Maximum Correlation Errors Principle" (MAXCORE). The idea in this principle is to evaluate the importance of each network connection by calculating the cross correlation among errors in a layer and the back-propagated errors in the preceding layer, starting from the output layer and working through the network until the input layer is reached. The connections which have larger correlations remain and the others are pruned from the network. The evident advantage of this procedure is its simplicity, since matrix inversion or parameter adjustment are not necessary. The performance of the proposed method is evaluated in pattern classification tasks and the results are compared to those achieved by the OBS/OBD techniques and also by regularization-based method. For this purpose, artificial data sets are used to highlight some important characteristics of the proposed methodology. Furthermore, well known benchmarking data sets, such as IRIS, WINE and DERMATOLOGY, are also used for the sake of evaluation. A real-world biomedical data set related to pathologies of the vertebral column is also used. The results obtained show that the proposed method achieves equivalent or superior performance compared to conventional pruning methods, with the additional advantages of low computational cost and simplicity. The proposed method also presents eficient behavior in pruning the input units, which suggests its use as a feature selection method.

21

You, Shi Xian, and 游世賢. "The growing and pruning of neural network learning." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/91967731658427789731.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

(5931047), Akash Gaikwad. "Pruning Convolution Neural Network (SqueezeNet) for Efficient Hardware Deployment." Thesis, 2019.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In recent years, deep learning models have become popular in the real-time embedded application, but there are many complexities for hardware deployment because of limited resources such as memory, computational power, and energy. Recent research in the field of deep learning focuses on reducing the model size of the Convolution Neural Network (CNN) by various compression techniques like Architectural compression, Pruning, Quantization, and Encoding (e.g., Huffman encoding). Network pruning is one of the promising technique to solve these problems.

This thesis proposes methods to prune the convolution neural network (SqueezeNet) without introducing network sparsity in the pruned model.

This thesis proposes three methods to prune the CNN to decrease the model size of CNN without a significant drop in the accuracy of the model.

1: Pruning based on Taylor expansion of change in cost function Delta C.

2: Pruning based on L₂ normalization of activation maps.

3: Pruning based on a combination of method 1 and method 2.

The proposed methods use various ranking methods to rank the convolution kernels and prune the lower ranked filters afterwards SqueezeNet model is fine-tuned by backpropagation. Transfer learning technique is used to train the SqueezeNet on the CIFAR-10 dataset. Results show that the proposed approach reduces the SqueezeNet model by 72% without a significant drop in the accuracy of the model (optimal pruning efficiency result). Results also show that Pruning based on a combination of Taylor expansion of the cost function and L₂ normalization of activation maps achieves better pruning efficiency compared to other individual pruning criteria and most of the pruned kernels are from mid and high-level layers. The Pruned model is deployed on BlueBox 2.0 using RTMaps software and model performance was evaluated.

23

Gaikwad, Akash S. "Pruning Convolution Neural Network (SqueezeNet) for Efficient Hardware Deployment." Thesis, 2018. http://hdl.handle.net/1805/17923.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
In recent years, deep learning models have become popular in the real-time embedded application, but there are many complexities for hardware deployment because of limited resources such as memory, computational power, and energy. Recent research in the field of deep learning focuses on reducing the model size of the Convolution Neural Network (CNN) by various compression techniques like Architectural compression, Pruning, Quantization, and Encoding (e.g., Huffman encoding). Network pruning is one of the promising technique to solve these problems. This thesis proposes methods to prune the convolution neural network (SqueezeNet) without introducing network sparsity in the pruned model. This thesis proposes three methods to prune the CNN to decrease the model size of CNN without a significant drop in the accuracy of the model. 1: Pruning based on Taylor expansion of change in cost function Delta C. 2: Pruning based on L2 normalization of activation maps. 3: Pruning based on a combination of method 1 and method 2. The proposed methods use various ranking methods to rank the convolution kernels and prune the lower ranked filters afterwards SqueezeNet model is fine-tuned by backpropagation. Transfer learning technique is used to train the SqueezeNet on the CIFAR-10 dataset. Results show that the proposed approach reduces the SqueezeNet model by 72% without a significant drop in the accuracy of the model (optimal pruning efficiency result). Results also show that Pruning based on a combination of Taylor expansion of the cost function and L2 normalization of activation maps achieves better pruning efficiency compared to other individual pruning criteria and most of the pruned kernels are from mid and high-level layers. The Pruned model is deployed on BlueBox 2.0 using RTMaps software and model performance was evaluated.

24

Fan, En-Yu, and 樊恩宇. "Convolutional Neural Network Pruning by Training-based Important Channel Identification." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/rch56c.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

碩士
國立臺灣大學
電子工程學研究所
107
Despite the tremendous success of convolutional neural networks (CNNs) in various applications, their deployment is greatly obstructed by its high computational cost and its large memory usage. Many approaches have been proposed to prune the network channel-wisely. Nevertheless, most consider the interrelations of channels independently to training or they prune the network in a layer-by-layer manner leveraging the statistics only by an individual layer or two consecutive layers. In this work, we devise a strategy that introduces the concepts of Scoring Network (SN) and Importance of Channels (IofC) into training for channel pruning. Specifically, we take interdependencies of channels into account by combining them into the training phase and jointly prune the channels of every layer based on the trained model. Experimental results evaluated on multiple datasets with several modern CNN models demonstrate that our method can produce promising reductions for modern CNN frameworks in both parameters and floating point operations (FLOPs) while the performance loss is negligible, or even better relative to the unpruned counterparts.

25

AlShahrani, Mona. "Towards an Efficient Artificial Neural Network Pruning and Feature Ranking Tool." Thesis, 2015. http://hdl.handle.net/10754/555862.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Artificial Neural Networks (ANNs) are known to be among the most effective and expressive machine learning models. Their impressive abilities to learn have been reflected in many broad application domains such as image recognition, medical diagnosis, online banking, robotics, dynamic systems, and many others. ANNs with multiple layers of complex non-linear transformations (a.k.a Deep ANNs) have shown recently successful results in the area of computer vision and speech recognition. ANNs are parametric models that approximate unknown functions in which parameter values (weights) are adapted during training. ANN’s weights can be large in number and thus render the trained model more complex with chances for “overfitting” training data. In this study, we explore the effects of network pruning on performance of ANNs and ranking of features that describe the data. Simplified ANN model results in fewer parameters, less computation and faster training. We investigate the use of Hessian-based pruning algorithms as well as simpler ones (i.e. non Hessian-based) on nine datasets with varying number of input features and ANN parameters. The Hessian-based Optimal Brain Surgeon algorithm (OBS) is robust but slow. Therefore a faster parallel Hessian- approximation is provided. An additional speedup is provided using a variant we name ‘Simple n Optimal Brain Surgeon’ (SNOBS), which represents a good compromise between robustness and time efficiency. For some of the datasets, the ANN pruning experiments show on average 91% reduction in the number of ANN parameters and about 60% - 90% in the number of ANN input features, while maintaining comparable or better accuracy to the case when no pruning is applied. Finally, we show through a comprehensive comparison with seven state-of-the art feature filtering methods that the feature selection and ranking obtained as a byproduct of the ANN pruning is comparable in accuracy to these methods.

26

"Extended Kalman filter based pruning algorithms and several aspects of neural network learning." 1998. http://library.cuhk.edu.hk/record=b6073079.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

by John Pui-Fai Sum.
Thesis (Ph.D.)--Chinese University of Hong Kong, 1998.
Includes bibliographical references (p. 155-[163]).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Mode of access: World Wide Web.

27

Xu, Zhiwei. "Applications of Markov Random Field Optimization and 3D Neural Network Pruning in Computer Vision." Phd thesis, 2022. http://hdl.handle.net/1885/258295.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Recent years witness the rapid development of Convolutional Neural Network (CNN) in various computer vision applications that were traditionally addressed by Markov Random Field (MRF) optimization methods. Even though CNN based methods achieve high accuracy in these tasks, a high level of fine results are difficult to be achieved. For instance, a pairwise MRF optimization method is capable of segmenting objects with the auxiliary edge information through the second-order terms, which is very uncertain to be achieved by a deep neural network. MRF optimization methods, however, are able to enhance the performance with an explicit theoretical and experimental supports using iterative energy minimization. Secondly, such an edge detector can be learned by CNNs, and thus, seeking to transfer the task of a CNN for another task becomes valuable. It is desirable to fuse the superpixel contours from a state-of-the-art CNN with semantic segmentation results from another state-of-the-art CNN so that such a fusion enhances the object contours in semantic segmentation to be aligned with the superpixel contours. This kind of fusion is not limited to semantic segmentation but also other tasks with a collective effect of multiple off-the-shelf CNNs. While fusing multiple CNNs is useful to enhance the performance, each of such CNNs is usually specifically designed and trained with an empirical configuration of resources. With such a large batch size, however, the joint CNN training is possible to be out of GPU memory. Such a problem is usually involved in efficient CNN training yet with limited resources. This issue is more obvious and severe in 3D CNNs than 2D CNNs due to the high requirement of training resources. To solve the first problem, we propose two fast and differentiable message passing algorithms, namely Iterative Semi-Global Matching Revised (ISGMR) and Parallel Tree-Reweighted Message Passing (TRWP), for both energy minimization problems and deep learning applications. Our experiments on stereo vision dataset and image inpainting dataset validate the effectiveness and efficiency of our methods with minimum energies comparable to the state-of-the-art algorithm TRWS and greatly improve the forward and backward propagation speed using CUDA programming on massive parallel trees. Applying these two methods on deep learning semantic segmentation on PASCAL VOC 2012 with Canny edges achieves enhanced segmentation results measured by mean Intersection over Union (mIoU). In the second problem, to effectively fuse and finetune multiple CNNs, we present a transparent initialization module that identically maps the output of a multiple-layer module to its input at the early stage of finetuning. The pretrained model parameters are then gradually divergent in training as the loss decreases. This transparent initialization has a higher initialization rate than Net2Net and a higher recovery rate compared with random initialization and Xavier initialization. Our experiments validate the effectiveness of the proposed transparent initialization and the sparse encoder with sparse matrix operations. The edges of segmented objects achieve a higher performance ratio and a higher F-measure than other comparable methods. In the third problem, to compress a CNN effectually, especially for resource-inefficient 3D CNNs, we propose a single-shot neuron pruning method with resource constraints. The pruning principle is to remove the neurons with low neuron importance corresponding to small connection sensitivities. The reweighting strategy with the layerwise consumption of memory or FLOPs improves the pruning ability by avoiding infeasible pruning of the whole layer(s). Our experiments on point cloud dataset, ShapeNet, and medical image dataset, BraTS'18, prove the effectiveness of our method. Applying our method to video classification on UCF101 dataset using MobileNetV2 and I3D further strengthens the benefits of our method.

28

Alfarra, Motasem. "Applications of Tropical Geometry in Deep Neural Networks." Thesis, 2020. http://hdl.handle.net/10754/662473.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This thesis tackles the problem of understanding deep neural network with piece- wise linear activation functions. We leverage tropical geometry, a relatively new field in algebraic geometry to characterize the decision boundaries of a single hidden layer neural network. This characterization is leveraged to understand, and reformulate three interesting applications related to deep neural network. First, we give a geo- metrical demonstration of the behaviour of the lottery ticket hypothesis. Moreover, we deploy the geometrical characterization of the decision boundaries to reformulate the network pruning problem. This new formulation aims to prune network pa- rameters that are not contributing to the geometrical representation of the decision boundaries. In addition, we propose a dual view of adversarial attack that tackles both designing perturbations to the input image, and the equivalent perturbation to the decision boundaries.

29

Rao, Sreenivasa M. "DNN: A new neural network architecture of associative memory with pruning and order-sensitive learning and its applications." Thesis, 1998. http://hdl.handle.net/2009/726.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Laurent, César. "Advances in parameterisation, optimisation and pruning of neural networks." Thesis, 2020. http://hdl.handle.net/1866/25592.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les réseaux de neurones sont une famille de modèles de l'apprentissage automatique qui sont capable d'apprendre des tâches complexes directement des données. Bien que produisant déjà des résultats impressionnants dans beaucoup de domaines tels que la reconnaissance de la parole, la vision par ordinateur ou encore la traduction automatique, il y a encore de nombreux défis dans l'entraînement et dans le déploiement des réseaux de neurones. En particulier, entraîner des réseaux de neurones nécessite typiquement d'énormes ressources computationnelles, et les modèles entraînés sont souvent trop gros ou trop gourmands en ressources pour être déployés sur des appareils dont les ressources sont limitées, tels que les téléphones intelligents ou les puces de faible puissance. Les articles présentés dans cette thèse étudient des solutions à ces différents problèmes. Les deux premiers articles se concentrent sur l'amélioration de l'entraînement des réseaux de neurones récurrents (RNNs), un type de réseaux de neurones particulier conçu pour traiter des données séquentielles. Les RNNs sont notoirement difficiles à entraîner, donc nous proposons d'améliorer leur paramétrisation en y intégrant la normalisation par lots (BN), qui était jusqu'à lors uniquement appliquée aux réseaux non-récurrents. Dans le premier article, nous appliquons BN aux connections des entrées vers les couches cachées du RNN, ce qui réduit le décalage covariable entre les différentes couches; et dans le second article, nous montrons comment appliquer BN aux connections des entrées vers les couches cachées et aussi des couches cachée vers les couches cachée des réseau récurrents à mémoire court et long terme (LSTM), une architecture populaire de RNN, ce qui réduit également le décalage covariable entre les pas de temps. Nos expériences montrent que les paramétrisations proposées permettent d'entraîner plus rapidement et plus efficacement les RNNs, et ce sur différents bancs de tests. Dans le troisième article, nous proposons un nouvel optimiseur pour accélérer l'entraînement des réseaux de neurones. Les optimiseurs diagonaux traditionnels, tels que RMSProp, opèrent dans l'espace des paramètres, ce qui n'est pas optimal lorsque plusieurs paramètres sont mis à jour en même temps. A la place, nous proposons d'appliquer de tels optimiseurs dans une base dans laquelle l'approximation diagonale est susceptible d'être plus efficace. Nous tirons parti de l'approximation K-FAC pour construire efficacement cette base propre Kronecker-factorisée (KFE). Nos expériences montrent une amélioration en vitesse d'entraînement par rapport à K-FAC, et ce pour différentes architectures de réseaux de neurones profonds. Le dernier article se concentre sur la taille des réseaux de neurones, i.e. l'action d'enlever des paramètres du réseau, afin de réduire son empreinte mémoire et son coût computationnel. Les méthodes de taille typique se base sur une approximation de Taylor de premier ou de second ordre de la fonction de coût, afin d'identifier quels paramètres peuvent être supprimés. Nous proposons d'étudier l'impact des hypothèses qui se cachent derrière ces approximations. Aussi, nous comparons systématiquement les méthodes basées sur des approximations de premier et de second ordre avec la taille par magnitude (MP), et montrons comment elles fonctionnent à la fois avant, mais aussi après une phase de réapprentissage. Nos expériences montrent que mieux préserver la fonction de coût ne transfère pas forcément à des réseaux qui performent mieux après la phase de réapprentissage, ce qui suggère que considérer uniquement l'impact de la taille sur la fonction de coût ne semble pas être un objectif suffisant pour développer des bon critères de taille.
Neural networks are a family of Machine Learning models able to learn complex tasks directly from the data. Although already producing impressive results in many areas such as speech recognition, computer vision or machine translation, there are still a lot of challenges in both training and deployment of neural networks. In particular, training neural networks typically requires huge amounts of computational resources, and trained models are often too big or too computationally expensive to be deployed on resource-limited devices, such as smartphones or low-power chips. The articles presented in this thesis investigate solutions to these different issues. The first couple of articles focus on improving the training of Recurrent Neural Networks (RNNs), networks specially designed to process sequential data. RNNs are notoriously hard to train, so we propose to improve their parameterisation by upgrading them with Batch Normalisation (BN), a very effective parameterisation which was hitherto used only in feed-forward networks. In the first article, we apply BN to the input-to-hidden connections of the RNNs, thereby reducing internal covariate shift between layers. In the second article, we show how to apply it to both input-to-hidden and hidden-to-hidden connections of the Long Short-Term Memory (LSTM), a popular RNN architecture, thus also reducing internal covariate shift between time steps. Our experiments show that these proposed parameterisations allow for faster and better training of RNNs on several benchmarks. In the third article, we propose a new optimiser to accelerate the training of neural networks. Traditional diagonal optimisers, such as RMSProp, operate in parameters coordinates, which is not optimal when several parameters are updated at the same time. Instead, we propose to apply such optimisers in a basis in which the diagonal approximation is likely to be more effective. We leverage the same approximation used in Kronecker-factored Approximate Curvature (K-FAC) to efficiently build this Kronecker-factored Eigenbasis (KFE). Our experiments show improvements over K-FAC in training speed for several deep network architectures. The last article focuses on network pruning, the action of removing parameters from the network, in order to reduce its memory footprint and computational cost. Typical pruning methods rely on first or second order Taylor approximations of the loss landscape to identify which parameters can be discarded. We propose to study the impact of the assumptions behind such approximations. Moreover, we systematically compare methods based on first and second order approximations with Magnitude Pruning (MP), showing how they perform both before and after a fine-tuning phase. Our experiments show that better preserving the original network function does not necessarily transfer to better performing networks after fine-tuning, suggesting that only considering the impact of pruning on the loss might not be a sufficient objective to design good pruning criteria.

31

Fletcher, Lizelle. "Statistical modelling by neural networks." Thesis, 2002. http://hdl.handle.net/10500/600.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In this thesis the two disciplines of Statistics and Artificial Neural Networks are combined into an integrated study of a data set of a weather modification Experiment. An extensive literature study on artificial neural network methodology has revealed the strongly interdisciplinary nature of the research and the applications in this field. An artificial neural networks are becoming increasingly popular with data analysts, statisticians are becoming more involved in the field. A recursive algoritlun is developed to optimize the number of hidden nodes in a feedforward artificial neural network to demonstrate how existing statistical techniques such as nonlinear regression and the likelihood-ratio test can be applied in innovative ways to develop and refine neural network methodology. This pruning algorithm is an original contribution to the field of artificial neural network methodology that simplifies the process of architecture selection, thereby reducing the number of training sessions that is needed to find a model that fits the data adequately. [n addition, a statistical model to classify weather modification data is developed using both a feedforward multilayer perceptron artificial neural network and a discriminant analysis. The two models are compared and the effectiveness of applying an artificial neural network model to a relatively small data set assessed. The formulation of the problem, the approach that has been followed to solve it and the novel modelling application all combine to make an original contribution to the interdisciplinary fields of Statistics and Artificial Neural Networks as well as to the discipline of meteorology.
Mathematical Sciences
D. Phil. (Statistics)

32

Hübsch, Ondřej. "Redukce počtu parametrů v konvolučních neuronových sítích." Master's thesis, 2021. http://www.nusl.cz/ntk/nusl-447970.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In the current deep learning era, convolutional neural networks are commonly used as a backbone of systems that process images or videos. A lot of existing neural network architectures are however needlessly overparameterized and their performance can be closely matched by an alternative that uses much smaller amount of parameters. Our aim is to design a method that is able to find such alternative(s) for a given convolutional architecture. We propose a general scheme for architecture reduction and evaluate three algorithms that search for the op- timal smaller architecture. We do multiple experiments with ResNet and Wide ResNet architectures as the base using CIFAR-10 dataset. The best method is able to reduce the number of parameters by 75-85% without any loss in accuracy even in these already quite efficient architectures. 1

33

Petříčková, Zuzana. "Umělé neuronové sítě a jejich využití při extrakci znalostí." Doctoral thesis, 2015. http://www.nusl.cz/ntk/nusl-352245.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Title: Artificial Neural Networks and Their Usage For Knowledge Extraction Author: RNDr. Zuzana Petříčková Department: Department of Theoretical Computer Science and Mathema- tical Logic Supervisor: doc. RNDr. Iveta Mrázová, CSc., Department of Theoretical Computer Science and Mathematical Logic Abstract: The model of multi/layered feed/forward neural networks is well known for its ability to generalize well and to find complex non/linear dependencies in the data. On the other hand, it tends to create complex internal structures, especially for large data sets. Efficient solutions to demanding tasks currently dealt with require fast training, adequate generalization and a transparent and simple network structure. In this thesis, we propose a general framework for training of BP/networks. It is based on the fast and robust scaled conjugate gradient technique. This classical training algorithm is enhanced with analytical or approximative sensitivity inhibition during training and enforcement of a transparent in- ternal knowledge representation. Redundant hidden and input neurons are pruned based on internal representation and sensitivity analysis. The performance of the developed framework has been tested on various types of data with promising results. The framework provides a fast training algorithm,...

34

ElAraby, Mostafa. "Optimizing ANN Architectures using Mixed-Integer Programming." Thesis, 2020. http://hdl.handle.net/1866/24312.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Over-parameterized networks, where the number of parameters surpass the number of train-ing samples, generalize well on various tasks. However, large networks are computationally expensive in terms of the training and inference time. Furthermore, the lottery ticket hy-pothesis states that a subnetwork of a randomly initialized network can achieve marginal loss after training on a specific task compared to the original network. Therefore, there is a need to optimize the inference and training time, and a potential for more compact neural architectures. We introduce a novel approach “Optimizing ANN Architectures using Mixed-Integer Programming” (OAMIP) to find these subnetworks by identifying critical neurons and re-moving non-critical ones, resulting in a faster inference time. The proposed OAMIP utilizes a Mixed-Integer Program (MIP) for assigning importance scores to each neuron in deep neural network architectures. Our MIP is guided by the impact on the main learning task of the net-work when simultaneously pruning subsets of neurons. In concrete, the optimization of the objective function drives the solver to minimize the number of neurons, to limit the network to critical neurons, i.e., with high importance score, that need to be kept for maintaining the overall accuracy of the trained neural network. Further, the proposed formulation generalizes the recently considered lottery ticket hypothesis by identifying multiple “lucky” subnetworks, resulting in optimized architectures, that not only perform well on a single dataset, but also generalize across multiple ones upon retraining of network weights. Finally, we present a scalable implementation of our method by decoupling the importance scores across layers using auxiliary networks and across di˙erent classes. We demonstrate the ability of OAMIP to prune neural networks with marginal loss in accuracy and generalizability on popular datasets and architectures.
Les réseaux sur-paramétrés, où le nombre de paramètres dépasse le nombre de données, se généralisent bien sur diverses tâches. Cependant, les grands réseaux sont coûteux en termes d’entraînement et de temps d’inférence. De plus, l’hypothèse du billet de loterie indique qu’un sous-réseau d’un réseau initialisé de façon aléatoire peut atteindre une perte marginale après l’entrainement sur une tâche spécifique par rapport au réseau de référence. Par conséquent, il est nécessaire d’optimiser le temps d’inférence et d’entrainement, ce qui est possible pour des architectures neurales plus compactes. Nous introduisons une nouvelle approche “Optimizing ANN Architectures using Mixed-Integer Programming” (OAMIP) pour trouver ces sous-réseaux en identifiant les neurones importants et en supprimant les neurones non importants, ce qui permet d’accélérer le temps d’inférence. L’approche OAMIP proposée fait appel à un programme mixte en nombres entiers (MIP) pour attribuer des scores d’importance à chaque neurone dans les architectures de modèles profonds. Notre MIP est guidé par l’impact sur la principale tâche d’apprentissage du réseau en élaguant simultanément les neurones. En définissant soigneusement la fonction objective du MIP, le solveur aura une tendance à minimiser le nombre de neurones, à limiter le réseau aux neurones critiques, c’est-à-dire avec un score d’importance élevé, qui doivent être conservés pour maintenir la précision globale du réseau neuronal formé. De plus, la formulation proposée généralise l’hypothèse des billets de loterie récemment envisagée en identifiant de multiples sous-réseaux “chanceux”. Cela permet d’obtenir des architectures optimisées qui non seulement fonctionnent bien sur un seul ensemble de données, mais aussi se généralisent sur des di˙érents ensembles de données lors du recyclage des poids des réseaux. Enfin, nous présentons une implémentation évolutive de notre méthode en découplant les scores d’importance entre les couches à l’aide de réseaux auxiliaires et entre les di˙érentes classes. Nous démontrons la capacité de notre formulation à élaguer les réseaux de neurones avec une perte marginale de précision et de généralisabilité sur des ensembles de données et des architectures populaires.

Dissertations / Theses on the topic 'Neural Network Pruning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles