Academic literature on the topic 'Neural network accelerator'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Neural network accelerator.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Neural network accelerator"

1

Eliahu, Adi, Ronny Ronen, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky. "multiPULPly." ACM Journal on Emerging Technologies in Computing Systems 17, no. 2 (April 2021): 1–27. http://dx.doi.org/10.1145/3432815.

Full text
Abstract:
Computationally intensive neural network applications often need to run on resource-limited low-power devices. Numerous hardware accelerators have been developed to speed up the performance of neural network applications and reduce power consumption; however, most focus on data centers and full-fledged systems. Acceleration in ultra-low-power systems has been only partially addressed. In this article, we present multiPULPly, an accelerator that integrates memristive technologies within standard low-power CMOS technology, to accelerate multiplication in neural network inference on ultra-low-power systems. This accelerator was designated for PULP, an open-source microcontroller system that uses low-power RISC-V processors. Memristors were integrated into the accelerator to enable power consumption only when the memory is active, to continue the task with no context-restoring overhead, and to enable highly parallel analog multiplication. To reduce the energy consumption, we propose novel dataflows that handle common multiplication scenarios and are tailored for our architecture. The accelerator was tested on FPGA and achieved a peak energy efficiency of 19.5 TOPS/W, outperforming state-of-the-art accelerators by 1.5× to 4.5×.
APA, Harvard, Vancouver, ISO, and other styles
2

Cho, Jaechan, Yongchul Jung, Seongjoo Lee, and Yunho Jung. "Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme." Electronics 10, no. 3 (January 20, 2021): 230. http://dx.doi.org/10.3390/electronics10030230.

Full text
Abstract:
Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer accelerators. Although streaming accelerators designed for a specific BNN network topology provide high throughput, they are infeasible for various sensor applications in edge AI because of their complexity and inflexibility. In contrast, layer accelerators with reasonable resources can support various network topologies, but they operate with the same parallelism for all the layers of the BNN, which degrades throughput performance at certain layers. To overcome this problem, we propose a BNN accelerator with adaptive parallelism that offers high throughput performance in all layers. The proposed accelerator analyzes target layer parameters and operates with optimal parallelism using reasonable resources. In addition, this architecture is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators. In performance evaluation using state-of-the-art BNN topologies, the designed BNN accelerator achieved an area–speed efficiency 9.69 times higher than previous FPGA implementations and 24% higher than existing VLSI implementations for BNNs.
APA, Harvard, Vancouver, ISO, and other styles
3

Hong, JiUn, Saad Arslan, TaeGeon Lee, and HyungWon Kim. "Design of Power-Efficient Training Accelerator for Convolution Neural Networks." Electronics 10, no. 7 (March 26, 2021): 787. http://dx.doi.org/10.3390/electronics10070787.

Full text
Abstract:
To realize deep learning techniques, a type of deep neural network (DNN) called a convolutional neural networks (CNN) is among the most widely used models aimed at image recognition applications. However, there is growing demand for light-weight and low-power neural network accelerators, not only for inference but also for training process. In this paper, we propose a training accelerator that provides low power and compact chip size targeted for mobile and edge computing applications. It accelerates to achieve the real-time processing of both inference and training using concurrent floating-point data paths. The proposed accelerator can be externally controlled and employs resource sharing and an integrated convolution-pooling block to achieve low area and low energy consumption. We implemented the proposed training accelerator in an FPGA (Field Programmable Gate Array) and evaluated its training performance using an MNIST CNN example in comparison with a PC with GPU (Graphics Processing Unit). While both methods achieved a similar training accuracy of 95.1%, the proposed accelerator, when implemented in a silicon chip, reduced the energy consumption by 480 times compared to the counterpart. Additionally, when implemented on an FPGA, an energy reduction of over 4.5 times was achieved compared to the existing FPGA training accelerator for the MNIST dataset. Therefore, the proposed accelerator is more suitable for deployment in mobile/edge nodes compared to the existing software and hardware accelerators.
APA, Harvard, Vancouver, ISO, and other styles
4

Noskova, E. S., I. E. Zakharov, Y. N. Shkandybin, and S. G. Rykovanov. "Towards energy-efficient neural network calculations." Computer Optics 46, no. 1 (February 2022): 160–66. http://dx.doi.org/10.18287/2412-6179-co-914.

Full text
Abstract:
Nowadays, the problem of creating high-performance and energy-efficient hardware for Artificial Intelligence tasks is very acute. The most popular solution to this problem is the use of Deep Learning Accelerators, such as GPUs and Tensor Processing Units to run neural networks. Recently, NVIDIA has announced the NVDLA project, which allows one to design neural network accelerators based on an open-source code. This work describes a full cycle of creating a prototype NVDLA accelerator, as well as testing the resulting solution by running the resnet-50 neural network on it. Finally, an assessment of the performance and power efficiency of the prototype NVDLA accelerator when compared to the GPU and CPU is provided, the results of which show the superiority of NVDLA in many characteristics.
APA, Harvard, Vancouver, ISO, and other styles
5

Ferianc, Martin, Hongxiang Fan, Divyansh Manocha, Hongyu Zhou, Shuanglong Liu, Xinyu Niu, and Wayne Luk. "Improving Performance Estimation for Design Space Exploration for Convolutional Neural Network Accelerators." Electronics 10, no. 4 (February 23, 2021): 520. http://dx.doi.org/10.3390/electronics10040520.

Full text
Abstract:
Contemporary advances in neural networks (NNs) have demonstrated their potential in different applications such as in image classification, object detection or natural language processing. In particular, reconfigurable accelerators have been widely used for the acceleration of NNs due to their reconfigurability and efficiency in specific application instances. To determine the configuration of the accelerator, it is necessary to conduct design space exploration to optimize the performance. However, the process of design space exploration is time consuming because of the slow performance evaluation for different configurations. Therefore, there is a demand for an accurate and fast performance prediction method to speed up design space exploration. This work introduces a novel method for fast and accurate estimation of different metrics that are of importance when performing design space exploration. The method is based on a Gaussian process regression model parametrised by the features of the accelerator and the target NN to be accelerated. We evaluate the proposed method together with other popular machine learning based methods in estimating the latency and energy consumption of our implemented accelerator on two different hardware platforms targeting convolutional neural networks. We demonstrate improvements in estimation accuracy, without the need for significant implementation effort or tuning.
APA, Harvard, Vancouver, ISO, and other styles
6

Sunny, Febin P., Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha. "ROBIN: A Robust Optical Binary Neural Network Accelerator." ACM Transactions on Embedded Computing Systems 20, no. 5s (October 31, 2021): 1–24. http://dx.doi.org/10.1145/3476988.

Full text
Abstract:
Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we present a novel optical-domain BNN accelerator, named ROBIN , which intelligently integrates heterogeneous microring resonator optical devices with complementary capabilities to efficiently implement the key functionalities in BNNs. We perform detailed fabrication-process variation analyses at the optical device level, explore efficient corrective tuning for these devices, and integrate circuit-level optimization to counter thermal variations. As a result, our proposed ROBIN architecture possesses the desirable traits of being robust, energy-efficient, low latency, and high throughput, when executing BNN models. Our analysis shows that ROBIN can outperform the best-known optical BNN accelerators and many electronic accelerators. Specifically, our energy-efficient ROBIN design exhibits energy-per-bit values that are ∼4 × lower than electronic BNN accelerators and ∼933 × lower than a recently proposed photonic BNN accelerator, while a performance-efficient ROBIN design shows ∼3 × and ∼25 × better performance than electronic and photonic BNN accelerators, respectively.
APA, Harvard, Vancouver, ISO, and other styles
7

Anmin, Kong, and Zhao Bin. "A Parallel Loading Based Accelerator for Convolution Neural Network." International Journal of Machine Learning and Computing 10, no. 5 (October 5, 2020): 669–74. http://dx.doi.org/10.18178/ijmlc.2020.10.5.989.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Xia, Chengpeng, Yawen Chen, Haibo Zhang, Hao Zhang, Fei Dai, and Jigang Wu. "Efficient neural network accelerators with optical computing and communication." Computer Science and Information Systems, no. 00 (2022): 66. http://dx.doi.org/10.2298/csis220131066x.

Full text
Abstract:
Conventional electronic Artificial Neural Networks (ANNs) accelerators focus on architecture design and numerical computation optimization to improve the training efficiency. However, these approaches have recently encountered bottlenecks in terms of energy efficiency and computing performance, which leads to an increase interest in photonic accelerator. Photonic architectures with low energy consumption, high transmission speed and high bandwidth have been considered as an important role for generation of computing architectures. In this paper, to provide a better understanding of optical technology used in ANN acceleration, we present a comprehensive review for the efficient photonic computing and communication in ANN accelerators. The related photonic devices are investigated in terms of the application in ANNs acceleration, and a classification of existing solutions is proposed that are categorized into optical computing acceleration and optical communication acceleration according to photonic effects and photonic architectures. Moreover, we discuss the challenges for these photonic neural network acceleration approaches to highlight the most promising future research opportunities in this field.
APA, Harvard, Vancouver, ISO, and other styles
9

Tang, Wenkai, and Peiyong Zhang. "GPGCN: A General-Purpose Graph Convolution Neural Network Accelerator Based on RISC-V ISA Extension." Electronics 11, no. 22 (November 21, 2022): 3833. http://dx.doi.org/10.3390/electronics11223833.

Full text
Abstract:
In the past two years, various graph convolution neural networks (GCNs) accelerators have emerged, each with their own characteristics, but their common disadvantage is that the hardware architecture is not programmable and it is optimized for a specific network and dataset. They may not support acceleration for different GCNs and may not achieve optimal hardware resource utilization for datasets of different sizes. Therefore, given the above shortcomings, and according to the development trend of traditional neural network accelerators, this paper proposes and implements GPGCN: a general-purpose GCNs accelerator architecture based on RISC-V instruction set extension, providing the software programming freedom to support acceleration for various GCNs, and achieving the best acceleration efficiency for different GCNs with different datasets. Compared with traditional CPU, and traditional CPU with vector expansion, GPGCN achieves above 1001×, 267× speedup for GCN with the Cora dataset. Compared with dedicated accelerators, GPGCN has software programmability and supports the acceleration of more GCNs.
APA, Harvard, Vancouver, ISO, and other styles
10

An, Fubang, Lingli Wang, and Xuegong Zhou. "A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network." Electronics 12, no. 13 (June 27, 2023): 2847. http://dx.doi.org/10.3390/electronics12132847.

Full text
Abstract:
Since the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for EfficientNet still have much room to improve the performance of the depthwise convolution, squeeze-and-excitation module and nonlinear activation functions. In this paper, we first design a reconfigurable register array and computational kernel to accelerate the depthwise convolution. Next, we propose a vector unit to implement the nonlinear activation functions and the scale operation. An exchangeable-sequence dual-computational kernel architecture is proposed to improve the performance and the utilization. In addition, the memory architectures are designed to complete the hardware accelerator for the above computing architecture. Finally, in order to evaluate the performance of the hardware accelerator, the accelerator is implemented based on Xilinx XCVU37P. The results show that the proposed accelerator can work at the main system clock frequency of 300 MHz with the DSP kernel at 600 MHz. The performance of EfficientNet-B3 in our architecture can reach 69.50 FPS and 255.22 GOPS. Compared with the latest EfficientNet-B3 accelerator, which uses the same FPGA development board, the accelerator proposed in this paper can achieve a 1.28-fold improvement of single-core performance and 1.38-fold improvement of performance of each DSP.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Neural network accelerator"

1

Tianxu, Yue. "Convolutional Neural Network FPGA-accelerator on Intel DE10-Standard FPGA." Thesis, Linköpings universitet, Elektroniska Kretsar och System, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-178174.

Full text
Abstract:
Convolutional neural networks (CNNs) have been extensively used in many aspects, such as face and speech recognition, image searching and classification, and automatic drive. Hence, CNN accelerators have become a trending research. Generally, Graphics processing units (GPUs) are widely applied in CNNaccelerators. However, Field-programmable gate arrays (FPGAs) have higher energy and resource efficiency compared with GPUs, moreover, high-level synthesis tools based on Open Computing Language (OpenCL) can reduce the verification and implementation period for FPGAs. In this project, PipeCNN[1] is implemented on Intel DE10-Standard FPGA. This OpenCL design acceleratesAlexnet through the interaction between Advanced RISC Machine (ARM) and FPGA. Then, PipeCNN optimization based on memory read and convolution is analyzed and discussed.
APA, Harvard, Vancouver, ISO, and other styles
2

Oudrhiri, Ali. "Performance of a Neural Network Accelerator Architecture and its Optimization Using a Pipeline-Based Approach." Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS658.pdf.

Full text
Abstract:
Ces dernières années, les réseaux de neurones ont gagné en popularité en raison de leur polyvalence et de leur efficacité dans la résolution d'une grande variété de tâches complexes. Cependant, à mesure que les réseaux neuronaux continuent de trouver des applications dans une gamme toujours croissante de domaines, leurs importantes exigences en matière de calcul deviennent un défi pressant. Cette demande en calcul est particulièrement problématique lors du déploiement de réseaux neuronaux sur des dispositifs embarqués aux ressources limitées, en particulier dans le contexte du calcul en périphérie pour les tâches d'inférence. De nos jours, les puces accélératrices de réseaux neuronaux émergent comme le choix optimal pour prendre en charge les réseaux neuronaux en périphérie. Ces puces offrent une efficacité remarquable avec leur taille compacte, leur faible consommation d'énergie et leur latence réduite. Dans le cadre du calcul en périphérie, diverses exigences ont émergé, nécessitant des compromis dans divers aspects de performance. Cela a conduit au développement d'architectures d'accélérateurs hautement configurables, leur permettant de s'adapter aux demandes de performance distinctes. Dans ce contexte, l'accent est mis sur Gemini, un accélérateur configurable de réseaux neuronaux conçu avec une architecture imposée et mis en œuvre à l'aide de techniques de synthèse de haut niveau. Les considérations pour sa conception et sa mise en œuvre ont été motivées par le besoin de configurabilité de la parallélisation et d'optimisation des performances. Une fois cet accélérateur conçu, il est devenu essentiel de démontrer la puissance de sa configurabilité, aidant les utilisateurs à choisir l'architecture la plus adaptée à leurs réseaux neuronaux. Pour atteindre cet objectif, cette thèse a contribué au développement d'une stratégie de prédiction des performances fonctionnant à un niveau élevé d'abstraction, qui prend en compte l'architecture choisie et la configuration du réseau neuronal. Cet outil aide les clients à prendre des décisions concernant l'architecture appropriée pour leurs applications de réseaux neuronaux spécifiques. Au cours de la recherche, nous avons constaté qu'utiliser un seul accélérateur présentait plusieurs limites et que l'augmentation de la parallélisme avait des limitations en termes de performances. Par conséquent, nous avons adopté une nouvelle stratégie d'optimisation de l'accélération des réseaux neuronaux. Cette fois, nous avons adopté une approche de haut niveau qui ne nécessitait pas d'optimisations fines de l'accélérateur. Nous avons organisé plusieurs instances de Gemini en pipeline et avons attribué les couches à différents accélérateurs pour maximiser les performances. Nous avons proposé des solutions pour deux scénarios : un scénario utilisateur où la structure du pipeline est prédéfinie avec un nombre fixe d'accélérateurs, de configurations d'accélérateurs et de tailles de RAM. Nous avons proposé des solutions pour mapper les couches sur les différents accélérateurs afin d'optimiser les performances d'exécution. Nous avons fait de même pour un scénario concepteur, où la structure du pipeline n'est pas fixe, cette fois il est permis de choisir le nombre et la configuration des accélérateurs pour optimiser l'exécution et également les performances matérielles. Cette stratégie de pipeline s'est révélée efficace pour l'accélérateur Gemini. Bien que cette thèse soit née d'un besoin industriel spécifique, certaines solutions développées au cours de la recherche peuvent être appliquées ou adaptées à d'autres accélérations de réseaux neuronaux. Notamment, la stratégie de prédiction des performances et l'optimisation de haut niveau du traitement de réseaux neuronaux en combinant plusieurs instances offrent des aperçus précieux pour une application plus large
In recent years, neural networks have gained widespread popularity for their versatility and effectiveness in solving a wide range of complex tasks. Their ability to learn and make predictions from large data-sets has revolutionized various fields. However, as neural networks continue to find applications in an ever-expanding array of domains, their significant computational requirements become a pressing challenge. This computational demand is particularly problematic when deploying neural networks in resource-constrained embedded devices, especially within the context of edge computing for inference tasks. Nowadays, neural network accelerator chips emerge as the optimal choice for supporting neural networks at the edge. These chips offer remarkable efficiency with their compact size, low power consumption, and reduced latency. Moreover, the fact that they are integrated on the same chip environment also enhances security by minimizing external data communication. In the frame of edge computing, diverse requirements have emerged, necessitating trade-offs in various performance aspects. This has led to the development of accelerator architectures that are highly configurable, allowing them to adapt to distinct performance demands. In this context, the focus lies on Gemini, a configurable inference neural network accelerator designed with imposed architecture and implemented using High-Level Synthesis techniques. The considerations for its design and implementation were driven by the need for parallelization configurability and performance optimization. Once this accelerator was designed, demonstrating the power of its configurability became essential, helping users select the most suitable architecture for their neural networks. To achieve this objective, this thesis contributed to the development of a performance prediction strategy operating at a high-level of abstraction, which considers the chosen architecture and neural network configuration. This tool assists clients in making decisions regarding the appropriate architecture for their specific neural network applications. During the research, we noticed that using one accelerator presents several limits and that increasing parallelism had limitations on performances. Consequently, we adopted a new strategy for optimizing neural network acceleration. This time, we took a high-level approach that did not require fine-grained accelerator optimizations. We organized multiple Gemini instances into a pipeline and allocated layers to different accelerators to maximize performance. We proposed solutions for two scenarios: a user scenario where the pipeline structure is predefined with a fixed number of accelerators, accelerator configurations, and RAM sizes. We proposed solutions to map the layers on the different accelerators to optimise the execution performance. We did the same for a designer scenario, where the pipeline structure is not fixed, this time it is allowed to choose the number and configuration of the accelerators to optimize the execution and also hardware performances. This pipeline strategy has proven to be effective for the Gemini accelerator. Although this thesis originated from a specific industrial need, certain solutions developed during the research can be applied or adapted to other neural network accelerators. Notably, the performance prediction strategy and high-level optimization of NN processing through pipelining multiple instances offer valuable insights for broader application
APA, Harvard, Vancouver, ISO, and other styles
3

Maltoni, Pietro. "Progetto di un acceleratore hardware per layer di convoluzioni depthwise in applicazioni di Deep Neural Network." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24205/.

Full text
Abstract:
Il progressivo sviluppo tecnologico e il costante monitoraggio, controllo e analisi della realtà circostante ha condotto allo sviluppo di dispositivi IoT sempre più performanti, per questo si è iniziato a parlare di Edge Computing. In questi dispositivi sono presenti le risorse per elaborare i dati dai sensori direttamente in locale. Questa tecnologia si adatta bene alle CNN, reti neurali per l'analisi e il riconoscimento di immagini. Le Separable Convolution rappresentano una nuova frontiera perchè permettono di diminuire in modo massiccio la quantità di operazioni da eseguire su tensori di dati dividendo la convoluzione in due parti: una Depthwise e una Pointwise. Tutto questo porta a risultati molto affidabili in termini di accuratezza e velocità ma è sempre centrale il problema legato al consumo di potenza in quanto i dispositivi si affidano solamente ad una batteria interna. Per questo è necessario avere un buon trade-off tra consumi e capacità computazionale. Per rispondere a questa sfida tecnologica lo stato dell'arte in questo ambito propone soluzioni diverse, composte da cluster con core ottimizzati e istruzioni dedicate o FPGA. In questa tesi proponiamo un acceleratore hardware sviluppato in PULP orientato al calcolo di layer di convoluzioni Depthwise. Grazie ad una logica HWC dei dati in memoria e al Window Buffer, una finestra che trasla sull'immagine per effettuare le convoluzioni canale per canale è stato possibile sviluppare una architettura del datapath orientata al riuso dei dati; questo porta l’acceleratore ad avere come risultato in uscita uno throughput massimo di 4 pixel per ciclo di clock. Con le performance di 6 GOP/s, un' efficienza energetica di 101 GOP/j e un consumo di potenza nell'ordine dei mW, dati ottenuti attraverso l'integrazione dell'IP all'interno del cluster di Darkside, nuovo chip di ricerca con tecnologia TSCM a 65 nm, l'acceleratore Depthwise si candida ad essere una soluzione ideale per questo tipo di applicazioni.
APA, Harvard, Vancouver, ISO, and other styles
4

Xu, Hongjie. "Energy-Efficient On-Chip Cache Architectures and Deep Neural Network Accelerators Considering the Cost of Data Movement." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263786.

Full text
Abstract:
付記する学位プログラム名: 京都大学卓越大学院プログラム「先端光・電子デバイス創成学」
京都大学
新制・課程博士
博士(情報学)
甲第23325号
情博第761号
京都大学大学院情報学研究科通信情報システム専攻
(主査)教授 小野寺 秀俊, 教授 大木 英司, 教授 佐藤 高史
学位規則第4条第1項該当
Doctor of Informatics
Kyoto University
DFAM
APA, Harvard, Vancouver, ISO, and other styles
5

Riera, Villanueva Marc. "Low-power accelerators for cognitive computing." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669828.

Full text
Abstract:
Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications, and are especially efficient in classification and decision making problems such as speech recognition or machine translation. Mobile and embedded devices increasingly rely on DNNs to understand the world. Smartphones, smartwatches and cars perform discriminative tasks, such as face or object recognition, on a daily basis. Despite the increasing popularity of DNNs, running them on mobile and embedded systems comes with several main challenges: delivering high accuracy and performance with a small memory and energy budget. Modern DNN models consist of billions of parameters requiring huge computational and memory resources and, hence, they cannot be directly deployed on low-power systems with limited resources. The objective of this thesis is to address these issues and propose novel solutions in order to design highly efficient custom accelerators for DNN-based cognitive computing systems. In first place, we focus on optimizing the inference of DNNs for sequence processing applications. We perform an analysis of the input similarity between consecutive DNN executions. Then, based on the high degree of input similarity, we propose DISC, a hardware accelerator implementing a Differential Input Similarity Computation technique to reuse the computations of the previous execution, instead of computing the entire DNN. We observe that, on average, more than 60% of the inputs of any neural network layer tested exhibit negligible changes with respect to the previous execution. Avoiding the memory accesses and computations for these inputs results in 63% energy savings on average. In second place, we propose to further optimize the inference of FC-based DNNs. We first analyze the number of unique weights per input neuron of several DNNs. Exploiting common optimizations, such as linear quantization, we observe a very small number of unique weights per input for several FC layers of modern DNNs. Then, to improve the energy-efficiency of FC computation, we present CREW, a hardware accelerator that implements a Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage. We evaluate CREW on a diverse set of modern DNNs. On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator. In third place, we propose a mechanism to optimize the inference of RNNs. RNN cells perform element-wise multiplications across the activations of different gates, sigmoid and tanh being the common activation functions. We perform an analysis of the activation function values, and show that a significant fraction are saturated towards zero or one in popular RNNs. Then, we propose CGPA to dynamically prune activations from RNNs at a coarse granularity. CGPA avoids the evaluation of entire neurons whenever the outputs of peer neurons are saturated. CGPA significantly reduces the amount of computations and memory accesses while avoiding sparsity by a large extent, and can be easily implemented on top of conventional accelerators such as TPU with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs. Finally, in the last contribution of this thesis we focus on static DNN pruning methodologies. DNN pruning reduces memory footprint and computational work by removing connections and/or neurons that are ineffectual. However, we show that prior pruning schemes require an extremely time-consuming iterative process that requires retraining the DNN many times to tune the pruning parameters. Then, we propose a DNN pruning scheme based on Principal Component Analysis and relative importance of each neuron's connection that automatically finds the optimized DNN in one shot.
Les xarxes neuronals profundes (DNN) han aconseguit un èxit enorme en aplicacions cognitives, i són especialment eficients en problemes de classificació i presa de decisions com ara reconeixement de veu o traducció automàtica. Els dispositius mòbils depenen cada cop més de les DNNs per entendre el món. Els telèfons i rellotges intel·ligents, o fins i tot els cotxes, realitzen diàriament tasques discriminatòries com ara el reconeixement de rostres o objectes. Malgrat la popularitat creixent de les DNNs, el seu funcionament en sistemes mòbils presenta diversos reptes: proporcionar una alta precisió i rendiment amb un petit pressupost de memòria i energia. Les DNNs modernes consisteixen en milions de paràmetres que requereixen recursos computacionals i de memòria enormes i, per tant, no es poden utilitzar directament en sistemes de baixa potència amb recursos limitats. L'objectiu d'aquesta tesi és abordar aquests problemes i proposar noves solucions per tal de dissenyar acceleradors eficients per a sistemes de computació cognitiva basats en DNNs. En primer lloc, ens centrem en optimitzar la inferència de les DNNs per a aplicacions de processament de seqüències. Realitzem una anàlisi de la similitud de les entrades entre execucions consecutives de les DNNs. A continuació, proposem DISC, un accelerador que implementa una tècnica de càlcul diferencial, basat en l'alt grau de semblança de les entrades, per reutilitzar els càlculs de l'execució anterior, en lloc de computar tota la xarxa. Observem que, de mitjana, més del 60% de les entrades de qualsevol capa de les DNNs utilitzades presenten canvis menors respecte a l'execució anterior. Evitar els accessos de memòria i càlculs d'aquestes entrades comporta un estalvi d'energia del 63% de mitjana. En segon lloc, proposem optimitzar la inferència de les DNNs basades en capes FC. Primer analitzem el nombre de pesos únics per neurona d'entrada en diverses xarxes. Aprofitant optimitzacions comunes com la quantització lineal, observem un nombre molt reduït de pesos únics per entrada en diverses capes FC de DNNs modernes. A continuació, per millorar l'eficiència energètica del càlcul de les capes FC, presentem CREW, un accelerador que implementa un eficient mecanisme de reutilització de càlculs i emmagatzematge dels pesos. CREW redueix el nombre de multiplicacions i proporciona estalvis importants en l'ús de la memòria. Avaluem CREW en un conjunt divers de DNNs modernes. CREW proporciona, de mitjana, una millora en rendiment de 2,61x i un estalvi d'energia de 2,42x. En tercer lloc, proposem un mecanisme per optimitzar la inferència de les RNNs. Les cel·les de les xarxes recurrents realitzen multiplicacions element a element de les activacions de diferents comportes, sigmoides i tanh sent les funcions habituals d'activació. Realitzem una anàlisi dels valors de les funcions d'activació i mostrem que una fracció significativa està saturada cap a zero o un en un conjunto d'RNNs populars. A continuació, proposem CGPA per podar dinàmicament les activacions de les RNNs a una granularitat gruixuda. CGPA evita l'avaluació de neurones senceres cada vegada que les sortides de neurones parelles estan saturades. CGPA redueix significativament la quantitat de càlculs i accessos a la memòria, aconseguint en mitjana un 12% de millora en el rendiment i estalvi d'energia. Finalment, en l'última contribució d'aquesta tesi ens centrem en metodologies de poda estàtica de les DNNs. La poda redueix la petjada de memòria i el treball computacional mitjançant l'eliminació de connexions o neurones redundants. Tanmateix, mostrem que els esquemes de poda previs fan servir un procés iteratiu molt llarg que requereix l'entrenament de les DNNs moltes vegades per ajustar els paràmetres de poda. A continuació, proposem un esquema de poda basat en l'anàlisi de components principals i la importància relativa de les connexions de cada neurona que optimitza automàticament el DNN optimitzat en un sol tret sense necessitat de sintonitzar manualment múltiples paràmetres
APA, Harvard, Vancouver, ISO, and other styles
6

Khan, Muhammad Jazib. "Programmable Address Generation Unit for Deep Neural Network Accelerators." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-271884.

Full text
Abstract:
The Convolutional Neural Networks are getting more and more popular due to their applications in revolutionary technologies like Autonomous Driving, Biomedical Imaging, and Natural Language Processing. With this increase in adoption, the complexity of underlying algorithms is also increasing. This trend entails implications for the computation platforms as well, i.e. GPUs, FPGA, or ASIC based accelerators, especially for the Address Generation Unit (AGU), which is responsible for the memory access. Existing accelerators typically have Parametrizable Datapath AGUs, which have minimal adaptability towards evolution in algorithms. Hence new hardware is required for new algorithms, which is a very inefficient approach in terms of time, resources, and reusability. In this research, six algorithms with different implications for hardware are evaluated for address generation, and a fully Programmable AGU (PAGU) is presented, which can adapt to these algorithms. These algorithms are Standard, Strided, Dilated, Upsampled and Padded convolution, and MaxPooling. The proposed AGU architecture is a Very Long Instruction Word based Application Specific Instruction Processor which has specialized components like hardware counters and zero-overhead loops and a powerful Instruction Set Architecture (ISA), which can model static and dynamic constraints and affine and non-affine Address Equations. The target has been to minimize the flexibility vs. area, power, and performance trade-off. For a working test network of Semantic Segmentation, results have shown that PAGU shows close to the ideal performance, one cycle per address, for all the algorithms under consideration excepts Upsampled Convolution for which it is 1.7 cycles per address. The area of PAGU is approx. 4.6 times larger than the Parametrizable Datapath approach, which is still reasonable considering the high flexibility benefits. The potential of PAGU is not just limited to neural network applications but also in more general digital signal processing areas, which can be explored in the future.
Convolutional Neural Networks blir mer och mer populära på grund av deras applikationer inom revolutionerande tekniker som autonom körning, biomedicinsk bildbehandling och naturligt språkbearbetning. Med denna ökning av antagandet ökar också komplexiteten hos underliggande algoritmer. Detta medför implikationer för beräkningsplattformarna såväl som GPU: er, FPGAeller ASIC-baserade acceleratorer, särskilt för Adressgenerationsenheten (AGU) som är ansvarig för minnesåtkomst. Befintliga acceleratorer har normalt Parametrizable Datapath AGU: er som har mycket begränsad anpassningsförmåga till utveckling i algoritmer. Därför krävs ny hårdvara för nya algoritmer, vilket är en mycket ineffektiv metod när det gäller tid, resurser och återanvändbarhet. I denna forskning utvärderas sex algoritmer med olika implikationer för hårdvara för adressgenerering och en helt programmerbar AGU (PAGU) presenteras som kan anpassa sig till dessa algoritmer. Dessa algoritmer är Standard, Strided, Dilated, Upsampled och Padded convolution och MaxPooling. Den föreslagna AGU-arkitekturen är en Very Long Instruction Word-baserad applikationsspecifik instruktionsprocessor som har specialiserade komponenter som hårdvara räknare och noll-overhead-slingor och en kraftfull Instruktionsuppsättning Arkitektur (ISA) som kan modellera statiska och dynamiska begränsningar och affinera och icke-affinerad adress ekvationer. Målet har varit att minimera flexibiliteten kontra avvägning av område, kraft och prestanda. För ett fungerande testnätverk av semantisk segmentering har resultaten visat att PAGU visar nära den perfekta prestanda, 1 cykel per adress, för alla algoritmer som beaktas undantar Upsampled Convolution för vilken det är 1,7 cykler per adress. Området för PAGU är ungefär 4,6 gånger större än Parametrizable Datapath-metoden, vilket fortfarande är rimligt med tanke på de stora flexibilitetsfördelarna. Potentialen för PAGU är inte bara begränsad till neurala nätverksapplikationer utan också i mer allmänna digitala signalbehandlingsområden som kan utforskas i framtiden.
APA, Harvard, Vancouver, ISO, and other styles
7

Jalasutram, Rommel. "Acceleration of spiking neural networks on multicore architectures." Connect to this title online, 2009. http://etd.lib.clemson.edu/documents/1252424720/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Han, Bing. "ACCELERATION OF SPIKING NEURAL NETWORK ON GENERAL PURPOSE GRAPHICS PROCESSORS." University of Dayton / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1271368713.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Chen, Yu-Hsin Ph D. Massachusetts Institute of Technology. "Architecture design for highly flexible and energy-efficient deep neural network accelerators." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117838.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 141-147).
Deep neural networks (DNNs) are the backbone of modern artificial intelligence (AI). However, due to their high computational complexity and diverse shapes and sizes, dedicated accelerators that can achieve high performance and energy efficiency across a wide range of DNNs are critical for enabling AI in real-world applications. To address this, we present Eyeriss, a co-design of software and hardware architecture for DNN processing that is optimized for performance, energy efficiency and flexibility. Eyeriss features a novel Row-Stationary (RS) dataflow to minimize data movement when processing a DNN, which is the bottleneck of both performance and energy efficiency. The RS dataflow supports highly-parallel processing while fully exploiting data reuse in a multi-level memory hierarchy to optimize for the overall system energy efficiency given any DNN shape and size. It achieves 1.4x to 2.5x higher energy efficiency than other existing dataflows. To support the RS dataflow, we present two versions of the Eyeriss architecture. Eyeriss v1 targets large DNNs that have plenty of data reuse. It features a flexible mapping strategy for high performance and a multicast on-chip network (NoC) for high data reuse, and further exploits data sparsity to reduce processing element (PE) power by 45% and off-chip bandwidth by up to 1.9x. Fabricated in a 65nm CMOS, Eyeriss v1 consumes 278 mW at 34.7 fps for the CONV layers of AlexNet, which is 10x more efficient than a mobile GPU. Eyeriss v2 addresses support for the emerging compact DNNs that introduce higher variation in data reuse. It features a RS+ dataflow that improves PE utilization, and a flexible and scalable NoC that adapts to the bandwidth requirement while also exploiting available data reuse. Together, they provide over 10x higher throughput than Eyeriss v1 at 256 PEs. Eyeriss v2 also exploits sparsity and SIMD for an additional 6x increase in throughput.
by Yu-Hsin Chen.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
10

Gaura, Elena Ioana. "Neural network techniques for the control and identification of acceleration sensors." Thesis, Coventry University, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.313132.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Neural network accelerator"

1

Whitehead, P. A. Design considerations for a hardware accelerator for Kohonen unsupervised learning in artificial neural networks. Manchester: UMIST, 1997.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jones, Steven P. Neural network models of simple mechanical systems illustrating the feasibility of accelerated life testing. [Washington, DC]: National Aeronautics and Space Administration, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

A, Daglis I., ed. Effects of space weather on technology infrastructure. Dordrecht: Kluwer Academic Publishers, 2004.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kong, Joonho, and Mahmood Azhar Qureshi. Accelerators for Convolutional Neural Networks. Wiley & Sons, Incorporated, John, 2023.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kong, Joonho, and Mahmood Azhar Qureshi. Accelerators for Convolutional Neural Networks. Wiley & Sons, Incorporated, John, 2023.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kong, Joonho, and Mahmood Azhar Qureshi. Accelerators for Convolutional Neural Networks. Wiley & Sons, Incorporated, John, 2023.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Munir. Accelerators for Convolutional Neural Networks. Wiley & Sons, Limited, John, 2023.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Accelerated training for large feedforward neural networks. Moffett Field, Calif: National Aeronautics and Space Administration, Ames Research Center, 1998.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Raff, Lionel, Ranga Komanduri, Martin Hagan, and Satish Bukkapatnam. Neural Networks in Chemical Reaction Dynamics. Oxford University Press, 2012. http://dx.doi.org/10.1093/oso/9780199765652.001.0001.

Full text
Abstract:
This monograph presents recent advances in neural network (NN) approaches and applications to chemical reaction dynamics. Topics covered include: (i) the development of ab initio potential-energy surfaces (PES) for complex multichannel systems using modified novelty sampling and feedforward NNs; (ii) methods for sampling the configuration space of critical importance, such as trajectory and novelty sampling methods and gradient fitting methods; (iii) parametrization of interatomic potential functions using a genetic algorithm accelerated with a NN; (iv) parametrization of analytic interatomic potential functions using NNs; (v) self-starting methods for obtaining analytic PES from ab inito electronic structure calculations using direct dynamics; (vi) development of a novel method, namely, combined function derivative approximation (CFDA) for simultaneous fitting of a PES and its corresponding force fields using feedforward neural networks; (vii) development of generalized PES using many-body expansions, NNs, and moiety energy approximations; (viii) NN methods for data analysis, reaction probabilities, and statistical error reduction in chemical reaction dynamics; (ix) accurate prediction of higher-level electronic structure energies (e.g. MP4 or higher) for large databases using NNs, lower-level (Hartree-Fock) energies, and small subsets of the higher-energy database; and finally (x) illustrative examples of NN applications to chemical reaction dynamics of increasing complexity starting from simple near equilibrium structures (vibrational state studies) to more complex non-adiabatic reactions. The monograph is prepared by an interdisciplinary group of researchers working as a team for nearly two decades at Oklahoma State University, Stillwater, OK with expertise in gas phase reaction dynamics; neural networks; various aspects of MD and Monte Carlo (MC) simulations of nanometric cutting, tribology, and material properties at nanoscale; scaling laws from atomistic to continuum; and neural networks applications to chemical reaction dynamics. It is anticipated that this emerging field of NN in chemical reaction dynamics will play an increasingly important role in MD, MC, and quantum mechanical studies in the years to come.
APA, Harvard, Vancouver, ISO, and other styles
10

AI Ladder: Accelerate Your Journey to AI. O'Reilly Media, Incorporated, 2020.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Neural network accelerator"

1

Huang, Hantao, and Hao Yu. "Distributed-Solver for Networked Neural Network." In Compact and Fast Machine Learning Accelerator for IoT Devices, 107–43. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-3323-1_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nakajima, Toshiya. "Architecture of the Neural Network Simulation Accelerator NEUROSIM/L." In International Neural Network Conference, 722–25. Dordrecht: Springer Netherlands, 1990. http://dx.doi.org/10.1007/978-94-009-0643-3_61.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Reagen, Brandon, Robert Adolf, Paul Whatmough, Gu-Yeon Wei, and David Brooks. "Neural Network Accelerator Optimization: A Case Study." In Deep Learning for Computer Architects, 43–61. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-031-01756-8_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Huang, Hantao, and Hao Yu. "Tensor-Solver for Deep Neural Network." In Compact and Fast Machine Learning Accelerator for IoT Devices, 63–105. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-3323-1_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ae, Tadashi, and Reiji Aibara. "A Neural Network for 3-D VLSI Accelerator." In The Kluwer International Series in Engineering and Computer Science, 179–88. Boston, MA: Springer US, 1989. http://dx.doi.org/10.1007/978-1-4613-1619-0_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Huang, Hantao, and Hao Yu. "Least-Squares-Solver for Shallow Neural Network." In Compact and Fast Machine Learning Accelerator for IoT Devices, 29–62. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-3323-1_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hu, Lili. "Frameworks for Efficient Convolutional Neural Network Accelerator on FPGA." In Advances in Intelligent Systems and Computing, 651–57. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-8944-2_75.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cheung, Kit, Simon R. Schultz, and Wayne Luk. "A Large-Scale Spiking Neural Network Accelerator for FPGA Systems." In Artificial Neural Networks and Machine Learning – ICANN 2012, 113–20. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-33269-2_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wu, Jin, Xiangyang Shi, Wenting Pang, and Yu Wang. "Research on FPGA Accelerator Optimization Based on Graph Neural Network." In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery, 536–42. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-20738-9_61.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Jin, Shaopeng, Shuo Qi, Yilin Dai, and Yihu Hu. "Design of Convolutional Neural Network Accelerator Based on RISC-V." In Lecture Notes on Data Engineering and Communications Technologies, 446–54. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-29097-8_53.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Neural network accelerator"

1

Shiflett, Kyle, Dylan Wright, Avinash Karanth, and Ahmed Louri. "PIXEL: Photonic Neural Network Accelerator." In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020. http://dx.doi.org/10.1109/hpca47549.2020.00046.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Xu, David, A. Barış Özgüler, Giuseppe Di Guglielmo, Nhan Tran, Gabriel Perdue, Luca Carloni, and Farah Fahim. "Neural network accelerator for quantum control." In Neural network accelerator for quantum control. US DOE, 2023. http://dx.doi.org/10.2172/1959815.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Yang, Zunming, Zhanzhuang He, Jing Yang, and Zhong Ma. "An LSTM Acceleration Method Based on Embedded Neural Network Accelerator." In ACAI'21: 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3508546.3508649.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Yi, Qian. "FPGA Implementation of Neural Network Accelerator." In 2018 2nd IEEE Advanced Information Management,Communicates, Electronic and Automation Control Conference (IMCEC). IEEE, 2018. http://dx.doi.org/10.1109/imcec.2018.8469659.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Vogt, Michael C. "Neural network-based sensor signal accelerator." In Intelligent Systems and Smart Manufacturing, edited by Peter E. Orban and George K. Knopf. SPIE, 2001. http://dx.doi.org/10.1117/12.417242.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Hong, Xiao Zhang, Dehui Kong, Guoning Lu, Degen Zhen, Fang Zhu, and Ke Xu. "Convolutional Neural Network Accelerator on FPGA." In 2019 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA). IEEE, 2019. http://dx.doi.org/10.1109/icta48799.2019.9012821.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Xu, David, A. Barls Ozguler, Giuseppe Di Guglielmo, Nhan Tran, Gabriel N. Perdue, Luca Carloni, and Farah Fahim. "Neural network accelerator for quantum control." In 2022 IEEE/ACM Third International Workshop on Quantum Computing Software (QCS). IEEE, 2022. http://dx.doi.org/10.1109/qcs56647.2022.00010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Miscuglio, Mario, Zibo Hu, Shurui Li, Puneet Gupta, Hamed Dalir, and Volker J. Sorger. "Fourier Optical Convolutional Neural Network Accelerator." In Signal Processing in Photonic Communications. Washington, D.C.: OSA, 2021. http://dx.doi.org/10.1364/sppcom.2021.spm5c.2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lai, Yeong-Kang, and Zheng-Xun Yeh. "An Efficient Convolutional Neural Network Accelerator." In 2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan). IEEE, 2023. http://dx.doi.org/10.1109/icce-taiwan58799.2023.10226679.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mody, Mihir, Prithvi Shankar, Veeramanikandan Raju, and Sriramakrishnan Govindarajan. "Fail-Safe Neural Network Inference Accelerator." In 2021 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). IEEE, 2021. http://dx.doi.org/10.1109/conecct52877.2021.9622537.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Neural network accelerator"

1

Aimone, James, Christopher Bennett, Suma Cardwell, Ryan Dellana, and Tianyao Xiao. Mosaic The Best of Both Worlds: Analog devices with Digital Spiking Communication to build a Hybrid Neural Network Accelerator. Office of Scientific and Technical Information (OSTI), September 2020. http://dx.doi.org/10.2172/1673175.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Morgan, Nelson, Jerome Feldman, and John Wawrzynek. Accelerator Systems for Neural Networks, Speech, and Related Applications. Fort Belvoir, VA: Defense Technical Information Center, April 1995. http://dx.doi.org/10.21236/ada298954.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Garg, Raveesh, Eric Qin, Francisco Martinez, Robert Guirado, Akshay Jain, Sergi Abadal, Jose Abellan, et al. Understanding the Design Space of Sparse/Dense Multiphase Dataflows for Mapping Graph Neural Networks on Spatial Accelerators. Office of Scientific and Technical Information (OSTI), September 2021. http://dx.doi.org/10.2172/1821960.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wideman, Jr., Robert F., Nicholas B. Anthony, Avigdor Cahaner, Alan Shlosberg, Michel Bellaiche, and William B. Roush. Integrated Approach to Evaluating Inherited Predictors of Resistance to Pulmonary Hypertension Syndrome (Ascites) in Fast Growing Broiler Chickens. United States Department of Agriculture, December 2000. http://dx.doi.org/10.32747/2000.7575287.bard.

Full text
Abstract:
Background PHS (pulmonary hypertension syndrome, ascites syndrome) is a serious cause of loss in the broiler industry, and is a prime example of an undesirable side effect of successful genetic development that may be deleteriously manifested by factors in the environment of growing broilers. Basically, continuous and pinpointed selection for rapid growth in broilers has led to higher oxygen demand and consequently to more frequent manifestation of an inherent potential cardiopulmonary incapability to sufficiently oxygenate the arterial blood. The multifaceted causes and modifiers of PHS make research into finding solutions to the syndrome a complex and multi threaded challenge. This research used several directions to better understand the development of PHS and to probe possible means of achieving a goal of monitoring and increasing resistance to the syndrome. Research Objectives (1) To evaluate the growth dynamics of individuals within breeding stocks and their correlation with individual susceptibility or resistance to PHS; (2) To compile data on diagnostic indices found in this work to be predictive for PHS, during exposure to experimental protocols known to trigger PHS; (3) To conduct detailed physiological evaluations of cardiopulmonary function in broilers; (4) To compile data on growth dynamics and other diagnostic indices in existing lines selected for susceptibility or resistance to PHS; (5) To integrate growth dynamics and other diagnostic data within appropriate statistical procedures to provide geneticists with predictive indices that characterize resistance or susceptibility to PHS. Revisions In the first year, the US team acquired the costly Peckode weigh platform / individual bird I.D. system that was to provide the continuous (several times each day), automated weighing of birds, for a comprehensive monitoring of growth dynamics. However, data generated were found to be inaccurate and irreproducible, so making its use implausible. Henceforth, weighing was manual, this highly labor intensive work precluding some of the original objectives of using such a strategy of growth dynamics in selection procedures involving thousands of birds. Major conclusions, solutions, achievements 1. Healthy broilers were found to have greater oscillations in growth velocity and acceleration than PHS susceptible birds. This proved the scientific validity of our original hypothesis that such differences occur. 2. Growth rate in the first week is higher in PHS-susceptible than in PHS-resistant chicks. Artificial neural network accurately distinguished differences between the two groups based on growth patterns in this period. 3. In the US, the unilateral pulmonary occlusion technique was used in collaboration with a major broiler breeding company to create a commercial broiler line that is highly resistant to PHS induced by fast growth and low ambient temperatures. 4. In Israel, lines were obtained by genetic selection on PHS mortality after cold exposure in a dam-line population comprising of 85 sire families. The wide range of PHS incidence per family (0-50%), high heritability (about 0.6), and the results in cold challenged progeny, suggested a highly effective and relatively easy means for selection for PHS resistance 5. The best minimally-invasive diagnostic indices for prediction of PHS resistance were found to be oximetry, hematocrit values, heart rate and electrocardiographic (ECG) lead II waves. Some differences in results were found between the US and Israeli teams, probably reflecting genetic differences in the broiler strains used in the two countries. For instance the US team found the S wave amplitude to predict PHS susceptibility well, whereas the Israeli team found the P wave amplitude to be a better valid predictor. 6. Comprehensive physiological studies further increased knowledge on the development of PHS cardiopulmonary characteristics of pre-ascitic birds, pulmonary arterial wedge pressures, hypotension/kidney response, pulmonary hemodynamic responses to vasoactive mediators were all examined in depth. Implications, scientific and agricultural Substantial progress has been made in understanding the genetic and environmental factors involved in PHS, and their interaction. The two teams each successfully developed different selection programs, by surgical means and by divergent selection under cold challenge. Monitoring of the progress and success of the programs was done be using the in-depth estimations that this research engendered on the reliability and value of non-invasive predictive parameters. These findings helped corroborate the validity of practical means to improve PHT resistance by research-based programs of selection.
APA, Harvard, Vancouver, ISO, and other styles
5

DEEP LEARNING DAMAGE IDENTIFICATION METHOD FOR STEEL- FRAME BRACING STRUCTURES USING TIME–FREQUENCY ANALYSIS AND CONVOLUTIONAL NEURAL NETWORKS. The Hong Kong Institute of Steel Construction, December 2023. http://dx.doi.org/10.18057/ijasc.2023.19.4.8.

Full text
Abstract:
Lattice bracing, commonly used in steel construction systems, is vulnerable to damage and failure when subjected to horizontal seismic pressure. To identify damage, manual examination is the conventional method applied. However, this approach is time-consuming and typically unable to detect damage in its early stage. Determining the exact location of damage has been problematic for researchers. Nevertheless, detecting the failure of lateral supports in various parts of a structure using time–frequency analysis and deep learning methods, such as convolutional neural networks, is possible. Then, the damaged structure can be rapidly rebuilt to ensure safety. Experiments are conducted to determine the vibration acceleration modes of a four-storey steel structure considering various support structure damage scenarios. The acceleration signals at each measurement point are then analysed with respect to time and frequency to generate appropriate three-dimensional spectral matrices. In this study, the MobileNetV2 deep learning model was trained on a labelled picture collection of damaged matrix images. Hyperparameter tweaking and training resulted in a prediction accuracy of 97.37% for the complete dataset and 99.30% and 96.23% for the training and testing sets, respectively. The findings indicate that a combination of time–frequency analysis and deep learning methods may pinpoint the position of the damaged steel frame support components more accurately.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography