Dissertations / Theses on the topic 'Neural network accelerator'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Neural network accelerator.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Tianxu, Yue. "Convolutional Neural Network FPGA-accelerator on Intel DE10-Standard FPGA." Thesis, Linköpings universitet, Elektroniska Kretsar och System, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-178174.
Full textOudrhiri, Ali. "Performance of a Neural Network Accelerator Architecture and its Optimization Using a Pipeline-Based Approach." Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS658.pdf.
Full textIn recent years, neural networks have gained widespread popularity for their versatility and effectiveness in solving a wide range of complex tasks. Their ability to learn and make predictions from large data-sets has revolutionized various fields. However, as neural networks continue to find applications in an ever-expanding array of domains, their significant computational requirements become a pressing challenge. This computational demand is particularly problematic when deploying neural networks in resource-constrained embedded devices, especially within the context of edge computing for inference tasks. Nowadays, neural network accelerator chips emerge as the optimal choice for supporting neural networks at the edge. These chips offer remarkable efficiency with their compact size, low power consumption, and reduced latency. Moreover, the fact that they are integrated on the same chip environment also enhances security by minimizing external data communication. In the frame of edge computing, diverse requirements have emerged, necessitating trade-offs in various performance aspects. This has led to the development of accelerator architectures that are highly configurable, allowing them to adapt to distinct performance demands. In this context, the focus lies on Gemini, a configurable inference neural network accelerator designed with imposed architecture and implemented using High-Level Synthesis techniques. The considerations for its design and implementation were driven by the need for parallelization configurability and performance optimization. Once this accelerator was designed, demonstrating the power of its configurability became essential, helping users select the most suitable architecture for their neural networks. To achieve this objective, this thesis contributed to the development of a performance prediction strategy operating at a high-level of abstraction, which considers the chosen architecture and neural network configuration. This tool assists clients in making decisions regarding the appropriate architecture for their specific neural network applications. During the research, we noticed that using one accelerator presents several limits and that increasing parallelism had limitations on performances. Consequently, we adopted a new strategy for optimizing neural network acceleration. This time, we took a high-level approach that did not require fine-grained accelerator optimizations. We organized multiple Gemini instances into a pipeline and allocated layers to different accelerators to maximize performance. We proposed solutions for two scenarios: a user scenario where the pipeline structure is predefined with a fixed number of accelerators, accelerator configurations, and RAM sizes. We proposed solutions to map the layers on the different accelerators to optimise the execution performance. We did the same for a designer scenario, where the pipeline structure is not fixed, this time it is allowed to choose the number and configuration of the accelerators to optimize the execution and also hardware performances. This pipeline strategy has proven to be effective for the Gemini accelerator. Although this thesis originated from a specific industrial need, certain solutions developed during the research can be applied or adapted to other neural network accelerators. Notably, the performance prediction strategy and high-level optimization of NN processing through pipelining multiple instances offer valuable insights for broader application
Maltoni, Pietro. "Progetto di un acceleratore hardware per layer di convoluzioni depthwise in applicazioni di Deep Neural Network." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24205/.
Full textXu, Hongjie. "Energy-Efficient On-Chip Cache Architectures and Deep Neural Network Accelerators Considering the Cost of Data Movement." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263786.
Full text京都大学
新制・課程博士
博士(情報学)
甲第23325号
情博第761号
京都大学大学院情報学研究科通信情報システム専攻
(主査)教授 小野寺 秀俊, 教授 大木 英司, 教授 佐藤 高史
学位規則第4条第1項該当
Doctor of Informatics
Kyoto University
DFAM
Riera, Villanueva Marc. "Low-power accelerators for cognitive computing." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669828.
Full textLes xarxes neuronals profundes (DNN) han aconseguit un èxit enorme en aplicacions cognitives, i són especialment eficients en problemes de classificació i presa de decisions com ara reconeixement de veu o traducció automàtica. Els dispositius mòbils depenen cada cop més de les DNNs per entendre el món. Els telèfons i rellotges intel·ligents, o fins i tot els cotxes, realitzen diàriament tasques discriminatòries com ara el reconeixement de rostres o objectes. Malgrat la popularitat creixent de les DNNs, el seu funcionament en sistemes mòbils presenta diversos reptes: proporcionar una alta precisió i rendiment amb un petit pressupost de memòria i energia. Les DNNs modernes consisteixen en milions de paràmetres que requereixen recursos computacionals i de memòria enormes i, per tant, no es poden utilitzar directament en sistemes de baixa potència amb recursos limitats. L'objectiu d'aquesta tesi és abordar aquests problemes i proposar noves solucions per tal de dissenyar acceleradors eficients per a sistemes de computació cognitiva basats en DNNs. En primer lloc, ens centrem en optimitzar la inferència de les DNNs per a aplicacions de processament de seqüències. Realitzem una anàlisi de la similitud de les entrades entre execucions consecutives de les DNNs. A continuació, proposem DISC, un accelerador que implementa una tècnica de càlcul diferencial, basat en l'alt grau de semblança de les entrades, per reutilitzar els càlculs de l'execució anterior, en lloc de computar tota la xarxa. Observem que, de mitjana, més del 60% de les entrades de qualsevol capa de les DNNs utilitzades presenten canvis menors respecte a l'execució anterior. Evitar els accessos de memòria i càlculs d'aquestes entrades comporta un estalvi d'energia del 63% de mitjana. En segon lloc, proposem optimitzar la inferència de les DNNs basades en capes FC. Primer analitzem el nombre de pesos únics per neurona d'entrada en diverses xarxes. Aprofitant optimitzacions comunes com la quantització lineal, observem un nombre molt reduït de pesos únics per entrada en diverses capes FC de DNNs modernes. A continuació, per millorar l'eficiència energètica del càlcul de les capes FC, presentem CREW, un accelerador que implementa un eficient mecanisme de reutilització de càlculs i emmagatzematge dels pesos. CREW redueix el nombre de multiplicacions i proporciona estalvis importants en l'ús de la memòria. Avaluem CREW en un conjunt divers de DNNs modernes. CREW proporciona, de mitjana, una millora en rendiment de 2,61x i un estalvi d'energia de 2,42x. En tercer lloc, proposem un mecanisme per optimitzar la inferència de les RNNs. Les cel·les de les xarxes recurrents realitzen multiplicacions element a element de les activacions de diferents comportes, sigmoides i tanh sent les funcions habituals d'activació. Realitzem una anàlisi dels valors de les funcions d'activació i mostrem que una fracció significativa està saturada cap a zero o un en un conjunto d'RNNs populars. A continuació, proposem CGPA per podar dinàmicament les activacions de les RNNs a una granularitat gruixuda. CGPA evita l'avaluació de neurones senceres cada vegada que les sortides de neurones parelles estan saturades. CGPA redueix significativament la quantitat de càlculs i accessos a la memòria, aconseguint en mitjana un 12% de millora en el rendiment i estalvi d'energia. Finalment, en l'última contribució d'aquesta tesi ens centrem en metodologies de poda estàtica de les DNNs. La poda redueix la petjada de memòria i el treball computacional mitjançant l'eliminació de connexions o neurones redundants. Tanmateix, mostrem que els esquemes de poda previs fan servir un procés iteratiu molt llarg que requereix l'entrenament de les DNNs moltes vegades per ajustar els paràmetres de poda. A continuació, proposem un esquema de poda basat en l'anàlisi de components principals i la importància relativa de les connexions de cada neurona que optimitza automàticament el DNN optimitzat en un sol tret sense necessitat de sintonitzar manualment múltiples paràmetres
Khan, Muhammad Jazib. "Programmable Address Generation Unit for Deep Neural Network Accelerators." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-271884.
Full textConvolutional Neural Networks blir mer och mer populära på grund av deras applikationer inom revolutionerande tekniker som autonom körning, biomedicinsk bildbehandling och naturligt språkbearbetning. Med denna ökning av antagandet ökar också komplexiteten hos underliggande algoritmer. Detta medför implikationer för beräkningsplattformarna såväl som GPU: er, FPGAeller ASIC-baserade acceleratorer, särskilt för Adressgenerationsenheten (AGU) som är ansvarig för minnesåtkomst. Befintliga acceleratorer har normalt Parametrizable Datapath AGU: er som har mycket begränsad anpassningsförmåga till utveckling i algoritmer. Därför krävs ny hårdvara för nya algoritmer, vilket är en mycket ineffektiv metod när det gäller tid, resurser och återanvändbarhet. I denna forskning utvärderas sex algoritmer med olika implikationer för hårdvara för adressgenerering och en helt programmerbar AGU (PAGU) presenteras som kan anpassa sig till dessa algoritmer. Dessa algoritmer är Standard, Strided, Dilated, Upsampled och Padded convolution och MaxPooling. Den föreslagna AGU-arkitekturen är en Very Long Instruction Word-baserad applikationsspecifik instruktionsprocessor som har specialiserade komponenter som hårdvara räknare och noll-overhead-slingor och en kraftfull Instruktionsuppsättning Arkitektur (ISA) som kan modellera statiska och dynamiska begränsningar och affinera och icke-affinerad adress ekvationer. Målet har varit att minimera flexibiliteten kontra avvägning av område, kraft och prestanda. För ett fungerande testnätverk av semantisk segmentering har resultaten visat att PAGU visar nära den perfekta prestanda, 1 cykel per adress, för alla algoritmer som beaktas undantar Upsampled Convolution för vilken det är 1,7 cykler per adress. Området för PAGU är ungefär 4,6 gånger större än Parametrizable Datapath-metoden, vilket fortfarande är rimligt med tanke på de stora flexibilitetsfördelarna. Potentialen för PAGU är inte bara begränsad till neurala nätverksapplikationer utan också i mer allmänna digitala signalbehandlingsområden som kan utforskas i framtiden.
Jalasutram, Rommel. "Acceleration of spiking neural networks on multicore architectures." Connect to this title online, 2009. http://etd.lib.clemson.edu/documents/1252424720/.
Full textHan, Bing. "ACCELERATION OF SPIKING NEURAL NETWORK ON GENERAL PURPOSE GRAPHICS PROCESSORS." University of Dayton / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1271368713.
Full textChen, Yu-Hsin Ph D. Massachusetts Institute of Technology. "Architecture design for highly flexible and energy-efficient deep neural network accelerators." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117838.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 141-147).
Deep neural networks (DNNs) are the backbone of modern artificial intelligence (AI). However, due to their high computational complexity and diverse shapes and sizes, dedicated accelerators that can achieve high performance and energy efficiency across a wide range of DNNs are critical for enabling AI in real-world applications. To address this, we present Eyeriss, a co-design of software and hardware architecture for DNN processing that is optimized for performance, energy efficiency and flexibility. Eyeriss features a novel Row-Stationary (RS) dataflow to minimize data movement when processing a DNN, which is the bottleneck of both performance and energy efficiency. The RS dataflow supports highly-parallel processing while fully exploiting data reuse in a multi-level memory hierarchy to optimize for the overall system energy efficiency given any DNN shape and size. It achieves 1.4x to 2.5x higher energy efficiency than other existing dataflows. To support the RS dataflow, we present two versions of the Eyeriss architecture. Eyeriss v1 targets large DNNs that have plenty of data reuse. It features a flexible mapping strategy for high performance and a multicast on-chip network (NoC) for high data reuse, and further exploits data sparsity to reduce processing element (PE) power by 45% and off-chip bandwidth by up to 1.9x. Fabricated in a 65nm CMOS, Eyeriss v1 consumes 278 mW at 34.7 fps for the CONV layers of AlexNet, which is 10x more efficient than a mobile GPU. Eyeriss v2 addresses support for the emerging compact DNNs that introduce higher variation in data reuse. It features a RS+ dataflow that improves PE utilization, and a flexible and scalable NoC that adapts to the bandwidth requirement while also exploiting available data reuse. Together, they provide over 10x higher throughput than Eyeriss v1 at 256 PEs. Eyeriss v2 also exploits sparsity and SIMD for an additional 6x increase in throughput.
by Yu-Hsin Chen.
Ph. D.
Gaura, Elena Ioana. "Neural network techniques for the control and identification of acceleration sensors." Thesis, Coventry University, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.313132.
Full textAnderson, Thomas. "Built-In Self Training of Hardware-Based Neural Networks." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1512039036199393.
Full textWijekoon, Jayawan. "Mixed signal VLSI circuit implementation of the cortical microcircuit models." Thesis, University of Manchester, 2011. https://www.research.manchester.ac.uk/portal/en/theses/mixed-signal-vlsi-circuit-implementation-of-the-cortical-microcircuit-models(6deb2d34-5811-42ec-a4f1-e11cdb6816f1).html.
Full textNgo, Kalle. "FPGA Hardware Acceleration of Inception Style Parameter Reduced Convolution Neural Networks." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-205026.
Full textReiche, Myrgård Martin. "Acceleration of deep convolutional neural networks on multiprocessor system-on-chip." Thesis, Uppsala universitet, Avdelningen för datorteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385904.
Full textSilfa, Franyell. "Energy-efficient architectures for recurrent neural networks." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671448.
Full textLos algoritmos de aprendizaje profundo han tenido un éxito notable en aplicaciones como el reconocimiento automático de voz y la traducción automática. Por ende, estas aplicaciones son omnipresentes en nuestras vidas y se encuentran en una gran cantidad de dispositivos. Estos algoritmos se componen de Redes Neuronales Profundas (DNN), tales como las Redes Neuronales Convolucionales y Redes Neuronales Recurrentes (RNN), las cuales tienen un gran número de parámetros y cálculos. Por esto implementar DNNs en dispositivos móviles y servidores es un reto debido a los requisitos de memoria y energía. Las RNN se usan para resolver problemas de secuencia a secuencia tales como traducción automática. Estas contienen dependencias de datos entre las ejecuciones de cada time-step, por ello la cantidad de paralelismo es limitado. Por eso la evaluación de RNNs de forma energéticamente eficiente es un reto. En esta tesis se estudian RNNs para mejorar su eficiencia energética en arquitecturas especializadas. Para esto, proponemos técnicas de ahorro energético y arquitecturas de alta eficiencia adaptadas a la evaluación de RNN. Primero, caracterizamos un conjunto de RNN ejecutándose en un SoC. Luego identificamos que acceder a la memoria para leer los pesos es la mayor fuente de consumo energético el cual llega hasta un 80%. Por ende, creamos E-PUR: una unidad de procesamiento para RNN. E-PUR logra una aceleración de 6.8x y mejora el consumo energético en 88x en comparación con el SoC. Esas mejoras se deben a la maximización de la ubicación temporal de los pesos. En E-PUR, la lectura de los pesos representa el mayor consumo energético. Por ende, nos enfocamos en reducir los accesos a la memoria y creamos un esquema que reutiliza resultados calculados previamente. La observación es que al evaluar las secuencias de entrada de un RNN, la salida de una neurona dada tiende a cambiar ligeramente entre evaluaciones consecutivas, por lo que ideamos un esquema que almacena en caché las salidas de las neuronas y las reutiliza cada vez que detecta un cambio pequeño entre el valor de salida actual y el valor previo, lo que evita leer los pesos. Para decidir cuándo usar un cálculo anterior utilizamos una Red Neuronal Binaria (BNN) como predictor de reutilización, dado que su salida está altamente correlacionada con la salida de la RNN. Esta propuesta evita más del 24.2% de los cálculos y reduce el consumo energético promedio en 18.5%. El tamaño de la memoria de los modelos RNN suele reducirse utilizando baja precisión para la evaluación y el almacenamiento de los pesos. En este caso, la precisión mínima utilizada se identifica de forma estática y se establece de manera que la RNN mantenga su exactitud. Normalmente, este método utiliza la misma precisión para todo los cálculos. Sin embargo, observamos que algunos cálculos se pueden evaluar con una precisión menor sin afectar la exactitud. Por eso, ideamos una técnica que selecciona dinámicamente la precisión utilizada para calcular cada time-step. Un reto de esta propuesta es como elegir una precisión menor. Abordamos este problema reconociendo que el resultado de una evaluación previa se puede emplear para determinar la precisión requerida en el time-step actual. Nuestro esquema evalúa el 57% de los cálculos con una precisión menor que la precisión fija empleada por los métodos estáticos. Por último, la evaluación en E-PUR muestra una aceleración de 1.46x con un ahorro de energía promedio de 19.2%
Torcolacci, Veronica. "Implementation of Machine Learning Algorithms on Hardware Accelerators." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.
Find full textTran, Ba-Hien. "Advancing Bayesian Deep Learning : Sensible Priors and Accelerated Inference." Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS280.pdf.
Full textOver the past decade, deep learning has witnessed remarkable success in a wide range of applications, revolutionizing various fields with its unprecedented performance. However, a fundamental limitation of deep learning models lies in their inability to accurately quantify prediction uncertainty, posing challenges for applications that demand robust risk assessment. Fortunately, Bayesian deep learning provides a promising solution by adopting a Bayesian formulation for neural networks. Despite significant progress in recent years, there remain several challenges that hinder the widespread adoption and applicability of Bayesian deep learning. In this thesis, we address some of these challenges by proposing solutions to choose sensible priors and accelerate inference for Bayesian deep learning models. The first contribution of the thesis is a study of the pathologies associated with poor choices of priors for Bayesian neural networks for supervised learning tasks and a proposal to tackle this problem in a practical and effective way. Specifically, our approach involves reasoning in terms of functional priors, which are more easily elicited, and adjusting the priors of neural network parameters to align with these functional priors. The second contribution is a novel framework for conducting model selection for Bayesian autoencoders for unsupervised tasks, such as representation learning and generative modeling. To this end, we reason about the marginal likelihood of these models in terms of functional priors and propose a fully sample-based approach for its optimization. The third contribution is a novel fully Bayesian autoencoder model that treats both local latent variables and the global decoder in a Bayesian fashion. We propose an efficient amortized MCMC scheme for this model and impose sparse Gaussian process priors over the latent space to capture correlations between latent encodings. The last contribution is a simple yet effective approach to improve likelihood-based generative models through data mollification. This accelerates inference for these models by allowing accurate density-esimation in low-density regions while addressing manifold overfitting
CARRERAS, MARCO. "Acceleration of Artificial Neural Networks at the edge: adapting flexibly to emerging devices and models." Doctoral thesis, Università degli Studi di Cagliari, 2022. http://hdl.handle.net/11584/333521.
Full textHofmann, Jaco [Verfasser], Andreas [Akademischer Betreuer] Koch, and Mladen [Akademischer Betreuer] Berekovic. "An Improved Framework for and Case Studies in FPGA-Based Application Acceleration - Computer Vision, In-Network Processing and Spiking Neural Networks / Jaco Hofmann ; Andreas Koch, Mladen Berekovic." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2020. http://d-nb.info/1202923097/34.
Full textMealey, Thomas C. "Binary Recurrent Unit: Using FPGA Hardware to Accelerate Inference in Long Short-Term Memory Neural Networks." University of Dayton / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1524402925375566.
Full textLi, Zhuoer. "Étude de l'accélération matérielle reconfigurable pour les réseaux de neurones embarqués faible consommation." Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4025.
Full textThe Artificial Intelligence (AI) application landscape is undergoing a significant transformation as AI continues to permeate various fields, from healthcare to the automotive industry, leading to a surge in the demand for deploying neural networks directly on edge devices. This shift towards edge computing presents a set of challenges and requirements, particularly in terms of latency and energy efficiency. However, as deep learning evolves to address increasingly complex tasks, the complexity and depth of neural network models have also grown. Given that edge devices often operate in environments where energy consumption is limited, it becomes crucial to maximize energy efficiency while ensuring the complex functionalities of deep neural networks. Despite the evident demand, there remains a relative lack of efforts for strategies specifically aimed at reducing the energy cost of neural networks.This thesis aims to address these questions by investigating and evaluating methodologies for low-power neural network implementations on Field-Programmable Gate Arrays (FPGAs), known for their advantageous performance-to-power ratio and adaptability. This work implements various CNN topologies on FPGA platforms using High-Level Synthesis (HLS) methods in cooperation with other existing design and exploration approaches. This thesis then delves into a detailed comparison of the performance and energy efficiency of Artificial Neural Networks (ANNs) and their corresponding Spiking Neural Networks (SNNs) implemented on FPGA platforms. This comparison reveals an unexpected inefficiency in the FPGA implementations of SNNs compared to their equivalent ANNs.Building on these conclusions, this work then continues to explore power optimization methods for deep neural network inference on FPGAs, examining the potential of Partial Reconfiguration (PR) in this regard. The study investigates the use of a full PR methodology based on a cooperation of relevant system-level methodologies and synthesis tools to evaluate the energy efficiency of PR solutions and static solutions on several CNN benchmarks of different complexities. The results indicate that under the main condition of using an optimal reconfiguration control scheme, energy gains can reach a factor of two when PR is applied at the CNN layer level, and savings increase as the network size grows.This result stands as one of the pioneering demonstrations of the great potential of PR for Deep Learning applications on FPGAs. Research in this domain is still at an early stage, yet it already exhibits large promise for minimizing the size of programmable logic, particularly in terms of memory requirements.Additionally, this thesis contributes to an open neural network acceleration library with a compact 8-layer CNN for traffic sign recognition and a ResNet-18, with the relevant FPGA IPs to be made available online later
Wu, Gang. "Using GPU acceleration and a novel artificial neural networks approach for ultra-fast fluorescence lifetime imaging microscopy analysis." Thesis, University of Sussex, 2017. http://sro.sussex.ac.uk/id/eprint/71657/.
Full textKong, Yat Sheng [Verfasser], and Dieter [Akademischer Betreuer] Schramm. "Establishment of artificial neural network for suspension spring fatigue life prediction using strain and acceleration data / Yat Sheng Kong ; Betreuer: Dieter Schramm." Duisburg, 2019. http://d-nb.info/1191692558/34.
Full textViebke, André. "Accelerated Deep Learning using Intel Xeon Phi." Thesis, Linnéuniversitetet, Institutionen för datavetenskap (DV), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-45491.
Full textVogel, Sebastian A. A. Verfasser], Gerd [Akademischer Betreuer] [Ascheid, and Walter [Akademischer Betreuer] Stechele. "Design and implementation of number representations for efficient multiplierless acceleration of convolutional neural networks / Sebastian A. A. Vogel ; Gerd Ascheid, Walter Stechele." Aachen : Universitätsbibliothek der RWTH Aachen, 2020. http://d-nb.info/1220082716/34.
Full textAxillus, Viktor. "Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classification." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-19160.
Full textSlouka, Lukáš. "Implementace neuronové sítě bez operace násobení." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-386017.
Full textPETRINI, ALESSANDRO. "HIGH PERFORMANCE COMPUTING MACHINE LEARNING METHODS FOR PRECISION MEDICINE." Doctoral thesis, Università degli Studi di Milano, 2021. http://hdl.handle.net/2434/817104.
Full textPrecision Medicine is a new paradigm which is reshaping several aspects of clinical practice, representing a major departure from the "one size fits all" approach in diagnosis and prevention featured in classical medicine. Its main goal is to find personalized prevention measures and treatments, on the basis of the personal history, lifestyle and specific genetic factors of each individual. Three factors contributed to the rapid rise of Precision Medicine approaches: the ability to quickly and cheaply generate a vast amount of biological and omics data, mainly thanks to Next-Generation Sequencing; the ability to efficiently access this vast amount of data, under the Big Data paradigm; the ability to automatically extract relevant information from data, thanks to innovative and highly sophisticated data processing analytical techniques. Machine Learning in recent years revolutionized data analysis and predictive inference, influencing almost every field of research. Moreover, high-throughput bio-technologies posed additional challenges to effectively manage and process Big Data in Medicine, requiring novel specialized Machine Learning methods and High Performance Computing techniques well-tailored to process and extract knowledge from big bio-medical data. In this thesis we present three High Performance Computing Machine Learning techniques that have been designed and developed for tackling three fundamental and still open questions in the context of Precision and Genomic Medicine: i) identification of pathogenic and deleterious genomic variants among the "sea" of neutral variants in the non-coding regions of the DNA; ii) detection of the activity of regulatory regions across different cell lines and tissues; iii) automatic protein function prediction and drug repurposing in the context of biomolecular networks. For the first problem we developed parSMURF, a novel hyper-ensemble method able to deal with the huge data imbalance that characterizes the detection of pathogenic variants in the non-coding regulatory regions of the human genome. We implemented this approach with highly parallel computational techniques using supercomputing resources at CINECA (Marconi – KNL) and HPC Center Stuttgart (HLRS Apollo HAWK), obtaining state-of-the-art results. For the second problem we developed Deep Feed Forward and Deep Convolutional Neural Networks to respectively process epigenetic and DNA sequence data to detect active promoters and enhancers in specific tissues at genome-wide level using GPU devices to parallelize the computation. Finally we developed scalable semi-supervised graph-based Machine Learning algorithms based on parametrized Hopfield Networks to process in parallel using GPU devices large biological graphs, using a parallel coloring method that improves the classical Luby greedy algorithm. We also present ongoing extensions of parSMURF, very recently awarded by the Partnership for Advance in Computing in Europe (PRACE) consortium to further develop the algorithm, apply them to huge genomic data and embed its results into Genomiser, a state-of-the-art computational tool for the detection of pathogenic variants associated with Mendelian genetic diseases, in the context of an international collaboration with the Jackson Lab for Genomic Medicine.
Zmeškal, Jiří. "Extrémní učící se stroje pro předpovídání časových řad." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-376967.
Full textJasovský, Filip. "Realizace superpočítače pomocí grafické karty." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-220617.
Full textReoyo-Prats, Reine. "Etude du vieillissement de récepteurs solaires : estimation de propriétés thermophysiques par méthode photothermique associée aux outils issus de l'intelligence artificielle." Thesis, Perpignan, 2020. http://www.theses.fr/2020PERP0017.
Full textThe increasing energy consumption and the awareness of climate change induced by the increasing greenhouse gas emissions result in a progressive change of the energy model. Technologies based on renewable resources have been developing for several decades, such as concentrated solar power plants (CSP). So the issue of their sustainability is studied in many research programs. This thesis contributes to the development of a methodology for the accelerated ageing of the materials used in CSP receivers, which is the component submitted to concentrated solar radiation. For this purpose, several experimental protocols are carried out. Their efficiency is examined in light of the evolution of the radiative properties of the materials (absorptivity, emissivity). On another hand, the thermophysical properties such as the thermal conductivity and diffusivity are studied on a wider range of materials. Considering the limits of the current characterization methods, a new method for estimating these properties is developed. This is based on artificial neural networks and relies on photothermal experimental data
Pradhan, Manoj Kumar. "Conformal Thermal Models for Optimal Loading and Elapsed Life Estimation of Power Transformers." Thesis, Indian Institute of Science, 2004. https://etd.iisc.ac.in/handle/2005/97.
Full textPradhan, Manoj Kumar. "Conformal Thermal Models for Optimal Loading and Elapsed Life Estimation of Power Transformers." Thesis, Indian Institute of Science, 2004. http://hdl.handle.net/2005/97.
Full textNarmack, Kirilll. "Dynamic Speed Adaptation for Curves using Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233545.
Full textMorgondagens fordon kommer att vara mer sofistikerade, intelligenta och säkra än dagens fordon. Framtiden lutar mot fullständigt autonoma fordon. Detta examensarbete tillhandahåller en datadriven lösning för ett hastighetsanpassningssystem som kan beräkna ett fordons hastighet i kurvor som är lämpligt för förarens körstil, vägens egenskaper och rådande väder. Ett hastighetsanpassningssystem för kurvor har som mål att beräkna en fordonshastighet för kurvor som kan användas i Advanced Driver Assistance Systems (ADAS) eller Autonomous Driving (AD) applikationer. Detta examensarbete utfördes på Volvo Car Corporation. Litteratur kring hastighetsanpassningssystem samt faktorer som påverkar ett fordons hastighet i kurvor studerades. Naturalistisk bilkörningsdata samlades genom att köra bil samt extraherades från Volvos databas och bearbetades. Ett nytt hastighetsanpassningssystem uppfanns, implementerades samt utvärderades. Hastighetsanpassningssystemet visade sig vara kapabelt till att beräkna en lämplig fordonshastighet för förarens körstil under rådande väderförhållanden och vägens egenskaper. Två olika artificiella neuronnätverk samt två matematiska modeller användes för att beräkna fordonets hastighet. Dessa metoder jämfördes och utvärderades.
Jebelli, Ali. "Development of Sensors and Microcontrollers for Underwater Robots." Thesis, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31283.
Full textLee, Heng, and 李亨. "Convolutional Neural Network Accelerator with Vector Quantization." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/w7kr56.
Full text國立臺灣大學
電子工程學研究所
107
Deep neural networks (DNNs) have demonstrated impressive performance in many edge computer vision tasks, causing the increasing demand for DNN accelerator on mobile and internet of things (IoT) devices. However, the massive power consumption and storage requirement make the hardware design challenging. In this paper, we introduce a DNN accelerator based on a model compression technique vector quantization (VQ), which can reduce the network model size and computation cost simultaneously. Moreover, a specialized processing element (PE) is designed with various SRAM bank configurations as well as dataflows such that it can support different codebook/kernel sizes, and keep high utilization under small input or output channel numbers. Compared to the state-of-the-art, the proposed accelerator architecture achieves 3.94 times reduction in memory access and 1.2 times in latency for batch-one inference.
Chen, Chun-Chen, and 陳俊辰. "Design Exploration Methodology for Deep Convolutional Neural Network Accelerator." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/sj63xu.
Full textYu-LinHu and 胡雨霖. "General Accelerator Study and Design for Convolutional Neural Network." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/86u346.
Full text國立成功大學
電機工程學系
107
The hardware design of Convolutional Neural Networks (CNN) facing the following problems: high complexity of computation, large amount of data movement and divergence to different neural network in structural domain. The previous work has dealt well with the first two problems but fail to take the third question in a wide consideration. After analyzing the state-to-art CNN accelerators and the design space they exploiting, we try to develop a format that can describe the full design space. Base on our design space exploration and hardware evaluation, we propose a novel general CNN hardware accelerator, which contain: hierarchical memory storage, variable length and width two-dimensional hardware processing unit set, and elastic data distributor. Our work shows higher multipliers usage in FPGA result compared with previous FPGA design. On the other hands, our work is as efficient as other two latest works in ASIC synthesis estimate.
ACHARJEE, SUVAJIT, and 蘇沃杰. "Hardware Efficient Accelerator for Binary Convolution Neural Network Inference." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/2c8792.
Full text國立交通大學
電機資訊國際學程
107
Binary Neural Network is such a topic in this recent era that it is improving day by day to improve the use in computer vision such as recognition, object detection, depth perception etc. However, most of the existing designs suffer from low hardware utilization or complex circuits that result in high hardware cost. In addition, a large amount of computation redundancy still exists in BNN inference. Therefore, to overcome all these issues like hardware utilization and the problem of computational complexity, this design has adopted systolic array architecture, which takes binarized inputs and weights. It is drastically reduced since weights and activations can be stored as single bit i.e., +1 is stored as 1, and -1 is stored as 0. In addition, the problem of computational is solved when it replaced the MAC operations by bitwise operations. In this design, eight PEs are used and each PE is parallel processed with each accumulator where convolution 3x3 kernel size is filtered in each PE block. The throughput is increased and operating frequency at maximum of 188.67 MHz and minimum at 125 MHz. Our results shown with eight PEs , the design achieves and support 63.168 GOPS, which is 10x more area efficient with other results. The power consumption after simulating the RTL synthesis is 0.014W. The architecture implemented successfully in Xilinx ISE 14.7 using the Spartan 6 series FPGA. This design also shows better area and bandwidth efficiency compared to the other state-of-the-art works.
Lin, Chien-Yu, and 林建宇. "Merlin: A Sparse Neural Network Accelerator Utilizing Both Neuron and Weight Sparsity." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/6aq7yc.
Full textWu, Yi-Heng, and 吳奕亨. "Compressing Convolutional Neural Network by VectorQuantization : Implementation and Accelerator Design." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/959vy5.
Full text國立臺灣大學
電子工程學研究所
105
In recent years, deep convolutional neural networks~(CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional~(CONV) and fully connected~(FC) layers. In this thesis, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access, subsequently reduce power consumption. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to state-of-the-art accelerators for CNN, the proposed architecture achieves 1.2--5x less DRAM access and 1.5--5x higher throughput for both CONV and FC layers.
Kung, Chu King, and 江子近. "An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/y475rn.
Full text國立臺灣大學
電子工程學研究所
107
The recent resurgence of artificial intelligence is due to advances in deep learning. Deep neural network (DNN) has exceeded human capability in many computer vision applications, such as object detection, image classification and playing games like Go. The idea of deep learning dates back to as early as the 1950s, with the key algorithmic breakthroughs occurred in the 1980s. Yet, it has only been in the past few years, that powerful hardware accelerators became available to train neural networks. Even now, the demand for machine learning algorithms is still increasing; and it is affecting almost every industry. Therefore, designing a powerful and efficient hardware accelerator for deep learning algorithms is of critical importance for the time being. The accelerators that run the deep learning algorithm must be general enough to support deep neural networks with various computational structures. For instance, general-purpose graphics processing units (GP-GPUs) were widely adopted for deep learning tasks ever since they allow users to execute arbitrary code on them. Other than graphics processing units, researchers have also paid a lot of attention to hardware acceleration of deep neural networks (DNNs) in the last few years. Google developed its own chip called the Tensor Processing Unit (TPU) to power its own machine learning services [8]; while Intel unveiled its first generation of ASIC processor, called Nervana, for deep learning a few years ago [9]. ASICs usually provide a better performance, compared with FPGA and software implementations. Nevertheless, existing accelerators mostly focus on inference. However, local DNN training is still required to meet the needs of new applications, such as incremental learning and on-device personalization. Unlike inference, training requires high dynamic range in order to deliver high learning quality. In this work, we introduce the floating-point signed digit (FloatSD) data representation format for reducing computational complexity required for both the inference and the training of a convolutional neural network (CNN). By co-designing data representation and circuit, we demonstrate that we can achieve high raw performance and optimal efficiency – both energy and area – without sacrificing the quality of training. This work focuses on the design of FloatSD based system on chip (SOC) for AI training and inference. The SOC consists of an AI IP, integrated DDR3 controller and ARC HS34 CPU through AXI/AHB standard AMBA interfaces. The platform can be programmed by the CPU via the AHB slave port to fit various neural network topologies. The completed SOC has been tested and validated on the HAPS-80 FPGA platform. A synthesis and automated place and route (APR) flow is used to tape out a 28 nm test chip, after testing and verifying the correctness of the SOC. At its normal operating condition (e.g. 400MHz), the accelerator is capable of 1.38 TFLOPs peak performance and 2.34 TFLOPS/W.
Chen, Chih-Chiang, and 陳致強. "Energy-Efficient Accelerator and Data Processing Flow for Convolutional Neural Network." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/fy245e.
Full text國立交通大學
電子研究所
106
For recent years, Machine learning and Convolutional Neural Network (CNN) has become the most popular research topic in this era. Restricted to the hardware technique that has not become mature, this topic is not being fully developed before. Since CNN needs a lot of calculation and a large amount of data access and movement, the energy cost on the data access may even exceed the computation consumption. Therefore, how to manage data reuse efficiently and reduce data access has turned into a research theme. In this thesis, we propose a Processing Element (PE) that makes data reuse effectively. Meanwhile, we propose a data processing flow. With that flow, data can be propagated between each PE, making data reuse more frequently. Besides, we propose a 3D/2.5D accelerator system architecture. Transmitting data with TSV can further decrease the energy consumption. We make a comparison table of 2D, 2.5D, and 3D in speed, power, etc. Also, we propose a FPGA implementation design flow for reference and the future research. We present a new innovative reconfigurable accelerator for deep learning networks which has the advantages of both computation-intensive and data-intensive applications. This new reconfigurable computing hardware technique can mitigate the power and memory walls for both computation- and data-intensive applications such as, computer vision, computer graphics, convolution neural networks and deep learning networks.
Chen, Yi-Kai, and 陳奕愷. "Architecture Design of Energy-Efficient Reconfigurable Deep Convolutional Neural Network Accelerator." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/46a96s.
Full textJuang, Tzung-Han, and 莊宗翰. "Energy-Efficient Accelerator Architecture for Neural Network Training and Its Circuit Design." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/sffx7b.
Full text國立臺灣大學
電子工程學研究所
106
Artificial intelligence (AI) has become the most popular research topic in recent years. AI can be applied to applications on image classification, object detection and natural language processing. Especially, researchers have breakthroughs on such fields with neural networks. Neural network is known for its versatile and deep architectures, which can have more than hundreds of layers. Such structure make neural network needs large amount of computation and memory. Improvement of hardware acceleration on graphics processing units (GPU) make neural networks be possible to be applied to practical applications. However, GPU tends to have large volume and is very power hungry. Many researches focused on reducing the resources of computation used in neural network and implementation on specific hardware. Most of these works only support acceleration on inference phase. Other than inference, this thesis proposed architecture that can also support training phase, which is based on backpropagation algorithm to find optimal models of neural networks. Training phase includes forward pass, backward pass and weight update, while inference only contains forward pass. This thesis is devoted to designing a unified architecture that can process these three stages in training phase on convolutional neural networks (CNN). In addition, IO bandwidth is always the bottleneck of accelerator design. To reduce data bandwidth, this thesis uses floating-point signed digit algorithm (FloatSD) and quantization techniques in previous work as basis to reduce neural network size and bit width of data values. The previous work can reach 0.8% loss of top-5 accuracy on ImageNet dataset compared to floating-point version. This thesis designs hardware accelerator for training neural networks, including the designs on data flow for processing, AMBA interface and memory settings. The design is an IP-level engine that can be applied to SOC platform. In addition, this thesis also focuses on optimizing data reusing to make the system have efficient DRAM access. Keyword: Convolutional neural network, Backpropagation, FloatSD
Hsu, Lien-Chih, and 徐連志. "ESSA: An Energy-Aware Bit-Serial Streaming Deep Convolutional Neural Network Accelerator." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/859cgm.
Full text國立清華大學
資訊工程學系所
107
Over the past decade, deep convolutional neural networks (CNN) have been widely embraced in various visual recognition applications due to their extraordinary accuracies that even surpassed those of human beings. However, the high computational complexity and massive amount of data storages are two challenges for the hardware design of CNN. Although the existence of GPU can deal with the high computational complexity, the large energy consumption due to huge external memory access has pushed researchers towards dedicated CNN accelerator designs. Generally, the precision of the modern CNN accelerators are set to 16-bit fixed-point. To reduce data storages, Sakr et al. [1] show that less precision can be used under the constraint of 1% accuracy degradation in recognitions. Besides, per-layer precision assignments can reach lower bit-width requirements than uniform precision assignment for all layers. In this paper, we propose an energy-aware bit-serial streaming Deep CNN accelerator to tackle the computational complexity, data storage and external memory access issues. With the ring streaming dataflow and the output reuse strategy to decrease the data access, the amount of external DRAM access for the convolutional layers is reduced by 357.26x compared to that of no output reuse case on AlexNet. In addition, we optimize the hardware utilization and avoid the unnecessary computations by the loop tiling technique and by mapping strides of convolutional layers to the unit-ones for computational performance enhancement. Furthermore, the bit-serial processing element (PE) is designed for using less number of bits in weights, which can reduce both the amount of computation and external memory access. We evaluate our design with the well-known roofline model, which is an efficient way for evaluation compared to real hardware implementation. The design space is explored to find the solution with the best computational performance and comunication to computation (CTC) ratio. Assume using the same FPGA as Chen et al. [2], we can reach 1.36x speed up and reduce 41% energy consumption for external memory access compared to the design in [2]. On the aspect of the hardware implementation for our PE-Array architecture design, the implementation can reach the operating frequency of 119 MHz and consume 68 k gates with the power consumption of 10.08mW under TSMC 90 nm technology. Compared to the 15.4 MB external memory access for Eyeriss [3] on the convolutional layers of AlexNet, our work only need 4.36 MB external memory access that dramatically reduce the most energy-consuming part of power consumption.
Shen, En-Ho, and 沈恩禾. "Reconfigurable Low Arithmetic Precision Convolution Neural Network Accelerator VLSI Design and Implementation." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/7678c2.
Full text國立臺灣大學
電子工程學研究所
107
Deep neural networks (DNNs) shows promising results on various AI application tasks. However such networks typically are executed on general purpose GPUs with bulky size in form factor and hundreds of watt in power consumption, which unsuitable for mobile applications. In this thesis, we present a VLSI architecture able to process on quantized low numeric-precision convolution neural networks (CNNs), cutting down on power consumption from memory access and speeding the model up with limited area budget,particularlyfitformobiledevices.We first propose a quantization re-trainig algorithm for trainig low-precision CNN, then a dataflow with high data reuse rate with a specially data multiplication accumulation strategy specially designed for such quantized model. To fully utilize the efficiency of computation with such low-precision data, we design a micro-architecture for low bit-length multiplication and accumulation, then a on-chip memory hierarchy and data re-alignment flow for power saving and avoiding buffer bank-conflicts, and a PE array designed for taking broadcast-ed data from buffer and sending out finished data sequentially back to buffer for such dataflow. The architecture is highly flexible for various CNN shaped and re-configurable for low bit-length quantized models. The design synthesised with a 180KB on-chip memory capacity and a 1340k logic gate counts area, the implementation resultshows state-of-the-art hardware efficiency.
Mohammadi, Mahnaz. "An Accelerator for Machine Learning Based Classifiers." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/4245.
Full textYen-Hsing and 李彥興. "Design of Low Complexity Convolutional Neural Network Accelerator for Finger-Vein Identification System." Thesis, 2019. http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5441060%22.&searchmode=basic.
Full text國立中興大學
電機工程學系所
107
Vein identification is a vital branch with more invisibility and unique features among biometric field. The reason is that veins belong to the physiological characteristics of human beings, and must be irradiated by specific bands of light to obtain a complete vein image. Also, taking vein image for identification has strong reliability for the reason that each person’s vein has its unique pattern. Thanks to the advanced computer technology, neural network has rapidly become main trend of classification method for image classification. Through establishing big database, we can utilize neural network for analyzing features, thus become the classification basis of database. What’s more, users usually won’t want their personal information be uploaded to clouds. Therefore, edge computing has become a vital issue for the sake of protecting user’s privacy. Inspired by these concepts, we proposed a low complexity convolutional neural network for finger vein recognition with top-1 accuracy of 95%. This neural network system can operate independently in client mode. After fetching the finger vein image of the user through the near-infrared camera mounted on Raspberry Pi embedded board, the vein feature can be efficiently extracted by vein curving algorithms and the identity of the user can quickly be returned. In order to implement the concept of edge computing, our proposed system is characterized by designing a silicon intellectual property (SIP) for the purpose of shortening the inference time of neural network. Compared to the ARM Cortex-A9 dual core CPU running in Linux environment working at 650 MHz, 120x acceleration can be obtained. Simulation and verification used the SDUMLA-HMT finger vein database [28] provided by Machine Learning and Data Mining Laboratory of Shandong University and our laboratory’s own database. Both dataset can reach 95% of accuracy for 10 users’ identity identification while inferenced by our low complexity neural network. Moreover, with low complexity, real-time identification can be achieved at the neural network inference side, and further close to commercialization.
Wu, I.-Chen, and 吳易真. "An Energy-Efficient Accelerator with Relative-Indexing Memory for Sparse Compressed Convolutional Neural Network." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/tx6yx4.
Full text