Academic literature on the topic 'Reconfigurable Hardware Accelerator'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Reconfigurable Hardware Accelerator.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Reconfigurable Hardware Accelerator"

1

Xiong, Hao, Kelin Sun, Bing Zhang, Jingchuan Yang, and Huiping Xu. "Deep-Sea: A Reconfigurable Accelerator for Classic CNN." Wireless Communications and Mobile Computing 2022 (February 2, 2022): 1–23. http://dx.doi.org/10.1155/2022/4726652.

Full text
Abstract:
To meet the changing real-time edge engineering application requirements of CNN, aiming at the lack of universality and flexibility of CNN hardware acceleration architecture based on ARM+FPGA, a general low-power all pipelined CNN hardware acceleration architecture is proposed to cope with the continuously updated CNN algorithm and accelerate in hardware platforms with different resource constraints. In the framework of the general hardware architecture, a basic instruction set belonging to the architecture is proposed, which can be used to calculate and configure different versions of CNN algorithms. Based on the instruction set, the configurable computing subsystem, memory management subsystem, on-chip cache subsystem, and instruction execution subsystem are designed and implemented. In addition, in the processing of convolution results, the on-chip storage unit is used to preprocess the convolution results, to speed up the activation and pooling calculation process in parallel. Finally, the accelerator is modeled at the RTL level and deployed on the XC7Z100 heterogeneous device. The lightweight networks YOLOv2-tiny and YOLOv3-tiny commonly used in engineering applications are verified on the accelerator. The results show that the peak performance of the accelerator reaches 198.37 GOP/s, the clock frequency reaches 210 MHz, and the power consumption is 4.52 w under 16-bit width.
APA, Harvard, Vancouver, ISO, and other styles
2

An, Fubang, Lingli Wang, and Xuegong Zhou. "A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network." Electronics 12, no. 13 (June 27, 2023): 2847. http://dx.doi.org/10.3390/electronics12132847.

Full text
Abstract:
Since the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for EfficientNet still have much room to improve the performance of the depthwise convolution, squeeze-and-excitation module and nonlinear activation functions. In this paper, we first design a reconfigurable register array and computational kernel to accelerate the depthwise convolution. Next, we propose a vector unit to implement the nonlinear activation functions and the scale operation. An exchangeable-sequence dual-computational kernel architecture is proposed to improve the performance and the utilization. In addition, the memory architectures are designed to complete the hardware accelerator for the above computing architecture. Finally, in order to evaluate the performance of the hardware accelerator, the accelerator is implemented based on Xilinx XCVU37P. The results show that the proposed accelerator can work at the main system clock frequency of 300 MHz with the DSP kernel at 600 MHz. The performance of EfficientNet-B3 in our architecture can reach 69.50 FPS and 255.22 GOPS. Compared with the latest EfficientNet-B3 accelerator, which uses the same FPGA development board, the accelerator proposed in this paper can achieve a 1.28-fold improvement of single-core performance and 1.38-fold improvement of performance of each DSP.
APA, Harvard, Vancouver, ISO, and other styles
3

Nakasato, N., T. Hamada, and T. Fukushige. "Galaxy Evolution with Reconfigurable Hardware Accelerator." EAS Publications Series 24 (2007): 291–92. http://dx.doi.org/10.1051/eas:2007043.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ebrahim, Ali. "Finding the Top-K Heavy Hitters in Data Streams: A Reconfigurable Accelerator Based on an FPGA-Optimized Algorithm." Electronics 12, no. 11 (May 24, 2023): 2376. http://dx.doi.org/10.3390/electronics12112376.

Full text
Abstract:
This paper presents a novel approach for accelerating the top-k heavy hitters query in data streams using Field Programmable Gate Arrays (FPGAs). Current hardware acceleration approaches rely on the direct and strict mapping of software algorithms into hardware, limiting their performance and practicality due to the lack of hardware optimizations at an algorithmic level. The presented approach optimizes a well-known software algorithm by carefully relaxing some of its requirements to allow for the design of a practical and scalable hardware accelerator that outperforms current state-of-the-art accelerators while maintaining near-perfect accuracy. This paper details the design and implementation of an optimized FPGA accelerator specifically tailored for computing the top-k heavy hitters query in data streams. The presented accelerator is entirely specified at the C language level and is easily reproducible with High-Level Synthesis (HLS) tools. Implementation on Intel Arria 10 and Stratix 10 FPGAs using Intel HLS compiler showed promising results—outperforming prior state-of-the-art accelerators in terms of throughput and features.
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Xvpeng, Bingqiang Liu, Yaqi Zhao, Xiaoyu Hu, Zixuan Shen, Zhaoxia Zheng, Zhenglin Liu, et al. "Design and Analysis of Area and Energy Efficient Reconfigurable Cryptographic Accelerator for Securing IoT Devices." Sensors 22, no. 23 (November 25, 2022): 9160. http://dx.doi.org/10.3390/s22239160.

Full text
Abstract:
Achieving low-cost and high-performance network security communication is necessary for Internet of Things (IoT) devices, including intelligent sensors and mobile robots. Designing hardware accelerators to accelerate multiple computationally intensive cryptographic primitives in various network security protocols is challenging. Different from existing unified reconfigurable cryptographic accelerators with relatively low efficiency and high latency, this paper presents design and analysis of a reconfigurable cryptographic accelerator consisting of a reconfigurable cipher unit and a reconfigurable hash unit to support widely used cryptographic algorithms for IoT Devices, which require block ciphers and hash functions simultaneously. Based on a detailed and comprehensive algorithmic analysis of both the block ciphers and hash functions in terms of basic algorithm structures and common cryptographic operators, the proposed reconfigurable cryptographic accelerator is designed by reusing key register files and operators to build unified data paths. Both the reconfigurable cipher unit and the reconfigurable hash unit contain a unified data path to implement Data Encryption Standard (DES)/Advanced Encryption Standard (AES)/ShangMi 4 (SM4) and Secure Hash Algorithm-1 (SHA-1)/SHA-256/SM3 algorithms, respectively. A reconfigurable S-Box for AES and SM4 is designed based on the composite field Galois field (GF) GF(((22)2)2), which significantly reduces hardware overhead and power consumption compared with the conventional implementation by look-up tables. The experimental results based on 65-nm application-specific integrated circuit (ASIC) implementation show that the achieved energy efficiency and area efficiency of the proposed design is 441 Gbps/W and 37.55 Gbps/mm2, respectively, which is suitable for IoT devices with limited battery and form factor. The result of delay analysis also shows that the number of delay cycles of our design can be reduced by 83% compared with the state-of-the-art design, which shows that the proposed design is more suitable for applications including 5G/Wi-Fi/ZigBee/Ethernet network standards to accelerate block ciphers and hash functions simultaneously.
APA, Harvard, Vancouver, ISO, and other styles
6

Milik, Adam, and Andrzej Pułka. "The Reconfigurable Hardware Accelerator for Searching Genome Patterns." IFAC Proceedings Volumes 42, no. 1 (2009): 33–38. http://dx.doi.org/10.3182/20090210-3-cz-4002.00010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ibrahim, Atef, Hamed Elsimary, Abdullah Aljumah, and Fayez Gebali. "Reconfigurable Hardware Accelerator for Profile Hidden Markov Models." Arabian Journal for Science and Engineering 41, no. 8 (May 18, 2016): 3267–77. http://dx.doi.org/10.1007/s13369-016-2162-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Vranjkovic, Vuk, Predrag Teodorovic, and Rastislav Struharik. "Universal Reconfigurable Hardware Accelerator for Sparse Machine Learning Predictive Models." Electronics 11, no. 8 (April 8, 2022): 1178. http://dx.doi.org/10.3390/electronics11081178.

Full text
Abstract:
This study presents a universal reconfigurable hardware accelerator for efficient processing of sparse decision trees, artificial neural networks and support vector machines. The main idea is to develop a hardware accelerator that will be able to directly process sparse machine learning models, resulting in shorter inference times and lower power consumption compared to existing solutions. To the author’s best knowledge, this is the first hardware accelerator of this type. Additionally, this is the first accelerator that is capable of processing sparse machine learning models of different types. Besides the hardware accelerator itself, algorithms for induction of sparse decision trees, pruning of support vector machines and artificial neural networks are presented. Such sparse machine learning classifiers are attractive since they require significantly less memory resources for storing model parameters. This results in reduced data movement between the accelerator and the DRAM memory, as well as a reduced number of operations required to process input instances, leading to faster and more energy-efficient processing. This could be of a significant interest in edge-based applications, with severely constrained memory, computation resources and power consumption. The performance of algorithms and the developed hardware accelerator are demonstrated using standard benchmark datasets from the UCI Machine Learning Repository database. The results of the experimental study reveal that the proposed algorithms and presented hardware accelerator are superior when compared to some of the existing solutions. Throughput is increased up to 2 times for decision trees, 2.3 times for support vector machines and 38 times for artificial neural networks. When the processing latency is considered, maximum performance improvement is even higher: up to a 4.4 times reduction for decision trees, a 84.1 times reduction for support vector machines and a 22.2 times reduction for artificial neural networks. Finally, since it is capable of supporting sparse classifiers, the usage of the proposed hardware accelerator leads to a significant reduction in energy spent on DRAM data transfers and a reduction of 50.16% for decision trees, 93.65% for support vector machines and as much as 93.75% for artificial neural networks, respectively.
APA, Harvard, Vancouver, ISO, and other styles
9

Schumacher, Tobias, Tim Süß, Christian Plessl, and Marco Platzner. "FPGA Acceleration of Communication-Bound Streaming Applications: Architecture Modeling and a 3D Image Compositing Case Study." International Journal of Reconfigurable Computing 2011 (2011): 1–11. http://dx.doi.org/10.1155/2011/760954.

Full text
Abstract:
Reconfigurable computers usually provide a limited number of different memory resources, such as host memory, external memory, and on-chip memory with different capacities and communication characteristics. A key challenge for achieving high-performance with reconfigurable accelerators is the efficient utilization of the available memory resources. A detailed knowledge of the memories' parameters is key for generating an optimized communication layout. In this paper, we discuss a benchmarking environment for generating such a characterization. The environment is built on IMORC, our architectural template and on-chip network for creating reconfigurable accelerators. We provide a characterization of the memory resources available on the XtremeData XD1000 reconfigurable computer. Based on this data, we present as a case study the implementation of a 3D image compositing accelerator that is able to double the frame rate of a parallel renderer.
APA, Harvard, Vancouver, ISO, and other styles
10

Pérez, Ignacio, and Miguel Figueroa. "A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems." Sensors 21, no. 8 (April 9, 2021): 2637. http://dx.doi.org/10.3390/s21082637.

Full text
Abstract:
Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Reconfigurable Hardware Accelerator"

1

Babecki, Christopher. "A Memory-Array Centric Reconfigurable Hardware Accelerator for Security Applications." Case Western Reserve University School of Graduate Studies / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=case1427381331.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Diniz, Claudio Machado. "Dedicated and reconfigurable hardware accelerators for high efficiency video coding standard." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2015. http://hdl.handle.net/10183/118394.

Full text
Abstract:
A demanda por vídeos de resolução ultra-alta (além de 1920x1080 pontos) levou à necessidade de desenvolvimento de padrões de codificação de vídeo novos e mais eficientes para prover alta eficiência de compressão. O novo padrão High Efficiency Video Coding (HEVC), publicado em 2013, atinge o dobro da eficiência de compressão (ou 50% de redução no tamanho do vídeo codificado) comparado com o padrão mais eficiente até então, e mais utilizado no mercado, o padrão H.264/AVC (Advanced Video Coding). O HEVC atinge este resultado ao custo de uma elevação da complexidade computacional das ferramentas inseridas no codificador e decodificador. O aumento do esforço computacional do padrão HEVC e as limitações de potência das tecnologias de fabricação em silício atuais tornam essencial o desenvolvimento de aceleradores de hardware para partes importantes da aplicação do HEVC. Aceleradores de hardware fornecem maior desempenho e eficiência energética para aplicações específicas que os processadores de propósito geral. Uma análise da aplicação do HEVC realizada neste trabalho identificou as partes mais importantes do HEVC do ponto de vista do esforço computacional, a saber, o Filtro de Interpolação de Ponto Fracionário, o Filtro de Deblocagem e o cálculo da Soma das Diferenças Absolutas. Uma análise de tempo de execução do Filtro de Interpolação indica um grande potencial de economia de potência/energia pela adaptação do acelerador de hardware à carga de trabalho variável. Esta tese introduz novas contribuições no tema de aceleradores dedicados e reconfiguráveis para o padrão HEVC. Aceleradores de hardware dedicados para o Filtro de Interpolação de Pixel Fracionário, para o Filtro de Deblocagem, e para o cálculo da Soma das Diferenças Absolutas, são propostos, projetados e avaliados nesta tese. A arquitetura de hardware proposta para o filtro de interpolação atinge taxa de processamento similar ao estado da arte, enquanto reduz a área do hardware para este bloco em 50%. A arquitetura de hardware proposta para o filtro de deblocagem também atinge taxa de processamento similar ao estado da arte com uma redução de 5X a 6X na contagem de gates e uma redução de 3X na dissipação de potência. A nova análise comparativa proposta para os elementos de processamento do cálculo da Soma das Diferenças Absolutas introduz diversas alternativas de projeto de arquitetura com diferentes resultados de área, desempenho e potência. A nova arquitetura reconfigurável para o filtro de interpolação do padrão HEVC fornece 57% de redução de área em tempo de projeto e adaptação da potência/energia em tempo-real a cada imagem processada, o que ainda não é suportado pelas arquiteturas do estado da arte para o filtro de interpolação. Adicionalmente, a tese propõe um novo esquema de alocação de aceleradores em tempo-real para arquiteturas reconfiguráveis baseadas em tiles de processamento e de grão-misto, o que reduz em 44% (23% em média) o “overhead” de comunicação comparado com uma estratégia first-fit com reuso de datapaths, para números diferentes de tiles e organizações internas de tile. Este esquema de alocação leva em conta a arquitetura interna para alocar aceleradores de uma maneira mais eficiente, evitando e minimizando a comunicação entre tiles. Os aceleradores e técnicas dedicadas e reconfiguráveis propostos nesta tese proporcionam implementações de codificadores de vídeo de nova geração, além do HEVC, com melhor área, desempenho e eficiência em potência.
The demand for ultra-high resolution video (beyond 1920x1080 pixels) led to the need of developing new and more efficient video coding standards to provide high compression efficiency. The High Efficiency Video Coding (HEVC) standard, published in 2013, reaches double compression efficiency (or 50% reduction in size of coded video) compared to the most efficient video coding standard at that time, and most used in the market, the H.264/AVC (Advanced Video Coding) standard. HEVC reaches this result at the cost of high computational effort of the tools included in the encoder and decoder. The increased computational effort of HEVC standard and the power limitations of current silicon fabrication technologies makes it essential to develop hardware accelerators for compute-intensive computational kernels of HEVC application. Hardware accelerators provide higher performance and energy efficiency than general purpose processors for specific applications. An HEVC application analysis conducted in this work identified the most compute-intensive kernels of HEVC, namely the Fractional-pixel Interpolation Filter, the Deblocking Filter and the Sum of Absolute Differences calculation. A run-time analysis on Interpolation Filter indicates a great potential of power/energy saving by adapting the hardware accelerator to the varying workload. This thesis introduces new contributions in the field of dedicated and reconfigurable hardware accelerators for HEVC standard. Dedicated hardware accelerators for the Fractional Pixel Interpolation Filter, the Deblocking Filter and the Sum of Absolute Differences calculation are herein proposed, designed and evaluated. The interpolation filter hardware architecture achieves throughput similar to the state of the art, while reducing hardware area by 50%. Our deblocking filter hardware architecture also achieves similar throughput compared to state of the art with a 5X to 6X reduction in gate count and 3X reduction in power dissipation. The thesis also does a new comparative analysis of Sum of Absolute Differences processing elements, in which various architecture design alternatives with different area, performance and power results were introduced. A novel reconfigurable interpolation filter hardware architecture for HEVC standard was developed, and it provides 57% design-time area reduction and run-time power/energy adaptation in a picture-by-picture basis, compared to the state-of-the-art. Additionally a run-time accelerator binding scheme is proposed for tile-based mixed-grained reconfigurable architectures, which reduces the communication overhead, compared to first-fit strategy with datapath reusing scheme, by up to 44% (23% on average) for different number of tiles and internal tile organizations. This run-time accelerator binding scheme is aware of the underlying architecture to bind datapaths in an efficient way, to avoid and minimize inter-tile communications. The new dedicated and reconfigurable hardware accelerators and techniques proposed in this thesis enable next-generation video coding standard implementations beyond HEVC with improved area, performance, and power efficiency.
APA, Harvard, Vancouver, ISO, and other styles
3

Das, Satyajit. "Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems." Thesis, Lorient, 2018. http://www.theses.fr/2018LORIS490/document.

Full text
Abstract:
La complexité des systèmes embarqués et des applications impose des besoins croissants en puissance de calcul et de consommation énergétique. Couplé au rendement en baisse de la technologie, le monde académique et industriel est toujours en quête d'accélérateurs matériels efficaces en énergie. L'inconvénient d'un accélérateur matériel est qu'il est non programmable, le rendant ainsi dédié à une fonction particulière. La multiplication des accélérateurs dédiés dans les systèmes sur puce conduit à une faible efficacité en surface et pose des problèmes de passage à l'échelle et d'interconnexion. Les accélérateurs programmables fournissent le bon compromis efficacité et flexibilité. Les architectures reconfigurables à gros grains (CGRA) sont composées d'éléments de calcul au niveau mot et constituent un choix prometteur d'accélérateurs programmables. Cette thèse propose d'exploiter le potentiel des architectures reconfigurables à gros grains et de pousser le matériel aux limites énergétiques dans un flot de conception complet. Les contributions de cette thèse sont une architecture de type CGRA, appelé IPA pour Integrated Programmable Array, sa mise en œuvre et son intégration dans un système sur puce, avec le flot de compilation associé qui permet d'exploiter les caractéristiques uniques du nouveau composant, notamment sa capacité à supporter du flot de contrôle. L'efficacité de l'approche est éprouvée à travers le déploiement de plusieurs applications de traitement intensif. L'accélérateur proposé est enfin intégré à PULP, a Parallel Ultra-Low-Power Processing-Platform, pour explorer le bénéfice de ce genre de plate-forme hétérogène ultra basse consommation
Emerging trends in embedded systems and applications need high throughput and low power consumption. Due to the increasing demand for low power computing and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy efficient hardware accelerators. The main drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low is they perform one specific function and increasing the number of the accelerators in a system on chip (SoC) causes scalability issues. Programmable accelerators provide flexibility and solve the scalability issues. Coarse-Grained Reconfigurable Array (CGRA) architecture consisting of several processing elements with word level granularity is a promising choice for programmable accelerator. Inspired by the promising characteristics of programmable accelerators, potentials of CGRAs in near threshold computing platforms are studied and an end-to-end CGRA research framework is developed in this thesis. The major contributions of this framework are: CGRA design, implementation, integration in a computing system, and compilation for CGRA. First, the design and implementation of a CGRA named Integrated Programmable Array (IPA) is presented. Next, the problem of mapping applications with control and data flow onto CGRA is formulated. From this formulation, several efficient algorithms are developed using internal resources of a CGRA, with a vision for low power acceleration. The algorithms are integrated into an automated compilation flow. Finally, the IPA accelerator is augmented in PULP - a Parallel Ultra-Low-Power Processing-Platform to explore heterogeneous computing
APA, Harvard, Vancouver, ISO, and other styles
4

Jung, Lukas Johannes [Verfasser], Christian [Akademischer Betreuer] Hochberger, and Diana [Akademischer Betreuer] Göhringer. "Optimization of the Memory Subsystem of a Coarse Grained Reconfigurable Hardware Accelerator / Lukas Johannes Jung ; Christian Hochberger, Diana Göhringer." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2019. http://d-nb.info/1187919810/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

El-Hassan, Fadi. "Hardware Architecture of an XML/XPath Broker/Router for Content-Based Publish/Subscribe Data Dissemination Systems." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/30660.

Full text
Abstract:
The dissemination of various types of data faces ongoing challenges with the growing need of accessing manifold information. Since the interest in content is what drives data networks, some new technologies and thoughts attempt to cope with these challenges by developing content-based rather than address-based architectures. The Publish/ Subscribe paradigm can be a promising approach toward content-based data dissemination, especially that it provides total decoupling between publishers and subscribers. However, in content-based publish/subscribe systems, subscriptions are expressive and the information is often delivered based on the matched expressive content - which may not deeply alleviate considerable performance challenges. This dissertation explores a hardware solution for disseminating data in content-based publish/subscribe systems. This solution consists of an efficient hardware architecture of an XML/XPath broker that can route information based on content to either other XML/XPath brokers or to ultimate users. A network of such brokers represent an overlay structure for XML content-based publish/subscribe data dissemination systems. Each broker can simultaneously process many XPath subscriptions, efficiently parse XML publications, and subsequently forward notifications that result from high-performance matching processes. In the core of the broker architecture, locates an XML parser that utilizes a novel Skeleton CAM-Based XML Parsing (SCBXP) technique in addition to an XPath processor and a high-performance matching engine. Moreover, the broker employs effective mechanisms for content-based routing, so as subscriptions, publications, and notifications are routed through the network based on content. The inherent reconfigurability feature of the broker’s hardware provides the system architecture with the capability of residing in any FPGA device of moderate logic density. Furthermore, such a system-on-chip architecture is upgradable, if any future hardware add-ons are needed. However, the current architecture is mature and can effectively be implemented on an ASIC device. Finally, this thesis presents and analyzes the experiments conducted on an FPGA prototype implementation of the proposed broker/router. The experiments tackle tests for the SCBXP alone and for two phases of development of the whole broker. The corresponding results indicate the high performance that the involved parsing, storing, matching, and routing processes can achieve.
APA, Harvard, Vancouver, ISO, and other styles
6

Abdelouahab, Kamel. "Reconfigurable hardware acceleration of CNNs on FPGA-based smart cameras." Thesis, Université Clermont Auvergne‎ (2017-2020), 2018. http://www.theses.fr/2018CLFAC042/document.

Full text
Abstract:
Les Réseaux de Neurones Convolutifs profonds (CNNs) ont connu un large succès au cours de la dernière décennie, devenant un standard de la vision par ordinateur. Ce succès s’est fait au détriment d’un large coût de calcul, où le déploiement des CNNs reste une tâche ardue surtout sous des contraintes de temps réel.Afin de rendre ce déploiement possible, la littérature exploite le parallélisme important de ces algorithmes, ce qui nécessite l’utilisation de plate-formes matérielles dédiées. Dans les environnements soumis à des contraintes de consommations énergétiques, tels que les nœuds des caméras intelligentes, les cœurs de traitement à base de FPGAs sont reconnus comme des solutions de choix pour accélérer les applications de vision par ordinateur. Ceci est d’autant plus vrai pour les CNNs, où les traitements se font naturellement sur un flot de données, rendant les architectures matérielles à base de FPGA d’autant plus pertinentes. Dans ce contexte, cette thèse aborde les problématiques liées à l’implémentation des CNNs sur FPGAs. En particulier, ces travaux visent à améliorer l’efficacité des implantations grâce à deux principales stratégies d’optimisation; la première explore le modèle et les paramètres des CNNs, tandis que la seconde se concentre sur les architectures matérielles adaptées au FPGA
Deep Convolutional Neural Networks (CNNs) have become a de-facto standard in computer vision. This success came at the price of a high computational cost, making the implementation of CNNs, under real-time constraints, a challenging task.To address this challenge, the literature exploits the large amount of parallelism exhibited by these algorithms, motivating the use of dedicated hardware platforms. In power-constrained environments, such as smart camera nodes, FPGA-based processing cores are known to be adequate solutions in accelerating computer vision applications. This is especially true for CNN workloads, which have a streaming nature that suits well to reconfigurable hardware architectures.In this context, the following thesis addresses the problems of CNN mapping on FPGAs. In Particular, it aims at improving the efficiency of CNN implementations through two main optimization strategies; The first one focuses on the CNN model and parameters while the second one considers the hardware architecture and the fine-grain building blocks
APA, Harvard, Vancouver, ISO, and other styles
7

Vargun, Bilgin. "Acceleration Of Molecular Dynamics Simulation For Tersoff2 Potential Through Reconfigurable Hardware." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12615063/index.pdf.

Full text
Abstract:
In nanotechnology, Carbon Nanotubes systems are studied with Molecular Dynamics Simulation software programs investigating the properties of molecular structure. Computational loads are very complex in these kinds of software programs. Especially in three body simulations, it takes a couple of weeks for small number of atoms. Researchers use supercomputers to study more complex systems. In recent years, by the development of sophisticated Field Programmable Gate Array (FPGA) Technology, researchers design special purpose co-processor to accelerate their simulations. Ongoing researches show that using application specific digital circuits will have better performance with respect to an ordinary computer. In this thesis, a new special co-processor, called TERSOFF2, is designed and implemented. Resulting design is a low cost, low power and high performance computing solution. It can solve same computation problem 1000 times faster. Moreover, an optimized digital mathematical elementary functions library is designed and implemented through thesis study. All of the work about digital circuits and architecture of co-processor will be given in the related chapter. Performance achievements will be at the end of thesis.
APA, Harvard, Vancouver, ISO, and other styles
8

Martin, Phillip Murray. "Acceleration methodology for the implementation of scientific application on reconfigurable hardware." Connect to this title online, 2009. http://etd.lib.clemson.edu/documents/1246557242/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Silva, João Paulo Sá da. "Data processing in Zynq APSoC." Master's thesis, Universidade de Aveiro, 2014. http://hdl.handle.net/10773/14703.

Full text
Abstract:
Mestrado em Engenharia de Computadores e Telemática
Field-Programmable Gate Arrays (FPGAs) were invented by Xilinx in 1985, i.e. less than 30 years ago. The influence of FPGAs on many directions in engineering is growing continuously and rapidly. There are many reasons for such progress and the most important are the inherent reconfigurability of FPGAs and relatively cheap development cost. Recent field-configurable micro-chips combine the capabilities of software and hardware by incorporating multi-core processors and reconfigurable logic enabling the development of highly optimized computational systems for a vast variety of practical applications, including high-performance computing, data, signal and image processing, embedded systems, and many others. In this context, the main goals of the thesis are to study the new micro-chips, namely the Zynq-7000 family and to apply them to two selected case studies: data sort and Hamming weight calculation for long vectors.
Field-Programmable Gate Arrays (FPGAs) foram inventadas pela Xilinx em 1985, ou seja, há menos de 30 anos. A influência das FPGAs está a crescer continua e rapidamente em muitos ramos de engenharia. Há varias razões para esta evolução, as mais importantes são a sua capacidade de reconfiguração inerente e os baixos custos de desenvolvimento. Os micro-chips mais recentes baseados em FPGAs combinam capacidades de software e hardware através da incorporação de processadores multi-core e lógica reconfigurável permitindo o desenvolvimento de sistemas computacionais altamente otimizados para uma grande variedade de aplicações práticas, incluindo computação de alto desempenho, processamento de dados, de sinal e imagem, sistemas embutidos, e muitos outros. Neste contexto, este trabalho tem como o objetivo principal estudar estes novos micro-chips, nomeadamente a família Zynq-7000, para encontrar as melhores formas de potenciar as vantagens deste sistema usando casos de estudo como ordenação de dados e cálculo do peso de Hamming para vetores longos.
APA, Harvard, Vancouver, ISO, and other styles
10

Blumer, Aric David. "Register Transfer Level Simulation Acceleration via Hardware/Software Process Migration." Diss., Virginia Tech, 2007. http://hdl.handle.net/10919/29380.

Full text
Abstract:
The run-time reconfiguration of Field Programmable Gate Arrays (FPGAs) opens new avenues to hardware reuse. Through the use of process migration between hardware and software, an FPGA provides a parallel execution cache. Busy processes can be migrated into hardware-based, parallel processors, and idle processes can be migrated out increasing the utilization of the hardware. The application of hardware/software process migration to the acceleration of Register Transfer Level (RTL) circuit simulation is developed and analyzed. RTL code can exhibit a form of locality of reference such that executing processes tend to be executed again. This property is termed executive temporal locality, and it can be exploited by migration systems to accelerate RTL simulation. In this dissertation, process migration is first formally modeled using Finite State Machines (FSMs). Upon FSMs are built programs, processes, migration realms, and the migration of process state within a realm. From this model, a taxonomy of migration realms is developed. Second, process migration is applied to the RTL simulation of digital circuits. The canonical form of an RTL process is defined, and transformations of HDL code are justified and demonstrated. These transformations allow a simulator to identify basic active units within the simulation and combine them to balance the load across a set of processors. Through the use of input monitors, executive locality of reference is identified and demonstrated on a set of six RTL designs. Finally, the implementation of a migration system is described which utilizes Virtual Machines (VMs) and Real Machines (RMs) in existing FPGAs. Empirical and algorithmic models are developed from the data collected from the implementation to evaluate the effect of optimizations and migration algorithms.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Reconfigurable Hardware Accelerator"

1

Zhang, Hao, Deivalakshmi Subbian, G. Lakshminarayanan, and Seok-Bum Ko. "Application-Specific and Reconfigurable AI Accelerator." In Artificial Intelligence and Hardware Accelerators, 183–223. Cham: Springer International Publishing, 2012. http://dx.doi.org/10.1007/978-3-031-22170-5_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Gang, Du Chen, Jian Chen, Jianliang Ma, and Tianzhou Chen. "A Performance Model for Run-Time Reconfigurable Hardware Accelerator." In Lecture Notes in Computer Science, 54–66. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-03644-6_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kamaraj, A., and J. Senthil Kumar. "Reconfigurable Binary Neural Networks Hardware Accelerator for Accurate Data Analysis in Intelligent Systems." In Data Science and Innovations for Intelligent Systems, 261–79. Boca Raton: CRC Press, 2021. http://dx.doi.org/10.1201/9781003132080-11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Claus, Christopher, and Walter Stechele. "AutoVision—Reconfigurable Hardware Acceleration for Video-Based Driver Assistance." In Dynamically Reconfigurable Systems, 375–94. Dordrecht: Springer Netherlands, 2010. http://dx.doi.org/10.1007/978-90-481-3485-4_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Angioli, Marco, Marcello Barbirotta, Abdallah Cheikh, Antonio Mastrandrea, Francesco Menichelli, Saeid Jamili, and Mauro Olivieri. "Contextual Bandits Algorithms for Reconfigurable Hardware Accelerators." In Lecture Notes in Electrical Engineering, 149–54. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-30333-3_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Mingas, Grigorios, and Christos-Savvas Bouganis. "Parallel Tempering MCMC Acceleration Using Reconfigurable Hardware." In Lecture Notes in Computer Science, 227–38. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28365-9_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Platzner, Marco, and Giovanni De Micheli. "Acceleration of satisfiability algorithms by reconfigurable hardware." In Lecture Notes in Computer Science, 69–78. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0055234.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Nagvajara, Prawat, Chika Nwankpa, and Jeremy Johnson. "Reconfigurable Hardware Accelerators for Power Transmission System Computation." In Power Systems, 211–28. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-32683-7_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rodríguez, David, Juan M. Sánchez, and Arturo Duran. "Mobile Fingerprint Identification Using a Hardware Accelerated Biometric Service Provider." In Reconfigurable Computing: Architectures and Applications, 383–88. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11802839_46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Soon, Voon Siew, Chee Yuen Lam, and Ehkan Phaklen. "Reconfigurable Hardware Acceleration of RGB to HSL Converter." In Lecture Notes in Electrical Engineering, 1275–82. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-24584-3_109.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Reconfigurable Hardware Accelerator"

1

Calderón, Humberto, Jesús Ortiz, and Jean-Guy Fontaine. "Disparity Map Hardware Accelerator." In 2008 International Conference on Reconfigurable Computing and FPGAs (ReConFig). IEEE, 2008. http://dx.doi.org/10.1109/reconfig.2008.29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Reeves, Keith, Ken Sienski, and Calvin Field. "Reconfigurable hardware accelerator for embedded DSP." In Photonics East '96, edited by John Schewel, Peter M. Athanas, V. Michael Bove, Jr., and John Watson. SPIE, 1996. http://dx.doi.org/10.1117/12.255831.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Lin, Ming-Yi, Sheng-Hsien Hsieh, and Ching-Han Chen. "Reconfigurable Hardware Accelerator of Morphological Image Processor." In 2023 IEEE 3rd International Conference on Electronic Communications, Internet of Things and Big Data (ICEIB). IEEE, 2023. http://dx.doi.org/10.1109/iceib57887.2023.10170636.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Avey, Joe, Phillip Jones, and Joseph Zambreno. "An FPGA-based Hardware Accelerator for Iris Segmentation." In 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 2018. http://dx.doi.org/10.1109/reconfig.2018.8641726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Struharik, Rastislav, and Bogdan Vukobratovic. "AIScale — A coarse grained reconfigurable CNN hardware accelerator." In 2017 IEEE East-West Design & Test Symposium (EWDTS). IEEE, 2017. http://dx.doi.org/10.1109/ewdts.2017.8110048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hasnain, Syed Ali, and Rabi Mahapatra. "Towards reconfigurable optoelectronic hardware accelerator for reservoir computing." In Optoelectronic Devices and Integration IX, edited by Baojun Li, Changyuan Yu, Xuping Zhang, and Xinliang Zhang. SPIE, 2020. http://dx.doi.org/10.1117/12.2575401.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rubin, G., M. Omieljanowicz, and A. Petrovsky. "Reconfigurable FPGA-Based Hardware Accelerator for Embedded DSP." In 2007 14th International Conference on Mixed Design of Integrated Circuits and Systems. IEEE, 2007. http://dx.doi.org/10.1109/mixdes.2007.4286138.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Yuan, Zhongda, Yuchun Ma, and Jinian Bian. "SMPP: Generic SAT Solver over Reconfigurable Hardware Accelerator." In 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2012. http://dx.doi.org/10.1109/ipdpsw.2012.57.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Bouthaina, Damak, Mouna Baklouti, Smail Niar, and Mohamed Abid. "Shared hardware accelerator architectures for heterogeneous MPSoCs." In 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC). IEEE, 2013. http://dx.doi.org/10.1109/recosoc.2013.6581549.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Davis, John D., Zhangxi Tan, Fang Yu, and Lintao Zhang. "A practical reconfigurable hardware accelerator for Boolean satisfiability solvers." In the 45th annual conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1391469.1391669.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography