Log in

Relevant bibliographies by topics / Reconfigurable Hardware Accelerator / Journal articles

To see the other types of publications on this topic, follow the link: Reconfigurable Hardware Accelerator.

Journal articles on the topic 'Reconfigurable Hardware Accelerator'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Reconfigurable Hardware Accelerator.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Xiong, Hao, Kelin Sun, Bing Zhang, Jingchuan Yang, and Huiping Xu. "Deep-Sea: A Reconfigurable Accelerator for Classic CNN." Wireless Communications and Mobile Computing 2022 (February 2, 2022): 1–23. http://dx.doi.org/10.1155/2022/4726652.

Full text

Abstract:

To meet the changing real-time edge engineering application requirements of CNN, aiming at the lack of universality and flexibility of CNN hardware acceleration architecture based on ARM+FPGA, a general low-power all pipelined CNN hardware acceleration architecture is proposed to cope with the continuously updated CNN algorithm and accelerate in hardware platforms with different resource constraints. In the framework of the general hardware architecture, a basic instruction set belonging to the architecture is proposed, which can be used to calculate and configure different versions of CNN algorithms. Based on the instruction set, the configurable computing subsystem, memory management subsystem, on-chip cache subsystem, and instruction execution subsystem are designed and implemented. In addition, in the processing of convolution results, the on-chip storage unit is used to preprocess the convolution results, to speed up the activation and pooling calculation process in parallel. Finally, the accelerator is modeled at the RTL level and deployed on the XC7Z100 heterogeneous device. The lightweight networks YOLOv2-tiny and YOLOv3-tiny commonly used in engineering applications are verified on the accelerator. The results show that the peak performance of the accelerator reaches 198.37 GOP/s, the clock frequency reaches 210 MHz, and the power consumption is 4.52 w under 16-bit width.

APA, Harvard, Vancouver, ISO, and other styles

2

An, Fubang, Lingli Wang, and Xuegong Zhou. "A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network." Electronics 12, no. 13 (June 27, 2023): 2847. http://dx.doi.org/10.3390/electronics12132847.

Full text

Abstract:

Since the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for EfficientNet still have much room to improve the performance of the depthwise convolution, squeeze-and-excitation module and nonlinear activation functions. In this paper, we first design a reconfigurable register array and computational kernel to accelerate the depthwise convolution. Next, we propose a vector unit to implement the nonlinear activation functions and the scale operation. An exchangeable-sequence dual-computational kernel architecture is proposed to improve the performance and the utilization. In addition, the memory architectures are designed to complete the hardware accelerator for the above computing architecture. Finally, in order to evaluate the performance of the hardware accelerator, the accelerator is implemented based on Xilinx XCVU37P. The results show that the proposed accelerator can work at the main system clock frequency of 300 MHz with the DSP kernel at 600 MHz. The performance of EfficientNet-B3 in our architecture can reach 69.50 FPS and 255.22 GOPS. Compared with the latest EfficientNet-B3 accelerator, which uses the same FPGA development board, the accelerator proposed in this paper can achieve a 1.28-fold improvement of single-core performance and 1.38-fold improvement of performance of each DSP.

APA, Harvard, Vancouver, ISO, and other styles

3

Nakasato, N., T. Hamada, and T. Fukushige. "Galaxy Evolution with Reconfigurable Hardware Accelerator." EAS Publications Series 24 (2007): 291–92. http://dx.doi.org/10.1051/eas:2007043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Ebrahim, Ali. "Finding the Top-K Heavy Hitters in Data Streams: A Reconfigurable Accelerator Based on an FPGA-Optimized Algorithm." Electronics 12, no. 11 (May 24, 2023): 2376. http://dx.doi.org/10.3390/electronics12112376.

Full text

Abstract:

This paper presents a novel approach for accelerating the top-k heavy hitters query in data streams using Field Programmable Gate Arrays (FPGAs). Current hardware acceleration approaches rely on the direct and strict mapping of software algorithms into hardware, limiting their performance and practicality due to the lack of hardware optimizations at an algorithmic level. The presented approach optimizes a well-known software algorithm by carefully relaxing some of its requirements to allow for the design of a practical and scalable hardware accelerator that outperforms current state-of-the-art accelerators while maintaining near-perfect accuracy. This paper details the design and implementation of an optimized FPGA accelerator specifically tailored for computing the top-k heavy hitters query in data streams. The presented accelerator is entirely specified at the C language level and is easily reproducible with High-Level Synthesis (HLS) tools. Implementation on Intel Arria 10 and Stratix 10 FPGAs using Intel HLS compiler showed promising results—outperforming prior state-of-the-art accelerators in terms of throughput and features.

APA, Harvard, Vancouver, ISO, and other styles

5

Zhang, Xvpeng, Bingqiang Liu, Yaqi Zhao, Xiaoyu Hu, Zixuan Shen, Zhaoxia Zheng, Zhenglin Liu, et al. "Design and Analysis of Area and Energy Efficient Reconfigurable Cryptographic Accelerator for Securing IoT Devices." Sensors 22, no. 23 (November 25, 2022): 9160. http://dx.doi.org/10.3390/s22239160.

Full text

Abstract:

Achieving low-cost and high-performance network security communication is necessary for Internet of Things (IoT) devices, including intelligent sensors and mobile robots. Designing hardware accelerators to accelerate multiple computationally intensive cryptographic primitives in various network security protocols is challenging. Different from existing unified reconfigurable cryptographic accelerators with relatively low efficiency and high latency, this paper presents design and analysis of a reconfigurable cryptographic accelerator consisting of a reconfigurable cipher unit and a reconfigurable hash unit to support widely used cryptographic algorithms for IoT Devices, which require block ciphers and hash functions simultaneously. Based on a detailed and comprehensive algorithmic analysis of both the block ciphers and hash functions in terms of basic algorithm structures and common cryptographic operators, the proposed reconfigurable cryptographic accelerator is designed by reusing key register files and operators to build unified data paths. Both the reconfigurable cipher unit and the reconfigurable hash unit contain a unified data path to implement Data Encryption Standard (DES)/Advanced Encryption Standard (AES)/ShangMi 4 (SM4) and Secure Hash Algorithm-1 (SHA-1)/SHA-256/SM3 algorithms, respectively. A reconfigurable S-Box for AES and SM4 is designed based on the composite field Galois field (GF) GF(((22)2)2), which significantly reduces hardware overhead and power consumption compared with the conventional implementation by look-up tables. The experimental results based on 65-nm application-specific integrated circuit (ASIC) implementation show that the achieved energy efficiency and area efficiency of the proposed design is 441 Gbps/W and 37.55 Gbps/mm2, respectively, which is suitable for IoT devices with limited battery and form factor. The result of delay analysis also shows that the number of delay cycles of our design can be reduced by 83% compared with the state-of-the-art design, which shows that the proposed design is more suitable for applications including 5G/Wi-Fi/ZigBee/Ethernet network standards to accelerate block ciphers and hash functions simultaneously.

APA, Harvard, Vancouver, ISO, and other styles

6

Milik, Adam, and Andrzej Pułka. "The Reconfigurable Hardware Accelerator for Searching Genome Patterns." IFAC Proceedings Volumes 42, no. 1 (2009): 33–38. http://dx.doi.org/10.3182/20090210-3-cz-4002.00010.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Ibrahim, Atef, Hamed Elsimary, Abdullah Aljumah, and Fayez Gebali. "Reconfigurable Hardware Accelerator for Profile Hidden Markov Models." Arabian Journal for Science and Engineering 41, no. 8 (May 18, 2016): 3267–77. http://dx.doi.org/10.1007/s13369-016-2162-y.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Vranjkovic, Vuk, Predrag Teodorovic, and Rastislav Struharik. "Universal Reconfigurable Hardware Accelerator for Sparse Machine Learning Predictive Models." Electronics 11, no. 8 (April 8, 2022): 1178. http://dx.doi.org/10.3390/electronics11081178.

Full text

Abstract:

This study presents a universal reconfigurable hardware accelerator for efficient processing of sparse decision trees, artificial neural networks and support vector machines. The main idea is to develop a hardware accelerator that will be able to directly process sparse machine learning models, resulting in shorter inference times and lower power consumption compared to existing solutions. To the author’s best knowledge, this is the first hardware accelerator of this type. Additionally, this is the first accelerator that is capable of processing sparse machine learning models of different types. Besides the hardware accelerator itself, algorithms for induction of sparse decision trees, pruning of support vector machines and artificial neural networks are presented. Such sparse machine learning classifiers are attractive since they require significantly less memory resources for storing model parameters. This results in reduced data movement between the accelerator and the DRAM memory, as well as a reduced number of operations required to process input instances, leading to faster and more energy-efficient processing. This could be of a significant interest in edge-based applications, with severely constrained memory, computation resources and power consumption. The performance of algorithms and the developed hardware accelerator are demonstrated using standard benchmark datasets from the UCI Machine Learning Repository database. The results of the experimental study reveal that the proposed algorithms and presented hardware accelerator are superior when compared to some of the existing solutions. Throughput is increased up to 2 times for decision trees, 2.3 times for support vector machines and 38 times for artificial neural networks. When the processing latency is considered, maximum performance improvement is even higher: up to a 4.4 times reduction for decision trees, a 84.1 times reduction for support vector machines and a 22.2 times reduction for artificial neural networks. Finally, since it is capable of supporting sparse classifiers, the usage of the proposed hardware accelerator leads to a significant reduction in energy spent on DRAM data transfers and a reduction of 50.16% for decision trees, 93.65% for support vector machines and as much as 93.75% for artificial neural networks, respectively.

APA, Harvard, Vancouver, ISO, and other styles

9

Schumacher, Tobias, Tim Süß, Christian Plessl, and Marco Platzner. "FPGA Acceleration of Communication-Bound Streaming Applications: Architecture Modeling and a 3D Image Compositing Case Study." International Journal of Reconfigurable Computing 2011 (2011): 1–11. http://dx.doi.org/10.1155/2011/760954.

Full text

Abstract:

Reconfigurable computers usually provide a limited number of different memory resources, such as host memory, external memory, and on-chip memory with different capacities and communication characteristics. A key challenge for achieving high-performance with reconfigurable accelerators is the efficient utilization of the available memory resources. A detailed knowledge of the memories' parameters is key for generating an optimized communication layout. In this paper, we discuss a benchmarking environment for generating such a characterization. The environment is built on IMORC, our architectural template and on-chip network for creating reconfigurable accelerators. We provide a characterization of the memory resources available on the XtremeData XD1000 reconfigurable computer. Based on this data, we present as a case study the implementation of a 3D image compositing accelerator that is able to double the frame rate of a parallel renderer.

APA, Harvard, Vancouver, ISO, and other styles

10

Pérez, Ignacio, and Miguel Figueroa. "A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems." Sensors 21, no. 8 (April 9, 2021): 2637. http://dx.doi.org/10.3390/s21082637.

Full text

Abstract:

Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.

APA, Harvard, Vancouver, ISO, and other styles

11

Shi, Kaisheng, Mingwei Wang, Xin Tan, Qianghua Li, and Tao Lei. "Efficient Dynamic Reconfigurable CNN Accelerator for Edge Intelligence Computing on FPGA." Information 14, no. 3 (March 20, 2023): 194. http://dx.doi.org/10.3390/info14030194.

Full text

Abstract:

This paper proposes an efficient dynamic reconfigurable CNN accelerator (EDRCA) for FPGAs to tackle the issues of limited hardware resources and low energy efficiency in the deployment of convolutional neural networks on embedded edge computing devices. First, a configuration layer sequence optimization method is proposed to minimize the configuration time overhead and improve performance. Second, accelerator templates for dynamic regions are designed to create a unified high-speed interface and enhance operational performance. The dynamic reconfigurable technology is applied on the Xilinx KV260 FPGA platform to design the EDRCA accelerator, resolving the hardware resource constraints in traditional accelerator design. The YOLOV2-TINY object detection network is used to test the EDRCA accelerator on the Xilinx KV260 platform using floating point data. Results at 250 MHz show a computing performance of 75.1929 GOPS, peak power consumption of 5.25 W, and power efficiency of 13.6219 GOPS/W, indicating the potential of the EDRCA accelerator for edge intelligence computing.

APA, Harvard, Vancouver, ISO, and other styles

12

Melnyk, Viktor A., and Vladyslav V. Hamolia. "Investigation of reconfigurable hardware platforms for 5G protocol stack functions acceleration." Applied Aspects of Information Technology 6, no. 1 (April 10, 2023): 84–99. http://dx.doi.org/10.15276/aait.06.2023.7.

Full text

Abstract:

Open RAN and 5G are two key technologies designed to qualitatively improve network infrastructure and provide greater flexibility and efficiency to mobile operators and users. 5G creates new capabilities for high-speed Internet, Internet of Things, telemedicine and many other applications, while Open RAN enables open and standardized network architectures, which reduces cost and risk for operators and promotes innovations. Given the growing number of users and data volumes, the purely software implementation of certain functions of the 5G protocol, and especially computationally complex ones, requires significant computer resources and energy. These, for example, are low-density parity-check (LDPC) coding, FFT and iFFT algorithms on physical (PHY) layer, and NEA and NIA security algorithms on Packet Data Convergence Protocol (PDCP) layer. Therefore, one of the activity areas in the development of means for 5G systems is the hardware acceleration of such functions execution, which provides the possibility of processing large volumes of data in real time and with high efficiency. The high-performance hardware basis for implementing these functions today is field-programmable gate array (FPGA) integrated circuits. Along with this, the efficiency of the 5G protocol stack functions hardware acceleration depends significantly on the size of the data packets transmitted to the hardware accelerator. As experience shows, for certain types of architecture of computer systems with accelerators, the acceleration value can take even a negative value. This necessitates the search for alternative architectural solutions for the implementation of such systems. In this article the approaches for hardware acceleration using reconfigurable FPGA-based computing components are explored, their comparative analysis is performed, and architectural alternatives are evaluated for the implementation of a computing platform to perform the functions of the 5G protocol stack with hardware acceleration of PHY and medium access control (MAC) layers functions.

APA, Harvard, Vancouver, ISO, and other styles

13

Ferianc, Martin, Hongxiang Fan, Divyansh Manocha, Hongyu Zhou, Shuanglong Liu, Xinyu Niu, and Wayne Luk. "Improving Performance Estimation for Design Space Exploration for Convolutional Neural Network Accelerators." Electronics 10, no. 4 (February 23, 2021): 520. http://dx.doi.org/10.3390/electronics10040520.

Full text

Abstract:

Contemporary advances in neural networks (NNs) have demonstrated their potential in different applications such as in image classification, object detection or natural language processing. In particular, reconfigurable accelerators have been widely used for the acceleration of NNs due to their reconfigurability and efficiency in specific application instances. To determine the configuration of the accelerator, it is necessary to conduct design space exploration to optimize the performance. However, the process of design space exploration is time consuming because of the slow performance evaluation for different configurations. Therefore, there is a demand for an accurate and fast performance prediction method to speed up design space exploration. This work introduces a novel method for fast and accurate estimation of different metrics that are of importance when performing design space exploration. The method is based on a Gaussian process regression model parametrised by the features of the accelerator and the target NN to be accelerated. We evaluate the proposed method together with other popular machine learning based methods in estimating the latency and energy consumption of our implemented accelerator on two different hardware platforms targeting convolutional neural networks. We demonstrate improvements in estimation accuracy, without the need for significant implementation effort or tuning.

APA, Harvard, Vancouver, ISO, and other styles

14

Irmak, Hasan, Federico Corradi, Paul Detterer, Nikolaos Alachiotis, and Daniel Ziener. "A Dynamic Reconfigurable Architecture for Hybrid Spiking and Convolutional FPGA-Based Neural Network Designs." Journal of Low Power Electronics and Applications 11, no. 3 (August 17, 2021): 32. http://dx.doi.org/10.3390/jlpea11030032.

Full text

Abstract:

This work presents a dynamically reconfigurable architecture for Neural Network (NN) accelerators implemented in Field-Programmable Gate Array (FPGA) that can be applied in a variety of application scenarios. Although the concept of Dynamic Partial Reconfiguration (DPR) is increasingly used in NN accelerators, the throughput is usually lower than pure static designs. This work presents a dynamically reconfigurable energy-efficient accelerator architecture that does not sacrifice throughput performance. The proposed accelerator comprises reconfigurable processing engines and dynamically utilizes the device resources according to model parameters. Using the proposed architecture with DPR, different NN types and architectures can be realized on the same FPGA. Moreover, the proposed architecture maximizes throughput performance with design optimizations while considering the available resources on the hardware platform. We evaluate our design with different NN architectures for two different tasks. The first task is the image classification of two distinct datasets, and this requires switching between Convolutional Neural Network (CNN) architectures having different layer structures. The second task requires switching between NN architectures, namely a CNN architecture with high accuracy and throughput and a hybrid architecture that combines convolutional layers and an optimized Spiking Neural Network (SNN) architecture. We demonstrate throughput results from quickly reprogramming only a tiny part of the FPGA hardware using DPR. Experimental results show that the implemented designs achieve a 7× faster frame rate than current FPGA accelerators while being extremely flexible and using comparable resources.

APA, Harvard, Vancouver, ISO, and other styles

15

Huang, Xiaoying, Zhichuan Guo, Mangu Song, and Yunfei Guo. "AccelSDP: A Reconfigurable Accelerator for Software Data Plane Based on FPGA SmartNIC." Electronics 10, no. 16 (August 11, 2021): 1927. http://dx.doi.org/10.3390/electronics10161927.

Full text

Abstract:

Software-defined networking (SDN) has attracted much attention since it was proposed. The architecture of the SDN data plane is also evolving. To support the flexibility of the data plane, the software implementation approach is adopted. The software data plane of SDN is commonly implemented on a commercial off-the-shelf (COTS) server, executing an entire processing logic on a commodity CPU. With sharp increases in network capacity, CPU-based packet processing is overwhelmed. However, completely implementing the data plane on hardware weakens the flexibility. Therefore, hybrid implementation where a hardware device is adopted as the accelerator is proposed to balance the performance and flexibility. We propose an FPGA SmartNIC-based reconfigurable accelerator to offload some of the operation-intensive packet processing functions from the software data plane to reconfigurable hardware, thus improving the overall data plane performance while retaining flexibility. The accelerated software data plane has a powerful line-rate packet processing capability and flexible programmability at 100 Gbps and higher throughput. We offloaded a cached-rule table to the proposed accelerator and tested its performance with 100 GbE traffic. Compared with the software implementation, the evaluation result shows that the throughput can achieve a 600% improvement when processing small packets and a 100% increase in large packet processing, and the latency can be reduced by about 20× and 100×, respectively, when processing small packets and large packets.

APA, Harvard, Vancouver, ISO, and other styles

16

Dondo Gazzano, Julio, Francisco Sanchez Molina, Fernando Rincon, and Juan Carlos López. "Integrating Reconfigurable Hardware-Based Grid for High Performance Computing." Scientific World Journal 2015 (2015): 1–19. http://dx.doi.org/10.1155/2015/272536.

Full text

Abstract:

FPGAs have shown several characteristics that make them very attractive for high performance computing (HPC). The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use. However, there are still some difficulties when using reconfigurable platforms as accelerator that need to be addressed: the need of an in-depth application study to identify potential acceleration, the lack of tools for the deployment of computational problems in distributed hardware platforms, and the low portability of components, among others. This work proposes a complete grid infrastructure for distributed high performance computing based on dynamically reconfigurable FPGAs. Besides, a set of services designed to facilitate the application deployment is described. An example application and a comparison with other hardware and software implementations are shown. Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process.

APA, Harvard, Vancouver, ISO, and other styles

17

Kuznar, Damian, Robert Szczygiel, Piotr Maj, and Anna Kozioł. "Design of artificial neural network hardware accelerator." Journal of Instrumentation 18, no. 04 (April 1, 2023): C04013. http://dx.doi.org/10.1088/1748-0221/18/04/c04013.

Full text

Abstract:

Abstract We present a design of the scalable processor capable of providing an artificial neural network (ANN) functionality and in-house developed tools for automatic conversion of an ANN model designed with the TensorFlow library into an HDL code. The hardware is described in SystemVerilog and the synthesized module of the processor can perform calculations of a neural network with the speed exceeding 100 MHz. Our in-house designed software tool for ANN conversion supports translation of an arbitrary multilayer perceptron neural network into a state machine module, which performs necessary calculations. It is also dynamically reconfigurable so that the ANN operating on the hardware can be changed after it is deployed as an ASIC. The project aims the in-pixel implementation towards an X-ray photon energy estimation. The energy estimation shall be delivered with accuracy that exceeds the accuracy of an ADC converter that feeds the ANN with data.

APA, Harvard, Vancouver, ISO, and other styles

18

Zamacola, Rafael, Andrés Otero, and Eduardo de la Torre. "Multi-grain reconfigurable and scalable overlays for hardware accelerator composition." Journal of Systems Architecture 121 (December 2021): 102302. http://dx.doi.org/10.1016/j.sysarc.2021.102302.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Babecki, Christopher, Wenchao Qian, Somnath Paul, Robert Karam, and Swarup Bhunia. "An Embedded Memory-Centric Reconfigurable Hardware Accelerator for Security Applications." IEEE Transactions on Computers 65, no. 10 (October 1, 2016): 3196–202. http://dx.doi.org/10.1109/tc.2015.2512858.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Kang, Sungho, Youngmin Hur, and Stephen A. Szygenda. "A Hardware Accelerator for Fault Simulation Utilizing a Reconfigurable Array Architecture." VLSI Design 4, no. 2 (January 1, 1996): 119–33. http://dx.doi.org/10.1155/1996/60318.

Full text

Abstract:

In order to reduce cost and to achieve high speed a new hardware accelerator for fault simulation has been designed. The architecture of the new accelerator is based on a reconfigurabl mesh type processing element (PE) array. Circuit elements at the same topological level are simulated concurrently, as in a pipelined process. A new parallel simulation algorithm expands all of the gates to two input gates in order to limit the number of faults to two at each gate, so that the faults can be distributed uniformly throughout the PE array. The PE array reconfiguration operation provides a simulation speed advantage by maximizing the use of each PE cell.This new approach provides for a high performance, cost effective, gain over software simulation. Simulation results show that the hardware accelerator is orders of magnitude faster than the software simulation program.

APA, Harvard, Vancouver, ISO, and other styles

21

Leon, Vasileios, Spyridon Mouselinos, Konstantina Koliogeorgi, Sotirios Xydis, Dimitrios Soudris, and Kiamal Pekmestzi. "A TensorFlow Extension Framework for Optimized Generation of Hardware CNN Inference Engines." Technologies 8, no. 1 (January 13, 2020): 6. http://dx.doi.org/10.3390/technologies8010006.

Full text

Abstract:

The workloads of Convolutional Neural Networks (CNNs) exhibit a streaming nature that makes them attractive for reconfigurable architectures such as the Field-Programmable Gate Arrays (FPGAs), while their increased need for low-power and speed has established Application-Specific Integrated Circuit (ASIC)-based accelerators as alternative efficient solutions. During the last five years, the development of Hardware Description Language (HDL)-based CNN accelerators, either for FPGA or ASIC, has seen huge academic interest due to their high-performance and room for optimizations. Towards this direction, we propose a library-based framework, which extends TensorFlow, the well-established machine learning framework, and automatically generates high-throughput CNN inference engines for FPGAs and ASICs. The framework allows software developers to exploit the benefits of FPGA/ASIC acceleration without requiring any expertise on HDL development and low-level design. Moreover, it provides a set of optimization knobs concerning the model architecture and the inference engine generation, allowing the developer to tune the accelerator according to the requirements of the respective use case. Our framework is evaluated by optimizing the LeNet CNN model on the MNIST dataset, and implementing FPGA- and ASIC-based accelerators using the generated inference engine. The optimal FPGA-based accelerator on Zynq-7000 delivers 93% less memory footprint and 54% less Look-Up Table (LUT) utilization, and up to 10× speedup on the inference execution vs. different Graphics Processing Unit (GPU) and Central Processing Unit (CPU) implementations of the same model, in exchange for a negligible accuracy loss, i.e., 0.89%. For the same accuracy drop, the 45 nm standard-cell-based ASIC accelerator provides an implementation which operates at 520 MHz and occupies an area of 0.059 mm 2 , while the power consumption is ∼7.5 mW.

APA, Harvard, Vancouver, ISO, and other styles

22

Cho, Jaechan, Yongchul Jung, Seongjoo Lee, and Yunho Jung. "Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme." Electronics 10, no. 3 (January 20, 2021): 230. http://dx.doi.org/10.3390/electronics10030230.

Full text

Abstract:

Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer accelerators. Although streaming accelerators designed for a specific BNN network topology provide high throughput, they are infeasible for various sensor applications in edge AI because of their complexity and inflexibility. In contrast, layer accelerators with reasonable resources can support various network topologies, but they operate with the same parallelism for all the layers of the BNN, which degrades throughput performance at certain layers. To overcome this problem, we propose a BNN accelerator with adaptive parallelism that offers high throughput performance in all layers. The proposed accelerator analyzes target layer parameters and operates with optimal parallelism using reasonable resources. In addition, this architecture is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators. In performance evaluation using state-of-the-art BNN topologies, the designed BNN accelerator achieved an area–speed efficiency 9.69 times higher than previous FPGA implementations and 24% higher than existing VLSI implementations for BNNs.

APA, Harvard, Vancouver, ISO, and other styles

23

Manjith B.C. and Ramasubramanian N. "Securing AES Accelerator from Key-Leaking Trojans on FPGA." International Journal of Embedded and Real-Time Communication Systems 11, no. 3 (July 2020): 84–105. http://dx.doi.org/10.4018/ijertcs.2020070105.

Full text

Abstract:

Reconfigurable hardware presents a useful platform for building systems with high performance and a secured nature. A new method for protecting 128-bit AES accelerator on FPGA for embedded systems and cloud servers is proposed. One of the major issues faced by the AES accelerator is the security of the key stored inside the FPGA memory. The article proposes a masking scheme which makes the secret key unidentifiable. With the new method of masking scheme, there is no way for an attacker to leak and identify the secret key from the working device through undetected hardware unit. To work with the masked key, a modified key expansion that maintains the throughput through a properly designed multistage pipelining is proposed. The proposed method takes the advantage of reconfigurable computing for flexible and provides security against key-leaking Trojans. The efficiency of the masked AES implementation is found to be 28.5 Mbps, which is 17.87% higher than the existing best wok. The security of the proposed masked scheme is validated through correlation and hamming distance.

APA, Harvard, Vancouver, ISO, and other styles

24

Tahir, Ahsen, Gordon Morison, Dawn A. Skelton, and Ryan M. Gibson. "Hardware/Software Co-Design of Fractal Features Based Fall Detection System." Sensors 20, no. 8 (April 18, 2020): 2322. http://dx.doi.org/10.3390/s20082322.

Full text

Abstract:

Falls are a leading cause of death in older adults and result in high levels of mortality, morbidity and immobility. Fall Detection Systems (FDS) are imperative for timely medical aid and have been known to reduce death rate by 80%. We propose a novel wearable sensor FDS which exploits fractal dynamics of fall accelerometer signals. Fractal dynamics can be used as an irregularity measure of signals and our work shows that it is a key discriminant for classification of falls from other activities of life. We design, implement and evaluate a hardware feature accelerator for computation of fractal features through multi-level wavelet transform on a reconfigurable embedded System on Chip, Zynq device for evaluating wearable accelerometer sensors. The proposed FDS utilises a hardware/software co-design approach with hardware accelerator for fractal features and software implementation of Linear Discriminant Analysis on an embedded ARM core for high accuracy and energy efficiency. The proposed system achieves 99.38% fall detection accuracy, 7.3× speed-up and 6.53× improvements in power consumption, compared to the software only execution with an overall performance per Watt advantage of 47.6×, while consuming low reconfigurable resources at 28.67%.

APA, Harvard, Vancouver, ISO, and other styles

25

Gowda, Kavitha Malali Vishveshwarappa, Sowmya Madhavan, Stefano Rinaldi, Parameshachari Bidare Divakarachari, and Anitha Atmakur. "FPGA-Based Reconfigurable Convolutional Neural Network Accelerator Using Sparse and Convolutional Optimization." Electronics 11, no. 10 (May 22, 2022): 1653. http://dx.doi.org/10.3390/electronics11101653.

Full text

Abstract:

Nowadays, the data flow architecture is considered as a general solution for the acceleration of a deep neural network (DNN) because of its higher parallelism. However, the conventional DNN accelerator offers only a restricted flexibility for diverse network models. In order to overcome this, a reconfigurable convolutional neural network (RCNN) accelerator, i.e., one of the DNN, is required to be developed over the field-programmable gate array (FPGA) platform. In this paper, the sparse optimization of weight (SOW) and convolutional optimization (CO) are proposed to improve the performances of the RCNN accelerator. The combination of SOW and CO is used to optimize the feature map and weight sizes of the RCNN accelerator; therefore, the hardware resources consumed by this RCNN are minimized in FPGA. The performances of RCNN-SOW-CO are analyzed by means of feature map size, weight size, sparseness of the input feature map (IFM), weight parameter proportion, block random access memory (BRAM), digital signal processing (DSP) elements, look-up tables (LUTs), slices, delay, power, and accuracy. An existing architectures OIDSCNN, LP-CNN, and DPR-NN are used to justify efficiency of the RCNN-SOW-CO. The LUT of RCNN-SOW-CO with Alexnet designed in the Zynq-7020 is 5150, which is less than the OIDSCNN and DPR-NN.

APA, Harvard, Vancouver, ISO, and other styles

26

Chen, Hui, Kai Chen, Kaifeng Cheng, Qinyu Chen, Yuxiang Fu, and Li Li. "An Efficient Hardware Accelerator for the MUSIC Algorithm." Electronics 8, no. 5 (May 8, 2019): 511. http://dx.doi.org/10.3390/electronics8050511.

Full text

Abstract:

As a classical DOA (direction of arrival) estimation algorithm, the multiple signal classification (MUSIC) algorithm can estimate the direction of signal incidence. A major bottleneck in the application of this algorithm is the large computation amount, so accelerating the algorithm to meet the requirements of high real-time and high precision is the focus. In this paper, we design an efficient and reconfigurable accelerator to implement the MUSIC algorithm. Initially, we propose a hardware-friendly MUSIC algorithm without the eigenstructure decomposition of the covariance matrix, which is time consuming and accounts for about 60% of the whole computation. Furthermore, to reduce the computation of the covariance matrix, this paper utilizes the conjugate symmetry property of it and the way of iterative storage, which can also lessen memory access time. Finally, we adopt the stepwise search method to realize the spectral peak search, which can meet the requirements of 1° and 0.1° precision. The accelerator can operate at a maximum frequency of 1 GHz with a 4,765,475.4 μm2 area, and the power dissipation is 238.27 mW after the gate-level synthesis under the TSMC 40-nm CMOS technology with the Synopsys Design Compiler. Our implementation can accelerate the algorithm to meet the high real-time and high precision requirements in applications. Assuming that the case is an eight-element uniform linear array, a single signal source, and 128 snapshots, the computation times of the algorithm in our architecture are 2.8 μs and 22.7 μs for covariance matrix estimation and spectral peak search, respectively.

APA, Harvard, Vancouver, ISO, and other styles

27

Lu, Anni, Xiaochen Peng, Yandong Luo, Shanshi Huang, and Shimeng Yu. "A Runtime Reconfigurable Design of Compute-in-Memory–Based Hardware Accelerator for Deep Learning Inference." ACM Transactions on Design Automation of Electronic Systems 26, no. 6 (June 28, 2021): 1–18. http://dx.doi.org/10.1145/3460436.

Full text

Abstract:

Compute-in-memory (CIM) is an attractive solution to address the “memory wall” challenges for the extensive computation in deep learning hardware accelerators. For custom ASIC design, a specific chip instance is restricted to a specific network during runtime. However, the development cycle of the hardware is normally far behind the emergence of new algorithms. Although some of the reported CIM-based architectures can adapt to different deep neural network (DNN) models, few details about the dataflow or control were disclosed to enable such an assumption. Instruction set architecture (ISA) could support high flexibility, but its complexity would be an obstacle to efficiency. In this article, a runtime reconfigurable design methodology of CIM-based accelerators is proposed to support a class of convolutional neural networks running on one prefabricated chip instance with ASIC-like efficiency. First, several design aspects are investigated: (1) the reconfigurable weight mapping method; (2) the input side of data transmission, mainly about the weight reloading; and (3) the output side of data processing, mainly about the reconfigurable accumulation. Then, a system-level performance benchmark is performed for the inference of different DNN models, such as VGG-8 on a CIFAR-10 dataset and AlexNet GoogLeNet, ResNet-18, and DenseNet-121 on an ImageNet dataset to measure the trade-offs between runtime reconfigurability, chip area, memory utilization, throughput, and energy efficiency.

APA, Harvard, Vancouver, ISO, and other styles

28

Liu, Bing, Danyin Zou, Lei Feng, Shou Feng, Ping Fu, and Junbao Li. "An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution." Electronics 8, no. 3 (March 3, 2019): 281. http://dx.doi.org/10.3390/electronics8030281.

Full text

Abstract:

The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great advantages due to its low power consumption and reconfigurable property. However, FPGA’s extremely limited resources and CNN’s huge amount of parameters and computational complexity pose great challenges to the design. Based on the ZYNQ heterogeneous platform and the coordination of resource and bandwidth issues with the roofline model, the CNN accelerator we designed can accelerate both standard convolution and depthwise separable convolution with a high hardware resource rate. The accelerator can handle network layers of different scales through parameter configuration and maximizes bandwidth and achieves full pipelined by using a data stream interface and ping-pong on-chip cache. The experimental results show that the accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can also accelerate depthwise separable convolution, which has obvious advantages compared with other designs.

APA, Harvard, Vancouver, ISO, and other styles

29

Yan, Tianwei, Ning Zhang, Jie Li, Wenchao Liu, and He Chen. "Automatic Deployment of Convolutional Neural Networks on FPGA for Spaceborne Remote Sensing Application." Remote Sensing 14, no. 13 (June 29, 2022): 3130. http://dx.doi.org/10.3390/rs14133130.

Full text

Abstract:

In recent years, convolutional neural network (CNN)-based algorithms have been widely used in remote sensing image processing and show tremendous performance in a variety of application fields. However, large amounts of data and intensive computations make the deployment of CNN-based algorithms a challenging problem, especially for the spaceborne scenario where resources and power consumption are limited. To tackle this problem, this paper proposes an automatic CNN deployment solution on resource-limited field-programmable gate arrays (FPGAs) for spaceborne remote sensing applications. Firstly, a series of hardware-oriented optimization methods are proposed to reduce the complexity of the CNNs. Secondly, a hardware accelerator is designed. In this accelerator, a reconfigurable processing engine array with efficient convolutional computation architecture is used to accelerate CNN-based algorithms. Thirdly, to bridge the optimized CNNs and hardware accelerator, a compilation toolchain is introduced into the deployment solution. Through the automatic conversion from CNN models to hardware instructions, various networks can be deployed on hardware in real-time. Finally, we deployed an improved VGG16 network and an improved YOLOv2 network on Xilinx AC701 to evaluate the effectiveness of the proposed deployment solution. The experiments show that with only 3.407 W power consumption and 94 DSP consumption, our solution achieves 23.06 giga operations per second (GOPS) throughput in the improved VGG16 and 22.17 GOPS throughput in the improved YOLOv2. Compared to the related works, the DSP efficiency of our solution is improved by 1.3–2.7×.

APA, Harvard, Vancouver, ISO, and other styles

30

IBRAHIM, Atef, Hamed ELSIMARY, and Abdullah ALJUMAH. "Novel Reconfigurable Hardware Accelerator for Protein Sequence Alignment Using Smith-Waterman Algorithm." IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E99.A, no. 3 (2016): 683–90. http://dx.doi.org/10.1587/transfun.e99.a.683.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Müller, Jan, Dirk Fimmel, Renate Merker, and Rainer Schaffer. "A Hardware–Software System for Tomographic Reconstruction." Journal of Circuits, Systems and Computers 12, no. 02 (April 2003): 203–29. http://dx.doi.org/10.1142/s021812660300074x.

Full text

Abstract:

We present the design of a hardware–software system for the reconstruction of tomographic images. In a systematic approach we developed the parallel processor array, a reconfigurable hardware controller and processing kernel, and the software control up to the integration into a graphical user interface. The processor array acting as a hardware accelerator, is constructed using theoretical results and methods of application-specific hardware design. The reconfigurability of the system allows one to utilize a much wider realm of algorithms than the three reconstruction algorithms implemented so far. In the paper we discuss the system design at different levels from algorithm transformations to board development.

APA, Harvard, Vancouver, ISO, and other styles

32

Barrios, Yubal, Alfonso Rodríguez, Antonio Sánchez, Arturo Pérez, Sebastián López, Andrés Otero, Eduardo de la Torre, and Roberto Sarmiento. "Lossy Hyperspectral Image Compression on a Reconfigurable and Fault-Tolerant FPGA-Based Adaptive Computing Platform." Electronics 9, no. 10 (September 26, 2020): 1576. http://dx.doi.org/10.3390/electronics9101576.

Full text

Abstract:

This paper describes a novel hardware implementation of a lossy multispectral and hyperspectral image compressor for on-board operation in space missions. The compression algorithm is a lossy extension of the Consultative Committee for Space Data Systems (CCSDS) 123.0-B-1 lossless standard that includes a bit-rate control stage, which in turn manages the losses the compressor may introduce to achieve higher compression ratios without compromising the recovered image quality. The algorithm has been implemented using High-Level Synthesis (HLS) techniques to increase design productivity by raising the abstraction level. The proposed lossy compression solution is deployed onto ARTICo3, a dynamically reconfigurable multi-accelerator architecture, obtaining a run-time adaptive solution that enables user-selectable performance (i.e., load more hardware accelerators to transparently increase throughput), power consumption, and fault tolerance (i.e., group hardware accelerators to transparently enable hardware redundancy). The whole compression solution is tested on a Xilinx Zynq UltraScale+ Field-Programmable Gate Array (FPGA)-based MPSoC using different input images, from multispectral to ultraspectral. For images acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), the proposed implementation renders an execution time of approximately 36 s when 8 accelerators are compressing concurrently at 100 MHz, which in turn uses around 20% of the LUTs and 17% of the dedicated memory blocks available in the target device. In this scenario, a speedup of 15.6× is obtained in comparison with a pure software version of the algorithm running in an ARM Cortex-A53 processor.

APA, Harvard, Vancouver, ISO, and other styles

33

Rashid, Muhammad, Omar S. Sonbul, Muhammad Yousuf Irfan Zia, Muhammad Arif, Asher Sajid, and Saud S. Alotaibi. "Throughput/Area-Efficient Accelerator of Elliptic Curve Point Multiplication over GF(2233) on FPGA." Electronics 12, no. 17 (August 26, 2023): 3611. http://dx.doi.org/10.3390/electronics12173611.

Full text

Abstract:

This paper presents a throughput/area-efficient hardware accelerator architecture for elliptic curve point multiplication (ECPM) computation over GF(2233). The throughput of the proposed accelerator design is optimized by reducing the total clock cycles using a bit-parallel Karatsuba modular multiplier. We employ two techniques to minimize the hardware resources: (i) a consolidated arithmetic unit where we combine a single modular adder, multiplier, and square block instead of having multiple modular operators, and (ii) an Itoh–Tsujii inversion algorithm by leveraging the existing hardware resources of the multiplier and square units for multiplicative inverse computation. An efficient finite-state-machine (FSM) controller is implemented to facilitate control functionalities. To evaluate and compare the results of the proposed accelerator architecture against state-of-the-art solutions, a figure-of-merit (FoM) metric in terms of throughput/area is defined. The implementation results after post-place-and-route simulation are reported for reconfigurable field-programmable gate array (FPGA) devices. Particular to Virtex-7 FPGA, the accelerator utilizes 3584 slices, needs 7208 clock cycles, operates on a maximum frequency of 350 MHz, computes one ECPM operation in 20.59 μs, and the calculated value of FoM is 13.54. Consequently, the results and comparisons reveal that our accelerator suits applications that demand throughput and area-optimized ECPM implementations.

APA, Harvard, Vancouver, ISO, and other styles

34

Tan, Yonghao, Mengying Sun, Huanshihong Deng, Haihan Wu, Minghao Zhou, Yifei Chen, Zhuo Yu, et al. "A Reconfigurable Visual–Inertial Odometry Accelerated Core with High Area and Energy Efficiency for Autonomous Mobile Robots." Sensors 22, no. 19 (October 9, 2022): 7669. http://dx.doi.org/10.3390/s22197669.

Full text

Abstract:

With the wide application of autonomous mobile robots (AMRs), the visual inertial odometer (VIO) system that realizes the positioning function through the integration of a camera and inertial measurement unit (IMU) has developed rapidly, but it is still limited by the high complexity of the algorithm, the long development cycle of the dedicated accelerator, and the low power supply capacity of AMRs. This work designs a reconfigurable accelerated core that supports different VIO algorithms and has high area and energy efficiency, precision, and speed processing characteristics. Experimental results show that the loss of accuracy of the proposed accelerator is negligible on the most authoritative dataset. The on-chip memory usage of 70 KB is at least 10× smaller than the state-of-the-art works. Thus, the FPGA implementation’s hardware-resource consumption, power dissipation, and synthesis in the 28 nm CMOS outperform the previous works with the same platform.

APA, Harvard, Vancouver, ISO, and other styles

35

A, Sasikumar, Logesh Ravi, Ketan Kotecha, Indragandhi V, and Subramaniyaswamy V. "Reconfigurable and hardware efficient adaptive quantization model-based accelerator for binarized neural network." Computers and Electrical Engineering 102 (September 2022): 108302. http://dx.doi.org/10.1016/j.compeleceng.2022.108302.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Guo, Shuaizhi, Tianqi Wang, Linfeng Tao, Teng Tian, Zikun Xiang, and Xi Jin. "RP-Ring: A Heterogeneous Multi-FPGA Accelerator." International Journal of Reconfigurable Computing 2018 (2018): 1–14. http://dx.doi.org/10.1155/2018/6784319.

Full text

Abstract:

To reduce the cost of designing new specialized FPGA boards as direct-summation MOND (Modified Newtonian Dynamics) simulator, we propose a new heterogeneous architecture with existing FPGA boards, which is called RP-ring (reconfigurable processor ring). This design can be expanded conveniently with any available FPGA board and only requires quite low communication bandwidth between FPGA boards. The communication protocol is simple and can be implemented with limited hardware/software resources. In order to avoid overall performance loss caused by the slowest board, we build a mathematical model to decompose workload among FPGAs. The dividing of workload is based on the logic resource, memory access bandwidth, and communication bandwidth of each FPGA chip. Our accelerator can achieve two orders of magnitude speedup compared with CPU implementation.

APA, Harvard, Vancouver, ISO, and other styles

37

Ghani, Arfan, Rawad Hodeify, Chan H. See, Simeon Keates, Dah-Jye Lee, and Ahmed Bouridane. "Computer Vision-Based Kidney’s (HK-2) Damaged Cells Classification with Reconfigurable Hardware Accelerator (FPGA)." Electronics 11, no. 24 (December 19, 2022): 4234. http://dx.doi.org/10.3390/electronics11244234.

Full text

Abstract:

In medical and health sciences, the detection of cell injury plays an important role in diagnosis, personal treatment and disease prevention. Despite recent advancements in tools and methods for image classification, it is challenging to classify cell images with higher precision and accuracy. Cell classification based on computer vision offers significant benefits in biomedicine and healthcare. There have been studies reported where cell classification techniques have been complemented by Artificial Intelligence-based classifiers such as Convolutional Neural Networks. These classifiers suffer from the drawback of the scale of computational resources required for training and hence do not offer real-time classification capabilities for an embedded system platform. Field Programmable Gate Arrays (FPGAs) offer the flexibility of hardware reconfiguration and have emerged as a viable platform for algorithm acceleration. Given that the logic resources and on-chip memory available on a single device are still limited, hardware/software co-design is proposed where image pre-processing and network training were performed in software, and trained architectures were mapped onto an FPGA device (Nexys4DDR) for real-time cell classification. This paper demonstrates that the embedded hardware-based cell classifier performs with almost 100% accuracy in detecting different types of damaged kidney cells.

APA, Harvard, Vancouver, ISO, and other styles

38

Sestito, Cristian, Fanny Spagnolo, and Stefania Perri. "Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions." Journal of Imaging 7, no. 10 (October 12, 2021): 210. http://dx.doi.org/10.3390/jimaging7100210.

Full text

Abstract:

Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy.

APA, Harvard, Vancouver, ISO, and other styles

39

Kalomiros, John, and John Lygouras. "Robotic Mapping and Localization with Real-Time Dense Stereo on Reconfigurable Hardware." International Journal of Reconfigurable Computing 2010 (2010): 1–17. http://dx.doi.org/10.1155/2010/480208.

Full text

Abstract:

A reconfigurable architecture for dense stereo is presented as an observation framework for a real-time implementation of the simultaneous localization and mapping problem in robotics. The reconfigurable sensor detects point features from stereo image pairs to use at the measurement update stage of the procedure. The main hardware blocks are a dense depth stereo accelerator, a left and right image corner detector, and a stage performing left-right consistency check. For the stereo-processor stage, we have implemented and tested a global-matching component based on a maximum-likelihood dynamic programming technique. The system includes a Nios II processor for data control and a USB 2.0 interface for host communication. Remote control is used to guide a vehicle equipped with a stereo head in an indoor environment. The FastSLAM Bayesian algorithm is applied in order to track and update observations and the robot path in real time. The system is assessed using real scene depth detection and public reference data sets. The paper also reports resource usage and a comparison of mapping and localization results with ground truth.

APA, Harvard, Vancouver, ISO, and other styles

40

Zhang, Peiheng. "An Implementation of Reconfigurable Computing Accelerator Card Oriented Bioinformatics." Journal of Computer Research and Development 42, no. 6 (2005): 930. http://dx.doi.org/10.1360/crad20050605.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Chen, Yupeng, Bertil Schmidt, and Douglas L. Maskell. "Reconfigurable Accelerator for the Word-Matching Stage of BLASTN." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21, no. 4 (April 2013): 659–69. http://dx.doi.org/10.1109/tvlsi.2012.2196060.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Kurdi, Aous H., Janos L. Grantner, and Ikhlas M. Abdel-Qader. "Fuzzy Logic Based Hardware Accelerator with Partially Reconfigurable Defuzzification Stage for Image Edge Detection." International Journal of Reconfigurable Computing 2017 (2017): 1–13. http://dx.doi.org/10.1155/2017/1325493.

Full text

Abstract:

In this paper, the design and the implementation of a pipelined hardware accelerator based on a fuzzy logic approach for an edge detection system are presented. The fuzzy system comprises a preprocessing stage, a fuzzifier with four fuzzy inputs, an inference system with seven rules, and a defuzzification stage delivering a single crisp output, which represents the intensity value of a pixel in the output image. The hardware accelerator consists of seven stages with one clock cycle latency per stage. The defuzzification stage was implemented using three different defuzzification methods. These methods are the mean of maxima, the smallest of maxima, and the largest of maxima. The defuzzification modules are interchangeable while the system runs using partial reconfiguration design methodology. System development was carried out using Vivado High-Level Synthesis, Vivado Design Suite, Vivado Simulator, and a set of Xilinx 7000 FPGA devices. Depending upon the speed grade of the device that is employed, the system can operate at a frequency range from 83 MHz to 125 MHz. Its peak performance is up to 58 high definition frames per second. A comparison of this system’s performance and its software counterpart shows a significant speedup in the magnitude of hundred thousand times.

APA, Harvard, Vancouver, ISO, and other styles

43

Sugiarto, Indar, Cristian Axenie, and Jörg Conradt. "FPGA-Based Hardware Accelerator for an Embedded Factor Graph with Configurable Optimization." Journal of Circuits, Systems and Computers 28, no. 02 (November 12, 2018): 1950031. http://dx.doi.org/10.1142/s0218126619500312.

Full text

Abstract:

A factor graph (FG) can be considered as a unified model combining a Bayesian network (BN) and a Markov random field (MRF). The inference mechanism of a FG can be used to perform reasoning under incompleteness and uncertainty, which is a challenging task in many intelligent systems and robotics. Unfortunately, a complete inference mechanism requires intense computations that introduces a long delay for the reasoning process to complete. Furthermore, in an energy-constrained system such as a mobile robot, it is required to have a very efficient inference process. In this paper, we present an embedded FG inference engine that employs a neural-inspired discretization mechanism. The engine runs on a system-on-chip (SoC) and is accelerated by its FPGA. We optimized our design to balance the trade-off between speed and hardware resource utilization. In our fully-optimized design, it can accelerate the inference process eight times faster than the normal execution, which is twice the speed-up gain achieved by a parallelized FG running on a PC. The experiments demonstrate that our design can be extended into an efficient reconfigurable computing machine.

APA, Harvard, Vancouver, ISO, and other styles

44

Yazdani, Samar, Joël Cambonie, and Bernard Pottier. "Coordinated concurrent memory accesses on a reconfigurable multimedia accelerator." Microprocessors and Microsystems 33, no. 1 (February 2009): 13–23. http://dx.doi.org/10.1016/j.micpro.2008.08.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Schmitt, Christian, Moritz Schmid, Sebastian Kuckuk, Harald Köstler, Jürgen Teich, and Frank Hannig. "Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution." Parallel Processing Letters 28, no. 04 (December 2018): 1850016. http://dx.doi.org/10.1142/s0129626418500160.

Full text

Abstract:

Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096[Formula: see text]4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.

APA, Harvard, Vancouver, ISO, and other styles

46

Lopes, Alba, and Monica Pereira. "Fast DSE of reconfigurable accelerator systems via ensemble machine learning." Analog Integrated Circuits and Signal Processing 108, no. 3 (May 28, 2021): 495–509. http://dx.doi.org/10.1007/s10470-021-01885-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Yang, Ruiheng, Zhikun Chen, Bin’an Wang, Yunfei Guo, and Lingtong Hu. "A Lightweight Detection Method for Remote Sensing Images and Its Energy-Efficient Accelerator on Edge Devices." Sensors 23, no. 14 (July 18, 2023): 6497. http://dx.doi.org/10.3390/s23146497.

Full text

Abstract:

Convolutional neural networks (CNNs) have been extensively employed in remote sensing image detection and have exhibited impressive performance over the past few years. However, the abovementioned networks are generally limited by their complex structures, which make them difficult to deploy with power-sensitive and resource-constrained remote sensing edge devices. To tackle this problem, this study proposes a lightweight remote sensing detection network suitable for edge devices and an energy-efficient CNN accelerator based on field-programmable gate arrays (FPGAs). First, a series of network weight reduction and optimization methods are proposed to reduce the size of the network and the difficulty of hardware deployment. Second, a high-energy-efficiency CNN accelerator is developed. The accelerator employs a reconfigurable and efficient convolutional processing engine to perform CNN computations, and hardware optimization was performed for the proposed network structure. The experimental results obtained with the Xilinx ZYNQ Z7020 show that the network achieved higher accuracy with a smaller size, and the CNN accelerator for the proposed network exhibited a throughput of 29.53 GOPS and power consumption of only 2.98 W while consuming only 113 DSPs. In comparison with relevant work, DSP efficiency at an identical level of energy consumption was increased by 1.1–2.5 times, confirming the superiority of the proposed solution and its potential for deployment with remote sensing edge devices.

APA, Harvard, Vancouver, ISO, and other styles

48

Mehdipour, Farhad, Hiroaki Honda, Koji Inoue, Hiroshi Kataoka, and Kazuaki Murakami. "A design scheme for a reconfigurable accelerator implemented by single-flux quantum circuits." Journal of Systems Architecture 57, no. 1 (January 2011): 169–79. http://dx.doi.org/10.1016/j.sysarc.2010.07.009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Chien, Shao-Yi, and Liang-Gee Chen. "Reconfigurable Morphological Image Processing Accelerator for Video Object Segmentation." Journal of Signal Processing Systems 62, no. 1 (November 18, 2008): 77–96. http://dx.doi.org/10.1007/s11265-008-0311-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Tan, Cheng, Chenhao Xie, Tong Geng, Andres Marquez, Antonino Tumeo, Kevin Barker, and Ang Li. "ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing." IEEE Transactions on Parallel and Distributed Systems 32, no. 12 (December 1, 2021): 2880–92. http://dx.doi.org/10.1109/tpds.2021.3081074.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!