Log in

Relevant bibliographies by topics / Hardware/algorithm co-design / Journal articles

Journal articles on the topic 'Hardware/algorithm co-design'

To see the other types of publications on this topic, follow the link: Hardware/algorithm co-design.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Hardware/algorithm co-design.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Chen, Andrew, Rohaan Gupta, Anton Borzenko, Kevin Wang, and Morteza Biglari-Abhari. "Accelerating SuperBE with Hardware/Software Co-Design." Journal of Imaging 4, no. 10 (October 18, 2018): 122. http://dx.doi.org/10.3390/jimaging4100122.

Full text

Abstract:

Background Estimation is a common computer vision task, used for segmenting moving objects in video streams. This can be useful as a pre-processing step, isolating regions of interest for more complicated algorithms performing detection, recognition, and identification tasks, in order to reduce overall computation time. This is especially important in the context of embedded systems like smart cameras, which may need to process images with constrained computational resources. This work focuses on accelerating SuperBE, a superpixel-based background estimation algorithm that was designed for simplicity and reducing computational complexity while maintaining state-of-the-art levels of accuracy. We explore both software and hardware acceleration opportunities, converting the original algorithm into a greyscale, integer-only version, and using Hardware/Software Co-design to develop hardware acceleration components on FPGA fabric that assist a software processor. We achieved a 4.4× speed improvement with the software optimisations alone, and a 2× speed improvement with the hardware optimisations alone. When combined, these led to a 9× speed improvement on a Cyclone V System-on-Chip, delivering almost 38 fps on 320 × 240 resolution images.

APA, Harvard, Vancouver, ISO, and other styles

2

Krawczyk, Kamil, Paweł Tomaszewicz, and Mariusz Rawski. "Whirlpool SoPC Implementation - Hardware/Software Co-Design Example." International Journal of Electronics and Telecommunications 58, no. 1 (March 1, 2012): 21–26. http://dx.doi.org/10.2478/v10177-012-0003-9.

Full text

Abstract:

Whirlpool SoPC Implementation - Hardware/Software Co-Design Example The aim of this work was to design a System on Programmable Chip (SoPC), that implements the Whirlpool Hash Function (WHF) algorithm. An assumption of the project was to use an embedded soft-processor NIOS II controlling the whole system, which functionality was extended by a custom logic in order to improve the used algorithm efficiency. This paper presents the Whirlpool Hash Function realized in several SoPC configurations, which differ in implementation complexity and performance.

APA, Harvard, Vancouver, ISO, and other styles

3

López, M., J. Daugman, and E. Cantó. "Hardware–software co-design of an iris recognition algorithm." IET Information Security 5, no. 1 (2011): 60. http://dx.doi.org/10.1049/iet-ifs.2009.0267.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Li, Shih-An, Chen-Chien Hsu, Ching-Chang Wong, and Chia-Jun Yu. "Hardware/software co-design for particle swarm optimization algorithm." Information Sciences 181, no. 20 (October 2011): 4582–96. http://dx.doi.org/10.1016/j.ins.2010.07.017.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Alecsa, Bogdan, and Alexandru Onea. "Hardware-Software Co-Design for BLDC Motor Speed Controller Design." Advanced Materials Research 463-464 (February 2012): 1256–59. http://dx.doi.org/10.4028/www.scientific.net/amr.463-464.1256.

Full text

Abstract:

This paper proposes a combined hardware-software approach for a controller design. The case of a brushless DC (BLDC) motor speed controller is studied. A hardware controller is implemented inside a field programmable gate array (FPGA) device, together with soft core processors that implement by software non-critical tasks, like liquid crystal display (LCD) interface and serial data communication to a host computer. This way, the control algorithm is executed in hardware, as fast as possible, while the monitoring tasks are performed by the software. Experimental results are provided, showing the working design.

APA, Harvard, Vancouver, ISO, and other styles

6

Zhang, Xinyi, Yawen Wu, Peipei Zhou, Xulong Tang, and Jingtong Hu. "Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices." ACM Transactions on Embedded Computing Systems 20, no. 5s (October 31, 2021): 1–24. http://dx.doi.org/10.1145/3477002.

Full text

Abstract:

Multi-head self-attention (attention mechanism) has been employed in a variety of fields such as machine translation, language modeling, and image processing due to its superiority in feature extraction and sequential data analysis. This is benefited from a large number of parameters and sophisticated model architecture behind the attention mechanism. To efficiently deploy attention mechanism on resource-constrained devices, existing works propose to reduce the model size by building a customized smaller model or compressing a big standard model. A customized smaller model is usually optimized for the specific task and needs effort in model parameters exploration. Model compression reduces model size without hurting the model architecture robustness, which can be efficiently applied to different tasks. The compressed weights in the model are usually regularly shaped (e.g. rectangle) but the dimension sizes vary (e.g. differs in rectangle height and width). Such compressed attention mechanism can be efficiently deployed on CPU/GPU platforms as their memory and computing resources can be flexibly assigned with demand. However, for Field Programmable Gate Arrays (FPGAs), the data buffer allocation and computing kernel are fixed at run time to achieve maximum energy efficiency. After compression, weights are much smaller and different in size, which leads to inefficient utilization of FPGA on-chip buffer. Moreover, the different weight heights and widths may lead to inefficient FPGA computing kernel execution. Due to the large number of weights in the attention mechanism, building a unique buffer and computing kernel for each compressed weight on FPGA is not feasible. In this work, we jointly consider the compression impact on buffer allocation and the required computing kernel during the attention mechanism compressing. A novel structural pruning method with memory footprint awareness is proposed and the associated accelerator on FPGA is designed. The experimental results show that our work can compress Transformer (an attention mechanism based model) by 95x. The developed accelerator can fully utilize the FPGA resource, processing the sparse attention mechanism with the run-time throughput performance of 1.87 Tops in ZCU102 FPGA.

APA, Harvard, Vancouver, ISO, and other styles

7

Ismael, Sarmad, Omar Tareq, and Yahya Taher Qassim. "Hardware/software co-design for a parallel three-dimensional bresenham’s algorithm." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 1 (February 1, 2019): 148. http://dx.doi.org/10.11591/ijece.v9i1.pp148-156.

Full text

Abstract:

<p>Line plotting is the one of the basic operations in the scan conversion. Bresenham’s line drawing algorithm is an efficient and high popular algorithm utilized for this purpose. This algorithm starts from one end-point of the line to the other end-point by calculating one point at each step. As a result, the calculation time for all the points depends on the length of the line thereby the number of the total points presented. In this paper, we developed an approach to speed up the Bresenham algorithm by partitioning each line into number of segments, find the points belong to those segments and drawing them simultaneously to formulate the main line. As a result, the higher number of segments generated, the faster the points are calculated. By employing 32 cores in the Field Programmable Gate Array, a line of length 992 points is formulated in 0.31μs only. The complete system is implemented using Zybo board that contains the Xilinx Zynq-7000 chip (Z-7010).<em></em></p>

APA, Harvard, Vancouver, ISO, and other styles

8

Grout, Ian Andrew, and Lenore Mullin. "Realizing Mathematics of Arrays Operations as Custom Architecture Hardware-Software Co-Design Solutions." Information 13, no. 11 (November 4, 2022): 528. http://dx.doi.org/10.3390/info13110528.

Full text

Abstract:

In embedded electronic system applications being developed today, complex datasets are required to be obtained, processed, and communicated. These can be from various sources such as environmental sensors, still image cameras, and video cameras. Once obtained and stored in electronic memory, the data is accessed and processed using suitable mathematical algorithms. How the data are stored, accessed, processed, and communicated will impact on the cost to process the data. Such algorithms are traditionally implemented in software programs that run on a suitable processor. However, different approaches can be considered to create the digital system architecture that would consist of the memory, processing, and communications operations. When considering the mathematics at the centre of the design making processes, this leads to system architectures that can be optimized for the required algorithm or algorithms to realize. Mathematics of Arrays (MoA) is a class of operations that supports n-dimensional array computations using array shapes and indexing of values held within the array. In this article, the concept of MoA is considered for realization in software and hardware using Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) technologies. The realization of MoA algorithms will be developed along with the design choices that would be required to map a MoA algorithm to hardware, software or hardware-software co-designs.

APA, Harvard, Vancouver, ISO, and other styles

9

Raghunathan, Shriram, Sumeet K. Gupta, Himanshu S. Markandeya, Kaushik Roy, and Pedro P. Irazoqui. "A hardware-algorithm co-design approach to optimize seizure detection algorithms for implantable applications." Journal of Neuroscience Methods 193, no. 1 (October 2010): 106–17. http://dx.doi.org/10.1016/j.jneumeth.2010.08.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Drumond, Mario, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, and Dionisios Pnevmatikatos. "Algorithm/Architecture Co-Design for Near-Memory Processing." ACM SIGOPS Operating Systems Review 52, no. 1 (August 28, 2018): 109–22. http://dx.doi.org/10.1145/3273982.3273992.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

ANDO, Kota, Kodai UEYOSHI, Yuka OBA, Kazutoshi HIROSE, Ryota UEMATSU, Takumi KUDO, Masayuki IKEBE, Tetsuya ASAI, Shinya TAKAMAEDA-YAMAZAKI, and Masato MOTOMURA. "Dither NN: Hardware/Algorithm Co-Design for Accurate Quantized Neural Networks." IEICE Transactions on Information and Systems E102.D, no. 12 (December 1, 2019): 2341–53. http://dx.doi.org/10.1587/transinf.2019pap0009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Ghaffari, Sina, Parastoo Soleimani, Kin Fun Li, and David W. Capson. "A Novel Hardware–Software Co-Design and Implementation of the HOG Algorithm." Sensors 20, no. 19 (October 2, 2020): 5655. http://dx.doi.org/10.3390/s20195655.

Full text

Abstract:

The histogram of oriented gradients is a commonly used feature extraction algorithm in many applications. Hardware acceleration can boost the speed of this algorithm due to its large number of computations. We propose a hardware–software co-design of the histogram of oriented gradients and the subsequent support vector machine classifier, which can be used to process data from digital image sensors. Our main focus is to minimize the resource usage of the algorithm while maintaining its accuracy and speed. This design and implementation make four contributions. First, we allocate the computationally expensive steps of the algorithm, including gradient calculation, magnitude computation, bin assignment, normalization and classification, to hardware, and the less complex windowing step to software. Second, we introduce a logarithm-based bin assignment. Third, we use parallel computation and a time-sharing protocol to create a histogram in order to achieve the processing of one pixel per clock cycle after the initialization (setup time) of the pipeline, and produce valid results at each clock cycle afterwards. Finally, we use a simplified block normalization logic to reduce hardware resource usage while maintaining accuracy. Our design attains a frame rate of 115 frames per second on a Xilinx® Kintex® Ultrascale™ FPGA while using less hardware resources, and only losing accuracy marginally, in comparison with other existing work.

APA, Harvard, Vancouver, ISO, and other styles

13

Schumann, Thomas, Herbert Krauß, Yeong Kang Lai, and Yu Fan Lai. "Hardware/Software Co-Design of 2D-to-3D Video Conversion on FPGA." Applied Mechanics and Materials 284-287 (January 2013): 3230–34. http://dx.doi.org/10.4028/www.scientific.net/amm.284-287.3230.

Full text

Abstract:

With advances in technology, 3D video technology becomes possible and attractive. However, there are still many pre-recorded 2D videos/images which need to get transferred to 3D. Hence this paper presents a high quality view synthesis algorithm and architecture for 2D-to-3D video conversion. During the process of view synthesis, the monocular depth information together with the intermediate view is synthesized to the left-eye and right-eye view. The proposed view synthesis algorithm consists of two parts: 3D image warping and inpainting (hole filling). 3D image warping transforms a 2D camera image plane to a 3D coordinate plane. However the integer grid points of the reference are warped to irregularly spaced points in the virtual view, resulting in occlusion problems. Thus inpainting is needed to fix the virtual images. The proposed algorithm shows an improved PSNR gain of 0.2~1.5dB. We adopt hardware/software co-design to accomplish the proposed view synthesis algorithm. For this we implemented the image inpainting on a FPGA device and the remaining algorithm in software.

APA, Harvard, Vancouver, ISO, and other styles

14

Hou, Neng, Xiaohu Yan, and Fazhi He. "A survey on partitioning models, solution algorithms and algorithm parallelization for hardware/software co-design." Design Automation for Embedded Systems 23, no. 1-2 (April 30, 2019): 57–77. http://dx.doi.org/10.1007/s10617-019-09220-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Zhou, Wenqian. "Fast Implementation of Genetic Algorithm Based on Software/Hardware Co-design Method." Journal of Physics: Conference Series 1952, no. 3 (June 1, 2021): 032044. http://dx.doi.org/10.1088/1742-6596/1952/3/032044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Yang, Fu, Liu Xin, and Pei Yuan Guo. "A Multi-Objective Optimization Genetic Algorithm for SOPC Hardware-Software Partitioning." Advanced Materials Research 457-458 (January 2012): 1142–48. http://dx.doi.org/10.4028/www.scientific.net/amr.457-458.1142.

Full text

Abstract:

Hardware-software partitioning is the key technology in hardware-software co-design; the results will determine the design of system directly. Genetic algorithm is a classical search algorithm for solving such combinatorial optimization problem. A Multi-objective genetic algorithm for hardware-software partitioning is presented in this paper. This method can give consideration to both system performance and indicators such as time, power, area and cost, and achieve multi-objective optimization in system on programmable chip (SOPC). Simulation results show that the method can solve the SOPC hardware-software partitioning problem effectively.

APA, Harvard, Vancouver, ISO, and other styles

17

JOHNSTON, S. P., G. PRASAD, L. MAGUIRE, and T. M. MCGINNITY. "AN FPGA HARDWARE/SOFTWARE CO-DESIGN TOWARDS EVOLVABLE SPIKING NEURAL NETWORKS FOR ROBOTICS APPLICATION." International Journal of Neural Systems 20, no. 06 (December 2010): 447–61. http://dx.doi.org/10.1142/s0129065710002541.

Full text

Abstract:

This paper presents an approach that permits the effective hardware realization of a novel Evolvable Spiking Neural Network (ESNN) paradigm on Field Programmable Gate Arrays (FPGAs). The ESNN possesses a hybrid learning algorithm that consists of a Spike Timing Dependent Plasticity (STDP) mechanism fused with a Genetic Algorithm (GA). The design and implementation direction utilizes the latest advancements in FPGA technology to provide a partitioned hardware/software co-design solution. The approach achieves the maximum FPGA flexibility obtainable for the ESNN paradigm. The algorithm was applied as an embedded intelligent system robotic controller to solve an autonomous navigation and obstacle avoidance problem.

APA, Harvard, Vancouver, ISO, and other styles

18

Dessai, Sanket, and Sandeep G. "Embedded Hardware Circuit and Software Development of USB based Hardware Accelerator." International Journal of Reconfigurable and Embedded Systems (IJRES) 7, no. 1 (May 30, 2018): 21. http://dx.doi.org/10.11591/ijres.v7.i1.pp21-33.

Full text

Abstract:

<p>This paper focus on design and develop a Hardware Accelerator which can plug in to Universal Serial Bus of any modern low power low cost embedded development system to do complex processing in a plug and play development environment. Cryptographic algorithms, steganography and encoding decoding applications can use co-devices to accelerate performance. In this paper an implementation of a hardware infrastructure for computing though USB bus of any small scale embedded controller board. Execution engine of the accelerator will be an FPGA which is connected to a USB controller with DDR memory to store user data. FPGAs can perform the process faster than low power microcontrollers to solve such algorithms. For the implementation XILINX ARTIX 7 FPGA is used to off load the algorithm for faster processing. System also has a Cypress USB interface chip for offloading data path. Hardware also has a DRAM memory for dumping the data to be stored. Design also explores different futuristic features like interrupt connection for faster response path, shared memory architecture for hand shake mechanism and GPIO connection for implementation of faster interfaces for IO expansion.</p>

APA, Harvard, Vancouver, ISO, and other styles

19

Farouk, Yasmeen, and Sherine Rady. "Optimizing MRI Registration using Software/Hardware Co-Design Model on FPGA." International Journal of Innovative Technology and Exploring Engineering 10, no. 2 (December 10, 2020): 128–37. http://dx.doi.org/10.35940/ijitee.b8300.1210220.

Full text

Abstract:

The correct localization of brain tissue deformation and determination of the tumor growth relies majorly on the accuracy of the process known by image registration. Poor registration may lead to misclassified diseases and highly affect image-guided surgery and radiation therapies. Voxel-based morphometry (VBM) is an image analytical technique encompassing accurate registration but suffers from intensive time computations, similar to most of image registration techniques. Achieving the compromise between accuracy and computations is a challenging mission. Field programmable gate arrays have fast-evolving and customizable hardware acceleration capabilities that promise to help speed up computational tasks. This paper presents a software/hardware co-design model for accelerating the implementation of the diffeomorphic image registration algorithm ‘DARTEL’ as a part of VBM that analyzes MRI images. An optimized and pipelined hardware architecture is proposed and integrated into the Statistical Parametric Mapping (SPM) software tool that runs the DARTEL. Acceleration of the DARTEL registration algorithm resulted in a speedup factor of 114x on function-level, compared to the CPU with a contribution of 8x faster for the overall performance in the registration process of the SPM. The proposed model is successfully validated for the identification of Alzheimer’s disease based on T1-weighted MRI. A proposed software/hardware co-design model for VBM achieves remarkable acceleration while maintaining classification accuracy and proving proficiency against other CPU and GPU implementations.

APA, Harvard, Vancouver, ISO, and other styles

20

Feng, Xiao Jing, Xi Li, Wang Chao, Xue Hai Zhou, and Jun Neng Zhang. "A Hardware/Software Co-Design Flow for Dynamic Partial Reconfiguration." Advanced Materials Research 433-440 (January 2012): 5172–77. http://dx.doi.org/10.4028/www.scientific.net/amr.433-440.5172.

Full text

Abstract:

The strict requirements on both performance and flexibility lead us to apply Dynamic Partial Reconfiguration (DPR) technology in embedded systems. However, existing DPR design flows are still immature, since previous works mainly focus on hardware designs while ignore software designs for DPR. To remedy this weakness, this paper proposes a hardware/software (HW/SW) co-design flow for DPR. The co-design flow aims at accelerating the process of DPR designs, and it merges software and hardware design flows to make them operate in parallel. Besides, in order to validate the effectiveness of our co-design flow, we implement a partial self-reconfigurable prototype system on Xilinx Virtex-5 platform and perform a set of experiments. Experimental results present that the reconfiguration overhead for partial reconfiguration is only 4.66% against global reconfiguration in our prototype. It’s also presented that our prototype can achieve a 23.6 × speedup over software algorithm solutions.

APA, Harvard, Vancouver, ISO, and other styles

21

Xiao, Hao, Yuxuan Liu, Zhenmin Li, and Guangzhu Liu. "Algorithm-hardware co-design of ultra-high radix based high throughput modular multiplier." IEICE Electronics Express 18, no. 10 (May 25, 2021): 20210135. http://dx.doi.org/10.1587/elex.18.20210135.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Yan, Xiaohu, Fazhi He, Neng Hou, and Haojun Ai. "An Efficient Particle Swarm Optimization for Large-Scale Hardware/Software Co-Design System." International Journal of Cooperative Information Systems 27, no. 01 (March 2018): 1741001. http://dx.doi.org/10.1142/s0218843017410015.

Full text

Abstract:

In the co-design process of hardware/software (HW/SW) system, especially for large and complicated embedded systems, HW/SW partitioning is a challenging step. Among different heuristic approaches, particle swarm optimization (PSO) has the advantages of simple implementation and computational efficiency, which is suitable for solving large-scale problems. This paper presents a conformity particle swarm optimization with fireworks explosion operation (CPSO-FEO) to solve large-scale HW/SW partitioning. First, the proposed CPSO algorithm simulates the conformist mentality from biology research. The CPSO particles with psychological conformist always try to move toward a secure point and avoid being attacked by natural enemy. In this way, there is a greater possibility to increase population diversity and avoid local optimum in CPSO. Next, to enhance the search accuracy and solution quality, an improved FEO with new initialization strategy is presented and is combined with CPSO algorithm to search a better position for the global best position. This combination can keep both the diversified and intensified searching. At last, the experiments on benchmarks and large-scale HW/SW partitioning demonstrate the efficiency of the proposed algorithm.

APA, Harvard, Vancouver, ISO, and other styles

23

An, Jianjing, Dezheng Zhang, Ke Xu, and Dong Wang. "An OpenCL-Based FPGA Accelerator for Faster R-CNN." Entropy 24, no. 10 (September 23, 2022): 1346. http://dx.doi.org/10.3390/e24101346.

Full text

Abstract:

In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN’s inherently high computational complexity and high memory complexity bring challenges to the design of efficient accelerators. This paper proposes a software-hardware co-design scheme based on OpenCL to implement a Faster R-CNN object detection algorithm on FPGA. First, we design an efficient, deep pipelined FPGA hardware accelerator that can implement Faster R-CNN algorithms for different backbone networks. Then, an optimized hardware-aware software algorithm was proposed, including fixed-point quantization, layer fusion, and a multi-batch Regions of interest (RoIs) detector. Finally, we present an end-to-end design space exploration scheme to comprehensively evaluate the performance and resource utilization of the proposed accelerator. Experimental results show that the proposed design achieves a peak throughput of 846.9 GOP/s at the working frequency of 172 MHz. Compared with the state-of-the-art Faster R-CNN accelerator and the one-stage YOLO accelerator, our method achieves 10× and 2.1× inference throughput improvements, respectively.

APA, Harvard, Vancouver, ISO, and other styles

24

Sabir, Brahim, Yassine Khazri, Mohamed Moussetad, and Bouzekri Touri. "Hardware and Software Co-Design of Arabic Alphabets Recognition Platform for Blind and Visually Impaired Persons." Open Electrical & Electronic Engineering Journal 11, no. 1 (November 16, 2017): 193–200. http://dx.doi.org/10.2174/1874129001711010193.

Full text

Abstract:

Background:Optical character Recognition (OCR) is a technic that converts scanned or printed text images into editable text. Many OCR solutions have been proposed and used for Latin and Chinese alphabets.However not much can be found about OCRs for the handwriting scripts Arabic Alphabets, and especially to be used for blind and visually impaired persons.This paper has been an attempt towards the development of an OCR for Arabic Alphabets dedicated to blind and visually impaired persons.Method:The proposed Optical Arabic Alphabets Recognition algorithm includes binarization of the inputted image, segmentation, feature extraction and a classification based on neural networks to match read Arabic alphabets with trained pattern.The proposed algorithm has been developed using Matlab, and the solution was designed to be implemented on hardware platform and can be customized for mobile phones.Conclusion:The presented method has the benefit that the accuracy of recognition is comparable to other OCR algorithms.

APA, Harvard, Vancouver, ISO, and other styles

25

Zheng, Xin, Xianghong Hu, Jinglong Zhang, Jian Yang, Shuting Cai, and Xiaoming Xiong. "An Efficient and Low-Power Design of the SM3 Hash Algorithm for IoT." Electronics 8, no. 9 (September 14, 2019): 1033. http://dx.doi.org/10.3390/electronics8091033.

Full text

Abstract:

The Internet-of-Things (IoT) has a security problem that has become increasingly significant. New architecture of SM3 which can be implemented in loT devices is proposed in this paper. The software/hardware co-design approach is put forward to implement the new architecture to achieve high performance and low costs. To facilitate software/hardware co-design, an AHB-SM3 interface controller (AHB-SIC) is designed as an AHB slave interface IP to exchange data with the embedded CPU. Task scheduling and hardware resource optimization techniques are adopted in the design of expansion modules. The task scheduling and critical path optimization techniques are utilized in the compression module design. The proposed architecture is implemented with ASIC using SMIC 130 nm technology. For the purpose of comparison, the proposed architecture is also implemented on Virtex 7 FPGA with a 36 MHz system clock. Compared with the standard implementation of SM3, the proposed architecture saves the number of registers for approximately 3.11 times, and 263 Mbps throughput is achieved under the 36 MHz clock. This design signifies an excellent trade-off between performance and the hardware area. Thus, the design accommodates the resource-limited IoT security devices very well. The proposed architecture is applied to an intelligent security gateway device.

APA, Harvard, Vancouver, ISO, and other styles

26

Howard, Neil J., Andrew M. Tyrrell, and Nigel M. Allinson. "The Use of Field-Programmable Gate Arrays for the Hardware Acceleration of Design Automation Tasks." VLSI Design 4, no. 2 (January 1, 1996): 135–39. http://dx.doi.org/10.1155/1996/17505.

Full text

Abstract:

This paper investigates the possibility of using Field-Programmable Gate Arrays (Fpgas) as reconfigurable co-processors for workstations to produce moderate speedups for most tasks in the design process, resulting in a worthwhile overall design process speedup at low cost and allowing algorithm upgrades with no hardware modification. The use of Fpgas as hardware accelerators is reviewed and then achievable speedups are predicted for logic simulation and VLSI design rule checking tasks for various Fpga co-processor arrangements.

APA, Harvard, Vancouver, ISO, and other styles

27

Zhou, Zhen, Debiao He, Zhe Liu, Min Luo, and Kim-Kwang Raymond Choo. "A Software/Hardware Co-Design of Crystals-Dilithium Signature Scheme." ACM Transactions on Reconfigurable Technology and Systems 14, no. 2 (June 5, 2021): 1–21. http://dx.doi.org/10.1145/3447812.

Full text

Abstract:

As quantum computers become more affordable and commonplace, existing security systems that are based on classical cryptographic primitives, such as RSA and Elliptic Curve Cryptography ( ECC ), will no longer be secure. Hence, there has been interest in designing post-quantum cryptographic ( PQC ) schemes, such as those based on lattice-based cryptography ( LBC ). The potential of LBC schemes is evidenced by the number of such schemes passing the selection of NIST PQC Standardization Process Round-3. One such scheme is the Crystals-Dilithium signature scheme, which is based on the hard module-lattice problem. However, there is no efficient implementation of the Crystals-Dilithium signature scheme. Hence, in this article, we present a compact hardware architecture containing elaborate modular multiplication units using the Karatsuba algorithm along with smart generators of address sequence and twiddle factors for NTT, which can complete polynomial addition/multiplication with the parameter setting of Dilithium in a short clock period. Also, we propose a fast software/hardware co-design implementation on Field Programmable Gate Array ( FPGA ) for the Dilithium scheme with a tradeoff between speed and resource utilization. Our co-design implementation outperforms a pure C implementation on a Nios-II processor of the platform Altera DE2-115, in the sense that our implementation is 11.2 and 7.4 times faster for signature and verification, respectively. In addition, we also achieve approximately 51% and 31% speed improvement for signature and verification, in comparison to the pure C implementation on processor ARM Cortex-A9 of ZYNQ-7020 platform.

APA, Harvard, Vancouver, ISO, and other styles

28

Yusuf, Yusmardiah, Darmawaty Mohd Ali, and Norsuzila Ya’acob. "Hardware simulation for exponential blind equal throughput algorithm using system generator." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 1 (February 1, 2019): 170. http://dx.doi.org/10.11591/ijece.v9i1.pp170-180.

Full text

Abstract:

Scheduling mechanism is the process of allocating radio resources to User Equipment (UE) that transmits different flows at the same time. It is performed by the scheduling algorithm implemented in the Long Term Evolution base station, Evolved Node B. Normally, most of the proposed algorithms are not focusing on handling the real-time and non-real-time traffics simultaneously. Thus, UE with bad channel quality may starve due to no resources allocated for quite a long time. To solve the problems, Exponential Blind Equal Throughput (EXP-BET) algorithm is proposed. User with the highest priority metrics is allocated the resources firstly which is calculated using the EXP-BET metric equation. This study investigates the implementation of the EXP-BET scheduling algorithm on the FPGA platform. The metric equation of the EXP-BET is modelled and simulated using System Generator. This design has utilized only 10% of available resources on FPGA. Fixed numbers are used for all the input to the scheduler. The system verification is performed by simulating the hardware co-simulation for the metric value of the EXP-BET metric algorithm. The output from the hardware co-simulation showed that the metric values of EXP-BET produce similar results to the Simulink environment. Thus, the algorithm is ready for prototyping and Virtex-6 FPGA is chosen as the platform.

APA, Harvard, Vancouver, ISO, and other styles

29

Dang, Tuan Linh, and Yukinobu Hoshino. "Hardware/Software Co-design for a Neural Network Trained by Particle Swarm Optimization Algorithm." Neural Processing Letters 49, no. 2 (March 30, 2018): 481–505. http://dx.doi.org/10.1007/s11063-018-9826-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Ranjith, C., and S. P. Joy Vasantha Rani. "A Fast On-Chip Adaptive Genetic Algorithm Processor for Evolutionary FIR Filter Implementation Using Hardware–Software Co-Design." Journal of Circuits, Systems and Computers 29, no. 01 (April 4, 2019): 2050014. http://dx.doi.org/10.1142/s0218126620500140.

Full text

Abstract:

Recent studies show the impact of genetic algorithms (GA) in the design of evolutionary finite impulse response (FIR) filters. Studies have shown hardware and software method of GA implementation for design. Hardware method improves speed due to parallelism, pipelining and the absence of the function calls compared to software implementation. But area constraint was the main issue of hardware implementation. Therefore, this paper illustrates a hardware–software co-design concept to implement an Adaptive GA processor (AGAP) for FIR filter design. The architecture of AGAP uses adaptive crossover and mutation probabilities to speed up the convergence of the GA process. The AGAP architecture was implemented using Verilog Hardware Description Language (HDL) and instantiated as a custom intellectual property (IP) core to the soft-core MicroBlaze processor of Spartan 6 (XC6SLX45-3CSG324I) FPGA. The MicroBlaze processor controls the AGAP IP core and other interfaces using Embedded C programs. The experiment demonstrated a significant 134% improvement in speed over hardware implementation but with a marginal increase in area. The complete evaluation and evolution of the filter coefficients were executed on a single FPGA. The system on chip (SoC) concept enables a robust and flexible system.

APA, Harvard, Vancouver, ISO, and other styles

31

Gan, Jiayan, Ang Hu, Ziyi Kang, Zhipeng Qu, Zhanxiang Yang, Rui Yang, Yibing Wang, Huaizong Shao, and Jun Zhou. "SAS-SEINet: A SNR-Aware Adaptive Scalable SEI Neural Network Accelerator Using Algorithm–Hardware Co-Design for High-Accuracy and Power-Efficient UAV Surveillance." Sensors 22, no. 17 (August 30, 2022): 6532. http://dx.doi.org/10.3390/s22176532.

Full text

Abstract:

As a potential air control measure, RF-based surveillance is one of the most commonly used unmanned aerial vehicles (UAV) surveillance methods that exploits specific emitter identification (SEI) technology to identify captured RF signal from ground controllers to UAVs. Recently many SEI algorithms based on deep convolution neural network (DCNN) have emerged. However, there is a lack of the implementation of specific hardware. This paper proposes a high-accuracy and power-efficient hardware accelerator using an algorithm–hardware co-design for UAV surveillance. For the algorithm, we propose a scalable SEI neural network with SNR-aware adaptive precision computation. With SNR awareness and precision reconfiguration, it can adaptively switch between DCNN and binary DCNN to cope with low SNR and high SNR tasks, respectively. In addition, a short-time Fourier transform (STFT) reusing DCNN method is proposed to pre-extract feature of UAV signal. For hardware, we designed a SNR sensing engine, denoising engine, and specialized DCNN engine with hybrid-precision convolution and memory access, aiming at SEI acceleration. Finally, we validate the effectiveness of our design on a FPGA, using a public UAV dataset. Compared with a state-of-the-art algorithm, our method can achieve the highest accuracy of 99.3% and an F1 score of 99.3%. Compared with other hardware designs, our accelerator can achieve the highest power efficiency of 40.12 Gops/W and 96.52 Gops/W with INT16 precision and binary precision.

APA, Harvard, Vancouver, ISO, and other styles

32

WEI, WENLONG, BIN LI, YI ZOU, WENCONG ZHANG, and ZHENQUAN ZHUANG. "A MULTI-OBJECTIVE HW–SW CO-SYNTHESIS ALGORITHM BASED ON QUANTUM-INSPIRED EVOLUTIONARY ALGORITHM." International Journal of Computational Intelligence and Applications 07, no. 02 (June 2008): 129–48. http://dx.doi.org/10.1142/s146902680800220x.

Full text

Abstract:

Hardware–Software (HW–SW) co-synthesis is one of the key steps in modern embedded system design. Generally, HW–SW co-synthesis is to optimally allocate processors, assign tasks to processors, and schedule the processing of tasks to achieve a good balance among performance, cost, power consumption, etc. Hence, it is a typical multi-objective optimization problem. In this paper, a new multi-objective HW–SW co-synthesis algorithm based on the quantum-inspired evolutionary algorithm (MQEAC) is proposed. MQEAC utilizes multiple quantum probability amplitude vectors to model the promising areas of solution space. Meanwhile, this paper presents a new crossover operator to accelerate the convergence to the Pareto front and introduces a PE slot-filling strategy to improve the efficiency of scheduling. Experimental results show that the proposed algorithm can solve the typical multi-objective co-synthesis problems effectively and efficiently.

APA, Harvard, Vancouver, ISO, and other styles

33

Khoud, Khaled Ben, Soufiene Bouallègue, and Mounir Ayadi. "Design and co-simulation of a fuzzy gain-scheduled PID controller based on particle swarm optimization algorithms for a quad tilt wing unmanned aerial vehicle." Transactions of the Institute of Measurement and Control 40, no. 14 (January 8, 2018): 3933–52. http://dx.doi.org/10.1177/0142331217740947.

Full text

Abstract:

This paper deals with the systematic design and hardware co-simulation of a fuzzy gain-scheduled proportional–integral–derivative (GS-PID) controller for a quad tilt wing (QTW) type of unmanned aerial vehicles (UAVs) based on different variants of the particle swarm optimization (PSO) algorithm. The fuzzy PID gains scheduling problem for the stabilization of the roll, pitch and yaw dynamics of the QTW vehicle is formulated as a constrained optimization problem and solved thanks to improved PSO algorithms. PSO algorithms with variable inertia weight (PSO-In), PSO with constriction factor (PSO-Co) and PSO with possibility updating strategies (PSO-gbest) are proposed. Such variants of the PSO algorithm aim further to improve the exploration and exploitation capabilities of such a stochastic algorithm as well as its convergence fastness. The robustness of the designed PSO-based fuzzy GS-PID controllers under actuators faults is shown on the non-linear model of the QTW. All optimized fuzzy GS-PID controllers are then co-simulated within a processor-in-the-loop (PIL) framework based on an embedded NI myRIO-1900 board and a host PC. Such a proposed software (SW) and hardware (HW) computer aided design (CAD) platform is based on the Control Design and Simulation (CDSim) module of the LabVIEW environment as well as a set-up Network Streams-based data communication protocol. Demonstrative simulation results are presented, compared and discussed in order to improve the effectiveness of the proposed PSO-based fuzzy gains scheduled PID controllers for the QTW’s attitude flight stabilization.

APA, Harvard, Vancouver, ISO, and other styles

34

ISSAD, M., B. BOUDRAA, M. ANANE, and N. ANANE. "SOFTWARE/HARDWARE CO-DESIGN OF MODULAR EXPONENTIATION FOR EFFICIENT RSA CRYPTOSYSTEM." Journal of Circuits, Systems and Computers 23, no. 03 (March 2014): 1450032. http://dx.doi.org/10.1142/s0218126614500327.

Full text

Abstract:

This paper presents an implementation of Rivest, Shamir and Adleman (RSA) cryptosystem based on hardware/software (HW/SW) co-design. The main operation of RSA is the modular exponentiation (ME) which is performed by repeated modular multiplications (MMs). In this work, the right-to-left (R2L) algorithm is used for the implementation of the ME as a programmable system on chip (PSoC). The processor MicroBlaze of Xilinx is used for flexibility. The R2L method is often suggested to improve the timing performance, since it is based on parallel computations of MMs. However, if the optimization of HW resources is a constraint, this method can be executed sequentially using a single modular multiplier as a custom intellectual property (IP). Consequently, the execution time of the ME becomes dependent of three factors, namely the capability of the custom IP to perform the MMs, the nonzero bit string of the exponent and the communication link between the processor and the custom IP. In order to achieve the best trade-off between area, speed and flexibility, we propose three implementations in this work. The first one is a pure software solution. The second one takes benefit of a HW accelerator dedicated to the MM execution. The last one is based on a dual strategy. Two parallel MMs are implemented within a custom IP and local memories are used close to the arithmetic units to minimize the communication link influence. The results show that in the application to RSA 1024-bits, the ME runs in 22,25 ms, while using only 1,848 slices.

APA, Harvard, Vancouver, ISO, and other styles

35

El-MALAKI, M. H., M. WATHEQ El-KHARASHI, S. HAMMAD, A. SALEM, and A. WAHDAN. "A PLATFORM APPROACH FOR HARDWARE/SOFTWARE CO-DESIGN WITH SUPPORT FOR RTOS-BASED SYSTEMS." Journal of Circuits, Systems and Computers 16, no. 06 (December 2007): 961–79. http://dx.doi.org/10.1142/s0218126607004015.

Full text

Abstract:

We propose a new flow for hardware/software co-design, based on the platform-based design, which forms a base for further automation attempts of the co-design process. We prove the applicability of the proposed flow on co-designing generic systems as well as RTOS-based systems. Our proposed flow starts with a software-only solution in which all system functionality is described as embedded software targeting a selected platform. Then, the flow iterates through co-verification, profiling, partitioning, and co-synthesis until the design criteria are met. We present four test cases to show the effectiveness of our proposed methodology. The main contribution added by the proposed methodology is incorporating the target application platform at the first stage of the flow then applying our iterative co-design algorithm without altering the main platform. This opposes other co-design methodologies that let the platform details be synthesized at later stages, widening the exploration space to be unrealistic and producing platforms that may vary to a large extent compared to the pre-verified application platform. The other contribution is the study provided on the effect of co-design on the behavior of RTOS-based platforms, which brings the flow closer to real-case problems, where most embedded systems utilize RTOS in their software stack.

APA, Harvard, Vancouver, ISO, and other styles

36

Mekala, Priyanka, Jeffrey Fan, Wen-Cheng Lai, and Ching-Wen Hsue. "Gesture Recognition Using Neural Networks Based on HW/SW Cosimulation Platform." Advances in Software Engineering 2013 (February 24, 2013): 1–13. http://dx.doi.org/10.1155/2013/707248.

Full text

Abstract:

Hardware/software (HW/SW) cosimulation integrates software simulation and hardware simulation simultaneously. Usually, HW/SW co-simulation platform is used to ease debugging and verification for very large-scale integration (VLSI) design. To accelerate the computation of the gesture recognition technique, an HW/SW implementation using field programmable gate array (FPGA) technology is presented in this paper. The major contributions of this work are: (1) a novel design of memory controller in the Verilog Hardware Description Language (Verilog HDL) to reduce memory consumption and load on the processor. (2) The testing part of the neural network algorithm is being hardwired to improve the speed and performance. The American Sign Language gesture recognition is chosen to verify the performance of the approach. Several experiments were carried out on four databases of the gestures (alphabet signs A to Z). (3) The major benefit of this design is that it takes only few milliseconds to recognize the hand gesture which makes it computationally more efficient.

APA, Harvard, Vancouver, ISO, and other styles

37

Chen, Yi-Jung, Chia-Lin Yang, and Yen-Sheng Chang. "An architectural co-synthesis algorithm for energy-aware Network-on-Chip design." Journal of Systems Architecture 55, no. 5-6 (May 2009): 299–309. http://dx.doi.org/10.1016/j.sysarc.2009.02.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

XIAO, Hao, Yanming FAN, Fen GE, Zhang ZHANG, and Xin CHENG. "Algorithm-Hardware Co-Design of Real-Time Edge Detection for Deep-Space Autonomous Optical Navigation." IEICE Transactions on Information and Systems E103.D, no. 10 (October 1, 2020): 2047–58. http://dx.doi.org/10.1587/transinf.2020pcp0002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Lee, Jinsu, Sanghoon Kang, Jinmook Lee, Dongjoo Shin, Donghyeon Han, and Hoi-Jun Yoo. "The Hardware and Algorithm Co-Design for Energy-Efficient DNN Processor on Edge/Mobile Devices." IEEE Transactions on Circuits and Systems I: Regular Papers 67, no. 10 (October 2020): 3458–70. http://dx.doi.org/10.1109/tcsi.2020.3021397.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Migliore, Vincent, Maria Mendez Real, Vianney Lapotre, Arnaud Tisserand, Caroline Fontaine, and Guy Gogniat. "Hardware/Software Co-Design of an Accelerator for FV Homomorphic Encryption Scheme Using Karatsuba Algorithm." IEEE Transactions on Computers 67, no. 3 (March 1, 2018): 335–47. http://dx.doi.org/10.1109/tc.2016.2645204.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Niu, Wen Liang, Wen Zheng Li, and Kai Shuang Yin. "Application of DFG Model on SOPC Technology." Applied Mechanics and Materials 198-199 (September 2012): 696–700. http://dx.doi.org/10.4028/www.scientific.net/amm.198-199.696.

Full text

Abstract:

HW/SW (hardware/software) co-design method based on analysis and optimization of DFG (data flow graphic) model is introduced for SOPC (System on a Programmable Chip) used for digital instrument design in this paper. The method is based on the DFG model of the digital signal process algorithm and implemented with SOPC technology. The DFG model could help designer to divide the function into hardware and software respectively, therefore, the optimizing analysis at system level and circuit level of a SOPC used for portable logic analyzer shows that the DFG model is very useful for not only optimizing architecture and power consumption, but also HW/SW co-design.

APA, Harvard, Vancouver, ISO, and other styles

42

Javed, Hassan, Muhammad Bilal, and Shahid Masud. "A Hardware–Software Co-Design Framework for Real-Time Video Stabilization." Journal of Circuits, Systems and Computers 29, no. 02 (May 3, 2019): 2050027. http://dx.doi.org/10.1142/s0218126620500279.

Full text

Abstract:

Live digital video is a valuable source of information in security, broadcast and industrial quality control applications. Motion jitter due to camera and platform instability is a common artefact found in captured video which renders it less effective for subsequent computer vision tasks such as detection and tracking of objects, background modeling, mosaicking, etc. The process of algorithmically compensating for the motion jitter is hence a mandatory pre-processing step in many applications. This process, called video stabilization, requires estimation of global motion from consecutive video frames and is constrainted by additional challenges such as preservation of intentional motion and native frame resolution. The problem is exacerbated in the presence of local motion of foreground objects and requires robust compensation of the same. As such achieving real-time performance for this computationally intensive operation is a difficult task for embedded processors with limited computational and memory resources. In this work, development of an optimized hardware–software co-design framework for video stabilization has been investigated. Efficient video stabilization depends on the identification of key points in the frame which in turn requires dense feature calculation at the pixel level. This task has been identified to be most suitable for offloading the pipelined hardware implemented in the FPGA fabric due to the involvement of complex memory and computation operations. Subsequent tasks to be performed for the overall stabilization algorithm utilize these sparse key points and have been found to be efficiently handled in the software. The proposed Hardware–Software (HW–SW) co-design framework has been implemented on Zedboard FPGA platform which houses Xilinx Zynq SOC equipped with ARM A9 processor. The proposed implementation scheme can process real-time video stream input at 28 frames per second and is at least twice faster than the corresponding software-only approach. Two different hardware accelerator designs have been implemented using different high-level synthesis tools using rapid prototyping principle and consume less than 50% of logic resources available on the host FPGA while being at least 30% faster than contemporary designs.

APA, Harvard, Vancouver, ISO, and other styles

43

Fasfous, Nael, Manoj Rohit Vemparala, Alexander Frickenstein, Emanuele Valpreda, Driton Salihu, Nguyen Anh Vu Doan, Christian Unger, Naveen Shankar Nagaraja, Maurizio Martina, and Walter Stechele. "HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology." ACM Transactions on Embedded Computing Systems 20, no. 5s (October 31, 2021): 1–25. http://dx.doi.org/10.1145/3476997.

Full text

Abstract:

Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods.

APA, Harvard, Vancouver, ISO, and other styles

44

Ravi, Aadithya, Easwara E. A. Moorthy, D. Vidya, and G. Mahesh Kumar. "Hybrid Reconfigurable PC Add-on Card for Parallel Image Processing." Applied Mechanics and Materials 110-116 (October 2011): 5057–62. http://dx.doi.org/10.4028/www.scientific.net/amm.110-116.5057.

Full text

Abstract:

Specific hardware solutions are always faster than programmable architectures. But dedicated architectures have the inherent disadvantage of inflexibility. Changes in the algorithm or extensions of the application are handled easily by programmable architectures. The approach discussed here involves a hardware-software co-design to optimize on performance and programmability. The architecture houses two SHARC processors to aid in parallelizing the image processing algorithms, and a reconfigurable FPGA which may be configured on the fly to execute any of the real-time algorithms as desired. The functional memory would consist of pre-designs (FPGA based) of certain objects, each of which could be used to configure an FPGA to perform a particular function.

APA, Harvard, Vancouver, ISO, and other styles

45

Zhang, Jun An, Ya Hong Guo, and Guo Min Mo. "A Software Hardware Co-Design Approach for FPGAs on Nios II Soft-Core Processors." Applied Mechanics and Materials 373-375 (August 2013): 1591–94. http://dx.doi.org/10.4028/www.scientific.net/amm.373-375.1591.

Full text

Abstract:

In order to prove the applicability of the design approach with complex System-on-Chip (SoC), equipments for real-time electrocardiographic (ECG) signal generator and corresponding algorithm have been implemented in this study. The study mainly focused on completing a SoC design which constructs a customizable system via user interface to an FPGA Chip in accordance with the need of a specific application. In the proposed design flow the architecture of the generated hardware is tailored to match the communication structure of the application. This allows the developer to meet the system's performance, size and power consumption requirements with short time to market. The feature-rich multimedia products can meet market expectations of high performance at low cost and lower energy consumption.

APA, Harvard, Vancouver, ISO, and other styles

46

Memon, Farida, Aamir Hussain Memon, Shahnawaz Talpur, Fayaz Ahmed Memon, and Rafia Naz Memon. "Design and Co-Simulation of Depth Estimation Using Simulink HDL Coder and Modelsim." July 2016 35, no. 3 (July 1, 2016): 473–82. http://dx.doi.org/10.22581/muet1982.1603.17.

Full text

Abstract:

In this paper a novel VHDL design procedure of depth estimation algorithm using HDL (Hardware Description Language) Coder is presented. A framework is developed that takes depth estimation algorithm described in MATLAB as input and generates VHDL code, which dramatically decreases the time required to implement an application on FPGAs (Field Programmable Gate Arrays). In the first phase, design is carriedout in MATLAB. Using HDL Coder, MATLAB floating- point design is converted to an efficient fixed-point design and generated VHDL Code and test-bench from fixed point MATLAB code. Further, the generated VHDL code of design is verified with co-simulation using Mentor Graphic ModelSim10.3d software. Simulation results are presented which indicate that VHDL simulations match with the MATLAB simulations and confirm the efficiency of presented methodology.

APA, Harvard, Vancouver, ISO, and other styles

47

Li, Guihong, Sumit K. Mandal, Umit Y. Ogras, and Radu Marculescu. "FLASH: F ast Neura l A rchitecture S earch with H ardware Optimization." ACM Transactions on Embedded Computing Systems 20, no. 5s (October 31, 2021): 1–26. http://dx.doi.org/10.1145/3476994.

Full text

Abstract:

Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs). As the performance requirements of ML applications grow continuously, the hardware accelerators start playing a central role in DNN design. This trend makes NAS even more complicated and time-consuming for most real applications. This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform. As the main theoretical contribution, we first propose the NN-Degree, an analytical metric to quantify the topological characteristics of DNNs with skip connections (e.g., DenseNets, ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us to do training-free NAS within one second and build an accuracy predictor by training as few as 25 samples out of a vast search space with more than 63 billion configurations. Second, by performing inference on the target hardware, we fine-tune and validate our analytical models to estimate the latency, area, and energy consumption of various DNN architectures while executing standard ML datasets. Third, we construct a hierarchical algorithm based on simplicial homology global optimization (SHGO) to optimize the model-architecture co-design process, while considering the area, latency, and energy consumption of the target hardware. We demonstrate that, compared to the state-of-the-art NAS approaches, our proposed hierarchical SHGO-based algorithm enables more than four orders of magnitude speedup (specifically, the execution time of the proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations show that FLASH is easily transferable to different hardware architectures, thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3 seconds.

APA, Harvard, Vancouver, ISO, and other styles

48

Rojas-Muñoz, Luis Felipe, Horacio Rostro-González, Carlos Hugo García-Capulín, and Santiago Sánchez-Solano. "Hardware/Software Co-Design of a Circle Detection System Based on Evolutionary Computing." Electronics 11, no. 17 (August 27, 2022): 2686. http://dx.doi.org/10.3390/electronics11172686.

Full text

Abstract:

In recent years, the strategy of co-designing Hardware/Software (HW/SW) systems has been widely adopted to exploit the synergy between both approaches thanks to technological advances that have led to more powerful devices providing an increasingly better cost–benefit trade-off. This paper presents an HW/SW system for the detection of multiple circles in digital images based on a genetic algorithm. It is implemented on an Ultra96-v2 development board, which contains a Xilinx Zynq UltraScale+ MPSoC device and supports a Linux operating system that facilitates application development. The design is powered by developing an interactive computing environment by means of the Jupyter Notebook platform, in which different programming languages coexist. The specific advantages of each of these languages have been used to describe the hardware component that accelerates the evolutionary computation for circle detection (VHDL), to execute SW-HW interaction functions, as well as the pre- and post-processing of the images (ANSI-C) and to code, evaluate, and document the system execution process (Python). As a result, a computationally efficient application was obtained, with high accuracy in the detection of circles in synthetic and real images, and with a high degree of reconfigurability that provides the user with the necessary tools to incorporate it in a specific area of interest.

APA, Harvard, Vancouver, ISO, and other styles

49

Al-Musawi, Wisal Adnan, Wasan A. Wali, and Mohammed Abd Ali Al-Ibadi. "New artificial neural network design for Chua chaotic system prediction using FPGA hardware co-simulation." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 2 (April 1, 2022): 1955. http://dx.doi.org/10.11591/ijece.v12i2.pp1955-1964.

Full text

Abstract:

<p>This study aims to design a new architecture of the artificial neural networks (ANNs) using the Xilinx system generator (XSG) and its hardware co-simulation equivalent model using field programmable gate array (FPGA) to predict the behavior of Chua’s chaotic system and use it in hiding information. The work proposed consists of two main sections. In the first section, MATLAB R2016a was used to build a 3×4×3 feed forward neural network (FFNN). The training results demonstrate that FFNN training in the Bayesian regulation algorithm is sufficiently accurate to directly implement. The second section demonstrates the hardware implementation of the network with the XSG on the Xilinx artix7 xc7a100t-1csg324 chip. Finally, the message was first encrypted using a dynamic Chua system and then decrypted using ANN’s chaotic dynamics. ANN models were developed to implement hardware in the FPGA system using the IEEE 754 Single precision floating-point format. The ANN design method illustrated can be extended to other chaotic systems in general.</p>

APA, Harvard, Vancouver, ISO, and other styles

50

Adiono, Trio, Aditya F. Ardyanto, Nur Ahmadi, Idham Hafizh, and Septian G. P. Putra. "An SoC Architecture for Real-Time Noise Cancellation System Using Variable Speech PDF Method." International Journal of Electrical and Computer Engineering (IJECE) 5, no. 6 (December 1, 2015): 1336. http://dx.doi.org/10.11591/ijece.v5i6.pp1336-1346.

Full text

Abstract:

This paper presents the architecture and implementation of system-on-chip (SoC) for realtime noise cancellation system which exploits variable speech probability density function (PDF) and maximum a posteriori (MAP) estimation rule as noise cancelling algorithm. The hardware software co-design approach is employed to achieve real-time performance while considering ease of implementation and design flexibility. The software module utilizes LEON SPARC-v8 and FPU co-prosessor as processing unit. The AMBA based Hanning Filter and FFT/IFFT are utilized as processing accelerator modules to increase system performance. The FFT/IFFT module employs custom Radix-2^2 Single Delay Feedback (R2^2SDF). In order to deliver high data transfer rate between buffer and hardware accelerators, the DMA controller is incorporated. The overall system implementation utilizes 18,500 logic elements and consumes 21.87 kB of memory. The system takes only 0.69 ms latency which is appropriate for real-time application. An FPGA Altera DE2-70 is used for prototyping with both algorithms and the noise cancellation function have been verified.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!