Log in

Relevant bibliographies by topics / Hardware-Aware Algorithm design / Journal articles

To see the other types of publications on this topic, follow the link: Hardware-Aware Algorithm design.

Journal articles on the topic 'Hardware-Aware Algorithm design'

Author: Grafiati

Published: 25 May 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Hardware-Aware Algorithm design.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

An, Jianjing, Dezheng Zhang, Ke Xu, and Dong Wang. "An OpenCL-Based FPGA Accelerator for Faster R-CNN." Entropy 24, no. 10 (September 23, 2022): 1346. http://dx.doi.org/10.3390/e24101346.

Full text

Abstract:

In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN’s inherently high computational complexity and high memory complexity bring challenges to the design of efficient accelerators. This paper proposes a software-hardware co-design scheme based on OpenCL to implement a Faster R-CNN object detection algorithm on FPGA. First, we design an efficient, deep pipelined FPGA hardware accelerator that can implement Faster R-CNN algorithms for different backbone networks. Then, an optimized hardware-aware software algorithm was proposed, including fixed-point quantization, layer fusion, and a multi-batch Regions of interest (RoIs) detector. Finally, we present an end-to-end design space exploration scheme to comprehensively evaluate the performance and resource utilization of the proposed accelerator. Experimental results show that the proposed design achieves a peak throughput of 846.9 GOP/s at the working frequency of 172 MHz. Compared with the state-of-the-art Faster R-CNN accelerator and the one-stage YOLO accelerator, our method achieves 10× and 2.1× inference throughput improvements, respectively.

APA, Harvard, Vancouver, ISO, and other styles

2

Vo, Quang Hieu, Faaiz Asim, Batyrbek Alimkhanuly, Seunghyun Lee, and Lokwon Kim. "Hardware Platform-Aware Binarized Neural Network Model Optimization." Applied Sciences 12, no. 3 (January 26, 2022): 1296. http://dx.doi.org/10.3390/app12031296.

Full text

Abstract:

Deep Neural Networks (DNNs) have shown superior accuracy at the expense of high memory and computation requirements. Optimizing DNN models regarding energy and hardware resource requirements is extremely important for applications with resource-constrained embedded environments. Although using binary neural networks (BNNs), one of the recent promising approaches, significantly reduces the design’s complexity, accuracy degradation is inevitable when reducing the precision of parameters and output activations. To balance between implementation cost and accuracy, in addition to proposing specialized hardware accelerators for corresponding specific network models, most recent software binary neural networks have been optimized based on generalized metrics, such as FLOPs or MAC operation requirements. However, with the wide range of hardware available today, independently evaluating software network structures is not good enough to determine the final network model for typical devices. In this paper, an architecture search algorithm based on estimating the hardware performance at the design time is proposed to achieve the best binary neural network models for hardware implementation on target platforms. With the XNOR-net used as a base architecture and target platforms, including Field Programmable Gate Array (FPGA), Graphic Processing Unit (GPU), and Resistive Random Access Memory (RRAM), the proposed algorithm shows its efficiency by giving more accurate estimation for the hardware performance at the design time than FLOPs or MAC operations.

APA, Harvard, Vancouver, ISO, and other styles

3

Fung, Wing On, and Tughrul Arslan. "A power-aware algorithm for the design of reconfigurable hardware during high level placement." International Journal of Knowledge-based and Intelligent Engineering Systems 12, no. 3 (October 21, 2008): 237–44. http://dx.doi.org/10.3233/kes-2008-12306.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Petschenig, Horst, and Robert Legenstein. "Quantized rewiring: hardware-aware training of sparse deep neural networks." Neuromorphic Computing and Engineering 3, no. 2 (May 26, 2023): 024006. http://dx.doi.org/10.1088/2634-4386/accd8f.

Full text

Abstract:

Abstract Mixed-signal and fully digital neuromorphic systems have been of significant interest for deploying spiking neural networks in an energy-efficient manner. However, many of these systems impose constraints in terms of fan-in, memory, or synaptic weight precision that have to be considered during network design and training. In this paper, we present quantized rewiring (Q-rewiring), an algorithm that can train both spiking and non-spiking neural networks while meeting hardware constraints during the entire training process. To demonstrate our approach, we train both feedforward and recurrent neural networks with a combined fan-in/weight precision limit, a constraint that is, for example, present in the DYNAP-SE mixed-signal analog/digital neuromorphic processor. Q-rewiring simultaneously performs quantization and rewiring of synapses and synaptic weights through gradient descent updates and projecting the trainable parameters to a constraint-compliant region. Using our algorithm, we find trade-offs between the number of incoming connections to neurons and network performance for a number of common benchmark datasets.

APA, Harvard, Vancouver, ISO, and other styles

5

SINHA, SHARAD, UDIT DHAWAN, and THAMBIPILLAI SRIKANTHAN. "EXTENDED COMPATIBILITY PATH BASED HARDWARE BINDING: AN ADAPTIVE ALGORITHM FOR HIGH LEVEL SYNTHESIS OF AREA-TIME EFFICIENT DESIGNS." Journal of Circuits, Systems and Computers 23, no. 09 (August 25, 2014): 1450131. http://dx.doi.org/10.1142/s021812661450131x.

Full text

Abstract:

Hardware binding is an important step in high level synthesis (HLS). The quality of hardware binding affects the area-time efficiency of a design. The goal of a synthesis process is to produce a design which meets the area-time requirements. In this paper, we present a new hardware binding algorithm with focus on area reduction. It is called extended compatibility path-based (ECPB) hardware binding and extends the compatibility path-based (CPB) hardware binding method by exploiting inter-operation flow dependencies, non-overlapping lifetimes of variables and modifying the weight relation in order to make it application aware and thus adaptive in nature. The presented methodology also takes into account bit width of functional units (FUs) and multi mode FUs. It performs simultaneous FU and register binding. Implemented within a C to register transfer level (RTL) framework, it produces binding results which are better than those produced by weighted bipartite matching (WBM) and CPB algorithms. The use of ECPB algorithm results in an average reduction of 34% and 17.44% in area-time product over WBM and CPB methods, respectively.

APA, Harvard, Vancouver, ISO, and other styles

6

Gan, Jiayan, Ang Hu, Ziyi Kang, Zhipeng Qu, Zhanxiang Yang, Rui Yang, Yibing Wang, Huaizong Shao, and Jun Zhou. "SAS-SEINet: A SNR-Aware Adaptive Scalable SEI Neural Network Accelerator Using Algorithm–Hardware Co-Design for High-Accuracy and Power-Efficient UAV Surveillance." Sensors 22, no. 17 (August 30, 2022): 6532. http://dx.doi.org/10.3390/s22176532.

Full text

Abstract:

As a potential air control measure, RF-based surveillance is one of the most commonly used unmanned aerial vehicles (UAV) surveillance methods that exploits specific emitter identification (SEI) technology to identify captured RF signal from ground controllers to UAVs. Recently many SEI algorithms based on deep convolution neural network (DCNN) have emerged. However, there is a lack of the implementation of specific hardware. This paper proposes a high-accuracy and power-efficient hardware accelerator using an algorithm–hardware co-design for UAV surveillance. For the algorithm, we propose a scalable SEI neural network with SNR-aware adaptive precision computation. With SNR awareness and precision reconfiguration, it can adaptively switch between DCNN and binary DCNN to cope with low SNR and high SNR tasks, respectively. In addition, a short-time Fourier transform (STFT) reusing DCNN method is proposed to pre-extract feature of UAV signal. For hardware, we designed a SNR sensing engine, denoising engine, and specialized DCNN engine with hybrid-precision convolution and memory access, aiming at SEI acceleration. Finally, we validate the effectiveness of our design on a FPGA, using a public UAV dataset. Compared with a state-of-the-art algorithm, our method can achieve the highest accuracy of 99.3% and an F1 score of 99.3%. Compared with other hardware designs, our accelerator can achieve the highest power efficiency of 40.12 Gops/W and 96.52 Gops/W with INT16 precision and binary precision.

APA, Harvard, Vancouver, ISO, and other styles

7

Zhang, Yue, Shuai Jiang, Yue Cao, Jiarong Xiao, Chengkun Li, Xuan Zhou, and Zhongjun Yu. "Hardware-Aware Design of Speed-Up Algorithms for Synthetic Aperture Radar Ship Target Detection Networks." Remote Sensing 15, no. 20 (October 17, 2023): 4995. http://dx.doi.org/10.3390/rs15204995.

Full text

Abstract:

Recently, synthetic aperture radar (SAR) target detection algorithms based on Convolutional Neural Networks (CNN) have received increasing attention. However, the large amount of computation required burdens the real-time detection of SAR ship targets on resource-limited and power-constrained satellite-based platforms. In this paper, we propose a hardware-aware model speed-up method for single-stage SAR ship targets detection tasks, oriented towards the most widely used hardware for neural network computing—Graphic Processing Unit (GPU). We first analyze the process by which the task of detection is executed on GPUs and propose two strategies according to this process. Firstly, in order to speed up the execution of the model on a GPU, we propose SAR-aware model quantification to allow the original model to be stored and computed in a low-precision format. Next, to ensure the loss of accuracy is negligible after the acceleration and compression process, precision-aware scheduling is used to filter out layers that are not suitable for quantification and store and execute them in a high-precision mode. Trained on the dataset HRSID, the effectiveness of this model speed-up algorithm was demonstrated by compressing four different sizes of models (yolov5n, yolov5s, yolov5m, yolov5l). The experimental results show that the detection speeds of yolov5n, yolov5s, yolov5m, and yolov5l can reach 234.7785 fps, 212.8341 fps, 165.6523 fps, and 139.8758 fps on the NVIDIA AGX Xavier development board with negligible loss of accuracy, which is 1.230 times, 1.469 times, 1.955 times, and 2.448 times faster than the original before the use of this method, respectively.

APA, Harvard, Vancouver, ISO, and other styles

8

Perleberg, Murilo, Vinicius Borges, Vladimir Afonso, Daniel Palomino, Luciano Agostini, and Marcelo Porto. "6WR: A Hardware Friendly 3D-HEVC DMM-1 Algorithm and its Energy-Aware and High-Throughput Design." IEEE Transactions on Circuits and Systems II: Express Briefs 67, no. 5 (May 2020): 836–40. http://dx.doi.org/10.1109/tcsii.2020.2983959.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Arif, Muhammad, Omar S. Sonbul, Muhammad Rashid, Mohsin Murad, and Mohammed H. Sinky. "A Unified Point Multiplication Architecture of Weierstrass, Edward and Huff Elliptic Curves on FPGA." Applied Sciences 13, no. 7 (March 25, 2023): 4194. http://dx.doi.org/10.3390/app13074194.

Full text

Abstract:

This article presents an area-aware unified hardware accelerator of Weierstrass, Edward, and Huff curves over GF(2233) for the point multiplication step in elliptic curve cryptography (ECC). The target implementation platform is a field-programmable gate array (FPGA). In order to explore the design space between processing time and various protection levels, this work employs two different point multiplication algorithms. The first is the Montgomery point multiplication algorithm for the Weierstrass and Edward curves. The second is the Double and Add algorithm for the Binary Huff curve. The area complexity is reduced by efficiently replacing storage elements that result in a 1.93 times decrease in the size of the memory needed. An efficient Karatsuba modular multiplier hardware accelerator is implemented to compute polynomial multiplications. We utilized the square arithmetic unit after the Karatsuba multiplier to execute the quad-block variant of a modular inversion, which preserves lower hardware resources and also reduces clock cycles. Finally, to support three different curves, an efficient controller is implemented. Our unified architecture can operate at a maximum of 294 MHz and utilizes 7423 slices on Virtex-7 FPGA. It takes less computation time than most recent state-of-the-art implementations. Thus, combining different security curves (Weierstrass, Edward, and Huff) in a single design is practical for applications that demand different reliability/security levels.

APA, Harvard, Vancouver, ISO, and other styles

10

Hsu, Bay-Yuan, Chih-Ya Shen, Hao Shan Yuan, Wang-Chien Lee, and De-Nian Yang. "Social-Aware Group Display Configuration in VR Conference." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (March 24, 2024): 8517–25. http://dx.doi.org/10.1609/aaai.v38i8.28695.

Full text

Abstract:

Virtual Reality (VR) has emerged due to advancements in hardware and computer graphics. During the pandemic, conferences and exhibitions leveraging VR have gained attention. However, large-scale VR conferences, face a significant problem not yet studied in the literature -- displaying too many irrelevant users on the screen which may negatively impact the user experience. To address this issue, we formulate a new research problem, Social-Aware VR Conference Group Display Configuration (SVGD). Accordingly, we design the Social Utility-Aware VR Conference Group Formation (SVC) algorithm, which is a 2-approximation algorithm to SVGD. SVC iteratively selects either the P-Configuration or S-Configuration based on their effective ratios. This ensures that in each iteration, SVC identifies and chooses the solution with the highest current effectiveness. Experiments on real metaverse datasets show that the proposed SVC outperforms 11 baselines by 75% in terms of solution quality.

APA, Harvard, Vancouver, ISO, and other styles

11

Perleberg, Murilo Roschildt, Vladimir Afonso, Ruhan Conceição, Altamiro Susin, Luciano Agostini, Marcelo Porto, and Bruno Zatt. "Energy and Rate-Aware Design for HEVC Motion Estimation Based on Pareto Efficiency." Journal of Integrated Circuits and Systems 13, no. 1 (August 24, 2018): 1–12. http://dx.doi.org/10.29292/jics.v13i1.18.

Full text

Abstract:

This paper presents a high-throughput energy and rate-aware hardware design for the Motion Estimation (ME) according to the High Efficiency Video Coding (HEVC) standard. The hardware design implements a modified Test Zone Search (TZS) algorithm to perform Integer Motion Estimation (IME) as well as the Fractional Motion Estimation (FME) defined by the HEVC standard. Based on evaluations with the HEVC Reference Software, a complexity-reduction strategy was adopted in the developed architecture that mainly consists of supporting only the 8x8, 16x16, 32x32, and 64x64 Prediction Unit (PU) sizes rather than using the 24 possible PU sizes. The architecture allows an external control unit selects a subset of these four PU sizes according to the energy and rate targets for a specific application. The possible operation points were determined based on Pareto Efficiency. The architecture was described in VHDL, and the synthesis results for ASIC 45nm Nangate standard cells show that the developed architecture can process at least 53 frames per second (fps) considering Ultra-High Definition (UHD) 4320p videos. When an average-case of processing is considered, the architecture is able to process 112fps at UHD 4320p resolution.

APA, Harvard, Vancouver, ISO, and other styles

12

Sulaiman, Muhammad Bintang Gemintang, Jin-Yu Lin, Jian-Bai Li, Cheng-Ming Shih, Kai-Cheung Juang, and Chih-Cheng Lu. "SRAM-Based CIM Architecture Design for Event Detection." Sensors 22, no. 20 (October 16, 2022): 7854. http://dx.doi.org/10.3390/s22207854.

Full text

Abstract:

Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the high computational complexity and high-energy consumption of CNNs trammel their application in hardware accelerators. Computing-in-memory (CIM) is the technique of running calculations entirely in memory (in our design, we use SRAM). CIM architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. CIM-based architecture for event detection is designed to trigger the next stage of precision inference. To implement an SRAM-based CIM accelerator, a software and hardware co-design approach must consider the CIM macro’s hardware limitations to map the weight onto the AI edge devices. In this paper, we designed a hierarchical AI architecture to optimize the end-to-end system power in the AIoT application. In the experiment, the CIM-aware algorithm with 4-bit activation and 8-bit weight is examined on hand gesture and CIFAR-10 datasets, and determined to have 99.70% and 70.58% accuracy, respectively. A profiling tool to analyze the proposed design is also developed to measure how efficient our architecture design is. The proposed design system utilizes the operating frequency of 100 MHz, hand gesture and CIFAR-10 as the datasets, and nine CNNs and one FC layer as its network, resulting in a frame rate of 662 FPS, 37.6% processing unit utilization, and a power consumption of 0.853 mW.

APA, Harvard, Vancouver, ISO, and other styles

13

Yang, Jiacheng, Xiaoming Wang, and Jianwu Dang. "On the Algorithm of the Medical Diagnostic Decision Support System under the Mobile Platform." Open Electrical & Electronic Engineering Journal 8, no. 1 (December 31, 2014): 589–93. http://dx.doi.org/10.2174/1874129001408010589.

Full text

Abstract:

A mobile platform usually refers to a complete platform that provides a reference design, a hardware chipset and upper communication protocols and supporting software and development tools. The downstream vendors can design and manufacture appropriate mobile products in a shorter time as required. The app developers mustn’t pay attention to the content of platform hardware, but systematically understand the types of mobile devices, supporting software and appropriate development tools in mobile platform. The key for the medical experts to deal with the patients is decisionmaking. They make the decisions in diagnosing the patient’s status, adjusting the therapeutic plan and monitoring the changing disease status. In recent decades, a series of methods and tools have been developed to assist the clinicians for decision. Typically, these tools and methods are not intended to substitute the human’s job, but used for assistant and supportive purpose. The doctors, healthcare professionals and other healthcare staffs have been being aware of the importance of these algorithms and techniques in order to support the decision-making process as the information and communication technology has become increasingly important for the infrastructure construction in healthcare organizations.

APA, Harvard, Vancouver, ISO, and other styles

14

Diaz, Kristian, and Ying-Khai Teh. "Design and Power Management of a Secured Wireless Sensor System for Salton Sea Environmental Monitoring." Electronics 9, no. 4 (March 25, 2020): 544. http://dx.doi.org/10.3390/electronics9040544.

Full text

Abstract:

An embedded system composed of commercial off the shelf (COTS) peripherals and microcontroller. The system will collect environmental data for Salton Sea, Imperial Valley, California in order to understand the development of environmental and health hazards. Power analysis of each system features (i.e. Central Processing Unit (CPU) core, Input/Output (I/O) buses, and peripheral (temperature, humidity, and optical dust sensor) are studied. Software-based power optimization utilizes the power information with hardware-assisted power gating to control system features. The control of these features extends system uptime in a field deployed finite energy scenario. The proposed power optimization algorithm can collect more data by increasing system up time when compared to a Low Power Energy Aware Processing (LEAP) approach. Lastly, the 128 bit Advanced Encryption Standard (AES) algorithm is applied on the collected data using various parameters. A hidden peripheral requirement that must be considered during design are also noted to impact the efficacy of this method.

APA, Harvard, Vancouver, ISO, and other styles

15

Trevithick, Alex, Matthew Chan, Michael Stengel, Eric Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Manmohan Chandraker, Ravi Ramamoorthi, and Koki Nagano. "Real-Time Radiance Fields for Single-Image Portrait View Synthesis." ACM Transactions on Graphics 42, no. 4 (July 26, 2023): 1–15. http://dx.doi.org/10.1145/3592460.

Full text

Abstract:

We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e.g., face portrait) in real-time. Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering. Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization. To train our triplane encoder pipeline, we use only synthetic data, showing how to distill the knowledge from a pretrained 3D GAN into a feedforward encoder. Technical contributions include a Vision Transformer-based triplane encoder, a camera data augmentation strategy, and a well-designed loss function for synthetic data training. We benchmark against the state-of-the-art methods, demonstrating significant improvements in robustness and image quality in challenging real-world settings. We showcase our results on portraits of faces (FFHQ) and cats (AFHQ), but our algorithm can also be applied in the future to other categories with a 3D-aware image generator.

APA, Harvard, Vancouver, ISO, and other styles

16

Sekanina, Lukas. "Evolutionary Algorithms in Approximate Computing: A Survey." Journal of Integrated Circuits and Systems 16, no. 2 (August 16, 2021): 1–12. http://dx.doi.org/10.29292/jics.v16i2.499.

Full text

Abstract:

In recent years, many design automation methods have been developed to routinely create approximate implementations of circuits and programs that show excellent trade-offs between the quality of output and required resources. This paper deals with evolutionary approximation as one of the popular approximation methods. The paper provides the first survey of evolutionary algorithm (EA)-based approaches applied in the context of approximate computing. The survey reveals that EAs are primarily applied as multi-objective optimizers. We propose to divide these approaches into two main classes: (i) parameter optimization in which the EA optimizes a vector of system parameters, and (ii) synthesis and optimization in which EA is responsible for determining the architecture and parameters of the resulting system. The evolutionary approximation has been applied at all levels of design abstraction and in many different applications. The neural architecture search enabling the automated hardware-aware design of approximate deep neural networks was identified as a newly emerging topic in this area.

APA, Harvard, Vancouver, ISO, and other styles

17

Zhao, Zhongyuan, Weiguang Sheng, Jinchao Li, Pengfei Ye, Qin Wang, and Zhigang Mao. "Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA." Electronics 10, no. 18 (September 9, 2021): 2210. http://dx.doi.org/10.3390/electronics10182210.

Full text

Abstract:

Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.

APA, Harvard, Vancouver, ISO, and other styles

18

Guo, Peng, Hong Ma, Ruizhi Chen, and Donglin Wang. "A High-Efficiency FPGA-Based Accelerator for Binarized Neural Network." Journal of Circuits, Systems and Computers 28, supp01 (December 1, 2019): 1940004. http://dx.doi.org/10.1142/s0218126619400048.

Full text

Abstract:

Although the convolutional neural network (CNN) has exhibited outstanding performance in various applications, the deployment of CNN on embedded and mobile devices is limited by the massive computations and memory footprint. To address these challenges, Courbariaux and co-workers put forward binarized neural network (BNN) which quantizes both the weights and activations to [Formula: see text]1. From the perspective of hardware, BNN can greatly simplify the computation and reduce the storage. In this work, we first present the algorithm optimizations to further binarize the first layer and the padding bits of BNN; then we propose a fully binarized CNN accelerator. With the Shuffle–Compute structure and the memory-aware computation schedule scheme, the proposed design can boost the performance for feature maps of different sizes and make full use of the memory bandwidth. To evaluate our design, we implement the accelerator on the Zynq ZC702 board, and the experiments on the SVHN and CIFAR-10 datasets show the state-of-the-art performance efficiency and resource efficiency.

APA, Harvard, Vancouver, ISO, and other styles

19

Gao, Han, Zhangqin Huang, Xiaobo Zhang, and Huapeng Yang. "Research and Design of a Decentralized Edge-Computing-Assisted LoRa Gateway." Future Internet 15, no. 6 (May 27, 2023): 194. http://dx.doi.org/10.3390/fi15060194.

Full text

Abstract:

As a narrowband communication technology, long-range (LoRa) contributes to the long development of Internet of Things (IoT) applications. The LoRa gateway plays an important role in the IoT transport layer, and security and efficiency are the key issues of the current research. In the centralized working model of IoT systems built by traditional LoRa gateways, all the data generated and reported by end devices are processed and stored in cloud servers, which are susceptible to security issues such as data loss and data falsification. Edge computing (EC), as an innovative approach that brings data processing and storage closer to the endpoints, can create a decentralized security infrastructure for LoRa gateway systems, resulting in an EC-assisted IoT working model. Although this paradigm delivers unique features and an improved quality of service (QoS), installing IoT applications at LoRa gateways with limited computing and memory capabilities presents considerable obstacles. This article proposes the design and implementation of an “EC-assisted LoRa gateway” using edge computing. Our proposed latency-aware algorithm (LAA) can greatly improve the reliability of the network system by using a distributed edge computing network technology that can achieve maintenance operations, such as detection, repair, and replacement of failures of edge nodes in the network. Then, an EC-assisted LoRa gateway prototype was developed on an embedded hardware system. Finally, experiments were conducted to evaluate the performance of the proposed EC-assisted LoRa gateway. Compared with the conventional LoRa gateway, the proposed edge intelligent LoRa gateway had 41.1% lower bandwidth utilization and handled more end devices, ensuring system availability and IoT network reliability more effectively.

APA, Harvard, Vancouver, ISO, and other styles

20

Di, Xinkai, Hai-Gang Yang, Yiping Jia, Zhihong Huang, and Ning Mao. "Exploring Efficient Acceleration Architecture for Winograd-Transformed Transposed Convolution of GANs on FPGAs." Electronics 9, no. 2 (February 7, 2020): 286. http://dx.doi.org/10.3390/electronics9020286.

Full text

Abstract:

The acceleration architecture of transposed convolution layers is essential since transposed convolution operations, as critical components in the generative model of generative adversarial networks, are computationally intensive inherently. In addition, the pre-processing of inserting and padding with zeros for input feature maps causes many ineffective operations. Most of the already known FPGA (Field Programmable Gate Array) based architectures for convolution layers cannot tackle these issues. In this paper, we firstly propose a novel dataflow exploration through splitting the filters and its corresponding input feature maps into four sets and then applying the Winograd algorithm for fast processing with a high efficiency. Secondly, we present an underlying FPGA-based accelerator architecture that features owning processing units, with embedded parallel, pipelined, and buffered processing flow. At last, a parallelism-aware memory partition technique and the hardware-based design space are explored coordinating, respectively, for the required parallel operations and optimal design parameters. Experiments of several state-of-the-art GANs by our methods achieve an average performance of 639.2 GOPS on Xilinx ZCU102 and 162.5 GOPS on Xilinx VC706. In reference to a conventional optimized accelerator baseline, this work demonstrates an 8.6× (up to 11.7×) increase in processing performance, compared to below 2.2× improvement by the prior studies in the literature.

APA, Harvard, Vancouver, ISO, and other styles

21

Meng, Yang. "Analysis of Performance Improvement of Real-time Internet of Things Application Data Processing in the Movie Industry Platform." Computational Intelligence and Neuroscience 2022 (October 10, 2022): 1–9. http://dx.doi.org/10.1155/2022/5237252.

Full text

Abstract:

The goal of this study is to plan and develop complete strategies to improve the performance of film industry. The primary objectives of this study are to investigate a dataset generated by a IoT application and the nature of the data forms obtained, the speed of the data arriving rate, and the required query response time and to list the issues that the current film industry faces when attempting to handle IoT applications in real time. Finally, in film industry platforms, high performance with varied stream circulation levels of real-time IoT application information was realized. In this study, we proposed three alternative methods on top of the Storm platform, nicknamed Re-Storm, to improve the performance of IoT application data. Three different proposed strategies are (1) data stream graph optimization framework, (2) energy-efficient self-scheduling strategy, and (3) real-time data stream computing with memory DVFS. The work proposed a methodology for dealing with heterogeneous traffic-aware incoming rate of data streams Re-Storm at multiple traffic points, resulting in a short response time and great energy efficiency. It is divided into three parts, the first of which is a scientific model for fast response time and great energy efficiency. The distribution of resources is then considered using DVFS approaches, and successful optimum association methods are shown. Third is self-allocation of worker nodes towards optimizing DSG using hot swapping and making the span minimization technique. Furthermore, the testing findings suggest that Re-Storm outperforms Storm by 20–30% for real-time streaming data of IoT applications. This research focuses on high energy efficiency, short reaction time, and managing data stream traffic arrival rate. A model for a specific phase of data coming via IoT and real-time computing devices was built on top of the Storm platform. There is no need to change any software approach or hardware component in this design, but only merely add an energy-efficient and traffic-aware algorithm. The design and development of this algorithm take into account all of the needs of the data produced by IoT applications. It is an open-source platform with less prerequisites for addressing a more sophisticated big data challenge.

APA, Harvard, Vancouver, ISO, and other styles

22

Le-Tuan, Anh, Conor Hayes, Manfred Hauswirth, and Danh Le-Phuoc. "Pushing the Scalability of RDF Engines on IoT Edge Devices." Sensors 20, no. 10 (May 14, 2020): 2788. http://dx.doi.org/10.3390/s20102788.

Full text

Abstract:

Semantic interoperability for the Internet of Things (IoT) is enabled by standards and technologies from the Semantic Web. As recent research suggests a move towards decentralised IoT architectures, we have investigated the scalability and robustness of RDF (Resource Description Framework)engines that can be embedded throughout the architecture, in particular at edge nodes. RDF processing at the edge facilitates the deployment of semantic integration gateways closer to low-level devices. Our focus is on how to enable scalable and robust RDF engines that can operate on lightweight devices. In this paper, we have first carried out an empirical study of the scalability and behaviour of solutions for RDF data management on standard computing hardware that have been ported to run on lightweight devices at the network edge. The findings of our study shows that these RDF store solutions have several shortcomings on commodity ARM (Advanced RISC Machine) boards that are representative of IoT edge node hardware. Consequently, this has inspired us to introduce a lightweight RDF engine, which comprises an RDF storage and a SPARQL processor for lightweight edge devices, called RDF4Led. RDF4Led follows the RISC-style (Reduce Instruction Set Computer) design philosophy. The design constitutes a flash-aware storage structure, an indexing scheme, an alternative buffer management technique and a low-memory-footprint join algorithm that demonstrates improved scalability and robustness over competing solutions. With a significantly smaller memory footprint, we show that RDF4Led can handle 2 to 5 times more data than popular RDF engines such as Jena TDB (Tuple Database) and RDF4J, while consuming the same amount of memory. In particular, RDF4Led requires 10%–30% memory of its competitors to operate on datasets of up to 50 million triples. On memory-constrained ARM boards, it can perform faster updates and can scale better than Jena TDB and Virtuoso. Furthermore, we demonstrate considerably faster query operations than Jena TDB and RDF4J.

APA, Harvard, Vancouver, ISO, and other styles

23

Hajj, Hazem, Wassim El-Hajj, Mehiar Dabbagh, and Tawfik R. Arabi. "An Algorithm-Centric Energy-Aware Design Methodology." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, no. 11 (November 2014): 2431–35. http://dx.doi.org/10.1109/tvlsi.2013.2289906.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Ramos, Sabela, and Torsten Hoefler. "Cache Line Aware Algorithm Design for Cache-Coherent Architectures." IEEE Transactions on Parallel and Distributed Systems 27, no. 10 (October 1, 2016): 2824–37. http://dx.doi.org/10.1109/tpds.2016.2516540.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Goncalves, Paulo, Candido Moraes, Marcelo Porto, and Guilherme Correa. "Complexity-Aware TZS Algorithm for Mobile Video Encoders." Journal of Integrated Circuits and Systems 14, no. 3 (December 27, 2019): 1–9. http://dx.doi.org/10.29292/jics.v14i3.60.

Full text

Abstract:

Video applications are significantly growing in the last years, especially in embedded/mobile systems. Modern video compression algorithms and standards, such as the High Efficiency Video Coding (HEVC), achieved a high efficiency in compression ratio. However, such efficiency has caused an augment on the complexity to encode videos. This is a serious problem especially in mobile systems, which present restrictions on processing and energy consumption. This paper presents an enhanced Test Zone Search (TZS) algorithm, aiming at complexity reduction of the Motion Estimation (ME) process in the HEVC standard and focusing efficient hardware design for mobile encoders. The proposed algorithm is composed of two strategies: an early termination scheme for TZS, called e-TZS, and the Octagonal-Axis Raster Search Pattern (OARP). When combined and implemented in the HEVC reference encoder, the strategies allowed an average complexity reduction of 75.16% in TZS, with a negligible BD-rate increase of only 0.1242% in comparison to the original algorithm. Besides, the approach presents an average block matching operation reduction of 80%, allowing hardware simplification and decreasing memory access.

APA, Harvard, Vancouver, ISO, and other styles

26

Chen, Yi-Jung, Chia-Lin Yang, and Yen-Sheng Chang. "An architectural co-synthesis algorithm for energy-aware Network-on-Chip design." Journal of Systems Architecture 55, no. 5-6 (May 2009): 299–309. http://dx.doi.org/10.1016/j.sysarc.2009.02.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Chatterjee, Subarna, Mark F. Pekala, Lev Kruglyak, and Stratos Idreos. "Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage Engines." Proceedings of the ACM on Management of Data 2, no. 1 (March 12, 2024): 1–28. http://dx.doi.org/10.1145/3639302.

Full text

Abstract:

We present Limousine, a self-designing key-value storage engine, that can automatically morph to the near-optimal storage engine architecture shape given a workload, a cloud budget, and target performance. At its core, Limousine identifies the fundamental design principles of storage engines as combinations of learned and classical data structures that collaborate through algorithms for data storage and access. By unifying these principles over diverse hardware and three major cloud providers (AWS, GCP, and Azure), Limousine creates a massive design space of quindecillion (1048) storage engine designs the vast majority of which do not exist in literature or industry. Limousine contains a distribution-aware IO model to accurately evaluate any candidate design. Using these models, Limousine searches within the exhaustive design space to construct a navigable continuum of designs connected along a Pareto frontier of cloud cost and performance. If storage engines contain learned components, Limousine also introduces efficient lazy write algorithms to optimize the holistic read-write performance. Once the near-optimal design is decided for the given context, Limousine automatically materializes the corresponding design in Rust code. Using the YCSB benchmark, we demonstrate that storage engines automatically designed and generated by Limousine scale better by up to 3 orders of magnitude when compared with state-of-the-art industry-leading engines such as RocksDB, WiredTiger, FASTER, and Cosine, over diverse workloads, data sets, and cloud budgets.

APA, Harvard, Vancouver, ISO, and other styles

28

Chatterjee, Subarna, Meena Jagadeesan, Wilson Qin, and Stratos Idreos. "Cosine." Proceedings of the VLDB Endowment 15, no. 1 (September 2021): 112–26. http://dx.doi.org/10.14778/3485450.3485461.

Full text

Abstract:

We present a self-designing key-value storage engine, Cosine, which can always take the shape of the close to "perfect" engine architecture given an input workload, a cloud budget, a target performance, and required cloud SLAs. By identifying and formalizing the first principles of storage engine layouts and core key-value algorithms, Cosine constructs a massive design space comprising of sextillion (10 36 ) possible storage engine designs over a diverse space of hardware and cloud pricing policies for three cloud providers - AWS, GCP, and Azure. Cosine spans across diverse designs such as Log-Structured Merge-trees, B-trees, Log-Structured Hash-tables, in-memory accelerators for filters and indexes as well as trillions of hybrid designs that do not appear in the literature or industry but emerge as valid combinations of the above. Cosine includes a unified distribution-aware I/O model and a learned concurrency-aware CPU model that with high accuracy can calculate the performance and cloud cost of any possible design on any workload and virtual machines. Cosine can then search through that space in a matter of seconds to find the best design and materializes the actual code of the resulting storage engine design using a templated Rust implementation. We demonstrate that on average Cosine outperforms state-of-the-art storage engines such as write-optimized RocksDB, read-optimized WiredTiger, and very write-optimized FASTER by 53x, 25x, and 20x, respectively, for diverse workloads, data sizes, and cloud budgets across all YCSB core workloads and many variants.

APA, Harvard, Vancouver, ISO, and other styles

29

Mirzaei, Shahnam, Ryan Kastner, and Anup Hosangadi. "Layout Aware Optimization of High Speed Fixed Coefficient FIR Filters for FPGAs." International Journal of Reconfigurable Computing 2010 (2010): 1–17. http://dx.doi.org/10.1155/2010/697625.

Full text

Abstract:

We present a method for implementing high speed finite impulse response (FIR) filters on field programmable gate arrays (FPGAs). Our algorithm is a multiplierless technique where fixed coefficient multipliers are replaced with a series of add and shift operations. The first phase of our algorithm uses registered adders and hardwired shifts. Here, a modified common subexpression elimination (CSE) algorithm reduces the number of adders while maintaining performance. The second phase optimizes routing delay using prelayout wire length estimation techniques to improve the final placed and routed design. The optimization target platforms are Xilinx Virtex FPGA devices where we compare the implementation results with those produced by Xilinx Coregen, which is based on distributed arithmetic (DA). We observed up to 50% reduction in the number of slices and up to 75% reduction in the number of look up tables (LUTs) for fully parallel implementations compared to DA method. Also, there is 50% reduction in the total dynamic power consumption of the filters. Our designs perform up to 27% faster than the multiply accumulate (MAC) filters implemented by Xilinx Coregen tool using DSP blocks. For placement, there is a saving up to 20% in number of routing channels. This results in lower congestion and up to 8% reduction in average wirelength.

APA, Harvard, Vancouver, ISO, and other styles

30

Sudarshan, Deeksha, Chirag Khandelwal, Linge Gowda B M, Kiran Kumar Bijjaragi, and Rekha S S. "Resource Centric Analysis of RSA and ECC Algorithms on FPGA." ITM Web of Conferences 56 (2023): 01006. http://dx.doi.org/10.1051/itmconf/20235601006.

Full text

Abstract:

The electronics industry’s shadow side is counterfeiting, and the doom is growing. Almost every business in the supply chain is impacted by the issue, including component suppliers, distributors, Electronics Manufacturing Services (EMS) providers, Original Design Manufacturers (ODMs), Original Equipment Manufacturers (OEMs), and their clients. In fact, any electronics firm that wishes to benefit from the cheap costs associated with globalization must be aware that someone along the supply chain may be persuaded to acquire fake items and sell them as genuine. A thorough grasp of chip designs, including partitioning and prioritizing data transit and storage, as well as a range of obfuscation techniques and activity monitoring, is necessary to reduce the danger of future hardware breaches. To battle this problem, we need to enforce various security measures at different levels of the supply chain. The recent methods include implementing cryptographic ciphers into the devices. The commonly used ciphers are the hard ciphers. But owing to the advancements and increase in the number of low power and resource constrained devices, there has been a dire need to design ciphers that support such devices. This paper talks about the advantages of lightweight ciphers, aiming to secure low power devices and other embedded devices. This work mainly compares two algorithms, RSA(hard cipher) and ECC(light cipher) in terms of their device utilization and power consumption on a Kintex-7. The presented results are justified from simulations performed on the Vivado design suite.

APA, Harvard, Vancouver, ISO, and other styles

31

Li, Yihang. "Sparse-Aware Deep Learning Accelerator." Highlights in Science, Engineering and Technology 39 (April 1, 2023): 305–10. http://dx.doi.org/10.54097/hset.v39i.6544.

Full text

Abstract:

In view of the difficulty of hardware implementation of convolutional neural network computing, most of the previous convolutional neural network accelerator designs focused on solving the bottleneck of computational performance and bandwidth, ignoring the importance of convolutional neural network scarcity for accelerator design. In recent years, there are a few convolutional neural network accelerators that can take advantage of the scarcity, but they are usually difficult to consider in terms of computational flexibility, parallel efficiency and resource overhead. In view of the problem that the application of convolutional neural network (CNN) on the embedded side is limited by real-time, and there is a large degree of sparsity in CNN convolution calculation. This paper summarizes the methods of sparsification from the algorithm level and based on FPGA level. The different methods of sparsification and the research and analysis of different application layers are introduced. The advantages and development trend of sparsification are analyzed and summarized.

APA, Harvard, Vancouver, ISO, and other styles

32

Ye, Wenbin, and Ya Jun Yu. "Power Oriented Design of Linear Phase FIR Filters." Journal of Circuits, Systems and Computers 25, no. 07 (April 22, 2016): 1650075. http://dx.doi.org/10.1142/s0218126616500754.

Full text

Abstract:

In the design of low computational complexity and low power FIR filters, researchers have made every effort to reduce the number of adders when coefficients multipliers are considered as the multiple constant multiplication problem. In this paper, for the first time, we propose a power oriented optimization of linear phase FIR filters, where a power cost is used as the criteria in the discrete coefficient searches and synthesis. The power cost is computed based on a newly proposed power model, which takes both the static power and dynamic power into consideration. With the new power model, a new coefficient synthesis scheme is proposed such that the synthesized coefficient consumes the minimum power. Compared to the adder-cost oriented algorithm, the proposed power-oriented algorithm has two advantages: First, the algorithm can optimize filters with lower power consumption, and second, the optimal design in the sense of power consumption is frequency aware. Unlike the adder-cost oriented algorithms that generate the same final coefficient set and the same synthesis of the coefficient set regardless of the frequency for a given filter specification, the proposed algorithm search and synthesizes the coefficients with the awareness of the working frequency; different designs may be resulted for the same filter specification but different working frequency, and each designed filter has lower power consumption in its specified frequency. Transistor level simulations of benchmark filters verified the above claims.

APA, Harvard, Vancouver, ISO, and other styles

33

Belakaria, Syrine, Aryan Deshwal, Nitthilan Kannappan Jayakodi, and Janardhan Rao Doppa. "Uncertainty-Aware Search Framework for Multi-Objective Bayesian Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10044–52. http://dx.doi.org/10.1609/aaai.v34i06.6561.

Full text

Abstract:

We consider the problem of multi-objective (MO) blackbox optimization using expensive function evaluations, where the goal is to approximate the true Pareto set of solutions while minimizing the number of function evaluations. For example, in hardware design optimization, we need to find the designs that trade-off performance, energy, and area overhead using expensive simulations. We propose a novel uncertainty-aware search framework referred to as USeMO to efficiently select the sequence of inputs for evaluation to solve this problem. The selection method of USeMO consists of solving a cheap MO optimization problem via surrogate models of the true functions to identify the most promising candidates and picking the best candidate based on a measure of uncertainty. We also provide theoretical analysis to characterize the efficacy of our approach. Our experiments on several synthetic and six diverse real-world benchmark problems show that USeMO consistently outperforms the state-of-the-art algorithms.

APA, Harvard, Vancouver, ISO, and other styles

34

Choudhury, Priyanka, Kanchan Manna, Vivek Rai, and Sambhu Nath Pradhan. "Thermal-Aware Partitioning and Encoding of Power-Gated FSM." Journal of Circuits, Systems and Computers 28, no. 09 (August 2019): 1950144. http://dx.doi.org/10.1142/s0218126619501445.

Full text

Abstract:

Miniaturization and the continued scaling of CMOS technology leads to the high-power dissipation and ever-increasing power densities. One of the major challenges for the designer at all design levels is the temperature management, particularly the local hot spots along with power dissipation. In this work, the controller circuits which are implemented as Finite State Machines (FSMs) are considered for their thermal-aware and power-aware realization. Using Genetic Algorithm (GA), both encoding and bipartitioning of the FSM circuit are implemented to get two subFSMs such that at a particular instant of time, one subFSM is active at a time, whereas the other one is power-gated. Again, thermal-aware realization (in terms of power-density) of this power-gated FSM is done. Therefore, the work concerns with the thermal-aware encoding and partitioning of FSM for its power-gated realization. Average temperature saving obtained in this approach for a set of benchmark circuits over previous works is more than 16%. After getting the final partitioned circuit which is optimized in terms of Area and power-density, thermal analysis of the sunFSMs is performed to get the absolute temperature. As thermal-aware design may increase the area, a suitable area-temperature trade-off is also presented in this paper.

APA, Harvard, Vancouver, ISO, and other styles

35

Wang, Rongrong, Rui Tan, Zhenyu Yan, and Chris Xiaoxuan Lu. "Orientation-Aware 3D SLAM in Alternating Magnetic Field from Powerlines." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, no. 4 (December 19, 2023): 1–25. http://dx.doi.org/10.1145/3631446.

Full text

Abstract:

Identifying new sensing modalities for indoor localization is an interest of research. This paper studies powerline-induced alternating magnetic field (AMF) that fills the indoor space for the orientation-aware three-dimensional (3D) simultaneous localization and mapping (SLAM). While an existing study has adopted a uniaxial AMF sensor for SLAM in a plane surface, the design falls short of addressing the vector field nature of AMF and is therefore susceptible to sensor orientation variations. Moreover, although the higher spatial variability of AMF in comparison with indoor geomagnetism promotes location sensing resolution, extra SLAM algorithm designs are needed to achieve robustness to trajectory deviations from the constructed map. To address the above issues, we design a new triaxial AMF sensor and a new SLAM algorithm that constructs a 3D AMF intensity map regularized and augmented by a Gaussian process. The triaxial sensor's orientation estimation is free of the error accumulation problem faced by inertial sensing. From extensive evaluation in eight indoor environments, our AMF-based 3D SLAM achieves sub-1m to 3m median localization errors in spaces of up to 500 m2, sub-2° mean error in orientation sensing, and outperforms the SLAM systems based on Wi-Fi, geomagnetism, and uniaxial AMF by more than 30%.

APA, Harvard, Vancouver, ISO, and other styles

36

Parane, Khyamling, B. M. Prabhu Prasad, and Basavaraj Talawar. "YaNoC: Yet Another Network-on-Chip Simulation Acceleration Engine Supporting Congestion-Aware Adaptive Routing Using FPGAs." Journal of Circuits, Systems and Computers 28, no. 12 (November 2019): 1950202. http://dx.doi.org/10.1142/s0218126619502025.

Full text

Abstract:

Many-core systems employ the Network on Chip (NoC) as the underlying communication architecture. To achieve an optimized design for an application under consideration, there is a need for fast and flexible NoC simulator. This paper presents an FPGA-based NoC simulation acceleration framework supporting design space exploration of standard and custom NoC topologies considering a full set of microarchitectural parameters. The framework is capable of designing custom routing algorithms, various traffic patterns such as uniform random, transpose, bit complement and random permutation are supported. For conventional NoCs, the standard minimal routing algorithms are supported. For designing the custom topologies, the table-based routing has been implemented. A custom topology called diagonal mesh has been evaluated using table-based and novel shortest path routing algorithm. A congestion-aware adaptive routing has been proposed to route the packets along the minimally congested path. The congestion-aware adaptive routing algorithm has negligible FPGA area overhead compared to the conventional XY routing. Employing the congestion-aware adaptive routing, network latency is reduced by 55% compared to the XY routing algorithm. The microarchitectural parameters such as buffer depth, traffic pattern and flit width have been varied to observe the effect on NoC behavior. For the [Formula: see text] mesh topology, the LUT and FF usages will be increased from 32.23% to 34.45% and from 12.62% to 15% considering the buffer depth of 4 and flit widths of 16 bits, and 32 bits, respectively. Similar behavior has been observed for other configurations of buffer depth and flit width. The torus topology consumes 24% more resources than the mesh topology. The 56-node fat tree topology consumes 27% and 2.2% more FPGA resources than the [Formula: see text] mesh and torus topologies. The 56-node fat tree topology with buffer depth of 8 and 16 flits saturates at the injection rates of 40% and 45%, respectively.

APA, Harvard, Vancouver, ISO, and other styles

37

Denoyelle, Nicolas, John Tramm, Kazutomo Yoshii, Swann Perarnau, and Pete Beckman. "NUMA-AWARE DATA MANAGEMENT FOR NEUTRON CROSS SECTION DATA IN CONTINUOUS ENERGY MONTE CARLO NEUTRON TRANSPORT SIMULATION." EPJ Web of Conferences 247 (2021): 04020. http://dx.doi.org/10.1051/epjconf/202124704020.

Full text

Abstract:

The calculation of macroscopic neutron cross-sections is a fundamental part of the continuous-energy Monte Carlo (MC) neutron transport algorithm. MC simulations of full nuclear reactor cores are computationally expensive, making high-accuracy simulations impractical for most routine reactor analysis tasks because of their long time to solution. Thus, preparation of MC simulation algorithms for next generation supercomputers is extremely important as improvements in computational performance and efficiency will directly translate into improvements in achievable simulation accuracy. Due to the stochastic nature of the MC algorithm, cross-section data tables are accessed in a highly randomized manner, resulting in frequent cache misses and latency-bound memory accesses. Furthermore, contemporary and next generation non-uniform memory access (NUMA) computer architectures, featuring very high latencies and less cache space per core, will exacerbate this behaviour. The absence of a topology-aware allocation strategy in existing high-performance computing (HPC) programming models is a major source of performance problems in NUMA systems. Thus, to improve performance of the MC simulation algorithm, we propose a topology-aware data allocation strategies that allow full control over the location of data structures within a memory hierarchy. A new memory management library, known as AML, has recently been created to facilitate this mapping. To evaluate the usefulness of AML in the context of MC reactor simulations, we have converted two existing MC transport cross-section lookup “proxy-applications” (XSBench and RSBench) to utilize the AML allocation library. In this study, we use these proxy-applications to test several continuous-energy cross-section data lookup strategies (the nuclide grid, unionized grid, logarithmic hash grid, and multipole methods) with a number of AML allocation schemes on a variety of node architectures. We find that the AML library speeds up cross-section lookup performance up to 2x on current generation hardware (e.g., a dual-socket Skylake-based NUMA system) as compared with naive allocation. These exciting results also show a path forward for efficient performance on next-generation exascale supercomputer designs that feature even more complex NUMA memory hierarchies.

APA, Harvard, Vancouver, ISO, and other styles

38

Landmann, Christoph, and Rolf Kall. "Graphical Hardware Description as a High-Level Design Entry Method for FPGA-Based Data Acquisition Systems." Key Engineering Materials 613 (May 2014): 296–306. http://dx.doi.org/10.4028/www.scientific.net/kem.613.296.

Full text

Abstract:

Probably one of the most significant developments in the field of software-defined multifunction data acquisition systems and devices is the employment of FPGA (Field-Programmable GateArray) technology, resulting in a tremendous digital processing potential close to the I/O pin. FPGA technology is based on reconfigurable semiconductor devices which can be employed as processing targets in heterogeneous computing architectures for a variety of data acquisition applications. They can primarily be characterized by generic properties, such as deterministic execution, inherent parallelism, fast processing speed and high availability, stability and reliability. Therefore FPGAs areparticularly suitable for use in “intelligent” data acquisition applications that require either in-line digital signal co-processing or real-time system emulation in the field of advanced control, protocol aware communication, hardware-in-the-loop (HIL) as well as RF and wireless test. From the perspective of a domain expert however, primarily being focused on developing applications and algorithms, simple and intuitive design entry methods and tools are required that facilitate the FPGA configuration and design entry process. Traditional FPGA design entry methods and commercially available tools assume a comprehensive knowledge of hardware description languages (HDL),such as VHDL or Verilog®, and implement a process or function at register-level. In contrast, graphical hardware description languages for FPGAs, such as the integrated development environment NI LabVIEW® with FPGA module extension, abstract the design process by means of graphical objects, I/O nodes and interconnecting wires that represent the FPGA’s IP and implement processes, timing, I/O integration and data flow. This paper discusses the advantages of graphical system design for FPGAs over text-based alternatives, introduces interfaces for the integration of 3rd party IP, all backed up by a detailed illustration of a COTS FPGA-based multifunction DAQ target compared to a traditional DAQ architecture.

APA, Harvard, Vancouver, ISO, and other styles

39

Annaz, Fawaz. "UAV Testbed Training Platform development using Panda3d." Industrial Robot: An International Journal 42, no. 5 (August 17, 2015): 450–56. http://dx.doi.org/10.1108/ir-01-2015-0017.

Full text

Abstract:

Purpose – The paper aims to report the development of an Unmanned Aerial Vehicle (UAV) Testbed Training Platform (TTP). The development is to enable users to safely fly and control the UAV in real time within a limited (yet unconstrained) virtually created environment. Thus, the paper introduces a hardware–virtual environment coupling concept, the Panda3D gaming engine utilization to develop the graphical user interface (GUI) and the 3D-flying environment, as well as the interfacing electronics that enables tracking, monitoring and mapping of real-time movement onto the virtual domain and vice verse. Design/methodology/approach – The platform comprises a spring-shuttle assembly fixed to a heavy aluminium base. The spring supports a rotating platform (RP), which is intended to support UAVs. The RP yaw, pitch and roll are measured by an inertial measurement unit, its climb/descend is measured by a low cost infrared proximity sensor and its rotation is measured by a rotary optical encoder. The hardware is coupled to a virtual environment (VE), which was developed using the Panda3D gaming engine. The VE includes a GUI to generate, edit, load and save real-life environments. Hardware manoeuvres are reflected into the VE. Findings – The prototype was proven effective in dynamically mapping and tracking the rotating platform movements in the virtual environment. This should not be confused with the hardware in loop approach, which requires the inclusion of a mathematical model of the hardware in a loop. The finding will provide future means of testing navigation and tracking algorithms. Research limitations/implications – The work is still new, and there is great room for improvement in many aspects. Here, this paper reports the concept and its technical implementation only. Practical implications – In the literature, various testbeds were reported, and it is felt that there is still room to come up with a better design that enables UAV flying in safer and unlimited environments. This has many practical implications, particularly in testing control and navigation algorithms in hazardous fields. Social implications – The main social impact is to utilise the concept to develop systems that are capable of autonomous rescue mission navigation in disaster zones. Originality/value – The authors are aware that various researchers have developed various testbeds, at different degrees of freedom. Similarly, the authors are also aware that researchers have used game engines to simulate mobile robots or sophisticated equipment (like the VICON Motion Capture System) to measure to perform complex manoeuvres. However, the cost of this kind of equipment is very high, autonomous movements are planned in restricted environments and tested systems are only autonomous in certain setups. However, the idea of mapping the dynamics of an avatar flying object onto a 3D-VE is novel. To improve productivity and rapid prototyping, this paper proposes the use of commercially available game engines, such as the Panda3D, to create virtual environments.

APA, Harvard, Vancouver, ISO, and other styles

40

Srinath, B., Rajesh Verma, Abdulwasa Bakr Barnawi, Ramkumar Raja, Mohammed Abdul Muqeet, Neeraj Kumar Shukla, A. Ananthi Christy, C. Bharatiraja, and Josiah Lange Munda. "An Investigation of Clock Skew Using a Wirelength-Aware Floorplanning Process in the Pre-Placement Stages of MSV Layouts." Electronics 10, no. 22 (November 15, 2021): 2795. http://dx.doi.org/10.3390/electronics10222795.

Full text

Abstract:

Managing the timing constraints has become an important factor in the physical design of multiple supply voltage (MSV) integrated circuits (IC). Clock distribution and module scheduling are some of the conventional methods used to satisfy the timing constraints of a chip. In this paper, we propose a simulated annealing-based MSV floorplanning methodology for the design of ICs within the timing budget. Additionally, we propose a modified SKB tree representation for floorplanning the modules in the design. Our algorithm finds the optimal dimensions and position of the clocked modules in the design to reduce the wirelength and satisfy the timing constraints. The proposed algorithm is implemented in IWLS 2005 benchmark circuits and considers power, wirelength, and timing as the optimization parameters. Simulation results were obtained from the Cadence Innovus digital system taped-out at 45 nm. Our simulation results show that the proposed algorithm satisfies timing constraints through a 30.6% reduction in wirelength.

APA, Harvard, Vancouver, ISO, and other styles

41

Ren, Jiankang, Chunxiao Liu, Chi Lin, Ran Bi, Simeng Li, Zheng Wang, Yicheng Qian, Zhichao Zhao, and Guozhen Tan. "Protection Window Based Security-Aware Scheduling against Schedule-Based Attacks." ACM Transactions on Embedded Computing Systems 22, no. 5s (September 9, 2023): 1–22. http://dx.doi.org/10.1145/3609098.

Full text

Abstract:

With widespread use of common-off-the-shelf components and the drive towards connection with external environments, the real-time systems are facing more and more security problems. In particular, the real-time systems are vulnerable to the schedule-based attacks because of their predictable and deterministic nature in operation. In this paper, we present a security-aware real-time scheduling scheme to counteract the schedule-based attacks by preventing the untrusted tasks from executing during the attack effective window (AEW). In order to minimize the AEW untrusted coverage ratio for the system with uncertain AEW size, we introduce the protection window to characterize the system protection capability limit due to the system schedulability constraint. To increase the opportunity of the priority inversion for the security-aware scheduling, we design an online feasibility test method based on the busy interval analysis. In addition, to reduce the run-time overhead of the online feasibility test, we also propose an efficient online feasibility test method based on the priority inversion budget analysis to avoid online iterative calculation through the offline maximum slack analysis. Owing to the protection window and the online feasibility test, our proposed approach can efficiently provide best-effort protection to mitigate the schedule-based attack vulnerability while ensuring system schedulability. Experiments show the significant security capability improvement of our proposed approach over the state-of-the-art coverage oriented scheduling algorithm.

APA, Harvard, Vancouver, ISO, and other styles

42

Das, Apangshu, Yallapragada C. Hareesh, and Sambhu Nath Pradhan. "NSGA-II Based Thermal-Aware Mixed Polarity Dual Reed–Muller Network Synthesis Using Parallel Tabular Technique." Journal of Circuits, Systems and Computers 29, no. 15 (July 2, 2020): 2020008. http://dx.doi.org/10.1142/s021812662020008x.

Full text

Abstract:

Proposed work presents an OR-XNOR-based thermal-aware synthesis approach to reduce peak temperature by eliminating local hotspots within a densely packed integrated circuit. Tremendous increase in package density at sub-nanometer technology leads to high power-density that generates high temperature and creates hotspots. A nonexhaustive meta-heuristic algorithm named nondominated sorting genetic algorithm-II (NSGA-II) has been employed for selecting suitable input polarity of mixed polarity dual Reed–Muller (MPDRM) expansion function to reduce the power-density. A parallel tabular technique is used for input polarity conversion from Product-of-Sum (POS) to MPDRM function. Without performance degradation, the proposed MPDRM approach shows more than 50% improvement in the area and power savings and around 6% peak temperature reduction for the MCNC benchmark circuits than that of earlier literature at the logic level. Algorithmic optimized circuit decompositions are implemented in physical design domain using CADENCE INNOVUS and HotSpot tool and silicon area, power consumption and absolute temperature are reported to validate the proposed technique.

APA, Harvard, Vancouver, ISO, and other styles

43

Ye, Yunfei, Ning Wu, Xiaoqiang Zhang, Liling Dong, and Fang Zhou. "An Optimized Design for Compact Masked AES S-Box Based on Composite Field and Common Subexpression Elimination Algorithm." Journal of Circuits, Systems and Computers 27, no. 11 (June 6, 2018): 1850171. http://dx.doi.org/10.1142/s0218126618501712.

Full text

Abstract:

As the only nonlinear operation, masked S-box is the core to resist differential power attack (DPA) for advanced encryption standard (AES) cipher chips. In order to suit for the resource-constrained applications, a compact masked S-box based on composite field is proposed in this paper. Firstly, the architecture of masked S-box is designed with composite field masking method. Secondly, four masked S-boxes based on GF ((2[Formula: see text], which are based on four basis methods with the optimal coefficient and the corresponding optimal root, are implemented and optimized by the delay-aware common subexpression elimination (DACSE) algorithm. Finally, experimental results show that, while maintaining the DPA-resistance performance, our best masked S-box achieves better area performance with the fastest speed compared with the existing works. Therefore, our masked S-box is suitable for resource-constrained applications with fast speed requirements.

APA, Harvard, Vancouver, ISO, and other styles

44

Lee, Kyu-Bae, Jina Park, Eunjin Choi, Mingi Jeon, and Woojoo Lee. "Developing a TEI-Aware PMIC for Ultra-Low-Power System-on-Chips." Energies 15, no. 18 (September 16, 2022): 6780. http://dx.doi.org/10.3390/en15186780.

Full text

Abstract:

As the demand for ultra-low-power (ULP) devices has increased tremendously, system-on-chip (SoC) designs based on ultra-low-voltage (ULV) operation have been receiving great attention. Moreover, research has shown the remarkable potential that even more power savings can be achieved in ULV SoCs by exploiting the temperature effect inversion (TEI) phenomenon, i.e., the delay of the ULV SoCs decreases with increasing temperature. However, TEI-aware low-power (TEI-LP) techniques have a critical limitation in practical terms, in that dedicated power management-integrated circuits (PMICs) have not yet been developed. In other words, it is essential to develop PMICs that automatically bring out the full potential of the TEI-LP techniques as the chip temperature changes. With the aim of designing such PMICs, this paper first conducted a study to find the most suitable DC-DC converter for PMICs and then developed a control algorithm to maximize the effectiveness of the TEI-LP techniques. Furthermore, we have developed a compact hardware controller for the algorithm to operate most energy efficiently on ULP-SoCs.

APA, Harvard, Vancouver, ISO, and other styles

45

LIM, PILOK, KI-SEOK CHUNG, and TAEWHAN KIM. "THERMAL-AWARE HIGH-LEVEL SYNTHESIS BASED ON NETWORK FLOW METHOD." Journal of Circuits, Systems and Computers 18, no. 05 (August 2009): 965–84. http://dx.doi.org/10.1142/s0218126609005472.

Full text

Abstract:

Controlling the chip temperature is becoming one of the important design considerations, since temperature adversely and seriously affects many of design qualities, such as reliability, performance and power of chip, and increases the packaging cost. In this work, we address a new problem of thermal-aware functional module binding in high-level synthesis, in which the objective is to minimize the peak temperature of the chip. Two key contributions are (1) to solve the binding problem with the primary objective of minimizing the "peak" switched capacitance of modules and the secondary objective of minimizing the "total" switched capacitance of modules and (2) to control the switched capacitances with respect to the floorplan of modules in a way to minimize the "peak" heat diffusion between modules. For (1), our proposed thermal-aware binding algorithm, called TA-b, formulates the thermal-aware binding problem into a problem of repeated utilization of network flow method, and solve it effectively. For (2), TA-b is extended, called TA-bf, to take into account a given floorplan information of functional modules, if exists, of modules to be practically effective. Through experiments using a set of benchmarks, it is shown that TA-bf is able to use 10.1°C and 11.8°C lower peak temperature on the average, compared to that of the conventional low-power and thermal-aware methods, which target to minimizing total switched capacitance only in Ref. 20 and to minimizing peak switched capacitance only in Ref. 16, respectively. Additionally, we confirm, from the experiments, that the reduced peak temperature saves leakage power significantly, implying that controlling chip temperature is critically important for reducing leakage current as well.

APA, Harvard, Vancouver, ISO, and other styles

46

Chaudhry, M. A. R., Z. Asad, A. Sprintson, and J. Hu. "Efficient Congestion Mitigation Using Congestion-Aware Steiner Trees and Network Coding Topologies." VLSI Design 2011 (April 28, 2011): 1–9. http://dx.doi.org/10.1155/2011/892310.

Full text

Abstract:

In the advent of smaller devices, a significant increase in the density of on-chip components has raised congestion and overflow as critical issues in VLSI physical design automation. In this paper, we present novel techniques for reducing congestion and minimizing overflows. Our methods are based on ripping up nets that go through the congested areas and replacing them with congestion-aware topologies. Our contributions can be summarized as follows. First, we present several efficient algorithms for finding congestion-aware Steiner trees that is, trees that avoid congested areas of the chip. Next, we show that the novel technique of network coding can lead to further improvements in routability, reduction of congestion, and overflow avoidance. Finally, we present an algorithm for identifying efficient congestion-aware network coding topologies. We evaluate the performance of the proposed algorithms through extensive simulations.

APA, Harvard, Vancouver, ISO, and other styles

47

G., Muneeswari, Ahilan A., Rajeshwari R, Kannan K., and John Clement Singh C. "Trust And Energy-Aware Routing Protocol for Wireless Sensor Networks Based on Secure Routing." International journal of electrical and computer engineering systems 14, no. 9 (November 14, 2023): 1015–22. http://dx.doi.org/10.32985/ijeces.14.9.6.

Full text

Abstract:

Wireless Sensor Network (WSN) is a network area that includes a large number of nodes and the ability of wireless transmission. WSNs are frequently employed for vital applications in which security and dependability are of utmost concern. The main objective of the proposed method is to design a WSN to maximize network longevity while minimizing power usage. In a WSN, trust management is employed to encourage node collaboration, which is crucial for achieving dependable transmission. In this research, a novel Trust and Energy Aware Routing Protocol (TEARP) in wireless sensors networks is proposed, which use blockchain technology to maintain the identity of the Sensor Nodes (SNs) and Aggregator Nodes (ANs). The proposed TEARP technique provides a thorough trust value for nodes based on their direct trust values and the filtering mechanisms generate the indirect trust values. Further, an enhanced threshold technique is employed to identify the most appropriate clustering heads based on dynamic changes in the extensive trust values and residual energy of the networks. Lastly, cluster heads should be routed in a secure manner using a Sand Cat Swarm Optimization Algorithm (SCSOA). The proposed method has been evaluated using specific parameters such as Network Lifetime, Residual Energy, Throughpu,t Packet Delivery Ratio, and Detection Accuracy respectively. The proposed TEARP method improves the network lifetime by 39.64%, 33.05%, and 27.16%, compared with Energy-efficient and Secure Routing (ESR), Multi-Objective nature-inspired algorithm based on Shuffled frog-leaping algorithm and Firefly Algorithm (MOSFA) , and Optimal Support Vector Machine (OSVM).

APA, Harvard, Vancouver, ISO, and other styles

48

Ciuffoletti, Augusto. "Power-Aware Synchronization of a Software Defined Clock." Journal of Sensor and Actuator Networks 8, no. 1 (January 18, 2019): 11. http://dx.doi.org/10.3390/jsan8010011.

Full text

Abstract:

In a distributed system, a common time reference allows each component to associate the same timestamp to events that occur simultaneously. It is a design option with benefits and drawbacks since it simplifies and makes more efficient a number of functions, but requires additional resources and control to keep component clocks synchronized. In this paper, we quantify how much power is spent to implement such a function, which helps to solve the dilemma in a system of low-power sensors. To find widely applicable results, the formal model used in our investigation is agnostic of the communication pattern that components use to synchronize their clocks, and focuses on the scheduling of clock synchronization operations needed to correct clock drift. This model helps us to discover that the dynamic calibration of clock drift significantly reduces power consumption. We derive an optimal algorithm to keep a software defined clock (SDCk) synchronized with the reference, and we find that its effectiveness is strongly influenced by hardware clock quality. To demonstrate the soundness of formal statements, we introduce a proof of concept. For its implementation, we privilege low-cost components and standard protocols, and we use it to find that the power needed to keep a clock within 200 ms from UTC (Universal Time Coordinate) as on the order of 10−5 W . The prototype is fully documented and reproducible.

APA, Harvard, Vancouver, ISO, and other styles

49

Tan, Junyan, and Chunhua Cai. "An Efficient Partitioning Algorithm Based on Hypergraph for 3D Network-On-Chip Architecture Floorplanning." Journal of Circuits, Systems and Computers 28, no. 05 (May 2019): 1950075. http://dx.doi.org/10.1142/s0218126619500750.

Full text

Abstract:

Network-on-Chip (NoC) supplies a scalable and fast interconnect for the communication between the different IP cores in the System-on-Chip (SoC). With the growing complexity in consumer embedded systems, the emerging SoC architectures integrate more and more components for the different signal processing tasks. Two dimensional Network-on-Chip (2D NoC) becomes a bottleneck for the development of the SoC architecture because of its limitation on the area of chip and the long latency. In this case, SoC research is forcing on the exploration of three dimensions (3D) technology for developing the next generation of large SoC which integrates three dimensional Network-on-Chip (3D NoC) for the communication architecture. 3D design technology resolves the vertical inter-layer connection issue by Through-Silicon Vias (TSVs). However, TSVs occupy significant silicon area which limits the inter-layer links of the 3D NoC. Therefore, the task partitioning on 3D NoC must be judicious in large SoC design. In this paper, we propose an efficient layer-aware partitioning algorithm based on hypergraph (named ELAP-NoC) for the task partitioning with TSV minimization for 3D NoC architecture floorplanning. ELAP-NoC contains divergence stage and convergence stage. ELAP-NoC supplies firstly a multi-way min-cut partitioning to gradually divide a given design layer by layer in the divergence stage in order to get an initial solution, then this solution is refined in convergence stage. The experiments show that ELAP-NoC performs a better capacity in the partitioning of the different numbers of cores which supplies the first step for the 3D NoC floorplanning.

APA, Harvard, Vancouver, ISO, and other styles

50

Yoon, Hyejung, Kyungwoon Cho, and Hyokyung Bahn. "Storage Type and Hot Partition Aware Page Reclamation for NVM Swap in Smartphones." Electronics 11, no. 3 (January 27, 2022): 386. http://dx.doi.org/10.3390/electronics11030386.

Full text

Abstract:

With the rapid advances in mobile app technologies, new activities using smartphones emerge every day including social network and location-based services. However, smartphones experience problems in handling high priority tasks, and often close apps without the user’s agreement when there is no available memory space. To cope with this situation, supporting swap with fast NVM storage has been suggested. Although swap in smartphones incurs serious slowing-down problems in I/O operations during saving and restoring the context of apps, NVM has been shown to resolve this problem due to its fast I/O features. Unlike previous studies that only focused on the management of NVM swap itself, this article discusses how the memory management system of smartphones can be further improved with NVM swap. Specifically, we design a new page reclamation algorithm for smartphone memory systems, which considers the following: (1) storage types of each partition (i.e., file system for flash storage and swap for NVM), and (2) access hotness of each partition including operation types and workload characteristics. By considering asymmetric I/O cost and access density for each partition, our algorithm improves the I/O performance of smartphones significantly. Specifically, it improves the I/O time by 15.0% on average and by up to 35.1% compared to the well-known CLOCK algorithm.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!