Auswahl der wissenschaftlichen Literatur zum Thema „Hardware-Aware Algorithm design“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Hardware-Aware Algorithm design" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Hardware-Aware Algorithm design"

1

An, Jianjing, Dezheng Zhang, Ke Xu und Dong Wang. „An OpenCL-Based FPGA Accelerator for Faster R-CNN“. Entropy 24, Nr. 10 (23.09.2022): 1346. http://dx.doi.org/10.3390/e24101346.

Der volle Inhalt der Quelle
Annotation:
In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN’s inherently high computational complexity and high memory complexity bring challenges to the design of efficient accelerators. This paper proposes a software-hardware co-design scheme based on OpenCL to implement a Faster R-CNN object detection algorithm on FPGA. First, we design an efficient, deep pipelined FPGA hardware accelerator that can implement Faster R-CNN algorithms for different backbone networks. Then, an optimized hardware-aware software algorithm was proposed, including fixed-point quantization, layer fusion, and a multi-batch Regions of interest (RoIs) detector. Finally, we present an end-to-end design space exploration scheme to comprehensively evaluate the performance and resource utilization of the proposed accelerator. Experimental results show that the proposed design achieves a peak throughput of 846.9 GOP/s at the working frequency of 172 MHz. Compared with the state-of-the-art Faster R-CNN accelerator and the one-stage YOLO accelerator, our method achieves 10× and 2.1× inference throughput improvements, respectively.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Vo, Quang Hieu, Faaiz Asim, Batyrbek Alimkhanuly, Seunghyun Lee und Lokwon Kim. „Hardware Platform-Aware Binarized Neural Network Model Optimization“. Applied Sciences 12, Nr. 3 (26.01.2022): 1296. http://dx.doi.org/10.3390/app12031296.

Der volle Inhalt der Quelle
Annotation:
Deep Neural Networks (DNNs) have shown superior accuracy at the expense of high memory and computation requirements. Optimizing DNN models regarding energy and hardware resource requirements is extremely important for applications with resource-constrained embedded environments. Although using binary neural networks (BNNs), one of the recent promising approaches, significantly reduces the design’s complexity, accuracy degradation is inevitable when reducing the precision of parameters and output activations. To balance between implementation cost and accuracy, in addition to proposing specialized hardware accelerators for corresponding specific network models, most recent software binary neural networks have been optimized based on generalized metrics, such as FLOPs or MAC operation requirements. However, with the wide range of hardware available today, independently evaluating software network structures is not good enough to determine the final network model for typical devices. In this paper, an architecture search algorithm based on estimating the hardware performance at the design time is proposed to achieve the best binary neural network models for hardware implementation on target platforms. With the XNOR-net used as a base architecture and target platforms, including Field Programmable Gate Array (FPGA), Graphic Processing Unit (GPU), and Resistive Random Access Memory (RRAM), the proposed algorithm shows its efficiency by giving more accurate estimation for the hardware performance at the design time than FLOPs or MAC operations.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Fung, Wing On, und Tughrul Arslan. „A power-aware algorithm for the design of reconfigurable hardware during high level placement“. International Journal of Knowledge-based and Intelligent Engineering Systems 12, Nr. 3 (21.10.2008): 237–44. http://dx.doi.org/10.3233/kes-2008-12306.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Petschenig, Horst, und Robert Legenstein. „Quantized rewiring: hardware-aware training of sparse deep neural networks“. Neuromorphic Computing and Engineering 3, Nr. 2 (26.05.2023): 024006. http://dx.doi.org/10.1088/2634-4386/accd8f.

Der volle Inhalt der Quelle
Annotation:
Abstract Mixed-signal and fully digital neuromorphic systems have been of significant interest for deploying spiking neural networks in an energy-efficient manner. However, many of these systems impose constraints in terms of fan-in, memory, or synaptic weight precision that have to be considered during network design and training. In this paper, we present quantized rewiring (Q-rewiring), an algorithm that can train both spiking and non-spiking neural networks while meeting hardware constraints during the entire training process. To demonstrate our approach, we train both feedforward and recurrent neural networks with a combined fan-in/weight precision limit, a constraint that is, for example, present in the DYNAP-SE mixed-signal analog/digital neuromorphic processor. Q-rewiring simultaneously performs quantization and rewiring of synapses and synaptic weights through gradient descent updates and projecting the trainable parameters to a constraint-compliant region. Using our algorithm, we find trade-offs between the number of incoming connections to neurons and network performance for a number of common benchmark datasets.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

SINHA, SHARAD, UDIT DHAWAN und THAMBIPILLAI SRIKANTHAN. „EXTENDED COMPATIBILITY PATH BASED HARDWARE BINDING: AN ADAPTIVE ALGORITHM FOR HIGH LEVEL SYNTHESIS OF AREA-TIME EFFICIENT DESIGNS“. Journal of Circuits, Systems and Computers 23, Nr. 09 (25.08.2014): 1450131. http://dx.doi.org/10.1142/s021812661450131x.

Der volle Inhalt der Quelle
Annotation:
Hardware binding is an important step in high level synthesis (HLS). The quality of hardware binding affects the area-time efficiency of a design. The goal of a synthesis process is to produce a design which meets the area-time requirements. In this paper, we present a new hardware binding algorithm with focus on area reduction. It is called extended compatibility path-based (ECPB) hardware binding and extends the compatibility path-based (CPB) hardware binding method by exploiting inter-operation flow dependencies, non-overlapping lifetimes of variables and modifying the weight relation in order to make it application aware and thus adaptive in nature. The presented methodology also takes into account bit width of functional units (FUs) and multi mode FUs. It performs simultaneous FU and register binding. Implemented within a C to register transfer level (RTL) framework, it produces binding results which are better than those produced by weighted bipartite matching (WBM) and CPB algorithms. The use of ECPB algorithm results in an average reduction of 34% and 17.44% in area-time product over WBM and CPB methods, respectively.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Gan, Jiayan, Ang Hu, Ziyi Kang, Zhipeng Qu, Zhanxiang Yang, Rui Yang, Yibing Wang, Huaizong Shao und Jun Zhou. „SAS-SEINet: A SNR-Aware Adaptive Scalable SEI Neural Network Accelerator Using Algorithm–Hardware Co-Design for High-Accuracy and Power-Efficient UAV Surveillance“. Sensors 22, Nr. 17 (30.08.2022): 6532. http://dx.doi.org/10.3390/s22176532.

Der volle Inhalt der Quelle
Annotation:
As a potential air control measure, RF-based surveillance is one of the most commonly used unmanned aerial vehicles (UAV) surveillance methods that exploits specific emitter identification (SEI) technology to identify captured RF signal from ground controllers to UAVs. Recently many SEI algorithms based on deep convolution neural network (DCNN) have emerged. However, there is a lack of the implementation of specific hardware. This paper proposes a high-accuracy and power-efficient hardware accelerator using an algorithm–hardware co-design for UAV surveillance. For the algorithm, we propose a scalable SEI neural network with SNR-aware adaptive precision computation. With SNR awareness and precision reconfiguration, it can adaptively switch between DCNN and binary DCNN to cope with low SNR and high SNR tasks, respectively. In addition, a short-time Fourier transform (STFT) reusing DCNN method is proposed to pre-extract feature of UAV signal. For hardware, we designed a SNR sensing engine, denoising engine, and specialized DCNN engine with hybrid-precision convolution and memory access, aiming at SEI acceleration. Finally, we validate the effectiveness of our design on a FPGA, using a public UAV dataset. Compared with a state-of-the-art algorithm, our method can achieve the highest accuracy of 99.3% and an F1 score of 99.3%. Compared with other hardware designs, our accelerator can achieve the highest power efficiency of 40.12 Gops/W and 96.52 Gops/W with INT16 precision and binary precision.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Zhang, Yue, Shuai Jiang, Yue Cao, Jiarong Xiao, Chengkun Li, Xuan Zhou und Zhongjun Yu. „Hardware-Aware Design of Speed-Up Algorithms for Synthetic Aperture Radar Ship Target Detection Networks“. Remote Sensing 15, Nr. 20 (17.10.2023): 4995. http://dx.doi.org/10.3390/rs15204995.

Der volle Inhalt der Quelle
Annotation:
Recently, synthetic aperture radar (SAR) target detection algorithms based on Convolutional Neural Networks (CNN) have received increasing attention. However, the large amount of computation required burdens the real-time detection of SAR ship targets on resource-limited and power-constrained satellite-based platforms. In this paper, we propose a hardware-aware model speed-up method for single-stage SAR ship targets detection tasks, oriented towards the most widely used hardware for neural network computing—Graphic Processing Unit (GPU). We first analyze the process by which the task of detection is executed on GPUs and propose two strategies according to this process. Firstly, in order to speed up the execution of the model on a GPU, we propose SAR-aware model quantification to allow the original model to be stored and computed in a low-precision format. Next, to ensure the loss of accuracy is negligible after the acceleration and compression process, precision-aware scheduling is used to filter out layers that are not suitable for quantification and store and execute them in a high-precision mode. Trained on the dataset HRSID, the effectiveness of this model speed-up algorithm was demonstrated by compressing four different sizes of models (yolov5n, yolov5s, yolov5m, yolov5l). The experimental results show that the detection speeds of yolov5n, yolov5s, yolov5m, and yolov5l can reach 234.7785 fps, 212.8341 fps, 165.6523 fps, and 139.8758 fps on the NVIDIA AGX Xavier development board with negligible loss of accuracy, which is 1.230 times, 1.469 times, 1.955 times, and 2.448 times faster than the original before the use of this method, respectively.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Perleberg, Murilo, Vinicius Borges, Vladimir Afonso, Daniel Palomino, Luciano Agostini und Marcelo Porto. „6WR: A Hardware Friendly 3D-HEVC DMM-1 Algorithm and its Energy-Aware and High-Throughput Design“. IEEE Transactions on Circuits and Systems II: Express Briefs 67, Nr. 5 (Mai 2020): 836–40. http://dx.doi.org/10.1109/tcsii.2020.2983959.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Arif, Muhammad, Omar S. Sonbul, Muhammad Rashid, Mohsin Murad und Mohammed H. Sinky. „A Unified Point Multiplication Architecture of Weierstrass, Edward and Huff Elliptic Curves on FPGA“. Applied Sciences 13, Nr. 7 (25.03.2023): 4194. http://dx.doi.org/10.3390/app13074194.

Der volle Inhalt der Quelle
Annotation:
This article presents an area-aware unified hardware accelerator of Weierstrass, Edward, and Huff curves over GF(2233) for the point multiplication step in elliptic curve cryptography (ECC). The target implementation platform is a field-programmable gate array (FPGA). In order to explore the design space between processing time and various protection levels, this work employs two different point multiplication algorithms. The first is the Montgomery point multiplication algorithm for the Weierstrass and Edward curves. The second is the Double and Add algorithm for the Binary Huff curve. The area complexity is reduced by efficiently replacing storage elements that result in a 1.93 times decrease in the size of the memory needed. An efficient Karatsuba modular multiplier hardware accelerator is implemented to compute polynomial multiplications. We utilized the square arithmetic unit after the Karatsuba multiplier to execute the quad-block variant of a modular inversion, which preserves lower hardware resources and also reduces clock cycles. Finally, to support three different curves, an efficient controller is implemented. Our unified architecture can operate at a maximum of 294 MHz and utilizes 7423 slices on Virtex-7 FPGA. It takes less computation time than most recent state-of-the-art implementations. Thus, combining different security curves (Weierstrass, Edward, and Huff) in a single design is practical for applications that demand different reliability/security levels.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Hsu, Bay-Yuan, Chih-Ya Shen, Hao Shan Yuan, Wang-Chien Lee und De-Nian Yang. „Social-Aware Group Display Configuration in VR Conference“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 8 (24.03.2024): 8517–25. http://dx.doi.org/10.1609/aaai.v38i8.28695.

Der volle Inhalt der Quelle
Annotation:
Virtual Reality (VR) has emerged due to advancements in hardware and computer graphics. During the pandemic, conferences and exhibitions leveraging VR have gained attention. However, large-scale VR conferences, face a significant problem not yet studied in the literature -- displaying too many irrelevant users on the screen which may negatively impact the user experience. To address this issue, we formulate a new research problem, Social-Aware VR Conference Group Display Configuration (SVGD). Accordingly, we design the Social Utility-Aware VR Conference Group Formation (SVC) algorithm, which is a 2-approximation algorithm to SVGD. SVC iteratively selects either the P-Configuration or S-Configuration based on their effective ratios. This ensures that in each iteration, SVC identifies and chooses the solution with the highest current effectiveness. Experiments on real metaverse datasets show that the proposed SVC outperforms 11 baselines by 75% in terms of solution quality.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema "Hardware-Aware Algorithm design"

1

Seznec, Mickaël. „From the algorithm to the targets, optimization flow for high performance computing on embedded GPUs“. Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG074.

Der volle Inhalt der Quelle
Annotation:
Les algorithmes de traitement numérique actuels nécessitent une puissance de calcul accrue pour obtenir des résultats plus précis et traiter des données plus volumineuses. Dans le même temps, les architectures matérielles se spécialisent, avec des accélérateurs très efficaces pour des tâches spécifiques. Dans ce contexte, le chemin du déploiement de l'algorithme à l'implémentation est de plus en plus complexe. Il est donc crucial de déterminer comment les algorithmes peuvent être modifiés pour tirer parti des capacités du matériel. Dans notre étude, nous nous sommes intéressé aux unités graphiques (GPU), un type de processeur massivement parallèle. Notre travail a consisté à l'adaptation entre l'algorithme et le matériel d'exécution. À l'échelle d'un opérateur mathématique, nous avons modifié un algorithme de convolution d'images pour utiliser les tensor cores et montré qu'on peut en doubler les performances pour de grands noyaux de convolution. Au niveau méthode, nous avons évalué des solveurs de systèmes linéaires pour l'estimation de flux optique afin de trouver le plus adéquat sur GPU. Grâce à ce choix et après de nouvelles optimisations spécifiques, comme la fusion d'itérations ou la réutilisation de zones mémoire, la méthode est deux fois plus rapide que l'implémentation initiale, fonctionnant à 60 images par seconde sur plateforme embarquée (30W). Enfin, nous avons également montré l'intérêt, dans le cadre des réseaux de neurones profonds, de cette méthode de conception d'algorithmes adaptée au matériel. Avec pour exemple l'hybridation entre un réseau conçu pour le flux optique avec une autre architecture préentrainée et conçue pour être efficace sur des cibles à faible puissance de calcul
Current digital processing algorithms require more computing power to achieve more accurate results and process larger data. In the meantime, hardware architectures are becoming more specialized, with highly efficient accelerators designed for specific tasks. In this context, the path of deployment from the algorithm to the implementation becomes increasingly complex. It is, therefore, crucial to determine how algorithms can be modified to take advantage of new hardware capabilities. Our study focused on graphics processing units (GPUs), a massively parallel processor. Our algorithmic work was done in the context of radio-astronomy or optical flow estimation and consisted of finding the best adaptation of the software to the hardware. At the level of a mathematical operator, we modified the traditional image convolution algorithm to use the matrix units and showed that its performance doubles for large convolution kernels. At a broader method level, we evaluated linear solvers for the combined local-global optical flow to find the most suitable one on GPU. With additional optimizations, such as iteration fusion or memory buffer re-utilization, the method is twice as fast as the initial implementation, running at 60 frames per second on an embedded platform (30 W). Finally, we also pointed out the interest of this hardware-aware algorithm design method in the context of deep neural networks. For that, we showed the hybridization of a convolutional neural network for optical flow estimation with a pre-trained image classification network, MobileNet, that was initially designed for efficient image classification on low-power platforms
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Wang, Ya-Ting, und 王雅婷. „Algorithm and Hardware Architecture Design of Perception-Aware Motion Compensated Frame Rate Up-Conversion“. Thesis, 2010. http://ndltd.ncl.edu.tw/handle/50999594566970652643.

Der volle Inhalt der Quelle
Annotation:
碩士
臺灣大學
電子工程學研究所
98
Frame rate up-conversion (FRUC) is a technique converting video sequence from lower frame rate to a higher one, which is originally widely-used in the video compression system to reconstruct frames at the decoder side that skipped by the encoder, and also applied in the high frame rate LCD system nowadays to reduce motion artifacts. Among the motion blur reduction methods, motion-compensated frame interpolation (MCFI) yields the best interpolation results by taking the motion information into consideration and no decrease in the overall brightness. However, the cost is high, since the process to estimate and compensate motions in the MCFI algorithm is computationally expensive, and with high bandwidth and memory requirements. The target application of this work is the HDTV system using LCD with 120Hz refresh rate, where a cost-effective MCFI hardware is desirable for such system, and the frame rate that is much higher than the sampling rate of human eye motivated us to seek cost reduction solution with the perceptual characteristics of human eye. In this thesis, a psychophysical experiment has been conducted and the capability of human to distinct the difference between the motions displayed with 60fps and 120fps is studied. Where the difference of motions with velocity under 3 °/sec and with duration under 100 ms in 60fps and 120fps has proved to be hard-to-detective for human eyes. Base on the psychophysical experiment results, a novel hardware-oriented perception-aware motion-compensated frame interpolation algorithm is proposed. For the VLSI hardware design, the target specification is set to 1920x1080 frame size, with throughput of 60 interpolated frames per second. The hardware is implemented with Verilog-HDL and synthesized with SYNOPSYS Design Compiler. Faraday 90um cell library is adopted to design the hardware. The total gate count is 212K.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Pedram, Ardavan. „Algorithm/architecture codesign of low power and high performance linear algebra compute fabrics“. 2013. http://hdl.handle.net/2152/21364.

Der volle Inhalt der Quelle
Annotation:
In the past, we could rely on technology scaling and new micro-architectural techniques to improve the performance of processors. Nowadays, both of these methods are reaching their limits. The primary concern in future architectures with billions of transistors on a chip and limited power budgets is power/energy efficiency. Full-custom design of application-specific cores can yield up to two orders of magnitude better power efficiency over conventional general-purpose cores. However, a tremendous design effort is required in integrating a new accelerator for each new application. In this dissertation, we present the design of specialized compute fabrics that maintain the efficiency of full custom hardware while providing enough flexibility to execute a whole class of coarse-grain operations. The broad vision is to develop integrated and specialized hardware/software solutions that are co-optimized and co-designed across all layers ranging from the basic hardware foundations all the way to the application programming support through standard linear algebra libraries. We try to address these issues specifically in the context of dense linear algebra applications. In the process, we pursue the main questions that architects will face while designing such accelerators. How broad is this class of applications that the accelerator can support? What are the limiting factors that prevent utilization of these accelerators on the chip? What is the maximum achievable performance/efficiency? Answering these questions requires expertise and careful codesign of the algorithms and the architecture to select the best possible components, datapaths, and data movement patterns resulting in a more efficient hardware-software codesign. In some cases, codesign reduces complexities that are imposed on the algorithm side due to the initial limitations in the architectures. We design a specialized Linear Algebra Processor (LAP) architecture and discuss the details of mapping of matrix-matrix multiplication onto it. We further verify the flexibility of our design for computing a broad class of linear algebra kernels. We conclude that this architecture can perform a broad range of matrix-matrix operations as complex as matrix factorizations, and even Fast Fourier Transforms (FFTs), while maintaining its ASIC level efficiency. We present a power-performance model that compares state-of-the-art CPUs and GPUs with our design. Our power-performance model reveals sources of inefficiencies in CPUs and GPUs. We demonstrate how to overcome such inefficiencies in the process of designing our LAP. As we progress through this dissertation, we introduce modifications of the original matrix-matrix multiplication engine to facilitate the mapping of more complex operations. We observe the resulting performance and efficiencies on the modified engine using our power estimation methodology. When compared to other conventional architectures for linear algebra applications and FFT, our LAP is over an order of magnitude better in terms of power efficiency. Based on our estimations, up to 55 and 25 GFLOPS/W single- and double-precision efficiencies are achievable on a single chip in standard 45nm technology.
text
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Hardware-Aware Algorithm design"

1

Lin, Ji, Wei-Ming Chen und Song Han. „Algorithm-System Co-design for Efficient and Hardware-Aware Embedded Machine Learning“. In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, 349–70. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-39932-9_14.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Chien, Been-Chian, und Shiang-Yi He. „A Generic Context Interpreter for Pervasive Context-Aware Systems“. In Mobile and Handheld Computing Solutions for Organizations and End-Users, 308–21. IGI Global, 2013. http://dx.doi.org/10.4018/978-1-4666-2785-7.ch017.

Der volle Inhalt der Quelle
Annotation:
Developing pervasive context-aware systems to construct smart space applications has attracted much attention from researchers in recent decades. Although many different kinds of context-aware computing paradigms were built of late years, it is still a challenge for researchers to extend an existing system to different application domains and interoperate with other service systems due to heterogeneity among systems This paper proposes a generic context interpreter to overcome the dependency between context and hardware devices. The proposed generic context interpreter contains two modules: the context interpreter generator and the generic interpreter. The context interpreter generator imports sensor data from sensor devices as an XML schema and produces interpretation scripts instead of interpretation widgets. The generic interpreter generates the semantic context for context-aware applications. A context editor is also designed by employing schema matching algorithms for supporting context mapping between devices and context model.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Röhm, Uwe. „OLAP with a Database Cluster“. In Database Technologies, 829–46. IGI Global, 2009. http://dx.doi.org/10.4018/978-1-60566-058-5.ch047.

Der volle Inhalt der Quelle
Annotation:
This chapter presents a new approach to online decision support systems that is scalable, fast, and capable of analysing up-to-date data. It is based on a database cluster: a cluster of commercial off-the-shelf computers as hardware infrastructure and off-the-shelf database management systems as transactional storage managers. We focus on central architectural issues and on the performance implications of such a cluster-based decision support system. In the first half, we present a scalable infrastructure and discuss physical data design alternatives for cluster-based online decision support systems. In the second half of the chapter, we discuss query routing algorithms and freshness-aware scheduling. This protocol enables users to seamlessly decide how fresh the data analysed should be by allowing for different degrees of freshness of the online analytical processing (OLAP) nodes. In particular it becomes then possible to trade freshness of data for query performance.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Hardware-Aware Algorithm design"

1

Wells, Joshua W., Jayaram Natarajan, Abhijit Chatterjee und Irtaza Barlas. „Real-Time, Content Aware Camera -- Algorithm -- Hardware Co-Adaptation for Minimal Power Video Encoding“. In 2012 25th International Conference on VLSI Design. IEEE, 2012. http://dx.doi.org/10.1109/vlsid.2012.78.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Wing On Fung und T. Arslan. „A Power-Aware Algorithm for the Design of Reconfigurable Hardware during High Level Placement“. In 2007 2nd NASA/ESA Conference on Adaptive Hardware and Systems. IEEE, 2007. http://dx.doi.org/10.1109/ahs.2007.15.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Bandic, Medina, Sebastian Feld und Carmen G. Almudever. „Full-stack quantum computing systems in the NISQ era: algorithm-driven and hardware-aware compilation techniques“. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2022. http://dx.doi.org/10.23919/date54114.2022.9774643.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Benmeziane, Hadjer, Kaoutar El Maghraoui, Hamza Ouarnoughi, Smail Niar, Martin Wistuba und Naigang Wang. „Hardware-Aware Neural Architecture Search: Survey and Taxonomy“. In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/592.

Der volle Inhalt der Quelle
Annotation:
There is no doubt that making AI mainstream by bringing powerful, yet power hungry deep neural networks (DNNs) to resource-constrained devices would required an efficient co-design of algorithms, hardware and software. The increased popularity of DNN applications deployed on a wide variety of platforms, from tiny microcontrollers to data-centers, have resulted in multiple questions and challenges related to constraints introduced by the hardware. In this survey on hardware-aware neural architecture search (HW-NAS), we present some of the existing answers proposed in the literature for the following questions: "Is it possible to build an efficient DL model that meets the latency and energy constraints of tiny edge devices?", "How can we reduce the trade-off between the accuracy of a DL model and its ability to be deployed in a variety of platforms?". The survey provides a new taxonomy of HW-NAS and assesses the hardware cost estimation strategies. We also highlight the challenges and limitations of existing approaches and potential future directions. We hope that this survey will help to fuel the research towards efficient deep learning.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Villegas-Pachon, C., R. Carmona-Galan, J. Fernandez-Berni und A. Rodriguez-Vazquez. „Hardware-aware performance evaluation for the co-design of image sensors and vision algorithms“. In 2016 13th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD). IEEE, 2016. http://dx.doi.org/10.1109/smacd.2016.7520722.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Kothari, Aadi, Timothy Talty, Scott Huxtable und Haibo Zeng. „Energy-Efficient and Context-Aware Computing in Software-Defined Vehicles for Advanced Driver Assistance Systems (ADAS)“. In WCX SAE World Congress Experience. 400 Commonwealth Drive, Warrendale, PA, United States: SAE International, 2024. http://dx.doi.org/10.4271/2024-01-2051.

Der volle Inhalt der Quelle
Annotation:
<div class="section abstract"><div class="htmlview paragraph">The rise of Software-Defined Vehicles (SDV) has rapidly advanced the development of Advanced Driver Assistance Systems (ADAS), Autonomous Vehicle (AV), and Battery Electric Vehicle (BEV) technology. While AVs need power to compute data from perception to controls, BEVs need the efficiency to optimize their electric driving range and stand out compared to traditional Internal Combustion Engine (ICE) vehicles. AVs possess certain shortcomings in the current world, but SAE Level 2+ (L2+) Automated Vehicles are the focus of all major Original Equipment Manufacturers (OEMs). The most common form of an SDV today is the amalgamation of AV and BEV technology on the same platform which is prominently available in most OEM’s lineups. As the compute and sensing architectures for L2+ automated vehicles lean towards a computationally expensive centralized design, it may hamper the most important purchasing factor of a BEV, the electric driving range.</div><div class="htmlview paragraph">This research asserts that the development of dynamic sensing and context-aware algorithms will allow a BEV to retain energy efficiency and the ADAS to maintain performance. Moreover, a decentralized computing architecture design will allow the system to utilize System-on-Module (SoM) boards that can process Artificial Intelligence (AI) algorithms at the edge. This will enable refined hardware acceleration using Edge-AI. The research will propose the use of a novel Software-in-the-Loop (SiL) simulation environment for a 2023 Cadillac LYRIQ provided by the EcoCAR EV Challenge competition. Future work will involve an in-depth evaluation and discussion of the simulation data. We will conclude that optimizing sensing and computation in an SDV platform will allow Automated and Electric Vehicles to prosper concurrently without impeding their technological progress.</div></div>
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie