Увійти

Готові списки джерел за темами / HW/SW Co-Optimization

Добірка наукової літератури з теми "HW/SW Co-Optimization"

Автор: Grafiati

Опубліковано: 7 липня 2024

Оновлено: 7 липня 2024

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "HW/SW Co-Optimization".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Зміст

Статті в журналах
Дисертації
Частини книг
Тези доповідей конференцій

Статті в журналах з теми "HW/SW Co-Optimization":

1

Yan, Xiaohu, Fazhi He, Neng Hou, and Haojun Ai. "An Efficient Particle Swarm Optimization for Large-Scale Hardware/Software Co-Design System." International Journal of Cooperative Information Systems 27, no. 01 (March 2018): 1741001. http://dx.doi.org/10.1142/s0218843017410015.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In the co-design process of hardware/software (HW/SW) system, especially for large and complicated embedded systems, HW/SW partitioning is a challenging step. Among different heuristic approaches, particle swarm optimization (PSO) has the advantages of simple implementation and computational efficiency, which is suitable for solving large-scale problems. This paper presents a conformity particle swarm optimization with fireworks explosion operation (CPSO-FEO) to solve large-scale HW/SW partitioning. First, the proposed CPSO algorithm simulates the conformist mentality from biology research. The CPSO particles with psychological conformist always try to move toward a secure point and avoid being attacked by natural enemy. In this way, there is a greater possibility to increase population diversity and avoid local optimum in CPSO. Next, to enhance the search accuracy and solution quality, an improved FEO with new initialization strategy is presented and is combined with CPSO algorithm to search a better position for the global best position. This combination can keep both the diversified and intensified searching. At last, the experiments on benchmarks and large-scale HW/SW partitioning demonstrate the efficiency of the proposed algorithm.

2

WEI, WENLONG, BIN LI, YI ZOU, WENCONG ZHANG, and ZHENQUAN ZHUANG. "A MULTI-OBJECTIVE HW–SW CO-SYNTHESIS ALGORITHM BASED ON QUANTUM-INSPIRED EVOLUTIONARY ALGORITHM." International Journal of Computational Intelligence and Applications 07, no. 02 (June 2008): 129–48. http://dx.doi.org/10.1142/s146902680800220x.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Hardware–Software (HW–SW) co-synthesis is one of the key steps in modern embedded system design. Generally, HW–SW co-synthesis is to optimally allocate processors, assign tasks to processors, and schedule the processing of tasks to achieve a good balance among performance, cost, power consumption, etc. Hence, it is a typical multi-objective optimization problem. In this paper, a new multi-objective HW–SW co-synthesis algorithm based on the quantum-inspired evolutionary algorithm (MQEAC) is proposed. MQEAC utilizes multiple quantum probability amplitude vectors to model the promising areas of solution space. Meanwhile, this paper presents a new crossover operator to accelerate the convergence to the Pareto front and introduces a PE slot-filling strategy to improve the efficiency of scheduling. Experimental results show that the proposed algorithm can solve the typical multi-objective co-synthesis problems effectively and efficiently.

3

Niu, Wen Liang, Wen Zheng Li, and Kai Shuang Yin. "Application of DFG Model on SOPC Technology." Applied Mechanics and Materials 198-199 (September 2012): 696–700. http://dx.doi.org/10.4028/www.scientific.net/amm.198-199.696.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

HW/SW (hardware/software) co-design method based on analysis and optimization of DFG (data flow graphic) model is introduced for SOPC (System on a Programmable Chip) used for digital instrument design in this paper. The method is based on the DFG model of the digital signal process algorithm and implemented with SOPC technology. The DFG model could help designer to divide the function into hardware and software respectively, therefore, the optimizing analysis at system level and circuit level of a SOPC used for portable logic analyzer shows that the DFG model is very useful for not only optimizing architecture and power consumption, but also HW/SW co-design.

4

Yanhua Li, Youhui Zhang, and Weiming Zheng. "HW/SW co-optimization for stencil computation: Beginning with a customizable core." Tsinghua Science and Technology 21, no. 5 (October 2016): 570–80. http://dx.doi.org/10.1109/tst.2016.7590326.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

ISSAD, M., B. BOUDRAA, M. ANANE, and N. ANANE. "SOFTWARE/HARDWARE CO-DESIGN OF MODULAR EXPONENTIATION FOR EFFICIENT RSA CRYPTOSYSTEM." Journal of Circuits, Systems and Computers 23, no. 03 (March 2014): 1450032. http://dx.doi.org/10.1142/s0218126614500327.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This paper presents an implementation of Rivest, Shamir and Adleman (RSA) cryptosystem based on hardware/software (HW/SW) co-design. The main operation of RSA is the modular exponentiation (ME) which is performed by repeated modular multiplications (MMs). In this work, the right-to-left (R2L) algorithm is used for the implementation of the ME as a programmable system on chip (PSoC). The processor MicroBlaze of Xilinx is used for flexibility. The R2L method is often suggested to improve the timing performance, since it is based on parallel computations of MMs. However, if the optimization of HW resources is a constraint, this method can be executed sequentially using a single modular multiplier as a custom intellectual property (IP). Consequently, the execution time of the ME becomes dependent of three factors, namely the capability of the custom IP to perform the MMs, the nonzero bit string of the exponent and the communication link between the processor and the custom IP. In order to achieve the best trade-off between area, speed and flexibility, we propose three implementations in this work. The first one is a pure software solution. The second one takes benefit of a HW accelerator dedicated to the MM execution. The last one is based on a dual strategy. Two parallel MMs are implemented within a custom IP and local memories are used close to the arithmetic units to minimize the communication link influence. The results show that in the application to RSA 1024-bits, the ME runs in 22,25 ms, while using only 1,848 slices.

6

Khoud, Khaled Ben, Soufiene Bouallègue, and Mounir Ayadi. "Design and co-simulation of a fuzzy gain-scheduled PID controller based on particle swarm optimization algorithms for a quad tilt wing unmanned aerial vehicle." Transactions of the Institute of Measurement and Control 40, no. 14 (January 8, 2018): 3933–52. http://dx.doi.org/10.1177/0142331217740947.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This paper deals with the systematic design and hardware co-simulation of a fuzzy gain-scheduled proportional–integral–derivative (GS-PID) controller for a quad tilt wing (QTW) type of unmanned aerial vehicles (UAVs) based on different variants of the particle swarm optimization (PSO) algorithm. The fuzzy PID gains scheduling problem for the stabilization of the roll, pitch and yaw dynamics of the QTW vehicle is formulated as a constrained optimization problem and solved thanks to improved PSO algorithms. PSO algorithms with variable inertia weight (PSO-In), PSO with constriction factor (PSO-Co) and PSO with possibility updating strategies (PSO-gbest) are proposed. Such variants of the PSO algorithm aim further to improve the exploration and exploitation capabilities of such a stochastic algorithm as well as its convergence fastness. The robustness of the designed PSO-based fuzzy GS-PID controllers under actuators faults is shown on the non-linear model of the QTW. All optimized fuzzy GS-PID controllers are then co-simulated within a processor-in-the-loop (PIL) framework based on an embedded NI myRIO-1900 board and a host PC. Such a proposed software (SW) and hardware (HW) computer aided design (CAD) platform is based on the Control Design and Simulation (CDSim) module of the LabVIEW environment as well as a set-up Network Streams-based data communication protocol. Demonstrative simulation results are presented, compared and discussed in order to improve the effectiveness of the proposed PSO-based fuzzy gains scheduled PID controllers for the QTW’s attitude flight stabilization.

7

Zhao, Zhongyuan, Weiguang Sheng, Jinchao Li, Pengfei Ye, Qin Wang, and Zhigang Mao. "Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA." Electronics 10, no. 18 (September 9, 2021): 2210. http://dx.doi.org/10.3390/electronics10182210.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.

8

Loupis, Michalis. "Embedded Systems Development Tools: A MODUS-oriented Market Overview." Business Systems Research Journal 5, no. 1 (March 1, 2014): 6–20. http://dx.doi.org/10.2478/bsrj-2014-0001.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Abstract Background: The embedded systems technology has perhaps been the most dominating technology in high-tech industries, in the past decade. The industry has correctly identified the potential of this technology and has put its efforts into exploring its full potential. Objectives: The goal of the paper is to explore the versatility of the application in the embedded system development based on one FP7-SME project. Methods/Approach: Embedded applications normally demand high resilience and quality, as well as conformity to quality standards and rigid performance. As a result embedded system developers have adopted software methods that yield high quality. The qualitative approach to examining embedded systems development tools has been applied in this work. Results: This paper presents a MODUS-oriented market analysis in the domains of Formal Verification tools, HW/SW co-simulation tools, Software Performance Optimization tools and Code Generation tools. Conclusions: The versatility of applications this technology serves is amazing. With all this performance potential, the technology has carried with itself a large number of issues which the industry essentially needs to resolve to be able to harness the full potential contained. The MODUS project toolset addressed four discrete domains of the ESD Software Market, in which corresponding open tools were developed

9

Panousopoulos, Vasileios, Emmanouil Papaloukas, Vasileios Leon, Dimitrios Soudris, Emmanuel Koumandakis, and George Lentaris. "HW/SW co-design on embedded SoC FPGA for star tracking optimization in space applications." Journal of Real-Time Image Processing 21, no. 1 (January 4, 2024). http://dx.doi.org/10.1007/s11554-023-01391-8.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractStar trackers are crucial for satellite orientation. Improving their efficiency via reconfigurable COTS HW accommodates NewSpace missions. The current work considers SoC FPGAs to leverage both increased reprogramming and high-performance capabilities. Based on a custom sensor+FPGA system, we develop and optimize the algorithmic chain of star tracking by focusing on the acceleration of the image processing parts. We combine multiple circuit design techniques, such as low-level pipelining, word-length optimization, HW/SW co-processing, and parametric HLS+HDL coding, to fine-tune our implementation on Zynq-7020 FPGA when using real and synthetic input data. Overall, with 4-MPixel images, we achieve more than 24 FPS throughput by accelerating >95% of the computation by 8.9$$\times$$ × , at system level, while preserving the original SW accuracy and meeting the real-time requirements of the application.

10

Deng, Shao, Shanzhu Xiao, Qiuqun Deng, and Huanzhang Lu. "A hovering swarm particle swarm optimization algorithm based on node resource attributes for hardware/software partitioning." Journal of Supercomputing, September 19, 2023. http://dx.doi.org/10.1007/s11227-023-05603-7.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractHardware/software (HW/SW) partitioning is a vital aspect of HW/SW co-design. With the development of the design complexity in heterogeneous computing systems, existing partitioning algorithms have demonstrated inadequate performance in addressing problems relating to large-scale task nodes. This paper presents a novel HW/SW partitioning algorithm based on node resource attributes hovering swarm particle swarm optimization (HSPSO). First, the system task graph is initialized via the node resource urgency partitioning algorithm; then, the iterative solution produced by HSPSO algorithm yields the partitioning result. We present new initialization by combining node resource attribute information and introduce two improvements to the learning strategy of HSPSO algorithm. For the main swarm, a directed sample set and the addition of perturbation particles are designed to direct the main swarm’s particle search process. For the secondary swarm, a dynamic particle update equation is formulated. Iterative updates are performed based on previous rounds’ prior information using adaptive inertia weight. The experimental results illustrate that, in large-scale systems task graph partitioning with more than 400 nodes, when compared with mainstream partitioning algorithms, the proposed algorithm improves partitioning performance by no less than 10% for compute-intensive task graphs and no <5% for communication-intensive task graphs, with higher solution stability.

Дисертації з теми "HW/SW Co-Optimization":

1

Deb, Abhishek. "HW/SW mechanisms for instruction fusion, issue and commit in modern u-processors." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/81561.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In this thesis we have explored the co-designed paradigm to show alternative processor design points. Specifically, we have provided HW/SW mechanisms for instruction fusion, issue and commit for modern processors. We have implemented a co-designed virtual machine monitor that binary translates x86 instructions into RISC like micro-ops. Moreover, the translations are stored as superblocks, which are a trace of basic blocks. These superblocks are further optimized using speculative and non-speculative optimizations. Hardware mechanisms exists in-order to take corrective action in case of misspeculations. During the course of this PhD we have made following contributions. Firstly, we have provided a novel Programmable Functional unit, in-order to speed up general-purpose applications. The PFU consists of a grid of functional units, similar to CCA, and a distributed internal register file. The inputs of the macro-op are brought from the Physical Register File to the internal register file using a set of moves and a set of loads. A macro-op fusion algorithm fuses micro-ops at runtime. The fusion algorithm is based on a scheduling step that indicates whether the current fused instruction is beneficial or not. The micro-ops corresponding to the macro-ops are stored as control signals in a configuration. The macro-op consists of a configuration ID which helps in locating the configurations. A small configuration cache is present inside the Programmable Functional unit, that holds these configurations. In case of a miss in the configuration cache configurations are loaded from I-Cache. Moreover, in-order to support bulk commit of atomic superblocks that are larger than the ROB we have proposed a speculative commit mechanism. For this we have proposed a Speculative commit register map table that holds the mappings of the speculatively committed instructions. When all the instructions of the superblock have committed the speculative state is copied to Backend Register Rename Table. Secondly, we proposed a co-designed in-order processor with with two kinds of accelerators. These FU based accelerators run a pair of fused instructions. We have considered two kinds of instruction fusion. First, we fused a pair of independent loads together into vector loads and execute them on vector load units. For the second kind of instruction fusion we have fused a pair of dependent simple ALU instructions and execute them in Interlock Collapsing ALUs (ICALU). Moreover, we have evaluated performance of various code optimizations such as list-scheduling, load-store telescoping and load hoisting among others. We have compared our co-designed processor with small instruction window out-of-order processors. Thirdly, we have proposed a co-designed out-of-order processor. Specifically we have reduced complexity in two areas. First of all, we have co-designed the commit mechanism, that enable bulk commit of atomic superblocks. In this solution we got rid of the conventional ROB, instead we introduce the Superblock Ordering Buffer (SOB). SOB ensures program order is maintained at the granularity of the superblock, by bulk committing the program state. The program state consists of the register state and the memory state. The register state is held in a per superblock register map table, whereas the memory state is held in gated store buffer and updated in bulk. Furthermore, we have tackled the complexity of Out-of-Order issue logic by using FIFOs. We have proposed an enhanced steering heuristic that fixes the inefficiencies of the existing dependence-based heuristic. Moreover, a mechanism to release the FIFO entries earlier is also proposed that further improves the performance of the steering heuristic.
En aquesta tesis hem explorat el paradigma de les màquines issue i commit per processadors actuals. Hem implementat una màquina virtual que tradueix binaris x86 a micro-ops de tipus RISC. Aquestes traduccions es guarden com a superblocks, que en realitat no és més que una traça de virtuals co-dissenyades. En particular, hem proposat mecanismes hw/sw per a la fusió d’instruccions, blocs bàsics. Aquests superblocks s’optimitzen utilitzant optimizacions especualtives i d’altres no speculatives. En cas de les optimizations especulatives es consideren mecanismes per a la gestió de errades en l’especulació. Al llarg d’aquesta tesis s’han fet les següents contribucions: Primer, hem proposat una nova unitat functional programmable (PFU) per tal de millorar l’execució d’aplicacions de proposit general. La PFU està formada per un conjunt d’unitats funcionals, similar al CCA, amb un banc de registres intern a la PFU distribuït a les unitats funcionals que la composen. Les entrades de la macro-operació que s’executa en la PFU es mouen del banc de registres físic convencional al intern fent servir un conjunt de moves i loads. Un algorisme de fusió combina més micro-operacions en temps d’execució. Aquest algorisme es basa en un pas de planificació que mesura el benefici de les decisions de fusió. Les micro operacions corresponents a la macro operació s’emmagatzemen com a senyals de control en una configuració. Les macro-operacions tenen associat un identificador de configuració que ajuda a localitzar d’aquestes. Una petita cache de configuracions està present dintre de la PFU per tal de guardar-les. En cas de que la configuració no estigui a la cache, les configuracions es carreguen de la cache d’instruccions. Per altre banda, per tal de donar support al commit atòmic dels superblocks que sobrepassen el tamany del ROB s’ha proposat un mecanisme de commit especulatiu. Per aquest mecanisme hem proposat una taula de mapeig especulativa dels registres, que es copia a la taula no especulativa quan totes les instruccions del superblock han comitejat. Segon, hem proposat un processador en order co-dissenyat que combina dos tipus d’acceleradors. Aquests acceleradors executen un parell d’instruccions fusionades. S’han considerat dos tipus de fusió d’instructions. Primer, combinem un parell de loads independents formant loads vectorials i els executem en una unitat vectorial. Segon, fusionem parells d’instruccions simples d’alu que són dependents i que s’executaran en una Interlock Collapsing ALU (ICALU). Per altra aquestes tecniques les hem evaluat conjuntament amb diverses optimizacions com list scheduling, load-store telescoping i hoisting de loads, entre d’altres. Aquesta proposta ha estat comparada amb un processador fora d’ordre. Tercer, hem proposat un processador fora d’ordre co-dissenyat efficient reduint-ne la complexitat en dos areas principals. En primer lloc, hem co-disenyat el mecanisme de commit per tal de permetre un eficient commit atòmic del superblocks. En aquesta solució hem substituït el ROB convencional, i en lloc hem introduït el Superblock Ordering Buffer (SOB). El SOB manté l’odre de programa a granularitat de superblock. L’estat del programa consisteix en registres i memòria. L’estat dels registres es manté en una taula per superblock, mentre que l’estat de memòria es guarda en un buffer i s’actulitza atòmicament. La segona gran area de reducció de complexitat considerarada és l’ús de FIFOs a la lògica d’issue. En aquest últim àmbit hem proposat una heurística de distribució que solventa les ineficiències de l’heurística basada en dependències anteriorment proposada. Finalment, i junt amb les FIFOs, s’ha proposat un mecanisme per alliberar les entrades de la FIFO anticipadament.

2

Bouzidi, Halima. "Efficient Deployment of Deep Neural Networks on Hardware Devices for Edge AI." Electronic Thesis or Diss., Valenciennes, Université Polytechnique Hauts-de-France, 2024. http://www.theses.fr/2024UPHF0006.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Les réseaux de neurones (RN) sont devenus une force dominante dans le monde de la technologie. Inspirés par le cerveau humain, leur conception complexe leur permet d’apprendre des motifs, de prendre des décisions et même de prévoir des scénarios futurs avec une précision impressionnante. Les RN sont largement déployés dans les systèmes de l'Internet des Objets (IoT pour Internet of Things), renforçant davantage les capacités des dispositifs interconnectés en leur donnant la capacité d'apprendre et de s'auto-adapter dans un contexte temps réel. Cependant, la prolifération des données produites par les capteurs IoT rend difficile leur envoi vers un centre cloud pour le traitement. Par conséquent, le traitement des données plus près de leur origine, en edge, permet de prendre des décisions en temps réel, réduisant ainsi la congestion du réseau.L'intégration des RN à l'edge dans les systèmes IoT permet d'obtenir des solutions plus efficaces et réactives, inaugurant ainsi une nouvelle ère de edge AI. Néanmoins, le déploiement des RN sur des plateformes matérielles à ressources présente une multitude de défis. (i) La complexité inhérente des architectures des RN, qui nécessitent d'importantes capacités de calcul et de stockage. (ii) Le budget énergétique limité caractérisant les dispositifs matériels sur edge qui ne permet pas de supporter des RN complexes, réduisant drastiquement la durée de fonctionnement du système. (iii) Le défi d'assurer une harmonie entre la conception des RN et celle des dispositifs matériels de l’edge. (iv) L'absence de l'adaptabilité à l'environnement d'exécution dynamique et aux complexités des données.Pour pallier ces problèmes, cette thèse vise à établir des méthodes innovantes qui élargissent les cadres traditionnels de conception de RN (NAS pour Neural Architecture Search) en intégrant les caractéristiques contextuelles du matériel et de l’environnement d'exécution. Tout d'abord, nous intégrons les propriétés matérielles au NAS en adaptant les RN aux variations de la fréquence d'horloge. Deuxièmement, nous exploitons l’aspect dynamique au sein des RN d'un point de vue conceptuel, en introduisant un NAS dynamique. Troisièmement, nous explorons le potentiel des RN graphiques (GNN pour Graph Neural Network) en développant un NAS avec calcul distribué sur des multiprocesseurs hétérogènes sur puce (MPSoC pour Multi-Processors Système-on-Chip). Quatrièmement, nous abordons la co-optimisation software et matérielle sur les MPSoCs hétérogènes en proposant une stratégie d'ordonnancement innovante qui exploite l'adaptabilité et le parallélisme des RN. Cinquièmement, nous explorons la perspective de ML4ML (pour Machine Learning for Machine Learning) en introduisant des techniques d'estimation des performances des RN sur les plateformes matérielles sur edge en utilisant des méthodes basés sur ML. Enfin, nous développons un framework NAS évolutif et auto-adaptatif de bout en bout qui apprend progressivement l'importance des paramètres architecturaux du RN pour guider efficacement le processus de recherche du NAS vers l'optimalité.Nos méthodes aident à contribuer à la réalisation d’un framework de conception de bout en bout pour les RN sur les dispositifs matériels sur edge. Elles permettent ainsi de tirer avantage de plusieurs pistes d’optimisation au niveau logiciel et matériel, améliorant les performances et l’efficacité de l’Edge AI
Neural Networks (NN) have become a leading force in today's digital landscape. Inspired by the human brain, their intricate design allows them to recognize patterns, make informed decisions, and even predict forthcoming scenarios with impressive accuracy. NN are widely deployed in Internet of Things (IoT) systems, further elevating interconnected devices' capabilities by empowering them to learn and auto-adapt in real-time contexts. However, the proliferation of data produced by IoT sensors makes it difficult to send them to a centralized cloud for processing. This is where the allure of edge computing becomes captivating. Processing data closer to where it originates -at the edge- reduces latency, makes real-time decisions with less effort, and efficiently manages network congestion.Integrating NN on edge devices for IoT systems enables more efficient and responsive solutions, ushering in a new age of self-sustaining Edge AI. However, Deploying NN on resource-constrained edge devices presents a myriad of challenges: (i) The inherent complexity of neural network architectures, which requires significant computational and memory capabilities. (ii) The limited power budget of IoT devices makes the NN inference prone to rapid energy depletion, drastically reducing system utility. (iii) The hurdle of ensuring harmony between NN and HW designs as they evolve at different rates. (iv) The lack of adaptability to the dynamic runtime environment and the intricacies of input data.Addressing these challenges, this thesis aims to establish innovative methods that extend conventional NN design frameworks, notably Neural Architecture Search (NAS). By integrating HW and runtime contextual features, our methods aspire to enhance NN performances while abstracting the need for the human-in-loop}. Firstly, we incorporate HW properties into the NAS by tailoring the design of NN to clock frequency variations (DVFS) to minimize energy footprint. Secondly, we leverage dynamicity within NN from a design perspective, culminating in a comprehensive Hardware-aware Dynamic NAS with DVFS features. Thirdly, we explore the potential of Graph Neural Networks (GNN) at the edge by developing a novel HW-aware NAS with distributed computing features on heterogeneous MPSoC. Fourthly, we address the SW/HW co-optimization on heterogeneous MPSoCs by proposing an innovative scheduling strategy that leverages NN adaptability and parallelism across computing units. Fifthly, we explore the prospect of ML4ML -- Machine Learning for Machine Learning by introducing techniques to estimate NN performances on edge devices using neural architectural features and ML-based predictors. Finally, we develop an end-to-end self-adaptive evolutionary HW-aware NAS framework that progressively learns the importance of NN parameters to guide the search process toward Pareto optimality effectively.Our methods can contribute to elaborating an end-to-end design framework for neural networks on edge hardware devices. They enable leveraging multiple optimization opportunities at both the software and hardware levels, thus improving the performance and efficiency of Edge AI

3

Ming-ChungLi and 李明峻. "Hw/Sw Data Transfer Optimization Co-Design of Embedded JPEG Image Compress." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/12464595123506991299.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "HW/SW Co-Optimization":

1

Mohamed, Khaled Salah. "HW/SW Co-Optimization and Co-Protection." In Synthesis Lectures on Digital Circuits & Systems, 153–70. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-56152-8_6.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Salhi, Afef, Fahmi Ghozzi, and Ahmed Fakhfakh. "Approximation Algorithm for Scheduling a Chain of Tasks for Motion Estimation on Heterogeneous Systems MPSoC." In Engineering Problems - Uncertainties, Constraints and Optimization Techniques [Working Title]. IntechOpen, 2021. http://dx.doi.org/10.5772/intechopen.97676.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Co-design embedded system are very important step in digital vehicle and airplane. The multicore and multiprocessor SoC (MPSoC) started a new computing era. It is becoming increasingly used because it can provide designers much more opportunities to meet specific performances. Designing embedded systems includes two main phases: (i) HW/SW Partitioning performed from high-level (eclipse C/C++ or python (machine learning and deep learning)) functional and architecture models (with virtual prototype and real prototype). And (ii) Software Design performed with significantly more detailed models with scheduling and partitioning tasks algorithm DAG Directed Acyclic Graph and GGEN Generation Graph Estimation Nodes (there are automatic DAG algorithm). Partitioning decisions are made according to performance assumptions that should be validated on the more refined software models for ME block and GGEN algorithm. In this paper, we focus to optimize a execution time and amelioration for quality of video with a scheduling and partitioning tasks in video codec. We show how they can be modeled the video sequence test with the size of video in height and width (three models of scheduling tasks in four processor). This modeling with DAG and GGEN are partitioning at different platform in OVP (partitioning, SW design). We can know the optimization of consumption energy and execution time in SoC and MPSoC platform.

Тези доповідей конференцій з теми "HW/SW Co-Optimization":

1

Nezhadi, Ali, Shaahin Angizi, and Arman Roohi. "EaseMiss: HW/SW Co-Optimization for Efficient Large Matrix-Matrix Multiply Operations." In 2022 IEEE 15th Dallas Circuit And System Conference (DCAS). IEEE, 2022. http://dx.doi.org/10.1109/dcas53974.2022.9845629.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Xu, Susan, and Hugh Pollitt-Smith. "Optimization of HW/SW Co-Design: Relevance to Configurable Processor and FPGA Technology." In 2007 Canadian Conference on Electrical and Computer Engineering. IEEE, 2007. http://dx.doi.org/10.1109/ccece.2007.423.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Matsuoka, Yusuke, Patrick Schaumont, Kris Tiri, and Ingrid Verbauwhede. "Java cryptography on KVM and its performance and security optimization using HW/SW co-design techniques." In the 2004 international conference. New York, New York, USA: ACM Press, 2004. http://dx.doi.org/10.1145/1023833.1023874.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Branković, Aleksandar, Kyriakos Stavrou, Enric Gibert, and Antonio González. "Warm-Up Simulation Methodology for HW/SW Co-Designed Processors." In CGO '14: 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization. New York, NY, USA: ACM, 2014. http://dx.doi.org/10.1145/2544137.2544142.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Wu, Youfeng, Shiliang Hu, Edson Borin, and Cheng Wang. "A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing." In 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2011. http://dx.doi.org/10.1109/cgo.2011.5764691.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.