Log in

Relevant bibliographies by topics / SW Co-Optimization

Academic literature on the topic 'SW Co-Optimization'

Author: Grafiati

Published: 7 July 2024

Last updated: 7 July 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'SW Co-Optimization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Contents

Journal articles
Dissertations / Theses
Book chapters
Conference papers

Journal articles on the topic "SW Co-Optimization":

1

Yan, Xiaohu, Fazhi He, Neng Hou, and Haojun Ai. "An Efficient Particle Swarm Optimization for Large-Scale Hardware/Software Co-Design System." International Journal of Cooperative Information Systems 27, no. 01 (March 2018): 1741001. http://dx.doi.org/10.1142/s0218843017410015.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In the co-design process of hardware/software (HW/SW) system, especially for large and complicated embedded systems, HW/SW partitioning is a challenging step. Among different heuristic approaches, particle swarm optimization (PSO) has the advantages of simple implementation and computational efficiency, which is suitable for solving large-scale problems. This paper presents a conformity particle swarm optimization with fireworks explosion operation (CPSO-FEO) to solve large-scale HW/SW partitioning. First, the proposed CPSO algorithm simulates the conformist mentality from biology research. The CPSO particles with psychological conformist always try to move toward a secure point and avoid being attacked by natural enemy. In this way, there is a greater possibility to increase population diversity and avoid local optimum in CPSO. Next, to enhance the search accuracy and solution quality, an improved FEO with new initialization strategy is presented and is combined with CPSO algorithm to search a better position for the global best position. This combination can keep both the diversified and intensified searching. At last, the experiments on benchmarks and large-scale HW/SW partitioning demonstrate the efficiency of the proposed algorithm.

2

WEI, WENLONG, BIN LI, YI ZOU, WENCONG ZHANG, and ZHENQUAN ZHUANG. "A MULTI-OBJECTIVE HW–SW CO-SYNTHESIS ALGORITHM BASED ON QUANTUM-INSPIRED EVOLUTIONARY ALGORITHM." International Journal of Computational Intelligence and Applications 07, no. 02 (June 2008): 129–48. http://dx.doi.org/10.1142/s146902680800220x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Hardware–Software (HW–SW) co-synthesis is one of the key steps in modern embedded system design. Generally, HW–SW co-synthesis is to optimally allocate processors, assign tasks to processors, and schedule the processing of tasks to achieve a good balance among performance, cost, power consumption, etc. Hence, it is a typical multi-objective optimization problem. In this paper, a new multi-objective HW–SW co-synthesis algorithm based on the quantum-inspired evolutionary algorithm (MQEAC) is proposed. MQEAC utilizes multiple quantum probability amplitude vectors to model the promising areas of solution space. Meanwhile, this paper presents a new crossover operator to accelerate the convergence to the Pareto front and introduces a PE slot-filling strategy to improve the efficiency of scheduling. Experimental results show that the proposed algorithm can solve the typical multi-objective co-synthesis problems effectively and efficiently.

3

Niu, Wen Liang, Wen Zheng Li, and Kai Shuang Yin. "Application of DFG Model on SOPC Technology." Applied Mechanics and Materials 198-199 (September 2012): 696–700. http://dx.doi.org/10.4028/www.scientific.net/amm.198-199.696.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

HW/SW (hardware/software) co-design method based on analysis and optimization of DFG (data flow graphic) model is introduced for SOPC (System on a Programmable Chip) used for digital instrument design in this paper. The method is based on the DFG model of the digital signal process algorithm and implemented with SOPC technology. The DFG model could help designer to divide the function into hardware and software respectively, therefore, the optimizing analysis at system level and circuit level of a SOPC used for portable logic analyzer shows that the DFG model is very useful for not only optimizing architecture and power consumption, but also HW/SW co-design.

4

Yanhua Li, Youhui Zhang, and Weiming Zheng. "HW/SW co-optimization for stencil computation: Beginning with a customizable core." Tsinghua Science and Technology 21, no. 5 (October 2016): 570–80. http://dx.doi.org/10.1109/tst.2016.7590326.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Khoud, Khaled Ben, Soufiene Bouallègue, and Mounir Ayadi. "Design and co-simulation of a fuzzy gain-scheduled PID controller based on particle swarm optimization algorithms for a quad tilt wing unmanned aerial vehicle." Transactions of the Institute of Measurement and Control 40, no. 14 (January 8, 2018): 3933–52. http://dx.doi.org/10.1177/0142331217740947.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This paper deals with the systematic design and hardware co-simulation of a fuzzy gain-scheduled proportional–integral–derivative (GS-PID) controller for a quad tilt wing (QTW) type of unmanned aerial vehicles (UAVs) based on different variants of the particle swarm optimization (PSO) algorithm. The fuzzy PID gains scheduling problem for the stabilization of the roll, pitch and yaw dynamics of the QTW vehicle is formulated as a constrained optimization problem and solved thanks to improved PSO algorithms. PSO algorithms with variable inertia weight (PSO-In), PSO with constriction factor (PSO-Co) and PSO with possibility updating strategies (PSO-gbest) are proposed. Such variants of the PSO algorithm aim further to improve the exploration and exploitation capabilities of such a stochastic algorithm as well as its convergence fastness. The robustness of the designed PSO-based fuzzy GS-PID controllers under actuators faults is shown on the non-linear model of the QTW. All optimized fuzzy GS-PID controllers are then co-simulated within a processor-in-the-loop (PIL) framework based on an embedded NI myRIO-1900 board and a host PC. Such a proposed software (SW) and hardware (HW) computer aided design (CAD) platform is based on the Control Design and Simulation (CDSim) module of the LabVIEW environment as well as a set-up Network Streams-based data communication protocol. Demonstrative simulation results are presented, compared and discussed in order to improve the effectiveness of the proposed PSO-based fuzzy gains scheduled PID controllers for the QTW’s attitude flight stabilization.

6

Zhao, Zhongyuan, Weiguang Sheng, Jinchao Li, Pengfei Ye, Qin Wang, and Zhigang Mao. "Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA." Electronics 10, no. 18 (September 9, 2021): 2210. http://dx.doi.org/10.3390/electronics10182210.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.

7

ISSAD, M., B. BOUDRAA, M. ANANE, and N. ANANE. "SOFTWARE/HARDWARE CO-DESIGN OF MODULAR EXPONENTIATION FOR EFFICIENT RSA CRYPTOSYSTEM." Journal of Circuits, Systems and Computers 23, no. 03 (March 2014): 1450032. http://dx.doi.org/10.1142/s0218126614500327.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This paper presents an implementation of Rivest, Shamir and Adleman (RSA) cryptosystem based on hardware/software (HW/SW) co-design. The main operation of RSA is the modular exponentiation (ME) which is performed by repeated modular multiplications (MMs). In this work, the right-to-left (R2L) algorithm is used for the implementation of the ME as a programmable system on chip (PSoC). The processor MicroBlaze of Xilinx is used for flexibility. The R2L method is often suggested to improve the timing performance, since it is based on parallel computations of MMs. However, if the optimization of HW resources is a constraint, this method can be executed sequentially using a single modular multiplier as a custom intellectual property (IP). Consequently, the execution time of the ME becomes dependent of three factors, namely the capability of the custom IP to perform the MMs, the nonzero bit string of the exponent and the communication link between the processor and the custom IP. In order to achieve the best trade-off between area, speed and flexibility, we propose three implementations in this work. The first one is a pure software solution. The second one takes benefit of a HW accelerator dedicated to the MM execution. The last one is based on a dual strategy. Two parallel MMs are implemented within a custom IP and local memories are used close to the arithmetic units to minimize the communication link influence. The results show that in the application to RSA 1024-bits, the ME runs in 22,25 ms, while using only 1,848 slices.

8

Loupis, Michalis. "Embedded Systems Development Tools: A MODUS-oriented Market Overview." Business Systems Research Journal 5, no. 1 (March 1, 2014): 6–20. http://dx.doi.org/10.2478/bsrj-2014-0001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Abstract Background: The embedded systems technology has perhaps been the most dominating technology in high-tech industries, in the past decade. The industry has correctly identified the potential of this technology and has put its efforts into exploring its full potential. Objectives: The goal of the paper is to explore the versatility of the application in the embedded system development based on one FP7-SME project. Methods/Approach: Embedded applications normally demand high resilience and quality, as well as conformity to quality standards and rigid performance. As a result embedded system developers have adopted software methods that yield high quality. The qualitative approach to examining embedded systems development tools has been applied in this work. Results: This paper presents a MODUS-oriented market analysis in the domains of Formal Verification tools, HW/SW co-simulation tools, Software Performance Optimization tools and Code Generation tools. Conclusions: The versatility of applications this technology serves is amazing. With all this performance potential, the technology has carried with itself a large number of issues which the industry essentially needs to resolve to be able to harness the full potential contained. The MODUS project toolset addressed four discrete domains of the ESD Software Market, in which corresponding open tools were developed

9

Terenchenko, A. S., and A. S. Stryapunin. "Estimation of CO emissions from 2 KAMAZ-54901 truck with the use of Regulation (EU) 2017/2400 methodology." Trudy NAMI, no. 4 (December 28, 2023): 61–68. http://dx.doi.org/10.51187/0135-3152-2023-4-61-68.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Introduction (problem statement and relevance). For several years already, the world’s leading countries are managing programs aimed at decreasing СО2 emissions from trucks. These programs imply actual (field) testing of components and simulating of vehicle driving according to the test cycles in the special software (SW). In Europe, Regulation (EU) 2017/2400 requires that manufacturers produce and open source such calculations when registering new trucks (heavy-duty vehicles) in the EU territory. The purpose of the study is to compare the CO2 emission level from trucks made by the European manufacturers and from a modern truck produced by KAMAZ PTC.Methodology and research methods. The truck driving was simulated with the use of Regulation (EU) 2017/2400 methodology in the VECTO software. Scientific novelty and results. The СО2 emissions from the modern truck produced by KAMAZ PTC were calculated with the use of Regulation (EU) 2017/2400 methodology in the VECTO software. Practical significance. The obtained calculation results can be used for optimization of the KAMAZ vehicle parameters in order to decrease fuel consumption and СО2 emissions

10

Ahmed, M. Elmuzafar, Abdullah S. Sultan, Abdulkarim Al-Sofi, and Hasan S. Al-Hashim. "Optimization of surfactant-polymer flooding for enhanced oil recovery." Journal of Petroleum Exploration and Production Technology, June 15, 2023. http://dx.doi.org/10.1007/s13202-023-01651-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractChemical enhanced oil recovery applications continue to face a variety of obstacles, particularly in high saline and high-temperature reservoirs, in addition to high chemical prices. This issue creates difficulty in developing optimal recipes that can withstand these extreme circumstances and so achieve maximal hydrocarbon recovery at the lowest feasible cost. The usefulness of surfactant polymer (SP) in mobilizing oil and increasing sweep efficiency in carbonate rocks is assessed in this article. A thermo-viscosifying polymer and an acrylamido tertiary butyl sulfonate (ATBS)/acrylamide (AM) copolymer were employed. Surfactants of various grades of amphoteric carboxybetain are used. These potential chemicals were chosen after a thorough study of previous research, which included long-term thermal stability, fluid rheology, interfacial tension, adsorption, and microfluidic tests. The contact angles were measured using a captive drop analyzer at high pressure and high temperature. The core-flooding experiments for slug size and injection sequence optimization were carried out using 12-inch long and 1.5-inch diameter limestone cores. For two weeks, the samples were aged. The trials were carried out at 90 °C. The seawater (SW) salinity utilized in the injection was 57,000 ppm. The findings highlighted the importance of surfactant-polymer interactions in wettability and fluid rheology. The best chemical combination was carboxybetaine (0.05 wt%) and ATBS/AM (0.25 wt%) which recovered 31.29% of the residual oil saturation (ROS), or 11.63% of the original oil in place (OIIP). The optimal slug size was 3.5 PV, which recovered 34.21% of the ROS and 17.05% of the OIIP. The optimum injection sequence was the co-injection of surfactant and polymer SW-S1P1-SW, which extracted 31.29% of the ROS and 11.63% of the OIIP. The recoveries were discovered to be related to the slug’s size. The chemical injection sequence was critical to the eventual oil recovery. Among the other sequences, SW-SP-SW had the highest recovery (SW-P-S-SW, SW-S-SW-P-SW, and SW-P-SW-S-SW). This is thought to be owing to the compounds' synergistic impact. We found that there is no systematic optimization process that combines the effect of chemicals, slug size, and sequence in one study, which gave us the motivation to cover the research gap.

Dissertations / Theses on the topic "SW Co-Optimization":

1

Deb, Abhishek. "HW/SW mechanisms for instruction fusion, issue and commit in modern u-processors." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/81561.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In this thesis we have explored the co-designed paradigm to show alternative processor design points. Specifically, we have provided HW/SW mechanisms for instruction fusion, issue and commit for modern processors. We have implemented a co-designed virtual machine monitor that binary translates x86 instructions into RISC like micro-ops. Moreover, the translations are stored as superblocks, which are a trace of basic blocks. These superblocks are further optimized using speculative and non-speculative optimizations. Hardware mechanisms exists in-order to take corrective action in case of misspeculations. During the course of this PhD we have made following contributions. Firstly, we have provided a novel Programmable Functional unit, in-order to speed up general-purpose applications. The PFU consists of a grid of functional units, similar to CCA, and a distributed internal register file. The inputs of the macro-op are brought from the Physical Register File to the internal register file using a set of moves and a set of loads. A macro-op fusion algorithm fuses micro-ops at runtime. The fusion algorithm is based on a scheduling step that indicates whether the current fused instruction is beneficial or not. The micro-ops corresponding to the macro-ops are stored as control signals in a configuration. The macro-op consists of a configuration ID which helps in locating the configurations. A small configuration cache is present inside the Programmable Functional unit, that holds these configurations. In case of a miss in the configuration cache configurations are loaded from I-Cache. Moreover, in-order to support bulk commit of atomic superblocks that are larger than the ROB we have proposed a speculative commit mechanism. For this we have proposed a Speculative commit register map table that holds the mappings of the speculatively committed instructions. When all the instructions of the superblock have committed the speculative state is copied to Backend Register Rename Table. Secondly, we proposed a co-designed in-order processor with with two kinds of accelerators. These FU based accelerators run a pair of fused instructions. We have considered two kinds of instruction fusion. First, we fused a pair of independent loads together into vector loads and execute them on vector load units. For the second kind of instruction fusion we have fused a pair of dependent simple ALU instructions and execute them in Interlock Collapsing ALUs (ICALU). Moreover, we have evaluated performance of various code optimizations such as list-scheduling, load-store telescoping and load hoisting among others. We have compared our co-designed processor with small instruction window out-of-order processors. Thirdly, we have proposed a co-designed out-of-order processor. Specifically we have reduced complexity in two areas. First of all, we have co-designed the commit mechanism, that enable bulk commit of atomic superblocks. In this solution we got rid of the conventional ROB, instead we introduce the Superblock Ordering Buffer (SOB). SOB ensures program order is maintained at the granularity of the superblock, by bulk committing the program state. The program state consists of the register state and the memory state. The register state is held in a per superblock register map table, whereas the memory state is held in gated store buffer and updated in bulk. Furthermore, we have tackled the complexity of Out-of-Order issue logic by using FIFOs. We have proposed an enhanced steering heuristic that fixes the inefficiencies of the existing dependence-based heuristic. Moreover, a mechanism to release the FIFO entries earlier is also proposed that further improves the performance of the steering heuristic.
En aquesta tesis hem explorat el paradigma de les màquines issue i commit per processadors actuals. Hem implementat una màquina virtual que tradueix binaris x86 a micro-ops de tipus RISC. Aquestes traduccions es guarden com a superblocks, que en realitat no és més que una traça de virtuals co-dissenyades. En particular, hem proposat mecanismes hw/sw per a la fusió d’instruccions, blocs bàsics. Aquests superblocks s’optimitzen utilitzant optimizacions especualtives i d’altres no speculatives. En cas de les optimizations especulatives es consideren mecanismes per a la gestió de errades en l’especulació. Al llarg d’aquesta tesis s’han fet les següents contribucions: Primer, hem proposat una nova unitat functional programmable (PFU) per tal de millorar l’execució d’aplicacions de proposit general. La PFU està formada per un conjunt d’unitats funcionals, similar al CCA, amb un banc de registres intern a la PFU distribuït a les unitats funcionals que la composen. Les entrades de la macro-operació que s’executa en la PFU es mouen del banc de registres físic convencional al intern fent servir un conjunt de moves i loads. Un algorisme de fusió combina més micro-operacions en temps d’execució. Aquest algorisme es basa en un pas de planificació que mesura el benefici de les decisions de fusió. Les micro operacions corresponents a la macro operació s’emmagatzemen com a senyals de control en una configuració. Les macro-operacions tenen associat un identificador de configuració que ajuda a localitzar d’aquestes. Una petita cache de configuracions està present dintre de la PFU per tal de guardar-les. En cas de que la configuració no estigui a la cache, les configuracions es carreguen de la cache d’instruccions. Per altre banda, per tal de donar support al commit atòmic dels superblocks que sobrepassen el tamany del ROB s’ha proposat un mecanisme de commit especulatiu. Per aquest mecanisme hem proposat una taula de mapeig especulativa dels registres, que es copia a la taula no especulativa quan totes les instruccions del superblock han comitejat. Segon, hem proposat un processador en order co-dissenyat que combina dos tipus d’acceleradors. Aquests acceleradors executen un parell d’instruccions fusionades. S’han considerat dos tipus de fusió d’instructions. Primer, combinem un parell de loads independents formant loads vectorials i els executem en una unitat vectorial. Segon, fusionem parells d’instruccions simples d’alu que són dependents i que s’executaran en una Interlock Collapsing ALU (ICALU). Per altra aquestes tecniques les hem evaluat conjuntament amb diverses optimizacions com list scheduling, load-store telescoping i hoisting de loads, entre d’altres. Aquesta proposta ha estat comparada amb un processador fora d’ordre. Tercer, hem proposat un processador fora d’ordre co-dissenyat efficient reduint-ne la complexitat en dos areas principals. En primer lloc, hem co-disenyat el mecanisme de commit per tal de permetre un eficient commit atòmic del superblocks. En aquesta solució hem substituït el ROB convencional, i en lloc hem introduït el Superblock Ordering Buffer (SOB). El SOB manté l’odre de programa a granularitat de superblock. L’estat del programa consisteix en registres i memòria. L’estat dels registres es manté en una taula per superblock, mentre que l’estat de memòria es guarda en un buffer i s’actulitza atòmicament. La segona gran area de reducció de complexitat considerarada és l’ús de FIFOs a la lògica d’issue. En aquest últim àmbit hem proposat una heurística de distribució que solventa les ineficiències de l’heurística basada en dependències anteriorment proposada. Finalment, i junt amb les FIFOs, s’ha proposat un mecanisme per alliberar les entrades de la FIFO anticipadament.

2

Bouzidi, Halima. "Efficient Deployment of Deep Neural Networks on Hardware Devices for Edge AI." Electronic Thesis or Diss., Valenciennes, Université Polytechnique Hauts-de-France, 2024. http://www.theses.fr/2024UPHF0006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les réseaux de neurones (RN) sont devenus une force dominante dans le monde de la technologie. Inspirés par le cerveau humain, leur conception complexe leur permet d’apprendre des motifs, de prendre des décisions et même de prévoir des scénarios futurs avec une précision impressionnante. Les RN sont largement déployés dans les systèmes de l'Internet des Objets (IoT pour Internet of Things), renforçant davantage les capacités des dispositifs interconnectés en leur donnant la capacité d'apprendre et de s'auto-adapter dans un contexte temps réel. Cependant, la prolifération des données produites par les capteurs IoT rend difficile leur envoi vers un centre cloud pour le traitement. Par conséquent, le traitement des données plus près de leur origine, en edge, permet de prendre des décisions en temps réel, réduisant ainsi la congestion du réseau.L'intégration des RN à l'edge dans les systèmes IoT permet d'obtenir des solutions plus efficaces et réactives, inaugurant ainsi une nouvelle ère de edge AI. Néanmoins, le déploiement des RN sur des plateformes matérielles à ressources présente une multitude de défis. (i) La complexité inhérente des architectures des RN, qui nécessitent d'importantes capacités de calcul et de stockage. (ii) Le budget énergétique limité caractérisant les dispositifs matériels sur edge qui ne permet pas de supporter des RN complexes, réduisant drastiquement la durée de fonctionnement du système. (iii) Le défi d'assurer une harmonie entre la conception des RN et celle des dispositifs matériels de l’edge. (iv) L'absence de l'adaptabilité à l'environnement d'exécution dynamique et aux complexités des données.Pour pallier ces problèmes, cette thèse vise à établir des méthodes innovantes qui élargissent les cadres traditionnels de conception de RN (NAS pour Neural Architecture Search) en intégrant les caractéristiques contextuelles du matériel et de l’environnement d'exécution. Tout d'abord, nous intégrons les propriétés matérielles au NAS en adaptant les RN aux variations de la fréquence d'horloge. Deuxièmement, nous exploitons l’aspect dynamique au sein des RN d'un point de vue conceptuel, en introduisant un NAS dynamique. Troisièmement, nous explorons le potentiel des RN graphiques (GNN pour Graph Neural Network) en développant un NAS avec calcul distribué sur des multiprocesseurs hétérogènes sur puce (MPSoC pour Multi-Processors Système-on-Chip). Quatrièmement, nous abordons la co-optimisation software et matérielle sur les MPSoCs hétérogènes en proposant une stratégie d'ordonnancement innovante qui exploite l'adaptabilité et le parallélisme des RN. Cinquièmement, nous explorons la perspective de ML4ML (pour Machine Learning for Machine Learning) en introduisant des techniques d'estimation des performances des RN sur les plateformes matérielles sur edge en utilisant des méthodes basés sur ML. Enfin, nous développons un framework NAS évolutif et auto-adaptatif de bout en bout qui apprend progressivement l'importance des paramètres architecturaux du RN pour guider efficacement le processus de recherche du NAS vers l'optimalité.Nos méthodes aident à contribuer à la réalisation d’un framework de conception de bout en bout pour les RN sur les dispositifs matériels sur edge. Elles permettent ainsi de tirer avantage de plusieurs pistes d’optimisation au niveau logiciel et matériel, améliorant les performances et l’efficacité de l’Edge AI
Neural Networks (NN) have become a leading force in today's digital landscape. Inspired by the human brain, their intricate design allows them to recognize patterns, make informed decisions, and even predict forthcoming scenarios with impressive accuracy. NN are widely deployed in Internet of Things (IoT) systems, further elevating interconnected devices' capabilities by empowering them to learn and auto-adapt in real-time contexts. However, the proliferation of data produced by IoT sensors makes it difficult to send them to a centralized cloud for processing. This is where the allure of edge computing becomes captivating. Processing data closer to where it originates -at the edge- reduces latency, makes real-time decisions with less effort, and efficiently manages network congestion.Integrating NN on edge devices for IoT systems enables more efficient and responsive solutions, ushering in a new age of self-sustaining Edge AI. However, Deploying NN on resource-constrained edge devices presents a myriad of challenges: (i) The inherent complexity of neural network architectures, which requires significant computational and memory capabilities. (ii) The limited power budget of IoT devices makes the NN inference prone to rapid energy depletion, drastically reducing system utility. (iii) The hurdle of ensuring harmony between NN and HW designs as they evolve at different rates. (iv) The lack of adaptability to the dynamic runtime environment and the intricacies of input data.Addressing these challenges, this thesis aims to establish innovative methods that extend conventional NN design frameworks, notably Neural Architecture Search (NAS). By integrating HW and runtime contextual features, our methods aspire to enhance NN performances while abstracting the need for the human-in-loop}. Firstly, we incorporate HW properties into the NAS by tailoring the design of NN to clock frequency variations (DVFS) to minimize energy footprint. Secondly, we leverage dynamicity within NN from a design perspective, culminating in a comprehensive Hardware-aware Dynamic NAS with DVFS features. Thirdly, we explore the potential of Graph Neural Networks (GNN) at the edge by developing a novel HW-aware NAS with distributed computing features on heterogeneous MPSoC. Fourthly, we address the SW/HW co-optimization on heterogeneous MPSoCs by proposing an innovative scheduling strategy that leverages NN adaptability and parallelism across computing units. Fifthly, we explore the prospect of ML4ML -- Machine Learning for Machine Learning by introducing techniques to estimate NN performances on edge devices using neural architectural features and ML-based predictors. Finally, we develop an end-to-end self-adaptive evolutionary HW-aware NAS framework that progressively learns the importance of NN parameters to guide the search process toward Pareto optimality effectively.Our methods can contribute to elaborating an end-to-end design framework for neural networks on edge hardware devices. They enable leveraging multiple optimization opportunities at both the software and hardware levels, thus improving the performance and efficiency of Edge AI

3

Ming-ChungLi and 李明峻. "Hw/Sw Data Transfer Optimization Co-Design of Embedded JPEG Image Compress." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/12464595123506991299.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "SW Co-Optimization":

1

Mohamed, Khaled Salah. "HW/SW Co-Optimization and Co-Protection." In Synthesis Lectures on Digital Circuits & Systems, 153–70. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-56152-8_6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Salhi, Afef, Fahmi Ghozzi, and Ahmed Fakhfakh. "Approximation Algorithm for Scheduling a Chain of Tasks for Motion Estimation on Heterogeneous Systems MPSoC." In Engineering Problems - Uncertainties, Constraints and Optimization Techniques [Working Title]. IntechOpen, 2021. http://dx.doi.org/10.5772/intechopen.97676.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Co-design embedded system are very important step in digital vehicle and airplane. The multicore and multiprocessor SoC (MPSoC) started a new computing era. It is becoming increasingly used because it can provide designers much more opportunities to meet specific performances. Designing embedded systems includes two main phases: (i) HW/SW Partitioning performed from high-level (eclipse C/C++ or python (machine learning and deep learning)) functional and architecture models (with virtual prototype and real prototype). And (ii) Software Design performed with significantly more detailed models with scheduling and partitioning tasks algorithm DAG Directed Acyclic Graph and GGEN Generation Graph Estimation Nodes (there are automatic DAG algorithm). Partitioning decisions are made according to performance assumptions that should be validated on the more refined software models for ME block and GGEN algorithm. In this paper, we focus to optimize a execution time and amelioration for quality of video with a scheduling and partitioning tasks in video codec. We show how they can be modeled the video sequence test with the size of video in height and width (three models of scheduling tasks in four processor). This modeling with DAG and GGEN are partitioning at different platform in OVP (partitioning, SW design). We can know the optimization of consumption energy and execution time in SoC and MPSoC platform.

Conference papers on the topic "SW Co-Optimization":

1

Nezhadi, Ali, Shaahin Angizi, and Arman Roohi. "EaseMiss: HW/SW Co-Optimization for Efficient Large Matrix-Matrix Multiply Operations." In 2022 IEEE 15th Dallas Circuit And System Conference (DCAS). IEEE, 2022. http://dx.doi.org/10.1109/dcas53974.2022.9845629.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Xu, Susan, and Hugh Pollitt-Smith. "Optimization of HW/SW Co-Design: Relevance to Configurable Processor and FPGA Technology." In 2007 Canadian Conference on Electrical and Computer Engineering. IEEE, 2007. http://dx.doi.org/10.1109/ccece.2007.423.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Matsuoka, Yusuke, Patrick Schaumont, Kris Tiri, and Ingrid Verbauwhede. "Java cryptography on KVM and its performance and security optimization using HW/SW co-design techniques." In the 2004 international conference. New York, New York, USA: ACM Press, 2004. http://dx.doi.org/10.1145/1023833.1023874.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Branković, Aleksandar, Kyriakos Stavrou, Enric Gibert, and Antonio González. "Warm-Up Simulation Methodology for HW/SW Co-Designed Processors." In CGO '14: 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization. New York, NY, USA: ACM, 2014. http://dx.doi.org/10.1145/2544137.2544142.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Wu, Youfeng, Shiliang Hu, Edson Borin, and Cheng Wang. "A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing." In 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2011. http://dx.doi.org/10.1109/cgo.2011.5764691.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Corsetto, Nicola, and Francesco Fittipaldi. "Five senses: integrated ergonomic/stylistic design for aircraft interiors." In 15th International Conference on Applied Human Factors and Ergonomics (AHFE 2024). AHFE International, 2024. http://dx.doi.org/10.54941/ahfe1004822.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In the reference scenario for the design development of aircraft interiors defined through the use of new digital technologies, there is the related research and development activity functional to the application of an innovative operational methodology, aimed at achieving the best aesthetic/functional configuration of the final product. Specifically, this is done through a synergistic and equal action of modeling software for stylistic design CAS (virtual engineering CAD and ergonomic design in a virtual environment) defined then by a subsequent phase of physical production of carefully selected finished products. The application of the methodological and operational action has made it possible to develop ergonomic and styling furnishing complements for seats and PIS (personal Information System) for tourist class of aircraft with low-range mission profiles, focusing on the realization of interiors elements for passenger aircraft with significant modular and ergonomic features. The objective of the project proposal "five senses" combines the operational path with the study and analysis of the state of the design methodology art currently used by manufacturers of aircraft interiors, achieving the highest standards, with a specialized approach developed in multidisciplinary areas. The synthesis of "Five senses" consints in the development and production of interiors focused on the "centrality" of the passenger, with an ergonomic approach aware of anthropometric issues. This is done taking into account the different physical variations of the world's population, which represent the fundamental "dimensional standards" to consider for a right design with necessary sizing criteria. The core of the project definitely focused on the necessary revisiting of the Sw tool (Ergonomic simulator) characterizing it not only in the context of ergonomic verification of the designed elements, but mostly as a functional aesthetic co-design tool for the redesigned furniture, in this case the armchairs and the upper information panel. The definition and "identification of stylistic features and systematization of ergonomic design requirements" will take place through a process of integration of the factors that concur in "human centered" design, i.e. the structuring basis necessary to set up the ergonomic project, where the preliminary phase (the first technical development of the concept) is obtained, and where the first dimensional, technical, functional and material constraints are placed on the project, functional to the maximum stylistic performance of the interiors. For the definition of volume requirements, habitability studies, visibility studies and reachability analysis, an ergonomic simulator was used as a design tool through the interaction of different specialized software for ergonomic verification and 3D modeling in a virtual environment, with anthropomorphic mannequins in three different dimensional standards of the user populations, in the three representative percentiles ( 5th percentile woman, 50th percentile man and 95th percentile man), implemented then in the 3D CAS models of the armchairs and upper information panels, verifying the degree of postural comfort. This innovative approach proves to be helpful to the functional design processes of the cross-disciplinary and multidisciplinary aptitude of the different specialized contexts for the elements involved in the final production of the "five senses" project. In conclusion, the key contribution of this approach has been to receive "real time" design support during the concept design phases, considering both aesthetic and functional canons while fully respecting the delicate balance between aesthetics and habitability. Currently, the target market demands increasingly evident, and verification activities in the virtual environment have defined functional pre-approval requirements in response of the design aircraft to passenger aircraft habitability standards. To date, performing only ergonomic verifications through virtual simulations and comparing the results with previous studies and musculoskeletal analysis has so far helped designers to establish the possible cause of discomfort associated with product use, however with methodological optimization of the integrated procedure, optimization of the results obtained can hopefully be achieved. The approach illustrated here resulted in a "scientific and objective" approach, redefining an interior design through a new line of aesthetic/functional adaptation, as an "excellent" stylistic/ergonomic interface between the seat, PIS and the user, implemented in the finite elements and assembly of the latter within aircraft and then, ready for market.