Dissertations / Theses: 'Heterogeneous embedded systems'

1

Diarra, Rokiatou. "Automatic Parallelization for Heterogeneous Embedded Systems." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS485.

Full text

Abstract:

L'utilisation d'architectures hétérogènes, combinant des processeurs multicoeurs avec des accélérateurs tels que les GPU, FPGA et Intel Xeon Phi, a augmenté ces dernières années. Les GPUs peuvent atteindre des performances significatives pour certaines catégories d'applications. Néanmoins, pour atteindre ces performances avec des API de bas niveau comme CUDA et OpenCL, il est nécessaire de réécrire le code séquentiel, de bien connaître l’architecture des GPUs et d’appliquer des optimisations complexes, parfois non portables. D'autre part, les modèles de programmation basés sur des directives (par exemple, OpenACC, OpenMP) offrent une abstraction de haut niveau du matériel sous-jacent, simplifiant ainsi la maintenance du code et améliorant la productivité. Ils permettent aux utilisateurs d’accélérer leurs codes séquentiels sur les GPUs en insérant simplement des directives. Les compilateurs d'OpenACC/OpenMP ont la lourde tâche d'appliquer les optimisations nécessaires à partir des directives fournies par l'utilisateur et de générer des codes exploitant efficacement l'architecture sous-jacente. Bien que les compilateurs d'OpenACC/OpenMP soient matures et puissent appliquer certaines optimisations automatiquement, le code généré peut ne pas atteindre l'accélération prévue, car les compilateurs ne disposent pas d'une vue complète de l'ensemble de l'application. Ainsi, il existe généralement un écart de performance important entre les codes accélérés avec OpenACC/OpenMP et ceux optimisés manuellement avec CUDA/OpenCL. Afin d'aider les programmeurs à accélérer efficacement leurs codes séquentiels sur GPU avec les modèles basés sur des directives et à élargir l'impact d'OpenMP/OpenACC dans le monde universitaire et industrielle, cette thèse aborde plusieurs problématiques de recherche. Nous avons étudié les modèles de programmation OpenACC et OpenMP et proposé une méthodologie efficace de parallélisation d'applications avec les approches de programmation basées sur des directives. Notre expérience de portage d'applications a révélé qu'il était insuffisant d'insérer simplement des directives de déchargement OpenMP/OpenACC pour informer le compilateur qu'une région de code particulière devait être compilée pour être exécutée sur la GPU. Il est essentiel de combiner les directives de déchargement avec celles de parallélisation de boucle. Bien que les compilateurs actuels soient matures et effectuent plusieurs optimisations, l'utilisateur peut leur fournir davantage d'informations par le biais des clauses des directives de parallélisation de boucle afin d'obtenir un code mieux optimisé. Nous avons également révélé le défi consistant à choisir le bon nombre de threads devant exécuter une boucle. Le nombre de threads choisi par défaut par le compilateur peut ne pas produire les meilleures performances. L'utilisateur doit donc essayer manuellement différents nombres de threads pour améliorer les performances. Nous démontrons que les modèles de programmation OpenMP et OpenACC peuvent atteindre de meilleures performances avec un effort de programmation moindre, mais les compilateurs OpenMP/OpenACC atteignent rapidement leur limite lorsque le code de région déchargée a une forte intensité arithmétique, nécessite un nombre très élevé d'accès à la mémoire globale et contient plusieurs boucles imbriquées. Dans de tels cas, des langages de bas niveau doivent être utilisés. Nous discutons également du problème d'alias des pointeurs dans les codes GPU et proposons deux outils d'analyse statiques qui permettent d'insérer automatiquement les qualificateurs de type et le remplacement par scalaire dans le code source
Recent years have seen an increase of heterogeneous architectures combining multi-core CPUs with accelerators such as GPU, FPGA, and Intel Xeon Phi. GPU can achieve significant performance for certain categories of application. Nevertheless, achieving this performance with low-level APIs (e.g. CUDA, OpenCL) requires to rewrite the sequential code, to have a good knowledge of GPU architecture, and to apply complex optimizations that are sometimes not portable. On the other hand, directive-based programming models (e.g. OpenACC, OpenMP) offer a high-level abstraction of the underlying hardware, thus simplifying the code maintenance and improving productivity. They allow users to accelerate their sequential codes on GPU by simply inserting directives. OpenACC/OpenMP compilers have the daunting task of applying the necessary optimizations from the user-provided directives and generating efficient codes that take advantage of the GPU architecture. Although the OpenACC / OpenMP compilers are mature and able to apply some optimizations automatically, the generated code may not achieve the expected speedup as the compilers do not have a full view of the whole application. Thus, there is generally a significant performance gap between the codes accelerated with OpenACC/OpenMP and those hand-optimized with CUDA/OpenCL. To help programmers for speeding up efficiently their legacy sequential codes on GPU with directive-based models and broaden OpenMP/OpenACC impact in both academia and industry, several research issues are discussed in this dissertation. We investigated OpenACC and OpenMP programming models and proposed an effective application parallelization methodology with directive-based programming approaches. Our application porting experience revealed that it is insufficient to simply insert OpenMP/OpenACC offloading directives to inform the compiler that a particular code region must be compiled for GPU execution. It is highly essential to combine offloading directives with loop parallelization constructs. Although current compilers are mature and perform several optimizations, the user may provide them more information through loop parallelization constructs clauses in order to get an optimized code. We have also revealed the challenge of choosing good loop schedules. The default loop schedule chosen by the compiler may not produce the best performance, so the user has to manually try different loop schedules to improve the performance. We demonstrate that OpenMP and OpenACC programming models can achieve best performance with lesser programming effort, but OpenMP/OpenACC compilers quickly reach their limit when the offloaded region code is computed/memory bound and contain several nested loops. In such cases, low-level languages may be used. We also discuss pointers aliasing problem in GPU codes and propose two static analysis tools that perform automatically at source level type qualifier insertion and scalar promotion to solve aliasing issues

APA, Harvard, Vancouver, ISO, and other styles

2

Valente, Frederico Miguel Goulão. "Static analysis on embedded heterogeneous multiprocessor systems." Master's thesis, Universidade de Aveiro, 2008. http://hdl.handle.net/10773/2180.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Fischaber, Scott Johan. "Memory-centric system level design of heterogeneous embedded DSP systems." Thesis, Queen's University Belfast, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.491885.

Full text

Abstract:

Modern embedded systems for DSP applications are increasingly being implemented on heterogeneous processing architectures, consisting of multiple processors and programmable hardware such as FPGAs. The layered memory structure of FPGAs provides an open platform for memory organisation which many algorithms can benefit from. To efficiently target these platforms, high level design tools are being developed to target these architectures; often for DSP applications, these tools have been based around process networks, and as such, their memory architectures typically closely match the simple FIFO buffering employed by these models. This is not always ideal in a hardware implementation, where off-chip memory accesses may be required, particularly when there is data reuse inherent to the algorithm. This thesis proposes a formalised methodology to synthesise efficient memory architectures for FPGA-based DSP systems from a high level dataflow model. This includes reducing the memory requirements of the system through transformations, model refinements and by including the hardware characteristics into the dataflow analysis: Standard dataflow transformations have been characterised so that their effects on the memory subsystem are apparent and these transformations have been placed appropriately in a memory-centric design flow. The memory generation techniques for hardware cores on these FPGA platforms are also analysed, providing extensions which can reduce memory requirements through automatic sub-scheduling using a range of MoCs. These techniques effectively target the distributed nature of FPGA memories to introduce memory hierarchies into the implementations, targeting any data reuse inherent to the application which can take advantage of the memory architecture. This layered memory approach is used to reduce the number of accesses reqUired to large memories, which in turn can increase performance and reduce power consumption. For a motion estimation algorithm the reqUired bandwidth for off-chip memory accesses can vary by a factor of a thousand between two DFGs. For a 2-D convolution algorithm, the total reqUired memory is reduced by half though refinement of the system level model. This methodology has been demonstrated in the design of a video encoder and template matching algorithm and used to efficiently implement the memory sub-systems.

APA, Harvard, Vancouver, ISO, and other styles

4

Hines, Kenneth J. "Coordination-centric debugging for heterogeneous distributed embedded systems /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/6914.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Peterson, Thomas. "Dynamic Allocation for Embedded Heterogeneous Memory : An Empirical Study." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-223904.

Full text

Abstract:

Embedded systems are omnipresent and contribute to our lives in many ways by instantiating functionality in larger systems. To operate, embedded systems require well-functioning software, hardware as well as an interface in-between these. The hardware and software of these systems is under constant change as new technologies arise. An actual change these systems are undergoing are the experimenting with different memory management techniques for RAM as novel non-volatile RAM(NVRAM) technologies have been invented. These NVRAM technologies often come with asymmetrical read and write latencies and thus motivate designing memory consisting of multiple NVRAMs. As a consequence of these properties and memory designs there is a need for memory management that minimizes latencies.This thesis addresses the problem of memory allocation on heterogeneous memory by conducting an empirical study. The first part of the study examines free list, bitmap and buddy system based allocation techniques. The free list allocation technique is then concluded to be superior. Thereafter, multi-bank memory architectures are designed and memory bank selection strategies are established. These strategies are based on size thresholds as well as memory bank occupancies. The evaluation of these strategies did not result in any major conclusions but showed that some strategies were more appropriate for someapplication behaviors.
Inbyggda system existerar allestädes och bidrar till våran livsstandard på flertalet avseenden genom att skapa funktionalitet i större system. För att vara verksamma kräver inbyggda system en välfungerande hård- och mjukvara samt gränssnitt mellan dessa. Dessa tre måste ständigt omarbetas i takt med utvecklingen av nya användbara teknologier för inbyggda system. En förändring dessa system genomgår i nuläget är experimentering med nya minneshanteringstekniker för RAM-minnen då nya icke-flyktiga RAM-minnen utvecklats. Dessa minnen uppvisar ofta asymmetriska läs och skriv fördröjningar vilket motiverar en minnesdesign baserad på flera olika icke-flyktiga RAM. Som en konsekvens av dessa egenskaper och minnesdesigner finns ett behov av att hitta minnesallokeringstekniker som minimerar de fördröjningar som skapas. Detta dokument adresserar problemet med minnesallokering på heterogena minnen genom en empirisk studie. I den första delen av studien studerades allokeringstekniker baserade på en länkad lista, bitmapp och ett kompissystem. Med detta som grund drogs slutsatsen att den länkade listan var överlägsen alternativen. Därefter utarbetades minnesarkitekturer med flera minnesbanker samtidigt som framtagandet av flera strategier för val av minnesbank utfördes. Dessa strategier baserades på storleksbaserade tröskelvärden och nyttjandegrad hos olika minnesbanker. Utvärderingen av dessa strategier resulterade ej i några större slutsatser men visade att olika strategier var olika lämpade för olika beteenden hos applikationer.

APA, Harvard, Vancouver, ISO, and other styles

6

Vincenzo, Stoico. "A Model-Driven Approach for modeling Heterogeneous Embedded Systems." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-44199.

Full text

Abstract:

Demands of high-performance systems guided the designers to the assessment of heterogeneous embedded systems (HES). Their complexity highlighted the need for methodologies and tools to ease their design. Model-Driven Engineering (MDE) can be crucial to facilitate the design of such a system. Research has demonstrated the usage of MDE to create platform-specific models(PSM). The aim of this work is to support HES design targeting platform-agnostic models. This work is based on a well-defined use case. It comprises a software application, written following the CUDA programming model, executing on a CPU-GPU hardware platform. The use case is analyzed to define the main characteristics of a HES. These concerns are included in a UML profile used to capture all the features of a HES. The profile is built as an extension of MARTE modeling language. Finally, the Alf action language is applied to make the model executable. The results prove the suitability of MARTE and Alf to create executable HES models. Additional research is needed to further investigate the HES domain. Finally, it is necessary to prove the validity of the UML profile targeting different programming models and hardware platforms.

APA, Harvard, Vancouver, ISO, and other styles

7

Pop, Traian. "Analysis and Optimisation of Distributed Embedded Systems with Heterogeneous Scheduling Policies." Doctoral thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-8934.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Eriksson, Jonas. "Partitioning methodology validation for embedded systems design." Thesis, Linköpings universitet, Programvara och system, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-129332.

Full text

Abstract:

As modern embedded systems are becoming more sophisticated the demands on their applications significantly increase. A current trend is to utilize the advances of heterogeneous platforms (i.e. platform consisting of different computational units (e.g. CPU, FPGA or GPU)) where different parts of the application can be distributed among the different computational units as software and hardware implementations. This technology can improve the application characteristics to meet requirements (e.g. execution time, power consumption and design cost), but it leads to a new challenge in finding the best combination of hardware and software implementation (referred as system configuration). The decisions whether a part of the application should be implemented in software (e.g. as C code) or hardware (e.g. as VHDL code) affect the entire product life-cycle. This is traditionally done manually by the developers in the early stage of the design phase. However, due to the increasing complexity of the application the need of a systematic process that aids the developer when making these decisions to meet the demands rises. Prior to this work a methodology called MULTIPAR has been designed to address this problem. MULTIPAR applies component-/model-based techniques to design the application, i.e. the application is modeled as a number of interconnected components, where some of the components will be implemented as software and the remaining ones as hardware. To perform the partitioning decisions, i.e. determining for each component whether it should be implemented as software or hardware, MULTIPAR proposes a set of formulas to calculate the properties of the entire system based on the properties for each component working in isolation. This thesis aims to show to what extent the proposed system formulas are valid. In particular it focuses on validating the formulas that calculate the system response time, system power consumption, system static memory and system FPGA area. The formulas were validated trough an industrial case study, where the system properties for different system configurations were measured and calculated by applying these formulas. The measured values and calculated values for the system properties were compared by conducting a statistical analysis. The case study demonstrated that the system properties can be accurately calculated by applying the system formulas.

APA, Harvard, Vancouver, ISO, and other styles

9

Pop, Traian. "Scheduling and Optimisation of Heterogeneous Time/Event-Triggered Distributed Embedded Systems." Licentiate thesis, Linköping : Univ, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-5691.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lifa, Adrian Alin. "Hardware/Software Codesign of Embedded Systems with Reconfigurable and Heterogeneous Platforms." Doctoral thesis, Linköpings universitet, Programvara och system, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-117637.

Full text

Abstract:

Modern applications running on today's embedded systems have very high requirements. Most often, these requirements have many dimensions: the applications need high performance as well as exibility, energy-eciency as well as real-time properties, fault tolerance as well as low cost. In order to meet these demands, the industry is adopting architectures that are more and more heterogeneous and that have reconguration capabilities. Unfortunately, this adds to the complexity of designing streamlined applications that can leverage the advantages of such architectures. In this context, it is very important to have appropriate tools and design methodologies for the optimization of such systems. This thesis addresses the topic of hardware/software codesign and optimization of adaptive real-time systems implemented on recongurable and heterogeneous platforms. We focus on performance enhancement for dynamically recongurable FPGA-based systems, energy minimization in multi-mode real-time systems implemented on heterogeneous platforms, and codesign techniques for fault-tolerant systems. The solutions proposed in this thesis have been validated by extensive experiments, ranging from computer simulations to proof of concept implementations on real-life platforms. The results have conrmed the importance of the addressed aspects and the applicability of our techniques for design optimization of modern embedded systems.

APA, Harvard, Vancouver, ISO, and other styles

11

Nikov, Kris. "Power modelling and analysis on heterogeneous embedded systems : a systematic approach." Thesis, University of Bristol, 2018. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.743036.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Patel, Hiren Dhanji. "HEMLOCK: HEterogeneous ModeL Of Computation Kernel for SystemC." Thesis, Virginia Tech, 2003. http://hdl.handle.net/10919/9632.

Full text

Abstract:

As SystemC gains popularity as a System Level Design Language (SLDL) for System-On-Chip (SOC) designs, heterogeneous modelling and efficient simulation become increasingly important. The key in making an SLDL heterogeneous is the facility to express different Models Of Computation (MOC). Currently, all SystemC models employ a Discrete-Event simulation kernel making it difficult to express most MOCs without specific designer guidelines. This often makes it unnatural to express different MOCs in SystemC. For the simulation framework, this sometimes results in unnecessary delta cycles for models away from the Discrete-Event MOC, hindering the simulation performance of the model. Our goal is to extend SystemC's simulation framework to allow for better modelling expressiveness and efficiency for the Synchronous Data Flow (SDF) MOC. The SDF MOC follows a paradigm where the production and consumption rates of data by a function block are known a priori. These systems are common in Digital Signal Processing applications where relative sample rates are specified for every component. Knowledge of these rates enables the use of static scheduling. When compared to dynamic scheduling of SDF models, we experience a noticeable improvement in simulation efficiency. We implement an extension to the SystemC kernel that exploits such static scheduling for SDF models and propose designer style guidelines for modelers to use this extension. The modelling paradigm becomes more natural to SDF which results to better simulation efficiency. We will distribute our implementation to the SystemC community to demonstrate that SystemC can be a heterogeneous SLDL.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

13

Hegde, Sridhar. "FUNCTIONAL ENHANCEMENT AND APPLICATIONS DEVELOPMENT FOR A HYBRID, HETEROGENEOUS SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE." UKnowledge, 2004. http://uknowledge.uky.edu/gradschool_theses/252.

Full text

Abstract:

Reconfigurable and dynamic computer architecture is an exciting area of research that is rapidly expanding to meet the requirements of compute intense real and non-real time applications in key areas such as cryptography, signal/radar processing and other areas. To meet the demands of such applications, a parallel single-chip heterogeneous Hybrid Data/Command Architecture (HDCA) has been proposed. This single-chip multiprocessor architecture system is reconfigurable at three levels: application, node and processor level. It is currently being developed and experimentally verified via a three phase prototyping process. A first phase prototype with very limited functionality has been developed. This initial prototype was used as a base to make further enhancements to improve functionality and performance resulting in a second phase virtual prototype, which is the subject of this thesis. In the work reported here, major contributions are in further enhancing the functionality of the system by adding additional processors, by making the system reconfigurable at the node level, by enhancing the ability of the system to fork to more than two processes and by designing some more complex real/non-real time applications which make use of and can be used to test and evaluate enhanced and new functionality added to the architecture. A working proof of concept of the architecture is achieved by Hardware Description Language (HDL) based development and use of a Virtual Prototype of the architecture. The Virtual Prototype was used to evaluate the architecture functionality and performance in executing several newly developed example applications. Recommendations are made to further improve the system functionality.

APA, Harvard, Vancouver, ISO, and other styles

14

Souza, Jeckson Dellagostin. "A reconfigurable heterogeneous multicore system with homogeneous ISA." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2016. http://hdl.handle.net/10183/140321.

Full text

Abstract:

Dada a grande diversidade de aplicações embarcadas presentes nos atuais dispositivos portáveis, ambos os paralelismos em nível de threads e de instruções devem ser explorados para obter ganhos de desempenho e energia. Enquanto MPSoCs (sistemas em chip de múltiplos núcleos) são amplamente usados para esse propósito, estes falham quando consideramos produtividade de software, já que eles são compostos de chips com diferentes arquiteturas que precisam ser programados separadamente. Por outro lado, processadores multi núcleos de propósito geral implementam a mesma arquitetura, mas são compostos de núcleos homogêneos de processadores superescalares que consomem muita potência. Nesta dissertação, propõe-se um novo sistema, que tira proveito de circuitos reconfiguráveis para criar diferentes organizações que implementam a mesma arquitetura, capazes de apresentar alto desempenho com baixo custo energético. Para garantir a compatibilidade binária, usa-se um mecanismo de tradução binária que transforma o código a ser executado no circuito reconfigurável durante a execução. Usando aplicações representativas, mostra-se que uma versão do sistema heterogêneo pode ganhar da sua versão homogênea em média de 59% em desempenho e 10% em energia, com melhoras em EDP (Energy-Delay Product – Produto da energia pelo tempo de execução) em quase todos os cenários. Além disso, este trabalho também propõe e avalia seis escalonadores para este sistema heterogêneo: dois algoritmos estáticos, os quais alocam as threads no primeiro núcleo livre, onde elas permanecerão durante toda a execução; um escalonador direcionado por contagem de instruções, o qual realoca as threads durante pontos de sincronização de acordo com a sua contagem de instruções; um escalonador de Feedback, que usa dados de dentro da unidade reconfigurável para realocar threads; o PC-Feedback, que adiciona um mecanismo de reuso de dados ao último escalonador; e um escalonador Oráculo, que é capaz de decidir a melhor alocação de threads possível. Mostra-se que o algoritmo estático pode ter alto desempenho em aplicações com alto paralelismo, contudo para um desempenho mais uniforme em todas as aplicações os algoritmos de Feedback e PC-Feedback são mais indicados.
Given the large diversity of embedded applications one can find in current portable devices, for energy and performance reasons one must exploit both Thread- and Instruction Level Parallelism. While MPSoCs (Multiprocessor system-on-chip) are largely used for this purpose, they fail when one considers software productivity, since it comprises different ISAs (Instruction Set Architecture) that must be programmed separately. On the other hand, general purpose multicores implement the same ISA, but are composed of a homogeneous set of very power consuming superscalar processors. In this dissertation, we show how one can effectively use a reconfigurable unit to provide a number of different possible heterogeneous configurations while still sustaining the same ISA, capable of reaching high performance with low energy cost. To ensure ISA compatibility, we use a binary translation mechanism that transforms code to be executed on the fabric at run-time. Using representative benchmarks, we show that one version of the heterogeneous system can outperform its homogenous counterpart in average by 59% in performance and 10% in energy, with EDP (Energy-Delay Product) improvements in almost every scenario. Furthermore, this work also proposes and evaluates six schedulers for the heterogeneous system: two static algorithms, which allocate the threads on the first free core, where they will run during the entire execution; an Instruction Count (IC) Driven scheduler, which reallocates threads during synchronization points accordingly to their instruction count; a Feedback scheduler, which uses data from inside the reconfigurable unit to reallocate threads; the PCFeedback scheduler, that adds a reuse mechanism to the last one; and an Oracle scheduler, which is capable of deciding the best thread allocation possible. We show that the static algorithm can reach high performance in applications with high parallelism, however for uniform performance in all applications, the Feedback and PC-Feedback algorithms are better designated.

APA, Harvard, Vancouver, ISO, and other styles

15

Mendoza, Cervantes Francisco [Verfasser]. "A Problem-Oriented Approach for Dynamic Verification of Heterogeneous Embedded Systems / Francisco Mendoza Cervantes." Karlsruhe : KIT Scientific Publishing, 2014. http://www.ksp.kit.edu.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Oliva, Venegas Yaset. "High level modeling of run-time managers for the design of heterogeneous embedded systems." Rennes, INSA, 2012. http://www.theses.fr/2012ISAR0017.

Full text

Abstract:

Afin de répondre à la difficulté toujours croissante de réaliser des systèmes embarqués, les concepteurs ont dû se tourner vers des méthodes et outils permettant d’abstraire le niveau de description. Ces systèmes modernes sont généralement caractérisés par la présence de multiples fonctions complexes de traitement numérique du signal et s’articulent de plus en plus autour de blocs hétérogènes (logiciels et matériels). Dans ce contexte, la tendance actuelle est d’incorporer un noyau afin de gérer ces différents blocs de manière flexible et dynamique. Lorsque ces blocs deviennent trop complexes, il devient alors possible d’ajouter au noyau des services supplémentaires afin d’en assurer la gestion. Ceci est notamment le cas lorsque les fonctions peuvent s’exécuter sur plusieurs processeurs hétérogènes ou dans des zones matérielles reconfigurables. Compte tenu de tous ces aspects, la première partie de cette thèse porte sur une contribution à un outil de conception et propose une méthodologie permettant d'explorer la structure d'un système embarqué. L'outil permet la spécification des trois éléments de base d'un système : l'application, l'architecture et le système d'exploitation (noyau) à partir de modèles de haut niveau. La méthodologie consiste à spécifier, simuler et analyser les trois éléments de base. La démarche d’exploration s’effectue de manière itérative jusqu'à ce qu’une solution satisfaisante soit déterminée. La seconde partie de la thèse a porté sur une extension d’un modèle noyau afin de gérer dynamiquement la migration des tâches entre différents blocs de traitement (processeurs ou zones reconfigurables). Le service proposé est conçu pour gérer les architectures à mémoire partagée contenant un noyau supportant une configuration maître/esclave. Ce service d’Offloading complète le modèle du noyau en y ajoutant de nouvelles fonctionnalités (la migration des tâches, gestion des tâches hétérogènes et le placement intelligent)
In order to circumvent the ever increasing difficulty of designing embedded systems, designers have to envisage using new methods and tools to abstract the level of description. These modern systems are usually composed of multiple complex functions of digital signal processing that are often implemented in heterogeneous (software and hardware) blocks. In this context, the current trend is to incorporate a kernel destined to manage these different processing blocks in a flexible and dynamic manner. As these blocks become too complex to handle, it is thus possible to add specific services to the kernel to manage it. This is particularly the case when the functions may run on multiple heterogeneous processors or in reconfigurable hardware. Considering all these aspects, the first part of this thesis is a contribution to a design tool and proposes a methodology to explore the structure of an embedded system. The tool allows the specification of three basic elements of a system: the application, the architecture and the operating system (kernel) from high level models. The methodology consists in specifying, simulating and analyzing these three basic elements. The process of exploration is performed iteratively until a satisfactory solution is reached. The second part of the thesis focused on an extension of a core model to dynamically manage the migration of tasks between different blocks (processors or reconfigurable areas). The proposed service is designed to manage the shared-memory architectures containing a kernel that supports a master / slave configuration. This Offloading service is part of the kernel model and adds new features (migration tasks, heterogeneous task management and the smart placement)

APA, Harvard, Vancouver, ISO, and other styles

17

Nam, HyunSuk, and HyunSuk Nam. "Security-driven Design Optimization of Mixed Cryptographic Implementations in Distributed, Reconfigurable, and Heterogeneous Embedded Systems." Diss., The University of Arizona, 2017. http://hdl.handle.net/10150/624287.

Full text

Abstract:

Distributed heterogeneous embedded systems are increasingly prevalent in numerous applications, including automotive, avionics, smart and connected cities, Internet of Things, etc. With pervasive network access within these systems, security is a critical design concern. This dissertation presents a modeling and optimization framework for distributed, reconfigurable, and heterogeneous embedded systems. Distributed embedded systems consist of numerous interconnected embedded devices, each composed of different computing resources, such single core processors, asymmetric multicore processors, field-programmable gate arrays (FPGAs), and various combinations thereof. A dataflow-based modeling framework for streaming applications integrates models for computational latency, mixed cryptographic implementations for inter-task and intra task communication, security levels, communication latency, and power consumption. For the security model, we present a level-based modeling of cryptographic algorithms using mixed cryptographic implementations, including both symmetric and asymmetric implementations. We utilize a multi-objective genetic optimization algorithm to optimize security and energy consumption subject to latency and minimum security level constraints. The presented methodology is evaluated using a video-based object detection and tracking application and several synthetic benchmarks representing various application types. Experimental results for these design and optimization frameworks demonstrate the benefits of mixed cryptographic algorithm security model compared to single cryptographic algorithm alternatives. We further consider several distributed heterogeneous embedded systems architectures.

APA, Harvard, Vancouver, ISO, and other styles

18

Leija, Antonio M. "AN INVESTIGATION INTO PARTITIONING ALGORITHMS FOR AUTOMATIC HETEROGENEOUS COMPILERS." DigitalCommons@CalPoly, 2015. https://digitalcommons.calpoly.edu/theses/1546.

Full text

Abstract:

Automatic Heterogeneous Compilers allows blended hardware-software solutions to be explored without the cost of a full-fledged design team, but limited research exists on current partitioning algorithms responsible for separating hardware and software. The purpose of this thesis is to implement various partitioning algorithms onto the same automatic heterogeneous compiler platform to create an apples to apples comparison for AHC partitioning algorithms. Both estimated outcomes and actual outcomes for the solutions generated are studied and scored. The platform used to implement the algorithms is Cal Poly’s own Twill compiler, created by Doug Gallatin last year. Twill’s original partitioning algorithm is chosen along with two other partitioning algorithms: Tabu Search + Simulated Annealing (TSSA) and Genetic Search (GS). These algorithms are implemented inside Twill and test bench input code from the CHStone HLS Benchmark tests is used as stimulus. Along with the algorithms cost models, one key attribute of interest is queue counts generated, as the more cuts between hardware and software requires queues to pass the data between partition crossings. These high communication costs can end up damaging the heterogeneous solution’s performance. The Genetic, TSSA, and Twill’s original partitioning algorithm are all scored against each other’s cost models as well, combining the fitness and performance cost models with queue counts to evaluate each partitioning algorithm. The solutions generated by TSSA are rated as better by both the cost model for the TSSA algorithm and the cost model for the Genetic algorithm while producing low queue counts.

APA, Harvard, Vancouver, ISO, and other styles

19

Gantel, Laurent. "Hardware and software architecture facilitating the operation by the industry of dynamically adaptable heterogeneous embedded systems." Phd thesis, Université de Cergy Pontoise, 2014. http://tel.archives-ouvertes.fr/tel-01019909.

Full text

Abstract:

This thesis aims to define software and hardware mechanisms helping in the management the Heterogeneous and dynamically Reconfigurable Systems-on-Chip (HRSoC). The heterogeneity is due to the presence of general processing units and reconfigurable IPs. Our objective is to provide to an application developer an abstracted view of this heterogeneity, regarding the task mapping on the available processing elements. First, we homogenize the user interface defining a hardware thread model. Then, we pursue with the homogenization of the hardware threads management. We implemented OS services permitting to save and restore a hardware thread context. Conception tools have also been developed in order to overcome the relocation issue. The last step consisted in extending the access to the distributed OS services to every thread running on the platform. This access is provided independently from the thread location and is is realized implementing the MRAPI API. With these three steps, we build a solid basis to, in future work, provide to the developer, a conception flow dedicated to HRSoC allowing to perform precise architectural space explorations. Finally, to validate these mechanisms, we realize a demonstration platform on a Virtex 5 FPGA running a dynamic tracking application.

APA, Harvard, Vancouver, ISO, and other styles

20

Jädal, Thomas, and Dissel Dirk Postol. "Dynamic Bandwidth Allocation for Wireless Nodes with Software Defined Networking in Heterogeneous Networks with Embedded Systems." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-44115.

Full text

Abstract:

In the wireless computer networks of today, there is a limitation on how wireless access points handle bandwidth distribution to wireless nodes, both in Software Defined Networking (SDN) solutions, traditional networks with routers and switches, and in embedded systems. The limitation is the lack of control of the node bandwidth usage. Quality of Service exists, but it limits the bandwidth on the access point or the switch and not on the node itself. Thus, when several nodes connect to a single wireless access point, they compete for bandwidth, and therefore there is a need to allocate bandwidth to the nodes directly. Through the use of SDN, it would be possible to make this dynamic access control work with rapidly changing networks, both wired and wireless. This thesis is a proof of concept with actual hardware, and answers the question on how we can implement dynamic bandwidth allocation in a heterogeneous network with SDN and how to make it dynamic with the adding and removal of nodes. The solution achieves the dynamic bandwidth allocation by running a program in parallel to the SDN controller together with additional software on the nodes. The solution makes it possible to share the bandwidth between nodes and through priorities manipulate how much of the total bandwidth each node receives in comparison to the other nodes. The measured results show that the program has a manageable overhead and works with several nodes. The thesis aims to widen the view on viable SDN approaches and inspire future research on wireless SDN solutions.

APA, Harvard, Vancouver, ISO, and other styles

21

Ringenson, Josefin. "Efficiency of CNN on Heterogeneous Processing Devices." Thesis, Linköpings universitet, Programvara och system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-155034.

Full text

Abstract:

In the development of advanced driver assistance systems, computer vision problemsneed to be optimized to run efficiently on embedded platforms. Convolutional neural network(CNN) accelerators have proven to be very efficient for embedded camera platforms,such as the ones used for automotive vision systems. Therefore, the focus of this thesisis to evaluate the efficiency of a CNN on a future embedded heterogeneous processingdevice. The memory size in an embedded system is often very limited, and it is necessary todivide the input into multiple tiles. In addition, there are power and speed constraintsthat needs to be met to be able to use a computer vision system in a car. To increaseefficiency and optimize the memory usage, different methods for CNN layer fusion areproposed and evaluated for a variety of tile sizes. Several different layer fusion methods and input tile sizes are chosen as optimal solutions,depending on the depth of the layers in the CNN. The solutions investigated inthe thesis are most efficient for deep CNN layers, where the number of channels is high.

APA, Harvard, Vancouver, ISO, and other styles

22

Dekkiche, Djamila. "Programming methodologies for ADAS applications in parallel heterogeneous architectures." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS388/document.

Full text

Abstract:

La vision par ordinateur est primordiale pour la compréhension et l’analyse d’une scène routière afin de construire des systèmes d’aide à la conduite (ADAS) plus intelligents. Cependant, l’implémentation de ces systèmes dans un réel environnement automobile et loin d’être simple. En effet, ces applications nécessitent une haute performance de calcul en plus d’une précision algorithmique. Pour répondre à ces exigences, de nouvelles architectures hétérogènes sont apparues. Elles sont composées de plusieurs unités de traitement avec différentes technologies de calcul parallèle: GPU, accélérateurs dédiés, etc. Pour mieux exploiter les performances de ces architectures, différents langages sont nécessaires en fonction du modèle d’exécution parallèle. Dans cette thèse, nous étudions diverses méthodologies de programmation parallèle. Nous utilisons une étude de cas complexe basée sur la stéréo-vision. Nous présentons les caractéristiques et les limites de chaque approche. Nous évaluons ensuite les outils employés principalement en terme de performances de calcul et de difficulté de programmation. Le retour de ce travail de recherche est crucial pour le développement de futurs algorithmes de traitement d’images en adéquation avec les architectures parallèles avec un meilleur compromis entre les performances de calcul, la précision algorithmique et la difficulté de programmation
Computer Vision (CV) is crucial for understanding and analyzing the driving scene to build more intelligent Advanced Driver Assistance Systems (ADAS). However, implementing CV-based ADAS in a real automotive environment is not straightforward. Indeed, CV algorithms combine the challenges of high computing performance and algorithm accuracy. To respond to these requirements, new heterogeneous circuits are developed. They consist of several processing units with different parallel computing technologies as GPU, dedicated accelerators, etc. To better exploit the performances of such architectures, different languages are required depending on the underlying parallel execution model. In this work, we investigate various parallel programming methodologies based on a complex case study of stereo vision. We introduce the relevant features and limitations of each approach. We evaluate the employed programming tools mainly in terms of computation performances and programming productivity. The feedback of this research is crucial for the development of future CV algorithms in adequacy with parallel architectures with a best compromise between computing performance, algorithm accuracy and programming efforts

APA, Harvard, Vancouver, ISO, and other styles

23

Wolvers, Adrianus Hendrikus Cornelis. "Integrating requirements authoring and design tools for heterogeneous and multicore embedded systems. : Using the iFEST Tool Integration Framework." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-18712.

Full text

Abstract:

In today’s practical reality there are many different tools being used in their respective phases of thesystem development lifecycle. Every tool employs its own underlying metamodel and these metamodelstend to vary greatly in size and complexity, making them difficult to integrate. One solution to overcomethis problem is to build a tool integration framework that is based on a single, shared metamodel.The iFEST project aims to specify and develop such a tool integration framework for tools used in thedevelopment of heterogeneous and multi-core embedded systems. This framework is known as the iFESTTool Integration Framework or iFEST IF.The iFEST IF uses Web services based on the Open Services for Lifecycle Collaboration (OSLC)standards and specifications to make the tools within the tool chain communicate with each other. Tovalidate the framework, an industrial case study called ‘Wind Turbine’, using several embedded systemstools, has been carried out. Tools used to design, implement and test a controller for a wind turbine havebeen integrated in a prototype tool chain. To expose tools’ internal data through Web services, a tooladaptor is needed. This work reports on the development of such a tool adaptor for the RequirementsManagement module of HP Application Lifecycle Management (ALM), one of the tools used in the WindTurbine industrial case study. A generalization of the challenges faced while developing the tool adaptoris made. These challenges indicate that, despite having a tool integration framework, tool integration canstill be a difficult task with many obstacles to overcome. Especially when tools are not developed with tool integration in mind from the start.
Idag existerar det en mängd olika verktyg som kan appliceras i respektive fas isystemutvecklings livscykel. Varje verktyg använder sin egna underliggande metamodell. Dessametamodeller kan variera avsevärt i både storlek och komplexitet, vilket gör dem svåra attintegrera. En lösning på detta problem är att bygga ett ramverk för verktygsintegration sombaseras på en enda, gemensam metamodell.iFEST-projektets mål är att specificera och utveckla ett ramverk för verktygsintegration förverktyg som används i utvecklingen av heterogena och multi-core inbyggda system. Dettaramverk benämns iFEST Tool Integration Framework eller iFEST IF.iFEST IF använder webbtjänster baserade på en standard som kallas OSCL, Open Services forLifecycle Collaboration samt specifikationer som gör att verktygen i verktygskedjan kankommunicera med varandra. För att validera ramverket har en fallstudie vid namn ”WindTurbine” gjorts med flertal inbyggda systemverktyg. Verktyg som används för att designa,implementera och testa en styrenhet för vindturbiner har integrerats i prototyp av enverktygskedja. För att bearbeta och behandla intern data genom webbtjänster behövs enverktygsadapter. Detta arbete redogör utvecklingen av en verktygsadapter förkravhanteringsmodulen HP Application Lifecycle Management (ALM), ett av de verktyg somanvänts i fallstudien av vindturbinen. En generalisering av de utmaningar som uppstod underutvecklingen av verktygsadaptern har genomförts. Dessa utmaningar indikerar att, trots att detfinns ett ramverk för verktygsintegration så är verktygsintegration fortfarande vara en svåruppgift att få bukt med. Detta gäller särskilt när verktyg inte är utvecklade med hänsyn tillverktygsintegration från början.
ARTEMIS iFEST

APA, Harvard, Vancouver, ISO, and other styles

24

Motta, Rodrigo Bittencourt. "Reduzindo o consumo de energia em MPSoCs heterogêneos via clock gating." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2008. http://hdl.handle.net/10183/15312.

Full text

Abstract:

Nesse trabalho é apresentada uma arquitetura que habilita a geração de MPSoCs (Multiprocessors Systems-on-Chip) heterogêneos escaláveis, baseados em barramento, suportando ainda o uso de diferentes organizações de memória. A comunicação entre as tarefas é especificada por meio de uma estrutura de memória compartilhada, que evita colisões e promove ganhos energéticos através do disparo dinâmico de clock gating. Também é introduzida a técnica DCF (Dynamic Core Freezing), que incrementa a eficiência energética do MPSoC tirando proveito dos ciclos ociosos dos processadores durante os acessos à memória. Mais, a combinação das organizações de memória propostas habilita a exploração de migração de tarefas na arquitetura proposta, por meio da troca de contexto das tarefas na memória compartilhada. Além disso, é mostrado o simulador de alto-nível, baseado na arquitetura proposta, criado com o propósito de extrair os ganhos energéticos propiciados com o uso do clock gating e da técnica DCF. O simulador aceita como entrada arquivos de trace de execução de aplicações Java, com os quais ele gera um novo arquivo contendo o mapeamento das instruções encontradas nos arquivos de trace para diferentes classes de instrução. Dessa forma, podem ser modeladas diferentes arquiteturas de processadores, usando o arquivo com o mapeamento para simular o MPSoC. Mais, o simulador habilita ainda a exploração das diferentes organizações de memória da arquitetura proposta, de maneira que se pode estimar o seu impacto no número de instruções executadas, contenções no barramento, e consumo energético. Experimentos baseados em uma aplicação sintética, executando em um MPSoC composto por diferentes versões de um processador Java mostram um grande aumento na eficiência energética com um custo mínimo em área. Além disso, também são apresentados experimentos baseados em aplicações do benchmark SPECjvm98, que mostram o impacto causado na eficiência energética quando o tipo de aplicação é alterado. Mais, os experimentos mostram drásticos ganhos energéticos obtidos com a aplicação da técnica DCF sobre as memórias do MPSoC.
In this work we present an architecture that enables the generation of bus-based, scalable heterogeneous Multiprocessor Systems-on-Chip (MPSoCs), supporting different memory organizations. Intertask communication is specified by means of a shared memory structure that assures collision avoidance and promotes energy savings through a dynamic clock gating triggering. We also introduce a Dynamic Core Freezing (DCF) technique, which boosts energy savings taking advantage of processor idle cycles during memory accesses. Moreover, the combination of the memory organizations enables the architecture to exploit easy task migration by means of the task context saving in the shared data memory. Moreover, we show the high-level simulator, based on the proposed architecture, created in order to extract the energy savings enabled with the clock gating and the DCF techniques. The simulator accepts as input execution trace files of Java applications, from which it generates a new file that contains the mapping of the instructions found in the trace file for different instruction classes. This way, we can model different processor architectures, using the mapping file to simulate the MPSoC. Also, the simulator enables us to experiment with different memory organizations to estimate their impact on the executed instructions, bus contention, and energy consumption. As case study we have modeled different versions of a Java processor in order to experiment with different execution patterns over different memory organizations. Experiments based on a synthetic application running on an MPSoC containing different versions of a Java processor show a large improvement in energy efficiency with a minimal area cost. Besides that, we also present experiments based on applications of the SPECjvm98 benchmark, which show the impact on the energy efficiency when we change the application type. Moreover, the experiments show a huge improvement in the energy efficiency when applying the DCF technique to the MPSoC memories.

APA, Harvard, Vancouver, ISO, and other styles

25

Radhakrishnan, Swarnalatha Computer Science &amp Engineering Faculty of Engineering UNSW. "Heterogeneous multi-pipeline application specific instruction-set processor design and implementation." Awarded by:University of New South Wales. Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/29161.

Full text

Abstract:

Embedded systems are becoming ubiquitous, primarily due to the fast evolution of digital electronic devices. The design of modern embedded systems requires systems to exhibit, high performance and reliability, yet have short design time and low cost. Application Specific Instruction set processors (ASIPs) are widely used in embedded system since they are economical to use, flexible, and reusable (thus saves design time). During the last decade research work on ASIPs have been carried out in mainly for single pipelined processors. Improving performance in processors is possible by exploring the available parallelism in the program. Designing of multiple parallel execution paths for parallel execution of the processor naturally incurs additional cost. The methodology presented in this dissertation has addressed the problem of improving performance in ASIPs, at minimal additional cost. The devised methodology explores the available parallelism of an application to generate a multi-pipeline heterogeneous ASIP. The processor design is application specific. No pre-defined IPs are used in the design. The generated processor contains multiple standalone pipelined data paths, which are not necessarily identical, and are connected by the necessary bypass paths and control signals. Control unit are separate for each pipeline (though with the same clock) resulting in a simple and cost effective design. By using separate instruction and data memories (Harvard architecture) and by allowing memory access by two separate pipes, the complexity of the controller and buses are reduced. The impact of higher memory latencies is nullified by utilizing parallel pipes during memory access. Efficient bypass network selection and encoding techniques provide a better implementation. The initial design approach with only two pipelines without bypass paths show speed improvements of up to 36% and switching activity reductions of up to 11%. The additional area costs around 16%. An improved design with different number of pipelines (more than two) based on applications show on average of 77% performance improvement with overheads of: 49% on area; 51% on leakage power; 17% on switching activity; and 69% on code size. The design was further trimmed, with bypass path selection and encoding techniques, which show a saving of up to 32% of area and 34% of leakage power with 6% performance improvement and 69% of code size reduction compared to the design approach without these techniques in the multi pipeline design.

APA, Harvard, Vancouver, ISO, and other styles

26

Robino, Francesco. "A model-based design approach for heterogeneous NoC-based MPSoCs on FPGA." Licentiate thesis, KTH, Elektroniksystem, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-145521.

Full text

Abstract:

Network-on-chip (NoC) based multi-processor systems-on-chip (MPSoCs) are promising candidates for future multi-processor embedded platforms, which are expected to be composed of hundreds of heterogeneous processing elements (PEs) to potentially provide high performances. However, together with the performances, the systems complexity will increase, and new high level design techniques will be needed to efficiently model, simulate, debug and synthesize them. System-level design (SLD) is considered to be the next frontier in electronic design automation (EDA). It enables the description of embedded systems in terms of abstract functions and interconnected blocks. A promising complementary approach to SLD is the use of models of computation (MoCs) to formally describe the execution semantics of functions and blocks through a set of rules. However, also when this formalization is used, there is no clear way to synthesize system-level models into software (SW) and hardware (HW) towards a NoC-based MPSoC implementation, i.e., there is a lack of system design automation (SDA) techniques to rapidly synthesize and prototype system-level models onto heterogeneous NoC-based MPSoCs. In addition, many of the proposed solutions require large overhead in terms of SW components and memory requirements, resulting in complex and customized multi-processor platforms. In order to tackle the problem, a novel model-based SDA flow has been developed as part of the thesis. It starts from a system-level specification, where functions execute according to the synchronous MoC, and then it can rapidly prototype the system onto an FPGA configured as an heterogeneous NoC-based MPSoC. In the first part of the thesis the HeartBeat model is proposed as a model-based technique which fills the abstraction gap between the abstract system-level representation and its implementation on the multiprocessor prototype. Then details are provided to describe how this technique is automated to rapidly prototype the modeled system on a flexible platform, permitting to adjust the system specification until the designer is satisfied with the results. Finally, the proposed SDA technique is improved defining a methodology to automatically explore possible design alternatives for the modeled system to be implemented on a heterogeneous NoC-based MPSoC. The goal of the exploration is to find an implementation satisfying the designer's requirements, which can be integrated in the proposed SDA flow. Through the proposed SDA flow, the designer is relieved from implementation details and the design time of systems targeting heterogeneous NoC-based MPSoCs on FPGA is significantly reduced. In addition, it reduces possible design errors proposing a completely automated technique for fast prototyping. Compared to other SDA flows, the proposed technique targets a bare-metal solution, avoiding the use of an operating system (OS). This reduces the memory requirements on the FPGA platform comparing to related work targeting MPSoC on FPGA. At the same time, the performance (throughput) of the modeled applications can be increased when the number of processors of the target platform is increased. This is shown through a wide set of case studies implemented on FPGA.

QC 20140609

APA, Harvard, Vancouver, ISO, and other styles

27

Abdallah, Fadel. "Optimization and Scheduling on Heterogeneous CPU/FPGA Architecture with Communication Delays." Thesis, Université de Lorraine, 2017. http://www.theses.fr/2017LORR0301.

Full text

Abstract:

Le domaine de l'embarqué connaît depuis quelques années un essor important avec le développement d'applications de plus en plus exigeantes en calcul auxquels les architectures traditionnelles à base de processeurs (mono/multi cœur) ne peuvent pas toujours répondre en termes de performances. Si les architectures multiprocesseurs ou multi cœurs sont aujourd'hui généralisées, il est souvent nécessaire de leur adjoindre des circuits de traitement dédiés, reposant notamment sur des circuits reconfigurables, permettant de répondre à des besoins spécifiques et à des contraintes fortes particulièrement lorsqu'un traitement temps-réel est requis. Ce travail présente l'étude des problèmes d'ordonnancement dans les architectures hétérogènes reconfigurables basées sur des processeurs généraux (CPUs) et des circuits programmables (FPGAs). L'objectif principal est d'exécuter une application présentée sous la forme d'un graphe de précédence sur une architecture hétérogène CPU/FPGA, afin de minimiser le critère de temps d'exécution total ou makespan (Cmax). Dans cette thèse, nous avons considéré deux cas d'étude : un cas d'ordonnancement qui tient compte des délais d'intercommunication entre les unités de calcul CPU et FPGA, pouvant exécuter une seule tâche à la fois, et un autre cas prenant en compte le parallélisme dans le FPGA, qui peut exécuter plusieurs tâches en parallèle tout en respectant la contrainte surfacique. Dans un premier temps, pour le premier cas d'étude, nous proposons deux nouvelles approches d'optimisation, GAA (Genetic Algorithm Approach) et MGAA (Modified Genetic Algorithm Approach), basées sur des algorithmes génétiques. Nous proposons également de tester un algorithme par séparation et évaluation (méthode Branch & Bound). Les approches GAA et MGAA proposées offrent un très bon compromis entre la qualité des solutions obtenues (critère d'optimisation de makespan) et le temps de calcul nécessaire à leur obtention pour résoudre des problèmes à grande échelle, en comparant à la méthode par séparation et évaluation (Branch & Bound) proposée et l'autre méthode exacte proposée dans la littérature. Dans un second temps, pour le second cas d'étude, nous avons proposé et implémenté une méthode basée sur les algorithmes génétiques pour résoudre le problème du partitionnement temporel dans un circuit FPGA en utilisant la reconfiguration dynamique. Cette méthode fournit de bonnes solutions avec des temps de calcul raisonnables. Nous avons ensuite amélioré notre précédente approche MGAA afin d'obtenir une nouvelle approche intitulée MGA (Multithreaded Genetic Algorithm), permettent d'apporter des solutions au problème de partitionnement. De plus, nous avons également proposé un algorithme basé sur le recuit simulé, appelé MSA (Multithreaded Simulated Annealing). Ces deux approches proposées, basées sur les méthodes métaheuristiques, permettent de fournir des solutions approchées dans un intervalle de temps très raisonnable aux problèmes d'ordonnancement et de partitionnement sur système de calcul hétérogène
The domain of the embedded systems becomes more and more attractive in recent years with the development of increasing computationally demanding applications to which the traditional processor-based architectures (either single or multi-core) cannot always respond in terms of performance. While multiprocessor or multicore architectures have now become generalized, it is often necessary to add to them dedicated processing circuits, based in particular on reconfigurable circuits, to meet specific needs and strong constraints, especially when real-time processing is required. This work presents the study of scheduling problems into the reconfigurable heterogeneous architectures based on general processors (CPUs) and programmable circuits (FPGAs). The main objective is to run an application presented in the form of a Data Flow Graph (DFG) on a heterogeneous CPU/FPGA architecture in order to minimize the total running time or makespan criterion (Cmax). In this thesis, we have considered two case studies: a scheduling case taking into account the intercommunication delays and where the FPGA device can perform a single task at a time, and another case taking into account parallelism in the FPGA, which can perform several tasks in parallel while respecting the constraint surface. First, in the first case, we propose two new optimization approaches GAA (Genetic Algorithm Approach) and MGAA (Modified Genetic Algorithm Approach) based on genetic algorithms. We also propose to compare these algorithms to a Branch & Bound method. The proposed approaches (GAA and MGAA) offer a very good compromise between the quality of the solutions obtained (optimization makespan criterion) and the computational time required to perform large-scale problems, unlike to the proposed Branch & Bound and the other exact methods found in the literature. Second, we first implemented an updated method based on genetic algorithms to solve the temporal partitioning problem in an FPGA circuit using dynamic reconfiguration. This method provides good solutions in a reasonable running time. Then, we improved our previous MGAA approach to obtain a new approach called MGA (Multithreaded Genetic Algorithm), which allows us to provide solutions to the partitioning problem. In addition, we have also proposed an algorithm based on simulated annealing, called MSA (Multithreaded Simulated Annealing). These two proposed approaches which are based on metaheuristic methods provide approximate solutions within a reasonable time period to the scheduling and partitioning problems on a heterogeneous computing system

APA, Harvard, Vancouver, ISO, and other styles

28

Wahab, Muhammad Abdul. "Hardware support for the security analysis of embedded softwares : applications on information flow control and malware analysis." Thesis, CentraleSupélec, 2018. http://www.theses.fr/2018CSUP0003.

Full text

Abstract:

Le contrôle de flux d’informations, Dynamic Information Flow Tracking (DIFT), permet de détecter différents types d’attaques logicielles tels que les dépassements de tampon ou les injections SQL. Dans cette thèse, une solution ciblant les processeurs hardcore ARM Cortex-A9 est proposée. Notre approche s’appuie sur des composants ARM CoreSight, qui permettent de tracer l’exécution des programmes exécutés par le processeur, afin de réaliser le contrôle de flux d’informations. Le co-processeur DIFT que nous proposons est réalisé dans la partie FPGA Artix-7 du système sur puce (SoC) Zynq. Il est montré que l’utilisation des composants ARM CoreSight n’ajoute pas de surcoût en terme de temps d’exécution et permet une amélioration du temps de communication entre le processeur ARM et le coprocesseur DIFT
Information flow control (also known as Dynamic Information Flow Tracking, DIFT), allows a user to detect several types of software attacks such as buffer overflow or SQL injections. In this thesis, a solution based on the ARM Cortex-A9 processor family is proposed. Our approach relies on the use of ARM CoreSight components, which are able to trace software as executed by the processor in order to perform the information flow tracking. The DIFT coprocessor proposed in this thesis is implemented in an Artix-7 FPGA, embedded in a System-on-Chip (SoC) Zynq provided by Xilinx. It is shown that using ARM CoreSight components does not add a latency overhead while giving a better communication time between the ARM processor and the DIFT coprocessor

APA, Harvard, Vancouver, ISO, and other styles

29

Saussard, Romain. "Méthodologies et outils de portage d’algorithmes de traitement d’images sur cibles hardware mixte." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS176/document.

Full text

Abstract:

Les constructeurs automobiles proposent de plus en plus des systèmes d'aide à la conduite, en anglais Advanced Driver Assistance Systems (ADAS), utilisant des caméras et des algorithmes de traitement d'images. Pour embarquer des applications ADAS, les fondeurs proposent des architectures embarquées hétérogènes. Ces Systems-on-Chip (SoCs) intègrent sur la même puce plusieurs processeurs de différentes natures. Cependant, avec leur complexité croissante, il devient de plus en plus difficile pour un industriel automobile de choisir un SoC qui puisse exécuter une application ADAS donnée avec le respect des contraintes temps-réel. De plus le caractère hétérogène amène une nouvelle problématique : la répartition des charges de calcul entre les différents processeurs du même SoC.Pour répondre à cette problématique, nous avons défini au cours de cette thèse une méthodologie globale de l’analyse de l'embarquabilité d'algorithmes de traitement d'images pour une exécution temps-réel. Cette méthodologie permet d'estimer l'embarquabilité d'un algorithme de traitement d'images sur plusieurs SoCs hétérogènes en explorant automatiquement les différentes répartitions de charge de calcul possibles. Elle est basée sur trois contributions majeures : la modélisation d'un algorithme et ses contraintes temps-réel, la caractérisation d'un SoC hétérogène et une méthode de prédiction de performances multi-architecture
Car manufacturers increasingly provide Advanced Driver Assistance Systems (ADAS) based on cameras and image processing algorithms. To embed ADAS applications, semiconductor companies propose heterogeneous architectures. These Systems-on-Chip (SoCs) are composed of several processors with different capabilities on the same chip. However, with the increasing complexity of such systems, it becomes more and more difficult for an automotive actor to chose a SoC which can execute a given ADAS application while meeting real-time constraints. In addition, embedding algorithms on this type of hardware is not trivial: one needs to determine how to spread the computational load between the different processors, in others words the mapping of the computational load.In response to this issue, we defined during this thesis a global methodology to study the embeddability of image processing algorithms for real-time execution. This methodology predicts the embeddability of a given image processing algorithm on several heterogeneous SoCs by automatically exploring the possible mapping. It is based on three major contributions: the modeling of an algorithm and its real-time constraints, the characterization of a heterogeneous SoC, and a performance prediction approach which can address different types of architectures

APA, Harvard, Vancouver, ISO, and other styles

30

Arras, Paul-Antoine. "Ordonnancement d'applications à flux de données pour les MPSoC embarqués hybrides comprenant des unités de calcul programmables et des accélérateurs matériels." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0031/document.

Full text

Abstract:

Bien que de nombreux appareils numériques soient aujourd'hui capables de lire des contenus vidéo en temps réel et d'offrir une restitution de grande qualité, le décodage vidéo dans les systèmes embarqués n'en est pas pour autant devenu une opération anodine. En effet, les codecs récents tels que H.264 et HEVC sont d'une complexité telle que le recours à des architectures mixtes logiciel/matériel est presque incontournable. Or les plateformes de ce type sont notoirement difficiles à programmer efficacement. Cette thèse relève le défi du développement d'applications à flux de données pour les cibles embarquées hybrides et de leur exécution efficace, et propose plusieurs contributions. La première est une extension des heuristiques d'ordonnancement de liste pour tenir compte des contraintes mémorielles. La seconde est un modèle d'exécution à flot de données compatible avec la plupart des modèles existants et avec une large classe de plateformes matérielles, ainsi qu'un ordonnanceur dynamique. Enfin, de nombreux développements ont été menés sur une architecture réelle de STMicroelectronics pour démontrer la faisabilité de l'approche
Although numerous electronic devices are nowadays able to play video contents in real time and offer high-quality reproduction, video decoding in embedded systems has not become a trivial process yet. As a mater of fact, recent codecs such as H.264 and HEVC exhibit such a complexity that resorting to mixed sofware-hardware architecture is almost unavoidable. However, programming efficiently this kind of platforms is well-known to be tricky. This thesis addresses the issue of developing streaming applications for hybrid embedded targets and executing them efficiently, and proposes several contributions. The first one is an extension of the classical list-scheduling heuristics to take memory constraints into account. Te second one is a datafow execution model compatible with most existing models and with a large set of hardware platforms, as well as a dynamic scheduler. Lastly, numerous developments have been carried out on a real-world architecture from STMicroelectronics so as to demonstrate the feasibility of the approach

APA, Harvard, Vancouver, ISO, and other styles

31

Bergenhem, Carl, and Magnus Jonsson. "Two Protocols with Heterogeneous Real-Time Services for High-Performance Embedded Networks." Högskolan i Halmstad, Centrum för forskning om inbyggda system (CERES), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-21296.

Full text

Abstract:

High-performance embedded networks are found in computer systems that perform applications such as radar signal processing and multimedia rendering. The system can be composed of multiple computer nodes that are interconnected with the network. Properties of the network such as latency and speed affect the performance of the entire system. A node´s access to the network is controlled by a medium access protocol. This protocol decides e.g. real-time properties and services that the network will offer its users, i.e. the nodes. Two such network protocols with heterogeneous real-time services are presented. The protocols offer different communication services and services for parallel and distributed real-time processing. The latter services include barrier synchronisation, global reduction and short message service. A network topology of a unidirectional pipelined optical fibre-ribbon ring is assumed for both presented protocols. In such a network several simultaneous transmissions in non-overlapping segments are possible. Both protocols are aimed for applications that require a high-performance embedded network such as radar signal processing and multimedia. In these applications the system can be organised as multiple interconnected computation nodes that co-operate in parallel to achieve higher performance. The computing performance of the whole system is greatly affected by the choice of network. Computing nodes in a system for radar signal processing should be tightly coupled, i.e., communications cost, such as latency, between nodes should be small. This is possible if a suitable network with an efficient protocol is used. The target applications have heterogeneous real-time requirements for communication in that different classes of data-traffic exist. The traffic can be classified according to its requirements. The proposed protocols partition data-traffic into three classes with distinctly different qualities. These classes are: traffic with hard real-time demands, such as mission critical commands; traffic with soft real-time demands, such as application data (a deadline miss here only leads to decreased performance); and traffic with no real-time constraints at all. The protocols are analysed and performance is tested through simulation with different data-traffic patterns.

APA, Harvard, Vancouver, ISO, and other styles

32

Swegert, Eric B. "RTOS Tutorials for a Heterogeneous Class of Senior and Beginning Graduate Students." University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1367934958.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Negreiros, ângelo Lemos Vidal de. "Desenvolvimento e Avaliação de Simulação Distribuída para Projeto de Sistemas Embarcados com Ptolemy." Universidade Federal da Paraíba, 2014. http://tede.biblioteca.ufpb.br:8080/handle/tede/6106.

Full text

Abstract:

Made available in DSpace on 2015-05-14T12:36:43Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 3740448 bytes, checksum: df44ddc74f1029976a1e1beb1c698bf6 (MD5) Previous issue date: 2014-01-29
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Nowadays, embedded systems have a huge amount of computational power and consequently, high complexity. It is quite usual to find different applications being executed in embedded systems. Embedded system design demands for method and tools that allow the simulation and verification in an efficient and practical way. This paper proposes the development and evaluation of a solution for embedded modeling and simulation of heterogeneous Models of Computation in a distributed way by the integration of Ptolemy II and the High Level Architecture (HLA), a middleware for distributed discrete event simulation, in order to create an environment with high-performance execution of large-scale heterogeneous models. Experimental results demonstrated that the use of a non distributed simulation for some situations as well as the use of distributed simulation with few machines, like one, two or three computers can be infeasible. It was also demonstrated the feasibility of the integration of both technologies and so the advantages in its usage in many different scenarios. This conclusion was possible because the experiments captured some data during the simulation: execution time, exchanged data and CPU usage. One of the experiments demonstrated that a speedup of factor 4 was acquired when a model with 4,000 thousands actors were distributed in 8 different machines inside an experiment that used up to 16 machines. Furthermore, experiments have also shown that the use of HLA presents great advantages in fact, although with certain limitations.
Atualmente, sistemas embarcados têm apresentado grande poder computacional e consequentemente, alta complexidade. É comum encontrar diferentes aplicações sendo executadas em sistemas embarcados. O projeto de sistemas embarcados demanda métodos e ferramentas que possibilitem a simulação e a verificação de um modo eficiente e prático. Este trabalho propõe o desenvolvimento e a avaliação de uma solução para a modelagem e simulação de sistemas embarcados heterogêneos de forma distribuída, através da integração do Ptolemy II com o High Level Architecture (HLA), em que o último é um middleware para simulação de eventos discretos distribuídos. O intuito dessa solução é criar um ambiente com alto desempenho que possibilite a execução em larga escala de modelos heterogêneos. Os resultados dos experimentos demonstraram que o uso da simulação não distribuída para algumas situações assim como o uso da simulação distribuída utilizando poucas máquinas, como, uma, duas ou três podem ser inviável. Demonstrou-se também a viabilidade da integração das duas tecnologias, além de vantagens no seu uso em diversos cenários de simulação, através da realização de diversos experimentos que capturavam dados como: tempo de execução, dados trocados na rede e uso da CPU. Em um dos experimentos realizados consegue-se obter o speedup de fator quatro quando o modelo com quatro mil atores foi distribuído em oito diferentes computadores, em um experimento que utilizava até 16 máquinas distintas. Além disso, os experimentos também demonstraram que o uso do HLA apresenta grandes vantagens, de fato, porém com certas limitações.

APA, Harvard, Vancouver, ISO, and other styles

34

Bouhadiba, Tayeb Sofiane. "42, Une approche à composants pour le prototypage virtuel des systèmes embarqués hétérogènes." Grenoble, 2010. https://theses.hal.science/tel-00539648.

Full text

Abstract:

Les travaux présentés dans cette thèse portent sur le prototypage virtuel des systèmes embarqués hétérogènes. La complexité des systèmes embarqués fait qu'il est difficile de trouver une solution optimale. Ainsi, les approches adoptées par les ingénieurs reposent sur la simulation qui requiert le prototypage virtuel. L'intérêt du prototypage virtuel est de fournir des modèles exécutables de systèmes embarqués afin de les étudier du point de vue fonctionnel et non-fonctionnel. Notre contribution consiste en la définition d'une nouvelle approche à composants pour le prototypage virtuel des systèmes embarqués, appelé 42. 42 n'est pas un nouveau langage pour le développement des systèmes embarqués, mais plutôt un outil pour la description et l'assemblage de composants pour les systèmes embarqués, au niveau système. Un modèle pour le prototypage virtuel des systèmes embarqués doit prendre en compte leur hétérogénéité. Des approches comme Ptolemy proposent un catalogue de MoCCs (Models of Computation and Communication) qui peuvent être organisés en hiérarchie afin de modéliser l'hétérogénéité. 42 s'inspire de Ptolemy dans l'organisation hiérarchique de composants et de MoCCs. Cependant, les MoCCs dans 42 ne sont pas fournis sous forme de catalogue, ils sont décrits par des programmes qui manipulent un petit ensemble de primitives de base pour activer les composants et gérer les communications entre eux. Une approche à composants comme 42 requiert un formalisme de spécification de composants. Nous étudierons les moyens proposés par 42 pour décrire les composants. Nous nous intéresserons particulièrement aux contrats de contrôle de 42. 42 est indépendant de tout langage ou formalisme. Il est conçu dans l'optique d'être utilisé conjointement avec les approches existantes. Nous donnerons une preuve de concept afin de montrer l'intérêt d'utiliser 42 et les contrats de contrôle associés aux composants, conjointement avec des approches existantes
The work presented in this thesis deals with virtual prototyping of heterogeneous embedded systems. The complexity of these systems make it difficult to find an optimal solution. Hence, engineers usually make simulations that require virtual prototyping of the system. Virtual prototyping of an embedded system aims at providing an executable model of it, in order to study its functional as well as its non-functional aspects. Our contribution is the definition of a new component-based approach for the virtual prototyping of embedded systems, called 42. 42 is not a new language for the design of embedded systems, it is a tool for describing components and assemblies for embedded systems at the system-level. Virtual prototyping of embedded systems must take into account their heterogeneous aspect. Following Ptolemy, several approaches propose a catalog of MoCCs (Models of Computation and Communication) and a framework for hierarchically combining them in order to model heterogeneity. As in Ptolemy, 42 allows to organize components and MoCCs in hierarchy. However, the MoCCs in 42 are described by means of programs manipulating a small set of basic primitives to activate components and to manage their communication. A component-based approach like 42 requires a formalism for specifying components. 42 proposes several means for specifying components. We will present these means an give particular interest to 42 control contracts. 42 is designed independently from any language or formalism and may be used jointly with the existing approaches. We provide a proof of concept to demonstrate the interest of using 42 and its control contracts with the existing approaches

APA, Harvard, Vancouver, ISO, and other styles

35

Endo, Fernando Akira. "Génération dynamique de code pour l'optimisation énergétique." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM044/document.

Full text

Abstract:

Dans les systèmes informatiques, la consommation énergétique est devenue le facteur le plus limitant de la croissance de performance observée pendant les décennies précédentes. Conséquemment, les paradigmes d'architectures d'ordinateur et de développement logiciel doivent changer si nous voulons éviter une stagnation de la performance durant les décennies à venir.Dans ce nouveau scénario, des nouveaux designs architecturaux et micro-architecturaux peuvent offrir des possibilités d'améliorer l'efficacité énergétique des ordinateurs, grâce à la spécialisation matérielle, comme par exemple les configurations de cœurs hétérogènes, des nouvelles unités de calcul et des accélérateurs. D'autre part, avec cette nouvelle tendance, le développement logiciel devra faire face au manque de portabilité de la performance entre les matériels toujours en évolution et à l'écart croissant entre la performance exploitée par les programmeurs et la performance maximale exploitable du matériel. Pour traiter ce problème, la contribution de cette thèse est une méthodologie et la preuve de concept d'un cadriciel d'auto-tuning à la volée pour les systèmes embarqués. Le cadriciel proposé peut à la fois adapter du code à une micro-architecture inconnue avant la compilation et explorer des possibilités d'auto-tuning qui dépendent des données d'entrée d'un programme.Dans le but d'étudier la capacité de l'approche proposée à adapter du code à des différentes configurations micro-architecturales, j'ai développé un cadriciel de simulation de processeurs hétérogènes ARM avec exécution dans l'ordre ou dans le désordre, basé sur les simulateurs gem5 et McPAT. Les expérimentations de validation ont démontré en moyenne des erreurs absolues temporels autour de 7 % comparé aux ARM Cortex-A8 et A9, et une estimation relative d'énergie et de performance à 6 % près pour le benchmark Dhrystone 2.1 comparée à des CPUs Cortex-A7 et A15 (big.LITTLE). Les résultats de validation temporelle montrent que gem5 est beaucoup plus précis que les simulateurs similaires existants, dont les erreurs moyennes sont supérieures à 15 %.Un composant important du cadriciel d'auto-tuning à la volée proposé est un outil de génération dynamique de code, appelé deGoal. Il définit un langage dédié dynamique et bas-niveau pour les noyaux de calcul. Pendant cette thèse, j'ai porté deGoal au jeu d'instructions ARM Thumb-2 et créé des nouvelles fonctionnalités pour l'auto-tuning à la volée. Une validation préliminaire dans des processeurs ARM ont montré que deGoal peut en moyenne générer du code machine avec une qualité équivalente ou supérieure comparé aux programmes de référence écrits en C, et même par rapport à du code vectorisé à la main.La méthodologie et la preuve de concept de l'auto-tuning à la volée dans des processeurs embarqués ont été développées autour de deux applications basées sur noyau de calcul, extraits de la suite de benchmark PARSEC 3.0 et de sa version vectorisée à la main PARVEC.Dans l'application favorable, des accélérations de 1.26 et de 1.38 ont été observées sur des cœurs réels et simulés, respectivement, jusqu'à 1.79 et 2.53 (toutes les surcharges dynamiques incluses).J'ai aussi montré par la simulation que l'auto-tuning à la volée d'instructions SIMD aux cœurs d'exécution dans l'ordre peut surpasser le code de référence vectorisé exécuté par des cœurs d'exécution dans le désordre similaires, avec une accélération moyenne de 1.03 et une amélioration de l'efficacité énergétique de 39 %.L'application défavorable a été choisie pour montrer que l'approche proposée a une surcharge négligeable lorsque des versions de noyau plus performantes ne peuvent pas être trouvées.En faisant tourner les deux applications sur les processeurs réels, la performance de l'auto-tuning à la volée est en moyenne seulement 6 % en dessous de la performance obtenue par la meilleure implémentation de noyau trouvée statiquement
In computing systems, energy consumption is limiting the performance growth experienced in the last decades. Consequently, computer architecture and software development paradigms will have to change if we want to avoid a performance stagnation in the next decades.In this new scenario, new architectural and micro-architectural designs can offer the possibility to increase the energy efficiency of hardware, thanks to hardware specialization, such as heterogeneous configurations of cores, new computing units and accelerators. On the other hand, with this new trend, software development should cope with the lack of performance portability to ever changing hardware and with the increasing gap between the performance that programmers can extract and the maximum achievable performance of the hardware. To address this issue, this thesis contributes by proposing a methodology and proof of concept of a run-time auto-tuning framework for embedded systems. The proposed framework can both adapt code to a micro-architecture unknown prior compilation and explore auto-tuning possibilities that are input-dependent.In order to study the capability of the proposed approach to adapt code to different micro-architectural configurations, I developed a simulation framework of heterogeneous in-order and out-of-order ARM cores. Validation experiments demonstrated average absolute timing errors around 7 % when compared to real ARM Cortex-A8 and A9, and relative energy/performance estimations within 6 % for the Dhrystone 2.1 benchmark when compared to Cortex-A7 and A15 (big.LITTLE) CPUs.An important component of the run-time auto-tuning framework is a run-time code generation tool, called deGoal. It defines a low-level dynamic DSL for computing kernels. During this thesis, I ported deGoal to the ARM Thumb-2 ISA and added new features for run-time auto-tuning. A preliminary validation in ARM processors showed that deGoal can in average generate equivalent or higher quality machine code compared to programs written in C, including manually vectorized codes.The methodology and proof of concept of run-time auto-tuning in embedded processors were developed around two kernel-based applications, extracted from the PARSEC 3.0 suite and its hand vectorized version PARVEC. In the favorable application, average speedups of 1.26 and 1.38 were obtained in real and simulated cores, respectively, going up to 1.79 and 2.53 (all run-time overheads included). I also demonstrated through simulations that run-time auto-tuning of SIMD instructions to in-order cores can outperform the reference vectorized code run in similar out-of-order cores, with an average speedup of 1.03 and energy efficiency improvement of 39 %. The unfavorable application was chosen to show that the proposed approach has negligible overheads when better kernel versions can not be found. When both applications run in real hardware, the run-time auto-tuning performance is in average only 6 % way from the performance obtained by the best statically found kernel implementations

APA, Harvard, Vancouver, ISO, and other styles

36

Gómez, Cárdenas Carlos Ernesto. "Une approche multi-vue pour la modélisation système de propriétés fonctionnelles et non-fonctionnelles." Phd thesis, Université Nice Sophia Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00931001.

Full text

Abstract:

Au niveau système, un ensemble d'experts spécifient des propriétés fonctionnelles et non fonctionnelles en utilisant chacun leurs propres modèles théoriques, outils et environnements. Chacun essaye d'utiliser les formalismes les plus adéquats en fonction des propriétés à vérifier. Cependant, chacune des vues d'expertise pour un domaine s'appuie sur un socle commun et impacte direct ou indirectement les modèles décrits par les autres experts. Il est donc indispensable de maintenir une cohérence sémantique entre les différents points de vue, et de pouvoir réconcilier et agréger chacun des points de vue avant de poursuivre les différentes phases d'analyse. Cette thèse propose un modèle, dénommé PRISMSYS, qui s'appuie sur une approche multi-vue dirigée par les modèles et dans laquelle pour chacun des domaines, chaque expert décrit les concepts de son domaine et la relation que ces concepts entretiennent avec le modèle socle. L'approche permet de maintenir la cohérence sémantique entre les différentes vues à travers la manipulation d'événements et d'horloges logiques. PRISMSYS est basé sur un profil UML qui s'appuie autant que possible sur les profils SysML et MARTE. Le modèle sémantique qui maintien la cohérence est spécifié avec le langage CCSL qui est un langage formel déclaratif pour la spécification de relations causales et temporelles entre les événements de différentes vues. L'environnement proposé par PRISMSYS permet la co-simulation du modèle et l'analyse. L'approche est illustrée en s'appuyant sur une architecture matérielle dans laquelle le domaine d'analyse privilégié est un domaine de consommation de puissance.

APA, Harvard, Vancouver, ISO, and other styles

37

Butko, Anastasiia. "Techniques de simulation rapide quasi cycle-précise pour l'exploration d'architectures multicoeur." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS144/document.

Full text

Abstract:

Le calcul intensif joue un rôle moteur de premier plan pour de nombreux domaines scientifiques. La croissance en puissance crête des supercalculateurs a évolué du téraflops au pétaflops en l'espace d'une décennie. Toutefois, la consommation d'énergie associée extrêmement élevée ainsi que le coût associé ont motivé des recherches vers des technologies plus efficaces énergétiquement comme l'utilisation de processeurs issus du domaine des systèmes embarqués à faible puissance.Selon les prévisions, les systèmes multicœurs émergents seront constitués de centaines de cœurs d'ici la fin de la décennie. Cette évolution nécessite des solutions efficaces pour l'exploration de l'espace de conception et le débogage. Les simulateurs industriels et académiques disponibles à ce jour diffèrent en termes de compromis entre vitesse de simulation et précision. Leur adoption est généralement définie par le niveau d'exploration souhaité. Les simulateurs quasi cycle-précis sont populaires et attrayants pour l'exploration architecturale. Alors que la vitesse de simulation est trivialement observée, le niveau de précision de ces simulateurs reste souvent flou. En outre, bien que permettant une évaluation flexible et détaillée de l'architecture, les simulateurs quasi cycle-précis entraînent des vitesses de simulation lentes ce qui limite leur champ d'application pour des systèmes avec des centaines de cœurs. Cela exige des approches alternatives capables de fournir des simulations rapides tout en préservant une précision élevée ce qui est cruciale pour l'exploration architecturale.Dans cette thèse, des modèles d'architectures multicœurs complexes ont été développés et évalués en utilisant des systèmes de simulation quasi cycle-précis pour l'exploration de la performance et de la puissance. Sur cette base, une approche hybride orientée traces d'exécution a été proposée pour permettre une exploration rapide, flexible et précise des architectures multicœurs à grande échelle. Sur la base de l'environnement de simulation proposé, plusieurs configurations de systèmes manycoeurs ont été construites et évaluées en évaluant le passage à l'échelle des performances. Enfin, des configurations alternatives d'architectures multicœurs hétérogènes ont été proposées et ont montré des améliorations significatives en termes d'efficacité énergétique
Since the computational needs precipitously grow each year, HPC technology becomes a driving force for numerous scientific and consumer areas. The most powerful supercomputer has been progressing from TFLOPS to PFLOPS throughout the last ten years. However, the extremely high power consumption and therefore the high cost pushed researchers to explore more energy-efficient technologies, such as the use of low-power embedded SoCs.The evolution of emerging manycore systems, forecasted to feature hundreds of cores by the end of the decade calls for efficient solutions for the design space exploration and debugging. Available industrial and academic simulators differ in terms of simulation speed/accuracy trade-offs. Cycle-approximate simulators are popular and attractive for architectural exploration. Even though enabling flexible and detailed architecture evaluation, cycle-approximate simulators entail slow simulation speeds, thereby limiting their scope of applicability for systems with hundreds of cores. This calls for alternative approaches capable of providing high simulation speed while preserving accuracy that is crucial to architectural exploration.In this thesis, we evaluate cycle-approximate simulation techniques for fast and accurate exploration of multi- and manycore architectures. Expecting to significantly reduce simulation time still preserving the accuracy at the cycle-approximate level, we propose a hybrid trace-oriented approach to enable flexible manycore architecture simulation. We design a set of simulation techniques to overcome the main weaknesses of the trace-oriented approach. The trace synchronization technique aims to manage control and data dependencies arising from the abstraction of processor cores. The trace replication technique is proposed to simulate manycore architectures using a finite set of pre-collected traces. The computation phase scaling technique is designed to enable flexible switching between multiple processor models without considering microarchitectural difference but taking into account the computation speed ratio. Based on the proposed simulation environment, we explore several manycore architectures in terms of performance and energy-efficiency trade-offs

APA, Harvard, Vancouver, ISO, and other styles

38

Vodel, Matthias. "Funkstandardübergreifende Kommunikation in Mobilen Ad Hoc Netzwerken." Doctoral thesis, Universitätsbibliothek Chemnitz, 2010. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-201001164.

Full text

Abstract:

Der neunte Band der wissenschaftlichen Schriftenreihe Eingebettete, Selbstorganisierende Systeme widmet sich der funkstandardübergreifenden Kommunikation in Mobilen Ad Hoc Netzwerken. Im Zuge der fortschreitenden, drahtlosen Vernetzung mobiler Endgeräte entstehen immer neue, hochspezialisierte Kommunikationsstandards. Deren Übertragungseigenschaften sind dabei eng an den jeweiligen Anwendungsfokus gebunden. Eine intelligente Verknüpfung der verfügbaren Standards würde die Integrationsmöglichkeiten der Endgeräte deutlich erhöhen. Gleichzeitig bieten sich vielfältige Möglichkeiten, die Kommunikation bezüglich auftretender Latenzen, der Erreichbarkeit und dem Energieverbrauch zu optimieren. Im Rahmen dieser Arbeit stellt Herr Vodel ein generisches Konzept vor, welches eine solche Verknüpfung applikationsunabhängig ermöglicht. Der entwickelte Integrationsansatz nutzt dabei handelsübliche, am Markt verfügbare Funkmodule, welche auf einer hardwarenahen Ebene gekapselt werden. Der Anwendungsfokus liegt speziell im Bereich eingebetteter sowie mobiler, ressourcenbeschränkter Systeme. Für die Umsetzung des Konzeptes werden drei wesentliche Problemstellungen betrachtet. Zunächst muss die grundlegende Initialisierung und Verwaltung der heterogenen, funkstandardübergreifenden Topologie sichergestellt werden. Darauf aufbauend wird eine effiziente Routingstrategie vorgestellt, welche die Vorteile der geschaffenen Netzwerkstruktur in vollem Umfang nutzen kann. Im Zuge eines möglichen Funkstandardwechsels während der Übertragung muss außerdem ein reibungsloser Konvertierungsprozess garantiert werden. Die Evaluierung des vorgestellten Kommunikationskonzeptes erfolgt auf zwei Ebenen. Ein speziell entwickeltes Simulations-Framework ermöglicht weitreichende Testreihen in komplexen Netzwerktopologien. Mit der Entwicklung einer prototypischen Hardware-Plattform können parallel dazu detaillierte Messungen unter Realbedingungen durchgeführt werden. Die Schwerpunkte dieser Arbeit umfassen somit Konzeption, Simulation und praktische Umsetzung eines neuen Kommunikationsansatzes im Bereich mobiler Ad Hoc Netzwerke. Ich freue mich daher, Herrn Vodel für die Veröffentlichung der Ergebnisse seiner Arbeiten in dieser wissenschaftlichen Schriftenreihe gewonnen zu haben, und wünsche allen Lesern einen interessanten Einblick in dieses Themengebiet.

APA, Harvard, Vancouver, ISO, and other styles

39

Neill, Richard W. "Heterogeneous Cloud Systems Based on Broadband Embedded Computing." Thesis, 2013. https://doi.org/10.7916/D8HH6JG1.

Full text

Abstract:

Computing systems continue to evolve from homogeneous systems of commodity-based servers within a single data-center towards modern Cloud systems that consist of numerous data-center clusters virtualized at the infrastructure and application layers to provide scalable, cost-effective and elastic services to devices connected over the Internet. There is an emerging trend towards heterogeneous Cloud systems driven from growth in wired as well as wireless devices that incorporate the potential of millions, and soon billions, of embedded devices enabling new forms of computation and service delivery. Service providers such as broadband cable operators continue to contribute towards this expansion with growing Cloud system infrastructures combined with deployments of increasingly powerful embedded devices across broadband networks. Broadband networks enable access to service provider Cloud data-centers and the Internet from numerous devices. These include home computers, smart-phones, tablets, game-consoles, sensor-networks, and set-top box devices. With these trends in mind, I propose the concept of broadband embedded computing as the utilization of a broadband network of embedded devices for collective computation in conjunction with centralized Cloud infrastructures. I claim that this form of distributed computing results in a new class of heterogeneous Cloud systems, service delivery and application enablement. To support these claims, I present a collection of research contributions in adapting distributed software platforms that include MPI and MapReduce to support simultaneous application execution across centralized data-center blade servers and resource-constrained embedded devices. Leveraging these contributions, I develop two complete prototype system implementations to demonstrate an architecture for heterogeneous Cloud systems based on broadband embedded computing. Each system is validated by executing experiments with applications taken from bioinformatics and image processing as well as communication and computational benchmarks. This vision, however, is not without challenges. The questions on how to adapt standard distributed computing paradigms such as MPI and MapReduce for implementation on potentially resource-constrained embedded devices, and how to adapt cluster computing runtime environments to enable heterogeneous process execution across millions of devices remain open-ended. This dissertation presents methods to begin addressing these open-ended questions through the development and testing of both experimental broadband embedded computing systems and in-depth characterization of broadband network behavior. I present experimental results and comparative analysis that offer potential solutions for optimal scalability and performance for constructing broadband embedded computing systems. I also present a number of contributions enabling practical implementation of both heterogeneous Cloud systems and novel application services based on broadband embedded computing.

APA, Harvard, Vancouver, ISO, and other styles

40

Qiu, Meikang. "Time and power optimization for heterogeneous parallel embedded systems /." 2007. http://proquest.umi.com/pqdweb?did=1296105641&sid=1&Fmt=2&clientId=10361&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Liu, Jian-Hong, and 劉建宏. "A Micro-Kernel for Embedded Systems with Heterogeneous Multiprocessors." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/62080472684263138746.

Full text

Abstract:

碩士
國立成功大學
電機工程學系碩博士班
92
This thesis presents how to build a kernel which is based on micro-kernel architecture on a SOC of heterogeneous multiprocessors. The micro-kernel architecture is based on message-passing mechanism. There are four essential parts in the kernel, namely inter-process communication, hardware interrupt handler, inter-processor communication and scheduler. Message-passing between processes is the responsibility of inter-process communication. When a hardware interrupt is triggered, hardware interrupt handler must handle it. Inter-processor communication handles the communication between processors. Choosing next process to execute is scheduler’s duty. 　　The kernel described in this thesis has been implemented on a reference design of TI TMS320DSC25 which is a heterogeneous multiprocessors SOC containing a ARM7TDMI core and a C5409 DSP core. ARM7 is a general purpose processor with 32-bit capability, while C5409 is a digital signal processor with 16-bit capability. The ARM processor and the DSP processor run their own copy of the kernel independently. Except the hardware dependent functionality, the two copies of the kernel are designed with the same structure providing same service functions through the same application program interfaces. 　　By executing micro-kernel on DSC25, different processes executing on different processors can communicate or request services via inter-processor communication. Finally, every critical sections of the kernel, no matter whichever processor it running on, can complete in a bounded time.

APA, Harvard, Vancouver, ISO, and other styles

42

Sousa, Luís Miguel Mendes Pimentel Alves de. "Runtime Management of Heterogeneous Compute Resources in Embedded Systems." Master's thesis, 2021. https://hdl.handle.net/10216/137152.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Ku, Chun-Wei, and 古君葳. "Heterogeneous Sensing Fusion for Safety Critical Embedded Real-time Systems." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/d73267.

Full text

Abstract:

碩士
國立臺灣大學
資訊工程學研究所
106
Nowadays, the bulk of these road collisions is caused by human unawareness or distraction. Since the most important thing is your safety and the safety of others, ADAS is developed to support enhanced vehicle system for safety and better driving. AEBS as an important part of the ADAS has become a hot research topic. Computer vision, together with Radar and Lidar, is at the forefront of technologies that enable the evolution of AEBS. Since the cost of long range radar and lidar is very high, we want to use camera-based system to construct AEBS. Instead of using a single monocular camera, we propose a heterogeneous camera-based system to use sensor fusion to combine the strengths of all the difference FoV cameras. Also,We use a heuristic false positive removal method to decrease the false positive rate that caused by the sensor fusion method. We optimize the sensor fusion method Because of the the limitation of computing resource on embedded system. As a result, the recall of YOLO can be increased up to 10% through our heterogeneous camera-based system.

APA, Harvard, Vancouver, ISO, and other styles

44

Liao, Han-chiang, and 廖翰強. "Real-time on-line Task Scheduling for Heterogeneous Multi-core Embedded Systems." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/ya9ata.

Full text

Abstract:

碩士
國立臺灣科技大學
電機工程系
99
This paper explores the real-time scheduling problems for heterogeneous multi-core systems. With the precedence constraint consideration, we test the performance of heterogeneous dual-core systems under varying schedulers, protocols, preemption point and context switch overhead. In heterogeneous multi-core systems, we discuss the performance of system under varying dispatchers, migration cost and task structures. We also propose an efficient algorithm to reduce the number of preemption in heterogeneous multi-core systems.

APA, Harvard, Vancouver, ISO, and other styles

45

Lin, Pochun, and 林柏君. "An Efficient Low Power Scheduling for Heterogeneous Dual-Core Embedded Real-Time Systems." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/89711075931157988423.

Full text

Abstract:

碩士
國立交通大學
網路工程研究所
96
In recent years, heterogeneous dual-core embedded real-time systems, such as personal digital assistants (PDAs) and cellular phones, have become more and more popular. In order to achieve real time performance and low energy consumption, low power scheduling becomes a critical issue. Most researches on low power scheduling with dynamic voltage scaling (DVS) were targeted at only one CPU or homogeneous multi-core systems. In this thesis, we propose a low power scheduling algorithm called Longer Common Execution Time (LCET) for DVS enabled heterogeneous dual-core embedded real-time systems, which includes two steps. First, we reduce total execution time of tasks by using LCET in heterogeneous dual-core embedded real-time systems. Second, we further exploit the reduced total execution time to adjust voltage and frequency levels in order to reduce the total energy consumption. Simulation results show that the proposed P-LCET (a preemptive version) and NP-LCET (a non-preemptive version) can effectively reduce the total energy consumption by 8% and 16% ~ 25% (13% and 33% ~ 38%) compared with the work by Kim et al. with (without) dynamic voltage scaling.

APA, Harvard, Vancouver, ISO, and other styles

46

李翰青. "A list-based task scheduling method for power-aware heterogeneous distributed real-time embedded systems." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/01917950348803090687.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Chuang, Chieh-Chun, and 莊俊傑. "The Design and Implementation of Dynamic Code Overlay Mechanism for Embedded Systems with Heterogeneous Processor Cores." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/65019078380647694384.

Full text

Abstract:

碩士
國立成功大學
電腦與通信工程研究所
95
This thesis presents the design and implementation of a dynamic code loading mechanism with overlay functionality on an embedded system platform of heterogeneous multiprocessor. The heterogeneous multiprocessor platform is composed of an ARM7 general purpose processor and a DSP special purpose processor. The ARM processor handles the general operations of the platform while the DSP processor runs digital signal processing software and plays the role of programmable external device. Because the available memory space at DSP side is limited, the development of application program faces big challenge. This thesis takes the policy of trading time for space to design and implement the code overlay mechanism to address the issue of limited memory space. The code overlay mechanism designed in this thesis has three characteristics: programmers don’t need to modify source code; without the restriction of compiler; there is just little overhead imposed on programmers and the mechanism is built on platform of heterogeneous multiprocessor. The implementation of the code overlay mechanism is composed of a preprocessing tool, dynamic code loading service on the ARM side and overlay manager on the DSP side. While the later two components implement the dynamic code overlay at run time, the preprocessing tool divides the application into segments, from the object program, in units of function modules to reduce the overhead incurred from loading overlays. Code structure affects the effect of overlay mechanism in general. For application programs using the dynamic code overlay mechanism there is no need for programmers to modify source program structure because the target of preprocessing is object file and the performance impact is acceptable as observed from the result of testing. Therefore, the memory space used by application program can be reduced effectively.

APA, Harvard, Vancouver, ISO, and other styles

48

Chiu, Yu-Chen, and 邱于真. "Design and Optimization of Distributed Computing on Embedded Heterogeneous Manycore Systems: A Case Study of Singular Value Decomposition." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/zzqj52.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Wei, Chien-Hao, and 魏仟豪. "Integrating and Developing the Night Lights and Lane Tracking Detection and Recording Event Functions into the Driver Assistance Systems on the Heterogeneous Dual-Core Embedded Platform." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/pqmb7w.

Full text

Abstract:

碩士
國立臺北科技大學
資訊工程系研究所
100
The car accidents have been one of the most serious problems. There are over ten million casualties in average each year due to the car accidents, especially in nighttime. Therefore, this thesis proposes a nighttime embedded driving assistance system based on computer vision technology. By analyzing the captured images, the proposed system is able to faciliate vehicle detection, lane detection, lane tracking, event detection and real-time event recording mechanisms. To detect vehicles based on features of vehicle headlights and taillights, the techniques of image segmentation and pattern analysis is proposed in this study. For obtaining lane features based on lane patterns and spline curves, the lane features and moving vehicle locations are applied for determining the possible traffic events and then activating the real-time video recording process based on H.264 compression technology. The utilized computer vision technology in this thesis is integrated and implemented on the heterogeneous dual-core embedded platform. For optimizing the algorithms, the vehicle detection, lane detection, event identification and video recording modules are integrated on an embedded platform by DSP platform-oriented optimization libraries. Finally, all the modules are integrated in an embedded system to achieve an intelligent embedded night driver assistance system.

APA, Harvard, Vancouver, ISO, and other styles

50

Oliveira, Bruno Gonçalves. "Exploring energy efficient object classification on reconfigurable logic." Master's thesis, 2019. http://hdl.handle.net/10316/88114.

Full text

Abstract:

Dissertação de Mestrado Integrado em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia
A classificação de objetos é um problema com grande relevância em visão por computador, uma vez que pode ser integrada em um enorme conjunto de aplicações alvo, taiscomo agricultura e segurança. No presente, existe um conjunto de soluções que resolvemeste problema, sendo que, as que possuem maior sucesso, dependem de redes neuronais.As mais conhecidas são o GoogleNet, a AlexNet e o YOLO. No entanto, o seu processamento subjacente requer plataformas de alta performance tais como GPUs, clusters deCPUs ou ASICs customizados. Excluindo os ASICs, que tem um elevado custo, massão menos genéricos, elas tipicamente têm um elevado consumo energético e não sãoadequadas a sistemas embebidos. No entanto, tem havido progresso em abordagens debaixo consumo devido em parte ao mercado dos smartphones e tablets, estando disponíveismixes de arquiteturas (CPUs e GPUs) com lógica reconfigurável (FPGAs). Neste trabalho,propomos uma série de implementações de redes neuronais quantizadas em plataformashíbridas, explorando completamente o espaço de design, a performance de classificaçãoe a eficiência energética. O algoritmo subjacente é analisado, e os componentes chavepara computação concorrente e paralela identificados. O mapeamento na plataforma foiexplorado, desde a implementação CPU base até uma completamente customizada quemaximiza o uso dos recursos disponíveis. Um conjunto de métricas é considerado para aavaliação das diferentes configurações. No final, conseguimos classificadores de objetoscom diferentes caracteristicas a correr em dois dispositivos de baixo consumo. As análisesrealizadas às implementações suportaram a fiabilidade da compressão de redes neuronaisde convolução para caber nos dispostivos alvo, através da redução da precisão dos seusparâmetros.
Object classification is a problem with great relevance in computer vision since itcan integrate a wide range of target applications, such as agriculture and security. Atpresent, there are a set of solutions that solve this problem, the most successful relyingon neural networks. Among the best known are GoogleNet, AlexNet and YOLO. However, the underlying processing requires high performing computational platforms suchas GPUs, CPU clusters, or custom ASICs. Apart from ASICs, that have a high cost butare less generic, they are typically high power and not well suited for embedded systems.However, there has been some progress in low power approaches, driven in part by thesmartphone and tablet market, and heterogeneous platforms are now available that explore a mix of architectures (CPUs and GPUs) with reconfigurable logic (FPGAs). In thiswork, we propose implementations of lightweight convolutions neural networks in hybridplatforms, thoroughly exploring the design space, the classification performance and thepower efficiency. The underlying algorithm is analysed, and key components for concurrent and parallel computation identified. Mappings of this to the heterogeneous platformwill be explored, ranging from a baseline CPU implementation to a full custom implementation maximising the use of the available resources. A set of metrics is considered for theevaluation of the different configurations. In the end, we achieved object classifiers withdifferent characteristics running in two low-power devices. Analyses performed on theimplementations supported the reliability of compression a convolution neural network tofit on the target device, through the reduction of the precision of its parameters.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Heterogeneous embedded systems'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles