To see the other types of publications on this topic, follow the link: Heterogeneous programming.

Dissertations / Theses on the topic 'Heterogeneous programming'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Heterogeneous programming.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Bhatia, Vishal. "Remote programming for heterogeneous sensor networks." Manhattan, Kan. : Kansas State University, 2008. http://hdl.handle.net/2097/1091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Dastgeer, Usman. "Skeleton Programming for Heterogeneous GPU-based Systems." Licentiate thesis, Linköpings universitet, PELAB - Laboratoriet för programmeringsomgivningar, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70234.

Full text
Abstract:
In this thesis, we address issues associated with programming modern heterogeneous systems while focusing on a special kind of heterogeneous systems that include multicore CPUs and one or more GPUs, called GPU-based systems.We consider the skeleton programming approach to achieve high level abstraction for efficient and portable programming of these GPU-based systemsand present our work on SkePU library which is a skeleton library for these systems. We extend the existing SkePU library with a two-dimensional (2D) data type and skeleton operations and implement several new applications using newly made skeletons. Furthermore, we consider the algorithmic choice present in SkePU and implement support to specify and automatically optimize the algorithmic choice for a skeleton call, on a given platform. To show how to achieve performance, we provide a case-study on optimized GPU-based skeleton implementation for 2D stencil computations and introduce two metrics to maximize resource utilization on a GPU. By devising a mechanism to automatically calculate these two metrics, performance can be retained while porting an application from one GPU architecture to another. Another contribution of this thesis is implementation of the runtime support for the SkePU skeleton library. This is achieved with the help of the StarPUruntime system. By this implementation,support for dynamic scheduling and load balancing for the SkePU skeleton programs is achieved. Furthermore, a capability to do hybrid executionby parallel execution on all available CPUs and GPUs in a system, even for a single skeleton invocation, is developed. SkePU initially supported only data-parallel skeletons. The first task-parallel skeleton (farm) in SkePU is implemented with support for performance-aware scheduling and hierarchical parallel execution by enabling all data parallel skeletons to be usable as tasks inside the farm construct. Experimental evaluations are carried out and presented for algorithmic selection, performance portability, dynamic scheduling and hybrid execution aspects of our work.
APA, Harvard, Vancouver, ISO, and other styles
3

Planas, Carbonell Judit. "Programming models and scheduling techniques for heterogeneous architectures." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/327036.

Full text
Abstract:
There is a clear trend nowadays to use heterogeneous high-performance computers, as they offer considerably greater computing power than homogeneous CPU systems. Extending traditional CPU systems with specialized units (accelerators such as GPGPUs) has become a revolution in the HPC world. Both the traditional performance-per-Watt and the performance-per-Euro ratios have been increased with the use of such systems. Heterogeneous machines can adapt better to different application requirements, as each architecture type offers different characteristics. Thus, in order to maximize application performance in these platforms, applications should be divided into several portions according to their execution requirements. These portions should then be scheduled to the device that better fits their requirements. Hence, heterogeneity introduces complexity in application development, up to the point of reaching the programming wall: on the one hand, source codes must be adapted to fit new architectures and, on the other, resource management becomes more complicated. For example, multiple memory spaces that require explicit data movements or additional synchronizations between different code portions that run on different units. For all these reasons, efficient programming and code maintenance in heterogeneous systems is extremely complex and expensive. Although several approaches have been proposed for accelerator programming, like CUDA or OpenCL, these models do not solve the aforementioned programming challenges, as they expose low level hardware characteristics to the programmer. Therefore, programming models should be able to hide all these complex accelerator programming by providing a homogeneous development environment. In this context, this thesis contributes in two key aspects: first, it proposes a general design to efficiently manage the execution of heterogeneous applications and second, it presents several scheduling mechanisms to spread application execution among all the units of the system to maximize performance and resource utilization. The first contribution proposes an asynchronous design to manage execution, data movements and synchronizations on accelerators. This approach has been developed in two steps: first, a semi-asynchronous proposal and then, a fully-asynchronous proposal in order to fit contemporary hardware restrictions. The experimental results tested on different multi-accelerator systems showed that these approaches could reach the maximum expected performance. Even if compared to native, hand-tuned codes, they could get the same results and outperform native versions in selected cases. The second contribution presents four different scheduling strategies. They focus and combine different aspects related to heterogeneous programming to minimize application's execution time. For example, minimizing the amount of data shared between memory spaces, or maximizing resource utilization by scheduling each portion of code on the unit that fits better. The experimental results were performed on different heterogeneous platforms, including CPUs, GPGPU and Intel Xeon Phi devices. As shown in these tests, it is particularly interesting to analyze how all these scheduling strategies can impact application performance. Three general conclusions can be extracted: first, application performance is not guaranteed across new hardware generations. Then, source codes must be periodically updated as hardware evolves. Second, the most efficient way to run an application on a heterogeneous platform is to divide it into smaller portions and pick the unit that better fits to run each portion. Hence, system resources can cooperate together to execute the application. Finally, and probably the most important, the requirements derived from the first and second conclusions can be implemented inside runtime frameworks, so the complexity of programming heterogeneous architectures is completely hidden to the programmer.
Actualment, hi ha una clara tendència per l'ús de sistemes heterogenis d'alt rendiment, ja que ofereixen una major potència de càlcul que els sistemes homogenis amb CPUs tradicionals. L'addició d'unitats especialitzades (acceleradors com ara GPGPUs) als sistemes amb CPUs s'ha convertit en una revolució en el món de la computació d'alt rendiment. Els sistemes heterogenis poden adaptar-se millor a les diferents necessitats de les aplicacions, ja que cada tipus d'arquitectura ofereix diferents característiques. Per tant, per maximitzar el rendiment, les aplicacions s'han de dividir en diverses parts d'acord amb els seus requeriments computacionals. Llavors, aquestes parts s'han d'executar al dispositiu que s'adapti millor a les seves necessitats. Per tant, l'heterogeneïtat introdueix una complexitat addicional en el desenvolupament d'aplicacions: d'una banda, els codis font s'han d'adaptar a les noves arquitectures i, de l'altra, la gestió de recursos es fa més complicada. Per exemple, múltiples espais de memòria que requereixen moviments explícits de dades o sincronitzacions addicionals entre diferents parts de codi que s'executen en diferents unitats. Per això, la programació i el manteniment del codi en sistemes heterogenis són extremadament complexos i cars. Tot i que hi ha diverses propostes per a la programació d'acceleradors, com CUDA o OpenCL, aquests models no resolen els reptes de programació descrits anteriorment, ja que exposen les característiques de baix nivell del hardware al programador. Per tant, els models de programació han de poder ocultar les complexitats dels acceleradors de cara al programador, proporcionant un entorn de desenvolupament homogeni. En aquest context, la tesi contribueix en dos aspectes fonamentals: primer, proposa un disseny per a gestionar de manera eficient l'execució d'aplicacions heterogènies i, segon, presenta diversos mecanismes de planificació per dividir l'execució d'aplicacions entre totes les unitats del sistema, per tal de maximitzar el rendiment i la utilització de recursos. La primera contribució proposa un disseny d'execució asíncron per gestionar els moviments de dades i sincronitzacions en acceleradors. Aquest enfocament s'ha desenvolupat en dos passos: primer, una proposta semi-asíncrona i després, una proposta totalment asíncrona per tal d'adaptar-se a les restriccions del hardware contemporani. Els resultats en sistemes multi-accelerador mostren que aquests enfocaments poden assolir el màxim rendiment esperat. Fins i tot, en determinats casos, poden superar el rendiment de codis nadius altament optimitzats. La segona contribució presenta quatre mecanismes de planificació diferents, enfocats a la programació heterogènia, per minimitzar el temps d'execució de les aplicacions. Per exemple, minimitzar la quantitat de dades compartides entre espais de memòria, o maximitzar la utilització de recursos mitjançant l'execució de cada porció de codi a la unitat que s'adapta millor. Els experiments s'han realitzat en diferents plataformes heterogènies, incloent CPUs, GPGPUs i dispositius Intel Xeon Phi. És particularment interessant analitzar com totes aquestes estratègies de planificació poden afectar el rendiment de l'aplicació. Com a resultat, es poden extreure tres conclusions generals: en primer lloc, el rendiment de l'aplicació no està garantit en les noves generacions de hardware. Per tant, els codis s'han d'actualitzar periòdicament a mesura que el hardware evoluciona. En segon lloc, la forma més eficient d'executar una aplicació en una plataforma heterogènia és dividir-la en porcions més petites i escollir la unitat que millor s'adapta per executar cada porció. Finalment, i probablement la conclusió més important, és que les exigències derivades de les dues primeres conclusions poden ser implementades dins de llibreries de sistema, de manera que la complexitat de programació d'arquitectures heterogènies quedi completament oculta per al programador.
APA, Harvard, Vancouver, ISO, and other styles
4

VILLALOBOS, CRISTIAN ENRIQUE MUNOZ. "HETEROGENEOUS PARALLELIZATION OF QUANTUM-INSPIRED LINEAR GENETIC PROGRAMMING." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=27791@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
FUNDAÇÃO DE APOIO À PESQUISA DO ESTADO DO RIO DE JANEIRO
PROGRAMA DE EXCELENCIA ACADEMICA
BOLSA NOTA 10
Um dos principais desafios da ciência da computação é conseguir que um computador execute uma tarefa que precisa ser feita, sem dizer-lhe como fazê-la. A Programação Genética (PG) aborda este desafio a partir de uma declaração de alto nível sobre o que é necessário ser feito e cria um programa de computador para resolver o problema automaticamente. Nesta dissertação, é desenvolvida uma extensão do modelo de Programação Genética Linear com Inspiração Quântica (PGLIQ) com melhorias na eficiência e eficácia na busca de soluções. Para tal, primeiro o algoritmo é estruturado em um sistema de paralelização heterogênea visando à aceleração por Unidades de Processamento Gráfico e a execução em múltiplos processadores CPU, maximizando a velocidade dos processos, além de utilizar técnicas otimizadas para reduzir os tempos de transferências de dados. Segundo, utilizam-se as técnicas de Visualização Gráfica que interpretam a estrutura e os processos que o algoritmo evolui para entender o efeito da paralelização do modelo e o comportamento da PGLIQ. Na implementação da paralelização heterogênea, são utilizados os recursos de computação paralela como Message Passing Interface (MPI) e Open Multi-Processing (OpenMP), que são de vital importância quando se trabalha com multi-processos. Além de representar graficamente os parametros da PGLIQ, visualizando-se o comportamento ao longo das gerações, uma visualização 3D para casos de robôtica evolutiva é apresentada, na qual as ferramentas de simulação dinâmica como Bullet SDK e o motor gráfico OGRE para a renderização são utilizadas.
One of the main challenges of computer science is to get a computer execute a task that must be done, without telling it how to do it. Genetic Programming (GP) deals with this challenge from a high level statement of what is needed to be done and creates a computer program to solve the problem automatically. In this dissertation we developed an extension of Quantum-Inspired Linear Genetic Programming Model (QILGP), aiming to improve its efficiency and effectiveness in the search for solutions. For this, first the algorithm is structured in a Heterogeneous Parallelism System, Aiming to accelerated using Graphics Processing Units GPU and multiple CPU processors, reducing the timing of data transfers while maximizing the speed of the processes. Second, using the techniques of Graphic Visualization which interpret the structure and the processes that the algorithm evolves, understanding the behavior of QILGP. We used the highperformance features such as Message Passing Interface (MPI) and Open Multi- Processing (OpenMP), which are of vital importance when working with multiprocesses, as it is necessary to design a topology that has multiple levels of parallelism to avoid delaying the process for transferring the data to a local computer where the visualization is projected. In addition to graphically represent the parameters of PGLIQ devising the behavior over generations, a 3D visualization for cases of evolutionary robotics is presented, in which the tools of dynamic simulation as Bullet SDK and graphics engine OGRE for rendering are used . This visualization is used as a tool for a case study in this dissertation.
APA, Harvard, Vancouver, ISO, and other styles
5

Aji, Ashwin M. "Programming High-Performance Clusters with Heterogeneous Computing Devices." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/52366.

Full text
Abstract:
Today's high-performance computing (HPC) clusters are seeing an increase in the adoption of accelerators like GPUs, FPGAs and co-processors, leading to heterogeneity in the computation and memory subsystems. To program such systems, application developers typically employ a hybrid programming model of MPI across the compute nodes in the cluster and an accelerator-specific library (e.g.; CUDA, OpenCL, OpenMP, OpenACC) across the accelerator devices within each compute node. Such explicit management of disjointed computation and memory resources leads to reduced productivity and performance. This dissertation focuses on designing, implementing and evaluating a runtime system for HPC clusters with heterogeneous computing devices. This work also explores extending existing programming models to make use of our runtime system for easier code modernization of existing applications. Specifically, we present MPI-ACC, an extension to the popular MPI programming model and runtime system for efficient data movement and automatic task mapping across the CPUs and accelerators within a cluster, and discuss the lessons learned. MPI-ACC's task-mapping runtime subsystem performs fast and automatic device selection for a given task. MPI-ACC's data-movement subsystem includes careful optimizations for end-to-end communication among CPUs and accelerators, which are seamlessly leveraged by the application developers. MPI-ACC provides a familiar, flexible and natural interface for programmers to choose the right computation or communication targets, while its runtime system achieves efficient cluster utilization.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
6

Guerreiro, Pedro Miguel Rito. "Visual programming in a heterogeneous multi-core environment." Master's thesis, Universidade de Évora, 2009. http://hdl.handle.net/10174/18505.

Full text
Abstract:
É do conhecimento geral de que, hoje em dia, a tecnologia evolui rapidamente. São criadas novas arquitecturas para resolver determinadas limitações ou problemas. Por vezes, essa evolução é pacífica e não requer necessidade de adaptação e, por outras, essa evolução pode Implicar mudanças. As linguagens de programação são, desde sempre, o principal elo de comunicação entre o programador e o computador. Novas linguagens continuam a aparecer e outras estão sempre em desenvolvimento para se adaptarem a novos conceitos e paradigmas. Isto requer um esforço extra para o programador, que tem de estar sempre atento a estas mudanças. A Programação Visual pode ser uma solução para este problema. Exprimir funções como módulos que recebem determinado Input e retomam determinado output poderá ajudar os programadores espalhados pelo mundo, através da possibilidade de lhes dar uma margem para se abstraírem de pormenores de baixo nível relacionados com uma arquitectura específica. Esta tese não só mostra como combinar as capacidades do CeII/B.E. (que tem uma arquitectura multi­processador heterogénea) com o OpenDX (que tem um ambiente de programação visual), como também demonstra que tal pode ser feito sem grande perda de performance. ABSTRACT; lt is known that nowadays technology develops really fast. New architectures are created ln order to provide new solutions for different technology limitations and problems. Sometimes, this evolution is pacific and there is no need to adapt to new technologies, but things also may require a change every once ln a while. Programming languages have always been the communication bridge between the programmer and the computer. New ones keep coming and other ones keep improving ln order to adapt to new concepts and paradigms. This requires an extra-effort for the programmer, who always needs to be aware of these changes. Visual Programming may be a solution to this problem. Expressing functions as module boxes which receive determined Input and return determined output may help programmers across the world by giving them the possibility to abstract from specific low-level hardware issues. This thesis not only shows how the CeII/B.E. (which has a heterogeneous multi-core architecture) capabilities can be combined with OpenDX (which has a visual programming environment), but also demonstrates that lt can be done without losing much performance.
APA, Harvard, Vancouver, ISO, and other styles
7

FANFARILLO, ALESSANDRO. "Parallel programming techniques for heterogeneous exascale computing platforms." Doctoral thesis, Università degli Studi di Roma "Tor Vergata", 2014. http://hdl.handle.net/2108/202339.

Full text
Abstract:
Nowadays, the most powerful supercomputers in the world, needed for solving complex models and simulations of critical scientific problems, are able to perform tens of quadrillion (1015) floating point operations per second (tens of PetaFLOPS). Although such big amount of computational power may seem enough, scientists and engineers always need to solve more accurate models, run broader simulations and analyze huge amount of data in less time. In particular, experiments that are currently impossible, dangerous, or too expensive to be realized, can be accurately simulated by solving complex predictive models on an exascale machine (1018 FLOPS). A few examples of studies where the exascale computing can make a difference are: reduction of the carbon footprint of the transportation sector, innovative designs for cost-effective renewable energy resources, efficiency and safety of nuclear energy, reverse engineering of the human brain, design, control and manufacture of advanced materials. The importance of having an exascale supercomputer has been officially acknowledged on July 29th, 2015 by President Obama, who signed an executive order creating a National Strategic Computing Initiative calling for the accelerated development of an exascale system. Unfortunately, building an exascale system with the technology we currently use on petascale machines would represent an unaffordable project. Although the cost of the processing units is so inexpensive as to be considered as free, the energy required for moving data (from memories to processors and across the network) and to power-on the entire system (including the cooling system) represents the real limit for reaching the exascale era. Therefore, deep changes in hardware architectures, programming models and parallel algorithms are needed in order to reduce energy requirements and increase compute power. In this dissertation, we face the challanges related to data transfers on exascale architectures, proposing solutions in the field of heterogeneous architectures (CPUs + Accelerators), parallel programming models and parallel algorithms. In particular, we first explore the potential benefits brought by a hybrid CPUs+GPUs approach for sparse matrix computations, then we implement and analyze the performance of coar- VII ray Fortran as parallel programming system for exascale computing. Finally, we merge the world of accelerators and coarray Fortran in order to create a data-aware parallel programming model, suitable for exascale computing. The implementation of OpenCoarrays, the open-source communication library used by GNU Fortran for supporting coarrays, and its usage on heterogeneous devices, are the most relevant contributions presented in this dissertation.
APA, Harvard, Vancouver, ISO, and other styles
8

Schneider, Scott. "Shared Memory Abstractions for Heterogeneous Multicore Processors." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30240.

Full text
Abstract:
We are now seeing diminishing returns from classic single-core processor designs, yet the number of transistors available for a processor is still increasing. Processor architects are therefore experimenting with a variety of multicore processor designs. Heterogeneous multicore processors with Explicitly Managed Memory (EMM) hierarchies are one such experimental design which has the potential for high performance, but at the cost of great programmer effort. EMM processors have cores that are divorced from the normal memory hierarchy, thus the onus is on the programmer to manage locality and parallelism. This dissertation presents the Cellgen source-to-source compiler which moves some of this complexity back into the compiler. Cellgen offers a directive-based programming model with semantics similar to OpenMP for the Cell Broadband Engine, a general-purpose processor with EMM. The compiler implicitly handles locality and parallelism, schedules memory transfers for data parallel regions of code, and provides performance predictions which can be leveraged to make scheduling decisions. We compare this approach to using a software cache, to a different programming model which is task based with explicit data transfers, and to programming the Cell directly using the native SDK. We also present a case study which uses the Cellgen compiler in a comparison across multiple kinds of multicore architectures: heterogeneous, homogeneous and radically data-parallel graphics processors.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
9

Podobas, Artur, Mats Brorsson, and Vladimir Vlassov. "Exploring heterogeneous scheduling using the task-centric programming model." KTH, Programvaru- och datorsystem, SCS, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-120436.

Full text
Abstract:
Computer architecture technology is moving towards more heteroge-neous solutions, which will contain a number of processing units with different capabilities that may increase the performance of the system as a whole. How-ever, with increased performance comes increased complexity; complexity that is now barely handled in homogeneous multiprocessing systems. The present study tries to solve a small piece of the heterogeneous puzzle; how can we exploit all system resources in a performance-effective and user-friendly way? Our proposed solution includes a run-time system capable of using a variety of different heterogeneous components while providing the user with the already familiar task-centric programming model interface. Furthermore, when dealing with non-uniform workloads, we show that traditional approaches based on centralized or work-stealing queue algorithms do not work well and propose a scheduling algorithm based on trend analysis to distribute work in a performance-effective way across resources.

QC 20130429


ENCORE
APA, Harvard, Vancouver, ISO, and other styles
10

Dekkiche, Djamila. "Programming methodologies for ADAS applications in parallel heterogeneous architectures." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS388/document.

Full text
Abstract:
La vision par ordinateur est primordiale pour la compréhension et l’analyse d’une scène routière afin de construire des systèmes d’aide à la conduite (ADAS) plus intelligents. Cependant, l’implémentation de ces systèmes dans un réel environnement automobile et loin d’être simple. En effet, ces applications nécessitent une haute performance de calcul en plus d’une précision algorithmique. Pour répondre à ces exigences, de nouvelles architectures hétérogènes sont apparues. Elles sont composées de plusieurs unités de traitement avec différentes technologies de calcul parallèle: GPU, accélérateurs dédiés, etc. Pour mieux exploiter les performances de ces architectures, différents langages sont nécessaires en fonction du modèle d’exécution parallèle. Dans cette thèse, nous étudions diverses méthodologies de programmation parallèle. Nous utilisons une étude de cas complexe basée sur la stéréo-vision. Nous présentons les caractéristiques et les limites de chaque approche. Nous évaluons ensuite les outils employés principalement en terme de performances de calcul et de difficulté de programmation. Le retour de ce travail de recherche est crucial pour le développement de futurs algorithmes de traitement d’images en adéquation avec les architectures parallèles avec un meilleur compromis entre les performances de calcul, la précision algorithmique et la difficulté de programmation
Computer Vision (CV) is crucial for understanding and analyzing the driving scene to build more intelligent Advanced Driver Assistance Systems (ADAS). However, implementing CV-based ADAS in a real automotive environment is not straightforward. Indeed, CV algorithms combine the challenges of high computing performance and algorithm accuracy. To respond to these requirements, new heterogeneous circuits are developed. They consist of several processing units with different parallel computing technologies as GPU, dedicated accelerators, etc. To better exploit the performances of such architectures, different languages are required depending on the underlying parallel execution model. In this work, we investigate various parallel programming methodologies based on a complex case study of stereo vision. We introduce the relevant features and limitations of each approach. We evaluate the employed programming tools mainly in terms of computation performances and programming productivity. The feedback of this research is crucial for the development of future CV algorithms in adequacy with parallel architectures with a best compromise between computing performance, algorithm accuracy and programming efforts
APA, Harvard, Vancouver, ISO, and other styles
11

Fagg, Graham Edward. "Enabling technologies for parallel heterogeneous computing." Thesis, University of Reading, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266150.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Anthony, Patricia. "Bidding agents for multiple heterogeneous online auctions." Thesis, University of Southampton, 2003. https://eprints.soton.ac.uk/257838/.

Full text
Abstract:
Due to the proliferation of online auctions, there is an increasing need to monitor and bid in multiple auctions in order to procure the best deal for the desired good. To this end, this thesis reports on the development of a heuristic decision making framework that an autonomous agent can exploit to tackle the problem of bidding across multiple auctions with varying start and end times and with varying protocols (including English, Dutch and Vickrey). The framework is flexible, configurable, and enables the agent to adopt varying tactics and strategies that attempt to ensure that the desired item is delivered in a manner consistent with the user's preferences. Given this large space of possibilities, a genetic algorithm is employed to search (offline) for effective strategies in common classes of environment. The strategies that emerge from this evolution are then codified into the agent's reasoning behaviour so that it can select the most appropriate strategy to employ in its prevailing circumstances. The proposed framework has been implemented in a simulated marketplace environment and its effectiveness has been empirically demonstrated.
APA, Harvard, Vancouver, ISO, and other styles
13

Potter, Ralph. "Programming models for heterogeneous systems with application to computer graphics." Thesis, University of Bath, 2017. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.760928.

Full text
Abstract:
For over a decade, we have seen a plateauing of CPU clock rates, primarily due topower and thermal constraints. To counter these problems, processor architects have turned to both multi-core and heterogeneous processors. Whilst the use of heterogeneous processors provides a route to reducing energy con-sumption, this comes at the cost of increased complexity for software developers. In this thesis, we explore the development of C++-based programming models and frameworks which enable the efficient use of these heterogeneous platforms, and the application of these programming models to problems from the field of visual computing. Two recent specifications for heterogeneous computing: SYCL and Heterogeneous System Architecture, share the common goal of providing a foundation for developing heterogeneous programming models. In this thesis, we provide early evaluations of the suitability of these two new platforms as foundations for building higher-level domain-specific abstractions. We drawing upon two use cases from the field of visual computing: image processing and ray tracing; and explore the development and use of domain-specific C++ abstractions layered upon these platforms. We present a domain-specific language which generates optimized image processing kernels by deeply embedding within SYCL. By combining simple primitives into more complex kernels, we are able to eliminate intermediate memory accesses and improved performance. We also describe Offload for HSA: a single-source C++14 compiler and programming model for Heterogeneous System Architecture. The pervasive shared virtual memory offered by HSA allows us to reduce execution overheads and relax constraints imposed by SYCL’s programming model, leading to significant performance improvements. Performance optimization on heterogeneous systems is a challenging task. We build upon Offload to provide RTKit, a framework for exploring the optimization space of ray tracing algorithms on heterogeneous systems. Finally, we conclude by discussing challenges raised by our work and open problems that must be resolved in order to unify C++ and heterogeneous computing.
APA, Harvard, Vancouver, ISO, and other styles
14

Linderman, Michael David. "A programming model and processor architecture for heterogeneous multicore computers /." May be available electronically:, 2009. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

BARCHI, FRANCESCO. "Many-core and heterogeneous architectures: programming models and compilation toolchains." Doctoral thesis, Politecnico di Torino, 2020. http://hdl.handle.net/11583/2850608.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Xu, Lihui. "On the integration of heterogeneous deductive databases." Thesis, King's College London (University of London), 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.321953.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Müller, Georg. "Traffic profiles and performance modelling of heterogeneous networks." Thesis, University of Plymouth, 2000. http://hdl.handle.net/10026.1/2364.

Full text
Abstract:
This thesis considers the analysis and study of short and long-term traffic patterns of heterogeneous networks. A large number of traffic profiles from different locations and network environments have been determined. The result of the analysis of these patterns has led to a new parameter, namely the 'application signature'. It was found that these signatures manifest themselves in various granularities over time, and are usually unique to an application, permanent virtual circuit (PVC), user or service. The differentiation of the application signatures into different categories creates a foundation for short and long-term management of networks. The thesis therefore looks from the micro and macro perspective on traffic management, covering both aspects. The long-term traffic patterns have been used to develop a novel methodology for network planning and design. As the size and complexity of interconnected systems grow steadily, usually covering different time zones, geographical and political areas, a new methodology has been developed as part of this thesis. A part of the methodology is a new overbooking mechanism, which stands in contrast to existing overbooking methods created by companies like Bell Labs. The new overbooking provides companies with cheaper network design and higher average throughput. In addition, new requirements like risk factors have been incorporated into the methodology, which lay historically outside the design process. A large network service provider has implemented the overbooking mechanism into their network planning process, enabling practical evaluation. The other aspect of the thesis looks at short-term traffic patterns, to analyse how congestion can be controlled. Reoccurring short-term traffic patterns, the application signatures, have been used for this research to develop the "packet train model" further. Through this research a new congestion control mechanism was created to investigate how the application signatures and the "extended packet train model" could be used. To validate the results, a software simulation has been written that executes the proprietary congestion mechanism and the new mechanism for comparison. Application signatures for the TCP/IP protocols have been applied in the simulation and the results are displayed and discussed in the thesis. The findings show the effects that frame relay congestion control mechanisms have on TCP/IP, where the re-sending of segments, buffer allocation, delay and throughput are compared. The results prove that application signatures can be used effectively to enhance existing congestion control mechanisms.
APA, Harvard, Vancouver, ISO, and other styles
18

Lopes, Fernanda LiÂgia Rodrigues. "Access to data from Ontology Using Mappings Heterogeneous and Logic Programming." Universidade Federal do CearÃ, 2010. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=14156.

Full text
Abstract:
CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior
Ontologies have been used in different areas, including Data Integration and Semantic Web, to provide formal descriptions to data sources as well as to associate semantics to them and to make information easier to discover and to recover. In this context, one of the most relevant issues is the Ontology-Based Data Access â OBDA, which is the problem of accessing one or more data sources by means of a conceptual representation expressed in terms of an ontology. The independence between the ontology layer and the data layer, and the ability of answering more expressive queries than the ones defined using description logics are some of the main distinguished issues of the ODBA. In this work, we specify an environment for OBDA, which deals with this problem considering a set of independent tasks. Our main contribution concerns the definition and implementation of a query rewriting process between ontologies structurally heterogeneous. In the proposed query rewriting approach, we combine the semantics and expressiveness of SPARQL with logic programming and we adopt a rulebased formalism to represent mappings between ontologies. We also deal with some relevant questions, including: the structural heterogeneity, the prune of irrelevant parts of the rewritten query and the representation of query results according to the target ontology. It is important to note that, although in this work we discuss the use of the proposed solution considering just two ontologies, it can also be extended and applied for data distributions cenarios with multiple ontologies.
Em vÃrias Ãreas, tais como IntegraÃÃo de Dados e Web SemÃntica, ontologias tÃm sido adotadas para descrever formalmente a semÃntica das fontes de dados, com o intuito de facilitar a descoberta e a recuperaÃÃo de informaÃÃes. Dentro desse contexto, o Acesso a Dados Baseado em Ontologias (Ontology-Based Data Access - OBDA) à um problema decorrente da necessidade de acessar tais fontes a partir das ontologias que representam seus modelos conceituais. Dentre as principais caracterÃsticas do OBDA, destacamos a independÃncia entre as ontologias e a camada de dados e a possibilidade de responder a consultas que sejam mais expressivas que as geralmente realizadas utilizando LÃgica Descritiva. Neste trabalho, especificamos um ambiente de OBDA no qual este problema à dividido em uma sÃrie de passos que podem ser tratados de maneira independente. Dentre cada um destes passos especificados, nossa principal contribuiÃÃo reside na definiÃÃo e implementaÃÃo de um processo para reescrita de consultas entre ontologias estruturalmente distintas. Em nossa abordagem de reescrita, manipulamos a consulta de entrada combinando a semÃntica e a expressividade da linguagem SPARQL com um mÃtodo baseado em noÃÃes de ProgramaÃÃo em LÃgica, uma vez que utilizamos mapeamentos heterogÃneos expressos atravÃs de regras. AlÃm disso, tratamos aspectos referentes Ãs diferenÃas estruturais entre as ontologias, possibilitamos que partes da consulta reescrita possam ser descartadas durante o processo, caso seja constatado que tais partes seriam desnecessÃrias, e permitimos que os resultados sejam reestruturados e apresentados conforme a ontologia alvo. Por fim, à vÃlido destacar que, embora a soluÃÃo apresentada tenha como foco duas ontologias, esta pode ser estendida para considerar aspectos especÃficos de distribuiÃÃo.
APA, Harvard, Vancouver, ISO, and other styles
19

Ernstsson, August. "Designing a Modern Skeleton Programming Framework for Parallel and Heterogeneous Systems." Licentiate thesis, Linköpings universitet, Programvara och system, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-170194.

Full text
Abstract:
Today's society is increasingly software-driven and dependent on powerful computer technology. Therefore it is important that advancements in the low-level processor hardware are made available for exploitation by a growing number of programmers of differing skill level. However, as we are approaching the end of Moore's law, hardware designers are finding new and increasingly complex ways to increase the accessible processor performance. It is getting more and more difficult to effectively target these processing resources without expert knowledge in parallelization, heterogeneous computation, communication, synchronization, and so on. To ensure that the software side can keep up, advanced programming environments and frameworks are needed to bridge the widening gap between hardware and software. One such example is the pattern-centric skeleton programming model and in particular the SkePU project. The work presented in this thesis first redesigns the SkePU framework based on modern C++ variadic template metaprogramming and state-of-the-art compiler technology. It then explores new ways to improve performance: by providing new patterns, improving the data access locality of existing ones, and using both static and dynamic knowledge about program flow. The work combines novel ideas with practical evaluation of the approach on several applications. The advancements also include the first skeleton API that allows variadic skeletons, new data containers, and finally an approach to make skeleton programming more customizable without compromising universal portability.

Ytterligare forskningsfinansiärer: EU H2020 project EXA2PRO (801015); SeRC.

APA, Harvard, Vancouver, ISO, and other styles
20

Peisert, Sean Philip. "A programming model for automated decomposition on heterogeneous clusters of multiprocessors /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2000. http://wwwlib.umi.com/cr/ucsd/fullcit?p1398091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Tuffen, John Anthony. "Load distribution within a heterogeneous multiprocessor computer music system." Thesis, University of York, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.242117.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Sodsong, Wasuwee. "Parallelization Techniques for Heterogeneous Multicores with Applications." Thesis, The University of Sydney, 2017. http://hdl.handle.net/2123/17987.

Full text
Abstract:
In the past decade, graphics processing units (GPUs) have gained wide-spread use as general purpose hardware accelerators. Equipped with several thousand cores, GPUs are suitable for data-intensive operations. Although a GPU provides a vast amount of raw, parallel compute power, it is nevertheless a daunting task to fully utilize this hardware resource. Doing so requires an in-depth understanding of the GPU architecture to utilize the software-exposed GPU memory hierarchy, and to mitigate main memory latencies. Because a GPU lacks complex control units, it under-performs for tasks with complex control flow. Control-flow intensive operations are thus more efficiently computed on a CPU. In contrast, CPUs lack ALUs and thus under-perform with data-intensive operations. In practice, we find applications to be composed of a mix of data-intensive operations and operations with complex control-flow. Heterogeneous computing aims at utilizing both the CPU and the GPU of a system. It offers the advantage of leveraging the key strengths of both architectures, while diminishing their weaknesses. This thesis proposes code-partitioning, which considers application characteristics and the capabilities of the underlying hardware to assign computations to either the CPU or the GPU. Dynamic scheduling techniques are proposed to leverage pipeline-parallelism and load-balance the work-load on a heterogeneous architecture. The proposed code-partitioning technique is applied with two major applications, JPEG decompression and Kronecker algebra-operations. The entropy decoding of JPEG decompression is difficult to parallelize because codewords are of variable length, and the start-position of a codeword in the bitstream is not known before the previous codeword has been decoded. The remaining JPEG decoding steps are compute-intensive with few dependencies. Similarly, Kronecker algebra, which has been shown to be effective with static program analysis, consists of data-intensive matrix operations. However, it has cross-iteration dependencies, such as bookkeeping of visited nodes, which is unsuitable for GPU computing. Despite improvement potential with a heterogeneous system, the domination of the JPEG format and the usefulness of Kronecker algebra, no approaches exist yet that are capable of joining forces of a system’s CPU and GPU. We investigate parallelization strategies that use heterogeneous multicores for JPEG decompression and Kronecker algebra. We propose algorithm-specific optimizations that minimize the known sequential bottlenecks. Our code-partitioning and scheduling scheme exploits task, data, and pipeline parallelism. We introduce an offline profiling step to determine the performance of a system’s CPU and GPU such that workloads are distributed accordingly. These applications are evaluated on several heterogeneous platforms, including an embedded system (for JPEG decompression). From the “lessons learned”, parallel software design patterns for heterogeneous computing have been distilled and put to work with the two major applications of this thesis.
APA, Harvard, Vancouver, ISO, and other styles
23

Vella, Kevin J. "Seamless parallel computing on heterogeneous networks of multiprocessor workstations." Thesis, University of Kent, 1998. https://kar.kent.ac.uk/21580/.

Full text
Abstract:
This thesis is concerned with portable, efficient, and, above all, seamless parallel programming of heterogeneous networks of shared memory multiprocessor workstations. The CSP model of concurrency as embodied in the occam language is used to purvey an architecture-independent and elegant view of concurrent systems. Tools and techniques for efficiently executing finely decomposed parallel programs on uniprocessor workstations, shared memory multiprocessor workstations and networks of both are examined in some detail. In particular, scheduling strategies that batch related processes together to reduce cache-related context switching overheads on uniprocessors, and to reduce contention and false sharing on shared memory multiprocessors are studied. New wait-free CP channel algorithms for shared memory multiprocessors are presented, as well as implementations of CSP channel algorithms across commodity network interconnects. A virtual parallel computer abstraction is applied to hide the inherent heterogeneity of workstation networks and enable seamless execution of parallel programs. An investigation of the performance of moderate to very fine grain parallelism on uniprocessors and shared memory multiprocessors is presented. The performance of CSP channels across TCP/IP networks is also scrutinized. The results indicate that fine grain parallelism can be handled efficiently in software on uniprocessors and shared memory multiprocessors, though issues related to caching warrant careful consideration. Other results also show that a limited amount of computation-communication overlap can be attained even with commodity network adapters which require significant processor interaction to sustain data transfer. This thesis demonstrates that seamless parallel programming across a variety of contemporary architectures using the CSP/occam model is a viable, as well as an attractive, option.
APA, Harvard, Vancouver, ISO, and other styles
24

Saà-Garriga, Albert. "Automatic source code adaptation for heterogeneous platforms." Doctoral thesis, Universitat Autònoma de Barcelona, 2016. http://hdl.handle.net/10803/399986.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Elteir, Marwa Khamis. "A MapReduce Framework for Heterogeneous Computing Architectures." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28786.

Full text
Abstract:
Nowadays, an increasing number of computational systems are equipped with heterogeneous compute resources, i.e., following different architecture. This applies to the level of a single chip, a single node and even supercomputers and large-scale clusters. With its impressive price-to-performance ratio as well as power efficiently compared to traditional multicore processors, graphics processing units (GPUs) has become an integrated part of these systems. GPUs deliver high peak performance; however efficiently exploiting their computational power requires the exploration of a multi-dimensional space of optimization methodologies, which is challenging even for the well-trained expert. The complexity of this multi-dimensional space arises not only from the traditionally well known but arduous task of architecture-aware GPU optimization at design and compile time, but it also arises in the partitioning and scheduling of the computation across these heterogeneous resources. Even with programming models like the Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), the developer still needs to manage the data transfer be- tween host and device and vice versa, orchestrate the execution of several kernels, and more arduously, optimize the kernel code. In this dissertation, we aim to deliver a transparent parallel programming environment for heterogeneous resources by leveraging the power of the MapReduce programming model and OpenCL programming language. We propose a portable architecture-aware framework that efficiently runs an application across heterogeneous resources, specifically AMD GPUs and NVIDIA GPUs, while hiding complex architectural details from the developer. To further enhance performance portability, we explore approaches for asynchronously and efficiently distributing the computations across heterogeneous resources. When applied to benchmarks and representative applications, our proposed framework significantly enhances performance, including up to 58% improvement over traditional approaches to task assignment and up to a 45-fold improvement over state-of-the-art MapReduce implementations.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
26

Metwly, Nevin. "Common application programming interface architecture bridge for connecting heterogeneous home computing middleware." Thesis, University of Ottawa (Canada), 2010. http://hdl.handle.net/10393/28492.

Full text
Abstract:
Microprocessors are now embedded in various equipments such as home appliances and digital audio/video devices, providing varieties of services. An integration of these services and creation of new ones is always needed to fulfill ongoing users' demands. There are a number of issues limiting the expandability of the intelligent home networks. One of these issues is that most middleware for home computing networks are not compatible with each other. To solve this problem of diversification of middleware and to ensure interoperability among them, the one-to-one bridge has been developed, which connects two specific middleware varieties and performs protocol conversion; also various one-to-any home network middleware bridges have lately been introduced. In this thesis, a proposal for any-to-any bridge connecting home computing middleware is introduced. It enables any appliance under any middleware to communicate and control any other appliances under different middleware. Also, new middleware can be easily integrated with the proposed architecture.
APA, Harvard, Vancouver, ISO, and other styles
27

Rodrigues, Tabajara Krausburg. "Constrained coalition formation among heterogeneous agents for the multi-agent programming contest." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2018. http://tede2.pucrs.br/tede2/handle/tede/8102.

Full text
Abstract:
Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-05-28T12:31:15Z No. of bitstreams: 1 TABAJARA_KRAUSBURG_RODRIGUES_DIS.pdf: 4049392 bytes, checksum: 154302eff9df959cfa74d6c0faec5d4e (MD5)
Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-06-06T13:05:51Z (GMT) No. of bitstreams: 1 TABAJARA_KRAUSBURG_RODRIGUES_DIS.pdf: 4049392 bytes, checksum: 154302eff9df959cfa74d6c0faec5d4e (MD5)
Made available in DSpace on 2018-06-06T13:35:29Z (GMT). No. of bitstreams: 1 TABAJARA_KRAUSBURG_RODRIGUES_DIS.pdf: 4049392 bytes, checksum: 154302eff9df959cfa74d6c0faec5d4e (MD5) Previous issue date: 2018-03-26
Esta disserta??o apresenta um estudo sobre forma??o de coaliz?es entre agentes heterog?neos para a competi??o de programa??o multiagente de 2017. Foi investigado e aplicado a forma??o de estruturas de coaliz?es entre agentes para resolver problemas log?sticos simulados sobre o mapa de uma cidade real. A fim de atingir o objetivo deste trabalho, foram integrados algoritmos formadores de coaliz?es na plataforma JaCaMo por meio de um artefato CArtAgO chamado CFArtefact. Foi utilizada a implementa??o provida pelo time SMART-JaCaMo (time participante da competi??o multiagente), para experimentar a forma??o de coaliz?es na competi??o. Tr?s abordagens foram avaliadas no dom?nio da competi??o em diferentes configura??es. A primeira abordagem utiliza somente aloca??o de tarefas para resolver o problema. A segunda e a terceira abordagem utilizam a t?cnica de forma??o de coaliz?es anteriormente ? aloca??o de tarefas; dentre estas abordagens, uma utiliza um algor?timo ?timo para resolver o problema e a outra um heur?stico. As an?lises dos experimentos realizados mostram que algor?timos formadores de coaliz?es podem melhorar a performance do time participante da competi??o quando a taxa de trabalhos gerados pelo simulador ? baixa. Entretanto, conforme a taxa de trabalhos aumenta, a abordagem que realiza somente aloca??o de tarefas obt?m um desempenho melhor quando comparada as demais. Mesmo a abordagem heur?stica tem desempenho pr?ximo ? abordagem ?tima para coaliz?es. Desta forma, ? poss?vel concluir que forma??o de coaliz?es possui grande valia para balancear os agentes para um conjunto de trabalhos que precisa ser completado.
This work focuses on coalition formation among heterogeneous agents for the 2017 multiagent programming contest. An agent is a computer system that is capable of independent action to achieve its goals. In order to increase the effectiveness of the agents, we can organise them into coalitions, in which the agents collaborate with each other to achieve individual or common goals. We investigate and apply coalition structure generation (the first activity of the coalition formation process) in simulated scenarios, specifically the 2017 contest scenario, where the agents forming a competing team cooperate to solve logistic problems simulated on the map of a real city. In order to achieve our goal, we integrate coalition formation algorithms into the JaCaMo platform by means of a CArtAgO artefact, named CFArtefact. We use the implementation of the SMART JaCaMo team for experimenting with the coalition formation approach in the contest scenario. We experiment on three approaches in the contest domain with different configurations. In the first, we use only a taskallocation mechanism, while the other approaches use an optimal coalition formation algorithm and a heuristic coalition formation algorithm. We conducted several experiments to compare the advantages of each approach. Our results show that coalition formation algorithms can improve the performance of a participating team when dealing with low job rates (i.e., how quickly new jobs are created by the simulation). However, as we increase the job rate, the approach using only task allocation has better performance. Even a heuristic coalition formation approach has close performance to the optimal one in that case. Coalition formation can play an important role when we aim to balance each group of agents to accomplish some particular goal given a larger team of cooperating agents.
APA, Harvard, Vancouver, ISO, and other styles
28

Harvey, Paul. "A linguistic approach to concurrent, distributed, and adaptive programming across heterogeneous platforms." Thesis, University of Glasgow, 2015. http://theses.gla.ac.uk/6749/.

Full text
Abstract:
Two major trends in computing hardware during the last decade have been an increase in the number of processing cores found in individual computer hardware platforms and an ubiquity of distributed, heterogeneous systems. Together, these changes can improve not only the performance of a range of applications, but the types of applications that can be created. Despite the advances in hardware technology, advances in programming of such systems has not kept pace. Traditional concurrent programming has always been challenging, and is only set to be come more so as the level of hardware concurrency increases. The different hardware platforms which make up heterogeneous systems come with domain-specific programming models, which are not designed to interact, or take into account the different resource-constraints present across different hardware devices, motivating a need for runtime reconfiguration or adaptation. This dissertation investigates the actor model of computation as an appropriate abstraction to address the issues present in programming concurrent, distributed, and adaptive applications across different scales and types of computing hardware. Given the limitations of other approaches, this dissertation describes a new actor-based programming language (Ensemble) and its runtime to address these challenges. The goal of this language is to enable non-specialist programmers to take advantage of parallel, distributed, and adaptive programming without the programmer requiring in-depth knowledge of hardware architectures or software frameworks. There is also a description of the design and implementation of the runtime system which executes Ensemble applications across a range of heterogeneous platforms. To show the suitability of the actor-based abstraction in creating applications for such systems, the language and runtime were evaluated in terms of linguistic complexity and performance. These evaluations covered programming embedded, concurrent, distributed, and adaptable applications, as well as combinations thereof. The results show that the actor provides an objectively simple way to program such systems without sacrificing performance.
APA, Harvard, Vancouver, ISO, and other styles
29

Sai, Ranga Prashanth C. "Algorithms for task scheduling in heterogeneous computing environments." Auburn, Ala., 2006. http://repo.lib.auburn.edu/2006%20Fall/SAI_RANGA_58.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Krommydas, Konstantinos. "Towards Enhancing Performance, Programmability, and Portability in Heterogeneous Computing." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/77582.

Full text
Abstract:
The proliferation of a diverse set of heterogeneous computing platforms in conjunction with the plethora of programming languages and optimization techniques on each language for each underlying architecture exacerbate widespread adoption of such platforms. This is especially true for novice programmers and the non-technical-savvy masses that are largely precluded from enjoying the advantages of high-performance computing. Moreover, different groups within the heterogeneous computing community (e.g., hardware architects, tool developers, and programmers) are presented with new challenges with respect to performance, programmability, and portability (or the three P's) of heterogeneous computing. In this work we discuss such challenges and identify benchmarking techniques based on computation and communication patterns as an appropriate means for the systematic evaluation of heterogeneous computing with respect to the three P's. Our proposed approach is based on OpenCL implementations of the Berkeley dwarfs. We use our benchmark suite (OpenDwarfs) in characterizing performance of state-of-the-art parallel architectures, and as the main component of a methodology (Telescoping Architectures) for identifying trends in future heterogeneous architectures. Furthermore, we employ OpenDwarfs in a multi-faceted study on the gaps between the three P's in the context of the modern heterogeneous computing landscape. Our case-study spans a variety of compilers, languages, optimizations, and target architectures, including the CPU, GPU, MIC, and FPGA. Based on our insights, and extending aspects of prior research (e.g., in compilers, programming languages, and auto-tuning), we propose the introduction of grid-based data structures as the basis of programming frameworks and present a prototype unified framework (GLAF) that encompasses a novel visual programming environment with code generation, auto-parallelization, and auto-tuning capabilities. Our results, which span scientific domains, indicate that our holistic approach constitutes a viable alternative towards enhancing the three P's and further democratizing heterogeneous, parallel computing for non-programming-savvy audiences, and especially domain scientists.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
31

Smith, Graham Leslie. "A distributed Linda server on a network of heterogeneous processors." Thesis, Rhodes University, 1993. http://hdl.handle.net/10962/d1004890.

Full text
Abstract:
Linda is an approach to parallelism which relies on a virtual associative shared memory called tuple space. Tuple space is accessed through a small set of primitive operations and is conceptually easy to understand and manipulate. The physical implementation of a Linda tuple space may of course be completely different from the conceptual model. Rhodes has implemented versions of Linda on a ring of RS-232 joined PC's and on a cluster of T800 transputers with a single copy of tuple space on one transputer. Current research targets the implementation of a distributed Linda server on a network of heterogeneous processors. This work describes the design and implementation of a distributed Linda server. Emphasis is placed on aspects of the design which enhance portability and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
32

Sajjapongse, Kittisak. "Hierarchical scheduling and uniform access programming frameworks for heterogeneous CPU-GPU computing clusters." Thesis, University of Missouri - Columbia, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10178997.

Full text
Abstract:

The advance of the GPU hardware architecture has made GPUs attractive devices for general-purpose computing. Modern GPUs are equipped with an increasing number of cores, a flexible memory hierarchy, and a large memory capacity. While the computational power of modern GPU devices has allowed their introduction in high-performance computing (HPC) clusters and the efficient processing of ever larger workloads, existing software components for HPC clusters still offer basic support for hardware heterogeneity and often cause performance limitations in the presence of GPU devices. In particular, two kinds of limitations are associated with these software components: runtime support and programmability. We found that these limitations are due to the fact that existing software frameworks for heterogeneous clusters treat GPUs as dedicated coprocessor devices.

In this dissertation, we propose two software frameworks for addressing the performance and hardware underutilization issues found in heterogeneous CPU-GPU clusters as well as increasing their programmability. Our frameworks provide a uniform view of compute resources and treat CPUs and GPUs equally as first-class resources, allowing efficient management of heterogeneous compute resources. First, we propose a hierarchical scheduling framework consisting of a node-level runtime and a cluster-level scheduler that provides abstraction of heterogeneous compute resources at different granularities. This hierarchical framework targets existing applications and does not require their modification. In the node-level runtime, we identify and design mechanisms, such as virtual GPUs, GPU virtual memory, dynamic load balancing and pre-emption, which are necessary to support efficient sharing and load balancing schemes for GPUs within a compute node. In the cluster-level scheduler, we introduce mechanisms to abstract compute nodes and perform load balancing in concert with the node-level runtime. Our hierarchical scheduling framework allows supporting different load balancing policies and does not require additional inputs (such as profiling information) from users. Second, we propose a programming framework based on a novel memory and execution model. Our memory model hides disjoint addressing spaces (corresponding to different CPUs, GPUs and compute nodes) and provides a view of a single virtual memory space that can be accessed by all compute resources in a heterogeneous cluster. Our execution model provides uniform access to compute resources and allows our framework to treat all CPUs and GPUs equally and to access data in the virtual memory space.

APA, Harvard, Vancouver, ISO, and other styles
33

Ribeiro, Tiago Filipe Rodrigues. "Developing and evaluating clopencl applications for heterogeneous clusters." Master's thesis, Instituto Politécnico de Bragança, Escola Superior de Tecnologia e Gestão, 2012. http://hdl.handle.net/10198/7948.

Full text
Abstract:
In the last few years, the computing systems processing capabilities have increased significantly, changing from single-core to multi-core and even many-core systems. Accompanying this evolution, local networks have also become faster, with multi-gigabit technologies like Infiniband, Myrinet and 10G Ethernet. Parallel/distributed programming tools and standards, like POSIX Threads, OpenMP and MPI, have helped to explore these technologies and have been frequently combined, giving rise to Hybrid Programming Models. Recently, co-processors like GPUs and FPGAs, started to be used as accelerators, requiring specialized frameworks (like CUDA for NVIDIA GPUs). Presented with so much heterogeneity, the industry formulated the OpenCL specification, as a standard to explore heterogeneous systems. However, in the context of cluster computing, one problem surfaces: OpenCL only enables a developer to use the devices that are present in the local machine. With many processor devices scattered across cluster nodes (CPUs, GPUs and other co-processors), it then became important to enable software developers to take full advantage of the full cluster device set. This dissertation demonstrates and evaluates an OpenCL extension, named clOpenCL, which supports the simple deployment and efficient running of OpenCL-based parallel applications that may span several cluster nodes, thus expanding the original single-node OpenCL model. The main contributions are that clOpenCL i) offers a transparent approach to the porting of traditional OpenCL applications to cluster environments and ii) provides significant performance increases over classical (non-)hybrid parallel approaches. Nos últimos anos, a capacidade de processamento dos sistemas de computação aumentou significativamente, passando de CPUs com um núcleo para CPUs multi-núcleo. Acompanhando esta evolução, as redes locais também se tornaram mais rápidas, com tecnologias multi-gigabit como a Infiniband, Myrinet e 10G Ethernet. Ferramentas e standards paralelos/distribuídos, como POSIX Threads, OpenMP e MPI, ajudaram a explorar esses sistemas, e têm sido frequentemente combinados dando origem a Modelos de Programação Híbrida. Mais recentemente, co-processadores como GPUs e FPGAs, começaram a ser utilizados como aceleradores, exigindo frameworks especializadas (como o CUDA para GPUs NVIDIA). Deparada com tanta heterogeneidade, a indústria formulou a especificação OpenCL, como sendo um standard para exploração de sistemas heterogéneos. No entanto, no contexto da computação em cluster, um problema surge: o OpenCL só permite ao desenvolvedor utilizar dispositivos presentes na máquina local. Com tantos dispositivos de processamento espalhados pelos nós de um cluster (CPUs, GPUs e outros co-processadores), tornou-se assim importante habilitar os desenvolvedores de software, a tirarem o máximo proveito do conjunto total de dispositivos do cluster. Esta dissertação demonstra e avalia uma extensão OpenCL, chamada clOpenCL, que suporta a implementação simples e execução eficiente de aplicações paralelas baseadas em OpenCL que podem estender-se por vários nós do cluster, expandindo assim o modelo original de um único nó do OpenCL. As principais contribuições referem-se a que o clOpenCL i) oferece uma abordagem transparente à portabilidade de aplicações OpenCL tradicionais para ambientes cluster e ii) proporciona aumentos significativos de desempenho sobre abordagens paralelas clássicas (não) híbridas.
APA, Harvard, Vancouver, ISO, and other styles
34

Barfoursh, Ahmad Abdollahzadeh. "The design and implementation of distributed information systems based on heterogeneous data bases." Thesis, University of Bristol, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.314511.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Mathaikutty, Deepak Abraham. "Functional Programming and Metamodeling frameworks for System Design." Thesis, Virginia Tech, 2005. http://hdl.handle.net/10919/32639.

Full text
Abstract:
System-on-Chip (SoC) and other complex distributed hardware/software systems contain heterogeneous components whose behavior are best captured by different models of computations (MoCs). As a result, any system design framework for such systems requires the capability to express heterogeneous MoCs. Although a number of system level design languages (SLDL)s and frameworks have proliferated over the last few years, most of them are lacking in multiple ways. Some of the SLDLs and system design frameworks we have worked with are SpecC, Ptolemy II, SystemC-H, etc. From our analysis of these, we identify their following shortcomings: First, their dependence on specific programming language artifacts (Java or C/C++) make them less amenable to formal analysis. Second, the refinement strategies proposed in the design flows based on these languages lack formal semantics underpinnings making it difficult to prove that refinements preserve correctness, and third, none of the available SLDLs are easily customizable by users. In our work, we address these problems as follows: To alleviate the first problem, we follow Axel Jantschâ s paradigm of function-based semantic definitions of MoCs and formulate a functional programming framework called SML-Sys. We illustrate through a number of examples how to model heterogenous computing systems using SML-Sys. Our framework provides for formal reasoning due to its formal semantic underpinning inherited from SMLâ s precise denotational semantics. To handle the second problem and apply refinement strategies at a higher-level, we propose a refinement methodology and provide a semantics preserving transformation library within our framework. To address the third shortcoming, we have developed EWD, which allows users to customize MoC-specific visual modeling syntax defined as a metamodel. EWD is developed using a metamodeling framework GME (Generic Modeling Environment). It allows for automatic design-time syntactic and semantic checks on the models for conformance to their metamodel. Modeling in EWD facilitates saving the model in an XML-based interoperability language (IML) we defined for this purpose. The IML format is in turn automatically translated into Standard ML, or Haskell models. These may then be executed and analyzed either by our existing model analysis tools SMLSys, or the ForSyDe environment. We also generate SMV-based template from the XML representation to obtain verification models.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
36

Inuani, Maurice Kilavuka. "Technology mapping of heterogeneous lookup table based field programmable gate arrays." Thesis, University of Oxford, 1998. http://ora.ox.ac.uk/objects/uuid:8ec8745f-c0b2-43c0-994f-bd949d9fdefa.

Full text
Abstract:
A lot of work has been done over the last decade on the logic synthesis and technology mapping of field programmable gate arrays (FPGAs) based on a single size of lookup table (LUT). A significant part of the FPGA market is occupied by devices based on more than one type of lookup tables. Examples of these heterogeneous LUT-based FPGAs are the Xilinx 4000 series devices. The technology mapping for this class of FPGAs has hardly been considered. This thesis covers work on the synthesis for heterogeneous LUT-based FPGAs. The proposed scheme uses the typical steps of graph covering, decomposition, node elimination and Boolean graph simplification. The covering step is based on the concept of flow networks and cut-computation. A theory is devised that reduces the flow network sizes so that a dynamic programming approach can be used to compute the feasible cuts in the network. An iterative selection algorithm can then be used to compute the set cover of the network. For the decomposition, the conventional bin-packing (cube-packing) algorithm has been extended so that it produces two types of bins. It has also been enhanced to explore several packing possibilities and include cube division and cascading of nodes. The classical functional decomposition method is extended to heterogeneous graphs. In particular, variable partitioning is coupled with other decomposition methods and exploits the structure of the functions. Partial collapsing and re-decomposition are used to re-synthesise the graphs. A strategy for eliminating nodes within a heterogeneous graph is developed. A simplification strategy is also derived from logic optimisation techniques. Comparisons of the mapping results on Xilinx devices show an improvement of over 11% over existing mapping tools for the same devices.
APA, Harvard, Vancouver, ISO, and other styles
37

Awoyemi, Babatunde Seun. "Resource allocation optimisation in heterogeneous cognitive radio networks." Thesis, University of Pretoria, 2017. http://hdl.handle.net/2263/61327.

Full text
Abstract:
Cognitive radio networks (CRN) have been tipped as one of the most promising paradigms for next generation wireless communication, due primarily to its huge promise of mitigating the spectrum scarcity challenge. To help achieve this promise, CRN develop mechanisms that permit spectrum spaces to be allocated to, and used by more than one user, either simultaneously or opportunistically, under certain preconditions. However, because of various limitations associated with CRN, spectrum and other resources available for use in CRN are usually very scarce. Developing appropriate models that can efficiently utilise the scarce resources in a manner that is fair, among its numerous and diverse users, is required in order to achieve the utmost for CRN. 'Resource allocation (RA) in CRN' describes how such models can be developed and analysed. In developing appropriate RA models for CRN, factors that can limit the realisation of optimal solutions have to be identified and addressed; otherwise, the promised improvement in spectrum/resource utilisation would be seriously undermined. In this thesis, by a careful examination of relevant literature, the most critical limitations to RA optimisation in CRN are identified and studied, and appropriate solution models that address such limitations are investigated and proffered. One such problem, identified as a potential limitation to achieving optimality in its RA solutions, is the problem of heterogeneity in CRN. Although it is indeed the more realistic consideration, introducing heterogeneity into RA in CRN exacerbates the complex nature of RA problems. In the study, three broad classifications of heterogeneity, applicable to CRN, are identified; heterogeneous networks, channels and users. RA models that incorporate these heterogeneous considerations are then developed and analysed. By studying their structures, the complex RA problems are smartly reformulated as integer linear programming problems and solved using classical optimisation. This smart move makes it possible to achieve optimality in the RA solutions for heterogeneous CRN. Another serious limitation to achieving optimality in RA for CRN is the strictness in the level of permissible interference to the primary users (PUs) due to the activities of the secondary users (SUs). To mitigate this problem, the concept of cooperative diversity is investigated and employed. In the cooperative model, the SUs, by assisting each other in relaying their data, reduce their level of interference to PUs significantly, thus achieving greater results in the RA solutions. Furthermore, an iterative-based heuristic is developed that solves the RA optimisation problem timeously and efficiently, thereby minimising network complexity. Although results obtained from the heuristic are only suboptimal, the gains in terms of reduction in computations and time make the idea worthwhile, especially when considering large networks. The final problem identified and addressed is the limiting effect of long waiting time (delay) on the RA and overall productivity of CRN. To address this problem, queueing theory is investigated and employed. The queueing model developed and analysed helps to improve both the blocking probability as well as the system throughput, thus achieving significant improvement in the RA solutions for CRN. Since RA is an essential pivot on which the CRN's productivity revolves, this thesis, by providing viable solutions to the most debilitating problems in RA for CRN, stands out as an indispensable contribution to helping CRN realise its much-proclaimed promises.
Thesis (PhD)--University of Pretoria, 2017.
Electrical, Electronic and Computer Engineering
PhD
Unrestricted
APA, Harvard, Vancouver, ISO, and other styles
38

Castrillon, Mazo Jeronimo [Verfasser]. "Programming heterogeneous MPSoCs : tool flows to close the software productivity gap / Jeronimo Castrillon Mazo." Aachen : Hochschulbibliothek der Rheinisch-Westfälischen Technischen Hochschule Aachen, 2013. http://d-nb.info/1035622904/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Reiche, Oliver [Verfasser]. "A Domain-Specific Language Approach for Designing and Programming Heterogeneous Image Systems / Oliver Reiche." München : Verlag Dr. Hut, 2018. http://d-nb.info/1168534674/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Radeschnig, Jessica. "Heterogeneous Optimality of Lifetime Consumption and Asset Allocation : Growing Old in Sweden." Thesis, Mälardalens högskola, Akademin för utbildning, kultur och kommunikation, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-36119.

Full text
Abstract:
This thesis covers a utility optimizing model designed and calibrated for agents of the Swedish economy. The main ingredient providing for this specific country is the modeling of the pension accumulation and pension benefits, which closely mimics the Swedish system. This characteristic is important since it measures one of the only two diversities between genders, that is, the income. The second characteristic is the survival probability. Except for these differences in national statistics, men and women are equal. The reminding model parameters are realistically set estimates from the surrounding economy. When using the model, firstly a baseline agent representing the entire labor force is under the microscope for evaluating the model itself. Next, one representative woman and one representative man from the private and public sectors respectively, composes a set of four samples for investigation of heterogeneity in optimality. The optimum level of consumption and risk-proportion of liquid wealth are solved by maximizing an Epstein-Zin utility function using the method of dynamic programming. The results suggests that both genders benefit from adapting the customized solutions to the problem.
APA, Harvard, Vancouver, ISO, and other styles
41

McGarity, Michael Computer Science &amp Engineering Faculty of Engineering UNSW. "Heterogeneous representations for reinforcement learning control of dynamic systems." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2004. http://handle.unsw.edu.au/1959.4/19350.

Full text
Abstract:
Intelligent agents are designed to interact with, and learn about, their environment so that they can act purposefully towards a goal. One class of problems encountered in building such agents is learning how to respond to dynamic systems with a continuous state space. The goals of this dissertation are to develop a framework for understanding the behaviour of partitioned dynamic systems with continuous underlying state and to translate this framework into algorithms which adaptively form a partition of the continuous space such that the partitioned system is more easily learned and controlled, and such that the control law may be easily explained in intuitive ways. Currently, algorithms which learn a control policy for partitioned continuous state space systems treat the partitioned system as an approximation to a Markov chain. I give conditions for the partitioned system to be a Markov chain, a semi-Markov process and a new class of system, a weak-semi-Markov process. The weak-semi-Markov model is shown to model partitioned dynamic systems with greater economy than other surveyed models. The behaviour of a partitioned state space system in the area around the region boundaries is also considered. I use the theory of sliding surfaces, and some heuristic arguments to recommend region boundary shape and position. The concept of 'staying on the boundary' then becomes a robust and relatively easy subgoal within the control algorithm. The concept of 'reaching the sliding surface' as a subgoal is used as the basis for an intuitive explanation of the learnt controller. I present an algorithm based on this concept which explains the behaviour of a learnt controller in ways not previously available to a machine learning algorithms. Finally, the Markov Property and the theory of Sliding Mode Control are used as the basis of a class of recursive algorithms. These algorithms adaptively find a partition, and simultaneously use this partition in conjunction with one of five reinforcement learning algorithms to find a control policy based on that partition. This technique is shown to work very well in learning, controlling and explaining a variety of physical systems, from a monorail to a container crane.
APA, Harvard, Vancouver, ISO, and other styles
42

Peccerillo, Biagio. "PHAST Library - A Productive Solution to Program Heterogeneity." Doctoral thesis, Università di Siena, 2020. http://hdl.handle.net/11365/1107894.

Full text
Abstract:
Oggigiorno, le architetture parallele sono ovunque: computer desktop, dispositivi mobile, workstation ad alte prestazioni e server sono solo alcuni esempi. Oltre ai classici multi-core, anche GPU, FPGA, TPU e altri dispositivi vengono utilizzati in un ampio spettro di applicazioni. In questa era post-PC, i programmatori devono necessariamente sfruttare l'eterogeneità per utilizzare pienamente l'hardware e raggiungere prestazioni senza precedenti. Scrivere codice eterogeneo può essere difficile: ogni dispositivo può avere i suoi linguaggi, i suoi framework e i suoi stili di programmazione. Pertanto, per essere produttivi, bisognerebbe preferire degli approcci eterogenei che consentano la scrittura di codice una volta sola. Questi approcci devono fornire come minimo la portabilità del codice tra dispositivi, ma anche la portabilità delle prestazioni, così da ridurre la necessità di programmare e ottimizzare diverse architetture in maniera specifica. Nonostante l'ampia diffusione di dispositivi eterogenei, tali framework sono ancora di là da venire. Questa tesi di Dottorato presenta il mio contributo alla programmazione parallela ed eterogenea. In una prima parte viene discusso approfonditamente lo stato dell'arte dell'ambito di appartenenza. In seguito viene presentato un framework che mira a porsi come punto di riferimento, tentando di conciliare i punti forti delle diverse soluzioni concorrenti, limitandone le debolezze. Questo framework, dal nome PHAST Library, è una libreria di programmazione single-source in C++ moderno che fornisce produttività e portabilità di codice e prestazioni tra dispositivi paralleli. In questo lavoro si discutono le caratteristiche in maniera approfondita e se ne valuta il valore tramite confronto con altri framework eterogenei ad alto livello di astrazione, sia da un punto di vista prestazionale che di produttività. Dal confronto, che annovera esempi semplici e applicazioni reali, emerge che il framework è maturo. Inoltre, PHAST Library rappresenta un miglioramento misurabile dello stato dell'arte e perciò può essere utilizzato per lo sviluppo di applicazioni parallele ed eterogenee in maniera produttiva e portabile. La tesi si conclude con la discussione delle sue possibili evoluzioni.
Nowadays, parallel architectures are everywhere: desktop computers, mobile devices, high-performance workstations, and servers are just a few examples. Not only multi-core microprocessors, but also GPUs, FPGAs, TPUs, and other computing devices are widely used in a broad range of applications. Programmers willing to code in this post-PC era need to take advantage of heterogeneity in order to fully utilize these devices and reach unprecedented performance. Programming heterogeneity can be difficult: each device can have its own languages, frameworks, and coding styles. Heterogeneous approaches that let programmers code once must be preferred for productivity reasons. They must provide portability as a minimum requirement, but also performance portability to reduce the need of device-specific coding and tuning. Despite the wide diffusion of heterogeneous computing devices, frameworks with these facilities are yet to come. This Ph.D. thesis presents my contribution to parallel and heterogeneous programming. After presenting an in-depth study of the state-of-the-art programming approaches for heterogeneous devices, it presents a programming framework that aims at embodying their strengths while limiting their weaknesses. This framework, called PHAST Library, is a modern C++ single-source programming library that provides near-native performance, productivity, portability, and performance portability across parallel computing devices. This thesis discusses it in-depth and evaluates it against other productive heterogeneous frameworks from both performance and productivity points of view. The comparison is done in the case of various toy and real-world applications, which shows that the framework is mature. It also improves the state-of-the-art in a measurable way, and thus its use can help programmers to write high-level, high-performance parallel heterogeneous code. In conclusion, a possible evolution of the framework is presented.
APA, Harvard, Vancouver, ISO, and other styles
43

Rafique, Muhammad Mustafa. "An Adaptive Framework for Managing Heterogeneous Many-Core Clusters." Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/29119.

Full text
Abstract:
The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as IBM Cell and AMD Fusion APUs, and commodity computational accelerators, such as programmable GPUs, which exhibit excellent price to performance ratio as well as the much needed high energy efficiency. While such accelerators have been studied in detail as stand-alone computational engines, integrating the accelerators into large-scale distributed systems with heterogeneous computing resources for data-intensive computing presents unique challenges and trade-offs. Traditional programming and resource management techniques cannot be directly applied to many-core accelerators in heterogeneous distributed settings, given the complex and custom instruction sets architectures, memory hierarchies and I/O characteristics of different accelerators. In this dissertation, we explore the design space of using commodity accelerators, specifically IBM Cell and programmable GPUs, in distributed settings for data-intensive computing and propose an adaptive framework for programming and managing heterogeneous clusters. The proposed framework provides a MapReduce-based extended programming model for heterogeneous clusters, which distributes tasks between asymmetric compute nodes by considering workload characteristics and capabilities of individual compute nodes. The framework provides efficient data prefetching techniques that leverage general-purpose cores to stage the input data in the private memories of the specialized cores. We also explore the use of an advanced layered-architecture based software engineering approach and provide mixin-layers based reusable software components to enable easy and quick deployment of heterogeneous clusters. The framework also provides multiple resource management and scheduling policies under different constraints, e.g., energy-aware and QoS-aware, to support executing concurrent applications on multi-tenant heterogeneous clusters. When applied to representative applications and benchmarks, our framework yields significantly improved performance in terms of programming efficiency and optimal resource management as compared to conventional, hand-tuned, approaches to program and manage accelerator-based heterogeneous clusters.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
44

Evans, B. J. "The construction of a virtual multicomputer based on heterogeneous processors by use of a lightweight multicast protocol." Thesis, University of Reading, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.357126.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Diarra, Rokiatou. "Automatic Parallelization for Heterogeneous Embedded Systems." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS485.

Full text
Abstract:
L'utilisation d'architectures hétérogènes, combinant des processeurs multicoeurs avec des accélérateurs tels que les GPU, FPGA et Intel Xeon Phi, a augmenté ces dernières années. Les GPUs peuvent atteindre des performances significatives pour certaines catégories d'applications. Néanmoins, pour atteindre ces performances avec des API de bas niveau comme CUDA et OpenCL, il est nécessaire de réécrire le code séquentiel, de bien connaître l’architecture des GPUs et d’appliquer des optimisations complexes, parfois non portables. D'autre part, les modèles de programmation basés sur des directives (par exemple, OpenACC, OpenMP) offrent une abstraction de haut niveau du matériel sous-jacent, simplifiant ainsi la maintenance du code et améliorant la productivité. Ils permettent aux utilisateurs d’accélérer leurs codes séquentiels sur les GPUs en insérant simplement des directives. Les compilateurs d'OpenACC/OpenMP ont la lourde tâche d'appliquer les optimisations nécessaires à partir des directives fournies par l'utilisateur et de générer des codes exploitant efficacement l'architecture sous-jacente. Bien que les compilateurs d'OpenACC/OpenMP soient matures et puissent appliquer certaines optimisations automatiquement, le code généré peut ne pas atteindre l'accélération prévue, car les compilateurs ne disposent pas d'une vue complète de l'ensemble de l'application. Ainsi, il existe généralement un écart de performance important entre les codes accélérés avec OpenACC/OpenMP et ceux optimisés manuellement avec CUDA/OpenCL. Afin d'aider les programmeurs à accélérer efficacement leurs codes séquentiels sur GPU avec les modèles basés sur des directives et à élargir l'impact d'OpenMP/OpenACC dans le monde universitaire et industrielle, cette thèse aborde plusieurs problématiques de recherche. Nous avons étudié les modèles de programmation OpenACC et OpenMP et proposé une méthodologie efficace de parallélisation d'applications avec les approches de programmation basées sur des directives. Notre expérience de portage d'applications a révélé qu'il était insuffisant d'insérer simplement des directives de déchargement OpenMP/OpenACC pour informer le compilateur qu'une région de code particulière devait être compilée pour être exécutée sur la GPU. Il est essentiel de combiner les directives de déchargement avec celles de parallélisation de boucle. Bien que les compilateurs actuels soient matures et effectuent plusieurs optimisations, l'utilisateur peut leur fournir davantage d'informations par le biais des clauses des directives de parallélisation de boucle afin d'obtenir un code mieux optimisé. Nous avons également révélé le défi consistant à choisir le bon nombre de threads devant exécuter une boucle. Le nombre de threads choisi par défaut par le compilateur peut ne pas produire les meilleures performances. L'utilisateur doit donc essayer manuellement différents nombres de threads pour améliorer les performances. Nous démontrons que les modèles de programmation OpenMP et OpenACC peuvent atteindre de meilleures performances avec un effort de programmation moindre, mais les compilateurs OpenMP/OpenACC atteignent rapidement leur limite lorsque le code de région déchargée a une forte intensité arithmétique, nécessite un nombre très élevé d'accès à la mémoire globale et contient plusieurs boucles imbriquées. Dans de tels cas, des langages de bas niveau doivent être utilisés. Nous discutons également du problème d'alias des pointeurs dans les codes GPU et proposons deux outils d'analyse statiques qui permettent d'insérer automatiquement les qualificateurs de type et le remplacement par scalaire dans le code source
Recent years have seen an increase of heterogeneous architectures combining multi-core CPUs with accelerators such as GPU, FPGA, and Intel Xeon Phi. GPU can achieve significant performance for certain categories of application. Nevertheless, achieving this performance with low-level APIs (e.g. CUDA, OpenCL) requires to rewrite the sequential code, to have a good knowledge of GPU architecture, and to apply complex optimizations that are sometimes not portable. On the other hand, directive-based programming models (e.g. OpenACC, OpenMP) offer a high-level abstraction of the underlying hardware, thus simplifying the code maintenance and improving productivity. They allow users to accelerate their sequential codes on GPU by simply inserting directives. OpenACC/OpenMP compilers have the daunting task of applying the necessary optimizations from the user-provided directives and generating efficient codes that take advantage of the GPU architecture. Although the OpenACC / OpenMP compilers are mature and able to apply some optimizations automatically, the generated code may not achieve the expected speedup as the compilers do not have a full view of the whole application. Thus, there is generally a significant performance gap between the codes accelerated with OpenACC/OpenMP and those hand-optimized with CUDA/OpenCL. To help programmers for speeding up efficiently their legacy sequential codes on GPU with directive-based models and broaden OpenMP/OpenACC impact in both academia and industry, several research issues are discussed in this dissertation. We investigated OpenACC and OpenMP programming models and proposed an effective application parallelization methodology with directive-based programming approaches. Our application porting experience revealed that it is insufficient to simply insert OpenMP/OpenACC offloading directives to inform the compiler that a particular code region must be compiled for GPU execution. It is highly essential to combine offloading directives with loop parallelization constructs. Although current compilers are mature and perform several optimizations, the user may provide them more information through loop parallelization constructs clauses in order to get an optimized code. We have also revealed the challenge of choosing good loop schedules. The default loop schedule chosen by the compiler may not produce the best performance, so the user has to manually try different loop schedules to improve the performance. We demonstrate that OpenMP and OpenACC programming models can achieve best performance with lesser programming effort, but OpenMP/OpenACC compilers quickly reach their limit when the offloaded region code is computed/memory bound and contain several nested loops. In such cases, low-level languages may be used. We also discuss pointers aliasing problem in GPU codes and propose two static analysis tools that perform automatically at source level type qualifier insertion and scalar promotion to solve aliasing issues
APA, Harvard, Vancouver, ISO, and other styles
46

Chang, Tao. "Evaluation of programming models for manycore and / or heterogeneous architectures for Monte Carlo neutron transport codes." Thesis, Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAX099.

Full text
Abstract:
Dans cette thèse nous nous proposons d’évaluer les différents modèles de programmation disponibles pour adresser les architectures de type manycore et/ou hétérogènes dans le cadre des codes de transport Monte Carlo. On considèrera dans un premier temps un cas test d’application simple mais représentatif pour couvrir un éventail assez large de solutions et les comparer en terme de performance, de portabilité de la performance, de facilité de mise en œuvre et de maintenabilité. Les architectures cibles sont les CPU `classique', Intel Xeon Phi et GPU. Les modèles de programmation les plus pertinents seront ensuite mis en place dans un code de transport Monte Carlo
In this thesis we propose to evaluate the different programming models available for addressing manycore and / or heterogeneous architectures within the framework of the Monte Carlo transport codes. A simple but representative application test case will be considered in order to cover a fairly wide range of solutions and compare them in terms of performance, portability of performance, ease of implementation and maintainability. The target architectures are `classic' CPUs, Intel Xeon Phi and GPUs. The most relevant programming models will then be set up in a Monte Carlo transport code
APA, Harvard, Vancouver, ISO, and other styles
47

Potluri, Sreeram. "Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1397797221.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Luong, Johannes [Verfasser], Wolfgang [Gutachter] Lehner, and Ziawasch [Gutachter] Abedjan. "A Common Programming Interface for Managed Heterogeneous Data Analysis / Johannes Luong ; Gutachter: Wolfgang Lehner, Ziawasch Abedjan." Dresden : Technische Universität Dresden, 2021. http://d-nb.info/1238140599/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Cabarcas, Jaramillo Felipe. "Castell: a heterogeneous cmp architecture scalable to hundreds of processors." Doctoral thesis, Universitat Politècnica de Catalunya, 2011. http://hdl.handle.net/10803/80542.

Full text
Abstract:
Technology improvements and power constrains have taken multicore architectures to dominate microprocessor designs over uniprocessors. At the same time, accelerator based architectures have shown that heterogeneous multicores are very efficient and can provide high throughput for parallel applications, but with a high-programming effort. We propose Castell a scalable chip multiprocessor architecture that can be programmed as uniprocessors, and provides the high throughput of accelerator-based architectures. Castell relies on task-based programming models that simplify software development. These models use a runtime system that dynamically finds, schedules, and adds hardware-specific features to parallel tasks. One of these features is DMA transfers to overlap computation and data movement, which is known as double buffering. This feature allows applications on Castell to tolerate large memory latencies and lets us design the memory system focusing on memory bandwidth. In addition to provide programmability and the design of the memory system, we have used a hierarchical NoC and added a synchronization module. The NoC design distributes memory traffic efficiently to allow the architecture to scale. The synchronization module is a consequence of the large performance degradation of application for large synchronization latencies. Castell is mainly an architecture framework that enables the definition of domain-specific implementations, fine-tuned to a particular problem or application. So far, Castell has been successfully used to propose heterogeneous multicore architectures for scientific kernels, video decoding (using H.264), and protein sequence alignment (using Smith-Waterman and clustalW). It has also been used to explore a number of architecture optimizations such as enhanced DMA controllers, and architecture support for task-based programming models. iii
APA, Harvard, Vancouver, ISO, and other styles
50

Kainth, Haresh S. "A data dependency recovery system for a heterogeneous multicore processor." Thesis, University of Derby, 2014. http://hdl.handle.net/10545/313343.

Full text
Abstract:
Multicore processors often increase the performance of applications. However, with their deeper pipelining, they have proven increasingly difficult to improve. In an attempt to deliver enhanced performance at lower power requirements, semiconductor microprocessor manufacturers have progressively utilised chip-multicore processors. Existing research has utilised a very common technique known as thread-level speculation. This technique attempts to compute results before the actual result is known. However, thread-level speculation impacts operation latency, circuit timing, confounds data cache behaviour and code generation in the compiler. We describe an software framework codenamed Lyuba that handles low-level data hazards and automatically recovers the application from data hazards without programmer and speculation intervention for an asymmetric chip-multicore processor. The problem of determining correct execution of multiple threads when data hazards occur on conventional symmetrical chip-multicore processors is a significant and on-going challenge. However, there has been very little focus on the use of asymmetrical (heterogeneous) processors with applications that have complex data dependencies. The purpose of this thesis is to: (i) define the development of a software framework for an asymmetric (heterogeneous) chip-multicore processor; (ii) present an optimal software control of hardware for distributed processing and recovery from violations;(iii) provides performance results of five applications using three datasets. Applications with a small dataset showed an improvement of 17% and a larger dataset showed an improvement of 16% giving overall 11% improvement in performance.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography