Dissertations / Theses: 'Heterogeneous computing'

1

Lu, Howard J. (Howard Jason). "Heterogeneous multithreaded computing." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/36584.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Jackson, Robert Owen. "Heterogeneous parallel computing." Thesis, University of Birmingham, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.366162.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Fagg, Graham Edward. "Enabling technologies for parallel heterogeneous computing." Thesis, University of Reading, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266150.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Scogland, Thomas R. "Runtime Adaptation for Autonomic Heterogeneous Computing." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/71315.

Full text

Abstract:

Heterogeneity is increasing across all levels of computing, with the rise of accelerators such as GPUs, FPGAs, and other coprocessors into everything from cell phones to supercomputers. More quietly it is increasing with the rise of NUMA systems, hierarchical caching, OS noise, and a myriad of other factors. As heterogeneity becomes a fact of life, efficiently managing heterogeneous compute resources is becoming a critical, and ever more complex, task. The focus of this dissertation is to lay the foundation for an autonomic system for heterogeneous computing, employing runtime adaptation to improve performance portability and performance consistency while maintaining or increasing programmability. We investigate heterogeneity arising from a myriad of factors, grouped into the dimensions of locality and capability. This work has resulted in runtime schedulers capable of automatically detecting and mitigating heterogeneity in physically homogeneous systems through MPI and adaptive coscheduling for physically heterogeneous accelerator based systems as well as a synthesis of the two to address multiple levels of heterogeneity as a coherent whole. We also discuss our current work towards the next generation of fine-grained scheduling and synchronization across heterogeneous platforms in the design of a highly-scalable and portable concurrent queue for many-core systems. Each component addresses aspects of the urgent need for automated management of the extreme and ever expanding complexity introduced by heterogeneity. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

5

Shum, Kam Hong. "Adaptive parallelism for computing on heterogeneous clusters." Thesis, University of Cambridge, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.627563.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Lee, Jaekyu. "Shared resource management for efficient heterogeneous computing." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50217.

Full text

Abstract:

The demand for heterogeneous computing, because of its performance and energy efficiency, has made on-chip heterogeneous chip multi-processors (HCMP) become the mainstream computing platform, as the recent trend shows in a wide spectrum of platforms from smartphone application processors to desktop and low-end server processors. The performance of on-chip GPUs is not yet comparable to that of discrete GPU cards, but vendors have integrated more powerful GPUs and this trend will continue in upcoming processors. In this architecture, several system resources are shared between CPUs and GPUs. The sharing of system resources enables easier and cheaper data transfer between CPUs and GPUs, but it also causes resource contention problems between cores. The resource sharing problem has existed since the homogeneous (CPU-only) chip-multi processor (CMP) was introduced. However, resource sharing in HCMPs shows different aspects because of the different nature of CPU and GPU cores. In order to solve the resource sharing problem in HCMPs, we consider efficient shared resource management schemes, in particular tackling the problem in shared last-level cache and interconnection network. In the thesis, we propose four resource sharing mechanisms: First, we propose an efficient cache sharing mechanism that exploits the different characteristics of CPU and GPU cores to effectively share cache space between them. Second, adaptive virtual channel partitioning for on-chip interconnection network is proposed to isolate inter-application interference. By partitioning virtual channels to CPUs and GPUs, we can prevent the interference problem while guaranteeing quality-of-service (QoS) for both cores. Third, we propose a dynamic frequency controlling mechanism to efficiently share system resources. When both cores are active, the degree of resource contention as well as the system throughput will be affected by the operating frequency of CPUs and GPUs. The proposed mechanism tries to find optimal operating frequencies for both cores, which reduces the resource contention while improving system throughput. Finally, we propose a second cache sharing mechanism that exploits GPU-semantic information. The programming and execution models of GPUs are more strict and easier than those of CPUs. Also, programmers are asked to provide more information to the hardware. By exploiting these characteristics, GPUs can energy-efficiently exercise the cache and simpler, but more efficient cache partitioning can be enabled for HCMPs.

APA, Harvard, Vancouver, ISO, and other styles

7

Herdman, Andy. "The readying of applications for heterogeneous computing." Thesis, University of Warwick, 2017. http://wrap.warwick.ac.uk/102343/.

Full text

Abstract:

High performance computing is approaching a potentially significant change in architectural design. With pressures on the cost and sheer amount of power, additional architectural features are emerging which require a re-think to the programming models deployed over the last two decades. Today's emerging high performance computing (HPC) systems are maximising performance per unit of power consumed resulting in the constituent parts of the system to be made up of a range of different specialised building blocks, each with their own purpose. This heterogeneity is not just limited to the hardware components but also in the mechanisms that exploit the hardware components. These multiple levels of parallelism, instruction sets and memory hierarchies, result in truly heterogeneous computing in all aspects of the global system. These emerging architectural solutions will require the software to exploit tremendous amounts of on-node parallelism and indeed programming models to address this are emerging. In theory, the application developer can design new software using these models to exploit emerging low power architectures. However, in practice, real industrial scale applications last the lifetimes of many architectural generations and therefore require a migration path to these next generation supercomputing platforms. Identifying that migration path is non-trivial: With applications spanning many decades, consisting of many millions of lines of code and multiple scientific algorithms, any changes to the programming model will be extensive and invasive and may turn out to be the incorrect model for the application in question. This makes exploration of these emerging architectures and programming models using the applications themselves problematic. Additionally, the source code of many industrial applications is not available either due to commercial or security sensitivity constraints. This thesis highlights this problem by assessing current and emerging hard- ware with an industrial strength code, and demonstrating those issues described. In turn it looks at the methodology of using proxy applications in place of real industry applications, to assess their suitability on the next generation of low power HPC offerings. It shows there are significant benefits to be realised in using proxy applications, in that fundamental issues inhibiting exploration of a particular architecture are easier to identify and hence address. Evaluations of the maturity and performance portability are explored for a number of alternative programming methodologies, on a number of architectures and highlighting the broader adoption of these proxy applications, both within the authors own organisation, and across the industry as a whole.

APA, Harvard, Vancouver, ISO, and other styles

8

Sarjanoja, S. (Sampsa). "BM3D image denoising using heterogeneous computing platforms." Master's thesis, University of Oulu, 2015. http://urn.fi/URN:NBN:fi:oulu-201504141380.

Full text

Abstract:

Noise reduction is one of the most fundamental digital image processing problems, and is often designed to be solved at an early stage of the image processing path. Noise appears on the images in many different ways, and it is inevitable. In general, various image processing algorithms perform better if their input is as error-free as possible. In order to keep the processing delays small in different computing platforms, it is important that the noise reduction is performed swiftly. The recent progress in the entertainment industry has led to major improvements in the computing capabilities of graphics cards. Today, graphics circuits consist of several hundreds or even thousands of computing units. Using these computing units for general-purpose computation is possible with OpenCL and CUDA programming interfaces. In applications where the processed data is relatively independent, using parallel computing units may increase the performance significantly. Graphics chips enabled with general-purpose computation capabilities are becoming more common also in mobile devices. In addition, photography has never been as popular as it is nowadays by using mobile devices. This thesis aims to implement the calculation of the state-of-the-art technology used in noise reduction, block-matching and three-dimensional filtering (BM3D), to be executed in heterogeneous computing environments. This study evaluates the performance of the presented implementations by making comparisons with existing implementations. The presented implementations achieve significant benefits from the use of parallel computing devices. At the same time the comparisons illustrate general problems in the utilization of using massively parallel processing for the calculation of complex imaging algorithms Kohinanpoisto on yksi keskeisimmistä digitaaliseen kuvankäsittelyyn liittyvistä ongelmista, joka useimmiten pyritään ratkaisemaan jo signaalinkäsittelyvuon varhaisessa vaiheessa. Kohinaa ilmestyy kuviin monella eri tavalla ja sen esiintyminen on väistämätöntä. Useat kuvankäsittelyalgoritmit toimivat paremmin, jos niiden syöte on valmiiksi mahdollisimman virheetöntä käsiteltäväksi. Jotta kuvankäsittelyviiveet pysyisivät pieninä eri laskenta-alustoilla, on tärkeää että myös kohinanpoisto suoritetaan nopeasti. Viihdeteollisuuden kehityksen myötä näytönohjaimien laskentateho on moninkertaistunut. Nykyisin näytönohjainpiirit koostuvat useista sadoista tai jopa tuhansista laskentayksiköistä. Näiden laskentayksiköiden käyttäminen yleiskäyttöiseen laskentaan on mahdollista OpenCL- ja CUDA-ohjelmointirajapinnoilla. Rinnakkaislaskenta usealla laskentayksiköllä mahdollistaa suuria suorituskyvyn parannuksia käyttökohteissa, joissa käsiteltävä tieto on toisistaan riippumatonta tai löyhästi riippuvaista. Näytönohjainpiirien käyttö yleisessä laskennassa on yleistymässä myös mobiililaitteissa. Lisäksi valokuvaaminen on nykypäivänä suosituinta juuri mobiililaitteilla. Tämä diplomityö pyrkii selvittämään viimeisimmän kohinanpoistoon käytettävän tekniikan, lohkonsovitus ja kolmiulotteinen suodatus (block-matching and three-dimensional filtering, BM3D), laskennan toteuttamista heterogeenisissä laskentaympäristöissä. Työssä arvioidaan esiteltyjen toteutusten suorituskykyä tekemällä vertailuja jo olemassa oleviin toteutuksiin. Esitellyt toteutukset saavuttavat merkittäviä hyötyjä rinnakkaislaskennan käyttämisestä. Samalla vertailuissa havainnollistetaan yleisiä ongelmakohtia näytönohjainlaskennan hyödyntämisessä monimutkaisten kuvankäsittelyalgoritmien laskentaan

APA, Harvard, Vancouver, ISO, and other styles

9

Elteir, Marwa Khamis. "A MapReduce Framework for Heterogeneous Computing Architectures." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28786.

Full text

Abstract:

Nowadays, an increasing number of computational systems are equipped with heterogeneous compute resources, i.e., following different architecture. This applies to the level of a single chip, a single node and even supercomputers and large-scale clusters. With its impressive price-to-performance ratio as well as power efficiently compared to traditional multicore processors, graphics processing units (GPUs) has become an integrated part of these systems. GPUs deliver high peak performance; however efficiently exploiting their computational power requires the exploration of a multi-dimensional space of optimization methodologies, which is challenging even for the well-trained expert. The complexity of this multi-dimensional space arises not only from the traditionally well known but arduous task of architecture-aware GPU optimization at design and compile time, but it also arises in the partitioning and scheduling of the computation across these heterogeneous resources. Even with programming models like the Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), the developer still needs to manage the data transfer be- tween host and device and vice versa, orchestrate the execution of several kernels, and more arduously, optimize the kernel code. In this dissertation, we aim to deliver a transparent parallel programming environment for heterogeneous resources by leveraging the power of the MapReduce programming model and OpenCL programming language. We propose a portable architecture-aware framework that efficiently runs an application across heterogeneous resources, specifically AMD GPUs and NVIDIA GPUs, while hiding complex architectural details from the developer. To further enhance performance portability, we explore approaches for asynchronously and efficiently distributing the computations across heterogeneous resources. When applied to benchmarks and representative applications, our proposed framework significantly enhances performance, including up to 58% improvement over traditional approaches to task assignment and up to a 45-fold improvement over state-of-the-art MapReduce implementations. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

10

Lee, Young Choon. "Problem-centric scheduling for heterogeneous computing systems." Thesis, The University of Sydney, 2007. http://hdl.handle.net/2123/9321.

Full text

Abstract:

This project addresses key scheduling problems in heterogeneous computing environments. Heterogeneous computing systems (HCSs) have received increased attention since the 1990s, particularly over the past 10 years with the popularity of grid computing systems. These computing environments consist of a variety of resources interconnected by a high-speed network. Many parallel and distributed applications can take advantage of this computing platform; however, resource heterogeneity and dynamism impose scheduling restrictions. It is extremely difficult for a single scheduling scheme to efficiently and effectively handle the application scenarios that are required in grid computing environments. What further complicates the issue is that computing environments are controlled by different administrative authorities. Thus, application diversity, and resource heterogeneity and dynamism, point to the need to develop a set of scheduling algorithms to manage these scenarios. The thesis describes a number of key application and system models, and extensively discusses the characteristics of traditional multiprocessor scheduling and grid scheduling. The application models can be broadly classified as independent and precedence-constrained. The coupling of resources in our HCS model can be tight or loose; while static scheduling is applied to tightly coupled platforms, dynamic scheduling is adopted on loosely coupled platforms. The thesis presents the scheduling schemes that we have developed to address various challenging scheduling issues, and sets out and interprets the experimental results from our performance evaluation study. The data indicate that our novel scheduling algorithms—which appropriately incorporate application and system characteristics into their scheduling—demonstrate significantly superior performance than previous approaches.

APA, Harvard, Vancouver, ISO, and other styles

11

Sai, Ranga Prashanth C. "Algorithms for task scheduling in heterogeneous computing environments." Auburn, Ala., 2006. http://repo.lib.auburn.edu/2006%20Fall/SAI_RANGA_58.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Chen, Keping. "Self-Organised computing in a dynamic heterogeneous environment." Thesis, University of Manchester, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492715.

Full text

Abstract:

In a distributed, dynamic, and heterogeneous environment, achieving acceptable performance is challenging. This thesis describes research inspired by work in grid performance control and the vision of autonomic computing. It can be summarised in two parts: autonomous performance control and self-organised scheduling.

APA, Harvard, Vancouver, ISO, and other styles

13

Rodrigues, Gabriel Siqueira. "Autonomic goal-driven deployment in heterogeneous computing environments." reponame:Repositório Institucional da UnB, 2016. http://repositorio.unb.br/handle/10482/23185.

Full text

Abstract:

Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2016. Submitted by Fernanda Percia França (fernandafranca@bce.unb.br) on 2017-03-03T18:16:47Z No. of bitstreams: 1 2016_GabrielSiqueiraRodrigues.pdf: 1418859 bytes, checksum: 2ee51220d6f243fc8432fb73a19952c2 (MD5) Approved for entry into archive by Raquel Viana(raquelviana@bce.unb.br) on 2017-04-04T21:54:40Z (GMT) No. of bitstreams: 1 2016_GabrielSiqueiraRodrigues.pdf: 1418859 bytes, checksum: 2ee51220d6f243fc8432fb73a19952c2 (MD5) Made available in DSpace on 2017-04-04T21:54:40Z (GMT). No. of bitstreams: 1 2016_GabrielSiqueiraRodrigues.pdf: 1418859 bytes, checksum: 2ee51220d6f243fc8432fb73a19952c2 (MD5) Vemos um crescente interesse em aplicações que devem contar com ambientes de computação heterogêneos, como a Internet das Coisas (IoT). Esses aplicativos são destinados a executar em uma ampla gama de dispositivos com diferentes recursos computacionais disponíveis. Para lidar com algum tipo de heterogeneidade, como dois tipos possíveis de processadores gráficos em um computador pessoal, podemos usar abordagens simples como um script que escolhe a biblioteca de software certa a ser copiada para uma pasta. Essas abordagens simples são centralizadas e criadas em tempo de design. Eles requerem um especialista ou equipe para controlar todo o espaço de variabilidade. Dessa forma, essas abordagens não são escaláveis para ambientes altamente heterogêneos. Em ambientes altamente heterogêneos, é difícil prever o ambiente computacional em tempo de projeto, implicando provavelmente indecidibilidade na configuração correta para cada ambiente. Em nosso trabalho, propomos GoalD: um método que permite a implantação autônoma de sistemas, refletindo sobre os objetivos do sistema e seu ambiente computacional. Por implantação autônoma, queremos dizer que o sistema é capaz de encontrar o conjunto correto de componentes para o ambiente computacional alvo, sem intervenção humana. Nós avaliamos nossa abordagem em um estudo de caso: conselheiro de estação de abastecimento, onde uma aplicação aconselha um motorista onde reabastecer / recarregar seu veículo. Nós projetamos a aplicação com variabilidade em nível de requisitos, arquitetura e implantação, o que pode permitir que a aplicação projetada seja executada em diferentes dispositivos. Para cenários com diferentes ambientes, foi possível planejar a implantação de forma autônoma. Além disso, a escalabilidade do algoritmo que planeja a implantação foi avaliada em um ambiente simulado. Os resultados mostram que usando a abordagem é possível planejar de forma autônoma a implantação de um sistema com milhares de componentes em poucos segundos. We see a growing interest in computing applications that should rely on heterogeneous computing environments, like Internet of Things (IoT). Such applications are intended to execute in a broad range of devices with different available computing resources. In order to handle some kind of heterogeneity, such as two possible types of graphical processors in a desktop computer, we can use simple approaches as a script at deployment-time that chooses the right software library to be copied to a folder. These simple approaches are centralized and created at design-time. They require one specialist or team to control the entire space of variability. However, such approaches are not scalable to highly heterogeneous environments. In highly dynamic and heterogeneous environment it is hard to predict the computing environment at design-time, implying likely undecidability on the correct configuration for each environment at design-time. In our work, we propose GoalD: a method that allows autonomous deployment of systems by reflecting about the goals of the system and its computing environment. By autonomous deployment, we mean that the system can find the correct set of components, for the target computing environment, without human intervention. We evaluate our approach on the filling station advisor case study where an application advises a driver where to refuel/recharge its vehicle. We design the application with variability at requirements, architecture, and deployment, which can allow the designed application be executed in different devices. For scenarios with different environments, it was possible to plan the deployment autonomously. Additionally, the scalability of the algorithm that plan the deployment was evaluated in a simulated environment. Results show that using the approach it is possible to autonomously plan the deployment of a system with thousands of components in few seconds.

APA, Harvard, Vancouver, ISO, and other styles

14

Aji, Ashwin M. "Programming High-Performance Clusters with Heterogeneous Computing Devices." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/52366.

Full text

Abstract:

Today's high-performance computing (HPC) clusters are seeing an increase in the adoption of accelerators like GPUs, FPGAs and co-processors, leading to heterogeneity in the computation and memory subsystems. To program such systems, application developers typically employ a hybrid programming model of MPI across the compute nodes in the cluster and an accelerator-specific library (e.g.; CUDA, OpenCL, OpenMP, OpenACC) across the accelerator devices within each compute node. Such explicit management of disjointed computation and memory resources leads to reduced productivity and performance. This dissertation focuses on designing, implementing and evaluating a runtime system for HPC clusters with heterogeneous computing devices. This work also explores extending existing programming models to make use of our runtime system for easier code modernization of existing applications. Specifically, we present MPI-ACC, an extension to the popular MPI programming model and runtime system for efficient data movement and automatic task mapping across the CPUs and accelerators within a cluster, and discuss the lessons learned. MPI-ACC's task-mapping runtime subsystem performs fast and automatic device selection for a given task. MPI-ACC's data-movement subsystem includes careful optimizations for end-to-end communication among CPUs and accelerators, which are seamlessly leveraged by the application developers. MPI-ACC provides a familiar, flexible and natural interface for programmers to choose the right computation or communication targets, while its runtime system achieves efficient cluster utilization. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

15

Bijanapalli, Chakri Ramakrishna. "Enabling the use of Heterogeneous Computing for Bioinformatics." Thesis, Virginia Tech, 2013. http://hdl.handle.net/10919/23866.

Full text

Abstract:

The huge amount of information in the encoded sequence of DNA and increasing interest in uncovering new discoveries has spurred interest in accelerating the DNA sequencing and alignment processes. The use of heterogeneous systems, that use different types of computational units, has seen a new light in high performance computing in recent years; However expertise in multiple domains and skills required to program these systems is causing an hindrance to bioinformaticians in rapidly deploying their applications into these heterogeneous systems. This work attempts to make an heterogeneous system, Convey HC-1, with an x86-based host processor and FPGA-based co-processor, accessible to bioinformaticians. First, a highly efficient dynamic programming based Smith-Waterman kernel is implemented in hardware, which is able to achieve a peak throughput of 307.2 Giga Cell Updates per Second (GCUPS) on Convey HC-1. A dynamic programming accelerator interface is provided to any application that uses Smith-Waterman. This implementation is also extended to General Purpose Graphics Processing Units (GP-GPUs), which achieved a peak throughput of 9.89 GCUPS on NVIDIA GTX580 GPU. Second, a well known graphical programming tool, LabVIEW is enabled as a programming tool for the Convey HC-1. A connection is established between the graphical interface and the Convey HC-1 to control and monitor the application running on the FPGA-based co-processor. Master of Science

APA, Harvard, Vancouver, ISO, and other styles

16

Winkleblack, Scott Kenneth swinkleb. "ReGen: Optimizing Genetic Selection Algorithms for Heterogeneous Computing." DigitalCommons@CalPoly, 2014. https://digitalcommons.calpoly.edu/theses/1236.

Full text

Abstract:

GenSel is a genetic selection analysis tool used to determine which genetic markers are informational for a given trait. Performing genetic selection related analyses is a time consuming and computationally expensive task. Due to an expected increase in the number of genotyped individuals, analysis times will increase dramatically. Therefore, optimization efforts must be made to keep analysis times reasonable. This thesis focuses on optimizing one of GenSel’s underlying algorithms for heterogeneous computing. The resulting algorithm exposes task-level parallelism and data-level parallelism present but inaccessible in the original algorithm. The heterogeneous computing solution, ReGen, outperforms the optimized CPU implementation achieving a 1.84 times speedup.

APA, Harvard, Vancouver, ISO, and other styles

17

FANFARILLO, ALESSANDRO. "Parallel programming techniques for heterogeneous exascale computing platforms." Doctoral thesis, Università degli Studi di Roma "Tor Vergata", 2014. http://hdl.handle.net/2108/202339.

Full text

Abstract:

Nowadays, the most powerful supercomputers in the world, needed for solving complex models and simulations of critical scientific problems, are able to perform tens of quadrillion (1015) floating point operations per second (tens of PetaFLOPS). Although such big amount of computational power may seem enough, scientists and engineers always need to solve more accurate models, run broader simulations and analyze huge amount of data in less time. In particular, experiments that are currently impossible, dangerous, or too expensive to be realized, can be accurately simulated by solving complex predictive models on an exascale machine (1018 FLOPS). A few examples of studies where the exascale computing can make a difference are: reduction of the carbon footprint of the transportation sector, innovative designs for cost-effective renewable energy resources, efficiency and safety of nuclear energy, reverse engineering of the human brain, design, control and manufacture of advanced materials. The importance of having an exascale supercomputer has been officially acknowledged on July 29th, 2015 by President Obama, who signed an executive order creating a National Strategic Computing Initiative calling for the accelerated development of an exascale system. Unfortunately, building an exascale system with the technology we currently use on petascale machines would represent an unaffordable project. Although the cost of the processing units is so inexpensive as to be considered as free, the energy required for moving data (from memories to processors and across the network) and to power-on the entire system (including the cooling system) represents the real limit for reaching the exascale era. Therefore, deep changes in hardware architectures, programming models and parallel algorithms are needed in order to reduce energy requirements and increase compute power. In this dissertation, we face the challanges related to data transfers on exascale architectures, proposing solutions in the field of heterogeneous architectures (CPUs + Accelerators), parallel programming models and parallel algorithms. In particular, we first explore the potential benefits brought by a hybrid CPUs+GPUs approach for sparse matrix computations, then we implement and analyze the performance of coar- VII ray Fortran as parallel programming system for exascale computing. Finally, we merge the world of accelerators and coarray Fortran in order to create a data-aware parallel programming model, suitable for exascale computing. The implementation of OpenCoarrays, the open-source communication library used by GNU Fortran for supporting coarrays, and its usage on heterogeneous devices, are the most relevant contributions presented in this dissertation.

APA, Harvard, Vancouver, ISO, and other styles

18

Chiesi, Matteo <1984&gt. "Heterogeneous Multi-core Architectures for High Performance Computing." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amsdottorato.unibo.it/6469/1/strutt.pdf.

Full text

Abstract:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

APA, Harvard, Vancouver, ISO, and other styles

19

Chiesi, Matteo <1984&gt. "Heterogeneous Multi-core Architectures for High Performance Computing." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amsdottorato.unibo.it/6469/.

Full text

Abstract:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

APA, Harvard, Vancouver, ISO, and other styles

20

Vasanta, Harikrishna. "Secure, privacy assured mechanisms for heterogeneous contextual environments." Thesis, Queensland University of Technology, 2006. https://eprints.qut.edu.au/16177/1/Harikrishna_Vasanta_Thesis.pdf.

Full text

Abstract:

Location information is used to provide a diverse range of services to users such as emergency, navigation, billing, security, information and advertising services. This information is derived from a broad range of indoor and outdoor technologies. The location information thus derived is of different granularity, different co-ordination system and is controlled by numerous service providers. In addition to this, broad selections of devices are used for providing these services. Having a diverse range of applications requiring location information at different levels of granularity, the need to export location information across multiple devices and the existence of different location determination technologies necessitates the need for heterogeneous location network. These networks derive location information from multiple sources and provides various location-based services to users irrespective of the medium, device or technology used. Security, user privacy and management of location information are some of the important issues that need to be addressed. The main contribution of this thesis is the design of a secure and privacy assured heterogeneous location architecture. A formal methodology was chosen to design the heterogeneous location architecture. The design of the architecture resulted in a novel key distribution protocol and a model for information flow that can be easily encapsulated into applications or architectures having similar requirements. The research also resulted in the enhancement of a proposed location framework for securing critical infrastructures using context-aware self-defending objects. The proposed enhanced framework helps to negate the security vulnerabilities introduced through the use of general-purpose computer systems in critical infrastructures.

APA, Harvard, Vancouver, ISO, and other styles

21

Vasanta, Harikrishna. "Secure, privacy assured mechanisms for heterogeneous contextual environments." Queensland University of Technology, 2006. http://eprints.qut.edu.au/16177/.

Full text

Abstract:

Location information is used to provide a diverse range of services to users such as emergency, navigation, billing, security, information and advertising services. This information is derived from a broad range of indoor and outdoor technologies. The location information thus derived is of different granularity, different co-ordination system and is controlled by numerous service providers. In addition to this, broad selections of devices are used for providing these services. Having a diverse range of applications requiring location information at different levels of granularity, the need to export location information across multiple devices and the existence of different location determination technologies necessitates the need for heterogeneous location network. These networks derive location information from multiple sources and provides various location-based services to users irrespective of the medium, device or technology used. Security, user privacy and management of location information are some of the important issues that need to be addressed. The main contribution of this thesis is the design of a secure and privacy assured heterogeneous location architecture. A formal methodology was chosen to design the heterogeneous location architecture. The design of the architecture resulted in a novel key distribution protocol and a model for information flow that can be easily encapsulated into applications or architectures having similar requirements. The research also resulted in the enhancement of a proposed location framework for securing critical infrastructures using context-aware self-defending objects. The proposed enhanced framework helps to negate the security vulnerabilities introduced through the use of general-purpose computer systems in critical infrastructures.

APA, Harvard, Vancouver, ISO, and other styles

22

Grewe, Dominik. "Mapping parallel programs to heterogeneous multi-core systems." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/8852.

Full text

Abstract:

Heterogeneous computer systems are ubiquitous in all areas of computing, from mobile to high-performance computing. They promise to deliver increased performance at lower energy cost than purely homogeneous, CPU-based systems. In recent years GPU-based heterogeneous systems have become increasingly popular. They combine a programmable GPU with a multi-core CPU. GPUs have become flexible enough to not only handle graphics workloads but also various kinds of general-purpose algorithms. They are thus used as a coprocessor or accelerator alongside the CPU. Developing applications for GPU-based heterogeneous systems involves several challenges. Firstly, not all algorithms are equally suited for GPU computing. It is thus important to carefully map the tasks of an application to the most suitable processor in a system. Secondly, current frameworks for heterogeneous computing, such as OpenCL, are low-level, requiring a thorough understanding of the hardware by the programmer. This high barrier to entry could be lowered by automatically generating and tuning this code from a high-level and thus more user-friendly programming language. Both challenges are addressed in this thesis. For the task mapping problem a machine learning-based approach is presented in this thesis. It combines static features of the program code with runtime information on input sizes to predict the optimal mapping of OpenCL kernels. This approach is further extended to also take contention on the GPU into account. Both methods are able to outperform competing mapping approaches by a significant margin. Furthermore, this thesis develops a method for targeting GPU-based heterogeneous systems from OpenMP, a directive-based framework for parallel computing. OpenMP programs are translated to OpenCL and optimized for GPU performance. At runtime a predictive model decides whether to execute the original OpenMP code on the CPU or the generated OpenCL code on the GPU. This approach is shown to outperform both a competing approach as well as hand-tuned code.

APA, Harvard, Vancouver, ISO, and other styles

23

Gelado, Fernández Isaac. "On the programmability of heterogeneous massively-parallel computing systems." Doctoral thesis, Universitat Politècnica de Catalunya, 2010. http://hdl.handle.net/10803/6031.

Full text

Abstract:

Heterogeneous parallel computing combines general purpose processors with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications. Existing programming models for heterogeneous parallel computing impose added coding complexity when compared to traditional sequential shared-memory programming models for homogeneous systems. This extra code complexity is assumable in supercomputing environments, where programmability is sacrificed in pursuit of high performance. However, heterogeneous parallel systems are massively reaching the desktop market (e.g., 425.4 million of GPU cards were sold in 2009), where the trade-off between performance and programmability is the opposite. The code complexity required when using accelerators and the lack of compatibility prevents programmers from exploiting the full computing capabilities of heterogeneous parallel systems in general purpose applications. This dissertation aims to increase the programmability of CPU - accelerator systems, without introducing major performance penalties. The key insight is that general purpose application programmers tend to favor programmability at the cost of system performance. This fact is illustrated by the tendency to use high-level programming languages, such as C++, to ease the task of programming at the cost of minor performance penalties. Moreover, currently many general purpose applications are being developed using interpreted languages, such as Java, C# or python, which raise the abstraction level even further introducing relatively large performance overheads. This dissertation also takes the approach of raising the level of abstraction for accelerators to improve programmability and investigates hardware and software mechanisms to efficiently implement these high-level abstractions without introducing major performance overheads. Heterogeneous parallel systems typically implement separate memories for CPUs and accelerators, although commodity systems might use a shared memory at the cost of lower performance. However, in these commodity shared memory systems, coherence between accelerator and CPUs is not guaranteed. This system architecture implies that CPUs can only access system memory, and accelerators can only access their own local memory. This dissertation assumes separate system and accelerator memory and shows that low-level abstractions for these disjoint address spaces are the source of poor programmability of heterogeneous parallel systems. A first consequence of having separate system and accelerator memories are the current data transfer models for heterogeneous parallel systems. In this dissertation two data transfer paradigms are identified: per-call and double-buffered. In these two models, data structures used by accelerators are allocated in both, system and accelerator memories. These models differ on how data between accelerator and system memories is managed. The per-call model transfers the input data needed by accelerators before accelerator calls, and transfers back the output data produced by accelerators on accelerator call return. The per-call model is quite simple, but might impose unacceptable performance penalties due to data transfer overheads. The double-buffered model aims to overlap data communication and CPU and accelerator computation. This model requires a relative quite complex code due to parallel execution and the need of synchronization between data communication and processing tasks. The extra code required for data transfers in these two models is necessary due to the lack of by-reference parameter passing to accelerators. This dissertation presents a novel accelerator-hosted data transfer model. In this model, data used by accelerators is hosted in the accelerator memory, so when the CPU accesses this data, it is effectively accessing the accelerator memory. Such a model cleanly supports by-reference parameter passing to accelerator calls, removing the need to explicit data transfers. The second consequence of separate system and accelerator memories is that current programming models export separate virtual system and accelerator address spaces to application programmers. This dissertation identifies the double-pointer problem as a direct consequence of these separate virtual memory spaces. The double-pointer problem is that data structures used by both, accelerators and CPUs, are referenced by different virtual memory addresses (pointers) in the CPU and accelerator code. The double-pointer problem requires programmers to add extra code to ensure that both pointers contain consistent values (e.g., when reallocating a data structure). Keeping consistency between system and accelerator pointers might penalize accelerator performance and increase the accelerator memory requirements when pointers are embedded within data structures (e.g., a linked-list). For instance, the double-pointer problem requires increasing the numbers of global memory accesses by 2X in a GPU code that reconstructs a linked-list. This dissertation argues that a unified virtual address space that includes both, system and accelerator memories is an efficient solution to the double-pointer problem. Moreover, such a unified virtual address space cleanly complements the accelerator-hosted data model previously discussed. This dissertation introduces the Non-Uniform Accelerator Memory Access (NUAMA) architecture, as a hardware implementation of the accelerator-hosted data transfer model and the unified virtual address space. In NUAMA an Accelerator Memory Collector (AMC) is included within the system memory controller to identify memory requests for accelerator-hosted data. The AMC buffers and coalesces such memory requests to efficiently transfer data from the CPU to the accelerator memory. NUAMA also implements a hybrid L2 cache memory. The L2 cache in NUAMA follows a write-throughwrite-non-allocate policy for accelerator hosted data. This policy ensures that the contents of the accelerator memory are updated eagerly and, therefore, when the accelerator is called, most of the data has been already transferred. The eager update of the accelerator memory contents effectively overlaps data communication and CPU computation. A write-backwrite-allocate policy is used for the data hosted by the system memory, so the performance of applications that does not use accelerators is not affected. In NUAMA, accelerator-hosted data is identified using a TLB-assisted mechanism. The page table entries are extended with a bit, which is set for those memory pages that are hosted by the accelerator memory. NUAMA increases the average bandwidth requirements for the L2 cache memory and the interconnection network between the CPU and accelerators, but the instantaneous bandwidth, which is the limiting factor, requirements are lower than in traditional DMA-based architectures. The NUAMA architecture is compared to traditional DMA systems using cycle-accurate simulations. Experimental results show that NUAMA and traditional DMA-based architectures perform equally well. However, the application source code complexity of NUAMA is much lower than in DMA-based architectures. A software implementation of the accelerator-hosted model and the unified virtual address space is also explored. This dissertation presents the Asymmetric Distributed Shared Memory (ADSM) model. ADSM maintains a shared logical memory space for CPUs to access data in the accelerator physical memory but not vice versa. The asymmetry allows light-weight implementations that avoid common pitfalls of symmetrical distributed shared memory systems. ADSM allows programmers to assign data structures to performance critical methods. When a method is selected for accelerator execution, its associated data objects are allocated within the shared logical memory space, which is hosted in the accelerator physical memory and transparently accessible by the methods executed on CPUs. ADSM reduces programming efforts for heterogeneous parallel computing systems and enhances application portability. The design and implementation of an ADSM run-time, called GMAC, on top of CUDA in a GNU/Linux environment is presented. Experimental results show that applications written in ADSM and running on top of GMAC achieve performance comparable to their counterparts using programmer-managed data transfers. This dissertation presents the GMAC system, evaluates different design choices, and it further suggests additional architectural support that will likely allow GMAC to achieve higher application performance than the current CUDA model. Finally, the execution model of heterogeneous parallel systems is considered. Accelerator execution is abstracted in different ways in existent programming models. This dissertation explores three approaches implemented by existent programming models. OpenCL and the NVIDIA CUDA driver API use file descriptor semantics to abstract accelerators: user processes access accelerators through descriptors. This approach increases the complexity of using accelerators because accelerator descriptors are needed in any call involving the accelerator (e.g., memory allocations or passing a parameter to the accelerator). The IBM Cell SDK abstract accelerators as separate execution threads. This approach requires adding the necessary code to create new execution threads and synchronization primitives to use of accelerators. Finally, the NVIDIA CUDA run-time API abstract accelerators as Remote Procedure Calls (RPC). This approach is fundamentally incompatible with ADSM, because it assumes separate virtual address spaces for accelerator and CPU code. The Heterogeneous Parallel Execution (HPE) model is presented in this dissertation. This model extends the execution thread abstraction to incorporate different execution modes. Execution modes define the capabilities (e.g., accessible virtual address space, code ISA, etc) of the code being executed. In this execution model, accelerator calls are implemented as execution mode switches, analogously to system calls. Accelerator calls in HPE are synchronous, on the contrary of CUDA, OpenCL and the IBM Cell SDK. Synchronous accelerator calls provide full compatibility with the existent sequential execution model provided by most operating systems. Moreover, abstracting accelerator calls as execution mode switches allows application that use accelerator to run on system without accelerators. In these systems, the execution mode switch falls back to an emulation layer, which emulates the accelerator execution in the CPU. This dissertation further presents different design and implementation choices for the HPE model, in GMAC. The necessary hardware support for an efficient implementation of this model is also presented. Experimental results show that HPE introduces a low execution-time overhead while offering a clean and simple programming interface to applications.

APA, Harvard, Vancouver, ISO, and other styles

24

Banino-Rokkones, Cyril. "Algorithmic and Scheduling Techniques for Heterogeneous and Distributed Computing." Doctoral thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-1462.

Full text

Abstract:

The computing and communication resources of high performance computing systems are becoming heterogeneous, are exhibiting performance fluctuations and are failing in an unforeseeable manner. The Master-Slave (MS) paradigm, that decomposes the computational load into independent tasks, is well-suited for operating in these environments due to its loose synchronization requirements. The application tasks can be computed in any order, by any slave, and can be resubmitted in case of slave failures. Although, the MS paradigm naturally adapts to dynamic and unreliable environments, it nevertheless suffers from a lack of scalability.This thesis providesmodels, techniques and scheduling strategies that improve the scalability and performance of MS applications. In particular, we claim that deploying multiple masters may be necessary to achieve scalable performance. We address the problem of finding the most profitable locations on a heterogeneous Grid for hosting a given number of master processes, such that the total task throughput of the system is maximized. Further, we provide distributed scheduling strategies that better adapt to system load fluctuations than traditional MS techniques. Our strategies are especially efficient when communication is expensive compared to computation (which constitutes the difficult case).Furthermore, this thesis investigates also the suitability ofMS scheduling techniques for the parallelization of stencil code applications. These applications are usually parallelized with domain decompositionmethods, that are highly scalable, but rather impractical for dealing with heterogeneous, dynamic and unreliable environments. Our experimental results with two scientific applications show that traditional MS tasking techniques can successfully be applied to stencil code applications when the master is used to control the parallel execution. If the master is used as a data access point, then deploying multiple masters becomes necessary to achieve scalable performance.

APA, Harvard, Vancouver, ISO, and other styles

25

Krommydas, Konstantinos. "Towards Enhancing Performance, Programmability, and Portability in Heterogeneous Computing." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/77582.

Full text

Abstract:

The proliferation of a diverse set of heterogeneous computing platforms in conjunction with the plethora of programming languages and optimization techniques on each language for each underlying architecture exacerbate widespread adoption of such platforms. This is especially true for novice programmers and the non-technical-savvy masses that are largely precluded from enjoying the advantages of high-performance computing. Moreover, different groups within the heterogeneous computing community (e.g., hardware architects, tool developers, and programmers) are presented with new challenges with respect to performance, programmability, and portability (or the three P's) of heterogeneous computing. In this work we discuss such challenges and identify benchmarking techniques based on computation and communication patterns as an appropriate means for the systematic evaluation of heterogeneous computing with respect to the three P's. Our proposed approach is based on OpenCL implementations of the Berkeley dwarfs. We use our benchmark suite (OpenDwarfs) in characterizing performance of state-of-the-art parallel architectures, and as the main component of a methodology (Telescoping Architectures) for identifying trends in future heterogeneous architectures. Furthermore, we employ OpenDwarfs in a multi-faceted study on the gaps between the three P's in the context of the modern heterogeneous computing landscape. Our case-study spans a variety of compilers, languages, optimizations, and target architectures, including the CPU, GPU, MIC, and FPGA. Based on our insights, and extending aspects of prior research (e.g., in compilers, programming languages, and auto-tuning), we propose the introduction of grid-based data structures as the basis of programming frameworks and present a prototype unified framework (GLAF) that encompasses a novel visual programming environment with code generation, auto-parallelization, and auto-tuning capabilities. Our results, which span scientific domains, indicate that our holistic approach constitutes a viable alternative towards enhancing the three P's and further democratizing heterogeneous, parallel computing for non-programming-savvy audiences, and especially domain scientists. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

26

Helal, Ahmed Elmohamadi Mohamed. "Automated Runtime Analysis and Adaptation for Scalable Heterogeneous Computing." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/96607.

Full text

Abstract:

In the last decade, there have been tectonic shifts in computer hardware because of reaching the physical limits of the sequential CPU performance. As a consequence, current high-performance computing (HPC) systems integrate a wide variety of compute resources with different capabilities and execution models, ranging from multi-core CPUs to many-core accelerators. While such heterogeneous systems can enable dramatic acceleration of user applications, extracting optimal performance via manual analysis and optimization is a complicated and time-consuming process. This dissertation presents graph-structured program representations to reason about the performance bottlenecks on modern HPC systems and to guide novel automation frameworks for performance analysis and modeling and runtime adaptation. The proposed program representations exploit domain knowledge and capture the inherent computation and communication patterns in user applications, at multiple levels of computational granularity, via compiler analysis and dynamic instrumentation. The empirical results demonstrate that the introduced modeling frameworks accurately estimate the realizable parallel performance and scalability of a given sequential code when ported to heterogeneous HPC systems. As a result, these frameworks enable efficient workload distribution schemes that utilize all the available compute resources in a performance-proportional way. In addition, the proposed runtime adaptation frameworks significantly improve the end-to-end performance of important real-world applications which suffer from limited parallelism and fine-grained data dependencies. Specifically, compared to the state-of-the-art methods, such an adaptive parallel execution achieves up to an order-of-magnitude speedup on the target HPC systems while preserving the inherent data dependencies of user applications. Doctor of Philosophy Current supercomputers integrate a massive number of heterogeneous compute units with varying speed, computational throughput, memory bandwidth, and memory access latency. This trend represents a major challenge to end users, as their applications have been designed from the ground up to primarily exploit homogeneous CPUs. While heterogeneous systems can deliver several orders of magnitude speedup compared to traditional CPU-based systems, end users need extensive software and hardware expertise as well as significant time and effort to efficiently utilize all the available compute resources. To streamline such a daunting process, this dissertation presents automated frameworks for analyzing and modeling the performance on parallel architectures and for transforming the execution of user applications at runtime. The proposed frameworks incorporate domain knowledge and adapt to the input data and the underlying hardware using novel static and dynamic analyses. The experimental results show the efficacy of the introduced frameworks across many important application domains, such as computational fluid dynamics (CFD), and computer-aided design (CAD). In particular, the adaptive execution approach on heterogeneous systems achieves up to an order-of-magnitude speedup over the optimized parallel implementations.

APA, Harvard, Vancouver, ISO, and other styles

27

Daga, Mayank. "Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems." Thesis, Virginia Tech, 2011. http://hdl.handle.net/10919/32535.

Full text

Abstract:

The emergence of scientific applications embedded with multiple modes of parallelism has made heterogeneous computing systems indispensable in high performance computing. The popularity of such systems is evident from the fact that three out of the top five fastest supercomputers in the world employ heterogeneous computing, i.e., they use dissimilar computational units. A closer look at the performance of these supercomputers reveals that they achieve only around 50% of their theoretical peak performance. This suggests that applications that were tuned for erstwhile homogeneous computing may not be efficient for todayâ s heterogeneous computing and hence, novel optimization strategies are required to be exercised. However, optimizing an application for heterogeneous computing systems is extremely challenging, primarily due to the architectural differences in computational units in such systems. This thesis intends to act as a cookbook for optimizing applications on heterogeneous computing systems that employ graphics processing units (GPUs) as the preferred mode of accelerators. We discuss optimization strategies for multicore CPUs as well as for the two popular GPU platforms, i.e., GPUs from AMD and NVIDIA. Optimization strategies for NVIDIA GPUs have been well studied but when applied on AMD GPUs, they fail to measurably improve performance because of the differences in underlying architecture. To the best of our knowledge, this research is the first to propose optimization strategies for AMD GPUs. Even on NVIDIA GPUs, there exists a lesser known but an extremely severe performance pitfall called partition camping, which can affect application performance by up to seven-fold. To facilitate the detection of this phenomenon, we have developed a performance prediction model that analyzes and characterizes the effect of partition camping in GPU applications. We have used a large-scale, molecular modeling application to validate and verify all the optimization strategies. Our results illustrate that if appropriately optimized, AMD and NVIDIA GPUs can provide 371-fold and 328-fold improvement, respectively, over a hand-tuned, SSE-optimized serial implementation. Master of Science

APA, Harvard, Vancouver, ISO, and other styles

28

Adurti, Devi Abhiseshu, and Mohit Battu. "Optimization of Heterogeneous Parallel Computing Systems using Machine Learning." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21834.

Full text

Abstract:

Background: Heterogeneous parallel computing systems utilize the combination of different resources CPUs and GPUs to achieve high performance and, reduced latency and energy consumption. Programming applications that target various processing units requires employing different tools and programming models/languages. Furthermore, selecting the most optimal implementation, which may either target different processing units (i.e. CPU or GPU) or implement the various algorithms, is not trivial for a given context. In this thesis, we investigate the use of machine learning to address the selection problem of various implementation variants for an application running on a heterogeneous system. Objectives: This study is focused on providing an approach for optimization of heterogeneous parallel computing systems at runtime by building the most efficient machine learning model to predict the optimal implementation variant of an application. Methods: The six machine learning models KNN, XGBoost, DTC, Random Forest Classifier, LightGBM, and SVM are trained and tested using stratified k-fold on the dataset generated from the matrix multiplication application for square matrix input dimension ranging from 16x16 to 10992x10992. Results: The results of each machine learning algorithm’s finding are presented through accuracy, confusion matrix, classification report for parameters precision, recall, and F-1 score, and a comparison between the machine learning models in terms of accuracy, run-time training, and run-time prediction are provided to determine the best model. Conclusions: The XGBoost, DTC, SVM algorithms achieved 100% accuracy. In comparison to the other machine learning models, the DTC is found to be the most suitable due to its low time required for training and prediction in predicting the optimal implementation variant of the heterogeneous system application. Hence the DTC is the best suitable algorithm for the optimization of heterogeneous parallel computing.

APA, Harvard, Vancouver, ISO, and other styles

29

Srivatsan, Siddhartha Eluppai. "Integrating heterogeneous computing resources to form a campus grid." [Gainesville, Fla.] : University of Florida, 2009. http://purl.fcla.edu/fcla/etd/UFE0024690.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Vella, Kevin J. "Seamless parallel computing on heterogeneous networks of multiprocessor workstations." Thesis, University of Kent, 1998. https://kar.kent.ac.uk/21580/.

Full text

Abstract:

This thesis is concerned with portable, efficient, and, above all, seamless parallel programming of heterogeneous networks of shared memory multiprocessor workstations. The CSP model of concurrency as embodied in the occam language is used to purvey an architecture-independent and elegant view of concurrent systems. Tools and techniques for efficiently executing finely decomposed parallel programs on uniprocessor workstations, shared memory multiprocessor workstations and networks of both are examined in some detail. In particular, scheduling strategies that batch related processes together to reduce cache-related context switching overheads on uniprocessors, and to reduce contention and false sharing on shared memory multiprocessors are studied. New wait-free CP channel algorithms for shared memory multiprocessors are presented, as well as implementations of CSP channel algorithms across commodity network interconnects. A virtual parallel computer abstraction is applied to hide the inherent heterogeneity of workstation networks and enable seamless execution of parallel programs. An investigation of the performance of moderate to very fine grain parallelism on uniprocessors and shared memory multiprocessors is presented. The performance of CSP channels across TCP/IP networks is also scrutinized. The results indicate that fine grain parallelism can be handled efficiently in software on uniprocessors and shared memory multiprocessors, though issues related to caching warrant careful consideration. Other results also show that a limited amount of computation-communication overlap can be attained even with commodity network adapters which require significant processor interaction to sustain data transfer. This thesis demonstrates that seamless parallel programming across a variety of contemporary architectures using the CSP/occam model is a viable, as well as an attractive, option.

APA, Harvard, Vancouver, ISO, and other styles

31

Di, Giovanni Pasquale. "Enhancing Ubiquitous Computing Environments Through Composition of Heterogeneous Services." Doctoral thesis, Universita degli studi di Salerno, 2016. http://hdl.handle.net/10556/2231.

Full text

Abstract:

2012 - 2013 In recent years the substantial advancements in Information and Communication Technologies enabled the development of original software solutions that can provide support to problems people face in their daily activities. Among the technical advancements that have fostered the development of such innovative applications, the gradual transition from stand-alone and centralized architectures to distributed ones and the explosive growth in the area of mobile communication have played a central role. The pro table combination of these advancements has led to the rise of the so-called Mobile Information Systems. Unfortunately, ful lling such a type of systems is very challenging and several aspects have to be taken into account during the design and development of both the front and back ends of the proposed solution. Within this context in this thesis we investigate two main aspects: 1) the elicitation of requirements and the design of usable mobile User Interfaces and 2) the information exchange in a back end combining heterogeneous services, more speci cally services based on the standards of the World Wide Web (W3C) and Open Geospatial Consortium (OGC). In particular, we develop a methodology to support the design of mobile solutions when usability requirements play a key role for the success of the whole system. We also present a solution for a seamless integration of services developed according to di erent standards with speci c focus on the issue of proper management of geospatial metadata in a W3C standards-oriented infrastructure. The result of our investigation is an extension for a key W3C standard for the metadata retrieval to support OGC metadata. The case study considered in our work is a Mobile Information System to be used by a community of farmers in Sri Lanka. [edited by Author] XII n.s.

APA, Harvard, Vancouver, ISO, and other styles

32

MA, LIANG. "Low power and high performance heterogeneous computing on FPGAs." Doctoral thesis, Politecnico di Torino, 2019. http://hdl.handle.net/11583/2727228.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Ribeiro, Tiago Filipe Rodrigues. "Developing and evaluating clopencl applications for heterogeneous clusters." Master's thesis, Instituto Politécnico de Bragança, Escola Superior de Tecnologia e Gestão, 2012. http://hdl.handle.net/10198/7948.

Full text

Abstract:

In the last few years, the computing systems processing capabilities have increased significantly, changing from single-core to multi-core and even many-core systems. Accompanying this evolution, local networks have also become faster, with multi-gigabit technologies like Infiniband, Myrinet and 10G Ethernet. Parallel/distributed programming tools and standards, like POSIX Threads, OpenMP and MPI, have helped to explore these technologies and have been frequently combined, giving rise to Hybrid Programming Models. Recently, co-processors like GPUs and FPGAs, started to be used as accelerators, requiring specialized frameworks (like CUDA for NVIDIA GPUs). Presented with so much heterogeneity, the industry formulated the OpenCL specification, as a standard to explore heterogeneous systems. However, in the context of cluster computing, one problem surfaces: OpenCL only enables a developer to use the devices that are present in the local machine. With many processor devices scattered across cluster nodes (CPUs, GPUs and other co-processors), it then became important to enable software developers to take full advantage of the full cluster device set. This dissertation demonstrates and evaluates an OpenCL extension, named clOpenCL, which supports the simple deployment and efficient running of OpenCL-based parallel applications that may span several cluster nodes, thus expanding the original single-node OpenCL model. The main contributions are that clOpenCL i) offers a transparent approach to the porting of traditional OpenCL applications to cluster environments and ii) provides significant performance increases over classical (non-)hybrid parallel approaches. Nos últimos anos, a capacidade de processamento dos sistemas de computação aumentou significativamente, passando de CPUs com um núcleo para CPUs multi-núcleo. Acompanhando esta evolução, as redes locais também se tornaram mais rápidas, com tecnologias multi-gigabit como a Infiniband, Myrinet e 10G Ethernet. Ferramentas e standards paralelos/distribuídos, como POSIX Threads, OpenMP e MPI, ajudaram a explorar esses sistemas, e têm sido frequentemente combinados dando origem a Modelos de Programação Híbrida. Mais recentemente, co-processadores como GPUs e FPGAs, começaram a ser utilizados como aceleradores, exigindo frameworks especializadas (como o CUDA para GPUs NVIDIA). Deparada com tanta heterogeneidade, a indústria formulou a especificação OpenCL, como sendo um standard para exploração de sistemas heterogéneos. No entanto, no contexto da computação em cluster, um problema surge: o OpenCL só permite ao desenvolvedor utilizar dispositivos presentes na máquina local. Com tantos dispositivos de processamento espalhados pelos nós de um cluster (CPUs, GPUs e outros co-processadores), tornou-se assim importante habilitar os desenvolvedores de software, a tirarem o máximo proveito do conjunto total de dispositivos do cluster. Esta dissertação demonstra e avalia uma extensão OpenCL, chamada clOpenCL, que suporta a implementação simples e execução eficiente de aplicações paralelas baseadas em OpenCL que podem estender-se por vários nós do cluster, expandindo assim o modelo original de um único nó do OpenCL. As principais contribuições referem-se a que o clOpenCL i) oferece uma abordagem transparente à portabilidade de aplicações OpenCL tradicionais para ambientes cluster e ii) proporciona aumentos significativos de desempenho sobre abordagens paralelas clássicas (não) híbridas.

APA, Harvard, Vancouver, ISO, and other styles

34

Lu, Kai. "Decentralized load balancing in heterogeneous computational grids." Thesis, The University of Sydney, 2007. http://hdl.handle.net/2123/9382.

Full text

Abstract:

With the rapid development of high-speed wide-area networks and powerful yet low-cost computational resources, grid computing has emerged as an attractive computing paradigm. The space limitations of conventional distributed systems can thus be overcome, to fully exploit the resources of under-utilised computing resources in every region around the world for distributed jobs. Workload and resource management are key grid services at the service level of grid software infrastructure, where issues of load balancing represent a common concern for most grid infrastructure developers. Although these are established research areas in parallel and distributed computing, grid computing environments present a number of new challenges, including large-scale computing resources, heterogeneous computing power, the autonomy of organisations hosting the resources, uneven job-arrival pattern among grid sites, considerable job transfer costs, and considerable communication overhead involved in capturing the load information of sites. This dissertation focuses on designing solutions for load balancing in computational grids that can cater for the unique characteristics of grid computing environments. To explore the solution space, we conducted a survey for load balancing solutions, which enabled discussion and comparison of existing approaches, and the delimiting and exploration of the apportion of solution space. A system model was developed to study the load-balancing problems in computational grid environments. In particular, we developed three decentralised algorithms for job dispatching and load balancing—using only partial information: the desirability-aware load balancing algorithm (DA), the performance-driven desirability-aware load-balancing algorithm (P-DA), and the performance-driven region-based load-balancing algorithm (P-RB). All three are scalable, dynamic, decentralised and sender-initiated. We conducted extensive simulation studies to analyse the performance of our load-balancing algorithms. Simulation results showed that the algorithms significantly outperform preexisting decentralised algorithms that are relevant to this research.

APA, Harvard, Vancouver, ISO, and other styles

35

Padhye, Mohini. "Coordinating heterogeneous web services through handhelds using SyD." unrestricted, 2004. http://etd.gsu.edu/theses/available/etd-12062004-125228/.

Full text

Abstract:

Thesis (M.S.)--Georgia State University, 2004. Title from title screen. Sushil K. Prasad, committee chair; Anu Bourgeois, Alex Zelikovsky, committee members. Description based on contents viewed Feb. 26, 2007. Includes bibliographical references (p. 55-59). Source code: p. 75-123.

APA, Harvard, Vancouver, ISO, and other styles

36

Li, Yue. "Edge computing-based access network selection for heterogeneous wireless networks." Thesis, Rennes 1, 2017. http://www.theses.fr/2017REN1S042/document.

Full text

Abstract:

Au cours de ces dernières décennies, les réseaux de télécommunications mobiles ont évolué de la 1G à la 4G. La 4G permet la coexistence de différents réseaux d'accès. Ainsi, les utilisateurs ont la capacité de se connecter à un réseau hétérogène, constitué de plusieurs réseaux d'accès. Toutefois, la sélection du réseau approprié n'est pas une tâche facile pour les utilisateurs mobiles puisque les conditions de chaque réseau d'accès changent rapidement. Par ailleurs, en termes d'usage, le streaming vidéo devient le service principal de transfert de données sur les réseaux mobiles, ce qui amène les fournisseurs de contenu et les opérateurs de réseau à coopérer pour garantir la qualité de la diffusion. Dans ce contexte, la thèse propose la conception d'une approche novatrice pour la prise de décision optimale de sélection de réseau et une architecture améliorant les performances des services de streaming adaptatif dans un réseau hétérogène. En premier lieu, nous introduisons un modèle analytique décrivant la procédure de sélection de réseau en ne considérant déjà qu'une seule classe de trafic. Nous concevons ensuite une stratégie de sélection basée sur des fondements de la théorie du contrôle optimal linéaire. Des simulations sous MATLAB sont effectuées pour valider l'efficacité du mécanisme proposé. Sur ce même principe, nous étendons ce modèle avec un modèle analytique général décrivant les procédures de sélection de réseau dans des environnements de réseaux hétérogènes avec de multiples classes de trafic. Le modèle proposé est ensuite utilisé pour dériver un mécanisme adaptatif basé sur la théorie du contrôle, qui permet non seulement d'aider à piloter dynamiquement le trafic vers l'accès réseau le plus approprié mais aussi de bloquer dynamiquement le trafic résiduel lorsque le réseau est congestionné en ajustant les probabilités d'accès optimales. Nous discutons aussi les avantages d'une intégration transparente du mécanisme proposé avec l'ANDSF, solution fonctionnelle normalisée pour la sélection de réseau. Un prototype est également implémenté dans ns-3. En second lieu, nous nous concentrons sur l'amélioration des performances de DASH pour les utilisateurs mobiles dans un environnement de réseau d'accès 4G uniquement. Nous introduisons une nouvelle architecture basée sur l'utilisation de serveurs distribués en périphérie de réseau suivant le standard MEC. Le mécanisme d'adaptation proposé, fonctionnant en tant que service MEC, peut modifier les fichiers de manifeste en temps réel, en réponse à la congestion du réseau et à la demande dynamique de flux de streaming. Ces modifications conduisent ainsi les clients à sélectionner des représentations vidéo de débit / qualité plus appropriées. Nous avons développé une plateforme de test virtualisée pour l'expérimentation de notre proposition. Les résultats ainsi obtenus démontrent ses avantages en terme de QoE comparés aux approches d'adaptation traditionnelles, purement pilotées par les clients, car notre approche améliore non seulement le MOS mais aussi l'équité face à la congestion. Enfin, nous étendons l'architecture proposée basée sur MEC pour supporter le service de streaming adaptatif DASH dans un réseau hétérogène multi-accès afin de maximiser la QoE et l'équité des utilisateurs mobiles. Dans ce scénario, notre mécanisme doit aider les utilisateurs à sélectionner la qualité vidéo et le réseau et nous le formulons comme un problème d'optimisation. Ce problème d'optimisation peut être résolu par l'outil IBM CPLEX, mais cela prend du temps et ne peut être envisagé à grande échelle. Par conséquent, nous introduisons une heuristique pour aborder la solution optimale avec moins de complexité. Ensuite, nous mettons en œuvre une expérimentation sur notre plateforme de tests. Le résultat démontre que, par rapport à l'outil IBM CPLEX, notre algorithme permet d'obtenir des performances similaires sur la QoE globale et l'équité, avec un gain de temps significatif Telecommunication network has evolved from 1G to 4G in the past decades. One of the typical characteristics of the 4G network is the coexistence of heterogeneous radio access technologies, which offers end-users the capability to connect them and to switch between them with their mobile devices of the new generation. However, selecting the right network is not an easy task for mobile users since access network condition changes rapidly. Moreover, video streaming is becoming the major data service over the mobile network where content providers and network operators should cooperate to guarantee the quality of video delivery. In order to cope with this context, the thesis concerns the design of a novel approach for making an optimal network selection decision and architecture for improving the performance of adaptive streaming in the context of a heterogeneous network. Firstly, we introduce an analytical model (i.e. linear discrete-time system) to describe the network selection procedure considering one traffic class. Then, we consider the design of a selection strategy based on foundations from linear optimal control theory, with the objective to maximize network resource utilization while meeting the constraints of the supported services. Computer simulations with MATLAB are carried out to validate the efficiency of the proposed mechanism. Based on the same principal we extend this model with a general analytical model describing the network selection procedures in heterogeneous network environments with multiple traffic classes. The proposed model was, then, used to derive a scalable mechanism based on control theory, which allows not only to assist in steering dynamically the traffic to the most appropriate network access but also helps in blocking the residual traffic dynamically when the network is congested by adjusting dynamically the access probabilities. We discuss the advantages of a seamless integration with the ANDSF. A prototype is also implemented into ns-3. Simulation results sort out that the proposed scheme prevents the network congestion and demonstrates the effectiveness of the controller design, which can maximize the network resources allocation by converging the network workload to the targeted network occupancy. Thereafter, we focus on enhancing the performance of DASH in a mobile network environment for the users which has one access network. We introduce a novel architecture based on MEC. The proposed adaptation mechanism, running as an MEC service, can modify the manifest files in real time, responding to network congestion and dynamic demand, thus driving clients towards selecting more appropriate quality/bitrate video representations. We have developed a virtualized testbed to run the experiment with our proposed scheme. The simulation results demonstrate its QoE benefits compared to traditional, purely client-driven, bitrate adaptation approaches since our scheme notably improves both on the achieved MOS and on fairness in the face of congestion. Finally, we extend the proposed the MEC-based architecture to support the DASH service in a multi-access heterogeneous network in order to maximize the QoE and fairness of mobile users. In this scenario, our scheme should help users select both video quality and access network and we formulate it as an optimization problem. This optimization problem can be solved by IBM CPLEX tool. However, this tool is time-consuming and not scalable. Therefore, we introduce a heuristic algorithm to make a sub-optimal solution with less complexity. Then we implement a testbed to conduct the experiment and the result demonstrates that our proposed algorithm notably can achieve similar performance on overall achieved QoE and fairness with much more time-saving compared to the IBM CPLEX tool

APA, Harvard, Vancouver, ISO, and other styles

37

Janjic, Vladimir. "Load balancing of irregular parallel applications on heterogeneous computing environments." Thesis, University of St Andrews, 2012. http://hdl.handle.net/10023/2540.

Full text

Abstract:

Large-scale heterogeneous distributed computing environments (such as Computational Grids and Clouds) offer the promise of access to a vast amount of computing resources at a relatively low cost. In order to ease the application development and deployment on such complex environments, high-level parallel programming languages exist that need to be supported by sophisticated runtime systems. One of the main problems that these runtime systems need to address is dynamic load balancing that ensures that no resources in the environment are underutilised or overloaded with work. This thesis deals with the problem of obtaining good speedups for irregular applications on heterogeneous distributed computing environments. It focuses on workstealing techniques that can be used for load balancing during the execution of irregular applications. It specifically addresses two problems that arise during work-stealing: where thieves should look for work during the application execution and how victims should respond to steal attempts. In particular, we describe and implement a new Feudal Stealing algorithm and also we describe and implement new granularity-driven task selection policies in the SCALES simulator, which is a work-stealing simulator developed for this thesis. In addition, we present the comprehensive evaluation of the Feudal Stealing algorithm and the granularity-driven task selection policies using the simulations of a large class of regular and irregular parallel applications on a wide range of computing environments. We show how the Feudal Stealing algorithm and the granularity-driven task selection policies bring significant improvements in speedups of irregular applications, compared to the state-of-the-art work-stealing algorithms. Furthermore, we also present the implementation of the task selection policies in the Grid-GUM runtime system [AZ06] for Glasgow Parallel Haskell (GpH) [THLPJ98], in addition to the implementation in SCALES, and we also present the evaluation of this implementation on a large set of synthetic applications.

APA, Harvard, Vancouver, ISO, and other styles

38

Kao, Yi-Hsuan. "Optimizing task assignment for collaborative computing over heterogeneous network devices." Thesis, University of Southern California, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10124490.

Full text

Abstract:

The Internet of Things promises to enable a wide range of new applications involving sensors, embedded devices and mobile devices. Different from traditional cloud computing, where the centralized and powerful servers offer high quality computing service, in the era of the Internet of Things, there are abundant computational resources distributed over the network. These devices are not as powerful as servers, but are easier to access with faster setup and short-range communication. However, because of energy, computation, and bandwidth constraints on smart things and other edge devices, it will be imperative to collaboratively run a computational-intensive application that a single device cannot support individually. As many IoT applications, like data processing, can be divided into multiple tasks, we study the problem of assigning such tasks to multiple devices taking into account their abilities and the costs, and latencies associated with both task computation and data communication over the network. A system that leverages collaborative computing over the network faces highly variant run-time environment. For example, the resource released by a device may suddenly decrease due to the change of states on local processes, or the channel quality may degrade due to mobility. Hence, such a system has to learn the available resources, be aware of changes and flexibly adapt task assignment strategy that efficiently makes use of these resources. We take a step by step approach to achieve these goals. First, we assume that the amount of resources are deterministic and known. We formulate a task assignment problem that aims to minimize the application latency (system response time) subject to a single cost constraint so that we will not overuse the available resource. Second, we consider that each device has its own cost budget and our new multi-constrained formulation clearly attributes the cost to each device separately. Moving a step further, we assume that the amount of resources are stochastic processes with known distributions, and solve a stochastic optimization with a strong QoS constraint. That is, instead of providing a guarantee on the average latency, our task assignment strategy gives a guarantee that p% of time the latency is less than t, where p and t are arbitrary numbers. Finally, we assume that the amount of run-time resources are unknown and stochastic, and design online algorithms that learn the unknown information within limited amount of time and make competitive task assignment. We aim to develop algorithms that efficiently make decisions at run-time. That is, the computational complexity should be as light as possible so that running the algorithm does not incur considerable overhead. For optimizations based on known resource profile, we show these problems are NP-hard and propose polynomial-time approximation algorithms with performance guarantee, where the performance loss caused by sub-optimal strategy is bounded. For online learning formulations, we propose light algorithms for both stationary environment and non-stationary environment and show their competitiveness by comparing the performance with the optimal offline policy (solved by assuming the resource profile is known). We perform comprehensive numerical evaluations, including simulations based on trace data measured at application run-time, and validate our analysis on algorithm's complexity and performance based on the numerical results. Especially, we compare our algorithms with the existing heuristics and show that in some cases the performance loss given by the heuristic is considerable due to the sub-optimal strategy. Hence, we conclude that to efficiently leverage the distributed computational resource over the network, it is essential to formulate a sophisticated optimization problem that well captures the practical scenarios, and provide an algorithm that is light in complexity and suggests a good assignment strategy with performance guarantee.

APA, Harvard, Vancouver, ISO, and other styles

39

Schultek, Brian Robert. "Design and Implementation of the Heterogeneous Computing Device Management Architecture." University of Dayton / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1417801414.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Pagani, Marco. "Enabling Predictable Hardware Acceleration in Heterogeneous SoC-FPGA Computing Platforms." Thesis, Lille 1, 2020. http://www.theses.fr/2020LIL1I016.

Full text

Abstract:

Les architectures informatiques modernes pour les systèmes intégrés évoluent vers des plateformes de plus en plus hétérogènes, comprenant différents types de processeurs et d'accélérateurs. Cette évolution est entraînée par la nécessité de répondre à la demande croissante de capacité de calcul par les systèmes cyber-physiques modernes. Ces systèmes doivent acquérir et traiter de grandes quantités de données, provenant de différents capteurs, afin d'exécuter les tâches de contrôle et de surveillance nécessaires. Ces exigences se traduisent par la nécessité d'exécuter des charges de calcul complexes telles que des algorithmes d'apprentissage automatique, de traitement numérique des signaux et de cryptographie, en respectant les contraintes de temps imposées par l'interaction avec le monde physique. Les plateformes hétérogènes permettent de répondre à cette demande de calcul, en distribuant le travail entre les différents processeurs et accélérateurs, ce qui permet de maintenir un niveau élevé d'efficacité énergétique.Cette thèse contribue au développement du support pour les systèmes temps réels sur des plateformes hétérogènes, en présentant un ensemble de techniques et de méthodologies pour rendre prévisible l'accélération matérielle sur les plateformes SoC-FPGA. La première partie de cette thèse présente un framework pour soutenir le développement d'applications en temps réel sur les plateformes SoC-FPGA, en utilisant l'accélération matérielle et la reconfiguration dynamique partielle pour « virtualiser » les ressources logiques. Le framework est basé sur un device model qui exprime les caractéristiques des plateformes SoC-FPGA modernes, et il est structuré autour d'une infrastructure d’ordonnancement conçue pour garantir des temps de réponse limités. Cette caractéristique est fondamentale pour rendre l'accélération matérielle dynamique viable dans le contexte des systèmes critiques. La deuxième partie de la thèse présente une implémentation complète du framework proposé sur Linux. Grâce à cette implémentation, il est possible de développer des applications prévisibles dans l'environnement GNU/Linux, en profitant de l'accélération dynamique basée sur FPGA pour exécuter les opérations de calcul les plus intensifs, tout en utilisant la grande quantité de logiciels et de bibliothèques offerts par l'environnement. Ensuite, la dernière partie de cette thèse présente un mécanisme de régulation de la bande pour le bus AMBA AXI conçu pour améliorer la prévisibilité de l'accélération matérielle sur les plateformes hétérogènes Modern computing platforms for embedded systems are evolving towards heterogeneous architectures comprising different types of processing elements and accelerators. Such an evolution is driven by the steady increasing computational demand required by modern cyber-physical systems. These systems need to acquire large amounts of data from multiple sensors and process them for performing the required control and monitoring tasks. These requirements translate into the need to execute complex computing workloads such as machine learning, encryption, and advanced signal processing algorithms, within the timing constraints imposed by the physical world. Heterogeneous systems can meet this computational demand with a high level of energy efficiency by distributing the computational workload among the different processing elements.This thesis contributes to the development of system support for real-time systems on heterogeneous platforms by presenting novel methodologies and techniques for enabling predictable hardware acceleration on SoC-FPGA platforms. The first part of this thesis presents a framework designed for supporting the development of real-time applications on SoC-FPGAs, leveraging hardware acceleration and logic resource “Virtualization” through dynamic partial reconfiguration. The proposed framework is based on a device model that matches the capabilities of modern SoC-FPGA devices, and it is centered around a custom scheduling infrastructure designed to guarantee bounded response times. This characteristic is crucial for making dynamic hardware acceleration viable for safety-critical applications. The second part of this thesis presents a full implementation of the proposed framework on Linux. Such implementation allows developing predictable applications leveraging the large number of software systems available on GNU/Linux while relying on dynamic FPGA-based hardware acceleration for performing heavy computations. Finally, the last part of this thesis introduces a reservation mechanism for the AMBA AXI bus aimed at improving the predictability of hardware accelerators by regulating BUS contention through a bandwidth reservation mechanism

APA, Harvard, Vancouver, ISO, and other styles

41

Brown, Grant Donald. "Application Of Heterogeneous Computing Techniques To Compartmental Spatiotemporal Epidemic Models." Diss., University of Iowa, 2015. https://ir.uiowa.edu/etd/1554.

Full text

Abstract:

The application of spatial methods to epidemic estimation and prediction problems is a vibrant and active area of research. In many cases, however, well thought out and laboratory supported models for epidemic patterns may be easy to specify but extremely difficult to fit efficiently. While this problem exists in many scientific disciplines, epidemic modeling is particularly prone to this challenge due to the rate at which the problem scope grows as a function of the size of the spatial and temporal domains involved. An additional barrier to widespread use of spatiotemporal epidemic models is the lack of user friendly software packages capable of fitting them. In particular, compartmental epidemic models are easy to understand, but in most cases difficult to fit. This class of epidemic models describes a set of states, or compartments, which captures the disease progression in a population. This dissertation attempts to expand the problem scope to which spatio-temporal compartmental epidemic models are applicable both computationally and practically. In particular, a general family of spatially heterogeneous SEIRS models is developed alongside a software library with the dual goals of high computational performance and ease of use in fitting models in this class. We emphasize the task of model specification, and develop a framework describing the components of epidemic behavior. In addition, we establish methods to estimate and interpret reproductive numbers, which are of fundamental importance to the study of infectious disease. Finally, we demonstrate the application of these techniques both under simulation, and in the context of a diverse set of real diseases, including Ebola Virus Disease, Smallpox, Methicillin-resistant Staphylococcus aureus, and Influenza.

APA, Harvard, Vancouver, ISO, and other styles

42

Cumming, Benjamin Donald. "Modelling sea water intrusion in coastal aquifers using heterogeneous computing." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/61038/1/Benjamin_Cumming_Thesis.pdf.

Full text

Abstract:

The objective of this PhD research program is to investigate numerical methods for simulating variably-saturated flow and sea water intrusion in coastal aquifers in a high-performance computing environment. The work is divided into three overlapping tasks: to develop an accurate and stable finite volume discretisation and numerical solution strategy for the variably-saturated flow and salt transport equations; to implement the chosen approach in a high performance computing environment that may have multiple GPUs or CPU cores; and to verify and test the implementation. The geological description of aquifers is often complex, with porous materials possessing highly variable properties, that are best described using unstructured meshes. The finite volume method is a popular method for the solution of the conservation laws that describe sea water intrusion, and is well-suited to unstructured meshes. In this work we apply a control volume-finite element (CV-FE) method to an extension of a recently proposed formulation (Kees and Miller, 2002) for variably saturated groundwater flow. The CV-FE method evaluates fluxes at points where material properties and gradients in pressure and concentration are consistently defined, making it both suitable for heterogeneous media and mass conservative. Using the method of lines, the CV-FE discretisation gives a set of differential algebraic equations (DAEs) amenable to solution using higher-order implicit solvers. Heterogeneous computer systems that use a combination of computational hardware such as CPUs and GPUs, are attractive for scientific computing due to the potential advantages offered by GPUs for accelerating data-parallel operations. We present a C++ library that implements data-parallel methods on both CPU and GPUs. The finite volume discretisation is expressed in terms of these data-parallel operations, which gives an efficient implementation of the nonlinear residual function. This makes the implicit solution of the DAE system possible on the GPU, because the inexact Newton-Krylov method used by the implicit time stepping scheme can approximate the action of a matrix on a vector using residual evaluations. We also propose preconditioning strategies that are amenable to GPU implementation, so that all computationally-intensive aspects of the implicit time stepping scheme are implemented on the GPU. Results are presented that demonstrate the efficiency and accuracy of the proposed numeric methods and formulation. The formulation offers excellent conservation of mass, and higher-order temporal integration increases both numeric efficiency and accuracy of the solutions. Flux limiting produces accurate, oscillation-free solutions on coarse meshes, where much finer meshes are required to obtain solutions with equivalent accuracy using upstream weighting. The computational efficiency of the software is investigated using CPUs and GPUs on a high-performance workstation. The GPU version offers considerable speedup over the CPU version, with one GPU giving speedup factor of 3 over the eight-core CPU implementation.

APA, Harvard, Vancouver, ISO, and other styles

43

Faticanti, Francescomaria. "Resource Allocation Strategies in Highly Distributed and Heterogeneous Computing Systems." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/321482.

Full text

Abstract:

The spread of IoT devices has led to the design of new extensions of cloud computing, such as fog computing, providing IoT applications with reduced latency, location-awareness and mobility support. The term cloud-to-things continuum in this context refers to the fact that the computation is no longer confined to a few data centers but workloads can be displaced from the central cloud to the edge of network involving multiple infrastructure owners and several devices with different computational characteristics. This heterogeneity impacts the capability of the infrastructure owner of satisfying the QoS requirements of clients. Consequently, solving the placement and orchestration problems among the cloud-to-things continuum becomes key to ensure the profitability for the involved stakeholders. This thesis focuses on the algorithmic solutions for the problem of placement and the orchestration of microservice-based applications in such a distributed and heterogeneous context. On one hand, the placement problem involves the design of efficient solutions for the deployment of applications on a fog infrastructure, a type of problem which typically is NP-hard, even assuming a complete knowledge of applications' requirements and resource availability. In this thesis, the focus is on the design of approximated solutions for the NP-hard problems behind such resource allocation tasks. The orchestration of fog applications, on the other hand, deals with the maintenance of applications' QoS requirements under partial information about applications requests arrivals. For each applications' module, orchestration algorithms involve the decision of deployment either in fog or in cloud as the applications requests vary over time. In order to deal with this problem, we developed solutions based on stochastic optimisation techniques. The proposed methods outperform standard cloud-native solutions and suggest new approaches for inter-operability between different fog regions. Additionally, numerical results confirm the scalability properties of all the proposed solutions and their efficiency in terms of infrastructure owner's costs, for the placement side, and in terms applications' QoS, for the orchestration part.

APA, Harvard, Vancouver, ISO, and other styles

44

Kerr, Andrew. "A model of dynamic compilation for heterogeneous compute platforms." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/47719.

Full text

Abstract:

Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity. The rise of parallelism adds an additional dimension to the challenge of portability, as different processors support different notions of parallelism, whether vector parallelism executing in a few threads on multicore CPUs or large-scale thread hierarchies on GPUs. Thus, software experiences obstacles to portability and efficient execution beyond differences in instruction sets; rather, the underlying execution models of radically different architectures may not be compatible. Dynamic compilation applied to data-parallel heterogeneous architectures presents an abstraction layer decoupling program representations from optimized binaries, thus enabling portability without encumbering performance. This dissertation proposes several techniques that extend dynamic compilation to data-parallel execution models. These contributions include: - characterization of data-parallel workloads - machine-independent application metrics - framework for performance modeling and prediction - execution model translation for vector processors - region-based compilation and scheduling We evaluate these claims via the development of a novel dynamic compilation framework, GPU Ocelot, with which we execute real-world workloads from GPU computing. This enables the execution of GPU computing workloads to run efficiently on multicore CPUs, GPUs, and a functional simulator. We show data-parallel workloads exhibit performance scaling, take advantage of vector instruction set extensions, and effectively exploit data locality via scheduling which attempts to maximize control locality.

APA, Harvard, Vancouver, ISO, and other styles

45

Raman, Pirabhu. "GEMS Gossip-Enabled Monitoring Service for heterogeneous distributed systems /." [Gainesville, Fla.] : University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE0000598.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Chang, He. "Server selection for heterogeneous cloud video services." HKBU Institutional Repository, 2017. http://repository.hkbu.edu.hk/etd_oa/419.

Full text

Abstract:

Server selection is an important problem of cloud computing in which cloud service providers direct user demands to servers in one of the multiple data centers located in different geographical locations. The existing solutions usually assume homogeneity of cloud services (i.e., all users request the same type of service) and handle user demands in an individual basis which incurs high computational overhead. In this study, we propose a new and effective server selection scheme in which diversities of cloud services are taken into account. We focus on a specific cloud service, i.e., online video service, and assume that different videos have different bandwidth requirements. We group users into clusters and handle user demands on a cluster basis for faster and more efficient process. Firstly, we assume that user demands and bandwidth capacities of servers are given in the data centers, our problem is to assign the user demands to the servers under the bandwidth constraint, such that the overall latency (measured by the network distance) between the user clusters and the selected servers is minimized. We design a server selection system and formulate this problem as a linear programming formulation which can be solved by existing techniques. The system periodically executes our scheme and computes an optimal solution for server selection. User demands are assigned to the servers according to the optimal solution and the minimum overall latency can be achieved. The simulation results show that our scheme is significantly better than the random algorithm and the YouTube server selection strategy. Based on the first part, we take the storage capacities of servers constraint into consideration. In the second part, our new problem is to assign the user demands to the servers under the bandwidth and storage constraint, such that the function of overall latency (measured by the network distance) between the user clusters and the selected servers and standard deviation of traffic load of every server in the system is minimized. We design a server selection system and formulate this problem which can be solved by existing techniques. User demands are assigned to the servers according to the optimal solution and the two goals (minimum overall latency and the most balanced traffic load) can be achieved. The simulation results show the influence of different weights of these two goals on the user demands assigning.

APA, Harvard, Vancouver, ISO, and other styles

47

Rafique, Muhammad Mustafa. "An Adaptive Framework for Managing Heterogeneous Many-Core Clusters." Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/29119.

Full text

Abstract:

The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as IBM Cell and AMD Fusion APUs, and commodity computational accelerators, such as programmable GPUs, which exhibit excellent price to performance ratio as well as the much needed high energy efficiency. While such accelerators have been studied in detail as stand-alone computational engines, integrating the accelerators into large-scale distributed systems with heterogeneous computing resources for data-intensive computing presents unique challenges and trade-offs. Traditional programming and resource management techniques cannot be directly applied to many-core accelerators in heterogeneous distributed settings, given the complex and custom instruction sets architectures, memory hierarchies and I/O characteristics of different accelerators. In this dissertation, we explore the design space of using commodity accelerators, specifically IBM Cell and programmable GPUs, in distributed settings for data-intensive computing and propose an adaptive framework for programming and managing heterogeneous clusters. The proposed framework provides a MapReduce-based extended programming model for heterogeneous clusters, which distributes tasks between asymmetric compute nodes by considering workload characteristics and capabilities of individual compute nodes. The framework provides efficient data prefetching techniques that leverage general-purpose cores to stage the input data in the private memories of the specialized cores. We also explore the use of an advanced layered-architecture based software engineering approach and provide mixin-layers based reusable software components to enable easy and quick deployment of heterogeneous clusters. The framework also provides multiple resource management and scheduling policies under different constraints, e.g., energy-aware and QoS-aware, to support executing concurrent applications on multi-tenant heterogeneous clusters. When applied to representative applications and benchmarks, our framework yields significantly improved performance in terms of programming efficiency and optimal resource management as compared to conventional, hand-tuned, approaches to program and manage accelerator-based heterogeneous clusters. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

48

Huang, Jun. "Heterogeneity-aware approaches to optimizing performance of computing and communication tasks." Auburn, Ala., 2005. http://repo.lib.auburn.edu/2005%20Fall/Dissertation/HUANG_JUN_28.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Porter, N. Wayne. "Resource usage for adaptive C4I models in a heterogeneous computing environment." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1999. http://handle.dtic.mil/100.2/ADA366190.

Full text

Abstract:

Thesis (M.S. in Computer Science) Naval Postgraduate School, June 1999. "June 1999". Thesis advisor(s): Debra Hensgen, William G. Kemple. Includes bibliographical references (p. 175-179). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

50

Shan, Meijuan. "Distributed object-oriented parallel computing on heterogeneous workstation clusters using Java." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/mq43403.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Heterogeneous computing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles