Dissertations / Theses on the topic 'GPU-CPU'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'GPU-CPU.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Fang, Zhuowen. "Java GPU vs CPU Hashing Performance." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-33994.
Full textDollinger, Jean-François. "A framework for efficient execution on GPU and CPU+GPU systems." Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAD019/document.
Full textTechnological limitations faced by the semi-conductor manufacturers in the early 2000's restricted the increase in performance of the sequential computation units. Nowadays, the trend is to increase the number of processor cores per socket and to progressively use the GPU cards for highly parallel computations. Complexity of the recent architectures makes it difficult to statically predict the performance of a program. We describe a reliable and accurate parallel loop nests execution time prediction method on GPUs based on three stages: static code generation, offline profiling, and online prediction. In addition, we present two techniques to fully exploit the computing resources at disposal on a system. The first technique consists in jointly using CPU and GPU for executing a code. In order to achieve higher performance, it is mandatory to consider load balance, in particular by predicting execution time. The runtime uses the profiling results and the scheduler computes the execution times and adjusts the load distributed to the processors. The second technique, puts CPU and GPU in a competition: instances of the considered code are simultaneously executed on CPU and GPU. The winner of the competition notifies its completion to the other instance, implying the termination of the latter
Gjermundsen, Aleksander. "CPU and GPU Co-processing for Sound." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2010. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-11794.
Full textCARLOS, EDUARDO TELLES. "HYBRID FRUSTUM CULLING USING CPU AND GPU." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2009. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=31453@1.
Full textUm dos problemas mais antigos da computação gráfica tem sido a determinação de visibilidade. Vários algoritmos têm sido desenvolvidos para viabilizar modelos cada vez maiores e detalhados. Dentre estes algoritmos, destaca-se o frustum culling, cujo papel é remover objetos que não sejam visíveis ao observador. Esse algoritmo, muito comum em várias aplicações, vem sofrendo melhorias ao longo dos anos, a fim de acelerar ainda mais a sua execução. Apesar de ser tratado como um problema bem resolvido na computação gráfica, alguns pontos ainda podem ser aperfeiçoados, e novas formas de descarte desenvolvidas. No que se refere aos modelos massivos, necessita-se de algoritmos de alta performance, pois a quantidade de cálculos aumenta significativamente. Este trabalho objetiva avaliar o algoritmo de frustum culling e suas otimizações, com o propósito de obter o melhor algoritmo possível implementado em CPU, além de analisar a influência de cada uma de suas partes em modelos massivos. Com base nessa análise, novas técnicas de frustum culling serão desenvolvidas, utilizando o poder computacional da GPU (Graphics Processing Unit), e comparadas com o resultado obtido apenas pela CPU. Como resultado, será proposta uma forma de frustum culling híbrido, que tentará aproveitar o melhor da CPU e da GPU.
The definition of visibility is a classical problem in Computer Graphics. Several algorithms have been developed to enable the visualization of huge and complex models. Among these algorithms, the frustum culling, which plays an important role in this area, is used to remove invisible objects by the observer. Besides being very usual in applications, this algorithm has been improved in order to accelerate its execution. Although being treated as a well-solved problem in Computer Graphics, some points can be enhanced yet, and new forms of culling may be disclosed as well. In massive models, for example, algorithms of high performance are required, since the calculus arises considerably. This work analyses the frustum culling algorithm and its optimizations, aiming to obtain the state-of-the-art algorithm implemented in CPU, as well as explains the influence of each of its steps in massive models. Based on this analysis, new GPU (Graphics Processing Unit) based frustum culling techniques will be developed and compared with the ones using only CPU. As a result, a hybrid frustum culling will be proposed, in order to achieve the best of CPU and GPU processing.
Farooqui, Naila. "Runtime specialization for heterogeneous CPU-GPU platforms." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54915.
Full textSmith, Michael Shawn. "Performance Analysis of Hybrid CPU/GPU Environments." PDXScholar, 2010. https://pdxscholar.library.pdx.edu/open_access_etds/300.
Full textWong, Henry Ting-Hei. "Architectures and limits of GPU-CPU heterogeneous systems." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/2529.
Full textGummadi, Deepthi. "Improving GPU performance by regrouping CPU-memory data." Thesis, Wichita State University, 2014. http://hdl.handle.net/10057/10959.
Full textThesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and Computer Science
Chen, Wei. "Dynamic Workload Division in GPU-CPU Heterogeneous Systems." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1364250106.
Full textBen, Romdhanne Bilel. "Simulation des réseaux à grande échelle sur les architectures de calculs hétérogènes." Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0088/document.
Full textThe simulation is a primary step on the evaluation process of modern networked systems. The scalability and efficiency of such a tool in view of increasing complexity of the emerging networks is a key to derive valuable results. The discrete event simulation is recognized as the most scalable model that copes with both parallel and distributed architecture. Nevertheless, the recent hardware provides new heterogeneous computing resources that can be exploited in parallel.The main scope of this thesis is to provide a new mechanisms and optimizations that enable efficient and scalable parallel simulation using heterogeneous computing node architecture including multicore CPU and GPU. To address the efficiency, we propose to describe the events that only differs in their data as a single entry to reduce the event management cost. At the run time, the proposed hybrid scheduler will dispatch and inject the events on the most appropriate computing target based on the event descriptor and the current load obtained through a feedback mechanisms such that the hardware usage rate is maximized. Results have shown a significant gain of 100 times compared to traditional CPU based approaches. In order to increase the scalability of the system, we propose a new simulation model, denoted as general purpose coordinator-master-worker, to address jointly the challenge of distributed and parallel simulation at different levels. The performance of a distributed simulation that relies on the GP-CMW architecture tends toward the maximal theoretical efficiency in a homogeneous deployment. The scalability of such a simulation model is validated on the largest European GPU-based supercomputer
Sundberg, Andreas. "Skapa digitalt fingeravtryck med hjälp av CPU och GPU." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-12851.
Full textHar haft kontakt med Västgöta-Data AB som har gett feedback på arbetet och även gett förslag på hur man kan utföra arbetet.
Krishnasamy, Ezhilmathi. "Hybrid CPU-GPU Parallel Simulations of 3D Front Propagation." Thesis, Linköpings universitet, Hållfasthetslära, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-114935.
Full textLindqvist, Sebastian. "Performance Evaluation of Boids on the GPU and CPU." Thesis, Blekinge Tekniska Högskola, Institutionen för kreativa teknologier, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-15970.
Full textVenkatasubramanian, Sundaresan. "Tuned and asynchronous stencil kernels for CPU/GPU systems." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29728.
Full textCommittee Chair: Vuduc, Richard; Committee Member: Kim, Hyesoon; Committee Member: Vetter, Jeffrey. Part of the SMARTech Electronic Thesis and Dissertation Collection.
Lind, Eric, and Velasquez Ävelin Pantigoso. "A performance comparison between CPU and GPU in TensorFlow." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260240.
Full textDet snabbt växande fältet Maskininlärning har de senaste åren kommit att bli vanligare och vanligare, det har gått från att vara ett forskningsfält till att användas mer generellt i produktutveckling. Ramverk som TensorFlow har utvecklats för att göra det möjligt att skala och analysera artificiella neurala nätverk, dessa används inom Djupinlärning, ett fält inom Maskininlärning. Denna rapport undersöker hur väl ramverket TensorFlow utför beräkningar med åtanke till tid och minnesallokering på CPU samt GPU eftersom dessa är de faktorer som är mest begränsade resurserna under träning. Tre artificiella neurala nätverk har använts för att undersöka hur TensorFlow allokerar resurserna och hur den använder sig utav operationer som utförs under träningsfasen av de neural nätverken. Genom att använda TensorFlows profiler kunde vi följa hur varje operation var utfördes i både GPU och CPU. Från datan kunde vi analysera operationer tog tid och allokerad minne under hela träningsfasen. Resultatet visade på att träning av mer komplexa neurala nätverk drog nytta av att utföras på GPU, medan mer simpla neurala nätverk hade ingen eller en obetydlig vinning från att använda GPU istället för CPU. Resultaten indikerar också möjliga upptäckter som kan undersökas i framtida forskning. Som till exempel processorernas utnyttjande eftersom vi fann luckor inom operations schemat som inte blev undersökt i denna studie.
Lagerhult, Christopher. "Smartphone CPU : An Energy efficient alternative to the GPU." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-397426.
Full textOspici, Matthieu. "Modèles de programmation et d'exécution pour les architectures parallèles et hybrides. Applications à des codes de simulation pour la physique." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00934266.
Full textNorgren, David. "Implementing and Evaluating CPU/GPU Real-Time Ray Tracing Solutions." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-32076.
Full textSandgren, Julius. "Transfer Time Reduction of Data Transfers between CPU and GPU." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-205272.
Full textErik, Liljeqvist. "Evaluating a CPU/GPU Implementation for Real-Time Ray Tracing." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-35768.
Full textSvantesson, David, and Martin Eklund. "A naive implementation of Topological Sort on GPU : A comparative study between CPU and GPU performance." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186417.
Full textZhang, Junchi. "GPU computing of Heat Equations." Digital WPI, 2015. https://digitalcommons.wpi.edu/etd-theses/515.
Full textVekterli, Tor Brede. "Parallelization of Artificial Spiking Neural Networks on the CPU and GPU." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9838.
Full textConventional artificial neural networks have traditionally faced inherent problems with efficient parallelization of neuron processing. Recent research has shown how artificial spiking neural networks can, with the introduction of biologically plausible synaptic conduction delays, be fully parallelized regardless of their network topology. This, in conjunction with the influx of fast, massively parallel desktop-level computing hardware leaves the field of efficient, large-scale spiking neural network simulations potentially open to even those with no access to supercomputers or large computing clusters. This thesis aims to show how such a parallelization is possible as well as present a network model that enables it. This model will then be used as a base for implementing a parallel artificial spiking neural network on both the CPU and the GPU and subsequently evaluating some of the challenges involved, the performance and scalability measured and the potential that is exhibited.
Enmyren, Johan. "A Skeleton Programming Library for Multicore CPU and Multi-GPU Systems." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-60319.
Full textBerthou, Gautier. "Implementation of an object-detection algorithm on a CPU+GPU target." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-206178.
Full textSystem så som autonoma vehiklar kan kräva inbyggd bildbehandling i realtid under hårdvarubegränsningar. Denna uppsats tillhandahåller anvisningar för att designa tidsoch resurseffektiva Haar-kasad detekterande algoritmer. Dessutom granskas en del mjukvaruarkitektur och hårdvaruaspekter. De avsedda algoritmerna är menade att användas på plattformar försedda med en CPU och en GPU under begränsad energitillgång. Det huvudsakliga målet med projektet var att designa och utveckla realtidsalgoritmer för detektering av objekt under vatten. Dock är koncepten som presenteras i arbetet generiska och kan appliceras på andra domäner där objektdetektering kan behövas, till exempel vid detektering av ansikten. Resultaten visar hur lösningarna överträffar OpenCVs kaskaddetektor beträffande exekutionstid och med samtidig lika stor träffsäkerhet.
Concha, Ramírez Francisca Andrea. "FADRA: A CPU-GPU framework for astronomical data reduction and Analysis." Tesis, Universidad de Chile, 2016. http://repositorio.uchile.cl/handle/2250/140769.
Full textEsta tesis establece las bases de FADRA: Framework for Astronomical Data Reduction and Analysis. El framework FADRA fue diseñado para ser eficiente, simple de usar, modular, expandible, y open source. Hoy en día, la astronomía es inseparable de la computación, pero algunos de los software más usados en la actualidad fueron desarrollados tres décadas atrás y no están diseñados para enfrentar los actuales paradigmas de big data. El mundo del software astronómico debe evolucionar no solo hacia prácticas que comprendan y adopten la era del big data, sino también que estén enfocadas en el trabajo colaborativo de la comunidad. El trabajo desarollado consistió en el diseño e implementación de los algoritmos básicos para el análisis de datos astronómicos, dando inicio al desarrollo del framework. Esto consideró la implementación de estructuras de datos eficientes al trabajar con un gran número de imágenes, la implementación de algoritmos para el proceso de calibración o reducción de imágenes astronómicas, y el diseño y desarrollo de algoritmos para el cálculo de fotometría y la obtención de curvas de luz. Tanto los algoritmos de reducción como de obtención de curvas de luz fueron implementados en versiones CPU y GPU. Para las implementaciones en GPU, se diseñaron algoritmos que minimizan la cantidad de datos a ser procesados de manera de reducir la transferencia de datos entre CPU y GPU, proceso lento que muchas veces eclipsa las ganancias en tiempo de ejecución que se pueden obtener gracias a la paralelización. A pesar de que FADRA fue diseñado con la idea de utilizar sus algoritmos dentro de scripts, un módulo wrapper para interactuar a través de interfaces gráficas también fue implementado. Una de las principales metas de esta tesis consistió en la validación de los resultados obtenidos con FADRA. Para esto, resultados de la reducción y curvas de luz fueron comparados con resultados de AstroPy, paquete de Python con distintas utilidades para astrónomos. Los experimentos se realizaron sobre seis datasets de imágenes astronómicas reales. En el caso de reducción de imágenes astronómicas, el Normalized Root Mean Squared Error (NRMSE) fue utilizado como métrica de similaridad entre las imágenes. Para las curvas de luz, se probó que las formas de las curvas eran iguales a través de la determinación de offsets constantes entre los valores numéricos de cada uno de los puntos pertenecientes a las distintas curvas. En términos de la validez de los resultados, tanto la reducción como la obtención de curvas de luz, en sus implementaciones CPU y GPU, generaron resultados correctos al ser comparados con los de AstroPy, lo que significa que los desarrollos y aproximaciones diseñados para FADRA otorgan resultados que pueden ser utilizados con seguridad para el análisis científico de imágenes astronómicas. En términos de tiempos de ejecución, la naturaleza intensiva en uso de datos propia del proceso de reducción hace que la versión GPU sea incluso más lenta que la versión CPU. Sin embargo, en el caso de la obtención de curvas de luz, el algoritmo GPU presenta una disminución importante en tiempo de ejecución comparado con su contraparte en CPU.
Este trabajo ha sido parcialmente financiado por Proyecto Fondecyt 1120299
Öhberg, Tomas. "Auto-tuning Hybrid CPU-GPU Execution of Algorithmic Skeletons in SkePU." Thesis, Linköpings universitet, Programvara och system, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-149605.
Full textVivanloc, Vincent. "Rendu distribué sur grappe de CPU/GPU et effets d'éclairage global." Toulouse 3, 2008. http://thesesups.ups-tlse.fr/823/.
Full textVirtual Prototyping and design review require realistic real time rendering. This brings two research axes: on one hand, a real time rendering of indirect illumination effects and on the other hand, a real time distributed high resolution rendering. Simulating global illumination effects provide a sensible improvement over computer graphics currently generated by rasterisation. We were involved in indirect lighting and specular reflection rendering. Render indirect illumination is now possible for low frequency lightings. For broader frequencies, the indirect illumination rendering is limited to static scenes. This latter case requires a long preprocessing time or a lengthy mesh parametrisation. Our contribution consists in a fast reconstruction of global illumination from a photon map without any required parameterisation. The photon map is then simplified into an octree of virtual directional lights. The radiance is therefore evaluated on the fly by a graphic card to provide a real time navigation into a global illuminated scene. We also try to improve the quality of specular reflections in rasterisation to avoid a costly raytracing simulation. Indeed, rasterised reflexion are only valid for reflected items located at infinity. Thus, the quality improvement of existing solutions relies on over simplified hypothesis on scene topology. Therefore, we devised a method based on an iterative search to provide a plausible solution for near reflexions. However, the obtained accuracy is followed by some parallax phenomenon. This problem is partly limited by a local reconstruction of geometry by our projected geometry buffer. A lot of existing solutions provide high resolution real time displays. In one hand, distributed rendering hardware suffer from a fast obsolescence and have only a limited scalability. In the other hand, software distribution are more extensibility but are stuck to rough renderings. However, modifying these solutions in order to improve the quality of the rendered pictures with multipass shaders is relatively difficult : legacy software interlaces the rendering procedures with the data distribution algorithms. On the contrary, a modular architecture might improve the re-usability of a distributed system; the development of rendering methods becomes independent from any data distribution code. This is what HiD2RA tries to provide, assisted by its meta scenegraph. This implementation of remote proxy design pattern offers an extensible interface for the development of real time high quality rendering applications on display walls
Trichy, Ravi Vignesh. "Runtime Systems and Scheduling Support for High-End CPU-GPU Architectures." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1338324367.
Full textHe, Guanlin. "Parallel algorithms for clustering large datasets on CPU-GPU heterogeneous architectures." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG062.
Full textClustering, which aims at achieving natural groupings of data, is a fundamental and challenging task in machine learning and data mining. Numerous clustering methods have been proposed in the past, among which k-means is one of the most famous and commonly used methods due to its simplicity and efficiency.Spectral clustering is a more recent approach that usually achieves higher clustering quality than k-means. However, classical algorithms of spectral clustering suffer from a lack of scalability due to their high complexities in terms of number of operations and memory space requirements. This scalability challenge can be addressed by applying approximation methods or by employing parallel and distributed computing.The objective of this thesis is to accelerate spectral clustering and make it scalable to large datasets by combining representatives-based approximation with parallel computing on CPU-GPU platforms. Considering different scenarios, we propose several parallel processing chains for large-scale spectral clustering. We design optimized parallel algorithms and implementations for each module of the proposed chains: parallel k-means on CPU and GPU, parallel spectral clustering on GPU using sparse storage format, parallel filtering of data noise on GPU, etc. Our various experiments reach high performance and validate the scalability of each module and the complete chains
Kankatala, Sriram. "Performance Analysis of kNN on large datasets using CUDA & Pthreads : Comparing between CPU & GPU." Thesis, Blekinge Tekniska Högskola, Institutionen för kommunikationssystem, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-10830.
Full textTopcu, Tumer. "Data Parallelism For Ray Casting Large Scenes On A Cpu-gpu Cluster." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609494/index.pdf.
Full texts memory capacity. The algorithm proposed in this work uses a data parallel approach where the scene is partitioned and assigned to CPU-GPU couples in a cluster to overcome this problem. Our algorithm focuses on ray casting which is a special case of ray tracing mainly used in visualization of volumetric data. CPUs are pretty ecient in ow control and branching while GPUs are very fast performing intense oating point operations. Using these facts, the GPUs in the cluster are assigned the task of performing ray casting while the CPUs are responsible for traversing the rays. In the end, we were able to visualize large scenes successfully by utilizing CPU-GPU couples eectively and observed that the performance is highly dependent on the viewing angle as a result of load imbalance.
Sharma, Vishist. "Sparse-Matrix support for the SkePU library for portable CPU/GPU programming." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-129687.
Full textFerenczi, Daniel. "Användning av Dynamisk Arbetslastbalansering mellan CPU och GPU för att Simulera Rök." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11025.
Full textPinto, Vinícius Garcia. "Escalonamento por roubo de tarefas em sistemas Multi-CPU e Multi-GPU." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/71270.
Full textIn the last years, one of alternatives adopted to increase performance in high performance computing systems have been the use of hybrid architectures. These architectures consist of multicore processors and specialized coprocessors, like GPUs. Coprocessors act as accelerators in some types of operations. On the other hand, current parallel programming models and tools are not suitable for hybrid scenarios, generating less portable applications. Task parallelism, considered a generic and high level programming paradigm, can be used in this scenario. However, it requires the use of dynamic scheduling algorithms, such as work stealing. In this context, this work presents a middleware (WORMS) that supports task parallelism with work stealing scheduling in multi-CPU and multi-GPU systems. This middleware allows task implementations for both CPU and GPU, deciding at runtime which implementation will run according to the available hardware resources. The performance results obtained with WORMS showed that is possible to outperform both CPU and GPU reference tools in some applications.
Mestre, Nuno Roberto Pereira. "Comparação do desempenho do FDTD com implementação em CPU e em GPU." Master's thesis, Universidade de Aveiro, 2012. http://hdl.handle.net/10773/10939.
Full textO Finite-Difference Time-Domain é um método utilizado em electromagnetismo computacional para simular a propagação de ondas electromagnéticas em meios cujas características podem não ser uniformes. É um método com inúmeras aplicações, e como tal é vantajoso que o seu desempenho possa ser aumentado, de preferência recorrendo a sistemas computacionais de baixo custo. O propósito desta dissertação é aproveitar duas tecnologias emergentes e de relativo baixo custo para aumentar o desempenho do FDTD em uma e duas dimensões. Essas tecnologias são sistemas com processadores Multi-Core e placas gráficas que permitem utilizar as suas características de processamento massivamente paralelo para a execução de código de propósito geral. Para explorar as capacidades de um sistema com processador Multi-Core, o algoritmo originalmente sequencial foi alterado de modo a ser executado em múltiplas threads. Por sua vez, para tirar partido da tecnologia CUDA, o algoritmo foi convertido de forma a ser executado num GPU. Os acréscimos de desempenho obtidos indicam que é vantajoso o uso destas tecnologias comparativamente com implementações puramente sequenciais.
The Finite-Difference Time-Domain is a method used in computational electromagnetics to simulate the propagation of electromagnetic waves in fields that might not have uniform characteristics. It is a method with countless applications and so it is advantageous to increase its performance, preferably using low cost computer systems. The purpose of this thesis is to make use of two relatively low-cost emerging technologies to increase the FDTD performance in one and two dimensions. These technologies are Multi-Core systems and graphics cards that allow the use of its massive parallel processing characteristics to run general purpose code. To make use of a Multi-Core system, the original sequential code was changed to be executed by multiple threads. In order to use the CUDA technology, the algorithm was converted so that it could be executed on the GPU. The performance increase shows that the use of these technologies is advantageous in comparison to pure sequential implementations.
Ansaloni, Pietro. "Analisi di immagini con trasformata Ranklet: ottimizzazioni computazionali su CPU e GPU." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/5037/.
Full textWen, Hao. "IMPROVING PERFORMANCE AND ENERGY EFFICIENCY FOR THE INTEGRATED CPU-GPU HETEROGENEOUS SYSTEMS." VCU Scholars Compass, 2018. https://scholarscompass.vcu.edu/etd/5664.
Full textGiuntoli, Guido. "Hybrid CPU/GPU implementation for the FE2 multi-scale method for composite problems." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/668824.
Full textEsta tesis apunta a desarrollar una implementación de alta performance computacional para resolver problemas grandes de materiales compuestos a través del método de Multi-Escala FE2. Trabajos previos no han logrado escalar la técnica FE2 a problemas de dimensiones reales con mayas de resolucion de más de 10 K elementos en la macro-escala y 100^3 elementos en la micro-escala. Esto último se debe a los requerimientos computacionales para llevar a cabo estos cálculos. Este trabajo identifica las partes computacionales más costosas del algoritmo FE2 y porta varias partes del cálculo de micro-escala a GPUs. Los casos considerados asumen condiciones de pequeñas deformaciones y estado estacionario de equilibrio. El trabajo provee una estrategía factible que puede ser usada en problemas reales de ingeniería para optimizar el diseño de estructuras de materiales compuestos. Para esto se presenta un esquema de acople entre el codigo MPI de multi-física Alya (macro-escala) y la versión acelerada CPU/GPU de Micropp (micro-escala). El sistema acoplado está diseñado para trabajar con arquitecturas de multiples GPUs y explotar la sobrecarga de GPUs. También, un método de multiple zonas de acople combinado con particionado pesado es propuesto para reducir el costo computacional y resolver el problema de balanceo de carga. La tesis demuestra que el método propuesto escala notablemente bien para los problemas modelo, especialmente en arquitecturas híbridas con nodos CPU distribuidos y comunicados con multiples GPUs. Más aún, la tesis clarifica las ventajas logradas con la versión acelerada CPU/GPU respecto a usar unicamente CPUs.
Barrientos, Rojel Ricardo Javier. "Búsqueda por Similitud en Espacios Métricos Sobre Plataformas Multi-Core (CPU y GPU)." Tesis, Universidad de Chile, 2011. http://www.repositorio.uchile.cl/handle/2250/102738.
Full textSajjapongse, Kittisak. "Hierarchical scheduling and uniform access programming frameworks for heterogeneous CPU-GPU computing clusters." Thesis, University of Missouri - Columbia, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10178997.
Full textThe advance of the GPU hardware architecture has made GPUs attractive devices for general-purpose computing. Modern GPUs are equipped with an increasing number of cores, a flexible memory hierarchy, and a large memory capacity. While the computational power of modern GPU devices has allowed their introduction in high-performance computing (HPC) clusters and the efficient processing of ever larger workloads, existing software components for HPC clusters still offer basic support for hardware heterogeneity and often cause performance limitations in the presence of GPU devices. In particular, two kinds of limitations are associated with these software components: runtime support and programmability. We found that these limitations are due to the fact that existing software frameworks for heterogeneous clusters treat GPUs as dedicated coprocessor devices.
In this dissertation, we propose two software frameworks for addressing the performance and hardware underutilization issues found in heterogeneous CPU-GPU clusters as well as increasing their programmability. Our frameworks provide a uniform view of compute resources and treat CPUs and GPUs equally as first-class resources, allowing efficient management of heterogeneous compute resources. First, we propose a hierarchical scheduling framework consisting of a node-level runtime and a cluster-level scheduler that provides abstraction of heterogeneous compute resources at different granularities. This hierarchical framework targets existing applications and does not require their modification. In the node-level runtime, we identify and design mechanisms, such as virtual GPUs, GPU virtual memory, dynamic load balancing and pre-emption, which are necessary to support efficient sharing and load balancing schemes for GPUs within a compute node. In the cluster-level scheduler, we introduce mechanisms to abstract compute nodes and perform load balancing in concert with the node-level runtime. Our hierarchical scheduling framework allows supporting different load balancing policies and does not require additional inputs (such as profiling information) from users. Second, we propose a programming framework based on a novel memory and execution model. Our memory model hides disjoint addressing spaces (corresponding to different CPUs, GPUs and compute nodes) and provides a view of a single virtual memory space that can be accessed by all compute resources in a heterogeneous cluster. Our execution model provides uniform access to compute resources and allows our framework to treat all CPUs and GPUs equally and to access data in the virtual memory space.
Wahlberg, Björn. "Att procedurellt generera ett 2D landskap parallellt på GPU vs seriellt på CPU." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18759.
Full textFauzia, Naznin. "Characterization of Data Locality Potential of CPU and GPU Applications through Dynamic Analysis." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1420759839.
Full textVan, Winkle Scott E. "Dynamic Bandwidth and Laser Scaling for CPU-GPU Heterogenous Network-on-Chip Architectures." Ohio University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1500992706350957.
Full textXue, Weicheng. "CPU/GPU Code Acceleration on Heterogeneous Systems and Code Verification for CFD Applications." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/102073.
Full textDoctor of Philosophy
Computational Fluid Dynamics (CFD) is a numerical method to solve fluid problems, which usually requires a large amount of computations. A large CFD problem can be decomposed into smaller sub-problems which are stored in discrete memory locations and accelerated by a large number of compute units. In addition to code acceleration, it is important to ensure that the code and algorithm are implemented correctly, which is called code verification. This dissertation focuses on the CFD code acceleration as well as the code verification for turbulence model implementation. In this dissertation, multiple Graphic Processing Units (GPUs) are utilized to accelerate two CFD codes, considering that the GPU has high computational power and high memory bandwidth. A variety of optimizations are developed and applied to improve the performance of CFD codes on different parallel computing systems. The program execution time can be reduced significantly especially when multiple GPUs are used. In addition, code-to-code comparisons with some NASA CFD codes and the method of manufactured solutions are utilized to verify the correctness of a research CFD code.
Said, Issam. "Apports des architectures hybrides à l'imagerie profondeur : étude comparative entre CPU, APU et GPU." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066531/document.
Full textIn an exploration context, Oil and Gas (O&G) companies rely on HPC to accelerate depth imaging algorithms. Solutions based on CPU clusters and hardware accelerators are widely embraced by the industry. The Graphics Processing Units (GPUs), with a huge compute power and a high memory bandwidth, had attracted significant interest.However, deploying heavy imaging workflows, the Reverse Time Migration (RTM) being the most famous, on such hardware had suffered from few limitations. Namely, the lack of memory capacity, frequent CPU-GPU communications that may be bottlenecked by the PCI transfer rate, and high power consumptions. Recently, AMD has launched theAccelerated Processing Unit (APU): a processor that merges a CPU and a GPU on the same die, with promising features notably a unified CPU-GPU memory. Throughout this thesis, we explore how efficiently may the APU technology be applicable in an O&G context, and study if it can overcome the limitations that characterize the CPU and GPU based solutions. The APU is evaluated with the help of memory, applicative and power efficiency OpenCL benchmarks. The feasibility of the hybrid utilization of the APUs is surveyed. The efficiency of a directive based approach is also investigated. By means of a thorough review of a selection of seismic applications (modeling and RTM) on the node level and on the large scale level, a comparative study between the CPU, the APU and the GPU is conducted. We show the relevance of overlapping I/O and MPI communications with computations for the APU and GPUclusters, that APUs deliver performances that range between those of CPUs and those of GPUs, and that the APU can be as power efficient as the GPU
Said, Issam. "Apports des architectures hybrides à l'imagerie profondeur : étude comparative entre CPU, APU et GPU." Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066531.
Full textIn an exploration context, Oil and Gas (O&G) companies rely on HPC to accelerate depth imaging algorithms. Solutions based on CPU clusters and hardware accelerators are widely embraced by the industry. The Graphics Processing Units (GPUs), with a huge compute power and a high memory bandwidth, had attracted significant interest.However, deploying heavy imaging workflows, the Reverse Time Migration (RTM) being the most famous, on such hardware had suffered from few limitations. Namely, the lack of memory capacity, frequent CPU-GPU communications that may be bottlenecked by the PCI transfer rate, and high power consumptions. Recently, AMD has launched theAccelerated Processing Unit (APU): a processor that merges a CPU and a GPU on the same die, with promising features notably a unified CPU-GPU memory. Throughout this thesis, we explore how efficiently may the APU technology be applicable in an O&G context, and study if it can overcome the limitations that characterize the CPU and GPU based solutions. The APU is evaluated with the help of memory, applicative and power efficiency OpenCL benchmarks. The feasibility of the hybrid utilization of the APUs is surveyed. The efficiency of a directive based approach is also investigated. By means of a thorough review of a selection of seismic applications (modeling and RTM) on the node level and on the large scale level, a comparative study between the CPU, the APU and the GPU is conducted. We show the relevance of overlapping I/O and MPI communications with computations for the APU and GPUclusters, that APUs deliver performances that range between those of CPUs and those of GPUs, and that the APU can be as power efficient as the GPU
Sjölander, Erik. "Krypteringsalgoritmer i OpenCL : AES-256 och ECC ElGamal." Thesis, Linköpings universitet, Institutionen för systemteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-81660.
Full textLast years, the graphic cards have become more powerful than ever before. A conversion from pure rendering components to more general purpose computing devices together with languages like OpenCL have created a new division for graphics cards. The goal of this thesis is to show that crypthography algorithms are well suited for acceleration with OpenCL using graphics cards. A second goal was to show that C-code can be easily translated into OpenCL kernel with just a small syntax change. The two algorithms that have been used are AES-256 implemented in 8- and 32-bits variants, and the second algorithm is Elliptic Curve Crypthography with the ElGamal scheme. The algoritms are chosen to both represent fast symmetric and the slower public-key schemes. The results for AES-256 in ECB mode on GPU, ended up with a throughtput of 7Gbit/s which is a acceleration of 25 times compared to a CPU. For Elliptic Curve, a single scalar point multiplication for the B-163 NIST curve is computed on the GPU in 65us. Using this in the ElGamal encryption scheme, an acceleration of 55 and 67 times was gained for encryption and decryption. The work has been made at Syntronic Software Innovations AB in Linköping, Sweden.
Löfgren, Robin, and Kristoffer Dahl. "Beräkningar med GPU vs CPU : En jämförelsestudie av beräkningseffektivitet med avseende på energi- och tidsförbrukning." Thesis, Linnaeus University, School of Computer Science, Physics and Mathematics, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-5782.
Full textExamensarbetet handlar om en jämförelsestudie av beräkningseffektivitet med avseende på energi- och tidsförbrukning mellan grafikkort och processorer i persondatorer och PlayStation 3.
Problemet studeras för att göra allmänheten uppmärksam på att det går att lösa en del av energiproblematiken med beräkningar genom att öka energieffektiviteten av beräkningsenheterna.
Undersökningen har genomförts på ett explorativt sätt och studerar förhållandet mellan processorer, grafikkort och vilken som presterar bäst i vilket sammanhang. Prestandatest genomförs med molekylberäkningsprogrammet F@H och med filkomprimeringsprogrammet WinRAR. Testerna utförs på MultiCore- och SingleCorePCs och PS3s av olika karaktär. I vissa test mäts effektförbrukning för att kunna räkna ut hur energieffektiva vissa system är.
Resultatet visar tydligt hur den genomsnittliga effektförbrukningen och energieffektiviteten för olika testsystem skiljer sig vid belastning, viloläge och olika typer beräkningar.
The thesis is a comparative study of computational efficiency in terms of energy and time consumption of graphics cards and processors in personal computers and Playstation3’s.
The problem is studied in order to make the public aware that it is possible to solve some of the energy problems with computations by increasing energy efficiency of the computational units.
The audit was conducted in an exploratory way, studying the relationship between the processors, graphics cards and which one performs best in which context. Performance tests are carried out by the molecule calculating F@H-program and the file compression program WinRAR. Tests performed on MultiCore and SingleCore PC’s and PS3’s with different characteristics. In some tests power consumption is measured in order to figure out how energy-efficient certain systems are.
The results clearly show how the average power consumption and energy efficiency for various test systems at differ at load, sleep and various calculations.
Chavez, Daniel. "Parallelizing Map Projection of Raster Data on Multi-core CPU and GPU Parallel Programming Frameworks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-190883.
Full textKartprojektioner är en central del av geografiska informationssystem och en otalig mängd av kartprojektioner används idag. Omprojiceringen mellan olika kartprojektioner sker regelbundet i ett geografiskt informationssystem och den kan parallelliseras med flerkärniga CPU:er och GPU:er. Denna masteruppsats implementerar en parallel och analytisk omprojicering av rasterdata i C/C++ med ramverken Pthreads, C++11 STL threads, OpenMP, Intel TBB, CUDA och OpenCL. Uppsatsen jämför de olika implementationernas exekveringstider på tre rasterdata av varierande storlek, där OpenMP hade bäst speedup på 6, 6.2 och 5.5. GPU-implementationerna var 293 % snabbare än de snabbaste CPU-implementationerna, där profileringen visar att de senare spenderade mest tid på trigonometriska funktioner. Resultaten visar att GPU:n är bäst lämpad för omprojicering av rasterdata, medan OpenMP är den snabbaste inom CPU ramverken.