Log in

Relevant bibliographies by topics / GPU1 / Dissertations / Theses

To see the other types of publications on this topic, follow the link: GPU1.

Dissertations / Theses on the topic 'GPU1'

Author: Grafiati

Published: 28 June 2021

Last updated: 7 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'GPU1.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Stodůlka, Martin. "Akcelerace ultrazvukových simulací pomocí multi-GPU systémů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445538.

Full text

Abstract:

The main focus of this project is usage of multi - GPU systems and usage of CUDA unified memory . Its goal is to accelerate computation of 2D and 3D FFT, which is the main part of simulations in k- Wave library .K- Wave is a C++/ Matlab library used for simulations of propagation of ultrasonic waves in 1D , 2D or 3D space . Acceleration of these functions is necessary , because the simulations are computationally intensive .

APA, Harvard, Vancouver, ISO, and other styles

2

Ma, Wenjing. "Automatic Transformation and Optimization of Applications on GPUs and GPU clusters." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1300972089.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Tanasić, Ivan. "Towards multiprogrammed GPUs." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/405796.

Full text

Abstract:

Programmable Graphics Processing Units (GPUs) have recently become the most pervasitheve massively parallel processors. They have come a long way, from fixed function ASICs designed to accelerate graphics tasks to a programmable architecture that can also execute general-purpose computations. Because of their performance and efficiency, an increasing amount of software is relying on them to accelerate data parallel and computationally intensive sections of code. They have earned a place in many systems, from low power mobile devices to the biggest data centers in the world. However, GPUs are still plagued by the fact that they essentially have no multiprogramming support, resulting in low system performance if the GPU is shared among multiple programs. In this dissertation we set to provide the rich GPU multiprogramming support by improving the multitasking capabilities and increasing the virtual memory functionality and performance. The main issue hindering the multitasking support in GPUs is the nonpreemptive execution of GPU kernels. Here we propose two preemption mechanisms with dierent design philosophies, that can be used by a scheduler to preempt execution on GPU cores and make room for some other process. We also argue for the spatial sharing of the GPU and propose a concrete hardware scheduler implementation that dynamically partitions the GPU cores among running kernels, according to their set priorities. Opposing the assumptions made in the related work, we demonstrate that preemptive execution is feasible and the desired approach to GPU multitasking. We further show improved system fairness and responsiveness with our scheduling policy. We also pinpoint that at the core of the insufficient virtual memory support lies the exceptions handling mechanism used by modern GPUs. Currently, GPUs offload the actual exception handling work to the CPU, while the faulting instruction is stalled in the GPU core. This stall-on-fault model prevents some of the virtual memory features and optimizations and is especially harmful in multiprogrammed environments because it prevents context switching the GPU unless all the in-flight faults are resolved. In this disseritation, we propose three GPU core organizations with varying performance-complexity trade-off that get rid of the stall-on-fault execution and enable preemptible exceptions on the GPU (i.e., the faulting instruction can be squashed and restarted later). Building on this support, we implement two use cases and demonstrate their utility. One is a scheme that performs context switch of the faulted threads and tries to find some other useful work to do in the meantime, hiding the latency of the fault and improving the system performance. The other enables the fault handling code to run locally, on the GPU, instead of relying on the CPU offloading and show that the local fault handling can also improve performance.
Las Unidades de Procesamiento de Gráficos Programables (GPU, por sus siglas en inglés) se han convertido recientemente en los procesadores masivamente paralelos más difundidos. Han recorrido un largo camino desde ASICs de función fija diseñados para acelerar tareas gráficas, hasta una arquitectura programable que también puede ejecutar cálculos de propósito general. Debido a su rendimiento y eficiencia, una cantidad creciente de software se basa en ellas para acelerar las secciones de código computacionalmente intensivas que disponen de paralelismo de datos. Se han ganado un lugar en muchos sistemas, desde dispositivos móviles de baja potencia hasta los centros de datos más grandes del mundo. Sin embargo, las GPUs siguen plagadas por el hecho de que esencialmente no tienen soporte de multiprogramación, lo que resulta en un bajo rendimiento del sistema si la GPU se comparte entre múltiples programas. En esta disertación nos centramos en proporcionar soporte de multiprogramación para GPUs mediante la mejora de las capacidades de multitarea y del soporte de memoria virtual. El principal problema que dificulta el soporte multitarea en las GPUs es la ejecución no apropiativa de los núcleos de la GPU. Proponemos dos mecanismos de apropiación con diferentes filosofías de diseño, que pueden ser utilizados por un planificador para apropiarse de los núcleos de la GPU y asignarlos a otros procesos. También abogamos por la división espacial de la GPU y proponemos una implementación concreta de un planificador hardware que divide dinámicamente los núcleos de la GPU entre los kernels en ejecución, de acuerdo con sus prioridades establecidas. Oponiéndose a las suposiciones hechas por otros en trabajos relacionados, demostramos que la ejecución apropiativa es factible y el enfoque deseado para la multitarea en GPUs. Además, mostramos una mayor equidad y capacidad de respuesta del sistema con nuestra política de asignación de núcleos de la GPU. También señalamos que la causa principal del insuficiente soporte de la memoria virtual en las GPUs es el mecanismo de manejo de excepciones utilizado por las GPUs modernas. En la actualidad, las GPUs descargan el manejo de las excepciones a la CPU, mientras que la instrucción que causo la fallada se encuentra esperando en el núcleo de la GPU. Este modelo de bloqueo en fallada impide algunas de las funciones y optimizaciones de la memoria virtual y es especialmente perjudicial en entornos multiprogramados porque evita el cambio de contexto de la GPU a menos que se resuelvan todas las fallas pendientes. En esta disertación, proponemos tres implementaciones del pipeline de los núcleos de la GPU que ofrecen distintos balances de rendimiento-complejidad y permiten la apropiación del núcleo aunque haya excepciones pendientes (es decir, la instrucción que produjo la fallada puede ser reiniciada más tarde). Basándonos en esta nueva funcionalidad, implementamos dos casos de uso para demostrar su utilidad. El primero es un planificador que asigna el núcleo a otros subprocesos cuando hay una fallada para tratar de hacer trabajo útil mientras esta se resuelve, ocultando así la latencia de la fallada y mejorando el rendimiento del sistema. El segundo permite que el código de manejo de las falladas se ejecute localmente en la GPU, en lugar de descargar el manejo a la CPU, mostrando que el manejo local de falladas también puede mejorar el rendimiento.

APA, Harvard, Vancouver, ISO, and other styles

4

Hong, Changwan. "Code Optimization on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Wang, Kaibo. "Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1447685368.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Pedersen, Stian Aaraas. "Progressive Photon Mapping on GPUs." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-22995.

Full text

Abstract:

Physically based rendering using ray tracing is capable of producing realistic images of much higher quality than other methods. However, the computational costs associated with exploring all paths of light are huge; it can take hours to render high quality images of complex scenes. Using graphics processing units has emerged as a popular way to speed up this process. The recent appearance of libraries like Nvidia's CUDA and OptiX make the processing power of modern GPUs more available than ever before.This project includes an overview of current photon mapping techniques. We present a complete render application based on photon mapping which runs entirely on the GPU. Several different photon map implementations suitable for modern GPU architectures are considered and evaluated. A uniform grid approach based on photon sorting on the GPU is found to be fast to construct and efficient for photon gathering.The application is extended to support volumetric effects like fog and smoke. Our implementation is tested on a set of benchmark scenes exhibiting phenomenon like global illumination, reflections and refractions, and participating volumetric media.A major contribution of this thesis is to demonstrate how recent advances in photon mapping can be used to render an image using many GPUs simultaneously. Our results show that we are able to get close to linear speedup employing up to six GPUs in a distributed system. We can emit up to 18 million photons per second on six Nvidia GTX 480 and generate images of our test scenes with little to no noise in a few minutes. Our implementation is straightforward to extend to a cluster of GPUs.

APA, Harvard, Vancouver, ISO, and other styles

7

Harb, Mohammed. "Quantum transport modeling with GPUs." Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=114417.

Full text

Abstract:

In this thesis, we have developed a parallel GPU accelerated code for carrying out transport calculations within the Non-Equilibrium Green's Function (NEGF) framework using the Tight-Binding (TB) model. We also discuss the theoretical, modelling, and computational issues that arise in this implementation. We demonstrate that a heterogenous implementation with CPUs and GPUs is superior to single processor, multiple processor, and massively parallel CPU-only implementations. The GPU-Matlab Interface (GMI) developed in this work for use in our NEGF-TB code is not application specific and can be used by researchers in any field without previous knowledge of GPU programming or multi-threaded programming. We also demonstrate that GMI competes very well with commercial packages.Finally, we apply our heterogenous NEGF-TB code to the study of electronic transport properties of Si nanowires and nanobeams. We investigate the effect of several kinds of structural defects on the conductance of such devices and demonstrate that our method can handle systems of over 200,000 atoms in a reasonable time scale while using just 1-4 GPUs.
Dans cette thèse, nous présentons un logiciel qui effectue des calculs de transport quantique en utilisant conjointement la théorie des fonctions de Green hors équilibre (non equilibrium Green function, NEGF) et le modèle des liens étroits (tight-binding model, TB). Notre logiciel tire avantage du parallélisme inhérent aux algorithmes utilisés en plus d'être accéléré grâce à l'utilisation de processeurs graphiques (GPU). Nous abordons également les problèmes théoriques, géométriques et numériques qui se posent lors de l'implémentation du code NEGF-TB. Nous démontrons ensuite qu'une implémentation hétérogène utilisant des CPU et des GPU est supérieure aux implémentations à processeur unique, à celles à processeurs multiples, et même aux implémentations massivement parallèles n'utilisant que des CPU. Le GPU-Matlab Interface (GMI) présenté dans cette thèse fut développé pour des fins de calculs de transport quantique NEGF-TB. Néanmoins, les capacités de GMI ne se limitent pas à l'utilisation que nous en faisons ici et GMI peut être utilisé par des chercheurs de tous les domaines n'ayant pas de connaissances préalables de la programmation GPU ou de la programmation "multi-thread". Nous démontrons également que GMI compétitionne avantageusement avec des logiciels commerciaux similaires.Enfin, nous utilisons notre logiciel NEGF-TB pour étudier certaines propriétés de transport électronique de nanofils de Si et de Nanobeams. Nous examinons l'effet de plusieurs sortes de lacunes sur la conductance de ces structures et démontrons que notre méthode peut étudier des systèmes de plus de 200 000 atomes en un temps raisonnable en utilisant de un à quatre GPU sur un seul poste de travail.

APA, Harvard, Vancouver, ISO, and other styles

8

Hovland, Rune Johan. "Throughput Computing on Future GPUs." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9893.

Full text

Abstract:

The general-purpose computing capabilities of the Graphics Processing Unit (GPU) have recently been given a great deal of attention by the High-Performance Computing (HPC) community. By allowing massively parallel applications to run efficiently on commodity graphics cards, personal supercomputers are now available in desktop versions at a low price. For some applications, speedups of 70 times that of a single CPU implementation have been achieved. Among the most popular GPUs are those based on the NVIDIA Tesla Architecture which allows relatively easy development of GPU applications using the NVIDIA CUDA programming environment. While the GPU is gaining interest in the HPC community, others are more reluctant to embrace the GPU as a computational device. The focus on throughput and large data volumes separates Information Retrieval (IR) from HPC, since for IR it is critical to process large amounts of data efficiently, a task which the GPU currently does not excel at. Only recently has the IR community begun to explore the possibilities, and an implementation of a search engine for the GPU was published recently in April 2009. This thesis analyzes how GPUs can be improved to better suit large data volume applications. Current graphics cards have a bottleneck regarding the transfer of data between the host and the GPU. One approach to resolve this bottleneck is to include the host memory as part of the GPUs memory hierarchy. We develop a theoretical model, and based on this model, the expected performance improvement for high data volume applications are shown for both computationally-bound and data transfer-bound applications. The performance improvement for an existing search engine is also given based on the theoretical model. For this case, the improvements would result in a speedup between 1.389 and 1.874 for the various query-types supported by the search engine.

APA, Harvard, Vancouver, ISO, and other styles

9

Kim, Jinsung. "Optimizing Tensor Contractions on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563237825735994.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Tadros, Rimon. "Accelerating web search using GPUs." Thesis, University of British Columbia, 2015. http://hdl.handle.net/2429/54722.

Full text

Abstract:

The amount of content on the Internet is growing rapidly as well as the number of the online Internet users. As a consequence, web search engines need to increase their computing capabilities and data continually while maintaining low search latency and without a significant rise in the cost per query. To serve this larger numbers of online users, web search engines utilize a large distributed system in the data centers. They partition their data across several hundred of thousands of independent commodity servers called Index Serving Nodes (ISNs). These ISNs work together to serve search queries as a single coherent system in a distributed manner. The choice of a high number of commodity servers vs. a smaller number of supercomputers is due to the need for scalability, high availability/reliability, performance, and cost efficiency. For the web search engines to serve a larger data, the web search engines can be scaled either vertically or horizontally~\cite{michael2007scale}. Vertical scaling enables ranking more documents per query within a single node by employing machines with higher single thread and throughput performance with bigger and faster memory. Horizontal scaling supports a larger index by adding more index serving nodes at the cost of increased synchronization, aggregation overhead, and power consumption. This thesis evaluates the potential for achieving better vertical scaling by using Graphics processing unit (GPUs) to reduce the documents ranking latency per query at a reasonable initial cost increase. It achieves this by exploiting the parallelism in ranking the numerous potential documents that match a query to offload to the GPU. We evaluate this approach using hundreds of rankers from a commercial web search engine on real production data. Our results show an 8.8x harmonic mean reduction in the latency and 2x power efficiency when ranking 10000 web documents per query for a variety of rankers using C++AMP and a commodity GPU.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

11

Lapajne, Mikael Hellborg, and Daniel Slat. "Random Forests for CUDA GPUs." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2953.

Full text

Abstract:

Context. Machine Learning is a complex and resource consuming process that requires a lot of computing power. With the constant growth of information, the need for efficient algorithms with high performance is increasing. Today's commodity graphics cards are parallel multi processors with high computing capacity at an attractive price and are usually pre-installed in new PCs. The graphics cards provide an additional resource to be used in machine learning applications. The Random Forest learning algorithm which has been showed competitive within machine learning has a good potential for performance increase through parallelization of the algorithm. Objectives. In this study we implement and review a revised Random Forest algorithm for GPU execution using CUDA. Methods. A review of previous work in the area has been done by studying articles from several sources, including Compendex, Inspec, IEEE Xplore, ACM Digital Library and Springer Link. Additional information regarding GPU architecture and implementation specific details have been obtained mainly from documentation available from Nvidia and the Nvidia developer forums. The implemented algorithm has been benchmarked and compared with two state-of-the-art CPU implementations of the Random Forest algorithm, both regarding consumed time for training and classification and for classification accuracy. Results. Measurements from benchmarks made on the three different algorithms are gathered showing the performance results of the algorithms for two publicly available data sets. Conclusion. We conclude that our implementation under the right conditions is able to outperform its competitors. We also conclude that this is only true for certain data sets depending on the size of the data sets. Moreover we conclude that there is potential for further improvements of the algorithm both regarding performance as well as adaption towards a wider range of real world applications.
Mikael: +46768539263, Daniel: +46703040693

APA, Harvard, Vancouver, ISO, and other styles

12

Polok, Lukáš. "WaldBoost na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236790.

Full text

Abstract:

Image recognition and machine vision in general is quickly evolving field, due boom of cheap and powerful computation technologies. Image recognition has many different applications in wide spectrum of industries, ranging from communications trough security to entertainment. Algorithms for image recognition are still evolving and are often quite computationaly demanding. That is why some of authors deal with implementing the algorithms on specialized hardware accelerators. This work describes implementation of image recognition using the WaldBoost algorithm on the graphic accelerator (GPU) platform.

APA, Harvard, Vancouver, ISO, and other styles

13

Straňák, Marek. "Raytracing na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237020.

Full text

Abstract:

Raytracing is a basic technique for displaying 3D objects. The goal of this thesis is to demonstrate the possibility of implementing raytracer using a programmable GPU. The algorithm and its modified version, implemented using "C for CUDA" language, are described. The raytracer is focused on displaying dynamic scenes. For this purpose the KD tree structure, bounding volume hierarchies and PBO transfer are used. To achieve realistic output, photon mapping was implemented.

APA, Harvard, Vancouver, ISO, and other styles

14

Yanggratoke, Rerngvit. "GPU Network Processing." Thesis, KTH, Telekommunikationssystem, TSLab, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-103694.

Full text

Abstract:

Networking technology is connecting more and more people around the world. It has become an essential part of our daily life. For this connectivity to be seamless, networks need to be fast. Nonetheless, rapid growth in network traffic and variety of communication protocols overwhelms the Central Processing Units (CPUs) processing packets in the networks. Existing solutions to this problem such as ASIC, FPGA, NPU, and TOE are not cost effective and easy to manage because they require special hardware and custom configurations. This thesis approaches the problem differently by offloading the network processing to off-the-shelf Graphic Processing Units (GPUs). The thesis's primary goal is to find out how the GPUs should be used for the offloading. The thesis follows the case study approach and the selected case studies are layer 2 Bloom filter forwarding and flow lookup in Openflow switch. Implementation alternatives and evaluation methodology are proposed for both of the case studies. Then, the prototype implementation for comparing between traditional CPU-only and GPU-offloading approach is developed and evaluated. The primary findings from this work are criteria of network processing functions suitable for GPU offloading and tradeoffs involved. The criteria are no inter-packet dependency, similar processing flows for all packets, and within-packet parallel processing opportunity. This offloading trades higher latency and memory consumption for higher throughput.
Nätverksteknik ansluter fler och fler människor runt om i världen. Det har blivit en viktig del av vårt dagliga liv. För att denna anslutning skall vara sömlös, måste nätet vara snabbt. Den snabba tillväxten i nätverkstrafiken och olika kommunikationsprotokoll sätter stora krav på processorer som hanterar all trafik. Befintliga lösningar på detta problem, t.ex. ASIC, FPGA, NPU, och TOE är varken kostnadseffektivt eller lätta att hantera, eftersom de kräver speciell hårdvara och anpassade konfigurationer. Denna avhandling angriper problemet på ett annat sätt genom att avlasta nätverks processningen till grafikprocessorer som sitter i vanliga pc-grafikkort. Avhandlingen främsta mål är att ta reda på hur GPU bör användas för detta. Avhandlingen följer fallstudie modell och de valda fallen är lager 2 Bloom filter forwardering och ``flow lookup'' i Openflow switch. Implementerings alternativ och utvärderingsmetodik föreslås för både fallstudierna. Sedan utvecklas och utvärderas en prototyp för att jämföra mellan traditionell CPU- och GPU-offload. Det primära resultatet från detta arbete utgör kriterier för nätvärksprocessfunktioner lämpade för GPU offload och vilka kompromisser som måste göras. Kriterier är inget inter-paket beroende, liknande processflöde för alla paket. och möjlighet att köra fler processer på ett paket paralellt. GPU offloading ger ökad fördröjning och minneskonsumption till förmån för högre troughput.

APA, Harvard, Vancouver, ISO, and other styles

15

Berrajaa, Achraf. "Parallélisation d'heuristiques d'optimisation sur les GPUs." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMLH31/document.

Full text

Abstract:

Cette thèse, présente des contributions à la résolution (sur les GPUs) de problèmes d'optimisations réels de grandes tailles. Les problèmes de tournées de véhicules (VRP) et ceux de localisation des hubs (HLP) sont traités. Diverses approches et leur implémentions sur GPU pour résoudre des variantes du VRP sont présentées. Un algorithme génétique (GA) parallèle sur GPU est proposé pour résoudre différentes variantes du HLP. Le GA adapte son codage, sa solution initiale, ses opérateurs génétiques et son implémentation à chacune des variantes traitées. Enfin, nous avons utilisé le GA pour résoudre le HLP avec des incertitudes sur les données.Les tests numériques montrent que les approches proposées exploitent efficacement la puissance de calcul du GPU et ont permis de résoudre de larges instances jusqu'à 6000 nœuds
This thesis presents contributions to the resolution (on GPUs) of real optimization problems of large sizes. The vehicle routing problems (VRP) and the hub location problems (HLP) are treated. Various approaches implemented on GPU to solve variants of the VRP. A parallel genetic algorithm (GA) on GPU is proposed to solve different variants of the HLP. The proposed GA adapts its encoding, initial solution, genetic operators and its implementation to each of the variants treated. Finally, we used the GA to solve the HLP with uncertainties on the data.The numerical tests show that the proposed approaches effectively exploit the computing power of the GPU and have made it possible to resolve large instances up to 6000 nodes

APA, Harvard, Vancouver, ISO, and other styles

16

SILVA, CLEOMAR PEREIRA DA. "MASSIVELY PARALLEL GENETIC PROGRAMMING ON GPUS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24129@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
PROGRAMA DE EXCELENCIA ACADEMICA
A Programação Genética permite que computadores resolvam problemas automaticamente, sem que eles tenham sido programados para tal. Utilizando a inspiração no princípio da seleção natural de Darwin, uma população de programas, ou indivíduos, é mantida, modificada baseada em variação genética, e avaliada de acordo com uma função de aptidão (fitness). A programação genética tem sido usada com sucesso por uma série de aplicações como projeto automático, reconhecimento de padrões, controle robótico, mineração de dados e análise de imagens. Porém, a avaliação da gigantesca quantidade de indivíduos gerados requer excessiva quantidade de computação, levando a um tempo de execução inviável para problemas grandes. Este trabalho explora o alto poder computacional de unidades de processamento gráfico, ou GPUs, para acelerar a programação genética e permitir a geração automática de programas para grandes problemas. Propomos duas novas metodologias para se explorar a GPU em programação genética: compilação em linguagem intermediária e a criação de indivíduos em código de máquina. Estas metodologias apresentam vantagens em relação às metodologias tradicionais usadas na literatura. A utilização de linguagem intermediária reduz etapas de compilação e trabalha com instruções que estão bem documentadas. A criação de indivíduos em código de máquina não possui nenhuma etapa de compilação, mas requer engenharia reversa das instruções que não estão documentadas neste nível. Nossas metodologias são baseadas em programação genética linear e inspiradas em computação quântica. O uso de computação quântica permite uma convergência rápida, capacidade de busca global e inclusão da história passada dos indivíduos. As metodologias propostas foram comparadas com as metodologias existentes e apresentaram ganhos consideráveis de desempenho. Foi observado um desempenho máximo de até 2,74 trilhões de GPops (operações de programação genética por segundo) para o benchmark Multiplexador de 20 bits e foi possível estender a programação genética para problemas que apresentam bases de dados de até 7 milhões de amostras.
Genetic Programming enables computers to solve problems automatically, without being programmed to it. Using the inspiration in the Darwin s Principle of natural selection, a population of programs or individuals is maintained, modified based on genetic variation, and evaluated according to a fitness function. Genetic programming has been successfully applied to many different applications such as automatic design, pattern recognition, robotic control, data mining and image analysis. However, the evaluation of the huge amount of individuals requires excessive computational demands, leading to extremely long computational times for large size problems. This work exploits the high computational power of graphics processing units, or GPUs, to accelerate genetic programming and to enable the automatic generation of programs for large problems. We propose two new methodologies to exploit the power of the GPU in genetic programming: intermediate language compilation and individuals creation in machine language. These methodologies have advantages over traditional methods used in the literature. The use of an intermediate language reduces the compilation steps, and works with instructions that are well-documented. The individuals creation in machine language has no compilation step, but requires reverse engineering of the instructions that are not documented at this level. Our methodologies are based on linear genetic programming and are inspired by quantum computing. The use of quantum computing allows rapid convergence, global search capability and inclusion of individuals past history. The proposed methodologies were compared against existing methodologies and they showed considerable performance gains. It was observed a maximum performance of 2,74 trillion GPops (genetic programming operations per second) for the 20-bit Multiplexer benchmark, and it was possible to extend genetic programming for problems that have databases with up to 7 million samples.

APA, Harvard, Vancouver, ISO, and other styles

17

Dublish, Saumay Kumar. "Managing the memory hierarchy in GPUs." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31205.

Full text

Abstract:

Pervasive use of GPUs across multiple disciplines is a result of continuous adaptation of the GPU architectures to address the needs of upcoming application domains. One such vital improvement is the introduction of the on-chip cache hierarchy, used primarily to filter the high bandwidth demand to the off-chip memory. However, in contrast to traditional CPUs, the cache hierarchy in GPUs is presented with significantly different challenges such as cache thrashing and bandwidth bottlenecks, arising due to small caches and high levels of memory traffic. These challenges lead to severe congestion across the memory hierarchy, resulting in high memory access latencies. In memory-intensive applications, such high memory access latencies often get exposed and can no longer be hidden through multithreading, and therefore adversely impact system performance. In this thesis, we address the inefficiencies across the memory hierarchy in GPUs that lead to such high levels of congestion. We identify three major factors contributing to poor memory system performance: first, disproportionate and insufficient bandwidth resources in the cache hierarchy; second, poor cache management policies; and third, high levels of multithreading. In order to revitalize the memory hierarchy by addressing the above limitations, we propose a three-pronged approach. First, we characterize the bandwidth bottlenecks present across the memory hierarchy in GPUs and identify the architectural parameters that are most critical in alleviating congestion. Subsequently, we explore the architectural design space to mitigate the bandwidth bottlenecks in a cost-effective manner. Second, we identify significant inter-core reuse in GPUs, presenting an opportunity to reuse data among the L1s. We exploit this reuse by connecting the L1 caches with a lightweight ring network to facilitate inter-core communication of shared data. We show that this technique reduces traffic to the L2 cache, freeing up the bandwidth for other accesses. Third, we present Poise, a machine learning approach to mitigate cache thrashing and bandwidth bottlenecks by altering the levels of multi-threading. Poise comprises a supervised learning model that is trained offline on a set of profiled kernels to make good warp scheduling decisions. Subsequently, a hardware inference engine is used to predict good warp scheduling decisions at runtime using the model learned during training. In summary, we address the problem of bandwidth bottlenecks across the memory hierarchy in GPUs by exploring how to best scale, supplement and utilize the existing bandwidth resources. These techniques provide an effective and comprehensive methodology to mitigate the bandwidth bottlenecks in the GPU memory hierarchy.

APA, Harvard, Vancouver, ISO, and other styles

18

Rawat, Prashant Singh. "Optimization of Stencil Computations on GPUs." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Suri, Bharath. "Accelerating the knapsack problem on GPUs." Thesis, Linköpings universitet, ESLAB - Laboratoriet för inbyggda system, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70406.

Full text

Abstract:

The knapsack problem manifests itself in many domains like cryptography, financial domain and bio-informatics. Knapsack problems are often inside optimization loops in system-level design and analysis of embedded systems as well. Given a set of items, each associated with a profit and a weight, the knapsack problem deals with how to choose a subset of items such that the profit is maximized and the total weight of the chosen items is less than the capacity of the knapsack. There exists several variants and extensions of this knapsack problem. In this thesis, we focus on the multiple-choice knapsack problem, where the items are grouped into disjoint classes. However, the multiple-choice knapsack problem is known to be NP-hard. While many different heuristics and approximation schemes have been proposed to solve the problem in polynomial-time, such techniques do not return the optimal solution. A dynamic programming algorithm to solve the problem optimally is known, but has a pseudo-polynomial running time. This leads to high running times of tools in various application domains where knapsack problems must be solved. Many system-level design tools in the embedded systems domain, in particular, would suffer from high running when such a knapsack problem must be solved inside a larger optimization loop. To mitigate the high running times of such algorithms, in this thesis, we propose a GPU-based technique to solve the multiple-choice knapsack problem. We study different approaches to map the dynamic programming algorithm on the GPU and compare their performance in terms of the running times. We employ GPU specific methods to further improve the running times like exploiting the GPU on-chip shared memory. Apart from results on synthetic test-cases, we also demonstrate the applicability of our technique in practice by considering a case-study from system-level design. Towards this, we consider the problem of instruction-set selection for customizable processors.

APA, Harvard, Vancouver, ISO, and other styles

20

Björn, Overå. "Skinning på GPUn : Med dubbel kvaternioner." Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-21364.

Full text

Abstract:

Målet med projektet var att undersöka hur skeletal animationer utförs. Ett mål till var att det skulle vara förbestämda animationer. För att kunna ha förberäknade animationer användes autodesk fbx-filer. Skinningen har använt dubbel kvaternioner istället för matriser.Rapporten visar att skeletal animation med dubbel kvaternion skinning teknik kan utföras genom att använda fbx-filer med data som först exporterats till json-format.

APA, Harvard, Vancouver, ISO, and other styles

21

Dollinger, Jean-François. "A framework for efficient execution on GPU and CPU+GPU systems." Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAD019/document.

Full text

Abstract:

Les verrous technologiques rencontrés par les fabricants de semi-conducteurs au début des années deux-mille ont abrogé la flambée des performances des unités de calculs séquentielles. La tendance actuelle est à la multiplication du nombre de cœurs de processeur par socket et à l'utilisation progressive des cartes GPU pour des calculs hautement parallèles. La complexité des architectures récentes rend difficile l'estimation statique des performances d'un programme. Nous décrivons une méthode fiable et précise de prédiction du temps d'exécution de nids de boucles parallèles sur GPU basée sur trois étapes : la génération de code, le profilage offline et la prédiction online. En outre, nous présentons deux techniques pour exploiter l'ensemble des ressources disponibles d'un système pour la performance. La première consiste en l'utilisation conjointe des CPUs et GPUs pour l'exécution d'un code. Afin de préserver les performances il est nécessaire de considérer la répartition de charge, notamment en prédisant les temps d'exécution. Le runtime utilise les résultats du profilage et un ordonnanceur calcule des temps d'exécution et ajuste la charge distribuée aux processeurs. La seconde technique présentée met le CPU et le GPU en compétition : des instances du code cible sont exécutées simultanément sur CPU et GPU. Le vainqueur de la compétition notifie sa complétion à l'autre instance, impliquant son arrêt
Technological limitations faced by the semi-conductor manufacturers in the early 2000's restricted the increase in performance of the sequential computation units. Nowadays, the trend is to increase the number of processor cores per socket and to progressively use the GPU cards for highly parallel computations. Complexity of the recent architectures makes it difficult to statically predict the performance of a program. We describe a reliable and accurate parallel loop nests execution time prediction method on GPUs based on three stages: static code generation, offline profiling, and online prediction. In addition, we present two techniques to fully exploit the computing resources at disposal on a system. The first technique consists in jointly using CPU and GPU for executing a code. In order to achieve higher performance, it is mandatory to consider load balance, in particular by predicting execution time. The runtime uses the profiling results and the scheduler computes the execution times and adjusts the load distributed to the processors. The second technique, puts CPU and GPU in a competition: instances of the considered code are simultaneously executed on CPU and GPU. The winner of the competition notifies its completion to the other instance, implying the termination of the latter

APA, Harvard, Vancouver, ISO, and other styles

22

Arvid, Johnsson. "Analysis of GPU accelerated OpenCL applications on the Intel HD 4600 GPU." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-140124.

Full text

Abstract:

GPU acceleration is the concept of accelerating the execution speed of an application by running it on the GPU. Researchers and developers have always wanted to achieve greater speed for their applications and GPU acceleration is a very common way of doing so. This has been done a long time for highly graphical applications using powerful dedicated GPUs. However, researchers have become more and more interested in using GPU acceleration on everyday applications. Moreover now a days more or less every computer has some sort of integrated GPU which often is underutilized. The integrated GPUs are not as powerful as dedicated ones but they have other benefits such as a lower power consumption and faster data transfer. Therefore this thesis’ purpose was to examine whether the integrated GPU Intel HD 4600 can be used to accelerate the two applications Image Convolution and sparse matrix vector multiplication (SpMV). This was done by analysing the code from a previous thesis which produced some unexpected results as well as a benchmark from the OpenDwarf’s benchmark suite. The Intel HD 4600 was able to speedup both Image Convolution and SpMV by about two times compared to running them on the Intel i7-4790. However, the SpMV implementation was not well suited for the GPU meaning that the speedup was only observed on ideal input configurations.

APA, Harvard, Vancouver, ISO, and other styles

23

Van, Luong Thé. "Métaheuristiques parallèles sur GPU." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2011. http://tel.archives-ouvertes.fr/tel-00638820.

Full text

Abstract:

Les problèmes d'optimisation issus du monde réel sont souvent complexes et NP-difficiles. Leur modélisation est en constante évolution en termes de contraintes et d'objectifs, et leur résolution est coûteuse en temps de calcul. Bien que des algorithmes approchés telles que les métaheuristiques (heuristiques génériques) permettent de réduire la complexité de leur résolution, ces méthodes restent insuffisantes pour traiter des problèmes de grande taille. Au cours des dernières décennies, le calcul parallèle s'est révélé comme un moyen incontournable pour faire face à de grandes instances de problèmes difficiles d'optimisation. La conception et l'implémentation de métaheuristiques parallèles sont ainsi fortement influencées par l'architecture parallèle considérée. De nos jours, le calcul sur GPU s'est récemment révélé efficace pour traiter des problèmes coûteux en temps de calcul. Cette nouvelle technologie émergente est considérée comme extrêmement utile pour accélérer de nombreux algorithmes complexes. Un des enjeux majeurs pour les métaheuristiques est de repenser les modèles existants et les paradigmes de programmation parallèle pour permettre leur déploiement sur les accélérateurs GPU. De manière générale, les problèmes qui se posent sont la répartition des tâches entre le CPU et le GPU, la synchronisation des threads, l'optimisation des transferts de données entre les différentes mémoires, les contraintes de capacité mémoire, etc. La contribution de cette thèse est de faire face à ces problèmes pour la reconception des modèles parallèles des métaheuristiques pour permettre la résolution des problèmes d'optimisation à large échelle sur les architectures GPU. Notre objectif est de repenser les modèles parallèles existants et de permettre leur déploiement sur GPU. Ainsi, nous proposons dans ce document une nouvelle ligne directrice pour la construction de métaheuristiques parallèles efficaces sur GPU. Le défi de cette thèse porte sur la conception de toute la hiérarchie des modèles parallèles sur GPU. Pour cela, des approches très efficaces ont été proposées pour l'optimisation des transferts de données entre le CPU et le GPU, le contrôle de threads, l'association entre les solutions et les threads, ou encore la gestion de la mémoire. Les approches proposées ont été expérimentées de façon exhaustive en utilisant cinq problèmes d'optimisation et quatre configurations GPU. En comparaison avec une exécution sur CPU, les accélérations obtenues vont jusqu'à 80 fois plus vite pour des grands problèmes d'optimisation combinatoire et jusqu'à 2000 fois plus vite pour un problème d'optimisation continue. Les différents travaux liés à cette thèse ont fait l'objet d'une douzaine publications comprenant la revue IEEE Transactions on Computers.

APA, Harvard, Vancouver, ISO, and other styles

24

Jensen, Jørgen Haavind. "Hydrodynamiske beregninger vha GPU." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for marin teknikk, 2010. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-11549.

Full text

Abstract:

Om man skal gjøre prediksjoner av hydrodynamiske egenskaper til større marine strukturer i dag, er den vanligste måten fremdeles modellforsøk, empiri og lineærteori. Viskøse CFDsimuleringer ved hjelp av RANSmetoder har begynt å gi brukbare resultater, men en direkte Navier-Stokesløser for de Reynoldstall man typisk støter på i større marine applikasjoner, er enda langt framme i tid, både med tanke på beregningskraft og minne. General Purpose GPU har i den siste tiden vist lovende resultater for å øke ytelsen på aritmetisk tunge beregninger. Samtidig har Lattice-Boltzmann-metode for løsing av Navier-Stokes ligninger, i kraft av sin høye parallelliserbarhet, vist seg godt egnet for implementering på GPU. Ved å se på muligheten for å gjøre Lattice-Boltzmann-beregninger over flere GPUer samtidig, håper jeg å ta oss ett steg nærmere en fullskala løser for Navier-Stokes, løse noen nyttige problemer på veien dit, og samtidig få lært meg mer om GPGPUprogrammering som jeg tror vil bli et svært nyttig verktøy for framtidens ingeniør.

APA, Harvard, Vancouver, ISO, and other styles

25

Lien, Geir Josten. "Auto-tunable GPU BLAS." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-18411.

Full text

Abstract:

In this paper, we present our implementation of an Auto tuning system, written in C++, which incorporate the use of OpenCL kernels. We deploy this approach on different GPU architectures, evaluating the performance of the approach. Our main focus is to easily generate tuned code, that would otherwise require a large amount of empirical testing, and then run it on any kind of device. This is achieved through the auto tuning framework, which will create different kernels, compile and run them on the device and output the best performing kernel on the given platform.BLAS is much used in performance critical applications, and is a good candidate for execution on GPUs due to its potential performance increase. Our implementation was benchmarked on various of test environments, with different GPUs, where we achieved comparable results to the ViennaCL library. We also tested against the native vendor specific BLAS libraries from AMD and NVIDIA.

APA, Harvard, Vancouver, ISO, and other styles

26

Tokdemir, Serpil. "DCT Implementation on GPU." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_theses/33.

Full text

Abstract:

There has been a great progress in the field of graphics processors. Since, there is no rise in the speed of the normal CPU processors; Designers are coming up with multi-core, parallel processors. Because of their popularity in parallel processing, GPUs are becoming more and more attractive for many applications. With the increasing demand in utilizing GPUs, there is a great need to develop operating systems that handle the GPU to full capacity. GPUs offer a very efficient environment for many image processing applications. This thesis explores the processing power of GPUs for digital image compression using Discrete cosine transform.

APA, Harvard, Vancouver, ISO, and other styles

27

Yuan, George Lai. "GPU compute memory systems." Thesis, University of British Columbia, 2009. http://hdl.handle.net/2429/15877.

Full text

Abstract:

Modern Graphic Process Units (GPUs) offer orders of magnitude more raw computing power than contemporary CPUs by using many simpler in-order single-instruction, multiple-data (SIMD) cores optimized for multi-thread performance rather than single-thread performance. As such, GPUs operate much closer to the "Memory Wall", thus requiring much more careful memory management. This thesis proposes changes to the memory system of our detailed GPU performance simulator, GPGPU-Sim, to allow proper simulation of general-purpose applications written using NVIDIA's Compute Unified Device Architecture (CUDA) framework. To test these changes, fourteen CUDA applications with varying degrees of memory intensity were collected. With these changes, we show that our simulator predicts performance of commodity GPU hardware with 86% correlation. Furthermore, we show that increasing chip resources to allow more threads to run concurrently does not necessarily increase performance due to increased contention for the shared memory system. Moreover, this thesis proposes a hybrid analytical DRAM performance model that uses memory address traces to predict the efficiency of a DRAM system when using a conventional First-Ready First-Come First-Serve (FR-FCFS) memory scheduling policy. To stress the proposed model, a massively multithreaded architecture based upon contemporary high-end GPUs is simulated to generate the memory address trace needed. The results show that the hybrid analytical model predicts DRAM efficiency to within 11.2% absolute error when arithmetically averaged across a memory-intensive subset of the CUDA applications introduced in the first part of this thesis. Finally, this thesis proposes a complexity-effective solution to memory scheduling that recovers most of the performance loss incurred by a naive in-order First-in First-out (FIFO) DRAM scheduler compared to an aggressive out-of-order FR-FCFS scheduler. While FR-FCFS scheduling re-orders memory requests to improve row access locality, we instead employ an interconnection network arbitration scheme that preserves the inherently high row access locality of memory request streams from individual "shader cores" and, in doing so, achieve DRAM efficiency and system performance close to that of FR-FCFS with a simpler design. We evaluate our interconnection network arbitration scheme using crossbar, ring, and mesh networks and show that, when coupled with a banked FIFO in-order scheduler, it obtains up to 91.0% of the performance obtainable with an out-of-order memory scheduler with eight-entry DRAM controller queues.

APA, Harvard, Vancouver, ISO, and other styles

28

Tokdemir, Serpil. "Digital compression on GPU." unrestricted, 2006. http://etd.gsu.edu/theses/available/etd-12012006-154433/.

Full text

Abstract:

Thesis (M.S.)--Georgia State University, 2006.
Title from dissertation title page. Saeid Belkasim, committee chair; Ying Zhu, A.P. Preethy, committee members. Electronic text (90 p. : ill. (some col.)). Description based on contents viewed May 2, 2007. Includes bibliographical references (p. 78-81).

APA, Harvard, Vancouver, ISO, and other styles

29

Young, Bobby Dalton. "MPI WITHIN A GPU." UKnowledge, 2009. http://uknowledge.uky.edu/gradschool_theses/614.

Full text

Abstract:

GPUs offer high-performance floating-point computation at commodity prices, but their usage is hindered by programming models which expose the user to irregularities in the current shared-memory environments and require learning new interfaces and semantics. This thesis will demonstrate that the message-passing paradigm can be conceptually cleaner than the current data-parallel models for programming GPUs because it can hide the quirks of current GPU shared-memory environments, as well as GPU-specific features, behind a well-established and well-understood interface. This will be shown by demonstrating a proof-of-concept MPI implementation which provides cleaner, simpler code with a reasonable performance cost. This thesis will also demonstrate that, although there is a virtualization constraint imposed by MPI, this constraint is harmless as long as the virtualization was already chosen to be optimal in terms of a strong execution model and nearly-optimal execution time. This will be demonstrated by examining execution times with varying virtualization using a computationally-expensive micro-kernel.

APA, Harvard, Vancouver, ISO, and other styles

30

Blomquist, Linus, and Hampus Engström. "GPU based IP forwarding." Thesis, Linköpings universitet, Programvara och system, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-119433.

Full text

Abstract:

This thesis was about investigating if it is feasible to implement an IP-forwarding data plane on a GPU. A GPU is energy efficient compared to other more powerful processors on the market today and should in theory be efficient to use for routing purposes. An IP-forwarding data plane consist of several things where we focused on some of the concepts. We have implemented IP-forwarding lookup operations, packet header changes, prioritization between different packets and a traffic shaper to restrict the packet throughput. To test these concepts we implemented a prototype, on a Tegra platform, in CUDA and evaluated its performance. We are able to forward 28 Mpackets/second with a best case latency of 27 µS given local simulated packets. The conclusions we can draw of this thesis work is that using a GPU for IP-forwarding purposes seems like an energy efficient solution compared to other routers on the market today. In the thesis we also tried the concept of only launching the GPU kernel once and let it be infinite which shows promising results for future work.

APA, Harvard, Vancouver, ISO, and other styles

31

Mondal, Siddhartha Sankar. "GPU: Power vs Performance." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-237364.

Full text

Abstract:

GPUs are widely being used to meet the ever increasing demands of High performance computing. High-end GPUs are one of the highest consumers of power in a computer. Power dissipation has always been a major concern area for computer architects. Due to power efficiency demands modern CPUs have moved towards multicore architectures. GPUs are already massively parallel architectures. There has been some encouraging results for power efficiency in CPUs by applying DVFS . The vision is that a similar approach would also provide encouraging results for power efficiency in GPUs. In this thesis we have analyzed the power and performance characteristics of GPU at different frequencies. To help us achieve that, we have made a set of microbenchmarks with different levels of memory boundedness and threads. We have also used benchmarks from CUDA SDK and Parboil. We have used a GTX580 Fermi based GPU. We have also made a hardware infrastructure that accurately measures the power being consumed by the GPU.

APA, Harvard, Vancouver, ISO, and other styles

32

BORDIGNON, ALEX LAIER. "NAVIER-STOKES EM GPU." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2006. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=8928@1.

Full text

Abstract:

COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
Nesse trabalho, mostramos como simular um fluido em duas dimensÃµes em um domÃnio com fronteiras arbitrÃ¡rias. Nosso trabalho Ã© baseado no esquema stable fluids desenvolvido por Joe Stam. A implementaÃ§Ã£o Ã© feita na GPU (Graphics Processing Unit), permitindo velocidade de interaÃ§Ã£o com o fluido. Fazemos uso da linguagem Cg (C for Graphics), desenvolvida pela companhia NVidia. Nossas principais contribuiÃ§Ãµes sÃ£o o tratamento das mÃºltiplas fronteiras, onde aplicamos interpolaÃ§Ã£o bilinear para atingir melhores resultados, armazenamento das condiÃ§Ãµes de fronteira usa apenas um canal de textura, e o uso de confinamento de vorticidade.
In this work we show how to simulate fluids in two dimensions in a domain with arbitrary bondaries. Our work is based on the stable fluid scheme developed by Jo Stam. The implementation is done in GPU (Graphics Processinfg Unit), thus allowing fluid interaction speed. We use the language Cg (C for Graphics) developed by the company Nvídia. Our main contributions are the treatment of domains with multiple boundaries, where we apply bilinear interpolation to obtain better results, the storage of the bondaty conditions in a unique texturre channel, and the use of vorticity confinement.

APA, Harvard, Vancouver, ISO, and other styles

33

DUARTE, LEONARDO SEPERUELO. "GRAINS SIMULATION ON GPU." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2009. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=16008@1.

Full text

Abstract:

COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
A proposta deste trabalho é viabilizar e acelerar a simulação de um sistema de grãos implementado inteiramente na GPU, utilizando o Método dos Elementos Discretos (MED). O objetivo de implementar todo o sistema na GPU é evitar o custo de transferência de informações entre a placa gráfica e a CPU. O sistema proposto simula partículas de diferentes diâmetros, com tratamento de colisão entre partículas e entre partículas e o ambiente. Com o Método dos Elementos Discretos são consideradas forças normais e forças tangenciais aplicadas sobre as partículas. Algoritmos paralelos foram desenvolvidos para construção e armazenamento do histórico de forças tangenciais existente em cada contato entre partículas. São comparadas duas propostas de construção de grade regular de células para realizar a detecção de contato. A primeira proposta é muito eficiente para partículas com raio fixo, enquanto que a segunda se mostra com maior escalabilidade para modelos com variação de raio. O sistema é composto por diversos algoritmos executados em threads, responsáveis por cada etapa da simulação. Os resultados da simulação foram validados com o programa comercial PFC3D. O sistema de partículas em GPU consegue ser até 10 vezes mais rápido do que o programa comercial.
The purpose of this work is to make possible and speed up a grain system simulation implemented entirely on GPU, using the Discrete Element Method (DEM). The goal of implementing all the system on GPU is to avoid the cost of data transfer between the graphics hardware and the CPU. The proposed system simulate particles of different diameters, with collision treatment between particles and between particles and the environment. The Discrete Element Method consider normal forces and tangential forces applied on the particles. Parallel algorithms were designed to construct and storage the tangential forces history present in each contact between particles. Two ideas for the construction of the regular grid of cells are proposed and compared to perform the collision detection. The first one is very efficient to particles with fixed radius, while the second one shows more scalability in models with radius variation. The system consists of several algorithms running in threads, responsible for each step of the simulation. The results of the simulation were validated with the commercial program called PFC3D. The GPU particle system can be up to 10 times faster then the commercial program.

APA, Harvard, Vancouver, ISO, and other styles

34

Lionetti, Fred. "GPU accelerated cardiac electrophysiology." Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/ucsd/fullcit?p1474756.

Full text

Abstract:

Thesis (M.S.)--University of California, San Diego, 2010.
Title from first page of PDF file (viewed April 14, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 85-89).

APA, Harvard, Vancouver, ISO, and other styles

35

Souza, Thársis Tuani Pinto. "Simulações Financeiras em GPU." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-23052013-234703/.

Full text

Abstract:

É muito comum modelar problemas em finanças com processos estocásticos, dada a incerteza de suas variáveis de análise. Além disso, problemas reais nesse domínio são, em geral, de grande custo computacional, o que sugere a utilização de plataformas de alto desempenho (HPC) em sua implementação. As novas gerações de arquitetura de hardware gráfico (GPU) possibilitam a programação de propósito geral enquanto mantêm alta banda de memória e grande poder computacional. Assim, esse tipo de arquitetura vem se mostrando como uma excelente alternativa em HPC. Com isso, a proposta principal desse trabalho é estudar o ferramental matemático e computacional necessário para modelagem estocástica em finanças com a utilização de GPUs como plataforma de aceleração. Para isso, apresentamos a GPU como uma plataforma de computação de propósito geral. Em seguida, analisamos uma variedade de geradores de números aleatórios, tanto em arquitetura sequencial quanto paralela. Além disso, apresentamos os conceitos fundamentais de Cálculo Estocástico e de método de Monte Carlo para simulação estocástica em finanças. Ao final, apresentamos dois estudos de casos de problemas em finanças: \"Stops Ótimos\" e \"Cálculo de Risco de Mercado\". No primeiro caso, resolvemos o problema de otimização de obtenção do ganho ótimo em uma estratégia de negociação de ações de \"Stop Gain\". A solução proposta é escalável e de paralelização inerente em GPU. Para o segundo caso, propomos um algoritmo paralelo para cálculo de risco de mercado, bem como técnicas para melhorar a solução obtida. Nos nossos experimentos, houve uma melhora de 4 vezes na qualidade da simulação estocástica e uma aceleração de mais de 50 vezes.
Given the uncertainty of their variables, it is common to model financial problems with stochastic processes. Furthermore, real problems in this area have a high computational cost. This suggests the use of High Performance Computing (HPC) to handle them. New generations of graphics hardware (GPU) enable general purpose computing while maintaining high memory bandwidth and large computing power. Therefore, this type of architecture is an excellent alternative in HPC and comptutational finance. The main purpose of this work is to study the computational and mathematical tools needed for stochastic modeling in finance using GPUs. We present GPUs as a platform for general purpose computing. We then analyze a variety of random number generators, both in sequential and parallel architectures, and introduce the fundamental mathematical tools for Stochastic Calculus and Monte Carlo simulation. With this background, we present two case studies in finance: ``Optimal Trading Stops\'\' and ``Market Risk Management\'\'. In the first case, we solve the problem of obtaining the optimal gain on a stock trading strategy of ``Stop Gain\'\'. The proposed solution is scalable and with inherent parallelism on GPU. For the second case, we propose a parallel algorithm to compute market risk, as well as techniques for improving the quality of the solutions. In our experiments, there was a 4 times improvement in the quality of stochastic simulation and an acceleration of over 50 times.

APA, Harvard, Vancouver, ISO, and other styles

36

Mäkelä, J. (Jussi). "GPU accelerated face detection." Master's thesis, University of Oulu, 2013. http://urn.fi/URN:NBN:fi:oulu-201303181103.

Full text

Abstract:

Graphics processing units have massive parallel processing capabilities, and there is a growing interest in utilizing them for generic computing. One area of interest is computationally heavy computer vision algorithms, such as face detection and recognition. Face detection is used in a variety of applications, for example the autofocus on cameras, face and emotion recognition, and access control. In this thesis, the face detection algorithm was accelerated with GPU using OpenCL. The goal was to gain performance benefit while keeping the implementations functionally equivalent. The OpenCL version was based on optimized reference implementation. The possibilities and challenges in accelerating different parts of the algorithm were studied. The reference and the accelerated implementations are depicted in detail, and performance is compared. The performance was evaluated by runtimes with three sets of four different sized images, and three additional images presenting special cases. The tests were run with two differently set-up computers. From the results, it can be seen that face detection is well suited for GPU acceleration; that is the algorithm is well parallelizable and can utilize efficient texture processing hardware. There are delays related in initializing the OpenCL platform which mitigate the benefit to some degree. The accelerated implementation was found to deliver equal or lower performance when there was little computation; that is the image was small or easily analyzed. With bigger and more complex images, the accelerated implementation delivered good performance compared to reference implementation. In future work, there should be some method of mitigating delays introduced by the OpenCL initialization. This work will have interest in the future when OpenCL acceleration becomes available on mobile phones
Grafiikkaprosessorit kykenevät massiiviseen rinnakkaislaskentaan ja niiden käyttö yleiseen laskentaan on kasvava kiinnostuksen aihe. Eräs alue missä kiihdytyksen käytöstä on kiinnostuttu on laskennallisesti raskaat konenäköalgoritmit kuten kasvojen ilmaisu ja tunnistus. Kasvojen ilmaisua käytetään useissa sovelluksissa, kuten kameroiden automaattitarkennuksessa, kasvojen ja tunteiden tunnistuksessa sekä kulun valvonnassa. Tässä työssä kasvojen ilmaisualgoritmia kiihdytettiin grafiikkasuorittimella käyttäen OpenCL-rajapintaa. Työn tavoite oli parantunut suorituskyky kuitenkin niin että implementaatiot pysyivät toiminnallisesti samanlaisina. OpenCL-versio perustui optimoituun verrokki-implementaatioon. Algoritmin eri vaiheiden kiihdytyksen mahdollisuuksia ja haasteita on tutkittu. Kiihdytetty- ja verrokki-implementaatio kuvaillaan ja niiden välistä suorituskykyeroa vertaillaan. Suorituskykyä arvioitiin ajoaikojen perusteella. Testeissä käytettiin kolmea kuvasarjaa joissa jokaisessa oli neljä eri kokoista kuvaa sekä kolmea lisäkuvaa jotka kuvastivat erikoistapauksia. Testit ajettiin kahdella erilailla varustellulla tietokoneella. Tuloksista voidaan nähdä että kasvojen ilmaisu soveltuu hyvin GPU kiihdytykseen, sillä algoritmin pystyy rinnakkaistamaan ja siinä pystyy käyttämään tehokasta tekstuurinkäsittelylaitteistoa. OpenCL-ympäristön alustaminen aiheuttaa viivettä joka vähentää jonkin verran suorituskykyetua. Testeissä todettiin kiihdytetyn implementaation antavan saman suuruisen tai jopa pienemmän suorituskyvyn kuin verrokki-implementaatio sellaisissa tapauksissa, joissa laskentaa oli vähän johtuen joko pienestä tai helposti käsiteltävästä kuvasta. Toisaalta kiihdytetyn implementaation suorituskyky oli hyvä verrattuna verrokki-implementaatioon kun käytettiin suuria ja monimutkaisia kuvia. Tulevaisuudessa OpenCL-ympäristön alustamisen aiheuttamat viivettä tulisi saada vähennettyä. Tämä työ on kiinnostava myös tulevaisuudessa kun OpenCL-kiihdytys tulee mahdolliseksi matkapuhelimissa

APA, Harvard, Vancouver, ISO, and other styles

37

Walton, Simon. "GPU-based volume deformation." Thesis, Swansea University, 2007. https://cronfa.swan.ac.uk/Record/cronfa43117.

Full text

Abstract:

Surface-based representations of objects, and consequently their rendering algorithms, currently dominate the field of computer graphics. It could be argued that this is not just due to the efficiency of representation (representing merely surfaces, and not internal information), but is mostly due to the fact that surface-based graphics as a sub-field has seen many years of prioritised research and development. Volume graphics as a sub-field of computer graphics has however seen a rapid rise in research concentration in recent years. Its popularity can be attributed mainly to its ever-important role in medical applications such as surgery simulations and medical illustration; however, its rapid growth in the past five years or so is unquestionably due to the real-time volume rendering techniques implemented on the Graphics Processing Units of commodity graphics hardware. The deformation of graphical objects is an important part of animation; particularly in CGI-based movies where characters must bend and stretch comically according to their actions. Deformation also plays an important role in surgical simulations, where real-time physically-based solutions are required to give the surgeon or student a realistic simulation of a surgical operation. The deformation of volumetric data (as in volume graphics) is a challenge due to the sheer amount of data that must be transformed, and the lack of topographical/semantic information that is embedded with freshly-aquired data. Such semantics must usually be inferred by the user using manual processes such as segmentation. The work presented in this thesis provides a robust set of methods and techniques for the real-time manipulation of volumetric data, utilising high-performance graphics hardware to ensure that the field of volume graphics can continue to be a highly-attractive alternative to surface-based graphics. The main contributions of this work are: • A comprehensive review of volume graphics and volume deformation; • An introduction to important GPU-acclererated volume graphics methods; • A framework for the non-reconstructive deformation of volume data; • A GPU-accelerated forward-projection system for interactive volume deformation; • A real-time backwardbackward-mapping raycasting Tenderer for interactive, character-based volume deformation.

APA, Harvard, Vancouver, ISO, and other styles

38

Frank, Igor. "Simulace kapalin na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445570.

Full text

Abstract:

This thesis focuses on fluid simulation, particularly on coupling between particle based simulation and grid based simulation and thus modeling evaporation. Mentioned coupling is based on the article Evaporation and Condensation of SPH-based Fluids of authors Hendrik Hochstetter a Andreas Kolb. The goal of this thesis is not purely implementing ideas of the mentioned article, but also study of different methods used for fluid simulation.

APA, Harvard, Vancouver, ISO, and other styles

39

Jurák, Martin. "Detekce objektů na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234897.

Full text

Abstract:

This thesis is focused on the acceleration of Random Forest object detection in an image. Random Forest detector is an ensemble of independently evaluated random decision trees. This feature can be used to acceleration on graphics unit. Development and increasing performance of graphics processing units allow the use of GPU for general-purpose computing (GPGPU). The goal of this thesis is describe how to implement Random Forest method on GPU with OpenCL standard.

APA, Harvard, Vancouver, ISO, and other styles

40

Macenauer, Pavel. "Detekce objektů na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234942.

Full text

Abstract:

This thesis addresses the topic of object detection on graphics processing units. As a part of it, a system for object detection using NVIDIA CUDA was designed and implemented, allowing for realtime video object detection and bulk processing. Its contribution is mainly to study the options of NVIDIA CUDA technology and current graphics processing units for object detection acceleration. Also parallel algorithms for object detection are discussed and suggested.

APA, Harvard, Vancouver, ISO, and other styles

41

Schmied, Jan. "GPU akcelerované prolamování šifer." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236071.

Full text

Abstract:

This work describes one - way hash functions and cryptographic algorithms . It also describes their implementation regarding DOC, PDF and ZIP files contents encryption . Subsequently , the implementation analyzis is provided . Following next, the brute - force attack procedure levereging GPU is proposed and evaluated.

APA, Harvard, Vancouver, ISO, and other styles

42

Galacz, Roman. "Photon tracing na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236175.

Full text

Abstract:

Subject of this thesis is acceleration of the photon mapping method on a graphic card. The photon mapping is a method for computing almost realistic global illumination of the scene. The computation itself is relatively time-consuming, so the acceleration of it is a hot issue in the field of computer graphics. The photon mapping is described in detail from photon tracing to rendering of the scene. The thesis is then focused on spatial subdivision structures, especially to the uniform grid. The design and the implementation of the application computing the photon mapping on GPU, which is achieved by OpenGL and CUDA interoperability, is described in the next part of the thesis. Lastly, the application is tested properly. The achieved results are reviewed in the conclusion of the thesis.

APA, Harvard, Vancouver, ISO, and other styles

43

Potěšil, Josef. "Akcelerace kryptografie pomocí GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237073.

Full text

Abstract:

The reader will be familiar with selected concepts of cryptography consited in this work. AES algorithm was selected in conjunction with the description of architecture and software for programming graphic cards (CUDA, OpenCL), in order to create its GPU-accelerated version. This thesis tries to map APIs for communication with crypto-coprocessors, which exist in kernels of Linux/BSD operating systems (CryptoAPI, OCF). It examines this support in the cross-platform OpenSSL library. Subsequently, the work discusses the implementation details, achieved results and integration with OpenSSL library. The conclusion suggests how the developed application could be used and briefly suggests its usage directly by the operating system kernel.

APA, Harvard, Vancouver, ISO, and other styles

44

Janošík, Ondřej. "Fyzikální simulace na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255365.

Full text

Abstract:

This thesis addresses the issue of rigid body simulation and possibilities of paralellization using GPU. It describes the basics necessary for implementation of basic physics engine for blocks and technologies which can be used for acceleration. In my thesis, I describe approach which allowed me to gradually accellerate physics simulation using OpenCL. Each significant change is described in its own section and includes measurement results with short summary.

APA, Harvard, Vancouver, ISO, and other styles

45

Graves, Alex. "GPU-Accelerated Feature Tracking." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1462372516.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Luong, Thé Van. "Métaheuristiques parallèles sur GPU." Thesis, Lille 1, 2011. http://www.theses.fr/2011LIL10058/document.

Full text

Abstract:

Les problèmes d'optimisation issus du monde réel sont souvent complexes et NP-difficiles. Leur modélisation est en constante évolution en termes de contraintes et d'objectifs, et leur résolution est coûteuse en temps de calcul. Bien que des algorithmes approchés telles que les métaheuristiques (heuristiques génériques) permettent de réduire la complexité de leur résolution, ces méthodes restent insuffisantes pour traiter des problèmes de grande taille. Au cours des dernières décennies, le calcul parallèle s'est révélé comme un moyen incontournable pour faire face à de grandes instances de problèmes difficiles d'optimisation. La conception et l'implémentation de métaheuristiques parallèles sont ainsi fortement influencées par l'architecture parallèle considérée. De nos jours, le calcul sur GPU s'est récemment révélé efficace pour traiter des problèmes coûteux en temps de calcul. Cette nouvelle technologie émergente est considérée comme extrêmement utile pour accélérer de nombreux algorithmes complexes. Un des enjeux majeurs pour les métaheuristiques est de repenser les modèles existants et les paradigmes de programmation parallèle pour permettre leurdéploiement sur les accélérateurs GPU. De manière générale, les problèmes qui se posent sont la répartition des tâches entre le CPU et le GPU, la synchronisation des threads, l'optimisation des transferts de données entre les différentes mémoires, les contraintes de capacité mémoire, etc. La contribution de cette thèse est de faire face à ces problèmes pour la reconception des modèles parallèles des métaheuristiques pour permettre la résolution des problèmes d'optimisation à large échelle sur les architectures GPU. Notre objectif est de repenser les modèles parallèles existants et de permettre leur déploiement sur GPU. Ainsi, nous proposons dans ce document une nouvelle ligne directrice pour la construction de métaheuristiques parallèles efficaces sur GPU. Le défi de cette thèse porte sur la conception de toute la hiérarchie des modèles parallèles sur GPU. Pour cela, des approches très efficaces ont été proposées pour l'optimisation des transferts de données entre le CPU et le GPU, le contrôle de threads, l'association entre les solutions et les threads, ou encore la gestion de la mémoire. Les approches proposées ont été expérimentées de façon exhaustive en utilisant cinq problèmes d'optimisation et quatre configurations GPU. En comparaison avec une exécution sur CPU, les accélérations obtenues vont jusqu'à 80 fois plus vite pour des grands problèmes d'optimisation combinatoire et jusqu'à 2000 fois plus vite pour un problème d'optimisation continue. Les différents travaux liés à cette thèse ont fait l'objet d'une douzaine publications comprenant la revue IEEE Transactions on Computers
Real-world optimization problems are often complex and NP-hard. Their modeling is continuously evolving in terms of constraints and objectives, and their resolution is CPU time-consuming. Although near-optimal algorithms such as metaheuristics (generic heuristics) make it possible to reduce the temporal complexity of their resolution, they fail to tackle large problems satisfactorily. Over the last decades, parallel computing has been revealed as an unavoidable way to deal with large problem instances of difficult optimization problems. The design and implementation of parallel metaheuristics are strongly influenced by the computing platform. Nowadays, GPU computing has recently been revealed effective to deal with time-intensive problems. This new emerging technology is believed to be extremely useful to speed up many complex algorithms. One of the major issues for metaheuristics is to rethink existing parallel models and programming paradigms to allow their deployment on GPU accelerators. Generally speaking, the major issues we have to deal with are: the distribution of data processing between CPU and GPU, the thread synchronization, the optimization of data transfer between the different memories, the memory capacity constraints, etc. The contribution of this thesis is to deal with such issues for the redesign of parallel models of metaheuristics to allow solving of large scale optimization problems on GPU architectures. Our objective is to rethink the existing parallel models and to enable their deployment on GPUs. Thereby, we propose in this document a new generic guideline for building efficient parallel metaheuristics on GPU. Our challenge is to come out with the GPU-based design of the whole hierarchy of parallel models.In this purpose, very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of solutions to GPU threadsor memory management. These approaches have been exhaustively experimented using five optimization problems and four GPU configurations. Compared to a CPU-based execution, experiments report up to 80-fold acceleration for large combinatorial problems and up to 2000-fold speed-up for a continuous problem. The different works related to this thesis have been accepted in a dozen of publications, including the IEEE Transactions on Computers journal

APA, Harvard, Vancouver, ISO, and other styles

47

Olsson, Martin Wexö. "GPU based particle system." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3761.

Full text

Abstract:

GPGPU (General purpose computing on graphics processing unit) is quite common in today's modern computer games when doing heavy simulation calculations like game physics or particle systems. GPU programming is not only used in games but also in scientific research when doing heavy calculations on molecular structures and protein folding etc. The reason why you use the GPU for these kinds of tasks is that you can gain an incredible speedup in performance to your application. Previous research shows that particle systems scale very well to the GPU architecture. When simulating very large particle-system on the GPU it can run up to 79 times faster than the CPU. But for some very small particle systems the CPU proved to be faster. This research aims to compare the difference between the GPU and CPU when it comes to simulating many smaller particle-systems and to see what happen to the performance when the particle-systems become smaller and smaller.

APA, Harvard, Vancouver, ISO, and other styles

48

Pettersson, Jimmy, and Ian Wainwright. "Radar Signal Processing with Graphics Processors (GPUS)." Thesis, Uppsala University, Division of Scientific Computing, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-114003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Saffar, Shamshirgar Davoud. "Accelerated Pressure Projection using OpenCL on GPUs." Thesis, KTH, Mekanik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-102771.

Full text

Abstract:

A GPU version of the pressure projection solver using OpenCL is implemented. Then it has been compared with CPU version which is accelerated with OpenMP. The GPU version shows a sensible reduction in time despite using a simple algorithm in the kernel. The nal code is plugged into a commercial uid simulator software. Dierent kinds of algorithms and data transfer methods have been investigated. Overlapping the computation and communication showed a more than 3 times speed-up versus the serial communication-computation pattern. Finally we exploit methods for partitioning data and writing kernels to use many of the bene ts of computation on a heterogeneous system. We ran all the simulations on a machine with an Intel core i7-2600 cpu and 16 GB main memory coupled with a GeForce GTX 560 Ti graphic processing unit on a windows OS.

APA, Harvard, Vancouver, ISO, and other styles

50

Liu, Chi-man, and 廖志敏. "Efficient solutions for bioinformatics applications using GPUs." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2015. http://hdl.handle.net/10722/211138.

Full text

Abstract:

Over the past few years, DNA sequencing technology has been advancing at such a fast pace that computer hardware and software can hardly meet the ever-increasing demand for sequence analysis. A natural approach to boost analysis efficiency is parallelization, which divides the problem into smaller ones that are to be solved simultaneously on multiple execution units. Common architectures such as multi-core CPUs and clusters can increase the throughput to some extent, but the hardware setup and maintenance costs are prohibitive. Fortunately, the newly emerged general-purpose GPU programming paradigm gives us a low-cost alternative for parallelization. This thesis presents GPU-accelerated algorithms for several problems in bioinformatics, along with implementations to demonstrate their power in handling enormous totally different limitations and optimization techniques than the CPU. The first tool presented is SOAP3-dp, which is a DNA short-read aligner highly optimized for speed. Prior to SOAP3-DP, the fastest short-read aligner was its predecessor SOAP2, which was capable of aligning 1 million 100-bp reads in 5 minutes. SOAP3-dp beats this record by aligning the same volume in only 10 seconds. The key to unlocking this unprecedented speed is the revamped BWT engine underlying SOAP3-dp. All data structures and associated operations have been tailor made for the GPU to achieve optimized performance. Experiments show that SOAP3-dp not only excels in speed, but also outperforms other aligners in both alignment sensitivity and accuracy. The next tools are for constructing data structures, namely Burrows-Wheeler transform (BWT) and de Bruijn graphs (DBGs), to facilitate genome assembly of short reads, especially large metagenomics data. The BWT index for a set of short reads has recently found its use in string-graph assemblers [44], as it provides a succinct way of representing huge string graphs which would otherwise exceed the main memory limit. Constructing the BWT index for a million reads is by itself not an easy task, let alone optimize for the GPU. Another class of assemblers, the DBG-based assemblers, also faces the same problem. This thesis presents construction algorithms for both the BWT and DBGs in a succinct form. In our experiments, we constructed the succinct DBG for a metagenomics data set with over 200 gigabases in 3 hours, and the resulting DBG only consumed 31.2 GB of memory. We also constructed the BWT index for 10 million 100-bp reads in 40 minutes using 4 quad-core machines. Lastly, we introduce a SNP detection tool, iSNPcall, which detects SNPs from a set of reads. Given a set of user-supplied annotated SNPs, iSNPcall focuses only on alignments covering these SNPs, which greatly accelerates the detection of SNPs at the prescribed loci. The annotated SNPs also helps us distinguish sequencing errors from authentic SNPs alleles easily. This is in contrast to the traditional de novo method which aligns reads onto the reference genome and then filters inauthentic mismatches according to some probabilities. After comparing on several applications, iSNPcall was found to give a higher accuracy than the de novo method, especially for samples with low coverage.
published_or_final_version
Computer Science
Doctoral
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!