Log in

Relevant bibliographies by topics / GPU Accelerated / Dissertations / Theses

Dissertations / Theses on the topic 'GPU Accelerated'

To see the other types of publications on this topic, follow the link: GPU Accelerated.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'GPU Accelerated.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Lionetti, Fred. "GPU accelerated cardiac electrophysiology." Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/ucsd/fullcit?p1474756.

Full text

Abstract:

Thesis (M.S.)--University of California, San Diego, 2010.
Title from first page of PDF file (viewed April 14, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 85-89).

APA, Harvard, Vancouver, ISO, and other styles

2

Mäkelä, J. (Jussi). "GPU accelerated face detection." Master's thesis, University of Oulu, 2013. http://urn.fi/URN:NBN:fi:oulu-201303181103.

Full text

Abstract:

Graphics processing units have massive parallel processing capabilities, and there is a growing interest in utilizing them for generic computing. One area of interest is computationally heavy computer vision algorithms, such as face detection and recognition. Face detection is used in a variety of applications, for example the autofocus on cameras, face and emotion recognition, and access control. In this thesis, the face detection algorithm was accelerated with GPU using OpenCL. The goal was to gain performance benefit while keeping the implementations functionally equivalent. The OpenCL version was based on optimized reference implementation. The possibilities and challenges in accelerating different parts of the algorithm were studied. The reference and the accelerated implementations are depicted in detail, and performance is compared. The performance was evaluated by runtimes with three sets of four different sized images, and three additional images presenting special cases. The tests were run with two differently set-up computers. From the results, it can be seen that face detection is well suited for GPU acceleration; that is the algorithm is well parallelizable and can utilize efficient texture processing hardware. There are delays related in initializing the OpenCL platform which mitigate the benefit to some degree. The accelerated implementation was found to deliver equal or lower performance when there was little computation; that is the image was small or easily analyzed. With bigger and more complex images, the accelerated implementation delivered good performance compared to reference implementation. In future work, there should be some method of mitigating delays introduced by the OpenCL initialization. This work will have interest in the future when OpenCL acceleration becomes available on mobile phones
Grafiikkaprosessorit kykenevät massiiviseen rinnakkaislaskentaan ja niiden käyttö yleiseen laskentaan on kasvava kiinnostuksen aihe. Eräs alue missä kiihdytyksen käytöstä on kiinnostuttu on laskennallisesti raskaat konenäköalgoritmit kuten kasvojen ilmaisu ja tunnistus. Kasvojen ilmaisua käytetään useissa sovelluksissa, kuten kameroiden automaattitarkennuksessa, kasvojen ja tunteiden tunnistuksessa sekä kulun valvonnassa. Tässä työssä kasvojen ilmaisualgoritmia kiihdytettiin grafiikkasuorittimella käyttäen OpenCL-rajapintaa. Työn tavoite oli parantunut suorituskyky kuitenkin niin että implementaatiot pysyivät toiminnallisesti samanlaisina. OpenCL-versio perustui optimoituun verrokki-implementaatioon. Algoritmin eri vaiheiden kiihdytyksen mahdollisuuksia ja haasteita on tutkittu. Kiihdytetty- ja verrokki-implementaatio kuvaillaan ja niiden välistä suorituskykyeroa vertaillaan. Suorituskykyä arvioitiin ajoaikojen perusteella. Testeissä käytettiin kolmea kuvasarjaa joissa jokaisessa oli neljä eri kokoista kuvaa sekä kolmea lisäkuvaa jotka kuvastivat erikoistapauksia. Testit ajettiin kahdella erilailla varustellulla tietokoneella. Tuloksista voidaan nähdä että kasvojen ilmaisu soveltuu hyvin GPU kiihdytykseen, sillä algoritmin pystyy rinnakkaistamaan ja siinä pystyy käyttämään tehokasta tekstuurinkäsittelylaitteistoa. OpenCL-ympäristön alustaminen aiheuttaa viivettä joka vähentää jonkin verran suorituskykyetua. Testeissä todettiin kiihdytetyn implementaation antavan saman suuruisen tai jopa pienemmän suorituskyvyn kuin verrokki-implementaatio sellaisissa tapauksissa, joissa laskentaa oli vähän johtuen joko pienestä tai helposti käsiteltävästä kuvasta. Toisaalta kiihdytetyn implementaation suorituskyky oli hyvä verrattuna verrokki-implementaatioon kun käytettiin suuria ja monimutkaisia kuvia. Tulevaisuudessa OpenCL-ympäristön alustamisen aiheuttamat viivettä tulisi saada vähennettyä. Tämä työ on kiinnostava myös tulevaisuudessa kun OpenCL-kiihdytys tulee mahdolliseksi matkapuhelimissa

APA, Harvard, Vancouver, ISO, and other styles

3

Graves, Alex. "GPU-Accelerated Feature Tracking." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1462372516.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Baravdish, Gabriel. "GPU Accelerated Light Field Compression." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-150558.

Full text

Abstract:

This thesis presents a GPU accelerated method to compress light field or light field videos. The implementation is based on an earlier work of a full light field compression framework. The large amount of data storage by capturing light fields is a challenge to compress and we seek to accelerate the encoding part. We compress by projecting each data point onto a set of dictionaries and seek a sparse representation with the least error. An optimized greedy algorithm to suit computations on the GPU is presented. We benefit of the algorithm outline by encoding the data segmentally in parallel for faster computation speed while maintaining the quality. The results shows a significantly faster encoding time compared to the results in the same research field. We conclude that there are further improvements to increase the speed, and thus it is not too far from an interactive compression speed.

APA, Harvard, Vancouver, ISO, and other styles

5

Kottravel, Sathish. "GPU accelerated Nonlinear Soft Tissue Deformation." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-76895.

Full text

Abstract:

There are two types of structures in human body, solid organs and hollow membrane like organs. Brain, liver and other soft tissues such as tendons, muscles, cartilage etc., are examples of solid organs. Colon and blood vessels are examples of hollow organs. They greatly differ in structure and mechanical behavior. Deformation of these types of structures is an important phenomena during the process of medical simulation. The primary focus of this project is on deformation of soft tissues. These kind of soft tissues usually undergo large deformation. Deformation of an organ can be considered as mechanical response of that organ during medical simulation. This can be modeled using continuum mechanics and FEM. The primary goal of any system, irrespective of methods and models chosen, it must provide real-time response to obtain sufficient realism and accurate information. One such example is medical training system using haptic feedback. In the past two decades many models were developed and very few considered the non-linear nature in material and geometry of the solid organs. TLED is one among them. A ﬁnite element formulation proposed by Miller in 2007, known as total Lagrangian explicit dynamics (TLED) algorithm, will be discussed with respect to implementation point of view and deploying GPU acceleration (because of its parallel nature to some extent) for both pre-processing and actual computation.

APA, Harvard, Vancouver, ISO, and other styles

6

Edespong, Erik. "GPU Accelerated Surface Reconstruction from Particles." Thesis, Linköpings universitet, Institutionen för teknik och naturvetenskap, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-93543.

Full text

Abstract:

Realistic uid eects, such as smoke and water has been pursued by the visual eects industry for a long time. In recent years, particle simulations have gained a lot of popularity for achieving such eects. One problem noted by researchers has been the diculty of generating surfaces from the particles. This thesis investigates current techniques for particle surface reconstruction. In addition to this, a GPU-based implementation using constrained mesh smoothing is described. The result is globally smooth surfaces which closely follows the distribution of the particles, though some problems are still apparent. The performance of the algortihm is approximately an order of magnitude faster than its CPU counterpart, but is clogged by bottlenecks in sections still runnning on the CPU.

APA, Harvard, Vancouver, ISO, and other styles

7

BASTOS, THIAGO DE ALMEIDA. "GPU-ACCELERATED ADAPTIVELY SAMPLED DISTANCE FIELDS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2008. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=12160@1.

Full text

Abstract:

A representação de formas é um problema fundamental em Computação Gráfica. Dentre as representações conhecidas para objetos tridimensionais, os campos de distância amostrados adaptativamente (ADFs) destacam-se por sua versatilidade. ADFs combinam os conceitos de geometria com dados volumétricos, permitem representar objetos com precisão arbitrária, e consolidam diversas operações como visualização, modelagem de níveis de detalhe, detecção de colisão, testes de proximidade, metamorfose e operações booleanas em uma única representação. Este trabalho propõe métodos para acelerar a reconstrução de ADFs estáticas, melhorar a qualidade dos campos reconstruídos, e visualizar iso-superfícies das ADFs, valendo-se do enorme poder computacional encontrado nas placas gráficas modernas (GPUs). Para que as ADFs sejam representadas de forma eficiente em placas gráficas, propõe-se o uso de uma estrutura hierárquica baseada em dispersão espacial perfeita. A renderização de ADFs é feita integralmente pela GPU, utilizando uma técnica de lançamento de raios baseada em traçado por esferas. Uma maneira de tratar as descontinuidades C0 e C1 inerentes às ADFs é sugerida para que o sombreamento das superfícies seja suave. Finalmente, o trabalho propõe um novo método de reconstrução para ADFs, capaz de representar melhor superfícies curvas. Os resultados são apresentados através de aplicações simples de visualização interativa, com ADFs geradas a partir de malhas de triângulos e sólidos primitivos.
Shape representation is a fundamental problem in Computer Graphics. Among known representations for three-dimensional objects, adaptively sampled distance fields (ADFs) are noted for their versatility. ADFs combine the concepts of geometry with volume data, allow objects to be represented with arbitrary precision, and consolidate several operations - such as visualization, level-of-detail modeling, collision detection, proximity tests, morphing and boolean operations | into a single representation. This work proposes methods to accelerate the reconstruction of static ADFs, to improve the quality of reconstructed fields, and to visualize ADF isosurfaces, making use of the massive computational power found in modern graphics hardware (GPUs). In order to effciently represent ADFs on graphics cards, a hierarchical structure based on perfect spatial hashing is proposed. Rendering of ADFs is done completely on GPUs, using a ray casting technique based on sphere tracing. Means to overcome the C0 and C1 discontinuities inherent to ADFs are suggested in order to attain smoothly shaded iso-surfaces. Finally, a new reconstruction method for ADFs, which can better represent curved surfaces, is proposed. Results are presented through simple interactive visualization applications, with ADFs generated from both triangle meshes and primitive solids.

APA, Harvard, Vancouver, ISO, and other styles

8

Zhao, Kaiyong. "GPU accelerated sequence alignment /Zhao Kaiyong." HKBU Institutional Repository, 2016. https://repository.hkbu.edu.hk/etd_oa/378.

Full text

Abstract:

DNA sequence alignment is a fundamental task in gene information processing, which is about searching the location of a string (usually based on newly collected DNA data) in the existing huge DNA sequence databases. Due to the huge amount of newly generated DNA data and the complexity of approximate string match, sequence alignment becomes a time-consuming process. Hence how to reduce the alignment time becomes a significant research problem. Some algorithms of string alignment based on HASH comparison, suffix array and BWT, which have been proposed for DNA sequence alignment. Although these algorithms have reached the speed of O(N), they still cannot meet the increasing demand if they are running on traditional CPUs. Recently, GPUs have been widely accepted as an efficient accelerator for many scientific and commercial applications. A typical GPU has thousands of processing cores which can speed up repetitive computations significantly as compared to multi-core CPUs. However, sequence alignment is one kind of computation procedure with intensive data access, i.e., it is memory-bounded. The access to GPU memory and IO has more significant influence in performance when compared to the computing capabilities of GPU cores. By analyzing GPU memory and IO characteristics, this thesis produces novel parallel algorithms for DNA sequence alignment applications. This thesis consists of six parts. The first two parts explain some basic knowledge of DNA sequence alignment and GPU computing. The third part investigates the performance of data access on different types of GPU memory. The fourth part describes a parallel method to accelerate short-read sequence alignment based on BWT algorithm. The fifth part proposes the parallel algorithm for accelerating BLASTN, one of the most popular sequence alignment software. It shows how multi-threaded control and multiple GPU cards can accelerate the BLASTN algorithm significantly. The sixth part concludes the whole thesis. To summarize, through analyzing the layout of GPU memory and comparing data under the mode of multithread access, this thesis analyzes and concludes a perfect optimization method to achieve sequence alignment on GPU. The outcomes can help practitioners in bioinformatics to improve their working efficiency by significantly reducing the sequence alignment time.

APA, Harvard, Vancouver, ISO, and other styles

9

Schmitt, Ryan Daniel. "GPU-Accelerated Point-Based Color Bleeding." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/810.

Full text

Abstract:

Traditional global illumination lighting techniques like Radiosity and Monte Carlo sampling are computationally expensive. This has prompted the development of the Point-Based Color Bleeding (PBCB) algorithm by Pixar in order to approximate complex indirect illumination while meeting the demands of movie production; namely, reduced memory usage, surface shading independent run time, and faster renders than the aforementioned lighting techniques. The PBCB algorithm works by discretizing a scene’s directly illuminated geometry into a point cloud (surfel) representation. When computing the indirect illumination at a point, the surfels are rasterized onto cube faces surrounding that point, and the constituent pixels are combined into the final, approximate, indirect lighting value. In this thesis we present a performance enhancement to the Point-Based Color Bleeding algorithm through hardware acceleration; our contribution incorporates GPU-accelerated rasterization into the cube-face raster phase. The goal is to leverage the powerful rasterization capabilities of modern graphics processors in order to speed up the PBCB algorithm over standard software rasterization. Additionally, we contribute a preprocess that generates triangular surfels that are suited for fast rasterization by the GPU, and show that new heterogeneous architecture chips (e.g. Sandy Bridge from Intel) simplify the code required to leverage the power of the GPU. Our algorithm reproduces the output of the traditional Monte Carlo technique with a speedup of 41.65x, and additionally achieves a 3.12x speedup over software-rasterized PBCB.

APA, Harvard, Vancouver, ISO, and other styles

10

Pettersson, Niklas. "GPU-Accelerated Real-Time Surveillance De-Weathering." Thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-97401.

Full text

Abstract:

A fully automatic de-weathering system to increase the visibility/stability in surveillance applications during bad weather has been developed. Rain, snow and haze during daylight are handled in real-time performance with acceleration from CUDA implemented algorithms. Video from fixed cameras is processed on a PC with no need of special hardware except an NVidia GPU. The system does not use any background model and does not require any precalibration. Increase in contrast is obtained in all haze/rain/snow-cases while the system lags the maximum of one frame during rain or snow removal. De-hazing can be obtained for any distance to simplify tracking or other operating algorithms on a surveillance system.

APA, Harvard, Vancouver, ISO, and other styles

11

Prestegård, Elisabeth K. "A GPU Accelerated Simulator for CO2 Storage." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for fysikk, 2014. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-24542.

Full text

Abstract:

The goal of this thesis has been to develop a fast simulator for large-scale migration of CO2 in saline aquifers. We have also focused on being able to let the CO2 storage atlas from the Norwegian Petroleum Directorate specify the reservoir properties. In order to meet the demands of simulating on large data sets combined with high performance, we have investigated the possibilities of using graphic processing units (GPUs) to accelerate the computations.The Intergovernmental Panel on Climate Change, IPCC, has considered CO2 to be one of the main factors influencing the climate changes of today. Capture and storage of CO2 is one of the strategies which could reduce the amount of CO2 released into the atmosphere. However, there are still uncertainties related to flow of CO2 in saline aquifers. It is therefore necessary with fast simulators which can predict this behavior to minimize the risks involved in a storage project.GPUs are initially designed to accelerate graphic operations. As opposed to standard CPUs, where most of the transistor capacity is used on advanced logic, the GPU uses most of its transistors on floating point operations in parallel. This results in that the theoretical upper bound for floating point operations are 7-10 times higher on the GPU than the CPU. Thus, GPUs have shown to be a strong tool when solving hyperbolic conservation laws using stencil based schemes, as a large amount of the computations can be parallelized.In compliance with the storage atlases we have based our simulator on structured grids. Our numerical scheme consists of a finite volume method combined with an explicit Euler method.

APA, Harvard, Vancouver, ISO, and other styles

12

Pachev, Ivan. "GPUMap: A Transparently GPU-Accelerated Map Function." DigitalCommons@CalPoly, 2017. https://digitalcommons.calpoly.edu/theses/1704.

Full text

Abstract:

As GPGPU computing becomes more popular, it will be used to tackle a wider range of problems. However, due to the current state of GPGPU programming, programmers are typically required to be familiar with the architecture of the GPU in order to effectively program it. Fortunately, there are software packages that attempt to simplify GPGPU programming in higher-level languages such as Java and Python. However, these software packages do not attempt to abstract the GPU-acceleration process completely. Instead, they require programmers to be somewhat familiar with the traditional GPGPU programming model which involves some understanding of GPU threads and kernels. In addition, prior to using these software packages, programmers are required to transform the data they would like to operate on into arrays of primitive data. Typically, such software packages restrict the use of object-oriented programming when implementing the code to operate on this data. This thesis presents GPUMap, which is a proof-of-concept GPU-accelerated map function for Python. GPUMap aims to hide all the details of the GPU from the programmer, and allows the programmer to accelerate programs written in normal Python code that operate on arbitrarily nested objects using a majority of Python syntax. Using GPUMap, certain types of Python programs are able to be accelerated up to 100 times over normal Python code. There are also software packages that provide simplified GPU acceleration to distributed computing frameworks such as MapReduce and Spark. Unfortunately, these packages do not provide a completely abstracted GPU programming experience, which conflicts with the purpose of the distributed computing frameworks: to abstract the underlying distributed system. This thesis also presents GPU-accelerated RDD (GPURDD), which is a type of Spark Resilient Distributed Dataset (RDD) which incorporates GPUMap into its map, filter, and foreach methods in order to allow Spark applicatons to make use of the abstracted GPU acceleration provided by GPUMap.

APA, Harvard, Vancouver, ISO, and other styles

13

Arvid, Johnsson. "Analysis of GPU accelerated OpenCL applications on the Intel HD 4600 GPU." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-140124.

Full text

Abstract:

GPU acceleration is the concept of accelerating the execution speed of an application by running it on the GPU. Researchers and developers have always wanted to achieve greater speed for their applications and GPU acceleration is a very common way of doing so. This has been done a long time for highly graphical applications using powerful dedicated GPUs. However, researchers have become more and more interested in using GPU acceleration on everyday applications. Moreover now a days more or less every computer has some sort of integrated GPU which often is underutilized. The integrated GPUs are not as powerful as dedicated ones but they have other benefits such as a lower power consumption and faster data transfer. Therefore this thesis’ purpose was to examine whether the integrated GPU Intel HD 4600 can be used to accelerate the two applications Image Convolution and sparse matrix vector multiplication (SpMV). This was done by analysing the code from a previous thesis which produced some unexpected results as well as a benchmark from the OpenDwarf’s benchmark suite. The Intel HD 4600 was able to speedup both Image Convolution and SpMV by about two times compared to running them on the Intel i7-4790. However, the SpMV implementation was not well suited for the GPU meaning that the speedup was only observed on ideal input configurations.

APA, Harvard, Vancouver, ISO, and other styles

14

Lagergren, Mattias. "GPU accelerated SPH simulation of fluids for VFX." Thesis, Linköping University, Visual Information Technology and Applications (VITA), 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-57320.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Hrstic, Dusan Viktor. "Improving the performance of GPU-accelerated spatial joins." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210719.

Full text

Abstract:

Data collisions have been widely studied by various fields of science and industry. Combing CPU and GPU for processing spatial joins has been broadly accepted due to the increased speed of computations. This should redirect efforts in GPGPU research from straightforward porting of applications to establishing principles and strategies that allow efficient mapping of computation to graphics hardware. As threads are executing instructions while using hardware resources that are available, impact of different thread organizations and their effect on spatial join performance is analyzed and examined in this report.Having new perspectives and solutions to the problem of thread organization and warp scheduling may contribute more to encourage others to program on the GPU side. The aim with this project is to examine the impact of different thread organizations in spatial join processes. The relationship between the items inside datasets are examined by counting the number of collisions their join produce in order to understand how different approaches may have an influence on performance. Performance benchmarking, analysis and measuring of different approaches in thread organization are investigated and analyzed in this report in order to find the most time efficient solution which is the purpose of the conducted work.This report shows the obtained results for the utilization of different thread techniques in order to optimize the computational speeds of the spatial join algorithms. There are two algorithms on the GPU, one implementing thread techniques and the other non-optimizing solution. The GPU times are compared with the execution times on the CPU and the GPU implementations are verified by observing the collision counters that are matching with all of the collision counters from the CPU counterpart.In the analysis part of this report the the implementations are discussed and compared to each other. It has shown that the difference between algorithm implementing thread techniques and the non-optimizing one lies around 80% in favour of the algorithm implementing thread techniques and it is also around 56 times faster then the spatial joins on the CPU.
Datakollisioner har studerats i stor utsträckning i olika områden inom vetenskap och industri. Att kombinera CPU och GPU för bearbetning av rumsliga föreningar har godtagits på grund av bättre prestanda. Detta bör omdirigera insatser i GPGPU-forskning från en enkel portning av applikationer till fastställande av principer och strategier som möjliggör en effektiv användning av grafikhårdvara. Eftersom trådar som exekverar instruktioner använder sig av hårdvaruresurser, förekommer olika effekter beroende på olika trådorganisationer. Deras på verkan på prestanda av rumsliga föreningar kommer att analyseras och granskas i denna rapport. Nya perspektiv och lösningar på problemet med trådorganisationen och schemaläggning av warps kan bidra till att fler uppmuntras till att använda GPU-programmering. Syftet med denna rapport är att undersöka effekterna av olika trådorganisationer i rumsliga föreningar. Förhållandet mellan objekten inom datamängder undersöks genom att beräkna antalet kollisioner som ihopslagna datamängder förorsakar. Detta görs för att förstå hur olika metoder kan påverka effektivitet och prestanda. Prestandamätningar av olika metoder inom trå dorganisationer undersö ks och analyseras fö r att hitta den mest tidseffektiva lösningen. I denna rapport visualiseras också det erhållna resultatet av olika trådtekniker som används för att optimera beräkningshastigheterna för rumsliga föreningar. Rapporten undersökeren CPU-algoritm och två GPU-algoritmer. GPU tiderna jämförs hela tiden med exekveringstiderna på CPU:n, och GPU-implementeringarna verifieras genom att jämföra antalet kollisioner från både CPU:n och GPU:n. Under analysdelen av rapporten jämförs och diskuteras olika implementationer med varandra. Det visade sig att skillnaden mellan en algoritm som implementerar trådtekniker och en icke-optimerad version är cirka 80 % till förmån för algoritmen som implementerar trådtekniker. Det visade sig också föreningarna på CPU:n att den är runt 56 gånger snabbare än de rumsliga

APA, Harvard, Vancouver, ISO, and other styles

16

Young, Emily Clark. "GPU-Accelerated Demodulation for a Satellite Ground Station." DigitalCommons@USU, 2019. https://digitalcommons.usu.edu/etd/7635.

Full text

Abstract:

One consequence of the increasing number of small satellite missions is an increasing demand for high data rate downlinks. As the satellites transmit at high data rates, ground-side receivers need to demodulate the transmitted data as quickly as possible. While application specific hardware can be designed, software defined radio solutions for ground stations are attractive for their flexibility, adaptability, and portability. Another industry trend is the increasing use of Graphics Processing Units (GPUs) in general-purpose processing. By performing many operations simultaneously, GPUs are capable of accelerating processing when given a problem that can be implemented in a parallel manner. Furthermore, once a parallel algorithm is implemented, further speedups are possible by increasing hardware resources without need for any revision in the algorithm. This project combines the above ideas by implementing a software defined radio algorithm to quickly demodulate high-speed data on a GPU. It demonstrates the viability of the GPU in software defined radio applications and particularly in the area of fast demodulation.

APA, Harvard, Vancouver, ISO, and other styles

17

Qvick, Faxå Alexander, and Jonas Bromö. "GPU accelerated rendering of vector based maps on iOS." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-107064.

Full text

Abstract:

Digital maps can be represented as either raster (bitmap images) or vector data. Vector maps are often preferable as they can be stored more efficiently and rendered irrespective of screen resolution. Vector map rendering on demand can be a computationally intensive task and has to be implemented in an efficient manner to ensure good performance and a satisfied end-user, especially on mobile devices with limited computational resources. This thesis discusses different ways of utilizing the on-chip GPU to improve the vector map rendering performance of an existing iOS app. It describes an implementation that uses OpenGL ES 2.0 to achieve the same end-result as the old CPU-based implementation using the same underlying map infras- tructure. By using the OpenGL based map renderer as well as implementing other performance optimizations, the authors were able to achieve an almost fivefold increase in rendering performance on an iPad Air.

APA, Harvard, Vancouver, ISO, and other styles

18

Kienel, Enrico, and Guido Brunnett. "GPU-Accelerated Contour Extraction on Large Images Using Snakes." Universitätsbibliothek Chemnitz, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200900358.

Full text

Abstract:

Active contours have been proven to be a powerful semiautomatic image segmentation approach, that seems to cope with many applications and different image modalities. However, they exhibit inherent drawbacks, including the sensibility to contour initialization due to the limited capture range of image edges and problems with concave boundary regions. The Gradient Vector Flow replaces the traditional image force and provides an enlarged capture range as well as enhanced concavity extraction capabilities, but it involves an expensive computational effort and considerably increased memory requirements at the time of computation. In this paper, we present an enhancement of the active contour model to facilitate semiautomatic contour detection in huge images. We propose a tile-based image decomposition accompanying an image force computation scheme on demand in order to minimize both computational and memory requirements. We show an efficient implementation of this approach on the basis of general purpose GPU processing providing for continuous active contour deformation without a considerable delay.

APA, Harvard, Vancouver, ISO, and other styles

19

Brodén, Alexander, and Bohlin Gustav Pihl. "Towards Real-Time NavMesh Generation Using GPU Accelerated Scene Voxelization." Thesis, Blekinge Tekniska Högskola, Institutionen för kreativa teknologier, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-14381.

Full text

Abstract:

Context. Producing NavMeshes for pathfinding in computer games is a time-consuming process. Recast and Detour is a pair of stateof-the-art libraries that allows automation of NavMesh generation. It builds on a technique called Scene Voxelization, where triangle geometry is converted to voxels in heightfields. The algorithm is expensive in terms of execution time. A fast voxelization algorithm could be useful in real-time applications where geometry is dynamic. In recent years, voxelization implementations on the GPU have been shown to outperform CPU implementations in certain configurations. Objectives. The objective of this thesis is to find a GPU-based alternative to Recast’s voxelization algorithm, and determine when the GPU-based solution is faster than the reference. Methods. This thesis proposes a GPU-based alternative to Recast’s voxelization algorithm, designed to be an interchangeable step in Recast’s pipeline, in a real-time application where geometry is dynamic. Experiments were conducted to show how accurately the algorithm generates heightfields, how fast the execution time is in certain con- figurations, and how the algorithm scales with different sets of input data. Results. The proposed algorithm, when run on an AMD Radeon RX 480 GPU, was shown to be both accurate and fast in certain configurations. At low voxelfield resolutions, it outperformed the reference algorithm on typical Recast reference models. The biggest performance gain was shown when the input contained large numbers of small triangles. The algorithm performs poorly when the input data has triangles that are big in relation to the size of the voxels, and an optional optimization was presented to address this issue. Another optimization was presented that further increases performance gain when many instances of the same mesh are voxelized. Conclusions. The objectives of the thesis were met. A fast, GPUbased algorithm for voxelization in Recast was presented, and conclusions about when it can outperform the reference algorithm were drawn. Possibilities for even greater performance gains were identified for future research.

APA, Harvard, Vancouver, ISO, and other styles

20

Liu, Bingchen, Alexander Bock, Timo Ropinski, Martyn Nash, Poul Nielsen, and Burkhard Wünsche. "GPU-Accelerated Direct Volume Rendering of Finite Element Data Sets." Linköpings universitet, Medie- och Informationsteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-92854.

Full text

Abstract:

Direct Volume Rendering of Finite Element models is challengingsince the visualisation process is performed in worldcoordinates, whereas data fields are usually defined overthe elements’ material coordinate system. In this paper wepresent a framework for Direct Volume Rendering of FiniteElement models. We present several novel implementationsvisualising Finite Element data directly without requiring resamplinginto world coordinates. We evaluate the methodsusing several biomedical Finite Element models. Our GPUimplementation of ray-casting in material coordinates usingdepth peeling is several orders of magnitude faster than thecorresponding CPU approach, and our new ray interpolationapproach achieves near interactive frame rates for high-orderfinite element models at high resolutions.

APA, Harvard, Vancouver, ISO, and other styles

21

Cluff, Stephen T. "A unified approach to GPU-accelerated aerial video enhancement techniques /." Diss., CLICK HERE for online access, 2009. http://contentdm.lib.byu.edu/ETD/image/etd2780.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Sreenibha, Reddy Byreddy. "Performance Metrics Analysis of GamingAnywhere with GPU accelerated Nvidia CUDA." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16846.

Full text

Abstract:

The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU.

APA, Harvard, Vancouver, ISO, and other styles

23

Cluff, Stephen Thayn. "A Unified Approach to GPU-Accelerated Aerial Video Enhancement Techniques." BYU ScholarsArchive, 2009. https://scholarsarchive.byu.edu/etd/1680.

Full text

Abstract:

Video from aerial surveillance can provide a rich source of data for analysts. From the time-critical perspective of wilderness search and rescue operations, information extracted from aerial videos can mean the difference between a successful search and an unsuccessful search. When using low-cost, payload-limited mini-UAVs, as opposed to more expensive platforms, several challenges arise, including jittery video, narrow fields of view, low resolution, and limited time on screen for key features. These challenges make it difficult for analysts to extract key information in a timely manner. Traditional approaches may address some of these issues, but no existing system effectively addresses all of them in a unified and efficient manner. Building upon a hierarchical dense image correspondence technique, we create a unifying framework for reducing jitter, enhancing resolution, and expanding the field of view while lengthening the time that features remain on screen. It also provides for easy extraction of moving objects in the scene. Our method incorporates locally adaptive warps which allows for robust image alignment even in the presence of parallax and without the aid of internal or external camera parameters. We accelerate the image registration process using commodity Graphics Processing Units (GPUs) to accomplish all of these tasks in near real-time with no external telemetry data.

APA, Harvard, Vancouver, ISO, and other styles

24

Ulmstedt, Mattias, and Joacim Stålberg. "GPU Accelerated Ray-tracing for Simulating Sound Propagation in Water." Thesis, Linköpings universitet, Datorteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-160308.

Full text

Abstract:

The propagation paths of sound in water can be somewhat complicated due to the fact that the sound speed in water varies with properties such as water temperature and pressure, which has the effect of curving the propagation paths. This thesis shows how sound propagation in water can be simulated using a ray-tracing based approach on a GPU using Nvidia’s OptiX ray-tracing engine. In particular, it investigates how much speed-up can be achieved compared to CPU based implementations and whether the RT cores introduced in Nvidia’s Turing architecture, which provide hardware accelerated ray-tracing, can be used to speed up the computations. The presented GPU implementation is shown to be up to 310 times faster then the CPU based Fortran implementation Bellhop. Although the speed-up is significant, it is hard to say how much speed-up is gained by utilizing the RT cores due to not having anything equivalent to compare the performance to.

APA, Harvard, Vancouver, ISO, and other styles

25

Liberg, Tim, and Per-Erik Måhl. "GPU-accelerated Model Checking of Periodic Self-Suspending Real-Time Tasks." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-14661.

Full text

Abstract:

Efficient model checking is important in order to make this type of software verification useful for systems that are complex in their structure. If a system is too large or complex then model checking does not simply scale, i.e., it could take too much time to verify the system. This is one strong argument for focusing on making model checking faster. Another interesting aim is to make model checking so fast that it can be used for predicting scheduling decisions for real-time schedulers at runtime. This of course requires the model checking to complete within a order of milliseconds or even microseconds. The aim is set very high but the results of this thesis will at least give a hint on whether this seems possible or not. The magic card for (maybe) making this possible is called Graphics Processing Unit (GPU). This thesis will investigate if and how a model checking algorithm can be ported and executed on a GPU. Modern GPU architectures offers a high degree of processing power since they are equipped with up to 1000 (NVIDIA GTX 590) or 3000 (NVIDIA Tesla K10) processor cores. The drawback is that they offer poor thread-communication possibilities and memory caches compared to CPU. This makes it very difficult to port CPU programs to GPUs.The example model (system) used in this thesis represents a real-time task scheduler that can schedule up to three periodic self-suspending tasks. The aim is to verify, i.e., find a feasible schedule for these tasks, and do it as fast as possible with the help of the GPU.

APA, Harvard, Vancouver, ISO, and other styles

26

Nottingham, Alastair Timothy. "GPU Accelerated protocol analysis for large and long-term traffic traces." Thesis, Rhodes University, 2016. http://hdl.handle.net/10962/910.

Full text

Abstract:

This thesis describes the design and implementation of GPF+, a complete general packet classification system developed using Nvidia CUDA for Compute Capability 3.5+ GPUs. This system was developed with the aim of accelerating the analysis of arbitrary network protocols within network traffic traces using inexpensive, massively parallel commodity hardware. GPF+ and its supporting components are specifically intended to support the processing of large, long-term network packet traces such as those produced by network telescopes, which are currently difficult and time consuming to analyse. The GPF+ classifier is based on prior research in the field, which produced a prototype classifier called GPF, targeted at Compute Capability 1.3 GPUs. GPF+ greatly extends the GPF model, improving runtime flexibility and scalability, whilst maintaining high execution efficiency. GPF+ incorporates a compact, lightweight registerbased state machine that supports massively-parallel, multi-match filter predicate evaluation, as well as efficient arbitrary field extraction. GPF+ tracks packet composition during execution, and adjusts processing at runtime to avoid redundant memory transactions and unnecessary computation through warp-voting. GPF+ additionally incorporates a 128-bit in-thread cache, accelerated through register shuffling, to accelerate access to packet data in slow GPU global memory. GPF+ uses a high-level DSL to simplify protocol and filter creation, whilst better facilitating protocol reuse. The system is supported by a pipeline of multi-threaded high-performance host components, which communicate asynchronously through 0MQ messaging middleware to buffer, index, and dispatch packet data on the host system. The system was evaluated using high-end Kepler (Nvidia GTX Titan) and entry level Maxwell (Nvidia GTX 750) GPUs. The results of this evaluation showed high system performance, limited only by device side IO (600MBps) in all tests. GPF+ maintained high occupancy and device utilisation in all tests, without significant serialisation, and showed improved scaling to more complex filter sets. Results were used to visualise captures of up to 160 GB in seconds, and to extract and pre-filter captures small enough to be easily analysed in applications such as Wireshark.

APA, Harvard, Vancouver, ISO, and other styles

27

Zhang, Yun. "LARGE-SCALE MICROARRAY DATA ANALYSIS USING GPU- ACCELERATED LINEAR ALGEBRA LIBRARIES." OpenSIUC, 2012. https://opensiuc.lib.siu.edu/theses/878.

Full text

Abstract:

The biological datasets produced as a result of high-throughput genomic research such as specifically microarrays, contain vast amounts of knowledge for entire genome and their expression affiliations. Gene clustering from such data is a challenging task due to the huge data size and high complexity of the algorithms as well as the visualization needs. Most of the existing analysis methods for genome-wide gene expression profiles are sequential programs using greedy algorithms and require subjective human decision. Recently, Zhu et al. proposed a parallel Random matrix theory (RMT) based approach for generating transcriptional networks, which is much more resistant to high level of noise in the data [9] without human intervention. Nowadays GPUs are designed to be used more efficiently for general purpose computing [1] and are vastly superior to CPUs [6] in terms of threading performance. Our kernel functions running on GPU utilizes the functions from both the libraries of Compute Unified Basic Linear Algebra Subroutines (CUBLAS) and Compute Unified Linear Algebra (CULA) which implements the Linear Algebra Package (LAPACK). Our experiment results show that GPU program can achieve an average speed-up of 2~3 times for some simulated datasets.

APA, Harvard, Vancouver, ISO, and other styles

28

Tasoulas, Zois Gerasimos. "Resource management and application customization for hardware accelerated systems." OpenSIUC, 2021. https://opensiuc.lib.siu.edu/dissertations/1907.

Full text

Abstract:

Computational demands are continuously increasing, driven by the growing resource demands of applications. At the era of big-data, big-scale applications, and real-time applications, there is an enormous need for quick processing of big amounts of data. To meet these demands, computer systems have shifted towards multi-core solutions. Technology scaling has allowed the incorporation of even larger numbers of transistors and cores into chips. Nevertheless, area constrains, power consumption limitations, and thermal dissipation limit the ability to design and sustain ever increasing chips. To overpassthese limitations, system designers have turned towards the usage of hardware accelerators. These accelerators can take the form of modules attached to each core of a multi-core system, forming a network on chip of cores with attached accelerators. Another option of hardware accelerators are Graphics Processing Units (GPUs). GPUs can be connected through a host-device model with a general purpose system, and are used to off-load parts of a workload to them. Additionally, accelerators can be functionality dedicated units. They can be part of a chip and the main processor can offload specific workloads to the hardware accelerator unit.In this dissertation we present: (a) a microcoded synchronization mechanism for systems with hardware accelerators that provide distributed shared memory, (b) a Streaming Multiprocessor (SM) allocation policy for single application execution on GPUs, (c) an SM allocation policy for concurrent applications that execute on GPUs, and (d) a framework to map neural network (NN) weights to approximate multiplier accuracy levels. Theaforementioned mechanisms coexist in the resource management domain. Specifically, the methodologies introduce ways to boost system performance by using hardware accelerators. In tandem with improved performance, the methodologies explore and balance trade-offs that the use of hardware accelerators introduce.

APA, Harvard, Vancouver, ISO, and other styles

29

Hamed, Maien Mohamed Osman. "On meshless methods : a novel interpolatory method and a GPU-accelerated implementation." Thesis, Nelson Mandela Metropolitan University, 2013. http://hdl.handle.net/10948/d1018227.

Full text

Abstract:

Meshless methods have been developed to avoid the numerical burden imposed by meshing in the Finite Element Method. Such methods are especially attrac- tive in problems that require repeated updates to the mesh, such as problems with discontinuities or large geometrical deformations. Although meshing is not required for solving problems with meshless methods, the use of meshless methods gives rise to different challenges. One of the main challenges associated with meshless methods is imposition of essential boundary conditions. If exact interpolants are used as shape functions in a meshless method, imposing essen- tial boundary conditions can be done in the same way as the Finite Element Method. Another attractive feature of meshless methods is that their use involves compu- tations that are largely independent from one another. This makes them suitable for implementation to run on highly parallel computing systems. Highly par- allel computing has become widely available with the introduction of software development tools that enable developing general-purpose programs that run on Graphics Processing Units. In the current work, the Moving Regularized Interpolation method has been de- veloped, which is a novel method of constructing meshless shape functions that achieve exact interpolation. The method is demonstrated in data interpolation and in partial differential equations. In addition, an implementation of the Element-Free Galerkin method has been written to run on a Graphics Processing Unit. The implementation is described and its performance is compared to that of a similar implementation that does not make use of the Graphics Processing Unit.

APA, Harvard, Vancouver, ISO, and other styles

30

de, Ruiter Niels Johannes Antonius. "GPU Accelerated Intermixing as a Framework for Interactively Visualizing Spectral CT Data." Thesis, University of Canterbury. Centre of Bioengineering, 2011. http://hdl.handle.net/10092/5328.

Full text

Abstract:

Computed Tomography (CT) is a medical imaging modality which acquires anatomical data via the unique x-ray attenuation of materials. Yet, some clinically important materials remain diﬃcult to distinguish with current CT technology. Spectral CT is an emerging technology which acquires multiple CT datasets for speciﬁc x-ray spectra. These spectra provide a ﬁngerprint that allow materials to be distinguished that would otherwise look the same on conventional CT. The unique characteristics of spectral CT data motivates research into novel visualization techniques. In this thesis, we aim to provide the foundation for visualizing spectral CT data. Our initial investigation of similar multi-variate data types identiﬁed intermixing as a promising visualization technique. This promoted the development of a generic, modular and extensible intermixing framework. Therefore, the contribution of our work is a framework supporting the construction, analysis and storage of algorithms for visualizing spectral CT studies. To allow evaluation, we implemented the intermixing framework in an application called MARSCTExplorer along with a standard set of volume visualization tools. These tools provide user-interaction as well as supporting traditional visualization techniques for comparison. We evaluated our work with four spectral CT studies containing materials indistinguishable by conventional CT. Our results conﬁrm that spectral CT can distinguish these materials, and reveal how these materials might be visualized with our intermixing framework.

APA, Harvard, Vancouver, ISO, and other styles

31

Zhang, Chenggang, and 张呈刚. "Run-time loop parallelization with efficient dependency checking on GPU-accelerated platforms." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B47167658.

Full text

Abstract:

General-Purpose computing on Graphics Processing Units (GPGPU) has attracted a lot of attention recently. Exciting results have been reported in using GPUs to accelerate applications in various domains such as scientific simulations, data mining, bio-informatics and computational finance. However, up to now GPUs can only accelerate data-parallel loops with statically analyzable parallelism. Loops with dynamic parallelism (e.g., with array accesses through subscripted subscripts), an important pattern in many general-purpose applications, cannot be parallelized on GPUs using existing technologies. Run-time loop parallelization using Thread Level Speculation (TLS) has been proposed in the literatures to parallelize loops with statically un-analyzable dependencies. However, most of the existing TLS systems are designed for multiprocessor/multi-core CPUs. GPUs have fundamental differences with CPUs in both hardware architecture and execution model, making the previous TLS designs not work or inefficient when ported to GPUs. This thesis presents GPUTLS, a runtime system designed to support speculative loop parallelization on GPUs. The design of GPU-TLS addresses several key problems encountered when adapting TLS to GPUs: (1) To reduce the possibility of mis-speculation, deferred-update memory versioning scheme is adopted to avoid mis-speculations caused by inter-iteration WAR and WAW dependencies. A technique named intra-warp value forwarding is proposed to respect some inter-iteration RAW dependencies, which further reduces the mis-speculation possibility. (2) An incremental speculative execution scheme is designed to exploit partial parallelism within loops. This avoids excessive re-executions and reduces the mis-speculation penalty. (3) The dependency checking among thousands of speculative GPU threads poses large overhead and can easily become the performance bottleneck. To lower the overhead, we design several e_cient dependency checking schemes named PRW+BDC, SW, SR, SRW+EDC, and SRW+LDC respectively. (4) We devise a novel parallel commit scheme to avoid the overhead incurred by the serial commit phase in most existing TLS designs. We have carried out extensive experiments on two platforms with different NVIDIA GPUs, using both a synthetic loop that can simulate loops with different characteristics and several loops from real-life applications. Testing results show that the proposed intra-warp value forwarding and eager dependency checking techniques can improve the performance for almost all kinds of loop patterns. We observe that compared with other dependency checking schemes, SR and SW can achieve better performance in most cases. It is also shown that the proposed parallel commit scheme is especially useful for loops with large write set size and small number of inter-iteration WAW dependencies. Overall, GPU-TLS can achieve speedups ranging from 5 to 105 for loops with dynamic parallelism.
published_or_final_version
Computer Science
Master
Master of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

32

Ntemos, George. "GPU-accelerated high-order scale-resolving simulations using the flux reconstruction approach." Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/59135.

Full text

Abstract:

High-order methods in Computational Fluid Dynamics (CFD) offer a potential route towards the resolution of hitherto intractable fluid-dynamics problems in industry. The Flux Reconstruction (FR) approach provides a unifying framework for a number of popular high-order methods such as the Discontinuous Galerkin (DG). Its suitability for use on unstructured grids along with its ability to facilitate massively parallelised implementation on architectures such as GPUs provide a means to tackle computationally challenging flows around complex geometries. Such a flow can be found in the rod-aerofoil tandem configuration: Complex, unsteady flow structures generated by and interacting with more than a single solid body are central to a number of applications in the aerospace industry. The current thesis attempts to demonstrate the suitability of the FR approach in successfully simulating the flow around a rod-aerofoil configuration. The in-house CFD solver employed in the research is presented and the FR implementation analysed. Computational grid resolution issues arising from the rod-aerofoil problem are studied and a novel strategy for the stabilisation of the computation is implemented in the form of local entropy stability. The results obtained are analysed and conclusions are drawn on the utility of the FR approach in the absence of a sub-grid scale model (Implicit LES - under-resolved DNS). The present work confirms the utility of local entropy stability for the stabilisation of the rod-aerofoil simulation of aerofoil-chord based Reynolds number of Re=480, 000. It will also demonstrate that the under-resolved DNS setup that resulted in a computational cost of approximately six hours for a single flow pass over the aerofoil chord on 200 Nvidia P100 GPUs resulted in moderate success for a significant portion of the flow dynamics, which not adequately predicted when compared with experiment. The latter led to a series of useful conclusions. The core of the conclusions involved the apparent over-prediction of time-averaged velocity and momentum deficits across wakes and as well as over-prediction of turbulent intensities. An identification of the problematic areas is therefore given and potential alleviation techniques outlined.

APA, Harvard, Vancouver, ISO, and other styles

33

Dyson, Joshua. "GPU accelerated linear system solvers for OpenFOAM and their application to sprays." Thesis, Brunel University, 2018. http://bura.brunel.ac.uk/handle/2438/16005.

Full text

Abstract:

This thesis presents the development of GPU accelerated solvers for use in simulation of the primary atomization phenomenon. By using the open source continuum mechanics library, OpenFOAM, as a basis along with the NVidia CUDA API linear system solvers have been developed so that the multiphase solver runs in part on GPUs. This aims to reduce the enormous computational cost associated with modelling primary atomization. The modelling of such is vital to understanding the mechanisms that make combustion efficient. Firstly, the OpenFOAM code is benchmarked to assess both its suitability for atomization problems and to establish efficient operating parameters for comparison to GPU accelerations. This benchmarking then culminates in a comparison to an experimental test case, from the literature, dominated by surface tension, in 3D. Finally, a comparison is made with a primary atomizing liquid sheet as published in the literature. A geometric multigrid method is employed to solve the pressure Poisson equations, the first use of a geometric multigrid method in 3D GPU accelerated VOF simulation. Detailed investigations are made into the compute efficiency of the GPU accelerated solver, comparing memory bandwidth usage to hardware maximums as well as GPU idling time. In addition, the components of the multigrid method are also investigated, including the effect of residual scaling. While the GPU based multigrid method shows some improvement over the equivalent CPU implementation, the costs associated with running on GPU cause this to not be significantly greater.

APA, Harvard, Vancouver, ISO, and other styles

34

Es, S. Alphan. "Accelerated Ray Tracing Using Programmable Graphics Pipelines." Phd thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609307/index.pdf.

Full text

Abstract:

The graphics hardware have evolved from simple feed forward triangle rasterization devices to flexible, programmable, and powerful parallel processors. This evolution allows the researchers to use graphics processing units (GPU) for both general purpose computations and advanced graphics rendering. Sophisticated GPUs hold great opportunities for the acceleration of computationally expensive photorealistic rendering methods. Rendering of photorealistic images in real-time is a challenge. In this work, we investigate efficient ways to utilize GPUs for real-time photorealistic rendering. Specifically, we studied uniform grid based ray tracing acceleration methods and GPU friendly traversal algorithms. We show that our method is faster than or competitive to other GPU based ray tracing acceleration techniques. The proposed approach is also applicable to the fast rendering of volumetric data. Additionally, we devised GPU based solutions for real-time stereoscopic image generation which can be used in companion with GPU based ray tracers.

APA, Harvard, Vancouver, ISO, and other styles

35

Mantell, Rosemary Genevieve. "Accelerated sampling of energy landscapes." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/267990.

Full text

Abstract:

In this project, various computational energy landscape methods were accelerated using graphics processing units (GPUs). Basin-hopping global optimisation was treated using a version of the limited-memory BFGS algorithm adapted for CUDA, in combination with GPU-acceleration of the potential calculation. The Lennard-Jones potential was implemented using CUDA, and an interface to the GPU-accelerated AMBER potential was constructed. These results were then extended to form the basis of a GPU-accelerated version of hybrid eigenvector-following. The doubly-nudged elastic band method was also accelerated using an interface to the potential calculation on GPU. Additionally, a local rigid body framework was adapted for GPU hardware. Tests were performed for eight biomolecules represented using the AMBER potential, ranging in size from 81 to 22\,811 atoms, and the effects of minimiser history size and local rigidification on the overall efficiency were analysed. Improvements relative to CPU performance of up to two orders of magnitude were obtained for the largest systems. These methods have been successfully applied to both biological systems and atomic clusters. An existing interface between a code for free energy basin-hopping and the SuiteSparse package for sparse Cholesky factorisation was refined, validated and tested. Tests were performed for both Lennard-Jones clusters and selected biomolecules represented using the AMBER potential. Significant acceleration of the vibrational frequency calculations was achieved, with negligible loss of accuracy, relative to the standard diagonalisation procedure. For the larger systems, exploiting sparsity reduces the computational cost by factors of 10 to 30. The acceleration of these computational energy landscape methods opens up the possibility of investigating much larger and more complex systems than previously accessible. A wide array of new applications are now computationally feasible.

APA, Harvard, Vancouver, ISO, and other styles

36

Tarassu, Jonas. "GPU-Accelerated Frame Pre-Processing for Use in Low Latency Computer Vision Applications." Thesis, Linköpings universitet, Informationskodning, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-142019.

Full text

Abstract:

The attention for low latency computer vision and video processing applications are growing for every year, not least the VR and AR applications. In this thesis the Contrast Limited Adaptive Histogram Equalization (CLAHE) and Radial Dis- tortion algorithms are implemented using both CUDA and OpenCL to determine whether these type of algorithms are suitable for implementations aimed to run at GPUs when low latency is of utmost importance. The result is an implemen- tation of the block versions of the CLAHE algorithm which utilizes the built in interpolation hardware that resides on the GPU to reduce block effects and an im- plementation of the Radial Distortion algorithm that corrects a 1920x1080 frame in 0.3 ms. Further this thesis concludes that the GPU-platform might be a good choice if the data to be processed can be transferred to and possibly from the GPU fast enough and that the choice of compute API mostly is a matter of taste.

APA, Harvard, Vancouver, ISO, and other styles

37

Tong, Jason. "Providing an infrastructure for assertion-based test generation and GPU accelerated mutation testing." Thesis, McGill University, 2014. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=123078.

Full text

Abstract:

Functional verification of modern digital designs is a never ending challenge in the Integrated Circuit (IC) industry. Fuelled by the continuous demand of more integration, the increased effort in verification does not always entail error-free circuits after first production. Emerging technologies such as Assertion-based verification, can help in verifying the functional correctness of digital designs and can be easily integrated into existing design verification methodologies. Simulation-based verification is still the most predominant method in industry because of its ability to scale with largedesigns. Assertions can be inserted into the design and they can be treated as coverage points, where the input tests are responsible for exerting the design's conditions in evaluating those assertions. The effectiveness of this approach relies on the quality of the tests, where poor test quality can prevent the design from being thoroughly verified.This thesis presents novel techniques and algorithms for generating tests from assertions. Assertions serve as an invaluable source of information, where one can leverage the defined behaviours for generating the appropriate functional tests that can be used in simulation. A proposed set of coverage metrics helps in generating tests that thoroughly evaluate assertions during simulation. Verification engineers can make use of these tests in performing effective simulation in order to detect and then correct any design errors. The tool developed for generating tests from assertions was evaluated using nearly 300 assertions that were written for verifying the correctness of several industry-based designs. As a result, the proposed test generation approach was able to provide additional tests which led to an improvement in coverage compared to assertion-based test generator developed by another research team. This thesis also developed novel algorithms for Graphics Processing Units and used for accelerating mutation-based simulations, which is a computationally intensive application. It was empirically shown for a set of 10 industry-based designs, that efficiently using the GPU's resources can drastically improve the simulation performance on the GPU, when compared to a commercial tool. The additional performance is a necessity, where maximal acceleration is needed for rigorously assessing test quality when simulating large quantities of mutations. This can have a positive impact in the quest for improving assertion quality, ultimately leading to an effective dynamic verification of digital designs.
La vérification fonctionnelle de circuits numériques modernes comporte des défis sans fin dans l'industrie des circuits intégrés (CI). Alimentés par la demande continue d'intégration croissante, les efforts grandissants en vérification ne mènent pas toujours à des circuits sans erreur du premier coup. Une technologie émergente telle que la vérification par assertions peut aider à vérifier le bon fonctionnement des circuits numériques et peut être facilement intégrée aux méthodologies de vérification existantes. La simulation fonctionnelle représente toujours la méthode de vérification laplus répandue dans l'industrie, étant donné sa capacité à traiter des circuits plus volumineux. Les assertions peuvent être insérées dans un circuit et peuvent aussi servir comme repères de couverture, pour lesquels les tests d'entrée ont la responsabilité d'exercer le circuit évaluant ces assertions. L'efficacité de cette approche repose sur la qualité des tests, car de piètres tests peuvent empêcher une vérification complète. Cette thèse présente des techniques et algorithmes novateurs ayant pour but de produire des tests à partir des assertions. En raison des comportements qu'elles décrivent, les assertions représentent une source importante d'information permettant d'extraire des séries de tests fonctionnels, pouvant servir lors de la simulation. Un ensemble de métriques de couverture aide à produire des tests qui évaluent rigoureusement les assertions durant la simulation. Les ingénieurs en vérification peuvent ainsi utiliser ces tests pour effectuer des simulations efficaces dans le but de détecter et corriger des erreurs de conception. L'outil servant à générer des tests à partir des assertions qui a été développé fut évalué avec près de 300 assertions créées dans le but de vérifier le bon fonctionnement de plusieurs circuits industriels. Sur le plan des résultats, l'approche de génération de test proposée a été capable de produire des tests supplémentaires menant à une couverture de test améliorée comparativement à un générateur de test d'une autre équipe de recherche.Le test par mutation est une technique permettant d'évaluer la qualité des tests découlant des assertions. Les simulations de mutations exigent une grande puissance de calcul. Basés sur des processeurs graphiques (GPU), cette thèse présente aussi des algorithmes novateurs dans le domaine des tests par mutations. Sur une série de 10 circuits industriels, les résultats expérimentaux démontrent une amélioration importante de la performance de simulation comparativement à un outil commercial. Cette amélioration des performances est nécessaire étant donné l'accélération de calcul requise pour évaluer la qualité des tests lors de simulations de plusieurs mutations. Cela a un impact bénéfique dans la quête visant à améliorer la qualité des assertions, menant ultimement vers une vérification dynamique efficace de circuits numériques.

APA, Harvard, Vancouver, ISO, and other styles

38

Phillips, Adam. "GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications." Honors in the Major Thesis, University of Central Florida, 2014. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1635.

Full text

Abstract:

A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applications is presented. The works objectives are to (1) develop stable and efficient algorithms utilizing multiple NVIDIA GPUs with CUDA to accelerate common matrix computations, (2) optimize these algorithms through CPU/GPU memory allocation, GPU kernel development, CPU/GPU communication, data transfer and bandwidth control to (3) develop parallel CFD applications for Navier Stokes and Lattice Boltzmann analysis methods. Special consideration will be given to performing the linear algebra algorithms under certain matrix types (banded, dense, diagonal, sparse, symmetric and triangular). Benchmarks are performed for all analyses with baseline CPU times being determined to find speed-up factors and measure computational capability of the GPU accelerated algorithms. The GPU implemented algorithms used in this work along with the optimization techniques performed are measured against preexisting work and test matrices available in the NIST Matrix Market. CFD analysis looked to strengthen the assessment of this work by providing a direct engineering application to analysis that would benefit from matrix optimization techniques and accelerated algorithms. Overall, this work desired to develop optimization for selected linear algebra and matrix computations performed with modern GPU architectures and CUDA developer which were applied directly to mathematical and engineering applications through CFD analysis.
B.S.
Bachelors
Mathematics
Sciences

APA, Harvard, Vancouver, ISO, and other styles

39

Zhang, Michael Longqiang. "A partitioning approach for GPU accelerated level-based on-chip variation static timing analysis." Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/fullcit?p1477953.

Full text

Abstract:

Thesis (M.S.)--University of California, San Diego, 2010.
Title from first page of PDF file (viewed July 16, 2010). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (leaves 52-53).

APA, Harvard, Vancouver, ISO, and other styles

40

Pinska, Adrianna. "Addition of flexible linkers to GPU-accelerated coarse-grained simulations of protein-protein docking." Thesis, Faculty of Science, 2019. http://pubs.cs.uct.ac.za/archive/00001307/.

Full text

Abstract:

Multiprotein complexes are responsible for many vital cellular functions, and understanding their formation has many applications in medical research. Computer simulation has become a valuable tool in the study of biochemical processes, but simulation of large molecular structures such as proteins on a useful scale is computationally expensive. A compromise must be made between the level of detail at which a simulation can be performed, the size of the structures which can be modelled and the time scale of the simulation. Techniques which can be used to reduce the cost of such simulations include the use of coarse-grained models and parallelisation of the code. Parallelisation has recently been made more accessible by the advent of Graphics Processing Units (GPUs), a consumer technology which has become an affordable alternative to more specialised parallel hardware. We extend an existing implementation of a Monte Carlo protein-protein docking simulation using the Kim and Hummer coarse-grained protein model [1] on a heterogeneous GPU-CPU architecture [2]. This implementation has achieved a significant speed-up over previous serial implementations as a result of the efficient parallelisation of its expensive non-bonded potential energy calculation on the GPU. Our contribution is the addition of the optional capability for modelling flexible linkers between rigid domains of a single protein. We implement additional Monte Carlo mutations to allow for movement of residues within linkers, and for movement of domains connected by a linker with respect to each other. We also add potential terms for pseudo-bonds, pseudo-angles and pseudo-torsions between residues to the potential calculation, and include additional residue pairs in the non-bonded potential sum. Our flexible linker code has been tested, validated and benchmarked. We find that the implementation is correct, and that the addition of the linkers does not significantly impact the performance of the simulation. This modification may be used to enable fast simulation of the interaction between component proteins in a multiprotein complex, in configurations which are constrained to preserve particular linkages between the proteins. We demonstrate this utility with a series of simulations of diubiquitin chains, comparing the structure of chains formed through all known linkages between two ubiquitin monomers. We find reasonable agreement between our simulated structures and experimental data on the characteristics of diubiquitin chains in solution.

APA, Harvard, Vancouver, ISO, and other styles

41

Ren, Qinlong, and Qinlong Ren. "GPU Accelerated Study of Heat Transfer and Fluid Flow by Lattice Boltzmann Method on CUDA." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/621746.

Full text

Abstract:

Lattice Boltzmann method (LBM) has been developed as a powerful numerical approach to simulate the complex fluid flow and heat transfer phenomena during the past two decades. As a mesoscale method based on the kinetic theory, LBM has several advantages compared with traditional numerical methods such as physical representation of microscopic interactions, dealing with complex geometries and highly parallel nature. Lattice Boltzmann method has been applied to solve various fluid behaviors and heat transfer process like conjugate heat transfer, magnetic and electric field, diffusion and mixing process, chemical reactions, multiphase flow, phase change process, non-isothermal flow in porous medium, microfluidics, fluid-structure interactions in biological system and so on. In addition, as a non-body-conformal grid method, the immersed boundary method (IBM) could be applied to handle the complex or moving geometries in the domain. The immersed boundary method could be coupled with lattice Boltzmann method to study the heat transfer and fluid flow problems. Heat transfer and fluid flow are solved on Euler nodes by LBM while the complex solid geometries are captured by Lagrangian nodes using immersed boundary method. Parallel computing has been a popular topic for many decades to accelerate the computational speed in engineering and scientific fields. Today, almost all the laptop and desktop have central processing units (CPUs) with multiple cores which could be used for parallel computing. However, the cost of CPUs with hundreds of cores is still high which limits its capability of high performance computing on personal computer. Graphic processing units (GPU) is originally used for the computer video cards have been emerged as the most powerful high-performance workstation in recent years. Unlike the CPUs, the cost of GPU with thousands of cores is cheap. For example, the GPU (GeForce GTX TITAN) which is used in the current work has 2688 cores and the price is only 1,000 US dollars. The release of NVIDIA's CUDA architecture which includes both hardware and programming environment in 2007 makes GPU computing attractive. Due to its highly parallel nature, lattice Boltzmann method is successfully ported into GPU with a performance benefit during the recent years. In the current work, LBM CUDA code is developed for different fluid flow and heat transfer problems. In this dissertation, lattice Boltzmann method and immersed boundary method are used to study natural convection in an enclosure with an array of conduting obstacles, double-diffusive convection in a vertical cavity with Soret and Dufour effects, PCM melting process in a latent heat thermal energy storage system with internal fins, mixed convection in a lid-driven cavity with a sinusoidal cylinder, and AC electrothermal pumping in microfluidic systems on a CUDA computational platform. It is demonstrated that LBM is an efficient method to simulate complex heat transfer problems using GPU on CUDA.

APA, Harvard, Vancouver, ISO, and other styles

42

Cheng, Wei-Hung. "MRI-Based Images Segmentation for GPU Accelerated Fuzzy Methods on Graphics Processing Units by CUDA." Kent State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=kent154349822159698.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Selke, Gunnar [Verfasser], and Dietmar P. F. [Akademischer Betreuer] Möller. "Design and Development of a GPU-Accelerated Micromagnetic Simulator / Gunnar Selke. Betreuer: Dietmar P. F. Möller." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2014. http://d-nb.info/1051435609/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Kelly, Jesse. "Numerical solution of the two-phase incompressible navier-stokes equations using a gpu-accelerated meshless method." Honors in the Major Thesis, University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1277.

Full text

Abstract:

This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Engineering and Computer Science
Mechanical Engineering

APA, Harvard, Vancouver, ISO, and other styles

45

Hellmich, Stephan [Verfasser], and Tilman [Akademischer Betreuer] Spohn. "GPU accelerated n-body integrators for long-term simulations of planetary systems / Stephan Hellmich ; Betreuer: Tilman Spohn." Münster : Universitäts- und Landesbibliothek Münster, 2018. http://d-nb.info/1159955867/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Pacura, Dávid. "Hardware Accelerated Digital Image Stabilization in a Video Stream." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255435.

Full text

Abstract:

Cílem této práce je návrh nové techniky pro stabilizaci obrazu za pomoci hardwarové akcelerace prostřednictvím GPGPU. Využití této techniky umožnuje stabilizaci videosekvencí v reálném čase i pro video ve vysokém rozlišení. Toho je zapotřebí pro ulehčení dalšího zpracování v počítačovém vidění nebo v armádních aplikacích. Z důvodu existence vícerých programovacích modelů pro GPGPU je navrhnutý stabilizační algoritmus implementován ve třech nejpoužívanějších z nich. Jejich výkon a výsledky jsou následně porovnány a diskutovány.

APA, Harvard, Vancouver, ISO, and other styles

47

Caplan, Ronald Meyer. "Study of Vortex Ring Dynamics in the Nonlinear Schrödinger Equation Utilizing GPU-Accelerated High-Order Compact Numerical Integrators." Scholarship @ Claremont, 2012. http://scholarship.claremont.edu/cgu_etd/52.

Full text

Abstract:

We numerically study the dynamics and interactions of vortex rings in the nonlinear Schrödinger equation (NLSE). Single ring dynamics for both bright and dark vortex rings are explored including their traverse velocity, stability, and perturbations resulting in quadrupole oscillations. Multi-ring dynamics of dark vortex rings are investigated, including scattering and merging of two colliding rings, leapfrogging interactions of co-traveling rings, as well as co-moving steady-state multi-ring ensembles. Simulations of choreographed multi-ring setups are also performed, leading to intriguing interaction dynamics. Due to the inherent lack of a close form solution for vortex rings and the dimensionality where they live, efficient numerical methods to integrate the NLSE have to be developed in order to perform the extensive number of required simulations. To facilitate this, compact high-order numerical schemes for the spatial derivatives are developed which include a new semi-compact modulus-squared Dirichlet boundary condition. The schemes are combined with a fourth-order Runge-Kutta time-stepping scheme in order to keep the overall method fully explicit. To ensure efficient use of the schemes, a stability analysis is performed to find bounds on the largest usable time step-size as a function of the spatial step-size. The numerical methods are implemented into codes which are run on NVIDIA graphic processing unit (GPU) parallel architectures. The codes running on the GPU are shown to be many times faster than their serial counterparts. The codes are developed with future usability in mind, and therefore are written to interface with MATLAB utilizing custom GPU-enabled C codes with a MEX-compiler interface. Reproducibility of results is achieved by combining the codes into a code package called NLSEmagic which is freely distributed on a dedicated website.

APA, Harvard, Vancouver, ISO, and other styles

48

Rémy, Adrien. "Solving dense linear systems on accelerated multicore architectures." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112138/document.

Full text

Abstract:

Dans cette thèse de doctorat, nous étudions des algorithmes et des implémentations pour accélérer la résolution de systèmes linéaires denses en utilisant des architectures composées de processeurs multicœurs et d'accélérateurs. Nous nous concentrons sur des méthodes basées sur la factorisation LU. Le développement de notre code s'est fait dans le contexte de la bibliothèque MAGMA. Tout d'abord nous étudions différents solveurs CPU/GPU hybrides basés sur la factorisation LU. Ceux-ci visent à réduire le surcoût de communication dû au pivotage. Le premier est basé sur une stratégie de pivotage dite "communication avoiding" (CALU) alors que le deuxième utilise un préconditionnement aléatoire du système original pour éviter de pivoter (RBT). Nous montrons que ces deux méthodes surpassent le solveur utilisant la factorisation LU avec pivotage partiel quand elles sont utilisées sur des architectures hybrides multicœurs/GPUs. Ensuite nous développons des solveurs utilisant des techniques de randomisation appliquées sur des architectures hybrides utilisant des GPU Nvidia ou des coprocesseurs Intel Xeon Phi. Avec cette méthode, nous pouvons éviter l'important surcoût du pivotage tout en restant stable numériquement dans la plupart des cas. L'architecture hautement parallèle de ces accélérateurs nous permet d'effectuer la randomisation de notre système linéaire à un coût de calcul très faible par rapport à la durée de la factorisation. Finalement, nous étudions l'impact d'accès mémoire non uniformes (NUMA) sur la résolution de systèmes linéaires denses en utilisant un algorithme de factorisation LU. En particulier, nous illustrons comment un placement approprié des processus légers et des données sur une architecture NUMA peut améliorer les performances pour la factorisation du panel et accélérer de manière conséquente la factorisation LU globale. Nous montrons comment ces placements peuvent améliorer les performances quand ils sont appliqués à des solveurs hybrides multicœurs/GPU
In this PhD thesis, we study algorithms and implementations to accelerate the solution of dense linear systems by using hybrid architectures with multicore processors and accelerators. We focus on methods based on the LU factorization and our code development takes place in the context of the MAGMA library. We study different hybrid CPU/GPU solvers based on the LU factorization which aim at reducing the communication overhead due to pivoting. The first one is based on a communication avoiding strategy of pivoting (CALU) while the second uses a random preconditioning of the original system to avoid pivoting (RBT). We show that both of these methods outperform the solver using LU factorization with partial pivoting when implemented on hybrid multicore/GPUs architectures. We also present new solvers based on randomization for hybrid architectures for Nvidia GPU or Intel Xeon Phi coprocessor. With this method, we can avoid the high cost of pivoting while remaining numerically stable in most cases. The highly parallel architecture of these accelerators allow us to perform the randomization of our linear system at a very low computational cost compared to the time of the factorization. Finally we investigate the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and data on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We show how these placements can improve the performance when applied to hybrid multicore/GPU solvers

APA, Harvard, Vancouver, ISO, and other styles

49

Riesinger, Christoph [Verfasser], Hans-Joachim [Akademischer Betreuer] [Gutachter] Bungartz, and Takayuki [Gutachter] Aoki. "Scalable scientific computing applications for GPU-accelerated heterogeneous systems / Christoph Riesinger ; Gutachter: Hans-Joachim Bungartz, Takayuki Aoki ; Betreuer: Hans-Joachim Bungartz." München : Universitätsbibliothek der TU München, 2017. http://d-nb.info/1138787892/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Tollefson, Mallory RaNae. "Accelerated many-body protein side-chain repacking using gpus: application to proteins implicated in hearing loss." Thesis, University of Iowa, 2017. https://ir.uiowa.edu/etd/6006.

Full text

Abstract:

With recent advances and cost reductions in next generation sequencing (NGS), the amount of genetic sequence data is increasing rapidly. However, before patient specific genetic information reaches its full potential to advance clinical diagnostics, the immense degree of genetic heterogeneity that contributes to human disease must be more fully understood. For example, although large numbers of genetic variations are discovered during clinical use of NGS, annotating and understanding the impact of such coding variations on protein phenotype remains a bottleneck (i.e. what is the molecular mechanism behind deafness phenotypes). Fortunately, computational methods are emerging that can be used to efficiently study protein coding variants, and thereby overcome the bottleneck brought on by rapid adoption of clinical sequencing. To study proteins via physics-based computational algorithms, high-quality 3D structural models are essential. These protein models can be obtained using a variety of numerical optimization methods that operate on physics-based potential energy functions. Accurate protein structures serve as input to downstream variation analysis algorithms. In this work, we applied a novel amino acid side-chain optimization algorithm, which operated on an advanced model of atomic interactions (i.e. the AMOEBA polarizable force field), to a set of 164 protein structural models implicated in deafness. The resulting models were evaluated with the MolProbity structure validation tool. MolProbity “scores” were originally calibrated to predict the quality of X-ray diffraction data used to generate a given protein model (i.e. a 1.0 Å or lower MolProbity score indicates a protein model from high quality data, while a score of 4.0 Å or higher reflects relatively poor data). In this work, the side-chain optimization algorithm improved mean MolProbity score from 2.65 Å (42nd percentile) to nearly atomic resolution at 1.41 Å (95th percentile). However, side-chain optimization with the AMOEBA many-body potential function is computationally expensive. Thus, a second contribution of this work is a parallelization scheme that utilizes nVidia graphical processing units (GPUs) to accelerate the side-chain repacking algorithm. With the use of one GPU, our side-chain optimization algorithm achieved a 25 times speed-up compared to using two Intel Xeon E5-2680v4 central processing units (CPUs). We expect the GPU acceleration scheme to lessen demand on computing resources dedicated to protein structure optimization efforts and thereby dramatically expand the number of protein structures available to aid in interpretation of missense variations associated with deafness.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!