Dissertations / Theses on the topic 'GPU1'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'GPU1.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Stodůlka, Martin. "Akcelerace ultrazvukových simulací pomocí multi-GPU systémů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445538.
Full textMa, Wenjing. "Automatic Transformation and Optimization of Applications on GPUs and GPU clusters." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1300972089.
Full textTanasić, Ivan. "Towards multiprogrammed GPUs." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/405796.
Full textLas Unidades de Procesamiento de Gráficos Programables (GPU, por sus siglas en inglés) se han convertido recientemente en los procesadores masivamente paralelos más difundidos. Han recorrido un largo camino desde ASICs de función fija diseñados para acelerar tareas gráficas, hasta una arquitectura programable que también puede ejecutar cálculos de propósito general. Debido a su rendimiento y eficiencia, una cantidad creciente de software se basa en ellas para acelerar las secciones de código computacionalmente intensivas que disponen de paralelismo de datos. Se han ganado un lugar en muchos sistemas, desde dispositivos móviles de baja potencia hasta los centros de datos más grandes del mundo. Sin embargo, las GPUs siguen plagadas por el hecho de que esencialmente no tienen soporte de multiprogramación, lo que resulta en un bajo rendimiento del sistema si la GPU se comparte entre múltiples programas. En esta disertación nos centramos en proporcionar soporte de multiprogramación para GPUs mediante la mejora de las capacidades de multitarea y del soporte de memoria virtual. El principal problema que dificulta el soporte multitarea en las GPUs es la ejecución no apropiativa de los núcleos de la GPU. Proponemos dos mecanismos de apropiación con diferentes filosofías de diseño, que pueden ser utilizados por un planificador para apropiarse de los núcleos de la GPU y asignarlos a otros procesos. También abogamos por la división espacial de la GPU y proponemos una implementación concreta de un planificador hardware que divide dinámicamente los núcleos de la GPU entre los kernels en ejecución, de acuerdo con sus prioridades establecidas. Oponiéndose a las suposiciones hechas por otros en trabajos relacionados, demostramos que la ejecución apropiativa es factible y el enfoque deseado para la multitarea en GPUs. Además, mostramos una mayor equidad y capacidad de respuesta del sistema con nuestra política de asignación de núcleos de la GPU. También señalamos que la causa principal del insuficiente soporte de la memoria virtual en las GPUs es el mecanismo de manejo de excepciones utilizado por las GPUs modernas. En la actualidad, las GPUs descargan el manejo de las excepciones a la CPU, mientras que la instrucción que causo la fallada se encuentra esperando en el núcleo de la GPU. Este modelo de bloqueo en fallada impide algunas de las funciones y optimizaciones de la memoria virtual y es especialmente perjudicial en entornos multiprogramados porque evita el cambio de contexto de la GPU a menos que se resuelvan todas las fallas pendientes. En esta disertación, proponemos tres implementaciones del pipeline de los núcleos de la GPU que ofrecen distintos balances de rendimiento-complejidad y permiten la apropiación del núcleo aunque haya excepciones pendientes (es decir, la instrucción que produjo la fallada puede ser reiniciada más tarde). Basándonos en esta nueva funcionalidad, implementamos dos casos de uso para demostrar su utilidad. El primero es un planificador que asigna el núcleo a otros subprocesos cuando hay una fallada para tratar de hacer trabajo útil mientras esta se resuelve, ocultando así la latencia de la fallada y mejorando el rendimiento del sistema. El segundo permite que el código de manejo de las falladas se ejecute localmente en la GPU, en lugar de descargar el manejo a la CPU, mostrando que el manejo local de falladas también puede mejorar el rendimiento.
Hong, Changwan. "Code Optimization on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.
Full textWang, Kaibo. "Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1447685368.
Full textPedersen, Stian Aaraas. "Progressive Photon Mapping on GPUs." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-22995.
Full textHarb, Mohammed. "Quantum transport modeling with GPUs." Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=114417.
Full textDans cette thèse, nous présentons un logiciel qui effectue des calculs de transport quantique en utilisant conjointement la théorie des fonctions de Green hors équilibre (non equilibrium Green function, NEGF) et le modèle des liens étroits (tight-binding model, TB). Notre logiciel tire avantage du parallélisme inhérent aux algorithmes utilisés en plus d'être accéléré grâce à l'utilisation de processeurs graphiques (GPU). Nous abordons également les problèmes théoriques, géométriques et numériques qui se posent lors de l'implémentation du code NEGF-TB. Nous démontrons ensuite qu'une implémentation hétérogène utilisant des CPU et des GPU est supérieure aux implémentations à processeur unique, à celles à processeurs multiples, et même aux implémentations massivement parallèles n'utilisant que des CPU. Le GPU-Matlab Interface (GMI) présenté dans cette thèse fut développé pour des fins de calculs de transport quantique NEGF-TB. Néanmoins, les capacités de GMI ne se limitent pas à l'utilisation que nous en faisons ici et GMI peut être utilisé par des chercheurs de tous les domaines n'ayant pas de connaissances préalables de la programmation GPU ou de la programmation "multi-thread". Nous démontrons également que GMI compétitionne avantageusement avec des logiciels commerciaux similaires.Enfin, nous utilisons notre logiciel NEGF-TB pour étudier certaines propriétés de transport électronique de nanofils de Si et de Nanobeams. Nous examinons l'effet de plusieurs sortes de lacunes sur la conductance de ces structures et démontrons que notre méthode peut étudier des systèmes de plus de 200 000 atomes en un temps raisonnable en utilisant de un à quatre GPU sur un seul poste de travail.
Hovland, Rune Johan. "Throughput Computing on Future GPUs." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9893.
Full textThe general-purpose computing capabilities of the Graphics Processing Unit (GPU) have recently been given a great deal of attention by the High-Performance Computing (HPC) community. By allowing massively parallel applications to run efficiently on commodity graphics cards, personal supercomputers are now available in desktop versions at a low price. For some applications, speedups of 70 times that of a single CPU implementation have been achieved. Among the most popular GPUs are those based on the NVIDIA Tesla Architecture which allows relatively easy development of GPU applications using the NVIDIA CUDA programming environment. While the GPU is gaining interest in the HPC community, others are more reluctant to embrace the GPU as a computational device. The focus on throughput and large data volumes separates Information Retrieval (IR) from HPC, since for IR it is critical to process large amounts of data efficiently, a task which the GPU currently does not excel at. Only recently has the IR community begun to explore the possibilities, and an implementation of a search engine for the GPU was published recently in April 2009. This thesis analyzes how GPUs can be improved to better suit large data volume applications. Current graphics cards have a bottleneck regarding the transfer of data between the host and the GPU. One approach to resolve this bottleneck is to include the host memory as part of the GPUs memory hierarchy. We develop a theoretical model, and based on this model, the expected performance improvement for high data volume applications are shown for both computationally-bound and data transfer-bound applications. The performance improvement for an existing search engine is also given based on the theoretical model. For this case, the improvements would result in a speedup between 1.389 and 1.874 for the various query-types supported by the search engine.
Kim, Jinsung. "Optimizing Tensor Contractions on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563237825735994.
Full textTadros, Rimon. "Accelerating web search using GPUs." Thesis, University of British Columbia, 2015. http://hdl.handle.net/2429/54722.
Full textApplied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate
Lapajne, Mikael Hellborg, and Daniel Slat. "Random Forests for CUDA GPUs." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2953.
Full textMikael: +46768539263, Daniel: +46703040693
Polok, Lukáš. "WaldBoost na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236790.
Full textStraňák, Marek. "Raytracing na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237020.
Full textYanggratoke, Rerngvit. "GPU Network Processing." Thesis, KTH, Telekommunikationssystem, TSLab, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-103694.
Full textNätverksteknik ansluter fler och fler människor runt om i världen. Det har blivit en viktig del av vårt dagliga liv. För att denna anslutning skall vara sömlös, måste nätet vara snabbt. Den snabba tillväxten i nätverkstrafiken och olika kommunikationsprotokoll sätter stora krav på processorer som hanterar all trafik. Befintliga lösningar på detta problem, t.ex. ASIC, FPGA, NPU, och TOE är varken kostnadseffektivt eller lätta att hantera, eftersom de kräver speciell hårdvara och anpassade konfigurationer. Denna avhandling angriper problemet på ett annat sätt genom att avlasta nätverks processningen till grafikprocessorer som sitter i vanliga pc-grafikkort. Avhandlingen främsta mål är att ta reda på hur GPU bör användas för detta. Avhandlingen följer fallstudie modell och de valda fallen är lager 2 Bloom filter forwardering och ``flow lookup'' i Openflow switch. Implementerings alternativ och utvärderingsmetodik föreslås för både fallstudierna. Sedan utvecklas och utvärderas en prototyp för att jämföra mellan traditionell CPU- och GPU-offload. Det primära resultatet från detta arbete utgör kriterier för nätvärksprocessfunktioner lämpade för GPU offload och vilka kompromisser som måste göras. Kriterier är inget inter-paket beroende, liknande processflöde för alla paket. och möjlighet att köra fler processer på ett paket paralellt. GPU offloading ger ökad fördröjning och minneskonsumption till förmån för högre troughput.
Berrajaa, Achraf. "Parallélisation d'heuristiques d'optimisation sur les GPUs." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMLH31/document.
Full textThis thesis presents contributions to the resolution (on GPUs) of real optimization problems of large sizes. The vehicle routing problems (VRP) and the hub location problems (HLP) are treated. Various approaches implemented on GPU to solve variants of the VRP. A parallel genetic algorithm (GA) on GPU is proposed to solve different variants of the HLP. The proposed GA adapts its encoding, initial solution, genetic operators and its implementation to each of the variants treated. Finally, we used the GA to solve the HLP with uncertainties on the data.The numerical tests show that the proposed approaches effectively exploit the computing power of the GPU and have made it possible to resolve large instances up to 6000 nodes
SILVA, CLEOMAR PEREIRA DA. "MASSIVELY PARALLEL GENETIC PROGRAMMING ON GPUS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24129@1.
Full textCOORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
PROGRAMA DE EXCELENCIA ACADEMICA
A Programação Genética permite que computadores resolvam problemas automaticamente, sem que eles tenham sido programados para tal. Utilizando a inspiração no princípio da seleção natural de Darwin, uma população de programas, ou indivíduos, é mantida, modificada baseada em variação genética, e avaliada de acordo com uma função de aptidão (fitness). A programação genética tem sido usada com sucesso por uma série de aplicações como projeto automático, reconhecimento de padrões, controle robótico, mineração de dados e análise de imagens. Porém, a avaliação da gigantesca quantidade de indivíduos gerados requer excessiva quantidade de computação, levando a um tempo de execução inviável para problemas grandes. Este trabalho explora o alto poder computacional de unidades de processamento gráfico, ou GPUs, para acelerar a programação genética e permitir a geração automática de programas para grandes problemas. Propomos duas novas metodologias para se explorar a GPU em programação genética: compilação em linguagem intermediária e a criação de indivíduos em código de máquina. Estas metodologias apresentam vantagens em relação às metodologias tradicionais usadas na literatura. A utilização de linguagem intermediária reduz etapas de compilação e trabalha com instruções que estão bem documentadas. A criação de indivíduos em código de máquina não possui nenhuma etapa de compilação, mas requer engenharia reversa das instruções que não estão documentadas neste nível. Nossas metodologias são baseadas em programação genética linear e inspiradas em computação quântica. O uso de computação quântica permite uma convergência rápida, capacidade de busca global e inclusão da história passada dos indivíduos. As metodologias propostas foram comparadas com as metodologias existentes e apresentaram ganhos consideráveis de desempenho. Foi observado um desempenho máximo de até 2,74 trilhões de GPops (operações de programação genética por segundo) para o benchmark Multiplexador de 20 bits e foi possível estender a programação genética para problemas que apresentam bases de dados de até 7 milhões de amostras.
Genetic Programming enables computers to solve problems automatically, without being programmed to it. Using the inspiration in the Darwin s Principle of natural selection, a population of programs or individuals is maintained, modified based on genetic variation, and evaluated according to a fitness function. Genetic programming has been successfully applied to many different applications such as automatic design, pattern recognition, robotic control, data mining and image analysis. However, the evaluation of the huge amount of individuals requires excessive computational demands, leading to extremely long computational times for large size problems. This work exploits the high computational power of graphics processing units, or GPUs, to accelerate genetic programming and to enable the automatic generation of programs for large problems. We propose two new methodologies to exploit the power of the GPU in genetic programming: intermediate language compilation and individuals creation in machine language. These methodologies have advantages over traditional methods used in the literature. The use of an intermediate language reduces the compilation steps, and works with instructions that are well-documented. The individuals creation in machine language has no compilation step, but requires reverse engineering of the instructions that are not documented at this level. Our methodologies are based on linear genetic programming and are inspired by quantum computing. The use of quantum computing allows rapid convergence, global search capability and inclusion of individuals past history. The proposed methodologies were compared against existing methodologies and they showed considerable performance gains. It was observed a maximum performance of 2,74 trillion GPops (genetic programming operations per second) for the 20-bit Multiplexer benchmark, and it was possible to extend genetic programming for problems that have databases with up to 7 million samples.
Dublish, Saumay Kumar. "Managing the memory hierarchy in GPUs." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31205.
Full textRawat, Prashant Singh. "Optimization of Stencil Computations on GPUs." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1523037713249436.
Full textSuri, Bharath. "Accelerating the knapsack problem on GPUs." Thesis, Linköpings universitet, ESLAB - Laboratoriet för inbyggda system, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70406.
Full textBjörn, Overå. "Skinning på GPUn : Med dubbel kvaternioner." Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-21364.
Full textDollinger, Jean-François. "A framework for efficient execution on GPU and CPU+GPU systems." Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAD019/document.
Full textTechnological limitations faced by the semi-conductor manufacturers in the early 2000's restricted the increase in performance of the sequential computation units. Nowadays, the trend is to increase the number of processor cores per socket and to progressively use the GPU cards for highly parallel computations. Complexity of the recent architectures makes it difficult to statically predict the performance of a program. We describe a reliable and accurate parallel loop nests execution time prediction method on GPUs based on three stages: static code generation, offline profiling, and online prediction. In addition, we present two techniques to fully exploit the computing resources at disposal on a system. The first technique consists in jointly using CPU and GPU for executing a code. In order to achieve higher performance, it is mandatory to consider load balance, in particular by predicting execution time. The runtime uses the profiling results and the scheduler computes the execution times and adjusts the load distributed to the processors. The second technique, puts CPU and GPU in a competition: instances of the considered code are simultaneously executed on CPU and GPU. The winner of the competition notifies its completion to the other instance, implying the termination of the latter
Arvid, Johnsson. "Analysis of GPU accelerated OpenCL applications on the Intel HD 4600 GPU." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-140124.
Full textVan, Luong Thé. "Métaheuristiques parallèles sur GPU." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2011. http://tel.archives-ouvertes.fr/tel-00638820.
Full textJensen, Jørgen Haavind. "Hydrodynamiske beregninger vha GPU." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for marin teknikk, 2010. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-11549.
Full textLien, Geir Josten. "Auto-tunable GPU BLAS." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-18411.
Full textTokdemir, Serpil. "DCT Implementation on GPU." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_theses/33.
Full textYuan, George Lai. "GPU compute memory systems." Thesis, University of British Columbia, 2009. http://hdl.handle.net/2429/15877.
Full textTokdemir, Serpil. "Digital compression on GPU." unrestricted, 2006. http://etd.gsu.edu/theses/available/etd-12012006-154433/.
Full textTitle from dissertation title page. Saeid Belkasim, committee chair; Ying Zhu, A.P. Preethy, committee members. Electronic text (90 p. : ill. (some col.)). Description based on contents viewed May 2, 2007. Includes bibliographical references (p. 78-81).
Young, Bobby Dalton. "MPI WITHIN A GPU." UKnowledge, 2009. http://uknowledge.uky.edu/gradschool_theses/614.
Full textBlomquist, Linus, and Hampus Engström. "GPU based IP forwarding." Thesis, Linköpings universitet, Programvara och system, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-119433.
Full textMondal, Siddhartha Sankar. "GPU: Power vs Performance." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-237364.
Full textBORDIGNON, ALEX LAIER. "NAVIER-STOKES EM GPU." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2006. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=8928@1.
Full textNesse trabalho, mostramos como simular um fluido em duas dimensões em um domÃnio com fronteiras arbitrárias. Nosso trabalho é baseado no esquema stable fluids desenvolvido por Joe Stam. A implementação é feita na GPU (Graphics Processing Unit), permitindo velocidade de interação com o fluido. Fazemos uso da linguagem Cg (C for Graphics), desenvolvida pela companhia NVidia. Nossas principais contribuições são o tratamento das múltiplas fronteiras, onde aplicamos interpolação bilinear para atingir melhores resultados, armazenamento das condições de fronteira usa apenas um canal de textura, e o uso de confinamento de vorticidade.
In this work we show how to simulate fluids in two dimensions in a domain with arbitrary bondaries. Our work is based on the stable fluid scheme developed by Jo Stam. The implementation is done in GPU (Graphics Processinfg Unit), thus allowing fluid interaction speed. We use the language Cg (C for Graphics) developed by the company Nvídia. Our main contributions are the treatment of domains with multiple boundaries, where we apply bilinear interpolation to obtain better results, the storage of the bondaty conditions in a unique texturre channel, and the use of vorticity confinement.
DUARTE, LEONARDO SEPERUELO. "GRAINS SIMULATION ON GPU." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2009. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=16008@1.
Full textA proposta deste trabalho é viabilizar e acelerar a simulação de um sistema de grãos implementado inteiramente na GPU, utilizando o Método dos Elementos Discretos (MED). O objetivo de implementar todo o sistema na GPU é evitar o custo de transferência de informações entre a placa gráfica e a CPU. O sistema proposto simula partículas de diferentes diâmetros, com tratamento de colisão entre partículas e entre partículas e o ambiente. Com o Método dos Elementos Discretos são consideradas forças normais e forças tangenciais aplicadas sobre as partículas. Algoritmos paralelos foram desenvolvidos para construção e armazenamento do histórico de forças tangenciais existente em cada contato entre partículas. São comparadas duas propostas de construção de grade regular de células para realizar a detecção de contato. A primeira proposta é muito eficiente para partículas com raio fixo, enquanto que a segunda se mostra com maior escalabilidade para modelos com variação de raio. O sistema é composto por diversos algoritmos executados em threads, responsáveis por cada etapa da simulação. Os resultados da simulação foram validados com o programa comercial PFC3D. O sistema de partículas em GPU consegue ser até 10 vezes mais rápido do que o programa comercial.
The purpose of this work is to make possible and speed up a grain system simulation implemented entirely on GPU, using the Discrete Element Method (DEM). The goal of implementing all the system on GPU is to avoid the cost of data transfer between the graphics hardware and the CPU. The proposed system simulate particles of different diameters, with collision treatment between particles and between particles and the environment. The Discrete Element Method consider normal forces and tangential forces applied on the particles. Parallel algorithms were designed to construct and storage the tangential forces history present in each contact between particles. Two ideas for the construction of the regular grid of cells are proposed and compared to perform the collision detection. The first one is very efficient to particles with fixed radius, while the second one shows more scalability in models with radius variation. The system consists of several algorithms running in threads, responsible for each step of the simulation. The results of the simulation were validated with the commercial program called PFC3D. The GPU particle system can be up to 10 times faster then the commercial program.
Lionetti, Fred. "GPU accelerated cardiac electrophysiology." Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/ucsd/fullcit?p1474756.
Full textTitle from first page of PDF file (viewed April 14, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 85-89).
Souza, Thársis Tuani Pinto. "Simulações Financeiras em GPU." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-23052013-234703/.
Full textGiven the uncertainty of their variables, it is common to model financial problems with stochastic processes. Furthermore, real problems in this area have a high computational cost. This suggests the use of High Performance Computing (HPC) to handle them. New generations of graphics hardware (GPU) enable general purpose computing while maintaining high memory bandwidth and large computing power. Therefore, this type of architecture is an excellent alternative in HPC and comptutational finance. The main purpose of this work is to study the computational and mathematical tools needed for stochastic modeling in finance using GPUs. We present GPUs as a platform for general purpose computing. We then analyze a variety of random number generators, both in sequential and parallel architectures, and introduce the fundamental mathematical tools for Stochastic Calculus and Monte Carlo simulation. With this background, we present two case studies in finance: ``Optimal Trading Stops\'\' and ``Market Risk Management\'\'. In the first case, we solve the problem of obtaining the optimal gain on a stock trading strategy of ``Stop Gain\'\'. The proposed solution is scalable and with inherent parallelism on GPU. For the second case, we propose a parallel algorithm to compute market risk, as well as techniques for improving the quality of the solutions. In our experiments, there was a 4 times improvement in the quality of stochastic simulation and an acceleration of over 50 times.
Mäkelä, J. (Jussi). "GPU accelerated face detection." Master's thesis, University of Oulu, 2013. http://urn.fi/URN:NBN:fi:oulu-201303181103.
Full textGrafiikkaprosessorit kykenevät massiiviseen rinnakkaislaskentaan ja niiden käyttö yleiseen laskentaan on kasvava kiinnostuksen aihe. Eräs alue missä kiihdytyksen käytöstä on kiinnostuttu on laskennallisesti raskaat konenäköalgoritmit kuten kasvojen ilmaisu ja tunnistus. Kasvojen ilmaisua käytetään useissa sovelluksissa, kuten kameroiden automaattitarkennuksessa, kasvojen ja tunteiden tunnistuksessa sekä kulun valvonnassa. Tässä työssä kasvojen ilmaisualgoritmia kiihdytettiin grafiikkasuorittimella käyttäen OpenCL-rajapintaa. Työn tavoite oli parantunut suorituskyky kuitenkin niin että implementaatiot pysyivät toiminnallisesti samanlaisina. OpenCL-versio perustui optimoituun verrokki-implementaatioon. Algoritmin eri vaiheiden kiihdytyksen mahdollisuuksia ja haasteita on tutkittu. Kiihdytetty- ja verrokki-implementaatio kuvaillaan ja niiden välistä suorituskykyeroa vertaillaan. Suorituskykyä arvioitiin ajoaikojen perusteella. Testeissä käytettiin kolmea kuvasarjaa joissa jokaisessa oli neljä eri kokoista kuvaa sekä kolmea lisäkuvaa jotka kuvastivat erikoistapauksia. Testit ajettiin kahdella erilailla varustellulla tietokoneella. Tuloksista voidaan nähdä että kasvojen ilmaisu soveltuu hyvin GPU kiihdytykseen, sillä algoritmin pystyy rinnakkaistamaan ja siinä pystyy käyttämään tehokasta tekstuurinkäsittelylaitteistoa. OpenCL-ympäristön alustaminen aiheuttaa viivettä joka vähentää jonkin verran suorituskykyetua. Testeissä todettiin kiihdytetyn implementaation antavan saman suuruisen tai jopa pienemmän suorituskyvyn kuin verrokki-implementaatio sellaisissa tapauksissa, joissa laskentaa oli vähän johtuen joko pienestä tai helposti käsiteltävästä kuvasta. Toisaalta kiihdytetyn implementaation suorituskyky oli hyvä verrattuna verrokki-implementaatioon kun käytettiin suuria ja monimutkaisia kuvia. Tulevaisuudessa OpenCL-ympäristön alustamisen aiheuttamat viivettä tulisi saada vähennettyä. Tämä työ on kiinnostava myös tulevaisuudessa kun OpenCL-kiihdytys tulee mahdolliseksi matkapuhelimissa
Walton, Simon. "GPU-based volume deformation." Thesis, Swansea University, 2007. https://cronfa.swan.ac.uk/Record/cronfa43117.
Full textFrank, Igor. "Simulace kapalin na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445570.
Full textJurák, Martin. "Detekce objektů na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234897.
Full textMacenauer, Pavel. "Detekce objektů na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234942.
Full textSchmied, Jan. "GPU akcelerované prolamování šifer." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236071.
Full textGalacz, Roman. "Photon tracing na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236175.
Full textPotěšil, Josef. "Akcelerace kryptografie pomocí GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237073.
Full textJanošík, Ondřej. "Fyzikální simulace na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255365.
Full textGraves, Alex. "GPU-Accelerated Feature Tracking." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1462372516.
Full textLuong, Thé Van. "Métaheuristiques parallèles sur GPU." Thesis, Lille 1, 2011. http://www.theses.fr/2011LIL10058/document.
Full textReal-world optimization problems are often complex and NP-hard. Their modeling is continuously evolving in terms of constraints and objectives, and their resolution is CPU time-consuming. Although near-optimal algorithms such as metaheuristics (generic heuristics) make it possible to reduce the temporal complexity of their resolution, they fail to tackle large problems satisfactorily. Over the last decades, parallel computing has been revealed as an unavoidable way to deal with large problem instances of difficult optimization problems. The design and implementation of parallel metaheuristics are strongly influenced by the computing platform. Nowadays, GPU computing has recently been revealed effective to deal with time-intensive problems. This new emerging technology is believed to be extremely useful to speed up many complex algorithms. One of the major issues for metaheuristics is to rethink existing parallel models and programming paradigms to allow their deployment on GPU accelerators. Generally speaking, the major issues we have to deal with are: the distribution of data processing between CPU and GPU, the thread synchronization, the optimization of data transfer between the different memories, the memory capacity constraints, etc. The contribution of this thesis is to deal with such issues for the redesign of parallel models of metaheuristics to allow solving of large scale optimization problems on GPU architectures. Our objective is to rethink the existing parallel models and to enable their deployment on GPUs. Thereby, we propose in this document a new generic guideline for building efficient parallel metaheuristics on GPU. Our challenge is to come out with the GPU-based design of the whole hierarchy of parallel models.In this purpose, very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of solutions to GPU threadsor memory management. These approaches have been exhaustively experimented using five optimization problems and four GPU configurations. Compared to a CPU-based execution, experiments report up to 80-fold acceleration for large combinatorial problems and up to 2000-fold speed-up for a continuous problem. The different works related to this thesis have been accepted in a dozen of publications, including the IEEE Transactions on Computers journal
Olsson, Martin Wexö. "GPU based particle system." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3761.
Full textPettersson, Jimmy, and Ian Wainwright. "Radar Signal Processing with Graphics Processors (GPUS)." Thesis, Uppsala University, Division of Scientific Computing, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-114003.
Full textSaffar, Shamshirgar Davoud. "Accelerated Pressure Projection using OpenCL on GPUs." Thesis, KTH, Mekanik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-102771.
Full textLiu, Chi-man, and 廖志敏. "Efficient solutions for bioinformatics applications using GPUs." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2015. http://hdl.handle.net/10722/211138.
Full textpublished_or_final_version
Computer Science
Doctoral
Doctor of Philosophy