Dissertations / Theses on the topic 'NVIDIA CUDA GPU'

To see the other types of publications on this topic, follow the link: NVIDIA CUDA GPU.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 30 dissertations / theses for your research on the topic 'NVIDIA CUDA GPU.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Ikeda, Patricia Akemi. "Um estudo do uso eficiente de programas em placas gráficas." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25042012-212956/.

Full text
Abstract:
Inicialmente projetadas para processamento de gráficos, as placas gráficas (GPUs) evoluíram para um coprocessador paralelo de propósito geral de alto desempenho. Devido ao enorme potencial que oferecem para as diversas áreas de pesquisa e comerciais, a fabricante NVIDIA destaca-se pelo pioneirismo ao lançar a arquitetura CUDA (compatível com várias de suas placas), um ambiente capaz de tirar proveito do poder computacional aliado à maior facilidade de programação. Na tentativa de aproveitar toda a capacidade da GPU, algumas práticas devem ser seguidas. Uma delas consiste em manter o hardware o mais ocupado possível. Este trabalho propõe uma ferramenta prática e extensível que auxilie o programador a escolher a melhor configuração para que este objetivo seja alcançado.
Initially designed for graphical processing, the graphic cards (GPUs) evolved to a high performance general purpose parallel coprocessor. Due to huge potencial that graphic cards offer to several research and commercial areas, NVIDIA was the pioneer lauching of CUDA architecture (compatible with their several cards), an environment that take advantage of computacional power combined with an easier programming. In an attempt to make use of all capacity of GPU, some practices must be followed. One of them is to maximizes hardware utilization. This work proposes a practical and extensible tool that helps the programmer to choose the best configuration and achieve this goal.
APA, Harvard, Vancouver, ISO, and other styles
2

Rivera-Polanco, Diego Alejandro. "COLLECTIVE COMMUNICATION AND BARRIER SYNCHRONIZATION ON NVIDIA CUDA GPU." Lexington, Ky. : [University of Kentucky Libraries], 2009. http://hdl.handle.net/10225/1158.

Full text
Abstract:
Thesis (M.S.)--University of Kentucky, 2009.
Title from document title page (viewed on May 18, 2010). Document formatted into pages; contains: ix, 88 p. : ill. Includes abstract and vita. Includes bibliographical references (p. 86-87).
APA, Harvard, Vancouver, ISO, and other styles
3

Harvey, Jesse Patrick. "GPU acceleration of object classification algorithms using NVIDIA CUDA /." Online version of thesis, 2009. http://hdl.handle.net/1850/10894.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Lerchundi, Osa Gorka. "Fast Implementation of Two Hash Algorithms on nVidia CUDA GPU." Thesis, Norwegian University of Science and Technology, Department of Telematics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9817.

Full text
Abstract:

User needs increases as time passes. We started with computers like the size of a room where the perforated plaques did the same function as the current machine code object does and at present we are at a point where the number of processors within our graphic device unit it’s not enough for our requirements. A change in the evolution of computing is looming. We are in a transition where the sequential computation is losing ground on the benefit of the distributed. And not because of the birth of the new GPUs easily accessible this trend is novel but long before it was used for projects like SETI@Home, fightAIDS@Home, ClimatePrediction and there were shouting from the rooftops about what was to come. Grid computing was its formal name. Until now it was linked only to distributed systems over the network, but as this technology evolves it will take different meaning. nVidia with CUDA has been one of the first companies to make this kind of software package noteworthy. Instead of being a proof of concept it’s a real tool. Where the transition is expressed in greater magnitude in which the true artist is the programmer who uses it and achieves performance increases. As with many innovations, a community distributed worldwide has grown behind this software package and each one doing its bit. It is noteworthy that after CUDA release a lot of software developments grown like the cracking of the hitherto insurmountable WPA. With Sony-Toshiba-IBM (STI) alliance it could be said the same thing, it has a great community and great software (IBM is the company in charge of maintenance). Unlike nVidia is not as accessible as it is but IBM is powerful enough to enter home made supercomputing market. In this case, after IBM released the PS3 SDK, a notorious application was created using the benefits of parallel computing named Folding@Home. Its purpose is to, inter alia, find the cure for cancer. To sum up, this is only the beginning, and in this thesis is sized up the possibility of using this technology for accelerating cryptographic hash algorithms. BLUE MIDNIGHT WISH (The hash algorithm that is applied to the surgery) is undergone to an environment change adapting it to a parallel capable code for creating empirical measures that compare to the current sequential implementations. It will answer questions that nowadays haven’t been answered yet. BLUE MIDNIGHT WISH is a candidate hash function for the next NIST standard SHA-3, designed by professor Danilo Gligoroski from NTNU and Vlastimil Klima – an independent cryptographer from Czech Republic. So far, from speed point of view BLUE MIDNIGHT WISH is on the top of the charts (generally on the second place – right behind EDON-R - another hash function from professor Danilo Gligoroski). One part of the work on this thesis was to investigate is it possible to achieve faster speeds in processing of Blue Midnight Wish when the computations are distributed among the cores in a CUDA device card. My numerous experiments give a clear answer: NO. Although the answer is negative, it still has a significant scientific value. The point is that my work acknowledges viewpoints and standings of a part of the cryptographic community that is doubtful that the cryptographic primitives will benefit when executed in parallel in many cores in one CPU. Indeed, my experiments show that the communication costs between cores in CUDA outweigh by big margin the computational costs done inside one core (processor) unit.

APA, Harvard, Vancouver, ISO, and other styles
5

Sreenibha, Reddy Byreddy. "Performance Metrics Analysis of GamingAnywhere with GPU accelerated Nvidia CUDA." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16846.

Full text
Abstract:
The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU.
APA, Harvard, Vancouver, ISO, and other styles
6

Savioli, Nicolo'. "Parallelization of the algorithm WHAM with NVIDIA CUDA." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/6377/.

Full text
Abstract:
The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).
APA, Harvard, Vancouver, ISO, and other styles
7

Zaahid, Mohammed. "Performance Metrics Analysis of GamingAnywhere with GPU acceletayed NVIDIA CUDA using gVirtuS." Thesis, Blekinge Tekniska Högskola, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16852.

Full text
Abstract:
The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU.
APA, Harvard, Vancouver, ISO, and other styles
8

Virk, Bikram. "Implementing method of moments on a GPGPU using Nvidia CUDA." Thesis, Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33980.

Full text
Abstract:
This thesis concentrates on the algorithmic aspects of Method of Moments (MoM) and Locally Corrected Nyström (LCN) numerical methods in electromagnetics. The data dependency in each step of the algorithm is analyzed to implement a parallel version that can harness the powerful processing power of a General Purpose Graphics Processing Unit (GPGPU). The GPGPU programming model provided by NVIDIA's Compute Unified Device Architecture (CUDA) is described to learn the software tools at hand enabling us to implement C code on the GPGPU. Various optimizations such as the partial update at every iteration, inter-block synchronization and using shared memory enable us to achieve an overall speedup of approximately 10. The study also brings out the strengths and weaknesses in implementing different methods such as Crout's LU decomposition and triangular matrix inversion on a GPGPU architecture. The results suggest future directions of study in different algorithms and their effectiveness on a parallel processor environment. The performance data collected show how different features of the GPGPU architecture can be enhanced to yield higher speedup.
APA, Harvard, Vancouver, ISO, and other styles
9

Ekstam, Ljusegren Hannes, and Hannes Jonsson. "Parallelizing Digital Signal Processing for GPU." Thesis, Linköpings universitet, Programvara och system, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-167189.

Full text
Abstract:
Because of the increasing importance of signal processing in today's society, there is a need to easily experiment with new ways to process signals. Usually, fast-performing digital signal processing is done with special-purpose hardware that are difficult to develop for. GPUs pose an alternative for fast performing digital signal processing. The work in this thesis is an analysis and implementation of a GPU version of a digital signal processing chain provided by SAAB. Through an iterative process of development and testing, a final implementation was achieved. Two benchmarks, both comprised of 4.2 M test samples, were made to compare the CPU implementation with the GPU implementation. The benchmark was run on three different platforms: a desktop computer, a NVIDIA Jetson AGX Xavier and a NVIDIA Jetson TX2. The results show that the parallelized version can reach several magnitudes higher throughput than the CPU implementation.
APA, Harvard, Vancouver, ISO, and other styles
10

Araújo, João Manuel da Silva. "Paralelização de algoritmos de Filtragem baseados em XPATH/XML com recurso a GPUs." Master's thesis, FCT - UNL, 2009. http://hdl.handle.net/10362/2530.

Full text
Abstract:
Dissertação de Mestrado em Engenharia Informática
Esta dissertação envolve o estudo da viabilidade da utilização dos GPUs para o processamento paralelo aplicado aos algoritmos de filtragem de notificações num sistema editor/assinante. Este objectivo passou por realizar uma comparação de resultados experimentais entre a versão sequencial (nos CPUs) e a versão paralela de um algoritmo de filtragem escolhido como referência. Essa análise procurou dar elementos para aferir se eventuais ganhos da exploração dos GPUs serão suficientes para compensar a maior complexidade do processo.
APA, Harvard, Vancouver, ISO, and other styles
11

Shi, Bobo. "Implementation and Performance Analysis of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor and NVIDIA GPU Accelerator." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1462793739.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Senthil, Kumar Nithin. "Designing optimized MPI+NCCL hybrid collective communication routines for dense many-GPU clusters." The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1619132252608831.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Macenauer, Pavel. "Detekce objektů na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234942.

Full text
Abstract:
This thesis addresses the topic of object detection on graphics processing units. As a part of it, a system for object detection using NVIDIA CUDA was designed and implemented, allowing for realtime video object detection and bulk processing. Its contribution is mainly to study the options of NVIDIA CUDA technology and current graphics processing units for object detection acceleration. Also parallel algorithms for object detection are discussed and suggested.
APA, Harvard, Vancouver, ISO, and other styles
14

Straňák, Marek. "Raytracing na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237020.

Full text
Abstract:
Raytracing is a basic technique for displaying 3D objects. The goal of this thesis is to demonstrate the possibility of implementing raytracer using a programmable GPU. The algorithm and its modified version, implemented using "C for CUDA" language, are described. The raytracer is focused on displaying dynamic scenes. For this purpose the KD tree structure, bounding volume hierarchies and PBO transfer are used. To achieve realistic output, photon mapping was implemented.
APA, Harvard, Vancouver, ISO, and other styles
15

Bartosch, Nadine. "Correspondence-based pairwise depth estimation with parallel acceleration." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-34372.

Full text
Abstract:
This report covers the implementation and evaluation of a stereo vision corre- spondence-based depth estimation algorithm on a GPU. The results and feed- back are used for a Multi-view camera system in combination with Jetson TK1 devices for parallelized image processing and the aim of this system is to esti- mate the depth of the scenery in front of it. The performance of the algorithm plays the key role. Alongside the implementation, the objective of this study is to investigate the advantages of parallel acceleration inter alia the differences to the execution on a CPU which are significant for all the function, the imposed overheads particular for a GPU application like memory transfer from the CPU to the GPU and vice versa as well as the challenges for real-time and concurrent execution. The study has been conducted with the aid of CUDA on three NVIDIA GPUs with different characteristics and with the aid of knowledge gained through extensive literature study about different depth estimation algo- rithms but also stereo vision and correspondence as well as CUDA in general. Using the full set of components of the algorithm and expecting (near) real-time execution is utopic in this setup and implementation, the slowing factors are in- ter alia the semi-global matching. Investigating alternatives shows that results for disparity maps of a certain accuracy are also achieved by local methods like the Hamming Distance alone and by a filter that refines the results. Further- more, it is demonstrated that the kernel launch configuration and the usage of GPU memory types like shared memory is crucial for GPU implementations and has an impact on the performance of the algorithm. Just concurrency proves to be a more complicated task, especially in the desired way of realization. For the future work and refinement of the algorithm it is therefore recommended to invest more time into further optimization possibilities in regards of shared memory and into integrating the algorithm into the actual pipeline.
APA, Harvard, Vancouver, ISO, and other styles
16

Chehaimi, Omar. "Parallelizzazione dell'algoritmo di ricostruzione di Feldkamp-Davis-Kress per architetture Low-Power di tipo System-On-Chip." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/13918/.

Full text
Abstract:
In questa tesi,svolta presso il CNAF,si presentano i risultati ottenuti nel lavoro svolto per la parallelizzazione in CUDA dell'algoritmo di ricostruzione tomografica di Feldkamp-Davis-Kress (FDK),sulla base del software in versione sia sequenziale che parallela MPI,sviluppato presso i laboratori del X-ray Imaging Group.Gli obbiettivi di questo lavoro sono principalmente due:ridurre in modo sensibile i tempi di esecuzione dell'algoritmo di ricostruzione FDK parallelizzando su Graphics Processing Unit (GPU) e valutare,su diverse tipologie di architetture,i consumi energetici.Le piattaforme prese in esame sono:SoC (System-on-Chip) low-power, architetture a basso consumo energetico ma a limitata potenza di calcolo,e High Performance Computing (HPC),caratterizzate da un'elevata potenza di calcolo ma con un ingente consumo energetico.Si vuole mettere in risalto la differenza di prestazioni in relazione al tipo di architettura e rispetto al relativo consumo energetico.Poter sostituire nodi HPC con schede SoC low-power presenta il vantaggio di ridurre i consumi, la complessità dell'hardware e la possibilità di ottenere dei risultati direttamente in loco.I risultati ottenuti mostrano che la parallelizzazione di FDK su GPU sia la scelta più efficiente. Risulta infatti sempre,e su ogni architettura testata,più performante rispetto alla versione MPI,nonostante in quest'ultima venga parallelizzato tutto l'algoritmo.In CUDA invece si parallelizza solo la fase di ricostruzione.Inoltre si è risusciti a raggiungere un'efficienza di utilizzo della GPU del 100%.L'efficienza energetica rapportata alle prestazioni in termini di tempo è migliore per le architetture SoC rispetto a quelle HPC.Si propone infine un approccio ibrido MPI unito a CUDA che migliora ulteriormente le prestazioni di esecuzione.Il filtraggio e la ricostruzione sono operazioni indipendenti,si utilizza allora l'implementazione più efficiente per la data operazione,filtrare in MPI e ricostruire in CUDA.
APA, Harvard, Vancouver, ISO, and other styles
17

Čermák, Michal. "Detekce pohyblivého objektu ve videu na CUDA." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-236992.

Full text
Abstract:
This thesis deals with model-based approach to 3D tracking from monocular video. The 3D model pose dynamically estimated through minimization of objective function by particle filter. Objective function is based on rendered scene to real video similarity.
APA, Harvard, Vancouver, ISO, and other styles
18

Pecháček, Václav. "Akcelerace heuristických metod diskrétní optimalizace na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236550.

Full text
Abstract:
Thesis deals with discrete optimization problems. It focusses on faster ways to find good solutions by means of heuristics and parallel processing. Based on ant colony optimization (ACO) algorithm coupled with k-optimization local search approach, it aims at massively parallel computing on graphics processors provided by Nvidia CUDA platform. Well-known travelling salesman problem (TSP) is used as a case study. Solution is based on dividing task into subproblems using tour-based partitioning, parallel processing of distinct parts and their consecutive recombination. Provided parallel code can perform computation more than seventeen times faster than the sequential version.
APA, Harvard, Vancouver, ISO, and other styles
19

Pospíchal, Petr. "Akcelerace genetického algoritmu s využitím GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236783.

Full text
Abstract:
This thesis represents master's thesis focused on acceleration of Genetic algorithms using GPU. First chapter deeply analyses Genetic algorithms and corresponding topics like population, chromosome, crossover, mutation and selection. Next part of the thesis shows GPU abilities for unified computing using both DirectX/OpenGL with Cg and specialized GPGPU libraries like CUDA. The fourth chapter focuses on design of GPU implementation using CUDA, coarse-grained and fine-grained GAs are discussed, and completed by sorting and random number generation task accelerated by GPU. Next chapter covers implementation details -- migration, crossover and selection schemes mapped on CUDA software model. All GA elements and quality of GPU results are described in the last chapter.
APA, Harvard, Vancouver, ISO, and other styles
20

Mintěl, Tomáš. "Interpolace obrazových bodů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236736.

Full text
Abstract:
This master's thesis deals with acceleration of pixel interpolation methods using the GPU and NVIDIA (R) CUDA TM architecture. Graphic output is represented by a demonstrational application for geometrical image transforms using chosen interpolation method. Time critical parts of the code are moved on the GPU and executed in parallel. There are used highly optimized routines from the OpenCV library, made by the Intel company for an image and video processing.
APA, Harvard, Vancouver, ISO, and other styles
21

Němeček, Petr. "Geometrické transformace obrazu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236764.

Full text
Abstract:
This master's thesis deals with acceleration of geometrical image transforms using the GPU and NVIDIA (R) CUDA TM architecture. Time critical parts of the code are moved on the GPU and executed in parallel. One of the results is a demonstrational application for performance comparison of both architectures: the CPU, and GPU in combination with the CPU. As a reference implementation, there are used highly optimized routines from the OpenCV library, made by the Intel company.
APA, Harvard, Vancouver, ISO, and other styles
22

Hordemann, Glen J. "Exploring High Performance SQL Databases with Graphics Processing Units." Bowling Green State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1380125703.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Venkatasubramanian, Sundaresan. "Tuned and asynchronous stencil kernels for CPU/GPU systems." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29728.

Full text
Abstract:
Thesis (M. S.)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Vuduc, Richard; Committee Member: Kim, Hyesoon; Committee Member: Vetter, Jeffrey. Part of the SMARTech Electronic Thesis and Dissertation Collection.
APA, Harvard, Vancouver, ISO, and other styles
24

Music, Sani. "Grafikkort till parallella beräkningar." Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20150.

Full text
Abstract:
Den här studien beskriver hur grafikkort kan användas på en bredare front änmultimedia. Arbetet förklarar och diskuterar huvudsakliga alternativ som finnstill att använda grafikkort till generella operationer i dagsläget. Inom denna studieanvänds Nvidias CUDA arkitektur. Studien beskriver hur grafikkort användstill egna operationer rent praktiskt ur perspektivet att vi redan kan programmerai högnivåspråk och har grundläggande kunskap om hur en dator fungerar. Vianvänder s.k. accelererade bibliotek på grafikkortet (THRUST och CUBLAS) föratt uppnå målet som är utveckling av programvara och prestandatest. Resultatetär program som använder GPU:n till generella och prestandatest av dessa,för lösning av olika problem (matrismultiplikation, sortering, binärsökning ochvektor-inventering) där grafikkortet jämförs med processorn seriellt och parallellt.Resultat visar att grafikkortet exekverar upp till ungefär 50 gånger snabbare(tidsmässigt) kod jämfört med seriella program på processorn.
This study describes how we can use graphics cards for general purpose computingwhich differs from the most usual field where graphics cards are used, multimedia.The study describes and discusses present day alternatives for usinggraphic cards for general operations. In this study we use and describe NvidiaCUDA architecture. The study describes how we can use graphic cards for generaloperations from the point of view that we have programming knowledgein some high-level programming language and knowledge of how a computerworks. We use accelerated libraries (THRUST and CUBLAS) to achieve our goalson the graphics card, which are software development and benchmarking. Theresults are programs countering certain problems (matrix multiplication, sorting,binary search, vector inverting) and the execution time and speedup forthese programs. The graphics card is compared to the processor in serial andthe processor in parallel. Results show a speedup of up to approximatly 50 timescompared to serial implementations on the processor.
APA, Harvard, Vancouver, ISO, and other styles
25

Adeboye, Taiyelolu. "Robot Goalkeeper : A robotic goalkeeper based on machine vision and motor control." Thesis, Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-27561.

Full text
Abstract:
This report shows a robust and efficient implementation of a speed-optimized algorithm for object recognition, 3D real world location and tracking in real time. It details a design that was focused on detecting and following objects in flight as applied to a football in motion. An overall goal of the design was to develop a system capable of recognizing an object and its present and near future location while also actuating a robotic arm in response to the motion of the ball in flight. The implementation made use of image processing functions in C++, NVIDIA Jetson TX1, Sterolabs’ ZED stereoscopic camera setup in connection to an embedded system controller for the robot arm. The image processing was done with a textured background and the 3D location coordinates were applied to the correction of a Kalman filter model that was used for estimating and predicting the ball location. A capture and processing speed of 59.4 frames per second was obtained with good accuracy in depth detection while the ball was well tracked in the tests carried out.
APA, Harvard, Vancouver, ISO, and other styles
26

Subramoniapillai, Ajeetha Saktheesh. "Architectural Analysis and Performance Characterization of NVIDIA GPUs using Microbenchmarking." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1344623484.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Cazalas, Jonathan M. "Efficient and Scalable Evaluation of Continuous, Spatio-temporal Queries in Mobile Computing Environments." Doctoral diss., University of Central Florida, 2012. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5154.

Full text
Abstract:
A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. For this research, we present a two-pronged approach at addressing this problem. Firstly, we introduce an efficient and scalable system for monitoring traditional, continuous queries by leveraging the parallel processing capability of the Graphics Processing Unit. We examine a naive CPU-based solution for continuous range-monitoring queries, and we then extend this system using the GPU. Additionally, with mobile communication devices becoming commodity, location-based services will become ubiquitous. To cope with the very high intensity of location-based queries, we propose a view oriented approach of the location database, thereby reducing computation costs by exploiting computation sharing amongst queries requiring the same view. Our studies show that by exploiting the parallel processing power of the GPU, we are able to significantly scale the number of mobile objects, while maintaining an acceptable level of performance. Our second approach was to view this research problem as one belonging to the domain of data streams. Several works have convincingly argued that the two research fields of spatio-temporal data streams and the management of moving objects can naturally come together. [IlMI10, ChFr03, MoXA04] For example, the output of a GPS receiver, monitoring the position of a mobile object, is viewed as a data stream of location updates. This data stream of location updates, along with those from the plausibly many other mobile objects, is received at a centralized server, which processes the streams upon arrival, effectively updating the answers to the currently active queries in real time. For this second approach, we present GEDS, a scalable, Graphics Processing Unit (GPU)-based framework for the evaluation of continuous spatio-temporal queries over spatio-temporal data streams. Specifically, GEDS employs the computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal range queries and continuous, spatio-temporal kNN queries. The GEDS framework utilizes the parallel processing capability of the GPU, a stream processor by trade, to handle the computation required in this application. Experimental evaluation shows promising performance and shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments. Additional performance studies demonstrate that, even in light of the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. Finally, in an effort to move beyond the analysis of specific algorithms over the GEDS framework, we take a broader approach in our analysis of GPU computing. What algorithms are appropriate for the GPU? What types of applications can benefit from the parallel and stream processing power of the GPU? And can we identify a class of algorithms that are best suited for GPU computing? To answer these questions, we develop an abstract performance model, detailing the relationship between the CPU and the GPU. From this model, we are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based applications.
ID: 031001567; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Title from PDF title page (viewed August 26, 2013).; Thesis (Ph.D.)--University of Central Florida, 2012.; Includes bibliographical references (p. 103-112).
Ph.D.
Doctorate
Computer Science
Engineering and Computer Science
Computer Science
APA, Harvard, Vancouver, ISO, and other styles
28

Maurer, Andreas. "Methods for Multisensory Detection of Light Phenomena on the Moon as a Payload Concept for a Nanosatellite Mission." Thesis, Luleå tekniska universitet, Rymdteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-80785.

Full text
Abstract:
For 500 years transient light phenomena (TLP) have been observed on the lunar surface by ground-based observers. The actual physical reason for most of these events is today still unknown. Current plans of NASA and SpaceX to send astronauts back to the Moon and already successful deep-space CubeSat mission will allow in the future research nanosatellite missions to the cislunar space. This thesis presents a new hardware and software concept for a future payload on such a nanosatellite. The main task was to develop and implement a high-performance image processing algorithm which task is to detect short brightening flashes on the lunar surface. Based on a review of historic reported phenomena, possible explanation theories for these phenomena and currently active and planed ground- or space-based observatories possible reference scenarios were analyzed. From the presented scenarios one, the detection of brightening events was chosen and requirements for this scenario stated. Afterwards, possible detectors, processing computers and image processing algorithms were researched and compared regarding the specified requirements. This analysis of available algorithm was used to develop a new high-performance detection algorithm to detect transient brightening events on the Moon. The implementation of this algorithm running on the processor and the internal GPU of a MacMini achieved a framerate of 55 FPS by processing images with a resolution of 4.2 megapixel. Its functionality and performance was verified on the remote telescope operated by the Chair of Space Technology of the University of Würzburg. Furthermore, the developed algorithm was also successfully ported on the Nvidia Jetson Nano and its performance compared with a FPGA based image processing algorithm. The results were used to chose a FPGA as the main processing computer of the payload. This concept uses two backside illuminated CMOS image sensor connected to a single FPGA. On the FPGA the developed image processing algorithm should be implemented. Further work is required to realize the proposed concept in building the actual hardware and porting the developed algorithm onto this platform.
APA, Harvard, Vancouver, ISO, and other styles
29

Chi, Ping-Lin, and 机炳霖. "Simulation of Optical Properties for Thin Film Using CUDA on NVIDIA GPUs." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/23775928600592540874.

Full text
Abstract:
碩士
國立高雄第一科技大學
光電工程研究所
99
Firstly, the thesis will discuss the difference of parallel computing between the ways of data permutation in multi-threads and single thread, then measure whether the performance of GPU can increase the efficiency and the confidence in the accuracy. Compared with Intel i7 series CPU, the efficiency of NVIDIA G100 series GPU increases more than 40 times, and the effect for relative difference is less than that of 10E-15. That is to say, GPU can be a replacement of CPU to conduct huge calculation. Compared with the simulation programming of development platform with Matlab 2008 , the efficiency increases up to 200 times. In this programming, we have chosen two ways to optimize, including the useful cache memory and PCI-E bandwidth. Besides, it is also be mentioned about the Calculating method for improving the simulation programming and the solution for Many-core processor and multi-GPU. As for the functions, it provides the calculation for the transmittance and reflection of multi-selectivity absorb film, the absorbance of sunlight, the collocation of the best thickness, the supposition of the multi-thickness, and primitive derivation of refractive index and extinction coefficient. The match rate is up to 95% according to comparison of simulation result with experiment date. These are the functions that will be used.
APA, Harvard, Vancouver, ISO, and other styles
30

Chen, Wei-Sheng, and 陳威勝. "Hybrid Simulation of Optical Properties for Thin Film Using CUDA on NVIDIA GPUs." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/99410635297475769146.

Full text
Abstract:
碩士
國立高雄第一科技大學
光電工程研究所
101
This study is optical simulation and multifunction program design by CUDA, it contains: 1.Optimization in the thickness of solar selective absorbing film, 2.Reflectivity of optimal thickness fitting, 3.The reflectivity simulation of double-sided coating, 4.The optimal film thickness fitting on superlattice, 5.The reflectivity of multilayer films and the calculation of the absorption rate. Then measure whether the performance of GPU can increase the efficiency and the confidence in the accuracy. Compared with Intel i7 series CPU, the efficiency of NVIDIA G100 series GPU increases more than 40 times, and the effect for relative difference is less than that of 5%. That is to say, GPU can be a replacement of CPU to conduct huge calculation. Compared with the simulation programming of development platform with Matlab 2008 , the efficiency increases up to 200 times.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography