Готові списки джерел за темами / NVIDIA CUDA GPU

Добірка наукової літератури з теми "NVIDIA CUDA GPU"

Автор: Grafiati

Опубліковано: 10 грудня 2022

Оновлено: 29 січня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Зміст

Статті в журналах
Дисертації
Книги
Частини книг
Тези доповідей конференцій

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "NVIDIA CUDA GPU".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "NVIDIA CUDA GPU"

Nangla, Siddhante. "GPU Programming using NVIDIA CUDA." International Journal for Research in Applied Science and Engineering Technology 6, no. 6 (June 30, 2018): 79–84. http://dx.doi.org/10.22214/ijraset.2018.6016.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Liu, Zhi Yuan, and Xue Zhang Zhao. "Research and Implementation of Image Rotation Based on CUDA." Advanced Materials Research 216 (March 2011): 708–12. http://dx.doi.org/10.4028/www.scientific.net/amr.216.708.

Повний текст джерела

Анотація:

GPU technology release CPU from burdensome graphic computing task. The nVIDIA company, the main GPU producer, adds CUDA technology in new GPU models which enhances GPU function greatly and has much advantage in computing complex matrix. General algorithms of image rotation and the structure of CUDA are introduced in this paper. An example of rotating an image by using HALCON based on CPU instruction extensions and CUDA technology is to prove the advantage of CUDA by comparing two results.

Стилі APA, Harvard, Vancouver, ISO та ін.

Borcovas, Evaldas, and Gintautas Daunys. "CPU AND GPU (CUDA) TEMPLATE MATCHING COMPARISON / CPU IR GPU (CUDA) PALYGINIMAS VYKDANT ŠABLONŲ ATITIKTIES ALGORITMĄ." Mokslas – Lietuvos ateitis 6, no. 2 (April 24, 2014): 129–33. http://dx.doi.org/10.3846/mla.2014.16.

Повний текст джерела

Анотація:

Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I), NVidia GeForce GT320M CUDAcompliable graphics card (GPU I) and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II), NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II).Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have. Vaizdų apdorojimas, kompiuterinė rega ir kiti sudėtingi algoritmai, apdorojantys optinę informaciją, naudoja dideliusskaičiavimo išteklius. Dažnai šiuos algoritmus reikia realizuoti realiuoju laiku. Šį uždavinį išspręsti naudojant tik vienoCPU (angl. Central processing unit) pajėgumus yra sudėtinga. nVidia pasiūlyta CUDA (angl. Compute unified device architecture)technologija leidžia panaudoti GPU (angl. Graphic processing unit) išteklius. Tyrimui atlikti buvo pasirinkti du skirtingiCPU: Intel Pentium Dual-Core T4500 ir Intel Core I5 2500K, bei GPU: nVidia GeForce GT320M ir NVidia GeForce 560.Tyrime buvo panaudotos vaizdų apdorojimo bibliotekos: OpenCV 2.1 ir OpenCV 2.4. Tyrimui buvo pasirinktas šablonų atitiktiesalgoritmas. Algoritmui realizuoti reikalingas analizuojamas vaizdas ir ieškomo objekto vaizdo šablonas. Tyrimo metu buvokeičiamas vaizdo ir šablono dydis bei stebima, kaip tai veikia algoritmo vykdymo trukmę ir vykdomų operacijų skaičių persekundę. Iš gautų rezultatų galima teigti, kad apdorojant didelį duomenų kiekį GPU realizuoja algoritmą iki 24 kartų greičiaunei tik CPU. Dirbant su nedideliu duomenų kiekiu, skirtumas tarp CPU ir GPU yra minimalus. Lyginant skaičiavimus dviejuoseGPU, pastebėta, kad skaičiavimų sparta yra tiesiogiai proporcinga GPU turimų branduolių kiekiui. Mūsų tyrimo atvejuspartesniame GPU jų buvo 16 kartų daugiau, tad ir skaičiavimai vyko 16 kartų sparčiau.

Стилі APA, Harvard, Vancouver, ISO та ін.

Gonzalez Clua, Esteban Walter, and Marcelo Panaro Zamith. "Programming in CUDA for Kepler and Maxwell Architecture." Revista de Informática Teórica e Aplicada 22, no. 2 (November 21, 2015): 233. http://dx.doi.org/10.22456/2175-2745.56384.

Повний текст джерела

Анотація:

Since the first version of CUDA was launch, many improvements were made in GPU computing. Every new CUDA version included important novel features, turning this architecture more and more closely related to a typical parallel High Performance Language. This tutorial will present the GPU architecture and CUDA principles, trying to conceptualize novel features included by NVIDIA, such as dynamics parallelism, unified memory and concurrent kernels. This text also includes some optimization remarks for CUDA programs.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ahmed, Rafid, Md Sazzadul Islam, and Jia Uddin. "Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 1 (February 1, 2018): 70. http://dx.doi.org/10.11591/ijece.v8i1.pp70-75.

Повний текст джерела

Анотація:

As majority of the compression algorithms are implementations for CPU architecture, the primary focus of our work was to exploit the opportunities of GPU parallelism in audio compression. This paper presents an implementation of Apple Lossless Audio Codec (ALAC) algorithm by using NVIDIA GPUs Compute Unified Device Architecture (CUDA) Framework. The core idea was to identify the areas where data parallelism could be applied and parallel programming model CUDA could be used to execute the identified parallel components on Single Instruction Multiple Thread (SIMT) model of CUDA. The dataset was retrieved from European Broadcasting Union, Sound Quality Assessment Material (SQAM). Faster execution of the algorithm led to execution time reduction when applied to audio coding for large audios. This paper also presents the reduction of power usage due to running the parallel components on GPU. Experimental results reveal that we achieve about 80-90% speedup through CUDA on the identified components over its CPU implementation while saving CPU power consumption.

Стилі APA, Harvard, Vancouver, ISO та ін.

Lin, Chun-Yuan, Chung-Hung Wang, Che-Lun Hung, and Yu-Shiang Lin. "Accelerating Multiple Compound Comparison Using LINGO-Based Load-Balancing Strategies on Multi-GPUs." International Journal of Genomics 2015 (2015): 1–9. http://dx.doi.org/10.1155/2015/950905.

Повний текст джерела

Анотація:

Compound comparison is an important task for the computational chemistry. By the comparison results, potential inhibitors can be found and then used for the pharmacy experiments. The time complexity of a pairwise compound comparison isO(n2), wherenis the maximal length of compounds. In general, the length of compounds is tens to hundreds, and the computation time is small. However, more and more compounds have been synthesized and extracted now, even more than tens of millions. Therefore, it still will be time-consuming when comparing with a large amount of compounds (seen as a multiple compound comparison problem, abbreviated to MCC). The intrinsic time complexity of MCC problem isO(k2n2)withkcompounds of maximal lengthn. In this paper, we propose a GPU-based algorithm for MCC problem, called CUDA-MCC, on single- and multi-GPUs. Four LINGO-based load-balancing strategies are considered in CUDA-MCC in order to accelerate the computation speed among thread blocks on GPUs. CUDA-MCC was implemented by C+OpenMP+CUDA. CUDA-MCC achieved 45 times and 391 times faster than its CPU version on a single NVIDIA Tesla K20m GPU card and a dual-NVIDIA Tesla K20m GPU card, respectively, under the experimental results.

Стилі APA, Harvard, Vancouver, ISO та ін.

Blyth, Simon. "Meeting the challenge of JUNO simulation with Opticks: GPU optical photon acceleration via NVIDIA® OptiXTM." EPJ Web of Conferences 245 (2020): 11003. http://dx.doi.org/10.1051/epjconf/202024511003.

Повний текст джерела

Анотація:

Opticks is an open source project that accelerates optical photon simulation by integrating NVIDIA GPU ray tracing, accessed via NVIDIA OptiX, with Geant4 toolkit based simulations. A single NVIDIA Turing architecture GPU has been measured to provide optical photon simulation speedup factors exceeding 1500 times single threaded Geant4 with a full JUNO analytic GPU geometry automatically translated from the Geant4 geometry. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented within CUDA OptiX programs based on the Geant4 implementations. Wavelength-dependent material and surface properties as well as inverse cumulative distribution functions for reemission are interleaved into GPU textures providing fast interpolated property lookup or wavelength generation. Major recent developments enable Opticks to benefit from ray trace dedicated RT cores available in NVIDIA RTX series GPUs. Results of extensive validation tests are presented.

Стилі APA, Harvard, Vancouver, ISO та ін.

FUJIMOTO, NORIYUKI. "DENSE MATRIX-VECTOR MULTIPLICATION ON THE CUDA ARCHITECTURE." Parallel Processing Letters 18, no. 04 (December 2008): 511–30. http://dx.doi.org/10.1142/s0129626408003545.

Повний текст джерела

Анотація:

Recently GPUs have acquired the ability to perform fast general purpose computation by running thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on the NVIDIA CUDA architecture. The experiments are conducted on a PC with GeForce 8800GTX and 2.0 GHz Intel Xeon E5335 CPU. The results show that the proposed algorithm runs a maximum of 11.19 times faster than NVIDIA's BLAS library CUBLAS 1.1 on the GPU and 35.15 times faster than the Intel Math Kernel Library 9.1 on a single core x86 with SSE3 SIMD instructions. The performance of Jacobi's iterative method for solving linear equations, which includes the data transfer time between CPU and GPU, shows that the proposed algorithm is practical for real applications.

Стилі APA, Harvard, Vancouver, ISO та ін.

Blyth, Simon. "Integration of JUNO simulation framework with Opticks: GPU accelerated optical propagation via NVIDIA® OptiX™." EPJ Web of Conferences 251 (2021): 03009. http://dx.doi.org/10.1051/epjconf/202125103009.

Повний текст джерела

Анотація:

Opticks is an open source project that accelerates optical photon simulation by integrating NVIDIA GPU ray tracing, accessed via NVIDIA OptiX, with Geant4 toolkit based simulations. A single NVIDIA Turing architecture GPU has been measured to provide optical photon simulation speedup factors exceeding 1500 times single threaded Geant4 with a full JUNO analytic GPU geometry automatically translated from the Geant4 geometry. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented within CUDA OptiX programs based on the Geant4 implementations. Wavelength-dependent material and surface properties as well as inverse cumulative distribution functions for reemission are interleaved into GPU textures providing fast interpolated property lookup or wavelength generation. In this work we describe major recent developments to facilitate integration of Opticks with the JUNO simulation framework including on GPU collection effciency hit culling which substantially reduces both the CPU memory needed for photon hits and copying overheads. Also progress with the migration of Opticks to the all new NVIDIA OptiX 7 API is described.

Стилі APA, Harvard, Vancouver, ISO та ін.

Bi, Yujiang, Yi Xiao, WeiYi Guo, Ming Gong, Peng Sun, Shun Xu, and Yi-bo Yang. "Lattice QCD GPU Inverters on ROCm Platform." EPJ Web of Conferences 245 (2020): 09008. http://dx.doi.org/10.1051/epjconf/202024509008.

Повний текст джерела

Анотація:

The open source ROCm/HIP platform for GPU computing provides a uniform framework to support both the NVIDIA and AMD GPUs, and also the possibility to porting the CUDA code to the HIP-compatible one. We present the porting progress on the Overlap fermion inverter (GWU-code) and also the general Lattice QCD inverter package - QUDA. The manual of using QUDA on HIP and also the tips of porting general CUDA code into the HIP framework are also provided.

Стилі APA, Harvard, Vancouver, ISO та ін.

Більше джерел

Дисертації з теми "NVIDIA CUDA GPU"

Ikeda, Patricia Akemi. "Um estudo do uso eficiente de programas em placas gráficas." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25042012-212956/.

Повний текст джерела

Анотація:

Inicialmente projetadas para processamento de gráficos, as placas gráficas (GPUs) evoluíram para um coprocessador paralelo de propósito geral de alto desempenho. Devido ao enorme potencial que oferecem para as diversas áreas de pesquisa e comerciais, a fabricante NVIDIA destaca-se pelo pioneirismo ao lançar a arquitetura CUDA (compatível com várias de suas placas), um ambiente capaz de tirar proveito do poder computacional aliado à maior facilidade de programação. Na tentativa de aproveitar toda a capacidade da GPU, algumas práticas devem ser seguidas. Uma delas consiste em manter o hardware o mais ocupado possível. Este trabalho propõe uma ferramenta prática e extensível que auxilie o programador a escolher a melhor configuração para que este objetivo seja alcançado.
Initially designed for graphical processing, the graphic cards (GPUs) evolved to a high performance general purpose parallel coprocessor. Due to huge potencial that graphic cards offer to several research and commercial areas, NVIDIA was the pioneer lauching of CUDA architecture (compatible with their several cards), an environment that take advantage of computacional power combined with an easier programming. In an attempt to make use of all capacity of GPU, some practices must be followed. One of them is to maximizes hardware utilization. This work proposes a practical and extensible tool that helps the programmer to choose the best configuration and achieve this goal.

Стилі APA, Harvard, Vancouver, ISO та ін.

Rivera-Polanco, Diego Alejandro. "COLLECTIVE COMMUNICATION AND BARRIER SYNCHRONIZATION ON NVIDIA CUDA GPU." Lexington, Ky. : [University of Kentucky Libraries], 2009. http://hdl.handle.net/10225/1158.

Повний текст джерела

Анотація:

Thesis (M.S.)--University of Kentucky, 2009.
Title from document title page (viewed on May 18, 2010). Document formatted into pages; contains: ix, 88 p. : ill. Includes abstract and vita. Includes bibliographical references (p. 86-87).

Стилі APA, Harvard, Vancouver, ISO та ін.

Harvey, Jesse Patrick. "GPU acceleration of object classification algorithms using NVIDIA CUDA /." Online version of thesis, 2009. http://hdl.handle.net/1850/10894.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Lerchundi, Osa Gorka. "Fast Implementation of Two Hash Algorithms on nVidia CUDA GPU." Thesis, Norwegian University of Science and Technology, Department of Telematics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9817.

Повний текст джерела

Анотація:

User needs increases as time passes. We started with computers like the size of a room where the perforated plaques did the same function as the current machine code object does and at present we are at a point where the number of processors within our graphic device unit its not enough for our requirements. A change in the evolution of computing is looming. We are in a transition where the sequential computation is losing ground on the benefit of the distributed. And not because of the birth of the new GPUs easily accessible this trend is novel but long before it was used for projects like SETI@Home, fightAIDS@Home, ClimatePrediction and there were shouting from the rooftops about what was to come. Grid computing was its formal name. Until now it was linked only to distributed systems over the network, but as this technology evolves it will take different meaning. nVidia with CUDA has been one of the first companies to make this kind of software package noteworthy. Instead of being a proof of concept its a real tool. Where the transition is expressed in greater magnitude in which the true artist is the programmer who uses it and achieves performance increases. As with many innovations, a community distributed worldwide has grown behind this software package and each one doing its bit. It is noteworthy that after CUDA release a lot of software developments grown like the cracking of the hitherto insurmountable WPA. With Sony-Toshiba-IBM (STI) alliance it could be said the same thing, it has a great community and great software (IBM is the company in charge of maintenance). Unlike nVidia is not as accessible as it is but IBM is powerful enough to enter home made supercomputing market. In this case, after IBM released the PS3 SDK, a notorious application was created using the benefits of parallel computing named Folding@Home. Its purpose is to, inter alia, find the cure for cancer. To sum up, this is only the beginning, and in this thesis is sized up the possibility of using this technology for accelerating cryptographic hash algorithms. BLUE MIDNIGHT WISH (The hash algorithm that is applied to the surgery) is undergone to an environment change adapting it to a parallel capable code for creating empirical measures that compare to the current sequential implementations. It will answer questions that nowadays havent been answered yet. BLUE MIDNIGHT WISH is a candidate hash function for the next NIST standard SHA-3, designed by professor Danilo Gligoroski from NTNU and Vlastimil Klima an independent cryptographer from Czech Republic. So far, from speed point of view BLUE MIDNIGHT WISH is on the top of the charts (generally on the second place right behind EDON-R - another hash function from professor Danilo Gligoroski). One part of the work on this thesis was to investigate is it possible to achieve faster speeds in processing of Blue Midnight Wish when the computations are distributed among the cores in a CUDA device card. My numerous experiments give a clear answer: NO. Although the answer is negative, it still has a significant scientific value. The point is that my work acknowledges viewpoints and standings of a part of the cryptographic community that is doubtful that the cryptographic primitives will benefit when executed in parallel in many cores in one CPU. Indeed, my experiments show that the communication costs between cores in CUDA outweigh by big margin the computational costs done inside one core (processor) unit.

Стилі APA, Harvard, Vancouver, ISO та ін.

Sreenibha, Reddy Byreddy. "Performance Metrics Analysis of GamingAnywhere with GPU accelerated Nvidia CUDA." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16846.

Повний текст джерела

Анотація:

The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU.

Стилі APA, Harvard, Vancouver, ISO та ін.

Savioli, Nicolo'. "Parallelization of the algorithm WHAM with NVIDIA CUDA." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/6377/.

Повний текст джерела

Анотація:

The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).

Стилі APA, Harvard, Vancouver, ISO та ін.

Zaahid, Mohammed. "Performance Metrics Analysis of GamingAnywhere with GPU acceletayed NVIDIA CUDA using gVirtuS." Thesis, Blekinge Tekniska Högskola, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16852.

Повний текст джерела

Анотація:

Стилі APA, Harvard, Vancouver, ISO та ін.

Virk, Bikram. "Implementing method of moments on a GPGPU using Nvidia CUDA." Thesis, Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33980.

Повний текст джерела

Анотація:

This thesis concentrates on the algorithmic aspects of Method of Moments (MoM) and Locally Corrected Nyström (LCN) numerical methods in electromagnetics. The data dependency in each step of the algorithm is analyzed to implement a parallel version that can harness the powerful processing power of a General Purpose Graphics Processing Unit (GPGPU). The GPGPU programming model provided by NVIDIA's Compute Unified Device Architecture (CUDA) is described to learn the software tools at hand enabling us to implement C code on the GPGPU. Various optimizations such as the partial update at every iteration, inter-block synchronization and using shared memory enable us to achieve an overall speedup of approximately 10. The study also brings out the strengths and weaknesses in implementing different methods such as Crout's LU decomposition and triangular matrix inversion on a GPGPU architecture. The results suggest future directions of study in different algorithms and their effectiveness on a parallel processor environment. The performance data collected show how different features of the GPGPU architecture can be enhanced to yield higher speedup.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ekstam, Ljusegren Hannes, and Hannes Jonsson. "Parallelizing Digital Signal Processing for GPU." Thesis, Linköpings universitet, Programvara och system, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-167189.

Повний текст джерела

Анотація:

Because of the increasing importance of signal processing in today's society, there is a need to easily experiment with new ways to process signals. Usually, fast-performing digital signal processing is done with special-purpose hardware that are difficult to develop for. GPUs pose an alternative for fast performing digital signal processing. The work in this thesis is an analysis and implementation of a GPU version of a digital signal processing chain provided by SAAB. Through an iterative process of development and testing, a final implementation was achieved. Two benchmarks, both comprised of 4.2 M test samples, were made to compare the CPU implementation with the GPU implementation. The benchmark was run on three different platforms: a desktop computer, a NVIDIA Jetson AGX Xavier and a NVIDIA Jetson TX2. The results show that the parallelized version can reach several magnitudes higher throughput than the CPU implementation.

Стилі APA, Harvard, Vancouver, ISO та ін.

Araújo, João Manuel da Silva. "Paralelização de algoritmos de Filtragem baseados em XPATH/XML com recurso a GPUs." Master's thesis, FCT - UNL, 2009. http://hdl.handle.net/10362/2530.

Повний текст джерела

Анотація:

Dissertação de Mestrado em Engenharia Informática
Esta dissertação envolve o estudo da viabilidade da utilização dos GPUs para o processamento paralelo aplicado aos algoritmos de filtragem de notificações num sistema editor/assinante. Este objectivo passou por realizar uma comparação de resultados experimentais entre a versão sequencial (nos CPUs) e a versão paralela de um algoritmo de filtragem escolhido como referência. Essa análise procurou dar elementos para aferir se eventuais ganhos da exploração dos GPUs serão suficientes para compensar a maior complexidade do processo.

Стилі APA, Harvard, Vancouver, ISO та ін.

Більше джерел

Книги з теми "NVIDIA CUDA GPU"

Dagg, Michael. NVIDIA GPU Programming: Massively Parallel Programming with CUDA. Wiley & Sons, Incorporated, John, 2013.

Знайти повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Dagg, Michael. NVIDIA GPU Programming: Massively Parallel Programming with CUDA. Wiley & Sons, Incorporated, John, 2012.

Знайти повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Dagg, Michael. NVIDIA GPU Programming: Massively Parallel Programming with CUDA. Wiley & Sons, Incorporated, John, 2012.

Знайти повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "NVIDIA CUDA GPU"

Bhura, Mayank, Pranav H. Deshpande, and K. Chandrasekaran. "CUDA or OpenCL." In Research Advances in the Integration of Big Data and Smart Computing, 267–79. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-4666-8737-0.ch015.

Повний текст джерела

Анотація:

Usage of General Purpose Graphics Processing Units (GPGPUs) in high-performance computing is increasing as heterogeneous systems continue to become dominant. CUDA had been the programming environment for nearly all such NVIDIA GPU based GPGPU applications. Still, the framework runs only on NVIDIA GPUs, for other frameworks it requires reimplementation to utilize additional computing devices that are available. OpenCL provides a vendor-neutral and open programming environment, with many implementations available on CPUs, GPUs, and other types of accelerators, OpenCL can thus be regarded as write once, run anywhere framework. Despite this, both frameworks have their own pros and cons. This chapter presents a comparison of the performance of CUDA and OpenCL frameworks, using an algorithm to find the sum of all possible triple products on a list of integers, implemented on GPUs.

Стилі APA, Harvard, Vancouver, ISO та ін.

Lin, Chun-Yuan, Jin Ye, Che-Lun Hung, Chung-Hung Wang, Min Su, and Jianjun Tan. "Constructing a Bioinformatics Platform with Web and Mobile Services Based on NVIDIA Jetson TK1." In Data Analytics in Medicine, 629–44. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-7998-1204-3.ch035.

Повний текст джерела

Анотація:

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board with low cost, low power consumption and high applicability advantages for embedded applications. NVIDIA Jetson TK1 becomes a new research direction. Hence, in this paper, a bioinformatics platform was constructed based on NVIDIA Jetson TK1. ClustalWtk and MCCtk tools for sequence alignment and compound comparison were designed on this platform, respectively. Moreover, the web and mobile services for these two tools with user friendly interfaces also were provided. The experimental results showed that the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by Intel XEON E5-2650 CPU and NVIDIA Tesla K20m GPU card.

Стилі APA, Harvard, Vancouver, ISO та ін.

Adhikari, Mainak, and Sukhendu Kar. "Advanced Topics GPU Programming and CUDA Architecture." In Advances in Systems Analysis, Software Engineering, and High Performance Computing, 175–203. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-4666-8853-7.ch008.

Повний текст джерела

Анотація:

Graphics processing unit (GPU), which typically handles computation only for computer graphics. Any GPU providing a functionally complete set of operations performed on arbitrary bits can compute any computable value. Additionally, the use of multiple graphics cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing. CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA and implemented by the graphics processing units (GPUs). CUDA gives program developers direct access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs. This chapter first discuss some features and challenges of GPU programming and the effort to address some of the challenges with building and running GPU programming in high performance computing (HPC) environment. Finally this chapter point out the importance and standards of CUDA architecture.

Стилі APA, Harvard, Vancouver, ISO та ін.

Khadtare, Mahesh Satish. "GPU Based Image Quality Assessment using Structural Similarity (SSIM) Index." In Advances in Systems Analysis, Software Engineering, and High Performance Computing, 276–82. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-4666-8853-7.ch013.

Повний текст джерела

Анотація:

This chapter deals with performance analysis of CUDA implementation of an image quality assessment tool based on structural similarity index (SSI). Since it had been initial created at the University of Texas in 2002, the Structural SIMilarity (SSIM) image assessment algorithm has become a valuable tool for still image and video processing analysis. SSIM provided a big giant over MSE (Mean Square Error) and PSNR (Peak Signal to Noise Ratio) techniques because it way more closely aligned with the results that would have been obtained with subjective testing. For objective image analysis, this new technique represents as significant advancement over SSIM as the advancement that SSIM provided over PSNR. The method is computationally intensive and this poses issues in places wherever real time quality assessment is desired. We tend to develop a CUDA implementation of this technique that offers a speedup of approximately 30 X on Nvidia GTX275 and 80 X on C2050 over Intel single core processor.

Стилі APA, Harvard, Vancouver, ISO та ін.

Yamada, Susumu, Masahiko Machida, and Toshiyuki Imamura. "High Performance Eigenvalue Solver for Hubbard Model: Tuning Strategies for LOBPCG Method on CUDA GPU." In Parallel Computing: Technology Trends. IOS Press, 2020. http://dx.doi.org/10.3233/apc200030.

Повний текст джерела

Анотація:

The exact diagonalization is the most accurate approach for solving the Hubbard model. The approach calculates the ground state of the Hamiltonian derived exactly from the model. Since the Hamiltonian is a large sparse symmetric matrix, we usually utilize an iteration method. It has been reported that the LOBPCG method is one of the most effectual solvers for this problem. Since most operations of the method are linear operations, the method can be executed on CUDA GPU, which is one of the mainstream processors, by using cuBLAS and cuSPARSE libraries straightforwardly. However, since the routines are executed one after the other, cached data can not be reused among other routines. In this research, we tune the routines by fusing some of their loop operations in order to reuse cached data. Moreover, we propose the tuning strategies for the Hamiltonian-vector multiplication with shared memory system in consideration of the character of the Hamiltonian. The numerical test on NVIDIA Tesla P100 shows that the tuned LOBPCG code is about 1.5 times faster than the code with cuBLAS and cuSPARSE routines.

Стилі APA, Harvard, Vancouver, ISO та ін.

Peña-Cantillana, Francisco, Daniel Díaz-Pernil, Hepzibah A. Christinal, and Miguel A. Gutiérrez-Naranjo. "Implementation on CUDA of the Smoothing Problem with Tissue-Like P Systems." In Natural Computing for Simulation and Knowledge Discovery, 184–93. IGI Global, 2014. http://dx.doi.org/10.4018/978-1-4666-4253-9.ch012.

Повний текст джерела

Анотація:

Smoothing is often used in Digital Imagery for improving the quality of an image by reducing its level of noise. This paper presents a parallel implementation of an algorithm for smoothing 2D images in the framework of Membrane Computing. The chosen formal framework has been tissue-like P systems. The algorithm has been implemented by using a novel device architecture called CUDA (Compute Unified Device Architecture) which allows the parallel NVIDIA Graphics Processors Units (GPUs) to solve many complex computational problems. Some examples are presented and compared; research lines for the future are also discussed.

Стилі APA, Harvard, Vancouver, ISO та ін.

"Practical Examples of Automated Development of Efficient Parallel Programs." In Advances in Systems Analysis, Software Engineering, and High Performance Computing, 180–216. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-5225-9384-3.ch006.

Повний текст джерела

Анотація:

In this chapter, some examples of application of the developed software tools for design, generation, transformation, and optimization of programs for multicore processors and graphics processing units are considered. In particular, the algebra-algorithmic-integrated toolkit for design and synthesis of programs (IDS) and the rewriting rules system TermWare.NET are applied for design and parallelization of programs for multicore central processing units. The developed algebra-dynamic models and the rewriting rules toolkit are used for parallelization and optimization of programs for NVIDIA GPUs supporting the CUDA technology. The TuningGenie framework is applied for parallel program auto-tuning: optimization of sorting, Brownian motion simulation, and meteorological forecasting programs to a target platform. The parallelization of Fortran programs using the rewriting rules technique on sample problems in the field of quantum chemistry is examined.

Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "NVIDIA CUDA GPU"

Buck, Ian. "GPU computing with NVIDIA CUDA." In ACM SIGGRAPH 2007 courses. New York, New York, USA: ACM Press, 2007. http://dx.doi.org/10.1145/1281500.1281647.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Harris, Mark. "Many-core GPU computing with NVIDIA CUDA." In the 22nd annual international conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1375527.1375528.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Kirk, David. "NVIDIA cuda software and gpu parallel computing architecture." In the 6th international symposium. New York, New York, USA: ACM Press, 2007. http://dx.doi.org/10.1145/1296907.1296909.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Li, Yazhou, and Yahong Rosa Zheng. "Profiling NVIDIA Jetson Embedded GPU Devices for Autonomous Machines." In 6th International Conference on Computer Science, Engineering And Applications (CSEA 2020). AIRCC Publishing Corporation, 2020. http://dx.doi.org/10.5121/csit.2020.101811.

Повний текст джерела

Анотація:

This paper presents two methods, tegrastats GUI version jtop and Nsight Systems, to profile NVIDIA Jetson embedded GPU devices on a model race car which is a great platform for prototyping and field testing autonomous driving algorithms. The two profilers analyze the power consumption, CPU/GPU utilization, and the run time of CUDA C threads of Jetson TX2 in five different working modes. The performance differences among the five modes are demonstrated using three example programs: vector add in C and CUDA C, a simple ROS (Robot Operating System) package of the wall follow algorithm in Python, and a complex ROS package of the particle filter algorithm for SLAM (Simultaneous Localization and Mapping). The results show that the tools are effective means for selecting operating mode of the embedded GPU devices.

Стилі APA, Harvard, Vancouver, ISO та ін.

Soni, Hemlata, and Pradeep Chhawcharia. "GPU-accelerated MoM based scattering/radiation analysis using NVIDIA CUDA." In 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN). IEEE, 2015. http://dx.doi.org/10.1109/icrcicn.2015.7434257.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Ramroach, Sterling, Jonathan Herbert, and Ajay Joshi. "CUDA-ACCELERATED FEATURE SELECTION." In International Conference on Emerging Trends in Engineering & Technology (IConETech-2020). Faculty of Engineering, The University of the West Indies, St. Augustine, 2020. http://dx.doi.org/10.47412/juqg5057.

Повний текст джерела

Анотація:

Identifying important features from high dimensional data is usually done using one-dimensional filtering techniques. These techniques discard noisy attributes and those that are constant throughout the data. This is a time-consuming task that has scope for acceleration via high performance computing techniques involving the graphics processing unit (GPU). The proposed algorithm involves acceleration via the Compute Unified Device Architecture (CUDA) framework developed by Nvidia. This framework facilitates the seamless scaling of computation on any CUDA-enabled GPUs. Thus, the Pearson Correlation Coefficient can be applied in parallel on each feature with respect to the response variable. The ranks obtained for each feature can be used to determine the most relevant features to select. Using data from the UCI Machine Learning Repository, our results show an increase in efficiency for multi-dimensional analysis with a more reliable feature importance ranking. When tested on a high-dimensional dataset of 1000 samples and 10,000 features, we achieved a 1,230-time speedup using CUDA. This acceleration grows exponentially, as with any embarrassingly parallel task.

Стилі APA, Harvard, Vancouver, ISO та ін.

Carrigan, Travis J., Jacob Watt, and Brian H. Dennis. "Using GPU-Based Computing to Solve Large Sparse Systems of Linear Equations." In ASME 2011 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASMEDC, 2011. http://dx.doi.org/10.1115/detc2011-48452.

Повний текст джерела

Анотація:

Often thought of as tools for image rendering or data visualization, graphics processing units (GPU) are becoming increasingly popular in the areas of scientific computing due to their low cost massively parallel architecture. With the introduction of CUDA C by NVIDIA and CUDA enabled GPUs, the ability to perform general purpose computations without the need to utilize shading languages is now possible. One such application that benefits from the capabilities provided by NVIDIA hardware is computational continuum mechanics (CCM). The need to solve sparse linear systems of equations is common in CCM when partial differential equations are discretized. Often these systems are solved iteratively using domain decomposition among distributed processors working in parallel. In this paper we explore the benefits of using GPUs to improve the performance of sparse matrix operations, more specifically, sparse matrix-vector multiplication. Our approach does not require domain decomposition, so it is simpler than corresponding implementation for distributed memory parallel computers. We demonstrate that for matrices produced from finite element discretizations on unstructured meshes, the performance of the matrix-vector multiplication operation is just under 13 times faster than when run serially on an Intel i5 system. Furthermore, we show that when used in conjunction with the biconjugate gradient stabilized method (BiCGSTAB), a gradient based iterative linear solver, the method is over 13 times faster than the serially executed C equivalent. And lastly, we emphasize the application of such method for solving Poisson’s equation using the Galerkin finite element method, and demonstrate over 10.5 times higher performance on the GPU when compared with the Intel i5 system.

Стилі APA, Harvard, Vancouver, ISO та ін.

Zhou, Tao, Qiang-Ming Cai, Xin Cao, Wen Jiang, Yuying Zhu, Yuyu Zhu, and Jun Fan. "GPU-Accelerated HO-SIE-DDM Using NVIDIA CUDA for Analysis of Multiscale Problems." In 2022 Asia-Pacific International Symposium on Electromagnetic Compatibility (APEMC). IEEE, 2022. http://dx.doi.org/10.1109/apemc53576.2022.9888565.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Kelly, Jesse. "GPU-Accelerated Simulation of Two-Phase Incompressible Fluid Flow Using a Level-Set Method for Interface Capturing." In ASME 2009 International Mechanical Engineering Congress and Exposition. ASMEDC, 2009. http://dx.doi.org/10.1115/imece2009-13330.

Повний текст джерела

Анотація:

Computational fluid dynamics has seen a surge of popularity as a tool for visual effects animators over the past decade since Stam’s seminal Stable Fluids paper [1]. Complex fluid dynamics simulations can often be prohibitive to run due to the time it takes to perform all of the necessary computations. This project proposes an accelerated two-phase incompressible fluid flow solver implemented on programmable graphics hardware. Modern graphics-processing units (GPUs) are highly parallel computing devices, and in problems with a large potential for parallel computation the GPU may vastly out-perform the CPU. This project will use the potential parallelism in the solution of the Navier-Stokes equations in writing a GPU-accelerated flow solver. NVIDIA’s Compute-Unified-Device-Architecture (CUDA) language will be used to program the parallel portions of the solver. CUDA is a C-like language introduced by the NVIDIA Corporation with the goal of simplifying general-purpose computing on the GPU. CUDA takes advantage of data-parallelism by executing the same or near-same code on different data streams simultaneously, so the algorithms used in the flow solver will be designed to be highly data-parallel. Most finite difference-based fluid solvers for computer graphics applications have used the traditional staggered marker-and-cell (MAC) grid, introduced by Harlow and Welsh [2]. The proposed approach improves upon the programmability of solvers such as these by using a non-staggered (collocated) grid. An efficient technique is implemented to smooth the pressure oscillations that often result from the use of a collocated grid in the simulation of incompressible flows. To be appropriate for visual effects use, a fluid solver must have some means of tracking fluid interfaces in order to have a renderable fluid surface. This project uses the level-set method [3] for interface tracking. The level set is treated as a scalar property, and so its propagation in time is computed using the same transport algorithm used in the main fluid flow solver.

Стилі APA, Harvard, Vancouver, ISO та ін.

Alam, Muhammad S., and Liang Cheng. "Parallelization of LBM Code Using CUDA Capable GPU Platform for 3D Single and Two-Sided Non-Facing Lid-Driven Cavity Flow." In ASME 2011 30th International Conference on Ocean, Offshore and Arctic Engineering. ASMEDC, 2011. http://dx.doi.org/10.1115/omae2011-50332.

Повний текст джерела

Анотація:

In this paper, a lattice Boltzmann model is developed and then parallelized employing a Compute Unified Device Architecture (CUDA) capable nVIDIA GPU platform. Numerical algorithms are developed for the solution of 3D single and two-sided non-facing lid-driven (TSNFL) cavity flow for Re = 10–1000. The algorithms are verified by solving both steady and unsteady 3D cavity and 3D TSNFL flow problems. Excellent agreement is obtained between numerical predictions and results available in literature. The results show that the CUDA-enabled LBM code is computationally efficient. It is observed that the implementation of LBM on a GPU allows at least thirty million lattice updates per second for 3-D lid driven cavity flow. Computations have been carried out for a 2-D lid driven cavity flow too. It is revealed that LBM-GPU calculation achieves 641 million lattice updates per second for the 2-D lid driven cavity flow.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!