Academic literature on the topic 'GPU code generation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'GPU code generation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "GPU code generation"

1

EMMART, NIALL, and CHARLES WEEMS. "SEARCH-BASED AUTOMATIC CODE GENERATION FOR MULTIPRECISION MODULAR EXPONENTIATION ON MULTIPLE GENERATIONS OF GPU." Parallel Processing Letters 23, no. 04 (2013): 1340009. http://dx.doi.org/10.1142/s0129626413400094.

Full text
Abstract:
Multiprecision modular exponentiation has a variety of uses, including cryptography, prime testing and computational number theory. It is also a very costly operation to compute. GPU parallelism can be used to accelerate these computations, but to use the GPU efficiently, a problem must involve many simultaneous exponentiation operations. Handling a large number of TLS/SSL encrypted sessions in a data center is an important problem that fits this profile. We are developing a framework that enables generation of highly efficient implementations of exponentiation operations for different NVIDIA GPU architectures and problem instances. One of the challenges in generating such code is that NVIDIA's PTX is not a true assembly language, but is instead a virtual instruction set that is compiled and optimized in different ways for different generations of GPU hardware. Thus, the same PTX code runs with different levels of efficiency on different machines. And as the precision of the computations changes, each architecture has its own break-even points where a different algorithm or parallelization strategy must be employed. To make the code efficient for a given problem instance and architecture thus requires searching a multidimensional space of algorithms and configurations, by generating PTX code for each combination, executing it, validating the numerical result, and evaluating its performance. Our framework automates much of this process, and produces exponentiation code that is up to six times faster than the best known hand-coded implementations for the NVIDIA GTX 580. Our goal for the framework is to enable users to relatively quickly find the best configuration for each new GPU architecture. However, in migrating to the GTX 680, which has three times as many cores as the GTX 580, we found that the best performance our system could achieve was significantly less than for the GTX 580. The decrease was traced to a radical shift in the NVIDIA architecture that greatly reduces the storage resources for each core. Further analysis and feasibility simulations indicate that it should be possible, through changes in our code generators to adapt for different storage models, to take greater advantage of the parallelism on the GTX 680. That will add a new dimension to our search space, but will also give our framework greater flexibility for dealing with future architectures.
APA, Harvard, Vancouver, ISO, and other styles
2

Afar Nazim, Allazov. "Automatic Generation of GPU Code in DVOR." University News. North-Caucasian Region. Technical Sciences Series, no. 3 (September 2015): 3–9. http://dx.doi.org/10.17213/0321-2653-2015-3-3-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Blazewicz, Marek, Ian Hinder, David M. Koppelman, et al. "From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation." Scientific Programming 21, no. 1-2 (2013): 1–16. http://dx.doi.org/10.1155/2013/167841.

Full text
Abstract:
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, theChemoraframework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.
APA, Harvard, Vancouver, ISO, and other styles
4

Rodrigues, A. Wendell O., Frédéric Guyomarc'h, Jean-Luc Dekeyser, and Yvonnick Le Menach. "Automatic Multi-GPU Code Generation Applied to Simulation of Electrical Machines." IEEE Transactions on Magnetics 48, no. 2 (2012): 831–34. http://dx.doi.org/10.1109/tmag.2011.2179527.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rawat, Prashant Singh, Miheer Vaidya, Aravind Sukumaran-Rajam, et al. "Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations." Proceedings of the IEEE 106, no. 11 (2018): 1902–20. http://dx.doi.org/10.1109/jproc.2018.2862896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Basu, Protonu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, and Mary Hall. "Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers." Parallel Computing 64 (May 2017): 50–64. http://dx.doi.org/10.1016/j.parco.2017.04.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Klöckner, Andreas, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. "PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation." Parallel Computing 38, no. 3 (2012): 157–74. http://dx.doi.org/10.1016/j.parco.2011.09.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Hagiescu, Andrei, Bing Liu, R. Ramanathan, et al. "GPU code generation for ODE-based applications with phased shared-data access patterns." ACM Transactions on Architecture and Code Optimization 10, no. 4 (2013): 1–19. http://dx.doi.org/10.1145/2541228.2555311.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Holzer, Markus, Martin Bauer, Harald Köstler, and Ulrich Rüde. "Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at high-density ratios on CPUs and GPUs through code generation." International Journal of High Performance Computing Applications 35, no. 4 (2021): 413–27. http://dx.doi.org/10.1177/10943420211016525.

Full text
Abstract:
A high-performance implementation of a multiphase lattice Boltzmann method based on the conservative Allen-Cahn model supporting high-density ratios and high Reynolds numbers is presented. Meta-programming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a high-level symbolic description and optimized through automatic transformations. The memory footprint of the resulting algorithm is reduced through the fusion of compute kernels. A roofline analysis demonstrates the excellent efficiency of the generated code on a single GPU. The resulting single GPU code has been integrated into the multiphysics framework waLBerla to run massively parallel simulations on large domains. Communication hiding and GPUDirect-enabled MPI yield near-perfect scaling behavior. Scaling experiments are conducted on the Piz Daint supercomputer with up to 2048 GPUs, simulating several hundred fully resolved bubbles. Further, validation of the implementation is shown in a physically relevant scenario—a three-dimensional rising air bubble in water.
APA, Harvard, Vancouver, ISO, and other styles
10

Walsh, Stuart D. C., and Martin O. Saar. "Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units." Communications in Computational Physics 13, no. 3 (2013): 867–79. http://dx.doi.org/10.4208/cicp.351011.260112s.

Full text
Abstract:
AbstractLattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior. These methods are well suited to parallel implementation, particularly on the single-instruction multiple data (SIMD) parallel processing environments found in computer graphics processing units (GPUs).Although recent programming tools dramatically improve the ease with which GPUbased applications can be written, the programming environment still lacks the flexibility available to more traditional CPU programs. In particular, it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code generation that overcomes these difficulties for lattice-Boltzmann simulations. It details the development of GPU-based modules for an extensible lattice-Boltzmann simulation package – LBHydra. The performance of the automatically generated code is compared to equivalent purposewritten codes for both single-phase,multiphase, andmulticomponent flows. The flexibility of the new method is demonstrated by simulating a rising, dissolving droplet moving through a porous medium with user generated lattice-Boltzmann models and subroutines.
APA, Harvard, Vancouver, ISO, and other styles
More sources
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography