Academic literature on the topic 'GPU code generation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'GPU code generation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "GPU code generation"

1

EMMART, NIALL, and CHARLES WEEMS. "SEARCH-BASED AUTOMATIC CODE GENERATION FOR MULTIPRECISION MODULAR EXPONENTIATION ON MULTIPLE GENERATIONS OF GPU." Parallel Processing Letters 23, no. 04 (December 2013): 1340009. http://dx.doi.org/10.1142/s0129626413400094.

Full text
Abstract:
Multiprecision modular exponentiation has a variety of uses, including cryptography, prime testing and computational number theory. It is also a very costly operation to compute. GPU parallelism can be used to accelerate these computations, but to use the GPU efficiently, a problem must involve many simultaneous exponentiation operations. Handling a large number of TLS/SSL encrypted sessions in a data center is an important problem that fits this profile. We are developing a framework that enables generation of highly efficient implementations of exponentiation operations for different NVIDIA GPU architectures and problem instances. One of the challenges in generating such code is that NVIDIA's PTX is not a true assembly language, but is instead a virtual instruction set that is compiled and optimized in different ways for different generations of GPU hardware. Thus, the same PTX code runs with different levels of efficiency on different machines. And as the precision of the computations changes, each architecture has its own break-even points where a different algorithm or parallelization strategy must be employed. To make the code efficient for a given problem instance and architecture thus requires searching a multidimensional space of algorithms and configurations, by generating PTX code for each combination, executing it, validating the numerical result, and evaluating its performance. Our framework automates much of this process, and produces exponentiation code that is up to six times faster than the best known hand-coded implementations for the NVIDIA GTX 580. Our goal for the framework is to enable users to relatively quickly find the best configuration for each new GPU architecture. However, in migrating to the GTX 680, which has three times as many cores as the GTX 580, we found that the best performance our system could achieve was significantly less than for the GTX 580. The decrease was traced to a radical shift in the NVIDIA architecture that greatly reduces the storage resources for each core. Further analysis and feasibility simulations indicate that it should be possible, through changes in our code generators to adapt for different storage models, to take greater advantage of the parallelism on the GTX 680. That will add a new dimension to our search space, but will also give our framework greater flexibility for dealing with future architectures.
APA, Harvard, Vancouver, ISO, and other styles
2

Afar Nazim, Allazov. "Automatic Generation of GPU Code in DVOR." University News. North-Caucasian Region. Technical Sciences Series, no. 3 (September 2015): 3–9. http://dx.doi.org/10.17213/0321-2653-2015-3-3-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Blazewicz, Marek, Ian Hinder, David M. Koppelman, Steven R. Brandt, Milosz Ciznicki, Michal Kierzynka, Frank Löffler, Erik Schnetter, and Jian Tao. "From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation." Scientific Programming 21, no. 1-2 (2013): 1–16. http://dx.doi.org/10.1155/2013/167841.

Full text
Abstract:
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, theChemoraframework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.
APA, Harvard, Vancouver, ISO, and other styles
4

Rodrigues, A. Wendell O., Frédéric Guyomarc'h, Jean-Luc Dekeyser, and Yvonnick Le Menach. "Automatic Multi-GPU Code Generation Applied to Simulation of Electrical Machines." IEEE Transactions on Magnetics 48, no. 2 (February 2012): 831–34. http://dx.doi.org/10.1109/tmag.2011.2179527.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rawat, Prashant Singh, Miheer Vaidya, Aravind Sukumaran-Rajam, Mahesh Ravishankar, Vinod Grover, Atanas Rountev, Louis-Noel Pouchet, and P. Sadayappan. "Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations." Proceedings of the IEEE 106, no. 11 (November 2018): 1902–20. http://dx.doi.org/10.1109/jproc.2018.2862896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Basu, Protonu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, and Mary Hall. "Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers." Parallel Computing 64 (May 2017): 50–64. http://dx.doi.org/10.1016/j.parco.2017.04.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Klöckner, Andreas, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. "PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation." Parallel Computing 38, no. 3 (March 2012): 157–74. http://dx.doi.org/10.1016/j.parco.2011.09.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Hagiescu, Andrei, Bing Liu, R. Ramanathan, Sucheendra K. Palaniappan, Zheng Cui, Bipasa Chattopadhyay, P. S. Thiagarajan, and Weng-Fai Wong. "GPU code generation for ODE-based applications with phased shared-data access patterns." ACM Transactions on Architecture and Code Optimization 10, no. 4 (December 2013): 1–19. http://dx.doi.org/10.1145/2541228.2555311.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Holzer, Markus, Martin Bauer, Harald Köstler, and Ulrich Rüde. "Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at high-density ratios on CPUs and GPUs through code generation." International Journal of High Performance Computing Applications 35, no. 4 (May 13, 2021): 413–27. http://dx.doi.org/10.1177/10943420211016525.

Full text
Abstract:
A high-performance implementation of a multiphase lattice Boltzmann method based on the conservative Allen-Cahn model supporting high-density ratios and high Reynolds numbers is presented. Meta-programming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a high-level symbolic description and optimized through automatic transformations. The memory footprint of the resulting algorithm is reduced through the fusion of compute kernels. A roofline analysis demonstrates the excellent efficiency of the generated code on a single GPU. The resulting single GPU code has been integrated into the multiphysics framework waLBerla to run massively parallel simulations on large domains. Communication hiding and GPUDirect-enabled MPI yield near-perfect scaling behavior. Scaling experiments are conducted on the Piz Daint supercomputer with up to 2048 GPUs, simulating several hundred fully resolved bubbles. Further, validation of the implementation is shown in a physically relevant scenario—a three-dimensional rising air bubble in water.
APA, Harvard, Vancouver, ISO, and other styles
10

Walsh, Stuart D. C., and Martin O. Saar. "Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units." Communications in Computational Physics 13, no. 3 (March 2013): 867–79. http://dx.doi.org/10.4208/cicp.351011.260112s.

Full text
Abstract:
AbstractLattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior. These methods are well suited to parallel implementation, particularly on the single-instruction multiple data (SIMD) parallel processing environments found in computer graphics processing units (GPUs).Although recent programming tools dramatically improve the ease with which GPUbased applications can be written, the programming environment still lacks the flexibility available to more traditional CPU programs. In particular, it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code generation that overcomes these difficulties for lattice-Boltzmann simulations. It details the development of GPU-based modules for an extensible lattice-Boltzmann simulation package – LBHydra. The performance of the automatically generated code is compared to equivalent purposewritten codes for both single-phase,multiphase, andmulticomponent flows. The flexibility of the new method is demonstrated by simulating a rising, dissolving droplet moving through a porous medium with user generated lattice-Boltzmann models and subroutines.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "GPU code generation"

1

Holewinski, Justin A. "Automatic Code Generation for Stencil Computations on GPU Architectures." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1354545992.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Beaugnon, Ulysse. "Efficient code generation for hardware accelerators by refining partially specified implementation." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE050.

Full text
Abstract:
Les compilateurs cherchant à améliorer l’efficacité des programmes doivent déterminer quelles optimisations seront les plus bénéfiques. Ce problème est complexe, surtout lors des premières étapes de la compilation où chaque décision influence les choix disponibles aux étapes suivantes. Nous proposons de représenter la compilation comme le raffinement progressif d’une implémentation partiellement spécifiée. Les décisions possibles sont toutes connues dès le départ et commutent. Cela permet de prendre les décisions les plus importantes en premier et de construire un modèle de performance capable d'anticiper les potentielles optimisations. Nous appliquons cette approche pour générer du code d'algèbre linéaire ciblant des GPU et obtenons des performances comparables aux bibliothèques optimisées à la main
Compilers looking for an efficient implementation of a function must find which optimizations are the most beneficial. This is a complex problem, especially in the early steps of the compilation process. Each decision may impact the transformations available in subsequent steps. We propose to represent the compilation process as the progressive refinement of a partially specified implementation. All potential decisions are exposed upfront and commute. This allows for making the most discriminative decisions first and for building a performance model aware of which optimizations may be applied in subsequent steps. We apply this approach to the generation of efficient GPU code for linear algebra and yield performance competitive with hand-tuned libraries
APA, Harvard, Vancouver, ISO, and other styles
3

Membarth, Richard [Verfasser]. "Code Generation for GPU Accelerators from a Domain-Specific Language for Medical Imaging / Richard Membarth." München : Verlag Dr. Hut, 2013. http://d-nb.info/1037287142/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Mueller-Roemer, Johannes Sebastian Verfasser], Dieter W. [Akademischer Betreuer] Fellner, André [Akademischer Betreuer] [Stork, and Heinrich [Akademischer Betreuer] Müller. "GPU Data Structures and Code Generation for Modeling, Simulation, and Visualization / Johannes Sebastian Mueller-Roemer ; Dieter W. Fellner, André Stork, Heinrich Müller." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2020. http://d-nb.info/1204200823/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mueller-Roemer, Johannes Sebastian [Verfasser], Dieter W. [Akademischer Betreuer] Fellner, André [Akademischer Betreuer] Stork, and Heinrich [Akademischer Betreuer] Müller. "GPU Data Structures and Code Generation for Modeling, Simulation, and Visualization / Johannes Sebastian Mueller-Roemer ; Dieter W. Fellner, André Stork, Heinrich Müller." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2020. http://d-nb.info/1204200823/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Shanmugam, Sakthivadivel Saravanakumar. "Fast-NetMF: Graph Embedding Generation on Single GPU and Multi-core CPUs with NetMF." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557162076041442.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhengxuan, Zhang, Kou Yanhong, and Zhang Qishan. "DESIGN OF A SOFTWARE RADIO GPS RECEIVER." International Foundation for Telemetering, 2005. http://hdl.handle.net/10150/605032.

Full text
Abstract:
ITC/USA 2005 Conference Proceedings / The Forty-First Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2005 / Riviera Hotel & Convention Center, Las Vegas, Nevada
The GPS receiver based on software radio technology is a kind of general purpose GPS signal processing platform which makes use of advanced design ideas and advanced design tools nowadays. We used FPGA device and lots of necessary peripherals such as DSP and PCI controller in our design to promote flexibility and practicability effectively. Various fast acquisition means and accurate tracking algorithms could be realized, improved and validated on this platform, besides basic GPS receiver function.
APA, Harvard, Vancouver, ISO, and other styles
8

Kim, Jinsung. "Optimizing Tensor Contractions on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563237825735994.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Masliah, Ian. "Méthodes de génération automatique de code appliquées à l’algèbre linéaire numérique dans le calcul haute performance." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS285/document.

Full text
Abstract:
Les architectures parallèles sont aujourd'hui présentes dans tous les systèmes informatiques, allant des smartphones aux supercalculateurs en passant par les ordinateurs de bureau. Programmer efficacement ces architectures en fonction des applications requiert un effort pluridisciplinaire portant sur les langages dédiés (Domain Specific Languages - DSL), les techniques de génération de code et d'optimisation, et les algorithmes numériques propres aux applications. Dans cette thèse, nous présentons une méthode de programmation haut niveau prenant en compte les caractéristiques des architectures hétérogènes et les propriétés existantes des matrices pour produire un solveur générique d'algèbre linéaire dense. Notre modèle de programmation supporte les transferts explicites et implicites entre un processeur (CPU) et un processeur graphique qui peut être généraliste (GPU) ou intégré (IGP). Dans la mesure où les GPU sont devenus un outil important pour le calcul haute performance, il est essentiel d'intégrer leur usage dans les plateformes de calcul. Une architecture récente telle que l'IGP requiert des connaissances supplémentaires pour pouvoir être programmée efficacement. Notre méthodologie a pour but de simplifier le développement sur ces architectures parallèles en utilisant des outils de programmation haut niveau. À titre d'exemple, nous avons développé un solveur de moindres carrés en précision mixte basé sur les équations semi-normales qui n'existait pas dans les bibliothèques actuelles. Nous avons par la suite étendu nos travaux à un modèle de programmation multi-étape ("multi-stage") pour résoudre les problèmes d'interopérabilité entre les modèles de programmation CPU et GPU. Nous utilisons cette technique pour générer automatiquement du code pour accélérateur à partir d'un code effectuant des opérations point par point ou utilisant des squelettes algorithmiques. L'approche multi-étape nous assure que le typage du code généré est valide. Nous avons ensuite montré que notre méthode est applicable à d'autres architectures et algorithmes. Les routines développées ont été intégrées dans une bibliothèque de calcul appelée NT2.Enfin, nous montrons comment la programmation haut niveau peut être appliquée à des calculs groupés et des contractions de tenseurs. Tout d'abord, nous expliquons comment concevoir un modèle de container en utilisant des techniques de programmation basées sur le C++ moderne (C++-14). Ensuite, nous avons implémenté un produit de matrices optimisé pour des matrices de petites tailles en utilisant des instructions SIMD. Pour ce faire, nous avons pris en compte les multiples problèmes liés au calcul groupé ainsi que les problèmes de localité mémoire et de vectorisation. En combinant la programmation haut niveau avec des techniques avancées de programmation parallèle, nous montrons qu'il est possible d'obtenir de meilleures performances que celles des bibliothèques numériques actuelles
Parallelism in today's computer architectures is ubiquitous whether it be in supercomputers, workstations or on portable devices such as smartphones. Exploiting efficiently these systems for a specific application requires a multidisciplinary effort that concerns Domain Specific Languages (DSL), code generation and optimization techniques and application-specific numerical algorithms. In this PhD thesis, we present a method of high level programming that takes into account the features of heterogenous architectures and the properties of matrices to build a generic dense linear algebra solver. Our programming model supports both implicit or explicit data transfers to and from General-Purpose Graphics Processing Units (GPGPU) and Integrated Graphic Processors (IGPs). As GPUs have become an asset in high performance computing, incorporating their use in general solvers is an important issue. Recent architectures such as IGPs also require further knowledge to program them efficiently. Our methodology aims at simplifying the development on parallel architectures through the use of high level programming techniques. As an example, we developed a least-squares solver based on semi-normal equations in mixed precision that cannot be found in current libraries. This solver achieves similar performance as other mixed-precision algorithms. We extend our approach to a new multistage programming model that alleviates the interoperability problems between the CPU and GPU programming models. Our multistage approach is used to automatically generate GPU code for CPU-based element-wise expressions and parallel skeletons while allowing for type-safe program generation. We illustrate that this work can be applied to recent architectures and algorithms. The resulting code has been incorporated into a C++ library called NT2. Finally, we investigate how to apply high level programming techniques to batched computations and tensor contractions. We start by explaining how to design a simple data container using modern C++14 programming techniques. Then, we study the issues around batched computations, memory locality and code vectorization to implement a highly optimized matrix-matrix product for small sizes using SIMD instructions. By combining a high level programming approach and advanced parallel programming techniques, we show that we can outperform state of the art numerical libraries
APA, Harvard, Vancouver, ISO, and other styles
10

Mueller-Roemer, Johannes Sebastian. "GPU Data Structures and Code Generation for Modeling, Simulation, and Visualization." Phd thesis, 2020. https://tuprints.ulb.tu-darmstadt.de/11291/1/dissertation-2019-12-20.pdf.

Full text
Abstract:
Virtual prototyping, the iterative process of using computer-aided (CAx) modeling, simulation, and visualization tools to optimize prototypes and products before manufacturing the first physical artifact, plays an increasingly important role in the modern product development process. Especially due to the availability of affordable additive manufacturing (AM) methods (3D printing), it is becoming increasingly possible to manufacture customized products or even for customers to print items for themselves. In such cases, the first physical prototype is frequently the final product. In this dissertation, methods to efficiently parallelize modeling, simulation, and visualization operations are examined with the goal of reducing iteration times in the virtual prototyping cycle, while simultaneously improving the availability of the necessary CAx tools. The presented methods focus on parallelization on programmable graphics processing units (GPUs). Modern GPUs are fully programmable massively parallel manycore processors that are characterized by their high energy efficiency and good price-performance ratio. Additionally, GPUs are already present in many workstations and home computers due to their use in computer-aided design (CAD) and computer games. However, specialized algorithms and data structures are required to make efficient use of the processing power of GPUs. Using the novel GPU-optimized data structures and algorithms as well as the new applications of compiler technology introduced in this dissertation, speedups between approximately one (10×) and more than two orders of magnitude (> 100×) are achieved compared to the state of the art in the three core areas of virtual prototyping. Additionally, memory use and required bandwidths are reduced by up to nearly 86%. As a result, not only can computations on existing models be executed more efficiently but larger models can be created and processed as well. In the area of modeling, efficient discrete mesh processing algorithms are examined with a focus on volumetric meshes. In the field of simulation, the assembly of the large sparse system matrices resulting from the finite element method (FEM) and the simulation of fluid dynamics are accelerated. As sparse matrices form the foundation of the presented approaches to mesh processing and simulation, GPU-optimized sparse matrix data structures and hardware- and domain-specific automatic tuning of these data structures are developed and examined as well. In the area of visualization, visualization latencies in remote visualization of cloud-based simulations are reduced by using an optimizing query compiler. By using hybrid visualization, various user interactions can be performed without network round trip latencies.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "GPU code generation"

1

Konstantinidis, Athanasios, Paul H. J. Kelly, J. Ramanujam, and P. Sadayappan. "Parametric GPU Code Generation for Affine Loop Programs." In Languages and Compilers for Parallel Computing, 136–51. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-09967-5_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Trevisan Jost, Tiago, Arun Thangamani, Raphaël Colin, Vincent Loechner, Stéphane Genaud, and Bérenger Bramas. "GPU Code Generation of Cardiac Electrophysiology Simulation with MLIR." In Euro-Par 2023: Parallel Processing, 549–63. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-39698-4_37.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hu, Weifang, Lin Han, Pu Han, and Jiandong Shang. "Automatic Thread Block Size Selection Strategy in GPU Parallel Code Generation." In Parallel Architectures, Algorithms and Programming, 390–404. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-0010-4_34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Membarth, Richard, Anton Lokhmotov, and Jürgen Teich. "Generating GPU Code from a High-Level Representation for Image Processing Kernels." In Euro-Par 2011: Parallel Processing Workshops, 270–80. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-29737-3_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Shashidhar, G., and Rupesh Nasre. "LightHouse: An Automatic Code Generator for Graph Algorithms on GPUs." In Languages and Compilers for Parallel Computing, 235–49. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-52709-3_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Sosa, J., Tomás Bautista, Daniel Alcaraz, S. García-Alonso, and Juan A. Montiel-Nelson. "Generation of New Detection Codes for GPS Satellites Using NSGA-II." In Computational Methods in Applied Sciences, 511–20. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-11541-2_34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Klöckner, Andreas, Nicolas Pinto, Bryan Catanzaro, Yunsup Lee, Paul Ivanov, and Ahmed Fasih. "GPU Scripting and Code Generation with PyCUDA." In GPU Computing Gems Jade Edition, 373–85. Elsevier, 2012. http://dx.doi.org/10.1016/b978-0-12-385963-1.00027-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Eastman, Peter, and Vijay Pande. "Accelerating Development and Execution Speed with Just-in-Time GPU Code Generation." In GPU Computing Gems Jade Edition, 399–407. Elsevier, 2012. http://dx.doi.org/10.1016/b978-0-12-385963-1.00029-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Holm, Håvard H., André R. Brodtkorb, and Martin L. Sætra. "Performance and Energy Efficiency of CUDA and OpenCL for GPU Computing Using Python." In Parallel Computing: Technology Trends. IOS Press, 2020. http://dx.doi.org/10.3233/apc200089.

Full text
Abstract:
In this work, we examine the performance and energy efficiency when using Python for developing HPC codes running on the GPU. We investigate the portability of performance and energy efficiency between CUDA and OpenCL; between GPU generations; and between low-end, mid-range and high-end GPUs. Our findings show that for some combinations of GPU and GPU code, there is a significant speedup for CUDA over OpenCL, but that this does not hold in general. Our experiments show that performance in general varies more between different GPUs, than between using CUDA and OpenCL. Finally, we show that tuning for performance is a good way of tuning for energy efficiency.
APA, Harvard, Vancouver, ISO, and other styles
10

Rockenbach, Dinei A., Dalvan Griebler, Marco Danelutto, and Luiz G. Fernandes. "High-Level Stream Parallelism Abstractions with SPar Targeting GPUs." In Parallel Computing: Technology Trends. IOS Press, 2020. http://dx.doi.org/10.3233/apc200083.

Full text
Abstract:
The combined exploitation of stream and data parallelism is demonstrating encouraging performance results in the literature for heterogeneous architectures, which are present on every computer systems today. However, provide parallel software efficiently targeting those architectures requires significant programming effort and expertise. The SPar domain-specific language already represents a solution to this problem providing proven high-level programming abstractions for multi-core architectures. In this paper, we enrich the SPar language adding support for GPUs. New transformation rules are designed for generating parallel code using stream and data parallel patterns. Our experiments revealed that these transformations rules are able to improve performance while the high-level programming abstractions are maintained.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "GPU code generation"

1

Zhou, Keren, Xiaozhu Meng, Ryuichi Sai, and John Mellor-Crummey. "GPA: A GPU Performance Advisor Based on Instruction Sampling." In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2021. http://dx.doi.org/10.1109/cgo51591.2021.9370339.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Buck, Ian. "GPU Computing: Programming a Massively Parallel Processor." In International Symposium on Code Generation and Optimization (CGO'07). IEEE, 2007. http://dx.doi.org/10.1109/cgo.2007.13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Elmqvist, Hilding, Hans Olsson, Axel Goteman, Vilhelm Roxling, Dirk Zimmer, and Alexander Pollok. "Automatic GPU Code Generation of Modelica Functions." In The 11th International Modelica Conference. Linköping University Electronic Press, 2015. http://dx.doi.org/10.3384/ecp15118235.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Ao, Bojian Zheng, Gennady Pekhimenko, and Fan Long. "Automatic Horizontal Fusion for GPU Kernels." In 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2022. http://dx.doi.org/10.1109/cgo53902.2022.9741270.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mishra, Alok, Martin Kong, and Barbara Chapman. "Kernel Fusion/Decomposition for Automatic GPU-Offloading." In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2019. http://dx.doi.org/10.1109/cgo.2019.8661188.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Remmelg, Toomas, Thibaut Lutz, Michel Steuwer, and Christophe Dubach. "Performance portable GPU code generation for matrix multiplication." In PPoPP '16: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2884045.2884046.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Motta, Paulo. "Declaring Lua data types for GPU code generation." In SPLASH '17: Conference on Systems, Programming, Languages, and Applications: Software for Humanity. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3141865.3142466.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Vießmann, Hans-Nikolai, and Sven-Bodo Scholz. "Effective Host-GPU Memory Management Through Code Generation." In IFL 2020: 32nd Symposium on Implementation and Application of Functional Languages. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3462172.3462199.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Katel, Navdeep, Vivek Khandelwal, and Uday Bondhugula. "MLIR-based code generation for GPU tensor cores." In CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3497776.3517770.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Brahmakshatriya, Ajay, Yunming Zhang, Changwan Hong, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. "Compiling Graph Applications for GPU s with GraphIt." In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2021. http://dx.doi.org/10.1109/cgo51591.2021.9370321.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "GPU code generation"

1

Berney, Ernest, Jami Lynn Daugherty, and Lulu Edwards. Validation of the automatic dynamic cone penetrometer. Engineer Research and Development Center (U.S.), July 2022. http://dx.doi.org/10.21079/11681/44704.

Full text
Abstract:
The U.S. military requires a rapid means of measuring subsurface soil strength for construction and repair of expeditionary pavement surfaces. Traditionally, a dynamic cone penetrometer (DCP) has served this purpose, providing strength with depth profiles in natural and prepared pavement surfaces. To improve upon this device, the Engineer Research and Development Center (ERDC) validated a new battery-powered automatic dynamic cone penetrometer (A-DCP) apparatus that automates the driving process by using a motor-driven hammering cap placed on top of a traditional DCP rod. The device improves upon a traditional DCP by applying three to four blows per second while digitally recording depth, blow count, and California Bearing Ratio (CBR). An integrated Global Positioning Sensor (GPS) and Bluetooth® connection allow for real-time data capture and stationing. Similarities were illustrated between the DCP and the A-DCP by generation of a new A-DCP calibration curve. This curve relates penetration rate to field CBR that nearly follows the DCP calibration with the exception of a slight offset. Field testing of the A-DCP showed less variability and more consistent strength measurement with depth at a speed five times greater than that of the DCP with minimal physical exertion by the operator.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography