Academic literature on the topic 'GPU1'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'GPU1.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "GPU1"

1

Nakada, Yuji, and Yoshifumi Itoh. "Pseudomonas aeruginosa PAO1 genes for 3-guanidinopropionate and 4-guanidinobutyrate utilization may be derived from a common ancestor." Microbiology 151, no. 12 (December 1, 2005): 4055–62. http://dx.doi.org/10.1099/mic.0.28258-0.

Full text
Abstract:
Pseudomonas aeruginosa PAO1 utilizes 3-guanidinopropionate (3-GP) and 4-guanidinobutyrate (4-GB), which differ in one methylene group only, via distinct enzymes: guanidinopropionase (EC 3.5.3.17; the gpuA product) and guanidinobutyrase (EC 3.5.3.7; the gbuA product). The authors cloned and characterized the contiguous gpuPAR genes (in that order) responsible for 3-GP utilization, and compared the deduced sequences of their putative protein products, and the potential regulatory mechanisms of gpuPA, with those of the corresponding gbu genes encoding the 4-GB catabolic system. GpuA and GpuR have similarity to GbuA (49 % identity) and GbuR (a transcription activator of gbuA; 37 % identity), respectively. GpuP resembles PA1418 (58 % identity), which is a putative membrane protein encoded by a potential gene downstream of gbuA. These features of the GpuR and GpuP sequences, and the impaired growth of gpuR and gpuP knockout mutants on 3-GP, support the notion that GpuR and GpuP direct the 3-GP-inducible expression of gpuA, and the uptake of 3-GP, respectively. Northern blots of mRNA from 3-GP-induced PAO1 cells revealed three transcripts of gpuA, gpuP, and gpuP and gpuA together, suggesting that gpuP and gpuA each have a 3-GP-responsible promoter, and that some transcription from the gpuP promoter is terminated after gpuP, or proceeds into gpuA. Knockout of gpuR abolished 3-GP-dependent synthesis of the transcripts, confirming that GpuR activates transcription from these promoters, with 3-GP as a specific co-inducer. The sequence conservation between the three functional pairs of the Gpu and Gbu proteins, and the absence of gpuAPR in closely related species, imply that the triad gpu genes have co-ordinately evolved from origins common to the gbu counterparts, to establish an independent catabolic system of 3-GP in P. aeruginosa.
APA, Harvard, Vancouver, ISO, and other styles
2

Guo, Sen, San Feng Chen, and Yong Sheng Liang. "Global Shared Memory Design for Multi-GPU Graphics Cards on Personal Supercomputer." Applied Mechanics and Materials 263-266 (December 2012): 1236–41. http://dx.doi.org/10.4028/www.scientific.net/amm.263-266.1236.

Full text
Abstract:
When programming CUDA or OpenCL on multi-GPU systems, the programmers usually expect the GPUs on the same system can communicate fast with each other. For instance, they hope a device memory copy from GPU1s memory to GPU2s memory can be done inside the graphics card, and needn’t to employ the PCIE, which is in relative low speed. In this paper, we propose an idea to add a multi-channel memory to the multi-GPU board, and this memory is only for transferring data between different GPUs. This multi-channel memory should have multiple interfaces, including one common interface shared by different GPUs, which is connected with a FPGA arbitration circuit and several other interfaces connected with dedicated GPUs frame buffer independently. To distinguish the shared memory of a stream multiprocessor, we call this memory Global Shared Memory. We analyze the performance improvement expectation with this global shared memory, with the case of accelerating computer tomography algebraic reconstruction on multi-GPU.
APA, Harvard, Vancouver, ISO, and other styles
3

Palmer, Daniel A., Jill K. Thompson, Lie Li, Ashton Prat, and Ping Wang. "Gib2, A Novel Gβ-like/RACK1 Homolog, Functions as a Gβ Subunit in cAMP Signaling and Is Essential in Cryptococcus neoformans." Journal of Biological Chemistry 281, no. 43 (September 1, 2006): 32596–605. http://dx.doi.org/10.1074/jbc.m602768200.

Full text
Abstract:
Canonical G proteins are heterotrimeric, consisting of α, β, and γ subunits. Despite multiple Gα subunits functioning in fungi, only a single Gβ subunit per species has been identified, suggesting that non-conventional G protein signaling exists in this diverse group of eukaryotic organisms. Using the Gα subunit Gpa1 that functions in cAMP signaling as bait in a two-hybrid screen, we have identified a novel Gβ-like/RACK1 protein homolog, Gib2, from the human pathogenic fungus Cryptococcus neoformans. Gib2 contains a seven WD-40 repeat motif and is predicted to form a seven-bladed β propeller structure characteristic of β transducins. Gib2 is also shown to interact, respectively, with two Gγ subunit homologs, Gpg1 and Gpg2, similar to the conventional Gβ subunit Gpb1. In contrast to Gpb1 whose overexpression promotes mating response, overproduction of Gib2 suppresses defects of gpa1 mutation in both melanization and capsule formation, the phenotypes regulated by cAMP signaling and associated with virulence. Furthermore, depletion of Gib2 by antisense suppression results in a severe growth defect, suggesting that Gib2 is essential. Finally, Gib2 is shown to also physically interact with a downstream target of Gpa1-cAMP signaling, Smg1, and the protein kinase C homolog Pkc1, indicating that Gib2 is also a multifunctional RACK1-like protein.
APA, Harvard, Vancouver, ISO, and other styles
4

Harashima, Toshiaki, and Joseph Heitman. "Gα Subunit Gpa2 Recruits Kelch Repeat Subunits That Inhibit Receptor-G Protein Coupling during cAMP-induced Dimorphic Transitions in Saccharomyces cerevisiae." Molecular Biology of the Cell 16, no. 10 (October 2005): 4557–71. http://dx.doi.org/10.1091/mbc.e05-05-0403.

Full text
Abstract:
All eukaryotic cells sense extracellular stimuli and activate intracellular signaling cascades via G protein-coupled receptors (GPCR) and associated heterotrimeric G proteins. The Saccharomyces cerevisiae GPCR Gpr1 and associated Gα subunit Gpa2 sense extracellular carbon sources (including glucose) to govern filamentous growth. In contrast to conventional Gα subunits, Gpa2 forms an atypical G protein complex with the kelch repeat Gβ mimic proteins Gpb1 and Gpb2. Gpb1/2 negatively regulate cAMP signaling by inhibiting Gpa2 and an as yet unidentified target. Here we show that Gpa2 requires lipid modifications of its N-terminus for membrane localization but association with the Gpr1 receptor or Gpb1/2 subunits is dispensable for membrane targeting. Instead, Gpa2 promotes membrane localization of its associated Gβ mimic subunit Gpb2. We also show that the Gpa2 N-terminus binds both to Gpb2 and to the C-terminal tail of the Gpr1 receptor and that Gpb1/2 binding interferes with Gpr1 receptor coupling to Gpa2. Our studies invoke novel mechanisms involving GPCR-G protein modules that may be conserved in multicellular eukaryotes.
APA, Harvard, Vancouver, ISO, and other styles
5

Lai, Jianqi, Hua Li, Zhengyu Tian, and Ye Zhang. "A Multi-GPU Parallel Algorithm in Hypersonic Flow Computations." Mathematical Problems in Engineering 2019 (March 17, 2019): 1–15. http://dx.doi.org/10.1155/2019/2053156.

Full text
Abstract:
Computational fluid dynamics (CFD) plays an important role in the optimal design of aircraft and the analysis of complex flow mechanisms in the aerospace domain. The graphics processing unit (GPU) has a strong floating-point operation capability and a high memory bandwidth in data parallelism, which brings great opportunities for CFD. A cell-centred finite volume method is applied to solve three-dimensional compressible Navier–Stokes equations on structured meshes with an upwind AUSM+UP numerical scheme for space discretization, and four-stage Runge–Kutta method is used for time discretization. Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for GPUs, which reduces the complexity of programming. The main purpose of this paper is to design an extremely efficient multi-GPU parallel algorithm based on MPI+CUDA to study the hypersonic flow characteristics. Solutions of hypersonic flow over an aerospace plane model are provided at different Mach numbers. The agreement between numerical computations and experimental measurements is favourable. Acceleration performance of the parallel platform is studied with single GPU, two GPUs, and four GPUs. For single GPU implementation, the speedup reaches 63 for the coarser mesh and 78 for the finest mesh. GPUs are better suited for compute-intensive tasks than traditional CPUs. For multi-GPU parallelization, the speedup of four GPUs reaches 77 for the coarser mesh and 147 for the finest mesh; this is far greater than the acceleration achieved by single GPU and two GPUs. It is prospective to apply the multi-GPU parallel algorithm to hypersonic flow computations.
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Ping, John R. Perfect, and Joseph Heitman. "The G-Protein β Subunit GPB1 Is Required for Mating and Haploid Fruiting in Cryptococcus neoformans." Molecular and Cellular Biology 20, no. 1 (January 1, 2000): 352–62. http://dx.doi.org/10.1128/mcb.20.1.352-362.2000.

Full text
Abstract:
ABSTRACT Cryptococcus neoformans is an opportunistic fungal pathogen with a defined sexual cycle. The gene encoding a heterotrimeric G-protein β subunit, GPB1, was cloned and disrupted.gpb1 mutant strains are sterile, indicating a role for this gene in mating. GPB1 plays an active role in mediating responses to pheromones in early mating steps (conjugation tube formation and cell fusion) and signals via a mitogen-activated protein (MAP) kinase cascade in both MATα and MATa cells. The functions of GPB1 are distinct from those of the Gα protein GPA1, which functions in a nutrient-sensing cyclic AMP (cAMP) pathway required for mating, virulence factor induction, and virulence.gpb1 mutant strains are also defective in monokaryotic fruiting in response to nitrogen starvation. We show thatMATa cells stimulate monokaryotic fruiting ofMATα cells, possibly in response to mating pheromone, which may serve to disperse cells and spores to locate mating partners. In summary, the Gβ subunit GPB1 and the Gα subunit GPA1 function in distinct signaling pathways: one (GPB1) senses pheromones and regulates mating and haploid fruiting via a MAP kinase cascade, and the other (GPA1) senses nutrients and regulates mating, virulence factors, and pathogenicity via a cAMP cascade.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhou, Chao, and Tao Zhang. "High Performance Graph Data Imputation on Multiple GPUs." Future Internet 13, no. 2 (January 31, 2021): 36. http://dx.doi.org/10.3390/fi13020036.

Full text
Abstract:
In real applications, massive data with graph structures are often incomplete due to various restrictions. Therefore, graph data imputation algorithms have been widely used in the fields of social networks, sensor networks, and MRI to solve the graph data completion problem. To keep the data relevant, a data structure is represented by a graph-tensor, in which each matrix is the vertex value of a weighted graph. The convolutional imputation algorithm has been proposed to solve the low-rank graph-tensor completion problem that some data matrices are entirely unobserved. However, this data imputation algorithm has limited application scope because it is compute-intensive and low-performance on CPU. In this paper, we propose a scheme to perform the convolutional imputation algorithm with higher time performance on GPUs (Graphics Processing Units) by exploiting multi-core GPUs of CUDA architecture. We propose optimization strategies to achieve coalesced memory access for graph Fourier transform (GFT) computation and improve the utilization of GPU SM resources for singular value decomposition (SVD) computation. Furthermore, we design a scheme to extend the GPU-optimized implementation to multiple GPUs for large-scale computing. Experimental results show that the GPU implementation is both fast and accurate. On synthetic data of varying sizes, the GPU-optimized implementation running on a single Quadro RTX6000 GPU achieves up to 60.50× speedups over the GPU-baseline implementation. The multi-GPU implementation achieves up to 1.81× speedups on two GPUs versus the GPU-optimized implementation on a single GPU. On the ego-Facebook dataset, the GPU-optimized implementation achieves up to 77.88× speedups over the GPU-baseline implementation. Meanwhile, the GPU implementation and the CPU implementation achieve similar, low recovery errors.
APA, Harvard, Vancouver, ISO, and other styles
8

MITTAL, SPARSH. "A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs." Journal of Circuits, Systems and Computers 23, no. 08 (June 18, 2014): 1430002. http://dx.doi.org/10.1142/s0218126614300025.

Full text
Abstract:
Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges of cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.
APA, Harvard, Vancouver, ISO, and other styles
9

Oden, Lena, and Holger Fröning. "InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU." International Journal of High Performance Computing Applications 31, no. 4 (June 25, 2015): 274–84. http://dx.doi.org/10.1177/1094342015588142.

Full text
Abstract:
Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in high-performance computing and are a strong candidate for future exascale systems. But communication and data transfer in GPU-accelerated systems remain a challenging problem. Since the GPU normally is not able to control a network device, a hybrid-programming model is preferred whereby the GPU is used for calculation and the CPU handles the communication. As a result, communication between distributed GPUs suffers from unnecessary overhead, introduced by switching control flow from GPUs to CPUs and vice versa. Furthermore, often a designated CPU thread is required to control GPU-related communication. In this work, we modify user space libraries and device drivers of GPUs and the InfiniBand network device in a way to enable the GPU to control an InfiniBand network device to independently source and sink communication requests without any involvement of the CPU. Our results show that complex networking protocols such as InfiniBand Verbs are better handled by CPUs, since overhead of work request generation cannot be parallelized and is not suitable for the highly parallel programming model of GPUs. The massive number of instructions and accesses to host memory that is required to source and sink a communication request on the GPU slows down the performance. Only through a massive reduction in the complexity of the InfiniBand protocol can some performance improvements be achieved.
APA, Harvard, Vancouver, ISO, and other styles
10

Gaurav and Steven F. Wojtkiewicz. "Use of GPU Computing for Uncertainty Quantification in Computational Mechanics: A Case Study." Scientific Programming 19, no. 4 (2011): 199–212. http://dx.doi.org/10.1155/2011/730213.

Full text
Abstract:
Graphics processing units (GPUs) are rapidly emerging as a more economical and highly competitive alternative to CPU-based parallel computing. As the degree of software control of GPUs has increased, many researchers have explored their use in non-gaming applications. Recent studies have shown that GPUs consistently outperform their best corresponding CPU-based parallel computing alternatives in single-instruction multiple-data (SIMD) strategies. This study explores the use of GPUs for uncertainty quantification in computational mechanics. Five types of analysis procedures that are frequently utilized for uncertainty quantification of mechanical and dynamical systems have been considered and their GPU implementations have been developed. The numerical examples presented in this study show that considerable gains in computational efficiency can be obtained for these procedures. It is expected that the GPU implementations presented in this study will serve as initial bases for further developments in the use of GPUs in the field of uncertainty quantification and will (i) aid the understanding of the performance constraints on the relevant GPU kernels and (ii) provide some guidance regarding the computational and the data structures to be utilized in these novel GPU implementations.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "GPU1"

1

Stodůlka, Martin. "Akcelerace ultrazvukových simulací pomocí multi-GPU systémů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445538.

Full text
Abstract:
The main focus of this project is usage of multi - GPU systems and usage of CUDA unified memory . Its goal is to accelerate computation of 2D and 3D FFT, which is the main part of simulations in k- Wave library .K- Wave is a C++/ Matlab library used for simulations of propagation of ultrasonic waves in 1D , 2D or 3D space . Acceleration of these functions is necessary , because the simulations are computationally intensive .
APA, Harvard, Vancouver, ISO, and other styles
2

Ma, Wenjing. "Automatic Transformation and Optimization of Applications on GPUs and GPU clusters." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1300972089.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Tanasić, Ivan. "Towards multiprogrammed GPUs." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/405796.

Full text
Abstract:
Programmable Graphics Processing Units (GPUs) have recently become the most pervasitheve massively parallel processors. They have come a long way, from fixed function ASICs designed to accelerate graphics tasks to a programmable architecture that can also execute general-purpose computations. Because of their performance and efficiency, an increasing amount of software is relying on them to accelerate data parallel and computationally intensive sections of code. They have earned a place in many systems, from low power mobile devices to the biggest data centers in the world. However, GPUs are still plagued by the fact that they essentially have no multiprogramming support, resulting in low system performance if the GPU is shared among multiple programs. In this dissertation we set to provide the rich GPU multiprogramming support by improving the multitasking capabilities and increasing the virtual memory functionality and performance. The main issue hindering the multitasking support in GPUs is the nonpreemptive execution of GPU kernels. Here we propose two preemption mechanisms with dierent design philosophies, that can be used by a scheduler to preempt execution on GPU cores and make room for some other process. We also argue for the spatial sharing of the GPU and propose a concrete hardware scheduler implementation that dynamically partitions the GPU cores among running kernels, according to their set priorities. Opposing the assumptions made in the related work, we demonstrate that preemptive execution is feasible and the desired approach to GPU multitasking. We further show improved system fairness and responsiveness with our scheduling policy. We also pinpoint that at the core of the insufficient virtual memory support lies the exceptions handling mechanism used by modern GPUs. Currently, GPUs offload the actual exception handling work to the CPU, while the faulting instruction is stalled in the GPU core. This stall-on-fault model prevents some of the virtual memory features and optimizations and is especially harmful in multiprogrammed environments because it prevents context switching the GPU unless all the in-flight faults are resolved. In this disseritation, we propose three GPU core organizations with varying performance-complexity trade-off that get rid of the stall-on-fault execution and enable preemptible exceptions on the GPU (i.e., the faulting instruction can be squashed and restarted later). Building on this support, we implement two use cases and demonstrate their utility. One is a scheme that performs context switch of the faulted threads and tries to find some other useful work to do in the meantime, hiding the latency of the fault and improving the system performance. The other enables the fault handling code to run locally, on the GPU, instead of relying on the CPU offloading and show that the local fault handling can also improve performance.
Las Unidades de Procesamiento de Gráficos Programables (GPU, por sus siglas en inglés) se han convertido recientemente en los procesadores masivamente paralelos más difundidos. Han recorrido un largo camino desde ASICs de función fija diseñados para acelerar tareas gráficas, hasta una arquitectura programable que también puede ejecutar cálculos de propósito general. Debido a su rendimiento y eficiencia, una cantidad creciente de software se basa en ellas para acelerar las secciones de código computacionalmente intensivas que disponen de paralelismo de datos. Se han ganado un lugar en muchos sistemas, desde dispositivos móviles de baja potencia hasta los centros de datos más grandes del mundo. Sin embargo, las GPUs siguen plagadas por el hecho de que esencialmente no tienen soporte de multiprogramación, lo que resulta en un bajo rendimiento del sistema si la GPU se comparte entre múltiples programas. En esta disertación nos centramos en proporcionar soporte de multiprogramación para GPUs mediante la mejora de las capacidades de multitarea y del soporte de memoria virtual. El principal problema que dificulta el soporte multitarea en las GPUs es la ejecución no apropiativa de los núcleos de la GPU. Proponemos dos mecanismos de apropiación con diferentes filosofías de diseño, que pueden ser utilizados por un planificador para apropiarse de los núcleos de la GPU y asignarlos a otros procesos. También abogamos por la división espacial de la GPU y proponemos una implementación concreta de un planificador hardware que divide dinámicamente los núcleos de la GPU entre los kernels en ejecución, de acuerdo con sus prioridades establecidas. Oponiéndose a las suposiciones hechas por otros en trabajos relacionados, demostramos que la ejecución apropiativa es factible y el enfoque deseado para la multitarea en GPUs. Además, mostramos una mayor equidad y capacidad de respuesta del sistema con nuestra política de asignación de núcleos de la GPU. También señalamos que la causa principal del insuficiente soporte de la memoria virtual en las GPUs es el mecanismo de manejo de excepciones utilizado por las GPUs modernas. En la actualidad, las GPUs descargan el manejo de las excepciones a la CPU, mientras que la instrucción que causo la fallada se encuentra esperando en el núcleo de la GPU. Este modelo de bloqueo en fallada impide algunas de las funciones y optimizaciones de la memoria virtual y es especialmente perjudicial en entornos multiprogramados porque evita el cambio de contexto de la GPU a menos que se resuelvan todas las fallas pendientes. En esta disertación, proponemos tres implementaciones del pipeline de los núcleos de la GPU que ofrecen distintos balances de rendimiento-complejidad y permiten la apropiación del núcleo aunque haya excepciones pendientes (es decir, la instrucción que produjo la fallada puede ser reiniciada más tarde). Basándonos en esta nueva funcionalidad, implementamos dos casos de uso para demostrar su utilidad. El primero es un planificador que asigna el núcleo a otros subprocesos cuando hay una fallada para tratar de hacer trabajo útil mientras esta se resuelve, ocultando así la latencia de la fallada y mejorando el rendimiento del sistema. El segundo permite que el código de manejo de las falladas se ejecute localmente en la GPU, en lugar de descargar el manejo a la CPU, mostrando que el manejo local de falladas también puede mejorar el rendimiento.
APA, Harvard, Vancouver, ISO, and other styles
4

Hong, Changwan. "Code Optimization on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Kaibo. "Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1447685368.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Pedersen, Stian Aaraas. "Progressive Photon Mapping on GPUs." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-22995.

Full text
Abstract:
Physically based rendering using ray tracing is capable of producing realistic images of much higher quality than other methods. However, the computational costs associated with exploring all paths of light are huge; it can take hours to render high quality images of complex scenes. Using graphics processing units has emerged as a popular way to speed up this process. The recent appearance of libraries like Nvidia's CUDA and OptiX make the processing power of modern GPUs more available than ever before.This project includes an overview of current photon mapping techniques. We present a complete render application based on photon mapping which runs entirely on the GPU. Several different photon map implementations suitable for modern GPU architectures are considered and evaluated. A uniform grid approach based on photon sorting on the GPU is found to be fast to construct and efficient for photon gathering.The application is extended to support volumetric effects like fog and smoke. Our implementation is tested on a set of benchmark scenes exhibiting phenomenon like global illumination, reflections and refractions, and participating volumetric media.A major contribution of this thesis is to demonstrate how recent advances in photon mapping can be used to render an image using many GPUs simultaneously. Our results show that we are able to get close to linear speedup employing up to six GPUs in a distributed system. We can emit up to 18 million photons per second on six Nvidia GTX 480 and generate images of our test scenes with little to no noise in a few minutes. Our implementation is straightforward to extend to a cluster of GPUs.
APA, Harvard, Vancouver, ISO, and other styles
7

Harb, Mohammed. "Quantum transport modeling with GPUs." Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=114417.

Full text
Abstract:
In this thesis, we have developed a parallel GPU accelerated code for carrying out transport calculations within the Non-Equilibrium Green's Function (NEGF) framework using the Tight-Binding (TB) model. We also discuss the theoretical, modelling, and computational issues that arise in this implementation. We demonstrate that a heterogenous implementation with CPUs and GPUs is superior to single processor, multiple processor, and massively parallel CPU-only implementations. The GPU-Matlab Interface (GMI) developed in this work for use in our NEGF-TB code is not application specific and can be used by researchers in any field without previous knowledge of GPU programming or multi-threaded programming. We also demonstrate that GMI competes very well with commercial packages.Finally, we apply our heterogenous NEGF-TB code to the study of electronic transport properties of Si nanowires and nanobeams. We investigate the effect of several kinds of structural defects on the conductance of such devices and demonstrate that our method can handle systems of over 200,000 atoms in a reasonable time scale while using just 1-4 GPUs.
Dans cette thèse, nous présentons un logiciel qui effectue des calculs de transport quantique en utilisant conjointement la théorie des fonctions de Green hors équilibre (non equilibrium Green function, NEGF) et le modèle des liens étroits (tight-binding model, TB). Notre logiciel tire avantage du parallélisme inhérent aux algorithmes utilisés en plus d'être accéléré grâce à l'utilisation de processeurs graphiques (GPU). Nous abordons également les problèmes théoriques, géométriques et numériques qui se posent lors de l'implémentation du code NEGF-TB. Nous démontrons ensuite qu'une implémentation hétérogène utilisant des CPU et des GPU est supérieure aux implémentations à processeur unique, à celles à processeurs multiples, et même aux implémentations massivement parallèles n'utilisant que des CPU. Le GPU-Matlab Interface (GMI) présenté dans cette thèse fut développé pour des fins de calculs de transport quantique NEGF-TB. Néanmoins, les capacités de GMI ne se limitent pas à l'utilisation que nous en faisons ici et GMI peut être utilisé par des chercheurs de tous les domaines n'ayant pas de connaissances préalables de la programmation GPU ou de la programmation "multi-thread". Nous démontrons également que GMI compétitionne avantageusement avec des logiciels commerciaux similaires.Enfin, nous utilisons notre logiciel NEGF-TB pour étudier certaines propriétés de transport électronique de nanofils de Si et de Nanobeams. Nous examinons l'effet de plusieurs sortes de lacunes sur la conductance de ces structures et démontrons que notre méthode peut étudier des systèmes de plus de 200 000 atomes en un temps raisonnable en utilisant de un à quatre GPU sur un seul poste de travail.
APA, Harvard, Vancouver, ISO, and other styles
8

Hovland, Rune Johan. "Throughput Computing on Future GPUs." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9893.

Full text
Abstract:

The general-purpose computing capabilities of the Graphics Processing Unit (GPU) have recently been given a great deal of attention by the High-Performance Computing (HPC) community. By allowing massively parallel applications to run efficiently on commodity graphics cards, ”personal supercomputers” are now available in desktop versions at a low price. For some applications, speedups of 70 times that of a single CPU implementation have been achieved. Among the most popular GPUs are those based on the NVIDIA Tesla Architecture which allows relatively easy development of GPU applications using the NVIDIA CUDA programming environment. While the GPU is gaining interest in the HPC community, others are more reluctant to embrace the GPU as a computational device. The focus on throughput and large data volumes separates Information Retrieval (IR) from HPC, since for IR it is critical to process large amounts of data efficiently, a task which the GPU currently does not excel at. Only recently has the IR community begun to explore the possibilities, and an implementation of a search engine for the GPU was published recently in April 2009. This thesis analyzes how GPUs can be improved to better suit large data volume applications. Current graphics cards have a bottleneck regarding the transfer of data between the host and the GPU. One approach to resolve this bottleneck is to include the host memory as part of the GPUs’ memory hierarchy. We develop a theoretical model, and based on this model, the expected performance improvement for high data volume applications are shown for both computationally-bound and data transfer-bound applications. The performance improvement for an existing search engine is also given based on the theoretical model. For this case, the improvements would result in a speedup between 1.389 and 1.874 for the various query-types supported by the search engine.

APA, Harvard, Vancouver, ISO, and other styles
9

Kim, Jinsung. "Optimizing Tensor Contractions on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563237825735994.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Tadros, Rimon. "Accelerating web search using GPUs." Thesis, University of British Columbia, 2015. http://hdl.handle.net/2429/54722.

Full text
Abstract:
The amount of content on the Internet is growing rapidly as well as the number of the online Internet users. As a consequence, web search engines need to increase their computing capabilities and data continually while maintaining low search latency and without a significant rise in the cost per query. To serve this larger numbers of online users, web search engines utilize a large distributed system in the data centers. They partition their data across several hundred of thousands of independent commodity servers called Index Serving Nodes (ISNs). These ISNs work together to serve search queries as a single coherent system in a distributed manner. The choice of a high number of commodity servers vs. a smaller number of supercomputers is due to the need for scalability, high availability/reliability, performance, and cost efficiency. For the web search engines to serve a larger data, the web search engines can be scaled either vertically or horizontally~\cite{michael2007scale}. Vertical scaling enables ranking more documents per query within a single node by employing machines with higher single thread and throughput performance with bigger and faster memory. Horizontal scaling supports a larger index by adding more index serving nodes at the cost of increased synchronization, aggregation overhead, and power consumption. This thesis evaluates the potential for achieving better vertical scaling by using Graphics processing unit (GPUs) to reduce the documents ranking latency per query at a reasonable initial cost increase. It achieves this by exploiting the parallelism in ranking the numerous potential documents that match a query to offload to the GPU. We evaluate this approach using hundreds of rankers from a commercial web search engine on real production data. Our results show an 8.8x harmonic mean reduction in the latency and 2x power efficiency when ranking 10000 web documents per query for a variety of rankers using C++AMP and a commodity GPU.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "GPU1"

1

Kindratenko, Volodymyr, ed. Numerical Computations with GPUs. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-06548-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

GPU computing gems. Boston, MA: Morgan Kaufmann, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Engel, Wolfgang. GPU Pro 360. Edited by Wolfgang Engel. First edition. j Boca Raton, FL : CRC Press/Taylor & Francis Group, 2018. j Includes bibliographical references and index.: A K Peters/CRC Press, 2018. http://dx.doi.org/10.1201/9781351052108.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Engel, Wolfgang. GPU Pro 360. Edited by Wolfgang Engel. First edition. | Boca Raton, FL : CRC Press/Taylor & Francis Group, 2018.: A K Peters/CRC Press, 2018. http://dx.doi.org/10.1201/9781351208352.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Engel, Wolfgang. GPU Pro 360. Edited by Wolfgang Engel. Boca Raton : Taylor & Francis, CRC Press, 2018: A K Peters/CRC Press, 2018. http://dx.doi.org/10.1201/9781351261524.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Engel, Wolfgang, ed. GPU Pro 360. Boca Raton : Taylor & Francis, CRC Press, [2018]: A K Peters/CRC Press, 2018. http://dx.doi.org/10.1201/b22483.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Designing scientific applications on GPUs. Boca Raton, [Florida]: CRC/Taylor & Francis, 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Rodengen, Jeffrey L. The legacy of GPU. Fort Lauderdale, FL: Write Stuff Enterprises, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Cai, Yiyu, and Simon See, eds. GPU Computing and Applications. Singapore: Springer Singapore, 2015. http://dx.doi.org/10.1007/978-981-287-134-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

GPU Pro2: Advanced rendering techniques. Natick, Mass: AK Peters, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "GPU1"

1

Reinders, James, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, and Xinmin Tian. "Programming for GPUs." In Data Parallel C++, 353–85. Berkeley, CA: Apress, 2020. http://dx.doi.org/10.1007/978-1-4842-5574-2_15.

Full text
Abstract:
Abstract Over the last few decades, Graphics Processing Units (GPUs) have evolved from specialized hardware devices capable of drawing images on a screen to general-purpose devices capable of executing complex parallel kernels. Nowadays, nearly every computer includes a GPU alongside a traditional CPU, and many programs may be accelerated by offloading part of a parallel algorithm from the CPU to the GPU.
APA, Harvard, Vancouver, ISO, and other styles
2

Osama, Muhammad, Anton Wijs, and Armin Biere. "SAT Solving with GPU Accelerated Inprocessing." In Tools and Algorithms for the Construction and Analysis of Systems, 133–51. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-72016-2_8.

Full text
Abstract:
AbstractSince 2013, the leading SAT solvers in the SAT competition all use inprocessing, which unlike preprocessing, interleaves search with simplifications. However, applying inprocessing frequently can still be a bottle neck, i.e., for hard or large formulas. In this work, we introduce the first attempt to parallelize inprocessing on GPU architectures. As memory is a scarce resource in GPUs, we present new space-efficient data structures and devise a data-parallel garbage collector. It runs in parallel on the GPU to reduce memory consumption and improves memory access locality. Our new parallel variable elimination algorithm is twice as fast as previous work. In experiments our new solver ParaFROST solves many benchmarks faster on the GPU than its sequential counterparts.
APA, Harvard, Vancouver, ISO, and other styles
3

Andrzejewski, Witold, and Robert Wrembel. "GPU-WAH: Applying GPUs to Compressing Bitmap Indexes with Word Aligned Hybrid." In Lecture Notes in Computer Science, 315–29. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-15251-1_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Lombardi, Luca, and Piercarlo Dondi. "GPU." In Encyclopedia of Systems Biology, 844. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4419-9863-7_1308.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ketkar, Nikhil. "Introduction to GPUs." In Deep Learning with Python, 149–58. Berkeley, CA: Apress, 2017. http://dx.doi.org/10.1007/978-1-4842-2766-4_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kalé, Laxmikant V., Abhinav Bhatele, Eric J. Bohm, James C. Phillips, David H. Bailey, Ananth Y. Grama, Joseph Fogarty, et al. "NVIDIA GPU." In Encyclopedia of Parallel Computing, 1339–45. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-09766-4_276.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Uelschen, Michael. "GPU-Programmierung." In Software Engineering Paralleler Systeme, 313–39. Wiesbaden: Springer Fachmedien Wiesbaden, 2019. http://dx.doi.org/10.1007/978-3-658-25343-1_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Rauber, Thomas, and Gudula Rünger. "GPU-Programmierung." In Parallele Programmierung, 387–416. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-13604-7_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ranta, Sunil Mohan, Jag Mohan Singh, and P. J. Narayanan. "GPU Objects." In Computer Vision, Graphics and Image Processing, 352–63. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11949619_32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Vázquez, Fransisco, José Antonio Martínez, and Ester M. Garzón. "GPU Computing." In Encyclopedia of Systems Biology, 845–49. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4419-9863-7_998.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "GPU1"

1

Tarashima, Shuhei, Satoshi Someya, and Koji Okamoto. "Acceleration of Recursive Cross-Correlation PIV Using Multiple GPUs." In ASME/JSME 2011 8th Thermal Engineering Joint Conference. ASMEDC, 2011. http://dx.doi.org/10.1115/ajtec2011-44442.

Full text
Abstract:
A large number of PIV algorithms and systems have been proposed, many of which are highly sophisticated in terms of accuracy and spatial and temporal resolution. However, a general problem with PIV is the time cost to compute vector fields from images, which often imposes specific constraints on the measurement methods. In this paper, focusing on recursive direct cross-correlation PIV with window deformation, which is one of the most popular algorithms for PIV, we propose a technique to accelerate PIV processing using a single Graphics Processing Unit (single-GPU) and multiple GPUs (multi-GPU). In the case of using single-GPU, we show that PIV data can be processed over 100 times faster than using a CPU alone and about 30 PIV image pairs per second can be processed for certain image sizes. The scalability of the algorithm used is also discussed. In the case of using multi-GPU, the image split method and the parallel method of single-GPU-based PIV processing are measured. We show that the effect of multi-GPU can be observed over a certain amount of image data whether either of these two methods is used. Data transfer between CPU and GPUs is shown to be a bottleneck if the number of GPUs used increase.
APA, Harvard, Vancouver, ISO, and other styles
2

Ferro, Mariza, André Yokoyama, Vinicius Klõh, Gabrieli Silva, Rodrigo Gandra, Ricardo Bragança, Andre Bulcão, and Bruno Schulze. "Analysis of GPU Power Consumption Using Internal Sensors." In XVI Workshop em Desempenho de Sistemas Computacionais e de Comunicação. Sociedade Brasileira de Computação - SBC, 2017. http://dx.doi.org/10.5753/wperformance.2017.3360.

Full text
Abstract:
GPUs has been widely used in scientific computing, as by offering exceptional performance as by power-efficient hardware. Its position established in high-performance and scientific computing communities has increased the urgency of understanding the power cost of GPU usage in accurate measurements. For this, the use of internal sensors are extremely important. In this work, we employ the GPU sensors to obtain high-resolution power profiles of real and benchmark applications. We wrote our own tools to query the sensors of two NVIDIA GPUs from different generations and compare the accuracy of them. Also, we compare the power profile of GPU with CPU using IPMItool.
APA, Harvard, Vancouver, ISO, and other styles
3

Santos, Ricardo, Rhayssa Sonohata, Casio Krebs, Daniela Catelan, Liana Duenha, Diego Segovia, and Mateus Tostes Santos. "Exploração do Projeto de Sistemas Baseados em GPU ciente de Dark Silicon." In XX Simpósio em Sistemas Computacionais de Alto Desempenho. Sociedade Brasileira de Computação, 2019. http://dx.doi.org/10.5753/wscad.2019.8682.

Full text
Abstract:
Este artigo propõe uma infraestrutura para realizar a exploração do espaço de projetos de sistemas computacionais com unidades de processamento gráfico (GPUs) em conjunto com núcleos para processamento de propósito geral, com o objetivo de reduzir dark silicon e aumentar o desempenho do sistema em tempo de projeto. A ferramenta GPGPUSim de simulação e estimativa fı́sica de projeto foi estendida para realizar estimativas de dark silicon das plataformas de GPUs e, em seguida, foi integrada ao framework MultiExplorer. Adicionalmente, foi desenvolvida uma estratégia para estimativa de desempenho das plataformas de GPU e a modelagem de bases de dados que passaram a utilizar tanto núcleos de GPU quanto de plataformas multicore (núcleos de propósito geral), possibilitando, assim, a exploração do espaço de projeto buscando arquiteturas heterogêneas GP-GPUs.
APA, Harvard, Vancouver, ISO, and other styles
4

Gisbert, Fernando, Roque Corral, and Guillermo Pastor. "Implementation of an Edge-Based Navier-Stokes Solver for Unstructured Grids in Graphics Processing Units." In ASME 2011 Turbo Expo: Turbine Technical Conference and Exposition. ASMEDC, 2011. http://dx.doi.org/10.1115/gt2011-46224.

Full text
Abstract:
The implementation of an edge-based three-dimensional RANS equations solver for unstructured grids that runs on both central processing units (CPUs) and graphics processing units (GPUs) is presented. This CPU/GPU duality is kept without double-writing the code, reducing programming and maintenance costs. The GPU implementation is based on the standard OpenCL language. The code has been parallelized using MPI. Some turbomachinery benchmark cases are presented. For all cases, an order of magnitude reduction in computational time is achieved when the code is executed on GPUs instead of CPUs.
APA, Harvard, Vancouver, ISO, and other styles
5

Konobrytskyi, Dmytro, Thomas Kurfess, Joshua Tarbutton, and Tommy Tucker. "GPGPU Accelerated 3-Axis CNC Machining Simulation." In ASME 2013 International Manufacturing Science and Engineering Conference collocated with the 41st North American Manufacturing Research Conference. American Society of Mechanical Engineers, 2013. http://dx.doi.org/10.1115/msec2013-1096.

Full text
Abstract:
GPUs (Graphics Processing Units), traditionally used for 3D graphics calculations, have recently got an ability to perform general purpose calculations with a GPGPU (General Purpose GPU) technology. Moreover, GPUs can be much faster than CPUs (Central Processing Units) by performing hundreds or even thousands commands concurrently. This parallel processing allows the GPU achieving the extremely high performance but also requires using only highly parallel algorithms which can provide enough commands on each clock cycle. This work formulates a methodology for selection of a right geometry representation and a data structure suitable for parallel processing on GPU. Then the methodology is used for designing the 3-axis CNC milling simulation algorithm accelerated with the GPGPU technology. The developed algorithm is validated by performing an experimental machining simulation and evaluation of the performance results. The experimental simulation shows an importance of an optimization process and usage of algorithms that provide enough work to GPU. The used test configuration also demonstrates almost an order of magnitude difference between CPU and GPU performance results.
APA, Harvard, Vancouver, ISO, and other styles
6

Gajjar, Mrugesh, Christian Amann, and Kai Kadau. "High-Performance Computing Probabilistic Fracture Mechanics Implementation for Gas Turbine Rotor Disks on Distributed Architectures Including Graphics Processing Units (GPUs)." In ASME Turbo Expo 2021: Turbomachinery Technical Conference and Exposition. American Society of Mechanical Engineers, 2021. http://dx.doi.org/10.1115/gt2021-59295.

Full text
Abstract:
Abstract We present an efficient Monte Carlo based probabilistic fracture mechanics simulation implementation for heterogeneous high-performance (HPC) architectures including CPUs and GPUs. The specific application focuses on large heavy-duty gas turbine rotor components for the energy sector. A reliable probabilistic risk quantification requires the simulation of millions to billions of Monte Carlo (MC) samples. We apply a modified Runge-Kutta algorithm in order to solve numerically the fatigue crack growth for this large number of cracks for varying initial crack sizes, locations, material and service conditions. This compute intensive simulation has already been demonstrated to perform efficiently and scalable on parallel and distributed HPC architectures including hundreds of CPUs utilizing the Message Passing Interface (MPI) paradigm. In this work, we go a step further and include GPUs in our parallelization strategy. We develop a load distribution scheme to share one or more GPUs on compute nodes distributed over a network. We detail technical challenges and solution strategies in performing the simulations on GPUs efficiently. We show that the key computation of the modified Runge-Kutta integration step speeds up over two orders of magnitude on a typical GPU compared to a single threaded CPU. This is supported by our use of GPU textures for efficient interpolation of multi-dimensional tables utilized in the implementation. We demonstrate weak and strong scaling of our GPU implementation, i.e., that we can efficiently utilize a large number of GPUs/CPUs in order to solve for more MC samples, or reduce the computational turn-around time, respectively. On seven different GPUs spanning four generations, the presented probabilistic fracture mechanics simulation tool ProbFM achieves a speed-up ranging from 16.4x to 47.4x compared to single threaded CPU implementation.
APA, Harvard, Vancouver, ISO, and other styles
7

Romanelli, G., L. Mangani, E. Casartelli, A. Gadda, and M. Favale. "Implementation of Explicit Density-Based Unstructured CFD Solver for Turbomachinery Applications on Graphical Processing Units." In ASME Turbo Expo 2015: Turbine Technical Conference and Exposition. American Society of Mechanical Engineers, 2015. http://dx.doi.org/10.1115/gt2015-43396.

Full text
Abstract:
For the aerodynamic design of multistage compressors and turbines Computational Fluid Dynamics (CFD) plays a fundamental role. In fact it allows the characterization of the complex behaviour of turbomachinery components with high fidelity. Together with the availability of more and more powerful computing resources, current trends pursue the adoption of such high-fidelity tools and state-of-the-art technology even in the preliminary design phases. Within such a framework Graphical Processing Units (GPUs) yield further growth potential, allowing a significant reduction of CFD process turn-around times at relatively low costs. The target of the present work is to illustrate the design and implementation of an explicit density-based RANS coupled solver for the efficient and accurate numerical simulation of multi-dimensional time-dependent compressible fluid flows on polyhedral unstructured meshes. The solver has been developed within the object-oriented OpenFOAM framework, using OpenCL bindings to interface CPU and GPU and using MPI to interface multiple GPUs. The overall structure of the code, the numerical strategies adopted and the algorithms implemented are specifically designed in order to best exploit the huge computational peak power offered by modern GPUs, by minimizing memory transfers between CPUs and GPUs and potential branch divergence occurrences. This has a significant impact in terms of the speedup factor and is especially challenging within a polyhedral unstructured mesh framework. Specific tools for turbomachinery applications, such as Arbitrary Mesh Interface (AMI) and mixing-plane (MP), are implemented within the GPU context. The credibility of the proposed CFD solver is assessed by tackling a number of benchmark test problems, including Rotor 67 axial compressor, C3X stator blade with conjugate heat transfer and Aachen multi-stage turbine. An average GPU speedup factor of approximately S ≃ 50 with respect to CPU is achieved (single precision, both GPU and CPU in 100 USD price range). Preliminary parallel scalability test run on multiple GPUs show a parallel efficiency factor of approximately E ≃ 75%.
APA, Harvard, Vancouver, ISO, and other styles
8

Bergmann, Ryan M., and Jasmina L. Vujić. "Monte Carlo Neutron Transport on GPUs." In 2014 22nd International Conference on Nuclear Engineering. American Society of Mechanical Engineers, 2014. http://dx.doi.org/10.1115/icone22-30148.

Full text
Abstract:
GPUs have gradually increased in computational power from the small, job-specific boards of the early 90s to the programmable powerhouses of today. Compared to CPUs, they have a higher aggregate memory bandwidth, much higher floating-point operations per second (FLOPS), and lower energy consumption per FLOP. Because one of the main obstacles in exascale computing is power consumption, many new supercomputing platforms are gaining much of their computational capacity by incorporating GPUs into their compute nodes. Since CPU optimized parallel algorithms are not directly portable to GPU architectures (or at least without losing substantial performance gain), transport codes need to be rewritten in order to execute efficiently on GPUs. Unless this is done, we cannot take full advantage of these new supercomputers for reactor simulations. In this work, we attempt to efficiently map the Monte Carlo transport algorithm on the GPU while preserving its benefits, namely, very few physical and geometrical simplifications. Regularizing memory access and introducing parallel-efficient search and sorting algorithms are the main factors in completing the task.
APA, Harvard, Vancouver, ISO, and other styles
9

Breder, Bernardo, Eduardo Charles, Rommel Cruz, Esteban Clua, Cristiana Bentes, and Lucia Drummond. "Maximizando o Uso dos Recursos de GPU Através da Reordenação da Submissão de Kernels Concorrentes." In XVII Simpósio em Sistemas Computacionais de Alto Desempenho. Sociedade Brasileira de Computação - SBC, 2016. http://dx.doi.org/10.5753/wscad.2016.14264.

Full text
Abstract:
O aumento da quantidade de recursos disponíveis nas GPUs modernas despertou um novo interesse no problema do compartilhamento de seus recursos por diferentes kernels. A nova geração de GPUs permite a execução simultânea de kernels, porém ainda são limitadas ao fato de que decisões de escalonamento são tomadas pelo hardware em tempo de execução. Tais decisões dependem da ordem em que os kernels são submetidos para execução, criando execuções onde a GPU não necessariamente está com a melhor taxa de ocupação. Neste trabalho, apresentamos uma proposta de otimização para reordenar a submissão de kernels com foco em: maximizar a utilização dos recursos e melhorar o turnaround time médio. Modelamos a atribuição de kernels para a GPU como uma série de problemas da mochila e usamos uma abordagem de programação dinâmica para resolvê-los. Avaliamos nossa proposta utilizando kernels com diferentes tamanhos e requisitos de recursos. Nossos resultados mostram ganhos significativos no turnaround time médio e no throughput em comparação com a submissão padrão de kernels implementada em GPUs modernas.
APA, Harvard, Vancouver, ISO, and other styles
10

Brandvik, Tobias, and Graham Pullan. "An Accelerated 3D Navier-Stokes Solver for Flows in Turbomachines." In ASME Turbo Expo 2009: Power for Land, Sea, and Air. ASMEDC, 2009. http://dx.doi.org/10.1115/gt2009-60052.

Full text
Abstract:
A new three-dimensional Navier-Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes, but has been implemented to run on Graphics Processing Units (GPUs) instead of the traditional Central Processing Unit (CPU). The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. Scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 minutes on a cluster with four GPUs.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "GPU1"

1

Holladay, Daniel. Non-LTE Opacity Computation on GPUs. Office of Scientific and Technical Information (OSTI), August 2014. http://dx.doi.org/10.2172/1148954.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Long, Alex Roberts. Jayenne GPU Strategy Update. Office of Scientific and Technical Information (OSTI), May 2020. http://dx.doi.org/10.2172/1634935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hughes, Clayton, Simon Hammond, Mengchi Zhang, Yechen Liu, Tim Rogers, and Robert Hoekstra. SST-GPU: A Scalable SST GPU Component for Performance Modeling and Profiling. Office of Scientific and Technical Information (OSTI), January 2021. http://dx.doi.org/10.2172/1762830.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Gong, Qian, Wenji Wu, and Phil DeMar. GoldenEye:Stream-Based Network Packet Inspection using GPUs. Office of Scientific and Technical Information (OSTI), October 2018. http://dx.doi.org/10.2172/1508017.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Cawkwell, Marc J., Anders M. Niklasson, and Susan M. Mniszewski. Quantum molecular dynamics on parallel GPUs: w13_qmdgpu. Office of Scientific and Technical Information (OSTI), May 2014. http://dx.doi.org/10.2172/1131015.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Monroe, Laura Marie, Sarah E. Michalak, and Joanne R. Wendelberger. Randomized selection on the GPU. Office of Scientific and Technical Information (OSTI), August 2011. http://dx.doi.org/10.2172/1090658.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Monroe, Laura Marie, Sarah E. Michalak, and Joanne R. Wendelberger. Randomized selection on the GPU. Office of Scientific and Technical Information (OSTI), August 2011. http://dx.doi.org/10.2172/1090659.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kennedy, Liz Sexton, and Philippe Canal. Geant Exascale / GPU Pilot Project. Office of Scientific and Technical Information (OSTI), November 2019. http://dx.doi.org/10.2172/1599619.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Monroe, Laura M. GPGPU Computing and Visualization on GPUs at LANL. Office of Scientific and Technical Information (OSTI), October 2012. http://dx.doi.org/10.2172/1053889.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Boggan, Sha'Kia, and Daniel M. Pressel. GPUs: An Emerging Platform for General-Purpose Computation. Fort Belvoir, VA: Defense Technical Information Center, August 2007. http://dx.doi.org/10.21236/ada471188.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography