Relevant bibliographies by topics / GPU Systems

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'GPU Systems'

Author: Grafiati

Published: 6 September 2023

Last updated: 7 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'GPU Systems.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "GPU Systems"

Jararweh, Yaser, Moath Jarrah, and Abdelkader Bousselham. "GPU Scaling." International Journal of Information Technology and Web Engineering 9, no. 4 (October 2014): 13–23. http://dx.doi.org/10.4018/ijitwe.2014100102.

Full text

Abstract:

Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.

APA, Harvard, Vancouver, ISO, and other styles

Dematte, L., and D. Prandi. "GPU computing for systems biology." Briefings in Bioinformatics 11, no. 3 (March 7, 2010): 323–33. http://dx.doi.org/10.1093/bib/bbq006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ban, Zhihua, Jianguo Liu, and Jeremy Fouriaux. "GMMSP on GPU." Journal of Real-Time Image Processing 17, no. 2 (March 17, 2018): 245–57. http://dx.doi.org/10.1007/s11554-018-0762-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Georgii, Joachim, and Rüdiger Westermann. "Mass-spring systems on the GPU." Simulation Modelling Practice and Theory 13, no. 8 (November 2005): 693–702. http://dx.doi.org/10.1016/j.simpat.2005.08.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Huynh, Huynh Phung, Andrei Hagiescu, Ong Zhong Liang, Weng-Fai Wong, and Rick Siow Mong Goh. "Mapping Streaming Applications onto GPU Systems." IEEE Transactions on Parallel and Distributed Systems 25, no. 9 (September 2014): 2374–85. http://dx.doi.org/10.1109/tpds.2013.195.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Deniz, Etem, and Alper Sen. "MINIME-GPU." ACM Transactions on Architecture and Code Optimization 12, no. 4 (January 7, 2016): 1–25. http://dx.doi.org/10.1145/2818693.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Braak, Gert-Jan Van Den, and Henk Corporaal. "R-GPU." ACM Transactions on Architecture and Code Optimization 13, no. 1 (April 5, 2016): 1–24. http://dx.doi.org/10.1145/2890506.

Full text

APA, Harvard, Vancouver, ISO, and other styles

INO, Fumihiko, Shinta NAKAGAWA, and Kenichi HAGIHARA. "GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems." IEICE Transactions on Information and Systems E96.D, no. 12 (2013): 2604–16. http://dx.doi.org/10.1587/transinf.e96.d.2604.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rosenfeld, Viktor, Sebastian Breß, and Volker Markl. "Query Processing on Heterogeneous CPU/GPU Systems." ACM Computing Surveys 55, no. 1 (January 31, 2023): 1–38. http://dx.doi.org/10.1145/3485126.

Full text

Abstract:

Due to their high computational power and internal memory bandwidth, graphic processing units (GPUs) have been extensively studied by the database systems research community. A heterogeneous query processing system that employs CPUs and GPUs at the same time has to solve many challenges, including how to distribute the workload on processors with different capabilities; how to overcome the data transfer bottleneck; and how to support implementations for multiple processors efficiently. In this survey we devise a classification scheme to categorize techniques developed to address these challenges. Based on this scheme, we categorize query processing systems on heterogeneous CPU/GPU systems and identify open research problems.

APA, Harvard, Vancouver, ISO, and other styles

Besozzi, Daniela, Giulio Caravagna, Paolo Cazzaniga, Marco Nobile, Dario Pescini, and Alessandro Re. "GPU-powered Simulation Methodologies for Biological Systems." Electronic Proceedings in Theoretical Computer Science 130 (September 30, 2013): 87–91. http://dx.doi.org/10.4204/eptcs.130.14.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "GPU Systems"

Yuan, George Lai. "GPU compute memory systems." Thesis, University of British Columbia, 2009. http://hdl.handle.net/2429/15877.

Full text

Abstract:

Modern Graphic Process Units (GPUs) offer orders of magnitude more raw computing power than contemporary CPUs by using many simpler in-order single-instruction, multiple-data (SIMD) cores optimized for multi-thread performance rather than single-thread performance. As such, GPUs operate much closer to the "Memory Wall", thus requiring much more careful memory management. This thesis proposes changes to the memory system of our detailed GPU performance simulator, GPGPU-Sim, to allow proper simulation of general-purpose applications written using NVIDIA's Compute Unified Device Architecture (CUDA) framework. To test these changes, fourteen CUDA applications with varying degrees of memory intensity were collected. With these changes, we show that our simulator predicts performance of commodity GPU hardware with 86% correlation. Furthermore, we show that increasing chip resources to allow more threads to run concurrently does not necessarily increase performance due to increased contention for the shared memory system. Moreover, this thesis proposes a hybrid analytical DRAM performance model that uses memory address traces to predict the efficiency of a DRAM system when using a conventional First-Ready First-Come First-Serve (FR-FCFS) memory scheduling policy. To stress the proposed model, a massively multithreaded architecture based upon contemporary high-end GPUs is simulated to generate the memory address trace needed. The results show that the hybrid analytical model predicts DRAM efficiency to within 11.2% absolute error when arithmetically averaged across a memory-intensive subset of the CUDA applications introduced in the first part of this thesis. Finally, this thesis proposes a complexity-effective solution to memory scheduling that recovers most of the performance loss incurred by a naive in-order First-in First-out (FIFO) DRAM scheduler compared to an aggressive out-of-order FR-FCFS scheduler. While FR-FCFS scheduling re-orders memory requests to improve row access locality, we instead employ an interconnection network arbitration scheme that preserves the inherently high row access locality of memory request streams from individual "shader cores" and, in doing so, achieve DRAM efficiency and system performance close to that of FR-FCFS with a simpler design. We evaluate our interconnection network arbitration scheme using crossbar, ring, and mesh networks and show that, when coupled with a banked FIFO in-order scheduler, it obtains up to 91.0% of the performance obtainable with an out-of-order memory scheduler with eight-entry DRAM controller queues.

APA, Harvard, Vancouver, ISO, and other styles

Arnau, Jose Maria. "Energy-efficient mobile GPU systems." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/290736.

Full text

Abstract:

The design of mobile GPUs is all about saving energy. Smartphones and tablets are battery-operated and thus any type of rendering needs to use as little energy as possible. Furthermore, smartphones do not include sophisticated cooling systems due to their small size, making heat dissipation a primary concern. Improving the energy-efficiency of mobile GPUs will be absolutely necessary to achieve the performance required to satisfy consumer expectations, while maintaining operating time per battery charge and keeping the GPU in its thermal limits. The first step in optimizing energy consumption is to identify the sources of energy drain. Previous studies have demonstrated that the register file is one of the main sources of energy consumption in a GPU. As graphics workloads are highly data- and memory-parallel, GPUs rely on massive multithreading to hide the memory latency and keep the functional units busy. However, aggressive multithreading requires a huge register file to keep the registers of thousands of simultaneous threads. Such a big register file exceeds the power budget typically available for an embedded graphics processors and, hence, more energy-efficient memory latency tolerance techniques are necessary. On the other hand, prior research showed that the off-chip accesses to system memory are one of the most expensive operations in terms of energy in a mobile GPU. Therefore, optimizing memory bandwidth usage is a primary concern in mobile GPU design. Many bandwidth saving techniques, such as texture compression or ARM's transaction elimination, have been proposed in both industry and academia. The purpose of this thesis is to study the characteristics of mobile graphics processors and mobile workloads in order to propose different energy saving techniques specifically tailored for the low-power segment. Firstly, we focus on energy-efficient memory latency tolerance. We analyze several techniques such as multithreading and prefetching and conclude that they are effective but not energy-efficient. Next, we propose an architecture for the fragment processors of a mobile GPU that is based on the decoupled access/execute paradigm. The results obtained by using a cycle-accurate mobile GPU simulator and several commercial Android games show that the decoupled architecture combined with a small degree of multithreading provides the most energy efficient solution for hiding memory latency. More specifically, the decoupled access/execute-like design with just 4 SIMD threads/processor is able to achieve 97% of the performance of a larger GPU with 16 SIMD threads/processor, while providing 20.5% energy savings on average. Secondly, we focus on optimizing memory bandwidth in a mobile GPU. We analyze the bandwidth usage in a set of commercial Android games and find that most of the bandwidth is employed for fetching textures, and also that consecutive frames share most of the texture dataset as they tend to be very similar. However, the GPU cannot capture inter-frame texture re-use due to the big size of the texture dataset for one frame. Based on this analysis, we propose Parallel Frame Rendering (PFR), a technique that overlaps the processing of multiple frames in order to exploit inter-frame texture re-use and save bandwidth. By processing multiple frames in parallel textures are fetched once every two frames instead of being fetched in a frame basis as in conventional GPUs. PFR provides 23.8% memory bandwidth savings on average in our set of Android games, that result in 12% speedup and 20.1% energy savings. Finally, we improve PFR by introducing a hardware memoization system on top. We analyze the redundancy in mobile games and find that more than 38% of the Fragment Program executions are redundant on average. We thus propose a task-level hardware-based memoization system that provides 15% speedup and 12% energy savings on average over a PFR-enabled GPU.
El diseño de las GPUs (Graphics Procesing Units) móviles se centra fundamentalmente en el ahorro energético. Los smartphones y las tabletas son dispositivos alimentados mediante baterías y, por lo tanto, cualquier tipo de renderizado debe utilizar la menor cantidad de energía posible. Mejorar la eficiencia energética de las GPUs móviles será absolutamente necesario para alcanzar el rendimiento requirido para satisfacer las expectativas de los usuarios, sin reducir el tiempo de vida de la batería. El primer paso para optimizar el consumo energético consiste en identificar qué componentes son los principales consumidores de la batería. Estudios anteriores han identificado al banco de registros y a los accessos a memoria principal como las mayores fuentes de consumo energético en una GPU. El propósito de esta tesis es estudiar las características de los procesadores gráficos móviles y de las aplicaciones móviles con el objetivo de proponer distintas técnicas de ahorro energético. En primer lugar, la investigación se centra en desarrollar métodos energéticamente eficientes para ocultar la latencia de la memoria principal. El resultado de la investigación es una arquitectura desacoplada para los Fragment Processors de la GPU. Los resultados experimentales utilizando un simulador de ciclo y distintos juegos de Android muestran que una arquitectura desacoplada, combinada con un nivel de multithreading moderado, proporciona la solución más eficiente desde el punto de vista energético para ocultar la latencia de la memoria prinicipal. Más específicamente, la arquitectura desacoplada con sólo 4 SIMD threads/processor es capaz de alcanzar el 97% del rendimiento de una GPU más grande con 16 SIMD threads/processor, al tiempo que se reduce el consumo energético en un 20.5%. En segundo lugar, el trabajo de investigación se centró en optimizar el ancho de banda en una GPU móvil. Se realizó un estudio del uso del ancho de banda en distintos juegos de Android y se observó que la mayor parte del ancho de banda se utiliza para leer texturas. Además, se observó que frames consecutivos comparten una gran parte de las texturas. Sin embargo, la GPU no puede capturar el reuso de texturas entre frames dado que el tamaño de las texturas utilizadas por un frame es mucho mayor que la caché de segundo nivel. Basándose en este análisis, se desarrolló Parallel Frame Rendering (PFR), una técnica que solapa el procesado de multiples frames consecutivos con el objetivo de explotar el reuso de texturas entre frames y ahorrar así ancho de bando. Al procesar múltiples frames en paralelo las texturas se leen de memoria principal una vez cada dos frames en lugar de leerse en cada frame como sucede en una GPU convencional. PFR proporciona un ahorro del 23.8% en ancho de banda en promedio para distintos juegos de Android, este ahorro de ancho de banda redunda en un incremento del rendimiento del 12% y un ahorro energético del 20.1%. Por último, se mejoró PFR introduciendo un sistema hardware capaz de evitar cómputos redundantes. Un análisis de distintos juegos de Android reveló que más de un 38% de las ejecuciones del Fragment Program eran redundantes en promedio. Así pues, se propuso un sistema hardware capaz de identificar y eliminar parte de los cómputos y accessos a memoria redundantes, dicho sistema proporciona un incremento del rendimiento del 15% y un ahorro energético del 12% en promedio con respecto a una GPU móvil basada en PFR.

APA, Harvard, Vancouver, ISO, and other styles

Arnau, Montañés Jose Maria. "Energy-efficient mobile GPU systems." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/290736.

Full text

Abstract:

APA, Harvard, Vancouver, ISO, and other styles

Dollinger, Jean-François. "A framework for efficient execution on GPU and CPU+GPU systems." Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAD019/document.

Full text

Abstract:

Les verrous technologiques rencontrés par les fabricants de semi-conducteurs au début des années deux-mille ont abrogé la flambée des performances des unités de calculs séquentielles. La tendance actuelle est à la multiplication du nombre de cœurs de processeur par socket et à l'utilisation progressive des cartes GPU pour des calculs hautement parallèles. La complexité des architectures récentes rend difficile l'estimation statique des performances d'un programme. Nous décrivons une méthode fiable et précise de prédiction du temps d'exécution de nids de boucles parallèles sur GPU basée sur trois étapes : la génération de code, le profilage offline et la prédiction online. En outre, nous présentons deux techniques pour exploiter l'ensemble des ressources disponibles d'un système pour la performance. La première consiste en l'utilisation conjointe des CPUs et GPUs pour l'exécution d'un code. Afin de préserver les performances il est nécessaire de considérer la répartition de charge, notamment en prédisant les temps d'exécution. Le runtime utilise les résultats du profilage et un ordonnanceur calcule des temps d'exécution et ajuste la charge distribuée aux processeurs. La seconde technique présentée met le CPU et le GPU en compétition : des instances du code cible sont exécutées simultanément sur CPU et GPU. Le vainqueur de la compétition notifie sa complétion à l'autre instance, impliquant son arrêt
Technological limitations faced by the semi-conductor manufacturers in the early 2000's restricted the increase in performance of the sequential computation units. Nowadays, the trend is to increase the number of processor cores per socket and to progressively use the GPU cards for highly parallel computations. Complexity of the recent architectures makes it difficult to statically predict the performance of a program. We describe a reliable and accurate parallel loop nests execution time prediction method on GPUs based on three stages: static code generation, offline profiling, and online prediction. In addition, we present two techniques to fully exploit the computing resources at disposal on a system. The first technique consists in jointly using CPU and GPU for executing a code. In order to achieve higher performance, it is mandatory to consider load balance, in particular by predicting execution time. The runtime uses the profiling results and the scheduler computes the execution times and adjusts the load distributed to the processors. The second technique, puts CPU and GPU in a competition: instances of the considered code are simultaneously executed on CPU and GPU. The winner of the competition notifies its completion to the other instance, implying the termination of the latter

APA, Harvard, Vancouver, ISO, and other styles

Yanggratoke, Rerngvit. "GPU Network Processing." Thesis, KTH, Telekommunikationssystem, TSLab, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-103694.

Full text

Abstract:

Networking technology is connecting more and more people around the world. It has become an essential part of our daily life. For this connectivity to be seamless, networks need to be fast. Nonetheless, rapid growth in network traffic and variety of communication protocols overwhelms the Central Processing Units (CPUs) processing packets in the networks. Existing solutions to this problem such as ASIC, FPGA, NPU, and TOE are not cost effective and easy to manage because they require special hardware and custom configurations. This thesis approaches the problem differently by offloading the network processing to off-the-shelf Graphic Processing Units (GPUs). The thesis's primary goal is to find out how the GPUs should be used for the offloading. The thesis follows the case study approach and the selected case studies are layer 2 Bloom filter forwarding and flow lookup in Openflow switch. Implementation alternatives and evaluation methodology are proposed for both of the case studies. Then, the prototype implementation for comparing between traditional CPU-only and GPU-offloading approach is developed and evaluated. The primary findings from this work are criteria of network processing functions suitable for GPU offloading and tradeoffs involved. The criteria are no inter-packet dependency, similar processing flows for all packets, and within-packet parallel processing opportunity. This offloading trades higher latency and memory consumption for higher throughput.
Nätverksteknik ansluter fler och fler människor runt om i världen. Det har blivit en viktig del av vårt dagliga liv. För att denna anslutning skall vara sömlös, måste nätet vara snabbt. Den snabba tillväxten i nätverkstrafiken och olika kommunikationsprotokoll sätter stora krav på processorer som hanterar all trafik. Befintliga lösningar på detta problem, t.ex. ASIC, FPGA, NPU, och TOE är varken kostnadseffektivt eller lätta att hantera, eftersom de kräver speciell hårdvara och anpassade konfigurationer. Denna avhandling angriper problemet på ett annat sätt genom att avlasta nätverks processningen till grafikprocessorer som sitter i vanliga pc-grafikkort. Avhandlingen främsta mål är att ta reda på hur GPU bör användas för detta. Avhandlingen följer fallstudie modell och de valda fallen är lager 2 Bloom filter forwardering och ``flow lookup'' i Openflow switch. Implementerings alternativ och utvärderingsmetodik föreslås för både fallstudierna. Sedan utvecklas och utvärderas en prototyp för att jämföra mellan traditionell CPU- och GPU-offload. Det primära resultatet från detta arbete utgör kriterier för nätvärksprocessfunktioner lämpade för GPU offload och vilka kompromisser som måste göras. Kriterier är inget inter-paket beroende, liknande processflöde för alla paket. och möjlighet att köra fler processer på ett paket paralellt. GPU offloading ger ökad fördröjning och minneskonsumption till förmån för högre troughput.

APA, Harvard, Vancouver, ISO, and other styles

Spampinato, Daniele. "Modeling Communication on Multi-GPU Systems." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9068.

Full text

Abstract:

Coupling commodity CPUs and modern GPUs give you heterogeneous systems that are cheap, high-performance with incredible FLOPS counts. Recent evolution of GPGPU models and technologies make these systems even more appealing as compute devices for a range of HPC applications including image processing, seismic processing and other physical modeling, as well as linear programming applications. In fact, graphics vendor such as NVIDIA and AMD are now targeting HPC with some of their products. Due to the power and frequency walls, the trend is now to use multiple GPUs on a given system, much like you will find multiple cores on CPU-based systems. However, increasing the hierarchy of resource wides the spectrum of factors that may impact on the performance of the system. The lack of good models for GPU-based, heterogeneous systems also makes it harder to understand which factors impact performance the most. The goal of this thesis is to analyze such factors by investigating and benchmarking NVIDIA's multi-GPU solution, their recent NVIDIA Tesla S1070 Computing System. This system combines four T10 GPUs making available up to 4 TFLOPS of computational power. Based on a comparative study of fundamental parallel computing models and on the specific heterogeneous features exposed by the system, we define a test space for performance analysis. As a case study, we develop a red-black, SOR PDE solver for Laplace equations with Dirichlet boundaries, well known for requiring constant communication in order to exchange neighboring data. To aid both design and analysis, we propose a model for multi-GPU systems targeting communication between the several GPUs. The main variables exposed by the benchmark application are: domain size and shape, kind of data partitioning, number of GPUs, width of the borders to exchange, kernels to use, and kind of synchronization between the GPU contexts. Among other results, the framework is able to point out the most critical bounds of the S1070 system when dealing with applications like the one in our case study. We show that the multi-GPU system greatly benefits from using all its four GPUs on very large data volumes. Our results show the four GPUs almost four times faster than a single GPU, and twice as fast as two. Our analysis outcomes also allow us to refine our static communication model, enriching it with regression-based predictions.

APA, Harvard, Vancouver, ISO, and other styles

Lulec, Andac. "Solution Of Sparse Systems On Gpu Architecture." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613355/index.pdf.

Full text

Abstract:

The solution of the linear system of equations is one of the core aspects of Finite Element Analysis (FEA) software. Since large amount of arithmetic operations are required for the solution of the system obtained by FEA, the influence of the solution of linear equations on the performance of the software is very significant. In recent years, the increasing demand for performance in the game industry caused significant improvements on the performances of Graphical Processing Units (GPU). With their massive floating point operations capability, they became attractive sources of performance for the general purpose programmers. Because of this reason, GPUs are chosen as the target hardware to develop an efficient parallel direct solver for the solution of the linear equations obtained from FEA.

APA, Harvard, Vancouver, ISO, and other styles

Dastgeer, Usman. "Skeleton Programming for Heterogeneous GPU-based Systems." Licentiate thesis, Linköpings universitet, PELAB - Laboratoriet för programmeringsomgivningar, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70234.

Full text

Abstract:

In this thesis, we address issues associated with programming modern heterogeneous systems while focusing on a special kind of heterogeneous systems that include multicore CPUs and one or more GPUs, called GPU-based systems.We consider the skeleton programming approach to achieve high level abstraction for efficient and portable programming of these GPU-based systemsand present our work on SkePU library which is a skeleton library for these systems. We extend the existing SkePU library with a two-dimensional (2D) data type and skeleton operations and implement several new applications using newly made skeletons. Furthermore, we consider the algorithmic choice present in SkePU and implement support to specify and automatically optimize the algorithmic choice for a skeleton call, on a given platform. To show how to achieve performance, we provide a case-study on optimized GPU-based skeleton implementation for 2D stencil computations and introduce two metrics to maximize resource utilization on a GPU. By devising a mechanism to automatically calculate these two metrics, performance can be retained while porting an application from one GPU architecture to another. Another contribution of this thesis is implementation of the runtime support for the SkePU skeleton library. This is achieved with the help of the StarPUruntime system. By this implementation,support for dynamic scheduling and load balancing for the SkePU skeleton programs is achieved. Furthermore, a capability to do hybrid executionby parallel execution on all available CPUs and GPUs in a system, even for a single skeleton invocation, is developed. SkePU initially supported only data-parallel skeletons. The first task-parallel skeleton (farm) in SkePU is implemented with support for performance-aware scheduling and hierarchical parallel execution by enabling all data parallel skeletons to be usable as tasks inside the farm construct. Experimental evaluations are carried out and presented for algorithmic selection, performance portability, dynamic scheduling and hybrid execution aspects of our work.

APA, Harvard, Vancouver, ISO, and other styles

Lee, Kenneth Sydney. "Characterization and Exploitation of GPU Memory Systems." Thesis, Virginia Tech, 2012. http://hdl.handle.net/10919/34215.

Full text

Abstract:

Graphics Processing Units (GPUs) are workhorses of modern performance due to their ability to achieve massive speedups on parallel applications. The massive number of threads that can be run concurrently on these systems allow applications which have data-parallel computations to achieve better performance when compared to traditional CPU systems. However, the GPU is not perfect for all types of computation. The massively parallel SIMT architecture of the GPU can still be constraining in terms of achievable performance. GPU-based systems will typically only be able to achieve between 40%-60% of their peak performance. One of the major problems affecting this effeciency is the GPU memory system, which is tailored to the needs of graphics workloads instead of general-purpose computation. This thesis intends to show the importance of memory optimizations for GPU systems. In particular, this work addresses problems of data transfer and global atomic memory contention. Using the novel AMD Fusion architecture, we gain overall performance improvements over discrete GPU systems for data-intensive applications. The fused architecture systems offer an interesting trade off by increasing data transfer rates at the cost of some raw computational power. We characterize the performance of different memory paths that are possible because of the shared memory space present on the fused architecture. In addition, we provide a theoretical model which can be used to correctly predict the comparative performance of memory movement techniques for a given data-intensive application and system. In terms of global atomic memory contention, we show improvements in scalability and performance for global synchronization primitives by avoiding contentious global atomic memory accesses. In general, this work shows the importance of understanding the memory system of the GPU architecture to achieve better application performance.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

Rustico, Eugenio. "Fluid Dynamics Simulations on Multi-GPU Systems." Doctoral thesis, Università di Catania, 2012. http://hdl.handle.net/10761/1030.

Full text

Abstract:

The thesis describes the original design, implementation and testing of the multi-GPU version of two fluid flow simulation models, focusing on the cellular automaton MAGFLOW lava flow simulator and the GPU-SPH model for Navier-Stokes. In both cases, a spatial subdivision of the domain is performed, with a minimal overlap to ensure the correct evaluation of the bordering elements (cells in MAGFFLOW, particles in GPUSPH). The latencies introduces by the continuous transfer of the overlapping borders are completely hidden through the use of asynchronous transfers performed concurrently with computations. Different load balancing techniques are used (a priori for MAGFLOW, a posteriori for GPUSPH) and compared. The obtained speedup is linear with the number of the used devices and closely follows the ideal speedup. The performance results are formally analyzed and discussed. While the results are close to the ideal achievable speedups, some future improvements are hypothesized and open problems are mentioned.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "GPU Systems"

GPU computing gems. Boston, MA: Morgan Kaufmann, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

GPU Pro 3: Advanced rendering techniques. Boca Raton, FL: A K Peters/CRC Press, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Heatly, Ralph. GIS-GPS sources. Cleveland, Ohio: Advanstar Marketing Services, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Heatly, Ralph O. GIS-GPS sources. Cleveland, OH: Advanstar Communications, Marketing Services, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Geoff, Blewitt, ed. Intelligent positioning: GIS-GPS unification. England: John Wiley, 2006.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

GPS/GNSS antennas. Boston: Artech House, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Debian GNU/Linux bible. Foster City, CA: IDG Books Worldwide, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Williams, Kevin W. GPS user-interface design problems. Washington, D.C: Office of Aviation Medicine, 1999.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Williams, Kevin W. GPS user-interface design problems. Washington, D.C: Office of Aviation Medicine, 1999.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Gabaglio, Vincent. GPS/INS integration for pedestrian navigation. Zürich: Schweizerische Geodätische Kommission, 2003.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "GPU Systems"

Lombardi, Luca, and Piercarlo Dondi. "GPU." In Encyclopedia of Systems Biology, 844. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4419-9863-7_1308.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Vázquez, Fransisco, José Antonio Martínez, and Ester M. Garzón. "GPU Computing." In Encyclopedia of Systems Biology, 845–49. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4419-9863-7_998.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Nam, Byeong-Gyu, and Hoi-Jun Yoo. "Embedded GPU Design." In Embedded Systems, 85–106. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2012. http://dx.doi.org/10.1002/9781118468654.ch3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Osama, Muhammad, Anton Wijs, and Armin Biere. "SAT Solving with GPU Accelerated Inprocessing." In Tools and Algorithms for the Construction and Analysis of Systems, 133–51. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-72016-2_8.

Full text

Abstract:

AbstractSince 2013, the leading SAT solvers in the SAT competition all use inprocessing, which unlike preprocessing, interleaves search with simplifications. However, applying inprocessing frequently can still be a bottle neck, i.e., for hard or large formulas. In this work, we introduce the first attempt to parallelize inprocessing on GPU architectures. As memory is a scarce resource in GPUs, we present new space-efficient data structures and devise a data-parallel garbage collector. It runs in parallel on the GPU to reduce memory consumption and improves memory access locality. Our new parallel variable elimination algorithm is twice as fast as previous work. In experiments our new solver ParaFROST solves many benchmarks faster on the GPU than its sequential counterparts.

APA, Harvard, Vancouver, ISO, and other styles

Shi, Lin, Hao Chen, and Ting Li. "Hybrid CPU/GPU Checkpoint for GPU-Based Heterogeneous Systems." In Communications in Computer and Information Science, 470–81. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-642-53962-6_42.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wijs, Anton, and Muhammad Osama. "A GPU Tree Database for Many-Core Explicit State Space Exploration." In Tools and Algorithms for the Construction and Analysis of Systems, 684–703. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-30823-9_35.

Full text

Abstract:

AbstractVarious techniques have been proposed to accelerate explicit-state model checking with GPUs, but none address the compact storage of states, or if they do, at the cost of losing completeness of the checking procedure. We investigate how to implement a tree database to store states as binary trees in GPU memory. We present fine-grained parallel algorithms to find and store trees, experiment with a number of GPU-specific configurations, and propose a novel hashing technique, called Cleary-Cuckoo hashing, which enables the use of Cleary compression on GPUs. We are the first to assess the effectiveness of using a tree database, and Cleary compression, on GPUs. Experiments show processing speeds of up to 131 million states per second.

APA, Harvard, Vancouver, ISO, and other styles

Yang, Ying, Yu Gu, Chuanwen Li, Changyi Wan, and Ge Yu. "GPU-Accelerated Dynamic Graph Coloring." In Database Systems for Advanced Applications, 296–99. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-18590-9_32.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Guri, Mordechai. "GPU-FAN: Leaking Sensitive Data from Air-Gapped Machines via Covert Noise from GPU Fans." In Secure IT Systems, 194–211. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-22295-5_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Andrzejewski, Witold, and Pawel Boinski. "GPU-Accelerated Collocation Pattern Discovery." In Advances in Databases and Information Systems, 302–15. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40683-6_23.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Alba, Enrique, and Pablo Vidal. "Systolic Optimization on GPU Platforms." In Computer Aided Systems Theory – EUROCAST 2011, 375–83. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-27549-4_48.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "GPU Systems"

Arafa, Yehia, Abdel-Hameed A. Badawy, Gopinath Chennupati, Nandakishore Santhi, and Stephan Eidenbenz. "PPT-GPU." In MEMSYS '18: The International Symposium on Memory Systems. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3240302.3270315.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wu-chun Feng and Shucai Xiao. "To GPU synchronize or not GPU synchronize?" In 2010 IEEE International Symposium on Circuits and Systems. ISCAS 2010. IEEE, 2010. http://dx.doi.org/10.1109/iscas.2010.5537722.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Pandey, Shweta, Aditya K. Kamath, and Arkaprava Basu. "GPM: leveraging persistent memory from a GPU." In ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503222.3507758.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Alglave, Jade, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, and John Wickerson. "GPU Concurrency." In ASPLOS '15: Architectural Support for Programming Languages and Operating Systems. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2694344.2694391.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Laflin, Jeremy J., Kurt S. Anderson, and Michael Hans. "Investigation of GPU Use in Conjunction With DCA-Based Articulated Multibody Systems Simulation." In ASME 2015 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2015. http://dx.doi.org/10.1115/detc2015-47207.

Full text

Abstract:

Since computational performance is critically important for simulations to be used as an effective tool to study and design dynamic systems, the computing performance gains offered by Graphics Processing Units (GPUs) cannot be ignored. Since the GPU is designed to execute a very large number of simultaneous tasks (nominally Single Instruction Multi-Data (SIMD)), recursive algorithms in general, such as the DCA, are not well suited to be executed on GPU-type architecture. This is because each level of recursion is dependent on the previous level. However, there are some ways that the GPU can be leveraged to increase computational performance when using the DCA to form and solve the equations of motion for articulated multibody systems with a very large number of degrees-of-freedom. Computational performance of dynamic simulations is highly dependent on the nature of the underlying formulation and the number of generalized coordinates used to characterize the system. Therefore, algorithms that scale in a more desirable (lower order) fashion with the number of degrees-of-freedom are generally preferred when dealing with large (N > 10) systems. However, the utility of using simulations as a scientific tool is directly related to actual compute time. The DCA, and other top performing methods, have demonstrated the desirable property of the required compute time scaling linearly with (O(n)) with the number of degrees-of-freedom (n) and sublinearly (O(logn) performance when implemented in parallel. However for the DCA, total compute time could be further reduced by exploiting the large number of independent operations involved in the first few levels of recursion. A simple chain-type pendulum example is used to explore the feasibility of using the GPU to execute the assembly and disassembly operations for the levels of recursion that contain enough bodies for this process to be computationally advantageous. A multi-core CPU is used to perform the operations in parallel using Open MP for the remaining levels. The number of levels of recursion that utilizes the GPU is varied from zero to all levels. The data corresponding to zero utilization of the GPU provides the reference compute-time in which the assembly and disassembly operations necessary at each level are performed in parallel using Open MP. The computational time required to simulate the system for one time-step where the GPU is utilized for various levels of recursion is compared to the reference compute time also varying the number of bodies in the system. A decrease in the compute-time when using the GPU is demonstrated relative to the reference compute-time even for systems of moderate size n < 1000 for arrangements using the GPU. This is a lower number of bodies than was expected for this test case and confirms that the GPU can bring significant increases in computational efficiency for large systems, while preserving the attractive sub-linear scalability (w.r.t. compute time) of the DCA.

APA, Harvard, Vancouver, ISO, and other styles

Carrigan, Travis J., Jacob Watt, and Brian H. Dennis. "Using GPU-Based Computing to Solve Large Sparse Systems of Linear Equations." In ASME 2011 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASMEDC, 2011. http://dx.doi.org/10.1115/detc2011-48452.

Full text

Abstract:

Often thought of as tools for image rendering or data visualization, graphics processing units (GPU) are becoming increasingly popular in the areas of scientific computing due to their low cost massively parallel architecture. With the introduction of CUDA C by NVIDIA and CUDA enabled GPUs, the ability to perform general purpose computations without the need to utilize shading languages is now possible. One such application that benefits from the capabilities provided by NVIDIA hardware is computational continuum mechanics (CCM). The need to solve sparse linear systems of equations is common in CCM when partial differential equations are discretized. Often these systems are solved iteratively using domain decomposition among distributed processors working in parallel. In this paper we explore the benefits of using GPUs to improve the performance of sparse matrix operations, more specifically, sparse matrix-vector multiplication. Our approach does not require domain decomposition, so it is simpler than corresponding implementation for distributed memory parallel computers. We demonstrate that for matrices produced from finite element discretizations on unstructured meshes, the performance of the matrix-vector multiplication operation is just under 13 times faster than when run serially on an Intel i5 system. Furthermore, we show that when used in conjunction with the biconjugate gradient stabilized method (BiCGSTAB), a gradient based iterative linear solver, the method is over 13 times faster than the serially executed C equivalent. And lastly, we emphasize the application of such method for solving Poisson’s equation using the Galerkin finite element method, and demonstrate over 10.5 times higher performance on the GPU when compared with the Intel i5 system.

APA, Harvard, Vancouver, ISO, and other styles

Saiz, Victor Bautista, and Fernan Gallego. "GPU: Application for CCTV systems." In 2014 International Carnahan Conference on Security Technology (ICCST). IEEE, 2014. http://dx.doi.org/10.1109/ccst.2014.6987028.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Green, Simon G. "GPU-accelerated iterated function systems." In ACM SIGGRAPH 2005 Sketches. New York, New York, USA: ACM Press, 2005. http://dx.doi.org/10.1145/1187112.1187128.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Martinez-del-Amor, Miguel Angel, D. Orellana-Martin, A. Riscos-Nunez, and M. J. Perez-Jimenez. "On GPU-Oriented P Systems." In 2018 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 2018. http://dx.doi.org/10.1109/hpcs.2018.00125.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bollweg, Dennis, Luis Altenkort, David Anthony Clarke, Olaf Kaczmarek, Lukas Mazur, Christian Schmidt, Philipp Scior, and Hai-Tao Shu. "HotQCD on multi-GPU Systems." In The 38th International Symposium on Lattice Field Theory. Trieste, Italy: Sissa Medialab, 2022. http://dx.doi.org/10.22323/1.396.0196.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "GPU Systems"

Owens, John. A Programming Framework for Scientific Applications on CPU-GPU Systems. Office of Scientific and Technical Information (OSTI), March 2013. http://dx.doi.org/10.2172/1069280.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dongarra, Jack J., and Stanimire Tomov. Matrix Algebra for GPU and Multicore Architectures (MAGMA) for Large Petascale Systems. Office of Scientific and Technical Information (OSTI), March 2014. http://dx.doi.org/10.2172/1126489.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Cook, Samantha, Marissa Torres, Nathan Lamie, Lee Perren, Scott Slone, and Bonnie Jones. Automated ground-penetrating-radar post-processing software in R programming. Engineer Research and Development Center (U.S.), September 2022. http://dx.doi.org/10.21079/11681/45621.

Full text

Abstract:

Ground-penetrating radar (GPR) is a nondestructive geophysical technique used to create images of the subsurface. A major limitation of GPR is that a subject matter expert (SME) needs to post-process and interpret the data, limiting the technique’s use. Post-processing is time-intensive and, for detailed processing, requires proprietary software. The goal of this study is to develop automated GPR post-processing software, compatible with Geophysical Survey Systems, Inc. (GSSI) data, in open-source R programming. This would eliminate the need for an SME to process GPR data, remove proprietary software dependencies, and render GPR more accessible. This study collected GPR profiles by using a GSSI SIR4000 control unit, a 100 MHz antenna, and a Trimble GPS. A standardized method for post-processing data was then established, which includes static data removal, time-zero correction, distance normalization, data filtering, and stacking. These steps were scripted and automated in R programming, excluding data filtering, which was used from an existing package, RGPR. The study compared profiles processed using GSSI software to profiles processed using the R script developed here to ensure comparable functionality and output. While an SME is currently still necessary for interpretations, this script eliminates the need for one to post-process GSSI GPR data.

APA, Harvard, Vancouver, ISO, and other styles

Robert, J., and Michael Forte. Field evaluation of GNSS/GPS based RTK, RTN, and RTX correction systems. Engineer Research and Development Center (U.S.), September 2021. http://dx.doi.org/10.21079/11681/41864.

Full text

Abstract:

This Coastal and Hydraulic Engineering Technical Note (CHETN) details an evaluation of three Global Navigation Satellite System (GNSS)/Global Positioning System (GPS) real-time correction methods capable of providing centimeter-level positioning. Internet and satellite-delivered correction systems, Real Time Network (RTN) and Real Time eXtended (RTX), respectively, are compared to a traditional ground-based two-way radio transmission correction system, generally referred to as Local RTK, or simply RTK. Results from this study will provide prospective users background information on each of these positioning systems and comparisons of their respective accuracies during in field operations.

APA, Harvard, Vancouver, ISO, and other styles

Bergen, Benjamin K. OpenCL: Free Your GPU... and the rest of your system too! Office of Scientific and Technical Information (OSTI), May 2013. http://dx.doi.org/10.2172/1078372.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hoey, David, and Paul Benshoof. Civil GPS Systems and Potential Vulnerabilities. Fort Belvoir, VA: Defense Technical Information Center, October 2005. http://dx.doi.org/10.21236/ada440372.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hoey, David, and Paul Benshoof. Civil GPS Systems and Potential Vulnerabilities. Fort Belvoir, VA: Defense Technical Information Center, October 2005. http://dx.doi.org/10.21236/ada440379.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dickson, Dick. Standard Report Format for Global Positioning System (GPS) Receivers and Systems Accuracy Tests and Evaluations. Fort Belvoir, VA: Defense Technical Information Center, February 2000. http://dx.doi.org/10.21236/ada375388.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lombardi, Michael A. An Evaluation of Dependencies of Critical Infrastructure Timing Systems on the Global Positioning System (GPS). National Institute of Standards and Technology, November 2021. http://dx.doi.org/10.6028/nist.tn.2189.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bragge, Peter, Veronica Delafosse, Ngo Cong-Lem, Diki Tsering, and Breanna Wright. General practitioners raising and discussing sensitive health issues with patients. The Sax Institute, June 2023. http://dx.doi.org/10.57022/rseh3974.

Full text

Abstract:

This Evidence Check was commissioned by the NSW Ministry of Health, as part of a project to improve how preventive, sensitive health issues are raised in general practice. The review looked at what is known about discussing sensitive preventive health issues from both patients and GPs perspectives and approaches and factors that have been shown to be effective. The identified evidence was generally of moderate to high methodological quality. General behaviour change approaches that are applicable to this challenge include creating non-judgemental environments that normalise sensitive health issues; simulation training; and public campaigns that reduce stigma and challenge unhelpful cultural norms. Lack of time in consultations was identified as a challenging issue. Significant system-level change would be required to extend standard consultation times; focusing on optimising workflows may therefore be more feasible. Addressing GP patient–gender mismatch through diverse GP representation may also be feasible in larger practices. The key theme identified was the use of prompting, screening or other structured tools by GPs. Collectively, these approaches have two main features. First, they are a way of approaching sensitive health conversations less directly, for example by focusing on underlying risk factors for sensitive health conditions such as obesity and mental illness rather than addressing the issues directly. Second, through either risk-factor or more general question prompts, these approaches take the onus away from GPs and patients to come up with a way of asking the question using their own words.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'GPU Systems'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "GPU Systems"

Dissertations / Theses on the topic "GPU Systems"

Books on the topic "GPU Systems"

Book chapters on the topic "GPU Systems"

Conference papers on the topic "GPU Systems"

Reports on the topic "GPU Systems"