Log in

Relevant bibliographies by topics / GPU Systems / Journal articles

To see the other types of publications on this topic, follow the link: GPU Systems.

Journal articles on the topic 'GPU Systems'

Author: Grafiati

Published: 6 September 2023

Last updated: 7 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'GPU Systems.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Jararweh, Yaser, Moath Jarrah, and Abdelkader Bousselham. "GPU Scaling." International Journal of Information Technology and Web Engineering 9, no. 4 (October 2014): 13–23. http://dx.doi.org/10.4018/ijitwe.2014100102.

Full text

Abstract:

Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.

APA, Harvard, Vancouver, ISO, and other styles

2

Dematte, L., and D. Prandi. "GPU computing for systems biology." Briefings in Bioinformatics 11, no. 3 (March 7, 2010): 323–33. http://dx.doi.org/10.1093/bib/bbq006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Ban, Zhihua, Jianguo Liu, and Jeremy Fouriaux. "GMMSP on GPU." Journal of Real-Time Image Processing 17, no. 2 (March 17, 2018): 245–57. http://dx.doi.org/10.1007/s11554-018-0762-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Georgii, Joachim, and Rüdiger Westermann. "Mass-spring systems on the GPU." Simulation Modelling Practice and Theory 13, no. 8 (November 2005): 693–702. http://dx.doi.org/10.1016/j.simpat.2005.08.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Huynh, Huynh Phung, Andrei Hagiescu, Ong Zhong Liang, Weng-Fai Wong, and Rick Siow Mong Goh. "Mapping Streaming Applications onto GPU Systems." IEEE Transactions on Parallel and Distributed Systems 25, no. 9 (September 2014): 2374–85. http://dx.doi.org/10.1109/tpds.2013.195.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Deniz, Etem, and Alper Sen. "MINIME-GPU." ACM Transactions on Architecture and Code Optimization 12, no. 4 (January 7, 2016): 1–25. http://dx.doi.org/10.1145/2818693.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Braak, Gert-Jan Van Den, and Henk Corporaal. "R-GPU." ACM Transactions on Architecture and Code Optimization 13, no. 1 (April 5, 2016): 1–24. http://dx.doi.org/10.1145/2890506.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

INO, Fumihiko, Shinta NAKAGAWA, and Kenichi HAGIHARA. "GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems." IEICE Transactions on Information and Systems E96.D, no. 12 (2013): 2604–16. http://dx.doi.org/10.1587/transinf.e96.d.2604.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Rosenfeld, Viktor, Sebastian Breß, and Volker Markl. "Query Processing on Heterogeneous CPU/GPU Systems." ACM Computing Surveys 55, no. 1 (January 31, 2023): 1–38. http://dx.doi.org/10.1145/3485126.

Full text

Abstract:

Due to their high computational power and internal memory bandwidth, graphic processing units (GPUs) have been extensively studied by the database systems research community. A heterogeneous query processing system that employs CPUs and GPUs at the same time has to solve many challenges, including how to distribute the workload on processors with different capabilities; how to overcome the data transfer bottleneck; and how to support implementations for multiple processors efficiently. In this survey we devise a classification scheme to categorize techniques developed to address these challenges. Based on this scheme, we categorize query processing systems on heterogeneous CPU/GPU systems and identify open research problems.

APA, Harvard, Vancouver, ISO, and other styles

10

Besozzi, Daniela, Giulio Caravagna, Paolo Cazzaniga, Marco Nobile, Dario Pescini, and Alessandro Re. "GPU-powered Simulation Methodologies for Biological Systems." Electronic Proceedings in Theoretical Computer Science 130 (September 30, 2013): 87–91. http://dx.doi.org/10.4204/eptcs.130.14.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

ODAKA, Fumihiro, and Kenkichi SATO. "S2030201 GPU Computing Systems: History and Application." Proceedings of Mechanical Engineering Congress, Japan 2014 (2014): _S2030201——_S2030201—. http://dx.doi.org/10.1299/jsmemecj.2014._s2030201-.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Maza, Marc Moreno, and Wei Pan. "Solving Bivariate Polynomial Systems on a GPU." Journal of Physics: Conference Series 341 (February 9, 2012): 012022. http://dx.doi.org/10.1088/1742-6596/341/1/012022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

ODAGAWA, Masato, Yuriko TAKESHIMA, Issei FUJISHIRO, Gota KIKUGAWA, and Taku OHARA. "GPU-Based Adaptive Visualization for Particle Systems." TRANSACTIONS OF THE JAPAN SOCIETY OF MECHANICAL ENGINEERS Series B 77, no. 781 (2011): 1767–78. http://dx.doi.org/10.1299/kikaib.77.1767.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Maza, Marc Moreno, and Wei Pan. "Solving bivariate polynomial systems on a GPU." ACM Communications in Computer Algebra 45, no. 1/2 (July 25, 2011): 127–28. http://dx.doi.org/10.1145/2016567.2016589.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Jiang, Hai, Yi Chen, Zhi Qiao, Kuan-Ching Li, WonWoo Ro, and Jean-Luc Gaudiot. "Accelerating MapReduce framework on multi-GPU systems." Cluster Computing 17, no. 2 (May 30, 2013): 293–301. http://dx.doi.org/10.1007/s10586-013-0276-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Bernaschi, M., M. Fatica, G. Parisi, and L. Parisi. "Multi-GPU codes for spin systems simulations." Computer Physics Communications 183, no. 7 (July 2012): 1416–21. http://dx.doi.org/10.1016/j.cpc.2012.02.015.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Ino, Fumihiko, Akihiro Ogita, Kentaro Oita, and Kenichi Hagihara. "Cooperative multitasking for GPU-accelerated grid systems." Concurrency and Computation: Practice and Experience 24, no. 1 (March 22, 2011): 96–107. http://dx.doi.org/10.1002/cpe.1722.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Lamas-Rodríguez, Julián, Dora B. Heras, Francisco Argüello, Dagmar Kainmueller, Stefan Zachow, and Montserrat Bóo. "GPU-accelerated level-set segmentation." Journal of Real-Time Image Processing 12, no. 1 (November 26, 2013): 15–29. http://dx.doi.org/10.1007/s11554-013-0378-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Meng, Wanwan, Yongguang Cheng, Jiayang Wu, Zhiyan Yang, Yunxian Zhu, and Shuai Shang. "GPU Acceleration of Hydraulic Transient Simulations of Large-Scale Water Supply Systems." Applied Sciences 9, no. 1 (December 27, 2018): 91. http://dx.doi.org/10.3390/app9010091.

Full text

Abstract:

Simulating hydraulic transients in ultra-long water (oil, gas) transmission or large-scale distribution systems are time-consuming, and exploring ways to improve the simulation efficiency is an essential research direction. The parallel implementation of the method of characteristics (MOC) on graphics processing unit (GPU) chips is a promising approach for accelerating the simulations, because GPU has a great parallelization ability for massive but simple computations, and the explicit and local features of MOC meet the features of GPU quite well. In this paper, we propose and verify a GPU implementation of MOC on a single chip for more efficient simulations of hydraulic transients. Details of GPU-MOC parallel strategies are introduced, and the accuracy and efficiency of the proposed method are verified by simulating the benchmark single pipe water hammer problem. The transient processes of a large scale water distribution system and a long-distance water transmission system are simulated to investigate the computing capability of the proposed method. The results show that GPU-MOC method can achieve significant performance gains, and the speedup ratios are up to hundreds compared to the traditional method. This preliminary work demonstrates that GPU-MOC parallel computing has great prospects in practical applications with large computing load.

APA, Harvard, Vancouver, ISO, and other styles

20

Zhou, Zhe, Wenrui Diao, Xiangyu Liu, Zhou Li, Kehuan Zhang, and Rui Liu. "Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU." Proceedings on Privacy Enhancing Technologies 2017, no. 2 (April 1, 2017): 57–73. http://dx.doi.org/10.1515/popets-2017-0016.

Full text

Abstract:

Abstract According to previous reports, information could be leaked from GPU memory; however, the security implications of such a threat were mostly over-looked, because only limited information could be indirectly extracted through side-channel attacks. In this paper, we propose a novel algorithm for recovering raw data directly from the GPU memory residues of many popular applications such as Google Chrome and Adobe PDF reader. Our algorithm enables harvesting highly sensitive information including credit card numbers and email contents from GPU memory residues. Evaluation results also indicate that nearly all GPU-accelerated applications are vulnerable to such attacks, and adversaries can launch attacks without requiring any special privileges both on traditional multi-user operating systems, and emerging cloud computing scenarios.

APA, Harvard, Vancouver, ISO, and other styles

21

Campeanu, Gabriel, and Mehrdad Saadatmand. "A Two-Layer Component-Based Allocation for Embedded Systems with GPUs." Designs 3, no. 1 (January 19, 2019): 6. http://dx.doi.org/10.3390/designs3010006.

Full text

Abstract:

Component-based development is a software engineering paradigm that can facilitate the construction of embedded systems and tackle its complexities. The modern embedded systems have more and more demanding requirements. One way to cope with such a versatile and growing set of requirements is to employ heterogeneous processing power, i.e., CPU–GPU architectures. The new CPU–GPU embedded boards deliver an increased performance but also introduce additional complexity and challenges. In this work, we address the component-to-hardware allocation for CPU–GPU embedded systems. The allocation for such systems is much complex due to the increased amount of GPU-related information. For example, while in traditional embedded systems the allocation mechanism may consider only the CPU memory usage of components to find an appropriate allocation scheme, in heterogeneous systems, the GPU memory usage needs also to be taken into account in the allocation process. This paper aims at decreasing the component-to-hardware allocation complexity by introducing a two-layer component-based architecture for heterogeneous embedded systems. The detailed CPU–GPU information of the system is abstracted at a high-layer by compacting connected components into single units that behave as regular components. The allocator, based on the compacted information received from the high-level layer, computes, with a decreased complexity, feasible allocation schemes. In the last part of the paper, the two-layer allocation method is evaluated using an existing embedded system demonstrator; namely, an underwater robot.

APA, Harvard, Vancouver, ISO, and other styles

22

Chen, Yong, Hai Jin, Han Jiang, Dechao Xu, Ran Zheng, and Haocheng Liu. "Implementation and Optimization of GPU-Based Static State Security Analysis in Power Systems." Mobile Information Systems 2017 (2017): 1–10. http://dx.doi.org/10.1155/2017/1897476.

Full text

Abstract:

Static state security analysis (SSSA) is one of the most important computations to check whether a power system is in normal and secure operating state. It is a challenge to satisfy real-time requirements with CPU-based concurrent methods due to the intensive computations. A sensitivity analysis-based method with Graphics processing unit (GPU) is proposed for power systems, which can reduce calculation time by 40% compared to the execution on a 4-core CPU. The proposed method involves load flow analysis and sensitivity analysis. In load flow analysis, a multifrontal method for sparse LU factorization is explored on GPU through dynamic frontal task scheduling between CPU and GPU. The varying matrix operations during sensitivity analysis on GPU are highly optimized in this study. The results of performance evaluations show that the proposed GPU-based SSSA with optimized matrix operations can achieve a significant reduction in computation time.

APA, Harvard, Vancouver, ISO, and other styles

23

Tran, Giang Son, Thi Phuong Nghiem, and Jean-Christophe Burie. "Fast parallel blur detection on GPU." Journal of Real-Time Image Processing 17, no. 4 (November 12, 2018): 903–13. http://dx.doi.org/10.1007/s11554-018-0837-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Abell, Stephen, Nhan Do, and John Jaehwan Lee. "GPU-OSDDA: a bit-vector GPU-based deadlock detection algorithm for single-unit resource systems." International Journal of Parallel, Emergent and Distributed Systems 31, no. 5 (October 24, 2015): 450–68. http://dx.doi.org/10.1080/17445760.2015.1100301.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Abell, Stephen, Nhan Do, and John Jaehwan Lee. "GPU-LMDDA: a bit-vector GPU-based deadlock detection algorithm for multi-unit resource systems." International Journal of Parallel, Emergent and Distributed Systems 31, no. 6 (February 19, 2016): 562–90. http://dx.doi.org/10.1080/17445760.2016.1140761.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Wang, Long, Masaki Iwasawa, Keigo Nitadori, and Junichiro Makino. "petar: a high-performance N-body code for modelling massive collisional stellar systems." Monthly Notices of the Royal Astronomical Society 497, no. 1 (July 24, 2020): 536–55. http://dx.doi.org/10.1093/mnras/staa1915.

Full text

Abstract:

ABSTRACT The numerical simulations of massive collisional stellar systems, such as globular clusters (GCs), are very time consuming. Until now, only a few realistic million-body simulations of GCs with a small fraction of binaries ($5{{\ \rm per\ cent}}$) have been performed by using the nbody6++gpu code. Such models took half a year computational time on a Graphic Processing Unit (GPU)-based supercomputer. In this work, we develop a new N-body code, petar, by combining the methods of Barnes–Hut tree, Hermite integrator and slow-down algorithmic regularization. The code can accurately handle an arbitrary fraction of multiple systems (e.g. binaries and triples) while keeping a high performance by using the hybrid parallelization methods with mpi, openmp, simd instructions and GPU. A few benchmarks indicate that petar and nbody6++gpu have a very good agreement on the long-term evolution of the global structure, binary orbits and escapers. On a highly configured GPU desktop computer, the performance of a million-body simulation with all stars in binaries by using petar is 11 times faster than that of nbody6++gpu. Moreover, on the Cray XC50 supercomputer, petar well scales when number of cores increase. The 10 million-body problem, which covers the region of ultracompact dwarfs and nuclear star clusters, becomes possible to be solved.

APA, Harvard, Vancouver, ISO, and other styles

27

Kopysov, S. P., A. K. Novikov, and Yu A. Sagdeeva. "Solving of discontinuous Galerkin method systems on GPU." Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, no. 4 (December 2011): 121–31. http://dx.doi.org/10.20537/vm110411.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Martínez-del-Amor, Miguel A., Manuel García-Quismondo, Luis F. Macías-Ramos, Luis Valencia-Cabrera, Agustin Riscos-Núñez, and Mario J. Pérez-Jiménez. "Simulating P Systems on GPU Devices: A Survey." Fundamenta Informaticae 136, no. 3 (2015): 269–84. http://dx.doi.org/10.3233/fi-2015-1157.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

van Pelt, Roy, Anna Vilanova, and Huub van de Wetering. "Illustrative Volume Visualization Using GPU-Based Particle Systems." IEEE Transactions on Visualization and Computer Graphics 16, no. 4 (July 2010): 571–82. http://dx.doi.org/10.1109/tvcg.2010.32.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Anzt, Hartwig, Stanimire Tomov, Mark Gates, Jack Dongarra, and Vincent Heuveline. "Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems." Procedia Computer Science 9 (2012): 7–16. http://dx.doi.org/10.1016/j.procs.2012.04.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Galiano, V., H. Migallón, V. Migallón, and J. Penadés. "GPU-based parallel algorithms for sparse nonlinear systems." Journal of Parallel and Distributed Computing 72, no. 9 (September 2012): 1098–105. http://dx.doi.org/10.1016/j.jpdc.2011.10.016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Nere, Andrew, Sean Franey, Atif Hashmi, and Mikko Lipasti. "Simulating cortical networks on heterogeneous multi-GPU systems." Journal of Parallel and Distributed Computing 73, no. 7 (July 2013): 953–71. http://dx.doi.org/10.1016/j.jpdc.2012.02.006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Mastrostefano, Enrico, and Massimo Bernaschi. "Efficient breadth first search on multi-GPU systems." Journal of Parallel and Distributed Computing 73, no. 9 (September 2013): 1292–305. http://dx.doi.org/10.1016/j.jpdc.2013.05.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Acosta, Alejandro, Vicente Blanco, and Francisco Almeida. "Dynamic load balancing on heterogeneous multi-GPU systems." Computers & Electrical Engineering 39, no. 8 (November 2013): 2591–602. http://dx.doi.org/10.1016/j.compeleceng.2013.08.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Dastgeer, Usman, and Christoph Kessler. "Performance-aware composition framework for GPU-based systems." Journal of Supercomputing 71, no. 12 (January 30, 2014): 4646–62. http://dx.doi.org/10.1007/s11227-014-1105-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Jo, Heeseung, Seung-Tae Hong, Jae-Woo Chang, and Dong Hoon Choi. "Offloading data encryption to GPU in database systems." Journal of Supercomputing 69, no. 1 (March 21, 2014): 375–94. http://dx.doi.org/10.1007/s11227-014-1159-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Vuduc, Richard, and Kent Czechowski. "What GPU Computing Means for High-End Systems." IEEE Micro 31, no. 4 (July 2011): 74–78. http://dx.doi.org/10.1109/mm.2011.78.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

da Silva Junior, Jose Ricardo, Esteban Clua, and Leonardo Murta. "Efficient image-aware version control systems using GPU." Software: Practice and Experience 46, no. 8 (June 24, 2015): 1011–33. http://dx.doi.org/10.1002/spe.2340.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Gembris, Daniel, Markus Neeb, Markus Gipp, Andreas Kugel, and Reinhard Männer. "Correlation analysis on GPU systems using NVIDIA’s CUDA." Journal of Real-Time Image Processing 6, no. 4 (June 17, 2010): 275–80. http://dx.doi.org/10.1007/s11554-010-0162-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

YOO, SEUNG-HUN, and CHANG-SUNG JEONG. "IMAGE REGISTRATION AND FUSION SYSTEM BASED ON GPU." Journal of Circuits, Systems and Computers 19, no. 01 (February 2010): 173–89. http://dx.doi.org/10.1142/s0218126610006049.

Full text

Abstract:

Graphics processing unit (GPU) has surfaced as a high-quality platform for computer vision-related systems. In this paper, we propose a straightforward system consisting of a registration and a fusion method over GPU, which generates good results at high speed, compared to non-GPU-based systems. Our GPU-accelerated system utilizes existing methods through converting the methods into the GPU-based platform. The registration method uses point correspondences to find a registering transformation estimated with the incremental parameters in a coarse-to-fine way, while the fusion algorithm uses multi-scale methods to fuse the results from the registration stage. We evaluate performance with the same methods that are executed over both CPU-only and GPU-mounted environment. The experiment results present convincing evidences of the efficiency of our system, which is tested on a few pairs of aerial images taken by electro-optical and infrared sensors to provide visual information of a scene for environmental observatories.

APA, Harvard, Vancouver, ISO, and other styles

41

Kumar, Anshuman, Pablo R. Arantes, Aakash Saha, Giulia Palermo, and Bryan M. Wong. "GPU-Enhanced DFTB Metadynamics for Efficiently Predicting Free Energies of Biochemical Systems." Molecules 28, no. 3 (January 28, 2023): 1277. http://dx.doi.org/10.3390/molecules28031277.

Full text

Abstract:

Metadynamics calculations of large chemical systems with ab initio methods are computationally prohibitive due to the extensive sampling required to simulate the large degrees of freedom in these systems. To address this computational bottleneck, we utilized a GPU-enhanced density functional tight binding (DFTB) approach on a massively parallelized cloud computing platform to efficiently calculate the thermodynamics and metadynamics of biochemical systems. To first validate our approach, we calculated the free-energy surfaces of alanine dipeptide and showed that our GPU-enhanced DFTB calculations qualitatively agree with computationally-intensive hybrid DFT benchmarks, whereas classical force fields give significant errors. Most importantly, we show that our GPU-accelerated DFTB calculations are significantly faster than previous approaches by up to two orders of magnitude. To further extend our GPU-enhanced DFTB approach, we also carried out a 10 ns metadynamics simulation of remdesivir, which is prohibitively out of reach for routine DFT-based metadynamics calculations. We find that the free-energy surfaces of remdesivir obtained from DFTB and classical force fields differ significantly, where the latter overestimates the internal energy contribution of high free-energy states. Taken together, our benchmark tests, analyses, and extensions to large biochemical systems highlight the use of GPU-enhanced DFTB simulations for efficiently predicting the free-energy surfaces/thermodynamics of large biochemical systems.

APA, Harvard, Vancouver, ISO, and other styles

42

Ngo, Long Thanh, Dzung Dinh Nguyen, Long The Pham, and Cuong Manh Luong. "Speedup of Interval Type 2 Fuzzy Logic Systems Based on GPU for Robot Navigation." Advances in Fuzzy Systems 2012 (2012): 1–11. http://dx.doi.org/10.1155/2012/698062.

Full text

Abstract:

As the number of rules and sample rate for type 2 fuzzy logic systems (T2FLSs) increases, the speed of calculations becomes a problem. The T2FLS has a large membership value of inherent algorithmic parallelism that modern CPU architectures do not exploit. In the T2FLS, many rules and algorithms can be speedup on a graphics processing unit (GPU) as long as the majority of computation a various stages and components are not dependent on each other. This paper demonstrates how to install interval type 2 fuzzy logic systems (IT2-FLSs) on the GPU and experiments for obstacle avoidance behavior of robot navigation. GPU-based calculations are high-performance solution and free up the CPU. The experimental results show that the performance of the GPU is many times faster than CPU.

APA, Harvard, Vancouver, ISO, and other styles

43

Ding, Yifan, Nicholas Botzer, and Tim Weninger. "HetSeq: Distributed GPU Training on Heterogeneous Infrastructure." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 17 (May 18, 2021): 15432–38. http://dx.doi.org/10.1609/aaai.v35i17.17813.

Full text

Abstract:

Modern deep learning systems like PyTorch and Tensorflow are able to train enormous models with billions (or trillions) of parameters on a distributed infrastructure. These systems require that the internal nodes have the same memory capacity and compute performance. Unfortunately, most organizations, especially universities, have a piecemeal approach to purchasing computer systems resulting in a heterogeneous infrastructure, which cannot be used to compute large models. The present work describes HetSeq, a software package adapted from the popular PyTorch package that provides the capability to train large neural network models on heterogeneous infrastructure. Experiments with language translation, text and image classification shows that HetSeq scales over heterogeneous systems. Additional information, support documents, source code are publicly available at https://github.com/yifding/hetseq.

APA, Harvard, Vancouver, ISO, and other styles

44

Fu, Yaosheng, Evgeny Bolotin, Niladrish Chatterjee, David Nellans, and Stephen W. Keckler. "GPU Domain Specialization via Composable On-Package Architecture." ACM Transactions on Architecture and Code Optimization 19, no. 1 (March 31, 2022): 1–23. http://dx.doi.org/10.1145/3484505.

Full text

Abstract:

As GPUs scale their low-precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities. We demonstrate that a converged GPU design trying to address diverging architectural requirements between FP32 (or larger)-based HPC and FP16 (or smaller)-based DL workloads results in sub-optimal configurations for either of the application domains. We argue that a C omposable O n- PA ckage GPU (COPA-GPU) architecture to provide domain-specialized GPU products is the most practical solution to these diverging requirements. A COPA-GPU leverages multi-chip-module disaggregation to support maximal design reuse, along with memory system specialization per application domain. We show how a COPA-GPU enables DL-specialized products by modular augmentation of the baseline GPU architecture with up to 4× higher off-die bandwidth, 32× larger on-package cache, and 2.3× higher DRAM bandwidth and capacity, while conveniently supporting scaled-down HPC-oriented designs. This work explores the microarchitectural design necessary to enable composable GPUs and evaluates the benefits composability can provide to HPC, DL training, and DL inference. We show that when compared to a converged GPU design, a DL-optimized COPA-GPU featuring a combination of 16× larger cache capacity and 1.6× higher DRAM bandwidth scales per-GPU training and inference performance by 31% and 35%, respectively, and reduces the number of GPU instances by 50% in scale-out training scenarios.

APA, Harvard, Vancouver, ISO, and other styles

45

Rapaport, D. C. "GPU molecular dynamics: Algorithms and performance." Journal of Physics: Conference Series 2241, no. 1 (March 1, 2022): 012007. http://dx.doi.org/10.1088/1742-6596/2241/1/012007.

Full text

Abstract:

Abstract A previous study of MD algorithms designed for GPU use is extended to cover more recent developments in GPU architecture. Algorithm modifications are described, togther with extensions to more complex systems. New measurements include the effects of increased parallelism on GPU performance, as well as comparisons with multiple-core CPUs using multitasking based on CPU threads and message passing. The results show that the GPU retains a significant performance advantage.

APA, Harvard, Vancouver, ISO, and other styles

46

Zhu, Rui, Chang Nian Chen, and Lei Hua Qin. "An Transfer Latency Optimized Solution in GPU-Accelerated De-Duplication." Applied Mechanics and Materials 336-338 (July 2013): 2059–62. http://dx.doi.org/10.4028/www.scientific.net/amm.336-338.2059.

Full text

Abstract:

Recently, GPU has been introduced as an important tool in general purpose programming due to its powerful computing capacity. In data de-duplication systems, GPU has been used to accelerate the chunking and hashing algorithms. However, the data transfer latency between the memories of CPU to GPU is one of the main challenges in GPU accelerated de-duplication. To alleviate this challenge, our solution strives to reduce the data transfer time between host and GPU memory on parallelized content-defined chunking and hashing algorithm. In our experiment, it has shown 15%~20% performance improvements over already accelerated baseline GPU implementation in data de-duplication.

APA, Harvard, Vancouver, ISO, and other styles

47

DeFrancisco, Richard, Shenghsun Cho, Michael Ferdman, and Scott A. Smolka. "Swarm model checking on the GPU." International Journal on Software Tools for Technology Transfer 22, no. 5 (June 16, 2020): 583–99. http://dx.doi.org/10.1007/s10009-020-00576-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Wang, Qihan, Zhen Peng, Bin Ren, Jie Chen, and Robert G. Edwards. "MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation." ACM Transactions on Architecture and Code Optimization 19, no. 2 (June 30, 2022): 1–26. http://dx.doi.org/10.1145/3506705.

Full text

Abstract:

The many-body correlation function is a fundamental computation kernel in modern physics computing applications, e.g., Hadron Contractions in Lattice quantum chromodynamics (QCD). This kernel is both computation and memory intensive, involving a series of tensor contractions, and thus usually runs on accelerators like GPUs. Existing optimizations on many-body correlation mainly focus on individual tensor contractions (e.g., cuBLAS libraries and others). In contrast, this work discovers a new optimization dimension for many-body correlation by exploring the optimization opportunities among tensor contractions. More specifically, it targets general GPU architectures (both NVIDIA and AMD) and optimizes many-body correlation’s memory management by exploiting a set of memory allocation and communication redundancy elimination opportunities: first, GPU memory allocation redundancy : the intermediate output frequently occurs as input in the subsequent calculations; second, CPU-GPU communication redundancy : although all tensors are allocated on both CPU and GPU, many of them are used (and reused) on the GPU side only, and thus, many CPU/GPU communications (like that in existing Unified Memory designs) are unnecessary; third, GPU oversubscription: limited GPU memory size causes oversubscription issues, and existing memory management usually results in near-reuse data eviction, thus incurring extra CPU/GPU memory communications. Targeting these memory optimization opportunities, this article proposes MemHC, an optimized systematic GPU memory management framework that aims to accelerate the calculation of many-body correlation functions utilizing a series of new memory reduction designs. These designs involve optimizations for GPU memory allocation, CPU/GPU memory movement, and GPU memory oversubscription, respectively. More specifically, first, MemHC employs duplication-aware management and lazy release of GPU memories to corresponding host managing for better data reusability. Second, it implements data reorganization and on-demand synchronization to eliminate redundant (or unnecessary) data transfer. Third, MemHC exploits an optimized Least Recently Used (LRU) eviction policy called Pre-Protected LRU to reduce evictions and leverage memory hits. Additionally, MemHC is portable for various platforms including NVIDIA GPUs and AMD GPUs. The evaluation demonstrates that MemHC outperforms unified memory management by \( 2.18\times \) to \( 10.73\times \) . The proposed Pre-Protected LRU policy outperforms the original LRU policy by up to \( 1.36\times \) improvement. 1

APA, Harvard, Vancouver, ISO, and other styles

49

Zhang, Yu, Da Peng, Xiaofei Liao, Hai Jin, Haikun Liu, Lin Gu, and Bingsheng He. "LargeGraph." ACM Transactions on Architecture and Code Optimization 18, no. 4 (December 31, 2021): 1–24. http://dx.doi.org/10.1145/3477603.

Full text

Abstract:

Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices’ new states along graph paths. To efficiently support out-of-GPU-memory graph processing, this work designs a system LargeGraph . Different from existing out-of-GPU-memory systems, LargeGraph proposes a dependency-aware data-driven execution approach , which can significantly accelerate active vertices’ state propagations along graph paths with low data access cost and also high parallelism. Specifically, according to the dependencies between the vertices, it only loads and processes the graph data associated with dependency chains originated from active vertices for smaller access cost. Because most active vertices frequently use a small evolving set of paths for their new states’ propagation because of power-law property, this small set of paths are dynamically identified and maintained and efficiently handled on the GPU to accelerate most propagations for faster convergence, whereas the remaining graph data are handled over the CPU. For out-of-GPU-memory graph processing, LargeGraph outperforms four cutting-edge systems: Totem (5.19–11.62×), Graphie (3.02–9.41×), Garaph (2.75–8.36×), and Subway (2.45–4.15×).

APA, Harvard, Vancouver, ISO, and other styles

50

Wong, Un-Hong, Takayuki Aoki, and Hon-Cheng Wong. "Efficient magnetohydrodynamic simulations on distributed multi-GPU systems using a novel GPU Direct–MPI hybrid approach." Computer Physics Communications 185, no. 7 (July 2014): 1901–13. http://dx.doi.org/10.1016/j.cpc.2014.03.018.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!