Acceder

Bibliografías temáticas / GPU Systems / Artículos de revistas

Artículos de revistas sobre el tema "GPU Systems"

Siga este enlace para ver otros tipos de publicaciones sobre el tema: GPU Systems.

Autor: Grafiati

Publicado: 6 de septiembre de 2023

Última modificación: 7 de septiembre de 2023

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "GPU Systems".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Jararweh, Yaser, Moath Jarrah y Abdelkader Bousselham. "GPU Scaling". International Journal of Information Technology and Web Engineering 9, n.º 4 (octubre de 2014): 13–23. http://dx.doi.org/10.4018/ijitwe.2014100102.

Texto completo

Resumen

Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Dematte, L. y D. Prandi. "GPU computing for systems biology". Briefings in Bioinformatics 11, n.º 3 (7 de marzo de 2010): 323–33. http://dx.doi.org/10.1093/bib/bbq006.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Ban, Zhihua, Jianguo Liu y Jeremy Fouriaux. "GMMSP on GPU". Journal of Real-Time Image Processing 17, n.º 2 (17 de marzo de 2018): 245–57. http://dx.doi.org/10.1007/s11554-018-0762-3.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Georgii, Joachim y Rüdiger Westermann. "Mass-spring systems on the GPU". Simulation Modelling Practice and Theory 13, n.º 8 (noviembre de 2005): 693–702. http://dx.doi.org/10.1016/j.simpat.2005.08.004.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Huynh, Huynh Phung, Andrei Hagiescu, Ong Zhong Liang, Weng-Fai Wong y Rick Siow Mong Goh. "Mapping Streaming Applications onto GPU Systems". IEEE Transactions on Parallel and Distributed Systems 25, n.º 9 (septiembre de 2014): 2374–85. http://dx.doi.org/10.1109/tpds.2013.195.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Deniz, Etem y Alper Sen. "MINIME-GPU". ACM Transactions on Architecture and Code Optimization 12, n.º 4 (7 de enero de 2016): 1–25. http://dx.doi.org/10.1145/2818693.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Braak, Gert-Jan Van Den y Henk Corporaal. "R-GPU". ACM Transactions on Architecture and Code Optimization 13, n.º 1 (5 de abril de 2016): 1–24. http://dx.doi.org/10.1145/2890506.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

INO, Fumihiko, Shinta NAKAGAWA y Kenichi HAGIHARA. "GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems". IEICE Transactions on Information and Systems E96.D, n.º 12 (2013): 2604–16. http://dx.doi.org/10.1587/transinf.e96.d.2604.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Rosenfeld, Viktor, Sebastian Breß y Volker Markl. "Query Processing on Heterogeneous CPU/GPU Systems". ACM Computing Surveys 55, n.º 1 (31 de enero de 2023): 1–38. http://dx.doi.org/10.1145/3485126.

Texto completo

Resumen

Due to their high computational power and internal memory bandwidth, graphic processing units (GPUs) have been extensively studied by the database systems research community. A heterogeneous query processing system that employs CPUs and GPUs at the same time has to solve many challenges, including how to distribute the workload on processors with different capabilities; how to overcome the data transfer bottleneck; and how to support implementations for multiple processors efficiently. In this survey we devise a classification scheme to categorize techniques developed to address these challenges. Based on this scheme, we categorize query processing systems on heterogeneous CPU/GPU systems and identify open research problems.

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Besozzi, Daniela, Giulio Caravagna, Paolo Cazzaniga, Marco Nobile, Dario Pescini y Alessandro Re. "GPU-powered Simulation Methodologies for Biological Systems". Electronic Proceedings in Theoretical Computer Science 130 (30 de septiembre de 2013): 87–91. http://dx.doi.org/10.4204/eptcs.130.14.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

ODAKA, Fumihiro y Kenkichi SATO. "S2030201 GPU Computing Systems: History and Application". Proceedings of Mechanical Engineering Congress, Japan 2014 (2014): _S2030201——_S2030201—. http://dx.doi.org/10.1299/jsmemecj.2014._s2030201-.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Maza, Marc Moreno y Wei Pan. "Solving Bivariate Polynomial Systems on a GPU". Journal of Physics: Conference Series 341 (9 de febrero de 2012): 012022. http://dx.doi.org/10.1088/1742-6596/341/1/012022.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

ODAGAWA, Masato, Yuriko TAKESHIMA, Issei FUJISHIRO, Gota KIKUGAWA y Taku OHARA. "GPU-Based Adaptive Visualization for Particle Systems". TRANSACTIONS OF THE JAPAN SOCIETY OF MECHANICAL ENGINEERS Series B 77, n.º 781 (2011): 1767–78. http://dx.doi.org/10.1299/kikaib.77.1767.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Maza, Marc Moreno y Wei Pan. "Solving bivariate polynomial systems on a GPU". ACM Communications in Computer Algebra 45, n.º 1/2 (25 de julio de 2011): 127–28. http://dx.doi.org/10.1145/2016567.2016589.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Jiang, Hai, Yi Chen, Zhi Qiao, Kuan-Ching Li, WonWoo Ro y Jean-Luc Gaudiot. "Accelerating MapReduce framework on multi-GPU systems". Cluster Computing 17, n.º 2 (30 de mayo de 2013): 293–301. http://dx.doi.org/10.1007/s10586-013-0276-5.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Bernaschi, M., M. Fatica, G. Parisi y L. Parisi. "Multi-GPU codes for spin systems simulations". Computer Physics Communications 183, n.º 7 (julio de 2012): 1416–21. http://dx.doi.org/10.1016/j.cpc.2012.02.015.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Ino, Fumihiko, Akihiro Ogita, Kentaro Oita y Kenichi Hagihara. "Cooperative multitasking for GPU-accelerated grid systems". Concurrency and Computation: Practice and Experience 24, n.º 1 (22 de marzo de 2011): 96–107. http://dx.doi.org/10.1002/cpe.1722.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Lamas-Rodríguez, Julián, Dora B. Heras, Francisco Argüello, Dagmar Kainmueller, Stefan Zachow y Montserrat Bóo. "GPU-accelerated level-set segmentation". Journal of Real-Time Image Processing 12, n.º 1 (26 de noviembre de 2013): 15–29. http://dx.doi.org/10.1007/s11554-013-0378-6.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Meng, Wanwan, Yongguang Cheng, Jiayang Wu, Zhiyan Yang, Yunxian Zhu y Shuai Shang. "GPU Acceleration of Hydraulic Transient Simulations of Large-Scale Water Supply Systems". Applied Sciences 9, n.º 1 (27 de diciembre de 2018): 91. http://dx.doi.org/10.3390/app9010091.

Texto completo

Resumen

Simulating hydraulic transients in ultra-long water (oil, gas) transmission or large-scale distribution systems are time-consuming, and exploring ways to improve the simulation efficiency is an essential research direction. The parallel implementation of the method of characteristics (MOC) on graphics processing unit (GPU) chips is a promising approach for accelerating the simulations, because GPU has a great parallelization ability for massive but simple computations, and the explicit and local features of MOC meet the features of GPU quite well. In this paper, we propose and verify a GPU implementation of MOC on a single chip for more efficient simulations of hydraulic transients. Details of GPU-MOC parallel strategies are introduced, and the accuracy and efficiency of the proposed method are verified by simulating the benchmark single pipe water hammer problem. The transient processes of a large scale water distribution system and a long-distance water transmission system are simulated to investigate the computing capability of the proposed method. The results show that GPU-MOC method can achieve significant performance gains, and the speedup ratios are up to hundreds compared to the traditional method. This preliminary work demonstrates that GPU-MOC parallel computing has great prospects in practical applications with large computing load.

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Zhou, Zhe, Wenrui Diao, Xiangyu Liu, Zhou Li, Kehuan Zhang y Rui Liu. "Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU". Proceedings on Privacy Enhancing Technologies 2017, n.º 2 (1 de abril de 2017): 57–73. http://dx.doi.org/10.1515/popets-2017-0016.

Texto completo

Resumen

Abstract According to previous reports, information could be leaked from GPU memory; however, the security implications of such a threat were mostly over-looked, because only limited information could be indirectly extracted through side-channel attacks. In this paper, we propose a novel algorithm for recovering raw data directly from the GPU memory residues of many popular applications such as Google Chrome and Adobe PDF reader. Our algorithm enables harvesting highly sensitive information including credit card numbers and email contents from GPU memory residues. Evaluation results also indicate that nearly all GPU-accelerated applications are vulnerable to such attacks, and adversaries can launch attacks without requiring any special privileges both on traditional multi-user operating systems, and emerging cloud computing scenarios.

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Campeanu, Gabriel y Mehrdad Saadatmand. "A Two-Layer Component-Based Allocation for Embedded Systems with GPUs". Designs 3, n.º 1 (19 de enero de 2019): 6. http://dx.doi.org/10.3390/designs3010006.

Texto completo

Resumen

Component-based development is a software engineering paradigm that can facilitate the construction of embedded systems and tackle its complexities. The modern embedded systems have more and more demanding requirements. One way to cope with such a versatile and growing set of requirements is to employ heterogeneous processing power, i.e., CPU–GPU architectures. The new CPU–GPU embedded boards deliver an increased performance but also introduce additional complexity and challenges. In this work, we address the component-to-hardware allocation for CPU–GPU embedded systems. The allocation for such systems is much complex due to the increased amount of GPU-related information. For example, while in traditional embedded systems the allocation mechanism may consider only the CPU memory usage of components to find an appropriate allocation scheme, in heterogeneous systems, the GPU memory usage needs also to be taken into account in the allocation process. This paper aims at decreasing the component-to-hardware allocation complexity by introducing a two-layer component-based architecture for heterogeneous embedded systems. The detailed CPU–GPU information of the system is abstracted at a high-layer by compacting connected components into single units that behave as regular components. The allocator, based on the compacted information received from the high-level layer, computes, with a decreased complexity, feasible allocation schemes. In the last part of the paper, the two-layer allocation method is evaluated using an existing embedded system demonstrator; namely, an underwater robot.

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Chen, Yong, Hai Jin, Han Jiang, Dechao Xu, Ran Zheng y Haocheng Liu. "Implementation and Optimization of GPU-Based Static State Security Analysis in Power Systems". Mobile Information Systems 2017 (2017): 1–10. http://dx.doi.org/10.1155/2017/1897476.

Texto completo

Resumen

Static state security analysis (SSSA) is one of the most important computations to check whether a power system is in normal and secure operating state. It is a challenge to satisfy real-time requirements with CPU-based concurrent methods due to the intensive computations. A sensitivity analysis-based method with Graphics processing unit (GPU) is proposed for power systems, which can reduce calculation time by 40% compared to the execution on a 4-core CPU. The proposed method involves load flow analysis and sensitivity analysis. In load flow analysis, a multifrontal method for sparse LU factorization is explored on GPU through dynamic frontal task scheduling between CPU and GPU. The varying matrix operations during sensitivity analysis on GPU are highly optimized in this study. The results of performance evaluations show that the proposed GPU-based SSSA with optimized matrix operations can achieve a significant reduction in computation time.

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Tran, Giang Son, Thi Phuong Nghiem y Jean-Christophe Burie. "Fast parallel blur detection on GPU". Journal of Real-Time Image Processing 17, n.º 4 (12 de noviembre de 2018): 903–13. http://dx.doi.org/10.1007/s11554-018-0837-1.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

Abell, Stephen, Nhan Do y John Jaehwan Lee. "GPU-OSDDA: a bit-vector GPU-based deadlock detection algorithm for single-unit resource systems". International Journal of Parallel, Emergent and Distributed Systems 31, n.º 5 (24 de octubre de 2015): 450–68. http://dx.doi.org/10.1080/17445760.2015.1100301.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

Abell, Stephen, Nhan Do y John Jaehwan Lee. "GPU-LMDDA: a bit-vector GPU-based deadlock detection algorithm for multi-unit resource systems". International Journal of Parallel, Emergent and Distributed Systems 31, n.º 6 (19 de febrero de 2016): 562–90. http://dx.doi.org/10.1080/17445760.2016.1140761.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Wang, Long, Masaki Iwasawa, Keigo Nitadori y Junichiro Makino. "petar: a high-performance N-body code for modelling massive collisional stellar systems". Monthly Notices of the Royal Astronomical Society 497, n.º 1 (24 de julio de 2020): 536–55. http://dx.doi.org/10.1093/mnras/staa1915.

Texto completo

Resumen

ABSTRACT The numerical simulations of massive collisional stellar systems, such as globular clusters (GCs), are very time consuming. Until now, only a few realistic million-body simulations of GCs with a small fraction of binaries ($5{{\ \rm per\ cent}}$) have been performed by using the nbody6++gpu code. Such models took half a year computational time on a Graphic Processing Unit (GPU)-based supercomputer. In this work, we develop a new N-body code, petar, by combining the methods of Barnes–Hut tree, Hermite integrator and slow-down algorithmic regularization. The code can accurately handle an arbitrary fraction of multiple systems (e.g. binaries and triples) while keeping a high performance by using the hybrid parallelization methods with mpi, openmp, simd instructions and GPU. A few benchmarks indicate that petar and nbody6++gpu have a very good agreement on the long-term evolution of the global structure, binary orbits and escapers. On a highly configured GPU desktop computer, the performance of a million-body simulation with all stars in binaries by using petar is 11 times faster than that of nbody6++gpu. Moreover, on the Cray XC50 supercomputer, petar well scales when number of cores increase. The 10 million-body problem, which covers the region of ultracompact dwarfs and nuclear star clusters, becomes possible to be solved.

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Kopysov, S. P., A. K. Novikov y Yu A. Sagdeeva. "Solving of discontinuous Galerkin method systems on GPU". Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, n.º 4 (diciembre de 2011): 121–31. http://dx.doi.org/10.20537/vm110411.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Martínez-del-Amor, Miguel A., Manuel García-Quismondo, Luis F. Macías-Ramos, Luis Valencia-Cabrera, Agustin Riscos-Núñez y Mario J. Pérez-Jiménez. "Simulating P Systems on GPU Devices: A Survey". Fundamenta Informaticae 136, n.º 3 (2015): 269–84. http://dx.doi.org/10.3233/fi-2015-1157.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

van Pelt, Roy, Anna Vilanova y Huub van de Wetering. "Illustrative Volume Visualization Using GPU-Based Particle Systems". IEEE Transactions on Visualization and Computer Graphics 16, n.º 4 (julio de 2010): 571–82. http://dx.doi.org/10.1109/tvcg.2010.32.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

Anzt, Hartwig, Stanimire Tomov, Mark Gates, Jack Dongarra y Vincent Heuveline. "Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems". Procedia Computer Science 9 (2012): 7–16. http://dx.doi.org/10.1016/j.procs.2012.04.002.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Galiano, V., H. Migallón, V. Migallón y J. Penadés. "GPU-based parallel algorithms for sparse nonlinear systems". Journal of Parallel and Distributed Computing 72, n.º 9 (septiembre de 2012): 1098–105. http://dx.doi.org/10.1016/j.jpdc.2011.10.016.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Nere, Andrew, Sean Franey, Atif Hashmi y Mikko Lipasti. "Simulating cortical networks on heterogeneous multi-GPU systems". Journal of Parallel and Distributed Computing 73, n.º 7 (julio de 2013): 953–71. http://dx.doi.org/10.1016/j.jpdc.2012.02.006.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Mastrostefano, Enrico y Massimo Bernaschi. "Efficient breadth first search on multi-GPU systems". Journal of Parallel and Distributed Computing 73, n.º 9 (septiembre de 2013): 1292–305. http://dx.doi.org/10.1016/j.jpdc.2013.05.007.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

Acosta, Alejandro, Vicente Blanco y Francisco Almeida. "Dynamic load balancing on heterogeneous multi-GPU systems". Computers & Electrical Engineering 39, n.º 8 (noviembre de 2013): 2591–602. http://dx.doi.org/10.1016/j.compeleceng.2013.08.004.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Dastgeer, Usman y Christoph Kessler. "Performance-aware composition framework for GPU-based systems". Journal of Supercomputing 71, n.º 12 (30 de enero de 2014): 4646–62. http://dx.doi.org/10.1007/s11227-014-1105-1.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

Jo, Heeseung, Seung-Tae Hong, Jae-Woo Chang y Dong Hoon Choi. "Offloading data encryption to GPU in database systems". Journal of Supercomputing 69, n.º 1 (21 de marzo de 2014): 375–94. http://dx.doi.org/10.1007/s11227-014-1159-0.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

Vuduc, Richard y Kent Czechowski. "What GPU Computing Means for High-End Systems". IEEE Micro 31, n.º 4 (julio de 2011): 74–78. http://dx.doi.org/10.1109/mm.2011.78.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

da Silva Junior, Jose Ricardo, Esteban Clua y Leonardo Murta. "Efficient image-aware version control systems using GPU". Software: Practice and Experience 46, n.º 8 (24 de junio de 2015): 1011–33. http://dx.doi.org/10.1002/spe.2340.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

Gembris, Daniel, Markus Neeb, Markus Gipp, Andreas Kugel y Reinhard Männer. "Correlation analysis on GPU systems using NVIDIA’s CUDA". Journal of Real-Time Image Processing 6, n.º 4 (17 de junio de 2010): 275–80. http://dx.doi.org/10.1007/s11554-010-0162-9.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

YOO, SEUNG-HUN y CHANG-SUNG JEONG. "IMAGE REGISTRATION AND FUSION SYSTEM BASED ON GPU". Journal of Circuits, Systems and Computers 19, n.º 01 (febrero de 2010): 173–89. http://dx.doi.org/10.1142/s0218126610006049.

Texto completo

Resumen

Graphics processing unit (GPU) has surfaced as a high-quality platform for computer vision-related systems. In this paper, we propose a straightforward system consisting of a registration and a fusion method over GPU, which generates good results at high speed, compared to non-GPU-based systems. Our GPU-accelerated system utilizes existing methods through converting the methods into the GPU-based platform. The registration method uses point correspondences to find a registering transformation estimated with the incremental parameters in a coarse-to-fine way, while the fusion algorithm uses multi-scale methods to fuse the results from the registration stage. We evaluate performance with the same methods that are executed over both CPU-only and GPU-mounted environment. The experiment results present convincing evidences of the efficiency of our system, which is tested on a few pairs of aerial images taken by electro-optical and infrared sensors to provide visual information of a scene for environmental observatories.

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

Kumar, Anshuman, Pablo R. Arantes, Aakash Saha, Giulia Palermo y Bryan M. Wong. "GPU-Enhanced DFTB Metadynamics for Efficiently Predicting Free Energies of Biochemical Systems". Molecules 28, n.º 3 (28 de enero de 2023): 1277. http://dx.doi.org/10.3390/molecules28031277.

Texto completo

Resumen

Metadynamics calculations of large chemical systems with ab initio methods are computationally prohibitive due to the extensive sampling required to simulate the large degrees of freedom in these systems. To address this computational bottleneck, we utilized a GPU-enhanced density functional tight binding (DFTB) approach on a massively parallelized cloud computing platform to efficiently calculate the thermodynamics and metadynamics of biochemical systems. To first validate our approach, we calculated the free-energy surfaces of alanine dipeptide and showed that our GPU-enhanced DFTB calculations qualitatively agree with computationally-intensive hybrid DFT benchmarks, whereas classical force fields give significant errors. Most importantly, we show that our GPU-accelerated DFTB calculations are significantly faster than previous approaches by up to two orders of magnitude. To further extend our GPU-enhanced DFTB approach, we also carried out a 10 ns metadynamics simulation of remdesivir, which is prohibitively out of reach for routine DFT-based metadynamics calculations. We find that the free-energy surfaces of remdesivir obtained from DFTB and classical force fields differ significantly, where the latter overestimates the internal energy contribution of high free-energy states. Taken together, our benchmark tests, analyses, and extensions to large biochemical systems highlight the use of GPU-enhanced DFTB simulations for efficiently predicting the free-energy surfaces/thermodynamics of large biochemical systems.

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

Ngo, Long Thanh, Dzung Dinh Nguyen, Long The Pham y Cuong Manh Luong. "Speedup of Interval Type 2 Fuzzy Logic Systems Based on GPU for Robot Navigation". Advances in Fuzzy Systems 2012 (2012): 1–11. http://dx.doi.org/10.1155/2012/698062.

Texto completo

Resumen

As the number of rules and sample rate for type 2 fuzzy logic systems (T2FLSs) increases, the speed of calculations becomes a problem. The T2FLS has a large membership value of inherent algorithmic parallelism that modern CPU architectures do not exploit. In the T2FLS, many rules and algorithms can be speedup on a graphics processing unit (GPU) as long as the majority of computation a various stages and components are not dependent on each other. This paper demonstrates how to install interval type 2 fuzzy logic systems (IT2-FLSs) on the GPU and experiments for obstacle avoidance behavior of robot navigation. GPU-based calculations are high-performance solution and free up the CPU. The experimental results show that the performance of the GPU is many times faster than CPU.

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

Ding, Yifan, Nicholas Botzer y Tim Weninger. "HetSeq: Distributed GPU Training on Heterogeneous Infrastructure". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 17 (18 de mayo de 2021): 15432–38. http://dx.doi.org/10.1609/aaai.v35i17.17813.

Texto completo

Resumen

Modern deep learning systems like PyTorch and Tensorflow are able to train enormous models with billions (or trillions) of parameters on a distributed infrastructure. These systems require that the internal nodes have the same memory capacity and compute performance. Unfortunately, most organizations, especially universities, have a piecemeal approach to purchasing computer systems resulting in a heterogeneous infrastructure, which cannot be used to compute large models. The present work describes HetSeq, a software package adapted from the popular PyTorch package that provides the capability to train large neural network models on heterogeneous infrastructure. Experiments with language translation, text and image classification shows that HetSeq scales over heterogeneous systems. Additional information, support documents, source code are publicly available at https://github.com/yifding/hetseq.

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

Fu, Yaosheng, Evgeny Bolotin, Niladrish Chatterjee, David Nellans y Stephen W. Keckler. "GPU Domain Specialization via Composable On-Package Architecture". ACM Transactions on Architecture and Code Optimization 19, n.º 1 (31 de marzo de 2022): 1–23. http://dx.doi.org/10.1145/3484505.

Texto completo

Resumen

As GPUs scale their low-precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities. We demonstrate that a converged GPU design trying to address diverging architectural requirements between FP32 (or larger)-based HPC and FP16 (or smaller)-based DL workloads results in sub-optimal configurations for either of the application domains. We argue that a C omposable O n- PA ckage GPU (COPA-GPU) architecture to provide domain-specialized GPU products is the most practical solution to these diverging requirements. A COPA-GPU leverages multi-chip-module disaggregation to support maximal design reuse, along with memory system specialization per application domain. We show how a COPA-GPU enables DL-specialized products by modular augmentation of the baseline GPU architecture with up to 4× higher off-die bandwidth, 32× larger on-package cache, and 2.3× higher DRAM bandwidth and capacity, while conveniently supporting scaled-down HPC-oriented designs. This work explores the microarchitectural design necessary to enable composable GPUs and evaluates the benefits composability can provide to HPC, DL training, and DL inference. We show that when compared to a converged GPU design, a DL-optimized COPA-GPU featuring a combination of 16× larger cache capacity and 1.6× higher DRAM bandwidth scales per-GPU training and inference performance by 31% and 35%, respectively, and reduces the number of GPU instances by 50% in scale-out training scenarios.

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Rapaport, D. C. "GPU molecular dynamics: Algorithms and performance". Journal of Physics: Conference Series 2241, n.º 1 (1 de marzo de 2022): 012007. http://dx.doi.org/10.1088/1742-6596/2241/1/012007.

Texto completo

Resumen

Abstract A previous study of MD algorithms designed for GPU use is extended to cover more recent developments in GPU architecture. Algorithm modifications are described, togther with extensions to more complex systems. New measurements include the effects of increased parallelism on GPU performance, as well as comparisons with multiple-core CPUs using multitasking based on CPU threads and message passing. The results show that the GPU retains a significant performance advantage.

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Zhu, Rui, Chang Nian Chen y Lei Hua Qin. "An Transfer Latency Optimized Solution in GPU-Accelerated De-Duplication". Applied Mechanics and Materials 336-338 (julio de 2013): 2059–62. http://dx.doi.org/10.4028/www.scientific.net/amm.336-338.2059.

Texto completo

Resumen

Recently, GPU has been introduced as an important tool in general purpose programming due to its powerful computing capacity. In data de-duplication systems, GPU has been used to accelerate the chunking and hashing algorithms. However, the data transfer latency between the memories of CPU to GPU is one of the main challenges in GPU accelerated de-duplication. To alleviate this challenge, our solution strives to reduce the data transfer time between host and GPU memory on parallelized content-defined chunking and hashing algorithm. In our experiment, it has shown 15%~20% performance improvements over already accelerated baseline GPU implementation in data de-duplication.

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

DeFrancisco, Richard, Shenghsun Cho, Michael Ferdman y Scott A. Smolka. "Swarm model checking on the GPU". International Journal on Software Tools for Technology Transfer 22, n.º 5 (16 de junio de 2020): 583–99. http://dx.doi.org/10.1007/s10009-020-00576-x.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Wang, Qihan, Zhen Peng, Bin Ren, Jie Chen y Robert G. Edwards. "MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation". ACM Transactions on Architecture and Code Optimization 19, n.º 2 (30 de junio de 2022): 1–26. http://dx.doi.org/10.1145/3506705.

Texto completo

Resumen

The many-body correlation function is a fundamental computation kernel in modern physics computing applications, e.g., Hadron Contractions in Lattice quantum chromodynamics (QCD). This kernel is both computation and memory intensive, involving a series of tensor contractions, and thus usually runs on accelerators like GPUs. Existing optimizations on many-body correlation mainly focus on individual tensor contractions (e.g., cuBLAS libraries and others). In contrast, this work discovers a new optimization dimension for many-body correlation by exploring the optimization opportunities among tensor contractions. More specifically, it targets general GPU architectures (both NVIDIA and AMD) and optimizes many-body correlation’s memory management by exploiting a set of memory allocation and communication redundancy elimination opportunities: first, GPU memory allocation redundancy : the intermediate output frequently occurs as input in the subsequent calculations; second, CPU-GPU communication redundancy : although all tensors are allocated on both CPU and GPU, many of them are used (and reused) on the GPU side only, and thus, many CPU/GPU communications (like that in existing Unified Memory designs) are unnecessary; third, GPU oversubscription: limited GPU memory size causes oversubscription issues, and existing memory management usually results in near-reuse data eviction, thus incurring extra CPU/GPU memory communications. Targeting these memory optimization opportunities, this article proposes MemHC, an optimized systematic GPU memory management framework that aims to accelerate the calculation of many-body correlation functions utilizing a series of new memory reduction designs. These designs involve optimizations for GPU memory allocation, CPU/GPU memory movement, and GPU memory oversubscription, respectively. More specifically, first, MemHC employs duplication-aware management and lazy release of GPU memories to corresponding host managing for better data reusability. Second, it implements data reorganization and on-demand synchronization to eliminate redundant (or unnecessary) data transfer. Third, MemHC exploits an optimized Least Recently Used (LRU) eviction policy called Pre-Protected LRU to reduce evictions and leverage memory hits. Additionally, MemHC is portable for various platforms including NVIDIA GPUs and AMD GPUs. The evaluation demonstrates that MemHC outperforms unified memory management by \( 2.18\times \) to \( 10.73\times \) . The proposed Pre-Protected LRU policy outperforms the original LRU policy by up to \( 1.36\times \) improvement. 1

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Zhang, Yu, Da Peng, Xiaofei Liao, Hai Jin, Haikun Liu, Lin Gu y Bingsheng He. "LargeGraph". ACM Transactions on Architecture and Code Optimization 18, n.º 4 (31 de diciembre de 2021): 1–24. http://dx.doi.org/10.1145/3477603.

Texto completo

Resumen

Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices’ new states along graph paths. To efficiently support out-of-GPU-memory graph processing, this work designs a system LargeGraph . Different from existing out-of-GPU-memory systems, LargeGraph proposes a dependency-aware data-driven execution approach , which can significantly accelerate active vertices’ state propagations along graph paths with low data access cost and also high parallelism. Specifically, according to the dependencies between the vertices, it only loads and processes the graph data associated with dependency chains originated from active vertices for smaller access cost. Because most active vertices frequently use a small evolving set of paths for their new states’ propagation because of power-law property, this small set of paths are dynamically identified and maintained and efficiently handled on the GPU to accelerate most propagations for faster convergence, whereas the remaining graph data are handled over the CPU. For out-of-GPU-memory graph processing, LargeGraph outperforms four cutting-edge systems: Totem (5.19–11.62×), Graphie (3.02–9.41×), Garaph (2.75–8.36×), and Subway (2.45–4.15×).

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Wong, Un-Hong, Takayuki Aoki y Hon-Cheng Wong. "Efficient magnetohydrodynamic simulations on distributed multi-GPU systems using a novel GPU Direct–MPI hybrid approach". Computer Physics Communications 185, n.º 7 (julio de 2014): 1901–13. http://dx.doi.org/10.1016/j.cpc.2014.03.018.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!