Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: GPU Systems.

Artykuły w czasopismach na temat „GPU Systems”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „GPU Systems”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Jararweh, Yaser, Moath Jarrah i Abdelkader Bousselham. "GPU Scaling". International Journal of Information Technology and Web Engineering 9, nr 4 (październik 2014): 13–23. http://dx.doi.org/10.4018/ijitwe.2014100102.

Pełny tekst źródła
Streszczenie:
Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.
Style APA, Harvard, Vancouver, ISO itp.
2

Dematte, L., i D. Prandi. "GPU computing for systems biology". Briefings in Bioinformatics 11, nr 3 (7.03.2010): 323–33. http://dx.doi.org/10.1093/bib/bbq006.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Ban, Zhihua, Jianguo Liu i Jeremy Fouriaux. "GMMSP on GPU". Journal of Real-Time Image Processing 17, nr 2 (17.03.2018): 245–57. http://dx.doi.org/10.1007/s11554-018-0762-3.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Georgii, Joachim, i Rüdiger Westermann. "Mass-spring systems on the GPU". Simulation Modelling Practice and Theory 13, nr 8 (listopad 2005): 693–702. http://dx.doi.org/10.1016/j.simpat.2005.08.004.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Huynh, Huynh Phung, Andrei Hagiescu, Ong Zhong Liang, Weng-Fai Wong i Rick Siow Mong Goh. "Mapping Streaming Applications onto GPU Systems". IEEE Transactions on Parallel and Distributed Systems 25, nr 9 (wrzesień 2014): 2374–85. http://dx.doi.org/10.1109/tpds.2013.195.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Deniz, Etem, i Alper Sen. "MINIME-GPU". ACM Transactions on Architecture and Code Optimization 12, nr 4 (7.01.2016): 1–25. http://dx.doi.org/10.1145/2818693.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Braak, Gert-Jan Van Den, i Henk Corporaal. "R-GPU". ACM Transactions on Architecture and Code Optimization 13, nr 1 (5.04.2016): 1–24. http://dx.doi.org/10.1145/2890506.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

INO, Fumihiko, Shinta NAKAGAWA i Kenichi HAGIHARA. "GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems". IEICE Transactions on Information and Systems E96.D, nr 12 (2013): 2604–16. http://dx.doi.org/10.1587/transinf.e96.d.2604.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Rosenfeld, Viktor, Sebastian Breß i Volker Markl. "Query Processing on Heterogeneous CPU/GPU Systems". ACM Computing Surveys 55, nr 1 (31.01.2023): 1–38. http://dx.doi.org/10.1145/3485126.

Pełny tekst źródła
Streszczenie:
Due to their high computational power and internal memory bandwidth, graphic processing units (GPUs) have been extensively studied by the database systems research community. A heterogeneous query processing system that employs CPUs and GPUs at the same time has to solve many challenges, including how to distribute the workload on processors with different capabilities; how to overcome the data transfer bottleneck; and how to support implementations for multiple processors efficiently. In this survey we devise a classification scheme to categorize techniques developed to address these challenges. Based on this scheme, we categorize query processing systems on heterogeneous CPU/GPU systems and identify open research problems.
Style APA, Harvard, Vancouver, ISO itp.
10

Besozzi, Daniela, Giulio Caravagna, Paolo Cazzaniga, Marco Nobile, Dario Pescini i Alessandro Re. "GPU-powered Simulation Methodologies for Biological Systems". Electronic Proceedings in Theoretical Computer Science 130 (30.09.2013): 87–91. http://dx.doi.org/10.4204/eptcs.130.14.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
11

ODAKA, Fumihiro, i Kenkichi SATO. "S2030201 GPU Computing Systems: History and Application". Proceedings of Mechanical Engineering Congress, Japan 2014 (2014): _S2030201——_S2030201—. http://dx.doi.org/10.1299/jsmemecj.2014._s2030201-.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
12

Maza, Marc Moreno, i Wei Pan. "Solving Bivariate Polynomial Systems on a GPU". Journal of Physics: Conference Series 341 (9.02.2012): 012022. http://dx.doi.org/10.1088/1742-6596/341/1/012022.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
13

ODAGAWA, Masato, Yuriko TAKESHIMA, Issei FUJISHIRO, Gota KIKUGAWA i Taku OHARA. "GPU-Based Adaptive Visualization for Particle Systems". TRANSACTIONS OF THE JAPAN SOCIETY OF MECHANICAL ENGINEERS Series B 77, nr 781 (2011): 1767–78. http://dx.doi.org/10.1299/kikaib.77.1767.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
14

Maza, Marc Moreno, i Wei Pan. "Solving bivariate polynomial systems on a GPU". ACM Communications in Computer Algebra 45, nr 1/2 (25.07.2011): 127–28. http://dx.doi.org/10.1145/2016567.2016589.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
15

Jiang, Hai, Yi Chen, Zhi Qiao, Kuan-Ching Li, WonWoo Ro i Jean-Luc Gaudiot. "Accelerating MapReduce framework on multi-GPU systems". Cluster Computing 17, nr 2 (30.05.2013): 293–301. http://dx.doi.org/10.1007/s10586-013-0276-5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
16

Bernaschi, M., M. Fatica, G. Parisi i L. Parisi. "Multi-GPU codes for spin systems simulations". Computer Physics Communications 183, nr 7 (lipiec 2012): 1416–21. http://dx.doi.org/10.1016/j.cpc.2012.02.015.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
17

Ino, Fumihiko, Akihiro Ogita, Kentaro Oita i Kenichi Hagihara. "Cooperative multitasking for GPU-accelerated grid systems". Concurrency and Computation: Practice and Experience 24, nr 1 (22.03.2011): 96–107. http://dx.doi.org/10.1002/cpe.1722.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
18

Lamas-Rodríguez, Julián, Dora B. Heras, Francisco Argüello, Dagmar Kainmueller, Stefan Zachow i Montserrat Bóo. "GPU-accelerated level-set segmentation". Journal of Real-Time Image Processing 12, nr 1 (26.11.2013): 15–29. http://dx.doi.org/10.1007/s11554-013-0378-6.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
19

Meng, Wanwan, Yongguang Cheng, Jiayang Wu, Zhiyan Yang, Yunxian Zhu i Shuai Shang. "GPU Acceleration of Hydraulic Transient Simulations of Large-Scale Water Supply Systems". Applied Sciences 9, nr 1 (27.12.2018): 91. http://dx.doi.org/10.3390/app9010091.

Pełny tekst źródła
Streszczenie:
Simulating hydraulic transients in ultra-long water (oil, gas) transmission or large-scale distribution systems are time-consuming, and exploring ways to improve the simulation efficiency is an essential research direction. The parallel implementation of the method of characteristics (MOC) on graphics processing unit (GPU) chips is a promising approach for accelerating the simulations, because GPU has a great parallelization ability for massive but simple computations, and the explicit and local features of MOC meet the features of GPU quite well. In this paper, we propose and verify a GPU implementation of MOC on a single chip for more efficient simulations of hydraulic transients. Details of GPU-MOC parallel strategies are introduced, and the accuracy and efficiency of the proposed method are verified by simulating the benchmark single pipe water hammer problem. The transient processes of a large scale water distribution system and a long-distance water transmission system are simulated to investigate the computing capability of the proposed method. The results show that GPU-MOC method can achieve significant performance gains, and the speedup ratios are up to hundreds compared to the traditional method. This preliminary work demonstrates that GPU-MOC parallel computing has great prospects in practical applications with large computing load.
Style APA, Harvard, Vancouver, ISO itp.
20

Zhou, Zhe, Wenrui Diao, Xiangyu Liu, Zhou Li, Kehuan Zhang i Rui Liu. "Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU". Proceedings on Privacy Enhancing Technologies 2017, nr 2 (1.04.2017): 57–73. http://dx.doi.org/10.1515/popets-2017-0016.

Pełny tekst źródła
Streszczenie:
Abstract According to previous reports, information could be leaked from GPU memory; however, the security implications of such a threat were mostly over-looked, because only limited information could be indirectly extracted through side-channel attacks. In this paper, we propose a novel algorithm for recovering raw data directly from the GPU memory residues of many popular applications such as Google Chrome and Adobe PDF reader. Our algorithm enables harvesting highly sensitive information including credit card numbers and email contents from GPU memory residues. Evaluation results also indicate that nearly all GPU-accelerated applications are vulnerable to such attacks, and adversaries can launch attacks without requiring any special privileges both on traditional multi-user operating systems, and emerging cloud computing scenarios.
Style APA, Harvard, Vancouver, ISO itp.
21

Campeanu, Gabriel, i Mehrdad Saadatmand. "A Two-Layer Component-Based Allocation for Embedded Systems with GPUs". Designs 3, nr 1 (19.01.2019): 6. http://dx.doi.org/10.3390/designs3010006.

Pełny tekst źródła
Streszczenie:
Component-based development is a software engineering paradigm that can facilitate the construction of embedded systems and tackle its complexities. The modern embedded systems have more and more demanding requirements. One way to cope with such a versatile and growing set of requirements is to employ heterogeneous processing power, i.e., CPU–GPU architectures. The new CPU–GPU embedded boards deliver an increased performance but also introduce additional complexity and challenges. In this work, we address the component-to-hardware allocation for CPU–GPU embedded systems. The allocation for such systems is much complex due to the increased amount of GPU-related information. For example, while in traditional embedded systems the allocation mechanism may consider only the CPU memory usage of components to find an appropriate allocation scheme, in heterogeneous systems, the GPU memory usage needs also to be taken into account in the allocation process. This paper aims at decreasing the component-to-hardware allocation complexity by introducing a two-layer component-based architecture for heterogeneous embedded systems. The detailed CPU–GPU information of the system is abstracted at a high-layer by compacting connected components into single units that behave as regular components. The allocator, based on the compacted information received from the high-level layer, computes, with a decreased complexity, feasible allocation schemes. In the last part of the paper, the two-layer allocation method is evaluated using an existing embedded system demonstrator; namely, an underwater robot.
Style APA, Harvard, Vancouver, ISO itp.
22

Chen, Yong, Hai Jin, Han Jiang, Dechao Xu, Ran Zheng i Haocheng Liu. "Implementation and Optimization of GPU-Based Static State Security Analysis in Power Systems". Mobile Information Systems 2017 (2017): 1–10. http://dx.doi.org/10.1155/2017/1897476.

Pełny tekst źródła
Streszczenie:
Static state security analysis (SSSA) is one of the most important computations to check whether a power system is in normal and secure operating state. It is a challenge to satisfy real-time requirements with CPU-based concurrent methods due to the intensive computations. A sensitivity analysis-based method with Graphics processing unit (GPU) is proposed for power systems, which can reduce calculation time by 40% compared to the execution on a 4-core CPU. The proposed method involves load flow analysis and sensitivity analysis. In load flow analysis, a multifrontal method for sparse LU factorization is explored on GPU through dynamic frontal task scheduling between CPU and GPU. The varying matrix operations during sensitivity analysis on GPU are highly optimized in this study. The results of performance evaluations show that the proposed GPU-based SSSA with optimized matrix operations can achieve a significant reduction in computation time.
Style APA, Harvard, Vancouver, ISO itp.
23

Tran, Giang Son, Thi Phuong Nghiem i Jean-Christophe Burie. "Fast parallel blur detection on GPU". Journal of Real-Time Image Processing 17, nr 4 (12.11.2018): 903–13. http://dx.doi.org/10.1007/s11554-018-0837-1.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
24

Abell, Stephen, Nhan Do i John Jaehwan Lee. "GPU-OSDDA: a bit-vector GPU-based deadlock detection algorithm for single-unit resource systems". International Journal of Parallel, Emergent and Distributed Systems 31, nr 5 (24.10.2015): 450–68. http://dx.doi.org/10.1080/17445760.2015.1100301.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
25

Abell, Stephen, Nhan Do i John Jaehwan Lee. "GPU-LMDDA: a bit-vector GPU-based deadlock detection algorithm for multi-unit resource systems". International Journal of Parallel, Emergent and Distributed Systems 31, nr 6 (19.02.2016): 562–90. http://dx.doi.org/10.1080/17445760.2016.1140761.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
26

Wang, Long, Masaki Iwasawa, Keigo Nitadori i Junichiro Makino. "petar: a high-performance N-body code for modelling massive collisional stellar systems". Monthly Notices of the Royal Astronomical Society 497, nr 1 (24.07.2020): 536–55. http://dx.doi.org/10.1093/mnras/staa1915.

Pełny tekst źródła
Streszczenie:
ABSTRACT The numerical simulations of massive collisional stellar systems, such as globular clusters (GCs), are very time consuming. Until now, only a few realistic million-body simulations of GCs with a small fraction of binaries ($5{{\ \rm per\ cent}}$) have been performed by using the nbody6++gpu code. Such models took half a year computational time on a Graphic Processing Unit (GPU)-based supercomputer. In this work, we develop a new N-body code, petar, by combining the methods of Barnes–Hut tree, Hermite integrator and slow-down algorithmic regularization. The code can accurately handle an arbitrary fraction of multiple systems (e.g. binaries and triples) while keeping a high performance by using the hybrid parallelization methods with mpi, openmp, simd instructions and GPU. A few benchmarks indicate that petar and nbody6++gpu have a very good agreement on the long-term evolution of the global structure, binary orbits and escapers. On a highly configured GPU desktop computer, the performance of a million-body simulation with all stars in binaries by using petar is 11 times faster than that of nbody6++gpu. Moreover, on the Cray XC50 supercomputer, petar well scales when number of cores increase. The 10 million-body problem, which covers the region of ultracompact dwarfs and nuclear star clusters, becomes possible to be solved.
Style APA, Harvard, Vancouver, ISO itp.
27

Kopysov, S. P., A. K. Novikov i Yu A. Sagdeeva. "Solving of discontinuous Galerkin method systems on GPU". Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, nr 4 (grudzień 2011): 121–31. http://dx.doi.org/10.20537/vm110411.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
28

Martínez-del-Amor, Miguel A., Manuel García-Quismondo, Luis F. Macías-Ramos, Luis Valencia-Cabrera, Agustin Riscos-Núñez i Mario J. Pérez-Jiménez. "Simulating P Systems on GPU Devices: A Survey". Fundamenta Informaticae 136, nr 3 (2015): 269–84. http://dx.doi.org/10.3233/fi-2015-1157.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
29

van Pelt, Roy, Anna Vilanova i Huub van de Wetering. "Illustrative Volume Visualization Using GPU-Based Particle Systems". IEEE Transactions on Visualization and Computer Graphics 16, nr 4 (lipiec 2010): 571–82. http://dx.doi.org/10.1109/tvcg.2010.32.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
30

Anzt, Hartwig, Stanimire Tomov, Mark Gates, Jack Dongarra i Vincent Heuveline. "Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems". Procedia Computer Science 9 (2012): 7–16. http://dx.doi.org/10.1016/j.procs.2012.04.002.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
31

Galiano, V., H. Migallón, V. Migallón i J. Penadés. "GPU-based parallel algorithms for sparse nonlinear systems". Journal of Parallel and Distributed Computing 72, nr 9 (wrzesień 2012): 1098–105. http://dx.doi.org/10.1016/j.jpdc.2011.10.016.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
32

Nere, Andrew, Sean Franey, Atif Hashmi i Mikko Lipasti. "Simulating cortical networks on heterogeneous multi-GPU systems". Journal of Parallel and Distributed Computing 73, nr 7 (lipiec 2013): 953–71. http://dx.doi.org/10.1016/j.jpdc.2012.02.006.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
33

Mastrostefano, Enrico, i Massimo Bernaschi. "Efficient breadth first search on multi-GPU systems". Journal of Parallel and Distributed Computing 73, nr 9 (wrzesień 2013): 1292–305. http://dx.doi.org/10.1016/j.jpdc.2013.05.007.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
34

Acosta, Alejandro, Vicente Blanco i Francisco Almeida. "Dynamic load balancing on heterogeneous multi-GPU systems". Computers & Electrical Engineering 39, nr 8 (listopad 2013): 2591–602. http://dx.doi.org/10.1016/j.compeleceng.2013.08.004.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
35

Dastgeer, Usman, i Christoph Kessler. "Performance-aware composition framework for GPU-based systems". Journal of Supercomputing 71, nr 12 (30.01.2014): 4646–62. http://dx.doi.org/10.1007/s11227-014-1105-1.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
36

Jo, Heeseung, Seung-Tae Hong, Jae-Woo Chang i Dong Hoon Choi. "Offloading data encryption to GPU in database systems". Journal of Supercomputing 69, nr 1 (21.03.2014): 375–94. http://dx.doi.org/10.1007/s11227-014-1159-0.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
37

Vuduc, Richard, i Kent Czechowski. "What GPU Computing Means for High-End Systems". IEEE Micro 31, nr 4 (lipiec 2011): 74–78. http://dx.doi.org/10.1109/mm.2011.78.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

da Silva Junior, Jose Ricardo, Esteban Clua i Leonardo Murta. "Efficient image-aware version control systems using GPU". Software: Practice and Experience 46, nr 8 (24.06.2015): 1011–33. http://dx.doi.org/10.1002/spe.2340.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
39

Gembris, Daniel, Markus Neeb, Markus Gipp, Andreas Kugel i Reinhard Männer. "Correlation analysis on GPU systems using NVIDIA’s CUDA". Journal of Real-Time Image Processing 6, nr 4 (17.06.2010): 275–80. http://dx.doi.org/10.1007/s11554-010-0162-9.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
40

YOO, SEUNG-HUN, i CHANG-SUNG JEONG. "IMAGE REGISTRATION AND FUSION SYSTEM BASED ON GPU". Journal of Circuits, Systems and Computers 19, nr 01 (luty 2010): 173–89. http://dx.doi.org/10.1142/s0218126610006049.

Pełny tekst źródła
Streszczenie:
Graphics processing unit (GPU) has surfaced as a high-quality platform for computer vision-related systems. In this paper, we propose a straightforward system consisting of a registration and a fusion method over GPU, which generates good results at high speed, compared to non-GPU-based systems. Our GPU-accelerated system utilizes existing methods through converting the methods into the GPU-based platform. The registration method uses point correspondences to find a registering transformation estimated with the incremental parameters in a coarse-to-fine way, while the fusion algorithm uses multi-scale methods to fuse the results from the registration stage. We evaluate performance with the same methods that are executed over both CPU-only and GPU-mounted environment. The experiment results present convincing evidences of the efficiency of our system, which is tested on a few pairs of aerial images taken by electro-optical and infrared sensors to provide visual information of a scene for environmental observatories.
Style APA, Harvard, Vancouver, ISO itp.
41

Kumar, Anshuman, Pablo R. Arantes, Aakash Saha, Giulia Palermo i Bryan M. Wong. "GPU-Enhanced DFTB Metadynamics for Efficiently Predicting Free Energies of Biochemical Systems". Molecules 28, nr 3 (28.01.2023): 1277. http://dx.doi.org/10.3390/molecules28031277.

Pełny tekst źródła
Streszczenie:
Metadynamics calculations of large chemical systems with ab initio methods are computationally prohibitive due to the extensive sampling required to simulate the large degrees of freedom in these systems. To address this computational bottleneck, we utilized a GPU-enhanced density functional tight binding (DFTB) approach on a massively parallelized cloud computing platform to efficiently calculate the thermodynamics and metadynamics of biochemical systems. To first validate our approach, we calculated the free-energy surfaces of alanine dipeptide and showed that our GPU-enhanced DFTB calculations qualitatively agree with computationally-intensive hybrid DFT benchmarks, whereas classical force fields give significant errors. Most importantly, we show that our GPU-accelerated DFTB calculations are significantly faster than previous approaches by up to two orders of magnitude. To further extend our GPU-enhanced DFTB approach, we also carried out a 10 ns metadynamics simulation of remdesivir, which is prohibitively out of reach for routine DFT-based metadynamics calculations. We find that the free-energy surfaces of remdesivir obtained from DFTB and classical force fields differ significantly, where the latter overestimates the internal energy contribution of high free-energy states. Taken together, our benchmark tests, analyses, and extensions to large biochemical systems highlight the use of GPU-enhanced DFTB simulations for efficiently predicting the free-energy surfaces/thermodynamics of large biochemical systems.
Style APA, Harvard, Vancouver, ISO itp.
42

Ngo, Long Thanh, Dzung Dinh Nguyen, Long The Pham i Cuong Manh Luong. "Speedup of Interval Type 2 Fuzzy Logic Systems Based on GPU for Robot Navigation". Advances in Fuzzy Systems 2012 (2012): 1–11. http://dx.doi.org/10.1155/2012/698062.

Pełny tekst źródła
Streszczenie:
As the number of rules and sample rate for type 2 fuzzy logic systems (T2FLSs) increases, the speed of calculations becomes a problem. The T2FLS has a large membership value of inherent algorithmic parallelism that modern CPU architectures do not exploit. In the T2FLS, many rules and algorithms can be speedup on a graphics processing unit (GPU) as long as the majority of computation a various stages and components are not dependent on each other. This paper demonstrates how to install interval type 2 fuzzy logic systems (IT2-FLSs) on the GPU and experiments for obstacle avoidance behavior of robot navigation. GPU-based calculations are high-performance solution and free up the CPU. The experimental results show that the performance of the GPU is many times faster than CPU.
Style APA, Harvard, Vancouver, ISO itp.
43

Ding, Yifan, Nicholas Botzer i Tim Weninger. "HetSeq: Distributed GPU Training on Heterogeneous Infrastructure". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 17 (18.05.2021): 15432–38. http://dx.doi.org/10.1609/aaai.v35i17.17813.

Pełny tekst źródła
Streszczenie:
Modern deep learning systems like PyTorch and Tensorflow are able to train enormous models with billions (or trillions) of parameters on a distributed infrastructure. These systems require that the internal nodes have the same memory capacity and compute performance. Unfortunately, most organizations, especially universities, have a piecemeal approach to purchasing computer systems resulting in a heterogeneous infrastructure, which cannot be used to compute large models. The present work describes HetSeq, a software package adapted from the popular PyTorch package that provides the capability to train large neural network models on heterogeneous infrastructure. Experiments with language translation, text and image classification shows that HetSeq scales over heterogeneous systems. Additional information, support documents, source code are publicly available at https://github.com/yifding/hetseq.
Style APA, Harvard, Vancouver, ISO itp.
44

Fu, Yaosheng, Evgeny Bolotin, Niladrish Chatterjee, David Nellans i Stephen W. Keckler. "GPU Domain Specialization via Composable On-Package Architecture". ACM Transactions on Architecture and Code Optimization 19, nr 1 (31.03.2022): 1–23. http://dx.doi.org/10.1145/3484505.

Pełny tekst źródła
Streszczenie:
As GPUs scale their low-precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities. We demonstrate that a converged GPU design trying to address diverging architectural requirements between FP32 (or larger)-based HPC and FP16 (or smaller)-based DL workloads results in sub-optimal configurations for either of the application domains. We argue that a C omposable O n- PA ckage GPU (COPA-GPU) architecture to provide domain-specialized GPU products is the most practical solution to these diverging requirements. A COPA-GPU leverages multi-chip-module disaggregation to support maximal design reuse, along with memory system specialization per application domain. We show how a COPA-GPU enables DL-specialized products by modular augmentation of the baseline GPU architecture with up to 4× higher off-die bandwidth, 32× larger on-package cache, and 2.3× higher DRAM bandwidth and capacity, while conveniently supporting scaled-down HPC-oriented designs. This work explores the microarchitectural design necessary to enable composable GPUs and evaluates the benefits composability can provide to HPC, DL training, and DL inference. We show that when compared to a converged GPU design, a DL-optimized COPA-GPU featuring a combination of 16× larger cache capacity and 1.6× higher DRAM bandwidth scales per-GPU training and inference performance by 31% and 35%, respectively, and reduces the number of GPU instances by 50% in scale-out training scenarios.
Style APA, Harvard, Vancouver, ISO itp.
45

Rapaport, D. C. "GPU molecular dynamics: Algorithms and performance". Journal of Physics: Conference Series 2241, nr 1 (1.03.2022): 012007. http://dx.doi.org/10.1088/1742-6596/2241/1/012007.

Pełny tekst źródła
Streszczenie:
Abstract A previous study of MD algorithms designed for GPU use is extended to cover more recent developments in GPU architecture. Algorithm modifications are described, togther with extensions to more complex systems. New measurements include the effects of increased parallelism on GPU performance, as well as comparisons with multiple-core CPUs using multitasking based on CPU threads and message passing. The results show that the GPU retains a significant performance advantage.
Style APA, Harvard, Vancouver, ISO itp.
46

Zhu, Rui, Chang Nian Chen i Lei Hua Qin. "An Transfer Latency Optimized Solution in GPU-Accelerated De-Duplication". Applied Mechanics and Materials 336-338 (lipiec 2013): 2059–62. http://dx.doi.org/10.4028/www.scientific.net/amm.336-338.2059.

Pełny tekst źródła
Streszczenie:
Recently, GPU has been introduced as an important tool in general purpose programming due to its powerful computing capacity. In data de-duplication systems, GPU has been used to accelerate the chunking and hashing algorithms. However, the data transfer latency between the memories of CPU to GPU is one of the main challenges in GPU accelerated de-duplication. To alleviate this challenge, our solution strives to reduce the data transfer time between host and GPU memory on parallelized content-defined chunking and hashing algorithm. In our experiment, it has shown 15%~20% performance improvements over already accelerated baseline GPU implementation in data de-duplication.
Style APA, Harvard, Vancouver, ISO itp.
47

DeFrancisco, Richard, Shenghsun Cho, Michael Ferdman i Scott A. Smolka. "Swarm model checking on the GPU". International Journal on Software Tools for Technology Transfer 22, nr 5 (16.06.2020): 583–99. http://dx.doi.org/10.1007/s10009-020-00576-x.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
48

Wang, Qihan, Zhen Peng, Bin Ren, Jie Chen i Robert G. Edwards. "MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation". ACM Transactions on Architecture and Code Optimization 19, nr 2 (30.06.2022): 1–26. http://dx.doi.org/10.1145/3506705.

Pełny tekst źródła
Streszczenie:
The many-body correlation function is a fundamental computation kernel in modern physics computing applications, e.g., Hadron Contractions in Lattice quantum chromodynamics (QCD). This kernel is both computation and memory intensive, involving a series of tensor contractions, and thus usually runs on accelerators like GPUs. Existing optimizations on many-body correlation mainly focus on individual tensor contractions (e.g., cuBLAS libraries and others). In contrast, this work discovers a new optimization dimension for many-body correlation by exploring the optimization opportunities among tensor contractions. More specifically, it targets general GPU architectures (both NVIDIA and AMD) and optimizes many-body correlation’s memory management by exploiting a set of memory allocation and communication redundancy elimination opportunities: first, GPU memory allocation redundancy : the intermediate output frequently occurs as input in the subsequent calculations; second, CPU-GPU communication redundancy : although all tensors are allocated on both CPU and GPU, many of them are used (and reused) on the GPU side only, and thus, many CPU/GPU communications (like that in existing Unified Memory designs) are unnecessary; third, GPU oversubscription: limited GPU memory size causes oversubscription issues, and existing memory management usually results in near-reuse data eviction, thus incurring extra CPU/GPU memory communications. Targeting these memory optimization opportunities, this article proposes MemHC, an optimized systematic GPU memory management framework that aims to accelerate the calculation of many-body correlation functions utilizing a series of new memory reduction designs. These designs involve optimizations for GPU memory allocation, CPU/GPU memory movement, and GPU memory oversubscription, respectively. More specifically, first, MemHC employs duplication-aware management and lazy release of GPU memories to corresponding host managing for better data reusability. Second, it implements data reorganization and on-demand synchronization to eliminate redundant (or unnecessary) data transfer. Third, MemHC exploits an optimized Least Recently Used (LRU) eviction policy called Pre-Protected LRU to reduce evictions and leverage memory hits. Additionally, MemHC is portable for various platforms including NVIDIA GPUs and AMD GPUs. The evaluation demonstrates that MemHC outperforms unified memory management by \( 2.18\times \) to \( 10.73\times \) . The proposed Pre-Protected LRU policy outperforms the original LRU policy by up to \( 1.36\times \) improvement. 1
Style APA, Harvard, Vancouver, ISO itp.
49

Zhang, Yu, Da Peng, Xiaofei Liao, Hai Jin, Haikun Liu, Lin Gu i Bingsheng He. "LargeGraph". ACM Transactions on Architecture and Code Optimization 18, nr 4 (31.12.2021): 1–24. http://dx.doi.org/10.1145/3477603.

Pełny tekst źródła
Streszczenie:
Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices’ new states along graph paths. To efficiently support out-of-GPU-memory graph processing, this work designs a system LargeGraph . Different from existing out-of-GPU-memory systems, LargeGraph proposes a dependency-aware data-driven execution approach , which can significantly accelerate active vertices’ state propagations along graph paths with low data access cost and also high parallelism. Specifically, according to the dependencies between the vertices, it only loads and processes the graph data associated with dependency chains originated from active vertices for smaller access cost. Because most active vertices frequently use a small evolving set of paths for their new states’ propagation because of power-law property, this small set of paths are dynamically identified and maintained and efficiently handled on the GPU to accelerate most propagations for faster convergence, whereas the remaining graph data are handled over the CPU. For out-of-GPU-memory graph processing, LargeGraph outperforms four cutting-edge systems: Totem (5.19–11.62×), Graphie (3.02–9.41×), Garaph (2.75–8.36×), and Subway (2.45–4.15×).
Style APA, Harvard, Vancouver, ISO itp.
50

Wong, Un-Hong, Takayuki Aoki i Hon-Cheng Wong. "Efficient magnetohydrodynamic simulations on distributed multi-GPU systems using a novel GPU Direct–MPI hybrid approach". Computer Physics Communications 185, nr 7 (lipiec 2014): 1901–13. http://dx.doi.org/10.1016/j.cpc.2014.03.018.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii