Academic literature on the topic 'High-performance, graph processing, GPU'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'High-performance, graph processing, GPU.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "High-performance, graph processing, GPU"

1

Zhou, Chao, and Tao Zhang. "High Performance Graph Data Imputation on Multiple GPUs." Future Internet 13, no. 2 (January 31, 2021): 36. http://dx.doi.org/10.3390/fi13020036.

Full text
Abstract:
In real applications, massive data with graph structures are often incomplete due to various restrictions. Therefore, graph data imputation algorithms have been widely used in the fields of social networks, sensor networks, and MRI to solve the graph data completion problem. To keep the data relevant, a data structure is represented by a graph-tensor, in which each matrix is the vertex value of a weighted graph. The convolutional imputation algorithm has been proposed to solve the low-rank graph-tensor completion problem that some data matrices are entirely unobserved. However, this data imputation algorithm has limited application scope because it is compute-intensive and low-performance on CPU. In this paper, we propose a scheme to perform the convolutional imputation algorithm with higher time performance on GPUs (Graphics Processing Units) by exploiting multi-core GPUs of CUDA architecture. We propose optimization strategies to achieve coalesced memory access for graph Fourier transform (GFT) computation and improve the utilization of GPU SM resources for singular value decomposition (SVD) computation. Furthermore, we design a scheme to extend the GPU-optimized implementation to multiple GPUs for large-scale computing. Experimental results show that the GPU implementation is both fast and accurate. On synthetic data of varying sizes, the GPU-optimized implementation running on a single Quadro RTX6000 GPU achieves up to 60.50× speedups over the GPU-baseline implementation. The multi-GPU implementation achieves up to 1.81× speedups on two GPUs versus the GPU-optimized implementation on a single GPU. On the ego-Facebook dataset, the GPU-optimized implementation achieves up to 77.88× speedups over the GPU-baseline implementation. Meanwhile, the GPU implementation and the CPU implementation achieve similar, low recovery errors.
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Yangzihao, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. "Gunrock: a high-performance graph processing library on the GPU." ACM SIGPLAN Notices 50, no. 8 (December 18, 2015): 265–66. http://dx.doi.org/10.1145/2858788.2688538.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Choudhury, Dwaipayan, Aravind Sukumaran Rajam, Ananth Kalyanaraman, and Partha Pratim Pande. "High-Performance and Energy-Efficient 3D Manycore GPU Architecture for Accelerating Graph Analytics." ACM Journal on Emerging Technologies in Computing Systems 18, no. 1 (January 31, 2022): 1–19. http://dx.doi.org/10.1145/3482880.

Full text
Abstract:
Recent advances in GPU-based manycore accelerators provide the opportunity to efficiently process large-scale graphs on chip. However, real world graphs have a diverse range of topology and connectivity patterns (e.g., degree distributions) that make the design of input-agnostic hardware architectures a challenge. Network-on-Chip (NoC)- based architectures provide a way to overcome this challenge as the architectural topology can be used to approximately model the expected traffic patterns that emerge from graph application workloads. In this paper, we first study the mix of long- and short-range traffic patterns generated on-chip using graph workloads, and subsequently use the findings to adapt the design of an optimal NoC-based architecture. In particular, by leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)- enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors (SM) and the memory controllers (MC) follow a power-law distribution. The proposed 3D manycore GPU architecture outperforms the traditional planar (2D) counterparts in both performance and energy consumption. Moreover, by adopting a joint performance-thermal optimization strategy, we address the thermal concerns in a 3D design without noticeably compromising the achievable performance. The 3D integration technology is also leveraged to incorporate Near Data Processing (NDP) to complement the performance benefits introduced by the SWNoC architecture. As graph applications are inherently memory intensive, off-chip data movement gives rise to latency and energy overheads in the presence of external DRAM. In conventional GPU architectures, as the main memory layer is not integrated with the logic, off-chip data movement negatively impacts overall performance and energy consumption. We demonstrate that NDP significantly reduces the overheads associated with such frequent and irregular memory accesses in graph-based applications. The proposed SWNoC-enabled NDP framework that integrates 3D memory (like Micron's HMC) with a massive number of GPU cores achieves 29.5% performance improvement and 30.03% less energy consumption on average compared to a conventional planar Mesh-based design with external DRAM.
APA, Harvard, Vancouver, ISO, and other styles
4

Pan, Xiao Hui. "Efficient Graph Component Labeling on Hybrid CPU and GPU Platforms." Applied Mechanics and Materials 596 (July 2014): 276–79. http://dx.doi.org/10.4028/www.scientific.net/amm.596.276.

Full text
Abstract:
Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.
APA, Harvard, Vancouver, ISO, and other styles
5

Lü, Yashuai, Hui Guo, Libo Huang, Qi Yu, Li Shen, Nong Xiao, and Zhiying Wang. "GraphPEG." ACM Transactions on Architecture and Code Optimization 18, no. 3 (June 2021): 1–24. http://dx.doi.org/10.1145/3450440.

Full text
Abstract:
Due to massive thread-level parallelism, GPUs have become an attractive platform for accelerating large-scale data parallel computations, such as graph processing. However, achieving high performance for graph processing with GPUs is non-trivial. Processing graphs on GPUs introduces several problems, such as load imbalance, low utilization of hardware unit, and memory divergence. Although previous work has proposed several software strategies to optimize graph processing on GPUs, there are several issues beyond the capability of software techniques to address. In this article, we present GraphPEG, a graph processing engine for efficient graph processing on GPUs. Inspired by the observation that many graph algorithms have a common pattern on graph traversal, GraphPEG improves the performance of graph processing by coupling automatic edge gathering with fine-grain work distribution. GraphPEG can also adapt to various input graph datasets and simplify the software design of graph processing with hardware-assisted graph traversal. Simulation results show that, in comparison with two representative highly efficient GPU graph processing software framework Gunrock and SEP-Graph, GraphPEG improves graph processing throughput by 2.8× and 2.5× on average, and up to 7.3× and 7.0× for six graph algorithm benchmarks on six graph datasets, with marginal hardware cost.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Yu, Da Peng, Xiaofei Liao, Hai Jin, Haikun Liu, Lin Gu, and Bingsheng He. "LargeGraph." ACM Transactions on Architecture and Code Optimization 18, no. 4 (December 31, 2021): 1–24. http://dx.doi.org/10.1145/3477603.

Full text
Abstract:
Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices’ new states along graph paths. To efficiently support out-of-GPU-memory graph processing, this work designs a system LargeGraph . Different from existing out-of-GPU-memory systems, LargeGraph proposes a dependency-aware data-driven execution approach , which can significantly accelerate active vertices’ state propagations along graph paths with low data access cost and also high parallelism. Specifically, according to the dependencies between the vertices, it only loads and processes the graph data associated with dependency chains originated from active vertices for smaller access cost. Because most active vertices frequently use a small evolving set of paths for their new states’ propagation because of power-law property, this small set of paths are dynamically identified and maintained and efficiently handled on the GPU to accelerate most propagations for faster convergence, whereas the remaining graph data are handled over the CPU. For out-of-GPU-memory graph processing, LargeGraph outperforms four cutting-edge systems: Totem (5.19–11.62×), Graphie (3.02–9.41×), Garaph (2.75–8.36×), and Subway (2.45–4.15×).
APA, Harvard, Vancouver, ISO, and other styles
7

SOMAN, JYOTHISH, KISHORE KOTHAPALLI, and P. J. NARAYANAN. "SOME GPU ALGORITHMS FOR GRAPH CONNECTED COMPONENTS AND SPANNING TREE." Parallel Processing Letters 20, no. 04 (December 2010): 325–39. http://dx.doi.org/10.1142/s0129626410000272.

Full text
Abstract:
Graphics Processing Units (GPU) are application specific accelerators which provide high performance to cost ratio and are widely available and used, hence places them as a ubiquitous accelerator. A computing paradigm based on the same is the general purpose computing on the GPU (GPGPU) model. The GPU due to its graphics lineage is better suited for the data-parallel, data-regular algorithms. The hardware architecture of the GPU is not suitable for the data parallel but data irregular algorithms such as graph connected components and list ranking. In this paper, we present results that show how to use GPUs efficiently for graph algorithms which are known to have irregular data access patterns. We consider two fundamental graph problems: finding the connected components and finding a spanning tree. These two problems find applications in several graph theoretical problems. In this paper we arrive at efficient GPU implementations for the above two problems. The algorithms focus on minimising irregularity at both algorithmic and implementation level. Our implementation achieves a speedup of 11-16 times over a corresponding best sequential implementation.
APA, Harvard, Vancouver, ISO, and other styles
8

Seliverstov, E. Yu. "Structural Mapping of Global Optimization Algorithms to Graphics Processing Unit Architecture." Herald of the Bauman Moscow State Technical University. Series Instrument Engineering, no. 2 (139) (June 2022): 42–59. http://dx.doi.org/10.18698/0236-3933-2022-2-42-59.

Full text
Abstract:
Graphics processing units (GPU) deliver a high execution efficiency for modern metaheuristic algorithms with a high computation complexity. It is crucial to have an optimal task mapping of the optimization algorithm to the parallel system architecture which strongly affects the efficiency of the optimization process. The paper proposes a novel task mapping algorithm of the parallel metaheuristic algorithm to the GPU architecture, describes problem statement for the mapping of algorithm graph model to the GPU model, and gives a formal definition of graph mapping and mapping restrictions. The algorithm graph model is a hierarchical graph model consisting of island parallel model and metaheuristic optimization algorithm model. A set of feasible mappings using mapping restrictions makes it possible to formalize GPU architecture and parallel model features. The structural mapping algorithm is based on cooperative solving of the optimization problem and the discrete optimization problem of the structural model mapping. The study outlines the parallel efficiency criteria which can be evaluated both experimentally and analytically to predict a model efficiency. The experimental section introduces the parallel optimization algorithm based on the proposed structural mapping algorithm. Experimental results for parallel efficiency comparison between parallel and sequential algorithms are presented and discussed
APA, Harvard, Vancouver, ISO, and other styles
9

Toledo, Leonel, Pedro Valero-Lara, Jeffrey S. Vetter, and Antonio J. Peña. "Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs." Electronics 11, no. 9 (April 20, 2022): 1307. http://dx.doi.org/10.3390/electronics11091307.

Full text
Abstract:
The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. GPU capabilities have been increasing significantly in terms of performance and memory capacity. However, there are still some problems in terms of scalability and limitations to the amount of work that a GPU can perform at a time. To minimize the overhead associated with the launch of GPU kernels, as well as to maximize the use of GPU capacity, we have combined the new CUDA Graph API with the CUDA programming model (including CUDA math libraries) and the OpenACC programming model. We use as test cases two different, well-known and widely used problems in HPC and AI: the Conjugate Gradient method and the Particle Swarm Optimization. In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. In this case, we are able to significantly outperform the NVIDIA reference code, reaching an acceleration of up to 11× thanks to a better implementation, which can benefit from the new CUDA Graph capabilities. In the second test case (Particle Swarm Optimization), we complement the OpenACC functionality with the use of CUDA Graph, achieving again accelerations of up to one order of magnitude, with average speedups ranging from 2× to 4×, and performance very close to a reference and optimized CUDA code. Our main target is to achieve a higher coding productivity model for GPU programming by using Static Graphs, which provides, in a very transparent way, a better exploitation of the GPU capacity. The combination of using Static Graphs with two of the current most important GPU programming models (CUDA and OpenACC) is able to reduce considerably the execution time w.r.t. the use of CUDA and OpenACC only, achieving accelerations of up to more than one order of magnitude. Finally, we propose an interface to incorporate the concept of Static Graphs into the OpenACC Specifications.
APA, Harvard, Vancouver, ISO, and other styles
10

Quer, Stefano, and Andrea Calabrese. "Graph Reachability on Parallel Many-Core Architectures." Computation 8, no. 4 (December 2, 2020): 103. http://dx.doi.org/10.3390/computation8040103.

Full text
Abstract:
Many modern applications are modeled using graphs of some kind. Given a graph, reachability, that is, discovering whether there is a path between two given nodes, is a fundamental problem as well as one of the most important steps of many other algorithms. The rapid accumulation of very large graphs (up to tens of millions of vertices and edges) from a diversity of disciplines demand efficient and scalable solutions to the reachability problem. General-purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize algorithms that present a high degree of regularity. In this paper, we extend the applicability of GPU processing to graph-based manipulation, by re-designing a simple but efficient state-of-the-art graph-labeling method, namely the GRAIL (Graph Reachability Indexing via RAndomized Interval) algorithm, to many-core CUDA-based GPUs. This algorithm firstly generates a label for each vertex of the graph, then it exploits these labels to answer reachability queries. Unfortunately, the original algorithm executes a sequence of depth-first visits which are intrinsically recursive and cannot be efficiently implemented on parallel systems. For that reason, we design an alternative approach in which a sequence of breadth-first visits substitute the original depth-first traversal to generate the labeling, and in which a high number of concurrent visits is exploited during query evaluation. The paper describes our strategy to re-design these steps, the difficulties we encountered to implement them, and the solutions adopted to overcome the main inefficiencies. To prove the validity of our approach, we compare (in terms of time and memory requirements) our GPU-based approach with the original sequential CPU-based tool. Finally, we report some hints on how to conduct further research in the area.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "High-performance, graph processing, GPU"

1

Segura, Salvador Albert. "High-performance and energy-efficient irregular graph processing on GPU architectures." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671449.

Full text
Abstract:
Graph processing is an established and prominent domain that is the foundation of new emerging applications in areas such as Data Analytics and Machine Learning, empowering applications such as road navigation, social networks and automatic speech recognition. The large amount of data employed in these domains requires high throughput architectures such as GPGPU. Although the processing of large graph-based workloads exhibits a high degree of parallelism, memory access patterns tend to be highly irregular, leading to poor efficiency due to memory divergence.In order to ameliorate these issues, GPGPU graph applications perform stream compaction operations which process active nodes/edges so subsequent steps work on a compacted dataset. We propose to offload this task to the Stream Compaction Unit (SCU) hardware extension tailored to the requirements of these operations, which additionally performs pre-processing by filtering and reordering elements processed.We show that memory divergence inefficiencies prevail in GPGPU irregular graph-based applications, yet we find that it is possible to relax the strict relationship between thread and processed data to empower new optimizations. As such, we propose the Irregular accesses Reorder Unit (IRU), a novel hardware extension integrated in the GPU pipeline that reorders and filters data processed by the threads on irregular accesses improving memory coalescing.Finally, we leverage the strengths of both previous approaches to achieve synergistic improvements. We do so by proposing the IRU-enhanced SCU (ISCU), which employs the efficient pre-processing mechanisms of the IRU to improve SCU stream compaction efficiency and NoC throughput limitations due to SCU pre-processing operations. We evaluate the ISCU with state-of-the-art graph-based applications achieving a 2.2x performance improvement and 10x energy-efficiency.
El processament de grafs és un domini prominent i establert com a la base de noves aplicacions emergents en àrees com l'anàlisi de dades i Machine Learning, que permeten aplicacions com ara navegació per carretera, xarxes socials i reconeixement automàtic de veu. La gran quantitat de dades emprades en aquests dominis requereix d’arquitectures d’alt rendiment, com ara GPGPU. Tot i que el processament de grans càrregues de treball basades en grafs presenta un alt grau de paral·lelisme, els patrons d’accés a la memòria tendeixen a ser irregulars, fet que redueix l’eficiència a causa de la divergència d’accessos a memòria. Per tal de millorar aquests problemes, les aplicacions de grafs per a GPGPU realitzen operacions de stream compaction que processen nodes/arestes per tal que els passos posteriors funcionin en un conjunt de dades compactat. Proposem deslliurar d’aquesta tasca a la extensió hardware Stream Compaction Unit (SCU) adaptada als requisits d’aquestes operacions, que a més realitza un pre-processament filtrant i reordenant els elements processats.Mostrem que les ineficiències de divergència de memòria prevalen en aplicacions GPGPU basades en grafs irregulars, tot i que trobem que és possible relaxar la relació estricta entre threads i les dades processades per obtenir noves optimitzacions. Com a tal, proposem la Irregular accesses Reorder Unit (IRU), una nova extensió de maquinari integrada al pipeline de la GPU que reordena i filtra les dades processades pels threads en accessos irregulars que milloren la convergència d’accessos a memòria. Finalment, aprofitem els punts forts de les propostes anteriors per aconseguir millores sinèrgiques. Ho fem proposant la IRU-enhanced SCU (ISCU), que utilitza els mecanismes de pre-processament eficients de la IRU per millorar l’eficiència de stream compaction de la SCU i les limitacions de rendiment de NoC a causa de les operacions de pre-processament de la SCU.
APA, Harvard, Vancouver, ISO, and other styles
2

McLaughlin, Adam Thomas. "Power-constrained performance optimization of GPU graph traversal." Thesis, Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50209.

Full text
Abstract:
Graph traversal represents an important class of graph algorithms that is the nucleus of many large scale graph analytics applications. While improving the performance of such algorithms using GPUs has received attention, understanding and managing performance under power constraints has not yet received similar attention. This thesis first explores the power and performance characteristics of breadth first search (BFS) via measurements on a commodity GPU. We utilize this analysis to address the problem of minimizing execution time below a predefined power limit or power cap exposing key relationships between graph properties and power consumption. We modify the firmware on a commodity GPU to measure power usage and use the GPU as an experimental system to evaluate future architectural enhancements for the optimization of graph algorithms. Specifically, we propose and evaluate power management algorithms that scale i) the GPU frequency or ii) the number of active GPU compute units for a diverse set of real-world and synthetic graphs. Compared to scaling either frequency or compute units individually, our proposed schemes reduce execution time by an average of 18.64% by adjusting the configuration based on the inter- and intra-graph characteristics.
APA, Harvard, Vancouver, ISO, and other styles
3

Lee, Dongwon. "High-performance computer system architectures for embedded computing." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42766.

Full text
Abstract:
The main objective of this thesis is to propose new methods for designing high-performance embedded computer system architectures. To achieve the goal, three major components - multi-core processing elements (PEs), DRAM main memory systems, and on/off-chip interconnection networks - in multi-processor embedded systems are examined in each section respectively. The first section of this thesis presents architectural enhancements to graphics processing units (GPUs), one of the multi- or many-core PEs, for improving performance of embedded applications. An embedded application is first mapped onto GPUs to explore the design space, and then architectural enhancements to existing GPUs are proposed for improving throughput of the embedded application. The second section proposes high-performance buffer mapping methods, which exploit useful features of DRAM main memory systems, in DSP multi-processor systems. The memory wall problem becomes increasingly severe in multiprocessor environments because of communication and synchronization overheads. To alleviate the memory wall problem, this section exploits bank concurrency and page mode access of DRAM main memory systems for increasing the performance of multiprocessor DSP systems. The final section presents a network-centric Turbo decoder and network-centric FFT processors. In the era of multi-processor systems, an interconnection network is another performance bottleneck. To handle heavy communication traffic, this section applies a crossbar switch - one of the indirect networks - to the parallel Turbo decoder, and applies a mesh topology to the parallel FFT processors. When designing the mesh FFT processors, a very different approach is taken to improve performance; an optical fiber is used as a new interconnection medium.
APA, Harvard, Vancouver, ISO, and other styles
4

Sedaghati, Mokhtari Naseraddin. "Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Hong, Changwan. "Code Optimization on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hassan, Mohamed Wasfy Abdelfattah. "Using Workload Characterization to Guide High Performance Graph Processing." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/103469.

Full text
Abstract:
Graph analytics represent an important application domain widely used in many fields such as web graphs, social networks, and Bayesian networks. The sheer size of the graph data sets combined with the irregular nature of the underlying problem pose a significant challenge for performance, scalability, and power efficiency of graph processing. With the exponential growth of the size of graph datasets, there is an ever-growing need for faster more power efficient graph solvers. The computational needs of graph processing can take advantage of the FPGAs' power efficiency and customizable architecture paired with CPUs' general purpose processing power and sophisticated cache policies. CPU-FPGA hybrid systems have the potential for supporting performant and scalable graph solvers if both devices can work coherently to make up for each other's deficits. This study aims to optimize graph processing on heterogeneous systems through interdisciplinary research that would impact both the graph processing community, and the FPGA/heterogeneous computing community. On one hand, this research explores how to harness the computational power of FPGAs and how to cooperatively work in a CPU-FPGA hybrid system. On the other hand, graph applications have a data-driven execution profile; hence, this study explores how to take advantage of information about the graph input properties to optimize the performance of graph solvers. The introduction of High Level Synthesis (HLS) tools allowed FPGAs to be accessible to the masses but they are yet to be performant and efficient, especially in the case of irregular graph applications. Therefore, this dissertation proposes automated frameworks to help integrate FPGAs into mainstream computing. This is achieved by first exploring the optimization space of HLS-FPGA designs, then devising a domain-specific performance model that is used to build an automated framework to guide the optimization process. Moreover, the architectural strengths of both CPUs and FPGAs are exploited to maximize graph processing performance via an automated framework for workload distribution on the available hardware resources.
Doctor of Philosophy
Graph processing is a very important application domain, which is emphasized by the fact that many real-world problems can be represented as graph applications. For instance, looking at the internet, web pages can be represented as the graph vertices while hyper links between them represent the edges. Analyzing these types of graphs is used for web search engines, ranking websites, and network analysis among other uses. However, graph processing is computationally demanding and very challenging to optimize. This is due to the irregular nature of graph problems, which can be characterized by frequent indirect memory accesses. Such a memory access pattern is dependent on the data input and impossible to predict, which renders CPUs' sophisticated caching policies useless to performance. With the rise of heterogeneous computing that enabled using hardware accelerators, a new research area was born, attempting to maximize performance by utilizing the available hardware devices in a heterogeneous ecosystem. This dissertation aims to improve the efficiency of utilizing such heterogeneous systems when targeting graph applications. More specifically, this research focuses on the collaboration of CPUs and FPGAs (Field Programmable Gate Arrays) in a CPU-FPGA hybrid system. Innovative ideas are presented to exploit the strengths of each available device in such a heterogeneous system, as well as addressing some of the inherent challenges of graph processing. Automated frameworks are introduced to efficiently utilize the FPGA devices, in addition to distributing and scheduling the workload across multiple devices to maximize the performance of graph applications.
APA, Harvard, Vancouver, ISO, and other styles
7

Smith, Michael Shawn. "Performance Analysis of Hybrid CPU/GPU Environments." PDXScholar, 2010. https://pdxscholar.library.pdx.edu/open_access_etds/300.

Full text
Abstract:
We present two metrics to assist the performance analyst to gain a unified view of application performance in a hybrid environment: GPU Computation Percentage and GPU Load Balance. We analyze the metrics using a matrix multiplication benchmark suite and a real scientific application. We also extend an experiment management system to support GPU performance data and to calculate and store our GPU Computation Percentage and GPU Load Balance metrics.
APA, Harvard, Vancouver, ISO, and other styles
8

Cyrus, Sam. "Fast Computation on Processing Data Warehousing Queries on GPU Devices." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6214.

Full text
Abstract:
Current database management systems use Graphic Processing Units (GPUs) as dedicated accelerators to process each individual query, which results in underutilization of GPU. When a single query data warehousing workload was run on an open source GPU query engine, the utilization of main GPU resources was found to be less than 25%. The low utilization then leads to low system throughput. To resolve this problem, this paper suggests a way to transfer all of the desired data into the global memory of GPU and keep it until all queries are executed as one batch. The PCIe transfer time from CPU to GPU is minimized, which results in better performance in less time of overall query processing. The execution time was improved by up to 40% when running multiple queries, compared to dedicated processing.
APA, Harvard, Vancouver, ISO, and other styles
9

Madduri, Kamesh. "A high-performance framework for analyzing massive complex networks." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24712.

Full text
Abstract:
Thesis (Ph.D.)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Bader, David; Committee Member: Berry, Jonathan; Committee Member: Fujimoto, Richard; Committee Member: Saini, Subhash; Committee Member: Vuduc, Richard
APA, Harvard, Vancouver, ISO, and other styles
10

Hordemann, Glen J. "Exploring High Performance SQL Databases with Graphics Processing Units." Bowling Green State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1380125703.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "High-performance, graph processing, GPU"

1

Matt, Pharr, and Fernando Randima, eds. GPU gems: Programming techniques for high- performance graphics and general-purpose computation. Upper Saddle River, NJ: Addison-Wesley, 2005.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Matt, Pharr, and Fernando Randima, eds. GPU gems 2: Programming techniques for high- performance graphics and general-purpose computation. Upper Saddle River, NJ: Addison-Wesley, 2005.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Yuen, David A. GPU Solutions to Multi-scale Problems in Science and Engineering. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Fernando, Randima, and Matt Pharr. GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems). Addison-Wesley Professional, 2005.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Gpu Solutions To Multiscale Problems In Science And Engineering. Springer, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ge, Wei, Lennart Johnsson, Long Wang, David A. Yuen, Xuebin Chi, and Yaolin Shi. GPU Solutions to Multi-scale Problems in Science and Engineering. Springer, 2016.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "High-performance, graph processing, GPU"

1

Kaczmarski, Krzysztof, Piotr Przymus, and Paweł Rzążewski. "Improving High-Performance GPU Graph Traversal with Compression." In Advances in Intelligent Systems and Computing, 201–14. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-10518-5_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sørensen, Hans Henrik Brandenborg. "High-Performance Matrix-Vector Multiplication on the GPU." In Euro-Par 2011: Parallel Processing Workshops, 377–86. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-29737-3_42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sengupta, Dipanjan, Narayanan Sundaram, Xia Zhu, Theodore L. Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. "GraphIn: An Online High Performance Incremental Graph Processing Framework." In Euro-Par 2016: Parallel Processing, 319–33. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-43659-3_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Cabarle, Francis George, Henry Adorna, Miguel A. Martínez-del-Amor, and Mario J. Pérez-Jiménez. "Spiking Neural P System Simulations on a High Performance GPU Platform." In Algorithms and Architectures for Parallel Processing, 99–108. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24669-2_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Potluri, Sasanka, Alireza Fasih, Laxminand Kishore Vutukuru, Fadi Al Machot, and Kyandoghere Kyamakya. "CNN Based High Performance Computing for Real Time Image Processing on GPU." In Autonomous Systems: Developments and Trends, 255–66. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-24806-1_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Rubinpur, Yaniv, and Sivan Toledo. "High-performance GPU and CPU Signal Processing for a Reverse-GPS Wildlife Tracking System." In Euro-Par 2020: Parallel Processing Workshops, 96–108. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-71593-9_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chu, Tianshu, Jian Dai, Depei Qian, Weiwei Fang, and Yi Liu. "A Novel Scheme for High Performance Finite-Difference Time-Domain (FDTD) Computations Based on GPU." In Algorithms and Architectures for Parallel Processing, 441–53. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-13119-6_38.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chang, Dong, Yanfeng Zhang, and Ge Yu. "MaiterStore: A Hot-Aware, High-Performance Key-Value Store for Graph Processing." In Database Systems for Advanced Applications, 117–31. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-43984-5_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Blas, Javier Garcia, Manuel F. Dolz, J. Daniel Garcia, Jesus Carretero, Alessandro Daducci, Yasser Aleman, and Erick Jorge Canales-Rodriguez. "Porting Matlab Applications to High-Performance C++ Codes: CPU/GPU-Accelerated Spherical Deconvolution of Diffusion MRI Data." In Algorithms and Architectures for Parallel Processing, 630–43. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-49583-5_49.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

de Melo Menezes, Breno Augusto, Luis Filipe de Araujo Pessoa, Herbert Kuchen, and Fernando Buarque De Lima Neto. "Parallelization Strategies for GPU-Based Ant Colony Optimization Applied to TSP." In Parallel Computing: Technology Trends. IOS Press, 2020. http://dx.doi.org/10.3233/apc200057.

Full text
Abstract:
Graphics Processing Units (GPUs) have been widely used to speed up the execution of various meta-heuristics for solving hard optimization problems. In the case of Ant Colony Optimization (ACO), many implementations with very distinct parallelization strategies and speedups have been already proposed and evaluated on the Traveling Salesman Problem (TSP). On the one hand, a coarse-grained strategy applies the parallelization on the ant-level and is the most intuitive and common strategy found in the literature. On the other hand, a fine-grained strategy also parallelizes the internal work of each ant, creating a higher degree of parallelization. Although many parallel implementations of ACO exist, the influence of the algorithm parameters (e.g., the number of ants) and the problem configurations (e.g., the number of nodes in the graph) on the performance of coarse- and fine-grained parallelization strategies has not been investigated so far. Thus, this work performs a series of experiments and provides speedup analyses of two distinct ACO parallelization strategies compared to a sequential implementation for different TSP configurations and colony sizes. The results show that the considered factors can significantly impact the performance of parallelization strategies, particularly for larger problems. Furthermore, we provide a recommendation for the parallelization strategy and colony size to use for a given problem size and some insights for the development of other GPU-based meta-heuristics.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "High-performance, graph processing, GPU"

1

Wang, Yangzihao, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. "Gunrock: a high-performance graph processing library on the GPU." In PPoPP '15: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2688500.2688538.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lai, Siyan, Guangda Lai, Guojun Shen, Jing Jin, and Xiaola Lin. "GPregel: A GPU-Based Parallel Graph Processing Model." In 2015 IEEE 17th International Conference on High-Performance Computing and Communications; 2015 IEEE 7th International Symposium on Cyberspace Safety and Security; and 2015 IEEE 12th International Conference on Embedded Software and Systems. IEEE, 2015. http://dx.doi.org/10.1109/hpcc-css-icess.2015.184.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Heldens, Stijn, Ana Lucia Varbanescu, and Alexandru Iosup. "Dynamic Load Balancing for High-Performance Graph Processing on Hybrid CPU-GPU Platforms." In 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3). IEEE, 2016. http://dx.doi.org/10.1109/ia3.2016.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Guo, Yong, Ana Lucia Varbanescu, Alexandru Iosup, and Dick Epema. "An Empirical Performance Evaluation of GPU-Enabled Graph-Processing Systems." In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, 2015. http://dx.doi.org/10.1109/ccgrid.2015.20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tang, Yu-Hang, Oguz Selvitopi, Doru Thom Popovici, and Aydin Buluc. "A High-Throughput Solver for Marginalized Graph Kernels on GPU." In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2020. http://dx.doi.org/10.1109/ipdps47924.2020.00080.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Bulavintsev, Vadim, and Dmitry Zhdanov. "Method for Adaptation of Algorithms to GPU Architecture." In 31th International Conference on Computer Graphics and Vision. Keldysh Institute of Applied Mathematics, 2021. http://dx.doi.org/10.20948/graphicon-2021-3027-930-941.

Full text
Abstract:
We propose a generalized method for adapting and optimizing algorithms for efficient execution on modern graphics processing units (GPU). The method consists of several steps. First, build a control flow graph (CFG) of the algorithm. Next, transform the CFG into a tree of loops and merge non-parallelizable loops into parallelizable ones. Finally, map the resulting loops tree to the tree of GPU computational units, unrolling the algorithm’s loops as necessary for the match. The mapping should be performed bottom-up, from the lowest GPU architecture levels to the highest ones, to minimize off-chip memory access and maximize register file usage. The method provides programmer with a convenient and robust mental framework and strategy for GPU code optimization. We demonstrate the method by adapting to a GPU the DPLL backtracking search algorithm for solving the Boolean satisfiability problem (SAT). The resulting GPU version of DPLL outperforms the CPU version in raw tree search performance sixfold for regular Boolean satisfiability problems and twofold for irregular ones.
APA, Harvard, Vancouver, ISO, and other styles
7

Bisson, Mauro, and Massimiliano Fatica. "Static graph challenge on GPU." In 2017 IEEE High-Performance Extreme Computing Conference (HPEC). IEEE, 2017. http://dx.doi.org/10.1109/hpec.2017.8091034.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Goodarzi, Bahareh, Farzad Khorasani, Vivek Sarkar, and Dhrubajyoti Goswami. "High Performance Multilevel Graph Partitioning on GPU." In 2019 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 2019. http://dx.doi.org/10.1109/hpcs48598.2019.9188120.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Bisson, Mauro, and Massimiliano Fatica. "Update on Static Graph Challenge on GPU." In 2018 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2018. http://dx.doi.org/10.1109/hpec.2018.8547514.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Yang, Haoduo, Huayou Su, Mei Wen, and Chunyuan Zhang. "HPGA: A High-Performance Graph Analytics Framework on the GPU." In 2018 International Conference on Information Systems and Computer Aided Education (ICISCAE). IEEE, 2018. http://dx.doi.org/10.1109/iciscae.2018.8666877.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography