Academic literature on the topic 'SpMV Multiplication'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'SpMV Multiplication.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "SpMV Multiplication"

1

Giannoula, Christina, Ivan Fernandez, Juan Gómez-Luna, Nectarios Koziris, Georgios Goumas, and Onur Mutlu. "Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures." ACM SIGMETRICS Performance Evaluation Review 50, no. 1 (June 20, 2022): 33–34. http://dx.doi.org/10.1145/3547353.3522661.

Full text
Abstract:
Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwidth and low memory access latency, thereby being a good fit to accelerate the Sparse Matrix Vector Multiplication (SpMV) kernel. SpMV has been characterized as one of the most significant and thoroughly studied scientific computation kernels. It is primarily a memory-bound kernel with intensive memory accesses due its algorithmic nature, the compressed matrix format used, and the sparsity patterns of the input matrices given. This paper provides the first comprehensive analysis of SpMV on a real-world PIM architecture, and presents SparseP, the first SpMV library for real PIM architectures. We make two key contributions. First, we design efficient SpMV algorithms to accelerate the SpMV kernel in current and future PIM systems, while covering a wide variety of sparse matrices with diverse sparsity patterns. Second, we provide the first comprehensive analysis of SpMV on a real PIM architecture. Specifically, we conduct our rigorous experimental analysis of SpMV kernels in the UPMEM PIM system, the first publicly-available real-world PIM architecture. Our extensive evaluation provides new insights and recommendations for software designers and hardware architects to efficiently accelerate the SpMV kernel on real PIM systems. For more information about our thorough characterization on the SpMV PIM execution, results, insights and the open-source SparseP software package [21], we refer the reader to the full version of the paper [3, 4]. The SparseP software package is publicly and freely available at https://github.com/CMU-SAFARI/SparseP.
APA, Harvard, Vancouver, ISO, and other styles
2

He, Guixia, and Jiaquan Gao. "A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs." Mathematical Problems in Engineering 2016 (2016): 1–12. http://dx.doi.org/10.1155/2016/8471283.

Full text
Abstract:
Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.
APA, Harvard, Vancouver, ISO, and other styles
3

Gao, Jiaquan, Yuanshen Zhou, and Kesong Wu. "A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication." Parallel Processing Letters 26, no. 04 (December 2016): 1640001. http://dx.doi.org/10.1142/s0129626416400016.

Full text
Abstract:
Accelerating the sparse matrix-vector multiplication (SpMV) on the graphics processing units (GPUs) has attracted considerable attention recently. We observe that on a specific multiple-GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi-GPU parallel SpMV optimization model. Our model involves two stages. In the first stage, a simple rule is defined to divide any given matrix among multiple GPUs, and then a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels. Using these models, we construct in the second stage an optimally multi-GPU parallel SpMV algorithm that is automatically and rapidly generated for the platform for any problem. Given that our model for SpMV is general, independent of the problems, and dependent on the resources of devices, this model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model.
APA, Harvard, Vancouver, ISO, and other styles
4

AlAhmadi, Sarah, Thaha Mohammed, Aiiad Albeshri, Iyad Katib, and Rashid Mehmood. "Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)." Electronics 9, no. 10 (October 13, 2020): 1675. http://dx.doi.org/10.3390/electronics9101675.

Full text
Abstract:
Graphics processing units (GPUs) have delivered a remarkable performance for a variety of high performance computing (HPC) applications through massive parallelism. One such application is sparse matrix-vector (SpMV) computations, which is central to many scientific, engineering, and other applications including machine learning. No single SpMV storage or computation scheme provides consistent and sufficiently high performance for all matrices due to their varying sparsity patterns. An extensive literature review reveals that the performance of SpMV techniques on GPUs has not been studied in sufficient detail. In this paper, we provide a detailed performance analysis of SpMV performance on GPUs using four notable sparse matrix storage schemes (compressed sparse row (CSR), ELLAPCK (ELL), hybrid ELL/COO (HYB), and compressed sparse row 5 (CSR5)), five performance metrics (execution time, giga floating point operations per second (GFLOPS), achieved occupancy, instructions per warp, and warp execution efficiency), five matrix sparsity features (nnz, anpr, nprvariance, maxnpr, and distavg), and 17 sparse matrices from 10 application domains (chemical simulations, computational fluid dynamics (CFD), electromagnetics, linear programming, economics, etc.). Subsequently, based on the deeper insights gained through the detailed performance analysis, we propose a technique called the heterogeneous CPU–GPU Hybrid (HCGHYB) scheme. It utilizes both the CPU and GPU in parallel and provides better performance over the HYB format by an average speedup of 1.7x. Heterogeneous computing is an important direction for SpMV and other application areas. Moreover, to the best of our knowledge, this is the first work where the SpMV performance on GPUs has been discussed in such depth. We believe that this work on SpMV performance analysis and the heterogeneous scheme will open up many new directions and improvements for the SpMV computing field in the future.
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Sheng, Yasong Cao, and Shuwei Sun. "Mapping and Optimization Method of SpMV on Multi-DSP Accelerator." Electronics 11, no. 22 (November 11, 2022): 3699. http://dx.doi.org/10.3390/electronics11223699.

Full text
Abstract:
Sparse matrix-vector multiplication (SpMV) solves the product of a sparse matrix and dense vector, and the sparseness of a sparse matrix is often more than 90%. Usually, the sparse matrix is compressed to save storage resources, but this causes irregular access to dense vectors in the algorithm, which takes a lot of time and degrades the SpMV performance of the system. In this study, we design a dedicated channel in the DMA to implement an indirect memory access process to speed up the SpMV operation. On this basis, we propose six SpMV algorithm schemes and map them to optimize the performance of SpMV. The results show that the M processor’s SpMV performance reached 6.88 GFLOPS. Besides, the average performance of the HPCG benchmark is 2.8 GFLOPS.
APA, Harvard, Vancouver, ISO, and other styles
6

Anzt, Hartwig, Stanimire Tomov, and Jack Dongarra. "On the performance and energy efficiency of sparse linear algebra on GPUs." International Journal of High Performance Computing Applications 31, no. 5 (October 5, 2016): 375–90. http://dx.doi.org/10.1177/1094342016672081.

Full text
Abstract:
In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix–vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel’s MKL for multicore CPUs, and develop a GPU sparse matrix–matrix product (SpMM) implementation that handles the simultaneous multiplication of a sparse matrix with a set of vectors in block-wise fashion. While a typical sparse computation such as the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM succeeds in exceeding the memory-bound limitations of the SpMV. We integrate this kernel into a GPU-accelerated Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) eigensolver. LOBPCG is chosen as a benchmark algorithm for this study as it combines an interesting mix of sparse and dense linear algebra operations that is typical for complex simulation applications, and allows for hardware-aware optimizations. In a detailed analysis we compare the performance and energy efficiency against a multi-threaded CPU counterpart. The reported performance and energy efficiency results are indicative of sparse computations on supercomputers.
APA, Harvard, Vancouver, ISO, and other styles
7

Liu, Jie. "Accuracy Controllable SpMV Optimization on GPU." Journal of Physics: Conference Series 2363, no. 1 (November 1, 2022): 012008. http://dx.doi.org/10.1088/1742-6596/2363/1/012008.

Full text
Abstract:
Sparse matrix vector multiplication (SpMV) is a key kernel widely used in a variety of fields, and mixed-precision calculation brings opportunities to SpMV optimization. Researchers have proposed to store nonzero elements in the interval (-1, 1) in single precision and calculate SpMV in mixed precision. Though it leads to high performance, it also brings loss of accuracy. This paper proposes an accuracy controllable optimization method for SpMV. By limiting the error caused by converting double-precision floating-point numbers in the interval (-1, 1) into single-precision format, the calculation accuracy of mixed-precision SpMV is effectively improved. We tested sparse matrices from the SuiteSparse Matrix Collection on Tesla V100. Compared with the existing mixed-precision MpSpMV kernel, the mixed-precision SpMV proposed in this paper achieves an accuracy improvement.
APA, Harvard, Vancouver, ISO, and other styles
8

Zeng, Guangsen, and Yi Zou. "Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs." Electronics 12, no. 17 (August 31, 2023): 3687. http://dx.doi.org/10.3390/electronics12173687.

Full text
Abstract:
Sparse matrix-vector multiplication (SpMV) is central to many scientific, engineering, and other applications, including machine learning. Compressed Sparse Row (CSR) is a widely used sparse matrix storage format. SpMV using the CSR format on GPU computing platforms is widely studied, where the access behavior of GPU is often the performance bottleneck. The Ampere GPU architecture recently from NVIDIA provides a new asynchronous memory copy instruction, memcpy_async, for more efficient data movement in shared memory. Leveraging the capability of this new memcpy_async instruction, we first propose the CSR-Partial-Overlap to carefully overlap the data copy from global memory to shared memory and computation, allowing us to take full advantage of the data transfer time. In addition, we design the dynamic batch partition and the dynamic threads distribution to achieve effective load balancing, avoid the overhead of fixing up partial sums, and improve thread utilization. Furthermore, we propose the CSR-Full-Overlap based on the CSR-Partial-Overlap, which takes the overlap of data transfer from host to device and SpMV kernel execution into account as well. The CSR-Full-Overlap unifies the two major overlaps in SpMV and hides the computation as much as possible in the two important access behaviors of the GPU. This allows CSR-Full-Overlap to achieve the best performance gains from both overlaps. As far as we know, this paper is the first in-depth study of how memcpy_async can be potentially applied to help accelerate SpMV computation in GPU platforms. We compare CSR-Full-Overlap to the current state-of-the-art cuSPARSE, where our experimental results show an average 2.03x performance gain and up to 2.67x performance gain.
APA, Harvard, Vancouver, ISO, and other styles
9

Gao, Jiaquan, Panpan Qi, and Guixia He. "Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU." Mathematical Problems in Engineering 2016 (2016): 1–14. http://dx.doi.org/10.1155/2016/4596943.

Full text
Abstract:
Sparse matrix-vector multiplication (SpMV) is an important operation in computational science and needs be accelerated because it often represents the dominant cost in many widely used iterative methods and eigenvalue problems. We achieve this objective by proposing a novel SpMV algorithm based on the compressed sparse row (CSR) on the GPU. Our method dynamically assigns different numbers of rows to each thread block and executes different optimization implementations on the basis of the number of rows it involves for each block. The process of accesses to the CSR arrays is fully coalesced, and the GPU’s DRAM bandwidth is efficiently utilized by loading data into the shared memory, which alleviates the bottleneck of many existing CSR-based algorithms (i.e., CSR-scalar and CSR-vector). Test results on C2050 and K20c GPUs show that our method outperforms a perfect-CSR algorithm that inspires our work, the vendor tuned CUSPARSE V6.5 and CUSP V0.5.1, and three popular algorithms clSpMV, CSR5, and CSR-Adaptive.
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Shizhao, Jianbin Fang, Chuanfu Xu, and Zheng Wang. "Adaptive Hybrid Storage Format for Sparse Matrix–Vector Multiplication on Multi-Core SIMD CPUs." Applied Sciences 12, no. 19 (September 29, 2022): 9812. http://dx.doi.org/10.3390/app12199812.

Full text
Abstract:
Optimizing sparse matrix–vector multiplication (SpMV) is challenging due to the non-uniform distribution of the non-zero elements of the sparse matrix. The best-performing SpMV format changes depending on the input matrix and the underlying architecture, and there is no “one-size-fit-for-all” format. A hybrid scheme combining multiple SpMV storage formats allows one to choose an appropriate format to use for the target matrix and hardware. However, existing hybrid approaches are inadequate for utilizing the SIMD cores of modern multi-core CPUs with SIMDs, and it remains unclear how to best mix different SpMV formats for a given matrix. This paper presents a new hybrid storage format for sparse matrices, specifically targeting multi-core CPUs with SIMDs. Our approach partitions the target sparse matrix into two segmentations based on the regularities of the memory access pattern, where each segmentation is stored in a format suitable for its memory access patterns. Unlike prior hybrid storage schemes that rely on the user to determine the data partition among storage formats, we employ machine learning to build a predictive model to automatically determine the partition threshold on a per matrix basis. Our predictive model is first trained off line, and the trained model can be applied to any new, unseen sparse matrix. We apply our approach to 956 matrices and evaluate its performance on three distinct multi-core CPU platforms: a 72-core Intel Knights Landing (KNL) CPU, a 128-core AMD EPYC CPU, and a 64-core Phytium ARMv8 CPU. Experimental results show that our hybrid scheme, combined with the predictive model, outperforms the best-performing alternative by 2.9%, 17.5% and 16% on average on KNL, AMD, and Phytium, respectively.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "SpMV Multiplication"

1

Ashari, Arash. "Sparse Matrix-Vector Multiplication on GPU." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1417770100.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Godwin, Jeswin Samuel. "High-Performancs Sparse Matrix-Vector Multiplication on GPUS for Structured Grid Computations." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1357280824.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Boyer, Brice. "Multiplication matricielle efficace et conception logicielle pour la bibliothèque de calcul exact LinBox." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00767915.

Full text
Abstract:
Dans ce mémoire de thèse, nous développons d'abord des multiplications matricielles efficaces. Nous créons de nouveaux ordonnancements qui permettent de réduire la taille de la mémoire supplémentaire nécessaire lors d'une multiplication du type Winograd tout en gardant une bonne complexité, grâce au développement d'outils externes ad hoc (jeu de galets), à des calculs fins de complexité et à de nouveaux algorithmes hybrides. Nous utilisons ensuite des technologies parallèles (multicœurs et GPU) pour accélérer efficacement la multiplication entre matrice creuse et vecteur dense (SpMV), essentielles aux algorithmes dits /boîte noire/, et créons de nouveaux formats hybrides adéquats. Enfin, nous établissons des méthodes de /design/ générique orientées vers l'efficacité, notamment par conception par briques de base, et via des auto-optimisations. Nous proposons aussi des méthodes pour améliorer et standardiser la qualité du code de manière à pérenniser et rendre plus robuste le code produit. Cela permet de pérenniser de rendre plus robuste le code produit. Ces méthodes sont appliquées en particulier à la bibliothèque de calcul exact LinBox.
APA, Harvard, Vancouver, ISO, and other styles
4

Hong, Changwan. "Code Optimization on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Singh, Kunal. "High-Performance Sparse Matrix-Multi Vector Multiplication on Multi-Core Architecture." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524089757826551.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ramesh, Chinthala. "Hardware-Software Co-Design Accelerators for Sparse BLAS." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/4276.

Full text
Abstract:
Sparse Basic Linear Algebra Subroutines (Sparse BLAS) is an important library. Sparse BLAS includes three levels of subroutines. Level 1, Level2 and Level 3 Sparse BLAS routines. Level 1 Sparse BLAS routines do computations over sparse vector and spare/dense vector. Level 2 deals with sparse matrix and vector operations. Level 3 deals with sparse matrix and dense matrix operations. The computations of these Sparse BLAS routines on General Purpose Processors (GPPs) not only suffer from less utilization of hardware resources but also takes more compute time than the workload due to poor data locality of sparse vector/matrix storage formats. In the literature, tremendous efforts have been put into software to improve these Sparse BLAS routines performance on GPPs. GPPs best suit for applications with high data locality, whereas Sparse BLAS routines operate on applications with less data locality hence, GPPs performance is poor. Various Custom Function Units (Hardware Accelerators) are proposed in the literature and are proved to be efficient than soft wares which tried to accelerate Sparse BLAS subroutines. Though existing hardware accelerators improved the Sparse BLAS performance compared to software Sparse BLAS routines, there is still lot of scope to improve these accelerators. This thesis describes both the existing software and hardware software co-designs (HW/SW co-design) and identifies the limitations of these existing solutions. We propose a new sparse data representation called Sawtooth Compressed Row Storage (SCRS) and corresponding SpMV and SpMM algorithms. SCRS based SpMV and SpMM are performing better than existing software solutions. Even though SCRS based SpMV and SpMM algorithms perform better than existing solutions, they still could not reach theoretical peak performance. The knowledge gained from the study of limitations of these existing solutions including the proposed SCRS based SpMV and SpMM is used to propose new HW/SW co-designs. Software accelerators are limited by the hardware properties of GPPs, and GPUs itself, hence, we propose HW/SW co-designs to accelerate few basic Sparse BLAS operations (SpVV and SpMV). Our proposed Parallel Sparse BLAS HW/SW co-design achieves near theoretical peak performance with reasonable hardware resources.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "SpMV Multiplication"

1

Bisseling, Rob H. Parallel Scientific Computation. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198788348.001.0001.

Full text
Abstract:
This book explains how to use the bulk synchronous parallel (BSP) model to design and implement parallel algorithms in the areas of scientific computing and big data. Furthermore, it presents a hybrid BSP approach towards new hardware developments such as hierarchical architectures with both shared and distributed memory. The book provides a full treatment of core problems in scientific computing and big data, starting from a high-level problem description, via a sequential solution algorithm to a parallel solution algorithm and an actual parallel program written in the communication library BSPlib. Numerical experiments are presented for parallel programs on modern parallel computers ranging from desktop computers to massively parallel supercomputers. The introductory chapter of the book gives a complete overview of BSPlib, so that the reader already at an early stage is able to write his/her own parallel programs. Furthermore, it treats BSP benchmarking and parallel sorting by regular sampling. The next three chapters treat basic numerical linear algebra problems such as linear system solving by LU decomposition, sparse matrix-vector multiplication (SpMV), and the fast Fourier transform (FFT). The final chapter explores parallel algorithms for big data problems such as graph matching. The book is accompanied by a software package BSPedupack, freely available online from the author’s homepage, which contains all programs of the book and a set of test programs.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "SpMV Multiplication"

1

Khan, Muhammad Hannan, Osman Hassan, and Shahid Khan. "Accelerating SpMV Multiplication in Probabilistic Model Checkers Using GPUs." In Theoretical Aspects of Computing – ICTAC 2021, 86–104. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-85315-0_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Stoyanov, Dimitar, Rui Machado, and Franz-Josef Pfreundt. "Task-Based Parallel Sparse Matrix-Vector Multiplication (SpMVM) with GPI-2." In Large-Scale Scientific Computing, 153–60. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-26520-9_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zekri, Ahmed S. "Three Dimensional SPMD Matrix–Matrix Multiplication Algorithm and a Stacked Many-Core Processor Architecture." In Lecture Notes in Electrical Engineering, 1139–50. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-3535-8_94.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Guo, Mingfeng, Yaobin Wang, Jun Huang, Qingfeng Wang, Yaqing Zhang, Mu Xu, and Fang Lu. "Rgs-SpMM: Accelerate Sparse Matrix-Matrix Multiplication by Row Group Splitting Strategy on the GPU." In Lecture Notes in Computer Science, 61–66. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-21395-3_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bisseling, Rob H. "Sparse matrix–vector multiplication." In Parallel Scientific Computation, 190–290. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198788348.003.0004.

Full text
Abstract:
This chapter introduces irregular algorithms and presents the example of parallel sparse matrix-vector multiplication (SpMV), which is the central operation in iterative linear system solvers. The irregular sparsity pattern of the matrix does not change during the multiplication, which may be repeated many times. This justifies putting a lot of effort into finding a good data distribution. The Mondriaan distribution of a sparse matrix is a useful non-Cartesian distribution that can be found by hypergraph-based partitioning. The Mondriaan package implements such a partitioning and also the newer medium-grain partitioning method. The chapter analyses the special cases of random sparse matrices and Laplacian matrices. It uses performance profiles and geometric means to compare different partitioning methods. Furthermore, it presents the hybrid-BSP model and a hybrid-BSP SpMV, which are aimed at hybrid distributed/shared-memory architectures. The parallel SpMV can be incorporated in applications, ranging from PageRank computation to artificial neural networks.
APA, Harvard, Vancouver, ISO, and other styles
6

Mpakos, Panagiotis, Nikela Papadopoulou, Chloe Alverti, Georgios Goumas, and Nectarios Koziris. "On the Performance and Energy Efficiency of Sparse Matrix-Vector Multiplication on FPGAs." In Parallel Computing: Technology Trends. IOS Press, 2020. http://dx.doi.org/10.3233/apc200092.

Full text
Abstract:
The Sparse Matrix-Vector Multiplication kernel (SpMV) has been one of the most popular kernels in high-performance computing, as the building block of many iterative solvers. At the same time, it has been one of the most notorious kernels, due to its low flop per byte ratio, which leads to under-utilization of modern processing system resources and a huge gap between the peak system performance and the observed performance of the kernel. However, moving forward to exascale, performance by itself is no longer the holy grail; the requirement for energy efficient high-performance computing systems is driving a trend towards processing units with better performance per watt ratios. Following this trend, FPGAs have emerged as an alternative, low-power accelerator for high-end systems. In this paper, we implement the SpMV kernel on FPGAs, towards an accelerated library for sparse matrix computations, for single-precision floating point values. Our implementation focuses on optimizing access to the data for the SpMV kernel and applies common optimizations to improve the parallelism and the performance of the SpMV kernel on FPGAs.We evaluate the performance and energy efficiency of our implementation, in comparison to modern CPUs and GPUs, for a diverse set of sparse matrices and demonstrate that FPGAs can be an energy-efficient solution for the SpMV kernel.
APA, Harvard, Vancouver, ISO, and other styles
7

"Sparse Linear Algebra." In Advances in Systems Analysis, Software Engineering, and High Performance Computing, 94–137. IGI Global, 2022. http://dx.doi.org/10.4018/978-1-7998-7082-1.ch004.

Full text
Abstract:
This chapter shows several new programming strategies based on tasking to parallelize sparse linear algebra kernels. The reader will explore different approaches to improve the performance of these kernels thanks to a better workload distribution and comprehension of the data layout. This will be accomplished through the study of some of the most popular and widely used sparse operations, such as SpMV (sparse matrix vector multiplication), GTSV (triangular solve), or CG (conjugate gradient). Those strategies have been tested on multicore systems. Some of them equipped GPU devices, showcasing how to overcome the peculiarities of task-based parallelized kernels in the context of sparse linear algebra computations.
APA, Harvard, Vancouver, ISO, and other styles
8

Jamalmohammed, Saira Banu, Lavanya K., Sumaiya Thaseen I., and Biju V. "Review on Sparse Matrix Storage Formats With Space Complexity Analysis." In Applications of Artificial Intelligence for Smart Technology, 122–45. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-3335-2.ch009.

Full text
Abstract:
Sparse matrix-vector multiplication (SpMV) is a challenging computational kernel in linear algebra applications, like data mining, image processing, and machine learning. The performance of this kernel is greatly dependent on the size of the input matrix and the underlying hardware features. Various sparse matrix storage formats referred to commonly as sparse formats have been proposed in the literature to reduce the size of the matrix. In modern multi-core and many-core architectures, the performance of the kernel is mainly dependent on memory wall and power wall problem. Normally review on sparse formats is done with specific architecture or with specific application. This chapter presents a comparative study on various sparse formats in cross platform architecture like CPU, graphics processor unit (GPU), and single instruction multiple data stream (SIMD) registers. Space complexity analysis of various formats with its representation is discussed. Finally, the merits and demerits of each format have been summarized into a table.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "SpMV Multiplication"

1

Page, Brian A., and Peter M. Kogge. "Scalability of Hybrid Sparse Matrix Dense Vector (SpMV) Multiplication." In 2018 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 2018. http://dx.doi.org/10.1109/hpcs.2018.00072.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Merrill, Duane, and Michael Garland. "Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format." In PPoPP '16: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2851141.2851190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hou, Kaixi, Wu-chun Feng, and Shuai Che. "Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors." In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2017. http://dx.doi.org/10.1109/ipdpsw.2017.155.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Suresh, Krishnan, and Praveen Yadav. "Large-Scale Modal Analysis on Multi-Core Architectures." In ASME 2012 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2012. http://dx.doi.org/10.1115/detc2012-70281.

Full text
Abstract:
We propose here a subspace augmented Rayleigh-Ritz conjugate gradient method (SaRCG) for solving large-scale eigen-value problems. The method is highly scalable and well suited for multi-core architectures since it only requires sparse matrix-vector multiplications (SpMV). As a specific application, we consider the modal analysis of geometrically complex structures that are discretized via non-conforming voxels. The voxelization process is robust and relatively insensitive to geometric complexity, but it leads to large eigen-value problems, that are difficult to solve via standard eigen-solvers such as block-Lanczos. Such problems are easily solved via the proposed SaRCG, where one can, in addition, exploit the voxelization structure to render the SpMV assembly-free. As the numerical experiments indicate, the resulting implementation on multi-core CPUs, and graphics-programmable-units is a practical solution to automated eigen-value estimation during early stages of design.
APA, Harvard, Vancouver, ISO, and other styles
5

Huang, Guyue, Guohao Dai, Yu Wang, and Huazhong Yang. "GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks." In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020. http://dx.doi.org/10.1109/sc41405.2020.00076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography