Academic literature on the topic 'SpMM Multiplication'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'SpMM Multiplication.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "SpMM Multiplication"

1

Wilkinson, Lucas, Kazem Cheshmi, and Maryam Mehri Dehnavi. "Register Tiling for Unstructured Sparsity in Neural Network Inference." Proceedings of the ACM on Programming Languages 7, PLDI (June 6, 2023): 1995–2020. http://dx.doi.org/10.1145/3591302.

Full text
Abstract:
Unstructured sparse neural networks are an important class of machine learning (ML) models, as they compact model size and reduce floating point operations. The execution time of these models is frequently dominated by the sparse matrix multiplication (SpMM) kernel, C = A × B , where A is a sparse matrix, and B and C are dense matrices. The unstructured sparsity pattern of matrices in pruned machine learning models along with their sparsity ratio has rendered useless the large class of libraries and systems that optimize sparse matrix multiplications. Reusing registers is particularly difficult because accesses to memory locations should be known statically. This paper proposes Sparse Register Tiling, a new technique composed of an unroll-and-sparse-jam transformation followed by data compression that is specifically tailored to sparsity patterns in ML matrices. Unroll-and-sparse-jam uses sparsity information to jam the code while improving register reuse. Sparse register tiling is evaluated across 2396 weight matrices from transformer and convolutional models with a sparsity range of 60-95% and provides an average speedup of 1.72× and 2.65× over MKL SpMM and dense matrix multiplication, respectively, on a multicore CPU processor. It also provides an end-to-end speedup of 2.12× for MobileNetV1 with 70% sparsity on an ARM processor commonly used in edge devices.
APA, Harvard, Vancouver, ISO, and other styles
2

Anzt, Hartwig, Stanimire Tomov, and Jack Dongarra. "On the performance and energy efficiency of sparse linear algebra on GPUs." International Journal of High Performance Computing Applications 31, no. 5 (October 5, 2016): 375–90. http://dx.doi.org/10.1177/1094342016672081.

Full text
Abstract:
In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix–vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel’s MKL for multicore CPUs, and develop a GPU sparse matrix–matrix product (SpMM) implementation that handles the simultaneous multiplication of a sparse matrix with a set of vectors in block-wise fashion. While a typical sparse computation such as the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM succeeds in exceeding the memory-bound limitations of the SpMV. We integrate this kernel into a GPU-accelerated Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) eigensolver. LOBPCG is chosen as a benchmark algorithm for this study as it combines an interesting mix of sparse and dense linear algebra operations that is typical for complex simulation applications, and allows for hardware-aware optimizations. In a detailed analysis we compare the performance and energy efficiency against a multi-threaded CPU counterpart. The reported performance and energy efficiency results are indicative of sparse computations on supercomputers.
APA, Harvard, Vancouver, ISO, and other styles
3

Ernst, Thomas. "On the q-Lie group of q-Appell polynomial matrices and related factorizations." Special Matrices 6, no. 1 (February 1, 2018): 93–109. http://dx.doi.org/10.1515/spma-2018-0009.

Full text
Abstract:
Abstract In the spirit of our earlier paper [10] and Zhang and Wang [16],we introduce the matrix of multiplicative q-Appell polynomials of order M ∈ ℤ. This is the representation of the respective q-Appell polynomials in ke-ke basis. Based on the fact that the q-Appell polynomials form a commutative ring [11], we prove that this set constitutes a q-Lie group with two dual q-multiplications in the sense of [9]. A comparison with earlier results on q-Pascal matrices gives factorizations according to [7], which are specialized to q-Bernoulli and q-Euler polynomials.We also show that the corresponding q-Bernoulli and q-Euler matrices form q-Lie subgroups. In the limit q → 1 we obtain corresponding formulas for Appell polynomial matrices.We conclude by presenting the commutative ring of generalized q-Pascal functional matrices,which operates on all functions f ∈ C∞q .
APA, Harvard, Vancouver, ISO, and other styles
4

Guzu, D., T. Hoffmann-Ostenhof, and A. Laptev. "On a class of sharp multiplicative Hardy inequalities." St. Petersburg Mathematical Journal 32, no. 3 (May 11, 2021): 523–30. http://dx.doi.org/10.1090/spmj/1659.

Full text
Abstract:
A class of weighted Hardy inequalities is treated. The sharp constants depend on the lowest eigenvalues of auxiliary Schrödinger operators on a sphere. In particular, for some block radial weights these sharp constants are given in terms of the lowest eigenvalue of a Legendre type equation.
APA, Harvard, Vancouver, ISO, and other styles
5

Bakhadly, Bakhad, Alexander Guterman, and María Jesús de la Puente. "Orthogonality for (0, −1) tropical normal matrices." Special Matrices 8, no. 1 (February 17, 2020): 40–60. http://dx.doi.org/10.1515/spma-2020-0006.

Full text
Abstract:
AbstractWe study pairs of mutually orthogonal normal matrices with respect to tropical multiplication. Minimal orthogonal pairs are characterized. The diameter and girth of three graphs arising from the orthogonality equivalence relation are computed.
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Jie. "Accuracy Controllable SpMV Optimization on GPU." Journal of Physics: Conference Series 2363, no. 1 (November 1, 2022): 012008. http://dx.doi.org/10.1088/1742-6596/2363/1/012008.

Full text
Abstract:
Sparse matrix vector multiplication (SpMV) is a key kernel widely used in a variety of fields, and mixed-precision calculation brings opportunities to SpMV optimization. Researchers have proposed to store nonzero elements in the interval (-1, 1) in single precision and calculate SpMV in mixed precision. Though it leads to high performance, it also brings loss of accuracy. This paper proposes an accuracy controllable optimization method for SpMV. By limiting the error caused by converting double-precision floating-point numbers in the interval (-1, 1) into single-precision format, the calculation accuracy of mixed-precision SpMV is effectively improved. We tested sparse matrices from the SuiteSparse Matrix Collection on Tesla V100. Compared with the existing mixed-precision MpSpMV kernel, the mixed-precision SpMV proposed in this paper achieves an accuracy improvement.
APA, Harvard, Vancouver, ISO, and other styles
7

Nikolski, N., and A. Pushnitski. "Szegő-type limit theorems for “multiplicative Toeplitz” operators and non-Følner approximations." St. Petersburg Mathematical Journal 32, no. 6 (October 20, 2021): 1033–50. http://dx.doi.org/10.1090/spmj/1683.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Merrill, Duane, and Michael Garland. "Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format." ACM SIGPLAN Notices 51, no. 8 (November 9, 2016): 1–2. http://dx.doi.org/10.1145/3016078.2851190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ernst, Thomas. "On the q-exponential of matrix q-Lie algebras." Special Matrices 5, no. 1 (January 26, 2017): 36–50. http://dx.doi.org/10.1515/spma-2017-0003.

Full text
Abstract:
Abstract In this paper, we define several new concepts in the borderline between linear algebra, Lie groups and q-calculus.We first introduce the ring epimorphism r, the set of all inversions of the basis q, and then the important q-determinant and corresponding q-scalar products from an earlier paper. Then we discuss matrix q-Lie algebras with a modified q-addition, and compute the matrix q-exponential to form the corresponding n × n matrix, a so-called q-Lie group, or manifold, usually with q-determinant 1. The corresponding matrix multiplication is twisted under τ, which makes it possible to draw diagrams similar to Lie group theory for the q-exponential, or the so-called q-morphism. There is no definition of letter multiplication in a general alphabet, but in this article we introduce new q-number systems, the biring of q-integers, and the extended q-rational numbers. Furthermore, we provide examples of matrices in suq(4), and its corresponding q-Lie group. We conclude with an example of system of equations with Ward number coeficients.
APA, Harvard, Vancouver, ISO, and other styles
10

Alkenani, Ahmad N., Mohammad Ashraf, and Aisha Jabeen. "Nonlinear generalized Jordan (σ, Γ)-derivations on triangular algebras." Special Matrices 6, no. 1 (December 20, 2017): 216–28. http://dx.doi.org/10.1515/spma-2017-0008.

Full text
Abstract:
Abstract Let R be a commutative ring with identity element, A and B be unital algebras over R and let M be (A,B)-bimodule which is faithful as a left A-module and also faithful as a right B-module. Suppose that A = Tri(A,M,B) is a triangular algebra which is 2-torsion free and σ, Γ be automorphisms of A. A map δ:A→A (not necessarily linear) is called a multiplicative generalized (σ, Γ)-derivation (resp. multiplicative generalized Jordan (σ, Γ)-derivation) on A associated with a (σ, Γ)-derivation (resp. Jordan (σ, Γ)-derivation) d on A if δ(xy) = δ(x)r(y) + σ(x)d(y) (resp. σ(x<sup>2</sup>) = δ(x)r(x) + δ(x)d(x)) holds for all x, y Є A. In the present paper it is shown that if δ:A→A is a multiplicative generalized Jordan (σ, Γ)-derivation on A, then δ is an additive generalized (σ, Γ)-derivation on A.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "SpMM Multiplication"

1

Singh, Kunal. "High-Performance Sparse Matrix-Multi Vector Multiplication on Multi-Core Architecture." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524089757826551.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Hong, Changwan. "Code Optimization on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ashari, Arash. "Sparse Matrix-Vector Multiplication on GPU." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1417770100.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Godwin, Jeswin Samuel. "High-Performancs Sparse Matrix-Vector Multiplication on GPUS for Structured Grid Computations." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1357280824.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Boyer, Brice. "Multiplication matricielle efficace et conception logicielle pour la bibliothèque de calcul exact LinBox." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00767915.

Full text
Abstract:
Dans ce mémoire de thèse, nous développons d'abord des multiplications matricielles efficaces. Nous créons de nouveaux ordonnancements qui permettent de réduire la taille de la mémoire supplémentaire nécessaire lors d'une multiplication du type Winograd tout en gardant une bonne complexité, grâce au développement d'outils externes ad hoc (jeu de galets), à des calculs fins de complexité et à de nouveaux algorithmes hybrides. Nous utilisons ensuite des technologies parallèles (multicœurs et GPU) pour accélérer efficacement la multiplication entre matrice creuse et vecteur dense (SpMV), essentielles aux algorithmes dits /boîte noire/, et créons de nouveaux formats hybrides adéquats. Enfin, nous établissons des méthodes de /design/ générique orientées vers l'efficacité, notamment par conception par briques de base, et via des auto-optimisations. Nous proposons aussi des méthodes pour améliorer et standardiser la qualité du code de manière à pérenniser et rendre plus robuste le code produit. Cela permet de pérenniser de rendre plus robuste le code produit. Ces méthodes sont appliquées en particulier à la bibliothèque de calcul exact LinBox.
APA, Harvard, Vancouver, ISO, and other styles
6

Ramesh, Chinthala. "Hardware-Software Co-Design Accelerators for Sparse BLAS." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/4276.

Full text
Abstract:
Sparse Basic Linear Algebra Subroutines (Sparse BLAS) is an important library. Sparse BLAS includes three levels of subroutines. Level 1, Level2 and Level 3 Sparse BLAS routines. Level 1 Sparse BLAS routines do computations over sparse vector and spare/dense vector. Level 2 deals with sparse matrix and vector operations. Level 3 deals with sparse matrix and dense matrix operations. The computations of these Sparse BLAS routines on General Purpose Processors (GPPs) not only suffer from less utilization of hardware resources but also takes more compute time than the workload due to poor data locality of sparse vector/matrix storage formats. In the literature, tremendous efforts have been put into software to improve these Sparse BLAS routines performance on GPPs. GPPs best suit for applications with high data locality, whereas Sparse BLAS routines operate on applications with less data locality hence, GPPs performance is poor. Various Custom Function Units (Hardware Accelerators) are proposed in the literature and are proved to be efficient than soft wares which tried to accelerate Sparse BLAS subroutines. Though existing hardware accelerators improved the Sparse BLAS performance compared to software Sparse BLAS routines, there is still lot of scope to improve these accelerators. This thesis describes both the existing software and hardware software co-designs (HW/SW co-design) and identifies the limitations of these existing solutions. We propose a new sparse data representation called Sawtooth Compressed Row Storage (SCRS) and corresponding SpMV and SpMM algorithms. SCRS based SpMV and SpMM are performing better than existing software solutions. Even though SCRS based SpMV and SpMM algorithms perform better than existing solutions, they still could not reach theoretical peak performance. The knowledge gained from the study of limitations of these existing solutions including the proposed SCRS based SpMV and SpMM is used to propose new HW/SW co-designs. Software accelerators are limited by the hardware properties of GPPs, and GPUs itself, hence, we propose HW/SW co-designs to accelerate few basic Sparse BLAS operations (SpVV and SpMV). Our proposed Parallel Sparse BLAS HW/SW co-design achieves near theoretical peak performance with reasonable hardware resources.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "SpMM Multiplication"

1

Bisseling, Rob H. Parallel Scientific Computation. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198788348.001.0001.

Full text
Abstract:
This book explains how to use the bulk synchronous parallel (BSP) model to design and implement parallel algorithms in the areas of scientific computing and big data. Furthermore, it presents a hybrid BSP approach towards new hardware developments such as hierarchical architectures with both shared and distributed memory. The book provides a full treatment of core problems in scientific computing and big data, starting from a high-level problem description, via a sequential solution algorithm to a parallel solution algorithm and an actual parallel program written in the communication library BSPlib. Numerical experiments are presented for parallel programs on modern parallel computers ranging from desktop computers to massively parallel supercomputers. The introductory chapter of the book gives a complete overview of BSPlib, so that the reader already at an early stage is able to write his/her own parallel programs. Furthermore, it treats BSP benchmarking and parallel sorting by regular sampling. The next three chapters treat basic numerical linear algebra problems such as linear system solving by LU decomposition, sparse matrix-vector multiplication (SpMV), and the fast Fourier transform (FFT). The final chapter explores parallel algorithms for big data problems such as graph matching. The book is accompanied by a software package BSPedupack, freely available online from the author’s homepage, which contains all programs of the book and a set of test programs.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "SpMM Multiplication"

1

Guo, Mingfeng, Yaobin Wang, Jun Huang, Qingfeng Wang, Yaqing Zhang, Mu Xu, and Fang Lu. "Rgs-SpMM: Accelerate Sparse Matrix-Matrix Multiplication by Row Group Splitting Strategy on the GPU." In Lecture Notes in Computer Science, 61–66. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-21395-3_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Khan, Muhammad Hannan, Osman Hassan, and Shahid Khan. "Accelerating SpMV Multiplication in Probabilistic Model Checkers Using GPUs." In Theoretical Aspects of Computing – ICTAC 2021, 86–104. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-85315-0_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Stoyanov, Dimitar, Rui Machado, and Franz-Josef Pfreundt. "Task-Based Parallel Sparse Matrix-Vector Multiplication (SpMVM) with GPI-2." In Large-Scale Scientific Computing, 153–60. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-26520-9_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zekri, Ahmed S. "Three Dimensional SPMD Matrix–Matrix Multiplication Algorithm and a Stacked Many-Core Processor Architecture." In Lecture Notes in Electrical Engineering, 1139–50. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-3535-8_94.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bisseling, Rob H. "Sparse matrix–vector multiplication." In Parallel Scientific Computation, 190–290. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780198788348.003.0004.

Full text
Abstract:
This chapter introduces irregular algorithms and presents the example of parallel sparse matrix-vector multiplication (SpMV), which is the central operation in iterative linear system solvers. The irregular sparsity pattern of the matrix does not change during the multiplication, which may be repeated many times. This justifies putting a lot of effort into finding a good data distribution. The Mondriaan distribution of a sparse matrix is a useful non-Cartesian distribution that can be found by hypergraph-based partitioning. The Mondriaan package implements such a partitioning and also the newer medium-grain partitioning method. The chapter analyses the special cases of random sparse matrices and Laplacian matrices. It uses performance profiles and geometric means to compare different partitioning methods. Furthermore, it presents the hybrid-BSP model and a hybrid-BSP SpMV, which are aimed at hybrid distributed/shared-memory architectures. The parallel SpMV can be incorporated in applications, ranging from PageRank computation to artificial neural networks.
APA, Harvard, Vancouver, ISO, and other styles
6

Mpakos, Panagiotis, Nikela Papadopoulou, Chloe Alverti, Georgios Goumas, and Nectarios Koziris. "On the Performance and Energy Efficiency of Sparse Matrix-Vector Multiplication on FPGAs." In Parallel Computing: Technology Trends. IOS Press, 2020. http://dx.doi.org/10.3233/apc200092.

Full text
Abstract:
The Sparse Matrix-Vector Multiplication kernel (SpMV) has been one of the most popular kernels in high-performance computing, as the building block of many iterative solvers. At the same time, it has been one of the most notorious kernels, due to its low flop per byte ratio, which leads to under-utilization of modern processing system resources and a huge gap between the peak system performance and the observed performance of the kernel. However, moving forward to exascale, performance by itself is no longer the holy grail; the requirement for energy efficient high-performance computing systems is driving a trend towards processing units with better performance per watt ratios. Following this trend, FPGAs have emerged as an alternative, low-power accelerator for high-end systems. In this paper, we implement the SpMV kernel on FPGAs, towards an accelerated library for sparse matrix computations, for single-precision floating point values. Our implementation focuses on optimizing access to the data for the SpMV kernel and applies common optimizations to improve the parallelism and the performance of the SpMV kernel on FPGAs.We evaluate the performance and energy efficiency of our implementation, in comparison to modern CPUs and GPUs, for a diverse set of sparse matrices and demonstrate that FPGAs can be an energy-efficient solution for the SpMV kernel.
APA, Harvard, Vancouver, ISO, and other styles
7

"Sparse Linear Algebra." In Advances in Systems Analysis, Software Engineering, and High Performance Computing, 94–137. IGI Global, 2022. http://dx.doi.org/10.4018/978-1-7998-7082-1.ch004.

Full text
Abstract:
This chapter shows several new programming strategies based on tasking to parallelize sparse linear algebra kernels. The reader will explore different approaches to improve the performance of these kernels thanks to a better workload distribution and comprehension of the data layout. This will be accomplished through the study of some of the most popular and widely used sparse operations, such as SpMV (sparse matrix vector multiplication), GTSV (triangular solve), or CG (conjugate gradient). Those strategies have been tested on multicore systems. Some of them equipped GPU devices, showcasing how to overcome the peculiarities of task-based parallelized kernels in the context of sparse linear algebra computations.
APA, Harvard, Vancouver, ISO, and other styles
8

Jamalmohammed, Saira Banu, Lavanya K., Sumaiya Thaseen I., and Biju V. "Review on Sparse Matrix Storage Formats With Space Complexity Analysis." In Applications of Artificial Intelligence for Smart Technology, 122–45. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-3335-2.ch009.

Full text
Abstract:
Sparse matrix-vector multiplication (SpMV) is a challenging computational kernel in linear algebra applications, like data mining, image processing, and machine learning. The performance of this kernel is greatly dependent on the size of the input matrix and the underlying hardware features. Various sparse matrix storage formats referred to commonly as sparse formats have been proposed in the literature to reduce the size of the matrix. In modern multi-core and many-core architectures, the performance of the kernel is mainly dependent on memory wall and power wall problem. Normally review on sparse formats is done with specific architecture or with specific application. This chapter presents a comparative study on various sparse formats in cross platform architecture like CPU, graphics processor unit (GPU), and single instruction multiple data stream (SIMD) registers. Space complexity analysis of various formats with its representation is discussed. Finally, the merits and demerits of each format have been summarized into a table.
APA, Harvard, Vancouver, ISO, and other styles
9

Janecek, Andreas, and Ying Tan. "Swarm Intelligence for Non-Negative Matrix Factorization." In Recent Algorithms and Applications in Swarm Intelligence Research, 168–92. IGI Global, 2013. http://dx.doi.org/10.4018/978-1-4666-2479-5.ch009.

Full text
Abstract:
The Non-negative Matrix Factorization (NMF) is a special low-rank approximation which allows for an additive parts-based and interpretable representation of the data. This article presents efforts to improve the convergence, approximation quality, and classification accuracy of NMF using five different meta-heuristics based on swarm intelligence. Several properties of the NMF objective function motivate the utilization of meta-heuristics: this function is non-convex, discontinuous, and may possess many local minima. The proposed optimization strategies are two-fold: On the one hand, a new initialization strategy for NMF is presented in order to initialize the NMF factors prior to the factorization; on the other hand, an iterative update strategy is proposed, which improves the accuracy per runtime for the multiplicative update NMF algorithm. The success of the proposed optimization strategies are shown by applying them on synthetic data and data sets coming from the areas of spam filtering/email classification, and evaluate them also in their application context. Experimental results show that both optimization strategies are able to improve NMF in terms of faster convergence, lower approximation error, and better classification accuracy. Especially the initialization strategy leads to significant reductions of the runtime per accuracy ratio for both, the NMF approximation as well as the classification results achieved with NMF.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "SpMM Multiplication"

1

Huang, Guyue, Guohao Dai, Yu Wang, and Huazhong Yang. "GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks." In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020. http://dx.doi.org/10.1109/sc41405.2020.00076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Page, Brian A., and Peter M. Kogge. "Scalability of Hybrid Sparse Matrix Dense Vector (SpMV) Multiplication." In 2018 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 2018. http://dx.doi.org/10.1109/hpcs.2018.00072.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wu, Di, Wei Cao, and Lingli Wang. "SpWMM: A High-Performance Sparse-Winograd Matrix-Matrix Multiplication Accelerator for CNNs." In 2019 International Conference on Field-Programmable Technology (ICFPT). IEEE, 2019. http://dx.doi.org/10.1109/icfpt47387.2019.00041.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Merrill, Duane, and Michael Garland. "Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format." In PPoPP '16: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2851141.2851190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Suresh, Krishnan, and Praveen Yadav. "Large-Scale Modal Analysis on Multi-Core Architectures." In ASME 2012 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2012. http://dx.doi.org/10.1115/detc2012-70281.

Full text
Abstract:
We propose here a subspace augmented Rayleigh-Ritz conjugate gradient method (SaRCG) for solving large-scale eigen-value problems. The method is highly scalable and well suited for multi-core architectures since it only requires sparse matrix-vector multiplications (SpMV). As a specific application, we consider the modal analysis of geometrically complex structures that are discretized via non-conforming voxels. The voxelization process is robust and relatively insensitive to geometric complexity, but it leads to large eigen-value problems, that are difficult to solve via standard eigen-solvers such as block-Lanczos. Such problems are easily solved via the proposed SaRCG, where one can, in addition, exploit the voxelization structure to render the SpMV assembly-free. As the numerical experiments indicate, the resulting implementation on multi-core CPUs, and graphics-programmable-units is a practical solution to automated eigen-value estimation during early stages of design.
APA, Harvard, Vancouver, ISO, and other styles
6

Hou, Kaixi, Wu-chun Feng, and Shuai Che. "Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors." In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2017. http://dx.doi.org/10.1109/ipdpsw.2017.155.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kang, Jinsong, Shoujian Yang, Wei Xia, and Shuo Wang. "An improved deadbeat control strategy for photovoltaic grid-connected inverter based on unipolar & frequency multiplication SPWM and its implementation." In 2014 IEEE International Power Electronics and Application Conference and Exposition (PEAC). IEEE, 2014. http://dx.doi.org/10.1109/peac.2014.7037934.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography