To see the other types of publications on this topic, follow the link: Sparse Basic Linear Algebra Subroutines.

Journal articles on the topic 'Sparse Basic Linear Algebra Subroutines'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 18 journal articles for your research on the topic 'Sparse Basic Linear Algebra Subroutines.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Yang, Bing, Xi Chen, Xiang Yun Liao, Mian Lun Zheng, and Zhi Yong Yuan. "FEM-Based Modeling and Deformation of Soft Tissue Accelerated by CUSPARSE and CUBLAS." Advanced Materials Research 671-674 (March 2013): 3200–3203. http://dx.doi.org/10.4028/www.scientific.net/amr.671-674.3200.

Full text
Abstract:
Realistic modeling and deformation of soft tissue is one of the key technologies of virtual surgery simulation which is a challenging research field that stimulates the development of new clinical applications such as the virtual surgery simulator. In this paper we adopt the linear FEM (Finite Element Method) and sparse matrix compression stored in CSR (Compressed Sparse Row) format that enables fast modeling and deformation of soft tissue on GPU hardware with NVIDIA’s CUSPARSE (Compute Unified Device Architecture Sparse Matrix) and CUBLAS (Compute Unified Device Architecture Basic Linear Algebra Subroutines) library. We focus on the CGS (Conjugate Gradient Solver) which is the mainly time-consuming part of FEM, and transplant it onto GPU with the two libraries mentioned above. The experimental results show that the accelerating method in this paper can achieve realistic and fast modeling and deformation simulation of soft tissue.
APA, Harvard, Vancouver, ISO, and other styles
2

Magnin, H., and J. L. Coulomb. "A parallel and vectorial implementation of basic linear algebra subroutines in iterative solving of large sparse linear systems of equations." IEEE Transactions on Magnetics 25, no. 4 (July 1989): 2895–97. http://dx.doi.org/10.1109/20.34317.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kramer, David, S. Lennart Johnsson, and Yu Hu. "Local Basic Linear Algebra Subroutines (LBLAS) for the CM-5/5E." International Journal of Supercomputer Applications and High Performance Computing 10, no. 4 (December 1996): 300–335. http://dx.doi.org/10.1177/109434209601000403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shaeffer, John. "BLAS IV: A BLAS for Rk Matrix Algebra." Applied Computational Electromagnetics Society 35, no. 11 (February 3, 2021): 1266–67. http://dx.doi.org/10.47037/2020.aces.j.351102.

Full text
Abstract:
Basic Linear Algebra Subroutines (BLAS) are well-known low-level workhorse subroutines for linear algebra vector-vector, matrixvector and matrix-matrix operations for full rank matrices. The advent of block low rank (Rk) full wave direct solvers, where most blocks of the system matrix are Rk, an extension to the BLAS III matrix-matrix work horse routine is needed due to the agony of Rk addition. This note outlines the problem of BLAS III for Rk LU and solve operations and then outlines an alternative approach, which we will call BLAS IV. This approach utilizes the thrill of Rk matrix-matrix multiply and uses the Adaptive Cross Approximation (ACA) as a methodology to evaluate sums of Rk terms to circumvent the agony of low rank addition.
APA, Harvard, Vancouver, ISO, and other styles
5

Demmel, James W., Michael T. Heath, and Henk A. van der Vorst. "Parallel numerical linear algebra." Acta Numerica 2 (January 1993): 111–97. http://dx.doi.org/10.1017/s096249290000235x.

Full text
Abstract:
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of paralled processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
APA, Harvard, Vancouver, ISO, and other styles
6

Duff, Iain S., Michele Marrone, Giuseppe Radicati, and Carlo Vittoli. "Level 3 basic linear algebra subprograms for sparse matrices." ACM Transactions on Mathematical Software 23, no. 3 (September 1997): 379–401. http://dx.doi.org/10.1145/275323.275327.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Dodson, David S., Roger G. Grimes, and John G. Lewis. "Sparse extensions to the FORTRAN Basic Linear Algebra Subprograms." ACM Transactions on Mathematical Software 17, no. 2 (June 1991): 253–63. http://dx.doi.org/10.1145/108556.108577.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Dodson, David S., and John G. Lewis. "Proposed sparse extensions to the Basic Linear Algebra Subprograms." ACM SIGNUM Newsletter 20, no. 1 (January 1985): 22–25. http://dx.doi.org/10.1145/1057935.1057938.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Duff, Iain S., Michael A. Heroux, and Roldan Pozo. "An overview of the sparse basic linear algebra subprograms." ACM Transactions on Mathematical Software 28, no. 2 (June 2002): 239–67. http://dx.doi.org/10.1145/567806.567810.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Aliaga, José I., Rocío Carratalá-Sáez, and Enrique S. Quintana-Ortí. "Parallel Solution of Hierarchical Symmetric Positive Definite Linear Systems." Applied Mathematics and Nonlinear Sciences 2, no. 1 (June 22, 2017): 201–12. http://dx.doi.org/10.21042/amns.2017.1.00017.

Full text
Abstract:
AbstractWe present a prototype task-parallel algorithm for the solution of hierarchical symmetric positive definite linear systems via the ℋ-Cholesky factorization that builds upon the parallel programming standards and associated runtimes for OpenMP and OmpSs. In contrast with previous efforts, our proposal decouples the numerical aspects of the linear algebra operation from the complexities associated with high performance computing. Our experiments make an exhaustive analysis of the efficiency attained by different parallelization approaches that exploit either task-parallelism or loop-parallelism via a runtime. Alternatively, we also evaluate a solution that leverages multi-threaded parallelism via the parallel implementation of the Basic Linear Algebra Subroutines (BLAS) in Intel MKL.
APA, Harvard, Vancouver, ISO, and other styles
11

Johnsson, S. Lennart, and Luis F. Ortiz. "Local Basic Linear Algebra Subroutines (Lblas) for Distributed Memory Architectures and Languages With Array Syntax." International Journal of Supercomputing Applications 6, no. 4 (December 1992): 322–50. http://dx.doi.org/10.1177/109434209200600403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Dodson, David S., Roger G. Grimes, and John G. Lewis. "Algorithm 692: Model implementation and test package for the Sparse Basic Linear Algebra Subprograms." ACM Transactions on Mathematical Software 17, no. 2 (June 1991): 264–72. http://dx.doi.org/10.1145/108556.108582.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Vuik, C. "Krylov Subspace Solvers and Preconditioners." ESAIM: Proceedings and Surveys 63 (2018): 1–43. http://dx.doi.org/10.1051/proc/201863001.

Full text
Abstract:
In these lecture notes an introduction to Krylov subspace solvers and preconditioners is presented. After a discretization of partial differential equations large, sparse systems of linear equations have to be solved. Fast solution of these systems is very urgent nowadays. The size of the problems can be 1013 unknowns and 1013 equations. Iterative solution methods are the methods of choice for these large linear systems. We start with a short introduction of Basic Iterative Methods. Thereafter preconditioned Krylov subspace methods, which are state of the art, are describeed. A distinction is made between various classes of matrices. At the end of the lecture notes many references are given to state of the art Scientific Computing methods. Here, we will discuss a number of books which are nice to use for an overview of background material. First of all the books of Golub and Van Loan [19] and Horn and Johnson [26] are classical works on all aspects of numerical linear algebra. These books also contain most of the material, which is used for direct solvers. Varga [50] is a good starting point to study the theory of basic iterative methods. Krylov subspace methods and multigrid are discussed in Saad [38] and Trottenberg, Oosterlee and Schüller [42]. Other books on Krylov subspace methods are [1, 6, 21, 34, 39].
APA, Harvard, Vancouver, ISO, and other styles
14

Morris, Karla, Damian W. I. Rouson, M. Nicole Lemaster, and Salvatore Filippone. "Exploring Capabilities within ForTrilinos by Solving the 3D Burgers Equation." Scientific Programming 20, no. 3 (2012): 275–92. http://dx.doi.org/10.1155/2012/378791.

Full text
Abstract:
We present the first three-dimensional, partial differential equation solver to be built atop the recently released, open-source ForTrilinos package (http://trilinos.sandia.gov/packages/fortrilinos). ForTrilinos currently provides portable, object-oriented Fortran 2003 interfaces to the C++ packages Epetra, AztecOO and Pliris in the Trilinos library and framework [ACM Trans. Math. Softw.31(3) (2005), 397–423]. Epetra provides distributed matrix and vector storage and basic linear algebra calculations. Pliris provides direct solvers for dense linear systems. AztecOO provides iterative sparse linear solvers. We demonstrate how to build a parallel application that encapsulates the Message Passing Interface (MPI) without requiring the user to make direct calls to MPI except for startup and shutdown. The presented example demonstrates the level of effort required to set up a high-order, finite-difference solution on a Cartesian grid. The example employs an abstract data type (ADT) calculus [Sci. Program.16(4) (2008), 329–339] that empowers programmers to write serial code that lower-level abstractions resolve into distributed-memory, parallel implementations. The ADT calculus uses compilable Fortran constructs that resemble the mathematical formulation of the partial differential equation of interest.
APA, Harvard, Vancouver, ISO, and other styles
15

Egunov, V. A., and A. G. Kravets. "A Method for Improving the Caching Strategy for Computing Systems with Shared Memory." Programmnaya Ingeneria 14, no. 7 (July 27, 2023): 329–38. http://dx.doi.org/10.17587/prin.14.329-338.

Full text
Abstract:
This paper considers the problem of increasing the software efficiency in terms of reducing the costs of their development and operation in the process of solving production and research tasks. We have analysed the existing approaches to solving this problem by example of parameterized algorithms for implementing mVm (matrix—vector multiplication) and MMM (matrix—matrix multiplication)) BLAS (Basic Linear Algebra Subroutines) operations. To achieve the goal of increasing the software efficiency, we proposed a new design method, designed to improve data caching algorithms in the software development for computing systems with a hierarchical memory structure. Using the proposed design procedure, we developed an analytical approach to evaluating the software effectiveness from the point of view of using a memory subsystem with a hierarchical structure is implemented. We applied the proposed method to the two-sided Householder transformation for the task of reducing the general form matrix to the Hessenberg form. Then we presented new algorithms for solving the problem, which are optimized variants of the Householder classical transformation: Row-Oriented Householder and Single-Pass Householder. The use of these algorithms can significantly reduce the software execution time. Computational experiments were carried out on a parallel computing system with shared memory, which is one of the nodes of the computing cluster of the Volgograd State Technical University. We made a comparison of the software execution time that reduce general-form matrices to Hessenberg form, written using the proposed algorithms and using the LAPACKE_dgehrd() function of the Intel MKL library. The conclusions made in the work are confirmed by the results of the conducted computational experiments.
APA, Harvard, Vancouver, ISO, and other styles
16

Stringer, James C., L. Kent Thomas, and Ray G. Pierson. "Efficiency of D4 Gaussian Elimination on a Vector Computer." Society of Petroleum Engineers Journal 25, no. 01 (February 1, 1985): 121–24. http://dx.doi.org/10.2118/11082-pa.

Full text
Abstract:
Abstract The efficiency of D4 Gaussian elimination on a vector computer, the Cray- 1/S, it examined. The algorithm used in this work is employed routinely in Phillips Petroleum Co. reservoir simulation models. Comparisons of scalar Phillips Petroleum Co. reservoir simulation models. Comparisons of scalar and vector Cray-1/S times are given for various example cases including multiple unknowns per gridblock. Vectorization of the program on the Cray- 1/S is discussed. Introduction In reservoir simulation, the solution of large systems of linear equations accounts for a substantial percentage of the computation time. Methods used today consist of both iterative and direct solution algorithms. Because of the theoretical savings in both storage and computing labor, D4 Gaussian elimination is a popular direct solution algorithm and is used widely on conventional scalar computers. In this paper we investigate the efficiency of the D4 algorithm on a computer with vector processing capabilities-the Cray-1/S. The D4 (or alternate diagonal) algorithm originally was presented by Price and Coats in 1973. Since that time much work has been done on the Price and Coats in 1973. Since that time much work has been done on the algorithm including an investigation by Nolen on the vector performance of D4 on the CDC Star 100 and Cyber 203 on single-unknown-per-gridblock example cases. Levesque has presented a comparison of the Cray-1 and Cyber 205 in reservoir simulation that includes the D4 algorithm. Vector performance of the Cray-1 on linear algebra kernels, both sparse and dense, performance of the Cray-1 on linear algebra kernels, both sparse and dense, also has been reported. Vector performance on these kernels typically is expressed in terms of million floating point operations per second (MFLOPS). Our objective here is to evaluate vector performance on a typical production code written in FORTRAN for a scalar computer. Therefore, performance, or efficiency, will be evaluated in terms of both scalar and vector CPU times on the Cray-1/S. We include vector performance on the original code with automatic vectorization enabled, and vector performance on the same code with minor restructuring, automatic performance on the same code with minor restructuring, automatic vectorization enabled, and the use of Cray assembly language (CAL) basic linear algebra kernels. Example cases for multiple unknowns per gridblock are presented. Reservoir Flow Equations The reservoir flow equations written using a seven-point finite difference formulation can be expressed as ...........................(1) where the terms A, B... G are matrices of order N equal to the number of unknowns per gridblock. represents the vector of unknowns at cell i, j, k, and H is the vector of residuals of the flow equations at cell i, j, k at iteration . Values of N from 1 to 10 typically are encountered depending on the type of simulator and the degree of implicitness used. For example, N is equal to one for an implicit pressure, explicit saturation (IMPES) black-oil model; three for a fully implicit black-oil model; five for an implicit three-component steamflood model and usually 10 or less for an implicit compositional model. Driver Program To facilitate timing studies in this work, a driver program was written to calculate coefficients for the D4 Gaussian elimination routine. Input to the program consists of grid dimensions and the number of unknowns per gridblock. All elements of the off-diagonal matrices (A, C, D... G) were set equal to 1. To guarantee a nonsingular solution, the B matrix was set equal to -5 for one unknown and as below for N unknowns. ............................(2) Right-side coefficients, H, were calculated by assuming a unit solution for . No-flow boundary conditions were used, which require specific matrices, such as A for I = 1 and C for I = NX, to be set equal to zero. Description of Hardware and Software All run times reported in this work were obtained on the Cray-1/S, Serial No. 23, at United Computing Systems in Kansas City, MO. Serial No. 23 contains 1 million 64-bit words of central memory interleaved in 16 memory banks and no input/output (I/O) subsystems. The FORTRAN compiler used was CFT 1.09. CPU times were obtained by calling SECOND, a FORTRAN-callable utility routine that returns CPU time since the start of the job in FPS'S. CPU overhead incurred for each call to SECOND is approximately 2.5 microseconds. For all reported Cray-1/S times, "vector" refers to the original FORTRAN code run with automatic vectorization enabled, which is the normal operating mode. SPEJ p. 121
APA, Harvard, Vancouver, ISO, and other styles
17

Zhao, Liming, Zhikuan Zhao, Patrick Rebentrost, and Joseph Fitzsimons. "Compiling basic linear algebra subroutines for quantum computers." Quantum Machine Intelligence 3, no. 2 (June 18, 2021). http://dx.doi.org/10.1007/s42484-021-00048-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Šimeček, I. "Acceleration of Sparse Matrix-Vector Multiplication by Region Traversal." Acta Polytechnica 48, no. 4 (January 4, 2008). http://dx.doi.org/10.14311/1029.

Full text
Abstract:
Sparse matrix-vector multiplication (shortly SpM×V) is one of most common subroutines in numerical linear algebra. The problem is that the memory access patterns during SpM×V are irregular, and utilization of the cache can suffer from low spatial or temporal locality. Approaches to improve the performance of SpM×V are based on matrix reordering and register blocking. These matrix transformations are designed to handle randomly occurring dense blocks in a sparse matrix. The efficiency of these transformations depends strongly on the presence of suitable blocks. The overhead of reorganization of a matrix from one format to another is often of the order of tens of executions ofSpM×V. For this reason, such a reorganization pays off only if the same matrix A is multiplied by multiple different vectors, e.g., in iterative linear solvers.This paper introduces an unusual approach to accelerate SpM×V. This approach can be combined with other acceleration approaches andconsists of three steps:1) dividing matrix A into non-empty regions,2) choosing an efficient way to traverse these regions (in other words, choosing an efficient ordering of partial multiplications),3) choosing the optimal type of storage for each region.All these three steps are tightly coupled. The first step divides the whole matrix into smaller parts (regions) that can fit in the cache. The second step improves the locality during multiplication due to better utilization of distant references. The last step maximizes the machine computation performance of the partial multiplication for each region.In this paper, we describe aspects of these 3 steps in more detail (including fast and time-inexpensive algorithms for all steps). Ourmeasurements prove that our approach gives a significant speedup for almost all matrices arising from various technical areas.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography