Статті в журналах з теми "Sparse Basic Linear Algebra Subroutines"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Sparse Basic Linear Algebra Subroutines.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-18 статей у журналах для дослідження на тему "Sparse Basic Linear Algebra Subroutines".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Yang, Bing, Xi Chen, Xiang Yun Liao, Mian Lun Zheng, and Zhi Yong Yuan. "FEM-Based Modeling and Deformation of Soft Tissue Accelerated by CUSPARSE and CUBLAS." Advanced Materials Research 671-674 (March 2013): 3200–3203. http://dx.doi.org/10.4028/www.scientific.net/amr.671-674.3200.

Повний текст джерела
Анотація:
Realistic modeling and deformation of soft tissue is one of the key technologies of virtual surgery simulation which is a challenging research field that stimulates the development of new clinical applications such as the virtual surgery simulator. In this paper we adopt the linear FEM (Finite Element Method) and sparse matrix compression stored in CSR (Compressed Sparse Row) format that enables fast modeling and deformation of soft tissue on GPU hardware with NVIDIA’s CUSPARSE (Compute Unified Device Architecture Sparse Matrix) and CUBLAS (Compute Unified Device Architecture Basic Linear Algebra Subroutines) library. We focus on the CGS (Conjugate Gradient Solver) which is the mainly time-consuming part of FEM, and transplant it onto GPU with the two libraries mentioned above. The experimental results show that the accelerating method in this paper can achieve realistic and fast modeling and deformation simulation of soft tissue.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Magnin, H., and J. L. Coulomb. "A parallel and vectorial implementation of basic linear algebra subroutines in iterative solving of large sparse linear systems of equations." IEEE Transactions on Magnetics 25, no. 4 (July 1989): 2895–97. http://dx.doi.org/10.1109/20.34317.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Kramer, David, S. Lennart Johnsson, and Yu Hu. "Local Basic Linear Algebra Subroutines (LBLAS) for the CM-5/5E." International Journal of Supercomputer Applications and High Performance Computing 10, no. 4 (December 1996): 300–335. http://dx.doi.org/10.1177/109434209601000403.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Shaeffer, John. "BLAS IV: A BLAS for Rk Matrix Algebra." Applied Computational Electromagnetics Society 35, no. 11 (February 3, 2021): 1266–67. http://dx.doi.org/10.47037/2020.aces.j.351102.

Повний текст джерела
Анотація:
Basic Linear Algebra Subroutines (BLAS) are well-known low-level workhorse subroutines for linear algebra vector-vector, matrixvector and matrix-matrix operations for full rank matrices. The advent of block low rank (Rk) full wave direct solvers, where most blocks of the system matrix are Rk, an extension to the BLAS III matrix-matrix work horse routine is needed due to the agony of Rk addition. This note outlines the problem of BLAS III for Rk LU and solve operations and then outlines an alternative approach, which we will call BLAS IV. This approach utilizes the thrill of Rk matrix-matrix multiply and uses the Adaptive Cross Approximation (ACA) as a methodology to evaluate sums of Rk terms to circumvent the agony of low rank addition.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Demmel, James W., Michael T. Heath, and Henk A. van der Vorst. "Parallel numerical linear algebra." Acta Numerica 2 (January 1993): 111–97. http://dx.doi.org/10.1017/s096249290000235x.

Повний текст джерела
Анотація:
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of paralled processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Duff, Iain S., Michele Marrone, Giuseppe Radicati, and Carlo Vittoli. "Level 3 basic linear algebra subprograms for sparse matrices." ACM Transactions on Mathematical Software 23, no. 3 (September 1997): 379–401. http://dx.doi.org/10.1145/275323.275327.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Dodson, David S., Roger G. Grimes, and John G. Lewis. "Sparse extensions to the FORTRAN Basic Linear Algebra Subprograms." ACM Transactions on Mathematical Software 17, no. 2 (June 1991): 253–63. http://dx.doi.org/10.1145/108556.108577.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Dodson, David S., and John G. Lewis. "Proposed sparse extensions to the Basic Linear Algebra Subprograms." ACM SIGNUM Newsletter 20, no. 1 (January 1985): 22–25. http://dx.doi.org/10.1145/1057935.1057938.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Duff, Iain S., Michael A. Heroux, and Roldan Pozo. "An overview of the sparse basic linear algebra subprograms." ACM Transactions on Mathematical Software 28, no. 2 (June 2002): 239–67. http://dx.doi.org/10.1145/567806.567810.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Aliaga, José I., Rocío Carratalá-Sáez, and Enrique S. Quintana-Ortí. "Parallel Solution of Hierarchical Symmetric Positive Definite Linear Systems." Applied Mathematics and Nonlinear Sciences 2, no. 1 (June 22, 2017): 201–12. http://dx.doi.org/10.21042/amns.2017.1.00017.

Повний текст джерела
Анотація:
AbstractWe present a prototype task-parallel algorithm for the solution of hierarchical symmetric positive definite linear systems via the ℋ-Cholesky factorization that builds upon the parallel programming standards and associated runtimes for OpenMP and OmpSs. In contrast with previous efforts, our proposal decouples the numerical aspects of the linear algebra operation from the complexities associated with high performance computing. Our experiments make an exhaustive analysis of the efficiency attained by different parallelization approaches that exploit either task-parallelism or loop-parallelism via a runtime. Alternatively, we also evaluate a solution that leverages multi-threaded parallelism via the parallel implementation of the Basic Linear Algebra Subroutines (BLAS) in Intel MKL.
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Johnsson, S. Lennart, and Luis F. Ortiz. "Local Basic Linear Algebra Subroutines (Lblas) for Distributed Memory Architectures and Languages With Array Syntax." International Journal of Supercomputing Applications 6, no. 4 (December 1992): 322–50. http://dx.doi.org/10.1177/109434209200600403.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Dodson, David S., Roger G. Grimes, and John G. Lewis. "Algorithm 692: Model implementation and test package for the Sparse Basic Linear Algebra Subprograms." ACM Transactions on Mathematical Software 17, no. 2 (June 1991): 264–72. http://dx.doi.org/10.1145/108556.108582.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Vuik, C. "Krylov Subspace Solvers and Preconditioners." ESAIM: Proceedings and Surveys 63 (2018): 1–43. http://dx.doi.org/10.1051/proc/201863001.

Повний текст джерела
Анотація:
In these lecture notes an introduction to Krylov subspace solvers and preconditioners is presented. After a discretization of partial differential equations large, sparse systems of linear equations have to be solved. Fast solution of these systems is very urgent nowadays. The size of the problems can be 1013 unknowns and 1013 equations. Iterative solution methods are the methods of choice for these large linear systems. We start with a short introduction of Basic Iterative Methods. Thereafter preconditioned Krylov subspace methods, which are state of the art, are describeed. A distinction is made between various classes of matrices. At the end of the lecture notes many references are given to state of the art Scientific Computing methods. Here, we will discuss a number of books which are nice to use for an overview of background material. First of all the books of Golub and Van Loan [19] and Horn and Johnson [26] are classical works on all aspects of numerical linear algebra. These books also contain most of the material, which is used for direct solvers. Varga [50] is a good starting point to study the theory of basic iterative methods. Krylov subspace methods and multigrid are discussed in Saad [38] and Trottenberg, Oosterlee and Schüller [42]. Other books on Krylov subspace methods are [1, 6, 21, 34, 39].
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Morris, Karla, Damian W. I. Rouson, M. Nicole Lemaster, and Salvatore Filippone. "Exploring Capabilities within ForTrilinos by Solving the 3D Burgers Equation." Scientific Programming 20, no. 3 (2012): 275–92. http://dx.doi.org/10.1155/2012/378791.

Повний текст джерела
Анотація:
We present the first three-dimensional, partial differential equation solver to be built atop the recently released, open-source ForTrilinos package (http://trilinos.sandia.gov/packages/fortrilinos). ForTrilinos currently provides portable, object-oriented Fortran 2003 interfaces to the C++ packages Epetra, AztecOO and Pliris in the Trilinos library and framework [ACM Trans. Math. Softw.31(3) (2005), 397–423]. Epetra provides distributed matrix and vector storage and basic linear algebra calculations. Pliris provides direct solvers for dense linear systems. AztecOO provides iterative sparse linear solvers. We demonstrate how to build a parallel application that encapsulates the Message Passing Interface (MPI) without requiring the user to make direct calls to MPI except for startup and shutdown. The presented example demonstrates the level of effort required to set up a high-order, finite-difference solution on a Cartesian grid. The example employs an abstract data type (ADT) calculus [Sci. Program.16(4) (2008), 329–339] that empowers programmers to write serial code that lower-level abstractions resolve into distributed-memory, parallel implementations. The ADT calculus uses compilable Fortran constructs that resemble the mathematical formulation of the partial differential equation of interest.
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Egunov, V. A., and A. G. Kravets. "A Method for Improving the Caching Strategy for Computing Systems with Shared Memory." Programmnaya Ingeneria 14, no. 7 (July 27, 2023): 329–38. http://dx.doi.org/10.17587/prin.14.329-338.

Повний текст джерела
Анотація:
This paper considers the problem of increasing the software efficiency in terms of reducing the costs of their development and operation in the process of solving production and research tasks. We have analysed the existing approaches to solving this problem by example of parameterized algorithms for implementing mVm (matrix—vector multiplication) and MMM (matrix—matrix multiplication)) BLAS (Basic Linear Algebra Subroutines) operations. To achieve the goal of increasing the software efficiency, we proposed a new design method, designed to improve data caching algorithms in the software development for computing systems with a hierarchical memory structure. Using the proposed design procedure, we developed an analytical approach to evaluating the software effectiveness from the point of view of using a memory subsystem with a hierarchical structure is implemented. We applied the proposed method to the two-sided Householder transformation for the task of reducing the general form matrix to the Hessenberg form. Then we presented new algorithms for solving the problem, which are optimized variants of the Householder classical transformation: Row-Oriented Householder and Single-Pass Householder. The use of these algorithms can significantly reduce the software execution time. Computational experiments were carried out on a parallel computing system with shared memory, which is one of the nodes of the computing cluster of the Volgograd State Technical University. We made a comparison of the software execution time that reduce general-form matrices to Hessenberg form, written using the proposed algorithms and using the LAPACKE_dgehrd() function of the Intel MKL library. The conclusions made in the work are confirmed by the results of the conducted computational experiments.
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Stringer, James C., L. Kent Thomas, and Ray G. Pierson. "Efficiency of D4 Gaussian Elimination on a Vector Computer." Society of Petroleum Engineers Journal 25, no. 01 (February 1, 1985): 121–24. http://dx.doi.org/10.2118/11082-pa.

Повний текст джерела
Анотація:
Abstract The efficiency of D4 Gaussian elimination on a vector computer, the Cray- 1/S, it examined. The algorithm used in this work is employed routinely in Phillips Petroleum Co. reservoir simulation models. Comparisons of scalar Phillips Petroleum Co. reservoir simulation models. Comparisons of scalar and vector Cray-1/S times are given for various example cases including multiple unknowns per gridblock. Vectorization of the program on the Cray- 1/S is discussed. Introduction In reservoir simulation, the solution of large systems of linear equations accounts for a substantial percentage of the computation time. Methods used today consist of both iterative and direct solution algorithms. Because of the theoretical savings in both storage and computing labor, D4 Gaussian elimination is a popular direct solution algorithm and is used widely on conventional scalar computers. In this paper we investigate the efficiency of the D4 algorithm on a computer with vector processing capabilities-the Cray-1/S. The D4 (or alternate diagonal) algorithm originally was presented by Price and Coats in 1973. Since that time much work has been done on the Price and Coats in 1973. Since that time much work has been done on the algorithm including an investigation by Nolen on the vector performance of D4 on the CDC Star 100 and Cyber 203 on single-unknown-per-gridblock example cases. Levesque has presented a comparison of the Cray-1 and Cyber 205 in reservoir simulation that includes the D4 algorithm. Vector performance of the Cray-1 on linear algebra kernels, both sparse and dense, performance of the Cray-1 on linear algebra kernels, both sparse and dense, also has been reported. Vector performance on these kernels typically is expressed in terms of million floating point operations per second (MFLOPS). Our objective here is to evaluate vector performance on a typical production code written in FORTRAN for a scalar computer. Therefore, performance, or efficiency, will be evaluated in terms of both scalar and vector CPU times on the Cray-1/S. We include vector performance on the original code with automatic vectorization enabled, and vector performance on the same code with minor restructuring, automatic performance on the same code with minor restructuring, automatic vectorization enabled, and the use of Cray assembly language (CAL) basic linear algebra kernels. Example cases for multiple unknowns per gridblock are presented. Reservoir Flow Equations The reservoir flow equations written using a seven-point finite difference formulation can be expressed as ...........................(1) where the terms A, B... G are matrices of order N equal to the number of unknowns per gridblock. represents the vector of unknowns at cell i, j, k, and H is the vector of residuals of the flow equations at cell i, j, k at iteration . Values of N from 1 to 10 typically are encountered depending on the type of simulator and the degree of implicitness used. For example, N is equal to one for an implicit pressure, explicit saturation (IMPES) black-oil model; three for a fully implicit black-oil model; five for an implicit three-component steamflood model and usually 10 or less for an implicit compositional model. Driver Program To facilitate timing studies in this work, a driver program was written to calculate coefficients for the D4 Gaussian elimination routine. Input to the program consists of grid dimensions and the number of unknowns per gridblock. All elements of the off-diagonal matrices (A, C, D... G) were set equal to 1. To guarantee a nonsingular solution, the B matrix was set equal to -5 for one unknown and as below for N unknowns. ............................(2) Right-side coefficients, H, were calculated by assuming a unit solution for . No-flow boundary conditions were used, which require specific matrices, such as A for I = 1 and C for I = NX, to be set equal to zero. Description of Hardware and Software All run times reported in this work were obtained on the Cray-1/S, Serial No. 23, at United Computing Systems in Kansas City, MO. Serial No. 23 contains 1 million 64-bit words of central memory interleaved in 16 memory banks and no input/output (I/O) subsystems. The FORTRAN compiler used was CFT 1.09. CPU times were obtained by calling SECOND, a FORTRAN-callable utility routine that returns CPU time since the start of the job in FPS'S. CPU overhead incurred for each call to SECOND is approximately 2.5 microseconds. For all reported Cray-1/S times, "vector" refers to the original FORTRAN code run with automatic vectorization enabled, which is the normal operating mode. SPEJ p. 121
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Zhao, Liming, Zhikuan Zhao, Patrick Rebentrost, and Joseph Fitzsimons. "Compiling basic linear algebra subroutines for quantum computers." Quantum Machine Intelligence 3, no. 2 (June 18, 2021). http://dx.doi.org/10.1007/s42484-021-00048-8.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Šimeček, I. "Acceleration of Sparse Matrix-Vector Multiplication by Region Traversal." Acta Polytechnica 48, no. 4 (January 4, 2008). http://dx.doi.org/10.14311/1029.

Повний текст джерела
Анотація:
Sparse matrix-vector multiplication (shortly SpM×V) is one of most common subroutines in numerical linear algebra. The problem is that the memory access patterns during SpM×V are irregular, and utilization of the cache can suffer from low spatial or temporal locality. Approaches to improve the performance of SpM×V are based on matrix reordering and register blocking. These matrix transformations are designed to handle randomly occurring dense blocks in a sparse matrix. The efficiency of these transformations depends strongly on the presence of suitable blocks. The overhead of reorganization of a matrix from one format to another is often of the order of tens of executions ofSpM×V. For this reason, such a reorganization pays off only if the same matrix A is multiplied by multiple different vectors, e.g., in iterative linear solvers.This paper introduces an unusual approach to accelerate SpM×V. This approach can be combined with other acceleration approaches andconsists of three steps:1) dividing matrix A into non-empty regions,2) choosing an efficient way to traverse these regions (in other words, choosing an efficient ordering of partial multiplications),3) choosing the optimal type of storage for each region.All these three steps are tightly coupled. The first step divides the whole matrix into smaller parts (regions) that can fit in the cache. The second step improves the locality during multiplication due to better utilization of distant references. The last step maximizes the machine computation performance of the partial multiplication for each region.In this paper, we describe aspects of these 3 steps in more detail (including fast and time-inexpensive algorithms for all steps). Ourmeasurements prove that our approach gives a significant speedup for almost all matrices arising from various technical areas.
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії