Journal articles on the topic 'Sparse Matrix Storage Formats'

To see the other types of publications on this topic, follow the link: Sparse Matrix Storage Formats.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Sparse Matrix Storage Formats.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Langr, Daniel, and Pavel Tvrdik. "Evaluation Criteria for Sparse Matrix Storage Formats." IEEE Transactions on Parallel and Distributed Systems 27, no. 2 (February 1, 2016): 428–40. http://dx.doi.org/10.1109/tpds.2015.2401575.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

MUKADDES, ABUL MUKID MOHAMMAD, MASAO OGINO, and RYUJI SHIOYA. "PERFORMANCE EVALUATION OF DOMAIN DECOMPOSITION METHOD WITH SPARSE MATRIX STORAGE SCHEMES IN MODERN SUPERCOMPUTER." International Journal of Computational Methods 11, supp01 (November 2014): 1344007. http://dx.doi.org/10.1142/s0219876213440076.

Full text
Abstract:
The use of proper data structures with corresponding algorithms is critical to achieve good performance in scientific computing. The need of sparse matrix vector multiplication in each iteration of the iterative domain decomposition method has led to implementation of a variety of sparse matrix storage formats. Many storage formats have been presented to represent sparse matrix and integrated in the method. In this paper, the storage efficiency of those sparse matrix storage formats are evaluated and compared. The performance results of sparse matrix vector multiplication used in the domain decomposition method is considered. Based on our experiments in the FX10 supercomputer system, some useful conclusions that can serve as guidelines for the optimization of domain decomposition method are extracted.
APA, Harvard, Vancouver, ISO, and other styles
3

Chen, Shizhao, Jianbin Fang, Chuanfu Xu, and Zheng Wang. "Adaptive Hybrid Storage Format for Sparse Matrix–Vector Multiplication on Multi-Core SIMD CPUs." Applied Sciences 12, no. 19 (September 29, 2022): 9812. http://dx.doi.org/10.3390/app12199812.

Full text
Abstract:
Optimizing sparse matrix–vector multiplication (SpMV) is challenging due to the non-uniform distribution of the non-zero elements of the sparse matrix. The best-performing SpMV format changes depending on the input matrix and the underlying architecture, and there is no “one-size-fit-for-all” format. A hybrid scheme combining multiple SpMV storage formats allows one to choose an appropriate format to use for the target matrix and hardware. However, existing hybrid approaches are inadequate for utilizing the SIMD cores of modern multi-core CPUs with SIMDs, and it remains unclear how to best mix different SpMV formats for a given matrix. This paper presents a new hybrid storage format for sparse matrices, specifically targeting multi-core CPUs with SIMDs. Our approach partitions the target sparse matrix into two segmentations based on the regularities of the memory access pattern, where each segmentation is stored in a format suitable for its memory access patterns. Unlike prior hybrid storage schemes that rely on the user to determine the data partition among storage formats, we employ machine learning to build a predictive model to automatically determine the partition threshold on a per matrix basis. Our predictive model is first trained off line, and the trained model can be applied to any new, unseen sparse matrix. We apply our approach to 956 matrices and evaluate its performance on three distinct multi-core CPU platforms: a 72-core Intel Knights Landing (KNL) CPU, a 128-core AMD EPYC CPU, and a 64-core Phytium ARMv8 CPU. Experimental results show that our hybrid scheme, combined with the predictive model, outperforms the best-performing alternative by 2.9%, 17.5% and 16% on average on KNL, AMD, and Phytium, respectively.
APA, Harvard, Vancouver, ISO, and other styles
4

Sanderson, Conrad, and Ryan Curtin. "Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation." Mathematical and Computational Applications 24, no. 3 (July 19, 2019): 70. http://dx.doi.org/10.3390/mca24030070.

Full text
Abstract:
Despite the importance of sparse matrices in numerous fields of science, software implementations remain difficult to use for non-expert users, generally requiring the understanding of the underlying details of the chosen sparse matrix storage format. In addition, to achieve good performance, several formats may need to be used in one program, requiring explicit selection and conversion between the formats. This can be both tedious and error-prone, especially for non-expert users. Motivated by these issues, we present a user-friendly and open-source sparse matrix class for the C++ language, with a high-level application programming interface deliberately similar to the widely-used MATLAB language. This facilitates prototyping directly in C++ and aids the conversion of research code into production environments. The class internally uses two main approaches to achieve efficient execution: (i) a hybrid storage framework, which automatically and seamlessly switches between three underlying storage formats (compressed sparse column, red-black tree, coordinate list) depending on which format is best suited and/or available for specific operations, and (ii) a template-based meta-programming framework to automatically detect and optimise the execution of common expression patterns. Empirical evaluations on large sparse matrices with various densities of non-zero elements demonstrate the advantages of the hybrid storage framework and the expression optimisation mechanism.
APA, Harvard, Vancouver, ISO, and other styles
5

FRAGUELA, BASILIO B., RAMÓN DOALLO, and EMILIO L. ZAPATA. "MEMORY HIERARCHY PERFORMANCE PREDICTION FOR BLOCKED SPARSE ALGORITHMS." Parallel Processing Letters 09, no. 03 (September 1999): 347–60. http://dx.doi.org/10.1142/s0129626499000323.

Full text
Abstract:
Nowadays the performance gap between processors and main memory makes an efficient usage of the memory hierarchy necessary for good program performance. Several techniques have been proposed for this purpose. Nevertheless most of them consider only regular access patterns, while many scientific and numerical applications give place to irregular patterns. A typical case is that of indirect accesses due to the use of compressed storage formats for sparse matrices. This paper describes an analytic approach to model both regular and irregular access patterns. The application modeled is an optimized sparse matrix-dense matrix product algorithm with several levels of blocking. Our model can be directly applied to any memory hierarchy consisting of K-way associative caches. Results are shown for several current microprocessor architectures.
APA, Harvard, Vancouver, ISO, and other styles
6

Smith, Barry F., and William D. Gropp. "The Design of Data-Structure-Neutral Libraries for the Iterative Solution of Sparse Linear Systems." Scientific Programming 5, no. 4 (1996): 329–36. http://dx.doi.org/10.1155/1996/417629.

Full text
Abstract:
Over the past few years several proposals have been made for the standardization of sparse matrix storage formats in order to allow for the development of portable matrix libraries for the iterative solution of linear systems. We believe that this is the wrong approach. Rather than define one standard (or a small number of standards) for matrix storage, the community should define an interface (i.e., the calling sequences) for the functions that act on the data. In addition, we cannot ignore the interface to the vector operations because, in many applications, vectors may not be stored as consecutive elements in memory. With the acceptance of shared memory, distributed memory, and cluster memory parallel machines, the flexibility of the distribution of the elements of vectors is also extremely important. This issue is ignored in most proposed standards. In this article we demonstrate how such libraries may be written using data encapsulation techniques.
APA, Harvard, Vancouver, ISO, and other styles
7

Guo, Dahai, and William Gropp. "Applications of the streamed storage format for sparse matrix operations." International Journal of High Performance Computing Applications 28, no. 1 (January 3, 2013): 3–12. http://dx.doi.org/10.1177/1094342012470469.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Akhunov, R. R., S. P. Kuksenko, V. K. Salov, and T. R. Gazizov. "Sparse matrix storage formats and acceleration of iterative solution of linear algebraic systems with dense matrices." Journal of Mathematical Sciences 191, no. 1 (April 21, 2013): 10–18. http://dx.doi.org/10.1007/s10958-013-1296-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Merrill, Duane, and Michael Garland. "Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format." ACM SIGPLAN Notices 51, no. 8 (November 9, 2016): 1–2. http://dx.doi.org/10.1145/3016078.2851190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Jilin, Jian Wan, Fangfang Li, Jie Mao, Li Zhuang, Junfeng Yuan, Enyi Liu, and Zhuoer Yu. "Efficient sparse matrix–vector multiplication using cache oblivious extension quadtree storage format." Future Generation Computer Systems 54 (January 2016): 490–500. http://dx.doi.org/10.1016/j.future.2015.03.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Smith, Barry, and Hong Zhang. "Sparse triangular solves for ILU revisited: data layout crucial to better performance." International Journal of High Performance Computing Applications 25, no. 4 (December 5, 2010): 386–91. http://dx.doi.org/10.1177/1094342010389857.

Full text
Abstract:
A key to good processor utilization for sparse matrix computations is storing the data in the format that is most conducive to fast access by the memory system. In particular, for sparse matrix triangular solves the traditional compressed sparse matrix format is poor, and minor adjustments to the data structure can increase the processor utilization dramatically. Such adjustments involve storing the L and U factors separately and storing the U rows ‘backwards' so that they are accessed in a simple streaming fashion during the triangular solves. Changes to the PETSc libraries to use this modified storage format resulted in over twice the floating-point rate for some matrices. This improvement can be accounted for by a decrease in the cache misses and TLB (transaction lookaside buffer) misses in the modified code.
APA, Harvard, Vancouver, ISO, and other styles
12

Ji, Guo Liang, Yang De Feng, Wen Kai Cui, and Liang Gang Lu. "Implementation Procedures of Parallel Preconditioning with Sparse Matrix Based on FEM." Applied Mechanics and Materials 166-169 (May 2012): 3166–73. http://dx.doi.org/10.4028/www.scientific.net/amm.166-169.3166.

Full text
Abstract:
A technique to assemble global stiffness matrix stored in sparse storage format and two parallel solvers for sparse linear systems based on FEM are presented. The assembly method uses a data structure named associated node at intermediate stages to finally arrive at the Compressed Sparse Row (CSR) format. The associated nodes record the information about the connection of nodes in the mesh. The technique can reduce large memory because it only stores the nonzero elements of the global stiffness matrix. This method is simple and effective. The solvers are Restarted GMRES iterative solvers with Jacobi and sparse appropriate inverse (SPAI) preconditioning, respectively. Some numerical experiments show that the both preconditioners can improve the convergence of the iterative method, and SPAI is more powerful than Jacobi in the sence of reducing the number of iterations and parallel efficiency. Both of the two solvers can be used to solve large sparse linear system.
APA, Harvard, Vancouver, ISO, and other styles
13

Oganesyan, P. A., and O. O. Shtein. "Implementation of Basic Operations for Sparse Matrices when Solving a Generalized Eigenvalue Problem in the ACELAN-COMPOS Complex." Advanced Engineering Research 23, no. 2 (July 14, 2023): 121–29. http://dx.doi.org/10.23947/2687-1653-2023-23-2-121-129.

Full text
Abstract:
Introduction. The widespread use of piezoelectric materials in various industries stimulates the study of their physical characteristics and determines the urgency of such research. In this case, modal analysis makes it possible to determine the operating frequency and the coefficient of electromechanical coupling of piezoelectric elements of various devices. These indicators are of serious theoretical and applied interest. The study was aimed at the development of numerical methods for solving the problem of determining resonance frequencies in a system of elastic bodies. To achieve this goal, we needed new approaches to the discretization of the problem based on the finite element method and the execution of the software implementation of the selected method in C# on the .net platform. Current solutions were created in the context of the ACELAN-COMPOS class library. The known methods of solving the generalized eigenvalue problem based on matrix inversion are not applicable to large-dimensional matrices. To overcome this limitation, the presented scientific work implemented the logic of constructing mass matrices and created software interfaces for exchanging data on eigenvalue problems with pre- and postprocessing modules.Materials and Methods. A platform was used to implement numerical methods .net and the C# programming language. Validation of the research results was carried out through comparing the values found with solutions obtained in well-known SAE packages (computer-aided engineering). The created routines were evaluated in terms of performance and applicability for large-scale tasks. Numerical experiments were carried out to validate new algorithms in small-dimensional problems that were solved by known methods in MATLAB. Next, the approach was tested on tasks with a large number of unknowns and taking into account the parallelization of individual operations. To avoid finding the inverse matrix, a modified Lanczos method was programmatically implemented. We examined the formats for storing matrices in RAM: triplets, CSR, СSC, Skyline. To solve a system of linear algebraic equations (SLAE), an iterative symmetric LQ method adapted to these storage formats was used.Results. New calculation modules integrated into the class library of the ACELAN-COMPOS complex were developed. Calculations were carried out to determine the applicability of various formats for storing sparse matrices in RAM and various methods for implementing operations with sparse matrices. The structure of stiffness matrices constructed for the same task, but with different renumbering of nodes of a finite element grid, was graphically visualized. In relation to the problem of the theory of electroelasticity, data on the time required to perform basic operations with stiffness matrices in various storage formats were summarized and presented in the form of a table. It has been established that the renumbering of grid nodes gives a significant increase in performance even without changing the internal structure of the matrix in memory. Taking into account the objectives of the study, the advantages and weaknesses of known matrix storage formats were named. Thus, CSR was optimal when multiplying a matrix by a vector, SKS was optimal when inverting a matrix. In problems with the number of unknowns of the order of 103, iterative methods for solving a generalized eigenvalue problem won in speed. The performance of the software implementation of the Lanczos method was evaluated. The contribution of all operations to the total solution time was measured. It has been found that the operation of solving SLAE takes up to 95% of the total time of the algorithm. When solving the SLAE by symmetric LQ method, the greatest computational costs were needed to multiply the matrix by a vector. To increase the performance of the algorithm, parallelization with shared memory was resorted to. When using eight threads, the performance gain increased by 40–50%.Discussion and Conclusion. The software modules obtained as part of the scientific work were implemented in the ACELAN-COMPOS package. Their performance for model problems with quasi-regular finite element grids was estimated. Taking into account the features of the structures of the stiffness and mass matrices obtained through solving the generalized eigenvalue problem for an electroelastic body, the preferred methods for their processing were determined.
APA, Harvard, Vancouver, ISO, and other styles
14

Bramas, Bérenger, and Pavel Kus. "Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions." PeerJ Computer Science 4 (April 30, 2018): e151. http://dx.doi.org/10.7717/peerj-cs.151.

Full text
Abstract:
The sparse matrix-vector product (SpMV) is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC) community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. Although it has been shown that block-based kernels help to achieve high performance, they are difficult to use in practice because of the zero padding they require. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a simple interpolation of results from previous executions. We compare the performance of our approach to that of the Intel MKL CSR kernel and the CSR5 open-source package on a set of standard benchmark matrices. We show that we can achieve significant improvements in many cases, both for sequential and for parallel executions. Finally, we provide the corresponding code in an open source library, called SPC5.
APA, Harvard, Vancouver, ISO, and other styles
15

Mahmoud, Mohammed, Mark Hoffmann, and Hassan Reza. "Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU." Computation 6, no. 3 (August 24, 2018): 45. http://dx.doi.org/10.3390/computation6030045.

Full text
Abstract:
Sparse matrix-vector multiplication (SpMV) can be used to solve diverse-scaled linear systems and eigenvalue problems that exist in numerous, and varying scientific applications. One of the scientific applications that SpMV is involved in is known as Configuration Interaction (CI). CI is a linear method for solving the nonrelativistic Schrödinger equation for quantum chemical multi-electron systems, and it can deal with the ground state as well as multiple excited states. In this paper, we have developed a hybrid approach in order to deal with CI sparse matrices. The proposed model includes a newly-developed hybrid format for storing CI sparse matrices on the Graphics Processing Unit (GPU). In addition to the new developed format, the proposed model includes the SpMV kernel for multiplying the CI matrix (proposed format) by a vector using the C language and the Compute Unified Device Architecture (CUDA) platform. The proposed SpMV kernel is a vector kernel that uses the warp approach. We have gauged the newly developed model in terms of two primary factors, memory usage and performance. Our proposed kernel was compared to the cuSPARSE library and the CSR5 (Compressed Sparse Row 5) format and already outperformed both.
APA, Harvard, Vancouver, ISO, and other styles
16

Mohammed, Saira Banu Jamal, M. Rajasekhara Babu, and Sumithra Sriram. "GPU Implementation of Image Convolution Using Sparse Model with Efficient Storage Format." International Journal of Grid and High Performance Computing 10, no. 1 (January 2018): 54–70. http://dx.doi.org/10.4018/ijghpc.2018010104.

Full text
Abstract:
With the growth of data parallel computing, role of GPU computing in non-graphic applications such as image processing becomes a focus in research fields. Convolution is an integral operation in filtering, smoothing and edge detection. In this article, the process of convolution is realized as a sparse linear system and is solved using Sparse Matrix Vector Multiplication (SpMV). The Compressed Sparse Row (CSR) format of SPMV shows better CPU performance compared to normal convolution. To overcome the stalling of threads for short rows in the GPU implementation of CSR SpMV, a more efficient model is proposed, which uses the Adaptive-Compressed Row Storage (A-CSR) format to implement the same. Using CSR in the convolution process achieves a 1.45x and a 1.159x increase in speed compared to the normal convolution of image smoothing and edge detection operations, respectively. An average speedup of 2.05x is achieved for image smoothing technique and 1.58x for edge detection technique in GPU platform usig adaptive CSR format.
APA, Harvard, Vancouver, ISO, and other styles
17

Zhang, Jianfei, and Lei Zhang. "Efficient CUDA Polynomial Preconditioned Conjugate Gradient Solver for Finite Element Computation of Elasticity Problems." Mathematical Problems in Engineering 2013 (2013): 1–12. http://dx.doi.org/10.1155/2013/398438.

Full text
Abstract:
Graphics processing unit (GPU) has obtained great success in scientific computations for its tremendous computational horsepower and very high memory bandwidth. This paper discusses the efficient way to implement polynomial preconditioned conjugate gradient solver for the finite element computation of elasticity on NVIDIA GPUs using compute unified device architecture (CUDA). Sliced block ELLPACK (SBELL) format is introduced to store sparse matrix arising from finite element discretization of elasticity with fewer padding zeros than traditional ELLPACK-based formats. Polynomial preconditioning methods have been investigated both in convergence and running time. From the overall performance, the least-squares (L-S) polynomial method is chosen as a preconditioner in PCG solver to finite element equations derived from elasticity for its best results on different example meshes. In the PCG solver, mixed precision algorithm is used not only to reduce the overall computational, storage requirements and bandwidth but to make full use of the capacity of the GPU devices. With SBELL format and mixed precision algorithm, the GPU-based L-S preconditioned CG can get a speedup of about 7–9 to CPU-implementation.
APA, Harvard, Vancouver, ISO, and other styles
18

Li, Yishui, Peizhen Xie, Xinhai Chen, Jie Liu, Bo Yang, Shengguo Li, Chunye Gong, Xinbiao Gan, and Han Xu. "VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors." Journal of Supercomputing 76, no. 3 (April 10, 2019): 2063–81. http://dx.doi.org/10.1007/s11227-019-02835-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Boo, Hee-Hyung, and Sung-Ho Kim. "Two dimensional variable-length vector storage format for efficient storage of sparse matrix in the finite element method." Journal of the Korea Society of Computer and Information 17, no. 9 (September 30, 2012): 9–16. http://dx.doi.org/10.9708/jksci/2012.17.9.009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

AlAhmadi, Sarah, Thaha Mohammed, Aiiad Albeshri, Iyad Katib, and Rashid Mehmood. "Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)." Electronics 9, no. 10 (October 13, 2020): 1675. http://dx.doi.org/10.3390/electronics9101675.

Full text
Abstract:
Graphics processing units (GPUs) have delivered a remarkable performance for a variety of high performance computing (HPC) applications through massive parallelism. One such application is sparse matrix-vector (SpMV) computations, which is central to many scientific, engineering, and other applications including machine learning. No single SpMV storage or computation scheme provides consistent and sufficiently high performance for all matrices due to their varying sparsity patterns. An extensive literature review reveals that the performance of SpMV techniques on GPUs has not been studied in sufficient detail. In this paper, we provide a detailed performance analysis of SpMV performance on GPUs using four notable sparse matrix storage schemes (compressed sparse row (CSR), ELLAPCK (ELL), hybrid ELL/COO (HYB), and compressed sparse row 5 (CSR5)), five performance metrics (execution time, giga floating point operations per second (GFLOPS), achieved occupancy, instructions per warp, and warp execution efficiency), five matrix sparsity features (nnz, anpr, nprvariance, maxnpr, and distavg), and 17 sparse matrices from 10 application domains (chemical simulations, computational fluid dynamics (CFD), electromagnetics, linear programming, economics, etc.). Subsequently, based on the deeper insights gained through the detailed performance analysis, we propose a technique called the heterogeneous CPU–GPU Hybrid (HCGHYB) scheme. It utilizes both the CPU and GPU in parallel and provides better performance over the HYB format by an average speedup of 1.7x. Heterogeneous computing is an important direction for SpMV and other application areas. Moreover, to the best of our knowledge, this is the first work where the SpMV performance on GPUs has been discussed in such depth. We believe that this work on SpMV performance analysis and the heterogeneous scheme will open up many new directions and improvements for the SpMV computing field in the future.
APA, Harvard, Vancouver, ISO, and other styles
21

Ahmed, Muhammad, Sardar Usman, Nehad Ali Shah, M. Usman Ashraf, Ahmed Mohammed Alghamdi, Adel A. Bahadded, and Khalid Ali Almarhabi. "AAQAL: A Machine Learning-Based Tool for Performance Optimization of Parallel SPMV Computations Using Block CSR." Applied Sciences 12, no. 14 (July 13, 2022): 7073. http://dx.doi.org/10.3390/app12147073.

Full text
Abstract:
The sparse matrix–vector product (SpMV), considered one of the seven dwarfs (numerical methods of significance), is essential in high-performance real-world scientific and analytical applications requiring solution of large sparse linear equation systems, where SpMV is a key computing operation. As the sparsity patterns of sparse matrices are unknown before runtime, we used machine learning-based performance optimization of the SpMV kernel by exploiting the structure of the sparse matrices using the Block Compressed Sparse Row (BCSR) storage format. As the structure of sparse matrices varies across application domains, optimizing the block size is important for reducing the overall execution time. Manual allocation of block sizes is error prone and time consuming. Thus, we propose AAQAL, a data-driven, machine learning-based tool that automates the process of data distribution and selection of near-optimal block sizes based on the structure of the matrix. We trained and tested the tool using different machine learning methods—decision tree, random forest, gradient boosting, ridge regressor, and AdaBoost—and nearly 700 real-world matrices from 43 application domains, including computer vision, robotics, and computational fluid dynamics. AAQAL achieved 93.47% of the maximum attainable performance with a substantial difference compared to in practice manual or random selection of block sizes. This is the first attempt at exploiting matrix structure using BCSR, to select optimal block sizes for the SpMV computations using machine learning techniques.
APA, Harvard, Vancouver, ISO, and other styles
22

Moussaoui, Mohammed Lamine, Abderrahmane Kibboua, and Mohamed Chabaat. "Contribution to Bridge Damage Analysis." Applied Mechanics and Materials 704 (December 2014): 435–41. http://dx.doi.org/10.4028/www.scientific.net/amm.704.435.

Full text
Abstract:
Structural damage detection has become an important research area since several works [2] were focused on the crack zones detection in order to foresee the appropriate solutions. The present research aims to carry out the reinforced concrete bridge damage detection with the finite element mathematical model updating method (MMUM). Unknown degrees of freedom dof are expanded from measured ones. The partitioned system of equations has provided a large sub-system of equations which can be solved efficiently by handling sparse matrix algorithms at each time step of the finite time centered space FTCS discretization. A new and efficient method for the calculation of the constant strain tetrahedron shape functions has been developed [1,3,4,5,6]. The topological and analytical geometry of the tetrahedron and its useful formulae enabled us to develop its shape functions and its corresponding finite element matrices. The global finite element matrices and sparse matrix computations have been achieved with a calculus source code. The reinforced concrete mixture has been modeled with the mixture laws [16] which led to its material properties matrix as an orthotropic case with 9 constants and 2 planes of symmetry from the generalizedHooke’slaw [1]. It is noticed that the material is made of steel, cement, gravels, sand and impurities. The data computations have been implemented with optimized cpu time and data storage using vectorial programming of efficient algorithms [11,12]. The sparse matrix algorithms used in this study are: solution of symmetric systems of equations UTDUd=R, multiplication, addition, transposition, permutation of rows and columns, and ordering of the matrices representations. All the sparse matrices are given in row-wise sparse format.
APA, Harvard, Vancouver, ISO, and other styles
23

Zeng, Guangsen, and Yi Zou. "Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs." Electronics 12, no. 17 (August 31, 2023): 3687. http://dx.doi.org/10.3390/electronics12173687.

Full text
Abstract:
Sparse matrix-vector multiplication (SpMV) is central to many scientific, engineering, and other applications, including machine learning. Compressed Sparse Row (CSR) is a widely used sparse matrix storage format. SpMV using the CSR format on GPU computing platforms is widely studied, where the access behavior of GPU is often the performance bottleneck. The Ampere GPU architecture recently from NVIDIA provides a new asynchronous memory copy instruction, memcpy_async, for more efficient data movement in shared memory. Leveraging the capability of this new memcpy_async instruction, we first propose the CSR-Partial-Overlap to carefully overlap the data copy from global memory to shared memory and computation, allowing us to take full advantage of the data transfer time. In addition, we design the dynamic batch partition and the dynamic threads distribution to achieve effective load balancing, avoid the overhead of fixing up partial sums, and improve thread utilization. Furthermore, we propose the CSR-Full-Overlap based on the CSR-Partial-Overlap, which takes the overlap of data transfer from host to device and SpMV kernel execution into account as well. The CSR-Full-Overlap unifies the two major overlaps in SpMV and hides the computation as much as possible in the two important access behaviors of the GPU. This allows CSR-Full-Overlap to achieve the best performance gains from both overlaps. As far as we know, this paper is the first in-depth study of how memcpy_async can be potentially applied to help accelerate SpMV computation in GPU platforms. We compare CSR-Full-Overlap to the current state-of-the-art cuSPARSE, where our experimental results show an average 2.03x performance gain and up to 2.67x performance gain.
APA, Harvard, Vancouver, ISO, and other styles
24

Gao, Jiaquan, Yuanshen Zhou, and Kesong Wu. "A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication." Parallel Processing Letters 26, no. 04 (December 2016): 1640001. http://dx.doi.org/10.1142/s0129626416400016.

Full text
Abstract:
Accelerating the sparse matrix-vector multiplication (SpMV) on the graphics processing units (GPUs) has attracted considerable attention recently. We observe that on a specific multiple-GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi-GPU parallel SpMV optimization model. Our model involves two stages. In the first stage, a simple rule is defined to divide any given matrix among multiple GPUs, and then a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels. Using these models, we construct in the second stage an optimally multi-GPU parallel SpMV algorithm that is automatically and rapidly generated for the platform for any problem. Given that our model for SpMV is general, independent of the problems, and dependent on the resources of devices, this model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model.
APA, Harvard, Vancouver, ISO, and other styles
25

Huang, Lan, Jia Zeng, Shiqi Sun, Wencong Wang, Yan Wang, and Kangping Wang. "Coarse-Grained Pruning of Neural Network Models Based on Blocky Sparse Structure." Entropy 23, no. 8 (August 13, 2021): 1042. http://dx.doi.org/10.3390/e23081042.

Full text
Abstract:
Deep neural networks may achieve excellent performance in many research fields. However, many deep neural network models are over-parameterized. The computation of weight matrices often consumes a lot of time, which requires plenty of computing resources. In order to solve these problems, a novel block-based division method and a special coarse-grained block pruning strategy are proposed in this paper to simplify and compress the fully connected structure, and the pruned weight matrices with a blocky structure are then stored in the format of Block Sparse Row (BSR) to accelerate the calculation of the weight matrices. First, the weight matrices are divided into square sub-blocks based on spatial aggregation. Second, a coarse-grained block pruning procedure is utilized to scale down the model parameters. Finally, the BSR storage format, which is much more friendly to block sparse matrix storage and computation, is employed to store these pruned dense weight blocks to speed up the calculation. In the following experiments on MNIST and Fashion-MNIST datasets, the trend of accuracies with different pruning granularities and different sparsity is explored in order to analyze our method. The experimental results show that our coarse-grained block pruning method can compress the network and can reduce the computational cost without greatly degrading the classification accuracy. The experiment on the CIFAR-10 dataset shows that our block pruning strategy can combine well with the convolutional networks.
APA, Harvard, Vancouver, ISO, and other styles
26

Liu, Guangwei, Zhen Hao, Zheng Niu, Ka Mu, Jixian Ma, and Wenzhe Zhang. "Lossless Compression and Optimization Method of Data Flow Efficiency of Infrared Image Depth Learning Model of Substation Equipment." Journal of Physics: Conference Series 2320, no. 1 (August 1, 2022): 012026. http://dx.doi.org/10.1088/1742-6596/2320/1/012026.

Full text
Abstract:
Abstract Infrared detection of substation equipment has become one of the important means for live detection of power grid equipment. The insulation fault and equipment defect of electrical equipment in operation can be found by using infrared thermal imaging technology. Infrared image deep learning model usually needs a very large computational cost and storage space, which greatly limits the application of deep learning model in embedded terminal. In order to apply the deep learning model well in embedded devices with limited resources, this paper proposes a lossless compression method of infrared image deep learning model of substation equipment, which can reduce the size of the deep learning network model by 35 to 49 times without loss of recognition accuracy. A hybrid sparse matrix storage format based on recursion and a cache blocking method based on multi-core/many-core processors are proposed to optimize the data flow efficiency between processors.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhang, Yi, Zebin Wu, Jin Sun, Yan Zhang, Yaoqin Zhu, Jun Liu, Qitao Zang, and Antonio Plaza. "A Distributed Parallel Algorithm Based on Low-Rank and Sparse Representation for Anomaly Detection in Hyperspectral Images." Sensors 18, no. 11 (October 25, 2018): 3627. http://dx.doi.org/10.3390/s18113627.

Full text
Abstract:
Anomaly detection aims to separate anomalous pixels from the background, and has become an important application of remotely sensed hyperspectral image processing. Anomaly detection methods based on low-rank and sparse representation (LRASR) can accurately detect anomalous pixels. However, with the significant volume increase of hyperspectral image repositories, such techniques consume a significant amount of time (mainly due to the massive amount of matrix computations involved). In this paper, we propose a novel distributed parallel algorithm (DPA) by redesigning key operators of LRASR in terms of MapReduce model to accelerate LRASR on cloud computing architectures. Independent computation operators are explored and executed in parallel on Spark. Specifically, we reconstitute the hyperspectral images in an appropriate format for efficient DPA processing, design the optimized storage strategy, and develop a pre-merge mechanism to reduce data transmission. Besides, a repartitioning policy is also proposed to improve DPA’s efficiency. Our experimental results demonstrate that the newly developed DPA achieves very high speedups when accelerating LRASR, in addition to maintaining similar accuracies. Moreover, our proposed DPA is shown to be scalable with the number of computing nodes and capable of processing big hyperspectral images involving massive amounts of data.
APA, Harvard, Vancouver, ISO, and other styles
28

Benatia, Akrem, Weixing Ji, Yizhuo Wang, and Feng Shi. "Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms." International Journal of High Performance Computing Applications 34, no. 1 (November 14, 2019): 66–80. http://dx.doi.org/10.1177/1094342019886628.

Full text
Abstract:
Sparse matrix–vector multiplication (SpMV) kernel dominates the computing cost in numerous applications. Most of the existing studies dedicated to improving this kernel have been targeting just one type of processing units, mainly multicore CPUs or graphics processing units (GPUs), and have not explored the potential of the recent, rapidly emerging, CPU-GPU heterogeneous platforms. To take full advantage of these heterogeneous systems, the input sparse matrix has to be partitioned on different available processing units. The partitioning problem is more challenging with the existence of many sparse formats whose performances depend both on the sparsity of the input matrix and the used hardware. Thus, the best performance does not only depend on how to partition the input sparse matrix but also on which sparse format to use for each partition. To address this challenge, we propose in this article a new CPU-GPU heterogeneous method for computing the SpMV kernel that combines between different sparse formats to achieve better performance and better utilization of CPU-GPU heterogeneous platforms. The proposed solution horizontally partitions the input matrix into multiple block-rows and predicts their best sparse formats using machine learning-based performance models. A mapping algorithm is then used to assign the block-rows to the CPU and GPU(s) available in the system. Our experimental results using real-world large unstructured sparse matrices on two different machines show a noticeable performance improvement.
APA, Harvard, Vancouver, ISO, and other styles
29

Langr, Daniel, and Ivan Šimeček. "Analysis of Memory Footprints of Sparse Matrices Partitioned Into Uniformly-Sized Blocks." Scalable Computing: Practice and Experience 19, no. 3 (September 14, 2018): 275–92. http://dx.doi.org/10.12694/scpe.v19i3.1358.

Full text
Abstract:
The presented study analyses memory footprints of 563 representative benchmark sparse matrices with respect to their partitioning into uniformly-sized blocks. Different block sizes and different ways of storing blocks in memory are considered and statistically evaluated. Memory footprints of partitioned matrices are then compared with their lower bounds and CSR, index-compressed CSR, and EBF storage formats. The results show that blocking-based storage formats may significantly reduce memory footprints of sparse matrices arising from a wide range of application domains. Additionally, measured consistency of results is presented and discussed, benefits of individual formats for storing blocks are evaluated, and an analysis of best-case and worst-case matrices is provided for in-depth understanding of causes of memory savings of blocking-based formats.
APA, Harvard, Vancouver, ISO, and other styles
30

., V. Kabeer. "SPARSE MATRIX STORAGE USING DECIMAL CODING." International Journal of Research in Engineering and Technology 05, no. 34 (October 25, 2016): 12–15. http://dx.doi.org/10.15623/ijret.2016.0534003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Bani-Ismail, Basel, and Ghassan Kanaan. "Comparing Different Sparse Matrix Storage Structures as Index Structure for Arabic Text Collection." International Journal of Information Retrieval Research 2, no. 2 (April 2012): 52–67. http://dx.doi.org/10.4018/ijirr.2012040105.

Full text
Abstract:
In the authors’ study they evaluate and compare the storage efficiency of different sparse matrix storage structures as index structure for Arabic text collection and their corresponding sparse matrix-vector multiplication algorithms to perform query processing in any Information Retrieval (IR) system. The study covers six sparse matrix storage structures including the Coordinate Storage (COO), Compressed Sparse Row (CSR), Compressed Sparse Column (CSC), Block Coordinate (BCO), Block Sparse Row (BSR), and Block Sparse Column (BSC). Evaluation depends on the storage space requirements for each storage structure and the efficiency of the query processing algorithm. The experimental results demonstrate that CSR is more efficient in terms of storage space requirements and query processing time than the other sparse matrix storage structures. The results also show that CSR requires the least amount of disk space and performs the best in terms of query processing time compared with the other point entry storage structures (COO, CSC). The results demonstrate that BSR requires the least amount of disk space and performs the best in terms of query processing time compared with the other block entry storage structures (BCO, BSC).
APA, Harvard, Vancouver, ISO, and other styles
32

Wang, Ying, and Korhan Cengiz. "Implementation of the Spark technique in a matrix distributed computing algorithm." Journal of Intelligent Systems 31, no. 1 (January 1, 2022): 660–71. http://dx.doi.org/10.1515/jisys-2022-0051.

Full text
Abstract:
Abstract Two analyzes of Spark engine performance strategies to implement the Spark technique in a matrix distributed computational algorithm, the multiplication of a sparse multiplication operational test model. The dimensions of the two input sparse matrices have been fixed to 30,000 × 30,000, and the density of the input matrix have been changed. The experimental results show that when the density reaches about 0.3, the original dense matrix multiplication performance can outperform the sparse-sparse matrix multiplication, which is basically consistent with the relationship between the sparse matrix multiplication implementation in the single-machine sparse matrix test and the computational performance of the local native library. When the density of the fixed sparse matrix is 0.01, the distributed density-sparse matrix multiplication outperforms the same sparsity but uses the density matrix storage, and the acceleration ratio increases from 1.88× to 5.71× with the increase in dimension. The overall performance of distributed operations is improved.
APA, Harvard, Vancouver, ISO, and other styles
33

Et.al, Sudha Hanumanthu. "Universal Measurement Matrix Design for Sparse and Co-Sparse Signal Recovery." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 6 (April 10, 2021): 404–11. http://dx.doi.org/10.17762/turcomat.v12i6.1407.

Full text
Abstract:
Compressed Sensing (CS) avails mutual coherence metric to choose the measurement matrix that is incoherent with dictionary matrix. Random measurement matrices are incoherent with any dictionary, but their highly uncertain elements necessitate large storage and make hardware realization difficult. In this paper deterministic matrices are employed which greatly reduce memory space and computational complexity. To avoid the randomness completely, deterministic sub-sampling is done by choosing rows deterministically rather than randomly, so that matrix can be regenerated during reconstruction without storing it. Also matrices are generated by orthonormalization, which makes them highly incoherent with any dictionary basis. Random matrices like Gaussian, Bernoulli, semi-deterministic matrices like Toeplitz, Circulant and full-deterministic matrices like DFT, DCT, FZC-Circulant are compared. DFT matrix is found to be effective in terms of recovery error and recovery time for all the cases of signal sparsity and is applicable for signals that are sparse in any basis, hence universal.
APA, Harvard, Vancouver, ISO, and other styles
34

Grasedyck, Lars, and Wolfgang Hackbusch. "An Introduction to Hierarchical (H-) Rank and TT-Rank of Tensors with Examples." Computational Methods in Applied Mathematics 11, no. 3 (2011): 291–304. http://dx.doi.org/10.2478/cmam-2011-0016.

Full text
Abstract:
Abstract We review two similar concepts of hierarchical rank of tensors (which extend the matrix rank to higher order tensors): the TT-rank and the H-rank (hierarchical or H-Tucker rank). Based on this notion of rank, one can define a data-sparse representation of tensors involving O(dnk + dk^3) data for order d tensors with mode sizes n and rank k. Simple examples underline the differences and similarities between the different formats and ranks. Finally, we derive rank bounds for tensors in one of the formats based on the ranks in the other format.
APA, Harvard, Vancouver, ISO, and other styles
35

Song, Qi, Pu Chen, and Shuli Sun. "Partial Refactorization in Sparse Matrix Solution: A New Possibility for Faster Nonlinear Finite Element Analysis." Mathematical Problems in Engineering 2013 (2013): 1–7. http://dx.doi.org/10.1155/2013/403912.

Full text
Abstract:
This paper proposes a partial refactorization for faster nonlinear analysis based on sparse matrix solution, which is nowadays the default solution choice in finite element analysis and can solve finite element models up to millions degrees of freedom. Among various fill-in’s reducing strategies for sparse matrix solution, the graph partition is in general the best in terms of resultant fill-ins and floating-point operations and furthermore produces a particular graph of sparse matrix that prevents local change of entries from wide spreading in factorization. Based on this feature, an explicit partial triangular refactorization with local change is efficiently constructed with limited additional storage requirement in row-sparse storage scheme. The partial refactorization of the changed stiffness matrix inherits a big percentage of the original factor and is carried out only on partial factor entries. The proposed method provides a new possibility for faster nonlinear analysis and is mainly suitable for material nonlinear problems and optimization problems. Compared to full factorization, it can significantly reduce the factorization time and can make nonlinear analysis more efficient.
APA, Harvard, Vancouver, ISO, and other styles
36

Liu, Sheng, Yasong Cao, and Shuwei Sun. "Mapping and Optimization Method of SpMV on Multi-DSP Accelerator." Electronics 11, no. 22 (November 11, 2022): 3699. http://dx.doi.org/10.3390/electronics11223699.

Full text
Abstract:
Sparse matrix-vector multiplication (SpMV) solves the product of a sparse matrix and dense vector, and the sparseness of a sparse matrix is often more than 90%. Usually, the sparse matrix is compressed to save storage resources, but this causes irregular access to dense vectors in the algorithm, which takes a lot of time and degrades the SpMV performance of the system. In this study, we design a dedicated channel in the DMA to implement an indirect memory access process to speed up the SpMV operation. On this basis, we propose six SpMV algorithm schemes and map them to optimize the performance of SpMV. The results show that the M processor’s SpMV performance reached 6.88 GFLOPS. Besides, the average performance of the HPCG benchmark is 2.8 GFLOPS.
APA, Harvard, Vancouver, ISO, and other styles
37

Mukaddes, Abul Mukid Mohammad, Masao Ogino, Ryuji Shioya, and Hiroshi Kanayama. "Treatment of Block-Based Sparse Matrices in Domain Decomposition Method." International Journal of System Modeling and Simulation 2, no. 1 (March 30, 2017): 1. http://dx.doi.org/10.24178/ijsms.2017.2.1.01.

Full text
Abstract:
Abstract— The domain decomposition method involves the finite element solution of problems in the parallel computer. The finite element discretization leads to the solution of large systems of linear equation whose matrix is naturally sparse. The use of proper storing techniques for sparse matrix is fundamental especially when dealing with large scale problems typical of industrial applications. The aim of this research is to review the sparsity pattern of the matrices originating from the discretization of the elasto-plastic and thermal-convection problems. Some practical strategies dealing with sparsity pattern in the finite element code of adventure system are recalled. Several efficient storage schemes to store the matrix originating from elasto-plastic and thermal-convection problems have been proposed. In the proposed technique, inherent block pattern of the matrix is exploited to locate the matrix element. The computation in the high performance computer shows better performance compared to the conventional skyline storage method used by the most of the researchers.
APA, Harvard, Vancouver, ISO, and other styles
38

Zhou, Huilin, Youwen Liu, Yuhao Wang, Liangbing Chen, and Rongxing Duan. "Nonlinear Electromagnetic Inverse Scattering Imaging Based on IN-LSQR." International Journal of Antennas and Propagation 2018 (August 2, 2018): 1–9. http://dx.doi.org/10.1155/2018/2794646.

Full text
Abstract:
A nonlinear inversion scheme is proposed for electromagnetic inverse scattering imaging. It exploits inexact Newton (IN) and least square QR factorization (LSQR) methods to tackle the nonlinearity and ill-posedness of the electromagnetic inverse scattering problem. A nonlinear model of the inverse scattering in functional form is developed. At every IN iteration, the sparse storage method is adopted to solve the storage and computational bottleneck of Fréchet derivative matrix, a large-scale sparse Jacobian matrix. Moreover, to address the slow convergence problem encountered in the inexact Newton solution via Landweber iterations, an LSQR algorithm is proposed for obtaining a better solution of the internal large-scale sparse linear equations in the IN step. Numerical results demonstrate the applicability of the proposed IN-LSQR method to quantitative inversion of scatterer electric performance parameters. Moreover, compared with the inexact Newton method based on Landweber iterations, the proposed method significantly improves the convergence rate with less computational and storage cost.
APA, Harvard, Vancouver, ISO, and other styles
39

Zhang, Xu Dong, Jian Ye Yuan, Jing Ping Zhang, and Jian Ying Feng. "Two-Port Characteristic Analysis for Transformers with the Large Scale Windings Based on Sparse Matrix." Advanced Materials Research 986-987 (July 2014): 2035–38. http://dx.doi.org/10.4028/www.scientific.net/amr.986-987.2035.

Full text
Abstract:
In order to calculate the two-port wide frequency parameters of a large transformer quickly and accurately, parameter calculation of multi-conductor transmission lines model is proposed based on the sparse matrix operation. It solved the problem caused by the large matrix. Firstly, multi-conductor transmission lines model of the transformer windings is established. Secondly, matrix characteristics ofYandZis analyzed, based on which the block storage and computation method is applied. Then the frequency-domain model is solved based on sparse matrix operation. At last, taking a SS11-20000/110 transformer as an example, the correctness of this method has been verified by comparing calculation results with measurement results.
APA, Harvard, Vancouver, ISO, and other styles
40

CUI, Hang, Shoichi HIRASAWA, Hiroaki KOBAYASHI, and Hiroyuki TAKIZAWA. "A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats." IEICE Transactions on Information and Systems E101.D, no. 9 (September 1, 2018): 2307–14. http://dx.doi.org/10.1587/transinf.2017edp7176.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Tang, Wai Teng, Wen Jun Tan, Rick Siow Mong Goh, Stephen John Turner, and Weng-Fai Wong. "A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU." IEEE Transactions on Parallel and Distributed Systems 26, no. 9 (September 1, 2015): 2373–85. http://dx.doi.org/10.1109/tpds.2014.2357437.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Wang, Pei, Xu Sheng Yang, Zhuo Yuan Wang, Lin Gong Li, Ji Chang He, and Qing Jie Wang. "Solving Large-Scale Asymmetric Sparse Linear Equations Based on SuperLU Algorithm." Advanced Materials Research 230-232 (May 2011): 1355–61. http://dx.doi.org/10.4028/www.scientific.net/amr.230-232.1355.

Full text
Abstract:
This article introduces the recent research of SuperLU algorithms and the optimal storage method for [1] the sparse linear equations of coefficient matrix. How to solve large-scale non-symmetric sparse linear equations by SuperLU algorithm is the key part of this article. The advantage of SuperLU algorithm compared to other algorithms is summarized at last. SuperLU algorithm not only saves memory space, but also reduces the computation time. Because of less storage needed by this algorithm, it could solve equation with larger scale, which is much more useful.
APA, Harvard, Vancouver, ISO, and other styles
43

Lu, Xinmiao, Cunfang Yang, Qiong Wu, Jiaxu Wang, Yuhan Wei, Liyu Zhang, Dongyuan Li, and Lanfei Zhao. "Improved Reconstruction Algorithm of Wireless Sensor Network Based on BFGS Quasi-Newton Method." Electronics 12, no. 6 (March 7, 2023): 1267. http://dx.doi.org/10.3390/electronics12061267.

Full text
Abstract:
Aiming at the problems of low reconstruction rate and poor reconstruction precision when reconstructing sparse signals in wireless sensor networks, a sparse signal reconstruction algorithm based on the Limit-Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton method is proposed. The L-BFGS quasi-Newton method uses a two-loop recursion algorithm to find the descent direction dk directly by calculating the step difference between m adjacent iteration points, and a matrix Hk approximating the inverse of the Hessian matrix is constructed. It solves the disadvantages of BFGS requiring the calculation and storage of Hk, reduces the algorithm complexity, and improves the reconstruction rate. Finally, the experimental results show that the L-BFGS quasi-Newton method has good experimental results for solving the problem of sparse signal reconstruction in wireless sensor networks.
APA, Harvard, Vancouver, ISO, and other styles
44

Stevanović, Dragoljub, Marko Topalović, and Miroslav Živković. "IMPROVEMENT OF THE SPARSE MATRICES STORAGE ROUTINES FOR LARGE FEM CALCULATIONS." Journal of the Serbian Society for Computational Mechanics 15, no. 1 (November 1, 2021): 81–97. http://dx.doi.org/10.24874/jsscm.2021.15.01.06.

Full text
Abstract:
Efficient memory handling is one of the key issues that engineers and programmers face in developing software for numerical analysis such as the Finite Element Method. This method operates on huge matrices that have a large number of zero coefficients which waste memory, so it is necessary to save it and to work only with non-zero coefficients using so called "SPARSE" matrices. Analysis of two methods used for the improvement of "SPARSE" matrix creation is presented in this paper and their pseudo code is given. Comparison is made on a wide range of problem sizes. Results show that "indexing" method is superior to "dotting" method both in memory usage and in elapsed time.
APA, Harvard, Vancouver, ISO, and other styles
45

Yuhendri, Muldi, Ahyanuardi Ahyanuardi, and Aswardi Aswardi. "Direct Torque Control Strategy of PMSM Employing Ultra Sparse Matrix Converter." International Journal of Power Electronics and Drive Systems (IJPEDS) 9, no. 1 (March 1, 2018): 64. http://dx.doi.org/10.11591/ijpeds.v9.i1.pp64-72.

Full text
Abstract:
Matrix converter is a good choice for Permanent Magnet Synchronous Motor (PMSM) drives, because it has high power density and does not require dc-link energy storage. the disadvantages of conventional matrix converter is using 18 active switches, so it becomes expensive and the modulation method becomes more complicated than back to back converter. To minimize this problem, this paper proposes variable speed drive of PMSM using Ultra Sparse Matrix Converter (USMC) based on Direct Torque Control (DTC) methods. This converter uses only 9 active switches, making it cheaper than conventional matrix converter. DTC is designed based on Space Vector Modulation (SVM) to reduce torque and flux ripples due to the hysteresis control in conventional DTC. The simulation results show that DTC based SVM using USMC effectively controls the rotor speed with low torque and flux ripples.
APA, Harvard, Vancouver, ISO, and other styles
46

Mukaddes, A. M. M., Masao Ogino, and Ryuji Shioya. "403 A Computational Study of Sparse Matrix Storage Schemes in the Domain Decomposition Method." Proceedings of The Computational Mechanics Conference 2012.25 (2012): 95–96. http://dx.doi.org/10.1299/jsmecmd.2012.25.95.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Fernandes, P., and P. Girdinio. "A new storage scheme for an efficient implementation of the sparse matrix-vector product." Parallel Computing 12, no. 3 (December 1989): 327–33. http://dx.doi.org/10.1016/0167-8191(89)90090-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Koslowski, T., and W. Von Niessen. "Linear combination of Lanczos vectors: A storage-efficient algorithm for sparse matrix eigenvector computations." Journal of Computational Chemistry 14, no. 7 (July 1993): 769–74. http://dx.doi.org/10.1002/jcc.540140703.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Vasco, Don W., John E. Peterson, and Ernest L. Majer. "Resolving seismic anisotropy: Sparse matrix methods for geophysical inverse problems." GEOPHYSICS 63, no. 3 (May 1998): 970–83. http://dx.doi.org/10.1190/1.1444408.

Full text
Abstract:
Two techniques for the singular value decomposition (SVD) of large sparse matrices are directly applicable to geophysical inverse problems: subspace iteration and the Lanczos method. Both methods require very little in‐core storage and efficiently compute a set of singular values and singular vectors. A comparison of the singular value and vector estimates of these iterative approaches with the results of a conventional in‐core SVD algorithm demonstrates their accuracy. Hence, it is possible to conduct an assessment of inversion results for larger sparse inverse problems such as those arising in seismic tomography. As an example, we examine the resolution matrix associated with a crosswell seismic inversion of first arrival times for lateral variations in anisotropy. The application to a set of first arrival times from a crosswell survey at the Grimsel Laboratory emphasizes the utility of including anisotropy in a traveltime inversion. The isotropic component of the estimated velocity structure appears to be well constrained even when anisotropy is included in the inversion. In the case of the Grimsel experiment, we are able to resolve a fracture zone, in agreement with borehole fracture intersections. Elements of the resolution matrix reveal moderate averaging among anisotropy coefficients as well as between anisotropy coefficients and source‐receiver static terms. The information on anisotropy, such as the directions of maximum velocity, appears sensitive to lateral variations in velocity and must be interpreted with some caution.
APA, Harvard, Vancouver, ISO, and other styles
50

Kai, Hong. "Application of Internet of Things Audio Technology Based on Parallel Storage System in Music Classroom." Advances in Multimedia 2022 (September 14, 2022): 1–12. http://dx.doi.org/10.1155/2022/5883238.

Full text
Abstract:
In order to improve the effect of music classroom teaching, this paper combines the parallel storage system and the Internet of Things audio technology to construct a music education system to improve the effect of music education in colleges and universities. Moreover, this paper expounds a grid-free method for sparsity and parameter estimation from the model and mathematical principles and proposes a grid-free DOA method based on fast reconstruction of the T-matrix. This method is guaranteed to produce sparse parameter estimates. At the same time, the numerical simulation shows that the method has stable estimation performance as sparse and parameter estimation methods. In addition, this paper constructs a music classroom teaching system based on parallel storage system and Internet of Things audio technology. Through the experimental research, it can be seen that the parallel storage system and the Internet of Things audio technology has obvious application effect in the music classroom, which can effectively improve the teaching effect of the music classroom.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography