Dissertations / Theses on the topic 'Sparse Accelerator'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 31 dissertations / theses for your research on the topic 'Sparse Accelerator.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Syed, Akber. "A Hardware Interpreter for Sparse Matrix LU Factorization." University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1024934521.
Full textJamal, Aygul. "A parallel iterative solver for large sparse linear systems enhanced with randomization and GPU accelerator, and its resilience to soft errors." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS269/document.
Full textIn this PhD thesis, we address three challenges faced by linear algebra solvers in the perspective of future exascale systems: accelerating convergence using innovative techniques at the algorithm level, taking advantage of GPU (Graphics Processing Units) accelerators to enhance the performance of computations on hybrid CPU/GPU systems, evaluating the impact of errors in the context of an increasing level of parallelism in supercomputers. We are interested in studying methods that enable us to accelerate convergence and execution time of iterative solvers for large sparse linear systems. The solver specifically considered in this work is the parallel Algebraic Recursive Multilevel Solver (pARMS), which is a distributed-memory parallel solver based on Krylov subspace methods.First we integrate a randomization technique referred to as Random Butterfly Transformations (RBT) that has been successfully applied to remove the cost of pivoting in the solution of dense linear systems. Our objective is to apply this method in the ARMS preconditioner to solve more efficiently the last Schur complement system in the application of the recursive multilevel process in pARMS. The experimental results show an improvement of the convergence and the accuracy. Due to memory concerns for some test problems, we also propose to use a sparse variant of RBT followed by a sparse direct solver (SuperLU), resulting in an improvement of the execution time.Then we explain how a non intrusive approach can be applied to implement GPU computing into the pARMS solver, more especially for the local preconditioning phase that represents a significant part of the time to compute the solution. We compare the CPU-only and hybrid CPU/GPU variant of the solver on several test problems coming from physical applications. The performance results of the hybrid CPU/GPU solver using the ARMS preconditioning combined with RBT, or the ILU(0) preconditioning, show a performance gain of up to 30% on the test problems considered in our experiments.Finally we study the effect of soft fault errors on the convergence of the commonly used flexible GMRES (FGMRES) algorithm which is also used to solve the preconditioned system in pARMS. The test problem in our experiments is an elliptical PDE problem on a regular grid. We consider two types of preconditioners: an incomplete LU factorization with dual threshold (ILUT), and the ARMS preconditioner combined with RBT randomization. We consider two soft fault error modeling approaches where we perturb the matrix-vector multiplication and the application of the preconditioner, and we compare their potential impact on the convergence of the solver
Pradels, Léo. "Efficient CNN inference acceleration on FPGAs : a pattern pruning-driven approach." Electronic Thesis or Diss., Université de Rennes (2023-....), 2024. http://www.theses.fr/2024URENS087.
Full textCNN-based deep learning models provide state-of-the-art performance in image and video processing tasks, particularly for image enhancement or classification. However, these models are computationally and memory-intensive, making them unsuitable for real-time constraints on embedded FPGA systems. As a result, compressing these CNNs and designing accelerator architectures for inference that integrate compression in a hardware-software co-design approach is essential. While software optimizations like pruning have been proposed, they often lack the structured approach needed for effective accelerator integration. To address these limitations, this thesis focuses on accelerating CNNs on FPGAs while complying with real-time constraints on embedded systems. This is achieved through several key contributions. First, it introduces pattern pruning, which imposes structure on network sparsity, enabling efficient hardware acceleration with minimal accuracy loss due to compression. Second, a scalable accelerator for CNN inference is presented, which adapts its architecture based on input performance criteria, FPGA specifications, and target CNN model architecture. An efficient method for integrating pattern pruning within the accelerator and a complete flow for CNN acceleration are proposed. Finally, improvements in network compression are explored through Shift&Add quantization, which modifies FPGA computation methods while maintaining baseline network accuracy
Ramachandran, Shridhar. "Incremental PageRank acceleration using Sparse Matrix-Sparse Vector Multiplication." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1462894358.
Full textFernández, Becerra David. "Multicore acceleration of sparse electromagnetics computations." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104641.
Full textLes processeurs multicœurs sont devenus la tendance dominante de l'industrie pour accroître la performance des systèmes informatiques, forçant les concepteurs de systèmes électromagnétiques (EM) à reconcevoir leurs applications en utilisant des paradigmes de programmation parallèle. Cela est particulièrement vrai pour les calculs impliquant des structures de données complexes comme les calculs de matrices creuses qui surviennent souvent dans des simulations électromagnétiques (EM) avec la méthode d'analyse par éléments finis (FÉM). Ces calculs nécessitent de manipulation de pointeurs qui rendent inutiles de nombreuses optimisations du compilateur et les bibliothèques de mémoire partagée parallèle (OpenMP, par exemple). Ce travail présente de nouvelles structures de données rares et de nouvelles techniques afin d'exploiter efficacement le parallélisme multicœur et les unités de vecteur court (dont le dernier n'a pas été exploité par des bibliothèques de matrices creuses à la fine pointe de la technologie) pour les noyaux de calcul intensif récurrents dans les simulations EM, tels que les multiplications matrice-vecteur rares (SMVM) et des algorithmes à gradient conjugué (CG). Des performances d'accélérations jusqu'à 14 fois supérieures sont démontrées pour le noyau accéléré par SMVM et jusqu'à 5,8 fois supérieures pour le noyau CG en utilisant les méthodes proposées par rapport aux approches conventionnelles pour deux architectures multicœurs différentes. Enfin, une nouvelle méthode pour résoudre la FÉM pour le traitement parallèle est présentée et une implantation optimisée est réalisée sur deux générations d'accélérateurs de GPU NVIDIA (multicœur) avec des augmentations de performances allant jusqu'à 27,53 fois par rapport aux résultats du CPU optimisé par compilateur.
Grigoras, Paul. "Instance directed tuning for sparse matrix kernels on reconfigurable accelerators." Thesis, Imperial College London, 2018. http://hdl.handle.net/10044/1/62634.
Full textSegura, Salvador Albert. "High-performance and energy-efficient irregular graph processing on GPU architectures." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671449.
Full textEl processament de grafs és un domini prominent i establert com a la base de noves aplicacions emergents en àrees com l'anàlisi de dades i Machine Learning, que permeten aplicacions com ara navegació per carretera, xarxes socials i reconeixement automàtic de veu. La gran quantitat de dades emprades en aquests dominis requereix d’arquitectures d’alt rendiment, com ara GPGPU. Tot i que el processament de grans càrregues de treball basades en grafs presenta un alt grau de paral·lelisme, els patrons d’accés a la memòria tendeixen a ser irregulars, fet que redueix l’eficiència a causa de la divergència d’accessos a memòria. Per tal de millorar aquests problemes, les aplicacions de grafs per a GPGPU realitzen operacions de stream compaction que processen nodes/arestes per tal que els passos posteriors funcionin en un conjunt de dades compactat. Proposem deslliurar d’aquesta tasca a la extensió hardware Stream Compaction Unit (SCU) adaptada als requisits d’aquestes operacions, que a més realitza un pre-processament filtrant i reordenant els elements processats.Mostrem que les ineficiències de divergència de memòria prevalen en aplicacions GPGPU basades en grafs irregulars, tot i que trobem que és possible relaxar la relació estricta entre threads i les dades processades per obtenir noves optimitzacions. Com a tal, proposem la Irregular accesses Reorder Unit (IRU), una nova extensió de maquinari integrada al pipeline de la GPU que reordena i filtra les dades processades pels threads en accessos irregulars que milloren la convergència d’accessos a memòria. Finalment, aprofitem els punts forts de les propostes anteriors per aconseguir millores sinèrgiques. Ho fem proposant la IRU-enhanced SCU (ISCU), que utilitza els mecanismes de pre-processament eficients de la IRU per millorar l’eficiència de stream compaction de la SCU i les limitacions de rendiment de NoC a causa de les operacions de pre-processament de la SCU.
Yee, Wai Min. "Cache Design for a Hardware Accelerated Sparse Texture Storage System." Thesis, University of Waterloo, 2004. http://hdl.handle.net/10012/1197.
Full textMantell, Rosemary Genevieve. "Accelerated sampling of energy landscapes." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/267990.
Full textChen, Dong. "Acceleration of the spatial selective excitation of MRI via sparse approximation." kostenfrei, 2009. https://mediatum2.ub.tum.de/node?id=956913.
Full textSamarawickrama, Mahendra. "Acceleration Techniques for Sparse Recovery Based Plane-wave Decomposition of a Sound Field." Thesis, The University of Sydney, 2017. http://hdl.handle.net/2123/17302.
Full textThürck, Daniel [Verfasser], Kristian [Akademischer Betreuer] Kersting, Matthias [Akademischer Betreuer] Bollhöfer, and Michael [Akademischer Betreuer] Goesele. "Irregularity mitigation and portability abstractions for accelerated sparse matrix factorization / Daniel Thürck ; Kristian Kersting, Matthias Bollhöfer, Michael Goesele." Darmstadt : Universitäts- und Landesbibliothek, 2021. http://d-nb.info/1232914843/34.
Full textKarkouri, Jabrane. "Exploiting sparse spectrum to accelerate spiral magnetic resonance spectroscopic imaging : method, simulation and applications to the functional exploration of skeletal muscle." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1295.
Full textQuantifying energetic muscular metabolism and mitochondrial capacity are of crucial interest to reveal muscular disorders, metabolic diseases or cardiovascular diseases like mitochondrial myopathy, diabetes or peripheral arthery diseases. 31P spectroscopy is a non-invasive way to monitor energetic metabolism and dynamic concentrations of 31P during exercise or after during recovery, and provides informations on mitochondrial and oxidative capacity. The assessment of energetic metabolism via 31P spectroscopy can be done with non-localized spectroscopy, single voxel selection spectroscopy and Magnetic Resonance Spectroscopic Imaging (MRSI). In clinical practice, mostly non localized 31P spectroscopy is done, preventing metabolic information from different individual muscles to be measured, but an average information resulting from the whole muscle and collected at once by the surface coil used for the experiment. The use of localized 31P spectroscopy would enable to access spatially resolved information and motivate the development of new home made sequences integrating the most advanced technical developments. Magnetic resonance Chemical shift Spectroscopic Imaging (CSI) available in clinical systems have very long acquisition time that limits their clinical use to static acquisition, while this is essentially the capacity to measure 31P dynamically during an exercise protocol that is of interest. The methodological developments on MRSI realized In the context of this thesis, aimed precisely at reducing the acquisition time and in view of some clinical applications. A fast MRSI acquisition method has thus been developed involving a non-Cartesian k-space Sampling (spiral sampling), coupled to a smart under-sampling of the temporal dimension, exploiting a priori known spectral support and a least-square estimation for signal reconstruction. This method has been validated using simulations, and implemented in a MR scanner, optimized and then tested in vivo on the calf muscle for 1H and 31P MRSI applications. Dynamic 31P applications were also performed at 3T and the use of the under-sampled CSI_spiral MRSI developed sequence has been shown to adequately reveal the expected dynamic changes in PCr. Quantification of the signal further enable us to access mitochondrial capacity, with a twice higher dynamic temporal resolution compared to the fully sampled CSI_spiral MRSI case, and similar temporal resolution as the non-localized classically used MRS sequence. Those developments are of crucial interest for a spatially resolved assessment of mitochondrial capacity within different muscles, i.e. to point out individual muscle alterations related to specific damages or differences between muscle energy consumption during the exercise. Sequence improvements on 1D localized 31P spectroscopy were also integrated in the clinical sequence and used in an on-going clinical protocol; in order, in the long term, to apply the sequence developments carried out during this thesis to a clinical context. First tested on safe volunteers for reproducibility, the protocol involves patients that suffer from lower leg arteriopathy. The objective was to assess mitochondrial capacity of those patients before and after a revascularization of the damaged artery. Results showed significant improvement in mitochondrial capacity after revascularization
Kulunchakov, Andrei. "Optimisation stochastique pour l'apprentissage machine à grande échelle : réduction de la variance et accélération." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM057.
Full textA goal of this thesis is to explore several topics in optimization for high-dimensional stochastic problems. The first task is related to various incremental approaches, which rely on exact gradient information, such as SVRG, SAGA, MISO, SDCA. While the minimization of large limit sums of functions was thoroughly analyzed, we suggest in Chapter 2 a new technique, which allows to consider all these methods in a generic fashion and demonstrate their robustness to possible stochastic perturbations in the gradient information.Our technique is based on extending the concept of estimate sequence introduced originally by Yu. Nesterov in order to accelerate deterministic algorithms.Using the finite-sum structure of the problems, we are able to modify the aforementioned algorithms to take into account stochastic perturbations. At the same time, the framework allows to derive naturally new algorithms with the same guarantees as existing incremental methods. Finally, we propose a new accelerated stochastic gradient descent algorithm and a new accelerated SVRG algorithm that is robust to stochastic noise. This acceleration essentially performs the typical deterministic acceleration in the sense of Nesterov, while preserving the optimal variance convergence.Next, we address the problem of generic acceleration in stochastic optimization. For this task, we generalize in Chapter 3 the multi-stage approach called Catalyst, which was originally aimed to accelerate deterministic methods. In order to apply it to stochastic problems, we improve its flexibility on the choice of surrogate functions minimized at each stage. Finally, given an optimization method with mild convergence guarantees for strongly convex problems, our developed multi-stage procedure, accelerates convergence to a noise-dominated region, and then achieves the optimal (up to a logarithmic factor) worst-case convergence depending on the noise variance of the gradients. Thus, we successfully address the acceleration of various stochastic methods, including the variance-reduced approaches considered and generalized in Chapter 2. Again, the developed framework bears similarities with the acceleration performed by Yu. Nesterov using the estimate sequences. In this sense, we try to fill the gap between deterministic and stochastic optimization in terms of Nesterov's acceleration. A side contribution of this chapter is a generic analysis that can handle inexact proximal operators, providing new insights about the robustness of stochastic algorithms when the proximal operator cannot be exactly computed.In Chapter 4, we study properties of non-Euclidean stochastic algorithms applied to the problem of sparse signal recovery. A sparse structure significantly reduces the effects of noise in gradient observations. We propose a new stochastic algorithm, called SMD-SR, allowing to make better use of this structure. This method is a multi-step procedure which uses the stochastic mirror descent algorithm as a building block over its stages. Essentially, SMD-SR has two phases of convergence with the linear bias convergence during the preliminary phase and the optimal asymptotic rate during the asymptotic phase.Comparing to the most effective existing solution to the sparse stochastic optimization problems, we offer an improvement in several aspects. First, we establish the linear bias convergence (similar to the one of the deterministic gradient descent algorithm, when the full gradient observation is available), while showing the optimal robustness to noise. Second, we achieve this rate for a large class of noise models, including sub-Gaussian, Rademacher, multivariate Student distributions and scale mixtures. Finally, these results are obtained under the optimal condition on the level of sparsity which can approach the total number of iterations of the algorithm (up to a logarithmic factor)
Pham, Duc-An, and 范德安. "Analysis and Optimization Methodology for Sparse CNN Accelerator." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/qgub7n.
Full textWang, Bang-Chyuan, and 王邦全. "A Sparse Optimization Design for Pointwise Convolutions on the CNN Accelerator." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/xzx4ds.
Full text國立交通大學
資訊科學與工程研究所
106
In the past, IoT devices just only process simple computations or collect sensor data. With the development of Artificial Intelligence(AI), IoT devices need to have computing power to deal with AI algorithm, such as image recognition, speech recognition, etc. Machine learning models should be supported for IoT devices. For example, Convolution Neural Network(CNN) and Recurrent Neural Network(RNN). Because ability of IoT devices computing and power are restricted. In regard to these complex applications, there are designs of Application Specific Integrated Circuit(ASIC) that accelerating computation, e.g. accelerator. In analysis, most of time is consumed by convolutional computations that compared with memory access time in computation of CNN. On the other hand, the analyzing result of CNN models shows that weights of models always exist lots of zero. It means that there are many redundant zero-value multiplications at runtime of accelerator and it costs overhead of energy. Therefore, we propose a sparse optimization mechanism based on a real accelerator to reduce overhead and improve performance. In addition, we implement a tool to evaluate performance that combining behavior of accelerator and hardware parameters with a deep learning compiler, called NNVM.
Lin, Chien-Yu, and 林建宇. "Merlin: A Sparse Neural Network Accelerator Utilizing Both Neuron and Weight Sparsity." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/6aq7yc.
Full textPan, Jyun-Wei, and 潘俊瑋. "Enhancing PE Utilization on SIMD-like Accelerator for Sparse Convolutional Neural Networks." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/6qt4ud.
Full textChang, Chia-Hung, and 張嘉宏. "Design of an Inference Accelerator for CNN with Sparse Row-wise Kernel." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/vgqj7n.
Full text"Algorithm Architecture Co-design for Dense and Sparse Matrix Computations." Master's thesis, 2018. http://hdl.handle.net/2286/R.I.51737.
Full textDissertation/Thesis
Masters Thesis Computer Engineering 2018
Wu, I.-Chen, and 吳易真. "An Energy-Efficient Accelerator with Relative-Indexing Memory for Sparse Compressed Convolutional Neural Network." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/tx6yx4.
Full textRamesh, Chinthala. "Hardware-Software Co-Design Accelerators for Sparse BLAS." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/4276.
Full text"Algorithm and Hardware Design for High Volume Rate 3-D Medical Ultrasound Imaging." Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.55684.
Full textDissertation/Thesis
Doctoral Dissertation Engineering 2019
"On-Chip Learning and Inference Acceleration of Sparse Representations." Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.54867.
Full textDissertation/Thesis
Doctoral Dissertation Electrical Engineering 2019
Thürck, Daniel. "Irregularity mitigation and portability abstractions for accelerated sparse matrix factorization." Phd thesis, 2021. https://tuprints.ulb.tu-darmstadt.de/17951/1/20210420_dthuerck_dissertation_pdfa.pdf.
Full textHamlett, Matthew. "A scalable architecture for hardware acceleration of large sparse matrix calculations." 2006. http://www.lib.ncsu.edu/theses/available/etd-11062006-023159/unrestricted/etd.pdf.
Full textJheng, Hong-Yuan, and 鄭弘元. "FPGA Acceleration of Sparse Matrix-Vector Multiplication Based on Network-on-Chip." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/y884tf.
Full text國立臺灣科技大學
電子工程系
99
The Sparse Matrix-Vector Multiplication (SMVM) is a pervasive operation in many scientific and engineering applications. Moreover, SMVM is a computational intensive operation that dominates the performance in most iterative linear system solvers. There are some optimization challenges in computations involving SMVM due to its high memory access rate and irregular memory access pattern. In this thesis, a new design concept for SMVM in an FPGA by using Network-on-Chip (NoC) is presented. In traditional circuit design on-chip communications have been designed with dedicated point-to-point interconnections or shared buses. Therefore, regular data transfer is the major concern of many parallel implementations. However, when dealing with the SMVM operation, the required data transfers are usually dependent on the sparsity structure of the matrix and can be extremely irregular. Using an NoC architecture makes it possible to deal with arbitrary structure of the data transfers, i.e. with arbitrary structured sparse matrices. In addition, the size of the pipelined SMVM calculator based on NoC architecture can be customized to 2×2, 4×4, ..., p×p (p∈N) due to its high scalibility and flexibility. The implementation is done in IEEE-754 single floating-point precision on the Xilinx Virtex-6 FPGA. The experimental results show that the proposed NoC-based implementation can achieve approximate 2.3 - 5.6 speed-up over the MATLAB-based software implementation in Matrix Market benchmark applications.
Chen, Dong [Verfasser]. "Acceleration of the spatial selective excitation of MRI via sparse approximation / Dong Chen." 2009. http://d-nb.info/1000072541/34.
Full textMeila, Marina. "An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High Dimensional Sparse Data." 1999. http://hdl.handle.net/1721.1/6676.
Full textSandhu, Ali Imran. "Efficient and Accurate Numerical Techniques for Sparse Electromagnetic Imaging." Diss., 2020. http://hdl.handle.net/10754/662627.
Full textSitaraman, Hariswaran. "Magneto-hydrodynamics simulation study of high density thermal plasmas in plasma acceleration devices." 2013. http://hdl.handle.net/2152/21618.
Full texttext