Dissertations / Theses: 'Sparse Vector Vector Multiplication'

1

Ashari, Arash. "Sparse Matrix-Vector Multiplication on GPU." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1417770100.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Ramachandran, Shridhar. "Incremental PageRank acceleration using Sparse Matrix-Sparse Vector Multiplication." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1462894358.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Balasubramanian, Deepan Karthik. "Efficient Sparse Matrix Vector Multiplication for Structured Grid Representation." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339730490.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Mansour, Ahmad [Verfasser]. "Sparse Matrix-Vector Multiplication Based on Network-on-Chip / Ahmad Mansour." München : Verlag Dr. Hut, 2015. http://d-nb.info/1075409470/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Singh, Kunal. "High-Performance Sparse Matrix-Multi Vector Multiplication on Multi-Core Architecture." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524089757826551.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

El-Kurdi, Yousef M. "Sparse Matrix-Vector floating-point multiplication with FPGAs for finite element electromagnetics." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98958.

Full text

Abstract:

The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. Field Programmable Gate Arrays (FPGAs) have been shown to have higher peak floating-point performance than general purpose CPUs, and the trends are moving in favor of FPGAs. We present an architecture and implementation of an FPGA-based Sparse Matrix-Vector Multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. Our architecture exploits the FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements. The architecture is based on a pipelined linear array of Processing Elements (PEs). A hardware-oriented matrix "striping" scheme is developed which reduces the number of required processing elements. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA reconfigurable systems, this architecture can achieve 1.5 GFLOPS sustained performance. A single pipeline uses 30% of the logic resources and 40% of the memory resources of a Stratix S80 FPGA. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solvers such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory.

APA, Harvard, Vancouver, ISO, and other styles

7

Godwin, Jeswin Samuel. "High-Performancs Sparse Matrix-Vector Multiplication on GPUS for Structured Grid Computations." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1357280824.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Pantawongdecha, Payut. "Autotuning divide-and-conquer matrix-vector multiplication." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105968.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 73-75).
Divide and conquer is an important concept in computer science. It is used ubiquitously to simplify and speed up programs. However, it needs to be optimized, with respect to parameter settings for example, in order to achieve the best performance. The problem boils down to searching for the best implementation choice on a given set of requirements, such as which machine the program is running on. The goal of this thesis is to apply and evaluate the Ztune approach [14] on serial divide-and-conquer matrix-vector multiplication. We implemented Ztune to autotune serial divide-and-conquer matrix-vector multiplication on machines with different hardware configurations, and found that Ztuneoptimized codes ran 1%-5% faster than the hand-optimized counterparts. We also compared Ztune-optimized results with other matrix-vector multiplication libraries including the Intel Math Kernel Library and OpenBLAS. Since the matrix-vector multiplication problem is a level 2 BLAS, it is not as computationally intensive as level 3 BLAS problems such as matrix-matrix multiplication and stencil computation. As a result, the measurement in matrix-vector multiplication is more prone to error from factors such as noise, cache alignment of the matrix, and cache states, which lead to wrong decision choices for Ztune. We explored multiple options to get more accurate measurements and demonstrated the techniques that remedied these issues. Lastly, we applied the Ztune approach to matrix-matrix multiplication, and we were able to achieve 2%-85% speedup compared to the hand-tuned code. This thesis represents joint work with Ekanathan Palamadai Natarajan.
by Payut Pantawongdecha.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

9

Hopkins, T. M. "The design of a sparse vector processor." Thesis, University of Edinburgh, 1993. http://hdl.handle.net/1842/14094.

Full text

Abstract:

This thesis describes the development of a new vector processor architecture capable of high efficiency when computing with very sparse vector and matrix data, of irregular structure. Two applications are identified as of particular importance: sparse Gaussian elimination, and Linear Programming, and the algorithmic steps involved in the solution of these problems are analysed. Existing techniques for sparse vector computation, which are only able to achieve a small fraction of the arithmetic performance commonly expected on dense matrix problems, are critically examined. A variety of new techniques with potential for hardware support is discussed. From these, the most promising are selected, and efficient hardware implementations developed. The architecture of a complete vector processor incorporating the new vector and matrix mechanisms is described - the new architecture also uses an innovative control structure for the vector processor, which enables high efficiency even when computing with vectors with very small numbers of non-zeroes. The practical feasibility of the design is demonstrated by describing the prototype implementation, under construction from off-the-shelf components. The expected performance of the new architecture is analysed, and simulation results are presented which demonstrate that the machine could be expected to provide an order of magnitude speed-up on many large sparse Linear Programming problems, compared to a scalar processor with the same clock rate. The simulation results indicate that the vector processor control structure is successful - the vector half-performance length is as low as 8 for standard vector instruction loop tests. In some cases, simulations indicate that the performance of the machine is limited by the speed of some scalar processor operations. Finally, the scope for re-implementing the new architecture in technology faster than the prototype's 8MHz is briefly discussed, and particular potential difficulties identified.

APA, Harvard, Vancouver, ISO, and other styles

10

Belgin, Mehmet. "Structure-based Optimizations for Sparse Matrix-Vector Multiply." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30260.

Full text

Abstract:

This dissertation introduces two novel techniques, OSF and PBR, to improve the performance of Sparse Matrix-vector Multiply (SMVM) kernels, which dominate the runtime of iterative solvers for systems of linear equations. SMVM computations that use sparse formats typically achieve only a small fraction of peak CPU speeds because they are memory bound due to their low flops:byte ratio, they access memory irregularly, and exhibit poor ILP due to inefficient pipelining. We particularly focus on improving the flops:byte ratio, which is the main limiter on performance, by exploiting recurring structures or sub-structures in matrices. Our techniques also support micro-architecture level optimizations to further improve performance. Operation Stacking Framework (OSF) stacks problems in large ensemble computations, which run the same sparse kernel using an identical matrix structure, such that they share a single copy of the indexing information to significantly reduce memory bandwidth usage. OSF provides performance improvements of up to 1.94x on an AMD Opteron compared to the CSR method. We validate performance results using hardware event counters, which demonstrate significantly improved cache and pipeline utilization. Pattern-based Representation (PBR) exploits recurring block nonzero patterns by generating custom code for each recurring block pattern. In this way, no indexing data for individual nonzero elements are read from memory, reducing the overall size of the indices by up to 98%. Our code generator emits highly tuned codes that utilize SSE vectorization and software prefetching. PBR accurately identifies a block size that achieves optimal or near-optimal performance using a linear multiple regression performance model. On recent multicore machines, PBR provides performance improvements of up to 3.4x sequentially and 5x in parallel, compared to the CSR method. The PBR library we provide converts matrices at runtime, allowing our method to be used as a drop-in replacement for existing methods. We compare PBRâ s overhead relative to its benefits and show that PBR is beneficial for many applications that repetitively call the SMVM kernel for the same matrix structure.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

11

DeLorimier, Michael DeHon André. "Floating-point sparse matrix-vector multiply for FPGAs /." Diss., Pasadena, Calif. : California Institute of Technology, 2005. http://resolver.caltech.edu/CaltechETD:etd-05132005-144347.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Xia, Xiao-Lei. "Sparse learning for support vector machines with biological applications." Thesis, Queen's University Belfast, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.527935.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Mellet, Dieter Sydney-Charles. "An integrated continuous output linear power sensor using Hall effect vector multiplication." Diss., Pretoria : [s.n.], 2002. http://upetd.up.ac.za/thesis/available/etd-09012005-120807.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Chen, Jihong. "Sparse Modeling in Classification, Compression and Detection." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/5051.

Full text

Abstract:

The principal focus of this thesis is the exploration of sparse structures in a variety of statistical modelling problems. While more comprehensive models can be useful to solve a larger number of problems, its calculation may be ill-posed in most practical instances because of the sparsity of informative features in the data. If this sparse structure can be exploited, the models can often be solved very efficiently. The thesis is composed of four projects. Firstly, feature sparsity is incorporated to improve the performance of support vector machines when there are a lot of noise features present. The second project is about an empirical study on how to construct an optimal cascade structure. The third project involves the design of a progressive, rate-distortionoptimized shape coder by combining zero-tree algorithm with beamlet structure. Finally, the longest run statistics is applied for the detection of a filamentary structure in twodimensional rectangular region. The fundamental ideas of the above projects are common — extract an efficient summary from a large amount of data. The main contributions of this work are to develop and implement novel techniques for the efficient solutions of several dicult problems that arise in statistical signal/image processing.

APA, Harvard, Vancouver, ISO, and other styles

15

Muradov, Feruz. "Development, Implementation, Optimization and Performance Analysis of Matrix-Vector Multiplication on Eight-Core Digital Signal Processor." Thesis, KTH, Numerisk analys, NA, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-131289.

Full text

Abstract:

This thesis work aims at implementing the sparse matrix vector multiplication on eight-core Digital Signal Processor (DSP) and giving insights on how to optimize matrix multiplication on DSP to achieve high energy efficiency. We used two sparse matrix formats: the Compressed Sparse Row (CSR) and the Block Compressed Sparse Row (BCSR) formats. We carried out loop unrolling optimization of the naive algorithm. In addition, we implemented the Registerblocked and the Cache-blocked sparse matrix vector multiplications to optimize the naive algorithm. The computational performance improvement with loop unrolling technique was promising (≈12%). With this optimization, we observed a decrease of power usage (0.3 W) when using a matrix size of 600 and an increase of power usage (1.2 W), when using larger size matrices. The Register-blocked algorithm resulted to be the most efficient technique on DSP. With this algorithm, we were able to increase performance by a factor of six when compared to the naive algorithm, still retaining low power consumption (≈ 14 W). The Cache-blocked sparse matrix vector multiplication is known to be most convenient for large number of architectures with coherent caches. However, because DSP does not support coherency between caches, this method did not show large improvement in computational performance. In fact, we confirm that power consumption for the Cache-blocked method was higher when compared to other effective algorithms such as Register-blocked sparse matrix vector multiplication and loop unrolling of naive algorithm. In conclusion, we found that the DSP delivers low power consumption, excellent computational performance and energy efficiency when the Register-blocked sparse matrix vector multiplication technique is used.

APA, Harvard, Vancouver, ISO, and other styles

16

Determe, Jean-François. "Greedy algorithms for multi-channel sparse recovery." Doctoral thesis, Universite Libre de Bruxelles, 2018. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/265808.

Full text

Abstract:

During the last decade, research has shown compressive sensing (CS) to be a promising theoretical framework for reconstructing high-dimensional sparse signals. Leveraging a sparsity hypothesis, algorithms based on CS reconstruct signals on the basis of a limited set of (often random) measurements. Such algorithms require fewer measurements than conventional techniques to fully reconstruct a sparse signal, thereby saving time and hardware resources. This thesis addresses several challenges. The first is to theoretically understand how some parameters—such as noise variance—affect the performance of simultaneous orthogonal matching pursuit (SOMP), a greedy support recovery algorithm tailored to multiple measurement vector signal models. Chapters 4 and 5 detail novel improvements in understanding the performance of SOMP. Chapter 4 presents analyses of SOMP for noiseless measurements; using those analyses, Chapter 5 extensively studies the performance of SOMP in the noisy case. A second challenge consists in optimally weighting the impact of each measurement vector on the decisions of SOMP. If measurement vectors feature unequal signal-to-noise ratios, properly weighting their impact improves the performance of SOMP. Chapter 6 introduces a novel weighting strategy from which SOMP benefits. The chapter describes the novel weighting strategy, derives theoretically optimal weights for it, and presents both theoretical and numerical evidence that the strategy improves the performance of SOMP. Finally, Chapter 7 deals with the tendency for support recovery algorithms to pick support indices solely for mapping a particular noise realization. To ensure that such algorithms pick all the correct support indices, researchers often make the algorithms pick more support indices than the number strictly required. Chapter 7 presents a support reduction technique, that is, a technique removing from a support the supernumerary indices solely mapping noise. The advantage of the technique, which relies on cross-validation, is that it is universal, in that it makes no assumption regarding the support recovery algorithm generating the support. Theoretical results demonstrate that the technique is reliable. Furthermore, numerical evidence proves that the proposed technique performs similarly to orthogonal matching pursuit with cross-validation (OMP-CV), a state-of-the-art algorithm for support reduction.
Doctorat en Sciences de l'ingénieur et technologie
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

17

Murugandi, Iyyappa Thirunavukkarasu. "A New Representation of Structured Grids for Matrix-vector Operation and Optimization of Doitgen Kernel." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1276878729.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Fourie, Christoff. "A one-class object-based system for sparse geographic feature identification." Thesis, Stellenbosch : University of Stellenbosch, 2011. http://hdl.handle.net/10019.1/6666.

Full text

Abstract:

Thesis (MSc (Geography and Environmental Studies))--University of Stellenbosch, 2011.
ENGLISH ABSTRACT: The automation of information extraction from earth observation imagery has become a field of active research. This is mainly due to the high volumes of remotely sensed data that remain unused and the possible benefits that the extracted information can provide to a wide range of interest groups. In this work an earth observation image processing system is presented and profiled that attempts to streamline the information extraction process, without degradation of the quality of the extracted information, for geographic object anomaly detection. The proposed system, implemented as a software application, combines recent research in automating image segment generation and automatically finding statistical classifier parameters and attribute subsets using evolutionary inspired search algorithms. Exploratory research was conducted on the use of an edge metric as a fitness function to an evolutionary search heuristic to automate the generation of image segments for a region merging segmentation algorithm having six control parameters. The edge metric for such an application is compared with an area based metric. The use of attribute subset selection in conjunction with a free parameter tuner for a one class support vector machine (SVM) classifier, operating on high dimensional object based data, was also investigated. For common earth observation anomaly detection problems using typical segment attributes, such a combined free parameter tuning and attribute subset selection system provided superior statistically significant results compared to a free parameter tuning only process. In some extreme cases, due to the stochastic nature of the search algorithm employed, the free parameter only strategy provided slightly better results. The developed system was used in a case study to map a single class of interest on a 22.5 x 22.5km subset of a SPOT 5 image and is compared with a multiclass classification strategy. The developed system generated slightly better classification accuracies than the multiclass classifier and only required samples from the class of interest.
AFIKAANSE OPSOMMING: Die outomatisering van die verkryging van inligting vanaf aardwaarnemingsbeelde het in sy eie reg 'n navorsingsveld geword as gevolg van die groot volumes data wat nie benut word nie, asook na aanleiding van die moontlike bydrae wat inligting wat verkry word van hierdie beelde aan verskeie belangegroepe kan bied. In hierdie tesis word 'n aardwaarneming beeldverwerkingsstelsel bekend gestel en geëvalueer. Hierdie stelsel beoog om die verkryging van inligting van aardwaarnemingsbeelde te vergemaklik deur verbruikersinteraksie te minimaliseer, sonder om die kwaliteit van die resultate te beïnvloed. Die stelsel is ontwerp vir geografiese voorwerp anomalie opsporing en is as 'n sagteware program geïmplementeer. Die program kombineer onlangse navorsing in die gebruik van evolusionêre soek-algoritmes om outomaties goeie beeldsegmente te verkry en parameters te vind, sowel as om kenmerke vir 'n statistiese klassifikasie van beeld segmente te selekteer. Verkennende navorsing is gedoen op die benutting van 'n rand metriek as 'n passings funksie in 'n evolusionêre soek heuristiek om outomaties goeie parameters te vind vir 'n streeks kombinering beeld segmentasie algoritme met ses beheer parameters. Hierdie rand metriek word vergelyk met 'n area metriek vir so 'n toepassing. Die nut van atribuut substel seleksie in samewerking met 'n vrye parameter steller vir 'n een klas steun vektor masjien (SVM) klassifiseerder is ondersoek op hoë dimensionele objek georiënteerde data. Vir algemene aardwaarneming anomalie opsporings probleme met 'n tipiese segment kenmerk versameling, het so 'n stelsel beduidend beter resultate as 'n eksklusiewe vrye parameter stel stelsel gelewer in sommige uiterste gevalle. As gevolg van die stogastiese aard van die soek algoritme het die eksklusiewe vrye parameter stel strategie effens beter resultate gelewer. Die stelsel is getoets in 'n gevallestudie waar 'n enkele klas op 'n 22.5 x 22.5km substel van 'n SPOT 5 beeld geïdentifiseer word. Die voorgestelde stelsel, wat slegs monsters van die gekose klas gebruik het, het beter klassifikasie akkuraathede genereer as die multi klas klassifiseerder.

APA, Harvard, Vancouver, ISO, and other styles

19

Tordsson, Pontus. "Compressed sensing for error correction on real-valued vectors." Thesis, Linnéuniversitetet, Institutionen för matematik (MA), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-85499.

Full text

Abstract:

Compressed sensing (CS) is a relatively new branch of mathematics with very interesting applications in signal processing, statistics and computer science. This thesis presents some theory of compressed sensing, which allows us to recover (high-dimensional) sparse vectors from (low-dimensional) compressed measurements by solving the L1-minimization problem. A possible application of CS to the problem of error correction is also presented, where sparse vectors are that of arbitrary noise. Successful sparse recovery by L1-minimization relies on certain properties of rectangular matrices. But these matrix properties are extremely subtle and diﬃcult to numerically verify. Therefore, to get an idea of how sparse (or dense) errors can be, numerical simulation of error correction was done. These simulations show the performance of error correction with respect to various levels of error sparsity and matrix dimensions. It turns out that error correction degrades slower for low matrix dimensions than for high matrix dimensions, while for suﬃciently sparse errors, high matrix dimensions oﬀer a higher likelihood of guaranteed error correction.

APA, Harvard, Vancouver, ISO, and other styles

20

Eibner, Tino, and Jens Markus Melenk. "Fast algorithms for setting up the stiffness matrix in hp-FEM: a comparison." Universitätsbibliothek Chemnitz, 2006. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200601623.

Full text

Abstract:

We analyze and compare different techniques to set up the stiffness matrix in the hp-version of the finite element method. The emphasis is on methods for second order elliptic problems posed on meshes including triangular and tetrahedral elements. The polynomial degree may be variable. We present a generalization of the Spectral Galerkin Algorithm of [7], where the shape functions are adapted to the quadrature formula, to the case of triangles/tetrahedra. Additionally, we study on-the-fly matrix-vector multiplications, where merely the matrix-vector multiplication is realized without setting up the stiffness matrix. Numerical studies are included.

APA, Harvard, Vancouver, ISO, and other styles

21

Flegar, Goran. "Sparse Linear System Solvers on GPUs: Parallel Preconditioning, Workload Balancing, and Communication Reduction." Doctoral thesis, Universitat Jaume I, 2019. http://hdl.handle.net/10803/667096.

Full text

Abstract:

With the breakdown of Dennard scaling in the mid-2000s and the end of Moore's law on the horizon, the high performance computing community is turning its attention towards unconventional accelerator hardware to ensure the continued growth of computational capacity. This dissertation presents several contributions related to the iterative solution of sparse linear systems on the most widely used general purpose accelerator - the Graphics Processing Unit (GPU). Specifically, it accelerates the major building blocks of Krylov solvers, and describes their realization as part of a software library of reusable building blocks. The first part of the dissertation focuses on the sparse matrix-vector product and effective load balancing in the presence of irregular sparsity patterns. The second part describes the design of high-performance preconditioners. Finally, the third part demonstrates the potential of adaptive precision techniques for constructing preconditioners with lower memory footprint, and accuracy comparable to their full precision equivalents.
Con el final de la ley de Dennard y el cercano fin de la ley de Moore, la comunidad en computación de altas prestaciones se está centrando en tecnologías de aceleración no convencionales para asegurar el crecimiento exponencial de la capacidad de computación. Esta tesis contribuye a la solución iterativa de sistemas lineales dispersos en el acelerador más difundido: el procesador gráfico. Específicamente, el trabajo acelera los bloques fundamentales de los métodos de Krylov, y describe su implementación como parte de una biblioteca de bloques reutilizables. La primera parte del trabajo se centra en el producto matriz-vector disperso y el equilibrado de la carga ante patrones de dispersidad irregulares. La segunda parte describe el diseño de precondicionadores de alto rendimiento. Finalmente, la tercera parte demuestra el potencial de las técnicas de precisión adaptativa para construir precondicionadores con menor consumo de memoria, y fiabilidad comparable con las versiones de precisión completa.

APA, Harvard, Vancouver, ISO, and other styles

22

Chew, Sien Wei. "Recognising facial expressions with noisy data." Thesis, Queensland University of Technology, 2013. https://eprints.qut.edu.au/63523/1/Sien_Wei_Chew_Thesis.pdf.

Full text

Abstract:

Techniques to improve the automated analysis of natural and spontaneous facial expressions have been developed. The outcome of the research has applications in several fields including national security (eg: expression invariant face recognition); education (eg: affect aware interfaces); mental and physical health (eg: depression and pain recognition).

APA, Harvard, Vancouver, ISO, and other styles

23

Grah, Joana Sarah. "Mathematical imaging tools in cancer research : from mitosis analysis to sparse regularisation." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/273243.

Full text

Abstract:

This dissertation deals with customised image analysis tools in cancer research. In the field of biomedical sciences, mathematical imaging has become crucial in order to account for advancements in technical equipment and data storage by sound mathematical methods that can process and analyse imaging data in an automated way. This thesis contributes to the development of such mathematically sound imaging models in four ways: (i) automated cell segmentation and tracking. In cancer drug development, time-lapse light microscopy experiments are conducted for performance validation. The aim is to monitor behaviour of cells in cultures that have previously been treated with chemotherapy drugs, since atypical duration and outcome of mitosis, the process of cell division, can be an indicator of successfully working drugs. As an imaging modality we focus on phase contrast microscopy, hence avoiding phototoxicity and influence on cell behaviour. As a drawback, the common halo- and shade-off effect impede image analysis. We present a novel workflow uniting both automated mitotic cell detection with the Hough transform and subsequent cell tracking by a tailor-made level-set method in order to obtain statistics on length of mitosis and cell fates. The proposed image analysis pipeline is deployed in a MATLAB software package called MitosisAnalyser. For the detection of mitotic cells we use the circular Hough transform. This concept is investigated further in the framework of image regularisation in the general context of imaging inverse problems, in which circular objects should be enhanced, (ii) exploiting sparsity of first-order derivatives in combination with the linear circular Hough transform operation. Furthermore, (iii) we present a new unified higher-order derivative-type regularisation functional enforcing sparsity of a vector field related to an image to be reconstructed using curl, divergence and shear operators. The model is able to interpolate between well-known regularisers such as total generalised variation and infimal convolution total variation. Finally, (iv) we demonstrate how we can learn sparsity promoting parametrised regularisers via quotient minimisation, which can be motivated by generalised Eigenproblems. Learning approaches have recently become very popular in the field of inverse problems. However, the majority aims at fitting models to favourable training data, whereas we incorporate knowledge about both fit and misfit data. We present results resembling behaviour of well-established derivative-based sparse regularisers, introduce novel families of non-derivative-based regularisers and extend this framework to classification problems.

APA, Harvard, Vancouver, ISO, and other styles

24

Simeão, Sandra Fiorelli de Almeida Penteado. "Técnicas de esparsidade em sistemas estáticos de energia elétrica." Universidade de São Paulo, 2001. http://www.teses.usp.br/teses/disponiveis/18/18133/tde-07032016-105658/.

Full text

Abstract:

Neste trabalho foi realizado um grande levantamento de técnicas de esparsidade relacionadas a sistemas estáticos de energia elétrica. Tais técnicas visam, do ponto de vista computacional, ao aumento da eficiência na solução de rede elétrica objetivando, além da resolução em si, a redução dos requisitos de memória, armazenamento e tempo de processamento. Para tanto, uma extensa revisão bibliográfica foi compilada, apresentando um posicionamento histórico e uma ampla visão do desenvolvimento teórico. Os testes comparativos realizados para sistemas de 14, 30, 57 e 118 barras, sobre a implantação de três das técnicas mais empregadas, apontou a Bi-fatoração como tendo o melhor desempenho médio. Para sistemas pequenos, a Eliminação Esparsa e Sintética de Gauss apresentou melhores resultados. Este trabalho fornecerá subsídios conceituais e metodológicos a técnicos e pesquisadores da área.
In this work a great survey of sparsity techniques related to static systems of electric power was accomplished. Such techniques seek, for of the computational point of view, the increase of the efficiency in the solution of the electric net aiming, besides the resolution of itself, the reduction of memory requirements, the storage and time processing. For that, an extensive bibliographic review was compiled providing a historic positioning and a broad view of theoretic development. The comparative tests accomplished for systems of 14,30, 57 and 118 buses, on the implementation of three of the most employed techniques, it pointed out an bi-factorisation as best medium performance. For small systems, the sparse symmetric Gaussian elimination showed the best results. This work will supply conceptual and methodological subsidies to technicians and researchers of the area.

APA, Harvard, Vancouver, ISO, and other styles

25

LE, COZ SERGE. "La rhizomanie de la betterave sucriere : multiplication du virus et aspects agronomiques de la maladie." Paris 6, 1986. http://www.theses.fr/1986PA066644.

Full text

Abstract:

Les chloroplastes des feuilles de betteraves rhizomaniees sont appauvries en pigments, en proteines, en liquides polaires et leur activite photosynthetique est reduite. Dans les cellules virosees, l'etude ultrastructure montre une association des amas de virus avec le reticulum endoplasmique granuleux. Un protocole pour la preparation de suspensions enrichies en cytosores isoles de polymyxa betae est propose. Le champignon est retrouve a tous les niveaux du sol de quatre parcelles rhizomaniees ou saines de la region de pithiviers. Une terre rhizomaniee reste infectieuse aprese un an et demi de lagune en bassin

APA, Harvard, Vancouver, ISO, and other styles

26

Herrmann, Felix J., Deli Wang, and Gilles Hennenfent. "Multiple prediction from incomplete data with the focused curvelet transform." Society of Exploration Geophysicists, 2007. http://hdl.handle.net/2429/563.

Full text

Abstract:

Incomplete data represents a major challenge for a successful prediction and subsequent removal of multiples. In this paper, a new method will be represented that tackles this challenge in a two-step approach. During the first step, the recenly developed curvelet-based recovery by sparsity-promoting inversion (CRSI) is applied to the data, followed by a prediction of the primaries. During the second high-resolution step, the estimated primaries are used to improve the frequency content of the recovered data by combining the focal transform, defined in terms of the estimated primaries, with the curvelet transform. This focused curvelet transform leads to an improved recovery, which can subsequently be used as input for a second stage of multiple prediction and primary-multiple separation.

APA, Harvard, Vancouver, ISO, and other styles

27

Behúň, Kamil. "Příznaky z videa pro klasifikaci." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236367.

Full text

Abstract:

This thesis compares hand-designed features with features learned by feature learning methods in video classification. The features learned by Principal Component Analysis whitening, Independent subspace analysis and Sparse Autoencoders were tested in a standard Bag of Visual Word classification paradigm replacing hand-designed features (e.g. SIFT, HOG, HOF). The classification performance was measured on Human Motion DataBase and YouTube Action Data Set. Learned features showed better performance than the hand-desined features. The combination of hand-designed features and learned features by Multiple Kernel Learning method showed even better performance, including cases when hand-designed features and learned features achieved not so good performance separately.

APA, Harvard, Vancouver, ISO, and other styles

28

Pradat, Yannick. "Retraite et risque financier." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLED022/document.

Full text

Abstract:

Le premier chapitre examine les caractéristiques statistiques à long terme des rendements financiers en France et aux USA. Les propriétés des différents actifs font apparaître qu’à long terme les actions procurent un risque sensiblement moins élevé. En outre, les propriétés de retour à la moyenne des actions justifient qu’elles soient utilisées dans une stratégie de cycle de vie comme « option par défaut » de plans d’épargne retraite. Le chapitre deux fournit une explication au débat sur l'hypothèse d’efficience des marchés. La cause du débat est souvent attribuée à la petite taille des échantillons et à la faible puissance des tests statistiques dédiés. Afin de contourner ce problème, nous utilisons l'approche développée par Campbell et Viceira (2005) qui utilisent une méthode VAR pour mettre en évidence l’existence de retour vers la moyenne dans le cours des actifs risqués.Le troisième chapitre évalue la vitesse de convergence des cours des actions. Un moyen classique pour caractériser la vitesse de retour vers la moyenne est la « demi-vie ». En comparant les indices boursiers de quatre pays développés (États-Unis, Royaume-Uni, France et Japon) sur la période 1950-2014, nous établissons une vitesse de convergence significative, avec une demi-vie entre 4,0 et 5,8 ans.Le dernier chapitre présente les résultats d'un modèle conçu pour étudier les interactions entre la démographie et les régimes de retraite. Afin d’étudier les risques inhérents à l’utilisation des revenus du capital pour financer les retraites, nous utilisons un « Trending OU process » au lieu d’un MBG classique pour modéliser les rendements. Pour un épargnant averse au risque le marché pourrait concurrencer les régimes par répartition
Chapter one examines the long run statistical characteristics of financial returns in France and the USA for selected assets. This study clearly shows that the returns’ distributions diverge from the Gaussian strategy as regards longholding periods. Thereafter we analyze the consequences of the non-Gaussian nature of stock returns on default-option retirement plans.Chapter two provides a reasonable explanation to the strong debate on the Efficient Market Hypothesis. The cause of the debate is often attributed to small sample sizes in combination with statistical tests for mean reversion that lackpower. In order to bypass this problem, we use the approach developed by Campbell and Viceira (2005) who have settled a vectorial autoregressive methodology (VAR) to measure the mean reversion of asset returns.The third chapter evaluates the speed of convergence of stock prices. A convenient way to characterize the speed of mean reversion is the half-life. Comparing the stock indexes of four developed countries (US, UK, France and Japan) during the period 1950-2014, we establish significant mean reversion, with a half-life lying between 4,0 and 5,8 years.The final chapter provides some results from a model built in order to study the linked impacts of demography and economy on the French pension scheme. In order to reveal the risks that are contained in pension fund investment, we use a Trending Ornstein-Uhlenbeck process instead of the typical GBM for modeling stock returns. We find that funded scheme returns, net of management fees, are slightly lower thanthe PAYG internal rate of return

APA, Harvard, Vancouver, ISO, and other styles

29

Tsai, Sung-Han, and 蔡松翰. "Optimization for sparse matrix-vector multiplication based on NVIDIA CUDA platform." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/qw23p7.

Full text

Abstract:

碩士
國立彰化師範大學
資訊工程學系
105
In recent years, large size sparse matrices are often used in fields such as science and engineering which usually apply in computing linear model. Using the ELLPACK format to store sparse matrices, it can reduce the matrix storage space. But if there is too much nonzero elements in one of row of the original sparse matrix, it still waste too much memory space. There are many research focusing on the Sparse Matrix–Vector Multiplication（SpMV）with ELLPACK format on Graphic Processing Unit（GPU）. Therefore, the purpose of our research is reducing the access space of sparse matrix which is transformed in Compressed Sparse Row（CSR）format after Reverse Cutthill-McKee（RCM）algorithm to accelerate for SpMV on GPU. Due to lower data access ratio from SpMV, the performance is restricted by memory bandwidth. Our propose is based on CSR format from two aspects:（1）reduce cache misses to enhance the vector locality and raise the performance, and（2）reduce accessed matrix data by index reduction to optimize the performance.

APA, Harvard, Vancouver, ISO, and other styles

30

Jheng, Hong-Yuan, and 鄭弘元. "FPGA Acceleration of Sparse Matrix-Vector Multiplication Based on Network-on-Chip." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/y884tf.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
99
The Sparse Matrix-Vector Multiplication (SMVM) is a pervasive operation in many scientific and engineering applications. Moreover, SMVM is a computational intensive operation that dominates the performance in most iterative linear system solvers. There are some optimization challenges in computations involving SMVM due to its high memory access rate and irregular memory access pattern. In this thesis, a new design concept for SMVM in an FPGA by using Network-on-Chip (NoC) is presented. In traditional circuit design on-chip communications have been designed with dedicated point-to-point interconnections or shared buses. Therefore, regular data transfer is the major concern of many parallel implementations. However, when dealing with the SMVM operation, the required data transfers are usually dependent on the sparsity structure of the matrix and can be extremely irregular. Using an NoC architecture makes it possible to deal with arbitrary structure of the data transfers, i.e. with arbitrary structured sparse matrices. In addition, the size of the pipelined SMVM calculator based on NoC architecture can be customized to 2×2, 4×4, ..., p×p (p∈N) due to its high scalibility and flexibility. The implementation is done in IEEE-754 single floating-point precision on the Xilinx Virtex-6 FPGA. The experimental results show that the proposed NoC-based implementation can achieve approximate 2.3 - 5.6 speed-up over the MATLAB-based software implementation in Matrix Market benchmark applications.

APA, Harvard, Vancouver, ISO, and other styles

31

Hsu, Wei-chun, and 徐偉郡. "Sparse Matrix-Vector Multiplication: A Low Communication Cost Data Mapping-Based Architecture." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/09761233547687794389.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
103
The performance of the sparse matrix-vector multiplication (SMVM) on a parallel system is strongly conditioned by the distribution of data among its components. Two costs arise as a result of the used data mapping method: arithmetic and communication. The communication cost of an algorithm often dominates the arithmetic cost, and the gap between these costs tends to increase. Therefore, finding a mapping method that reduces the communication cost is of high importance. On the other hand, the load distribution among the processing units must not be sacrificed as well. In this paper, a data mapping method is proposed for SMVM on Network-on-Chip (NoC) which achieves balanced working load and reduces the communication cost. Afterwards, an FPGA-based architecture is introduced which is designed to fit the proposed data mapping method. The experimental results show that the communication cost of the proposed design is 40\% lower than the previous work.

APA, Harvard, Vancouver, ISO, and other styles

32

TSAI, NIAN-YING, and 蔡念穎. "On Job Allocation Strategies for Running Sparse Matrix-Vector Multiplication on GPUs." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/mpwmh4.

Full text

Abstract:

碩士
國立彰化師範大學
資訊工程學系
105
In the era of big data, Graphic Processing Unit (GPU) has been widely used to deal with many parallelization problems as the amount of data to be processed. Sparse Matrix – Vector Multiplication is an important and basic operation in many fields, there are still many improved space for raising the performance of the GPU operation. This paper is mainly about job allocation strategies for running Sparse Matrix-Vector Multiplication on GPUs. The LightSpMV algorithm is based on the standard CSR format. The CSR format is a common sparse matrix storage format which is more flexible and better than other formats. The LightSpMV algorithm uses two dynamic configuration methods, whose matrix row is distributed to one for vector and the other for warp. Both of the methods use Atomic operations to get the Row index values. Because Atomic operations spent too much execution time, we proposed three strategies for this part of the workload allocation: (1) Using warp as the basic unit, through doubling the number of rows which have to be executed for each allocation, to make the number of Atomic operations reduced. (2) Using block as the basic unit, the number of rows are allocated dynamically. Compared to the dynamic configuration of using warp as basic unit, this strategy can reduce the number of Atomic operations. (3) Using block as the basic unit, the number of rows executed by blocks are static allocation. In each block we reuse warp as the basic unit and the warp are allocated dynamically instead of Atomic operations.After the implementation of our experiments in the work environment of the GTX980 GPU, the best performance is the third strategy and the performance improvement is nearly 100%.

APA, Harvard, Vancouver, ISO, and other styles

33

Mirza, Salma. "Scalable, Memory-Intensive Scientific Computing on Field Programmable Gate Arrays." 2010. https://scholarworks.umass.edu/theses/404.

Full text

Abstract:

Cache-based, general purpose CPUs perform at a small fraction of their maximum floating point performance when executing memory-intensive simulations, such as those required for many scientific computing problems. This is due to the memory bottleneck that is encountered with large arrays that must be stored in dynamic RAM. A system of FPGAs, with a large enough memory bandwidth, and clocked at only hundreds of MHz can outperform a CPU clocked at GHz in terms of floating point performance. An FPGA core designed for a target performance that does not unnecessarily exceed the memory imposed bottleneck can then be distributed, along with multiple memory interfaces, into a scalable architecture that overcomes the bandwidth limitation of a single interface. Interconnected cores can work together to solve a scientific computing problem and exploit a bandwidth that is the sum of the bandwidth available from all of their connected memory interfaces. The implementation demonstrates this concept of scalability with two memory interfaces through the use of available FPGA prototyping platforms. Even though the FPGAs operate at 133 MHz, which is twenty one times slower than an AMD Phenom X4 processor operating at 2.8 GHz, the system of two FPGAs performs eight times slower than the processor for the example problem of SMVM in heat transfer. However, the system is demonstrated to be scalable with a run-time that decreases linearly with respect to the available memory bandwidth. The floating point performance of a single board implementation is 12 GFlops which doubles to 24 GFlops for a two board implementation, for a gather or scatter operation on matrices of varying sizes.

APA, Harvard, Vancouver, ISO, and other styles

34

Batjargal, Delgerdalai, and 白德格. "Parallel Matrix Transposition and Vector Multiplication Using OpenMP." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/50550637149183575586.

Full text

Abstract:

碩士
靜宜大學
資訊工程學系
101
In this thesis, we propose two parallel algorithms for sparse matrix-transpose and vector multiplication using CSR (Compressed Sparse Row) format. Even though this storage format is simple and hence easy to understand and maintained, one of its limitation is difficult to parallelized, and a performance of a naïve parallel algorithm can be worst. But by preprocessing useful information that is hidden and indirect in its data structure during reading a matrix from a file, our algorithm of the matrix transposition can then be performed in parallel using OpenMP. Our codes are run on a quad-core Intel Xeon64 CPU E5507 platform. We measure, and compare the performance of our algorithms with that of using Compressed Sparse Block (CSB) format. Our experimental results show that our algorithms are comparable to the CSB based algorithm when the nonzero are scatter around the matrix and size of matrix is growing.

APA, Harvard, Vancouver, ISO, and other styles

35

deLorimier, Michael John. "Floating-Point Sparse Matrix-Vector Multiply for FPGAs." Thesis, 2005. https://thesis.library.caltech.edu/1776/1/smvm_thesis.pdf.

Full text

Abstract:

Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Multiply (SMVM) kernel. In fact, microprocessors rarely achieve 33% of their peak floating-point performance when computing SMVM. We develop and analyze a scalable SMVM implementation on modern FPGAs and show that it can sustain high throughput, near peak, floating-point performance. Our implementation consists of logic design as well as scheduling and data placement techniques. For benchmark matrices from the Matrix Market Suite we project 1.5 double precision Gflops/FPGA for a single VirtexII-6000-4 and 12 double precision Gflops for 16 Virtex IIs (750 Mflops/FPGA). We also analyze the asymptotic efficiency of our architecture as parallelism scales using a constant rent-parameter matrix model. This demonstrates that our data placement techniques provide an asymptotic scaling benefit.

While FPGA performance is attractive, higher performance is possible if we re-balance the hardware resources in FPGAs with embedded memories. We show that sacrificing half the logic area for memory area rarely degrades performance and improves performance for large matrices, by up to 5 times. We also 0 the performance effect of adding custom floating-point using a simple area model to preserve total chip area. Sacrificing logic for memory and custom floating-point units increases single FPGA performance to 5 double precision Gflops.

APA, Harvard, Vancouver, ISO, and other styles

36

Girosi, Federico. "An Equivalence Between Sparse Approximation and Support Vector Machines." 1997. http://hdl.handle.net/1721.1/7289.

Full text

Abstract:

In the first part of this paper we show a similarity between the principle of Structural Risk Minimization Principle (SRM) (Vapnik, 1982) and the idea of Sparse Approximation, as defined in (Chen, Donoho and Saunders, 1995) and Olshausen and Field (1996). Then we focus on two specific (approximate) implementations of SRM and Sparse Approximation, which have been used to solve the problem of function approximation. For SRM we consider the Support Vector Machine technique proposed by V. Vapnik and his team at AT&T Bell Labs, and for Sparse Approximation we consider a modification of the Basis Pursuit De-Noising algorithm proposed by Chen, Donoho and Saunders (1995). We show that, under certain conditions, these two techniques are equivalent: they give the same solution and they require the solution of the same quadratic programming problem.

APA, Harvard, Vancouver, ISO, and other styles

37

"Sparse learning under regularization framework." Thesis, 2011. http://library.cuhk.edu.hk/record=b6075111.

Full text

Abstract:

Regularization is a dominant theme in machine learning and statistics due to its prominent ability in providing an intuitive and principled tool for learning from high-dimensional data. As large-scale learning applications become popular, developing efficient algorithms and parsimonious models become promising and necessary for these applications. Aiming at solving large-scale learning problems, this thesis tackles the key research problems ranging from feature selection to learning with unlabeled data and learning data similarity representation. More specifically, we focus on the problems in three areas: online learning, semi-supervised learning, and multiple kernel learning.
The first part of this thesis develops a novel online learning framework to solve group lasso and multi-task feature selection. To the best our knowledge, the proposed online learning framework is the first framework for the corresponding models. The main advantages of the online learning algorithms are that (1) they can work on the applications where training data appear sequentially; consequently, the training procedure can be started at any time; (2) they can handle data up to any size with any number of features. The efficiency of the algorithms is attained because we derive closed-form solutions to update the weights of the corresponding models. At each iteration, the online learning algorithms just need O (d) time complexity and memory cost for group lasso, while they need O (d x Q) for multi-task feature selection, where d is the number of dimensions and Q is the number of tasks. Moreover, we provide theoretical analysis for the average regret of the online learning algorithms, which also guarantees the convergence rate of the algorithms. In addition, we extend the online learning framework to solve several related models which yield more sparse solutions.
The second part of this thesis addresses a general scenario of semi-supervised learning for the binary classification problern, where the unlabeled data may be a mixture of relevant and irrelevant data to the target binary classification task. Without specifying the relatedness in the unlabeled data, we develop a novel maximum margin classifier, named the tri-class support vector machine (3C-SVM), to seek an inductive rule that can separate these data into three categories: --1, +1, or 0. This is achieved by adopting a novel min loss function and following the maximum entropy principle. For the implementation, we approximate the problem and solve it by a standard concaveconvex procedure (CCCP). The approach is very efficient and it is possible to solve large-scale datasets.
The third part of this thesis focuses on multiple kernel learning (MKL) to solve the insufficiency of the L1-MKL and the Lp-MKL models. Hence, we propose a generalized MKL (GMKL) model by introducing an elastic net-type constraint on the kernel weights. More specifically, it is an MKL model with a constraint on a linear combination of the L1-norm and the square of the L2-norm on the kernel weights to seek the optimal kernel combination weights. Therefore, previous MKL problems based on the L1-norm or the L2-norm constraints can be regarded as its special cases. Moreover, our GMKL enjoys the favorable sparsity property on the solution and also facilitates the grouping effect. In addition, the optimization of our GMKL is a convex optimization problem, where a local solution is the globally optimal solution. We further derive the level method to efficiently solve the optimization problem.
Yang, Haiqin.
Advisers: Kuo Chin Irwin King; Michael Rung Tsong Iyu.
Source: Dissertation Abstracts International, Volume: 73-04, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2011.
Includes bibliographical references (leaves 152-173).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.

APA, Harvard, Vancouver, ISO, and other styles

38

"Bayesian Framework for Sparse Vector Recovery and Parameter Bounds with Application to Compressive Sensing." Master's thesis, 2019. http://hdl.handle.net/2286/R.I.55639.

Full text

Abstract:

abstract: Signal compressed using classical compression methods can be acquired using brute force (i.e. searching for non-zero entries in component-wise). However, sparse solutions require combinatorial searches of high computations. In this thesis, instead, two Bayesian approaches are considered to recover a sparse vector from underdetermined noisy measurements. The first is constructed using a Bernoulli-Gaussian (BG) prior distribution and is assumed to be the true generative model. The second is constructed using a Gamma-Normal (GN) prior distribution and is, therefore, a different (i.e. misspecified) model. To estimate the posterior distribution for the correctly specified scenario, an algorithm based on generalized approximated message passing (GAMP) is constructed, while an algorithm based on sparse Bayesian learning (SBL) is used for the misspecified scenario. Recovering sparse signal using Bayesian framework is one class of algorithms to solve the sparse problem. All classes of algorithms aim to get around the high computations associated with the combinatorial searches. Compressive sensing (CS) is a widely-used terminology attributed to optimize the sparse problem and its applications. Applications such as magnetic resonance imaging (MRI), image acquisition in radar imaging, and facial recognition. In CS literature, the target vector can be recovered either by optimizing an objective function using point estimation, or recovering a distribution of the sparse vector using Bayesian estimation. Although Bayesian framework provides an extra degree of freedom to assume a distribution that is directly applicable to the problem of interest, it is hard to find a theoretical guarantee of convergence. This limitation has shifted some of researches to use a non-Bayesian framework. This thesis tries to close this gab by proposing a Bayesian framework with a suggested theoretical bound for the assumed, not necessarily correct, distribution. In the simulation study, a general lower Bayesian Cram\'er-Rao bound (BCRB) bound is extracted along with misspecified Bayesian Cram\'er-Rao bound (MBCRB) for GN model. Both bounds are validated using mean square error (MSE) performances of the aforementioned algorithms. Also, a quantification of the performance in terms of gains versus losses is introduced as one main finding of this report.
Dissertation/Thesis
Masters Thesis Computer Engineering 2019

APA, Harvard, Vancouver, ISO, and other styles

39

"Exploring the potential for accelerating sparse matrix-vector product on a Processing-in-Memory architecture." Thesis, 2009. http://hdl.handle.net/1911/61946.

Full text

Abstract:

As the importance of memory access delays on performance has mushroomed over the past few decades, researchers have begun exploring Processing-in-Memory (PIM) technology, which offers higher memory bandwidth, lower memory latency, and lower power consumption. In this study, we investigate whether an emerging PIM design from Sandia National Laboratories can boost performance for sparse matrix-vector product (SMVP). While SMVP is in the best-case bandwidth-bound, factors related to matrix structure and representation also limit performance. We analyze SMVP both in the context of an AMD Opteron processor and the Sandia PIM, exploring the performance limiters for each and the degree to which these can be ameliorated by data and code transformations. Over a range of sparse matrices, SMVP on the PIM outperformed the Opteron by a factor of 1.82. On the PIM, computational kernel and data structure transformations improved performance by almost 40% over conventional implementations using compressed-sparse row format.

APA, Harvard, Vancouver, ISO, and other styles

40

Zein, Ahmed H. El. "Use of graphics processing units for sparse matrix-vector products in statistical machine learning applications." Master's thesis, 2009. http://hdl.handle.net/1885/148368.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Miron, David John. "The parallel solution of sparse linear least squares problems." Phd thesis, 1998. http://hdl.handle.net/1885/145188.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Wannenwetsch, Katrin Ulrike. "Deterministic Sparse FFT Algorithms." Doctoral thesis, 2016. http://hdl.handle.net/11858/00-1735-0000-002B-7C10-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Tiyyagura, Sunil Reddy [Verfasser]. "Efficient solution of sparse linear systems arising in engineering applications on vector hardware / vorgelegt von Sunil Reddy Tiyyagura." 2010. http://d-nb.info/1004991452/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Gadekar, Ameet. "On Learning k-Parities and the Complexity of k-Vector-SUM." Thesis, 2016. http://hdl.handle.net/2005/3060.

Full text

Abstract:

In this work, we study two problems: first is one of the central problem in learning theory of learning sparse parities and the other k-Vector-SUM is an extension of the not oriousk-SUM problem. We first consider the problem of learning k-parities in the on-line mistake-bound model: given a hidden vector ∈ {0,1}nwith|x|=kand a sequence of “questions” a ,a ,12··· ∈{0,1}n, where the algorithm must reply to each question with〈a ,xi〉(mod 2), what is the best trade off between the number of mistakes made by the algorithm and its time complexity? We improve the previous best result of Buhrman et. al. By an exp (k) factor in the timecomplexity. Next, we consider the problem of learning k-parities in the presence of classification noise of rate η∈(0,12). A polynomial time algorithm for this problem (whenη >0 andk=ω(1))is a longstanding challenge in learning theory. Grigorescu et al. Showed an algorithm running in time(no/2)1+4η2+o(1). Note that this algorithm inherently requires time(nk/2)even when the noise rateη is polynomially small. We observe that for sufficiently small noise rate, it ispossible to break the(nk/2)barrier. In particular, if for some function f(n) =ω(logn) andα∈[12,1),k=n/f(n) andη=o(f(n)−α/logn), then there is an algorithm for the problem with running time poly(n)·( )nk1−α·e−k/4.01.Moving on to the k-Vector-SUM problem, where given n vectors〈v ,v ,...,v12n〉over the vector space Fdq, a target vector tand an integer k>1, determine whether there exists k vectors whose sum list, where sum is over the field Fq. We show a parameterized reduction fromk-Clique problem to k-Vector-SUM problem, thus showing the hardness of k-Vector-SUM. In parameterized complexity theory, our reduction shows that the k-Vector-SUM problem is hard for the class W[1], although, Downey and Fellows have shown the W[1]-hardness result for k-Vector-SUM using other techniques. In our next attempt, we try to show connections between k-Vector-SUM and k-LPN. First we prove that, a variant of k-Vector-SUM problem, called k-Noisy-SUM is at least as hard as the k-LPN problem. This implies that any improvements ink-Noisy-SUM would result into improved algorithms fork-LPN. In our next result, we show a reverse reduction from k-Vector-SUM to k-LPN with high noise rate. Providing lower bounds fork-LPN problem is an open challenge and many algorithms in cryptography have been developed assuming its1 2hardness. Our reduction shows that k-LPN with noise rate η=12−12·nk·2−n(k−1k)cannot be solved in time no(k)assuming Exponential Time Hypothesis and, thus providing lower bound for a weaker version of k-LPN

APA, Harvard, Vancouver, ISO, and other styles

45

(6642491), Jingzhao Dai. "SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION." Thesis, 2019.

Find full text

Abstract:

Speech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].

In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.

Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.

APA, Harvard, Vancouver, ISO, and other styles

46

Rossi, Stefano, and Piero Tortoli. "Development and validation of novel approaches for real-time ultrasound vector velocity measurements." Doctoral thesis, 2021. http://hdl.handle.net/2158/1239650.

Full text

Abstract:

Ultrasound imaging techniques have become increasingly successful in the medical field as they provide relatively low cost and totally safe diagnosis. Doppler methods focus on blood flow for the diagnosis and follow-up of cardiovascular diseases. First Doppler methods only measured the axial component of the motion. More recently, advanced methods have solved this problem, by estimating two or even all three velocity components. In this context, high frame rate (HFR) imaging techniques, based on the transmission of plane waves (PW), lead to the reconstruction of 2-D and 3-D vector maps of blood velocity distribution. The aim of this Ph.D. project was to develop novel acquisition schemes and processing methods for advanced ultrasound Doppler systems. Each development step was based on simulations and experimental tests. Simulations were based on Field II©, while experiments were conducted by using the ULA OP 256 open scanner. In particular, the recently proposed 2-D HFR vector flow imaging (VFI) method (DOI: 10.1109/TUFFC.2014.3064), based on the frequency domain for displacement estimation, was thoroughly investigated. Three main issues were addressed: the high underestimation of blood flow velocity observed when examining vessels at great depths, the high computational load, which hindered any real-time implementation and the lack of information about the third velocity component. Specifically, the progressive broadening of the transmitted beam on the elevation plane due to the acoustic lens was demonstrated to be responsible for the underestimation. The computational cost was reduced by processing demodulated and down-sampled baseband data instead of radiofrequency data, and a preliminary real time version of the 2-D VFI method was implemented. It was also found that a more efficient implementation could be obtained by exploiting parallel computing and graphic processing units (GPUs). An expansion circuit board for the ULA-OP 256 hardware, which allocates GPU resources, was thus designed and built. This new system architecture may allow the implementation of even more complex algorithms, such as the 3-D VFI methods. In particular, it will be possible to implement the novel method for 3D VFI that was developed and tested during this Ph.D. project. Such method suitably extended the 2D VFI approach by proposing an efficient estimation strategy that considerably limits the overall computational load.

APA, Harvard, Vancouver, ISO, and other styles

47

Παπαδήμα, Ελισσάβετ. "Πειραματική αξιολόγηση μεθοδολογίας βελτιστοποίησης του αλγόριθμου πολλαπλασιασμού πίνακα επί διάνυσμα σε μονοπύρηνες και πολυπύρηνες αρχιτεκτονικές." Thesis, 2013. http://hdl.handle.net/10889/7283.

Full text

Abstract:

Στην παρούσα διπλωματική εργασία έγινε υλοποίηση και πειραματική αξιολόγηση μιας μεθοδολογίας η οποία έχει αναπτυχθεί στο Εργαστήριο Ολοκληρωμένων Κυκλωμάτων και αφορά τη βελτιστοποίηση του Πολλαπλασιασμού Πίνακα επί Διάνυσμα (ΠΠΔ) σε μονοπύρηνους και πολυπύρηνους επεξεργαστές. Η μεθοδολογία εκμεταλλεύεται το σύνολο των χαρακτηριστικών της αρχιτεκτονικής που χρησιμοποιείται και συγκεκριμένα (α) την ιεραρχία της μνήμης, (β) το μέγεθος της κρυφής μνήμης, (γ) το βαθμό συσχέτισης κάθε επιπέδου της κρυφής μνήμης, (δ) την καθυστέρηση της μνήμης και (ε) το πλήθος των πυρήνων. Είναι η πρώτη φορά που λαμβάνεται υπόψη ο βαθμός συσχέτισης της μνήμης. Σκοπός της μεθοδολογίας είναι η βελτιστοποίηση με βάση όλες τις παραμέτρους μαζί και όχι καθεμία ξεχωριστά. Για να βελτιωθεί η απόδοση προτείνεται διαφορετικός χρονοπρογραμματισμός ανάλογα με το μέγεθος του πίνακα. Για την πειραματική αξιολόγηση χρησιμοποιήθηκαν οι επεξεργαστές γενικού σκοπού Intel Core 2 Duo E6065 και Τ6600, ο Intel i7-3930K και ο ενσωματωμένος επεξεργαστής ειδικού σκοπού Microblaze από το Virtex-5 FPGA (Xilinx). Τα αποτελέσματα συγκρίνονται με την state-of-the-art βιβλιοθήκη ATLAS (Automatically Tuned Linear Algebra Software) και παρουσιάζουν βελτίωση 30%. Από τα πειραματικά αποτελέσματα είναι φανερό ότι η κύρια μνήμη είναι το bottleneck του προβλήματος. Επίσης, η απόδοση βελτιώνεται όταν αλλάζει το layout του πίνακα τόσο σε μονοπύρηνες όσο σε πολυπύρηνες αρχιτεκτονικές. Όσον αφορά τη μέθοδο του tiling τα πειραματικά αποτελέσματα δείχνουν ότι η μείωση των αστοχιών δεν βελτιώνει πάντα την απόδοση γιατί υπάρχει trade-off ανάμεσα στο μέγεθος του tile και στις εντολές διευθυνσιοδότησης. Επίσης, είναι φανερό ότι στις πολυπύρηνες αρχιτεκτονικές δεν υπάρχει γραμμική σχέση της απόδοσης και του πλήθους των πυρήνων που χρησιμοποιούνται. Αυτό οφείλεται στο περιορισμένο εύρος ζώνης της μνήμης.
The subject of this MSc Thesis is the implementation and the experimental evaluation of a methodology that has been developed at the Laboratory of Integrated Circuits and optimizes the Matrix Vector Multiplication (MVM) in single-core and multi-core processors. The methodology fully exploits the characteristics of the architecture. Specifically, it exploits (a) the hierarchy of the memory, (b) the cache size, (c) the cache associativity, (d) the memory latency and (e) the number of the cores. It is the first time that the cache associativity is taken into account. The methodology optimizes all the parameters together as one problem and not separately. A different scheduling is proposed according to the size of the matrix. The general purpose processors Intel Core 2 Duo E6065, Intel Core 2 Duo T6600 and Intel i7-3930K and the embedded processor Virtex-5 Microblaze have been used. The results have been compared with the state-of-the-art library ATLAS (Automatically Tuned Linear Algebra Software) and the performance is improved by 30%. According to the experimental results, it is obvious that the bottleneck is the memory latency. Moreover, the performance is increased when a new way of saving the matrix in the main memory (data array layout) is used in both single-core and multi-core architectures. As far as the tiling is concerned, the experimental results indicate that the decrease of the misses does not always improve the performance because there is a trade-off between the tile size and the addressing instructions. According to the experimental results, as far as multicore architectures are concerned, there is no linear relation between the performance and the number of the cores, because of the limited memory bandwidth.

APA, Harvard, Vancouver, ISO, and other styles

48

Bapat, Tanuja. "Sparse Multiclass And Multi-Label Classifier Design For Faster Inference." Thesis, 2011. http://etd.iisc.ernet.in/handle/2005/2065.

Full text

Abstract:

Many real-world problems like hand-written digit recognition or semantic scene classiﬁcation are treated as multiclass or multi-label classiﬁcation prob-lems. Solutions to these problems using support vector machines (SVMs) are well studied in literature. In this work, we focus on building sparse max-margin classiﬁers for multiclass and multi-label classiﬁcation. Sparse representation of the resulting classiﬁer is important both from eﬃcient training and fast inference viewpoints. This is true especially when the training and test set sizes are large.Very few of the existing multiclass and multi-label classiﬁcation algorithms have given importance to controlling the sparsity of the designed classiﬁers directly. Further, these algorithms were not found to be scalable. Motivated by this, we propose new formulations for sparse multiclass and multi-label classiﬁer design and also give eﬃcient algorithms to solve them. The formulation for sparse multi-label classiﬁcation also incorporates the prior knowledge of label correlations. In both the cases, the classiﬁcation model is designed using a common set of basis vectors across all the classes. These basis vectors are greedily added to an initially empty model, to approximate the target function. The sparsity of the classiﬁer can be controlled by a user deﬁned parameter, dmax which indicates the max-imum number of common basis vectors. The computational complexity of these algorithms for multiclass and multi-label classiﬁer designisO(lk2d2 max), Where l is the number of training set examples and k is the number of classes. The inference time for the proposed multiclass and multi-label classiﬁers is O(kdmax). Numerical experiments on various real-world benchmark datasets demonstrate that the proposed algorithms result in sparse classiﬁers that require lesser number of basis vectors than required by state-of-the-art algorithms, to attain the same generalization performance. Very small value of dmax results in signiﬁcant reduction in inference time. Thus, the proposed algorithms provide useful alternatives to the existing algorithms for sparse multiclass and multi-label classiﬁer design.

APA, Harvard, Vancouver, ISO, and other styles

49

Heinemeyer, Eric. "Integral Equation Methods for Rough Surface Scattering Problems in three Dimensions." Doctoral thesis, 2008. http://hdl.handle.net/11858/00-1735-0000-000D-F15F-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Balamurugan, P. "Efficient Algorithms for Structured Output Learning." Thesis, 2014. http://etd.iisc.ernet.in/2005/3488.

Full text

Abstract:

Structured output learning is the machine learning task of building a classiﬁer to predict structured outputs. Structured outputs arise in several contexts in diverse applications like natural language processing, computer vision, bioinformatics and social networks. Unlike the simple two(or multi)-class outputs which belong to a set of distinct or univariate categories, structured outputs are composed of multiple components with complex interdependencies amongst them. As an illustrative example ,consider the natural language processing task of tagging a sentence with its corresponding part-of-speech tags. The part-of-speech tag sequence is an example of a structured output as it is made up of multiple components, the interactions among them being governed by the underlying properties of the language. This thesis provides eﬃcient solutions for diﬀerent problems pertaining to structured output learning. The classiﬁer for structured outputs is generally built by learning a suitable model from a set of training examples labeled with their associated structured outputs. Discriminative techniques like Structural Support Vector Machines(Structural SVMs) and Conditional Random Fields(CRFs) are popular alternatives developed for structured output learning. The thesis contributes towards developing eﬃcient training strategies for structural SVMs. In particular, an eﬃcient sequential optimization method is proposed for structural SVMs, which is faster than several competing methods. An extension of the sequential method to CRFs is also developed. The sequential method is adapted to a variant of structural SVM with linear cumulative loss. The thesis also presents a systematic empirical evaluation of various training methods available for structured output learning, which will be useful to the practitioner. To train structural SVMs in the presence of a vast number of training examples without labels, the thesis develops a simple semi-supervised technique based on switching the labels of the components of the structured output. The proposed technique is general and its eﬃcacy is demonstrated using experiments on diﬀerent benchmark applications. Another contribution of the thesis is towards the design of fast algorithms for sparse structured output learning. Eﬃcient alternating optimization algorithms are developed for sparse classiﬁer design. These algorithms are shown to achieve sparse models faster, when compared to existing methods.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Sparse Vector Vector Multiplication'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles