To see the other types of publications on this topic, follow the link: Algorithmes GPU.

Journal articles on the topic 'Algorithmes GPU'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Algorithmes GPU.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Boulay, Thomas, Nicolas Gac, Ali Mohammad-Djafari, and Julien Lagoutte. "Algorithmes de reconnaissance NCTR et parallélisation sur GPU." Traitement du signal 30, no. 6 (April 28, 2013): 309–42. http://dx.doi.org/10.3166/ts.30.309-342.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Rios-Willars, Ernesto, Jennifer Velez-Segura, and María Magdalena Delabra-Salinas. "Enhancing Multiple Sequence Alignment with Genetic Algorithms: A Bioinformatics Approach in Biomedical Engineering." Revista Mexicana de Ingeniería Biomédica 45, no. 2 (May 1, 2024): 62–77. http://dx.doi.org/10.17488/rmib.45.2.4.

Full text
Abstract:
This study aimed to create a genetic information processing technique for the problem of multiple alignment of genetic sequences in bioinformatics. The objective was to take advantage of the computer hardware's capabilities and analyze the results obtained regarding quality, processing time, and the number of evaluated functions. The methodology was based on developing a genetic algorithm in Java, which resulted in four different versions: Gp1, Gp2, Gp3 and Gp4 . A set of genetic sequences were processed, and the results were evaluated by analyzing numerical behavior profiles. The research found that algorithms that maintained diversity in the population produced better quality solutions, and parallel processing reduced processing time. It was observed that the time required to perform the process decreased, according to the generated performance profile. The study concluded that conventional computer equipment can produce excellent results when processing genetic information if algorithms are optimized to exploit hardware resources. The computational effort of the hardware used is directly related to the number of evaluated functions. Additionally, the comparison method based on the determination of the performance profile is highlighted as a strategy for comparing the algorithm results in different metrics of interest, which can guide the development of more efficient genetic information processing techniques.
APA, Harvard, Vancouver, ISO, and other styles
3

SOMAN, JYOTHISH, KISHORE KOTHAPALLI, and P. J. NARAYANAN. "SOME GPU ALGORITHMS FOR GRAPH CONNECTED COMPONENTS AND SPANNING TREE." Parallel Processing Letters 20, no. 04 (December 2010): 325–39. http://dx.doi.org/10.1142/s0129626410000272.

Full text
Abstract:
Graphics Processing Units (GPU) are application specific accelerators which provide high performance to cost ratio and are widely available and used, hence places them as a ubiquitous accelerator. A computing paradigm based on the same is the general purpose computing on the GPU (GPGPU) model. The GPU due to its graphics lineage is better suited for the data-parallel, data-regular algorithms. The hardware architecture of the GPU is not suitable for the data parallel but data irregular algorithms such as graph connected components and list ranking. In this paper, we present results that show how to use GPUs efficiently for graph algorithms which are known to have irregular data access patterns. We consider two fundamental graph problems: finding the connected components and finding a spanning tree. These two problems find applications in several graph theoretical problems. In this paper we arrive at efficient GPU implementations for the above two problems. The algorithms focus on minimising irregularity at both algorithmic and implementation level. Our implementation achieves a speedup of 11-16 times over a corresponding best sequential implementation.
APA, Harvard, Vancouver, ISO, and other styles
4

Schnös, Florian, Dirk Hartmann, Birgit Obst, and Glenn Glashagen. "GPU accelerated voxel-based machining simulation." International Journal of Advanced Manufacturing Technology 115, no. 1-2 (May 8, 2021): 275–89. http://dx.doi.org/10.1007/s00170-021-07001-w.

Full text
Abstract:
AbstractThe simulation of subtractive manufacturing processes has a long history in engineering. Corresponding predictions are utilized for planning, validation and optimization, e.g., of CNC-machining processes. With the up-rise of flexible robotic machining and the advancements of computational and algorithmic capability, the simulation of the coupled machine-process behaviour for complex machining processes and large workpieces is within reach. These simulations require fast material removal predictions and analysis with high spatial resolution for multi-axis operations. Within this contribution, we propose to leverage voxel-based concepts introduced in the computer graphics industry to accelerate material removal simulations. Corresponding schemes are well suited for massive parallelization. By leveraging the computational power offered by modern graphics hardware, the computational performance of high spatial accuracy volumetric voxel-based algorithms is further improved. They now allow for very fast and accurate volume removal simulation and analysis of machining processes. Within this paper, a detailed description of the data structures and algorithms is provided along a detailed benchmark for common machining operations.
APA, Harvard, Vancouver, ISO, and other styles
5

Zatolokin, Y. A., E. I. Vatutin, and V. S. Titov. "ALGORITHMIC OPTIMIZATION OF SOFTWARE IMPLEMENTATION OF ALGORITHMS FOR MULTIPLYING DENSE REAL MATRICES ON GRAPHICS PROCESSORS WITH OPENGL TECHNOLOGY SUPPORT." Proceedings of the Southwest State University 21, no. 5 (October 28, 2017): 6–15. http://dx.doi.org/10.21869/2223-1560-2017-21-5-06-15.

Full text
Abstract:
In the article was given statement of a problem of matrix multiplication. Is is show that desired problem can be simpl formulated but for its solving may be required both heuristic methods and set of algorithmic modifications relating to algorithmic and high-level software optimization taking into account the particular problem and allow to increase the multiplication performance. These include: a comparative analysis of the performance of the actions performed without GPU-specific optimizations and with optimizations, which showed that computations without optimizing the work with global GPU memory have low processing performance. Optimizing data distribution in global and local memory The GPU allows you to reuse the calculation time and increase real performance. To compare the performance of the developed software implementations for OpenGL and CUDA technologies, identical calculations on identical GPUs were performed, which showed higher real performance when using CUDA cores. Specific values of generation performance measured for multi-threaded software implementation on GPU are given for all of described optimizations. It is shown that the most effective approach is based on the method we can get much more performance by technique of caching sub-blocks of the matrices (tiles) in the GPU's on-chip local memory, that with specialized software implementation is provide the performance of 275,3 GFLOP/s for GPU GeForce GTX 960M.
APA, Harvard, Vancouver, ISO, and other styles
6

MERRILL, DUANE, and ANDREW GRIMSHAW. "HIGH PERFORMANCE AND SCALABLE RADIX SORTING: A CASE STUDY OF IMPLEMENTING DYNAMIC PARALLELISM FOR GPU COMPUTING." Parallel Processing Letters 21, no. 02 (June 2011): 245–72. http://dx.doi.org/10.1142/s0129626411000187.

Full text
Abstract:
The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for radix sorting; and (2) our allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism. We demonstrate multiple factors of speedup (up to 3.8x) compared to state-of-the-art GPU sorting. We also reverse the performance differentials observed between GPU and multi/many-core CPU architectures by recent comparisons in the literature, including those with 32-core CPU-based accelerators. Our average sorting rates exceed 1B 32-bit keys/sec on a single GPU microprocessor. Our sorting passes are constructed from a very efficient parallel prefix scan "runtime" that incorporates three design features: (1) kernel fusion for locally generating and consuming prefix scan data; (2) multi-scan for performing multiple related, concurrent prefix scans (one for each partitioning bin); and (3) flexible algorithm serialization for avoiding unnecessary synchronization and communication within algorithmic phases, allowing us to construct a single implementation that scales well across all generations and configurations of programmable NVIDIA GPUs.
APA, Harvard, Vancouver, ISO, and other styles
7

Gremse, Felix, Andreas Höfter, Lukas Razik, Fabian Kiessling, and Uwe Naumann. "GPU-accelerated adjoint algorithmic differentiation." Computer Physics Communications 200 (March 2016): 300–311. http://dx.doi.org/10.1016/j.cpc.2015.10.027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Rapaport, D. C. "GPU molecular dynamics: Algorithms and performance." Journal of Physics: Conference Series 2241, no. 1 (March 1, 2022): 012007. http://dx.doi.org/10.1088/1742-6596/2241/1/012007.

Full text
Abstract:
Abstract A previous study of MD algorithms designed for GPU use is extended to cover more recent developments in GPU architecture. Algorithm modifications are described, togther with extensions to more complex systems. New measurements include the effects of increased parallelism on GPU performance, as well as comparisons with multiple-core CPUs using multitasking based on CPU threads and message passing. The results show that the GPU retains a significant performance advantage.
APA, Harvard, Vancouver, ISO, and other styles
9

Mikhayluk, M. V., and A. M. Trushin. "Spheres Collision Detection Algorithms on GPU." PROGRAMMNAYA INGENERIA 8, no. 8 (August 15, 2017): 354–58. http://dx.doi.org/10.17587/prin.8.354-358.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Matei, Adrian, Cristian Lupașcu, and Ion Bica. "On GPU Implementations of Encryption Algorithms." Journal of Military Technology 2, no. 2 (December 18, 2019): 29–34. http://dx.doi.org/10.32754/jmt.2019.2.04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Zhang, Xingyi, Bangju Wang, Zhuanlian Ding, Jin Tang, and Juanjuan He. "Implementation of Membrane Algorithms on GPU." Journal of Applied Mathematics 2014 (2014): 1–7. http://dx.doi.org/10.1155/2014/307617.

Full text
Abstract:
Membrane algorithms are a new class of parallel algorithms, which attempt to incorporate some components of membrane computing models for designing efficient optimization algorithms, such as the structure of the models and the way of communication between cells. Although the importance of the parallelism of such algorithms has been well recognized, membrane algorithms were usually implemented on the serial computing device central processing unit (CPU), which makes the algorithms unable to work in an efficient way. In this work, we consider the implementation of membrane algorithms on the parallel computing device graphics processing unit (GPU). In such implementation, all cells of membrane algorithms can work simultaneously. Experimental results on two classical intractable problems, the point set matching problem and TSP, show that the GPU implementation of membrane algorithms is much more efficient than CPU implementation in terms of runtime, especially for solving problems with a high complexity.
APA, Harvard, Vancouver, ISO, and other styles
12

Singh, Dhirendra Pratap, Ishan Joshi, and Jaytrilok Choudhary. "Survey of GPU Based Sorting Algorithms." International Journal of Parallel Programming 46, no. 6 (April 11, 2017): 1017–34. http://dx.doi.org/10.1007/s10766-017-0502-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Ngo, Long Thanh, Dzung Dinh Nguyen, Long The Pham, and Cuong Manh Luong. "Speedup of Interval Type 2 Fuzzy Logic Systems Based on GPU for Robot Navigation." Advances in Fuzzy Systems 2012 (2012): 1–11. http://dx.doi.org/10.1155/2012/698062.

Full text
Abstract:
As the number of rules and sample rate for type 2 fuzzy logic systems (T2FLSs) increases, the speed of calculations becomes a problem. The T2FLS has a large membership value of inherent algorithmic parallelism that modern CPU architectures do not exploit. In the T2FLS, many rules and algorithms can be speedup on a graphics processing unit (GPU) as long as the majority of computation a various stages and components are not dependent on each other. This paper demonstrates how to install interval type 2 fuzzy logic systems (IT2-FLSs) on the GPU and experiments for obstacle avoidance behavior of robot navigation. GPU-based calculations are high-performance solution and free up the CPU. The experimental results show that the performance of the GPU is many times faster than CPU.
APA, Harvard, Vancouver, ISO, and other styles
14

Lee, Taekhee, and Young J. Kim. "Massively parallel motion planning algorithms under uncertainty using POMDP." International Journal of Robotics Research 35, no. 8 (August 21, 2015): 928–42. http://dx.doi.org/10.1177/0278364915594856.

Full text
Abstract:
We present new parallel algorithms that solve continuous-state partially observable Markov decision process (POMDP) problems using the GPU (gPOMDP) and a hybrid of the GPU and CPU (hPOMDP). We choose the Monte Carlo value iteration (MCVI) method as our base algorithm and parallelize this algorithm using the multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to utilize the massive data parallelism available on modern GPUs. Our GPU-based method uses the two workload distribution techniques, compute/data interleaving and workload balancing, in order to obtain the maximum parallel performance at the highest level. Here we also present a CPU–GPU hybrid method that takes advantage of both CPU and GPU parallelism in order to solve highly complex POMDP planning problems. The CPU is responsible for data preparation, while the GPU performs Monte Cacrlo simulations; these operations are performed concurrently using the compute/data overlap technique between the CPU and GPU. To the best of the authors’ knowledge, our algorithms are the first parallel algorithms that efficiently execute POMDP in a massively parallel fashion utilizing the GPU or a hybrid of the GPU and CPU. Our algorithms outperform the existing CPU-based algorithm by a factor of 75–99 based on the chosen benchmark.
APA, Harvard, Vancouver, ISO, and other styles
15

Andrzejewski, Witold, Artur Gramacki, and Jarosław Gramacki. "Graphics processing units in acceleration of bandwidth selection for kernel density estimation." International Journal of Applied Mathematics and Computer Science 23, no. 4 (December 1, 2013): 869–85. http://dx.doi.org/10.2478/amcs-2013-0065.

Full text
Abstract:
Abstract The Probability Density Function (PDF) is a key concept in statistics. Constructing the most adequate PDF from the observed data is still an important and interesting scientific problem, especially for large datasets. PDFs are often estimated using nonparametric data-driven methods. One of the most popular nonparametric method is the Kernel Density Estimator (KDE). However, a very serious drawback of using KDEs is the large number of calculations required to compute them, especially to find the optimal bandwidth parameter. In this paper we investigate the possibility of utilizing Graphics Processing Units (GPUs) to accelerate the finding of the bandwidth. The contribution of this paper is threefold: (a) we propose algorithmic optimization to one of bandwidth finding algorithms, (b) we propose efficient GPU versions of three bandwidth finding algorithms and (c) we experimentally compare three of our GPU implementations with the ones which utilize only CPUs. Our experiments show orders of magnitude improvements over CPU implementations of classical algorithms.
APA, Harvard, Vancouver, ISO, and other styles
16

Lin, Chun-Yuan, Wei Sheng Lee, and Chuan Yi Tang. "Parallel Shellsort Algorithm for Many-Core GPUs with CUDA." International Journal of Grid and High Performance Computing 4, no. 2 (April 2012): 1–16. http://dx.doi.org/10.4018/jghpc.2012040101.

Full text
Abstract:
Sorting is a classic algorithmic problem and its importance has led to the design and implementation of various sorting algorithms on many-core graphics processing units (GPUs). CUDPP Radix sort is the most efficient sorting on GPUs and GPU Sample sort is the best comparison-based sorting. Although the implementations of these algorithms are efficient, they either need an extra space for the data rearrangement or the atomic operation for the acceleration. Sorting applications usually deal with a large amount of data, thus the memory utilization is an important consideration. Furthermore, these sorting algorithms on GPUs without the atomic operation support can result in the performance degradation or fail to work. In this paper, an efficient implementation of a parallel shellsort algorithm, CUDA shellsort, is proposed for many-core GPUs with CUDA. Experimental results show that, on average, the performance of CUDA shellsort is nearly twice faster than GPU quicksort and 37% faster than Thrust mergesort under uniform distribution. Moreover, its performance is the same as GPU sample sort up to 32 million data elements, but only needs a constant space usage. CUDA shellsort is also robust over various data distributions and could be suitable for other many-core architectures.
APA, Harvard, Vancouver, ISO, and other styles
17

Andrecut, M. "Parallel GPU Implementation of Iterative PCA Algorithms." Journal of Computational Biology 16, no. 11 (November 2009): 1593–99. http://dx.doi.org/10.1089/cmb.2008.0221.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Blanchard, Jeffrey D., and Jared Tanner. "GPU accelerated greedy algorithms for compressed sensing." Mathematical Programming Computation 5, no. 3 (July 13, 2013): 267–304. http://dx.doi.org/10.1007/s12532-013-0056-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Cecilia, José M., Andy Nisbet, Martyn Amos, José M. García, and Manuel Ujaldón. "Enhancing GPU parallelism in nature-inspired algorithms." Journal of Supercomputing 63, no. 3 (May 3, 2012): 773–89. http://dx.doi.org/10.1007/s11227-012-0770-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Jin, Jing, Xianggao Cai, Guoming Lai, and Xiaola Lin. "GPU-accelerated parallel algorithms for linear rankSVM." Journal of Supercomputing 71, no. 11 (August 27, 2015): 4141–71. http://dx.doi.org/10.1007/s11227-015-1509-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Barrientos, Ricardo J., Fabricio Millaguir, José L. Sánchez, and Enrique Arias. "GPU-based exhaustive algorithms processing kNN queries." Journal of Supercomputing 73, no. 10 (July 17, 2017): 4611–34. http://dx.doi.org/10.1007/s11227-017-2110-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Wang, Hongzhi, Ning Li, Zheng Wang, and Jianing Li. "GPU-based efficient join algorithms on Hadoop." Journal of Supercomputing 77, no. 1 (April 3, 2020): 292–321. http://dx.doi.org/10.1007/s11227-020-03262-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Ferreira, João Filipe, Jorge Lobo, and Jorge Dias. "Bayesian real-time perception algorithms on GPU." Journal of Real-Time Image Processing 6, no. 3 (February 26, 2010): 171–86. http://dx.doi.org/10.1007/s11554-010-0156-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Lee, Kwan-Ho, and Chi-Yong Kim. "Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms." Journal of IKEEE 21, no. 1 (March 31, 2017): 92–95. http://dx.doi.org/10.7471/ikeee.2017.21.1.92.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Acer, Seher, Ariful Azad, Erik G. Boman, Aydın Buluç, Karen D. Devine, SM Ferdous, Nitin Gawande, et al. "EXAGRAPH: Graph and combinatorial methods for enabling exascale applications." International Journal of High Performance Computing Applications 35, no. 6 (September 30, 2021): 553–71. http://dx.doi.org/10.1177/10943420211029299.

Full text
Abstract:
Combinatorial algorithms in general and graph algorithms in particular play a critical enabling role in numerous scientific applications. However, the irregular memory access nature of these algorithms makes them one of the hardest algorithmic kernels to implement on parallel systems. With tens of billions of hardware threads and deep memory hierarchies, the exascale computing systems in particular pose extreme challenges in scaling graph algorithms. The codesign center on combinatorial algorithms, ExaGraph, was established to design and develop methods and techniques for efficient implementation of key combinatorial (graph) algorithms chosen from a diverse set of exascale applications. Algebraic and combinatorial methods have a complementary role in the advancement of computational science and engineering, including playing an enabling role on each other. In this paper, we survey the algorithmic and software development activities performed under the auspices of ExaGraph from both a combinatorial and an algebraic perspective. In particular, we detail our recent efforts in porting the algorithms to manycore accelerator (GPU) architectures. We also provide a brief survey of the applications that have benefited from the scalable implementations of different combinatorial algorithms to enable scientific discovery at scale. We believe that several applications will benefit from the algorithmic and software tools developed by the ExaGraph team.
APA, Harvard, Vancouver, ISO, and other styles
26

Agibalov, Oleg, and Nikolay Ventsov. "On the issue of fuzzy timing estimations of the algorithms running at GPU and CPU architectures." E3S Web of Conferences 135 (2019): 01082. http://dx.doi.org/10.1051/e3sconf/201913501082.

Full text
Abstract:
We consider the task of comparing fuzzy estimates of the execution parameters of genetic algorithms implemented at GPU (graphics processing unit’ GPU) and CPU (central processing unit) architectures. Fuzzy estimates are calculated based on the averaged dependencies of the genetic algorithms running time at GPU and CPU architectures from the number of individuals in the populations processed by the algorithm. The analysis of the averaged dependences of the genetic algorithms running time at GPU and CPU-architectures showed that it is possible to process 10’000 chromosomes at GPU-architecture or 5’000 chromosomes at CPUarchitecture by genetic algorithm in approximately 2’500 ms. The following is correct for the cases under consideration: “Genetic algorithms (GA) are performed in approximately 2, 500 ms (on average), ” and a sections of fuzzy sets, with a = 0.5, correspond to the intervals [2, 000.2399] for GA performed at the GPU-architecture, and [1, 400.1799] for GA performed at the CPU-architecture. Thereby, it can be said that in this case, the actual execution time of the algorithm at the GPU architecture deviates in a lesser extent from the average value than at the CPU.
APA, Harvard, Vancouver, ISO, and other styles
27

Retamosa, Germán, Luis de Pedro, Ivan González, and Javier Tamames. "Prefiltering Model for Homology Detection Algorithms on GPU." Evolutionary Bioinformatics 12 (January 2016): EBO.S40877. http://dx.doi.org/10.4137/ebo.s40877.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Bogle, Ian, George M. Slota, Erik G. Boman, Karen D. Devine, and Sivasankaran Rajamanickam. "Parallel graph coloring algorithms for distributed GPU environments." Parallel Computing 110 (May 2022): 102896. http://dx.doi.org/10.1016/j.parco.2022.102896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

FUKUII, Masahiro, and Kenichi HAYASHI. "Techniques for Accelerating LSI Design Algorithms by GPU." IEICE ESS Fundamentals Review 6, no. 3 (2013): 210–17. http://dx.doi.org/10.1587/essfr.6.210.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

LUONG, THÉ VAN, NOUREDINE MELAB, and EL-GHAZALI TALBI. "NEIGHBORHOOD STRUCTURES FOR GPU-BASED LOCAL SEARCH ALGORITHMS." Parallel Processing Letters 20, no. 04 (December 2010): 307–24. http://dx.doi.org/10.1142/s0129626410000260.

Full text
Abstract:
Local search algorithms are powerful heuristics for solving computationally hard problems in science and industry. In these methods, designing neighborhood operators to explore large promising regions of the search space may improve the quality of the obtained solutions at the expense of a high-cost computation process. As a consequence, the use of GPU computing provides an efficient way to speed up the search. However, designing applications on a GPU is still complex and many issues have to be faced. We provide a methodology to design and implement different neighborhood structures for LS algorithms on a GPU. The work has been evaluated for binary problems and the obtained results are convincing both in terms of efficiency, quality and robustness of the provided solutions at run time.
APA, Harvard, Vancouver, ISO, and other styles
31

Schmitz, L., L. F. Scheidegger, D. K. Osmari, C. A. Dietrich, and J. L. D. Comba. "Efficient and Quality Contouring Algorithms on the GPU." Computer Graphics Forum 29, no. 8 (September 27, 2010): 2569–78. http://dx.doi.org/10.1111/j.1467-8659.2010.01825.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Gonakhchyan, V. I. "Survey of polygonal surface simplification algorithms on GPU." Proceedings of the Institute for System Programming of RAS 26, no. 2 (2014): 159–74. http://dx.doi.org/10.15514/ispras-2014-26(2)-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Luong, The Van, Nouredine Melab, and El-Ghazali Talbi. "GPU Computing for Parallel Local Search Metaheuristic Algorithms." IEEE Transactions on Computers 62, no. 1 (January 2013): 173–85. http://dx.doi.org/10.1109/tc.2011.206.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Chakroun, I., and N. Melab. "Operator-level GPU-Accelerated Branch and Bound Algorithms." Procedia Computer Science 18 (2013): 280–89. http://dx.doi.org/10.1016/j.procs.2013.05.191.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Galiano, V., H. Migallón, V. Migallón, and J. Penadés. "GPU-based parallel algorithms for sparse nonlinear systems." Journal of Parallel and Distributed Computing 72, no. 9 (September 2012): 1098–105. http://dx.doi.org/10.1016/j.jpdc.2011.10.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Wang, Hongzhi, Zheng Wang, Ning Li, and Xinxin Kong. "Efficient OLAP algorithms on GPU-accelerated Hadoop clusters." Distributed and Parallel Databases 37, no. 4 (July 31, 2018): 507–42. http://dx.doi.org/10.1007/s10619-018-7239-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Ploskas, Nikolaos, and Nikolaos Samaras. "Efficient GPU-based implementations of simplex type algorithms." Applied Mathematics and Computation 250 (January 2015): 552–70. http://dx.doi.org/10.1016/j.amc.2014.10.096.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Cazalas, Jonathan, and Ratan K. Guha. "Performance Modeling of Spatio-Temporal Algorithms Over GEDS Framework." International Journal of Grid and High Performance Computing 4, no. 3 (July 2012): 63–84. http://dx.doi.org/10.4018/jghpc.2012070104.

Full text
Abstract:
The efficient processing of spatio-temporal data streams is an area of intense research. However, all methods rely on an unsuitable processor (Govindaraju, 2004), namely a CPU, to evaluate concurrent, continuous spatio-temporal queries over these data streams. This paper presents a performance model of the execution of spatio-temporal queries over the authors’ GEDS framework (Cazalas & Guha, 2010). GEDS is a scalable, Graphics Processing Unit (GPU)-based framework, employing computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal queries over spatio temporal data streams. Experimental evaluation shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments and demonstrates that, despite the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. To move beyond the analysis of specific algorithms over the GEDS framework, the authors developed an abstract performance model, detailing the relationship of the CPU and the GPU. From this model, they are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based applications.
APA, Harvard, Vancouver, ISO, and other styles
39

WEIGEL, MARTIN. "SIMULATING SPIN MODELS ON GPU: A TOUR." International Journal of Modern Physics C 23, no. 08 (August 2012): 1240002. http://dx.doi.org/10.1142/s0129183112400025.

Full text
Abstract:
The use of graphics processing units (GPUs) in scientific computing has gathered considerable momentum in the past five years. While GPUs in general promise high performance and excellent performance per Watt ratios, not every class of problems is equally well suitable for exploiting the massively parallel architecture they provide. Lattice spin models appear to be prototypic examples of problems suitable for this architecture, at least as long as local update algorithms are employed. In this review, I summarize our recent experience with the simulation of a wide range of spin models on GPU employing an equally wide range of update algorithms, ranging from Metropolis and heat bath updates, over cluster algorithms to generalized ensemble simulations.
APA, Harvard, Vancouver, ISO, and other styles
40

Gajic, Dusan, and Radomir Stankovic. "GPU accelerated computation of fast spectral transforms." Facta universitatis - series: Electronics and Energetics 24, no. 3 (2011): 483–99. http://dx.doi.org/10.2298/fuee1103483g.

Full text
Abstract:
This paper discusses techniques for accelerated computation of several fast spectral transforms on graphics processing units (GPUs) using the Open Computing Language (OpenCL). We present a reformulation of fast algorithms which takes into account peculiar properties of transforms to make them suitable for the GPU implementation. A special attention is paid to the organization of computations, memory transfer reductions, impact of integer and Boolean arithmetic, different structure of algorithms, etc. Performance of the GPU implementations is compared with the classical C/C++ implementations for the central processing unit (CPU). Experiments confirm that, even though the spectral transforms considered involve only simple arithmetic, significant speedups are achieved by implementing the algorithms in OpenCL and performing them on the GPU.
APA, Harvard, Vancouver, ISO, and other styles
41

Озерицкий, А. В. "Computational simulation using particles on GPU and GLSL language." Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie), no. 1 (January 19, 2023): 37–54. http://dx.doi.org/10.26089/nummet.v24r104.

Full text
Abstract:
Рассмотрено моделирование гравитационной задачи N тел с использованием алгоритмов PM и P3M. Реализация алгоритмов для GPU осуществлена с применением вычислительных шейдеров. Предложенный подход использует CPU-код только для синхронизации и запуска шейдеров и не содержит вычислительных частей, реализуемых на CPU; в том числе полностью отсутствует копирование данных между CPU и GPU. Приводятся параллельный алгоритм размещения частиц по ячейкам сетки и параллельный алгоритм распределения масс по узлам сетки. Основой алгоритмов является параллельное построение списков, соответствующих ячейкам сетки. Алгоритмы полностью распараллелены и не содержат частей, исполняемых в один поток. Для расчета одновременно с визуализацией часть вычислений сделана в вершинном шейдере. Выполнить их позволило использование буферных объектов в вершинном шейдере и специально подготовленных данных вместо вершин в качестве входа. Приведены результаты численных расчетов на примере образования галактических скоплений в расширяющейся согласно модели Фридмана плоской вселенной. В качестве модели вселенной брался куб с периодическими краевыми условиями по всем осям. Максимальное число частиц, с которым проводились расчеты, — 10 в степени 8. Для моделирования использовались современный кроссплатформенный API Vulkan и язык GLSL. Результаты расчетов на процессорах Apple M1 и Ryzen 3700X сравниваются с результатами расчетов на обычных видеокартах Apple M1 и NVIDIA RTX 3060. Параллельный алгоритм для CPU реализован с помощью OpenMP. Проведено сравнение производительности алгоритма с результатами других авторов, причем делаются качественные сравнения самих результатов вычислений и сравнение времени работы алгоритмов. Также приведено сравнение времени работы программы для GPU и похожей программы для кластера из многих узлов. The N-body problem simulation using PM and P3M algorithms is provided. A GPU implementation of an algorithm using compute shaders is provided. This algorithm uses the CPU for synchronizing and launching the shaders only, whereas it does not contain computational parts implemented on the CPU. That also includes no data copying between the GPU and CPU. Parallel algorithms for placing particles in grid cells and mass distribution in grid nodes are presented. The algorithms are based on parallel construction of linked lists corresponding to grid cells. The algorithms are completely parallel and do not contain sequential parts. Some calculations are done in a vertex shader to compute simultaneously with visualization. This was done with the help of shader buffer objects as well as specially prepared data instead of vertices as vertex shader input. The results of the numerical calculations using galaxy cluster formation based on a flat expanding Friedmann model universe are presented as an example. A cube with periodic boundary conditions on all axes was used as an example of a model universe. The maximum particle amount used in calculations is 10 to the power of 8. The modern cross platform API Vulkan and GLSL language were used for simulation purposes. The numerical calculations are compared using the Apple M1 and Ryzen 3700X processors, with the results using regular video cards — Apple M1 and NVIDIA RTX 3060. The parallel algorithm for the CPU is implemented using OpenMP. Algorithmic efficiency is compared to the results by other authors. The results of the algorithm are compared to the results of other authors, and the qualitative results and execution time are also compared. A comparison of the running time of programs for the GPU with a similar cluster program with many nodes is given.
APA, Harvard, Vancouver, ISO, and other styles
42

CHEN, Ying, Jin-xian LIN, and Tun LU. "Implementation of LU decomposition and Laplace algorithms on GPU." Journal of Computer Applications 31, no. 3 (May 18, 2011): 851–55. http://dx.doi.org/10.3724/sp.j.1087.2011.00851.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Date, Ketan, and Rakesh Nagi. "GPU-accelerated Hungarian algorithms for the Linear Assignment Problem." Parallel Computing 57 (September 2016): 52–72. http://dx.doi.org/10.1016/j.parco.2016.05.012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Castaño-Díez, Daniel, Dominik Moser, Andreas Schoenegger, Sabine Pruggnaller, and Achilleas S. Frangakis. "Performance evaluation of image processing algorithms on the GPU." Journal of Structural Biology 164, no. 1 (October 2008): 153–60. http://dx.doi.org/10.1016/j.jsb.2008.07.006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Lobeiras, Jacobo, Margarita Amor, and Ramon Doallo. "Designing Efficient Index-Digit Algorithms for CUDA GPU Architectures." IEEE Transactions on Parallel and Distributed Systems 27, no. 5 (May 1, 2016): 1331–43. http://dx.doi.org/10.1109/tpds.2015.2450718.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Wang, Hongjian, Naiyu Zhang, Jean-Charles Créput, Yassine Ruichek, and Julien Moreau. "Massively parallel GPU computing for fast stereo correspondence algorithms." Journal of Systems Architecture 65 (April 2016): 46–58. http://dx.doi.org/10.1016/j.sysarc.2016.03.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Cheng, John Runwei, and Mitsuo Gen. "Accelerating genetic algorithms with GPU computing: A selective overview." Computers & Industrial Engineering 128 (February 2019): 514–25. http://dx.doi.org/10.1016/j.cie.2018.12.067.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Krüger, Jens, and Rüdiger Westermann. "Linear algebra operators for GPU implementation of numerical algorithms." ACM Transactions on Graphics 22, no. 3 (July 2003): 908–16. http://dx.doi.org/10.1145/882262.882363.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Agarwal, Nipun, Aman Goyal, Gaurav Maheshwari, and Alok Dugtal. "Parallel Implementation of Scheduling Algorithms on GPU using CUDA." International Journal of Computer Applications 127, no. 2 (October 15, 2015): 44–49. http://dx.doi.org/10.5120/ijca2015906339.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Ben Boudaoud, Lynda, Basel Solaiman, and Abdelkamel Tari. "Implementation and comparison of binary thinning algorithms on GPU." Computing 101, no. 8 (August 16, 2018): 1091–117. http://dx.doi.org/10.1007/s00607-018-0653-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography