Artigos de revistas sobre o tema "Algorithmes GPU"

Siga este link para ver outros tipos de publicações sobre o tema: Algorithmes GPU.

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Veja os 50 melhores artigos de revistas para estudos sobre o assunto "Algorithmes GPU".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Veja os artigos de revistas das mais diversas áreas científicas e compile uma bibliografia correta.

1

Boulay, Thomas, Nicolas Gac, Ali Mohammad-Djafari e Julien Lagoutte. "Algorithmes de reconnaissance NCTR et parallélisation sur GPU". Traitement du signal 30, n.º 6 (28 de abril de 2013): 309–42. http://dx.doi.org/10.3166/ts.30.309-342.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Rios-Willars, Ernesto, Jennifer Velez-Segura e María Magdalena Delabra-Salinas. "Enhancing Multiple Sequence Alignment with Genetic Algorithms: A Bioinformatics Approach in Biomedical Engineering". Revista Mexicana de Ingeniería Biomédica 45, n.º 2 (1 de maio de 2024): 62–77. http://dx.doi.org/10.17488/rmib.45.2.4.

Texto completo da fonte
Resumo:
This study aimed to create a genetic information processing technique for the problem of multiple alignment of genetic sequences in bioinformatics. The objective was to take advantage of the computer hardware's capabilities and analyze the results obtained regarding quality, processing time, and the number of evaluated functions. The methodology was based on developing a genetic algorithm in Java, which resulted in four different versions: Gp1, Gp2, Gp3 and Gp4 . A set of genetic sequences were processed, and the results were evaluated by analyzing numerical behavior profiles. The research found that algorithms that maintained diversity in the population produced better quality solutions, and parallel processing reduced processing time. It was observed that the time required to perform the process decreased, according to the generated performance profile. The study concluded that conventional computer equipment can produce excellent results when processing genetic information if algorithms are optimized to exploit hardware resources. The computational effort of the hardware used is directly related to the number of evaluated functions. Additionally, the comparison method based on the determination of the performance profile is highlighted as a strategy for comparing the algorithm results in different metrics of interest, which can guide the development of more efficient genetic information processing techniques.
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

SOMAN, JYOTHISH, KISHORE KOTHAPALLI e P. J. NARAYANAN. "SOME GPU ALGORITHMS FOR GRAPH CONNECTED COMPONENTS AND SPANNING TREE". Parallel Processing Letters 20, n.º 04 (dezembro de 2010): 325–39. http://dx.doi.org/10.1142/s0129626410000272.

Texto completo da fonte
Resumo:
Graphics Processing Units (GPU) are application specific accelerators which provide high performance to cost ratio and are widely available and used, hence places them as a ubiquitous accelerator. A computing paradigm based on the same is the general purpose computing on the GPU (GPGPU) model. The GPU due to its graphics lineage is better suited for the data-parallel, data-regular algorithms. The hardware architecture of the GPU is not suitable for the data parallel but data irregular algorithms such as graph connected components and list ranking. In this paper, we present results that show how to use GPUs efficiently for graph algorithms which are known to have irregular data access patterns. We consider two fundamental graph problems: finding the connected components and finding a spanning tree. These two problems find applications in several graph theoretical problems. In this paper we arrive at efficient GPU implementations for the above two problems. The algorithms focus on minimising irregularity at both algorithmic and implementation level. Our implementation achieves a speedup of 11-16 times over a corresponding best sequential implementation.
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Schnös, Florian, Dirk Hartmann, Birgit Obst e Glenn Glashagen. "GPU accelerated voxel-based machining simulation". International Journal of Advanced Manufacturing Technology 115, n.º 1-2 (8 de maio de 2021): 275–89. http://dx.doi.org/10.1007/s00170-021-07001-w.

Texto completo da fonte
Resumo:
AbstractThe simulation of subtractive manufacturing processes has a long history in engineering. Corresponding predictions are utilized for planning, validation and optimization, e.g., of CNC-machining processes. With the up-rise of flexible robotic machining and the advancements of computational and algorithmic capability, the simulation of the coupled machine-process behaviour for complex machining processes and large workpieces is within reach. These simulations require fast material removal predictions and analysis with high spatial resolution for multi-axis operations. Within this contribution, we propose to leverage voxel-based concepts introduced in the computer graphics industry to accelerate material removal simulations. Corresponding schemes are well suited for massive parallelization. By leveraging the computational power offered by modern graphics hardware, the computational performance of high spatial accuracy volumetric voxel-based algorithms is further improved. They now allow for very fast and accurate volume removal simulation and analysis of machining processes. Within this paper, a detailed description of the data structures and algorithms is provided along a detailed benchmark for common machining operations.
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Zatolokin, Y. A., E. I. Vatutin e V. S. Titov. "ALGORITHMIC OPTIMIZATION OF SOFTWARE IMPLEMENTATION OF ALGORITHMS FOR MULTIPLYING DENSE REAL MATRICES ON GRAPHICS PROCESSORS WITH OPENGL TECHNOLOGY SUPPORT". Proceedings of the Southwest State University 21, n.º 5 (28 de outubro de 2017): 6–15. http://dx.doi.org/10.21869/2223-1560-2017-21-5-06-15.

Texto completo da fonte
Resumo:
In the article was given statement of a problem of matrix multiplication. Is is show that desired problem can be simpl formulated but for its solving may be required both heuristic methods and set of algorithmic modifications relating to algorithmic and high-level software optimization taking into account the particular problem and allow to increase the multiplication performance. These include: a comparative analysis of the performance of the actions performed without GPU-specific optimizations and with optimizations, which showed that computations without optimizing the work with global GPU memory have low processing performance. Optimizing data distribution in global and local memory The GPU allows you to reuse the calculation time and increase real performance. To compare the performance of the developed software implementations for OpenGL and CUDA technologies, identical calculations on identical GPUs were performed, which showed higher real performance when using CUDA cores. Specific values of generation performance measured for multi-threaded software implementation on GPU are given for all of described optimizations. It is shown that the most effective approach is based on the method we can get much more performance by technique of caching sub-blocks of the matrices (tiles) in the GPU's on-chip local memory, that with specialized software implementation is provide the performance of 275,3 GFLOP/s for GPU GeForce GTX 960M.
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

MERRILL, DUANE, e ANDREW GRIMSHAW. "HIGH PERFORMANCE AND SCALABLE RADIX SORTING: A CASE STUDY OF IMPLEMENTING DYNAMIC PARALLELISM FOR GPU COMPUTING". Parallel Processing Letters 21, n.º 02 (junho de 2011): 245–72. http://dx.doi.org/10.1142/s0129626411000187.

Texto completo da fonte
Resumo:
The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for radix sorting; and (2) our allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism. We demonstrate multiple factors of speedup (up to 3.8x) compared to state-of-the-art GPU sorting. We also reverse the performance differentials observed between GPU and multi/many-core CPU architectures by recent comparisons in the literature, including those with 32-core CPU-based accelerators. Our average sorting rates exceed 1B 32-bit keys/sec on a single GPU microprocessor. Our sorting passes are constructed from a very efficient parallel prefix scan "runtime" that incorporates three design features: (1) kernel fusion for locally generating and consuming prefix scan data; (2) multi-scan for performing multiple related, concurrent prefix scans (one for each partitioning bin); and (3) flexible algorithm serialization for avoiding unnecessary synchronization and communication within algorithmic phases, allowing us to construct a single implementation that scales well across all generations and configurations of programmable NVIDIA GPUs.
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Gremse, Felix, Andreas Höfter, Lukas Razik, Fabian Kiessling e Uwe Naumann. "GPU-accelerated adjoint algorithmic differentiation". Computer Physics Communications 200 (março de 2016): 300–311. http://dx.doi.org/10.1016/j.cpc.2015.10.027.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Rapaport, D. C. "GPU molecular dynamics: Algorithms and performance". Journal of Physics: Conference Series 2241, n.º 1 (1 de março de 2022): 012007. http://dx.doi.org/10.1088/1742-6596/2241/1/012007.

Texto completo da fonte
Resumo:
Abstract A previous study of MD algorithms designed for GPU use is extended to cover more recent developments in GPU architecture. Algorithm modifications are described, togther with extensions to more complex systems. New measurements include the effects of increased parallelism on GPU performance, as well as comparisons with multiple-core CPUs using multitasking based on CPU threads and message passing. The results show that the GPU retains a significant performance advantage.
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Mikhayluk, M. V., e A. M. Trushin. "Spheres Collision Detection Algorithms on GPU". PROGRAMMNAYA INGENERIA 8, n.º 8 (15 de agosto de 2017): 354–58. http://dx.doi.org/10.17587/prin.8.354-358.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Matei, Adrian, Cristian Lupașcu e Ion Bica. "On GPU Implementations of Encryption Algorithms". Journal of Military Technology 2, n.º 2 (18 de dezembro de 2019): 29–34. http://dx.doi.org/10.32754/jmt.2019.2.04.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
11

Zhang, Xingyi, Bangju Wang, Zhuanlian Ding, Jin Tang e Juanjuan He. "Implementation of Membrane Algorithms on GPU". Journal of Applied Mathematics 2014 (2014): 1–7. http://dx.doi.org/10.1155/2014/307617.

Texto completo da fonte
Resumo:
Membrane algorithms are a new class of parallel algorithms, which attempt to incorporate some components of membrane computing models for designing efficient optimization algorithms, such as the structure of the models and the way of communication between cells. Although the importance of the parallelism of such algorithms has been well recognized, membrane algorithms were usually implemented on the serial computing device central processing unit (CPU), which makes the algorithms unable to work in an efficient way. In this work, we consider the implementation of membrane algorithms on the parallel computing device graphics processing unit (GPU). In such implementation, all cells of membrane algorithms can work simultaneously. Experimental results on two classical intractable problems, the point set matching problem and TSP, show that the GPU implementation of membrane algorithms is much more efficient than CPU implementation in terms of runtime, especially for solving problems with a high complexity.
Estilos ABNT, Harvard, Vancouver, APA, etc.
12

Singh, Dhirendra Pratap, Ishan Joshi e Jaytrilok Choudhary. "Survey of GPU Based Sorting Algorithms". International Journal of Parallel Programming 46, n.º 6 (11 de abril de 2017): 1017–34. http://dx.doi.org/10.1007/s10766-017-0502-5.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
13

Ngo, Long Thanh, Dzung Dinh Nguyen, Long The Pham e Cuong Manh Luong. "Speedup of Interval Type 2 Fuzzy Logic Systems Based on GPU for Robot Navigation". Advances in Fuzzy Systems 2012 (2012): 1–11. http://dx.doi.org/10.1155/2012/698062.

Texto completo da fonte
Resumo:
As the number of rules and sample rate for type 2 fuzzy logic systems (T2FLSs) increases, the speed of calculations becomes a problem. The T2FLS has a large membership value of inherent algorithmic parallelism that modern CPU architectures do not exploit. In the T2FLS, many rules and algorithms can be speedup on a graphics processing unit (GPU) as long as the majority of computation a various stages and components are not dependent on each other. This paper demonstrates how to install interval type 2 fuzzy logic systems (IT2-FLSs) on the GPU and experiments for obstacle avoidance behavior of robot navigation. GPU-based calculations are high-performance solution and free up the CPU. The experimental results show that the performance of the GPU is many times faster than CPU.
Estilos ABNT, Harvard, Vancouver, APA, etc.
14

Lee, Taekhee, e Young J. Kim. "Massively parallel motion planning algorithms under uncertainty using POMDP". International Journal of Robotics Research 35, n.º 8 (21 de agosto de 2015): 928–42. http://dx.doi.org/10.1177/0278364915594856.

Texto completo da fonte
Resumo:
We present new parallel algorithms that solve continuous-state partially observable Markov decision process (POMDP) problems using the GPU (gPOMDP) and a hybrid of the GPU and CPU (hPOMDP). We choose the Monte Carlo value iteration (MCVI) method as our base algorithm and parallelize this algorithm using the multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to utilize the massive data parallelism available on modern GPUs. Our GPU-based method uses the two workload distribution techniques, compute/data interleaving and workload balancing, in order to obtain the maximum parallel performance at the highest level. Here we also present a CPU–GPU hybrid method that takes advantage of both CPU and GPU parallelism in order to solve highly complex POMDP planning problems. The CPU is responsible for data preparation, while the GPU performs Monte Cacrlo simulations; these operations are performed concurrently using the compute/data overlap technique between the CPU and GPU. To the best of the authors’ knowledge, our algorithms are the first parallel algorithms that efficiently execute POMDP in a massively parallel fashion utilizing the GPU or a hybrid of the GPU and CPU. Our algorithms outperform the existing CPU-based algorithm by a factor of 75–99 based on the chosen benchmark.
Estilos ABNT, Harvard, Vancouver, APA, etc.
15

Andrzejewski, Witold, Artur Gramacki e Jarosław Gramacki. "Graphics processing units in acceleration of bandwidth selection for kernel density estimation". International Journal of Applied Mathematics and Computer Science 23, n.º 4 (1 de dezembro de 2013): 869–85. http://dx.doi.org/10.2478/amcs-2013-0065.

Texto completo da fonte
Resumo:
Abstract The Probability Density Function (PDF) is a key concept in statistics. Constructing the most adequate PDF from the observed data is still an important and interesting scientific problem, especially for large datasets. PDFs are often estimated using nonparametric data-driven methods. One of the most popular nonparametric method is the Kernel Density Estimator (KDE). However, a very serious drawback of using KDEs is the large number of calculations required to compute them, especially to find the optimal bandwidth parameter. In this paper we investigate the possibility of utilizing Graphics Processing Units (GPUs) to accelerate the finding of the bandwidth. The contribution of this paper is threefold: (a) we propose algorithmic optimization to one of bandwidth finding algorithms, (b) we propose efficient GPU versions of three bandwidth finding algorithms and (c) we experimentally compare three of our GPU implementations with the ones which utilize only CPUs. Our experiments show orders of magnitude improvements over CPU implementations of classical algorithms.
Estilos ABNT, Harvard, Vancouver, APA, etc.
16

Lin, Chun-Yuan, Wei Sheng Lee e Chuan Yi Tang. "Parallel Shellsort Algorithm for Many-Core GPUs with CUDA". International Journal of Grid and High Performance Computing 4, n.º 2 (abril de 2012): 1–16. http://dx.doi.org/10.4018/jghpc.2012040101.

Texto completo da fonte
Resumo:
Sorting is a classic algorithmic problem and its importance has led to the design and implementation of various sorting algorithms on many-core graphics processing units (GPUs). CUDPP Radix sort is the most efficient sorting on GPUs and GPU Sample sort is the best comparison-based sorting. Although the implementations of these algorithms are efficient, they either need an extra space for the data rearrangement or the atomic operation for the acceleration. Sorting applications usually deal with a large amount of data, thus the memory utilization is an important consideration. Furthermore, these sorting algorithms on GPUs without the atomic operation support can result in the performance degradation or fail to work. In this paper, an efficient implementation of a parallel shellsort algorithm, CUDA shellsort, is proposed for many-core GPUs with CUDA. Experimental results show that, on average, the performance of CUDA shellsort is nearly twice faster than GPU quicksort and 37% faster than Thrust mergesort under uniform distribution. Moreover, its performance is the same as GPU sample sort up to 32 million data elements, but only needs a constant space usage. CUDA shellsort is also robust over various data distributions and could be suitable for other many-core architectures.
Estilos ABNT, Harvard, Vancouver, APA, etc.
17

Andrecut, M. "Parallel GPU Implementation of Iterative PCA Algorithms". Journal of Computational Biology 16, n.º 11 (novembro de 2009): 1593–99. http://dx.doi.org/10.1089/cmb.2008.0221.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
18

Blanchard, Jeffrey D., e Jared Tanner. "GPU accelerated greedy algorithms for compressed sensing". Mathematical Programming Computation 5, n.º 3 (13 de julho de 2013): 267–304. http://dx.doi.org/10.1007/s12532-013-0056-5.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
19

Cecilia, José M., Andy Nisbet, Martyn Amos, José M. García e Manuel Ujaldón. "Enhancing GPU parallelism in nature-inspired algorithms". Journal of Supercomputing 63, n.º 3 (3 de maio de 2012): 773–89. http://dx.doi.org/10.1007/s11227-012-0770-1.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
20

Jin, Jing, Xianggao Cai, Guoming Lai e Xiaola Lin. "GPU-accelerated parallel algorithms for linear rankSVM". Journal of Supercomputing 71, n.º 11 (27 de agosto de 2015): 4141–71. http://dx.doi.org/10.1007/s11227-015-1509-6.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
21

Barrientos, Ricardo J., Fabricio Millaguir, José L. Sánchez e Enrique Arias. "GPU-based exhaustive algorithms processing kNN queries". Journal of Supercomputing 73, n.º 10 (17 de julho de 2017): 4611–34. http://dx.doi.org/10.1007/s11227-017-2110-y.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
22

Wang, Hongzhi, Ning Li, Zheng Wang e Jianing Li. "GPU-based efficient join algorithms on Hadoop". Journal of Supercomputing 77, n.º 1 (3 de abril de 2020): 292–321. http://dx.doi.org/10.1007/s11227-020-03262-6.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
23

Ferreira, João Filipe, Jorge Lobo e Jorge Dias. "Bayesian real-time perception algorithms on GPU". Journal of Real-Time Image Processing 6, n.º 3 (26 de fevereiro de 2010): 171–86. http://dx.doi.org/10.1007/s11554-010-0156-7.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
24

Lee, Kwan-Ho, e Chi-Yong Kim. "Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms". Journal of IKEEE 21, n.º 1 (31 de março de 2017): 92–95. http://dx.doi.org/10.7471/ikeee.2017.21.1.92.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
25

Acer, Seher, Ariful Azad, Erik G. Boman, Aydın Buluç, Karen D. Devine, SM Ferdous, Nitin Gawande et al. "EXAGRAPH: Graph and combinatorial methods for enabling exascale applications". International Journal of High Performance Computing Applications 35, n.º 6 (30 de setembro de 2021): 553–71. http://dx.doi.org/10.1177/10943420211029299.

Texto completo da fonte
Resumo:
Combinatorial algorithms in general and graph algorithms in particular play a critical enabling role in numerous scientific applications. However, the irregular memory access nature of these algorithms makes them one of the hardest algorithmic kernels to implement on parallel systems. With tens of billions of hardware threads and deep memory hierarchies, the exascale computing systems in particular pose extreme challenges in scaling graph algorithms. The codesign center on combinatorial algorithms, ExaGraph, was established to design and develop methods and techniques for efficient implementation of key combinatorial (graph) algorithms chosen from a diverse set of exascale applications. Algebraic and combinatorial methods have a complementary role in the advancement of computational science and engineering, including playing an enabling role on each other. In this paper, we survey the algorithmic and software development activities performed under the auspices of ExaGraph from both a combinatorial and an algebraic perspective. In particular, we detail our recent efforts in porting the algorithms to manycore accelerator (GPU) architectures. We also provide a brief survey of the applications that have benefited from the scalable implementations of different combinatorial algorithms to enable scientific discovery at scale. We believe that several applications will benefit from the algorithmic and software tools developed by the ExaGraph team.
Estilos ABNT, Harvard, Vancouver, APA, etc.
26

Agibalov, Oleg, e Nikolay Ventsov. "On the issue of fuzzy timing estimations of the algorithms running at GPU and CPU architectures". E3S Web of Conferences 135 (2019): 01082. http://dx.doi.org/10.1051/e3sconf/201913501082.

Texto completo da fonte
Resumo:
We consider the task of comparing fuzzy estimates of the execution parameters of genetic algorithms implemented at GPU (graphics processing unit’ GPU) and CPU (central processing unit) architectures. Fuzzy estimates are calculated based on the averaged dependencies of the genetic algorithms running time at GPU and CPU architectures from the number of individuals in the populations processed by the algorithm. The analysis of the averaged dependences of the genetic algorithms running time at GPU and CPU-architectures showed that it is possible to process 10’000 chromosomes at GPU-architecture or 5’000 chromosomes at CPUarchitecture by genetic algorithm in approximately 2’500 ms. The following is correct for the cases under consideration: “Genetic algorithms (GA) are performed in approximately 2, 500 ms (on average), ” and a sections of fuzzy sets, with a = 0.5, correspond to the intervals [2, 000.2399] for GA performed at the GPU-architecture, and [1, 400.1799] for GA performed at the CPU-architecture. Thereby, it can be said that in this case, the actual execution time of the algorithm at the GPU architecture deviates in a lesser extent from the average value than at the CPU.
Estilos ABNT, Harvard, Vancouver, APA, etc.
27

Retamosa, Germán, Luis de Pedro, Ivan González e Javier Tamames. "Prefiltering Model for Homology Detection Algorithms on GPU". Evolutionary Bioinformatics 12 (janeiro de 2016): EBO.S40877. http://dx.doi.org/10.4137/ebo.s40877.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
28

Bogle, Ian, George M. Slota, Erik G. Boman, Karen D. Devine e Sivasankaran Rajamanickam. "Parallel graph coloring algorithms for distributed GPU environments". Parallel Computing 110 (maio de 2022): 102896. http://dx.doi.org/10.1016/j.parco.2022.102896.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
29

FUKUII, Masahiro, e Kenichi HAYASHI. "Techniques for Accelerating LSI Design Algorithms by GPU". IEICE ESS Fundamentals Review 6, n.º 3 (2013): 210–17. http://dx.doi.org/10.1587/essfr.6.210.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
30

LUONG, THÉ VAN, NOUREDINE MELAB e EL-GHAZALI TALBI. "NEIGHBORHOOD STRUCTURES FOR GPU-BASED LOCAL SEARCH ALGORITHMS". Parallel Processing Letters 20, n.º 04 (dezembro de 2010): 307–24. http://dx.doi.org/10.1142/s0129626410000260.

Texto completo da fonte
Resumo:
Local search algorithms are powerful heuristics for solving computationally hard problems in science and industry. In these methods, designing neighborhood operators to explore large promising regions of the search space may improve the quality of the obtained solutions at the expense of a high-cost computation process. As a consequence, the use of GPU computing provides an efficient way to speed up the search. However, designing applications on a GPU is still complex and many issues have to be faced. We provide a methodology to design and implement different neighborhood structures for LS algorithms on a GPU. The work has been evaluated for binary problems and the obtained results are convincing both in terms of efficiency, quality and robustness of the provided solutions at run time.
Estilos ABNT, Harvard, Vancouver, APA, etc.
31

Schmitz, L., L. F. Scheidegger, D. K. Osmari, C. A. Dietrich e J. L. D. Comba. "Efficient and Quality Contouring Algorithms on the GPU". Computer Graphics Forum 29, n.º 8 (27 de setembro de 2010): 2569–78. http://dx.doi.org/10.1111/j.1467-8659.2010.01825.x.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
32

Gonakhchyan, V. I. "Survey of polygonal surface simplification algorithms on GPU". Proceedings of the Institute for System Programming of RAS 26, n.º 2 (2014): 159–74. http://dx.doi.org/10.15514/ispras-2014-26(2)-7.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
33

Luong, The Van, Nouredine Melab e El-Ghazali Talbi. "GPU Computing for Parallel Local Search Metaheuristic Algorithms". IEEE Transactions on Computers 62, n.º 1 (janeiro de 2013): 173–85. http://dx.doi.org/10.1109/tc.2011.206.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
34

Chakroun, I., e N. Melab. "Operator-level GPU-Accelerated Branch and Bound Algorithms". Procedia Computer Science 18 (2013): 280–89. http://dx.doi.org/10.1016/j.procs.2013.05.191.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
35

Galiano, V., H. Migallón, V. Migallón e J. Penadés. "GPU-based parallel algorithms for sparse nonlinear systems". Journal of Parallel and Distributed Computing 72, n.º 9 (setembro de 2012): 1098–105. http://dx.doi.org/10.1016/j.jpdc.2011.10.016.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
36

Wang, Hongzhi, Zheng Wang, Ning Li e Xinxin Kong. "Efficient OLAP algorithms on GPU-accelerated Hadoop clusters". Distributed and Parallel Databases 37, n.º 4 (31 de julho de 2018): 507–42. http://dx.doi.org/10.1007/s10619-018-7239-z.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
37

Ploskas, Nikolaos, e Nikolaos Samaras. "Efficient GPU-based implementations of simplex type algorithms". Applied Mathematics and Computation 250 (janeiro de 2015): 552–70. http://dx.doi.org/10.1016/j.amc.2014.10.096.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
38

Cazalas, Jonathan, e Ratan K. Guha. "Performance Modeling of Spatio-Temporal Algorithms Over GEDS Framework". International Journal of Grid and High Performance Computing 4, n.º 3 (julho de 2012): 63–84. http://dx.doi.org/10.4018/jghpc.2012070104.

Texto completo da fonte
Resumo:
The efficient processing of spatio-temporal data streams is an area of intense research. However, all methods rely on an unsuitable processor (Govindaraju, 2004), namely a CPU, to evaluate concurrent, continuous spatio-temporal queries over these data streams. This paper presents a performance model of the execution of spatio-temporal queries over the authors’ GEDS framework (Cazalas & Guha, 2010). GEDS is a scalable, Graphics Processing Unit (GPU)-based framework, employing computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal queries over spatio temporal data streams. Experimental evaluation shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments and demonstrates that, despite the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. To move beyond the analysis of specific algorithms over the GEDS framework, the authors developed an abstract performance model, detailing the relationship of the CPU and the GPU. From this model, they are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based applications.
Estilos ABNT, Harvard, Vancouver, APA, etc.
39

WEIGEL, MARTIN. "SIMULATING SPIN MODELS ON GPU: A TOUR". International Journal of Modern Physics C 23, n.º 08 (agosto de 2012): 1240002. http://dx.doi.org/10.1142/s0129183112400025.

Texto completo da fonte
Resumo:
The use of graphics processing units (GPUs) in scientific computing has gathered considerable momentum in the past five years. While GPUs in general promise high performance and excellent performance per Watt ratios, not every class of problems is equally well suitable for exploiting the massively parallel architecture they provide. Lattice spin models appear to be prototypic examples of problems suitable for this architecture, at least as long as local update algorithms are employed. In this review, I summarize our recent experience with the simulation of a wide range of spin models on GPU employing an equally wide range of update algorithms, ranging from Metropolis and heat bath updates, over cluster algorithms to generalized ensemble simulations.
Estilos ABNT, Harvard, Vancouver, APA, etc.
40

Gajic, Dusan, e Radomir Stankovic. "GPU accelerated computation of fast spectral transforms". Facta universitatis - series: Electronics and Energetics 24, n.º 3 (2011): 483–99. http://dx.doi.org/10.2298/fuee1103483g.

Texto completo da fonte
Resumo:
This paper discusses techniques for accelerated computation of several fast spectral transforms on graphics processing units (GPUs) using the Open Computing Language (OpenCL). We present a reformulation of fast algorithms which takes into account peculiar properties of transforms to make them suitable for the GPU implementation. A special attention is paid to the organization of computations, memory transfer reductions, impact of integer and Boolean arithmetic, different structure of algorithms, etc. Performance of the GPU implementations is compared with the classical C/C++ implementations for the central processing unit (CPU). Experiments confirm that, even though the spectral transforms considered involve only simple arithmetic, significant speedups are achieved by implementing the algorithms in OpenCL and performing them on the GPU.
Estilos ABNT, Harvard, Vancouver, APA, etc.
41

Озерицкий, А. В. "Computational simulation using particles on GPU and GLSL language". Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie), n.º 1 (19 de janeiro de 2023): 37–54. http://dx.doi.org/10.26089/nummet.v24r104.

Texto completo da fonte
Resumo:
Рассмотрено моделирование гравитационной задачи N тел с использованием алгоритмов PM и P3M. Реализация алгоритмов для GPU осуществлена с применением вычислительных шейдеров. Предложенный подход использует CPU-код только для синхронизации и запуска шейдеров и не содержит вычислительных частей, реализуемых на CPU; в том числе полностью отсутствует копирование данных между CPU и GPU. Приводятся параллельный алгоритм размещения частиц по ячейкам сетки и параллельный алгоритм распределения масс по узлам сетки. Основой алгоритмов является параллельное построение списков, соответствующих ячейкам сетки. Алгоритмы полностью распараллелены и не содержат частей, исполняемых в один поток. Для расчета одновременно с визуализацией часть вычислений сделана в вершинном шейдере. Выполнить их позволило использование буферных объектов в вершинном шейдере и специально подготовленных данных вместо вершин в качестве входа. Приведены результаты численных расчетов на примере образования галактических скоплений в расширяющейся согласно модели Фридмана плоской вселенной. В качестве модели вселенной брался куб с периодическими краевыми условиями по всем осям. Максимальное число частиц, с которым проводились расчеты, — 10 в степени 8. Для моделирования использовались современный кроссплатформенный API Vulkan и язык GLSL. Результаты расчетов на процессорах Apple M1 и Ryzen 3700X сравниваются с результатами расчетов на обычных видеокартах Apple M1 и NVIDIA RTX 3060. Параллельный алгоритм для CPU реализован с помощью OpenMP. Проведено сравнение производительности алгоритма с результатами других авторов, причем делаются качественные сравнения самих результатов вычислений и сравнение времени работы алгоритмов. Также приведено сравнение времени работы программы для GPU и похожей программы для кластера из многих узлов. The N-body problem simulation using PM and P3M algorithms is provided. A GPU implementation of an algorithm using compute shaders is provided. This algorithm uses the CPU for synchronizing and launching the shaders only, whereas it does not contain computational parts implemented on the CPU. That also includes no data copying between the GPU and CPU. Parallel algorithms for placing particles in grid cells and mass distribution in grid nodes are presented. The algorithms are based on parallel construction of linked lists corresponding to grid cells. The algorithms are completely parallel and do not contain sequential parts. Some calculations are done in a vertex shader to compute simultaneously with visualization. This was done with the help of shader buffer objects as well as specially prepared data instead of vertices as vertex shader input. The results of the numerical calculations using galaxy cluster formation based on a flat expanding Friedmann model universe are presented as an example. A cube with periodic boundary conditions on all axes was used as an example of a model universe. The maximum particle amount used in calculations is 10 to the power of 8. The modern cross platform API Vulkan and GLSL language were used for simulation purposes. The numerical calculations are compared using the Apple M1 and Ryzen 3700X processors, with the results using regular video cards — Apple M1 and NVIDIA RTX 3060. The parallel algorithm for the CPU is implemented using OpenMP. Algorithmic efficiency is compared to the results by other authors. The results of the algorithm are compared to the results of other authors, and the qualitative results and execution time are also compared. A comparison of the running time of programs for the GPU with a similar cluster program with many nodes is given.
Estilos ABNT, Harvard, Vancouver, APA, etc.
42

CHEN, Ying, Jin-xian LIN e Tun LU. "Implementation of LU decomposition and Laplace algorithms on GPU". Journal of Computer Applications 31, n.º 3 (18 de maio de 2011): 851–55. http://dx.doi.org/10.3724/sp.j.1087.2011.00851.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
43

Date, Ketan, e Rakesh Nagi. "GPU-accelerated Hungarian algorithms for the Linear Assignment Problem". Parallel Computing 57 (setembro de 2016): 52–72. http://dx.doi.org/10.1016/j.parco.2016.05.012.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
44

Castaño-Díez, Daniel, Dominik Moser, Andreas Schoenegger, Sabine Pruggnaller e Achilleas S. Frangakis. "Performance evaluation of image processing algorithms on the GPU". Journal of Structural Biology 164, n.º 1 (outubro de 2008): 153–60. http://dx.doi.org/10.1016/j.jsb.2008.07.006.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
45

Lobeiras, Jacobo, Margarita Amor e Ramon Doallo. "Designing Efficient Index-Digit Algorithms for CUDA GPU Architectures". IEEE Transactions on Parallel and Distributed Systems 27, n.º 5 (1 de maio de 2016): 1331–43. http://dx.doi.org/10.1109/tpds.2015.2450718.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
46

Wang, Hongjian, Naiyu Zhang, Jean-Charles Créput, Yassine Ruichek e Julien Moreau. "Massively parallel GPU computing for fast stereo correspondence algorithms". Journal of Systems Architecture 65 (abril de 2016): 46–58. http://dx.doi.org/10.1016/j.sysarc.2016.03.002.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
47

Cheng, John Runwei, e Mitsuo Gen. "Accelerating genetic algorithms with GPU computing: A selective overview". Computers & Industrial Engineering 128 (fevereiro de 2019): 514–25. http://dx.doi.org/10.1016/j.cie.2018.12.067.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
48

Krüger, Jens, e Rüdiger Westermann. "Linear algebra operators for GPU implementation of numerical algorithms". ACM Transactions on Graphics 22, n.º 3 (julho de 2003): 908–16. http://dx.doi.org/10.1145/882262.882363.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
49

Agarwal, Nipun, Aman Goyal, Gaurav Maheshwari e Alok Dugtal. "Parallel Implementation of Scheduling Algorithms on GPU using CUDA". International Journal of Computer Applications 127, n.º 2 (15 de outubro de 2015): 44–49. http://dx.doi.org/10.5120/ijca2015906339.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
50

Ben Boudaoud, Lynda, Basel Solaiman e Abdelkamel Tari. "Implementation and comparison of binary thinning algorithms on GPU". Computing 101, n.º 8 (16 de agosto de 2018): 1091–117. http://dx.doi.org/10.1007/s00607-018-0653-2.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!

Vá para a bibliografia