Relevant bibliographies by topics / Optimisations for GPU

Journal articles
Dissertations / Theses
Book chapters
Conference papers

Academic literature on the topic 'Optimisations for GPU'

Author: Grafiati

Published: 25 May 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Optimisations for GPU.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Optimisations for GPU"

Amadio, G., J. Apostolakis, P. Buncic, et al. "Offloading electromagnetic shower transport to GPUs." Journal of Physics: Conference Series 2438, no. 1 (2023): 012055. http://dx.doi.org/10.1088/1742-6596/2438/1/012055.

Full text

Abstract:

Abstract Making general particle transport simulation for high-energy physics (HEP) single-instruction-multiple-thread (SIMT) friendly, to take advantage of accelerator hardware, is an important alternative for boosting the throughput of simulation applications. To date, this challenge is not yet resolved, due to difficulties in mapping the complexity of Geant4 components and workflow to the massive parallelism features exposed by graphics processing units (GPU). The AdePT project is one of the R&D initiatives tackling this limitation and exploring GPUs as potential accelerators for offloading some part of the CPU simulation workload. Our main target is to implement a complete electromagnetic shower demonstrator working on the GPU. The project is the first to create a full prototype of a realistic electron, positron, and gamma electromagnetic shower simulation on GPU, implemented as either a standalone application or as an extension of the standard Geant4 CPU workflow. Our prototype currently provides a platform to explore many optimisations and different approaches. We present the most recent results and initial conclusions of our work, using both a standalone GPU performance analysis and a first implementation of a hybrid workflow based on Geant4 on the CPU and AdePT on the GPU.

APA, Harvard, Vancouver, ISO, and other styles

Yao, Shujun, Shuo Zhang, and Wanhua Guo. "Electromagnetic transient parallel simulation optimisation based on GPU." Journal of Engineering 2019, no. 16 (2019): 1737–42. http://dx.doi.org/10.1049/joe.2018.8587.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ebrahim, Abdulla, Andrea Bocci, Wael Elmedany, and Hesham Al-Ammal. "Optimising the Configuration of the CMS GPU Reconstruction." EPJ Web of Conferences 295 (2024): 11015. http://dx.doi.org/10.1051/epjconf/202429511015.

Full text

Abstract:

Particle track reconstruction for high energy physics experiments like CMS is computationally demanding but can benefit from GPU acceleration if properly tuned. This work develops an autotuning framework to automatically optimise the throughput of GPU-accelerated CUDA kernels in CMSSW. The proposed system navigates the complex parameter space by generating configurations, benchmarking performance, and leveraging multi-fidelity optimisation from simplified applications. The autotuned launch parameters improved CMSSW tracking throughput over the default settings by finding optimised, GPU-specific configurations. The successful application of autotuning to CMSSW demonstrates both performance portability across diverse accelerators and the potential of the methodology to optimise other HEP codebases.

APA, Harvard, Vancouver, ISO, and other styles

Quan, H., Z. Cui, R. Wang, and Zongjie Cao. "GPU parallel implementation and optimisation of SAR target recognition method." Journal of Engineering 2019, no. 21 (2019): 8129–33. http://dx.doi.org/10.1049/joe.2019.0669.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Träff, Erik A., Anton Rydahl, Sven Karlsson, Ole Sigmund, and Niels Aage. "Simple and efficient GPU accelerated topology optimisation: Codes and applications." Computer Methods in Applied Mechanics and Engineering 410 (May 2023): 116043. http://dx.doi.org/10.1016/j.cma.2023.116043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Szénási, Sándor. "Solving the inverse heat conduction problem using NVLink capable Power architecture." PeerJ Computer Science 3 (November 20, 2017): e138. http://dx.doi.org/10.7717/peerj-cs.138.

Full text

Abstract:

The accurate knowledge of Heat Transfer Coefficients is essential for the design of precise heat transfer operations. The determination of these values requires Inverse Heat Transfer Calculations, which are usually based on heuristic optimisation techniques, like Genetic Algorithms or Particle Swarm Optimisation. The main bottleneck of these heuristics is the high computational demand of the cost function calculation, which is usually based on heat transfer simulations producing the thermal history of the workpiece at given locations. This Direct Heat Transfer Calculation is a well parallelisable process, making it feasible to implement an efficient GPU kernel for this purpose. This paper presents a novel step forward: based on the special requirements of the heuristics solving the inverse problem (executing hundreds of simulations in a parallel fashion at the end of each iteration), it is possible to gain a higher level of parallelism using multiple graphics accelerators. The results show that this implementation (running on 4 GPUs) is about 120 times faster than a traditional CPU implementation using 20 cores. The latest developments of the GPU-based High Power Computations area were also analysed, like the new NVLink connection between the host and the devices, which tries to solve the long time existing data transfer handicap of GPU programming.

APA, Harvard, Vancouver, ISO, and other styles

Bitam, Salim, NourEddine Djedi, and Maroua Grid. "GPU-based distributed bee swarm optimisation for dynamic vehicle routing problem." International Journal of Ad Hoc and Ubiquitous Computing 31, no. 3 (2019): 155. http://dx.doi.org/10.1504/ijahuc.2019.10022343.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Khemiri, Randa, Hassan Kibeya, Fatma Ezahra Sayadi, Nejmeddine Bahri, Mohamed Atri, and Nouri Masmoudi. "Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation." IET Image Processing 12, no. 2 (2018): 243–53. http://dx.doi.org/10.1049/iet-ipr.2017.0474.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Uchida, Akihiro, Yasuaki Ito, and Koji Nakano. "Accelerating ant colony optimisation for the travelling salesman problem on the GPU." International Journal of Parallel, Emergent and Distributed Systems 29, no. 4 (2013): 401–20. http://dx.doi.org/10.1080/17445760.2013.842568.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Spalding, Myles, Anthony Walsh, and Trent Aland. "Evaluation of a new GPU-enabled VMAT multi-criteria optimisation plan generation algorithm." Medical Dosimetry 45, no. 4 (2020): 368–73. http://dx.doi.org/10.1016/j.meddos.2020.05.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Optimisations for GPU"

Romera, Thomas. "Adéquation algorithme architecture pour flot optique sur GPU embarqué." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS450.

Full text

Abstract:

Cette thèse porte sur l'optimisation et l'implémentation efficace d'algorithmes d'estimation du mouvement des pixels (flot optique) sur des processeurs graphiques (GPU) embarqués. Deux algorithmes itératifs ont été étudiés : la méthode de Variation Totale - L1 (TV-L1) et la méthode de Horn-Schunck. L’objectif est d’obtenir un traitement temps réel (moins de 40 ms par images) sur des plateformes embarquées à faible consommation énergétique, tout en gardant une résolution image et une qualité d’estimation du flot acceptable pour les applications visées. Différents niveaux de stratégies d'optimisation ont été explorés. Des transformations algorithmiques de haut niveau, telles que la fusion d'opérateurs et le pipeline d'opérateurs, ont été mises en œuvre pour maximiser la réutilisation des données et améliorer la localité spatiale/temporelle. De plus, des optimisations bas niveau spécifiques aux GPU, notamment l'utilisation d'instructions et de nombres vectoriels, ainsi qu'une gestion efficace de l'accès à la mémoire, ont été intégrées. L'impact de la représentation des nombres en virgule flottante (simple précision par rapport à demi-précision) a également été étudié. Les implémentations ont été évaluées sur les plateformes embarquées Nvidia Jetson Xavier, TX2 et Nano en termes de temps d'exécution, de consommation énergétique et de précision du flot optique. Notamment, la méthode TV-L1 présente une complexité et une intensité de calcul plus élevées par rapport à Horn-Schunck. Les versions les plus rapides de ces algorithmes atteignent ainsi un temps de traitement de 0,21 nanosecondes par pixel par itération en demi-précision sur la plate-forme Xavier. Cela représente une réduction du temps d'exécution de 22x par rapport aux versions CPU efficaces et parallèles. De plus, la consommation d'énergie est réduite d'un facteur x5,3. Parmi les cartes testées, la plate-forme embarquée Xavier, à la fois la plus puissante et la plus récente, offre systématiquement les meilleurs résultats en termes de vitesse et d'efficacité énergétique. La fusion d'opérateurs et le pipelining se sont avérés essentiels pour améliorer les performances sur GPU en favorisant la réutilisation des données. Cette réutilisation des données est rendue possible grâce à la mémoire Shared des GPU, une petite mémoire d'accès rapide permettant le partage de données entre les threads du même bloc de threads GPU. Bien que la fusion de plusieurs itérations apporte des gains de performance, elle est limitée par la taille de la mémoire Shared, nécessitant des compromis entre l'utilisation des ressources et la vitesse. L'utilisation de nombres en demi-précision accélère les algorithmes itératifs et permet d'obtenir une meilleure précision du flot optique dans le même laps de temps par rapport aux versions en simple-précision. Les implémentations en demi-précision convergent plus rapidement en raison de l'augmentation du nombre d'itérations réalisables dans un délai donné. Plus précisément, l'utilisation de nombres en demi-précision sur la meilleure architecture GPU accélère l'exécution jusqu'à 2,2x pour TV-L1 et 3,7x pour Horn-Schunck. Ces travaux soulignent l'importance des optimisations spécifiques aux GPU pour les algorithmes de vision par ordinateur, ainsi que l'utilisation et l'étude des nombres à virgule flottante de précision réduite. Ils ouvrent la voie à des améliorations futures grâce à des différentes transformations algorithmiques, à des formats numériques différents et à des architectures matérielles nouvelles. Cette approche peut également être étendue à d'autres familles d'algorithmes itératifs This thesis focus on the optimization and efficient implementation of pixel motion (optical flow) estimation algorithms on embedded graphics processing units (GPUs). Two iterative algorithms have been studied: the Total Variation - L1 (TV-L1) method and the Horn-Schunck method. The primary objective of this work is to achieve real-time processing, with a target frame processing time of less than 40 milliseconds, on low-power platforms, while maintaining acceptable image resolution and flow estimation quality for the intended applications. Various levels of optimization strategies have been explored. High-level algorithmic transformations, such as operator fusion and operator pipelining, have been implemented to maximize data reuse and enhance spatial/temporal locality. Additionally, GPU-specific low-level optimizations, including the utilization of vector instructions and numbers, as well as efficient memory access management, have been incorporated. The impact of floating-point number representation (single-precision versus half-precision) has also been investigated. The implementations have been assessed on Nvidia's Jetson Xavier, TX2, and Nano embedded platforms in terms of execution time, power consumption, and optical flow accuracy. Notably, the TV-L1 method exhibits higher complexity and computational intensity compared to Horn-Schunck. The fastest versions of these algorithms achieve a processing rate of 0.21 nanoseconds per pixel per iteration in half-precision on the Xavier platform, representing a 22x time reduction over efficient and parallel CPU versions. Furthermore, energy consumption is reduced by a factor of x5.3. Among the tested boards, the Xavier embedded platform, being both the most powerful and the most recent, consistently delivers the best results in terms of speed and energy efficiency. Operator merging and pipelining have proven to be instrumental in improving GPU performance by enhancing data reuse. This data reuse is made possible through GPU Shared memory, which is a small, high-speed memory that enables data sharing among threads within the same GPU thread block. While merging multiple iterations yields performance gains, it is constrained by the size of the Shared memory, necessitating trade-offs between resource utilization and speed. The adoption of half-precision numbers accelerates iterative algorithms and achieves superior optical flow accuracy within the same time frame compared to single-precision counterparts. Half-precision implementations converge more rapidly due to the increased number of iterations possible within a given time window. Specifically, the use of half-precision numbers on the best GPU architecture accelerates execution by up to x2.2 for TV-L1 and x3.7 for Horn-Schunck. This work underscores the significance of both GPU-specific optimizations for computer vision algorithms, along with the use and study of reduced floating point numbers. They pave the way for future enhancements through new algorithmic transformations, alternative numerical formats, and hardware architectures. This approach can potentially be extended to other families of iterative algorithms

APA, Harvard, Vancouver, ISO, and other styles

Fumero, Alfonso Juan José. "Accelerating interpreted programming languages on GPUs with just-in-time compilation and runtime optimisations." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28718.

Full text

Abstract:

Nowadays, most computer systems are equipped with powerful parallel devices such as Graphics Processing Units (GPUs). They are present in almost every computer system including mobile devices, tablets, desktop computers and servers. These parallel systems have unlocked the possibility for many scientists and companies to process significant amounts of data in shorter time. But the usage of these parallel systems is very challenging due to their programming complexity. The most common programming languages for GPUs, such as OpenCL and CUDA, are created for expert programmers, where developers are required to know hardware details to use GPUs. However, many users of heterogeneous and parallel hardware, such as economists, biologists, physicists or psychologists, are not necessarily expert GPU programmers. They have the need to speed up their applications, which are often written in high-level and dynamic programming languages, such as Java, R or Python. Little work has been done to generate GPU code automatically from these high-level interpreted and dynamic programming languages. This thesis presents a combination of a programming interface and a set of compiler techniques which enable an automatic translation of a subset of Java and R programs into OpenCL to execute on a GPU. The goal is to reduce the programmability and usability gaps between interpreted programming languages and GPUs. The first contribution is an Application Programming Interface (API) for programming heterogeneous and multi-core systems. This API combines ideas from functional programming and algorithmic skeletons to compose and reuse parallel operations. The second contribution is a new OpenCL Just-In-Time (JIT) compiler that automatically translates a subset of the Java bytecode to GPU code. This is combined with a new runtime system that optimises the data management and avoids data transformations between Java and OpenCL. This OpenCL framework and the runtime system achieve speedups of up to 645x compared to Java within 23% slowdown compared to the handwritten native OpenCL code. The third contribution is a new OpenCL JIT compiler for dynamic and interpreted programming languages. While the R language is used in this thesis, the developed techniques are generic for dynamic languages. This JIT compiler uniquely combines a set of existing compiler techniques, such as specialisation and partial evaluation, for OpenCL compilation together with an optimising runtime that compile and execute R code on GPUs. This JIT compiler for the R language achieves speedups of up to 1300x compared to GNU-R and 1.8x slowdown compared to native OpenCL.

APA, Harvard, Vancouver, ISO, and other styles

Hopson, Benjamin Thomas Ken. "Techniques of design optimisation for algorithms implemented in software." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20435.

Full text

Abstract:

The overarching objective of this thesis was to develop tools for parallelising, optimising, and implementing algorithms on parallel architectures, in particular General Purpose Graphics Processors (GPGPUs). Two projects were chosen from different application areas in which GPGPUs are used: a defence application involving image compression, and a modelling application in bioinformatics (computational immunology). Each project had its own specific objectives, as well as supporting the overall research goal. The defence / image compression project was carried out in collaboration with the Jet Propulsion Laboratories. The specific questions were: to what extent an algorithm designed for bit-serial for the lossless compression of hyperspectral images on-board unmanned vehicles (UAVs) in hardware could be parallelised, whether GPGPUs could be used to implement that algorithm, and whether a software implementation with or without GPGPU acceleration could match the throughput of a dedicated hardware (FPGA) implementation. The dependencies within the algorithm were analysed, and the algorithm parallelised. The algorithm was implemented in software for GPGPU, and optimised. During the optimisation process, profiling revealed less than optimal device utilisation, but no further optimisations resulted in an improvement in speed. The design had hit a local-maximum of performance. Analysis of the arithmetic intensity and data-flow exposed flaws in the standard optimisation metric of kernel occupancy used for GPU optimisation. Redesigning the implementation with revised criteria (fused kernels, lower occupancy, and greater data locality) led to a new implementation with 10x higher throughput. GPGPUs were shown to be viable for on-board implementation of the CCSDS lossless hyperspectral image compression algorithm, exceeding the performance of the hardware reference implementation, and providing sufficient throughput for the next generation of image sensor as well. The second project was carried out in collaboration with biologists at the University of Arizona and involved modelling a complex biological system – VDJ recombination involved in the formation of T-cell receptors (TCRs). Generation of immune receptors (T cell receptor and antibodies) by VDJ recombination is an enormously complex process, which can theoretically synthesize greater than 1018 variants. Originally thought to be a random process, the underlying mechanisms clearly have a non-random nature that preferentially creates a small subset of immune receptors in many individuals. Understanding this bias is a longstanding problem in the field of immunology. Modelling the process of VDJ recombination to determine the number of ways each immune receptor can be synthesized, previously thought to be untenable, is a key first step in determining how this special population is made. The computational tools developed in this thesis have allowed immunologists for the first time to comprehensively test and invalidate a longstanding theory (convergent recombination) for how this special population is created, while generating the data needed to develop novel hypothesis.

APA, Harvard, Vancouver, ISO, and other styles

Luong, Thé Van. "Métaheuristiques parallèles sur GPU." Thesis, Lille 1, 2011. http://www.theses.fr/2011LIL10058/document.

Full text

Abstract:

Les problèmes d'optimisation issus du monde réel sont souvent complexes et NP-difficiles. Leur modélisation est en constante évolution en termes de contraintes et d'objectifs, et leur résolution est coûteuse en temps de calcul. Bien que des algorithmes approchés telles que les métaheuristiques (heuristiques génériques) permettent de réduire la complexité de leur résolution, ces méthodes restent insuffisantes pour traiter des problèmes de grande taille. Au cours des dernières décennies, le calcul parallèle s'est révélé comme un moyen incontournable pour faire face à de grandes instances de problèmes difficiles d'optimisation. La conception et l'implémentation de métaheuristiques parallèles sont ainsi fortement influencées par l'architecture parallèle considérée. De nos jours, le calcul sur GPU s'est récemment révélé efficace pour traiter des problèmes coûteux en temps de calcul. Cette nouvelle technologie émergente est considérée comme extrêmement utile pour accélérer de nombreux algorithmes complexes. Un des enjeux majeurs pour les métaheuristiques est de repenser les modèles existants et les paradigmes de programmation parallèle pour permettre leurdéploiement sur les accélérateurs GPU. De manière générale, les problèmes qui se posent sont la répartition des tâches entre le CPU et le GPU, la synchronisation des threads, l'optimisation des transferts de données entre les différentes mémoires, les contraintes de capacité mémoire, etc. La contribution de cette thèse est de faire face à ces problèmes pour la reconception des modèles parallèles des métaheuristiques pour permettre la résolution des problèmes d'optimisation à large échelle sur les architectures GPU. Notre objectif est de repenser les modèles parallèles existants et de permettre leur déploiement sur GPU. Ainsi, nous proposons dans ce document une nouvelle ligne directrice pour la construction de métaheuristiques parallèles efficaces sur GPU. Le défi de cette thèse porte sur la conception de toute la hiérarchie des modèles parallèles sur GPU. Pour cela, des approches très efficaces ont été proposées pour l'optimisation des transferts de données entre le CPU et le GPU, le contrôle de threads, l'association entre les solutions et les threads, ou encore la gestion de la mémoire. Les approches proposées ont été expérimentées de façon exhaustive en utilisant cinq problèmes d'optimisation et quatre configurations GPU. En comparaison avec une exécution sur CPU, les accélérations obtenues vont jusqu'à 80 fois plus vite pour des grands problèmes d'optimisation combinatoire et jusqu'à 2000 fois plus vite pour un problème d'optimisation continue. Les différents travaux liés à cette thèse ont fait l'objet d'une douzaine publications comprenant la revue IEEE Transactions on Computers Real-world optimization problems are often complex and NP-hard. Their modeling is continuously evolving in terms of constraints and objectives, and their resolution is CPU time-consuming. Although near-optimal algorithms such as metaheuristics (generic heuristics) make it possible to reduce the temporal complexity of their resolution, they fail to tackle large problems satisfactorily. Over the last decades, parallel computing has been revealed as an unavoidable way to deal with large problem instances of difficult optimization problems. The design and implementation of parallel metaheuristics are strongly influenced by the computing platform. Nowadays, GPU computing has recently been revealed effective to deal with time-intensive problems. This new emerging technology is believed to be extremely useful to speed up many complex algorithms. One of the major issues for metaheuristics is to rethink existing parallel models and programming paradigms to allow their deployment on GPU accelerators. Generally speaking, the major issues we have to deal with are: the distribution of data processing between CPU and GPU, the thread synchronization, the optimization of data transfer between the different memories, the memory capacity constraints, etc. The contribution of this thesis is to deal with such issues for the redesign of parallel models of metaheuristics to allow solving of large scale optimization problems on GPU architectures. Our objective is to rethink the existing parallel models and to enable their deployment on GPUs. Thereby, we propose in this document a new generic guideline for building efficient parallel metaheuristics on GPU. Our challenge is to come out with the GPU-based design of the whole hierarchy of parallel models.In this purpose, very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of solutions to GPU threadsor memory management. These approaches have been exhaustively experimented using five optimization problems and four GPU configurations. Compared to a CPU-based execution, experiments report up to 80-fold acceleration for large combinatorial problems and up to 2000-fold speed-up for a continuous problem. The different works related to this thesis have been accepted in a dozen of publications, including the IEEE Transactions on Computers journal

APA, Harvard, Vancouver, ISO, and other styles

Chrétien, Benjamin. "Optimisation semi-infinie sur GPU pour le contrôle corps-complet de robots." Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT315/document.

Full text

Abstract:

Un robot humanoïde est un système complexe doté de nombreux degrés de liberté, et dont le comportement est sujet aux équations non linéaires du mouvement. Par conséquent, la planification de mouvement pour un tel système est une tâche difficile d'un point de vue calculatoire. Dans ce mémoire, nous avons pour objectif de développer une méthode permettant d'utiliser la puissance de calcul des GPUs dans le contexte de la planification de mouvement corps-complet basée sur de l'optimisation. Nous montrons dans un premier temps les propriétés du problème d'optimisation, et des pistes d'étude pour la parallélisation de ce dernier. Ensuite, nous présentons notre approche du calcul de la dynamique, adaptée aux architectures de calcul parallèle. Cela nous permet de proposer une implémentation de notre problème de planification de mouvement sur GPU: contraintes et gradients sont calculés en parallèle, tandis que la résolution du problème même se déroule sur le CPU. Nous proposons en outre une nouvelle paramétrisation des forces de contact adaptée à notre problème d'optimisation. Enfin, nous étudions l'extension de notre travail au contrôle prédictif A humanoid robot is a complex system with numerous degrees of freedom, whose behavior is subject to the nonlinear equations of motion. As a result, planning its motion is a difficult task from a computational perspective.In this thesis, we aim at developing a method that can leverage the computing power of GPUs in the context of optimization-based whole-body motion planning. We first exhibit the properties of the optimization problem, and show that several avenues can be exploited in the context of parallel computing. Then, we present our approach of the dynamics computation, suitable for highly-parallel processing architectures. Next, we propose a many-core GPU implementation of the motion planning problem. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. Because each constraint and its gradient can be evaluated independently for each time interval, we end up with a highly parallelizable problem that can take advantage of GPUs. We also propose a new parametrization of contact forces adapted to our optimization problem. Finally, we investigate the extension of our work to model predictive control

APA, Harvard, Vancouver, ISO, and other styles

Van, Luong Thé. "Métaheuristiques parallèles sur GPU." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2011. http://tel.archives-ouvertes.fr/tel-00638820.

Full text

Abstract:

APA, Harvard, Vancouver, ISO, and other styles

Delevacq, Audrey. "Métaheuristiques pour l'optimisation combinatoire sur processeurs graphiques (GPU)." Thesis, Reims, 2013. http://www.theses.fr/2013REIMS011/document.

Full text

Abstract:

Plusieurs problèmes d'optimisation combinatoire sont dits NP-difficiles et ne peuvent être résolus de façon optimale par des algorithmes exacts. Les métaheuristiques ont prouvé qu'elles pouvaient être efficaces pour résoudre un grand nombre de ces problèmes en leur trouvant des solutions approchées en un temps raisonnable. Cependant, face à des instances de grande taille, elles ont besoin d'un temps de calcul et d'une quantité d'espace mémoire considérables pour être performantes dans l'exploration de l'espace de recherche. Par conséquent, l'intérêt voué à leur déploiement sur des architectures de calcul haute performance a augmenté durant ces dernières années. Les approches de parallélisation existantes suivent généralement les paradigmes de passage de messages ou de mémoire partagée qui conviennent aux architectures traditionnelles à base de microprocesseurs, aussi appelés CPU (Central Processing Unit).Cependant, la recherche évolue très rapidement dans le domaine du parallélisme et de nouvelles architectures émergent, notamment les accélérateurs matériels qui permettent de décharger le CPU de certaines de ses tâches. Parmi ceux-ci, les processeurs graphiques ou GPU (Graphics Processing Units) présentent une architecture massivement parallèle possédant un grand potentiel mais aussi de nouvelles difficultés d'algorithmique et de programmation. En effet, les modèles de parallélisation de métaheuristiques existants sont généralement inadaptés aux environnements de calcul de type GPU. Certains travaux ont d'ailleurs abordé ce sujet sans toutefois y apporter une vision globale et fondamentale.L'objectif général de cette thèse est de proposer un cadre de référence permettant l'implémentation efficace des métaheuristiques sur des architectures parallèles basées sur les GPU. Elle débute par un état de l'art décrivant les travaux existants sur la parallélisation GPU des métaheuristiques et les classifications générales des métaheuristiques parallèles. Une taxonomie originale est ensuite proposée afin de classifier les implémentations recensées et de formaliser les stratégies de parallélisation sur GPU dans un cadre méthodologique cohérent. Cette thèse vise également à valider cette taxonomie en exploitant ses principales composantes pour proposer des stratégies de parallélisation originales spécifiquement adaptées aux architectures GPU. Plusieurs implémentations performantes basées sur les métaheuristiques d'Optimisation par Colonie de Fourmis et de Recherche Locale Itérée sont ainsi proposées pour la résolution du problème du Voyageur de Commerce. Une étude expérimentale structurée et minutieuse est réalisée afin d'évaluer et de comparer la performance des approches autant au niveau de la qualité des solutions trouvées que de la réduction du temps de calcul Several combinatorial optimization problems are NP-hard and can only be solved optimally by exact algorithms for small instances. Metaheuristics have proved to be effective in solving many of these problems by finding approximate solutions in a reasonable time. However, dealing with large instances, they may require considerable computation time and amount of memory space to be efficient in the exploration of the search space. Therefore, the interest devoted to their deployment on high performance computing architectures has increased over the past years. Existing parallelization approaches generally follow the message-passing and shared-memory computing paradigms which are suitable for traditional architectures based on microprocessors, also called CPU (Central Processing Unit). However, research in the field of parallel computing is rapidly evolving and new architectures emerge, including hardware accelerators which offloads the CPU of some of its tasks. Among them, graphics processors or GPUs (Graphics Processing Units) have a massively parallel architecture with great potential but also imply new algorithmic and programming challenges. In fact, existing parallelization models of metaheuristics are generally unsuited to computing environments like GPUs. Few works have tackled this subject without providing a comprehensive and fundamental view of it.The general purpose of this thesis is to propose a framework for the effective implementation of metaheuristics on parallel architectures based on GPUs. It begins with a state of the art describing existing works on GPU parallelization of metaheuristics and general classifications of parallel metaheuristics. An original taxonomy is then designed to classify identified implementations and to formalize GPU parallelization strategies in a coherent methodological framework. This thesis also aims to validate this taxonomy by exploiting its main components to propose original parallelization strategies specifically tailored to GPU architectures. Several effective implementations based on Ant Colony Optimization and Iterated Local Search metaheuristics are thus proposed for solving the Travelling Salesman Problem. A structured and thorough experimental study is conducted to evaluate and compare the performance of approaches on criteria related to solution quality and computing time reduction

APA, Harvard, Vancouver, ISO, and other styles

Quinto, Michele Arcangelo. "Méthode de reconstruction adaptive en tomographie par rayons X : optimisation sur architectures parallèles de type GPU." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENT109/document.

Full text

Abstract:

La reconstruction tomographique à partir de données de projections est un problème inverse largement utilisé en imagerie médicale et de façon plus modeste pour le contrôle nondestructif. Avec un nombre suffisant de projections, les algorithmes analytiques permettentdes reconstructions rapides et précises. Toutefois, dans le cas d’un faible nombre de vues(imagerie faible dose) et/ou d’angle limité (contraintes spécifiques liées à l’installation), lesdonnées disponibles pour l’inversion ne sont pas complètes, le mauvais conditionnementdu problème s’accentue, et les résultats montrent des artefacts importants. Pour aborderces situations, une approche alternative consiste à discrétiser le problème de reconstruction,et à utiliser des algorithmes itératifs ou une formulation statistique du problème afinde calculer une estimation de l’objet inconnu. Ces méthodes sont classiquement basées surune discrétisation du volume en un ensemble de voxels, et fournissent des cartes 3D de ladensité de l’objet étudié. Les temps de calcul et la ressource mémoire de ces méthodesitératives sont leurs principaux points faibles. Par ailleurs, quelle que soit l’application, lesvolumes sont ensuite segmentés pour une analyse quantitative. Devant le large éventaild’outils de segmentation existant, basés sur différentes interprétations des contours et defonctionnelles à minimiser, les choix sont multiples et les résultats en dépendent.Ce travail de thèse présente une nouvelle approche de reconstruction simultanée àla segmentation des différents matériaux qui composent le volume. Le processus dereconstruction n’est plus basé sur une grille régulière de pixels (resp. voxels), mais sur unmaillage composé de triangles (resp. tétraèdres) non réguliers qui s’adaptent à la formede l’objet. Après une phase d’initialisation, la méthode se décompose en trois étapesprincipales que sont la reconstruction, la segmentation et l’adaptation du maillage, quialternent de façon itérative jusqu’à convergence. Des algorithmes itératifs de reconstructioncommunément utilisés avec une représentation conventionnelle de l’image ont étéadaptés et optimisés pour être exécutés sur des grilles irrégulières composées d’élémentstriangulaires ou tétraédriques. Pour l’étape de segmentation, deux méthodes basées surune approche paramétrique (snake) et l’autre sur une approche géométrique (level set)ont été mises en oeuvre afin de considérer des objets de différentes natures (mono- etmulti- matériaux). L’adaptation du maillage au contenu de l’image estimée est basée surles contours segmentés précédemment, pour affiner la maille au niveau des détails del’objet et la rendre plus grossière dans les zones contenant peu d’information. En finde processus, le résultat est une image classique de reconstruction tomographique enniveaux de gris, mais dont la représentation par un maillage adapté au contenu proposeidirectement une segmentation associée. Les résultats montrent que la partie adaptative dela méthode permet de représenter efficacement les objets et conduit à diminuer drastiquementla mémoire nécessaire au stockage. Dans ce contexte, une version 2D du calcul desopérateurs de reconstruction sur une architecture parallèle type GPU montre la faisabilitédu processus dans son ensemble. Une version optimisée des opérateurs 3D permet descalculs encore plus efficaces Tomography reconstruction from projections data is an inverse problem widely used inthe medical imaging field. With sufficiently large number of projections over the requiredangle, the FBP (filtered backprojection) algorithms allow fast and accurate reconstructions.However in the cases of limited views (lose dose imaging) and/or limited angle (specificconstrains of the setup), the data available for inversion are not complete, the problembecomes more ill-conditioned, and the results show significant artifacts. In these situations,an alternative approach of reconstruction, based on a discrete model of the problem,consists in using an iterative algorithm or a statistical modelisation of the problem to computean estimate of the unknown object. These methods are classicaly based on a volumediscretization into a set of voxels and provide 3D maps of densities. Computation time andmemory storage are their main disadvantages. Moreover, whatever the application, thevolumes are segmented for a quantitative analysis. Numerous methods of segmentationwith different interpretations of the contours and various minimized energy functionalare offered, and the results can depend on their use.This thesis presents a novel approach of tomographic reconstruction simultaneouslyto segmentation of the different materials of the object. The process of reconstruction isno more based on a regular grid of pixels (resp. voxel) but on a mesh composed of nonregular triangles (resp. tetraedra) adapted to the shape of the studied object. After aninitialization step, the method runs into three main steps: reconstruction, segmentationand adaptation of the mesh, that iteratively alternate until convergence. Iterative algorithmsof reconstruction used in a conventionnal way have been adapted and optimizedto be performed on irregular grids of triangular or tetraedric elements. For segmentation,two methods, one based on a parametric approach (snake) and the other on a geometricapproach (level set) have been implemented to consider mono and multi materials objects.The adaptation of the mesh to the content of the estimated image is based on the previoussegmented contours that makes the mesh progressively coarse from the edges to thelimits of the domain of reconstruction. At the end of the process, the result is a classicaltomographic image in gray levels, but whose representation by an adaptive mesh toits content provide a correspoonding segmentation. The results show that the methodprovides reliable reconstruction and leads to drastically decrease the memory storage. Inthis context, the operators of projection have been implemented on parallel archituecturecalled GPU. A first 2D version shows the feasability of the full process, and an optimizedversion of the 3D operators provides more efficent compoutations

APA, Harvard, Vancouver, ISO, and other styles

O'Connell, Jonathan F. "A dynamic programming model to solve optimisation problems using GPUs." Thesis, Cardiff University, 2017. http://orca.cf.ac.uk/97930/.

Full text

Abstract:

This thesis presents a parallel, dynamic programming based model which is deployed on the GPU of a system to accelerate the solving of optimisation problems. This is achieved by simultaneously running GPU based computations, and memory transactions, allowing computation to never pause, and overcoming the memory constraints of solving large problem instances. Due to this some optimisation problems, which are currently not solved in an exact manner for real world sized instances due to their complexity, are moved into the solvable realm. The model is implemented to solve, a range of different test problems, where artificially constructed test data is used to ensure good performance even in the worst cases. Through this extensive testing, we can be confident the model will perform well when used to solve real world test cases. Testing of the model was carried out using a range of different implementation parameters in relation to deployment on the GPU, in order to identify both optimal implementation parameters, and how the model will operate when running on different systems. All problems, when implemented in parallel using the model, show run-time improvements compared to the sequential implementations, in some instances up to hundreds of times faster, but more importantly also show high efficiency metrics for the utilisation of GPU resources. Throughout testing emphasis has been placed on GPU based metrics to ensure the wider generic applicability of the model. Finally, the parallel model allows for new problems to be defined through the use of a simple file format, enabling wider usage of the model.

APA, Harvard, Vancouver, ISO, and other styles

Pospíchal, Petr. "Akcelerace genetického algoritmu s využitím GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236783.

Full text

Abstract:

This thesis represents master's thesis focused on acceleration of Genetic algorithms using GPU. First chapter deeply analyses Genetic algorithms and corresponding topics like population, chromosome, crossover, mutation and selection. Next part of the thesis shows GPU abilities for unified computing using both DirectX/OpenGL with Cg and specialized GPGPU libraries like CUDA. The fourth chapter focuses on design of GPU implementation using CUDA, coarse-grained and fine-grained GAs are discussed, and completed by sorting and random number generation task accelerated by GPU. Next chapter covers implementation details -- migration, crossover and selection schemes mapped on CUDA software model. All GA elements and quality of GPU results are described in the last chapter.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Optimisations for GPU"

Nagy, Szilárd, Károly Jármai, and Attila Baksa. "Combination of GPU Programming and FEM Analysis in Structural Optimisation." In Vehicle and Automotive Engineering 4. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-15211-5_63.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Prata, Paula, Paulo Fazendeiro, Pedro Sequeira, and Chandrashekhar Padole. "A Comment on Bio-inspired Optimisation via GPU Architecture: The Genetic Algorithm Workload." In Swarm, Evolutionary, and Memetic Computing. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-35380-2_78.

Full text

APA, Harvard, Vancouver, ISO, and other styles

González-Arribas, Daniel, Manuel Sanjurjo-Rivo, and Manuel Soler. "Multiobjective Optimisation of Aircraft Trajectories Under Wind Uncertainty Using GPU Parallelism and Genetic Algorithms." In Computational Methods in Applied Sciences. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-89890-2_29.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Youcef, Bouras. "Research Information." In Advanced Deep Learning Applications in Big Data Analytics. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-2791-7.ch011.

Full text

Abstract:

This chapter describes the framework of an analytical study around the computational intelligence algorithms, which are prompted by natural mechanisms and complex biological phenomena. These algorithms are numerous and can be classified in two great families: firstly the family of evolutionary algorithms (EA) such as genetic algorithms (GAs), genetic programming (GP), evolutionary strategy (ES), differential evolutionary (DE), paddy field algorithm (PFA); secondly, the swarm intelligence algorithms (SIA) such as particle swarm optimisation (PSO), ant colony optimization (ACO), bacteria foraging optimisation (BFO), wolf colony algorithm (WCA), fireworks algorithm (FA), bat algorithm (BA), cockroaches algorithm (CA), social spiders algorithm (SSA), cuckoo search algorithm (CSA), wasp swarm optimisation (WSO), mosquito optimisation algorithm (MOA). The authors have detailed the functioning of each algorithm following a structured organization (the descent of the algorithm, the inspiration source, the summary, and the general process) that offers for readers a thorough understanding. This study is the fruit of many years of research in the form of synthesis, which groups the contributions offered by several researchers in the meta-heuristic field. It can be the beginning point for planning and modelling new algorithms or improving existing algorithms.

APA, Harvard, Vancouver, ISO, and other styles

Soares, Adroaldo Santos, Lilian Lefol Nani Guarieiro, Oberdan Rocha Pinheiro, Marcelo Albano Moret Simões Gonçalves, Fabio de Sousa Santos, and Fernando Luiz Pellegrini Pessoa. "Metamodeling of the deposition process in oil pre-processing to optimise the cleaning of the heat exchanger network: A systematic review." In Themes focused on interdisciplinarity and sustainable development worldwide V. 02. Seven Editora, 2024. http://dx.doi.org/10.56238/sevened2024.003-009.

Full text

Abstract:

Identifying and analysing possible metamodelling techniques to optimise the performance of heat exchangers in oil pre-processing from the point of view of the deposition process is of great importance for evaluating the performance of heat exchangers in different operating and maintenance configurations in order to increase their energy efficiency, since during the operation of heat exchanger networks, deposition on the heat exchange surfaces is common, reducing their effectiveness. In this article, a systematic review was carried out to study the metamodelling techniques and optimisation tools used. The results of the study showed that there are some techniques used such as: Recurrent Neural Networks (RNN); Multi-Layer Perceptron (MPL); Long Short-Term Memory (LSTM); Gated Recurrent Unit (GRU); Recurrent Convolutional Neural Network (RCNN), and tools that will be covered in this study.

APA, Harvard, Vancouver, ISO, and other styles

Bhalla, Ishan, and Kamlesh Chaudhary. "Applying Service Oriented Architecture and Cloud Computing for a Greener Traffic Management." In Green Technologies. IGI Global, 2011. http://dx.doi.org/10.4018/978-1-60960-472-1.ch406.

Full text

Abstract:

Traffic Management System (TMS) is a possible implementation of a Green IT application. It can have direct impact on reducing the greenhouse gases. The focus of this report is to illustrate how event driven SOA design principles can be applied in designing traffic management system. It also discusses how cloud computing concept can be used for TMS application. Traffic during peak hours is a problem in any major city where population growth far exceeds the infrastructure. Frequent stop and start of the cars on the heavy traffic roads and slow moving traffic causes greater fuel consumption, which results in greater emission of carbon gases. If efficient traffic management system can speed up the traffic average speed it will help reduce the carbon emission. As the WiMAX technology reaches maturity and achieves greater reliability and speed for wireless data transmissions new mobile applications are possible. Traffic Management System is one such example. WiMAX can facilitate communication to and from fast moving cars. WiMAX combined with GPS (Global Positioning System) technology can facilitate building an efficient traffic management system. The authors have also discussed various scenarios where Cloud computing technology can be utilised resulting in further optimisation of the computing resources and therefore reducing the carbon emission.

APA, Harvard, Vancouver, ISO, and other styles

Bhalla, Ishan, and Kamlesh Chaudhary. "Applying Service Oriented Architecture and Cloud Computing for a Greener Traffic Management." In Handbook of Research on Green ICT. IGI Global, 2011. http://dx.doi.org/10.4018/978-1-61692-834-6.ch023.

Full text

Abstract:

APA, Harvard, Vancouver, ISO, and other styles

Ciampoli, L. Bianchini, F. D’Amico, A. Calvi, F. Benedetto, and F. Tosti. "Signal processing for optimisation of low-powered GPR data with application in transportation engineering (roads and railways)." In Bearing Capacity of Roads, Railways and Airfields. CRC Press, 2017. http://dx.doi.org/10.1201/9781315100333-206.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Optimisations for GPU"

Lemos, Dayllon V. X., Humberto J. Longo, Wellington S. Martins, and Les R. Foulds. "A GPU-based DP algorithm for solving multiple instances of the knapsack problem." In Simpósio em Sistemas Computacionais de Alto Desempenho. Sociedade Brasileira de Computação, 2023. http://dx.doi.org/10.5753/wscad.2023.235875.

Full text

Abstract:

The knapsack problem is a classic and fundamental optimisation problem that serves as a subproblem in various optimisation algorithms. Thus, it is of great importance that we manage to solve several instances of the knapsack problem in a fast and efficient way. In this work we present a parallel algorithm, based on dynamic programming, that can take advantage of parallelism as more knapsacks need to be solved. The algorithm makes use of fine-grained data parallelism and is easily mapped to GPU accelerators. Extensive experiments with diverse datasets demonstrate the superiority of the proposed algorithm, achieving relevant speedups compared to a serial algorithm.

APA, Harvard, Vancouver, ISO, and other styles

Paukste, Andrius. "Monte Carlo optimisation auto-tuning on a multi-GPU cluster." In 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing (PDGC). IEEE, 2012. http://dx.doi.org/10.1109/pdgc.2012.6449942.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wainwright, Thomas R., Daniel J. Poole, and Christian B. Allen. "GPU-accelerated aerodynamic shape optimisation framework for large turbine blades." In AIAA SCITECH 2022 Forum. American Institute of Aeronautics and Astronautics, 2022. http://dx.doi.org/10.2514/6.2022-1292.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Maknickienė, Nijolė, Ieva Kekytė, and Algirdas Maknickas. "COMPUTATION INTELLIGENCE BASED DAILY ALGORITHMIC STRATEGIES FOR TRADING IN THE FOREIGN EXCHANGE MARKET." In Business and Management 2018. VGTU Technika, 2018. http://dx.doi.org/10.3846/bm.2018.53.

Full text

Abstract:

Successful trading in financial markets is not possible without a support system that manages the preparation of the data, prediction system, and risk management and evaluates the trading efficien-cy. Selected orthogonal data was used to predict exchange rates by applying recurrent neural network (RNN) software based on the open source framework Keras and the graphical processing unit (GPU) NVIDIA GTX1070 to accelerate RNN learning. The newly developed software on the GPU predicted ten high-low distributions in approximately 90 minutes. This paper compares different daily algorith-mic trading strategies based on four methods of portfolio creation: split equally, optimisation, orthogonality, and maximal expectations. Each investigated portfolio has opportunities and limita-tions dependent on market state and behaviour of investors, and the efficiencies of the trading sup-port systems for investors in foreign exchange market were tested in a demo FOREX market in real time and compared with similar results obtained for risk-free rates.

APA, Harvard, Vancouver, ISO, and other styles

Li, Da, Hancheng Wu, and Michela Becchi. "Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs." In COSMIC '15: International Workshop on Code Optimisation for Multi and Many Cores. ACM, 2015. http://dx.doi.org/10.1145/2723772.2723780.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Jaros, Jiri, Jan Marek, and Pavel Mensik. "Optimisation of Water Management Systems Using a GPU-Accelerated Differential Evolution." In 2015 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2015. http://dx.doi.org/10.1109/ssci.2015.266.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Thomas, B., A. El Ouardi, S. Bouaziz, R. Le Goff Latimier, and H. Ben Ahmed. "GPU Optimisation of an Endogenous Peer-to-Peer Market with Product Differentiation." In 2023 IEEE Belgrade PowerTech. IEEE, 2023. http://dx.doi.org/10.1109/powertech55446.2023.10202823.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Cecilia, J. M., J. M. Garcia, M. Ujaldon, A. Nisbet, and M. Amos. "Parallelization strategies for ant colony optimisation on GPUs." In Distributed Processing, Workshops and Phd Forum. IEEE, 2011. http://dx.doi.org/10.1109/ipdps.2011.170.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Nagabandi, K., S. Mills, X. Zhang, D. J. J. Toal, and A. J. Keane. "Surrogate Based Design Optimisation of Combustor Tile Cooling Feed Holes." In ASME 2017 Gas Turbine India Conference. American Society of Mechanical Engineers, 2017. http://dx.doi.org/10.1115/gtindia2017-4586.

Full text

Abstract:

Gas turbine operating temperatures are projected to continue to increase and this leads to drawing more cooling air to keep the metals below their operational temperatures. This cooling air is chargeable as it has gone through several stages of compressor work. In this paper a surrogate based design optimization approach is used to reduce cooling mass flow on combustor tiles to attain pre-defined maximum metal surface temperatures dictated by different service life requirements. A series of Kriging based surrogate models are constructed using an efficient GPU based particle swarm algorithm. Various mechanical and manufacturing constraints such as hole ligament size, encroachment of holes onto other features like side rails, pedestals, dilution ports and retention pins etc. are built into the models and these models are trained using a number of high fidelity simulations. Furthermore these simulations employ the proprietary Rolls-Royce Finite Element Analysis (FEA) package SCO3 to run thermal analysis predicting surface heat transfer coefficients, fluid temperatures and finally metal surface temperatures. These temperature predictions are compared against the pre-defined surface temperature limits for a given service life and fed back to the surrogate model to run for new hole configuration. This way the loop continues until an optimized hole configuration is attained. Results demonstrate the potential of this optimization technique to improve the life of combustor tile by reducing tile temperature and also to reduce the amount of cooling air required.

APA, Harvard, Vancouver, ISO, and other styles

Dybedal, Joacim, and Geir Hovland. "GPU-Based Optimisation of 3D Sensor Placement Considering Redundancy, Range and Field of View." In 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, 2020. http://dx.doi.org/10.1109/iciea48937.2020.9248170.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Optimisations for GPU'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Optimisations for GPU"

Dissertations / Theses on the topic "Optimisations for GPU"

Book chapters on the topic "Optimisations for GPU"

Conference papers on the topic "Optimisations for GPU"