Academic literature on the topic 'Optimisations for GPU'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Optimisations for GPU.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Optimisations for GPU"
Amadio, G., J. Apostolakis, P. Buncic, G. Cosmo, D. Dosaru, A. Gheata, S. Hageboeck, et al. "Offloading electromagnetic shower transport to GPUs." Journal of Physics: Conference Series 2438, no. 1 (February 1, 2023): 012055. http://dx.doi.org/10.1088/1742-6596/2438/1/012055.
Full textYao, Shujun, Shuo Zhang, and Wanhua Guo. "Electromagnetic transient parallel simulation optimisation based on GPU." Journal of Engineering 2019, no. 16 (March 1, 2019): 1737–42. http://dx.doi.org/10.1049/joe.2018.8587.
Full textEbrahim, Abdulla, Andrea Bocci, Wael Elmedany, and Hesham Al-Ammal. "Optimising the Configuration of the CMS GPU Reconstruction." EPJ Web of Conferences 295 (2024): 11015. http://dx.doi.org/10.1051/epjconf/202429511015.
Full textQuan, H., Z. Cui, R. Wang, and Zongjie Cao. "GPU parallel implementation and optimisation of SAR target recognition method." Journal of Engineering 2019, no. 21 (November 1, 2019): 8129–33. http://dx.doi.org/10.1049/joe.2019.0669.
Full textTräff, Erik A., Anton Rydahl, Sven Karlsson, Ole Sigmund, and Niels Aage. "Simple and efficient GPU accelerated topology optimisation: Codes and applications." Computer Methods in Applied Mechanics and Engineering 410 (May 2023): 116043. http://dx.doi.org/10.1016/j.cma.2023.116043.
Full textSzénási, Sándor. "Solving the inverse heat conduction problem using NVLink capable Power architecture." PeerJ Computer Science 3 (November 20, 2017): e138. http://dx.doi.org/10.7717/peerj-cs.138.
Full textBitam, Salim, NourEddine Djedi, and Maroua Grid. "GPU-based distributed bee swarm optimisation for dynamic vehicle routing problem." International Journal of Ad Hoc and Ubiquitous Computing 31, no. 3 (2019): 155. http://dx.doi.org/10.1504/ijahuc.2019.10022343.
Full textKhemiri, Randa, Hassan Kibeya, Fatma Ezahra Sayadi, Nejmeddine Bahri, Mohamed Atri, and Nouri Masmoudi. "Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation." IET Image Processing 12, no. 2 (February 1, 2018): 243–53. http://dx.doi.org/10.1049/iet-ipr.2017.0474.
Full textUchida, Akihiro, Yasuaki Ito, and Koji Nakano. "Accelerating ant colony optimisation for the travelling salesman problem on the GPU." International Journal of Parallel, Emergent and Distributed Systems 29, no. 4 (October 8, 2013): 401–20. http://dx.doi.org/10.1080/17445760.2013.842568.
Full textSpalding, Myles, Anthony Walsh, and Trent Aland. "Evaluation of a new GPU-enabled VMAT multi-criteria optimisation plan generation algorithm." Medical Dosimetry 45, no. 4 (2020): 368–73. http://dx.doi.org/10.1016/j.meddos.2020.05.007.
Full textDissertations / Theses on the topic "Optimisations for GPU"
Romera, Thomas. "Adéquation algorithme architecture pour flot optique sur GPU embarqué." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS450.
Full textThis thesis focus on the optimization and efficient implementation of pixel motion (optical flow) estimation algorithms on embedded graphics processing units (GPUs). Two iterative algorithms have been studied: the Total Variation - L1 (TV-L1) method and the Horn-Schunck method. The primary objective of this work is to achieve real-time processing, with a target frame processing time of less than 40 milliseconds, on low-power platforms, while maintaining acceptable image resolution and flow estimation quality for the intended applications. Various levels of optimization strategies have been explored. High-level algorithmic transformations, such as operator fusion and operator pipelining, have been implemented to maximize data reuse and enhance spatial/temporal locality. Additionally, GPU-specific low-level optimizations, including the utilization of vector instructions and numbers, as well as efficient memory access management, have been incorporated. The impact of floating-point number representation (single-precision versus half-precision) has also been investigated. The implementations have been assessed on Nvidia's Jetson Xavier, TX2, and Nano embedded platforms in terms of execution time, power consumption, and optical flow accuracy. Notably, the TV-L1 method exhibits higher complexity and computational intensity compared to Horn-Schunck. The fastest versions of these algorithms achieve a processing rate of 0.21 nanoseconds per pixel per iteration in half-precision on the Xavier platform, representing a 22x time reduction over efficient and parallel CPU versions. Furthermore, energy consumption is reduced by a factor of x5.3. Among the tested boards, the Xavier embedded platform, being both the most powerful and the most recent, consistently delivers the best results in terms of speed and energy efficiency. Operator merging and pipelining have proven to be instrumental in improving GPU performance by enhancing data reuse. This data reuse is made possible through GPU Shared memory, which is a small, high-speed memory that enables data sharing among threads within the same GPU thread block. While merging multiple iterations yields performance gains, it is constrained by the size of the Shared memory, necessitating trade-offs between resource utilization and speed. The adoption of half-precision numbers accelerates iterative algorithms and achieves superior optical flow accuracy within the same time frame compared to single-precision counterparts. Half-precision implementations converge more rapidly due to the increased number of iterations possible within a given time window. Specifically, the use of half-precision numbers on the best GPU architecture accelerates execution by up to x2.2 for TV-L1 and x3.7 for Horn-Schunck. This work underscores the significance of both GPU-specific optimizations for computer vision algorithms, along with the use and study of reduced floating point numbers. They pave the way for future enhancements through new algorithmic transformations, alternative numerical formats, and hardware architectures. This approach can potentially be extended to other families of iterative algorithms
Fumero, Alfonso Juan José. "Accelerating interpreted programming languages on GPUs with just-in-time compilation and runtime optimisations." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28718.
Full textHopson, Benjamin Thomas Ken. "Techniques of design optimisation for algorithms implemented in software." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20435.
Full textLuong, Thé Van. "Métaheuristiques parallèles sur GPU." Thesis, Lille 1, 2011. http://www.theses.fr/2011LIL10058/document.
Full textReal-world optimization problems are often complex and NP-hard. Their modeling is continuously evolving in terms of constraints and objectives, and their resolution is CPU time-consuming. Although near-optimal algorithms such as metaheuristics (generic heuristics) make it possible to reduce the temporal complexity of their resolution, they fail to tackle large problems satisfactorily. Over the last decades, parallel computing has been revealed as an unavoidable way to deal with large problem instances of difficult optimization problems. The design and implementation of parallel metaheuristics are strongly influenced by the computing platform. Nowadays, GPU computing has recently been revealed effective to deal with time-intensive problems. This new emerging technology is believed to be extremely useful to speed up many complex algorithms. One of the major issues for metaheuristics is to rethink existing parallel models and programming paradigms to allow their deployment on GPU accelerators. Generally speaking, the major issues we have to deal with are: the distribution of data processing between CPU and GPU, the thread synchronization, the optimization of data transfer between the different memories, the memory capacity constraints, etc. The contribution of this thesis is to deal with such issues for the redesign of parallel models of metaheuristics to allow solving of large scale optimization problems on GPU architectures. Our objective is to rethink the existing parallel models and to enable their deployment on GPUs. Thereby, we propose in this document a new generic guideline for building efficient parallel metaheuristics on GPU. Our challenge is to come out with the GPU-based design of the whole hierarchy of parallel models.In this purpose, very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of solutions to GPU threadsor memory management. These approaches have been exhaustively experimented using five optimization problems and four GPU configurations. Compared to a CPU-based execution, experiments report up to 80-fold acceleration for large combinatorial problems and up to 2000-fold speed-up for a continuous problem. The different works related to this thesis have been accepted in a dozen of publications, including the IEEE Transactions on Computers journal
Chrétien, Benjamin. "Optimisation semi-infinie sur GPU pour le contrôle corps-complet de robots." Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT315/document.
Full textA humanoid robot is a complex system with numerous degrees of freedom, whose behavior is subject to the nonlinear equations of motion. As a result, planning its motion is a difficult task from a computational perspective.In this thesis, we aim at developing a method that can leverage the computing power of GPUs in the context of optimization-based whole-body motion planning. We first exhibit the properties of the optimization problem, and show that several avenues can be exploited in the context of parallel computing. Then, we present our approach of the dynamics computation, suitable for highly-parallel processing architectures. Next, we propose a many-core GPU implementation of the motion planning problem. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. Because each constraint and its gradient can be evaluated independently for each time interval, we end up with a highly parallelizable problem that can take advantage of GPUs. We also propose a new parametrization of contact forces adapted to our optimization problem. Finally, we investigate the extension of our work to model predictive control
Van, Luong Thé. "Métaheuristiques parallèles sur GPU." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2011. http://tel.archives-ouvertes.fr/tel-00638820.
Full textDelevacq, Audrey. "Métaheuristiques pour l'optimisation combinatoire sur processeurs graphiques (GPU)." Thesis, Reims, 2013. http://www.theses.fr/2013REIMS011/document.
Full textSeveral combinatorial optimization problems are NP-hard and can only be solved optimally by exact algorithms for small instances. Metaheuristics have proved to be effective in solving many of these problems by finding approximate solutions in a reasonable time. However, dealing with large instances, they may require considerable computation time and amount of memory space to be efficient in the exploration of the search space. Therefore, the interest devoted to their deployment on high performance computing architectures has increased over the past years. Existing parallelization approaches generally follow the message-passing and shared-memory computing paradigms which are suitable for traditional architectures based on microprocessors, also called CPU (Central Processing Unit). However, research in the field of parallel computing is rapidly evolving and new architectures emerge, including hardware accelerators which offloads the CPU of some of its tasks. Among them, graphics processors or GPUs (Graphics Processing Units) have a massively parallel architecture with great potential but also imply new algorithmic and programming challenges. In fact, existing parallelization models of metaheuristics are generally unsuited to computing environments like GPUs. Few works have tackled this subject without providing a comprehensive and fundamental view of it.The general purpose of this thesis is to propose a framework for the effective implementation of metaheuristics on parallel architectures based on GPUs. It begins with a state of the art describing existing works on GPU parallelization of metaheuristics and general classifications of parallel metaheuristics. An original taxonomy is then designed to classify identified implementations and to formalize GPU parallelization strategies in a coherent methodological framework. This thesis also aims to validate this taxonomy by exploiting its main components to propose original parallelization strategies specifically tailored to GPU architectures. Several effective implementations based on Ant Colony Optimization and Iterated Local Search metaheuristics are thus proposed for solving the Travelling Salesman Problem. A structured and thorough experimental study is conducted to evaluate and compare the performance of approaches on criteria related to solution quality and computing time reduction
Quinto, Michele Arcangelo. "Méthode de reconstruction adaptive en tomographie par rayons X : optimisation sur architectures parallèles de type GPU." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENT109/document.
Full textTomography reconstruction from projections data is an inverse problem widely used inthe medical imaging field. With sufficiently large number of projections over the requiredangle, the FBP (filtered backprojection) algorithms allow fast and accurate reconstructions.However in the cases of limited views (lose dose imaging) and/or limited angle (specificconstrains of the setup), the data available for inversion are not complete, the problembecomes more ill-conditioned, and the results show significant artifacts. In these situations,an alternative approach of reconstruction, based on a discrete model of the problem,consists in using an iterative algorithm or a statistical modelisation of the problem to computean estimate of the unknown object. These methods are classicaly based on a volumediscretization into a set of voxels and provide 3D maps of densities. Computation time andmemory storage are their main disadvantages. Moreover, whatever the application, thevolumes are segmented for a quantitative analysis. Numerous methods of segmentationwith different interpretations of the contours and various minimized energy functionalare offered, and the results can depend on their use.This thesis presents a novel approach of tomographic reconstruction simultaneouslyto segmentation of the different materials of the object. The process of reconstruction isno more based on a regular grid of pixels (resp. voxel) but on a mesh composed of nonregular triangles (resp. tetraedra) adapted to the shape of the studied object. After aninitialization step, the method runs into three main steps: reconstruction, segmentationand adaptation of the mesh, that iteratively alternate until convergence. Iterative algorithmsof reconstruction used in a conventionnal way have been adapted and optimizedto be performed on irregular grids of triangular or tetraedric elements. For segmentation,two methods, one based on a parametric approach (snake) and the other on a geometricapproach (level set) have been implemented to consider mono and multi materials objects.The adaptation of the mesh to the content of the estimated image is based on the previoussegmented contours that makes the mesh progressively coarse from the edges to thelimits of the domain of reconstruction. At the end of the process, the result is a classicaltomographic image in gray levels, but whose representation by an adaptive mesh toits content provide a correspoonding segmentation. The results show that the methodprovides reliable reconstruction and leads to drastically decrease the memory storage. Inthis context, the operators of projection have been implemented on parallel archituecturecalled GPU. A first 2D version shows the feasability of the full process, and an optimizedversion of the 3D operators provides more efficent compoutations
O'Connell, Jonathan F. "A dynamic programming model to solve optimisation problems using GPUs." Thesis, Cardiff University, 2017. http://orca.cf.ac.uk/97930/.
Full textPospíchal, Petr. "Akcelerace genetického algoritmu s využitím GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236783.
Full textBook chapters on the topic "Optimisations for GPU"
Nagy, Szilárd, Károly Jármai, and Attila Baksa. "Combination of GPU Programming and FEM Analysis in Structural Optimisation." In Vehicle and Automotive Engineering 4, 756–67. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-15211-5_63.
Full textPrata, Paula, Paulo Fazendeiro, Pedro Sequeira, and Chandrashekhar Padole. "A Comment on Bio-inspired Optimisation via GPU Architecture: The Genetic Algorithm Workload." In Swarm, Evolutionary, and Memetic Computing, 670–78. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-35380-2_78.
Full textGonzález-Arribas, Daniel, Manuel Sanjurjo-Rivo, and Manuel Soler. "Multiobjective Optimisation of Aircraft Trajectories Under Wind Uncertainty Using GPU Parallelism and Genetic Algorithms." In Computational Methods in Applied Sciences, 453–66. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-89890-2_29.
Full textYoucef, Bouras. "Research Information." In Advanced Deep Learning Applications in Big Data Analytics, 218–72. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-2791-7.ch011.
Full textSoares, Adroaldo Santos, Lilian Lefol Nani Guarieiro, Oberdan Rocha Pinheiro, Marcelo Albano Moret Simões Gonçalves, Fabio de Sousa Santos, and Fernando Luiz Pellegrini Pessoa. "Metamodeling of the deposition process in oil pre-processing to optimise the cleaning of the heat exchanger network: A systematic review." In Themes focused on interdisciplinarity and sustainable development worldwide V. 02. Seven Editora, 2024. http://dx.doi.org/10.56238/sevened2024.003-009.
Full textBhalla, Ishan, and Kamlesh Chaudhary. "Applying Service Oriented Architecture and Cloud Computing for a Greener Traffic Management." In Green Technologies, 678–93. IGI Global, 2011. http://dx.doi.org/10.4018/978-1-60960-472-1.ch406.
Full textBhalla, Ishan, and Kamlesh Chaudhary. "Applying Service Oriented Architecture and Cloud Computing for a Greener Traffic Management." In Handbook of Research on Green ICT, 332–47. IGI Global, 2011. http://dx.doi.org/10.4018/978-1-61692-834-6.ch023.
Full textCiampoli, L. Bianchini, F. D’Amico, A. Calvi, F. Benedetto, and F. Tosti. "Signal processing for optimisation of low-powered GPR data with application in transportation engineering (roads and railways)." In Bearing Capacity of Roads, Railways and Airfields, 1553–57. CRC Press, 2017. http://dx.doi.org/10.1201/9781315100333-206.
Full textConference papers on the topic "Optimisations for GPU"
Lemos, Dayllon V. X., Humberto J. Longo, Wellington S. Martins, and Les R. Foulds. "A GPU-based DP algorithm for solving multiple instances of the knapsack problem." In Simpósio em Sistemas Computacionais de Alto Desempenho. Sociedade Brasileira de Computação, 2023. http://dx.doi.org/10.5753/wscad.2023.235875.
Full textPaukste, Andrius. "Monte Carlo optimisation auto-tuning on a multi-GPU cluster." In 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing (PDGC). IEEE, 2012. http://dx.doi.org/10.1109/pdgc.2012.6449942.
Full textWainwright, Thomas R., Daniel J. Poole, and Christian B. Allen. "GPU-accelerated aerodynamic shape optimisation framework for large turbine blades." In AIAA SCITECH 2022 Forum. Reston, Virginia: American Institute of Aeronautics and Astronautics, 2022. http://dx.doi.org/10.2514/6.2022-1292.
Full textMaknickienė, Nijolė, Ieva Kekytė, and Algirdas Maknickas. "COMPUTATION INTELLIGENCE BASED DAILY ALGORITHMIC STRATEGIES FOR TRADING IN THE FOREIGN EXCHANGE MARKET." In Business and Management 2018. VGTU Technika, 2018. http://dx.doi.org/10.3846/bm.2018.53.
Full textLi, Da, Hancheng Wu, and Michela Becchi. "Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs." In COSMIC '15: International Workshop on Code Optimisation for Multi and Many Cores. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2723772.2723780.
Full textJaros, Jiri, Jan Marek, and Pavel Mensik. "Optimisation of Water Management Systems Using a GPU-Accelerated Differential Evolution." In 2015 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2015. http://dx.doi.org/10.1109/ssci.2015.266.
Full textThomas, B., A. El Ouardi, S. Bouaziz, R. Le Goff Latimier, and H. Ben Ahmed. "GPU Optimisation of an Endogenous Peer-to-Peer Market with Product Differentiation." In 2023 IEEE Belgrade PowerTech. IEEE, 2023. http://dx.doi.org/10.1109/powertech55446.2023.10202823.
Full textCecilia, J. M., J. M. Garcia, M. Ujaldon, A. Nisbet, and M. Amos. "Parallelization strategies for ant colony optimisation on GPUs." In Distributed Processing, Workshops and Phd Forum. IEEE, 2011. http://dx.doi.org/10.1109/ipdps.2011.170.
Full textNagabandi, K., S. Mills, X. Zhang, D. J. J. Toal, and A. J. Keane. "Surrogate Based Design Optimisation of Combustor Tile Cooling Feed Holes." In ASME 2017 Gas Turbine India Conference. American Society of Mechanical Engineers, 2017. http://dx.doi.org/10.1115/gtindia2017-4586.
Full textDybedal, Joacim, and Geir Hovland. "GPU-Based Optimisation of 3D Sensor Placement Considering Redundancy, Range and Field of View." In 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, 2020. http://dx.doi.org/10.1109/iciea48937.2020.9248170.
Full text