Literatura académica sobre el tema "Optimisations for GPU"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Optimisations for GPU".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Artículos de revistas sobre el tema "Optimisations for GPU"
Amadio, G., J. Apostolakis, P. Buncic, G. Cosmo, D. Dosaru, A. Gheata, S. Hageboeck et al. "Offloading electromagnetic shower transport to GPUs". Journal of Physics: Conference Series 2438, n.º 1 (1 de febrero de 2023): 012055. http://dx.doi.org/10.1088/1742-6596/2438/1/012055.
Texto completoYao, Shujun, Shuo Zhang y Wanhua Guo. "Electromagnetic transient parallel simulation optimisation based on GPU". Journal of Engineering 2019, n.º 16 (1 de marzo de 2019): 1737–42. http://dx.doi.org/10.1049/joe.2018.8587.
Texto completoEbrahim, Abdulla, Andrea Bocci, Wael Elmedany y Hesham Al-Ammal. "Optimising the Configuration of the CMS GPU Reconstruction". EPJ Web of Conferences 295 (2024): 11015. http://dx.doi.org/10.1051/epjconf/202429511015.
Texto completoQuan, H., Z. Cui, R. Wang y Zongjie Cao. "GPU parallel implementation and optimisation of SAR target recognition method". Journal of Engineering 2019, n.º 21 (1 de noviembre de 2019): 8129–33. http://dx.doi.org/10.1049/joe.2019.0669.
Texto completoTräff, Erik A., Anton Rydahl, Sven Karlsson, Ole Sigmund y Niels Aage. "Simple and efficient GPU accelerated topology optimisation: Codes and applications". Computer Methods in Applied Mechanics and Engineering 410 (mayo de 2023): 116043. http://dx.doi.org/10.1016/j.cma.2023.116043.
Texto completoSzénási, Sándor. "Solving the inverse heat conduction problem using NVLink capable Power architecture". PeerJ Computer Science 3 (20 de noviembre de 2017): e138. http://dx.doi.org/10.7717/peerj-cs.138.
Texto completoBitam, Salim, NourEddine Djedi y Maroua Grid. "GPU-based distributed bee swarm optimisation for dynamic vehicle routing problem". International Journal of Ad Hoc and Ubiquitous Computing 31, n.º 3 (2019): 155. http://dx.doi.org/10.1504/ijahuc.2019.10022343.
Texto completoKhemiri, Randa, Hassan Kibeya, Fatma Ezahra Sayadi, Nejmeddine Bahri, Mohamed Atri y Nouri Masmoudi. "Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation". IET Image Processing 12, n.º 2 (1 de febrero de 2018): 243–53. http://dx.doi.org/10.1049/iet-ipr.2017.0474.
Texto completoUchida, Akihiro, Yasuaki Ito y Koji Nakano. "Accelerating ant colony optimisation for the travelling salesman problem on the GPU". International Journal of Parallel, Emergent and Distributed Systems 29, n.º 4 (8 de octubre de 2013): 401–20. http://dx.doi.org/10.1080/17445760.2013.842568.
Texto completoSpalding, Myles, Anthony Walsh y Trent Aland. "Evaluation of a new GPU-enabled VMAT multi-criteria optimisation plan generation algorithm". Medical Dosimetry 45, n.º 4 (2020): 368–73. http://dx.doi.org/10.1016/j.meddos.2020.05.007.
Texto completoTesis sobre el tema "Optimisations for GPU"
Romera, Thomas. "Adéquation algorithme architecture pour flot optique sur GPU embarqué". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS450.
Texto completoThis thesis focus on the optimization and efficient implementation of pixel motion (optical flow) estimation algorithms on embedded graphics processing units (GPUs). Two iterative algorithms have been studied: the Total Variation - L1 (TV-L1) method and the Horn-Schunck method. The primary objective of this work is to achieve real-time processing, with a target frame processing time of less than 40 milliseconds, on low-power platforms, while maintaining acceptable image resolution and flow estimation quality for the intended applications. Various levels of optimization strategies have been explored. High-level algorithmic transformations, such as operator fusion and operator pipelining, have been implemented to maximize data reuse and enhance spatial/temporal locality. Additionally, GPU-specific low-level optimizations, including the utilization of vector instructions and numbers, as well as efficient memory access management, have been incorporated. The impact of floating-point number representation (single-precision versus half-precision) has also been investigated. The implementations have been assessed on Nvidia's Jetson Xavier, TX2, and Nano embedded platforms in terms of execution time, power consumption, and optical flow accuracy. Notably, the TV-L1 method exhibits higher complexity and computational intensity compared to Horn-Schunck. The fastest versions of these algorithms achieve a processing rate of 0.21 nanoseconds per pixel per iteration in half-precision on the Xavier platform, representing a 22x time reduction over efficient and parallel CPU versions. Furthermore, energy consumption is reduced by a factor of x5.3. Among the tested boards, the Xavier embedded platform, being both the most powerful and the most recent, consistently delivers the best results in terms of speed and energy efficiency. Operator merging and pipelining have proven to be instrumental in improving GPU performance by enhancing data reuse. This data reuse is made possible through GPU Shared memory, which is a small, high-speed memory that enables data sharing among threads within the same GPU thread block. While merging multiple iterations yields performance gains, it is constrained by the size of the Shared memory, necessitating trade-offs between resource utilization and speed. The adoption of half-precision numbers accelerates iterative algorithms and achieves superior optical flow accuracy within the same time frame compared to single-precision counterparts. Half-precision implementations converge more rapidly due to the increased number of iterations possible within a given time window. Specifically, the use of half-precision numbers on the best GPU architecture accelerates execution by up to x2.2 for TV-L1 and x3.7 for Horn-Schunck. This work underscores the significance of both GPU-specific optimizations for computer vision algorithms, along with the use and study of reduced floating point numbers. They pave the way for future enhancements through new algorithmic transformations, alternative numerical formats, and hardware architectures. This approach can potentially be extended to other families of iterative algorithms
Fumero, Alfonso Juan José. "Accelerating interpreted programming languages on GPUs with just-in-time compilation and runtime optimisations". Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28718.
Texto completoHopson, Benjamin Thomas Ken. "Techniques of design optimisation for algorithms implemented in software". Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20435.
Texto completoLuong, Thé Van. "Métaheuristiques parallèles sur GPU". Thesis, Lille 1, 2011. http://www.theses.fr/2011LIL10058/document.
Texto completoReal-world optimization problems are often complex and NP-hard. Their modeling is continuously evolving in terms of constraints and objectives, and their resolution is CPU time-consuming. Although near-optimal algorithms such as metaheuristics (generic heuristics) make it possible to reduce the temporal complexity of their resolution, they fail to tackle large problems satisfactorily. Over the last decades, parallel computing has been revealed as an unavoidable way to deal with large problem instances of difficult optimization problems. The design and implementation of parallel metaheuristics are strongly influenced by the computing platform. Nowadays, GPU computing has recently been revealed effective to deal with time-intensive problems. This new emerging technology is believed to be extremely useful to speed up many complex algorithms. One of the major issues for metaheuristics is to rethink existing parallel models and programming paradigms to allow their deployment on GPU accelerators. Generally speaking, the major issues we have to deal with are: the distribution of data processing between CPU and GPU, the thread synchronization, the optimization of data transfer between the different memories, the memory capacity constraints, etc. The contribution of this thesis is to deal with such issues for the redesign of parallel models of metaheuristics to allow solving of large scale optimization problems on GPU architectures. Our objective is to rethink the existing parallel models and to enable their deployment on GPUs. Thereby, we propose in this document a new generic guideline for building efficient parallel metaheuristics on GPU. Our challenge is to come out with the GPU-based design of the whole hierarchy of parallel models.In this purpose, very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of solutions to GPU threadsor memory management. These approaches have been exhaustively experimented using five optimization problems and four GPU configurations. Compared to a CPU-based execution, experiments report up to 80-fold acceleration for large combinatorial problems and up to 2000-fold speed-up for a continuous problem. The different works related to this thesis have been accepted in a dozen of publications, including the IEEE Transactions on Computers journal
Chrétien, Benjamin. "Optimisation semi-infinie sur GPU pour le contrôle corps-complet de robots". Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT315/document.
Texto completoA humanoid robot is a complex system with numerous degrees of freedom, whose behavior is subject to the nonlinear equations of motion. As a result, planning its motion is a difficult task from a computational perspective.In this thesis, we aim at developing a method that can leverage the computing power of GPUs in the context of optimization-based whole-body motion planning. We first exhibit the properties of the optimization problem, and show that several avenues can be exploited in the context of parallel computing. Then, we present our approach of the dynamics computation, suitable for highly-parallel processing architectures. Next, we propose a many-core GPU implementation of the motion planning problem. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. Because each constraint and its gradient can be evaluated independently for each time interval, we end up with a highly parallelizable problem that can take advantage of GPUs. We also propose a new parametrization of contact forces adapted to our optimization problem. Finally, we investigate the extension of our work to model predictive control
Van, Luong Thé. "Métaheuristiques parallèles sur GPU". Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2011. http://tel.archives-ouvertes.fr/tel-00638820.
Texto completoDelevacq, Audrey. "Métaheuristiques pour l'optimisation combinatoire sur processeurs graphiques (GPU)". Thesis, Reims, 2013. http://www.theses.fr/2013REIMS011/document.
Texto completoSeveral combinatorial optimization problems are NP-hard and can only be solved optimally by exact algorithms for small instances. Metaheuristics have proved to be effective in solving many of these problems by finding approximate solutions in a reasonable time. However, dealing with large instances, they may require considerable computation time and amount of memory space to be efficient in the exploration of the search space. Therefore, the interest devoted to their deployment on high performance computing architectures has increased over the past years. Existing parallelization approaches generally follow the message-passing and shared-memory computing paradigms which are suitable for traditional architectures based on microprocessors, also called CPU (Central Processing Unit). However, research in the field of parallel computing is rapidly evolving and new architectures emerge, including hardware accelerators which offloads the CPU of some of its tasks. Among them, graphics processors or GPUs (Graphics Processing Units) have a massively parallel architecture with great potential but also imply new algorithmic and programming challenges. In fact, existing parallelization models of metaheuristics are generally unsuited to computing environments like GPUs. Few works have tackled this subject without providing a comprehensive and fundamental view of it.The general purpose of this thesis is to propose a framework for the effective implementation of metaheuristics on parallel architectures based on GPUs. It begins with a state of the art describing existing works on GPU parallelization of metaheuristics and general classifications of parallel metaheuristics. An original taxonomy is then designed to classify identified implementations and to formalize GPU parallelization strategies in a coherent methodological framework. This thesis also aims to validate this taxonomy by exploiting its main components to propose original parallelization strategies specifically tailored to GPU architectures. Several effective implementations based on Ant Colony Optimization and Iterated Local Search metaheuristics are thus proposed for solving the Travelling Salesman Problem. A structured and thorough experimental study is conducted to evaluate and compare the performance of approaches on criteria related to solution quality and computing time reduction
Quinto, Michele Arcangelo. "Méthode de reconstruction adaptive en tomographie par rayons X : optimisation sur architectures parallèles de type GPU". Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENT109/document.
Texto completoTomography reconstruction from projections data is an inverse problem widely used inthe medical imaging field. With sufficiently large number of projections over the requiredangle, the FBP (filtered backprojection) algorithms allow fast and accurate reconstructions.However in the cases of limited views (lose dose imaging) and/or limited angle (specificconstrains of the setup), the data available for inversion are not complete, the problembecomes more ill-conditioned, and the results show significant artifacts. In these situations,an alternative approach of reconstruction, based on a discrete model of the problem,consists in using an iterative algorithm or a statistical modelisation of the problem to computean estimate of the unknown object. These methods are classicaly based on a volumediscretization into a set of voxels and provide 3D maps of densities. Computation time andmemory storage are their main disadvantages. Moreover, whatever the application, thevolumes are segmented for a quantitative analysis. Numerous methods of segmentationwith different interpretations of the contours and various minimized energy functionalare offered, and the results can depend on their use.This thesis presents a novel approach of tomographic reconstruction simultaneouslyto segmentation of the different materials of the object. The process of reconstruction isno more based on a regular grid of pixels (resp. voxel) but on a mesh composed of nonregular triangles (resp. tetraedra) adapted to the shape of the studied object. After aninitialization step, the method runs into three main steps: reconstruction, segmentationand adaptation of the mesh, that iteratively alternate until convergence. Iterative algorithmsof reconstruction used in a conventionnal way have been adapted and optimizedto be performed on irregular grids of triangular or tetraedric elements. For segmentation,two methods, one based on a parametric approach (snake) and the other on a geometricapproach (level set) have been implemented to consider mono and multi materials objects.The adaptation of the mesh to the content of the estimated image is based on the previoussegmented contours that makes the mesh progressively coarse from the edges to thelimits of the domain of reconstruction. At the end of the process, the result is a classicaltomographic image in gray levels, but whose representation by an adaptive mesh toits content provide a correspoonding segmentation. The results show that the methodprovides reliable reconstruction and leads to drastically decrease the memory storage. Inthis context, the operators of projection have been implemented on parallel archituecturecalled GPU. A first 2D version shows the feasability of the full process, and an optimizedversion of the 3D operators provides more efficent compoutations
O'Connell, Jonathan F. "A dynamic programming model to solve optimisation problems using GPUs". Thesis, Cardiff University, 2017. http://orca.cf.ac.uk/97930/.
Texto completoPospíchal, Petr. "Akcelerace genetického algoritmu s využitím GPU". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236783.
Texto completoCapítulos de libros sobre el tema "Optimisations for GPU"
Nagy, Szilárd, Károly Jármai y Attila Baksa. "Combination of GPU Programming and FEM Analysis in Structural Optimisation". En Vehicle and Automotive Engineering 4, 756–67. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-15211-5_63.
Texto completoPrata, Paula, Paulo Fazendeiro, Pedro Sequeira y Chandrashekhar Padole. "A Comment on Bio-inspired Optimisation via GPU Architecture: The Genetic Algorithm Workload". En Swarm, Evolutionary, and Memetic Computing, 670–78. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-35380-2_78.
Texto completoGonzález-Arribas, Daniel, Manuel Sanjurjo-Rivo y Manuel Soler. "Multiobjective Optimisation of Aircraft Trajectories Under Wind Uncertainty Using GPU Parallelism and Genetic Algorithms". En Computational Methods in Applied Sciences, 453–66. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-89890-2_29.
Texto completoYoucef, Bouras. "Research Information". En Advanced Deep Learning Applications in Big Data Analytics, 218–72. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-2791-7.ch011.
Texto completoSoares, Adroaldo Santos, Lilian Lefol Nani Guarieiro, Oberdan Rocha Pinheiro, Marcelo Albano Moret Simões Gonçalves, Fabio de Sousa Santos y Fernando Luiz Pellegrini Pessoa. "Metamodeling of the deposition process in oil pre-processing to optimise the cleaning of the heat exchanger network: A systematic review". En Themes focused on interdisciplinarity and sustainable development worldwide V. 02. Seven Editora, 2024. http://dx.doi.org/10.56238/sevened2024.003-009.
Texto completoBhalla, Ishan y Kamlesh Chaudhary. "Applying Service Oriented Architecture and Cloud Computing for a Greener Traffic Management". En Green Technologies, 678–93. IGI Global, 2011. http://dx.doi.org/10.4018/978-1-60960-472-1.ch406.
Texto completoBhalla, Ishan y Kamlesh Chaudhary. "Applying Service Oriented Architecture and Cloud Computing for a Greener Traffic Management". En Handbook of Research on Green ICT, 332–47. IGI Global, 2011. http://dx.doi.org/10.4018/978-1-61692-834-6.ch023.
Texto completoCiampoli, L. Bianchini, F. D’Amico, A. Calvi, F. Benedetto y F. Tosti. "Signal processing for optimisation of low-powered GPR data with application in transportation engineering (roads and railways)". En Bearing Capacity of Roads, Railways and Airfields, 1553–57. CRC Press, 2017. http://dx.doi.org/10.1201/9781315100333-206.
Texto completoActas de conferencias sobre el tema "Optimisations for GPU"
Lemos, Dayllon V. X., Humberto J. Longo, Wellington S. Martins y Les R. Foulds. "A GPU-based DP algorithm for solving multiple instances of the knapsack problem". En Simpósio em Sistemas Computacionais de Alto Desempenho. Sociedade Brasileira de Computação, 2023. http://dx.doi.org/10.5753/wscad.2023.235875.
Texto completoPaukste, Andrius. "Monte Carlo optimisation auto-tuning on a multi-GPU cluster". En 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing (PDGC). IEEE, 2012. http://dx.doi.org/10.1109/pdgc.2012.6449942.
Texto completoWainwright, Thomas R., Daniel J. Poole y Christian B. Allen. "GPU-accelerated aerodynamic shape optimisation framework for large turbine blades". En AIAA SCITECH 2022 Forum. Reston, Virginia: American Institute of Aeronautics and Astronautics, 2022. http://dx.doi.org/10.2514/6.2022-1292.
Texto completoMaknickienė, Nijolė, Ieva Kekytė y Algirdas Maknickas. "COMPUTATION INTELLIGENCE BASED DAILY ALGORITHMIC STRATEGIES FOR TRADING IN THE FOREIGN EXCHANGE MARKET". En Business and Management 2018. VGTU Technika, 2018. http://dx.doi.org/10.3846/bm.2018.53.
Texto completoLi, Da, Hancheng Wu y Michela Becchi. "Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs". En COSMIC '15: International Workshop on Code Optimisation for Multi and Many Cores. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2723772.2723780.
Texto completoJaros, Jiri, Jan Marek y Pavel Mensik. "Optimisation of Water Management Systems Using a GPU-Accelerated Differential Evolution". En 2015 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2015. http://dx.doi.org/10.1109/ssci.2015.266.
Texto completoThomas, B., A. El Ouardi, S. Bouaziz, R. Le Goff Latimier y H. Ben Ahmed. "GPU Optimisation of an Endogenous Peer-to-Peer Market with Product Differentiation". En 2023 IEEE Belgrade PowerTech. IEEE, 2023. http://dx.doi.org/10.1109/powertech55446.2023.10202823.
Texto completoCecilia, J. M., J. M. Garcia, M. Ujaldon, A. Nisbet y M. Amos. "Parallelization strategies for ant colony optimisation on GPUs". En Distributed Processing, Workshops and Phd Forum. IEEE, 2011. http://dx.doi.org/10.1109/ipdps.2011.170.
Texto completoNagabandi, K., S. Mills, X. Zhang, D. J. J. Toal y A. J. Keane. "Surrogate Based Design Optimisation of Combustor Tile Cooling Feed Holes". En ASME 2017 Gas Turbine India Conference. American Society of Mechanical Engineers, 2017. http://dx.doi.org/10.1115/gtindia2017-4586.
Texto completoDybedal, Joacim y Geir Hovland. "GPU-Based Optimisation of 3D Sensor Placement Considering Redundancy, Range and Field of View". En 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, 2020. http://dx.doi.org/10.1109/iciea48937.2020.9248170.
Texto completo