To see the other types of publications on this topic, follow the link: CUDA FRAMEWORK.

Journal articles on the topic 'CUDA FRAMEWORK'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'CUDA FRAMEWORK.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Trujillo, Leonardo, Jose Manuel Muñoz Contreras, Daniel E. Hernandez, Mauro Castelli, and Juan J. Tapia. "GSGP-CUDA — A CUDA framework for Geometric Semantic Genetic Programming." SoftwareX 18 (June 2022): 101085. http://dx.doi.org/10.1016/j.softx.2022.101085.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Rosenberg, Duane, Pablo D. Mininni, Raghu Reddy, and Annick Pouquet. "GPU Parallelization of a Hybrid Pseudospectral Geophysical Turbulence Framework Using CUDA." Atmosphere 11, no. 2 (February 8, 2020): 178. http://dx.doi.org/10.3390/atmos11020178.

Full text
Abstract:
An existing hybrid MPI-OpenMP scheme is augmented with a CUDA-based fine grain parallelization approach for multidimensional distributed Fourier transforms, in a well-characterized pseudospectral fluid turbulence code. Basics of the hybrid scheme are reviewed, and heuristics provided to show a potential benefit of the CUDA implementation. The method draws heavily on the CUDA runtime library to handle memory management and on the cuFFT library for computing local FFTs. The manner in which the interfaces to these libraries are constructed, and ISO bindings utilized to facilitate platform portability, are discussed. CUDA streams are implemented to overlap data transfer with cuFFT computation. Testing with a baseline solver demonstrated significant aggregate speed-up over the hybrid MPI-OpenMP solver by offloading to GPUs on an NVLink-based test system. While the batch streamed approach provided little benefit with NVLink, we saw a performance gain of 30 % when tuned for the optimal number of streams on a PCIe-based system. It was found that strong GPU scaling is nearly ideal, in all cases. Profiling of the CUDA kernels shows that the transform computation achieves 15% of the attainable peak FlOp-rate based on a roofline model for the system. In addition to speed-up measurements for the fiducial solver, we also considered several other solvers with different numbers of transform operations and found that aggregate speed-ups are nearly constant for all solvers.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Fan, Xiao Jiang, and Xiao Peng Hu. "A TBB-CUDA Implementation for Background Removal in a Video-Based Fire Detection System." Mathematical Problems in Engineering 2014 (2014): 1–6. http://dx.doi.org/10.1155/2014/692921.

Full text
Abstract:
This paper presents a parallel TBB-CUDA implementation for the acceleration of single-Gaussian distribution model, which is effective for background removal in the video-based fire detection system. In this framework, TBB mainly deals with initializing work of the estimated Gaussian model running on CPU, and CUDA performs background removal and adaption of the model running on GPU. This implementation can exploit the combined computation power of TBB-CUDA, which can be applied to the real-time environment. Over 220 video sequences are utilized in the experiments. The experimental results illustrate that TBB+CUDA can achieve a higher speedup than both TBB and CUDA. The proposed framework can effectively overcome the disadvantages of limited memory bandwidth and few execution units of CPU, and it reduces data transfer latency and memory latency between CPU and GPU.
APA, Harvard, Vancouver, ISO, and other styles
4

Bi, Yujiang, Yi Xiao, WeiYi Guo, Ming Gong, Peng Sun, Shun Xu, and Yi-bo Yang. "Lattice QCD GPU Inverters on ROCm Platform." EPJ Web of Conferences 245 (2020): 09008. http://dx.doi.org/10.1051/epjconf/202024509008.

Full text
Abstract:
The open source ROCm/HIP platform for GPU computing provides a uniform framework to support both the NVIDIA and AMD GPUs, and also the possibility to porting the CUDA code to the HIP-compatible one. We present the porting progress on the Overlap fermion inverter (GWU-code) and also the general Lattice QCD inverter package - QUDA. The manual of using QUDA on HIP and also the tips of porting general CUDA code into the HIP framework are also provided.
APA, Harvard, Vancouver, ISO, and other styles
5

Peng, Bo, Junliang Lai, Yang Wang, Ling Wang, and Dong C. Liu. "CUDA-Based Parallel Computation Framework for Phase Root Seeking Algorithm." Journal of Medical Imaging and Health Informatics 4, no. 6 (December 1, 2014): 922–31. http://dx.doi.org/10.1166/jmihi.2014.1343.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ahmed, Rafid, Md Sazzadul Islam, and Jia Uddin. "Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 1 (February 1, 2018): 70. http://dx.doi.org/10.11591/ijece.v8i1.pp70-75.

Full text
Abstract:
As majority of the compression algorithms are implementations for CPU architecture, the primary focus of our work was to exploit the opportunities of GPU parallelism in audio compression. This paper presents an implementation of Apple Lossless Audio Codec (ALAC) algorithm by using NVIDIA GPUs Compute Unified Device Architecture (CUDA) Framework. The core idea was to identify the areas where data parallelism could be applied and parallel programming model CUDA could be used to execute the identified parallel components on Single Instruction Multiple Thread (SIMT) model of CUDA. The dataset was retrieved from European Broadcasting Union, Sound Quality Assessment Material (SQAM). Faster execution of the algorithm led to execution time reduction when applied to audio coding for large audios. This paper also presents the reduction of power usage due to running the parallel components on GPU. Experimental results reveal that we achieve about 80-90% speedup through CUDA on the identified components over its CPU implementation while saving CPU power consumption.
APA, Harvard, Vancouver, ISO, and other styles
7

Michailidis, P. D., and K. G. Margaritis. "Accelerating kernel density estimation on the GPU using the CUDA framework." Applied Mathematical Sciences 7 (2013): 1447–76. http://dx.doi.org/10.12988/ams.2013.13133.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bruel, Pedro, Marcos Amarís, and Alfredo Goldman. "Autotuning CUDA compiler parameters for heterogeneous applications using the OpenTuner framework." Concurrency and Computation: Practice and Experience 29, no. 22 (March 6, 2017): e3973. http://dx.doi.org/10.1002/cpe.3973.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hadji-Kyriacou, Avelina, and Ognjen Arandjelović. "Raymarching Distance Fields with CUDA." Electronics 10, no. 22 (November 9, 2021): 2730. http://dx.doi.org/10.3390/electronics10222730.

Full text
Abstract:
Raymarching is a technique for rendering implicit surfaces using signed distance fields. It has been known and used since the 1980s for rendering fractals and CSG (constructive solid geometry) surfaces, but has rarely been used for commercial rendering applications such as film and 3D games. Raymarching was first used for photorealistic rendering in the mid 2000s by demoscene developers and hobbyist graphics programmers, receiving little to no attention from the academic community and professional graphics engineers. In the present work, we explain why the use of Simple and Fast Multimedia Library (SFML) by nearly all existing approaches leads to a number of inefficiencies, and hence set out to develop a CUDA oriented approach instead. We next show that the usual data handling pipeline leads to further unnecessary data flow overheads and therefore propose a novel pipeline structure that eliminates much of redundancy in the manner in which data are processed and passed. We proceed to introduce a series of data structures which were designed with the specific aim of exploiting the pipeline’s strengths in terms of efficiency while achieving a high degree of photorealism, as well as the accompanying models and optimizations that ultimately result in an engine which is capable of photorealistic and real-time rendering on complex scenes and arbitrary objects. Lastly, the effectiveness of our framework is demonstrated in a series of experiments which compare our engine both in terms of visual fidelity and computational efficiency with the leading commercial and open source solutions, namely Unreal Engine and Blender.
APA, Harvard, Vancouver, ISO, and other styles
10

Lee, Seongjae, and Taehyoun Kim. "Parallel Dislocation Model Implementation for Earthquake Source Parameter Estimation on Multi-Threaded GPU." Applied Sciences 11, no. 20 (October 11, 2021): 9434. http://dx.doi.org/10.3390/app11209434.

Full text
Abstract:
Graphics processing units (GPUs) have been in the spotlight in various fields because they can process a massive amount of computation at a relatively low price. This research proposes a performance acceleration framework applied to Monte Carlo method-based earthquake source parameter estimation using multi-threaded compute unified device architecture (CUDA) GPU. The Monte Carlo method takes an exhaustive computational burden because iterative nonlinear optimization is performed more than 1000 times. To alleviate this problem, we parallelize the rectangular dislocation model, i.e., the Okada model, since the model consists of independent point-wise computations and takes up most of the time in the nonlinear optimization. Adjusting the degree of common subexpression elimination, thread block size, and constant caching, we obtained the best CUDA optimization configuration that achieves 134.94×, 14.00×, and 2.99× speedups over sequential CPU, 16-threads CPU, and baseline CUDA GPU implementation from the 1000×1000 mesh size, respectively. Then, we evaluated the performance and correctness of four different line search algorithms for the limited memory Broyden–Fletcher–Goldfarb–Shanno with boundaries (L-BFGS-B) optimization in the real earthquake dataset. The results demonstrated Armijo line search to be the most efficient one among the algorithms. The visualization results with the best-fit parameters finally derived by the proposed framework confirm that our framework also approximates the earthquake source parameters with an excellent agreement with the geodetic data, i.e., at most 0.5 cm root-mean-square-error (RMSE) of residual displacement.
APA, Harvard, Vancouver, ISO, and other styles
11

Javeed, Danish, Tianhan Gao, Muhammad Taimoor Khan, and Ijaz Ahmad. "A Hybrid Deep Learning-Driven SDN Enabled Mechanism for Secure Communication in Internet of Things (IoT)." Sensors 21, no. 14 (July 18, 2021): 4884. http://dx.doi.org/10.3390/s21144884.

Full text
Abstract:
The Internet of Things (IoT) has emerged as a new technological world connecting billions of devices. Despite providing several benefits, the heterogeneous nature and the extensive connectivity of the devices make it a target of different cyberattacks that result in data breach and financial loss. There is a severe need to secure the IoT environment from such attacks. In this paper, an SDN-enabled deep-learning-driven framework is proposed for threats detection in an IoT environment. The state-of-the-art Cuda-deep neural network, gated recurrent unit (Cu- DNNGRU), and Cuda-bidirectional long short-term memory (Cu-BLSTM) classifiers are adopted for effective threat detection. We have performed 10 folds cross-validation to show the unbiasedness of results. The up-to-date publicly available CICIDS2018 data set is introduced to train our hybrid model. The achieved accuracy of the proposed scheme is 99.87%, with a recall of 99.96%. Furthermore, we compare the proposed hybrid model with Cuda-Gated Recurrent Unit, Long short term memory (Cu-GRULSTM) and Cuda-Deep Neural Network, Long short term memory (Cu- DNNLSTM), as well as with existing benchmark classifiers. Our proposed mechanism achieves impressive results in terms of accuracy, F1-score, precision, speed efficiency, and other evaluation metrics.
APA, Harvard, Vancouver, ISO, and other styles
12

Cabodi, Gianpiero, Paolo Camurati, Alessandro Garbo, Michele Giorelli, Stefano Quer, and Francesco Savarese. "A Smart Many-Core Implementation of a Motion Planning Framework along a Reference Path for Autonomous Cars." Electronics 8, no. 2 (February 2, 2019): 177. http://dx.doi.org/10.3390/electronics8020177.

Full text
Abstract:
Research on autonomous cars, early intensified in the 1990s, is becoming one of the main research paths in automotive industry. Recent works use Rapidly-exploring Random Trees to explore the state space along a given reference path, and to compute the minimum time collision-free path in real time. Those methods do not require good approximations of the reference path, they are able to cope with discontinuous routes, they are capable of navigating in realistic traffic scenarios, and they derive their power from an extensive computational effort directed to improve the quality of the trajectory from step to step. In this paper, we focus on re-engineering an existing state-of-the-art sequential algorithm to obtain a CUDA-based GPGPU (General Purpose Graphics Processing Units) implementation. To do that, we show how to partition the original algorithm among several working threads running on the GPU, how to propagate information among threads, and how to synchronize those threads. We also give detailed evidence on how to organize memory transfers between the CPU and the GPU (and among different CUDA kernels) such that planning times are optimized and the available memory is not exceeded while storing massive amounts of fuse data. To sum up, in our application the GPU is used for all main operations, the entire application is developed in the CUDA language, and specific attention is paid to concurrency, synchronization, and data communication. We run experiments on several real scenarios, comparing the GPU implementation with the CPU one in terms of the quality of the generated paths and in terms of computation (wall-clock) times. The results of our experiments show that embedded GPUs can be used as an enabler for real-time applications of computationally expensive planning approaches.
APA, Harvard, Vancouver, ISO, and other styles
13

Peña-Cantillana, Francisco, Daniel Díaz-Pernil, Hepzibah A. Christinal, and Miguel A. Gutiérrez-Naranjo. "Implementation on CUDA of the Smoothing Problem with Tissue-Like P Systems." International Journal of Natural Computing Research 2, no. 3 (July 2011): 25–34. http://dx.doi.org/10.4018/jncr.2011070103.

Full text
Abstract:
Smoothing is often used in Digital Imagery for improving the quality of an image by reducing its level of noise. This paper presents a parallel implementation of an algorithm for smoothing 2D images in the framework of Membrane Computing. The chosen formal framework has been tissue-like P systems. The algorithm has been implemented by using a novel device architecture called CUDA (Compute Unified Device Architecture) which allows the parallel NVIDIA Graphics Processors Units (GPUs) to solve many complex computational problems. Some examples are presented and compared; research lines for the future are also discussed.
APA, Harvard, Vancouver, ISO, and other styles
14

Kinsner, M., D. Capson, and A. Spence. "A modular CUDA-based framework for scale-space feature detection in video streams." Journal of Physics: Conference Series 256 (November 1, 2010): 012005. http://dx.doi.org/10.1088/1742-6596/256/1/012005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Chen, DeHao, WenGuang Chen, and WeiMin Zheng. "CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs." Science China Information Sciences 55, no. 3 (February 25, 2012): 663–76. http://dx.doi.org/10.1007/s11432-011-4497-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Tang, Ying, Xiaoying Shi, Tingzhe Xiao, and Jing Fan. "An improved image analogy method based on adaptive CUDA-accelerated neighborhood matching framework." Visual Computer 28, no. 6-8 (April 19, 2012): 743–53. http://dx.doi.org/10.1007/s00371-012-0701-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Sokolovskyy, Yaroslav, Denys Manokhin, Yaroslav Kaplunsky, and Olha Mokrytska. "Development of software and algorithms of parallel learning of artificial neural networks using CUDA technologies." Technology audit and production reserves 5, no. 2(61) (September 23, 2021): 21–25. http://dx.doi.org/10.15587/2706-5448.2021.239784.

Full text
Abstract:
The object of research is to parallelize the learning process of artificial neural networks to automate the procedure of medical image analysis using the Python programming language, PyTorch framework and Compute Unified Device Architecture (CUDA) technology. The operation of this framework is based on the Define-by-Run model. The analysis of the available cloud technologies for realization of the task and the analysis of algorithms of learning of artificial neural networks is carried out. A modified U-Net architecture from the MedicalTorch library was used. The purpose of its application was the need for a network that can effectively learn with small data sets, as in the field of medicine one of the most problematic places is the availability of large datasets, due to the requirements for data confidentiality of this nature. The resulting information system is able to implement the tasks set before it, contains the most user-friendly interface and all the necessary tools to simplify and automate the process of visualization and analysis of data. The efficiency of neural network learning with the help of the central processor (CPU) and with the help of the graphic processor (GPU) with the use of CUDA technologies is compared. Cloud technology was used in the study. Google Colab and Microsoft Azure were considered among cloud services. Colab was first used to build a prototype. Therefore, the Azure service was used to effectively teach the finished architecture of the artificial neural network. Measurements were performed using cloud technologies in both services. The Adam optimizer was used to learn the model. CPU duration measurements were also measured to assess the acceleration of CUDA technology. An estimate of the acceleration obtained through the use of GPU computing and cloud technologies was implemented. CPU duration measurements were also measured to assess the acceleration of CUDA technology. The model developed during the research showed satisfactory results according to the metrics of Jaccard and Dyce in solving the problem. A key factor in the success of this study was cloud computing services.
APA, Harvard, Vancouver, ISO, and other styles
18

Hwang, Jaemin, Jong-Wook Choi, Seongrim Choi, and Byeong-Gyu Nam. "A Simulation Framework for CUDA Computing on Non-x86 Platforms based on QEMU and GPGPU-Sim." Journal of the Korea Industrial Information Systems Research 19, no. 2 (April 30, 2014): 15–22. http://dx.doi.org/10.9723/jksiis.2014.19.2.015.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Poli, G., E. Llapa, J. R. Cecatto, J. H. Saito, J. F. Peters, S. Ramanna, and M. C. Nicoletti. "Solar flare detection system based on tolerance near sets in a GPU–CUDA framework." Knowledge-Based Systems 70 (November 2014): 345–60. http://dx.doi.org/10.1016/j.knosys.2014.07.012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Tovkach, Serhii. "CUDA-інтеграція контурів керування авіаційного газотурбінного двигуна." Aerospace Technic and Technology, no. 6 (November 27, 2023): 31–39. http://dx.doi.org/10.32620/aktt.2022.6.04.

Full text
Abstract:
The problem of accelerating the process of designing aircraft gas turbine engines and their control systems, the system "AIRCRAFT-AVIATION ENGINE-FUEL", and forming the technical type of an aircraft engine, adapting to new operating conditions within the framework of experimental design bureaus (EDB) and the industry is using automated systems with low computing performance and incomplete description. Information technologies for developing engines allow duplication and mismatch of data, loss of information and time during transmission and processing for making parametric and structural decisions. To better adaptation of the characteristics of an aviation engine (AE) to the tasks solved by an aircraft in flight, it is necessary to integrate control systems. Integrated control systems are especially effective for managing today's multi-mode aircraft. On the basis of their control, optimal control programs for the power plant (PP) are formed using the criteria for evaluating the effectiveness of the aircraft. This article proposes a paradigm for building integrated control loops for an aircraft gas turbine engine, which can be formed by automating control processes, an automatic control system, and combined control programs. The objective of this research is the processes of constructing adaptive control loops for aircraft gas turbine engines. The subject of this study is the adaptive control of aircraft gas turbine engines using embedded control loops and CUDA architecture. The goal is to improve the dynamic characteristics of an aircraft gas turbine engine through adaptive control using control loops, considering various aircraft flight modes and engine operating modes. Objectives: to determine the main controllable elements of an aircraft engine, adjustable parameters and factors for constructing control loops according to the principle of adaptation; describe the mechanism of joint management of gas turbine engines; to study the processes of building an integration circuit "aircraft - power plant" and develop the concept of an integrated ACS; define the CUDA paradigm for parallel computing of control loops. Conclusions. The scientific novelty lies in the formation of a paradigm for developing adaptive control models for gas turbine engines, considering different aircraft flight modes and engine operation modes.
APA, Harvard, Vancouver, ISO, and other styles
21

Zhao, Yuxuan, Qi Sun, Zhuolun He, Yang Bai, and Bei Yu. "AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 11354–62. http://dx.doi.org/10.1609/aaai.v37i9.26343.

Full text
Abstract:
Deep learning frameworks optimize the computation graphs and intra-operator computations to boost the inference performance on GPUs, while inter-operator parallelism is usually ignored. In this paper, a unified framework, AutoGraph, is proposed to obtain highly optimized computation graphs in favor of parallel executions of GPU kernels. A novel dynamic programming algorithm, combined with backtracking search, is adopted to explore the optimal graph optimization solution, with the fast performance estimation from the mixed critical path cost. Accurate runtime information based on GPU Multi-Stream launched with CUDA Graph is utilized to determine the convergence of the optimization. Experimental results demonstrate that our method achieves up to 3.47x speedup over existing graph optimization methods. Moreover, AutoGraph outperforms state-of-the-art parallel kernel launch frameworks by up to 1.26x.
APA, Harvard, Vancouver, ISO, and other styles
22

Shaikh, Alim, Isha Goski, Parth Bhosale, Somesh Bhosale, and Prof S. S. Pawar. "HORUS - Heuristic Object Recognition Unified System Using YOLO and CUDA." International Journal for Research in Applied Science and Engineering Technology 11, no. 5 (May 31, 2023): 974–76. http://dx.doi.org/10.22214/ijraset.2023.51688.

Full text
Abstract:
Abstract: The object detection based on deep learning is an important application in deep learning technology, which is characterized by its strong capability of feature learning and feature representation compared with the traditional object detection methods. The paper first makes an introduction of the classical methods in object detection and expounds the relation and difference between the classical methods and the deep learning methods in object detection. Then it introduces the emergence of the object detection methods based on deep learning and elaborates the most typical methods nowadays in the object detection via deep learning. In the statement of the methods, the paper focuses on the framework design and the working principle of the models and analyzes the model performance in the real-time and the accuracy of detection. Eventually, it discusses the challenges in the object detection based on deep learning and offers some solutions for reference.
APA, Harvard, Vancouver, ISO, and other styles
23

Hanousek, Vít, and Tomáš Oberhuber. "Efficient Transfer of C++ Objects on Intel Xeon Phi KNC in Offload Mode." Computer Methods in Material Science 17, no. 2 (2017): 94–100. http://dx.doi.org/10.7494/cmms.2017.2.0594.

Full text
Abstract:
Intel Xeon Phi KNC is a modern coprocessor designed for the high performance computing. In this paper we describe efficient method for transferring C++ objects in the Offloading mode. Our aim is to get consistent interface with NVidia CUDA framework in Template Numerical Library (TNL). As working example we use this library and the heat equation problem to demonstrate efficiency of implementation on Intel Xeon Phi and compare CPU with this coprocessor.
APA, Harvard, Vancouver, ISO, and other styles
24

Torti, Emanuele, Alessandro Fontanella, Antonio Plaza, Javier Plaza, and Francesco Leporati. "Hyperspectral Image Classification Using Parallel Autoencoding Diabolo Networks on Multi-Core and Many-Core Architectures." Electronics 7, no. 12 (December 8, 2018): 411. http://dx.doi.org/10.3390/electronics7120411.

Full text
Abstract:
One of the most important tasks in hyperspectral imaging is the classification of the pixels in the scene in order to produce thematic maps. This problem can be typically solved through machine learning techniques. In particular, deep learning algorithms have emerged in recent years as a suitable methodology to classify hyperspectral data. Moreover, the high dimensionality of hyperspectral data, together with the increasing availability of unlabeled samples, makes deep learning an appealing approach to process and interpret those data. However, the limited number of labeled samples often complicates the exploitation of supervised techniques. Indeed, in order to guarantee a suitable precision, a large number of labeled samples is normally required. This hurdle can be overcome by resorting to unsupervised classification algorithms. In particular, autoencoders can be used to analyze a hyperspectral image using only unlabeled data. However, the high data dimensionality leads to prohibitive training times. In this regard, it is important to realize that the operations involved in autoencoders training are intrinsically parallel. Therefore, in this paper we present an approach that exploits multi-core and many-core devices in order to achieve efficient autoencoders training in hyperspectral imaging applications. Specifically, in this paper, we present new OpenMP and CUDA frameworks for autoencoder training. The obtained results show that the CUDA framework provides a speed-up of about two orders of magnitudes as compared to an optimized serial processing chain.
APA, Harvard, Vancouver, ISO, and other styles
25

Bocci, Andrea, David Dagenhart, Vincenzo Innocente, Christopher Jones, Matti Kortelainen, Felice Pantaleo, and Marco Rovere. "Bringing heterogeneity to the CMS software framework." EPJ Web of Conferences 245 (2020): 05009. http://dx.doi.org/10.1051/epjconf/202024505009.

Full text
Abstract:
The advent of computing resources with co-processors, for example Graphics Processing Units (GPU) or Field-Programmable Gate Arrays (FPGA), for use cases like the CMS High-Level Trigger (HLT) or data processing at leadership-class supercomputers imposes challenges for the current data processing frameworks. These challenges include developing a model for algorithms to offload their computations on the co-processors as well as keeping the traditional CPU busy doing other work. The CMS data processing framework, CMSSW, implements multithreading using the Intel Threading Building Blocks (TBB) library, that utilizes tasks as concurrent units of work. In this paper we will discuss a generic mechanism to interact effectively with non-CPU resources that has been implemented in CMSSW. In addition, configuring such a heterogeneous system is challenging. In CMSSW an application is configured with a configuration file written in the Python language. The algorithm types are part of the configuration. The challenge therefore is to unify the CPU and co-processor settings while allowing their implementations to be separate. We will explain how we solved these challenges while minimizing the necessary changes to the CMSSW framework. We will also discuss on a concrete example how algorithms would offload work to NVIDIA GPUs using directly the CUDA API.
APA, Harvard, Vancouver, ISO, and other styles
26

KUMAR, PIYUSH, and ANUPAM AGRAWAL. "GPU-ACCELERATED INTERACTIVE VISUALIZATION OF 3D VOLUMETRIC DATA USING CUDA." International Journal of Image and Graphics 13, no. 02 (April 2013): 1340003. http://dx.doi.org/10.1142/s0219467813400032.

Full text
Abstract:
Improving the image quality and the rendering speed have always been a challenge to the programmers involved in large scale volume rendering especially in the field of medical image processing. The paper aims to perform volume rendering using the graphics processing unit (GPU), in which, with its massively parallel capability has the potential to revolutionize this field. This work is now better with the help of GPU accelerated system. The final results would allow the doctors to diagnose and analyze the 2D computed tomography (CT) scan data using three dimensional visualization techniques. The system is used in multiple types of datasets, from 10 MB to 350 MB medical volume data. Further, the use of compute unified device architecture (CUDA) framework, a low learning curve technology, for such purpose would greatly reduce the cost involved in CT scan analysis; hence bring it to the common masses. The volume rendering has been done on Nvidia Tesla C1060 (there are 240 CUDA cores, which provides execution of data parallely) card and its performance has also been benchmarked.
APA, Harvard, Vancouver, ISO, and other styles
27

Guzzi, Francesco, George Kourousias, Fulvio Billè, Roberto Pugliese, Alessandra Gianoncelli, and Sergio Carrato. "A modular software framework for the design and implementation of ptychography algorithms." PeerJ Computer Science 8 (July 25, 2022): e1036. http://dx.doi.org/10.7717/peerj-cs.1036.

Full text
Abstract:
Computational methods are driving high impact microscopy techniques such as ptychography. However, the design and implementation of new algorithms is often a laborious process, as many parts of the code are written in close-to-the-hardware programming constructs to speed up the reconstruction. In this article, we present SciComPty, a new ptychography software framework aiming at simulating ptychography datasets and testing state-of-the-art and new reconstruction algorithms. Despite its simplicity, the software leverages GPU accelerated processing through the PyTorch CUDA interface. This is essential for designing new methods that can readily be employed. As an example, we present an improved position refinement method based on Adam and a new version of the rPIE algorithm, adapted for partial coherence setups. Results are shown on both synthetic and real datasets. The software is released as open-source.
APA, Harvard, Vancouver, ISO, and other styles
28

Sheng, Yanyan, William S. Welling, and Michelle M. Zhu. "A GPU-Based Gibbs Sampler for a Unidimensional IRT Model." International Scholarly Research Notices 2014 (October 30, 2014): 1–11. http://dx.doi.org/10.1155/2014/368149.

Full text
Abstract:
Item response theory (IRT) is a popular approach used for addressing large-scale statistical problems in psychometrics as well as in other fields. The fully Bayesian approach for estimating IRT models is usually memory and computationally expensive due to the large number of iterations. This limits the use of the procedure in many applications. In an effort to overcome such restraint, previous studies focused on utilizing the message passing interface (MPI) in a distributed memory-based Linux cluster to achieve certain speedups. However, given the high data dependencies in a single Markov chain for IRT models, the communication overhead rapidly grows as the number of cluster nodes increases. This makes it difficult to further improve the performance under such a parallel framework. This study aims to tackle the problem using massive core-based graphic processing units (GPU), which is practical, cost-effective, and convenient in actual applications. The performance comparisons among serial CPU, MPI, and compute unified device architecture (CUDA) programs demonstrate that the CUDA GPU approach has many advantages over the CPU-based approach and therefore is preferred.
APA, Harvard, Vancouver, ISO, and other styles
29

Munawar, Asim, Mohamed Wahib, Masaharu Munetomo, and Kiyoshi Akama. "Hybrid of genetic algorithm and local search to solve MAX-SAT problem using nVidia CUDA framework." Genetic Programming and Evolvable Machines 10, no. 4 (October 20, 2009): 391–415. http://dx.doi.org/10.1007/s10710-009-9091-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Blazewicz, Marek, Ian Hinder, David M. Koppelman, Steven R. Brandt, Milosz Ciznicki, Michal Kierzynka, Frank Löffler, Erik Schnetter, and Jian Tao. "From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation." Scientific Programming 21, no. 1-2 (2013): 1–16. http://dx.doi.org/10.1155/2013/167841.

Full text
Abstract:
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, theChemoraframework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.
APA, Harvard, Vancouver, ISO, and other styles
31

Samaké, A., M. Alassane, A. Mahamane, and O. Diallo. "A SCALABLE HYBRID CPU-GPU COMPUTATIONAL FRAMEWORK FOR A FINITE ELEMENT-BASED AIR QUALITY MODEL." Advances in Mathematics: Scientific Journal 12, no. 1 (January 2, 2023): 45–61. http://dx.doi.org/10.37418/amsj.12.1.3.

Full text
Abstract:
We propose a scalable computational framework for the hybrid CPU-GPU implementation ofa traffic-induced and finite element-based air quality model. The hybrid computing paradigm we investigate consists in combining the CPU-based distributed-memory programming approach using Message Passing Interface (MPI) and a GPU programming model for the finite element numerical integration using Compute Unified Device Architecture (CUDA), a general purpose parallel computing platform released by NVIDIA Corporation and featured on its own GPUs. The scalability results obtained from numerical experiments on two major road traffic-induced air pollutants, namely the fine and inhalable particulate matter PM$_{2.5}$ and PM$_{10}$, are illustrated. These achievements, including speedup and efficiency analyses, support that this framework scales well up to 256 CPU cores used concurrently with GPUs from a hybrid computing system.
APA, Harvard, Vancouver, ISO, and other styles
32

T. A. Valencia-Pérez, J. M. Hernández-López, E. Moreno-Barbosa, and B. de Celis-Alonso. "Study of CT Images Processing with the Implementation of MLEM Algorithm using CUDA on NVIDIA’S GPU Framework." Journal of Nuclear Physics, Material Sciences, Radiation and Applications 7, no. 2 (February 28, 2020): 165–71. http://dx.doi.org/10.15415/jnp.2020.72021.

Full text
Abstract:
In medicine, the acquisition process in Computed Tomography Images (CT) is obtained by a reconstruction algorithm. The classical method for image reconstruction is the Filtered Back Projection (FBP). This method is fast and simple but does not use any statistical information about the measurements. The appearance of artifacts and its low spatial resolution in reconstructed images must be considered. Furthermore, the FBP requires of optimal conditions of the projections and complete sets of data. In this paper a methodology to accelerate acquisition process for CT based on the Maximum Likelihood Estimation Method (MLEM) algorithm is presented. This statistical iterative reconstruction algorithm uses a GPU Programming Paradigms and was compared with sequential algorithms in which the reconstruction time was reduced by up to 3 orders of magnitude while preserving image quality. Furthermore, they showed a good performance when compared with reconstruction methods provided by commercial software. The system, which would consist exclusively of a commercial laptop and GPU could be used as a fast, portable, simple and cheap image reconstruction platform in the future.
APA, Harvard, Vancouver, ISO, and other styles
33

Montella, Raffaele, Giulio Giunta, Giuliano Laccetti, Marco Lapegna, Carlo Palmieri, Carmine Ferraro, Valentina Pelliccia, Cheol-Ho Hong, Ivor Spence, and Dimitrios S. Nikolopoulos. "On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework." International Journal of Parallel Programming 45, no. 5 (October 13, 2016): 1142–63. http://dx.doi.org/10.1007/s10766-016-0462-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Qiu, Liu Chao. "OpenCL-Based GPU Acceleration of ISPH Simulation for Incompressible Flows." Applied Mechanics and Materials 444-445 (October 2013): 380–84. http://dx.doi.org/10.4028/www.scientific.net/amm.444-445.380.

Full text
Abstract:
Thanks to the recent development of tools such as CUDA and OpenCL it has become possible to fully utilize Graphical Processing Units (GPUs) for scientific computing. OpenCL promises huge savings in parallel code development and optimization efforts due to it is not restricted to a specific architecture. We have developed an OpenCL-based acceleration framework on GPU for numerical simulations of incompressible flows using incompressible Smoothed Particle Hydrodynamics (ISPH). In order to assess the performance of the GPU implementation presented in the present work, a comparison was made against the implementation of the same ISPH in CPU using OpenCL.
APA, Harvard, Vancouver, ISO, and other styles
35

Blazewicz, Marek, Steven R. Brandt, Michal Kierzynka, Krzysztof Kurowski, Bogdan Ludwiczak, Jian Tao, and Jan Weglarz. "CaKernel – A Parallel Application Programming Framework for Heterogenous Computing Architectures." Scientific Programming 19, no. 4 (2011): 185–97. http://dx.doi.org/10.1155/2011/457030.

Full text
Abstract:
With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging problems in various domains, e.g., engineering or physics. Although many parallel stencil computing approaches have been proposed, in most cases they solve only particular problems. As a result, scientists are struggling when it comes to the subject of implementing a new stencil-based simulation, especially on high performance hybrid supercomputers. In response to the presented need we extend our previous work on a parallel programming framework for CUDA – CaCUDA that now supports OpenCL. We present CaKernel – a tool that simplifies the development of parallel scientific applications on hybrid systems. CaKernel is built on the highly scalable and portable Cactus framework. In the CaKernel framework, Cactus manages the inter-process communication via MPI while CaKernel manages the code running on Graphics Processing Units (GPUs) and interactions between them. As a non-trivial test case we have developed a 3D CFD code to demonstrate the performance and scalability of the automatically generated code.
APA, Harvard, Vancouver, ISO, and other styles
36

Madrigal Díaz, Jorge Francisco, and Jean-Bernard Hayet. "Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU." Acta Universitaria 23, no. 1 (February 28, 2013): 9–16. http://dx.doi.org/10.15174/au.2013.355.

Full text
Abstract:
This paper describes an efficient implementation of multiple-target multiple-view tracking in video-surveillance sequences. It takes advantage of the capabilities of multiple core Central Processing Units (CPUs) and of graphical processing units under the Compute Unifie Device Arquitecture (CUDA) framework. The principle of our algorithm is 1) in each video sequence, to perform tracking on all persons to track by independent particle filters and 2) to fuse the tracking results of all sequences. Particle filters belong to the category of recursive Bayesian filters. They update a Monte-Carlo representation of the posterior distribution over the target position and velocity. For this purpose, they combine a probabilistic motion model, i.e. prior knowledge about how targets move (e.g. constant velocity) and a likelihood model associated to the observations on targets. At this first level of single video sequences, the multi-threading library Threading Buildings Blocks (TBB) has been used to parallelize the processing of the per-target independent particle filters. Afterwards at the higher level, we rely on General Purpose Programming on Graphical Processing Units (generally termed as GPGPU) through CUDA in order to fuse target-tracking data collected on multiple video sequences, by solving the data association problem. Tracking results are presented on various challenging tracking datasets.
APA, Harvard, Vancouver, ISO, and other styles
37

Su, Huayou, Mei Wen, Nan Wu, Ju Ren, and Chunyuan Zhang. "Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation." Scientific World Journal 2014 (2014): 1–19. http://dx.doi.org/10.1155/2014/716020.

Full text
Abstract:
Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA’s GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design.
APA, Harvard, Vancouver, ISO, and other styles
38

Blyth, Simon. "Integration of JUNO simulation framework with Opticks: GPU accelerated optical propagation via NVIDIA® OptiX™." EPJ Web of Conferences 251 (2021): 03009. http://dx.doi.org/10.1051/epjconf/202125103009.

Full text
Abstract:
Opticks is an open source project that accelerates optical photon simulation by integrating NVIDIA GPU ray tracing, accessed via NVIDIA OptiX, with Geant4 toolkit based simulations. A single NVIDIA Turing architecture GPU has been measured to provide optical photon simulation speedup factors exceeding 1500 times single threaded Geant4 with a full JUNO analytic GPU geometry automatically translated from the Geant4 geometry. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented within CUDA OptiX programs based on the Geant4 implementations. Wavelength-dependent material and surface properties as well as inverse cumulative distribution functions for reemission are interleaved into GPU textures providing fast interpolated property lookup or wavelength generation. In this work we describe major recent developments to facilitate integration of Opticks with the JUNO simulation framework including on GPU collection effciency hit culling which substantially reduces both the CPU memory needed for photon hits and copying overheads. Also progress with the migration of Opticks to the all new NVIDIA OptiX 7 API is described.
APA, Harvard, Vancouver, ISO, and other styles
39

Gomes, Raphael De Souza Rosa, Victor Hugo De Morais Danelichen, Marcelo Sacardi Biudes, Maisa Caldas Souza Velasque, Josiel Maimome De Figueiredo, and José De Souza Nogueira. "The Surface Energy Balance Algorithm for Land (SEBAL) framework in Graphics Processing Units (GPU) using Cuda and OpenCL." Revista Brasileira de Geografia Física 14, no. 3 (July 20, 2021): 1805. http://dx.doi.org/10.26848/rbgf.v14.3.p1805-1814.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Cuomo, S., A. Galletti, G. Giunta, and L. Marcellino. "Toward a Multi-level Parallel Framework on GPU Cluster with PetSC-CUDA for PDE-based Optical Flow Computation." Procedia Computer Science 51 (2015): 170–79. http://dx.doi.org/10.1016/j.procs.2015.05.220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Holmvall, P., N. Wall Wennerdal, M. Håkansson, P. Stadler, O. Shevtsov, T. Löfwander, and M. Fogelström. "SuperConga: An open-source framework for mesoscopic superconductivity." Applied Physics Reviews 10, no. 1 (March 2023): 011317. http://dx.doi.org/10.1063/5.0100324.

Full text
Abstract:
We present SuperConga, an open-source framework for simulating equilibrium properties of unconventional and ballistic singlet superconductors, confined to two-dimensional (2D) mesoscopic grains in a perpendicular external magnetic field, at arbitrary low temperatures. It aims at being both fast and easy to use, enabling research without access to a computer cluster, and visualization in real-time with OpenGL. The core is written in C++ and CUDA, exploiting the embarrassingly parallel nature of the quasiclassical theory of superconductivity by utilizing the parallel computational power of modern graphics processing units. The framework self-consistently computes both the superconducting order-parameter and the induced vector potential and finds the current density, free energy, induced flux density, local density of states (LDOS), and the magnetic moment. A user-friendly Python frontend is provided, enabling simulation parameters to be defined via intuitive configuration files, or via the command-line interface, without requiring a deep understanding of implementation details. For example, complicated geometries can be created with relative ease. The framework ships with simple tools for analyzing and visualizing the results, including an interactive plotter for spectroscopy. An overview of the theory is presented, as well as examples showcasing the framework's capabilities and ease of use. The framework is free to download from https://gitlab.com/superconga/superconga , which also links to the extensive user manual, containing even more examples, tutorials, and guides. To demonstrate and benchmark SuperConga, we study the magnetostatics, thermodynamics, and spectroscopy of various phenomena. In particular, we study flux quantization in solenoids, vortex physics, surface Andreev bound-states, and a “phase crystal.” We compare our numeric results with analytics and present experimental observables, e.g., the magnetic moment and LDOS, measurable with, for example, scanning probes, STM, and magnetometry.
APA, Harvard, Vancouver, ISO, and other styles
42

Wang, Zhenwu, Benting Wan, and Mengjie Han. "A Three-Dimensional Visualization Framework for Underground Geohazard Recognition on Urban Road-Facing GPR Data." ISPRS International Journal of Geo-Information 9, no. 11 (November 11, 2020): 668. http://dx.doi.org/10.3390/ijgi9110668.

Full text
Abstract:
The identification of underground geohazards is always a difficult issue in the field of underground public safety. This study proposes an interactive visualization framework for underground geohazard recognition on urban roads, which constructs a whole recognition workflow by incorporating data collection, preprocessing, modeling, rendering and analyzing. In this framework, two proposed sampling point selection methods have been adopted to enhance the interpolated accuracy for the Kriging algorithm based on ground penetrating radar (GPR) technology. An improved Kriging algorithm was put forward, which applies a particle swarm optimization (PSO) algorithm to optimize the Kriging parameters and adopts in parallel the Compute Unified Device Architecture (CUDA) to run the PSO algorithm on the GPU side in order to raise the interpolated efficiency. Furthermore, a layer-constrained triangulated irregular network algorithm was proposed to construct the 3D geohazard bodies and the space geometry method was used to compute their volume information. The study also presents an implementation system to demonstrate the application of the framework and its related algorithms. This system makes a significant contribution to the demonstration and understanding of underground geohazard recognition in a three-dimensional environment.
APA, Harvard, Vancouver, ISO, and other styles
43

Dressler, Sven, and Daniel N. Wilke. "PyBONDEM-GPU: A discrete element bonded particle Python research framework – Development and examples." EPJ Web of Conferences 249 (2021): 14009. http://dx.doi.org/10.1051/epjconf/202124914009.

Full text
Abstract:
Discrete element modelling (DEM) is widely used to simulate granular systems, nowadays routinely on graphical processing units. Graphics processing units (GPUs) are inherently designed for parallel computation, and recent advances in the architecture, compiler design and language development are allowing general-purpose computation to be computed on multiple GPUs. Application of DEM to bonded particle systems are much less common, with a number of open research questions remaining. This study outlines a Bonded-Particle Research DEM Framework, PyBONDEM-GPU, written in Python. This framework leverages the parallel nature of GPUs for computational speed-up and the rapid prototype flexibility of Python. Python is faster and easier to learn than classical compiled languages, making computational simulation development accessible to undergraduate and graduate engineers. PyBONDEMGPU leverages the Numba-CUDA module to compile Python syntax for execution on GPUs. The framework enables research of fibre pull-out from fibre-matrix embeddings. Bonds are simulated between all interacting particles. The performance of PyBONDEM-GPU is compared against Python CPU implementations of PyBONDEM using the Numpy and Numba-CPU Python modules. PyBONDEM-GPU was found to be 1000 times faster than the Numpy implementation and 4 times faster than the Numba-CPU implementation to resolve forces and to integrate the equations of motion.
APA, Harvard, Vancouver, ISO, and other styles
44

Adeshina, Adekunle. "Automated Medical Visualization Application of Supervised Learning to Clinical Diagnosis, Disease and Therapy Management.docx." SLU Journal of Science and Technology 5, no. 1&2 (December 29, 2022): 104–14. http://dx.doi.org/10.56471/slujst.v5i.311.

Full text
Abstract:
The rapid advancement and development in high performance computing, ultrafast computing, autonomous technologies and complexity of biomedical data for visualization and image guidance play a significant role in modern surgery to help surgeons perform their surgical procedures. Brain tumour diagnosis requires an enhanced, effective as well as accurate 3-D visualization system for navigation, reference, diagnosis as well as documentation. The automatic and effective 3-D high performance artificial intelligence-enabled medical visualization framework was designed and implemented using automated machine learning (AutoML) to take the advantage of complexity in the underlying datasets to help specialists in identifying the most appropriate regions of interest and their associated hyper parameters for optimizing performance, whilst simultaneously attempting to maximize the reliability of resulting predictions. C# and Compute Unified Device Architecture (CUDA) in Microsoft.NET environment in comparison side by side with visual basic studio was used for the implementation. The framework was evaluated for rendering processing speed with brain datasets obtained from the department of surgery, University of North Carolina, United States. Interestingly, our framework achieves 3-D visualization of the human brain, reliable enough to detect and locate possible brain tumor with high interactive speed and accuracy.
APA, Harvard, Vancouver, ISO, and other styles
45

Luo, Xun, Wei Zhao, Bao Shun Liu, and Yan Ming Zhang. "A Feasible Parallel Monte Carlo Algorithm to Simulate Templated Grain Growth." Advanced Materials Research 332-334 (September 2011): 1868–71. http://dx.doi.org/10.4028/www.scientific.net/amr.332-334.1868.

Full text
Abstract:
It is proposed a parallel Monte Carlo algorithm to simulate templated grain growth in sintering ceramics materials. The algorithm applies the general Potts model to treat the matrix as the discrete lattices for simulating the grain growth and there will be a number of lattices to be computed synchronously. The scheme is performed by CUDA GPU parallelization programming framework which is of much more feasibility and low cost comparing with the former conventional program. The most key point is that the parallel algorithm is of great temporal performance which means it takes less time to complete a simulation. The results of comparative experiments show that the algorithm is unquestionable effective while the other statistic numerical features of simulations are almost the same.
APA, Harvard, Vancouver, ISO, and other styles
46

Fedorov, Eugene, Peter Nikolyuk, Olga Nechporenko, and Esta Chioma. "Intellectualization of a method for solving a logistics problem to optimize costs within the framework of Lean Production technology." Electronic Scientific Journal Intellectualization of Logistics and Supply Chain Management #1 2020 1, no. 3 (2020): 7–17. http://dx.doi.org/10.46783/smart-scm/2020-3-1.

Full text
Abstract:
In the article, within the framework of intellectualization of the Lean Production technology, it is proposed to optimize the costs arising from the insufficient efficiency of placing goods in the warehouse by creating an optimization method based on the immune metaheuristics of the T-cell model, which allows solving the knapsack constrained optimization problem. The proposed metaheuristic method does not require specifying the probability of mutation, the number of mutations, the number of selected new cells and allows using only binary potential solutions, which makes discrete optimization possible and reduces computational complexity by preventing permanent transformations of real potential solutions into intermediate binary ones and vice versa. An immune metaheuristic algorithm based on the T-cell model has been created, intended for implementation on the GPU using the CUDA parallel information processing technology. The proposed optimization method based on immune metaheuristics can be used to intellectualize the Lean Production technology. The prospects for further researches are to test the proposed methods on a wider set of test databases.
APA, Harvard, Vancouver, ISO, and other styles
47

Yuan, Jianying, Dequan Guo, Gexiang Zhang, Prithwineel Paul, Ming Zhu, and Qiang Yang. "A Resolution-Free Parallel Algorithm for Image Edge Detection within the Framework of Enzymatic Numerical P Systems." Molecules 24, no. 7 (March 29, 2019): 1235. http://dx.doi.org/10.3390/molecules24071235.

Full text
Abstract:
Image edge detection is a fundamental problem in image processing and computer vision, particularly in the area of feature extraction. However, the time complexity increases squarely with the increase of image resolution in conventional serial computing mode. This results in being unbearably time consuming when dealing with a large amount of image data. In this paper, a novel resolution free parallel implementation algorithm for gradient based edge detection, namely EDENP, is proposed. The key point of our method is the introduction of an enzymatic numerical P system (ENPS) to design the parallel computing algorithm for image processing for the first time. The proposed algorithm is based on a cell-like P system with a nested membrane structure containing four membranes. The start and stop of the system is controlled by the variables in the skin membrane. The calculation of edge detection is performed in the inner three membranes in a parallel way. The performance and efficiency of this algorithm are evaluated on the CUDA platform. The main advantage of EDENP is that the time complexity of O ( 1 ) can be achieved regardless of image resolution theoretically.
APA, Harvard, Vancouver, ISO, and other styles
48

Ho, Nhut-Minh, Himeshi De silva, and Weng-Fai Wong. "GRAM." ACM Transactions on Architecture and Code Optimization 18, no. 2 (March 2021): 1–24. http://dx.doi.org/10.1145/3441830.

Full text
Abstract:
This article presents GRAM (<underline>G</underline>PU-based <underline>R</underline>untime <underline>A</underline>daption for <underline>M</underline>ixed-precision) a framework for the effective use of mixed precision arithmetic for CUDA programs. Our method provides a fine-grain tradeoff between output error and performance. It can create many variants that satisfy different accuracy requirements by assigning different groups of threads to different precision levels adaptively at runtime . To widen the range of applications that can benefit from its approximation, GRAM comes with an optional half-precision approximate math library. Using GRAM, we can trade off precision for any performance improvement of up to 540%, depending on the application and accuracy requirement.
APA, Harvard, Vancouver, ISO, and other styles
49

Tang, Chin-I., Xianyue Deng, and Yuzuru Takashima. "Real-Time CGH Generation by CUDA-OpenGL Interoperability for Adaptive Beam Steering with a MEMS Phase SLM." Micromachines 13, no. 9 (September 15, 2022): 1527. http://dx.doi.org/10.3390/mi13091527.

Full text
Abstract:
Real-time, simultaneous, and adaptive beam steering into multiple regions of interest replaces conventional raster scanning with a less time-consuming and flexible beam steering framework, where only regions of interest are scanned by a laser beam. CUDA-OpenGL interoperability with a computationally time-efficient computer-generated hologram (CGH) calculation algorithm enables such beam steering by employing a MEMS-based phase light modulator (PLM) and a Texas Instruments Phase Light Modulator (TI-PLM). The real-time CGH generation and display algorithm is incorporated into the beam steering system with variable power and scan resolution, which are adaptively controlled by camera-based object recognition. With a mid-range laptop GPU and the current version of the MEMS-PLM, the demonstrated scanning speed can exceed 1000 points/s (number of beams > 5) and potentially exceeds 4000 points/s with state-of-the-art GPUs.
APA, Harvard, Vancouver, ISO, and other styles
50

Peng, Feng, Xiaoli Hao, and Fuxin Chai. "A GPU-Accelerated Two-Dimensional Hydrodynamic Model for Unstructured Grids." Water 15, no. 7 (March 25, 2023): 1300. http://dx.doi.org/10.3390/w15071300.

Full text
Abstract:
The precision of numerical overland flow models is limited by their computational cost. A GPU-accelerated 2D shallow flow model is developed to overcome this challenge in this study. The model employs a Godunov-type finite volume method (FVM) to solve shallow water equations (SWEs) with unstructured grids, while also considering rainfall, infiltration, bottom slope, and friction source terms. The numerical simulation demonstrates that this model has well-balanced and robust properties. In an experiment of urban rain-runoff and flood, the accuracy and stability of the model are further demonstrated. The model is programmed with CUDA, and each numerical computation term is processed in parallel to adopt multi-thread GPU acceleration technology. With the GPU computation framework, this model can achieve a speeding up ration around 75 to single-thread CPU in the dam-break flow for a large-scale application.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography