Journal articles on the topic 'GPU pipeline'

To see the other types of publications on this topic, follow the link: GPU pipeline.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'GPU pipeline.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Magro, A., K. Zarb Adami, and J. Hickish. "GPU-Powered Coherent Beamforming." Journal of Astronomical Instrumentation 04, no. 01n02 (June 2015): 1550002. http://dx.doi.org/10.1142/s2251171715500026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Graphics processing units (GPU)-based beamforming is a relatively unexplored area in radio astronomy, possibly due to the assumption that any such system will be severely limited by the PCIe bandwidth required to transfer data to the GPU. We have developed a CUDA-based GPU implementation of a coherent beamformer, specifically designed and optimized for deployment at the BEST-2 array which can generate an arbitrary number of synthesized beams for a wide range of parameters. It achieves [Formula: see text] TFLOPs on an NVIDIA Tesla K20, approximately 10x faster than an optimized, multithreaded CPU implementation. This kernel has been integrated into two real-time, GPU-based time-domain software pipelines deployed at the BEST-2 array in Medicina: a standalone beamforming pipeline and a transient detection pipeline. We present performance benchmarks for the beamforming kernel as well as the transient detection pipeline with beamforming capabilities as well as results of test observation.
2

Movania, Muhammad Mobeen, and Lin Feng. "A Novel GPU-Based Deformation Pipeline." ISRN Computer Graphics 2012 (December 15, 2012): 1–8. http://dx.doi.org/10.5402/2012/936315.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We present a new deformation pipeline that is independent of the integration solver used and allows fast rendering of deformable soft bodies on the GPU. The proposed method exploits the transform feedback mechanism of the modern GPU to bypass the CPU read-back, thus, reusing the modified positions and/or velocities of the deformable object in a single pass in real time. The whole process is being carried out on the GPU. Prior approaches have resorted to CPU read-back along with the GPGPU mechanism. In contrast, our approach does not require these steps thus saving the GPU bandwidth for other tasks. We describe our algorithm along with implementation details on the modern GPU and finally conclude with a look at the experimental results. We show how easy it is to integrate any existing integration solver into the proposed pipeline by implementing explicit Euler integration in the vertex shader on the GPU.
3

Vasyliv, О. B., О. S. Titlov, and Т. А. Sagala. "Modeling of the modes of natural gas transportation by main gas pipelines in the conditions of underloading." Oil and Gas Power Engineering, no. 2(32) (December 27, 2019): 35–42. http://dx.doi.org/10.31471/1993-9868-2019-2(32)-35-42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The current state of transit of natural gas through the Ukrainian gas transmission system (GTS) is estimated in the paper. The prerequisites for further reduction of the GTS load in the coming years are considered, in particular in the direction of Europe through the gas measuring station "Orlivka" (south direction), taking into account the construction of alternative bypass gas pipelines. On the basis of the review of literature sources on the problem of efficient operation of gas pipelines under conditions of underloading, a method for determining the capacity and energy consumption of the gas pipeline for a given combination of working gas pumping units (GPU) was developed. The Ananyev-Tiraspol-Izmail gas pipeline at Tarutino-Orlivka section was selected as the object of research. The methodology includes the calculation of the physical properties of gas by its composition, the calculation of gas compression, the calculation of the linear part, the gas flow to the compressor station's own needs, and the calculation of the total power of the gas-pumping units under the specified technological limitations. With the help of the original software developed in the MATLAB programming language, cyclical multivariate calculations of the capacity and energy consumption of the gas pipeline were carried out and the operating modes of the compressor shop were optimized in the load range from 23 ... 60 million m3/day. Optimization criterion is the minimum total capacity of the GPU. Variable parameters at the same time are the speeds of the superchargers, different combination of working GPU, load factor. According to the results of the optimization graphical dependences were constructed: the optimum frequency of the rotor of the supercharger on the performance of the pipeline; changes in power and pressure depending on the performance of the pipeline when operating a different combination of superchargers. Recommendations have been developed to minimize fuel gas costs at the compressor station.
4

Kingyens, Jeffrey, and J. Gregory Steffan. "The Potential for a GPU-Like Overlay Architecture for FPGAs." International Journal of Reconfigurable Computing 2011 (2011): 1–15. http://dx.doi.org/10.1155/2011/514581.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We propose a soft processor programming model and architecture inspired by graphics processing units (GPUs) that are well-matched to the strengths of FPGAs, namely, highly parallel and pipelinable computation. In particular, our soft processor architecture exploits multithreading, vector operations, and predication to supply a floating-point pipeline of 64 stages via hardware support for up to 256 concurrent thread contexts. The key new contributions of our architecture are mechanisms for managing threads and register files that maximize data-level and instruction-level parallelism while overcoming the challenges of port limitations of FPGA block memories as well as memory and pipeline latency. Through simulation of a system that (i) is programmable via NVIDIA's high-levelCglanguage, (ii) supports AMD's CTM r5xx GPU ISA, and (iii) is realizable on an XtremeData XD1000 FPGA-based accelerator system, we demonstrate the potential for such a system to achieve 100% utilization of a deeply pipelined floating-point datapath.
5

Wang, Ke Nian, and Hui Min Du. "The FPGA Design and Implementation of Pipeline Image Processing in the GPU System." Applied Mechanics and Materials 380-384 (August 2013): 3807–10. http://dx.doi.org/10.4028/www.scientific.net/amm.380-384.3807.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In the GPU system, pipeline image processing is facing the problem that a large amount of data to be processed, complicated processing procedure, more data transmission channels, and etc. All of these lead to low processing speed and large circuit area. This paper proposed a FPGA design of the pipeline image processing in GPU. The design has been implemented by foam extrusion pipeline architecture and validated on Xilinx Virtex XC6VLX550T FPGA. The results show that the consumption of resources is 390726.09 and the speed is 200MHz.
6

Xiang, Yue, Peng Wang, Bo Yu, and Dongliang Sun. "GPU-accelerated hydraulic simulations of large-scale natural gas pipeline networks based on a two-level parallel process." Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles 75 (2020): 86. http://dx.doi.org/10.2516/ogst/2020076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The numerical simulation efficiency of large-scale natural gas pipeline network is usually unsatisfactory. In this paper, Graphics Processing Unit (GPU)-accelerated hydraulic simulations for large-scale natural gas pipeline networks are presented. First, based on the Decoupled Implicit Method for Efficient Network Simulation (DIMENS) method, presented in our previous study, a novel two-level parallel simulation process and the corresponding parallel numerical method for hydraulic simulations of natural gas pipeline networks are proposed. Then, the implementation of the two-level parallel simulation in GPU is introduced in detail. Finally, some numerical experiments are provided to test the performance of the proposed method. The results show that the proposed method has notable speedup. For five large-scale pipe networks, compared with the well-known commercial simulation software SPS, the speedup ratio of the proposed method is up to 57.57 with comparable calculation accuracy. It is more inspiring that the proposed method has strong adaptability to the large pipeline networks, the larger the pipeline network is, the larger speedup ratio of the proposed method is. The speedup ratio of the GPU method approximately linearly depends on the total discrete points of the network.
7

Akyüz, Ahmet Oğuz. "High dynamic range imaging pipeline on the GPU." Journal of Real-Time Image Processing 10, no. 2 (September 12, 2012): 273–87. http://dx.doi.org/10.1007/s11554-012-0270-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cao, Wei, Zheng Hua Wang, and Chuan Fu Xu. "A Survey of General Purpose Computation of GPU for Computational Fluid Dynamics." Advanced Materials Research 753-755 (August 2013): 2731–35. http://dx.doi.org/10.4028/www.scientific.net/amr.753-755.2731.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The graphics processing unit (GPU) has evolved from configurable graphics processor to a powerful engine for high performance computer. In this paper, we describe the graphics pipeline of GPU, and introduce the history and evolution of GPU architecture. We also provide a summary of software environments used on GPU, from graphics APIs to non-graphics APIs. At last, we present the GPU computing in computational fluid dynamics applications, including the GPGPU computing for Navier-Stokes equations methods and the GPGPU computing for Lattice Boltzmann method.
9

Abdellah, Marwan, Ayman Eldeib, and Amr Sharawi. "High Performance GPU-Based Fourier Volume Rendering." International Journal of Biomedical Imaging 2015 (2015): 1–13. http://dx.doi.org/10.1155/2015/590727.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Fourier volume rendering (FVR) is a significant visualization technique that has been used widely in digital radiography. As a result of itsO(N2log⁡N)time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that areO(N3)computationally complex. Relying on theFourier projection-slice theorem, this technique operates on the spectral representation of a 3D volume instead of processing its spatial representation to generate attenuation-only projections that look likeX-ray radiographs. Due to the rapid evolution of its underlying architecture, the graphics processing unit (GPU) became an attractive competent platform that can deliver giant computational raw power compared to the central processing unit (CPU) on a per-dollar-basis. The introduction of the compute unified device architecture (CUDA) technology enables embarrassingly-parallel algorithms to run efficiently on CUDA-capable GPU architectures. In this work, a high performance GPU-accelerated implementation of the FVR pipeline on CUDA-enabled GPUs is presented. This proposed implementation can achieve a speed-up of 117x compared to a single-threaded hybrid implementation that uses the CPU and GPU together by taking advantage of executing the rendering pipeline entirely on recent GPU architectures.
10

Cheng, Sining, Huiyan Qu, and Xianjun Chen. "Ray tracing collision detection based on GPU pipeline reorganization." Journal of Physics: Conference Series 1732 (January 2021): 012057. http://dx.doi.org/10.1088/1742-6596/1732/1/012057.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Gong, Qian, Esteban Vera, Dathon R. Golish, Steven D. Feller, David J. Brady, and Michael E. Gehm. "Model-Based Multiscale Gigapixel Image Formation Pipeline on GPU." IEEE Transactions on Computational Imaging 3, no. 3 (September 2017): 493–502. http://dx.doi.org/10.1109/tci.2016.2612942.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Fu, Zhisong, T. James Lewis, Robert M. Kirby, and Ross T. Whitaker. "Architecting the finite element method pipeline for the GPU." Journal of Computational and Applied Mathematics 257 (February 2014): 195–211. http://dx.doi.org/10.1016/j.cam.2013.09.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Ye, Chang, Yuchen Li, Shixuan Sun, and Wentian Guo. "gSWORD: GPU-accelerated Sampling for Subgraph Counting." Proceedings of the ACM on Management of Data 2, no. 1 (March 12, 2024): 1–26. http://dx.doi.org/10.1145/3639288.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Subgraph counting is a fundamental component for many downstream applications such as graph representation learning and query optimization.Since obtaining the exact count is often intractable,there have been a plethora of approximation methods on graph sampling techniques. Nonetheless, the state-of-the-art sampling methods still require massive samples to produce accurate approximations on large data graphs.We propose gSWORD, a GPU framework that leverages the massive parallelism of GPUs to accelerate iterative sampling algorithms for subgraph counting. Despite the embarrassingly parallel nature of the samples, there are unique challenges in accelerating subgraph counting due to its irregular computation logic. To address these challenges, we introduce two GPU-centric optimizations: (1) sample inheritance, enabling threads to inherit samples from neighboring threads to avoid idling, and (2) warp streaming, effectively distributing workloads among threads through a streaming process. Moreover, we propose a CPU-GPU co-processing pipeline that overlaps the sampling and enumeration processes to mitigate the underestimation issue. Experimental results demonstrate that deploying state-of-the-art sampling algorithms on gSWORD can perform millions of samples per second. The co-processing pipeline substantially improves the estimation accuracy in the cases where existing methods encounter severe underestimations with negligible overhead.
14

Kim, Do-Hyun, and Chi-Yong Kim. "Design of a SIMT architecture GP-GPU Using Tile based on Graphic Pipeline Structure." Journal of IKEEE 20, no. 1 (March 31, 2016): 75–81. http://dx.doi.org/10.7471/ikeee.2016.20.1.075.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Georgii, Joachim, and Rudiger Westermann. "A Generic and Scalable Pipeline for GPU Tetrahedral Grid Rendering." IEEE Transactions on Visualization and Computer Graphics 12, no. 5 (September 2006): 1345–52. http://dx.doi.org/10.1109/tvcg.2006.110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Kenzel, Michael, Bernhard Kerbl, Dieter Schmalstieg, and Markus Steinberger. "A high-performance software graphics pipeline architecture for the GPU." ACM Transactions on Graphics 37, no. 4 (August 10, 2018): 1–15. http://dx.doi.org/10.1145/3197517.3201374.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Hou, Yi, Rongke Liu, Hao Peng, and Ling Zhao. "High Throughput Pipeline Decoder for LDPC Convolutional Codes on GPU." IEEE Communications Letters 19, no. 12 (December 2015): 2066–69. http://dx.doi.org/10.1109/lcomm.2015.2486764.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

MAGRO, A., J. HICKISH, and K. Z. ADAMI. "MULTIBEAM GPU TRANSIENT PIPELINE FOR THE MEDICINA BEST-2 ARRAY." Journal of Astronomical Instrumentation 02, no. 01 (September 2013): 1350008. http://dx.doi.org/10.1142/s2251171713500086.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Radio transient discovery using next generation radio telescopes will pose several digital signal processing and data transfer challenges, requiring specialized high-performance backends. Several accelerator technologies are being considered as prototyping platforms, including Graphics Processing Units (GPUs). In this paper we present a real-time pipeline prototype capable of processing multiple beams concurrently, performing Radio Frequency Interference (RFI) rejection through thresholding, correcting for the delay in signal arrival times across the frequency band using brute-force dedispersion, event detection and clustering, and finally candidate filtering, with the capability of persisting data buffers containing interesting signals to disk. This setup was deployed at the BEST-2 SKA pathfinder in Medicina, Italy, where several benchmarks and test observations of astrophysical transients were conducted. These tests show that on the deployed hardware eight 20 MHz beams can be processed simultaneously for ~640 Dispersion Measure (DM) values. Furthermore, the clustering and candidate filtering algorithms employed prove to be good candidates for online event detection techniques. The number of beams which can be processed increases proportionally to the number of servers deployed and number of GPUs, making it a viable architecture for current and future radio telescopes.
19

Braga, Giani, Marcio M. Gonçalves, and José Rodrigo Azambuja. "Software-controlled pipeline parity in GPU architectures for error detection." Microelectronics Reliability 148 (September 2023): 115155. http://dx.doi.org/10.1016/j.microrel.2023.115155.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

LI, PING, HANQIU SUN, JIANBING SHEN, and CHEN HUANG. "HDR IMAGE RERENDERING USING GPU-BASED PROCESSING." International Journal of Image and Graphics 12, no. 01 (January 2012): 1250007. http://dx.doi.org/10.1142/s0219467812500076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
One essential process in image rerendering is to replace existing texture in the region of interest by other user-preferred textures, while preserving the shading and similar texture distortion. In this paper, we propose the graphics processing units (GPU)-accelerated high dynamic range (HDR) image rerendering using revisited NLM processing in parallel on GPU-CUDA platform, to reproduce the realistic rendering of HDR images with retexturing and transparent/translucent effects. Our image-based approach using GPU-based pipeline in gradient domain provides efficient processing with easy-control image retexturing and special shading effects. The experimental results showed the efficiency and high-quality performance of our approach.
21

GARBA, MICHAEL T., and HORACIO GONZÁLEZ–VÉLEZ. "ASYMPTOTIC PEAK UTILISATION IN HETEROGENEOUS PARALLEL CPU/GPU PIPELINES: A DECENTRALISED QUEUE MONITORING STRATEGY." Parallel Processing Letters 22, no. 02 (May 16, 2012): 1240008. http://dx.doi.org/10.1142/s0129626412400087.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Widespread heterogeneous parallelism is unavoidable given the emergence of General-Purpose computing on graphics processing units (GPGPU). The characteristics of a Graphics Processing Unit (GPU)—including significant memory transfer latency and complex performance characteristics—demand new approaches to ensuring that all available computational resources are efficiently utilised. This paper considers the simple case of a divisible workload based on widely-used numerical linear algebra routines and the challenges that prevent efficient use of all resources available to a naive SPMD application using the GPU as an accelerator. We suggest a possible queue monitoring strategy that facilitates resource usage with a view to balancing the CPU/GPU utilisation for applications that fit the pipeline parallel architectural pattern on heterogeneous multicore/multi-node CPU and GPU systems. We propose a stochastic allocation technique that may serve as a foundation for heuristic approaches to balancing CPU/GPU workloads.
22

Um, Taegeon, Byungsoo Oh, Byeongchan Seo, Minhyeok Kweun, Goeun Kim, and Woo-Yeon Lee. "FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline." Proceedings of the VLDB Endowment 16, no. 5 (January 2023): 1086–99. http://dx.doi.org/10.14778/3579075.3579083.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
When training a deep learning (DL) model, input data are pre-processed on CPUs and transformed into tensors, which are then fed into GPUs for gradient computations of model training. Expensive GPUs must be fully utilized during training to accelerate the training speed. However, intensive CPU operations for input data preprocessing (input pipeline) often lead to CPU bottlenecks; correspondingly, various DL training jobs suffer from GPU under-utilization. We propose FastFlow, a DL training system that automatically mitigates the CPU bottleneck by offloading (scaling out) input pipelines to remote CPUs. FastFlow carefully decides various offloading decisions based on performance metrics specific to applications and allocated resources, while leveraging both local and remote CPUs to prevent the inefficient use of remote resources and minimize the training time. FastFlow's smart offloading policy and mechanisms are seamlessly integrated with TensorFlow for users to enjoy the smart offloading features without modifying the main logic. Our evaluations on our private DL cloud with diverse workloads on various resource environments show that FastFlow improves the training throughput by 1 ~ 4.34X compared to TensorFlow without offloading, by 1 ~ 4.52X compared to TensorFlow with manual CPU offloading (tf.data.service), and by 0.63 ~ 2.06X compared to GPU offloading (DALI).
23

Mileff, Péter, and Judit Dudra. "Effective Pixel Rendering in Practice." Production Systems and Information Engineering 10, no. 1 (2022): 1–15. http://dx.doi.org/10.32968/psaie.2022.1.1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The graphics processing unit (GPU) has now become an integral part of our lives through both desktop and portable devices. Thanks to dedicated hardware, visualization has been significantly accelerated, softwares today only use the GPU for rasterization. As a result of this development, now we use only triangle-based rendering, and pixel-based image manipulations can only be performed using shaders. It can be stated that today’s GPU pipeline cannot provide the same flexibility as the previous software implementation. This paper discusses an efficient software implementation of pixel-based rasterization. After reviewing the current GPU-based drawing process, we will show how to access pixel level drawing in this environment. Finally, a more efficient storage and display format than the classic solution is presented, which performance far exceeds the previous solution.
24

Carrazza, Stefano, Juan Cruz-Martinez, Marco Rossi, and Marco Zaro. "MadFlow: towards the automation of Monte Carlo simulation on GPU for particle physics processes." EPJ Web of Conferences 251 (2021): 03022. http://dx.doi.org/10.1051/epjconf/202125103022.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this proceedings we present MadFlow, a new framework for the automation of Monte Carlo (MC) simulation on graphics processing units (GPU) for particle physics processes. In order to automate MC simulation for a generic number of processes, we design a program which provides to the user the possibility to simulate custom processes through the Mad-Graph5_aMC@NLO framework. The pipeline includes a first stage where the analytic expressions for matrix elements and phase space are generated and exported in a GPU-like format. The simulation is then performed using the VegasFlow and PDFFlow libraries which deploy automatically the full simulation on systems with different hardware acceleration capabilities, such as multi-threading CPU, single-GPU and multi-GPU setups. We show some preliminary results for leading-order simulations on different hardware configurations.
25

Li, Tao, Qiankun Dong, Yifeng Wang, Xiaoli Gong, and Yulu Yang. "Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing." Soft Computing 23, no. 3 (September 6, 2017): 859–69. http://dx.doi.org/10.1007/s00500-017-2795-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Gou, Chunyang, and Georgi N. Gaydadjiev. "Addressing GPU On-Chip Shared Memory Bank Conflicts Using Elastic Pipeline." International Journal of Parallel Programming 41, no. 3 (July 3, 2012): 400–429. http://dx.doi.org/10.1007/s10766-012-0201-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Sánchez-Rojas, José Armando, José Aníbal Arias-Aguilar, Hiroshi Takemura, and Alberto Elías Petrilli-Barceló. "Staircase Detection, Characterization and Approach Pipeline for Search and Rescue Robots." Applied Sciences 11, no. 22 (November 14, 2021): 10736. http://dx.doi.org/10.3390/app112210736.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Currently, most rescue robots are mainly teleoperated and integrate some level of autonomy to reduce the operator’s workload, allowing them to focus on the primary mission tasks. One of the main causes of mission failure are human errors and increasing the robot’s autonomy can increase the probability of success. For this reason, in this work, a stair detection and characterization pipeline is presented. The pipeline is tested on a differential drive robot using the ROS middleware, YOLOv4-tiny and a region growing based clustering algorithm. The pipeline’s staircase detector was implemented using the Neural Compute Engines (NCEs) of the OpenCV AI Kit with Depth (OAK-D) RGB-D camera, which allowed the implementation using the robot’s computer without a GPU and, thus, could be implemented in similar robots to increase autonomy. Furthermore, by using this pipeline we were able to implement a Fuzzy controller that allows the robot to align itself, autonomously, with the staircase. Our work can be used in different robots running the ROS middleware and can increase autonomy, allowing the operator to focus on the primary mission tasks. Furthermore, due to the design of the pipeline, it can be used with different types of RGB-D cameras, including those that generate noisy point clouds from low disparity depth images.
28

Zhuo, Jianghao, Ling Wang, Ke Xu, and Jianwei Wan. "A Coupling Graphic Pipeline with Normal Mode Model for Rapid Calculation of Underwater Acoustic Field." Shock and Vibration 2021 (January 29, 2021): 1–7. http://dx.doi.org/10.1155/2021/8847664.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Rapid execution is required in operation-oriented applications in underwater acoustic modelling. In this paper, the GPU graphic pipeline is used to accelerate the calculation of high-resolution sound field image in the normal mode model of underwater acoustic propagation. The computer times of the proposed graphic pipeline method, the MATLAB code, and the C# code are compared for a stratified shallow water waveguide using the KRAKEN model at different frequencies. The research validates that the graphic pipeline method outperforms the classic CPU-based methods in terms of execution speed at the frequencies where the eigenvalue equation in normal mode models can be solved.
29

Nie, Xiao, Leiting Chen, and Tao Xiang. "Real-Time Incompressible Fluid Simulation on the GPU." International Journal of Computer Games Technology 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/417417.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We present a parallel framework for simulating incompressible fluids with predictive-corrective incompressible smoothed particle hydrodynamics (PCISPH) on the GPU in real time. To this end, we propose an efficient GPU streaming pipeline to map the entire computational task onto the GPU, fully exploiting the massive computational power of state-of-the-art GPUs. In PCISPH-based simulations, neighbor search is the major performance obstacle because this process is performed several times at each time step. To eliminate this bottleneck, an efficient parallel sorting method for this time-consuming step is introduced. Moreover, we discuss several optimization techniques including using fast on-chip shared memory to avoid global memory bandwidth limitations and thus further improve performance on modern GPU hardware. With our framework, the realism of real-time fluid simulation is significantly improved since our method enforces incompressibility constraint which is typically ignored due to efficiency reason in previous GPU-based SPH methods. The performance results illustrate that our approach can efficiently simulate realistic incompressible fluid in real time and results in a speed-up factor of up to 23 on a high-end NVIDIA GPU in comparison to single-threaded CPU-based implementation.
30

WU, JIAWEN, FENGQUAN ZHANG, and XUKUN SHEN. "GPU-BASED FLUID SIMULATION WITH FAST COLLISION DETECTION ON BOUNDARIES." International Journal of Modeling, Simulation, and Scientific Computing 03, no. 01 (March 2012): 1240003. http://dx.doi.org/10.1142/s179396231240003x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this paper, we present a method for fluid simulation based on smoothed particle hydrodynamic (SPH) with fast collision detection on boundaries on GPU. The major goal of our algorithm is to get a fast SPH simulation and rendering on GPU. Additionally, our algorithm has the following three features: At first, to make the SPH method GPU-friendly, we introduce a spatial hash method for neighbor search. After sorting the particles based on their grid index, neighbor search can be done quickly on GPU. Second, we propose a fast particle-boundary collision detection method. By precomputing the distance field of scene boundaries, collision detection's computing cost arrived as O(n), which is much faster than the traditional way. Third, we propose a pipeline with fine-detail surface reconstruction, and progressive photon mapping working on GPU. We experiment our algorithm on different situations and particle numbers of scenes, and find out that our method gets good results. Our experimental data shows that we can simulate 100K particles, and up to 1000K particles scene at a rate of approximately 2 times per second.
31

Zamikhovskyi, L. M., O. L. Zamikhovska, and V. V. Pavlyk. "Methodology for monitoring the technical condition of GPU type GTK-25i in the process of operation." Scientific Bulletin of Ivano-Frankivsk National Technical University of Oil and Gas, no. 2(49) (December 30, 2020): 106–16. http://dx.doi.org/10.31471/1993-9965-2020-2(49)-106-116.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In the early 1980s, 120 gas compressor units (GPU) type GTK-25i were installed on the Urengoy-Pomary-Uzhgorod transcontinental gas pipeline, and three of them are in operation at CS-39 “U-P-U” of the Bogorodchansk Linear Production Department of trunk gas pipelines. Today, about 80% of GPU type GTK-25i have worked out the established service life, or those close to it. Their further operation does not ensure reliable and efficient operation, and therefore numerous failures and accidents occur, leading to significant economic losses. Methods of parametric and vibroacoustic diagnostics of GPU are analyzed. It is noted that the most fruitful years of development of the methods of vibroacoustic diagnostics of GPUs are the 70-90s of the last century. Today, their development is taking place in the direction of using modern information technologies and various transformations in the processing of vibroacoustic processes to identify diagnostic signs of the technical state of the GPU. The methods of diagnosing GPU type GTK-25i the analysis showed their absence. The exception is certain methods of their diagnostics based on modern information technologies, which were developed by the authors of the article. At the same time, the carried out improvement of the automatic control system (ACS) of the GPU type GTK-25i in terms of its technical and software makes it possible to obtain information about additional, in comparison with the standard ACS, technological parameters of the GPU type GTK-25i operation and vibroacoustic processes that accompany its operation. and can be used to create diagnostic methods for GPU type GTK-25i. The methodology for monitoring the technical condition of GPU type GTK-25i based on the determination of the highest values of the discriminant functions for each of the three technical states of GPU type GTK-25i for 16 technological parameters and acoustic and vibration characteristics is considered. At the same time, the best "nominal" condition is considered to be the state of GPU type GTK-25i after the repair work, the "defective" state "- before the repair work, and" current "- after the corresponding operating time of the GPU type GTK-25i. The use of the technique made it possible to develop a complex method, which is a combination of parametric and vibroacoustic diagnostics methods. It is shown that the use of the proposed method allows tracing the trend of changes in the technical state of GPU type GTK-25i in time and predicting the moment of its decommissioning. The developed method does not require additional technical means for its implementation, as it receives information from the improved ACS GPU type GTK-25i, which, in turn, can use the diagnostic results to control the gas compression process, taking into account the technical condition of the GPU type GTK-25i.
32

Peng, Bo, Tianqi Wang, Xi Jin, and Chuanjun Wang. "An Accelerating Solution forN-Body MOND Simulation with FPGA-SoC." International Journal of Reconfigurable Computing 2016 (2016): 1–10. http://dx.doi.org/10.1155/2016/4592780.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
As a modified-gravity proposal to handle the dark matter problem on galactic scales, Modified Newtonian Dynamics (MOND) has shown a great success. However, theN-body MOND simulation is quite challenged by its computation complexity, which appeals to acceleration of the simulation calculation. In this paper, we present a highly integrated accelerating solution forN-body MOND simulations. By using the FPGA-SoC, which integrates both FPGA and SoC (system on chip) in one chip, our solution exhibits potentials for better performance, higher integration, and lower power consumption. To handle the calculation bottleneck of potential summation, on one hand, we develop a strategy to simplify the pipeline, in which the square calculation task is conducted by the DSP48E1 of Xilinx 7 series FPGAs, so as to reduce the logic resource utilization of each pipeline; on the other hand, advantages of particle-mesh scheme are taken to overcome the bottleneck on bandwidth. Our experiment results show that 2 more pipelines can be integrated in Zynq-7020 FPGA-SoC with the simplified pipeline, and the bandwidth requirement is reduced significantly. Furthermore, our accelerating solution has a full range of advantages over different processors. Compared with GPU, our work is about 10 times better in performance per watt and 50% better in performance per cost.
33

Vázquez, Sergio, and Margarita Amor. "Texture Mapping on NURBS Surface." Proceedings 2, no. 18 (September 17, 2018): 1197. http://dx.doi.org/10.3390/proceedings2181197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Texture mapping allows high resolution details over 3D surfaces. Nevertheless, texture mapping has a number of unresolved problems such as distortion, boundary between textures or filtering. On the other hand, NURBS surfaces are usually decomposed into a set of Bézier surfaces, since NURBS surface can not be directly rendered by GPU. In this work, we propose a texture mapping directly on the NURBS surfaces using the RPNS (Rendering Pipeline for NURBS Surface) method, which allows the rendering of NURBS surface directly on the GPU. Our proposal facilitates the implementation while minimizing the cost of storage, mitigating distortions and stitching between textures.
34

Kunimoto, Michelle, Evan Tey, Willie Fong, Katharine Hesse, Glen Petitpas, and Avi Shporer. "QLP Data Release Notes 003: GPU-based Transit Search." Research Notes of the AAS 7, no. 2 (February 16, 2023): 28. http://dx.doi.org/10.3847/2515-5172/acbc13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract The Quick-Look Pipeline (QLP; Huang et al. 2020; Kunimoto et al. 2021, and references therein) searches for transit signals in the multi-sector light curves of several hundreds of thousand stars observed by TESS every 27.4 day sector. The computational expense of the planet search has grown considerably over time, especially as the TESS observing baseline continues to increase in the second Extended Mission. Starting in Sector 59, QLP has switched to a significantly faster GPU-based transit search capable of searching an entire sector in only ∼1 day. We describe its implementation and performance.
35

Xiong, Ruicheng, Yang Lu, Cong Chen, Jiaming Zhu, Yajun Zeng, and Ligang Liu. "ETER: Elastic Tessellation for Real-Time Pixel-Accurate Rendering of Large-Scale NURBS Models." ACM Transactions on Graphics 42, no. 4 (July 26, 2023): 1–13. http://dx.doi.org/10.1145/3592419.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We present ETER, an elastic tessellation framework for rendering large-scale NURBS models with pixel-accurate and crack-free quality at real-time frame rates. We propose a highly parallel adaptive tessellation algorithm to achieve pixel accuracy, measured by the screen space error between the exact surface and its triangulation. To resolve a bottleneck in NURBS rendering, we present a novel evaluation method based on uniform sampling grids and accelerated by GPU Tensor Cores. Compared to evaluation based on hardware tessellation, our method has achieved a significant speedup of 2.9 to 16.2 times depending on the degrees of the patches. We develop an efficient crack-filling algorithm based on conservative rasterization and visibility buffer to fill the tessellation-induced cracks while greatly reducing the jagged effect introduced by conservative rasterization. We integrate all our novel algorithms, implemented in CUDA, into a GPU NURBS rendering pipeline based on Mesh Shaders and hybrid software/hardware rasterization. Our performance data on a commodity GPU show that the rendering pipeline based on ETER is capable of rendering up to 3.7 million patches (0.25 billion tessellated triangles) in real-time (30FPS). With its advantages in performance, scalability, and visual quality in rendering large-scale NURBS models, a real-time tessellation solution based on ETER can be a powerful alternative or even a potential replacement for the existing pre-tessellation solution in CAD systems.
36

Li, Zhifang, Beicheng Peng, and Chuliang Weng. "XeFlow: Streamlining Inter-Processor Pipeline Execution for the Discrete CPU-GPU Platform." IEEE Transactions on Computers 69, no. 6 (June 1, 2020): 819–31. http://dx.doi.org/10.1109/tc.2020.2968302.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Babbitt, Gregory A., Jamie S. Mortensen, Erin E. Coppola, Lily E. Adams, and Justin K. Liao. "DROIDS 1.20: A GUI-Based Pipeline for GPU-Accelerated Comparative Protein Dynamics." Biophysical Journal 114, no. 5 (March 2018): 1009–17. http://dx.doi.org/10.1016/j.bpj.2018.01.020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Guo, Xiangyu, Qi Chu, Shin Kee Chung, Zhihui Du, Linqing Wen, and Yanqi Gu. "GPU-acceleration on a low-latency binary-coalescence gravitational wave search pipeline." Computer Physics Communications 231 (October 2018): 62–71. http://dx.doi.org/10.1016/j.cpc.2018.05.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Nicolas-Barreales, Gonzalo, Aaron Sujar, and Alberto Sanchez. "A Web-Based Tool for Simulating Molecular Dynamics in Cloud Environments." Electronics 10, no. 2 (January 15, 2021): 185. http://dx.doi.org/10.3390/electronics10020185.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Molecular dynamics simulations take advantage of supercomputing environments, e.g., to solve molecular systems composed of millions of atoms. Supercomputers are increasing their computing and memory power while they are becoming more complex with the introduction of Multi-GPU environments. Despite these capabilities, the molecular dynamics simulation is not an easy process. It requires properly preparing the simulation data and configuring the entire operation, e.g., installing and managing specific software packages to take advantage of the potential of Multi-GPU supercomputers. We propose a web-based tool that facilitates the management of molecular dynamics workflows to be used in combination with a multi-GPU cloud environment. The tool allows users to perform data pipeline and run the simulation in a cloud environment, even for those who are not specialized in the development of molecular dynamics simulators or cloud management.
40

Va, Hongly, Min-Hyung Choi, and Min Hong. "Real-Time Cloth Simulation Using Compute Shader in Unity3D for AR/VR Contents." Applied Sciences 11, no. 17 (September 6, 2021): 8255. http://dx.doi.org/10.3390/app11178255.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
While the cloth component in Unity engine has been used to represent the 3D cloth object for augmented reality (AR) and virtual reality (VR), it has several limitations in term of resolution and performance. The purpose of our research is to develop a stable cloth simulation based on a parallel algorithm. The method of a mass–spring system is applied to real-time cloth simulation with three types of springs. However, cloth simulation using the mass–spring system requires a small integration time-step to use a large stiffness coefficient. Furthermore, constraint enforcement is applied to obtain the stable behavior of the cloth model. To reduce the computational burden of constraint enforcement, the adaptive constraint activation and deactivation (ACAD) technique that includes the mass–spring system and constraint enforcement method is applied to prevent excessive elongation of the cloth. The proposed algorithm utilizes the graphics processing unit (GPU) parallel processing, and implements it in Compute Shader that executes in different pipelines to the rendering pipeline. In this paper, we investigate the performance and compare the behavior of the mass–spring system, constraint enforcement, and ACAD techniques using a GPU-based parallel method.
41

Fang, Juan, Zelin Wei, and Huijing Yang. "Locality-Based Cache Management and Warp Scheduling for Reducing Cache Contention in GPU." Micromachines 12, no. 10 (October 17, 2021): 1262. http://dx.doi.org/10.3390/mi12101262.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
GPGPUs has gradually become a mainstream acceleration component in high-performance computing. The long latency of memory operations is the bottleneck of GPU performance. In the GPU, multiple threads are divided into one warp for scheduling and execution. The L1 data caches have little capacity, while multiple warps share one small cache. That makes the cache suffer a large amount of cache contention and pipeline stall. We propose Locality-Based Cache Management (LCM), combined with the Locality-Based Warp Scheduling (LWS), to reduce cache contention and improve GPU performance. Each load instruction can be divided into three types according to locality: only used once as streaming data locality, accessed multiple times in the same warp as intra-warp locality, and accessed in different warps as inter-warp data locality. According to the locality of the load instruction, LWS applies cache bypass to the streaming locality request to improve the cache utilization rate, extend inter-warp memory request coalescing to make full use of the inter-warp locality, and combine with the LWS to alleviate cache contention. LCM and LWS can effectively improve cache performance, thereby improving overall GPU performance. Through experimental evaluation, our LCM and LWS can obtain an average performance improvement of 26% over baseline GPU.
42

Lee, Seokwon, Inmo Ban, Myeongjin Lee, Yunho Jung, and Wookyung Lee. "Architecture Exploration of a Backprojection Algorithm for Real-Time Video SAR." Sensors 21, no. 24 (December 10, 2021): 8258. http://dx.doi.org/10.3390/s21248258.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper explores novel architectures for fast backprojection based video synthetic aperture radar (BP-VISAR) with multiple GPUs. The video SAR frame rate is analyzed for non-overlapped and overlapped aperture modes. For the parallelization of the backprojection process, a processing data unit is defined as the phase history data or range profile data from partial synthetic-apertures divided from the full resolution target data. Considering whether full-aperture processing is performed and range compression or backprojection are parallelized on a GPU basis, we propose six distinct architectures, each having a single-stream pipeline with a single GPU. The performance of these architectures is evaluated in both non-overlapped and overlapped modes. The efficiency of the BP-VISAR architecture with sub-aperture processing in the overlapped mode is accelerated further by filling the processing gap from the idling GPU resources with multi-stream based backprojection on multiple GPUs. The frame rate of the proposed BP-VISAR architecture with sub-aperture processing is scalable with the number of GPU devices for large pixel resolution. It can generate 4096 × 4096 video SAR frames of 0.5 m cross-range resolution in 23.0 Hz on a single GPU and 73.5 Hz on quad GPUs.
43

Mo, Tiexiang, and Guodong Li. "Parallel Accelerated Fifth-Order WENO Scheme-Based Pipeline Transient Flow Solution Model." Applied Sciences 12, no. 14 (July 21, 2022): 7350. http://dx.doi.org/10.3390/app12147350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The water hammer phenomenon is the main problem in long-distance pipeline networks. The MOC (Method of characteristics) and finite difference methods lead to severe constraints on the mesh and Courant number, while the finite volume method of the second-order Godunov scheme has limited intermittent capture capability. These methods will produce severe numerical dissipation, affecting the computational efficiency at low Courant numbers. Based on the lax-Friedrichs flux splitting method, combined with the upstream and downstream virtual grid boundary conditions, this paper uses the high-precision fifth-order WENO scheme to reconstruct the interface flux and establishes a finite volume numerical model for solving the transient flow in the pipeline. The model adopts the GPU parallel acceleration technology to improve the program’s computational efficiency. The results show that the model maintains the excellent performance of intermittent excitation capture without spurious oscillations even at a low Courant number. Simultaneously, the model has a high degree of flexibility in meshing due to the high insensitivity to the Courant number. The number of grids in the model can be significantly reduced and higher computational efficiency can be obtained compared with MOC and the second-order Godunov scheme. Furthermore, this paper analyzes the acceleration effect in different grids. Accordingly, the acceleration effect of the GPU technique increases significantly with the increase in the number of computational grids. This model can support efficient and accurate fast simulation and prediction of non-constant transient processes in long-distance water pipeline systems.
44

Kozlenko, Mykola, Olena Zamikhovska, and Leonid Zamikhovskyi. "Software implemented fault diagnosis of natural gas pumping unit based on feedforward neural network." Eastern-European Journal of Enterprise Technologies 2, no. 2 (110) (April 30, 2021): 99–109. http://dx.doi.org/10.15587/1729-4061.2021.229859.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In recent years, more and more attention has been paid to the use of artificial neural networks (ANN) for the diagnostics of gas pumping units (GPU). Usually, ANN training is carried out on GPU workflow models, and generated sets of diagnostic data are used to simulate defect conditions. At the same time, the results obtained do not allow assessing the real state of the GPU. It is proposed to use the characteristics of the acoustic and vibration processes of the GPU as the input data of the ANN. A descriptive statistical analysis of real vibration and acoustic processes generated by the operation of the GPU type GTK-25-i (Nuovo Pignone, Italy) was carried out. The formation of batches of diagnostic features arriving at the input of the ANN was carried out. Diagnostic features are the five maximum amplitude components of the acoustic and vibration signals, as well as the value of the standard deviation for each sample. Diagnostic features are calculated directly in the ANN input data pipeline in real time for three technical states of the GPU. Using the frameworks TensorFlow, Keras, NumPy, pandas, in the Python 3 programming language, an architecture was developed for a deep fully connected feedforward ANN, trained on the backpropagation algorithm. The results of training and testing the developed ANN are presented. During testing, it was found that the signal classification precision for the “nominal” state of all 1,475 signal samples is 1.0000, for the “current” state, precision equals 0.9853, and for the “defective” state, precision is 0.9091. The use of the developed ANN makes it possible to classify the technical states of the GPU with an accuracy sufficient for practical use, which will prevent the occurrence of GPU failures. ANN can be used to diagnose GPU of any type and power
45

Střelák, David, Carlos Óscar S. Sorzano, José María Carazo, and Jiří Filipovič. "A GPU acceleration of 3-D Fourier reconstruction in cryo-EM." International Journal of High Performance Computing Applications 33, no. 5 (March 11, 2019): 948–59. http://dx.doi.org/10.1177/1094342019832958.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Cryo-electron microscopy is a popular method for macromolecules structure determination. Reconstruction of a 3-D volume from raw data obtained from a microscope is highly computationally demanding. Thus, acceleration of the reconstruction has a great practical value. In this article, we introduce a novel graphics processing unit (GPU)-friendly algorithm for direct Fourier reconstruction, one of the main computational bottlenecks in the 3-D volume reconstruction pipeline for some experimental cases (particularly those with a large number of images and a high internal symmetry). Contrary to the state of the art, our algorithm uses a gather memory pattern, improving cache locality and removing race conditions in parallel writing into the 3-D volume. We also introduce a finely tuned CUDA implementation of our algorithm, using auto-tuning to search for a combination of optimization parameters maximizing performance on a given GPU architecture. Our CUDA implementation is integrated in widely used software Xmipp, version 3.19, reaching 11.4× speedup compared to the original parallel CPU implementation using GPU with comparable power consumption. Moreover, we have reached 31.7× speedup using four GPUs and 2.14×–5.96× speedup compared to optimized GPU implementation based on a scatter memory pattern.
46

Konnurmath, Guruprasad, and Satyadhyan Chickerur. "GPU Shader Analysis and Power Optimization Model." Engineering, Technology & Applied Science Research 14, no. 1 (February 8, 2024): 12925–30. http://dx.doi.org/10.48084/etasr.6695.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
With the rapid advancements in 3D game technology, workload characterization has become crucial for each new generation of games. The increased complexity of scenes in 3D games allows for stunning real-time visual quality. However, handling such workloads results in significant power consumption over the GPU rendering pipeline. The focus of the current paper is low power optimization, targeting texture memory, geometry engine, pixel, and rasterization, as these components are significant contributors to the power consumption of a typical GPU. The proposed methodology integrates the Dynamic Voltage Frequency Scaling (DVFS) technique, adjusting voltage and frequency based on the workload analysis of frame rates with respect to the scenes of 3D games. Frame rates of 60 fps and 30 fps are set up to understand and manage the workload on frames. Furthermore, for comparative analysis, various frame-level power analysis schemes such as No DVFS implemented, Frame History Method, Frame Signature Method, and Tiled History-based are introduced. The proposed scheme consistently surpasses these frame-level schemes, with fewer missed deadlines, while having the lowest energy consumption per frame rate. The implementation resulted in a remarkable 65% improvement in quality, indicated by a reduction in deadline misses, along with a substantial 60% energy saving.
47

Khalid, Muhammad Farhan, Kanzal Iman, Amna Ghafoor, Mujtaba Saboor, Ahsan Ali, Urwa Muaz, Abdul Rehman Basharat, et al. "PERCEPTRON: an open-source GPU-accelerated proteoform identification pipeline for top-down proteomics." Nucleic Acids Research 49, W1 (May 17, 2021): W510—W515. http://dx.doi.org/10.1093/nar/gkab368.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract PERCEPTRON is a next-generation freely available web-based proteoform identification and characterization platform for top-down proteomics (TDP). PERCEPTRON search pipeline brings together algorithms for (i) intact protein mass tuning, (ii) de novo sequence tags-based filtering, (iii) characterization of terminal as well as post-translational modifications, (iv) identification of truncated proteoforms, (v) in silico spectral comparison, and (vi) weight-based candidate protein scoring. High-throughput performance is achieved through the execution of optimized code via multiple threads in parallel, on graphics processing units (GPUs) using NVidia Compute Unified Device Architecture (CUDA) framework. An intuitive graphical web interface allows for setting up of search parameters as well as for visualization of results. The accuracy and performance of the tool have been validated on several TDP datasets and against available TDP software. Specifically, results obtained from searching two published TDP datasets demonstrate that PERCEPTRON outperforms all other tools by up to 135% in terms of reported proteins and 10-fold in terms of runtime. In conclusion, the proposed tool significantly enhances the state-of-the-art in TDP search software and is publicly available at https://perceptron.lums.edu.pk. Users can also create in-house deployments of the tool by building code available on the GitHub repository (http://github.com/BIRL/Perceptron).
48

Cali, Damla Senol, Thomas Anantharaman, Martin Muggli, Samer Al-Saffar, Charles Schoonover, and Neil Miller. "Abstract 2337: Accelerated optical genome mapping analysis with Stratys Compute and Guided Assembly." Cancer Research 84, no. 6_Supplement (March 22, 2024): 2337. http://dx.doi.org/10.1158/1538-7445.am2024-2337.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Background Optical genome maps (OGM) from Bionano enable the detection of genomic structural and copy number variants that cannot be detected by next-generation sequencing (NGS) technologies and are often missed by conventional cytogenetic techniques. Bionano has developed bioinformatics pipelines for calling structural and copy number variants including the Bionano Solve de novo assembly pipeline for constitutional analysis and the Rare Variant Analysis (RVA) pipeline for low-allele-fraction cancer applications. Both pipelines are computationally intensive and currently take 5-10 hours on the latest generation of the Bionano Saphyr® Compute which is deployed as a four-node compute cluster and requires significant IT resources. Methods To increase throughput and simplify the deployment of the compute resources needed to support the Stratys™ optical genome mapping instrument, Bionano has developed Stratys™ Compute. Stratys Compute improves both compute and analytical performance by adapting compute-intensive stages in the Solve pipeline to run on GPUs and by developing the Guided Assembly pipeline. Stratys Compute is powered by state-of-the-art NVIDIA RTX 6000 Ada generation cards and CUDA-optimized refinement, alignment, and structural variation detection kernels to accelerate OGM analysis. Stratys Compute is a standalone workstation placed adjacent to Stratys instrument. This will minimize the IT footprint and bypass the integration with the customer site’s data center. We have incorporated these optimizations into the new Guided Assembly pipeline, which aims to combine the low-allele fraction detection capability of the RVA pipeline with the whole genome coverage and ability to detect smaller structural variants enabled by the de novo assembly pipeline. The Guided Assembly pipeline uses the reference genome as an initial seed followed by extension, refinement, and structural variant calling. This new analysis method has been evaluated through comparison to previous results from both existing pipelines and standard benchmarking datasets used to estimate structural variant calling performance and is deployed on Stratys Compute for both constitutional and low allele-fraction applications. Results Guided Assembly has been adapted to run on GPU hardware in a simplified compute tower that can be deployed to a lab along with the Stratys instrument without the need for a dedicated server room. We found concordance between the guided assembly results and our previous de novo and RVA pipeline results. We also found increased sensitivity at low allele fractions for detecting insertion variants smaller than 5 kb and larger than 200 kb while finding equivalent performance for other variant types with the updated methods. The accelerated Guided Assembly for constitutional and low-allele-fraction applications will be available to early access customers in Q4 2023 with full commercial release in Q2 2024. Citation Format: Damla Senol Cali, Thomas Anantharaman, Martin Muggli, Samer Al-Saffar, Charles Schoonover, Neil Miller. Accelerated optical genome mapping analysis with Stratys Compute and Guided Assembly [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2337.
49

Lazar, Alina, Xiangyang Ju, Daniel Murnane, Paolo Calafiura, Steven Farrell, Yaoyuan Xu, Maria Spiropulu, et al. "Accelerating the Inference of the Exa.TrkX Pipeline." Journal of Physics: Conference Series 2438, no. 1 (February 1, 2023): 012008. http://dx.doi.org/10.1088/1742-6596/2438/1/012008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Recently, graph neural networks (GNNs) have been successfully used for a variety of particle reconstruction problems in high energy physics, including particle tracking. The Exa.TrkX pipeline based on GNNs demonstrated promising performance in reconstructing particle tracks in dense environments. It includes five discrete steps: data encoding, graph building, edge filtering, GNN, and track labeling. All steps were written in Python and run on both GPUs and CPUs. In this work, we accelerate the Python implementation of the pipeline through customized and commercial GPU-enabled software libraries, and develop a C++ implementation for inferencing the pipeline. The implementation features an improved, CUDA-enabled fixed-radius nearest neighbor search for graph building and a weakly connected component graph algorithm for track labeling. GNNs and other trained deep learning models are converted to ONNX and inferenced via the ONNX Runtime C++ API. The complete C++ implementation of the pipeline allows integration with existing tracking software. We report the memory usage and average event latency tracking performance of our implementation applied to the TrackML benchmark dataset.
50

Zhao, Hanyu, Zhi Yang, Yu Cheng, Chao Tian, Shiru Ren, Wencong Xiao, Man Yuan, et al. "GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning." Proceedings of the ACM on Management of Data 1, no. 2 (June 13, 2023): 1–25. http://dx.doi.org/10.1145/3589773.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Training data pre-processing pipelines are essential to deep learning (DL). As the performance of model training keeps increasing with both hardware advancements (e.g., faster GPUs) and various software optimizations, the data pre-processing on CPUs is becoming more resource-intensive and a severe bottleneck of the pipeline. This problem is even worse in the cloud, where training jobs exhibit diverse CPU-GPU demands that usually result in mismatches with fixed hardware configurations and resource fragmentation, degrading both training performance and cluster utilization. We introduce GoldMiner, an input data processing service for stateless operations used in pre-processing data for DL model training. GoldMiner decouples data pre-processing from model training into a new role called the data worker. Data workers facilitate scaling of data pre-processing to anywhere in a cluster, effectively pooling the resources across the cluster to satisfy the diverse requirements of training jobs. GoldMiner achieves this decoupling in a fully automatic and elastic manner. The key insight is that data pre-processing is inherently stateless, thus can be executed independently and elastically. This insight guides GoldMiner to automatically extract stateless computation out of a monolithic training program, efficiently disaggregate it across data workers, and elastically scale data workers to tune the resource allocations across jobs to optimize cluster efficiency. We have applied GoldMiner to industrial workloads, and our evaluation shows that GoldMiner can transform unmodified training programs to use data workers, accelerating individual training jobs by up to 12.1x. GoldMiner also improves average job completion time and aggregate GPU utilization by up to 2.5x and 2.1x in a 64-GPU cluster, respectively, by scheduling data workers with elasticity.

To the bibliography