To see the other types of publications on this topic, follow the link: GPU code generation.

Journal articles on the topic 'GPU code generation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'GPU code generation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

EMMART, NIALL, and CHARLES WEEMS. "SEARCH-BASED AUTOMATIC CODE GENERATION FOR MULTIPRECISION MODULAR EXPONENTIATION ON MULTIPLE GENERATIONS OF GPU." Parallel Processing Letters 23, no. 04 (2013): 1340009. http://dx.doi.org/10.1142/s0129626413400094.

Full text
Abstract:
Multiprecision modular exponentiation has a variety of uses, including cryptography, prime testing and computational number theory. It is also a very costly operation to compute. GPU parallelism can be used to accelerate these computations, but to use the GPU efficiently, a problem must involve many simultaneous exponentiation operations. Handling a large number of TLS/SSL encrypted sessions in a data center is an important problem that fits this profile. We are developing a framework that enables generation of highly efficient implementations of exponentiation operations for different NVIDIA
APA, Harvard, Vancouver, ISO, and other styles
2

Afar Nazim, Allazov. "Automatic Generation of GPU Code in DVOR." University News. North-Caucasian Region. Technical Sciences Series, no. 3 (September 2015): 3–9. http://dx.doi.org/10.17213/0321-2653-2015-3-3-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Blazewicz, Marek, Ian Hinder, David M. Koppelman, et al. "From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation." Scientific Programming 21, no. 1-2 (2013): 1–16. http://dx.doi.org/10.1155/2013/167841.

Full text
Abstract:
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, theChemoraframework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversa
APA, Harvard, Vancouver, ISO, and other styles
4

Rodrigues, A. Wendell O., Frédéric Guyomarc'h, Jean-Luc Dekeyser, and Yvonnick Le Menach. "Automatic Multi-GPU Code Generation Applied to Simulation of Electrical Machines." IEEE Transactions on Magnetics 48, no. 2 (2012): 831–34. http://dx.doi.org/10.1109/tmag.2011.2179527.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rawat, Prashant Singh, Miheer Vaidya, Aravind Sukumaran-Rajam, et al. "Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations." Proceedings of the IEEE 106, no. 11 (2018): 1902–20. http://dx.doi.org/10.1109/jproc.2018.2862896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Basu, Protonu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, and Mary Hall. "Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers." Parallel Computing 64 (May 2017): 50–64. http://dx.doi.org/10.1016/j.parco.2017.04.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Klöckner, Andreas, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. "PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation." Parallel Computing 38, no. 3 (2012): 157–74. http://dx.doi.org/10.1016/j.parco.2011.09.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Hagiescu, Andrei, Bing Liu, R. Ramanathan, et al. "GPU code generation for ODE-based applications with phased shared-data access patterns." ACM Transactions on Architecture and Code Optimization 10, no. 4 (2013): 1–19. http://dx.doi.org/10.1145/2541228.2555311.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Holzer, Markus, Martin Bauer, Harald Köstler, and Ulrich Rüde. "Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at high-density ratios on CPUs and GPUs through code generation." International Journal of High Performance Computing Applications 35, no. 4 (2021): 413–27. http://dx.doi.org/10.1177/10943420211016525.

Full text
Abstract:
A high-performance implementation of a multiphase lattice Boltzmann method based on the conservative Allen-Cahn model supporting high-density ratios and high Reynolds numbers is presented. Meta-programming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a high-level symbolic description and optimized through automatic transformations. The memory footprint of the resulting algorithm is reduced through the fusion of compute kernels. A roofline analysis demonstrates the excellent efficiency of the generated code on a single GPU. Th
APA, Harvard, Vancouver, ISO, and other styles
10

Walsh, Stuart D. C., and Martin O. Saar. "Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units." Communications in Computational Physics 13, no. 3 (2013): 867–79. http://dx.doi.org/10.4208/cicp.351011.260112s.

Full text
Abstract:
AbstractLattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior. These methods are well suited to parallel implementation, particularly on the single-instruction multiple data (SIMD) parallel processing environments found in computer graphics processing units (GPUs).Although recent programming tools dramatically improve the ease with which GPUbased applications can be written, the programming environment still lacks the flexibility available to more traditional CPU programs. In particular, it may be difficult to d
APA, Harvard, Vancouver, ISO, and other styles
11

Xu, Shuqi, and Gilles Noguere. "Generation of thermal scattering files with the CINEL code." EPJ Nuclear Sciences & Technologies 8 (2022): 8. http://dx.doi.org/10.1051/epjn/2022004.

Full text
Abstract:
The CINEL code dedicated to generate the thermal neutron scattering files in ENDF-6 format for solid crystalline, free gas materials and liquid water is presented. Compared to the LEAPR module of the NJOY code, CINEL is able to calculate the coherent and incoherent elastic scattering cross sections for any solid crystalline materials. Specific material properties such as anharmonicity and texture can be taken into account in CINEL. The calculation of the thermal scattering laws can be accelerated by using graphics processing unit (GPU), which enables to remove the short collision time approxim
APA, Harvard, Vancouver, ISO, and other styles
12

Lapillonne, Xavier, and Oliver Fuhrer. "Using Compiler Directives to Port Large Scientific Applications to GPUs: An Example from Atmospheric Science." Parallel Processing Letters 24, no. 01 (2014): 1450003. http://dx.doi.org/10.1142/s0129626414500030.

Full text
Abstract:
For many scientific applications, Graphics Processing Units (GPUs) can be an interesting alternative to conventional CPUs as they can deliver higher memory bandwidth and computing power. While it is conceivable to re-write the most execution time intensive parts using a low-level API for accelerator programming, it may not be feasible to do it for the entire application. But, having only selected parts of the application running on the GPU requires repetitively transferring data between the GPU and the host CPU, which may lead to a serious performance penalty. In this paper we assess the poten
APA, Harvard, Vancouver, ISO, and other styles
13

Cesare, Valentina, Ugo Becciani, Alberto Vecchiato, et al. "The MPI + CUDA Gaia AVU–GSR Parallel Solver Toward Next-generation Exascale Infrastructures." Publications of the Astronomical Society of the Pacific 135, no. 1049 (2023): 074504. http://dx.doi.org/10.1088/1538-3873/acdf1e.

Full text
Abstract:
Abstract We ported to the GPU with CUDA the Astrometric Verification Unit–Global Sphere Reconstruction (AVU–GSR) Parallel Solver developed for the ESA Gaia mission, by optimizing a previous OpenACC porting of this application. The code aims to find, with a [10, 100] μarcsec precision, the astrometric parameters of ∼108 stars, the attitude and instrumental settings of the Gaia satellite, and the global parameter γ of the parametrized Post-Newtonian formalism, by solving a system of linear equations, A × x = b , with the LSQR iterative algorithm. The coefficient matrix A of the final Gaia data s
APA, Harvard, Vancouver, ISO, and other styles
14

Noguere, G., S. Xu, L. Desgrange, et al. "Generation of thermal scattering laws with the CINEL code." EPJ Web of Conferences 284 (2023): 17002. http://dx.doi.org/10.1051/epjconf/202328417002.

Full text
Abstract:
The thermal scattering laws (TSL) take into account the crystalline structure and atomic motions of isotopes bound in materials. This paper presents the CINEL code, which was developed to generate temperature-dependent TSL for solid, liquid and free gas materials of interest for nuclear reactors. CINEL is able to calculate TSL from the phonon density of states (PDOS) of materials under the Gaussian-Incoherent approximations. The PDOS can be obtained by using theoretical approaches (e.g., ab initio density functional theory and molecular dynamics) or experimental results. In this work, the PDOS
APA, Harvard, Vancouver, ISO, and other styles
15

Morales-Hernández, M., M. B. Sharif, S. Gangrade, et al. "High-performance computing in water resources hydrodynamics." Journal of Hydroinformatics 22, no. 5 (2020): 1217–35. http://dx.doi.org/10.2166/hydro.2020.163.

Full text
Abstract:
Abstract This work presents a vision of future water resources hydrodynamics codes that can fully utilize the strengths of modern high-performance computing (HPC). The advances to computing power, formerly driven by the improvement of central processing unit processors, now focus on parallel computing and, in particular, the use of graphics processing units (GPUs). However, this shift to a parallel framework requires refactoring the code to make efficient use of the data as well as changing even the nature of the algorithm that solves the system of equations. These concepts along with other fe
APA, Harvard, Vancouver, ISO, and other styles
16

Frolov, Vladimir, Vadim Sanzharov, Vladimir Galaktionov, and Alexander Shcherbakov. "Development in Vulkan: a domain-specific approach." Proceedings of the Institute for System Programming of the RAS 33, no. 5 (2021): 181–204. http://dx.doi.org/10.15514/ispras-2021-33(5)-11.

Full text
Abstract:
In this paper we propose a high-level approach to developing GPU applications based on the Vulkan API. The purpose of the work is to reduce the complexity of developing and debugging applications that implement complex algorithms on the GPU using Vulkan. The proposed approach uses the technology of code generation by translating a C++ program into an optimized implementation in Vulkan, which includes automatic shader generation, resource binding, and the use of synchronization mechanisms (Vulkan barriers). The proposed solution is not a general-purpose programming technology, but specializes i
APA, Harvard, Vancouver, ISO, and other styles
17

Golosio, Bruno, Jose Villamar, Gianmarco Tiddia, et al. "Runtime Construction of Large-Scale Spiking Neuronal Network Models on GPU Devices." Applied Sciences 13, no. 17 (2023): 9598. http://dx.doi.org/10.3390/app13179598.

Full text
Abstract:
Simulation speed matters for neuroscientific research: this includes not only how quickly the simulated model time of a large-scale spiking neuronal network progresses but also how long it takes to instantiate the network model in computer memory. On the hardware side, acceleration via highly parallel GPUs is being increasingly utilized. On the software side, code generation approaches ensure highly optimized code at the expense of repeated code regeneration and recompilation after modifications to the network model. Aiming for a greater flexibility with respect to iterative model changes, her
APA, Harvard, Vancouver, ISO, and other styles
18

Hoffmann, Lars, Paul F. Baumeister, Zhongyin Cai, et al. "Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on graphics processing units (GPUs)." Geoscientific Model Development 15, no. 7 (2022): 2731–62. http://dx.doi.org/10.5194/gmd-15-2731-2022.

Full text
Abstract:
Abstract. Lagrangian models are fundamental tools to study atmospheric transport processes and for practical applications such as dispersion modeling for anthropogenic and natural emission sources. However, conducting large-scale Lagrangian transport simulations with millions of air parcels or more can become rather numerically costly. In this study, we assessed the potential of exploiting graphics processing units (GPUs) to accelerate Lagrangian transport simulations. We ported the Massive-Parallel Trajectory Calculations (MPTRAC) model to GPUs using the open accelerator (OpenACC) programming
APA, Harvard, Vancouver, ISO, and other styles
19

Kiran, Utpal, Deepak Sharma, and Sachin Singh Gautam. "GPU-warp based finite element matrices generation and assembly using coloring method." Journal of Computational Design and Engineering 6, no. 4 (2018): 705–18. http://dx.doi.org/10.1016/j.jcde.2018.11.001.

Full text
Abstract:
Abstract Finite element method has been successfully implemented on the graphics processing units to achieve a significant reduction in simulation time. In this paper, new strategies for the finite element matrix generation including numerical integration and assembly are proposed by using a warp per element for a given mesh. These strategies are developed using the well-known coloring method. The proposed strategies use a specialized algorithm to realize fine-grain parallelism and efficient use of on-chip memory resources. The warp shuffle feature of Compute Unified Device Architecture (CUDA)
APA, Harvard, Vancouver, ISO, and other styles
20

Сентябов, А. В., А. А. Гаврилов, М. А. Кривов, А. А. Дектерев, and М. Н. Притула. "Efficiency analysis of hydrodynamic calculations on GPU and CPU clusters." Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie), no. 3 (September 20, 2016): 329–38. http://dx.doi.org/10.26089/nummet.v17r331.

Full text
Abstract:
Рассматривается ускорение параллельных гидродинамических расчетов на кластерах с CPU- и GPU-узлами. Для тестирования используется собственный CFD-код SigmaFlow, портированный для расчетов на графических ускорителях с помощью технологии CUDA. Алгоритм моделирования течения несжимаемой жидкости основан на SIMPLE-подобной процедуре и дискретизации с помощью метода контрольного объема на неструктурированных сетках из тексаэдральных ячеек. Сравнение скорости расчета показывает высокую производительность графических ускорителей нового поколения в GPGPU-расчетах. Speedup of parallel hydrodynamic calc
APA, Harvard, Vancouver, ISO, and other styles
21

Song, Yankan, Ying Chen, Shaowei Huang, Yin Xu, Zhitong Yu, and Wei Xue. "Efficient GPU-Based Electromagnetic Transient Simulation for Power Systems With Thread-Oriented Transformation and Automatic Code Generation." IEEE Access 6 (2018): 25724–36. http://dx.doi.org/10.1109/access.2018.2833506.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Khalid, Muhammad Farhan, Kanzal Iman, Amna Ghafoor, et al. "PERCEPTRON: an open-source GPU-accelerated proteoform identification pipeline for top-down proteomics." Nucleic Acids Research 49, W1 (2021): W510—W515. http://dx.doi.org/10.1093/nar/gkab368.

Full text
Abstract:
Abstract PERCEPTRON is a next-generation freely available web-based proteoform identification and characterization platform for top-down proteomics (TDP). PERCEPTRON search pipeline brings together algorithms for (i) intact protein mass tuning, (ii) de novo sequence tags-based filtering, (iii) characterization of terminal as well as post-translational modifications, (iv) identification of truncated proteoforms, (v) in silico spectral comparison, and (vi) weight-based candidate protein scoring. High-throughput performance is achieved through the execution of optimized code via multiple threads
APA, Harvard, Vancouver, ISO, and other styles
23

Lessley, Brenton, Shaomeng Li, and Hank Childs. "HashFight: A Platform-Portable Hash Table for Multi-Core and Many-Core Architectures." Electronic Imaging 2020, no. 1 (2020): 376–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.1.vda-376.

Full text
Abstract:
We introduce a new platform-portable hash table and collision-resolution approach, HashFight, for use in visualization and data analysis algorithms. Designed entirely in terms of dataparallel primitives (DPPs), HashFight is atomics-free and consists of a single code base that can be invoked across a diverse range of architectures. To evaluate its hashing performance, we compare the single-node insert and query throughput of Hash- Fight to that of two best-in-class GPU and CPU hash table implementations, using several experimental configurations and factors. Overall, HashFight maintains competi
APA, Harvard, Vancouver, ISO, and other styles
24

He, Q., A. Rezaei, and S. Pursiainen. "Zeffiro User Interface for Electromagnetic Brain Imaging: a GPU Accelerated FEM Tool for Forward and Inverse Computations in Matlab." Neuroinformatics 18, no. 2 (2019): 237–50. http://dx.doi.org/10.1007/s12021-019-09436-9.

Full text
Abstract:
Abstract This article introduces the Zeffiro interface (ZI) version 2.2 for brain imaging. ZI aims to provide a simple, accessible and multimodal open source platform for finite element method (FEM) based and graphics processing unit (GPU) accelerated forward and inverse computations in the Matlab environment. It allows one to (1) generate a given multi-compartment head model, (2) to evaluate a lead field matrix as well as (3) to invert and analyze a given set of measurements. GPU acceleration is applied in each of the processing stages (1)–(3). In its current configuration, ZI includes forwar
APA, Harvard, Vancouver, ISO, and other styles
25

Wrede, Fabian, and Herbert Kuchen. "Towards High-Performance Code Generation for Multi-GPU Clusters Based on a Domain-Specific Language for Algorithmic Skeletons." International Journal of Parallel Programming 48, no. 4 (2020): 713–28. http://dx.doi.org/10.1007/s10766-020-00659-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Concas, M. "A vendor-agnostic, single code-based GPU tracking for the Inner Tracking System of the ALICE experiment." Journal of Physics: Conference Series 2438, no. 1 (2023): 012134. http://dx.doi.org/10.1088/1742-6596/2438/1/012134.

Full text
Abstract:
Abstract During the LHC Run 3 the ALICE online computing farm will process up to 50 times more Pb-Pb events per second than in Run 2. The implied computing resource scaling requires a shift in the approach that comprises the extensive usage of Graphics Processing Units (GPU) for the processing. We will give an overview of the state of the art for the data reconstruction on GPUs in ALICE, with additional focus on the Inner Tracking System detector. A detailed teardown of adopted techniques, implemented algorithms and approaches and performance report will be shown. Additionally, we will show ho
APA, Harvard, Vancouver, ISO, and other styles
27

Ikuyajolu, Olawale James, Luke Van Roekel, Steven R. Brus, Erin E. Thomas, Yi Deng, and Sarat Sreepathi. "Porting the WAVEWATCH III (v6.07) wave action source terms to GPU." Geoscientific Model Development 16, no. 4 (2023): 1445–58. http://dx.doi.org/10.5194/gmd-16-1445-2023.

Full text
Abstract:
Abstract. Surface gravity waves play a critical role in several processes, including mixing, coastal inundation, and surface fluxes. Despite the growing literature on the importance of ocean surface waves, wind–wave processes have traditionally been excluded from Earth system models (ESMs) due to the high computational costs of running spectral wave models. The development of the Next Generation Ocean Model for the DOE’s (Department of Energy) E3SM (Energy Exascale Earth System Model) Project partly focuses on the inclusion of a wave model, WAVEWATCH III (WW3), into E3SM. WW3, which was origin
APA, Harvard, Vancouver, ISO, and other styles
28

Vasilev, Eugene, Dmitry Lachinov, Anton Grishin, and Vadim Turlapov. "Fast tetrahedral mesh generation and segmentation of an atlas-based heart model using a periodic uniform grid." Russian Journal of Numerical Analysis and Mathematical Modelling 33, no. 5 (2018): 315–23. http://dx.doi.org/10.1515/rnam-2018-0026.

Full text
Abstract:
Abstract A fast procedure for generation of regular tetrahedral finite element mesh for objects with complex shape cavities is proposed. The procedure like LBIE-Mesher can generate tetrahedral meshes for the volume interior to a polygonal surface, or for an interval volume between two surfaces having a complex shape and defined in STL-format. This procedure consists of several stages: generation of a regular tetrahedral mesh that fills the volume of the required object; generation of clipping for the uniform grid parts by a boundary surface; shifting vertices of the boundary layer to align ont
APA, Harvard, Vancouver, ISO, and other styles
29

Moscibrodzka, Monika A., and Aristomenis I. Yfantis. "Prospects for Ray-tracing Light Intensity and Polarization in Models of Accreting Compact Objects Using a GPU." Astrophysical Journal Supplement Series 265, no. 1 (2023): 22. http://dx.doi.org/10.3847/1538-4365/acb6f9.

Full text
Abstract:
Abstract The Event Horizon Telescope (EHT) has recently released high-resolution images of accretion flows onto two supermassive black holes. Our physical understanding of these images depends on the accuracy and precision of numerical models of plasma and radiation around compact objects. The goal of this work is to speed up radiative-transfer simulations used to create mock images of black holes for comparison with the EHT observations. A ray-tracing code for general relativistic and fully polarized radiative transfer through plasma in strong gravity is ported onto a graphics processing unit
APA, Harvard, Vancouver, ISO, and other styles
30

Torky, Ahmed A., and Youssef F. Rashed. "High-performance practical stiffness analysis of high-rise buildings using superfloor elements." Journal of Computational Design and Engineering 7, no. 2 (2020): 211–27. http://dx.doi.org/10.1093/jcde/qwaa018.

Full text
Abstract:
Abstract This study develops a high-performance computing method using OpenACC (Open Accelerator) for the stiffness matrix and load vector generation of shear-deformable plates in bending using the boundary element method on parallel processors. The boundary element formulation for plates in bending is used to derive fully populated displacement-based stiffness matrices and load vectors at degrees of freedom of interest. The computed stiffness matrix of the plate is defined as a single superfloor element and can be solved using stiffness analysis, $Ku = F$, instead of the conventional boundary
APA, Harvard, Vancouver, ISO, and other styles
31

Frontiere, Nicholas, J. D. Emberson, Michael Buehlmann, et al. "Simulating Hydrodynamics in Cosmology with CRK-HACC." Astrophysical Journal Supplement Series 264, no. 2 (2023): 34. http://dx.doi.org/10.3847/1538-4365/aca58d.

Full text
Abstract:
Abstract We introduce CRK-HACC, an extension of the Hardware/Hybrid Accelerated Cosmology Code (HACC), to resolve gas hydrodynamics in large-scale structure formation simulations of the universe. The new framework couples the HACC gravitational N-body solver with a modern smoothed-particle hydrodynamics (SPH) approach called conservative reproducing kernel SPH (CRKSPH). CRKSPH utilizes smoothing functions that exactly interpolate linear fields while manifestly preserving conservation laws (momentum, mass, and energy). The CRKSPH method has been incorporated to accurately model baryonic effects
APA, Harvard, Vancouver, ISO, and other styles
32

Cecilia, José M., Juan-Carlos Cano, Juan Morales-García, Antonio Llanes, and Baldomero Imbernón. "Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms." Sensors 20, no. 21 (2020): 6335. http://dx.doi.org/10.3390/s20216335.

Full text
Abstract:
Internet of Things (IoT) is becoming a new socioeconomic revolution in which data and immediacy are the main ingredients. IoT generates large datasets on a daily basis but it is currently considered as “dark data”, i.e., data generated but never analyzed. The efficient analysis of this data is mandatory to create intelligent applications for the next generation of IoT applications that benefits society. Artificial Intelligence (AI) techniques are very well suited to identifying hidden patterns and correlations in this data deluge. In particular, clustering algorithms are of the utmost importan
APA, Harvard, Vancouver, ISO, and other styles
33

Winter, Robin, Joren Retel, Frank Noé, Djork-Arné Clevert, and Andreas Steffen. "grünifai: interactive multiparameter optimization of molecules in a continuous vector space." Bioinformatics 36, no. 13 (2020): 4093–94. http://dx.doi.org/10.1093/bioinformatics/btaa271.

Full text
Abstract:
Abstract Summary Optimizing small molecules in a drug discovery project is a notoriously difficult task as multiple molecular properties have to be considered and balanced at the same time. In this work, we present our novel interactive in silico compound optimization platform termed grünifai to support the ideation of the next generation of compounds under the constraints of a multiparameter objective. grünifai integrates adjustable in silico models, a continuous representation of the chemical space, a scalable particle swarm optimization algorithm and the possibility to actively steer the co
APA, Harvard, Vancouver, ISO, and other styles
34

Lehmann, Moritz. "Esoteric Pull and Esoteric Push: Two Simple In-Place Streaming Schemes for the Lattice Boltzmann Method on GPUs." Computation 10, no. 6 (2022): 92. http://dx.doi.org/10.3390/computation10060092.

Full text
Abstract:
I present two novel thread-safe in-place streaming schemes for the lattice Boltzmann method (LBM) on graphics processing units (GPUs), termed Esoteric Pull and Esoteric Push, that result in the LBM only requiring one copy of the density distribution functions (DDFs) instead of two, greatly reducing memory demand. These build upon the idea of the existing Esoteric Twist scheme, to stream half of the DDFs at the end of one stream-collide kernel and the remaining half at the beginning of the next and offer the same beneficial properties over the AA-Pattern scheme—reduced memory bandwidth due to i
APA, Harvard, Vancouver, ISO, and other styles
35

Grosser, Tobias, Sven Verdoolaege, Albert Cohen, and P. Sadayappan. "The Relation Between Diamond Tiling and Hexagonal Tiling." Parallel Processing Letters 24, no. 03 (2014): 1441002. http://dx.doi.org/10.1142/s0129626414410023.

Full text
Abstract:
Iterative stencil computations are important in scientific computing and more also in the embedded and mobile domain. Recent publications have shown that tiling schemes that ensure concurrent start provide efficient ways to execute these kernels. Diamond tiling and hybrid-hexagonal tiling are two tiling schemes that enable concurrent start. Both have different advantages: diamond tiling has been integrated in a general purpose optimization framework and uses a cost function to choose among tiling hyperplanes, whereas the greater flexibility with tile sizes for hybrid-hexagonal tiling has been
APA, Harvard, Vancouver, ISO, and other styles
36

Bloch, Aurelien, Simone Casale-Brunet, and Marco Mattavelli. "Performance Estimation of High-Level Dataflow Program on Heterogeneous Platforms by Dynamic Network Execution." Journal of Low Power Electronics and Applications 12, no. 3 (2022): 36. http://dx.doi.org/10.3390/jlpea12030036.

Full text
Abstract:
The performance of programs executed on heterogeneous parallel platforms largely depends on the design choices regarding how to partition the processing on the various different processing units. In other words, it depends on the assumptions and parameters that define the partitioning, mapping, scheduling, and allocation of data exchanges among the various processing elements of the platform executing the program. The advantage of programs written in languages using the dataflow model of computation (MoC) is that executing the program with different configurations and parameter settings does n
APA, Harvard, Vancouver, ISO, and other styles
37

Abdelfattah, A., H. Anzt, J. Dongarra, et al. "Linear algebra software for large-scale accelerated multicore computing." Acta Numerica 25 (May 1, 2016): 1–160. http://dx.doi.org/10.1017/s0962492916000015.

Full text
Abstract:
Many crucial scientific computing applications, ranging from national security to medical advances, rely on high-performance linear algebra algorithms and technologies, underscoring their importance and broad impact. Here we present the state-of-the-art design and implementation practices for the acceleration of the predominant linear algebra algorithms on large-scale accelerated multicore systems. Examples are given with fundamental dense linear algebra algorithms – from the LU, QR, Cholesky, and LDLT factorizations needed for solving linear systems of equations, to eigenvalue and singular va
APA, Harvard, Vancouver, ISO, and other styles
38

Liu, Weihuang, Xiaodong Cun, Chi-Man Pun, Menghan Xia, Yong Zhang, and Jue Wang. "CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (2023): 1746–54. http://dx.doi.org/10.1609/aaai.v37i2.25263.

Full text
Abstract:
Image inpainting aims to fill the missing hole of the input. It is hard to solve this task efficiently when facing high-resolution images due to two reasons: (1) Large reception field needs to be handled for high-resolution image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously due to the form of the image matrix. In this paper, we try to break the above limitations for the first time thanks to the recent development of continuous implicit representation. In detail, we down-sample and encode the degraded image to produce the spatial-adapt
APA, Harvard, Vancouver, ISO, and other styles
39

Thi Nga, Nguyen, Ha Thi Thu, Nguyen Thi Hoa, et al. "Assessment of the genetic changes of the attenuated Hanvet1.vn strain compared with original virulent 02HY strain of the porcine reproductive and respiratory syndrome virus." Vietnam Journal of Biotechnology 20, no. 2 (2022): 245–52. http://dx.doi.org/10.15625/1811-4989/16677.

Full text
Abstract:
The attenuated porcine reproductive and respiratory syndrome virus (PRRSV) strain Hanvet1.vn was developed by Hanvet Pharmaceutical Co., Ltd. by inoculating the virulent strain 02HYon Marc-145 cells for 80 generations and used to produce PRRS vaccine. In this study, we published the results of sequencing, analyzing and comparing the genome of the attenuated PRRSV strain Hanvet1.vn compared with the original pathogenic strain 02HY. The genomes of strains Hanvet1.vn and 02HY have 8 reading frames, coding for 8 non-structural and structural proteins: NSP1a, NSP1b, GP2, GP3, GP4, GP5, MP, NP. Afte
APA, Harvard, Vancouver, ISO, and other styles
40

Harris-Dewey, Jared, and Richard Klein. "Generative Adversarial Networks for Non-Raytraced Global Illumination on Older GPU Hardware." International Journal of Electronics and Electrical Engineering 10, no. 1 (2022): 1–6. http://dx.doi.org/10.18178/ijeee.10.1.1-6.

Full text
Abstract:
We give an overview of the different rendering methods and we demonstrate that the use of a Generative Adversarial Networks (GAN) for Global Illumination (GI) gives a superior quality rendered image to that of a rasterisations image. We utilise the Pix2Pix architecture and specify the hyper-parameters and methodology used to mimic ray-traced images from a set of input features. We also demonstrate that the GANs quality is comparable to the quality of the ray-traced images, but is able to produce the image, at a fraction of the time. Source Code: https://github.com/Jaredrhd/Global-Illumination-
APA, Harvard, Vancouver, ISO, and other styles
41

Govett, Mark, Jim Rosinski, Jacques Middlecoff, et al. "Parallelization and Performance of the NIM Weather Model on CPU, GPU, and MIC Processors." Bulletin of the American Meteorological Society 98, no. 10 (2017): 2201–13. http://dx.doi.org/10.1175/bams-d-15-00278.1.

Full text
Abstract:
Abstract The design and performance of the Non-Hydrostatic Icosahedral Model (NIM) global weather prediction model is described. NIM is a dynamical core designed to run on central processing unit (CPU), graphics processing unit (GPU), and Many Integrated Core (MIC) processors. It demonstrates efficient parallel performance and scalability to tens of thousands of compute nodes and has been an effective way to make comparisons between traditional CPU and emerging fine-grain processors. The design of the NIM also serves as a useful guide in the fine-grain parallelization of the finite volume cube
APA, Harvard, Vancouver, ISO, and other styles
42

Holm, Håvard H., André R. Brodtkorb, and Martin L. Sætra. "GPU Computing with Python: Performance, Energy Efficiency and Usability." Computation 8, no. 1 (2020): 4. http://dx.doi.org/10.3390/computation8010004.

Full text
Abstract:
In this work, we examine the performance, energy efficiency, and usability when using Python for developing high-performance computing codes running on the graphics processing unit (GPU). We investigate the portability of performance and energy efficiency between Compute Unified Device Architecture (CUDA) and Open Compute Language (OpenCL); between GPU generations; and between low-end, mid-range, and high-end GPUs. Our findings showed that the impact of using Python is negligible for our applications, and furthermore, CUDA and OpenCL applications tuned to an equivalent level can in many cases
APA, Harvard, Vancouver, ISO, and other styles
43

Fang, Jian Wen, Jin Hui Yu, Shuang Xia Han, and Peng Wang. "Real-Time Ocean Water Animation in Cartoon Style." Key Engineering Materials 474-476 (April 2011): 2320–24. http://dx.doi.org/10.4028/www.scientific.net/kem.474-476.2320.

Full text
Abstract:
This paper presents a model for automatically generating 3D cartoon ocean water animations in real-time. The dynamic ocean water surface model is modeled by a spectral method. The cartoon rendering process is implemented by multipass on GPU: First, we code normal of ocean model and generate normal map. Next, we extract discontinuities from normal map and smooth it into edge map. Finally we combine the edge map with cartoon shading based on a projective texture mapping. Some experimental results demonstrate the prettiness and efficiency of the presented model.
APA, Harvard, Vancouver, ISO, and other styles
44

Rojek, Krzysztof, Kamil Halbiniak, and Lukasz Kuczynski. "CFD code adaptation to the FPGA architecture." International Journal of High Performance Computing Applications 35, no. 1 (2020): 33–46. http://dx.doi.org/10.1177/1094342020972461.

Full text
Abstract:
For the last years, we observe the intensive development of accelerated computing platforms. Although current trends indicate a well-established position of GPU devices in the HPC environment, FPGA (Field-Programmable Gate Array) aspires to be an alternative solution to offload the CPU computation. This paper presents a systematic adaptation of four various CFD (Computational Fluids Dynamic) kernels to the Xilinx Alveo U250 FPGA. The goal of this paper is to investigate the potential of the FPGA architecture as the future infrastructure able to provide the most complex numerical simulations in
APA, Harvard, Vancouver, ISO, and other styles
45

Liu, Yongjiu, Hao Gao, Qingyi Gu, Tadayoshi Aoyama, Takeshi Takaki, and Idaku Ishii. "High-Frame-Rate Structured Light 3-D Vision for Fast Moving Objects." Journal of Robotics and Mechatronics 26, no. 3 (2014): 311–20. http://dx.doi.org/10.20965/jrm.2014.p0311.

Full text
Abstract:
<div class=""abs_img""><img src=""[disp_template_path]/JRM/abst-image/00260003/04.jpg"" width=""300"" />HFR 3D vision system</span></div> This paper presents a fast motion-compensated structured-light vision system that realizes 3-D shape measurement at 500 fps using a high-frame-rate camera-projector system. Multiple light patterns with an 8-bit gray code, are projected on the measured scene at 1000 fps, and are processed in real time for generating 512 × 512 depth images at 500 fps by using the parallel processing of a motion-compensated structured-light method on a G
APA, Harvard, Vancouver, ISO, and other styles
46

Bentley, Phillip. "Accurate Simulation of Neutrons in Less Than One Minute Pt. 2: Sandman—GPU-Accelerated Adjoint Monte-Carlo Sampled Acceptance Diagrams." Quantum Beam Science 4, no. 2 (2020): 24. http://dx.doi.org/10.3390/qubs4020024.

Full text
Abstract:
A computational method in the modelling of neutron beams is described that blends neutron acceptance diagrams, GPU-based Monte-Carlo sampling, and a Bayesian approach to efficiency. The resulting code reaches orders of magnitude improvement in performance relative to existing methods. For example, data rates similar to world-leading, real instruments can be achieved on a 2017 laptop, generating 10 6 neutrons per second at the sample position of a high-resolution small angle scattering instrument. The method is benchmarked, and is shown to be in agreement with previous work. Finally, the method
APA, Harvard, Vancouver, ISO, and other styles
47

ZEIBDAWI, Abed R., Jean E. GRUNDY, Bogna LASIA, and Edward L. G. PRYZDIAL. "Coagulation factor Va Glu-96-Asp-111: a chelator-sensitive site involved in function and subunit association." Biochemical Journal 377, no. 1 (2004): 141–48. http://dx.doi.org/10.1042/bj20031205.

Full text
Abstract:
Coagulation FVa (factor Va) accelerates the essential generation of thrombin by FXa (factor Xa). Although the noncovalent Ca2+-dependent association between the FVa light and heavy subunits (FVaL and FVaH) is required for function, little is known about the specific residues involved. Previous fragmentation studies and homology modelling led us to investigate the contribution of Leu-94–Asp-112. Including prospective divalent cation-binding acidic amino acids, nine conserved residues were individually replaced with Ala in the recombinant B-domainless FVa precursor (ΔFV). While mutation of Thr-1
APA, Harvard, Vancouver, ISO, and other styles
48

Bartels, David W., and William D. Hutchison. "Microbial Control of First-Generation Ecb on Whorl Stage Corn, 1990." Insecticide and Acaricide Tests 16, no. 1 (1991): 72. http://dx.doi.org/10.1093/iat/16.1.72.

Full text
Abstract:
Abstract This experiment was conducted at the Rosemount Agricultural Experiment Station, Rosemount, Minn. 'Green Giant Code 40' sweet corn was planted on 7 Jun. Plots consisted of a single 30-ft row on 36-inch centers and were arranged in a randomized complete block design with 4 replications. Plots were infested with first instar ECB larvae on 24 Jun to simulate heavy pest pressure. Approximately 50 larvae were placed into each whorl with a “bazooka” applicator. The corn was 20—25 inches tall with no tassels visible from above. Insecticide applications were delayed by heavy rains (2.15 inches
APA, Harvard, Vancouver, ISO, and other styles
49

Kuśmirek, Wiktor, Wiktor Franus, and Robert Nowak. "Linking De Novo Assembly Results with Long DNA Reads Using the dnaasm-link Application." BioMed Research International 2019 (April 11, 2019): 1–10. http://dx.doi.org/10.1155/2019/7847064.

Full text
Abstract:
Currently, third-generation sequencing techniques, which make it possible to obtain much longer DNA reads compared to the next-generation sequencing technologies, are becoming more and more popular. There are many possibilities for combining data from next-generation and third-generation sequencing. Herein, we present a new application called dnaasm-link for linking contigs, the result of de novo assembly of second-generation sequencing data, with long DNA reads. Our tool includes an integrated module to fill gaps with a suitable fragment of an appropriate long DNA read, which improves the con
APA, Harvard, Vancouver, ISO, and other styles
50

Alcantara, Licinius Dimitri Sá de. "Towards a simple and secure method for binary cryptography via linear algebra." Revista Brasileira de Computação Aplicada 9, no. 3 (2017): 44. http://dx.doi.org/10.5335/rbca.v9i3.6556.

Full text
Abstract:
A simple and secure binary matrix encryption (BME) method is proposed and formalized on a linear algebra basis. The developed cryptography scheme does not require the idealization of a set of complex procedures or the generation of parallel bit stream for encryption of data, but it only needs to capture binary data sequences from the unprotected digital data, which are transformed into encrypted binary sequences by a cipher matrix. This method can be performed on physical or application layer level, and can be easily applied into any digital storage and telecommunication system. It also has th
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!