Segui questo link per vedere altri tipi di pubblicazioni sul tema: Mixed precision computation.

Articoli di riviste sul tema "Mixed precision computation"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "Mixed precision computation".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Van Zee, Field G., Devangi N. Parikh e Robert A. Van De Geijn. "Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework". ACM Transactions on Mathematical Software 47, n. 2 (aprile 2021): 1–26. http://dx.doi.org/10.1145/3402225.

Testo completo
Abstract (sommario):
We approach the problem of implementing mixed-datatype support within the general matrix multiplication ( gemm ) operation of the BLAS-like Library Instantiation Software framework, whereby each matrix operand A , B , and C may be stored as single- or double-precision real or complex values. Another factor of complexity, whereby the matrix product and accumulation are allowed to take place in a precision different from the storage precisions of either A or B , is also discussed. We first break the problem into orthogonal dimensions, considering the mixing of domains separately from mixing precisions. Support for all combinations of matrix operands stored in either the real or complex domain is mapped out by enumerating the cases and describing an implementation approach for each. Supporting all combinations of storage and computation precisions is handled by typecasting the matrices at key stages of the computation—during packing and/or accumulation, as needed. Several optional optimizations are also documented. Performance results gathered on a 56-core Marvell ThunderX2 and a 52-core Intel Xeon Platinum demonstrate that high performance is mostly preserved, with modest slowdowns incurred from unavoidable typecast instructions. The mixed-datatype implementation confirms that combinatorial intractability is avoided, with the framework relying on only two assembly microkernels to implement 128 datatype combinations.
Gli stili APA, Harvard, Vancouver, ISO e altri
2

Al-Marakeby, A. "PRECISION ON DEMAND: A NOVEL LOSSLES MIXED-PRECISION COMPUTATION TECHNIQUE". Journal of Al-Azhar University Engineering Sector 15, n. 57 (1 ottobre 2020): 1046–56. http://dx.doi.org/10.21608/auej.2020.120378.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
3

Wang, Shengquan, Chao Wang, Yong Cai e Guangyao Li. "A novel parallel finite element procedure for nonlinear dynamic problems using GPU and mixed-precision algorithm". Engineering Computations 37, n. 6 (22 febbraio 2020): 2193–211. http://dx.doi.org/10.1108/ec-07-2019-0328.

Testo completo
Abstract (sommario):
Purpose The purpose of this paper is to improve the computational speed of solving nonlinear dynamics by using parallel methods and mixed-precision algorithm on graphic processing units (GPUs). The computational efficiency of traditional central processing units (CPUs)-based computer aided engineering software has been difficult to satisfy the needs of scientific research and practical engineering, especially for nonlinear dynamic problems. Besides, when calculations are performed on GPUs, double-precision operations are slower than single-precision operations. So this paper implemented mixed precision for nonlinear dynamic problem simulation using Belytschko-Tsay (BT) shell element on GPU. Design/methodology/approach To minimize data transfer between heterogeneous architectures, the parallel computation of the fully explicit finite element (FE) calculation is realized using a vectorized thread-level parallelism algorithm. An asynchronous data transmission strategy and a novel dependency relationship link-based method, for efficiently solving parallel explicit shell element equations, are used to improve the GPU utilization ratio. Finally, this paper implements mixed precision for nonlinear dynamic problems simulation using the BT shell element on a GPU and compare it to the CPU-based serially executed program and a GPU-based double-precision parallel computing program. Findings For a car body model containing approximately 5.3 million degrees of freedom, the computational speed is improved 25 times over CPU sequential computation, and approximately 10% over double-precision parallel computing method. The accuracy error of the mixed-precision computation is small and can satisfy the requirements of practical engineering problems. Originality/value This paper realized a novel FE parallel computing procedure for nonlinear dynamic problems using mixed-precision algorithm on CPU-GPU platform. Compared with the CPU serial program, the program implemented in this article obtains a 25 times acceleration ratio when calculating the model of 883,168 elements, which greatly improves the calculation speed for solving nonlinear dynamic problems.
Gli stili APA, Harvard, Vancouver, ISO e altri
4

Liu, Xingchao, Mao Ye, Dengyong Zhou e Qiang Liu. "Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 10 (18 maggio 2021): 8697–705. http://dx.doi.org/10.1609/aaai.v35i10.17054.

Testo completo
Abstract (sommario):
We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an ``effect of mixed precision'' but without physical mixed precision implementations (which requires specialized hardware accelerators). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.
Gli stili APA, Harvard, Vancouver, ISO e altri
5

Zhang, Jianfei, e Lei Zhang. "Efficient CUDA Polynomial Preconditioned Conjugate Gradient Solver for Finite Element Computation of Elasticity Problems". Mathematical Problems in Engineering 2013 (2013): 1–12. http://dx.doi.org/10.1155/2013/398438.

Testo completo
Abstract (sommario):
Graphics processing unit (GPU) has obtained great success in scientific computations for its tremendous computational horsepower and very high memory bandwidth. This paper discusses the efficient way to implement polynomial preconditioned conjugate gradient solver for the finite element computation of elasticity on NVIDIA GPUs using compute unified device architecture (CUDA). Sliced block ELLPACK (SBELL) format is introduced to store sparse matrix arising from finite element discretization of elasticity with fewer padding zeros than traditional ELLPACK-based formats. Polynomial preconditioning methods have been investigated both in convergence and running time. From the overall performance, the least-squares (L-S) polynomial method is chosen as a preconditioner in PCG solver to finite element equations derived from elasticity for its best results on different example meshes. In the PCG solver, mixed precision algorithm is used not only to reduce the overall computational, storage requirements and bandwidth but to make full use of the capacity of the GPU devices. With SBELL format and mixed precision algorithm, the GPU-based L-S preconditioned CG can get a speedup of about 7–9 to CPU-implementation.
Gli stili APA, Harvard, Vancouver, ISO e altri
6

Molina, Roméo, Vincent Lafage, David Chamont e Fabienne Jézéquel. "Investigating mixed-precision for AGATA pulse-shape analysis". EPJ Web of Conferences 295 (2024): 03020. http://dx.doi.org/10.1051/epjconf/202429503020.

Testo completo
Abstract (sommario):
The AGATA project aims at building a 4π gamma-ray spectrometer consisting of 180 germanium crystals, each crystal being divided into 36 segments. Each gamma ray produces an electrical signal within several neighbouring segments, which is compared with a data base of reference signals, enabling to locate the interaction. This step is called Pulse-Shape Analysis (PSA). In the execution chain leading to the PSA, we observe successive data conversions: the original 14-bit integers given by the electronics are finally converted to 32-bit floats. This made us wonder about the real numerical accuracy of the results, and investigate the use of shorter floats, with the hope to speedup the computation, and also reduce a major cache-miss problem. In this article, we first describe the numerical validation of the PSA code, thanks to the CADNA library. After the code being properly instrumented, CADNA performs each computation three times with a random rounding mode. This allows, for each operation, to evaluate the number of exact significant digits using a Student test with 95% confidence threshold. In a second step, we report our successes and challenges while refactoring the code so to mix different numerical formats, using high precision only when necessary, and taking benefit of hardware speedup elsewhere.
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Yang, Linjie, e Qing Jin. "FracBits: Mixed Precision Quantization via Fractional Bit-Widths". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 12 (18 maggio 2021): 10612–20. http://dx.doi.org/10.1609/aaai.v35i12.17269.

Testo completo
Abstract (sommario):
Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimization, the bit-width of each layer / kernel in the model is at a fractional status of two consecutive bit-widths which can be adjusted gradually. With a differentiable regularization term, the resource constraints can be met during the quantization-aware training which results in an optimized mixed precision model. Our final models achieve comparable or better performance than previous quantization methods with mixed precision on MobilenetV1/V2, ResNet18 under different resource constraints on ImageNet dataset.
Gli stili APA, Harvard, Vancouver, ISO e altri
8

Stupishin, Leonid U., e Konstantin E. Nikitin. "Mixed Finite Element of Geometrically Nonlinear Shallow Shells of Revolution". Applied Mechanics and Materials 501-504 (gennaio 2014): 514–17. http://dx.doi.org/10.4028/www.scientific.net/amm.501-504.514.

Testo completo
Abstract (sommario):
The computation method for shallow shell of revolution in mixed finite-element formulation is developed. Final equations are constructed by the Galerkin method. Results of solution of test task are represented. Precision and convergence of results is analyzed.
Gli stili APA, Harvard, Vancouver, ISO e altri
9

Burkov, Andriy, e Brahim Chaib-draa. "An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games". Proceedings of the AAAI Conference on Artificial Intelligence 24, n. 1 (4 luglio 2010): 729–36. http://dx.doi.org/10.1609/aaai.v24i1.7623.

Testo completo
Abstract (sommario):
This paper presents a technique for approximating, up to any precision, the set of subgame-perfect equilibria (SPE) in repeated games with discounting. The process starts with a single hypercube approximation of the set of SPE payoff profiles. Then the initial hypercube is gradually partitioned on to a set of smaller adjacent hypercubes, while those hypercubes that cannot contain any SPE point are gradually withdrawn. Whether a given hypercube can contain an equilibrium point is verified by an appropriate mixed integer program. A special attention is paid to the question of extracting players' strategies and their representability in form of finite automata.
Gli stili APA, Harvard, Vancouver, ISO e altri
10

Lam, Michael O., e Jeffrey K. Hollingsworth. "Fine-grained floating-point precision analysis". International Journal of High Performance Computing Applications 32, n. 2 (15 giugno 2016): 231–45. http://dx.doi.org/10.1177/1094342016652462.

Testo completo
Abstract (sommario):
Floating-point computation is ubiquitous in high-performance scientific computing, but rounding error can compromise the results of extended calculations, especially at large scales. In this paper, we present new techniques that use binary instrumentation and modification to do fine-grained floating-point precision analysis, simulating any level of precision less than or equal to the precision of the original program. These techniques have an average of 40–70% lower overhead and provide more fine-grained insights into a program’s sensitivity than previous mixed-precision analyses. We also present a novel histogram-based visualization of a program’s floating-point precision sensitivity, as well as an incremental search technique that allows developers to incrementally trade off analysis time for detail, including the ability to restart analyses from where they left off. We present results from several case studies and experiments that show the efficacy of these techniques. Using our tool and its novel visualization, application developers can more quickly determine for specific data sets whether their application could be run using fewer double precision variables, saving both time and memory space.
Gli stili APA, Harvard, Vancouver, ISO e altri
11

Isupov, Konstantin. "High-Performance Computation in Residue Number System Using Floating-Point Arithmetic". Computation 9, n. 2 (21 gennaio 2021): 9. http://dx.doi.org/10.3390/computation9020009.

Testo completo
Abstract (sommario):
Residue number system (RNS) is known for its parallel arithmetic and has been used in recent decades in various important applications, from digital signal processing and deep neural networks to cryptography and high-precision computation. However, comparison, sign identification, overflow detection, and division are still hard to implement in RNS. For such operations, most of the methods proposed in the literature only support small dynamic ranges (up to several tens of bits), so they are only suitable for low-precision applications. We recently proposed a method that supports arbitrary moduli sets with cryptographically sized dynamic ranges, up to several thousands of bits. The practical interest of our method compared to existing methods is that it relies only on very fast standard floating-point operations, so it is suitable for multiple-precision applications and can be efficiently implemented on many general-purpose platforms that support IEEE 754 arithmetic. In this paper, we make further improvements to this method and demonstrate that it can successfully be applied to implement efficient data-parallel primitives operating in the RNS domain, namely finding the maximum element of an array of RNS numbers on graphics processing units. Our experimental results on an NVIDIA RTX 2080 GPU show that for random residues and a 128-moduli set with 2048-bit dynamic range, the proposed implementation reduces the running time by a factor of 39 and the memory consumption by a factor of 13 compared to an implementation based on mixed-radix conversion.
Gli stili APA, Harvard, Vancouver, ISO e altri
12

Ran, Yingqiang, Shikun Dai, Qingrui Chen, Ying Zhang e Jiaxuan Ling. "Numerical Simulation of 2D Strong Magnetic Field in Space-Wavenumber Mixed Domain". Journal of Physics: Conference Series 2651, n. 1 (1 dicembre 2023): 012083. http://dx.doi.org/10.1088/1742-6596/2651/1/012083.

Testo completo
Abstract (sommario):
Abstract Strong magnetic body has self-demagnetization effect, which is not conducive to processing and interpretation of magnetic measurement data. A 2D numerical simulation method in the spatial-wavenumber mixed domain is proposed to calculate magnetic anomalies of 2D strong magnetic objects. In this method, the partial differential equation satisfying the magnetic potential of 2D strong magnetic body is transformed into independent ordinary differential equation between different wave numbers by 1D FFT, which greatly reduces the computation and storage requirements. The compact operator is used to solve the strong magnetic field by iterative method. The algorithm provides a new efficient and high-precision method for the forward modeling of the 2D magnetic anomalies.
Gli stili APA, Harvard, Vancouver, ISO e altri
13

Yang, Tao, Zhezhi He, Tengchuan Kou, Qingzheng Li, Qi Han, Haibao Yu, Fangxin Liu, Yun Liang e Li Jiang. "BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization". ACM Transactions on Reconfigurable Technology and Systems 14, n. 4 (31 dicembre 2021): 1–28. http://dx.doi.org/10.1145/3467476.

Testo completo
Abstract (sommario):
Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.
Gli stili APA, Harvard, Vancouver, ISO e altri
14

Isupov, Konstantin, e Vladimir Knyazkov. "Interval Estimation of Relative Values in Residue Number System". Journal of Circuits, Systems and Computers 27, n. 01 (23 agosto 2017): 1850004. http://dx.doi.org/10.1142/s0218126618500044.

Testo completo
Abstract (sommario):
Residue number system (RNS), due to its carry-free nature, is popular in many applications of high-speed computer arithmetic, especially in digital signal processing and cryptography. However, the main limiting factor of RNS is a high complexity of such operations as magnitude comparison, sign determination and overflow detection. These operations have, for many years, been a major obstacle to more widespread use of parallel residue arithmetic. This paper presents a new efficient method to perform these operations, which is based on computation and analysis of the interval estimation for the relative value of an RNS number. The estimation, which is called the interval floating-point characteristic (IFC), is represented by two directed rounded bounds that are fixed-precision numbers. Generally, the time complexities of serial and parallel computations of IFC are linear and logarithmic functions of the size of the moduli set, respectively. The new method requires only small-integer and fixed-precision floating-point operations and focuses on arbitrary moduli sets with large dynamic ranges ([Formula: see text]). Experiments indicate that the performance of the proposed method is significantly higher than that of methods based on Mixed-Radix Conversion.
Gli stili APA, Harvard, Vancouver, ISO e altri
15

Ye, Yuejin, Zhenya Song, Shengchang Zhou, Yao Liu, Qi Shu, Bingzhuo Wang, Weiguo Liu, Fangli Qiao e Lanning Wang. "swNEMO_v4.0: an ocean model based on NEMO4 for the new-generation Sunway supercomputer". Geoscientific Model Development 15, n. 14 (25 luglio 2022): 5739–56. http://dx.doi.org/10.5194/gmd-15-5739-2022.

Testo completo
Abstract (sommario):
Abstract. The current large-scale parallel barrier of ocean general circulation models (OGCMs) makes it difficult to meet the computing demand of high resolution. Fully considering both the computational characteristics of OGCMs and the heterogeneous many-core architecture of the new Sunway supercomputer, swNEMO_v4.0, based on NEMO4 (Nucleus for European Modelling of the Ocean version 4), is developed with ultrahigh scalability. Three innovations and breakthroughs are shown in our work: (1) a highly adaptive, efficient four-level parallelization framework for OGCMs is proposed to release a new level of parallelism along the compute-dependency column dimension. (2) A many-core optimization method using blocking by remote memory access (RMA) and a dynamic cache scheduling strategy is applied, effectively utilizing the temporal and spatial locality of data. The test shows that the actual direct memory access (DMA) bandwidth is greater than 90 % of the ideal band-width after optimization, and the maximum is up to 95 %. (3) A mixed-precision optimization method with half, single and double precision is explored, which can effectively improve the computation performance while maintaining the simulated accuracy of OGCMs. The results demonstrate that swNEMO_v4.0 has ultrahigh scalability, achieving up to 99.29 % parallel efficiency with a resolution of 500 m using 27 988 480 cores, reaching the peak performance with 1.97 PFLOPS.
Gli stili APA, Harvard, Vancouver, ISO e altri
16

Mohamed Ben Ali, Amina, Salah Bouziane e Hamoudi Bouzerd. "Computation of mode I strain energy release rate of symmetrical and asymmetrical sandwich structures using mixed finite element". Frattura ed Integrità Strutturale 15, n. 56 (28 marzo 2021): 229–39. http://dx.doi.org/10.3221/igf-esis.56.19.

Testo completo
Abstract (sommario):
The use of composite materials is on the rise in different engineering fields, the main advantage of these materials for the aerospace industry is their low weight for excellent mechanical qualities. The analysis of failure modes, such as delamination, of these materials has received great attention from researchers. This paper proposes a method to evaluate the mode I Strain Energy Release Rate (SERR) of sandwich structures. This method associated a two-dimensional mixed finite element with virtual crack extension technique for the analysis of interfacial delamination of sandwich beams. A symmetrical Double Cantilever Beam (DCB) and asymmetrical Double Cantilever Beam (UDCB) have been analyzed in this study. The comparison of the results obtained by this method and those found in the literature shows efficiency and good precision for the calculation of Strain Energy Release Rate (SERR).
Gli stili APA, Harvard, Vancouver, ISO e altri
17

Fasfous, Nael, Manoj Rohit Vemparala, Alexander Frickenstein, Emanuele Valpreda, Driton Salihu, Nguyen Anh Vu Doan, Christian Unger, Naveen Shankar Nagaraja, Maurizio Martina e Walter Stechele. "HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology". ACM Transactions on Embedded Computing Systems 20, n. 5s (31 ottobre 2021): 1–25. http://dx.doi.org/10.1145/3476997.

Testo completo
Abstract (sommario):
Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods.
Gli stili APA, Harvard, Vancouver, ISO e altri
18

Lv, Haifeng, Xiaoyu Ji e Yong Ding. "A Mixed Intrusion Detection System utilizing K-means and Extreme Gradient Boosting". Journal of Physics: Conference Series 2517, n. 1 (1 giugno 2023): 012016. http://dx.doi.org/10.1088/1742-6596/2517/1/012016.

Testo completo
Abstract (sommario):
Abstract The intrusion detection system (IDS) plays an important part because it offers an efficient way to prevent and mitigate cyber attacks. Numerous deep learning methods for intrusion anomaly detection have been developed as a result of recent advances in artificial intelligence (AI) in order to strengthen internet security. The balance among the high detection rate (DR), the low false alarm rate (FAR) and disaster of dimensionality is the crucial apprehension while devising an effective IDS. For the binary classification of intrusion detection systems, we present in this study a mixed model called K-means-XGBoost consisting of K-means and (Extreme Gradient Boosting, XGBoost) algorithms. The distributed computation of our method is achieved in Spark platform to rapidly separate normal events and anomaly events. In phrases of accuracy, DR, F1-score, recall, precision, and error indices FAR, the proposed model’s performance is measured via the well-known dataset of NSL-KDD. The experimental outcomes indicate that our method is outstandingly better among accuracy, DR, F1-score, training time, and processing speed, compared to other models which are recently created. In particular, the accuracy, F1-score, and DR of the proposed model can achieve as high as 93.28%, 94.39%, and 99.22% in the NSL-KDD dataset, respectively.
Gli stili APA, Harvard, Vancouver, ISO e altri
19

Mintourakis, I., G. Panou e D. Paradissis. "Evaluation of ocean circulation models in the computation of the mean dynamic topography for geodetic applications. Case study in the Greek seas". Journal of Geodetic Science 9, n. 1 (1 gennaio 2019): 154–73. http://dx.doi.org/10.1515/jogs-2019-0015.

Testo completo
Abstract (sommario):
Abstract Precise knowledge of the oceanic Mean Dynamic Topography (MDT) is crucial for a number of geodetic applications, such as vertical datum unification and marine geoid modelling. The lack of gravity surveys over many regions of the Greek seas and the incapacity of the space borne gradiometry/gravity missions to resolve the small and medium wavelengths of the geoid led to the investigation of the oceanographic approach for computing the MDT. We compute two new regional MDT surfaces after averaging, for given epochs, the periodic gridded solutions of the Dynamic Ocean Topography (DOT) provided by two ocean circulation models. These newly developed regional MDT surfaces are compared to three state-of-theart models, which represent the oceanographic, the geodetic and the mixed oceanographic/geodetic approaches in the implementation of the MDT, respectively. Based on these comparisons, we discuss the differences between the three approaches for the case study area and we present some valuable findings regarding the computation of the regional MDT. Furthermore, in order to have an estimate of the precision of the oceanographic approach, we apply extensive evaluation tests on the ability of the two regional ocean circulation models to track the sea level variations by comparing their solutions to tide gauge records and satellite altimetry Sea Level Anomalies (SLA) data. The overall findings support the claim that, for the computation of the MDT surface due to the lack of geodetic data and to limitations of the Global Geopotential Models (GGMs) in the case study area, the oceanographic approach is preferable over the geodetic or the mixed oceano-graphic/geodetic approaches.
Gli stili APA, Harvard, Vancouver, ISO e altri
20

Wang, Yang, Jie Liu, Xiaoxiong Zhu, Qingyang Zhang, Shengguo Li e Qinglin Wang. "Improving Structured Grid-Based Sparse Matrix-Vector Multiplication and Gauss–Seidel Iteration on GPDSP". Applied Sciences 13, n. 15 (3 agosto 2023): 8952. http://dx.doi.org/10.3390/app13158952.

Testo completo
Abstract (sommario):
Structured grid-based sparse matrix-vector multiplication and Gauss–Seidel iterations are very important kernel functions in scientific and engineering computations, both of which are memory intensive and bandwidth-limited. GPDSP is a general purpose digital signal processor, which is a very significant embedded processor that has been introduced into high-performance computing. In this paper, we designed various optimization methods, which included a blocking method to improve data locality and increase memory access efficiency, a multicolor reordering method to develop Gauss–Seidel fine-grained parallelism, a data partitioning method designed for GPDSP memory structures, and a double buffering method to overlap computation and access memory on structured grid-based SpMV and Gauss–Seidel iterations for GPDSP. At last, we combined the above optimization methods to design a multicore vectorization algorithm. We tested the matrices generated with structured grids of different sizes on the GPDSP platform and obtained speedups of up to 41× and 47× compared to the unoptimized SpMV and Gauss–Seidel iterations, with maximum bandwidth efficiencies of 72% and 81%, respectively. The experiment results show that our algorithms could fully utilize the external memory bandwidth. We also implemented the commonly used mixed precision algorithm on the GPDSP and obtained speedups of 1.60× and 1.45× for the SpMV and Gauss–Seidel iterations, respectively.
Gli stili APA, Harvard, Vancouver, ISO e altri
21

Lian, Feng, Liming Hou, Jing Liu e Chongzhao Han. "Constrained Multi-Sensor Control Using a Multi-Target MSE Bound and a δ-GLMB Filter". Sensors 18, n. 7 (16 luglio 2018): 2308. http://dx.doi.org/10.3390/s18072308.

Testo completo
Abstract (sommario):
The existing multi-sensor control algorithms for multi-target tracking (MTT) within the random finite set (RFS) framework are all based on the distributed processing architecture, so the rule of generalized covariance intersection (GCI) has to be used to obtain the multi-sensor posterior density. However, there has still been no reliable basis for setting the normalized fusion weight of each sensor in GCI until now. Therefore, to avoid the GCI rule, the paper proposes a new constrained multi-sensor control algorithm based on the centralized processing architecture. A multi-target mean-square error (MSE) bound defined in our paper is served as cost function and the multi-sensor control commands are just the solutions that minimize the bound. In order to derive the bound by using the generalized information inequality to RFS observation, the error between state set and its estimation is measured by the second-order optimal sub-pattern assignment metric while the multi-target Bayes recursion is performed by using a δ-generalized labeled multi-Bernoulli filter. An additional benefit of our method is that the proposed bound can provide an online indication of the achievable limit for MTT precision after the sensor control. Two suboptimal algorithms, which are mixed penalty function (MPF) method and complex method, are used to reduce the computation cost of solving the constrained optimization problem. Simulation results show that for the constrained multi-sensor control system with different observation performance, our method significantly outperforms the GCI-based Cauchy-Schwarz divergence method in MTT precision. Besides, when the number of sensors is relatively large, the computation time of the MPF and complex methods is much shorter than that of the exhaustive search method at the expense of completely acceptable loss of tracking accuracy.
Gli stili APA, Harvard, Vancouver, ISO e altri
22

White, Alexander J., Lee A. Collins, Katarina Nichols e S. X. Hu. "Mixed stochastic-deterministic time-dependent density functional theory: application to stopping power of warm dense carbon". Journal of Physics: Condensed Matter 34, n. 17 (28 febbraio 2022): 174001. http://dx.doi.org/10.1088/1361-648x/ac4f1a.

Testo completo
Abstract (sommario):
Abstract Warm dense matter (WDM) describes an intermediate phase, between condensed matter and classical plasmas, found in natural and man-made systems. In a laboratory setting, WDM is often created dynamically. It is typically laser or pulse-power generated and can be difficult to characterize experimentally. Measuring the energy loss of high energy ions, caused by a WDM target, is both a promising diagnostic and of fundamental importance to inertial confinement fusion research. However, electron coupling, degeneracy, and quantum effects limit the accuracy of easily calculable kinetic models for stopping power, while high temperatures make the traditional tools of condensed matter, e.g. time-dependent density functional theory (TD-DFT), often intractable. We have developed a mixed stochastic-deterministic approach to TD-DFT which provides more efficient computation while maintaining the required precision for model discrimination. Recently, this approach showed significant improvement compared to models when compared to experimental energy loss measurements in WDM carbon. Here, we describe this approach and demonstrate its application to warm dense carbon stopping across a range of projectile velocities. We compare direct stopping-power calculation to approaches based on combining homogeneous electron gas response with bound electrons, with parameters extracted from our TD-DFT calculations.
Gli stili APA, Harvard, Vancouver, ISO e altri
23

Kavya, K. Guru Sai, G. Nagasowmya, G. Ankitha, K. Bharathi, K. Reshma e M. Sharmila. "Analysis of Two Stage CMOS Operational Amplifier in 90nm CMOS Technology". International Journal for Research in Applied Science and Engineering Technology 12, n. 2 (29 febbraio 2024): 444–49. http://dx.doi.org/10.22214/ijraset.2024.58338.

Testo completo
Abstract (sommario):
Abstract: Operational amplifier circuits are used in computation, instrumentation and other applications. Precision op-amps previously used in instrumentation are now-a-days being used in industrial and automotive applications. Hence, there always exists a need for better precision op-amps. It should operate under a wide temperature range. Nowadays, due to the industry trend of applying standard process technologies to implement both analog circuits and digital circuits on the same chip, Complementary Metal-Oxide Semiconductor (CMOS) technology has become dominant over bipolar technology for analog circuit design in a mixed-signal system. Two-stage Op-Amp is one of the most commonly used Op-Amp architectures. In this work an operational amplifier based on CMOS is presented whose input depends on its bias currents which is 20µA and designed using 180nm and 90nm technologies. In sub-threshold region due to unique behavior of the MOS transistors not only allows a designer to work at low voltages and it also allows operating at low input bias currents. Most CMOS Op-Amps are designed for specific on-chip applications and are only required to drive capacitive loads of a few pf. In this proposed work, design of a two stage fully differential CMOS operational amplifier is presented and simulations are carried out in 180nm and 90nm technologies for various parameters. The simulation is to be carried out using Cadence Virtuoso Tool.
Gli stili APA, Harvard, Vancouver, ISO e altri
24

Kubacki, Jan, e Alina Jędrzejczak. "Small area estimation of income under spatial SAR model". Statistics in Transition new series 17, n. 3 (1 settembre 2016): 365–90. http://dx.doi.org/10.59170/stattrans-2016-022.

Testo completo
Abstract (sommario):
The paper presents the method of hierarchical Bayes (HB) estimation under small area models with spatially correlated random effects and a spatial structure implied by the Simultaneous Autoregressive (SAR) process. The idea was to improve the spatial EBLUP by incorporating the HB approach into the estimation algorithm. The computation procedure applied in the paper uses the concept of sampling from a posterior distribution under generalized linear mixed models implemented in WinBUGS software and adapts the idea of parameter estimation for small areas by means of the HB method in the case of known model hyperparameters. The illustration of the approach mentioned above was based on a real-world example concerning household income data. The precision of the direct estimators was determined using own three-stage procedure which employs Balanced Repeated Replication, bootstrap and Generalized Variance Function. Additional simulations were conducted to show the influence of the spatial autoregression coefficient on the estimation error reduction. The computations performed by ‘sae’ package for R project and a special procedure for WinBUGS reveal that the method provides reliable estimates of small area means. For high spatial correlation between domains, noticeable MSE reduction was observed, which seems more evident for HB-SAR method as compared with the traditional spatial EBLUP. In our opinion, the Gibbs sampler, revealing the simultaneous nature of processes, especially for random effects, can be a good starting point for the simulations based on stochastic SAR processes.
Gli stili APA, Harvard, Vancouver, ISO e altri
25

Tan, Zhaoxiang, Shaofeng Lu, Kai Bao, Shaoning Zhang, Chaoxian Wu, Jie Yang e Fei Xue. "Adaptive Partial Train Speed Trajectory Optimization". Energies 11, n. 12 (26 novembre 2018): 3302. http://dx.doi.org/10.3390/en11123302.

Testo completo
Abstract (sommario):
Train speed trajectory optimization has been proposed as an efficient and feasible method for energy-efficient train operation without many further requirements to upgrade the current railway system. This paper focuses on an adaptive partial train speed trajectory optimization problem between two arbitrary speed points with a given traveling time and distance, in comparison with full speed trajectory with zero initial and end speeds between two stations. This optimization problem is of interest in dynamic applications where scenarios keep changing due to signaling and multi-train interactions. We present a detailed optimality analysis based on Pontryagin’s maximum principle (PMP) which is later used to design the optimization methods. We propose two optimization methods, one based on the PMP and another based on mixed-integer linear programming (MILP), to solve the problem. Both methods are designed using heuristics obtained from the developed optimality analysis based on the PMP. We develop an intuitive numerical algorithm to achieve the optimal speed trajectory in four typical case scenarios; meanwhile, we propose a new distance-based MILP approach to optimize the partial speed trajectory in the same scenarios with high modeling precision and computation efficiency. The MILP method is later used in a real engineering speed trajectory optimization to demonstrate its high computational efficiency, robustness, and adaptivity. This paper concludes with a comparison of both methods in addition to the widely applied pseudospectral method and propose the future work of this paper.
Gli stili APA, Harvard, Vancouver, ISO e altri
26

Chen, Xiangren, Bohan Yang, Wenping Zhu, Hanning Wang, Qichao Tao, Shuying Yin, Min Zhu, Shaojun Wei e Leibo Liu. "A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier". IACR Transactions on Cryptographic Hardware and Embedded Systems 2025, n. 1 (9 dicembre 2024): 275–313. https://doi.org/10.46586/tches.v2025.i1.275-313.

Testo completo
Abstract (sommario):
Zero-knowledge proof (ZKP) is an attractive cryptographic paradigm that allows a party to prove the correctness of a given statement without revealing any additional information. It offers both computation integrity and privacy, witnessing many celebrated deployments, such as computation outsourcing and cryptocurrencies. Recent general-purpose ZKP schemes, e.g., zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK), suffer from time-consuming proof generation, which is mainly bottlenecked by the large-scale number theoretic transformation (NTT) and multi-scalar point multiplication (MSM). To boost its wide application, great interest has been shown in expediting the proof generation on various platforms like GPU, FPGA and ASIC.So far as we know, current works on the hardware designs for ZKP employ two separated data-paths for NTT and MSM, overlooking the potential of resource reusage. In this work, we particularly explore the feasibility and profit of implementing both NTT and MSM with a unified and high-performance hardware architecture. For the crucial operator design, we propose a dual-precision, load-balanced and fully-pipelined Montgomery multiplier (LBFP MM) by introducing the new mixed-radix technique and improving the prior quotient-decoupled strategy. Collectively, we also integrate orthogonal ideas to further enhance the performance of LBFP MM, including the customized constant multiplication, truncated LSB/MSB multiplication/addition and Karatsuba technique. On top of that, we present the unified, scalable and highperformance hardware architecture that conducts both NTT and MSM in a versatile pipelined execution mechanism, intensively sharing the common computation and memory resource. The proposed accelerator manages to overlap the on-chip memory computation with off-chip memory access, considerably reducing the overall cycle counts for NTT and MSM.We showcase the implementation of modular multiplier and overall architecture on the BLS12-381 elliptic curve for zk-SNARK. Extensive experiments are carried out under TSMC 28nm synthesis and similar simulation set, which demonstrate impressive improvements: (1) the proposed LBFP MM obtains 1.8x speed-up and 1.3x less area cost versus the state-of-the-art design; (2) the unified accelerator achieves 12.1x and 5.8x acceleration for NTT and MSM while also consumes 4.3x lower overall on-chip area overhead, when compared to the most related and advanced work PipeZK.
Gli stili APA, Harvard, Vancouver, ISO e altri
27

Ge, Xiaohui, Lu Shen, Chaoming Zheng, Peng Li e Xiaobo Dou. "A Decoupling Rolling Multi-Period Power and Voltage Optimization Strategy in Active Distribution Networks". Energies 13, n. 21 (5 novembre 2020): 5789. http://dx.doi.org/10.3390/en13215789.

Testo completo
Abstract (sommario):
With the increasing penetration of distributed photovoltaics (PVs) in active distribution networks (ADNs), the risk of voltage violations caused by PV uncertainties is significantly exacerbated. Since the conventional voltage regulation strategy is limited by its discrete devices and delay, ADN operators allow PVs to participate in voltage optimization by controlling their power outputs and cooperating with traditional regulation devices. This paper proposes a decoupling rolling multi-period reactive power and voltage optimization strategy considering the strong time coupling between different devices. The mixed-integer voltage optimization model is first decomposed into a long-period master problem for on-load tap changer (OLTC) and multiple short-period subproblems for PV power by Benders decomposition algorithm. Then, based on the high-precision PV and load forecasts, the model predictive control (MPC) method is utilized to modify the independent subproblems into a series of subproblems that roll with the time window, achieving a smooth transition from the current state to the ideal state. The estimated voltage variation in the prediction horizon of MPC is calculated by a simplified discrete equation for OLTC tap and a linearized sensitivity matrix between power and voltage for fast computation. The feasibility of the proposed optimization strategy is demonstrated by performing simulations on a distribution test system.
Gli stili APA, Harvard, Vancouver, ISO e altri
28

Kong, Yi-Fan, Shi-Zhu Li, Kai-Wen Wang, Bin Zhu, Yu-Xin Yuan, Meng-Kai Li e Ji-Yuan Zhou. "An Efficient Bayesian Method for Estimating the Degree of the Skewness of X Chromosome Inactivation Based on the Mixture of General Pedigrees and Unrelated Females". Biomolecules 13, n. 3 (16 marzo 2023): 543. http://dx.doi.org/10.3390/biom13030543.

Testo completo
Abstract (sommario):
Skewed X chromosome inactivation (XCI-S) has been reported to be associated with some X-linked diseases. Several methods have been proposed to estimate the degree of XCI-S (denoted as γ) for quantitative and qualitative traits based on unrelated females. However, there is no method available for estimating γ based on general pedigrees. Therefore, in this paper, we propose a Bayesian method to obtain the point estimate and the credible interval of γ based on the mixture of general pedigrees and unrelated females (called mixed data for brevity), which is also suitable for only general pedigrees. We consider the truncated normal prior and the uniform prior for γ. Further, we apply the eigenvalue decomposition and Cholesky decomposition to our proposed methods to accelerate the computation speed. We conduct extensive simulation studies to compare the performances of our proposed methods and two existing Bayesian methods which are only applicable to unrelated females. The simulation results show that the incorporation of general pedigrees can improve the efficiency of the point estimation and the precision and the accuracy of the interval estimation of γ. Finally, we apply the proposed methods to the Minnesota Center for Twin and Family Research data for their practical use.
Gli stili APA, Harvard, Vancouver, ISO e altri
29

Zha, Chengyuan, Lei Li, Fangting Zhu e Yanzhe Zhao. "The Classification of VOCs Based on Sensor Images Using a Lightweight Neural Network for Lung Cancer Diagnosis". Sensors 24, n. 9 (28 aprile 2024): 2818. http://dx.doi.org/10.3390/s24092818.

Testo completo
Abstract (sommario):
The application of artificial intelligence to point-of-care testing (POCT) disease detection has become a hot research field, in which breath detection, which detects the patient’s exhaled VOCs, combined with sensor arrays of convolutional neural network (CNN) algorithms as a new lung cancer detection is attracting more researchers’ attention. However, the low accuracy, high-complexity computation and large number of parameters make the CNN algorithms difficult to transplant to the embedded system of POCT devices. A lightweight neural network (LTNet) in this work is proposed to deal with this problem, and meanwhile, achieve high-precision classification of acetone and ethanol gases, which are respiratory markers for lung cancer patients. Compared to currently popular lightweight CNN models, such as EfficientNet, LTNet has fewer parameters (32 K) and its training weight size is only 0.155 MB. LTNet achieved an overall classification accuracy of 99.06% and 99.14% in the own mixed gas dataset and the University of California (UCI) dataset, which are both higher than the scores of the six existing models, and it also offers the shortest training (844.38 s and 584.67 s) and inference times (23 s and 14 s) in the same validation sets. Compared to the existing CNN models, LTNet is more suitable for resource-limited POCT devices.
Gli stili APA, Harvard, Vancouver, ISO e altri
30

Xie, Wei, Wendi Zhu, Xiaozhong Tong e Huiying Ma. "A Legendre Spectral-Element Method to Incorporate Topography for 2.5D Direct-Current-Resistivity Forward Modeling". Mathematics 12, n. 12 (14 giugno 2024): 1864. http://dx.doi.org/10.3390/math12121864.

Testo completo
Abstract (sommario):
An effective and accurate solver for the direct-current-resistivity forward-modeling problem has become a cutting-edge research topic. However, computational limitations arise due to the substantial amount of data involved, hindering the widespread use of three-dimensional forward modeling, which is otherwise considered the most effective approach for identifying geo-electrical anomalies. An efficient compromise, or potentially an alternative, is found in two-and-a-half-dimensional (2.5D) modeling, which employs a three-dimensional current source within a two-dimensional subsurface medium. Consequently, a Legendre spectral-element algorithm is developed specifically for 2.5D direct-current-resistivity forward modeling, taking into account the presence of topography. This numerical algorithm can combine the complex geometric flexibility of the finite-element method with the high precision of the spectral method. To solve the wavenumber-domain electrical potential variational problem, which is converted into the two-dimensional Helmholtz equation with mixed boundary conditions, the Gauss–Lobatto–Legendre (GLL) quadrature is employed in all discrete quadrilateral spectral elements, ensuring identical Legendre polynomial interpolation and quadrature points. The Legendre spectral-element method is applied to solve a two-dimensional Helmholtz equation and a resistivity half-space model. Numerical experiments demonstrate that the proposed approach yields highly accurate numerical results, even with a coarse mesh. Additionally, the Legendre spectral-element algorithm is employed to simulate the apparent resistivity distortions caused by surface topographical variations in the direct-current resistivity Wenner-alpha array. These numerical results affirm the substantial impact of topographical variations on the apparent resistivity data obtained in the field. Consequently, when interpreting field data, it is crucial to consider topographic effects to the extent they can be simulated. Moreover, our numerical method can be extended and implemented for a more accurate computation of three-dimensional direct-current-resistivity forward modeling.
Gli stili APA, Harvard, Vancouver, ISO e altri
31

Yahyaoui, Zahra, Mansour Hajji, Majdi Mansouri, Kamaleldin Abodayeh, Kais Bouzrara e Hazem Nounou. "Effective Fault Detection and Diagnosis for Power Converters in Wind Turbine Systems Using KPCA-Based BiLSTM". Energies 15, n. 17 (23 agosto 2022): 6127. http://dx.doi.org/10.3390/en15176127.

Testo completo
Abstract (sommario):
The current work presents an effective fault detection and diagnosis (FDD) technique in wind energy converter (WEC) systems. The proposed FDD framework merges the benefits of kernel principal component analysis (KPCA) model and the bidirectional long short-term memory (BiLSTM) classifier. In the developed FDD approach, the KPCA model is applied to extract and select the most effective features, while the BiLSTM is utilized for classification purposes. The developed KPCA-based BiLSTM approach involves two main steps: feature extraction and selection, and fault classification. The KPCA model is developed in order to select and extract the most efficient features and the final features are fed to the BiLSTM to distinguish between different working modes. Different simulation scenarios are considered in this study in order to show the robustness and performance of the developed technique when compared to the conventional FDD methods. To evaluate the effectiveness of the proposed KPCA-based BiLSTM approach, we utilize data obtained from a healthy WTC, which are then injected with several fault scenarios: simple fault generator-side, simple fault grid-side, multiple fault generator-side, multiple fault grid-side, and mixed fault on both sides. The diagnosis performance is analyzed in terms of accuracy, recall, precision, and computation time. Furthermore, the efficiency of fault diagnosis is shown by the classification accuracy parameter. The experimental results show the efficiency of the developed KPCA-based BiLSTM technique compared to the classical FDD techniques (an accuracy of 97.30%).
Gli stili APA, Harvard, Vancouver, ISO e altri
32

Baboulin, Marc, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julie Langou, Julien Langou, Piotr Luszczek e Stanimire Tomov. "Accelerating scientific computations with mixed precision algorithms". Computer Physics Communications 180, n. 12 (dicembre 2009): 2526–33. http://dx.doi.org/10.1016/j.cpc.2008.11.005.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
33

Hoo, Choon Lih, Sallehuddin Mohamed Haris e Nik Abdullah Nik Mohamed. "A floating point conversion algorithm for mixed precision computations". Journal of Zhejiang University SCIENCE C 13, n. 9 (settembre 2012): 711–18. http://dx.doi.org/10.1631/jzus.c1200043.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
34

Kelley, C. T. "Newton's Method in Mixed Precision". SIAM Review 64, n. 1 (febbraio 2022): 191–211. http://dx.doi.org/10.1137/20m1342902.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
35

Li, Yi, Woyu Zhang, Xiaoxin Xu, Yifan He, Danian Dong, Nanjia Jiang, Fei Wang et al. "Mixed‐Precision Continual Learning Based on Computational Resistance Random Access Memory". Advanced Intelligent Systems 4, n. 8 (agosto 2022): 2270036. http://dx.doi.org/10.1002/aisy.202270036.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
36

Chen, Siyuan, Yi Zhang, Yiming Wang, Zhuang Liu, Xiaohan Li e Wei Xue. "Mixed-precision computing in the GRIST dynamical core for weather and climate modelling". Geoscientific Model Development 17, n. 16 (27 agosto 2024): 6301–18. http://dx.doi.org/10.5194/gmd-17-6301-2024.

Testo completo
Abstract (sommario):
Abstract. Atmosphere modelling applications are becoming increasingly memory-bound due to the inconsistent development rates between processor speeds and memory bandwidth. In this study, we mitigate memory bottlenecks and reduce the computational load of the Global–Regional Integrated Forecast System (GRIST) dynamical core by adopting a mixed-precision computing strategy. Guided by an application of the iterative development principle, we identify the coded equation terms that are precision insensitive and modify them from double to single precision. The results show that most precision-sensitive terms are predominantly linked to pressure gradient and gravity terms, while most precision-insensitive terms are advective terms. Without using more computing resources, computational time can be saved, and the physical performance of the model is largely kept. In the standard computational test, the reference runtime of the model's dry hydrostatic core, dry nonhydrostatic core, and the tracer transport module is reduced by 24 %, 27 %, and 44 %, respectively. A series of idealized tests, real-world weather and climate modelling tests, was performed to assess the optimized model performance qualitatively and quantitatively. In particular, in the long-term coarse-resolution climate simulation, the precision-induced sensitivity can manifest at the large scale, while in the kilometre-scale weather forecast simulation, the model's sensitivity to the precision level is mainly limited to small-scale features, and the wall-clock time is reduced by 25.5 % from the double- to mixed-precision full-model simulations.
Gli stili APA, Harvard, Vancouver, ISO e altri
37

Yue, Xiaoqiang, Zhiyong Wang e Shu-Lin Wu. "Convergence Analysis of a Mixed Precision Parareal Algorithm". SIAM Journal on Scientific Computing 45, n. 5 (22 settembre 2023): A2483—A2510. http://dx.doi.org/10.1137/22m1510169.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
38

Xu, Yihao, Zhuo Zhang, Longyong Chen, Zhenhua Li e Ling Yang. "The Adaptive Streaming SAR Back-Projection Algorithm Based on Half-Precision in GPU". Electronics 11, n. 18 (6 settembre 2022): 2807. http://dx.doi.org/10.3390/electronics11182807.

Testo completo
Abstract (sommario):
The back-projection (BP) algorithm is completely accurate in the imaging principle, but the computational complexity is extremely high. The single-precision arithmetic used in the traditional graphics processing unit (GPU) acceleration scheme has low throughput and its usage of the video memory is large. An adaptive asynchronous streaming scheme for the BP algorithm based on half-precision is proposed in this study, and then it is extended to the fast back-projection (FBP) algorithm. In this scheme, the adaptive loss factors selection strategy ensures the dynamic range of data, the asynchronous streaming structure ensures the efficiency of large scene imaging, and the mixed-precision data processing ensures the imaging quality. The schemes proposed in this paper are compared with BP, FBP, and fast factorized back-projection (FFBP) algorithms of single-precision in GPU. The experimental results show that the two half-precision acceleration schemes in this paper reduce the video memory usage to 74% and 59% of the single-precision schemes with guaranteed image quality. The efficiency improvements of the proposed schemes are almost one and 0.5 times greater than that of the corresponding single-precision scheme, and the advantage can be more obvious when dealing with large computations.
Gli stili APA, Harvard, Vancouver, ISO e altri
39

Carson, Erin, e Noaman Khan. "Mixed Precision Iterative Refinement with Sparse Approximate Inverse Preconditioning". SIAM Journal on Scientific Computing 45, n. 3 (9 giugno 2023): C131—C153. http://dx.doi.org/10.1137/22m1487709.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
40

Zhang, Hao, Dongdong Chen e Seok-Bum Ko. "Efficient Multiple-Precision Floating-Point Fused Multiply-Add with Mixed-Precision Support". IEEE Transactions on Computers 68, n. 7 (1 luglio 2019): 1035–48. http://dx.doi.org/10.1109/tc.2019.2895031.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
41

Wei, Hang, Zulin Wang e Yuanhan Ni. "Hierarchical Mixed-Precision Post-Training Quantization for SAR Ship Detection Networks". Remote Sensing 16, n. 21 (30 ottobre 2024): 4042. http://dx.doi.org/10.3390/rs16214042.

Testo completo
Abstract (sommario):
Convolutional neural network (CNN)-based synthetic aperture radar (SAR) ship detection models operating directly on satellites can reduce transmission latency and improve real-time surveillance capabilities. However, limited satellite platform resources present a significant challenge. Post-training quantization (PTQ) provides an efficient method for pre-training neural networks to effectively reduce memory and computational resources without retraining. Despite this, PTQ faces the challenge of maintaining model accuracy, especially at low-bit quantization (e.g., 4-bit or 2-bit). To address this challenge, we propose a hierarchical mixed-precision post-training quantization (HMPTQ) method for SAR ship detection neural networks to reduce quantization error. This method encompasses a layerwise precision configuration based on reconstruction error and an intra-layer mixed-precision quantization strategy. Specifically, our approach initially utilizes the activation reconstruction error of each layer to gauge the sensitivity necessary for bit allocation, considering the interdependencies among layers, which effectively reduces the complexity of computational sensitivity and achieves more precise quantization allocation. Subsequently, to minimize the quantization error of the layers, an intra-layer mixed-precision quantization strategy based on probability density assigns a greater number of quantization bits to regions where the probability density is low for higher values. Our evaluation on the SSDD, HRSID, and LS-SSDD-v1.0 SAR Ship datasets, using different detection CNN models, shows that the YOLOV9c model with mixed-precision quantization at 4-bit and 2-bit for weights and activations achieves only a 0.28% accuracy loss on the SSDD dataset, while reducing the model size by approximately 80%. Compared to state-of-the-art methods, our approach maintains competitive accuracy, confirming the superior performance of the HMPTQ method over existing quantization techniques.
Gli stili APA, Harvard, Vancouver, ISO e altri
42

Yang, L. Minah, Alyson Fox e Geoffrey Sanders. "Rounding Error Analysis of Mixed Precision Block Householder QR Algorithms". SIAM Journal on Scientific Computing 43, n. 3 (gennaio 2021): A1723—A1753. http://dx.doi.org/10.1137/19m1296367.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
43

Bodner, Benjamin Jacob, Gil Ben-Shalom e Eran Treister. "GradFreeBits: Gradient-Free Bit Allocation for Mixed-Precision Neural Networks". Sensors 22, n. 24 (13 dicembre 2022): 9772. http://dx.doi.org/10.3390/s22249772.

Testo completo
Abstract (sommario):
Quantized neural networks (QNNs) are among the main approaches for deploying deep neural networks on low-resource edge devices. Training QNNs using different levels of precision throughout the network (mixed-precision quantization) typically achieves superior trade-offs between performance and computational load. However, optimizing the different precision levels of QNNs can be complicated, as the values of the bit allocations are discrete and difficult to differentiate for. Moreover, adequately accounting for the dependencies between the bit allocation of different layers is not straightforward. To meet these challenges, in this work, we propose GradFreeBits: a novel joint optimization scheme for training mixed-precision QNNs, which alternates between gradient-based optimization for the weights and gradient-free optimization for the bit allocation. Our method achieves a better or on par performance with the current state-of-the-art low-precision classification networks on CIFAR10/100 and ImageNet, semantic segmentation networks on Cityscapes, and several graph neural networks benchmarks. Furthermore, our approach can be extended to a variety of other applications involving neural networks used in conjunction with parameters that are difficult to optimize for.
Gli stili APA, Harvard, Vancouver, ISO e altri
44

Tintó Prims, Oriol, Mario C. Acosta, Andrew M. Moore, Miguel Castrillo, Kim Serradell, Ana Cortés e Francisco J. Doblas-Reyes. "How to use mixed precision in ocean models: exploring a potential reduction of numerical precision in NEMO 4.0 and ROMS 3.6". Geoscientific Model Development 12, n. 7 (24 luglio 2019): 3135–48. http://dx.doi.org/10.5194/gmd-12-3135-2019.

Testo completo
Abstract (sommario):
Abstract. Mixed-precision approaches can provide substantial speed-ups for both computing- and memory-bound codes with little effort. Most scientific codes have overengineered the numerical precision, leading to a situation in which models are using more resources than required without knowing where they are required and where they are not. Consequently, it is possible to improve computational performance by establishing a more appropriate choice of precision. The only input that is needed is a method to determine which real variables can be represented with fewer bits without affecting the accuracy of the results. This paper presents a novel method that enables modern and legacy codes to benefit from a reduction of the precision of certain variables without sacrificing accuracy. It consists of a simple idea: we reduce the precision of a group of variables and measure how it affects the outputs. Then we can evaluate the level of precision that they truly need. Modifying and recompiling the code for each case that has to be evaluated would require a prohibitive amount of effort. Instead, the method presented in this paper relies on the use of a tool called a reduced-precision emulator (RPE) that can significantly streamline the process. Using the RPE and a list of parameters containing the precisions that will be used for each real variable in the code, it is possible within a single binary to emulate the effect on the outputs of a specific choice of precision. When we are able to emulate the effects of reduced precision, we can proceed with the design of the tests that will give us knowledge of the sensitivity of the model variables regarding their numerical precision. The number of possible combinations is prohibitively large and therefore impossible to explore. The alternative of performing a screening of the variables individually can provide certain insight about the required precision of variables, but, on the other hand, other complex interactions that involve several variables may remain hidden. Instead, we use a divide-and-conquer algorithm that identifies the parts that require high precision and establishes a set of variables that can handle reduced precision. This method has been tested using two state-of-the-art ocean models, the Nucleus for European Modelling of the Ocean (NEMO) and the Regional Ocean Modeling System (ROMS), with very promising results. Obtaining this information is crucial to build an actual mixed-precision version of the code in the next phase that will bring the promised performance benefits.
Gli stili APA, Harvard, Vancouver, ISO e altri
45

Junqing Sun, G. D. Peterson e O. O. Storaasli. "High-Performance Mixed-Precision Linear Solver for FPGAs". IEEE Transactions on Computers 57, n. 12 (dicembre 2008): 1614–23. http://dx.doi.org/10.1109/tc.2008.89.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
46

Alvermann, Andreas, Achim Basermann, Hans-Joachim Bungartz, Christian Carbogno, Dominik Ernst, Holger Fehske, Yasunori Futamura et al. "Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects". Japan Journal of Industrial and Applied Mathematics 36, n. 2 (27 aprile 2019): 699–717. http://dx.doi.org/10.1007/s13160-019-00360-8.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
47

Ben Hamdin, Haniyah A. M. Saed, e Faoziya S. M. Musbah. "Hybrid Triple Quadrature Rule Blending Some Gauss-Type Rules with the classical or the Derivative-Based Newton-Cotes-Type Rules." Al-Mukhtar Journal of Basic Sciences 21, n. 2 (5 maggio 2024): 63–72. http://dx.doi.org/10.54172/et373z10.

Testo completo
Abstract (sommario):
Hybrid numerical quadrature rules are widespread techniques for approximate computations of definite integrals. Such hybrid rules combine as many quadrature rules as long as they possess the same degree of precision. The revenue is a new mixed rule with a higher degree of precision than its constituted rules at least by two. Moreover, such mixed rules are quite simple and handy, because they do not involve any extra evaluations of the integrand. That is by relying on the same number of quadrature points of the constituted rules, the acquired hybrid rule performs more efficiently than its ingredients rules. In this paper; a triple hybrid quadrature rule has been constructed for the numerical integration of real definite integrals that do not possess a closed-form anti-derivative. At First, a dual hybrid rule was produced by blending Milne’s rule of Newton-Cotes type with the anti-Gaussian quadrature rule to prevail a dual rule of a degree of precision equal to five. Then the acquired dual rule is recombined with the composite derivative-based and mid-Point Newton–Cotes formula producing a hybrid triple rule of degree of precision equal to seven. The accomplished approach is satisfactory and efficient in the approximate evaluation of definite real integrals as confirmed analytically by the error analysis and numerically by some verification examples. To promote the degree of precision of the proposed triple approach, the numerical computations have been implemented in an adaptive environment.
Gli stili APA, Harvard, Vancouver, ISO e altri
48

Khaz’ali, Ali Reza, Mohammad Reza Rasaei e Jamshid Moghadasi. "Mixed precision parallel preconditioner and linear solver for compositional reservoir simulation". Computational Geosciences 18, n. 5 (15 maggio 2014): 729–46. http://dx.doi.org/10.1007/s10596-014-9421-3.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
49

Buttari, Alfredo, Jack Dongarra, Jakub Kurzak, Piotr Luszczek e Stanimir Tomov. "Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy". ACM Transactions on Mathematical Software 34, n. 4 (15 luglio 2008): 1–22. http://dx.doi.org/10.1145/1377596.1377597.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
50

Petschow, M., E. S. Quintana-Ortí e P. Bientinesi. "Improved Accuracy and Parallelism for MRRR-Based Eigensolvers---A Mixed Precision Approach". SIAM Journal on Scientific Computing 36, n. 2 (gennaio 2014): C240—C263. http://dx.doi.org/10.1137/130911561.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Offriamo sconti su tutti i piani premium per gli autori le cui opere sono incluse in raccolte letterarie tematiche. Contattaci per ottenere un codice promozionale unico!

Vai alla bibliografia