Academic literature on the topic 'Lowe Power Accelerators'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Lowe Power Accelerators.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Lowe Power Accelerators"

1

Zhao, Yulin, Donghui Wang, and Leiou Wang. "Convolution Accelerator Designs Using Fast Algorithms." Algorithms 12, no. 5 (May 27, 2019): 112. http://dx.doi.org/10.3390/a12050112.

Full text
Abstract:
Convolutional neural networks (CNNs) have achieved great success in image processing. However, the heavy computational burden it imposes makes it difficult for use in embedded applications that have limited power consumption and performance. Although there are many fast convolution algorithms that can reduce the computational complexity, they increase the difficulty of practical implementation. To overcome these difficulties, this paper proposes several convolution accelerator designs using fast algorithms. The designs are based on the field programmable gate array (FPGA) and display a better balance between the digital signal processor (DSP) and the logic resource, while also requiring lower power consumption. The implementation results show that the power consumption of the accelerator design based on the Strassen–Winograd algorithm is 21.3% less than that of conventional accelerators.
APA, Harvard, Vancouver, ISO, and other styles
2

Galdeckiy, Anatoliy. "On Prospects of Output Power Increasing in Low-Voltage Multibeam Klystrons for Electron Accelerators." Infocommunications and Radio Technologies 5, no. 1 (March 25, 2022): 93–100. http://dx.doi.org/10.29039/2587-9936.2022.05.1.07.

Full text
Abstract:
Physical principles of output power limitation in low-voltage multibeam klystrons are considered. It is demonstrated that metamaterial consisting of array of metal inductive inserts and located in the cavity’s interaction region makes possible significant increasing of phase velocity of transversal wave in the gap. This reveals the opportunity to enhance uniformity of the RF field interacting with the beams in the gap, interaction region diameter, beams current and power of the klystron without cathode voltage increase. Resonators in S and Ka bands are analyzed.
APA, Harvard, Vancouver, ISO, and other styles
3

Zimek, Zbigniew. "Economical evaluation of radiation processing with high-intensity X-rays." Nukleonika 65, no. 3 (September 1, 2020): 167–72. http://dx.doi.org/10.2478/nuka-2020-0027.

Full text
Abstract:
AbstractX-rays application for radiation processing was introduced to the industrial practice, and in some circumstances is found to be more economically competitive, and offer more flexibility than gamma sources. Recent progress in high-power accelerators development gives opportunity to construct and apply reliable high-power electron beam to X-rays converters for the industrial application. The efficiency of the conversion process depends mainly on electron energy and atomic number of the target material, as it was determined in theoretical predictions and confirmed experimentally. However, the lower price of low-energy direct accelerators and their higher electrical efficiency may also have certain influence on process economy. There are number of auxiliary parameters that can effectively change the economical results of the process. The most important ones are as follows: average beam power level, spare part cost, and optimal shape of electron beam and electron beam utilization efficiency. All these parameters and related expenses may affect the unit cost of radiation facility operation and have a significant influence on X-ray process economy. The optimization of X-rays converter construction is also important, but it does not depend on the type of accelerator. The article discusses the economy of radiation processing with high-intensity of X-rays stream emitted by conversion of electron beams accelerated in direct accelerator (electron energy 2.5 MeV) and resonant accelerators (electron energy 5 MeV and 7.5 MeV). The evaluation and comparison of the costs of alternative technical solutions were included to estimate the unit cost of X-rays facility operation for average beam power 100 kW.
APA, Harvard, Vancouver, ISO, and other styles
4

Barbareschi, Mario, Salvatore Barone, and Nicola Mazzocca. "Advancing synthesis of decision tree-based multiple classifier systems: an approximate computing case study." Knowledge and Information Systems 63, no. 6 (April 12, 2021): 1577–96. http://dx.doi.org/10.1007/s10115-021-01565-5.

Full text
Abstract:
AbstractSo far, multiple classifier systems have been increasingly designed to take advantage of hardware features, such as high parallelism and computational power. Indeed, compared to software implementations, hardware accelerators guarantee higher throughput and lower latency. Although the combination of multiple classifiers leads to high classification accuracy, the required area overhead makes the design of a hardware accelerator unfeasible, hindering the adoption of commercial configurable devices. For this reason, in this paper, we exploit approximate computing design paradigm to trade hardware area overhead off for classification accuracy. In particular, starting from trained DT models and employing precision-scaling technique, we explore approximate decision tree variants by means of multiple objective optimization problem, demonstrating a significant performance improvement targeting field-programmable gate array devices.
APA, Harvard, Vancouver, ISO, and other styles
5

Wolfenden, Joseph, Alexandra S. Alexandrova, Frank Jackson, Storm Mathisen, Geoffrey Morris, Thomas H. Pacey, Narender Kumar, Monika Yadav, Angus Jones, and Carsten P. Welsch. "Cherenkov Radiation in Optical Fibres as a Versatile Machine Protection System in Particle Accelerators." Sensors 23, no. 4 (February 16, 2023): 2248. http://dx.doi.org/10.3390/s23042248.

Full text
Abstract:
Machine protection systems in high power particle accelerators are crucial. They can detect, prevent, and respond to events which would otherwise cause damage and significant downtime to accelerator infrastructure. Current systems are often resource heavy and operationally expensive, reacting after an event has begun to cause damage; this leads to facilities only covering certain operational modes and setting lower limits on machine performance. Presented here is a new type of machine protection system based upon optical fibres, which would be complementary to existing systems, elevating existing performance. These fibres are laid along an accelerator beam line in lengths of ∼100 m, providing continuous coverage over this distance. When relativistic particles pass through these fibres, they generate Cherenkov radiation in the optical spectrum. This radiation propagates in both directions along the fibre and can be detected at both ends. A calibration based technique allows the location of the Cherenkov radiation source to be pinpointed to within 0.5 m with a resolution of 1 m. This measurement mechanism, from a single device, has multiple applications within an accelerator facility. These include beam loss location monitoring, RF breakdown prediction, and quench prevention. Detailed here are the application processes and results from measurements, which provide proof of concept for this device for both beam loss monitoring and RF breakdown detection. Furthermore, highlighted are the current challenges for future innovation.
APA, Harvard, Vancouver, ISO, and other styles
6

Langerman, David, and Alan George. "Real-time, High-resolution Depth Upsampling on Embedded Accelerators." ACM Transactions on Embedded Computing Systems 20, no. 3 (April 2021): 1–22. http://dx.doi.org/10.1145/3436878.

Full text
Abstract:
High-resolution, low-latency apps in computer vision are ubiquitous in today’s world of mixed-reality devices. These innovations provide a platform that can leverage the improving technology of depth sensors and embedded accelerators to enable higher-resolution, lower-latency processing for 3D scenes using depth-upsampling algorithms. This research demonstrates that filter-based upsampling algorithms are feasible for mixed-reality apps using low-power hardware accelerators. The authors parallelized and evaluated a depth-upsampling algorithm on two different devices: a reconfigurable-logic FPGA embedded within a low-power SoC; and a fixed-logic embedded graphics processing unit. We demonstrate that both accelerators can meet the real-time requirements of 11 ms latency for mixed-reality apps. 1
APA, Harvard, Vancouver, ISO, and other styles
7

Freda, Robert, Bradford Knight, and Siddharth Pannir. "A Theory for Power Extraction from Passive Accelerators and Confined Flows." Energies 13, no. 18 (September 16, 2020): 4854. http://dx.doi.org/10.3390/en13184854.

Full text
Abstract:
No accepted fluid theory exists for power extraction from unpressurized confined flow. The absence of a valid model to determine baseline uniform power extraction in confined flows creates difficulties in characterizing the coefficient of power. Currently, the primary body of research has been limited to Diffuser Augmented Wind Turbines (DAWTs) and passive fluid accelerators. Fluid power is proportional to the cube of velocity; therefore, passive acceleration is a promising path to effective renewable energy. Hypothetical models and experiments for passive accelerators yield low ideal power limits and poor performance, respectively. We show that these results derive from the misapplication of Betz’s Law and lack of a general theory for confined flow extraction. Experimental performance is due to the low efficiency of DAWTs and prior hypotheses exhibit high predictive error and continuity violations. A fluid model that accurately predicts available data and new experimental data, showing disk specific maximum CP for the confined channel at 38% of power available to disk, is presented. This is significantly lower than the 59% Betz freestream limit yielded by hypothetical models when the area ratio equals one. Experiments and their results are presented with non-DAWT accelerators, where new experimental results exceed CP limits predicted previously and correlate with the proposed predictive model.
APA, Harvard, Vancouver, ISO, and other styles
8

Tian, Shuo, Lei Wang, Shi Xu, Shasha Guo, Zhijie Yang, Jianfeng Zhang, and Weixia Xu. "A Systolic Accelerator for Neuromorphic Visual Recognition." Electronics 9, no. 10 (October 15, 2020): 1690. http://dx.doi.org/10.3390/electronics9101690.

Full text
Abstract:
Advances in neuroscience have encouraged researchers to focus on developing computational models that behave like the human brain. HMAX is one of the potential biologically inspired models that mimic the primate visual cortex’s functions and structures. HMAX has shown its effectiveness and versatility in multi-class object recognition with a simple computational structure. It is still a challenge to implement the HMAX model in embedded systems due to the heaviest computational S2 phase of HMAX. Previous implementations such as CoRe16 have used a reconfigurable two-dimensional processing element (PE) array to speed up the S2 layer for HMAX. However, the adder tree mechanism in CoRe16 used to produce output pixels by accumulating partial sums in different PEs increases the runtime for HMAX. To speed up the execution process of the S2 layer in HMAX, in this paper, we propose SAFA (systolic accelerator for HMAX), a systolic-array based architecture to compute and accelerate the S2 stage of HMAX. Using the output stationary (OS) dataflow, each PE in SAFA not only calculates the output pixel independently without additional accumulation of partial sums in multiple PEs, but also reduces the multiplexers applied in reconfigurable accelerators. Besides, data forwarding for the same input or weight data in OS reduces the memory bandwidth requirements. The simulation results show that the runtime of the heaviest computational S2 stage in HMAX model is decreased by 5.7%, and the bandwidth required for memory is reduced by 3.53 × on average by different kernel sizes (except for kernel = 12) compared with CoRe16. SAFA also obtains lower power and area costs than other reconfigurable accelerators from synthesis on ASIC.
APA, Harvard, Vancouver, ISO, and other styles
9

Haugen, K. L., K. Papastergiou, P. Asimakopoulos, and D. Peftitsis. "High precision scalable power converter for accelerator magnets." Journal of Instrumentation 17, no. 03 (March 1, 2022): C03021. http://dx.doi.org/10.1088/1748-0221/17/03/c03021.

Full text
Abstract:
Abstract The lower conduction power losses and the positive temperature coefficient that favours parallel connections, make Silicon Carbide (SiC) Metal Oxide Semiconductor Field-Effect Transistors (MOSFETs) to be an excellent replacement of existing Silicon Insulated Gate Bipolar Transistors (IGBTs) technology. These characteristics combined with high switching frequency operation, enables the design of high-accuracy DC-DC converters with minimised filtering requirements. This paper investigates the design for a converter with high-accuracy current (0.9 ppm) supplying a 0.05 H electromagnetic load, aiming to achieve the accuracy without the use of active filters, by using SiC MOSFETs and a scalable module-based converter design.
APA, Harvard, Vancouver, ISO, and other styles
10

Shin, Isu, Jongsang Son, Soonjae Ahn, Jeseong Ryu, Sunwoo Park, Jongman Kim, Baekdong Cha, Eunkyoung Choi, and Youngho Kim. "A Novel Short-Time Fourier Transform-Based Fall Detection Algorithm Using 3-Axis Accelerations." Mathematical Problems in Engineering 2015 (2015): 1–7. http://dx.doi.org/10.1155/2015/394340.

Full text
Abstract:
The short-time Fourier transform- (STFT-) based algorithm was suggested to distinguish falls from various activities of daily living (ADLs). Forty male subjects volunteered in the experiments including three types of falls and four types of ADLs. An inertia sensor unit attached to the middle of two anterior superior iliac spines was used to measure the 3-axis accelerations at 100 Hz. The measured accelerations were transformed to signal vector magnitude values to be analyzed using STFT. The powers of low frequency components were extracted, and the fall detection was defined as whether the normalized power was less than the threshold (50% of the normal power). Most power was observed at the frequency band lower than 5 Hz in all activities, but the dramatic changes in the power were found only in falls. The specificity of 1–3 Hz frequency components was the best (100%), but the sensitivity was much smaller compared with 4 Hz component. The 4 Hz component showed the best fall detection with 96.9% sensitivity and 97.1% specificity. We believe that the suggested algorithm based on STFT would be useful in the fall detection and the classification from ADLs as well.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Lowe Power Accelerators"

1

ROOZMEH, MEHDI. "High Performance Computing via High Level Synthesis." Doctoral thesis, Politecnico di Torino, 2018. http://hdl.handle.net/11583/2710706.

Full text
Abstract:
As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to be taken into consideration are (1) performance (by definition) and (2) energy consumption (since operational costs dominate over procurement costs). These requirements can be satisfied more easily by deploying heterogeneous platforms, which include CPUs, GPUs and FPGAs to provide a broad range of performance and energy-per-operation choices. In particular, as we will see, FPGAs clearly dominate both CPUs and GPUs in terms of energy, and can provide comparable performance. An important aspect of this trend is of course design technology, because these applications were traditionally programmed in high-level languages, while FPGAs required low-level RTL design. The OpenCL (Open Computing Language) developed by the Khronos group enables developers to program CPU, GPU and recently FPGAs using functionally portable (but sadly not performance portable) source code which creates new possibilities and challenges both for research and industry. FPGAs have been always used for mid-size designs and ASIC prototyping thanks to their energy efficient and flexible hardware architecture, but their usage requires hardware design knowledge and laborious design cycles. Several approaches are developed and deployed to address this issue and shorten the gap between software and hardware in FPGA design flow, in order to enable FPGAs to capture a larger portion of the hardware acceleration market in data centers. Moreover, FPGAs usage in data centers is growing already, regardless of and in addition to their use as computational accelerators, because they can be used as high performance, low power and secure switches inside data-centers. High-Level Synthesis (HLS) is the methodology that enables designers to map their applications on FPGAs (and ASICs). It synthesizes parallel hardware from a model originally written C-based programming languages .e.g. C/C++, SystemC and OpenCL. Design space exploration of the variety of implementations that can be obtained from this C model is possible through wide range of optimization techniques and directives, e.g. to pipeline loops and partition memories into multiple banks, which guide RTL generation toward application dependent hardware and benefit designers from flexible parallel architecture of FPGAs. Model Based Design (MBD) is a high-level and visual process used to generate implementations that solve mathematical problems through a varied set of IP-blocks. MBD enables developers with different expertise, e.g. control theory, embedded software development, and hardware design to share a common design framework and contribute to a shared design using the same tool. Simulink, developed by MATLAB, is a model based design tool for simulation and development of complex dynamical systems. Moreover, Simulink embedded code generators can produce verified C/C++ and HDL code from the graphical model. This code can be used to program micro-controllers and FPGAs. This PhD thesis work presents a study using automatic code generator of Simulink to target Xilinx FPGAs using both HDL and C/C++ code to demonstrate capabilities and challenges of high-level synthesis process. To do so, firstly, digital signal processing unit of a real-time radar application is developed using Simulink blocks. Secondly, generated C based model was used for high level synthesis process and finally the implementation cost of HLS is compared to traditional HDL synthesis using Xilinx tool chain. Alternative to model based design approach, this work also presents an analysis on FPGA programming via high-level synthesis techniques for computationally intensive algorithms and demonstrates the importance of HLS by comparing performance-per-watt of GPUs(NVIDIA) and FPGAs(Xilinx) manufactured in the same node running standard OpenCL benchmarks. We conclude that generation of high quality RTL from OpenCL model requires stronger hardware background with respect to the MBD approach, however, the availability of a fast and broad design space exploration ability and portability of the OpenCL code, e.g. to CPUs and GPUs, motivates FPGA industry leaders to provide users with OpenCL software development environment which promises FPGA programming in CPU/GPU-like fashion. Our experiments, through extensive design space exploration(DSE), suggest that FPGAs have higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology(28 nm). Moreover, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming much less power at the cost of more expensive devices.
APA, Harvard, Vancouver, ISO, and other styles
2

Riera, Villanueva Marc. "Low-power accelerators for cognitive computing." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669828.

Full text
Abstract:
Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications, and are especially efficient in classification and decision making problems such as speech recognition or machine translation. Mobile and embedded devices increasingly rely on DNNs to understand the world. Smartphones, smartwatches and cars perform discriminative tasks, such as face or object recognition, on a daily basis. Despite the increasing popularity of DNNs, running them on mobile and embedded systems comes with several main challenges: delivering high accuracy and performance with a small memory and energy budget. Modern DNN models consist of billions of parameters requiring huge computational and memory resources and, hence, they cannot be directly deployed on low-power systems with limited resources. The objective of this thesis is to address these issues and propose novel solutions in order to design highly efficient custom accelerators for DNN-based cognitive computing systems. In first place, we focus on optimizing the inference of DNNs for sequence processing applications. We perform an analysis of the input similarity between consecutive DNN executions. Then, based on the high degree of input similarity, we propose DISC, a hardware accelerator implementing a Differential Input Similarity Computation technique to reuse the computations of the previous execution, instead of computing the entire DNN. We observe that, on average, more than 60% of the inputs of any neural network layer tested exhibit negligible changes with respect to the previous execution. Avoiding the memory accesses and computations for these inputs results in 63% energy savings on average. In second place, we propose to further optimize the inference of FC-based DNNs. We first analyze the number of unique weights per input neuron of several DNNs. Exploiting common optimizations, such as linear quantization, we observe a very small number of unique weights per input for several FC layers of modern DNNs. Then, to improve the energy-efficiency of FC computation, we present CREW, a hardware accelerator that implements a Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage. We evaluate CREW on a diverse set of modern DNNs. On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator. In third place, we propose a mechanism to optimize the inference of RNNs. RNN cells perform element-wise multiplications across the activations of different gates, sigmoid and tanh being the common activation functions. We perform an analysis of the activation function values, and show that a significant fraction are saturated towards zero or one in popular RNNs. Then, we propose CGPA to dynamically prune activations from RNNs at a coarse granularity. CGPA avoids the evaluation of entire neurons whenever the outputs of peer neurons are saturated. CGPA significantly reduces the amount of computations and memory accesses while avoiding sparsity by a large extent, and can be easily implemented on top of conventional accelerators such as TPU with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs. Finally, in the last contribution of this thesis we focus on static DNN pruning methodologies. DNN pruning reduces memory footprint and computational work by removing connections and/or neurons that are ineffectual. However, we show that prior pruning schemes require an extremely time-consuming iterative process that requires retraining the DNN many times to tune the pruning parameters. Then, we propose a DNN pruning scheme based on Principal Component Analysis and relative importance of each neuron's connection that automatically finds the optimized DNN in one shot.
Les xarxes neuronals profundes (DNN) han aconseguit un èxit enorme en aplicacions cognitives, i són especialment eficients en problemes de classificació i presa de decisions com ara reconeixement de veu o traducció automàtica. Els dispositius mòbils depenen cada cop més de les DNNs per entendre el món. Els telèfons i rellotges intel·ligents, o fins i tot els cotxes, realitzen diàriament tasques discriminatòries com ara el reconeixement de rostres o objectes. Malgrat la popularitat creixent de les DNNs, el seu funcionament en sistemes mòbils presenta diversos reptes: proporcionar una alta precisió i rendiment amb un petit pressupost de memòria i energia. Les DNNs modernes consisteixen en milions de paràmetres que requereixen recursos computacionals i de memòria enormes i, per tant, no es poden utilitzar directament en sistemes de baixa potència amb recursos limitats. L'objectiu d'aquesta tesi és abordar aquests problemes i proposar noves solucions per tal de dissenyar acceleradors eficients per a sistemes de computació cognitiva basats en DNNs. En primer lloc, ens centrem en optimitzar la inferència de les DNNs per a aplicacions de processament de seqüències. Realitzem una anàlisi de la similitud de les entrades entre execucions consecutives de les DNNs. A continuació, proposem DISC, un accelerador que implementa una tècnica de càlcul diferencial, basat en l'alt grau de semblança de les entrades, per reutilitzar els càlculs de l'execució anterior, en lloc de computar tota la xarxa. Observem que, de mitjana, més del 60% de les entrades de qualsevol capa de les DNNs utilitzades presenten canvis menors respecte a l'execució anterior. Evitar els accessos de memòria i càlculs d'aquestes entrades comporta un estalvi d'energia del 63% de mitjana. En segon lloc, proposem optimitzar la inferència de les DNNs basades en capes FC. Primer analitzem el nombre de pesos únics per neurona d'entrada en diverses xarxes. Aprofitant optimitzacions comunes com la quantització lineal, observem un nombre molt reduït de pesos únics per entrada en diverses capes FC de DNNs modernes. A continuació, per millorar l'eficiència energètica del càlcul de les capes FC, presentem CREW, un accelerador que implementa un eficient mecanisme de reutilització de càlculs i emmagatzematge dels pesos. CREW redueix el nombre de multiplicacions i proporciona estalvis importants en l'ús de la memòria. Avaluem CREW en un conjunt divers de DNNs modernes. CREW proporciona, de mitjana, una millora en rendiment de 2,61x i un estalvi d'energia de 2,42x. En tercer lloc, proposem un mecanisme per optimitzar la inferència de les RNNs. Les cel·les de les xarxes recurrents realitzen multiplicacions element a element de les activacions de diferents comportes, sigmoides i tanh sent les funcions habituals d'activació. Realitzem una anàlisi dels valors de les funcions d'activació i mostrem que una fracció significativa està saturada cap a zero o un en un conjunto d'RNNs populars. A continuació, proposem CGPA per podar dinàmicament les activacions de les RNNs a una granularitat gruixuda. CGPA evita l'avaluació de neurones senceres cada vegada que les sortides de neurones parelles estan saturades. CGPA redueix significativament la quantitat de càlculs i accessos a la memòria, aconseguint en mitjana un 12% de millora en el rendiment i estalvi d'energia. Finalment, en l'última contribució d'aquesta tesi ens centrem en metodologies de poda estàtica de les DNNs. La poda redueix la petjada de memòria i el treball computacional mitjançant l'eliminació de connexions o neurones redundants. Tanmateix, mostrem que els esquemes de poda previs fan servir un procés iteratiu molt llarg que requereix l'entrenament de les DNNs moltes vegades per ajustar els paràmetres de poda. A continuació, proposem un esquema de poda basat en l'anàlisi de components principals i la importància relativa de les connexions de cada neurona que optimitza automàticament el DNN optimitzat en un sol tret sense necessitat de sintonitzar manualment múltiples paràmetres
APA, Harvard, Vancouver, ISO, and other styles
3

Yang, Yunfeng. "Low Power UDP/IP Accelerator for IM3910 Processor." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-92241.

Full text
Abstract:
Due to their attractive flexibility and high productivity, general purpose processors (GPPs) are found to be spreading over large domain of applications. The growing complexity of modern applications results in high performance demands and as a response, several solutions have came to fulfill these demands. One of these solutions is to couple the GPP with a hardware accelerator to off-load critical functionalities. In this thesis, A UDP/IP hardware accelerator is build to and coupled with an existing GPP with DMA interface, namely IM3910 form Imsys Technology AB in Stockholm, Sweden. The main goal of this thesis is to investigate the semantics of coupling the accelerator with IM3910 and to characterize its area, performance and power consumption. Building this UDP/IP accelerator started from an initial version taken from "OpenCore" and then it was completed and optimized to suit the project needs. After verifying the accelerator at RTL, it was prototyped using Altera FPGA and connected to IM3910 through the DMA. In the last step, the accelerator was synthesized to gate-level netlist using 90nm technology library and characterized in-terms of area, performance and power consumption.
APA, Harvard, Vancouver, ISO, and other styles
4

Yazdani, Aminabadi Reza. "Ultra low-power, high-performance accelerator for speech recognition." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/667429.

Full text
Abstract:
Automatic Speech Recognition (ASR) is undoubtedly one of the most important and interesting applications in the cutting-edge era of Deep-learning deployment, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost, requiring huge memory storage and computational power, which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems as well as reducing its memory pressure, while delivering high-performance. In this thesis, we present a customized accelerator for large-vocabulary, speaker-independent, continuous speech recognition. A state-of-the-art ASR system consists of two major components: acoustic-scoring using DNN and speech-graph decoding using Viterbi search. As the first step, we focus on the Viterbi search algorithm, that represents the main bottleneck in the ASR system. The accelerator includes some innovative techniques to improve the memory subsystem, which is the main bottleneck for performance and power, such as a prefetching scheme and a novel bandwidth saving technique tailored to the needs of ASR. Furthermore, as the speech graph is vast taking more than 1-Gigabyte memory space, we propose to change its representation by partitioning it into several sub-graphs and perform an on-the-fly composition during the Viterbi run-time. This approach together with some simple yet efficient compression techniques result in 31x memory footprint reduction, providing 155x real-time speedup and orders of magnitude power and energy saving compared to CPUs and GPUs. In the next step, we propose a novel hardware-based ASR system that effectively integrates a DNN accelerator for the pruned/quantized models with the Viterbi accelerator. We show that, when either pruning or quantizing the DNN model used for acoustic scoring, ASR accuracy is maintained but the execution time of the ASR system is increased by 33%. Although pruning and quantization improves the efficiency of the DNN, they result in a huge increase of activity in the Viterbi search since the output scores of the pruned model are less reliable. In order to avoid the aforementioned increase in Viterbi search workload, our system loosely selects the N-best hypotheses at every time step, exploring only the N most likely paths. Our final solution manages to efficiently combine both DNN and Viterbi accelerators using all their optimizations, delivering 222x real-time ASR with a small power budget of 1.26 Watt, small memory footprint of 41 MB, and a peak memory bandwidth of 381 MB/s, being amenable for low-power mobile platforms.
Los sistemas de reconocimiento automático del habla (ASR por sus siglas en inglés, Automatic Speech Recognition) son sin lugar a dudas una de las aplicaciones más relevantes en el área emergente de aprendizaje profundo (Deep Learning), specialmente en el segmento de los dispositivos móviles. Realizar el reconocimiento del habla de forma rápida y precisa tiene un elevado coste en energía, requiere de gran capacidad de memoria y de cómputo, lo cual no es deseable en sistemas móviles que tienen severas restricciones de consumo energético y disipación de potencia. El uso de arquitecturas específicas en forma de aceleradores hardware permite reducir el consumo energético de los sistemas de reconocimiento del habla, al tiempo que mejora el rendimiento y reduce la presión en el sistema de memoria. En esta tesis presentamos un acelerador específicamente diseñado para sistemas de reconocimiento del habla de gran vocabulario, independientes del orador y que funcionan en tiempo real. Un sistema de reconocimiento del habla estado del arte consiste principalmente en dos componentes: el modelo acústico basado en una red neuronal profunda (DNN, Deep Neural Network) y la búsqueda de Viterbi basada en un grafo que representa el lenguaje. Como primer objetivo nos centramos en la búsqueda de Viterbi, ya que representa el principal cuello de botella en los sistemas ASR. El acelerador para el algoritmo de Viterbi incluye técnicas innovadoras para mejorar el sistema de memoria, que es el mayor cuello de botella en rendimiento y energía, incluyendo técnicas de pre-búsqueda y una nueva técnica de ahorro de ancho de banda a memoria principal específicamente diseñada para sistemas ASR. Además, como el grafo que representa el lenguaje requiere de gran capacidad de almacenamiento en memoria (más de 1 GB), proponemos cambiar su representación y dividirlo en distintos grafos que se componen en tiempo de ejecución durante la búsqueda de Viterbi. De esta forma conseguimos reducir el almacenamiento en memoria principal en un factor de 31x, alcanzar un rendimiento 155 veces superior a tiempo real y reducir el consumo energético y la disipación de potencia en varios órdenes de magnitud comparado con las CPUs y las GPUs. En el siguiente paso, proponemos un novedoso sistema hardware para reconocimiento del habla que integra de forma efectiva un acelerador para DNNs podadas y cuantizadas con el acelerador de Viterbi. Nuestros resultados muestran que podar y/o cuantizar el DNN para el modelo acústico permite mantener la precisión pero causa un incremento en el tiempo de ejecución del sistema completo de hasta el 33%. Aunque podar/cuantizar mejora la eficiencia del DNN, éstas técnicas producen un gran incremento en la carga de trabajo de la búsqueda de Viterbi ya que las probabilidades calculadas por el DNN son menos fiables, es decir, se reduce la confianza en las predicciones del modelo acústico. Con el fin de evitar un incremento inaceptable en la carga de trabajo de la búsqueda de Viterbi, nuestro sistema restringe la búsqueda a las N hipótesis más probables en cada paso de la búsqueda. Nuestra solución permite combinar de forma efectiva un acelerador de DNNs con un acelerador de Viterbi incluyendo todas las optimizaciones de poda/cuantización. Nuestro resultados experimentales muestran que dicho sistema alcanza un rendimiento 222 veces superior a tiempo real con una disipación de potencia de 1.26 vatios, unos requisitos de memoria modestos de 41 MB y un uso de ancho de banda a memoria principal de, como máximo, 381 MB/s, ofreciendo una solución adecuada para dispositivos móviles.
APA, Harvard, Vancouver, ISO, and other styles
5

Prasad, Rohit <1991&gt. "Integrated Programmable-Array accelerator to design heterogeneous ultra-low power manycore architectures." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amsdottorato.unibo.it/9983/1/PhD_thesis__20_January_2022_.pdf.

Full text
Abstract:
There is an ever-increasing demand for energy efficiency (EE) in rapidly evolving Internet-of-Things end nodes. This pushes researchers and engineers to develop solutions that provide both Application-Specific Integrated Circuit-like EE and Field-Programmable Gate Array-like flexibility. One such solution is Coarse Grain Reconfigurable Array (CGRA). Over the past decades, CGRAs have evolved and are competing to become mainstream hardware accelerators, especially for accelerating Digital Signal Processing (DSP) applications. Due to the over-specialization of computing architectures, the focus is shifting towards fitting an extensive data representation range into fewer bits, e.g., a 32-bit space can represent a more extensive data range with floating-point (FP) representation than an integer representation. Computation using FP representation requires numerous encodings and leads to complex circuits for the FP operators, decreasing the EE of the entire system. This thesis presents the design of an EE ultra-low-power CGRA with native support for FP computation by leveraging an emerging paradigm of approximate computing called transprecision computing. We also present the contributions in the compilation toolchain and system-level integration of CGRA in a System-on-Chip, to envision the proposed CGRA as an EE hardware accelerator. Finally, an extensive set of experiments using real-world algorithms employed in near-sensor processing applications are performed, and results are compared with state-of-the-art (SoA) architectures. It is empirically shown that our proposed CGRA provides better results w.r.t. SoA architectures in terms of power, performance, and area.
APA, Harvard, Vancouver, ISO, and other styles
6

Tabani, Hamid. "Low-power architectures for automatic speech recognition." Doctoral thesis, Universitat Politècnica de Catalunya, 2018. http://hdl.handle.net/10803/462249.

Full text
Abstract:
Automatic Speech Recognition (ASR) is one of the most important applications in the area of cognitive computing. Fast and accurate ASR is emerging as a key application for mobile and wearable devices. These devices, such as smartphones, have incorporated speech recognition as one of the main interfaces for user interaction. This trend towards voice-based user interfaces is likely to continue in the next years which is changing the way of human-machine interaction. Effective speech recognition systems require real-time recognition, which is challenging for mobile devices due to the compute-intensive nature of the problem and the power constraints of such systems and involves a huge effort for CPU architectures to reach it. GPU architectures offer parallelization capabilities which can be exploited to increase the performance of speech recognition systems. However, efficiently utilizing the GPU resources for speech recognition is also challenging, as the software implementations exhibit irregular and unpredictable memory accesses and poor temporal locality. The purpose of this thesis is to study the characteristics of ASR systems running on low-power mobile devices in order to propose different techniques to improve performance and energy consumption. We propose several software-level optimizations driven by the power/performance analysis. Unlike previous proposals that trade accuracy for performance by reducing the number of Gaussians evaluated, we maintain accuracy and improve performance by effectively using the underlying CPU microarchitecture. We use a refactored implementation of the GMM evaluation code to ameliorate the impact of branches. Then, we exploit the vector unit available on most modern CPUs to boost GMM computation, introducing a novel memory layout for storing the means and variances of the Gaussians in order to maximize the effectiveness of vectorization. In addition, we compute the Gaussians for multiple frames in parallel, significantly reducing memory bandwidth usage. Our experimental results show that the proposed optimizations provide 2.68x speedup over the baseline Pocketsphinx decoder on a high-end Intel Skylake CPU, while achieving 61% energy savings. On a modern ARM Cortex-A57 mobile processor our techniques improve performance by 1.85x, while providing 59% energy savings without any loss in the accuracy of the ASR system. Secondly, we propose a register renaming technique that exploits register reuse to reduce the pressure on the register file. Our technique leverages physical register sharing by introducing minor changes in the register map table and the issue queue. We evaluated our renaming technique on top of a modern out-of-order processor. The proposed scheme supports precise exceptions and we show that it results in 9.5% performance improvements for GMM evaluation. Our experimental results show that the proposed register renaming scheme provides 6% speedup on average for the SPEC2006 benchmarks. Alternatively, our renaming scheme achieves the same performance while reducing the number of physical registers by 10.5%. Finally, we propose a hardware accelerator for GMM evaluation that reduces the energy consumption by three orders of magnitude compared to solutions based on CPUs and GPUs. The proposed accelerator implements a lazy evaluation scheme where Gaussians are computed on demand, avoiding 50% of the computations. Furthermore, it employs a novel clustering scheme to reduce the size of the GMM parameters, which results in 8x memory bandwidth savings with a negligible impact on accuracy. Finally, it includes a novel memoization scheme that avoids 74.88% of floating-point operations. The end design provides a 164x speedup and 3532x energy reduction when compared with a highly-tuned implementation running on a modern mobile CPU. Compared to a state-of-the-art mobile GPU, the GMM accelerator achieves 5.89x speedup over a highly optimized CUDA implementation, while reducing energy by 241x.
El reconocimiento automático de voz (ASR) es una de las aplicaciones más importantes en el área de la computación cognitiva. ASR rápido y preciso se está convirtiendo en una aplicación clave para dispositivos móviles y portátiles. Estos dispositivos, como los Smartphones, han incorporado el reconocimiento de voz como una de las principales interfaces de usuario. Es probable que esta tendencia hacia las interfaces de usuario basadas en voz continúe en los próximos años, lo que está cambiando la forma de interacción humano-máquina. Los sistemas de reconocimiento de voz efectivos requieren un reconocimiento en tiempo real, que es un desafío para los dispositivos móviles debido a la naturaleza de cálculo intensivo del problema y las limitaciones de potencia de dichos sistemas y supone un gran esfuerzo para las arquitecturas de CPU. Las arquitecturas GPU ofrecen capacidades de paralelización que pueden aprovecharse para aumentar el rendimiento de los sistemas de reconocimiento de voz. Sin embargo, la utilización eficiente de los recursos de la GPU para el reconocimiento de voz también es un desafío, ya que las implementaciones de software presentan accesos de memoria irregulares e impredecibles y una localidad temporal deficiente. El propósito de esta tesis es estudiar las características de los sistemas ASR que se ejecutan en dispositivos móviles de baja potencia para proponer diferentes técnicas para mejorar el rendimiento y el consumo de energía. Proponemos varias optimizaciones a nivel de software impulsadas por el análisis de potencia y rendimiento. A diferencia de las propuestas anteriores que intercambian precisión por el rendimiento al reducir el número de gaussianas evaluadas, mantenemos la precisión y mejoramos el rendimiento mediante el uso efectivo de la microarquitectura subyacente de la CPU. Usamos una implementación refactorizada del código de evaluación de GMM para reducir el impacto de las instrucciones de salto. Explotamos la unidad vectorial disponible en la mayoría de las CPU modernas para impulsar el cálculo de GMM. Además, calculamos las gaussianas para múltiples frames en paralelo, lo que reduce significativamente el uso de ancho de banda de memoria. Nuestros resultados experimentales muestran que las optimizaciones propuestas proporcionan un speedup de 2.68x sobre el decodificador Pocketsphinx en una CPU Intel Skylake de alta gama, mientras que logra un ahorro de energía del 61%. En segundo lugar, proponemos una técnica de renombrado de registros que explota la reutilización de registros físicos para reducir la presión sobre el banco de registros. Nuestra técnica aprovecha el uso compartido de registros físicos mediante la introducción de cambios en la tabla de renombrado de registros y la issue queue. Evaluamos nuestra técnica de renombrado sobre un procesador moderno. El esquema propuesto admite excepciones precisas y da como resultado mejoras de rendimiento del 9.5% para la evaluación GMM. Nuestros resultados experimentales muestran que el esquema de renombrado de registros propuesto proporciona un 6% de aceleración en promedio para SPEC2006. Finalmente, proponemos un acelerador para la evaluación de GMM que reduce el consumo de energía en tres órdenes de magnitud en comparación con soluciones basadas en CPU y GPU. El acelerador propuesto implementa un esquema de evaluación perezosa donde las GMMs se calculan bajo demanda, evitando el 50% de los cálculos. Finalmente, incluye un esquema de memorización que evita el 74.88% de las operaciones de coma flotante. El diseño final proporciona una aceleración de 164x y una reducción de energía de 3532x en comparación con una implementación altamente optimizada que se ejecuta en una CPU móvil moderna. Comparado con una GPU móvil de última generación, el acelerador de GMM logra un speedup de 5.89x sobre una implementación CUDA optimizada, mientras que reduce la energía en 241x.
APA, Harvard, Vancouver, ISO, and other styles
7

Gandolfi, Riccardo. "Design of a memory-to-memory tensor reshuffle unit for ultra-low-power deep learning accelerators." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23706/.

Full text
Abstract:
In the context of IoT edge-processing, deep learning applications and near-sensor analytics, the constraints on having low area occupation and low power consumption in MCUs (Microcontroller Units) performing computationally intensive tasks are more stringent than ever. A promising direction is to develop HWPEs (Hardware Processing Engines) that support and help the end-node in the execution of these tasks. The following work concerns the design and testing of the Datamover, a small and easily configurable HWPE for tensor shuffling and data marshaling operation. The accelerator is to be integrated within the Darkside PULP chip and can perform reordering operations and transpositions on data with different sub-byte widths. The focus is on the design of the internal buffering and transposition mechanism and its performance when compared to a software on-platform execution. Also, synthesis results will be shown in terms of area occupation and timing.
APA, Harvard, Vancouver, ISO, and other styles
8

Bleakley, Steven Shea, and steven bleakley@qr com au. "Time Frequency Analysis of Railway Wagon Body Accelerations for a Low-Power Autonomous Device." Central Queensland University, 2006. http://library-resources.cqu.edu.au./thesis/adt-QCQU/public/adt-QCQU20070622.121515.

Full text
Abstract:
This thesis examines the application of the techniques of Fourier spectrogram and wavelet analysis to a low power embedded microprocessor application in a novel railway and rollingstock monitoring system. The safe and cost effective operation of freight railways is limited by the dynamic performance of wagons running on track. A monitoring system has been proposed comprising of low cost wireless sensing devices, dubbed “Health Cards”, to be installed on every wagon in the fleet. When marshalled into a train, the devices would sense accelerations and communicate via radio network to a master system in the locomotive. The integrated system would provide online information for decision support systems. Data throughput was heavily restricted by the network architecture, so significant signal analysis was required at the device level. An electronics engineering team at Central Queensland University developed a prototype Health Card, incorporating a 27MHz microcontroller and four dual axis accelerometers. A sensing arrangement and online analysis algorithms were required to detect and categorise dynamic events while operating within the constraints of the system. Time-frequency analysis reveals the time varying frequency content of signals, making it suitable to detect and characterise transient events. With efficient algorithms such as the Fast Fourier Transform, and Fast Wavelet Transform, time-frequency analysis methods can be implemented on a low power, embedded microcontroller. This thesis examines the application of time-frequency analysis techniques to wagon body acceleration signals, for the purpose of detecting poor dynamic performance of the wagon-track system. The Fourier spectrogram is implemented on the Health Card prototype and demonstrated in the laboratory. The research and algorithms provide a foundation for ongoing development as resources become available for system testing and validation.
APA, Harvard, Vancouver, ISO, and other styles
9

Xu, Hongjie. "Energy-Efficient On-Chip Cache Architectures and Deep Neural Network Accelerators Considering the Cost of Data Movement." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263786.

Full text
Abstract:
付記する学位プログラム名: 京都大学卓越大学院プログラム「先端光・電子デバイス創成学」
京都大学
新制・課程博士
博士(情報学)
甲第23325号
情博第761号
京都大学大学院情報学研究科通信情報システム専攻
(主査)教授 小野寺 秀俊, 教授 大木 英司, 教授 佐藤 高史
学位規則第4条第1項該当
Doctor of Informatics
Kyoto University
DFAM
APA, Harvard, Vancouver, ISO, and other styles
10

Das, Satyajit. "Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems." Thesis, Lorient, 2018. http://www.theses.fr/2018LORIS490/document.

Full text
Abstract:
La complexité des systèmes embarqués et des applications impose des besoins croissants en puissance de calcul et de consommation énergétique. Couplé au rendement en baisse de la technologie, le monde académique et industriel est toujours en quête d'accélérateurs matériels efficaces en énergie. L'inconvénient d'un accélérateur matériel est qu'il est non programmable, le rendant ainsi dédié à une fonction particulière. La multiplication des accélérateurs dédiés dans les systèmes sur puce conduit à une faible efficacité en surface et pose des problèmes de passage à l'échelle et d'interconnexion. Les accélérateurs programmables fournissent le bon compromis efficacité et flexibilité. Les architectures reconfigurables à gros grains (CGRA) sont composées d'éléments de calcul au niveau mot et constituent un choix prometteur d'accélérateurs programmables. Cette thèse propose d'exploiter le potentiel des architectures reconfigurables à gros grains et de pousser le matériel aux limites énergétiques dans un flot de conception complet. Les contributions de cette thèse sont une architecture de type CGRA, appelé IPA pour Integrated Programmable Array, sa mise en œuvre et son intégration dans un système sur puce, avec le flot de compilation associé qui permet d'exploiter les caractéristiques uniques du nouveau composant, notamment sa capacité à supporter du flot de contrôle. L'efficacité de l'approche est éprouvée à travers le déploiement de plusieurs applications de traitement intensif. L'accélérateur proposé est enfin intégré à PULP, a Parallel Ultra-Low-Power Processing-Platform, pour explorer le bénéfice de ce genre de plate-forme hétérogène ultra basse consommation
Emerging trends in embedded systems and applications need high throughput and low power consumption. Due to the increasing demand for low power computing and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy efficient hardware accelerators. The main drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low is they perform one specific function and increasing the number of the accelerators in a system on chip (SoC) causes scalability issues. Programmable accelerators provide flexibility and solve the scalability issues. Coarse-Grained Reconfigurable Array (CGRA) architecture consisting of several processing elements with word level granularity is a promising choice for programmable accelerator. Inspired by the promising characteristics of programmable accelerators, potentials of CGRAs in near threshold computing platforms are studied and an end-to-end CGRA research framework is developed in this thesis. The major contributions of this framework are: CGRA design, implementation, integration in a computing system, and compilation for CGRA. First, the design and implementation of a CGRA named Integrated Programmable Array (IPA) is presented. Next, the problem of mapping applications with control and data flow onto CGRA is formulated. From this formulation, several efficient algorithms are developed using internal resources of a CGRA, with a vision for low power acceleration. The algorithms are integrated into an automated compilation flow. Finally, the IPA accelerator is augmented in PULP - a Parallel Ultra-Low-Power Processing-Platform to explore heterogeneous computing
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Lowe Power Accelerators"

1

Kalb, Tobias, Lester Kalms, Diana Göhringer, Carlota Pons, Ananya Muddukrishna, Magnus Jahre, Boitumelo Ruf, et al. "Developing Low-Power Image Processing Applications with the TULIPP Reference Platform Instance." In Hardware Accelerators in Data Centers, 181–97. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-92792-3_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Seeman, J., D. Schulte, J. P. Delahaye, M. Ross, S. Stapnes, A. Grudiev, A. Yamamoto, et al. "Design and Principles of Linear Accelerators and Colliders." In Particle Physics Reference Library, 295–336. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-34245-6_7.

Full text
Abstract:
AbstractLinear accelerators (linacs) use alternating radiofrequency (RF) electromagnetic fields to accelerate charged particles in a straight line. Linacs were invented about 95 years ago and have seen many significant technical innovations since. A wide range of particle beams have been accelerated with linacs including beams of electrons, positrons, protons, antiprotons, and heavy ions. Linac parameter possibilities include pulsed versus continuous wave, low and high beam powers, low and high repetition rates, low transverse emittance beams, short bunches with small energy spreads, and accelerated multiple bunches in a single pulse. The number of linacs around the world has grown tremendously with thousands of linacs in present use, many for medical therapy, in industry, and for research and development in a broad spectrum of scientific fields. Researchers have developed accelerators for scientific tools in their own right, being awarded several Nobel prizes. Moreover, linacs and particle accelerators in general have enabled many discovery level science experiments in related fields, resulting in many Nobel prizes as well.
APA, Harvard, Vancouver, ISO, and other styles
3

Liao, Chunhua, Yonghong Yan, Bronis R. de Supinski, Daniel J. Quinlan, and Barbara Chapman. "Early Experiences with the OpenMP Accelerator Model." In OpenMP in the Era of Low Power Devices and Accelerators, 84–98. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40698-0_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Locharla, Govinda Rao, Pogiri Revathi, and M. V. Nageswara Rao. "Compression Techniques for Low Power Hardware Accelerator Design: Case Studies." In Lecture Notes in Electrical Engineering, 117–27. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-5550-1_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Teruel, Xavier, Michael Klemm, Kelvin Li, Xavier Martorell, Stephen L. Olivier, and Christian Terboven. "A Proposal for Task-Generating Loops in OpenMP*." In OpenMP in the Era of Low Power Devices and Accelerators, 1–14. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40698-0_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ghosh, Priyanka, Yonghong Yan, Deepak Eachempati, and Barbara Chapman. "A Prototype Implementation of OpenMP Task Dependency Support." In OpenMP in the Era of Low Power Devices and Accelerators, 128–40. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40698-0_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Durand, Marie, François Broquedis, Thierry Gautier, and Bruno Raffin. "An Efficient OpenMP Loop Scheduler for Irregular Applications on Large-Scale NUMA Machines." In OpenMP in the Era of Low Power Devices and Accelerators, 141–55. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40698-0_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Muddukrishna, Ananya, Peter A. Jonsson, Vladimir Vlassov, and Mats Brorsson. "Locality-Aware Task Scheduling and Data Distribution on NUMA Systems." In OpenMP in the Era of Low Power Devices and Accelerators, 156–70. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40698-0_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Eichenberger, Alexandre E., John Mellor-Crummey, Martin Schulz, Michael Wong, Nawal Copty, Robert Dietrich, Xu Liu, Eugene Loh, and Daniel Lorenz. "OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis." In OpenMP in the Era of Low Power Devices and Accelerators, 171–85. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40698-0_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Qawasmeh, Ahmad, Abid Malik, Barbara Chapman, Kevin Huck, and Allen Malony. "Open Source Task Profiling by Extending the OpenMP Runtime API." In OpenMP in the Era of Low Power Devices and Accelerators, 186–99. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40698-0_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Lowe Power Accelerators"

1

Cong, Jingsheng Jason. "Accelerator-rich architectures." In ISLPED'14: International Symposium on Low Power Electronics and Design. New York, NY, USA: ACM, 2014. http://dx.doi.org/10.1145/2627369.2631636.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Venkatesan, Rangharajan, and Wei Wu. "Session details: Accelerators for Machine Learning." In ISLPED '16: International Symposium on Low Power Electronics and Design. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/3256013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Moss, Arthur, Hyunjong Lee, Lei Xun, Chulhong Min, Fahim Kawsar, and Alessandro Montanari. "Ultra-Low Power DNN Accelerators for IoT." In SenSys '22: The 20th ACM Conference on Embedded Networked Sensor Systems. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3560905.3568300.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Yan, Mingyu, Xing Hu, Shuangchen Li, Itir Akgun, Han Li, Xin Ma, Lei Deng, et al. "Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators." In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 2019. http://dx.doi.org/10.1109/islped.2019.8824832.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Limaye, Ankur, and Tosiron Adegbija. "DOSAGE: Generating Domain-Specific Accelerators for Resource-Constrained Computing." In 2021 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 2021. http://dx.doi.org/10.1109/islped52811.2021.9502501.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chen, Yu-Ting, and Jason Cong. "Interconnect synthesis of heterogeneous accelerators in a shared memory architecture." In 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 2015. http://dx.doi.org/10.1109/islped.2015.7273540.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Sedukhin, Stanislav, Yoichi Tomioka, and Kohei Yamamoto. "In Search of the Performance- and Energy-Efficient CNN Accelerators." In 2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS). IEEE, 2021. http://dx.doi.org/10.1109/coolchips52128.2021.9410350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Sunny, Shine Parekkadan, and Satyajit Das. "Reinforcement Learning based Efficient Mapping of DNN Models onto Accelerators." In 2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS). IEEE, 2022. http://dx.doi.org/10.1109/coolchips54332.2022.9772673.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Cong, Jason, Mohammad Ali Ghodrat, Michael Gill, Beayna Grigorian, Hui Huang, and Glenn Reinman. "Composable accelerator-rich microprocessor enhanced for adaptivity and longevity." In 2013 IEEE International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 2013. http://dx.doi.org/10.1109/islped.2013.6629314.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Holik, Michael, Carlos Granja, and Claude Leroy. "Ultra Low Power Datalogger." In NUCLEAR PHYSICS METHODS AND ACCELERATORS IN BIOLOGY AND MEDICINE: Fifth International Summer School on Nuclear Physics Methods and Accelerators in Biology and Medicine. AIP, 2010. http://dx.doi.org/10.1063/1.3295639.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography