Дисертації з теми "Lowe Power Accelerators"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Lowe Power Accelerators.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-16 дисертацій для дослідження на тему "Lowe Power Accelerators".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

ROOZMEH, MEHDI. "High Performance Computing via High Level Synthesis." Doctoral thesis, Politecnico di Torino, 2018. http://hdl.handle.net/11583/2710706.

Повний текст джерела
Анотація:
As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to be taken into consideration are (1) performance (by definition) and (2) energy consumption (since operational costs dominate over procurement costs). These requirements can be satisfied more easily by deploying heterogeneous platforms, which include CPUs, GPUs and FPGAs to provide a broad range of performance and energy-per-operation choices. In particular, as we will see, FPGAs clearly dominate both CPUs and GPUs in terms of energy, and can provide comparable performance. An important aspect of this trend is of course design technology, because these applications were traditionally programmed in high-level languages, while FPGAs required low-level RTL design. The OpenCL (Open Computing Language) developed by the Khronos group enables developers to program CPU, GPU and recently FPGAs using functionally portable (but sadly not performance portable) source code which creates new possibilities and challenges both for research and industry. FPGAs have been always used for mid-size designs and ASIC prototyping thanks to their energy efficient and flexible hardware architecture, but their usage requires hardware design knowledge and laborious design cycles. Several approaches are developed and deployed to address this issue and shorten the gap between software and hardware in FPGA design flow, in order to enable FPGAs to capture a larger portion of the hardware acceleration market in data centers. Moreover, FPGAs usage in data centers is growing already, regardless of and in addition to their use as computational accelerators, because they can be used as high performance, low power and secure switches inside data-centers. High-Level Synthesis (HLS) is the methodology that enables designers to map their applications on FPGAs (and ASICs). It synthesizes parallel hardware from a model originally written C-based programming languages .e.g. C/C++, SystemC and OpenCL. Design space exploration of the variety of implementations that can be obtained from this C model is possible through wide range of optimization techniques and directives, e.g. to pipeline loops and partition memories into multiple banks, which guide RTL generation toward application dependent hardware and benefit designers from flexible parallel architecture of FPGAs. Model Based Design (MBD) is a high-level and visual process used to generate implementations that solve mathematical problems through a varied set of IP-blocks. MBD enables developers with different expertise, e.g. control theory, embedded software development, and hardware design to share a common design framework and contribute to a shared design using the same tool. Simulink, developed by MATLAB, is a model based design tool for simulation and development of complex dynamical systems. Moreover, Simulink embedded code generators can produce verified C/C++ and HDL code from the graphical model. This code can be used to program micro-controllers and FPGAs. This PhD thesis work presents a study using automatic code generator of Simulink to target Xilinx FPGAs using both HDL and C/C++ code to demonstrate capabilities and challenges of high-level synthesis process. To do so, firstly, digital signal processing unit of a real-time radar application is developed using Simulink blocks. Secondly, generated C based model was used for high level synthesis process and finally the implementation cost of HLS is compared to traditional HDL synthesis using Xilinx tool chain. Alternative to model based design approach, this work also presents an analysis on FPGA programming via high-level synthesis techniques for computationally intensive algorithms and demonstrates the importance of HLS by comparing performance-per-watt of GPUs(NVIDIA) and FPGAs(Xilinx) manufactured in the same node running standard OpenCL benchmarks. We conclude that generation of high quality RTL from OpenCL model requires stronger hardware background with respect to the MBD approach, however, the availability of a fast and broad design space exploration ability and portability of the OpenCL code, e.g. to CPUs and GPUs, motivates FPGA industry leaders to provide users with OpenCL software development environment which promises FPGA programming in CPU/GPU-like fashion. Our experiments, through extensive design space exploration(DSE), suggest that FPGAs have higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology(28 nm). Moreover, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming much less power at the cost of more expensive devices.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Riera, Villanueva Marc. "Low-power accelerators for cognitive computing." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669828.

Повний текст джерела
Анотація:
Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications, and are especially efficient in classification and decision making problems such as speech recognition or machine translation. Mobile and embedded devices increasingly rely on DNNs to understand the world. Smartphones, smartwatches and cars perform discriminative tasks, such as face or object recognition, on a daily basis. Despite the increasing popularity of DNNs, running them on mobile and embedded systems comes with several main challenges: delivering high accuracy and performance with a small memory and energy budget. Modern DNN models consist of billions of parameters requiring huge computational and memory resources and, hence, they cannot be directly deployed on low-power systems with limited resources. The objective of this thesis is to address these issues and propose novel solutions in order to design highly efficient custom accelerators for DNN-based cognitive computing systems. In first place, we focus on optimizing the inference of DNNs for sequence processing applications. We perform an analysis of the input similarity between consecutive DNN executions. Then, based on the high degree of input similarity, we propose DISC, a hardware accelerator implementing a Differential Input Similarity Computation technique to reuse the computations of the previous execution, instead of computing the entire DNN. We observe that, on average, more than 60% of the inputs of any neural network layer tested exhibit negligible changes with respect to the previous execution. Avoiding the memory accesses and computations for these inputs results in 63% energy savings on average. In second place, we propose to further optimize the inference of FC-based DNNs. We first analyze the number of unique weights per input neuron of several DNNs. Exploiting common optimizations, such as linear quantization, we observe a very small number of unique weights per input for several FC layers of modern DNNs. Then, to improve the energy-efficiency of FC computation, we present CREW, a hardware accelerator that implements a Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage. We evaluate CREW on a diverse set of modern DNNs. On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator. In third place, we propose a mechanism to optimize the inference of RNNs. RNN cells perform element-wise multiplications across the activations of different gates, sigmoid and tanh being the common activation functions. We perform an analysis of the activation function values, and show that a significant fraction are saturated towards zero or one in popular RNNs. Then, we propose CGPA to dynamically prune activations from RNNs at a coarse granularity. CGPA avoids the evaluation of entire neurons whenever the outputs of peer neurons are saturated. CGPA significantly reduces the amount of computations and memory accesses while avoiding sparsity by a large extent, and can be easily implemented on top of conventional accelerators such as TPU with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs. Finally, in the last contribution of this thesis we focus on static DNN pruning methodologies. DNN pruning reduces memory footprint and computational work by removing connections and/or neurons that are ineffectual. However, we show that prior pruning schemes require an extremely time-consuming iterative process that requires retraining the DNN many times to tune the pruning parameters. Then, we propose a DNN pruning scheme based on Principal Component Analysis and relative importance of each neuron's connection that automatically finds the optimized DNN in one shot.
Les xarxes neuronals profundes (DNN) han aconseguit un èxit enorme en aplicacions cognitives, i són especialment eficients en problemes de classificació i presa de decisions com ara reconeixement de veu o traducció automàtica. Els dispositius mòbils depenen cada cop més de les DNNs per entendre el món. Els telèfons i rellotges intel·ligents, o fins i tot els cotxes, realitzen diàriament tasques discriminatòries com ara el reconeixement de rostres o objectes. Malgrat la popularitat creixent de les DNNs, el seu funcionament en sistemes mòbils presenta diversos reptes: proporcionar una alta precisió i rendiment amb un petit pressupost de memòria i energia. Les DNNs modernes consisteixen en milions de paràmetres que requereixen recursos computacionals i de memòria enormes i, per tant, no es poden utilitzar directament en sistemes de baixa potència amb recursos limitats. L'objectiu d'aquesta tesi és abordar aquests problemes i proposar noves solucions per tal de dissenyar acceleradors eficients per a sistemes de computació cognitiva basats en DNNs. En primer lloc, ens centrem en optimitzar la inferència de les DNNs per a aplicacions de processament de seqüències. Realitzem una anàlisi de la similitud de les entrades entre execucions consecutives de les DNNs. A continuació, proposem DISC, un accelerador que implementa una tècnica de càlcul diferencial, basat en l'alt grau de semblança de les entrades, per reutilitzar els càlculs de l'execució anterior, en lloc de computar tota la xarxa. Observem que, de mitjana, més del 60% de les entrades de qualsevol capa de les DNNs utilitzades presenten canvis menors respecte a l'execució anterior. Evitar els accessos de memòria i càlculs d'aquestes entrades comporta un estalvi d'energia del 63% de mitjana. En segon lloc, proposem optimitzar la inferència de les DNNs basades en capes FC. Primer analitzem el nombre de pesos únics per neurona d'entrada en diverses xarxes. Aprofitant optimitzacions comunes com la quantització lineal, observem un nombre molt reduït de pesos únics per entrada en diverses capes FC de DNNs modernes. A continuació, per millorar l'eficiència energètica del càlcul de les capes FC, presentem CREW, un accelerador que implementa un eficient mecanisme de reutilització de càlculs i emmagatzematge dels pesos. CREW redueix el nombre de multiplicacions i proporciona estalvis importants en l'ús de la memòria. Avaluem CREW en un conjunt divers de DNNs modernes. CREW proporciona, de mitjana, una millora en rendiment de 2,61x i un estalvi d'energia de 2,42x. En tercer lloc, proposem un mecanisme per optimitzar la inferència de les RNNs. Les cel·les de les xarxes recurrents realitzen multiplicacions element a element de les activacions de diferents comportes, sigmoides i tanh sent les funcions habituals d'activació. Realitzem una anàlisi dels valors de les funcions d'activació i mostrem que una fracció significativa està saturada cap a zero o un en un conjunto d'RNNs populars. A continuació, proposem CGPA per podar dinàmicament les activacions de les RNNs a una granularitat gruixuda. CGPA evita l'avaluació de neurones senceres cada vegada que les sortides de neurones parelles estan saturades. CGPA redueix significativament la quantitat de càlculs i accessos a la memòria, aconseguint en mitjana un 12% de millora en el rendiment i estalvi d'energia. Finalment, en l'última contribució d'aquesta tesi ens centrem en metodologies de poda estàtica de les DNNs. La poda redueix la petjada de memòria i el treball computacional mitjançant l'eliminació de connexions o neurones redundants. Tanmateix, mostrem que els esquemes de poda previs fan servir un procés iteratiu molt llarg que requereix l'entrenament de les DNNs moltes vegades per ajustar els paràmetres de poda. A continuació, proposem un esquema de poda basat en l'anàlisi de components principals i la importància relativa de les connexions de cada neurona que optimitza automàticament el DNN optimitzat en un sol tret sense necessitat de sintonitzar manualment múltiples paràmetres
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Yang, Yunfeng. "Low Power UDP/IP Accelerator for IM3910 Processor." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-92241.

Повний текст джерела
Анотація:
Due to their attractive flexibility and high productivity, general purpose processors (GPPs) are found to be spreading over large domain of applications. The growing complexity of modern applications results in high performance demands and as a response, several solutions have came to fulfill these demands. One of these solutions is to couple the GPP with a hardware accelerator to off-load critical functionalities. In this thesis, A UDP/IP hardware accelerator is build to and coupled with an existing GPP with DMA interface, namely IM3910 form Imsys Technology AB in Stockholm, Sweden. The main goal of this thesis is to investigate the semantics of coupling the accelerator with IM3910 and to characterize its area, performance and power consumption. Building this UDP/IP accelerator started from an initial version taken from "OpenCore" and then it was completed and optimized to suit the project needs. After verifying the accelerator at RTL, it was prototyped using Altera FPGA and connected to IM3910 through the DMA. In the last step, the accelerator was synthesized to gate-level netlist using 90nm technology library and characterized in-terms of area, performance and power consumption.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Yazdani, Aminabadi Reza. "Ultra low-power, high-performance accelerator for speech recognition." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/667429.

Повний текст джерела
Анотація:
Automatic Speech Recognition (ASR) is undoubtedly one of the most important and interesting applications in the cutting-edge era of Deep-learning deployment, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost, requiring huge memory storage and computational power, which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems as well as reducing its memory pressure, while delivering high-performance. In this thesis, we present a customized accelerator for large-vocabulary, speaker-independent, continuous speech recognition. A state-of-the-art ASR system consists of two major components: acoustic-scoring using DNN and speech-graph decoding using Viterbi search. As the first step, we focus on the Viterbi search algorithm, that represents the main bottleneck in the ASR system. The accelerator includes some innovative techniques to improve the memory subsystem, which is the main bottleneck for performance and power, such as a prefetching scheme and a novel bandwidth saving technique tailored to the needs of ASR. Furthermore, as the speech graph is vast taking more than 1-Gigabyte memory space, we propose to change its representation by partitioning it into several sub-graphs and perform an on-the-fly composition during the Viterbi run-time. This approach together with some simple yet efficient compression techniques result in 31x memory footprint reduction, providing 155x real-time speedup and orders of magnitude power and energy saving compared to CPUs and GPUs. In the next step, we propose a novel hardware-based ASR system that effectively integrates a DNN accelerator for the pruned/quantized models with the Viterbi accelerator. We show that, when either pruning or quantizing the DNN model used for acoustic scoring, ASR accuracy is maintained but the execution time of the ASR system is increased by 33%. Although pruning and quantization improves the efficiency of the DNN, they result in a huge increase of activity in the Viterbi search since the output scores of the pruned model are less reliable. In order to avoid the aforementioned increase in Viterbi search workload, our system loosely selects the N-best hypotheses at every time step, exploring only the N most likely paths. Our final solution manages to efficiently combine both DNN and Viterbi accelerators using all their optimizations, delivering 222x real-time ASR with a small power budget of 1.26 Watt, small memory footprint of 41 MB, and a peak memory bandwidth of 381 MB/s, being amenable for low-power mobile platforms.
Los sistemas de reconocimiento automático del habla (ASR por sus siglas en inglés, Automatic Speech Recognition) son sin lugar a dudas una de las aplicaciones más relevantes en el área emergente de aprendizaje profundo (Deep Learning), specialmente en el segmento de los dispositivos móviles. Realizar el reconocimiento del habla de forma rápida y precisa tiene un elevado coste en energía, requiere de gran capacidad de memoria y de cómputo, lo cual no es deseable en sistemas móviles que tienen severas restricciones de consumo energético y disipación de potencia. El uso de arquitecturas específicas en forma de aceleradores hardware permite reducir el consumo energético de los sistemas de reconocimiento del habla, al tiempo que mejora el rendimiento y reduce la presión en el sistema de memoria. En esta tesis presentamos un acelerador específicamente diseñado para sistemas de reconocimiento del habla de gran vocabulario, independientes del orador y que funcionan en tiempo real. Un sistema de reconocimiento del habla estado del arte consiste principalmente en dos componentes: el modelo acústico basado en una red neuronal profunda (DNN, Deep Neural Network) y la búsqueda de Viterbi basada en un grafo que representa el lenguaje. Como primer objetivo nos centramos en la búsqueda de Viterbi, ya que representa el principal cuello de botella en los sistemas ASR. El acelerador para el algoritmo de Viterbi incluye técnicas innovadoras para mejorar el sistema de memoria, que es el mayor cuello de botella en rendimiento y energía, incluyendo técnicas de pre-búsqueda y una nueva técnica de ahorro de ancho de banda a memoria principal específicamente diseñada para sistemas ASR. Además, como el grafo que representa el lenguaje requiere de gran capacidad de almacenamiento en memoria (más de 1 GB), proponemos cambiar su representación y dividirlo en distintos grafos que se componen en tiempo de ejecución durante la búsqueda de Viterbi. De esta forma conseguimos reducir el almacenamiento en memoria principal en un factor de 31x, alcanzar un rendimiento 155 veces superior a tiempo real y reducir el consumo energético y la disipación de potencia en varios órdenes de magnitud comparado con las CPUs y las GPUs. En el siguiente paso, proponemos un novedoso sistema hardware para reconocimiento del habla que integra de forma efectiva un acelerador para DNNs podadas y cuantizadas con el acelerador de Viterbi. Nuestros resultados muestran que podar y/o cuantizar el DNN para el modelo acústico permite mantener la precisión pero causa un incremento en el tiempo de ejecución del sistema completo de hasta el 33%. Aunque podar/cuantizar mejora la eficiencia del DNN, éstas técnicas producen un gran incremento en la carga de trabajo de la búsqueda de Viterbi ya que las probabilidades calculadas por el DNN son menos fiables, es decir, se reduce la confianza en las predicciones del modelo acústico. Con el fin de evitar un incremento inaceptable en la carga de trabajo de la búsqueda de Viterbi, nuestro sistema restringe la búsqueda a las N hipótesis más probables en cada paso de la búsqueda. Nuestra solución permite combinar de forma efectiva un acelerador de DNNs con un acelerador de Viterbi incluyendo todas las optimizaciones de poda/cuantización. Nuestro resultados experimentales muestran que dicho sistema alcanza un rendimiento 222 veces superior a tiempo real con una disipación de potencia de 1.26 vatios, unos requisitos de memoria modestos de 41 MB y un uso de ancho de banda a memoria principal de, como máximo, 381 MB/s, ofreciendo una solución adecuada para dispositivos móviles.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Prasad, Rohit <1991&gt. "Integrated Programmable-Array accelerator to design heterogeneous ultra-low power manycore architectures." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amsdottorato.unibo.it/9983/1/PhD_thesis__20_January_2022_.pdf.

Повний текст джерела
Анотація:
There is an ever-increasing demand for energy efficiency (EE) in rapidly evolving Internet-of-Things end nodes. This pushes researchers and engineers to develop solutions that provide both Application-Specific Integrated Circuit-like EE and Field-Programmable Gate Array-like flexibility. One such solution is Coarse Grain Reconfigurable Array (CGRA). Over the past decades, CGRAs have evolved and are competing to become mainstream hardware accelerators, especially for accelerating Digital Signal Processing (DSP) applications. Due to the over-specialization of computing architectures, the focus is shifting towards fitting an extensive data representation range into fewer bits, e.g., a 32-bit space can represent a more extensive data range with floating-point (FP) representation than an integer representation. Computation using FP representation requires numerous encodings and leads to complex circuits for the FP operators, decreasing the EE of the entire system. This thesis presents the design of an EE ultra-low-power CGRA with native support for FP computation by leveraging an emerging paradigm of approximate computing called transprecision computing. We also present the contributions in the compilation toolchain and system-level integration of CGRA in a System-on-Chip, to envision the proposed CGRA as an EE hardware accelerator. Finally, an extensive set of experiments using real-world algorithms employed in near-sensor processing applications are performed, and results are compared with state-of-the-art (SoA) architectures. It is empirically shown that our proposed CGRA provides better results w.r.t. SoA architectures in terms of power, performance, and area.
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Tabani, Hamid. "Low-power architectures for automatic speech recognition." Doctoral thesis, Universitat Politècnica de Catalunya, 2018. http://hdl.handle.net/10803/462249.

Повний текст джерела
Анотація:
Automatic Speech Recognition (ASR) is one of the most important applications in the area of cognitive computing. Fast and accurate ASR is emerging as a key application for mobile and wearable devices. These devices, such as smartphones, have incorporated speech recognition as one of the main interfaces for user interaction. This trend towards voice-based user interfaces is likely to continue in the next years which is changing the way of human-machine interaction. Effective speech recognition systems require real-time recognition, which is challenging for mobile devices due to the compute-intensive nature of the problem and the power constraints of such systems and involves a huge effort for CPU architectures to reach it. GPU architectures offer parallelization capabilities which can be exploited to increase the performance of speech recognition systems. However, efficiently utilizing the GPU resources for speech recognition is also challenging, as the software implementations exhibit irregular and unpredictable memory accesses and poor temporal locality. The purpose of this thesis is to study the characteristics of ASR systems running on low-power mobile devices in order to propose different techniques to improve performance and energy consumption. We propose several software-level optimizations driven by the power/performance analysis. Unlike previous proposals that trade accuracy for performance by reducing the number of Gaussians evaluated, we maintain accuracy and improve performance by effectively using the underlying CPU microarchitecture. We use a refactored implementation of the GMM evaluation code to ameliorate the impact of branches. Then, we exploit the vector unit available on most modern CPUs to boost GMM computation, introducing a novel memory layout for storing the means and variances of the Gaussians in order to maximize the effectiveness of vectorization. In addition, we compute the Gaussians for multiple frames in parallel, significantly reducing memory bandwidth usage. Our experimental results show that the proposed optimizations provide 2.68x speedup over the baseline Pocketsphinx decoder on a high-end Intel Skylake CPU, while achieving 61% energy savings. On a modern ARM Cortex-A57 mobile processor our techniques improve performance by 1.85x, while providing 59% energy savings without any loss in the accuracy of the ASR system. Secondly, we propose a register renaming technique that exploits register reuse to reduce the pressure on the register file. Our technique leverages physical register sharing by introducing minor changes in the register map table and the issue queue. We evaluated our renaming technique on top of a modern out-of-order processor. The proposed scheme supports precise exceptions and we show that it results in 9.5% performance improvements for GMM evaluation. Our experimental results show that the proposed register renaming scheme provides 6% speedup on average for the SPEC2006 benchmarks. Alternatively, our renaming scheme achieves the same performance while reducing the number of physical registers by 10.5%. Finally, we propose a hardware accelerator for GMM evaluation that reduces the energy consumption by three orders of magnitude compared to solutions based on CPUs and GPUs. The proposed accelerator implements a lazy evaluation scheme where Gaussians are computed on demand, avoiding 50% of the computations. Furthermore, it employs a novel clustering scheme to reduce the size of the GMM parameters, which results in 8x memory bandwidth savings with a negligible impact on accuracy. Finally, it includes a novel memoization scheme that avoids 74.88% of floating-point operations. The end design provides a 164x speedup and 3532x energy reduction when compared with a highly-tuned implementation running on a modern mobile CPU. Compared to a state-of-the-art mobile GPU, the GMM accelerator achieves 5.89x speedup over a highly optimized CUDA implementation, while reducing energy by 241x.
El reconocimiento automático de voz (ASR) es una de las aplicaciones más importantes en el área de la computación cognitiva. ASR rápido y preciso se está convirtiendo en una aplicación clave para dispositivos móviles y portátiles. Estos dispositivos, como los Smartphones, han incorporado el reconocimiento de voz como una de las principales interfaces de usuario. Es probable que esta tendencia hacia las interfaces de usuario basadas en voz continúe en los próximos años, lo que está cambiando la forma de interacción humano-máquina. Los sistemas de reconocimiento de voz efectivos requieren un reconocimiento en tiempo real, que es un desafío para los dispositivos móviles debido a la naturaleza de cálculo intensivo del problema y las limitaciones de potencia de dichos sistemas y supone un gran esfuerzo para las arquitecturas de CPU. Las arquitecturas GPU ofrecen capacidades de paralelización que pueden aprovecharse para aumentar el rendimiento de los sistemas de reconocimiento de voz. Sin embargo, la utilización eficiente de los recursos de la GPU para el reconocimiento de voz también es un desafío, ya que las implementaciones de software presentan accesos de memoria irregulares e impredecibles y una localidad temporal deficiente. El propósito de esta tesis es estudiar las características de los sistemas ASR que se ejecutan en dispositivos móviles de baja potencia para proponer diferentes técnicas para mejorar el rendimiento y el consumo de energía. Proponemos varias optimizaciones a nivel de software impulsadas por el análisis de potencia y rendimiento. A diferencia de las propuestas anteriores que intercambian precisión por el rendimiento al reducir el número de gaussianas evaluadas, mantenemos la precisión y mejoramos el rendimiento mediante el uso efectivo de la microarquitectura subyacente de la CPU. Usamos una implementación refactorizada del código de evaluación de GMM para reducir el impacto de las instrucciones de salto. Explotamos la unidad vectorial disponible en la mayoría de las CPU modernas para impulsar el cálculo de GMM. Además, calculamos las gaussianas para múltiples frames en paralelo, lo que reduce significativamente el uso de ancho de banda de memoria. Nuestros resultados experimentales muestran que las optimizaciones propuestas proporcionan un speedup de 2.68x sobre el decodificador Pocketsphinx en una CPU Intel Skylake de alta gama, mientras que logra un ahorro de energía del 61%. En segundo lugar, proponemos una técnica de renombrado de registros que explota la reutilización de registros físicos para reducir la presión sobre el banco de registros. Nuestra técnica aprovecha el uso compartido de registros físicos mediante la introducción de cambios en la tabla de renombrado de registros y la issue queue. Evaluamos nuestra técnica de renombrado sobre un procesador moderno. El esquema propuesto admite excepciones precisas y da como resultado mejoras de rendimiento del 9.5% para la evaluación GMM. Nuestros resultados experimentales muestran que el esquema de renombrado de registros propuesto proporciona un 6% de aceleración en promedio para SPEC2006. Finalmente, proponemos un acelerador para la evaluación de GMM que reduce el consumo de energía en tres órdenes de magnitud en comparación con soluciones basadas en CPU y GPU. El acelerador propuesto implementa un esquema de evaluación perezosa donde las GMMs se calculan bajo demanda, evitando el 50% de los cálculos. Finalmente, incluye un esquema de memorización que evita el 74.88% de las operaciones de coma flotante. El diseño final proporciona una aceleración de 164x y una reducción de energía de 3532x en comparación con una implementación altamente optimizada que se ejecuta en una CPU móvil moderna. Comparado con una GPU móvil de última generación, el acelerador de GMM logra un speedup de 5.89x sobre una implementación CUDA optimizada, mientras que reduce la energía en 241x.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Gandolfi, Riccardo. "Design of a memory-to-memory tensor reshuffle unit for ultra-low-power deep learning accelerators." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23706/.

Повний текст джерела
Анотація:
In the context of IoT edge-processing, deep learning applications and near-sensor analytics, the constraints on having low area occupation and low power consumption in MCUs (Microcontroller Units) performing computationally intensive tasks are more stringent than ever. A promising direction is to develop HWPEs (Hardware Processing Engines) that support and help the end-node in the execution of these tasks. The following work concerns the design and testing of the Datamover, a small and easily configurable HWPE for tensor shuffling and data marshaling operation. The accelerator is to be integrated within the Darkside PULP chip and can perform reordering operations and transpositions on data with different sub-byte widths. The focus is on the design of the internal buffering and transposition mechanism and its performance when compared to a software on-platform execution. Also, synthesis results will be shown in terms of area occupation and timing.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Bleakley, Steven Shea, and steven bleakley@qr com au. "Time Frequency Analysis of Railway Wagon Body Accelerations for a Low-Power Autonomous Device." Central Queensland University, 2006. http://library-resources.cqu.edu.au./thesis/adt-QCQU/public/adt-QCQU20070622.121515.

Повний текст джерела
Анотація:
This thesis examines the application of the techniques of Fourier spectrogram and wavelet analysis to a low power embedded microprocessor application in a novel railway and rollingstock monitoring system. The safe and cost effective operation of freight railways is limited by the dynamic performance of wagons running on track. A monitoring system has been proposed comprising of low cost wireless sensing devices, dubbed “Health Cards”, to be installed on every wagon in the fleet. When marshalled into a train, the devices would sense accelerations and communicate via radio network to a master system in the locomotive. The integrated system would provide online information for decision support systems. Data throughput was heavily restricted by the network architecture, so significant signal analysis was required at the device level. An electronics engineering team at Central Queensland University developed a prototype Health Card, incorporating a 27MHz microcontroller and four dual axis accelerometers. A sensing arrangement and online analysis algorithms were required to detect and categorise dynamic events while operating within the constraints of the system. Time-frequency analysis reveals the time varying frequency content of signals, making it suitable to detect and characterise transient events. With efficient algorithms such as the Fast Fourier Transform, and Fast Wavelet Transform, time-frequency analysis methods can be implemented on a low power, embedded microcontroller. This thesis examines the application of time-frequency analysis techniques to wagon body acceleration signals, for the purpose of detecting poor dynamic performance of the wagon-track system. The Fourier spectrogram is implemented on the Health Card prototype and demonstrated in the laboratory. The research and algorithms provide a foundation for ongoing development as resources become available for system testing and validation.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Xu, Hongjie. "Energy-Efficient On-Chip Cache Architectures and Deep Neural Network Accelerators Considering the Cost of Data Movement." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263786.

Повний текст джерела
Анотація:
付記する学位プログラム名: 京都大学卓越大学院プログラム「先端光・電子デバイス創成学」
京都大学
新制・課程博士
博士(情報学)
甲第23325号
情博第761号
京都大学大学院情報学研究科通信情報システム専攻
(主査)教授 小野寺 秀俊, 教授 大木 英司, 教授 佐藤 高史
学位規則第4条第1項該当
Doctor of Informatics
Kyoto University
DFAM
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Das, Satyajit. "Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems." Thesis, Lorient, 2018. http://www.theses.fr/2018LORIS490/document.

Повний текст джерела
Анотація:
La complexité des systèmes embarqués et des applications impose des besoins croissants en puissance de calcul et de consommation énergétique. Couplé au rendement en baisse de la technologie, le monde académique et industriel est toujours en quête d'accélérateurs matériels efficaces en énergie. L'inconvénient d'un accélérateur matériel est qu'il est non programmable, le rendant ainsi dédié à une fonction particulière. La multiplication des accélérateurs dédiés dans les systèmes sur puce conduit à une faible efficacité en surface et pose des problèmes de passage à l'échelle et d'interconnexion. Les accélérateurs programmables fournissent le bon compromis efficacité et flexibilité. Les architectures reconfigurables à gros grains (CGRA) sont composées d'éléments de calcul au niveau mot et constituent un choix prometteur d'accélérateurs programmables. Cette thèse propose d'exploiter le potentiel des architectures reconfigurables à gros grains et de pousser le matériel aux limites énergétiques dans un flot de conception complet. Les contributions de cette thèse sont une architecture de type CGRA, appelé IPA pour Integrated Programmable Array, sa mise en œuvre et son intégration dans un système sur puce, avec le flot de compilation associé qui permet d'exploiter les caractéristiques uniques du nouveau composant, notamment sa capacité à supporter du flot de contrôle. L'efficacité de l'approche est éprouvée à travers le déploiement de plusieurs applications de traitement intensif. L'accélérateur proposé est enfin intégré à PULP, a Parallel Ultra-Low-Power Processing-Platform, pour explorer le bénéfice de ce genre de plate-forme hétérogène ultra basse consommation
Emerging trends in embedded systems and applications need high throughput and low power consumption. Due to the increasing demand for low power computing and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy efficient hardware accelerators. The main drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low is they perform one specific function and increasing the number of the accelerators in a system on chip (SoC) causes scalability issues. Programmable accelerators provide flexibility and solve the scalability issues. Coarse-Grained Reconfigurable Array (CGRA) architecture consisting of several processing elements with word level granularity is a promising choice for programmable accelerator. Inspired by the promising characteristics of programmable accelerators, potentials of CGRAs in near threshold computing platforms are studied and an end-to-end CGRA research framework is developed in this thesis. The major contributions of this framework are: CGRA design, implementation, integration in a computing system, and compilation for CGRA. First, the design and implementation of a CGRA named Integrated Programmable Array (IPA) is presented. Next, the problem of mapping applications with control and data flow onto CGRA is formulated. From this formulation, several efficient algorithms are developed using internal resources of a CGRA, with a vision for low power acceleration. The algorithms are integrated into an automated compilation flow. Finally, the IPA accelerator is augmented in PULP - a Parallel Ultra-Low-Power Processing-Platform to explore heterogeneous computing
Стилі APA, Harvard, Vancouver, ISO та ін.
11

CAPRA, MAURIZIO. "Application Specific Domain Co-design Hardware Accelerator IP for Deep Learning Enabled Internet-of-Things." Doctoral thesis, Politecnico di Torino, 2022. https://hdl.handle.net/11583/2973427.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Galindo, Muñoz Natalia. "Development of direct measurement techniques for the in-situ internal alignment of accelerating structures." Doctoral thesis, Universitat Politècnica de València, 2018. http://hdl.handle.net/10251/100488.

Повний текст джерела
Анотація:
Las exigentes tolerancias de alineación en los componentes de los futuros colisionadores lineales de partículas requieren el desarrollo de nuevas técnicas de alineación más precisas que las existentes. Este es el caso del Colisionador Lineal Compacto (Compact Linear Collider, CLIC), cuyos objetivos altamente restrictivos de alineamiento alcanzan los 10 um. Para poder lograr el máximo rendimiento del acelerador, es necesario que el posicionamiento de las estructuras que aceleran las partículas y de los campos que las guían cumplan las tolerancias de alineación para dirigir el haz a lo largo de la trayectoria diseñada. Dicho procedimiento consiste en relacionar la posición de los ejes de referencia de cada componente con respecto a objetos externos, o fiduciales, lo cual resulta muy tedioso y económicamente costoso. Los errores sistemáticos y aleatorios se van acumulando en cada paso del proceso y, en consecuencia, la precisión final de alineamiento es todo un desafío. En este contexto, nace el proyecto PACMAN (Particle Accelerator Components Metrology and Alignment to the Nanometre scale), subvencionado por la Unión Europea en el programa FP7 de financiación para la investigación e innovación. El objetivo principal de PACMAN es investigar, desarrollar e implementar una solución integrada alternativa que incorpore todos los pasos de alineación en una misma ubicación, con el objetivo de mejorar la precisión de alineación de los componentes de los aceleradores, en concreto: las estructuras aceleradoras, los cuadrupolos y los monitores de posición de haz. La viabilidad de las soluciones desarrolladas y la precisión de alineamiento alcanzada deben de demostrarse en un banco de pruebas utilizando componentes de CLIC. La estrategia de PACMAN para alcanzar el objetivo técnico se divide en tres pasos. El primero consiste en la fiducialización de los componentes y sus soportes. El segundo paso es el ensamblaje de los componentes en dos tipos de soporte, uno compuesto por un monitor de posición de haz y un cuadrupolo, y otro con cuatro estructuras aceleradoras, tomando como referencia su centro electromagnético. Finalmente, ambos soportes se transportan al túnel para su alineación final utilizando técnicas de hilos tensados. En esta tesis doctoral, se describe el desarrollo de una nueva técnica no destructiva para localizar los ejes electromagnéticos de estructuras aceleradoras y su validación experimental. Para ello, se ha utilizado una estructura aceleradora de CLIC conocida como TD24. Debido a la complejidad mecánica de la TD24, su difícil acceso y su diámetro medio de iris de 5.5 mm, se desarrolla una nueva técnica denominada en esta tesis como 'el método perturbativo' y se realiza una propuesta experimental de validación. El estudio de viabilidad de este método, cumpliendo con los requisitos impuestos de precisión en la medida de 10 um, ha sido realizado con una campaña extensa de simulaciones de campos electromagnéticos en tres dimensiones utilizando la herramienta de software conocida como HFSS. Los resultados de simulación han permitido el desarrollo de un algoritmo muy completo de medidas y han proporcionado las especificaciones técnicas para el diseño conceptual de un banco de pruebas para la medida de los ejes electromagnéticos de la TD24. El preciso ensamblaje del banco de pruebas y sus correspondientes calibraciones, la incorporación de nuevos tratamientos de las medidas en el algoritmo final y la caracterización de fuentes de error en la medida, favorecieron la localización del centro electromagnético en la TD24 con una precisión menor a 1 um con un error estimado menor que 8.5 um, cumplimiendo con los objetivos de precisión establecidos.
In the next generation of linear particle accelerators, challenging alignment tolerances are required in the positioning of the components focusing, accelerating and detecting the beam over the accelerator length in order to achieve the maximum machine performance. In the case of the Compact Linear Collider (CLIC), accelerating structures, beam position monitors and quadrupole magnets need to be aligned in their support with respect to their reference axes with an accuracy of 10 um. To reach such objective, the PACMAN (Particle Accelerator Components Metrology and Alignment to the Nanometer Scale) project strives for the improvement of the current alignment accuracy by developing new methods and tools, whose feasibility should be validated using the major CLIC components. This Ph.D. thesis concerns the investigation, development and implementation of a new non-destructive intracavity technique, referenced here as 'the perturbative method', to determine the electromagnetic axes of accelerating structures by means of a stretched wire, acting as a reference of alignment. Of particular importance is the experimental validation of the method through the 5.5 mm iris-mean aperture CLIC prototype known as TD24, with complex mechanical features and difficult accessibility, in a dedicated test bench. In the first chapter of this thesis, the alignment techniques in particle accelerators and the novel proposals to be implemented in the future linear colliders are introduced, and a detailed description of the PACMAN project is provided. The feasibility study of the method, carried out with extensive electromagnetic fields simulations, is described in chapter 2, giving as a result, the knowledge of the theoretical accuracy expected in the measurement of the electromagnetic axes and facilitating the development of a measurement algorithm. The conceptual design, manufacturing and calibration of the automated experimental set-up, integrating the solution developed to measure the electromagnetic axes of the TD24, are covered in chapter 3. The future lines of research and developments of the perturbative method are also explored. In chapter 4, the most significant results obtained from an extensive experimental work are presented, analysed and compared with simulations. The proof-of-principle is completed, the measurement algorithm is optimised and the electromagnetic centre is measured in the TD24 with a precision less than 1 um and an estimated error less than 8.5 um. Finally, in chapter 5, the developments undertaken along this research work are summarised, the innovative achievements accomplished within the PACMAN project are listed and its impact is analysed.
En la generació pròxima d'acceleradors de partícules lineals, desafiant toleràncies d'alineament és requerit en el posicionament dels components que enfoquen, accelerant i detectant la biga sobre la longitud d'accelerador per tal d'aconseguir l'actuació de màquina màxima. En el cas del Colisionador Compacte Lineal (CLIC), accelerant estructures, monitors de posició de fes i imants necessiten ser alineats en el seu suport amb respectar a les seves destrals de referència amb una precisió de 10 um. Per assolir tal objectiu, el PACMAN (Metrologia de Components de l'Accelerador de partícules i Alineament al Nanometer Escala) projecte s'esforça per la millora de l'actual precisió d'alineament per mètodes nous en desenvolupament i eines, la viabilitat dels quals hauria de ser validada utilitzant els components de CLIC importants. Aquesta tesi concerneix la investigació, desenvolupament i implementació d'un nou no-destructiu tècnica interna, va referenciar ací mentre 'el mètode de pertorbació' per determinar les destrals electromagnètiques d'accelerar estructures mitjançant un cable estès, actuant com a referència d'alineament. De la importància particular és la validació experimental del mètode a través del 5.5 mm iris-roí obertura prototipus de CLIC sabut com TD24, amb característiques mecàniques complexes i accessibilitat difícil, en un banc de prova dedicat. En el primer capítol d'aquesta tesi, les tècniques d'alineament en acceleradors de partícules i les propostes novelles per ser implementades en el futur colisionador lineal és introduït, i una descripció detallada del projecte PACMAN és proporcionat. L'estudi de viabilitat el mètode de pertorbació, va dur a terme amb simulacres de camps electromagnètics extensos, és descrit dins capitol 2, donant com a resultat, el coneixement de la precisió teòrica esperada en la mida de les destrals electromagnètiques i facilitant el desenvolupament d'un algoritme de mida. El disseny conceptual, fabricació i calibratge del conjunt experimental automatitzat-amunt, integrant la solució desenvolupada per mesurar les destrals electromagnètiques del TD24, és cobert dins capitol 3. Les línies futures de recerca i desenvolupaments del mètode és també va explorar. Dins capitol 4, la majoria de resultats significatius van obtenir d'una faena experimental extensa és presentada, analitzat i comparat amb simulacres. La prova-de-el principi és completat, l'algoritme de mida és optimitzat i el centre electromagnètic és mesurat en el TD24 amb una precisió menys d'1 um i un error calculat menys de 8.5 um. Finalment, dins capitol 5, els desenvolupaments empresos al llarg d'aquesta faena de recerca és resumit, les consecucions innovadores van acomplir dins del projecte PACMAN és llistat i el seu impacte és analitzat.
Galindo Muñoz, N. (2018). Development of direct measurement techniques for the in-situ internal alignment of accelerating structures [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/100488
TESIS
Стилі APA, Harvard, Vancouver, ISO та ін.
13

(9781541), Steven Bleakley. "Time frequency analysis of railway wagon body accelerations for a low-power autonomous device." Thesis, 2006. https://figshare.com/articles/thesis/Time_frequency_analysis_of_railway_wagon_body_accelerations_for_a_low-power_autonomous_device/13436474.

Повний текст джерела
Анотація:
This thesis examines the application of the techniques of Fourier spectrogram and wavelet analysis to a low power embedded microprocessor application in a novel railway and rollingstock monitoring system. The safe and cost effective operation of freight railways is limited by the dynamic performance of wagons running on track. A monitoring system has been proposed comprising of low cost wireless sensing devices, dubbed "Health Cards", to be installed on every wagon in the fleet. When marshalled into a train, the devices would sense accelerations and communicate via radio network to a master system in the locomotive. The integrated system would provide online information for decision support systems. Data throughput was heavily restricted by the network architecture, so significant signal analysis was required at the device level. An electronics engineering team at Central Queensland University developed a prototype Health Card, incorporating a 27MHz microcontroller and four dual axis accelerometers. A sensing arrangement and online analysis algorithms were required to detect and categorise dynamic events while operating within the constraints of the system. Time-frequency analysis reveals the time varying frequency content of signals, making it suitable to detect and characterise transient events. With efficient algorithms such as the Fast Fourier Transform, and Fast Wavelet Transform, time-frequency analysis methods can be implemented on a low power, embedded microcontroller. This thesis examines the application of time-frequency analysis techniques to wagon body acceleration signals, for the purpose of detecting poor dynamic performance of the wagon-track system. The Fourier spectrogram is implemented on the Health Card prototype and demonstrated in the laboratory. The research and algorithms provide a foundation for ongoing development as resources become available for system testing and validation.
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Dixit, Kavita P. "Design Studies, Modelling And Testing The RF Characteristics Of The Radio Frequency Quadrupole Accelerator." Thesis, 1997. http://etd.iisc.ernet.in/handle/2005/1817.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
15

"Algorithm and Hardware Design for High Volume Rate 3-D Medical Ultrasound Imaging." Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.55684.

Повний текст джерела
Анотація:
abstract: Ultrasound B-mode imaging is an increasingly significant medical imaging modality for clinical applications. Compared to other imaging modalities like computed tomography (CT) or magnetic resonance imaging (MRI), ultrasound imaging has the advantage of being safe, inexpensive, and portable. While two dimensional (2-D) ultrasound imaging is very popular, three dimensional (3-D) ultrasound imaging provides distinct advantages over its 2-D counterpart by providing volumetric imaging, which leads to more accurate analysis of tumor and cysts. However, the amount of received data at the front-end of 3-D system is extremely large, making it impractical for power-constrained portable systems. In this thesis, algorithm and hardware design techniques to support a hand-held 3-D ultrasound imaging system are proposed. Synthetic aperture sequential beamforming (SASB) is chosen since its computations can be split into two stages, where the output generated of Stage 1 is significantly smaller in size compared to the input. This characteristic enables Stage 1 to be done in the front end while Stage 2 can be sent out to be processed elsewhere. The contributions of this thesis are as follows. First, 2-D SASB is extended to 3-D. Techniques to increase the volume rate of 3-D SASB through a new multi-line firing scheme and use of linear chirp as the excitation waveform, are presented. A new sparse array design that not only reduces the number of active transducers but also avoids the imaging degradation caused by grating lobes, is proposed. A combination of these techniques increases the volume rate of 3-D SASB by 4\texttimes{} without introducing extra computations at the front end. Next, algorithmic techniques to further reduce the Stage 1 computations in the front end are presented. These include reducing the number of distinct apodization coefficients and operating with narrow-bit-width fixed-point data. A 3-D die stacked architecture is designed for the front end. This highly parallel architecture enables the signals received by 961 active transducers to be digitalized, routed by a network-on-chip, and processed in parallel. The processed data are accumulated through a bus-based structure. This architecture is synthesized using TSMC 28 nm technology node and the estimated power consumption of the front end is less than 2 W. Finally, the Stage 2 computations are mapped onto a reconfigurable multi-core architecture, TRANSFORMER, which supports different types of on-chip memory banks and run-time reconfigurable connections between general processing elements and memory banks. The matched filtering step and the beamforming step in Stage 2 are mapped onto TRANSFORMER with different memory configurations. Gem5 simulations show that the private cache mode generates shorter execution time and higher computation efficiency compared to other cache modes. The overall execution time for Stage 2 is 14.73 ms. The average power consumption and the average Giga-operations-per-second/Watt in 14 nm technology node are 0.14 W and 103.84, respectively.
Dissertation/Thesis
Doctoral Dissertation Engineering 2019
Стилі APA, Harvard, Vancouver, ISO та ін.
16

"In Support of High Quality 3-D Ultrasound Imaging for Hand-held Devices." Doctoral diss., 2015. http://hdl.handle.net/2286/R.I.28545.

Повний текст джерела
Анотація:
abstract: Three dimensional (3-D) ultrasound is safe, inexpensive, and has been shown to drastically improve system ease-of-use, diagnostic efficiency, and patient throughput. However, its high computational complexity and resulting high power consumption has precluded its use in hand-held applications. In this dissertation, algorithm-architecture co-design techniques that aim to make hand-held 3-D ultrasound a reality are presented. First, image enhancement methods to improve signal-to-noise ratio (SNR) are proposed. These include virtual source firing techniques and a low overhead digital front-end architecture using orthogonal chirps and orthogonal Golay codes. Second, algorithm-architecture co-design techniques to reduce the power consumption of 3-D SAU imaging systems is presented. These include (i) a subaperture multiplexing strategy and the corresponding apodization method to alleviate the signal bandwidth bottleneck, and (ii) a highly efficient iterative delay calculation method to eliminate complex operations such as multiplications, divisions and square-root in delay calculation during beamforming. These techniques were used to define Sonic Millip3De, a 3-D die stacked architecture for digital beamforming in SAU systems. Sonic Millip3De produces 3-D high resolution images at 2 frames per second with system power consumption of 15W in 45nm technology. Third, a new beamforming method based on separable delay decomposition is proposed to reduce the computational complexity of the beamforming unit in an SAU system. The method is based on minimizing the root-mean-square error (RMSE) due to delay decomposition. It reduces the beamforming complexity of a SAU system by 19x while providing high image fidelity that is comparable to non-separable beamforming. The resulting modified Sonic Millip3De architecture supports a frame rate of 32 volumes per second while maintaining power consumption of 15W in 45nm technology. Next a 3-D plane-wave imaging system that utilizes both separable beamforming and coherent compounding is presented. The resulting system has computational complexity comparable to that of a non-separable non-compounding baseline system while significantly improving contrast-to-noise ratio and SNR. The modified Sonic Millip3De architecture is now capable of generating high resolution images at 1000 volumes per second with 9-fire-angle compounding.
Dissertation/Thesis
Doctoral Dissertation Electrical Engineering 2015
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії