Dissertations / Theses: 'Multiply and accumulate'

1

Duppils, Mattias. "Digitally controlled analog multiply-accumulate units /." Linköping : Univ, 2002. http://www.bibl.liu.se/liupubl/disp/disp2002/tek792s.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Natter, William. "Design and implementation of digit-serial online multiply-accumulate arithmetic operations." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ60479.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Lindahl, Erik. "Design and implementation of a decimation filter using a multi-precision multiply and accumulate unit for an audio range delta sigma analog to digital converter." Thesis, Linköping University, Department of Electrical Engineering, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11261.

Full text

Abstract:

This work presents the design and implementation of a decimation filter for a three bits sigma delta analog to digital converter. The input is audio with a oversampling ratio of 32. Filter optimization and tradeoffs concerning the design is described. The filter is a multistage filter consisting of two cascaded FIR filters. The arithmetic unit is a multi-precision unit that can handle three or 24 bits MAC operations. The designed decimation filter is synthesized on standard cells of a 0.13 μm CMOS library.

APA, Harvard, Vancouver, ISO, and other styles

4

Bowlyn, Kevin Nathaniel. "IMPLEMENTATION OF A NOVEL INTEGRATED DISTRIBUTED ARITHMETIC AND COMPLEX BINARY NUMBER SYSTEM IN FAST FOURIER TRANSFORM ALGORITHM." OpenSIUC, 2017. https://opensiuc.lib.siu.edu/dissertations/1470.

Full text

Abstract:

This research focuses on a novel integrated approach for computing and representing complex numbers as a single entity without the use of any dedicated multiplier for calculating the fast Fourier transform algorithm (FFT), using the Distributed Arithmetic (DA) technique and Complex Binary Number Systems (CBNS). The FFT algorithm is one of the most used and implemented technique employed in many Digital Signal Processing (DSP) applications in the field of science, engineering, and mathematics. The DA approach is a technique that is used to compute the inner dot product between two vectors without the use of any dedicated multipliers. These dedicated multipliers are fast but they consume a large amount of hardware and are quite costly. The DA multiplier process is accomplished by shifting and adding only without the need of any dedicated multiplier. In today's technology, complex numbers are computed using the divide and conquer approach in which complex numbers are divided into two parts: the real and imaginary. The CBNS technique however, allows for each complex addition and multiplication to be computed in one single step instead of two. With the combined DA-CBNS approach for computing the FFT algorithm, those dedicated multipliers are being replaced with a DA system that utilize a Rom-based memory for storing the twiddle factor 'wn' value and the complex arithmetic operations being represented as a single entity, not two, with the CBNS approach. This architectural design was implemented by coding in a very high speed integrated circuit (VHSIC) hardware description language (VHDL) using Xilinx ISE design suite software program version 14.2. This computer aided tool allows for the design to be synthesized to a logic gate level in order to be further implemented onto a Field Programmable Gate Array (FPGA) device. The VHDL code used to build this architecture was downloaded on a Nexys 4 DDR Artix-7 FPGA board for further testing and analysis. This novel technique resulted in the use of no dedicated multipliers and required half the amount of complex arithmetic computations needed for calculating an FFT structure compared with its current traditional approach. Finally, the results showed that for the proposed architecture design, for a 32 bit, 8-point DA-CBNS FFT structure, the results showed a 32% area reduction, 41% power reduction, 59% reduction in run-time, 42% reduction in logic gate cost, and 66% increase in speed. For a 28 bit, 16-point DA-CBNS FFT structure, its area size, power consumption, run-time, and logic gate, were also found to be reduced at approximately 30%, 37%, 60%, and 39%, respectively, with an increase of speed of approximately 67% when compared to the traditional approach that employs dedicated multipliers and computes its complex arithmetic as two separate entities: the real and imaginary.

APA, Harvard, Vancouver, ISO, and other styles

5

Kamp, William Hermanus Michael. "Redundant Number Systems for Optimising Digital Signal Processing Performance in Field Programmable Gate Array." Thesis, University of Canterbury. Electrical and Computer Engineering, 2010. http://hdl.handle.net/10092/4623.

Full text

Abstract:

Speeding up addition is the key to faster digital signal processing (DSP). This can be achieved by exploiting the properties of redundant number systems. Their expanded symbol (digit) alphabet gives them multiple representations for most values. Utilising redundant representations at the output of an adder permits addition to be performed without carry-propagation, yielding fast, constant time performance irrespective of the word length. A resource efficient implementation of this fast adder structure is developed that re-purposes the fast carry logic of low-cost field programmable gate arrays (FPGAs). Experiments confirm constant time addition and show that it outperforms binary ripple carry addition at word lengths of greater than 44 bits in a Xilinx Spartan 3 FPGA and 24 bits in an Altera Cyclone III FPGA. Redundancy also provides other properties that can be exploited for performance gain. Some redundant representations will have more zero-symbols than others. These maximise the opportunities to exploit the multiplicative absorbing and additive identity properties of zero that when exercised reduce superfluous calculations. A serial recoding algorithm is developed that generates a redundant representation for a specified value with as few nonzero symbols as possible. Unlike previously published methods, it accepts a wide specification of number systems including those with irregularly spaced symbol alphabets. A Markov analysis and analysis of the elementary cycles in the formulated state machine provides average and worst case measures for the tested number system. Typically, the average number of non-zero symbols is less than a third and the worst case is less than a half. Further to the increase in zero-symbols, zero-dominance is proposed as a new property of redundant number representations. It promotes a set of representations that have uniquely positioned zero-symbols, in a Pareto-optimal sense. This set covers all representations of a value and is used to select representations to optimise the calculation of a dot-product. The dot-product or vector-multiply is a fundamental operation in DSP, since it is employed in filtering, correlation and convolution. The nonzero partial products can be packed together, substantially reducing the calculation time. The application of redundant number systems provides a two-fold benefit. Firstly, the number of nonzero partial products is reduced. Secondly, a novel opportunity is identified to use the representations in the zero-dominant set to optimise the packing further, gaining an extra 18% improvement. An implementation of the proposed dot-product with partial product packing is developed for a Cyclone II FPGA. It outperforms a quad-multiplier binary implementation in throughput by 50% . Redundant number systems excel at increasing performance in particular DSP subsystems, those that are numerically intensive and consist of considerable accumulation. The conversion back to a binary result is the performance bottleneck in the DSP algorithm, taking a time proportional to a binary adder. Therefore, redundant number systems are best utilised when this conversion cost can be amortised over many fast redundant additions, which is typical in many DSP and communications applications.

APA, Harvard, Vancouver, ISO, and other styles

6

Olano, Jimmy Fernando Tarrillo. "Exploring the use of multiple modular redundancies for masking accumulated faults in SRAM-based FPGAs." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/103895.

Full text

Abstract:

Os erros transientes nos bits de memória de configuração dos FPGAs baseados em SRAM são um tema importante devido ao efeito de persistência e a possibilidade de gerar falhas de funcionamento no circuito implementado. Sempre que um bit de memória de configuração é invertido, o erro transiente será corrigido apenas recarregando o bitstream correto da memória de configuração. Se o bitstream correto não for recarregando, erros transientes persistentes podem se acumular nos bits de memória de configuração provocando uma falha funcional do sistema, o que consequentemente, pode causar uma situação catastrófica. Este cenário se agrava no caso de falhas múltiplas, cuja probabilidade de ocorrência é cada vez maior em novas tecnologias nano-métricas. As estratégias tradicionais para lidar com erros transientes na memória de configuração são baseadas no uso de redundância modular tripla (TMR), e na limpeza da memória (scrubbing) para reparar e evitar a acumulação de erros. A alta eficiência desta técnica para mascarar perturbações tem sido demonstrada em vários estudos, no entanto o TMR visa apenas mascarar falhas individuais. Porém, a tendência tecnológica conduz à redução das dimensões dos transistores o que causa o aumento da susceptibilidade a falhos. Neste novo cenário, as falhas multiplas são mais comuns que as falhas individuais e consequentemente o uso de TMR pode ser inapropriado para ser usado em aplicações de alta confiabilidade. Além disso, sendo que a taxa de falhas está aumentando, é necessário usar altas taxas de reconfiguração o que implica em um elevado custo no consumo de potência. Com o objetivo de lidar com falhas massivas acontecidas na mem[oria de configuração, este trabalho propõe a utilização de um sistema de redundância múltipla composto de n módulos idênticos que operam em conjunto, conhecido como (nMR), e um inovador votador auto-adaptativo que permite mascarar múltiplas falhas no sistema. A principal desvantagem do uso de redundância modular é o seu elevado custo em termos de área e o consumo de energia. No entanto, o problema da sobrecarga em área é cada vez menor devido à maior densidade de componentes em novas tecnologias. Por outro lado, o alto consumo de energia sempre foi um problema nos dispositivos FPGA. Neste trabalho também propõe-se um modelo para prever a sobrecarga de potência causada pelo uso de redundância múltipla em FPGAs baseados em SRAM. A capacidade de tolerar múltiplas falhas pela técnica proposta tem sido avaliada através de experimentos de radiação e campanhas de injeção de falhas de circuitos para um estudo de caso implementado em um FPGA comercial de tecnologia de 65nm. Finalmente, é demostrado que o uso de nMR em FPGAs é uma atrativa e possível solução em termos de potencia, área e confiabilidade medida em unidades de FIT e Mean Time between Failures (MTBF).
Soft errors in the configuration memory bits of SRAM-based FPGAs are an important issue due to the persistence effect and its possibility of generating functional failures in the implemented circuit. Whenever a configuration memory bit cell is flipped, the soft error will be corrected only by reloading the correct configuration memory bitstream. If the correct bitstream is not loaded, persistent soft errors can accumulate in the configuration memory bits provoking a system functional failure in the user’s design, and consequently can cause a catastrophic situation. This scenario gets worse in the event of multi-bit upset, whose probability of occurrence is increasing in new nano-metric technologies. Traditional strategies to deal with soft errors in configuration memory are based on the use of any type of triple modular redundancy (TMR) and the scrubbing of the memory to repair and avoid the accumulation of faults. The high reliability of this technique has been demonstrated in many studies, however TMR is aimed at masking single faults. The technology trend makes lower the dimensions of the transistors, and this leads to increased susceptibility to faults. In this new scenario, it is commoner to have multiple to single faults in the configuration memory of the FPGA, so that the use of TMR is inappropriate in high reliability applications. Furthermore, since the fault rate is increasing, scrubbing rate also needs to be incremented, leading to the increase in power consumption. Aiming at coping with massive upsets between sparse scrubbing, this work proposes the use of a multiple redundancy system composed of n identical modules, known as nmodular redundancy (nMR), operating in tandem and an innovative self-adaptive voter to be able to mask multiple upsets in the system. The main drawback of using modular redundancy is its high cost in terms of area and power consumption. However, area overhead is less and less problem due the higher density in new technologies. On the other hand, the high power consumption has always been a handicap of FPGAs. In this work we also propose a model to prevent power overhead caused by the use of multiple redundancy in SRAM-based FPGAs. The capacity of the proposal to tolerate multiple faults has been evaluated by radiation experiments and fault injection campaigns of study case circuits implemented in a 65nm technology commercial FPGA. Finally we demonstrate that the power overhead generated by the use of nMR in FPGAs is much lower than it is discussed in the literature.

APA, Harvard, Vancouver, ISO, and other styles

7

Ghodrati, Ashkan, and Ahmed Rashid. "Modelling and Simulation of a Power Take-off in Connection with Multiple Wave Energy Converters." Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3396.

Full text

Abstract:

The objective of this thesis is to develop a model that will integrate multiple buoys to a power take-off hub. The model will be derived using a time domain analysis and will consider the hydraulic coupling of the buoys and the power take-off. The derived model is reproduced in MATLAB in order to run simulations. This will give possibility to conduct a parameter study and evaluate the performance of the system. The buoy simulation model is provided by Wave4Power (W4P). It consists of a floater that is rigidly connected to a fully submerged vertical (acceleration) tube open at both ends. The tube contains a piston whose motion relative to the floater-tube system drives a power take-off mechanism. The power take-off model is provided by Ocean Harvesting Technologies AB (OHT). It comprises a mechanical gearbox and a gravity accumulator. The system is utilized to transform the irregular wave energy into a smooth electrical power output. OHT's simulation model needs to be extended with a hydraulic motor at the input shaft. There are control features in both systems, that need to be connected and synchronized with each other. Another major goal within the thesis is to test different online control techniques. A simple control strategy to optimize power capture is called sea-state tuning and it can be achieved by using a mechanical gearbox with several discrete gear ratios or with a variable displacement pump. The gear ratio of the gear box can be regulated according to a 2D look up table based on the average wave amplitude and frequency over a defined time frame. The OHT power take-off utilizes a control strategy, called spill function, to limit the excess power capture and keep the weight accumulator within a span by disengaging the input shaft from the power take-off. This is to be modified to implement power limitation with regulation of the gear ratio of the gearbox.
+46736290781

APA, Harvard, Vancouver, ISO, and other styles

8

Teng, Sin Yong. "Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries." Doctoral thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2020. http://www.nusl.cz/ntk/nusl-433427.

Full text

Abstract:

S tím, jak se neustále vyvíjejí nové technologie pro energeticky náročná průmyslová odvětví, stávající zařízení postupně zaostávají v efektivitě a produktivitě. Tvrdá konkurence na trhu a legislativa v oblasti životního prostředí nutí tato tradiční zařízení k ukončení provozu a k odstavení. Zlepšování procesu a projekty modernizace jsou zásadní v udržování provozních výkonů těchto zařízení. Současné přístupy pro zlepšování procesů jsou hlavně: integrace procesů, optimalizace procesů a intenzifikace procesů. Obecně se v těchto oblastech využívá matematické optimalizace, zkušeností řešitele a provozní heuristiky. Tyto přístupy slouží jako základ pro zlepšování procesů. Avšak, jejich výkon lze dále zlepšit pomocí moderní výpočtové inteligence. Účelem této práce je tudíž aplikace pokročilých technik umělé inteligence a strojového učení za účelem zlepšování procesů v energeticky náročných průmyslových procesech. V této práci je využit přístup, který řeší tento problém simulací průmyslových systémů a přispívá následujícím: (i)Aplikace techniky strojového učení, která zahrnuje jednorázové učení a neuro-evoluci pro modelování a optimalizaci jednotlivých jednotek na základě dat. (ii) Aplikace redukce dimenze (např. Analýza hlavních komponent, autoendkodér) pro vícekriteriální optimalizaci procesu s více jednotkami. (iii) Návrh nového nástroje pro analýzu problematických částí systému za účelem jejich odstranění (bottleneck tree analysis – BOTA). Bylo také navrženo rozšíření nástroje, které umožňuje řešit vícerozměrné problémy pomocí přístupu založeného na datech. (iv) Prokázání účinnosti simulací Monte-Carlo, neuronové sítě a rozhodovacích stromů pro rozhodování při integraci nové technologie procesu do stávajících procesů. (v) Porovnání techniky HTM (Hierarchical Temporal Memory) a duální optimalizace s několika prediktivními nástroji pro podporu managementu provozu v reálném čase. (vi) Implementace umělé neuronové sítě v rámci rozhraní pro konvenční procesní graf (P-graf). (vii) Zdůraznění budoucnosti umělé inteligence a procesního inženýrství v biosystémech prostřednictvím komerčně založeného paradigmatu multi-omics.

APA, Harvard, Vancouver, ISO, and other styles

9

Tavares, Lucas Alves. "O envolvimento da proteína adaptadora 1 (AP-1) no mecanismo de regulação negativa do receptor CD4 por Nef de HIV-1." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/17/17136/tde-06012017-113215/.

Full text

Abstract:

O Vírus da Imunodeficiência Humana (HIV) é o agente etiológico da Síndrome da Imunodeficiência Adquirida (AIDS). A AIDS é uma doença de distribuição mundial, e estima-se que existam atualmente pelo menos 36,9 milhões de pessoas infectadas com o vírus. Durante o seu ciclo replicativo, o HIV promove diversas alterações na fisiologia da célula hospedeira a fim de promover sua sobrevivência e potencializar a replicação. A rápida progressão da infecção pelo HIV-1 em humanos e em modelos animais está intimamente ligada à função da proteína acessória Nef. Dentre as diversas ações de Nef está a regulação negativa de proteínas importantes na resposta imunológica, como o receptor CD4. Sabe-se que esta ação resulta da indução da degradação de CD4 em lisossomos, mas os mecanismos moleculares envolvidos ainda são totalmente elucidados. Nef forma um complexo tripartite com a cauda citosólica de CD4 e a proteína adaptadora 2 (AP-2), em vesículas revestidas por clatrina nascentes, induzindo a internalização e degradação lisossomal de CD4. Pesquisas anteriores demonstraram que o direcionamento de CD4 aos lisossomos por Nef envolve a entrada do receptor na via dos corpos multivesiculares (MVBs), por um mecanismo atípico, pois, embora não necessite da ubiquitinação de carga, depende da ação de proteínas que compõem os ESCRTs (Endosomal Sorting Complexes Required for Transport) e da ação de Alix, uma proteína acessória da maquinaria ESCRT. Já foi reportado que Nef interage com subunidades dos complexos AP-1, AP-2, AP-3 e Nef não parece interagir com subunidades de AP-4 e AP-5. Entretanto, o papel da interação de Nef com AP-1 e AP-3 na regulação negativa de CD4 ainda não está totalmente elucidado. Ademais, AP-1, AP-2 e AP-3 são potencialmente heterogêneos devido à existência de isoformas múltiplas das subunidades codificadas por diferentes genes. Todavia, existem poucos estudos para demonstrar se as diferentes combinações de isoformas dos APs são formadas e se possuem propriedades funcionais distintas. O presente trabalho procurou identificar e caracterizar fatores celulares envolvidos na regulação do tráfego intracelular de proteínas no processo de regulação negativa de CD4 induzido por Nef. Mais especificamente, este estudo buscou caracterizar a participação do complexo AP-1 na modulação negativa de CD4 por Nef de HIV-1, através do estudo funcional das duas isoformas de ?-adaptina, subunidades de AP-1. Utilizando a técnica de Pull-down demonstramos que Nef é capaz de interagir com ?2. Além disso, nossos dados de Imunoblot indicaram que a proteína ?2-adaptina, e não ?1-adaptina, é necessária no processo de degradação lisossomal de CD4 por Nef e que esta participação é conservada para degradação de CD4 por Nef de diferentes cepas virais. Ademais, por citometria de fluxo, o silenciamento de ?2, e não de ?1, compromete a diminuição dos níveis de CD4 por Nef da membrana plasmática. A análise por imunofluorêsncia indireta também revelou que a diminuição dos níveis de ?2 impede a redistribuição de CD4 por Nef para regiões perinucleares, acarretando no acúmulo de CD4, retirados por Nef da membrana plasmática, em endossomos primários. A depleção de ?1A, outra subunidade de AP-1, acarretou na diminuição dos níveis celulares de ?2 e ?1, bem como, no comprometimento da eficiente degradação de CD4 por Nef. Além disso, foi possível observar que, ao perturbar a maquinaria ESCRT via super-expressão de HRS (uma subunidade do complexo ESCRT-0), ocorreu um acumulo de ?2 em endossomos dilatados contendo HRS-GFP, nos quais também detectou-se CD4 que foi internalizado por Nef. Em conjunto, os resultados indicam que ?2-adaptina é uma importante molécula para o direcionamento de CD4 por Nef para a via ESCRT/MVB, mostrando ser uma proteína relevante no sistema endo-lisossomal. Ademais, os resultados indicaram que as isoformas ?-adaptinas não só possuem funções distintas, mas também parecem compor complexos AP-1 com diferentes funções celulares, já que apenas a variante AP-1 contendo ?2, mas não ?1, participa da regulação negativa de CD4 por Nef. Estes estudos contribuem para o melhor entendimento dos mecanismos moleculares envolvidos na atividade de Nef, que poderão também ajudar na melhor compreensão da patogênese do HIV e da síndrome relacionada. Em adição, este trabalho contribui para o entendimento de processos fundamentais da regulação do tráfego de proteínas transmembrana no sistema endo-lisossomal.
The Human Immunodeficiency Virus (HIV) is the etiologic agent of Acquired Immunodeficiency Syndrome (AIDS). AIDS is a disease which has a global distribution, and it is estimated that there are currently at least 36.9 million people infected with the virus. During the replication cycle, HIV promotes several changes in the physiology of the host cell to promote their survival and enhance replication. The fast progression of HIV-1 in humans and animal models is closely linked to the function of an accessory protein Nef. Among several actions of Nef, one is the most important is the down-regulation of proteins from the immune response, such as the CD4 receptor. It is known that this action causes CD4 degradation in lysosome, but the molecular mechanisms are still incompletely understood. Nef forms a tripartite complex with the cytosolic tail of the CD4 and adapter protein 2 (AP-2) in clathrin-coated vesicles, inducing CD4 internalization and lysosome degradation. Previous research has demonstrated that CD4 target to lysosomes by Nef involves targeting of this receptor to multivesicular bodies (MVBs) pathway by an atypical mechanism because, although not need charging ubiquitination, depends on the proteins from ESCRTs (Endosomal Sorting Complexes Required for Transport) machinery and the action of Alix, an accessory protein ESCRT machinery. It has been reported that Nef interacts with subunits of AP- 1, AP-2, AP-3 complexes and Nef does not appear to interact with AP-4 and AP-5 subunits. However, the role of Nef interaction with AP-1 or AP-3 in CD4 down-regulation is poorly understood. Furthermore, AP-1, AP-2 and AP-3 are potentially heterogeneous due to the existence of multiple subunits isoforms encoded by different genes. However, there are few studies to demonstrate if the different combinations of APs isoforms are form and if they have distinct functional properties. This study aim to identify and characterize cellular factors involved on CD4 down-modulation induced by Nef from HIV-1. More specifically, this study aimed to characterize the involvement of AP-1 complex in the down-regulation of CD4 by Nef HIV-1 through the functional study of the two isoforms of ?-adaptins, AP-1 subunits. By pull-down technique, we showed that Nef is able to interact with ?2. In addition, our data from immunoblots indicated that ?2- adaptin, not ?1-adaptin, is required in Nef-mediated targeting of CD4 to lysosomes and the ?2 participation in this process is conserved by Nef from different viral strains. Furthermore, by flow cytometry assay, ?2 depletion, but not ?1 depletion, compromises the reduction of surface CD4 levels induced by Nef. Immunofluorescence microscopy analysis also revealed that ?2 depletion impairs the redistribution of CD4 by Nef to juxtanuclear region, resulting in CD4 accumulation in primary endosomes. Knockdown of ?1A, another subunit of AP-1, resulted in decreased cellular levels of ?1 and ?2 and, compromising the efficient CD4 degradation by Nef. Moreover, upon artificially stabilizing ESCRT-I in early endosomes, via overexpression of HRS, internalized CD4 accumulates in enlarged HRS-GFP positive endosomes, where co-localize with ?2. Together, the results indicate that ?2-adaptin is a molecule that is essential for CD4 targeting by Nef to ESCRT/MVB pathway, being an important protein in the endo-lysosomal system. Furthermore, the results indicate that ?-adaptins isoforms not only have different functions, but also seem to compose AP-1 complex with distinct cell functions, and only the AP-1 variant comprising ?2, but not ?1, acts in the CD4 down-regulation induced by Nef. These studies contribute to a better understanding on the molecular mechanisms involved in Nef activities, which may also help to improve the understanding of the HIV pathogenesis and the related syndrome. In addition, this work contributes with the understanding of primordial process regulation on intracellular trafficking of transmembrane proteins.

APA, Harvard, Vancouver, ISO, and other styles

10

Liu, Albert Y. M., and 劉元明. "A Multiply-And-Accumulate Module Generator." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/31258642337837389545.

Full text

Abstract:

碩士
國立清華大學
資訊工程學系
88
Multiply-And-Accumulate (MAC) is the most frequently used operation in many DSP applications. We propose a software method that can generate high-performance MAC units in synthesizable HDL format. Our tool integrates several novel techniques including a modified radix-4 Booth encoding, a three dimensional Wallace tree, a sign-extension prevention scheme , and a hybrid carry-select/carry-look-ahead adder. It allows users to specify the number of bits in both inputs and output, the number system (signed or unsigned or decided by command inputs), the number of pipeline stages, saturation option on overflow, accumulator type (“addition only” or “addition and subtraction”), and pipeline stall as well as accumulator initialization capability. A typical MAC unit (16x16 inputs, 40-bit Accumulation , 2-stage pipeline) can be generated within seconds and run at over 280 MHz in post-layout simulation typical case when targeted toward a TSMC 0.35μm CMOS cell library.

APA, Harvard, Vancouver, ISO, and other styles

11

YE, YI-HAO, and 葉儀皓. "A new design approach of CMOS floating point multiply/accumulate chip." Thesis, 1986. http://ndltd.ncl.edu.tw/handle/70572211987483505163.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Su, Yu-Yi, and 蘇育毅. "Dynamic Early Terminating of Multiply-Accumulate Operation for Convolutional Neural Networks." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/j6vujt.

Full text

Abstract:

碩士
國立清華大學
資訊工程學系所
106
Deep learning has been attracting enormous attention from academia as well as industry due to its great success in many artificial intelligence applications. As more applications are developed, the need for implementing a complex neural network model on an energy-limited edge device becomes more critical. Thus, this paper proposes a new optimization method for saving the computations of convolutional neural networks (CNNs). The method takes advantage of the fact that some convolutional operations are actually wasteful since their outputs are pruned by the following activation or pooling layers. Basically, a convolutional filter conducts a series of multiply-accumulate (MAC) operations. We propose to set a series of checkpoints in the MAC operations to determine whether a filter could terminate early according to the intermediate result. Furthermore, a fine-tuning process is conducted to recover the accuracy loss due to the applied checkpoints. The experimental results show that the proposed method can save approximately 50% MAC operations with only 1% accuracy loss for two classic CNN models and it is competitive with previous methods.

APA, Harvard, Vancouver, ISO, and other styles

13

WU, JI-LI, and 吳基立. "A study of high speed CMOS floating point multiply/accumulate chip based on redundant binary representation." Thesis, 1989. http://ndltd.ncl.edu.tw/handle/14874956461527063537.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Li, June-Yi, and 李峻毅. "A Non-volatile Computing-In-Memory ReRAM Macro with Multiply-and-Accumulate for Binary DNN AI Edge Processors." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/gnarhb.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Lin, Wei-En, and 林暐恩. "A Non-volatile ReRAM Based Macro with Computing-In-Memory Multiply-and-Accumulate for Binary DNN AI Edge Processors." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/473g54.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Multiply and accumulate'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles