Dissertations / Theses: 'Application specific processors'

1

Mutigwe, Charles. "Automatic synthesis of application-specific processors." Thesis, Bloemfontein : Central University of Technology, Free State, 2012. http://hdl.handle.net/11462/163.

Full text

Abstract:

Thesis (D. Tech. (Engineering: Electrical)) -- Central University of technology, Free State, 2012
This thesis describes a method for the automatic generation of appli- cation speci_c processors. The thesis was organized into three sepa- rate but interrelated studies, which together provide: a justi_cation for the method used, a theory that supports the method, and a soft- ware application that realizes the method. The _rst study looked at how modern day microprocessors utilize their hardware resources and it proposed a metric, called core density, for measuring the utilization rate. The core density is a function of the microprocessor's instruction set and the application scheduled to run on that microprocessor. This study concluded that modern day microprocessors use their resources very ine_ciently and proposed the use of subset processors to exe- cute the same applications more e_ciently. The second study sought to provide a theoretical framework for the use of subset processors by developing a generic formal model of computer architecture. To demonstrate the model's versatility, it was used to describe a number of computer architecture components and entire computing systems. The third study describes the development of a set of software tools that enable the automatic generation of application speci_c proces- sors. The FiT toolkit automatically generates a unique Hardware Description Language (HDL) description of a processor based on an application binary _le and a parameterizable template of a generic mi- croprocessor. Area-optimized and performance-optimized custom soft processors were generated using the FiT toolkit and the utilization of the hardware resources by the custom soft processors was character- ized. The FiT toolkit was combined with an ANSI C compiler and a third-party tool for programming _eld-programmable gate arrays (FPGAs) to create an unconstrained C-to-silicon compiler.

APA, Harvard, Vancouver, ISO, and other styles

2

Lau, C. H. "Computational structures for application specific VLSI processors." Thesis, University of Edinburgh, 1989. http://hdl.handle.net/1842/12396.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Nyländen, T. (Teemu). "Application specific programmable processors for reconfigurable self-powered devices." Doctoral thesis, Oulun yliopisto, 2018. http://urn.fi/urn:isbn:9789526218755.

Full text

Abstract:

Abstract The current Internet of Things solutions for simple measurement and monitoring tasks are evolving into ubiquitous sensor networks that are constantly observing both our well being and the conditions of our living environment. The oncoming omnipresent wireless infrastructure is expected to feature artificial intelligence capabilities that can interpret human actions, gestures and even needs. All of this will require processing power on a par with and energy efficiency far beyond that of the current mobile devices. The current Internet of Things devices rely mostly on commercial low power off-the-shelf micro-controllers. Optimized solely for low power, while paying little attention to computing performance, the present solutions are far from achieving the energy efficiency, let alone, the compute capability requirements of the future Internet of Things solutions. Since this domain is application specific by nature, the use of general purpose processors for signal processing tasks is counterintuitive. Instead, dedicated accelerator based solutions are more likely to be able to meet these strict demands. This thesis proposes one potential solution for achieving the necessary low energy, as well as the flexibility and performance requirements of the Internet of Things domain in a cost effective manner using reconfigurable heterogeneous processing solutions. A novel graphics processing unit-style accelerator for the Internet of Things application domain is presented. Since the accelerator can be reconfigured, it can be used for most applications of the Internet of Things domain, as well as other application domains. The solution is assessed using two computer vision applications, and is demonstrated to achieve an excellent combination of performance and energy efficiency. The accelerator is designed using an efficient and rapid co-design flow of software and hardware, featuring ease of development characteristics close to commercial off-the-shelf solutions, which also enables cost-efficient design flow
Tiivistelmä Esineiden internet tulee muuttamaan tulevaisuudessa elinympäristömme täysin. Se tulee mahdollistamaan interaktiiviset ympäristöt nykyisten passiivisten ympäristöjen sijaan. Lisäksi elinympäristömme tulee reagoimaan tekoihimme ja puheeseemme sekä myös tunteisiimme. Tämä kaikkialla läsnä olevan langaton infrastruktuuri tulee vaatimaan ennennäkemätöntä laskentatehokkuutta yhdistettynä äärimmäiseen energiatehokkuuteen. Nykyiset esineiden internet ratkaisut nojaavat lähes täysin kaupallisiin "suoraan hyllyltä" saataviin yleiskäyttöisiin mikrokontrollereihin. Ne ovat kuitenkin optimoituja pelkästään matalan tehonkulutuksen näkökulmasta, eivätkä niinkään energiatehokkuuden, saati tulevaisuuden esineiden internetin vaatiman laskentatehon suhteen. Kuitenkin esineiden internet on lähtökohtaisesti sovelluskohtaista laskentaa vaativa, joten yleiskäyttöisten prosessoreiden käyttö signaalinkäsittelytehtäviin on epäloogista. Sen sijaan sovelluskohtaisten kiihdyttimien käyttö laskentaan, todennäköisesti mahdollistaisi tavoitellun vaatimustason saavuttamisen. Tämä väitöskirja esittelee yhden mahdollisen ratkaisun matalan energian kulutuksen, korkean suorituskyvyn ja joustavuuden yhdenaikaiseen saavuttamiseen kustannustehokkaalla tavalla, käyttäen uudelleenkonfiguroitavia heterogeenisiä prosessoriratkaisuja. Työssä esitellään uusi grafiikkaprosessori-tyylinen uudelleen konfiguroitava kiihdytin esineiden internet sovellusalueelle, jota pystytään hyödyntämään useimpien laskentatehoa vaativien sovellusten kanssa. Ehdotetun kiihdyttimen ominaisuuksia arvioidaan kahta konenäkösovellusta esimerkkinä käyttäen ja osoitetaan sen saavuttavan loistavan yhdistelmän energia tehokkuutta ja suorituskykyä. Kiihdytin suunnitellaan käyttäen tehokasta ja nopeaa ohjelmiston ja laitteiston yhteissuunnitteluketjua, jolla voidaan saavuttaa lähestulkoon kaupallisten "suoraan hyllyltä" saatavien prosessoreiden kehitystyön helppous, joka puolestaan mahdollistaa kustannustehokkaan kehitys- ja suunnittelutyön

APA, Harvard, Vancouver, ISO, and other styles

4

Glökler, Tilman Meyr Heinrich. "Design of energy-efficient application-specific instruction set processors /." Boston, Mass. [u.a.] : Kluwer Acad. Publ, 2004. http://www.loc.gov/catdir/enhancements/fy0820/2004041376-d.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Hautala, I. (Ilkka). "From dataflow models to energy efficient application specific processors." Doctoral thesis, Oulun yliopisto, 2019. http://urn.fi/urn:isbn:9789526223681.

Full text

Abstract:

Abstract The development of wireless networks has provided the necessary conditions for several new applications. The emergence of the virtual and augmented reality and the Internet of things and during the era of social media and streaming services, various demands related to functionality and performance have been set for mobile and wearable devices. Meeting these demands is complicated due to minimal energy budgets, which are characteristic of embedded devices. Lately, the energy efficiency of devices has been addressed by increasing parallelism and the use of application-specific hardware resources. This has been hindered by hardware development as well as software development because the conventional development methods are based on the use of low-level abstractions and sequential programming paradigms. On the other hand, deployment of high-level design methods is slowed down because of final solutions that are too much compromised when energy efficiency and performance are considered. This doctoral thesis introduces a model-driven framework for the development of signal processing systems that facilitates hardware and software co-design. The design flow exploits an easily customizable, re-programmable and energy-efficient processor template. The proposed design flow enables tailoring of multiple heterogeneous processing elements and the connections between them to the demands of an application. Application software is described by using high-level dataflow models, which enable the automatic synthesis of parallel applications for different multicore hardware platforms and speed up design space exploration. Suitability of the proposed design flow is demonstrated by using three different applications from different signal processing domains. The experiments showed that raising the level of abstraction has only a minor impact on performance. Video processing algorithms are selected to be the main application area in this thesis. The thesis proposes tailored and reprogrammable energy-efficient processing elements for video coding algorithms. The solutions are based on the use of multiple processing elements by exploiting the pipeline parallelism of the application, which is characteristic of many signal processing algorithms. Performance, power and area metrics for the designed solutions have been obtained using post-layout simulation models. In terms of energy efficiency, the proposed programmable processors form a new compromise solution between fixed hardware accelerators and conventional embedded processors for video coding
Tiivistelmä Langattomien verkkojen kehittyminen on luonut edellytykset useille uusille sovelluksille. Muiden muassa sosiaalisen media, suoratoistopalvelut, virtuaalitodellisuus ja esineiden internet asettavat kannettaville ja puettaville laitteille moninaisia toimintoihin, suorituskykyyn, energiankulutukseen ja fyysiseen muotoon liittyviä vaatimuksia. Yksi isoimmista haasteista on sulautettujen laitteiden energiankulutus. Laitteiden energiatehokkuutta on pyritty parantamaan rinnakkaislaskentaa ja räätälöityjä laskentaresursseja hyödyntämällä. Tämä puolestaan on vaikeuttanut niin laite- kuin sovelluskehitystä, koska laajassa käytössä olevat kehitystyökalut perustuvat matalan tason abstraktioihin ja hyödyntävät alun perin yksi ydinprosessoreille suunniteltuja ohjelmointikieliä. Korkean tason ja automatisoitujen kehitysmenetelmien käyttöönottoa on hidastanut aikaansaatujen järjestelmien puutteellinen suorituskyky ja laiteresurssien tehoton hyödyntäminen. Väitöskirja esittelee datavuopohjaiseen suunnitteluun perustuvan työkaluketjun, joka on tarkoitettu energiatehokkaiden signaalikäsittelyjärjestelmien toteuttamiseen. Työssä esiteltävä suunnitteluvuo pohjautuu laitteistoratkaisuissa räätälöitävään ja ohjelmoitavaan siirtoliipaistavaan prosessoritemplaattiin. Ehdotettu suunnitteluvuo mahdollistaa useiden heterogeenisten prosessoriytimien ja niiden välisten kytkentöjen räätälöimisen sovelluksien tarpeiden vaatimalla tavalla. Suunnitteluvuossa ohjelmistot kuvataan korkean tason datavuomallien avulla. Tämä mahdollistaa erityisesti rinnakkaista laskentaa sisältävän ohjelmiston automaattisen sovittamisen erilaisiin moniprosessorijärjestelmiin ja nopeuttaa erilaisten järjestelmätason ratkaisujen kartoittamista. Suunnitteluvuon käyttökelpoisuus osoitetaan käyttäen esimerkkinä kolmea eri signaalinkäsittelysovellusta. Tulokset osoittavat, että suunnittelumenetelmien abstraktiotasoa on mahdollista nostaa ilman merkittävää suorituskyvyn heikkenemistä. Väitöskirjan keskeinen sovellusalue on videonkoodaus. Työ esittelee videonkoodaukseen suunniteltuja energiatehokkaita ja uudelleenohjelmoitavia prosessoriytimiä. Ratkaisut perustuvat usean prosessoriytimen käyttämiseen hyödyntäen erityisesti videonkäsittelyalgoritmeille ominaista liukuhihnarinnakkaisuutta. Prosessorien virrankulutus, suorituskyky ja pinta-ala on analysoitu käyttämällä simulointimalleja, jotka huomioivat logiikkasolujen sijoittelun ja johdotuksen. Ehdotetut sovelluskohtaiset prosessoriratkaisut tarjoavat uuden energiatehokkaan kompromissiratkaisun tavanomaisten ohjelmoitavien prosessoreiden ja kiinteästi johdotettujen video-kiihdyttimien välille

APA, Harvard, Vancouver, ISO, and other styles

6

Sohl, Joar. "Efficient Compilation for Application Specific Instruction set DSP Processors with Multi-bank Memories." Doctoral thesis, Linköpings universitet, Datorteknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-113702.

Full text

Abstract:

Modern signal processing systems require more and more processing capacity as times goes on. Previously, large increases in speed and power efficiency have come from process technology improvements. However, lately the gain from process improvements have been greatly reduced. Currently, the way forward for high-performance systems is to use specialized hardware and/or parallel designs. Application Specific Integrated Circuits (ASICs) have long been used to accelerate the processing of tasks that are too computationally heavy for more general processors. The problem with ASICs is that they are costly to develop and verify, and the product life time can be limited with newer standards. Since they are very specific the applicable domain is very narrow. More general processors are more flexible and can easily adapt to perform the functions of ASIC based designs. However, the generality comes with a performance cost that renders general designs unusable for some tasks. The question then becomes, how general can a processor be while still being power efficient and fast enough for some particular domain? Application Specific Instruction set Processors (ASIPs) are processors that target a specific application domain, and can offer enough performance with power efficiency and silicon cost that is comparable to ASICs. The flexibility allows for the same hardware design to be used over several system designs, and also for multiple functions in the same system, if some functions are not used simultaneously. One problem with ASIPs is that they are more difficult to program than a general purpose processor, given that we want efficient software. Utilizing all of the features that give an ASIP its performance advantage can be difficult at times, and new tools and methods for programming them are needed. This thesis will present ePUMA (embedded Parallel DSP platform with Unique Memory Access), an ASIP architecture that targets algorithms with predictable data access. These kinds of algorithms are very common in e.g. baseband processing or multimedia applications. The primary focus will be on the specific features of ePUMA that are utilized to achieve high performance, and how it is possible to automatically utilize them using tools. The most significant features include data permutation for conflict-free data access, and utilization of address generation features for overhead free code execution. This sometimes requires specific information; for example the exact sequences of addresses in memory that are accessed, or that some operations may be performed in parallel. This is not always available when writing code using the traditional way with traditional languages, e.g. C, as extracting this information is still a very active research topic. In the near future at least, the way that software is written needs to change to exploit all hardware features, but in many cases in a positive way. Often the problem with current methods is that code is overly specific, and that a more general abstractions are actually easier to generate code from.

APA, Harvard, Vancouver, ISO, and other styles

7

Franz, Jonathan D. Duren Russell Walker. "An evaluation of CoWare Inc.'s Processor Designer tool suite for the design of embedded processors." Waco, Tex. : Baylor University, 2008. http://hdl.handle.net/2104/5254.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Lim, Wei Ming. "Design of application specific instruction set processors for the domain of GF(2'm)." Thesis, University of Sheffield, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.412439.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Radhakrishnan, Swarnalatha Computer Science &amp Engineering Faculty of Engineering UNSW. "Heterogeneous multi-pipeline application specific instruction-set processor design and implementation." Awarded by:University of New South Wales. Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/29161.

Full text

Abstract:

Embedded systems are becoming ubiquitous, primarily due to the fast evolution of digital electronic devices. The design of modern embedded systems requires systems to exhibit, high performance and reliability, yet have short design time and low cost. Application Specific Instruction set processors (ASIPs) are widely used in embedded system since they are economical to use, flexible, and reusable (thus saves design time). During the last decade research work on ASIPs have been carried out in mainly for single pipelined processors. Improving performance in processors is possible by exploring the available parallelism in the program. Designing of multiple parallel execution paths for parallel execution of the processor naturally incurs additional cost. The methodology presented in this dissertation has addressed the problem of improving performance in ASIPs, at minimal additional cost. The devised methodology explores the available parallelism of an application to generate a multi-pipeline heterogeneous ASIP. The processor design is application specific. No pre-defined IPs are used in the design. The generated processor contains multiple standalone pipelined data paths, which are not necessarily identical, and are connected by the necessary bypass paths and control signals. Control unit are separate for each pipeline (though with the same clock) resulting in a simple and cost effective design. By using separate instruction and data memories (Harvard architecture) and by allowing memory access by two separate pipes, the complexity of the controller and buses are reduced. The impact of higher memory latencies is nullified by utilizing parallel pipes during memory access. Efficient bypass network selection and encoding techniques provide a better implementation. The initial design approach with only two pipelines without bypass paths show speed improvements of up to 36% and switching activity reductions of up to 11%. The additional area costs around 16%. An improved design with different number of pipelines (more than two) based on applications show on average of 77% performance improvement with overheads of: 49% on area; 51% on leakage power; 17% on switching activity; and 69% on code size. The design was further trimmed, with bypass path selection and encoding techniques, which show a saving of up to 32% of area and 34% of leakage power with 6% performance improvement and 69% of code size reduction compared to the design approach without these techniques in the multi pipeline design.

APA, Harvard, Vancouver, ISO, and other styles

10

Tell, Eric. "Design of Programmable Baseband Processors." Doctoral thesis, Linköping : Univ, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-4377.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Ma, Ning. "Ultra-low-power Design and Implementation of Application-specific Instruction-set Processors for Ubiquitous Sensing and Computing." Doctoral thesis, KTH, Industriell och Medicinsk Elektronik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-174896.

Full text

Abstract:

The feature size of transistors keeps shrinking with the development of technology, which enables ubiquitous sensing and computing. However, with the break down of Dennard scaling caused by the difficulties for further lowering supply voltage, the power density increases significantly. The consequence is that, for a given power budget, the energy efficiency must be improved for hardware resources to maximize the performance. Application-specific integrated circuits (ASICs) obtain high energy efficiency at the cost of low flexibility for various applications, while general-purpose processors (GPPs) gain generality at the expense of efficiency. To provide both high energy efficiency and flexibility, this dissertation explores the ultra-low-power design of application-specific instruction-set processors (ASIP) for ubiquitous sensing and computing. Two application scenarios, i.e. high-throughput compute-intensive processing for multimedia and low-throughput low-cost processing for Internet of Things (IoT) are implemented in the proposed ASIPs. Multimedia stream processing for human-computer interaction is always featured with high data throughput. To design processors for networked multimedia streams, customizing application-specific accelerators controlled by the embedded processor is exploited. By abstracting the common features from multiple coding algorithms, video decoding accelerators are implemented for networked multi-standard multimedia stream processing. Fabricated in 0.13 $\mu$m CMOS technology, the processor running at 216 MHz is capable of decoding real-time high-definition video streams with power consumption of 414 mW. When even higher throughput is required, such as in multi-view video coding applications, multiple customized processors will be connected with an on-chip network. Design problems are further studied for selecting the capability of single processors, the number of processors, the capacity of communication network, as well as the task assignment schemes. In the IoT scenario, low processing throughput but high energy efficiency and adaptability are demanded for a wide spectrum of devices. In this case, a tile processor including a multi-mode router and dual cores is proposed and implemented. The multi-mode router supports both circuit and wormhole switching to facilitate inter-silicon extension for providing on-demand performance. The control-centric dual-core architecture uses control words to directly manipulate all hardware resources. Such a mechanism avoids introducing complex control logics, and the hardware utilization is increased. Programmable control words enable reconfigurability of the processor for supporting general-purpose ISAs, application-specific instructions and dedicated implementations. The idea of reducing global data transfer also increases the energy efficiency. Finally, a single tile processor together with network of bare dies and network of packaged chips has been demonstrated as the result. The processor implemented in 65 nm low leakage CMOS technology and achieves the energy efficiency of 101.4 GOPS/W for each core.

QC 20151009

APA, Harvard, Vancouver, ISO, and other styles

12

Wahlen, Oliver [Verfasser]. "C Compiler Aided Design of Application-Specific Instruction-Set Processors Using the Machine Description Language LISA / Oliver Wahlen." Aachen : Shaker, 2004. http://d-nb.info/1181603536/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Wahlen, Oliver [Verfasser]. "C compiler aided design of application specific instruction set processors using the machine description language LISA / vorgelegt von Oliver Wahlen." Aachen : Shaker, 2004. http://nbn-resolving.de/urn:nbn:de:hbz:82-opus-9116.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Wilczák, Milan. "Ladicí nástroj generických simulátorů mikroprocesorů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237257.

Full text

Abstract:

Application specific instruction set processors become part of every day life although it's not always visible at first sight. During their development it's needed to somehow describe their architecture, instruction set and behavior. To make their developement worth, it's necessary to be able to create applications for these processors and during application development errors are always made. Debuggers serve to discover and help fixing them. This paper summarises some basic information to debugger development and describes implementation for processors created using the Lissom project.

APA, Harvard, Vancouver, ISO, and other styles

15

FRANCHINI, Silvia Giuseppina. "Graphic Coprocessors with Native Clifford Algebra Support." Doctoral thesis, Università degli Studi di Palermo, 2009. http://hdl.handle.net/10447/178952.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Bishell, Aaron. "Designing application-specific processors for image processing : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science, Massey University, Palmerston North, New Zealand." Massey University, 2008. http://hdl.handle.net/10179/1024.

Full text

Abstract:

Implementing a real-time image-processing algorithm on a serial processor is difficult to achieve because such a processor cannot cope with the volume of data in the low-level operations. However, a parallel implementation, required to meet timing constraints for the low-level operations, results in low resource utilisation when implementing the high-level operations. These factors suggested a combination of parallel hardware, for the low-level operations, and a serial processor, for the high-level operations, for implementing a high-level image-processing algorithm. Several types of serial processors were available. A general-purpose processor requires an extensive instruction set to be able to execute any arbitrary algorithm resulting in a relatively complex instruction decoder and possibly extra FUs. An application-specific processor, which was considered in this research, implements enough FUs to execute a given algorithm and implements a simpler, and more efficient, instruction decoder. In addition, an algorithms behaviour on a processor could be represented in either hardware (i.e. hardwired logic), which limits the ability to modify the algorithm behaviour of a processor, or “software” (i.e. programmable logic), which enables external sources to specify the algorithm behaviour. This research investigated hardware- and software- controlled application-specific serial processors for the implementation of high-level image-processing algorithms and compared these against parallel hardware and general-purpose serial processors. It was found that application-specific processors are easily able to meet the timing constraints imposed by real-time high-level image processing. In addition, the software-controlled processors had additional flexibility, a performance penalty of 9.9% and 36.9% and inconclusive footprint savings (and costs) when compared to hardwarecontrolled processors.

APA, Harvard, Vancouver, ISO, and other styles

17

Grant, David. "A Lightweight Processor Core for Application Specific Acceleration." Thesis, University of Waterloo, 2004. http://hdl.handle.net/10012/800.

Full text

Abstract:

Advances in configurable logic technology have permitted the development of low-cost, high-speed configurable devices, allowing one or more soft processor cores to be introduced into a configurable computing system. Soft processor cores offer logic-area savings and reduced configuration times when compared to the hardware-only implementations typically used for application specific acceleration. Programs for a soft processor core are small and simple compared to the design of a hardware core, but can leverage custom hardware within the processor core to provide greater acceleration for specific applications. This thesis presents several configurable system models, and implements one such model on a Nios Embedded Processor Development Board. A software programmable and hardware configurable lightweight processor core known as the FAST CPU is introduced. The configurable system implementation attaches several FAST CPUs to a standard Nios processor to create a system for experimentation with application specific acceleration. This system incorporating the FAST CPUs was tested for bus utilization behaviour, computing performance, and execution times for a minheap application. Experimental results are compared to the performance of a software-only solution, and also with previous research results. Experimental results verify that the theory and models used to predict bus utilization are correct. Performance testing shows that the FAST CPU is approximately 25% slower than a general purpose processor, which is expected. The FAST CPU, however, is 31% smaller in terms of logic area than the general purpose processor, and is 8% smaller than the design of a hardware-only implementation of a minheap for application specific acceleration. The results verify that it is possible to move functionality from a general purpose processor to a lightweight processor, and further, to realize an increase in performance when a task is parallelized across multiple FAST CPUs. The experimentation uses a procedure by which a set of equations can be derived for predicting bus utilization and deriving a cost-benefit curve for a coprocessing entity. They are applied to a specific system in this research, but the methods are generalizable to any coprocessing entity.

APA, Harvard, Vancouver, ISO, and other styles

18

Mikó, Albert. "Akcelerace aplikací pomocí specializovaných instrukcí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255444.

Full text

Abstract:

The design of specialized instructions for application specific processors is a challenging task. This thesis describes the issues of effective specification and use of specialized instructions for optimization of applications. It focuses on improvements of the outputs and usability of the semiatomatic method of selection of specialized instructions to allow the optimization of complicated applications. This method combines manual selection of instructions by marking a section of source code in the application and automatic generation of the instruction description in the modelling language.

APA, Harvard, Vancouver, ISO, and other styles

19

Dolíhal, Luděk. "Testování generovaných překladačů jazyka c pro procesory ve vestavěných systémech." Doctoral thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-412583.

Full text

Abstract:

Vestavěné systémy se staly nepostradatelnými pro náš každodenní život. Jsou to obvykle úzce zaměřená, vysoce optimalizovaná, jednoúčelová zařízení. Jádro vestavěných zařízení obvykle tvoří jeden nebo více aplikačně specifických instrukčních procesorů. Tato disertační práce se zaměřuje na problematiku testování nástrojú pro návrh aplikačně specifických procesorů a následně i samotných aplikačne specifických procesorů. Snahou bylo vytvořit systém, ve kterém bude možné otestovat jednotlivé nástroje, jako například překladač, assembler, disassembler, debugger. Nicméně vyvstává také potřeba provádět složitější testy, například integrační, které zaručí, že mezi jednotlivými nástroji nevzniká nekompatibilita. Autor vytvořil s podporou přůběžně integračního serveru prostředí, které napomáhá odhalování a odstraňování chyb při návrhu aplikačně specifických procesorů a které je navíc do značné míry automatizované.

APA, Harvard, Vancouver, ISO, and other styles

20

Kreutz, Marcio Eduardo. "Geração de processador para aplicacao especifica." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 1997. http://hdl.handle.net/10183/17752.

Full text

Abstract:

Este trabalho propõe a geração de uma arquitetura dedicada a aplicações específicas, baseadas no microcontrolador MCS8051. Por ser utilizado na solução de problemas em indústrias locais, este processador foi escolhido para servir como base em um sistema dedicado. O 8051 dedicado gerado deverá permitir a integração completa do sistema, proporcionando um aumento do valor agregado e, conseqüentemente, a diminuição do custo. Busca-se com a otimização da arquitetura obter um conjunto de instruções reduzido, construído com as instruções mais utilizadas em cada aplicação. O objetivo principal da otimização do conjunto de instruções está relacionado ao fato de que os circuitos decodificadores e geradores de microcódigo da parte de controle ocupam uma área significativa do processador. Uma otimização no sentido de reduzir-se o conjunto de instruções, portanto, resulta numa economia de área, o que vem de encontro com a idéia da integração completa do sistema com o processador. Um processador dedicado a aplicações específicas (ASIP) irá possuir um custo maior do que a sua versão original, devido as otimizações realizadas. Para compensar este custo, uma alternativa a seguir é a integração completa do sistema. Um Sistema Integrado para Aplicações Específicas (SIAE) torna-se desejável, pois aumentando o valor agregado do circuito possibilita-se a redução do custo pela eliminação de conexões da placa, do encapsulamento de outros circuitos, entre outros motivos. Todavia, para que um SIAE possa ser construído com um custo aceitável, é necessário que seja construído em uma área que não exceda muito a área original do processador. Tenta-se fazer isto neste trabalho, através da implementação de aplicações com poucas instruções diferentes. Por ser uma arquitetura comercial, o 8051 possui um grande parque de software desenvolvido e resolvendo problemas. Isto pode ser considerado uma vantagem pois, software básicos como por exemplo, compiladores, já estão desenvolvidos. Outra vantagem é o grande número de engenheiros treinados na sua utilização. Desse modo, torna-se necessária a criação de uma compatibilidade de software, para preservar o que já está desenvolvido. Uma vez que a programação em nível de linguagem montadora tende a constituir-se em uma tarefa cansativa e sujeita a erros, é desejável que se tenha uma compatibilidade em alto nível, ou seja, através de um compilador. Para criar a compatibilidade de SW necessária é realizada a otimização de um compilador C desenvolvido para o 8051. A escolha pela linguagem C deve-se ao fato de sua grande utilização. O compilador C otimizado procura utilizar um conjunto de instruções reduzido para obter a economia de área. Quando uma instrução necessita ser utilizada e não está presente no conjunto de instruções desejado, o compilador tenta substituí-la por outra(s). Um conjunto de instruções é utilizado para cada aplicação, sendo constituído pelas instruções mais utilizadas por esta. Para determinar as instruções mais utilizadas de cada aplicação é realizada uma análise estática sobre um código em linguagem montadora previamente compilado. As instruções implementadas serão sempre parte do conjunto de instruções original do 8051, de modo que novas instruções não serão criadas.Um programa em linguagem montadora gerado com um conjunto de instruções reduzido (RISC) normalmente terá um número maior de instruções do que o seu 10 equivalente com o conjunto de instruções completo (CISC). Isto ocorre porque possivelmente algumas substituições de uma instrução por outras, terão que ser realizadas. Como as instruções que serão utilizadas nas substituições pertencem ao conjunto de instruções original, o programa gerado com o compilador otimizado poderá executar em um tempo maior do que se fosse compilado com o código CISC. Para compensar esse atraso foi implementado um pipeline de instruções para o 8051. Este trabalho apresenta resultados da Síntese Lógica em Standard Cell e FPGA da arquitetura otimizada. Além disso, resultados de programas em linguagem montadora gerados com o compilador otimizado, são também apresentados.
This work discusses a processor for specific applications architecture, based on the MCS8051 microcontroller. This processor is used as a solution for many local industry applications, being the base of dedicated systems. The dedicated 8051 generated should allow complete integration of the system, and with the added value to the chip, reduced costs. The architecture optimization will produce as result a reduced instruction set, made by the often used instructions for each application. The main instruction set optimization goal refers to the instrucions decoders and microcode generators in the control part, because a large area in the processor is needed to implement them. Thus, a reduced instruction set will allow area savings, making possible the complete system integration in a chip. An ASIP architecture will have a higher cost than the original one. An alternative to solve this problem is add value to the chip, creating an Application Specific Integrated System (ASIS). An ASIS can be made with a acceptable cost, if it’s possible to integrate other circuits to the chip without area increase. This can be done in the area saved by using fewer implemented instructions. Because the 8051 is a commercial architecture, there is a large amount of software developed for it. This can be considered an advantage because basic softwares like compilers are available, being not necessary to create them. Another advantage refers to the large number of engineers trained to use the 8051. To preserve the already developed applications it’s necessary to mantain software compatibility. Assembler level programming is very boring an error prone task, being desirable to have software compatibility at higher levels through the use of high level languages. To create the necessary SW compatibility, a C compiler developed for 8051 was optimized. The chose for C language refers to its large utilization. The optimized C compiler tries to use a reduced instruction set, formed with the most important instructions for each application, in order ro save area. When an instruction needs to be used in an application, and it’s not present in the instruction set, the compiler tries to replace it with other instructions. The compiler will not use instructions not present in the original 8051 instruction set. So, new instrucions will be not created. To create an instruction set formed with the most important instructions for each application, a static analysis is made on a precompiled assembler source. An assembler source generated with a reduced instruction set (RISC) will probably have more instructions than the same assembler generated with a full instruction set (CISC). This can be explained because of the replacements instruction. If one instruction is replaced by other two, and these are from the original instruction set, probably the time needed to execute them would be higher. In order to deal with this problem, an instruction pipeline was implemented to the 8051. This work presents Standard Cells and FPGA results of Logic Synthesis of the optimized architecture. Also, assembly programs generated by the optimized compiler are presented.

APA, Harvard, Vancouver, ISO, and other styles

21

McMullin, John Derek. "Accelerating the parsing process with an application specific VLSI RISC processor." Thesis, University of Central Lancashire, 1997. http://clok.uclan.ac.uk/8694/.

Full text

Abstract:

This thesis investigates the topic of the design, implementation and potential use of specialised hardware used to accelerate the recognition and translation of computer programs expressed in a range of computer languages. This investigation focuses specifically on the twin processes of parsing and lexical analysis. The research described was carried out in two areas namely, the feasibility of designing a specialised instruction set for a RISC like processor able to accelerate the parsing and lexical analysis process, and the physical implementation of a RISC processor in CMOS VLSI technology able to execute the designed instruction set. The feasibility of mapping the process of language recognition onto the instruction set of a RISC processor is investigated. This involves an assessment of the suitability of the LL(1) and LALR(1) algorithms, both of which are used for parsing, and other associated algorithms, used for lexical analysis, as a basis for an appropriate instruction set architecture. The feasibility of an instruction set design which uses fixed size instructions with variable size data fields to ensure scaleable operation is also investigated. The appropriate software mechanisms used to validate the instruction set architecture are outlined. The practical implementation using CMOS technology of a RISC processor able to execute the new instruction set is investigated. In particular the feasibility of using bit-slice technology to implement the processor having fixed size instructions with variable size data-paths and address ranges is investigated. The combination of novel instruction set with variable data-widths and the fabricated devices able to activate semantic actions directly from hardware together form an original contribution to the field of parsing and lexical analysis.

APA, Harvard, Vancouver, ISO, and other styles

22

Petrov, Peter. "Application specific embedded processor customizations for low power and high performance /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2004. http://wwwlib.umi.com/cr/ucsd/fullcit?p3137218.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Packiaraj, Vivek. "Study, Design and Implementation of an Application Specific Instruction Set Processor for a Specific DSP Task." Thesis, Linköping University, Electronics System, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-52314.

Full text

Abstract:

There is a lot of literature already available describing well-structured approach for embeddeddesign and implementation of Application Specific Integrated Processor (ASIP) micro processorcore.

This concept features hardware structured approach for implementation of processor core fromminimal instruction set, encoding standards, hardware mapping, and micro architecture design,coding conventions, RTL,verification and burning into a FPGA. The goal is to design an ASIPprocessor core (Micro architecture design and RTL) which can perform DSP task, e.g., FIR. Thereport is a well structured approach of design and implementation of an ASIP DSP processor forDSP applications like FIR. This report contains design flow starting from Instruction set design,micro architecture design and RTL implementation of the core. Details of the power simulationsof FPGA are also listed and analyzed.

APA, Harvard, Vancouver, ISO, and other styles

24

Martin, Rovira Julia, and Fructoso Melero Francisco Manuel. "Micro-Network Processor : A Processor Architecture for Implementing NoC Routers." Thesis, Jönköping University, JTH, Computer and Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-941.

Full text

Abstract:

Routers are probably the most important component of a NoC, as the performance of the whole network is driven by the routers’ performance. Cost for the whole network in terms of area will also be minimised if the router design is kept small. A new application specific processor architecture for implementing NoC routers is proposed in this master thesis, which will be called µNP (Micro-Network Processor). The aim is to offer a solution in which there is a trade-off between the high performance of routers implemented in hardware and the high level of flexibility that could be achieved by loading a software that routed packets into a GPP. Therefore, a study including the design of a hardware based router and a GPP based router has been conducted. In this project the first version of the µNP has been designed and a complete instruction set, along with some sample programs, is also proposed. The results show that, in the best case for all implementation options, µNP was 7.5 times slower than the hardware based router. It has also behaved more than 100 times faster than the GPP based router, keeping almost the same degree of flexibility for routing purposes within NoC.

APA, Harvard, Vancouver, ISO, and other styles

25

Vogt, Timo. "A reconfigurable application-specific instruction-set processor for trellis-based channel decoding /." Kaiserslautern : Techn. Univ. Kaiserslautern, 2008. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=016537958&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Shee, Seng Lin Computer Science &amp Engineering Faculty of Engineering UNSW. "ADAPT : architectural and design exploration for application specific instruction-set processor technologies." Awarded by:University of New South Wales, 2007. http://handle.unsw.edu.au/1959.4/35404.

Full text

Abstract:

This thesis presents design automation methodologies for extensible processor platforms in application specific domains. The work presents first a single processor approach for customization; a methodology that can rapidly create different processor configurations by the removal of unused instructions sets from the architecture. A profile directed approach is used to identify frequently used instructions and to eliminate unused opcodes from the available instruction pool. A coprocessor approach is next explored to create an SoC (System-on-Chip) to speedup the application while reducing energy consumption. Loops in applications are identified and accelerated by tightly coupling a coprocessor to an ASIP (Application Specific Instruction-set Processor). Latency hiding is used to exploit the parallelism provided by this architecture. A case study has been performed on a JPEG encoding algorithm; comparing two different coprocessor approaches: a high-level synthesis approach and our custom coprocessor approach. The thesis concludes by introducing a heterogenous multi-processor system using ASIPs as processing entities in a pipeline configuration. The problem of mapping each algorithmic stage in the system to an ASIP configuration is formulated. We proposed an estimation technique to calculate runtimes of the configured multiprocessor system without running cycle-accurate simulations, which could take a significant amount of time. We present two heuristics to efficiently search the design space of a pipeline-based multi ASIP system and compare the results against an exhaustive approach. In our first approach, we show that, on average, processor size can be reduced by 30%, energy consumption by 24%, while performance is improved by 24%. In the coprocessor approach, compared with the use of a main processor alone, a loop performance improvement of 2.57x is achieved using the custom coprocessor approach, as against 1.58x for the high level synthesis method, and 1.33x for the customized instruction approach. Energy savings are 57%, 28% and 19%, respectively. Our multiprocessor design provides a performance improvement of at least 4.03x for JPEG and 3.31x for MP3, for a single processor design system. The minimum cost obtained using our heuristic was within 0.43% and 0.29% of the optimum values for the JPEG and MP3 benchmarks respectively.

APA, Harvard, Vancouver, ISO, and other styles

27

Stothard, David. "The development of an application specific processor for the transmission line matrix method." Thesis, Loughborough University, 2000. https://dspace.lboro.ac.uk/2134/14899.

Full text

Abstract:

This thesis details the development of an application specific processor for the transmission line matrix (TLM) method. The application of TLM to the modelling of wave propagation in two and three dimensions is introduced with the discussion focusing on the concept of computational efficiency. Methods for improving computational efficiency are reviewed, in particular the implementation of TLM on large scale parallel computers. It is shown that these methods, while increasing throughput, make inefficient use of available resources. The review of existing methods is used to define a set of goals for a new class of application specific TLM processor. The development of an application specific processor based upon the two dimensional shunt node is presented. This gives rise to an efficient, bit serial scatter processor. The implementation of this processor within a complete, application specific TLM system is discussed. The system is based around a unique mapping of the TLM connect routine to hardware. The bit serial scatter processor is modified to allow the modelling of inhomogenous and three dimensional media using the stub loaded shunt node, the symmetrical condensed node and the symmetrical super condensed node TLM schemes. It is shown that all four TLM schemes may be implemented within a single architecture without the introduction of redundant elements through the use of reconfigurable logic. The implications of interfacing this system to a host PC using the PCI bus are discussed. The processor designs are reviewed within the context of the goals set for the work. It is shown that all of the goals were successfully met. The implications and limitations of the processor are discussed. The thesis concludes with recommendations for areas worthy of further study.

APA, Harvard, Vancouver, ISO, and other styles

28

Cheung, Newton Computer Science &amp Engineering Faculty of Engineering UNSW. "Design automation methodologies for extensible processor platform." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2005. http://handle.unsw.edu.au/1959.4/26118.

Full text

Abstract:

This thesis addresses two ubiquitous trends in the embedded system world - the increasing importance of design turnaround time as a design metric, and the move towards closing the design productivity gap. Adopting the right choice of design approach has been recognised as an integral part of the design flow in order to meet desired characteristics such as increasing software content, satisfying the growing complexities of an application, reusing off-the-shelf components, and exploring design metrics tradeoff, which closes the design productivity gap. The importance of design turnaround time is motivated by the intensive competition between manufacturers, especially makers of mainstream electronic consumer products, who shrinks the product life cycle and requires faster time-to-market to maximise economic benefits. This thesis presents a suite of design automation methodologies to automatically design embedded systems for an application in the state-of-the-art design approach - the extensible processor platform. These design automation methodologies systematise the extensible processor platform???s design flow, with particular emphasis on solving four challenging design problems: i) code segment identification; ii) instruction generation; iii) architectural customisation selection; and iv) processor evaluation. Our suite of design automation methodologies includes: i) a semi-automatic design system - to design an extensible processor that maximises the application performance while satisfying the area constraint. By specifying a fitting function to identify suitable code segments within an application, a two-level hierarchy selection algorithm is used to first select a predefined processor and then select the right instruction, and a performance estimator is used to estimate an application's performance; ii) a tool to match instructions - to automatically match the pre-designed instructions with computationally intensive code segments, reducing verification time and effort; iii) an instructions estimation model - to estimate the area overhead, latency, power consumption of extensible instructions, exploring larger design space; and iv) an instructions generation tool - to generate new extensible instructions that maximises the speedup while minimising power dissipation. A number of techniques such as system decomposition, combinational equivalence checking and regression analysis etc., have been heavily relied upon in the creation of the final design system. This thesis shows results at every stage to demonstrate the efficacy of our design methodologies in the creation of extensible processors. The methodologies and results presented in this thesis demonstrate that automating the design process for an extensible processor platform results in significant performance increase - on average, an increase of 4.74x (up to 15.71x) compared to the original base processor. Our system achieves significant design turnaround time savings (2.5% of the full simulation time for the entire design space) with majority Pareto points obtained (91% on average), and can lead to fewer and faster design iterations. Our instruction matching tool is 7.3x faster on average compared to the best known approaches to the problem (partial simulations). Our estimation model has a mean absolute error as small as 3.4% (6.7% max.) for area overhead, 5.9% (9.4% max.) for latency, and 4.2% (7.2% max.) for power consumption, compared to estimation through the time consuming synthesis and simulation steps using commercial tools. Finally, the instruction generation tool reduces energy consumption by a further 5.8% on average (up to 17.7%) compared to extensible instructions generated by previous approaches.

APA, Harvard, Vancouver, ISO, and other styles

29

Gerlach, Lukas [Verfasser]. "KAVUAKA: A Low-Power Application-Specific Processor Architecture for Digital Hearing Aids / Lukas Gerlach." Hannover : Gottfried Wilhelm Leibniz Universität, 2021. http://d-nb.info/1230550674/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Yassin, Yahya H. "ULTRA LOW POWER APPLICATION SPECIFIC INSTRUCTION-SET PROCESSOR DESIGN : for a cardiac beat detector algorithm." Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9914.

Full text

Abstract:

High efficiency and low power consumption are among the main topics in embedded systems today. For complex applications, off-the-shelf processor cores might not provide the desired goals in terms of power consumption. By optimizing the processor for the application, or a set of applications, one could improve the computing power by introducing special purpose hardware units. The execution cycle count of the application would in this case be reduced significantly, and the resulting processor would consume less power. In this thesis, some research is done in how to optimize a software and hardware development for ultra low power consumption. A cardiac beat detector algorithm is implemented in ANSI C, and optimized for low power consumption, by using several software power optimization techniques. The resulting application is mapped on a basic processor architecture provided by Target Compiler Technologies. This processor is optimized further for ultra low power consumption by applying application specific hardware, and by using several hardware power optimization techniques. A general processor and the optimized processor has been mapped on a chip, using a 90 nm low power TSMC process. Information about power dissipation is extracted through netlist simulation, and the results of both processors have been compared. The optimized processor consume 55% less average power, and the duty cycle of the processor, i.e., the time in which the processor executes its task with respect to the time budget available, has been reduced from 14% to 2.8%. The reduction in the total execution cycle count is 81%. The possibilities of applying power gating, or voltage and frequency scaling are discussed, and it is concluded that further reduction in power consumption is possible by applying these power optimization techniques. For a given case, the average leakage power dissipation is estimated to be reduced by 97.2%.

APA, Harvard, Vancouver, ISO, and other styles

31

Schlechter, E. J. (Emile Johan). "Manufacturing intelligence : a dissemination of intelligent manufacturing principles with specific application." Thesis, Stellenbosch : Stellenbosch University, 2002. http://hdl.handle.net/10019.1/52927.

Full text

Abstract:

Thesis (MEng)--University of Stellenbosch, 2002.
ENGLISH ABSTRACT: Artificial intelligence has provided several techniques with applications in manufacturing. Knowledge based systems, neural networks, case based reasoning, genetic algorithms and fuzzy logic have been successfully employed in manufacturing. This thesis will provide the reader with an introduction and an understanding of each of these techniques (Chapter 2 & 3). The intelligent manufacturing process can be a complex one and can be decomposed into several components: intelligent design, intelligent process planning, intelligent quality management, intelligent maintenance and diagnosis, intelligent scheduling and intelligent control. This thesis will focus on how each of the artificial intelligence techniques can be applied to each of the manufacturing process fields. Chapter 5 Chapter 6 Chapter 7 Knowledge based systems Neural networks Fuzzy logic Case based reasoning Genetic algorithms Chapter 8 Chapter 9 Chapter 10 Manufacturing intelligence can be approached from two main directions: theoretical research and practical application. Most of the concepts, methods and techniques discussed in this thesis are approached from a theoretical research point of view. This thesis is also aimed at providing the reader with a broader picture of manufacturing intelligence and how to apply the intelligent techniques, in theory. Specific attention will be given to intelligent scheduling as an application (Chapter 11). The application will demonstrate how case based reasoning can be applied in intelligent scheduling within a small manufacturing plant.
AFRIKAANSE OPSOMMING: Kunsmatige intelligensie bied 'n verskeidenheid tegnieke en toepassings in die vervaardigingsomgewing. Kennis baseerde sisteme, neurale netwerke, gevalle basseerde redenasie, generiese algoritmes en wasige logika word suksesvol in die vervaardigingsopset toegepas. Dié tesis gee die leser 'n inleiding en basiese oorsig van metodes om elk van die tegnieke te gebruik (hoofstuk 2 & 3). Die intelligente vervaardigingproses is 'n komplekse proses en kan afgebreek word in verskeie komponente: intelligente ontwerp, intelligente prosesbeplanning, intelligente gehaltebestuur, intelligente onderhoud en diagnose, intelligente kontrole en intelligente skedulering. Hierdie tesis sal fokus op hoe elk van die kunsmatige intelligente tegnieke op elk van die vervaardigingprosesvelde toegepas kan word. Hoofstuk 5 Hoofstuk 6 Hoofstuk 7 Kennis gebaseerde sisteme Wasige logika Neurale netwerke Gevalle baseerde redenasie Generiese algoritmes Hoofstuk 8 Hoofstuk 9 Hoofstuk 10 Vervaardigingsintelligensie kan vanuit twee oogpunte benader word, naamlik 'n teoretiese ondersoek en 'n praktiese aanslag. Die meeste van hierdie konsepte, metodes en tegnieke word in hierdie tesis vanuit 'n teoretiese oogpunt benader. Die tesis is daarop gerig om die leser 'n wyer perspektief te gee van intelligente vervaardiging en hoe om die intelligente tegnieke, in teorie, toe te pas. Spesifieke aandag sal gegee word aan intelligente skedulering as 'n toepassing (Hookstuk 11). Die toepassing sal demonstreer hoe gevalle baseerde redenasie toegepas kan word in intelligente skedulering.

APA, Harvard, Vancouver, ISO, and other styles

32

Lüthje, Olaf [Verfasser]. "A Methodology for Automated Analysis of Application Specific Processor Models with Respect to Test Generation / Olaf Lüthje." Aachen : Shaker, 2005. http://d-nb.info/1181615461/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Žádník, Jakub. "Implementation of Fast Fourier Transformation on Transport Triggered Architecture." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-361729.

Full text

Abstract:

V této práci je navrhnut energeticky úsporný procesor typu TTA (Transport Triggered Architecture) pro výpočet rychlé Fourierovy transformace (FFT). Návrh procesoru byl vytvořen na míru použitému algoritmu pomocí speciáoních funkčních jednotek. Algoritmus byl realizován jako posloupnost instrukcí tak, že většina výpočtu probíhá ve smyčce obrahující pouze jedionu paralelní instrukci. Tato instrukce je umístěna do instrukčního bufferu, odkud je potom volána místo instrukční paměti. Díky tomu se dá docílit nižší spotřeby, neboť volání z instrukčního bufferu je efektivnější než volání z instrukční paměti. Program byl zkompilován na časovém modelu procesoru a časová simulace potvrdila správnost návrhu. Součástí práce jsou rovněž pomocné programy v Pythonu, které slouží ke generaci referenčních výsledků a automatické simulaci a porovnání výsledků simulace s referencí.

APA, Harvard, Vancouver, ISO, and other styles

34

Bytyn, Andreas [Verfasser], Gerd [Akademischer Betreuer] Ascheid, and Rainer [Akademischer Betreuer] Leupers. "Efficiency and scalability exploration of an application-specific instruction-set processor for deep convolutional neural networks / Andreas Bytyn ; Gerd Ascheid, Rainer Leupers." Aachen : Universitätsbibliothek der RWTH Aachen, 2020. http://d-nb.info/1230325506/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Šulek, Jakub. "Verifikace ASIP založena na formálních tvrzeních." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-264941.

Full text

Abstract:

This thesis introduces the concept of assertion-based verifi cation of application-specifi c instruction set processors (ASIPs). The proposed design is implemented in SystemVerilog Assertions language as a part of veri fication environment created using Codasip Framework. The implemented concept is simulated in QuestaSim tool using model of Codix RISC processor. Main outcome of this thesis is the verifi cation concept usable not only on other processors, but as a part of system that automates the processor design as well.

APA, Harvard, Vancouver, ISO, and other styles

36

Noury, Ludovic. "Contribution à la conception de processeurs d'analyse de signaux à large bande dans le domaine temps-fréquence : l'architecture F-TFR." Paris 6, 2008. http://www.theses.fr/2008PA066206.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Khan, Muhammad Jazib. "Programmable Address Generation Unit for Deep Neural Network Accelerators." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-271884.

Full text

Abstract:

The Convolutional Neural Networks are getting more and more popular due to their applications in revolutionary technologies like Autonomous Driving, Biomedical Imaging, and Natural Language Processing. With this increase in adoption, the complexity of underlying algorithms is also increasing. This trend entails implications for the computation platforms as well, i.e. GPUs, FPGA, or ASIC based accelerators, especially for the Address Generation Unit (AGU), which is responsible for the memory access. Existing accelerators typically have Parametrizable Datapath AGUs, which have minimal adaptability towards evolution in algorithms. Hence new hardware is required for new algorithms, which is a very inefficient approach in terms of time, resources, and reusability. In this research, six algorithms with different implications for hardware are evaluated for address generation, and a fully Programmable AGU (PAGU) is presented, which can adapt to these algorithms. These algorithms are Standard, Strided, Dilated, Upsampled and Padded convolution, and MaxPooling. The proposed AGU architecture is a Very Long Instruction Word based Application Specific Instruction Processor which has specialized components like hardware counters and zero-overhead loops and a powerful Instruction Set Architecture (ISA), which can model static and dynamic constraints and affine and non-affine Address Equations. The target has been to minimize the flexibility vs. area, power, and performance trade-off. For a working test network of Semantic Segmentation, results have shown that PAGU shows close to the ideal performance, one cycle per address, for all the algorithms under consideration excepts Upsampled Convolution for which it is 1.7 cycles per address. The area of PAGU is approx. 4.6 times larger than the Parametrizable Datapath approach, which is still reasonable considering the high flexibility benefits. The potential of PAGU is not just limited to neural network applications but also in more general digital signal processing areas, which can be explored in the future.
Convolutional Neural Networks blir mer och mer populära på grund av deras applikationer inom revolutionerande tekniker som autonom körning, biomedicinsk bildbehandling och naturligt språkbearbetning. Med denna ökning av antagandet ökar också komplexiteten hos underliggande algoritmer. Detta medför implikationer för beräkningsplattformarna såväl som GPU: er, FPGAeller ASIC-baserade acceleratorer, särskilt för Adressgenerationsenheten (AGU) som är ansvarig för minnesåtkomst. Befintliga acceleratorer har normalt Parametrizable Datapath AGU: er som har mycket begränsad anpassningsförmåga till utveckling i algoritmer. Därför krävs ny hårdvara för nya algoritmer, vilket är en mycket ineffektiv metod när det gäller tid, resurser och återanvändbarhet. I denna forskning utvärderas sex algoritmer med olika implikationer för hårdvara för adressgenerering och en helt programmerbar AGU (PAGU) presenteras som kan anpassa sig till dessa algoritmer. Dessa algoritmer är Standard, Strided, Dilated, Upsampled och Padded convolution och MaxPooling. Den föreslagna AGU-arkitekturen är en Very Long Instruction Word-baserad applikationsspecifik instruktionsprocessor som har specialiserade komponenter som hårdvara räknare och noll-overhead-slingor och en kraftfull Instruktionsuppsättning Arkitektur (ISA) som kan modellera statiska och dynamiska begränsningar och affinera och icke-affinerad adress ekvationer. Målet har varit att minimera flexibiliteten kontra avvägning av område, kraft och prestanda. För ett fungerande testnätverk av semantisk segmentering har resultaten visat att PAGU visar nära den perfekta prestanda, 1 cykel per adress, för alla algoritmer som beaktas undantar Upsampled Convolution för vilken det är 1,7 cykler per adress. Området för PAGU är ungefär 4,6 gånger större än Parametrizable Datapath-metoden, vilket fortfarande är rimligt med tanke på de stora flexibilitetsfördelarna. Potentialen för PAGU är inte bara begränsad till neurala nätverksapplikationer utan också i mer allmänna digitala signalbehandlingsområden som kan utforskas i framtiden.

APA, Harvard, Vancouver, ISO, and other styles

38

Fatmi, Hassane. "Méthodologie d’analyse des signaux et caractérisation hydrogéologique : application aux chroniques de données obtenues aux laboratoires souterrains du Mont Terri, Tournemire et Meuse/Haute-Marne." Thesis, Toulouse, INPT, 2009. http://www.theses.fr/2009INPT020H/document.

Full text

Abstract:

Ce rapport présente des méthodes de prétraitement, d'analyse statistique et d'interprétation de chroniques hydrogéologiques de massifs peu perméables (argilites) dans le cadre d'études sur le stockage profond de déchets radioactifs. Les séries temporelles analysées sont la pression interstitielle et la pression atmosphérique, en relation avec différents phénomènes (marées terrestres, effet barométrique, évolution de l'excavation des galeries). Les pré-traitements permettent de reconstituer et homogénéiser les chroniques de données en présence de lacunes, aberrations, et pas de temps variables. Les signaux prétraités sont ensuite analysés en vue de caractériser les propriétés hydrauliques du massif peu perméable (emmagasinement spécifique ; porosité effective). Pour cela, on a développé et mis en oeuvre les méthodes d'analyses suivantes (implémentées en Matlab): analyses corrélatoires et spectrales (Fourier) ; analyses ondelettes multirésolution ; enveloppes de signaux aléatoires. Cette méthodologie est appliquée aux données acquises au Laboratoire Souterrain du Consortium International du Mont Terri (Jura Suisse), ainsi qu'à certaines données des Laboratoires Souterrains de Tournemire (Aveyron) et de Meuse / Haute-Marne (ANDRA)
This report presents a set of statistical methods for pre-processing and analyzing multivariate hydrogeologic time series, such as pore pressure and its relation to atmospheric pressure. The goal is to study the hydrogeologic characteristics of low permeability geologic formations (argilite) in the context of deep disposal of radioactive waste. The pressure time series are analyzed in relation with different phenomena, such as earth tides, barometric effects, and the evolution of excavated galleries. The pre-processing is necessary for reconstituting and homogenizing the time series in the presence of data gaps, outliers, and variable time steps. The preprocessed signals are then analyzed with a view to characterizing the hydraulic properties of this type of low permeability formation (specific storativity; effective porosity). For this sake, we have developed and used the following methods (implemented in Matlab): temporal correlation analyses; spectral/Fourier analyses; multiresolution wavelet analyses envelopes of random processes. This methodology is applied to data collected at the URL (Underground Research Laboratory) of the Mont Terri International Consortium (Swiss Jura), as well as some other data collected at the URL of IRSN at Tournemire (Aveyron) and at the URL of ANDRA (Meuse / Haute-Marne)

APA, Harvard, Vancouver, ISO, and other styles

39

Husár, Adam. "Implementace obecného assembleru." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2007. http://www.nusl.cz/ntk/nusl-412779.

Full text

Abstract:

This thesis describes the design of the universal assembler that represents a part of the Lissom project. You will be provided with the description of the assembler architectures and their usual tasks. Special attention is paid to GNU assembler. Designed assembler consists of the fixed and the generated part. The generated part is created automatically from the description of instruction set, that is defined using architecture and instructions set description language ISAC. Using this approach, it is possible to change assembler target architecture automatically. The second part of thesis describes the Parserlib2 library implementation that is a part of the Lissom project and provides the information about the target instruction set for an assembler generator.

APA, Harvard, Vancouver, ISO, and other styles

40

Li, Bo. "Conception et test de cellules de gestion d'énergie à commande numérique en technologies CMOS avancées." Phd thesis, INSA de Lyon, 2012. http://tel.archives-ouvertes.fr/tel-00782429.

Full text

Abstract:

Les technologies avancées de semi-conducteur permettent de mettre en œuvre un contrôleur numérique dédié aux convertisseurs à découpage, de faible puissance et de fréquence de découpage élevée sur FPGA et ASIC. Cette thèse vise à proposer des contrôleurs numériques des performances élevées, de faible consommation énergétique et qui peuvent être implémentés facilement. En plus des contrôleurs numériques existants comme PID, RST, tri-mode et par mode de glissement, un nouveau contrôleur numérique (DDP) pour le convertisseur abaisseur de tension est proposé sur le principe de la commande prédictive: il introduit une nouvelle variable de contrôle qui est la position de la largeur d'impulsion permettant de contrôler de façon simultanée le courant dans l'inductance et la tension de sortie. La solution permet une dynamique très rapide en transitoire, aussi bien pour la variation de la charge que pour les changements de tension de référence. Les résultats expérimentaux sur FPGA vérifient les performances de ce contrôleur jusqu'à la fréquence de découpage de 4MHz. Un contrôleur numérique nécessite une modulation numérique de largeur d'impulsion (DPWM). L'approche Sigma-Delta de la DPWM est un bon candidat en ce qui concerne le compromis entre la complexité et les performances. Un guide de conception d'étage Sigma-Delta pour le DPWM est présenté. Une architecture améliorée de traditionnelles 1-1 MASH Sigma-Delta DPWM est synthétisée sans détérioration de la stabilité en boucle fermée ainsi qu'en préservant un coût raisonnable en ressources matérielles. Les résultats expérimentaux sur FPGA vérifient les performances des DPWM proposées en régimes stationnaire et transitoire. Deux ASICs sont portés en CMOS 0,35µm: le contrôleur en tri-mode pour le convertisseur abaisseur de tension et la commande par mode de glissement pour les convertisseurs abaisseur et élévateur de tension. Les bancs de test sont conçus pour conduire à un modèle d'évaluation de consommation énergétique. Pour le contrôleur en tri-mode, la consommation de puissance mesurée est seulement de 24,56mW/MHz lorsque le ratio de temps en régime de repos (stand-by) est 0,7. Les consommations de puissance de command par mode de glissement pour les convertisseurs abaisseur et élévateur de tension sont respectivement de 4,46mW/MHz et 4,79mW/MHz. En utilisant le modèle de puissance, une consommation de la puissance estimée inférieure à 1mW/MHz est envisageable dans des technologies CMOS plus avancées. Comparé aux contrôlés homologues analogiques de l'état de l'art, les prototypes ASICs illustrent la possibilité d'atteindre un rendement comparable pour les applications de faible et de moyen puissance mais avec l'avantage d'une meilleure précision et une meilleure flexibilité.

APA, Harvard, Vancouver, ISO, and other styles

41

Karkhanis, Tejas. "Automated design of application-specific superscalar processors." 2006. http://www.library.wisc.edu/databases/connect/dissertations.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Kim, Kyosun. "Automatic synthesis of application-specific programmable processors." 1998. https://scholarworks.umass.edu/dissertations/AAI9909175.

Full text

Abstract:

As witnessed by their recent rapid market growth, reconfigurable multi-functional data paths are an attractive alternative to both fully programmable and fully custom hardware platforms. We present a methodology for behavioral synthesis of an important and large class of reconfigurable data path designs, application specific programmable processors (ASPPs). ASPPs are data paths which provide efficient implementation for any of N functional specifications assuming that only one will be executed at any given time. The synthesis of ASPP designs imposes numerous new tasks on behavioral synthesis tools. We address bundling of applications, where n control-data flow graphs are bundled into at most m groups so that the area overhead is minimized, and all throughput constraints are satisfied. A variety of application specific constraints such as manufacturing cost minimization and risk reduction constraints are incorporated with application bundling. Task preemption is a critical enabling mechanism in a variety of multi-task real-time application scenarios. On preemption, data in the register files must be preserved in order for the task to be resumed. This entails extra memory to preserve the context and additional clock cycles to save and restore the context. We propose techniques and algorithms to incorporate micro-preemption constraints during ASPP synthesis. Specifically, we develop a controller based scheme to preclude preemption related performance degradation, techniques to minimize the context switch overhead, and algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints. This on-the-fly task preemption distinguishes ASPPs from other adaptive computing architectures. Using the architectural flexibility provided by behavioral synthesis scheduling and resource allocation, we develop a novel approach for permanent fault-tolerance. This technique combines the behavioral synthesis-based flexibility provided by each of multiple functionalities with judicious application-to-faulty-unit assignment either to maximize the permanent fault-tolerance of such ASPPs (resource constrained fault-tolerant ASPP synthesis) or to guarantee that the ASPP remains operational in the presence of all possible k-unit faults (fault-tolerance constrained ASPP synthesis). We demonstrate the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of industrial-strength designs.

APA, Harvard, Vancouver, ISO, and other styles

43

Min, Jae Hong. "Fused floating-point arithmetic for application specific processors." Thesis, 2013. http://hdl.handle.net/2152/23342.

Full text

Abstract:

Floating-point computer arithmetic units are used for modern-day computers for 2D/3D graphic and scientific applications due to their wider dynamic range than a fixed-point number system with the same word-length. However, the floating-point arithmetic unit has larger area, power consumption, and latency than a fixed-point arithmetic unit. It has become a big issue in modern low-power processors due to their limited power and performance margins. Therefore, fused architectures have been developed to improve floating-point operations. This dissertation introduces new improved fused architectures for add-subtract, sum-of-squares, and magnitude operations for graphics, scientific, and signal processing. A low-power dual-path fused floating-point add-subtract unit is introduced and compared with previous fused add-subtract units such as the single path and the high-speed dual-path fused add-subtract unit. The high-speed dual-path fused add-subtract unit has less latency compared with the single-path unit at a cost of large power consumption. To reduce the power consumption, an alternative dual-path architecture is applied to the fused add-subtract unit. The significand addition, subtraction and round units are performed after the far/close path. The power consumption of the proposed design is lower than the high-speed dual-path fused add-subtract unit at a cost in latency; however, the proposed fused unit is faster than the single-path fused unit. High-performance and low-power floating-point fused architectures for a two-term sum-of-squares computation are introduced and compared with discrete units. The fused architectures include pre/post-alignment, partial carry-sum width, and enhanced rounding. The fused floating-point sum-of-squares units with the post-alignment, 26 bit partial carry-sum width, and enhanced rounding system have less power-consumption, area, and latency compared with discrete parallel dot-product and sum-of-squares units. Hardware tradeoffs are presented between the fused designs in terms of power consumption, area, and latency. For example, the enhanced rounding processing reduces latency with a moderate cost of increased power consumption and area. A new type of fused architecture for magnitude computation with less power consumption, area, and latency than conventional discrete floating-point units is proposed. Compared with the discrete parallel magnitude unit realized with conventional floating-point squarers, an adder, and a square-root unit, the fused floating-point magnitude unit has less area, latency, and power consumption. The new design includes new designs for enhanced exponent, compound add/round, and normalization units. In addition, a pipelined structure for the fused magnitude unit is shown.
text

APA, Harvard, Vancouver, ISO, and other styles

44

Rajan, Kaushik. "Efficient Cache Organization For Application Specific And General Purpose Processors." Thesis, 2008. http://hdl.handle.net/2005/838.

Full text

Abstract:

The performance gap between processor and memory continues to remain a major performance bottleneck in both application specific and general purpose processors. This thesis strives to ease the above bottleneck by exploiting the characteristics of the application domain to improve the cache organization for two distinct processor architectures: (1) application specific processors for packet forwarding, (2) general purpose processors. Packet forwarding algorithms make use of a trie data structure to determine the forwarding route. We observe that the locality characteristics of the nodes at various levels of such a trie are different. Nodes that are closer to the root node, especially those that are immediate children of the root node (level-one nodes), exhibit higher temporal locality than nodes lower down the trie. Based on this observation we propose a novel Heterogeneously Segmented Cache Architecture (HSCA) that uses separate caches for level-one and lower-level nodes, each with carefully chosen sizes. We also propose a new replacement policy to enhance the performance of HSCA. Performance evaluation indicates that HSCA results in up to 32% reduction in average memory access time over a unified cache that shares the same cache space among all levels of the trie. HSCA also outperforms a previously proposed results cache. The use of a large root branching factor in a forwarding trie forcefully introduces a large number of nodes at level-one. Among these, only nodes that cover prefixes from the routing table are useful while the rest, are superfluous. We find that as many as 75% of the level-one nodes are superfluous. This leads to a skewed distribution of useful nodes among the cache sets of the level-one nodes cache. We propose a novel two-level mapping framework that achieves a better nodes to cache set mapping and hence incurs fewer conflict misses. Two-level mapping first aggregates nodes into Initial Partitions (IPs) using lower order bits and then remaps them from IPs into Refined Partitions (RPs), that form sets, based on some higher order bits. It provides flexibility in placement by allowing each IP to choose a different remap function. We propose three schemes conforming to the framework. A speedup in average memory access time of as much as 16% is gained over HSCA. In general purpose processor architectures, the design objectives of caches at various levels of the hierarchy are different. To ensure low access latencies, L1 caches are small and have low associativities, making them more susceptible to conflict misses. The extent of conflict misses incurred is governed by the placement function and the memory access patterns exhibited by the program. We propose a mechanism to learn the access characteristics of the program at runtime by analyzing the repetitive phases of program. We then make use of the two-level mapping framework to dynamically adapt the placement function. Further, we elegantly incorporate two-level mapping into the cache organization without increasing the cache access latency. Performance evaluation reveals that the proposed adaptive placement mechanism eliminates 32—36% of misses on average over a range of cache sizes. To prevent expensive off-chip accesses, L2 caches are larger and have higher associativities. Hence, the replacement policy plays a significant role in determining L2 cache performance. Further, as the inherent temporal locality in memory accesses is filtered out by the L1 cache, an L2 cache using the widely prevalent LRU replacement policy incurs significantly higher misses than the optimal replacement policy (OPT). We propose to bridge this gap through a novel replacement strategy that mimics the replacement decisions of OPT. The L2 cache is logically divided into two components, a Shepherd Cache (SC) with a simple FIFO replacement and a Main Cache (MC) with an emulation of optimal replacement. The SC plays the dual role of caching lines and shepherding the replacement decisions close to optimal for MC. Our proposed organization can cover 40% of the gap between LRU and OPT, resulting in 7% overall speedup.

APA, Harvard, Vancouver, ISO, and other styles

45

Κάργας, Χρήστος. "Energy efficient instruction decoding in application: Specific instruction - set processors." Thesis, 2012. http://hdl.handle.net/10889/6295.

Full text

Abstract:

With commercial processor design tools, a designer can quickly design a C- programmable ASIP for a specific application domain. There are several such ASIPs available for both wireless (UWB baseband processing), encryption, and biomedical processing (particularly for ECG beat detection). In traditional CPUs and DSPs the impact of the instruction-set definition and the complexity of the instruction decoder can be substantial, especially in terms of power consumption. Fully orthogonal VLIW processors, do not incur the cost of an instruction decoder that severely. Instead the instruction word becomes very large, thereby shifting the (power-)cost to the program memory or instruction cache. For the purposes of this thesis a SIMD processor is developed and is compared to a soft-SIMD to observe its area, performance and energy efficiency for a bioimaging benchmark and how the processor description in the ASIP language nML, defines the generated HDL. This SIMD processor is turned into orthogonal and using iterative experiments it is investigated, what is the impact on power while manipulating the instruction-set architecture in combination with the program memory size. It is also investigated how instruction-set re-configuration can be exploited to improve power efficiency. Using this investigation guidelines for low-power ASIP design can be produced.
Με τη σύγχρονη τεχνολογία σχεδιασμού επεξεργαστών, ο σχεδιαστής μπορεί με ευκολία να σχεδιάσει ένα προγραμματιζόμενο Επεξεργαστή Συνόλου Εντολών Ειδικού Σκοπού (ASIP - Application-Specific Instruction-set Processor) για ένα συγκεκριμένο εύρος εφαρμογών. Υπάρχουν διάφοροι τέτοιοι επεξεργαστές διαθέσιμοι για ασύρματες εφαρμογές, κρυπτογράφηση και βιοϊατρικές εφαρμογές (π.χ. στον αλγόριθμο εντοπισμού χτύπου ηλεκτροκαρδιογραφήματος). Στους παραδοσιακούς επεξεργαστές και επεξεργαστές σήματος (DSP - Digital Signal Processor) ο ορισμός του συνόλου εντολών και η πολυπλοκότητα έχουν μεγάλη επίδραση, ειδικά στην κατανάλωση ισχύος. Μία πιθανή λύση σε αυτό το πρόβλημα είναι οι ορθογώνιοι επεξεργαστές μεγάλου μεγέθους λέξης εντολής (VLIW - Very Large Instruction Word). Με τον όρο ορθογώνιο επεξεργαστή, ορίζεται ένας επεξεργαστής οριζόντιου σύνολου εντολών, άρα ένας επεξεργαστής στον οποίο μπορεί να υπάρξει κάθε διαθέσιμος συνδυασμός μεταξύ των διαθέσιμων εντολών και των μεθόδων διευθυνσιοδότησης για πρόσβαση στη μνήμη και το αρχείο καταχωρητών. Οι ορθογώνιοι επεξεργαστές δεν επιβαρύνουν τόσο τον αποκωδικοποιητή εντολών. Αντί αυτού το μέγεθος της λέξης της εντολής γίνεται πολύ μεγάλο, και έτσι μετατίθεται το ενεργειακό κόστος στην μνήμη εντολών προγράμματος (program memory )ή την κρυφή μνήμη εντολών προγράμματος (instruction cache). Για τους σκοπούς αυτής της διπλωματικής εργασίας, αναπτύχθηκε ένας επεξεργαστής SIMD, ο οποίος συγκρίνεται με έναν soft-SIMD για να μελετηθούν η απαιτούμενη περιοχή στο ενσωματωμένο, επιδόσεις και κατανάλωση ενέργειας για μία βιοϊατρική εφαρμογή, καθώς και το πως η περιγραφή ενός επεξεργαστή στη γλώσσα περιγραφής επεξεργαστών ASIP nML ορίζει την παραγούμενη γλώσσα περιγραφής υλικού (HDL - Hardware Description Language). Ο επεξεργαστής αυτός μετατρέπεται σε ορθογώνιο, και με τη χρήση επαναληπτικών πειραμάτων μελετάται η επίδραση στην κατανάλωση ενέργειας κατά τη διάρκεια αλλαγών στην αρχιτεκτονική του συνόλου εντολών και του μεγέθους της μνήμης εντολών προγράμματος. Ακόμη μελετάται πως μπορεί να εκμεταλλευτεί ο σχεδιαστής την αναδιάρθρωση του συνόλου εντολών για να βελτιώσει την κατανάλωση ενέργειας.

APA, Harvard, Vancouver, ISO, and other styles

46

Chao, Chie-Min, and 趙至敏. "Development of Software Tools for Application-Specific Instruction-set Processors (ASIPs)." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/10553662144219920236.

Full text

Abstract:

碩士
國立交通大學
電子工程系所
93
Programmable processors are dramatically attractive to amortize manufacturing costs and design efforts, as the system complexity grows. Besides, in order to satisfy the tight design constraints such as performance and power of today’s embedded systems, processor architectures are getting more specialized to some application domains (e.g. an application-specific instruction-set processor; ASIP). This thesis discusses the acceleration of system prototyping of new processor cores by reducing the software development time. Firstly, we propose a simple and effective high-level language compilation method by encapsulation new processor cores in compiler o friendly RISC shell. The native code translation form compiled RISC codes to the target processor is carried out by cooperating hardware and software. Secondly, we propose an efficient instruction set simulator with decoupled hazard checker and memory simulator. The simulation time is significantly reduced via native translation, while the cycle accuracy is maintained with proper instrumentation. Finally, we have constructed a C compiler with 6.98% hardware over head and a cycle-accurate ISS with 102~104 speed up for a proprietary DSP processor. Moreover, we have developed JPEG and H.264 encoding systems based on these software tools.

APA, Harvard, Vancouver, ISO, and other styles

47

Yu, Cheng-Juei, and 余承叡. "Methodologies and Algorithms for High-Level Synthesis of Application Specific Processors." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/19319584832695146324.

Full text

Abstract:

博士
國立臺灣大學
電機工程學研究所
99
Growing design complexity has led designers to generate designs at higher levels of abstraction, such as, the behavior description level. The core task for synthesizing the behavior description is the high-level synthesis (HLS), which contains three main steps to create a hardware architecture of datapath elements, control logic and memory elements: resource allocation, binding and scheduling. Among these tasks, resource scheduling is considered the most important in a HLS process. A part of this thesis is devoted to the study of the problem of resource-constrained scheduling (RCS) and proposes two search algorithms to exactly solve the problem in an effective, systematic way. The proposed algorithms are capable of reducing the computational effort required to obtain the best schedules on a pre-defined datapath by effectively pruning the non-promising search space. The effectiveness of the algorithms against existing approaches over time and space is demonstrated by theorems and related analysis. Furthermore, this thesis also presents methodologies that help convert a given application specified in C programming language into a hardware implementation of a custom processor. Besides datapaths representing standalone functional blocks the proposed RCS algorithms can schedule effectively, control paths representing calls of functions are also considered. Based on and extended from the concept of hierarchical finite-state machines (HFSMs), a number of built-in HFSM templates are proposed and used as the elementary components of a hardware design. Guidelines on the refinement of a C program are introduced; the refined C functions are compiled into HFSMs that in turn generate synthesizable hardware description language (HDL) code as the final design. A set of HFSMs is viewed as an intermediate representation between C and HDL and can be functionally simulated. Two modeling levels, i.e. cycle-accurate and cycle-approximated, are supported. In the end of this thesis, experimental results on a series of several well known algorithmic benchmarks demonstrate the effectiveness of the proposed algorithms and approaches against existing ones.

APA, Harvard, Vancouver, ISO, and other styles

48

Chen, Bo-hong, and 陳柏宏. "Task Partition and Scheduling on Multiple Heterogeneous Application Specific Instruction Set Processors." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/68773497398506205745.

Full text

Abstract:

碩士
逢甲大學
資訊工程所
97
Stream processing applications demand high throughput. Multiple heterogeneous Application-Specific Instruction-set Processor (ASIP) architectures tend not to offer only sufficient throughput but also runtime flexibility, which cannot be provided by custom ASIC solutions. In this paper, the partition and scheduling methodology has been proposed to utilize both the Application-Specific Instruction (ASI) in ASIP, to exploit fine-grain data parallelism, and multiple ASIPs, to exploit task and pipeline parallelism. In our design flow, the data dependence of a streaming application will be analyzed with all possible ASI candidates and then to generate the corresponding task graph. The task graph will be partitioned and scheduled to the pipeline stages to achieve the desired throughput constraint with less hardware cost. The experiment results show an average hardware reduction of 8% compared to the results without using ASIs.

APA, Harvard, Vancouver, ISO, and other styles

49

Wang, Hui-Shan, and 王惠珊. "Generating and Exploiting Reconfigurable Custom Functional Unit in Application Specific VLIW Processors." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/54856053965819254006.

Full text

Abstract:

碩士
國立交通大學
資訊科學與工程研究所
97
To improve the performance of processors, a customized accelerator, reconfigurable custom functional unit (RCFU), may be appended to a very long instruction word (VLIW) processor architecture. The technique is to generate RCFU by those frequent operation segments and collapse operation segments which could be executed on the RCFU as customized instructions. Then, instruction scheduling is done to elaborate instruction-level parallelism for performance improvement at compile time. In this research, we propose not only a tightly-coupled RCFU design on the VLIW processor, but also an algorithm is also proposed to exploit the processor augmented with RCFU. We assume that FUs in the processor pipeline and RCFU could execute simultaneously, and independent operation mapping and instruction scheduling algorithms are integrated into a single phase to get more performance gains and higher hardware usability. We had comparisons between the processors with RCFU and without RCFU. Overall, our proposed RCFU design while using our proposed exploitation algorithm still achieves giant speedup on average over previous generating algorithms. Furthermore, the algorithm for exploiting RCFU also achieves obviously speedup on average over previous methods, separating algorithms.

APA, Harvard, Vancouver, ISO, and other styles

50

Li, Gin-Hsuan, and 李京軒. "Application-Specific Instruction-set Processors With Implicit Registers To Improve Register Bandwidth." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/47423386130421458728.

Full text

Abstract:

碩士
逢甲大學
資訊工程所
98
Application-Specific Instruction-set processors (ASIPs) has become an important design choice for embedded systems due to runtime flexibility, which cannot be provided by custom ASIC solutions. However, the limited register bandwidth in the core processor becomes a potential performance bottleneck. In our design flow, the Implicit registers (Iregs) and a Implicit register allocation algorithm is proposed to improve the data bandwidth of CIs and reduce the number of additional move instructions. The experiment results show an average speedup of 8% (up to 27%) compared to the results without using Implicit registers.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Application specific processors'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles