Log in

Relevant bibliographies by topics / Reconfigurable Hardware Architecture / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Reconfigurable Hardware Architecture.

Dissertations / Theses on the topic 'Reconfigurable Hardware Architecture'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 48 dissertations / theses for your research on the topic 'Reconfigurable Hardware Architecture.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Gelb, Benjamin S. "A timeshared, runtime reconfigurable hardware co-processing architecture." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/53147.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
Includes bibliographical references (leaves 73-74).
The constant desire for increased performance in microprocessor systems has led to the need for specialized hardware cores to accelerate specific computational tasks. In this thesis, we explore the potential of using FPGA partial reconfiguration to create a platform for customized hardware cores that may be loaded on demand, at runtime, and replaced when not in use. We implement two new software tools, bitparse and bitrender, to demonstrate the bitstream relocation technique. Further, we present a functional microprocessor system coupled with a runtime reprogramable peripheral synthesized on a Xilinx Virtex-5 FPGA and discuss its performance implications.
by Benjamin S. Gelb.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

2

Peterkin, Raymond. "A reconfigurable hardware architecture for VPN MPLS based services." Thesis, University of Ottawa (Canada), 2006. http://hdl.handle.net/10393/27283.

Full text

Abstract:

Internet applications are becoming increasingly resource intensive and perform poorly in the presence of significant congestion. Increased bandwidth cannot provide long-term congestion relief so Internet traffic must be prioritized and efficiently routed. Multiprotocol Label Switching (MPLS) [12] provides the means to process traffic quickly and reserve resources for applications with specific requirements. However, MPLS must provide the same resilience mechanisms as ATM [18] over SONET [46] to become an acceptable alternative for assigning and switching label switched paths (LSPs). This thesis proposes a reconfigurable architecture and a prototype of a hardware processor for MPLS to improve its overall performance. Establishing LSPs and label management are the central tasks of the processor. It is used to describe LSPs and perform packet switching. A significant subset of RSVP-TE is implemented in the processor to provide the necessary mechanisms of a signaling protocol. Functionality is also available for Traffic Engineering (TE) allowing a user to configure the allocation of resources available for MPLS. The processor is designed to interact with software so it can become part of an embedded system. Results and analysis for the processor are provided describing its resource usage and performance. Resource intensive tasks are identified through analysis and determinations are made about the worst case performance improvement compared to software implementations which depend on the number of LSPs considered and network size.

APA, Harvard, Vancouver, ISO, and other styles

3

Diniz, Claudio Machado. "Dedicated and reconfigurable hardware accelerators for high efficiency video coding standard." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2015. http://hdl.handle.net/10183/118394.

Full text

Abstract:

A demanda por vídeos de resolução ultra-alta (além de 1920x1080 pontos) levou à necessidade de desenvolvimento de padrões de codificação de vídeo novos e mais eficientes para prover alta eficiência de compressão. O novo padrão High Efficiency Video Coding (HEVC), publicado em 2013, atinge o dobro da eficiência de compressão (ou 50% de redução no tamanho do vídeo codificado) comparado com o padrão mais eficiente até então, e mais utilizado no mercado, o padrão H.264/AVC (Advanced Video Coding). O HEVC atinge este resultado ao custo de uma elevação da complexidade computacional das ferramentas inseridas no codificador e decodificador. O aumento do esforço computacional do padrão HEVC e as limitações de potência das tecnologias de fabricação em silício atuais tornam essencial o desenvolvimento de aceleradores de hardware para partes importantes da aplicação do HEVC. Aceleradores de hardware fornecem maior desempenho e eficiência energética para aplicações específicas que os processadores de propósito geral. Uma análise da aplicação do HEVC realizada neste trabalho identificou as partes mais importantes do HEVC do ponto de vista do esforço computacional, a saber, o Filtro de Interpolação de Ponto Fracionário, o Filtro de Deblocagem e o cálculo da Soma das Diferenças Absolutas. Uma análise de tempo de execução do Filtro de Interpolação indica um grande potencial de economia de potência/energia pela adaptação do acelerador de hardware à carga de trabalho variável. Esta tese introduz novas contribuições no tema de aceleradores dedicados e reconfiguráveis para o padrão HEVC. Aceleradores de hardware dedicados para o Filtro de Interpolação de Pixel Fracionário, para o Filtro de Deblocagem, e para o cálculo da Soma das Diferenças Absolutas, são propostos, projetados e avaliados nesta tese. A arquitetura de hardware proposta para o filtro de interpolação atinge taxa de processamento similar ao estado da arte, enquanto reduz a área do hardware para este bloco em 50%. A arquitetura de hardware proposta para o filtro de deblocagem também atinge taxa de processamento similar ao estado da arte com uma redução de 5X a 6X na contagem de gates e uma redução de 3X na dissipação de potência. A nova análise comparativa proposta para os elementos de processamento do cálculo da Soma das Diferenças Absolutas introduz diversas alternativas de projeto de arquitetura com diferentes resultados de área, desempenho e potência. A nova arquitetura reconfigurável para o filtro de interpolação do padrão HEVC fornece 57% de redução de área em tempo de projeto e adaptação da potência/energia em tempo-real a cada imagem processada, o que ainda não é suportado pelas arquiteturas do estado da arte para o filtro de interpolação. Adicionalmente, a tese propõe um novo esquema de alocação de aceleradores em tempo-real para arquiteturas reconfiguráveis baseadas em tiles de processamento e de grão-misto, o que reduz em 44% (23% em média) o “overhead” de comunicação comparado com uma estratégia first-fit com reuso de datapaths, para números diferentes de tiles e organizações internas de tile. Este esquema de alocação leva em conta a arquitetura interna para alocar aceleradores de uma maneira mais eficiente, evitando e minimizando a comunicação entre tiles. Os aceleradores e técnicas dedicadas e reconfiguráveis propostos nesta tese proporcionam implementações de codificadores de vídeo de nova geração, além do HEVC, com melhor área, desempenho e eficiência em potência.
The demand for ultra-high resolution video (beyond 1920x1080 pixels) led to the need of developing new and more efficient video coding standards to provide high compression efficiency. The High Efficiency Video Coding (HEVC) standard, published in 2013, reaches double compression efficiency (or 50% reduction in size of coded video) compared to the most efficient video coding standard at that time, and most used in the market, the H.264/AVC (Advanced Video Coding) standard. HEVC reaches this result at the cost of high computational effort of the tools included in the encoder and decoder. The increased computational effort of HEVC standard and the power limitations of current silicon fabrication technologies makes it essential to develop hardware accelerators for compute-intensive computational kernels of HEVC application. Hardware accelerators provide higher performance and energy efficiency than general purpose processors for specific applications. An HEVC application analysis conducted in this work identified the most compute-intensive kernels of HEVC, namely the Fractional-pixel Interpolation Filter, the Deblocking Filter and the Sum of Absolute Differences calculation. A run-time analysis on Interpolation Filter indicates a great potential of power/energy saving by adapting the hardware accelerator to the varying workload. This thesis introduces new contributions in the field of dedicated and reconfigurable hardware accelerators for HEVC standard. Dedicated hardware accelerators for the Fractional Pixel Interpolation Filter, the Deblocking Filter and the Sum of Absolute Differences calculation are herein proposed, designed and evaluated. The interpolation filter hardware architecture achieves throughput similar to the state of the art, while reducing hardware area by 50%. Our deblocking filter hardware architecture also achieves similar throughput compared to state of the art with a 5X to 6X reduction in gate count and 3X reduction in power dissipation. The thesis also does a new comparative analysis of Sum of Absolute Differences processing elements, in which various architecture design alternatives with different area, performance and power results were introduced. A novel reconfigurable interpolation filter hardware architecture for HEVC standard was developed, and it provides 57% design-time area reduction and run-time power/energy adaptation in a picture-by-picture basis, compared to the state-of-the-art. Additionally a run-time accelerator binding scheme is proposed for tile-based mixed-grained reconfigurable architectures, which reduces the communication overhead, compared to first-fit strategy with datapath reusing scheme, by up to 44% (23% on average) for different number of tiles and internal tile organizations. This run-time accelerator binding scheme is aware of the underlying architecture to bind datapaths in an efficient way, to avoid and minimize inter-tile communications. The new dedicated and reconfigurable hardware accelerators and techniques proposed in this thesis enable next-generation video coding standard implementations beyond HEVC with improved area, performance, and power efficiency.

APA, Harvard, Vancouver, ISO, and other styles

4

Kung, Ling-Pei 1961. "Obtaining performance and programmability using reconfigurable hardware for media processing." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/61855.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2002.
Includes bibliographical references (p. 127-132).
An imperative requirement in the design of a reconfigurable computing system or in the development of a new application on such a system is performance gains. However, such developments suffer from long-and-difficult programming process, hard-to-predict performance gains, and limited scope of applications. To address these problems, we need to understand reconfigurable hardware's capabilities and limitations, its performance advantages and disadvantages, re-think reconfigurable system architectures, and develop new tools to explore its utility. We begin by examining performance contributors at the system level. We identify those from general-purpose and those from dedicated components. We propose an architecture by integrating reconfigurable hardware within the general-purpose framework. This is to avoid and minimize dedicated hardware and organization for programmability. We analyze reconfigurable logic architectures and their performance limitations. This analysis leads to a theory that reconfigurable logic can never be clocked faster than a fixed-logic design based on the same fabrication technology. Though highly unpredictable, we can obtain a quick upper bound estimate on the clock speed based on a few parameters. We also analyze microprocessor architectures and establish an analytical performance model. We use this model to estimate performance bounds using very little information on task properties. These bounds help us to detect potential memory-bound tasks. For a compute-bound task, we compare its performance upper bound with the upper bound on reconfigurable clock speed to further rule out unlikely speedup candidates.
(cont.) These performance estimates require very few parameters, and can be quickly obtained without writing software or hardware codes. They can be integrated with design tools as front end tools to explore speedup opportunities without costly trials. We believe this will broaden the applicability of reconfigurable computing.
by Ling-Pei Kung.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

5

Balasubramanian, Karthikeyan. "Reconfigurable System-on-Chip Architecture for Neural Signal Processing." Diss., Temple University Libraries, 2011. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/144255.

Full text

Abstract:

Electrical Engineering
Ph.D.
Analyzing the brain's behavior in terms of its neuronal activity is the fundamental purpose of Brain-Machine Interfaces (BMIs). Neuronal activity is often assumed to be encoded in the rate of neuronal action potential spikes. Successful performance of a BMI system is tied to the efficiency of its individual processing elements such as spike detection, sorting and decoding. To achieve reliable operation, BMIs are equipped with hundreds of electrodes at the neural interface. While a single electrode/tetrode communicates with up to four neurons at a given instant of time, a typical interface communicates with an ensemble of hundreds or even thousands of neurons. However, translation of these signals (data) into usable information for real-time BMIs is bottlenecked due to the lack of efficient real-time algorithms and real-time hardware that can handle massively parallel channels of neural data. The research presented here addresses this issue by developing real-time neural processing algorithms that can be implemented in reconfigurable hardware and thus, can be scaled to handle thousands of channels in parallel. The developed reconfigurable system serves as an evaluation platform for investigating the fundamental design tradeoffs in allocating finite hardware resources for a reliable BMI. In this work, the generic architectural layout needed to process neural signals in a massive scale is discussed. A System-on-Chip design with embedded system architecture is presented for FPGA hardware realization that features (a) scalability (b) reconfigurability, and (c) real-time operability. A prototype design incorporating a dual processor system and essential neural signal processing routines such as real-time spike detection and sorting is presented. Two kinds of spike detectors, a simple threshold-based and non-linear energy operator-based, were implemented. To achieve real-time spike sorting, a fuzzy logic-based spike sorter was developed and synthesized in the hardware. Furthermore, a real-time kernel to monitor the high-level interactions of the system was implemented. The entire system was realized in a platform FPGA (Xilinx Virtex-5 LX110T). The system was tested using extracellular neural recordings from three different animals, a owl monkey, a macaque and a rat. Operational performance of the system is demonstrated for a 300 channel neural interface. Scaling the system to 900 channels is trivial.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

6

Lomonaco, Michael John. "CRYPTARRAY A SCALABLE AND RECONFIGURABLE ARCHITECTURE FOR CRYPTOGRAPHIC APPLICATIONS." Master's thesis, University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4394.

Full text

Abstract:

Cryptography is increasingly viewed as a critical technology to fulfill the requirements of security and authentication for information exchange between Internet applications. However, software implementations of cryptographic applications are unable to support the quality of service from a bandwidth perspective required by most Internet applications. As a result, various hardware implementations, from Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), to programmable processors, were proposed to improve this inadequate quality of service. Although these implementations provide performances that are considered better than those produced by software implementations, they still fall short of addressing the bandwidth requirements of most cryptographic applications in the context of the Internet for two major reasons: (i) The majority of these architectures sacrifice flexibility for performance in order to reach the performance level needed for cryptographic applications. This lack of flexibility can be detrimental considering that cryptographic standards and algorithms are still evolving. (ii) These architectures do not consider the consequences of technology scaling in general, and particularly interconnect related problems. As a result, this thesis proposes an architecture that attempts to address the requirements of cryptographic applications by overcoming the obstacles described in (i) and (ii). To this end, we propose a new reconfigurable, two-dimensional, scalable architecture, called CRYPTARRAY, in which bus-based communication is replaced by distributed shared memory communication. At the physical level, the length of the wires will be kept to a minimum. CRYPTARRAY is organized as a chessboard in which the dark and light squares represent Processing Elements (PE) and memory blocks respectively. The granularity and resource composition of the PEs is specifically designed to support the computing operations encountered in cryptographic algorithms in general, and symmetric algorithms in particular. Communication can occur only between neighboring PEs through locally shared memory blocks. Because of the chessboard layout, the architecture can be reconfigured to allow computation to proceed as a pipelined wave in any direction. This organization offers a high computational density in terms of datapath resources and a large number of distributed storage resources that easily support a high degree of parallelism and pipelining. Experimental prototyping a small array on FPGA chips shows that this architecture can run at 80.9 MHz producing 26,968,716 outputs every second in static reconfiguration mode and 20,226,537 outputs every second in dynamic reconfiguration mode.
M.S.
Department of Electrical and Computer Engineering
Engineering and Computer Science
Electrical and Computer Engineering

APA, Harvard, Vancouver, ISO, and other styles

7

Avakian, Annie. "Reducing Cache Access Time in Multicore Architectures Using Hardware and Software Techniques." University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1335461322.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Silva, Antonio Carlos Fernandes da. "ChipCflow: tool for convert C code in a static dataflow architecture in reconfigurable hardware." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-30062015-141638/.

Full text

Abstract:

A growing search for alternative architectures and softwares have been noted in the last years. This search happens due to the advance of hardware technology and such advances must be complemented by innovations on design methodologies, test and verification techniques in order to use technology effectively. Alternative architectures and softwares, in general, explores the parallelism of applications, differently to Von Neumann model. Among high performance alternative architectures, there is the Dataflow Architecture. In this kind of architecture, the process of program execution is determined by data availability, thus the parallelism is intrinsic in these systems. The dataflow architectures become again a highlighted search area due to hardware advances, in particular, the advances of Reconfigurable Computing and Field Programmable Gate Arrays (FPGAs). ChipCflow projet is a tool for execution of algorithms using dynamic dataflow graph in FPGA. In this thesis, the development of a code conversion tool to generate aplications in a static dataflow architecture, is described. Also the ChipCflow project where the code conversion tool is part, is presented. The specification of algorithm to be converted is made in C language and converted to a hadware description language, respecting the proposed by ChipCflow project. The results are the proof of concept of converting a high-level language code for dataflow architecture to be used into a FPGA.
Existe uma crescente busca por softwares e arquiteturas alternativas. Essa busca acontece pois houveram avanços na tecnologia do hardware, e estes avanços devem ser complementados por inovações nas metodologias de projetos, testes e verificação para que haja um uso eficaz da tecnologia. Os software e arquiteturas alternativas, geralmente são modelos que exploram o paralelismo das aplicações, ao contrário do modelo de Von Neumann. Dentre as arquiteturas alternativas de alto desempenho, tem-se a arquitetura a fluxo de dados. Nesse tipo de arquitetura, o processo de execução de programas é determinado pela disponibilidade dos dados, logo o paralelismo está embutido na própria natureza do sistema. O modelo a fluxo de dados possui a vantagem de expressar o paralelismo de maneira intrínseca, eliminando a necessidade do programador explicitar em seu código os trechos onde deve haver paralelismo. As arquiteturas a fluxo de dados voltaram a ser uma área de pesquisa devido aos avanços do hardware, em particular, os avanços da Computação Reconfigurável e dos Field Programmable Gate Arrays (FPGAs).Nesta tese é descrita uma ferramenta de conversão de código que visa a geração de aplicações utilizando uma arquitetura a fluxo de dados estática. Também é descrito o projeto ChipCflow, cuja ferramenta de conversão de código, descrita nesta tese, é parte integrante. A especificação do algoritmo a ser convertido é feita em linguagem C e convertida para uma linguagem de descrição de hardware, respeitando o modelo proposto pelo ChipCflow. Os resultados alcançados visam a prova de conceito da conversão de código de uma linguagem de alto nível para uma arquitetura a fluxo de dados a ser configurada em FPGA.

APA, Harvard, Vancouver, ISO, and other styles

9

Robinson, Kylan Thomas. "An integrated development environment for the design and simulation of medium-grain reconfigurable hardware." Pullman, Wash. : Washington State University, 2010. http://www.dissertations.wsu.edu/Thesis/Spring2010/k_robinson_041510.pdf.

Full text

Abstract:

Thesis (M.S. in computer engineering)--Washington State University, May 2010.
Title from PDF title page (viewed on June 22, 2010). "School of Electrical Engineering and Computer Science." Includes bibliographical references (p. 75-76).

APA, Harvard, Vancouver, ISO, and other styles

10

Das, Satyajit. "Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems." Thesis, Lorient, 2018. http://www.theses.fr/2018LORIS490/document.

Full text

Abstract:

La complexité des systèmes embarqués et des applications impose des besoins croissants en puissance de calcul et de consommation énergétique. Couplé au rendement en baisse de la technologie, le monde académique et industriel est toujours en quête d'accélérateurs matériels efficaces en énergie. L'inconvénient d'un accélérateur matériel est qu'il est non programmable, le rendant ainsi dédié à une fonction particulière. La multiplication des accélérateurs dédiés dans les systèmes sur puce conduit à une faible efficacité en surface et pose des problèmes de passage à l'échelle et d'interconnexion. Les accélérateurs programmables fournissent le bon compromis efficacité et flexibilité. Les architectures reconfigurables à gros grains (CGRA) sont composées d'éléments de calcul au niveau mot et constituent un choix prometteur d'accélérateurs programmables. Cette thèse propose d'exploiter le potentiel des architectures reconfigurables à gros grains et de pousser le matériel aux limites énergétiques dans un flot de conception complet. Les contributions de cette thèse sont une architecture de type CGRA, appelé IPA pour Integrated Programmable Array, sa mise en œuvre et son intégration dans un système sur puce, avec le flot de compilation associé qui permet d'exploiter les caractéristiques uniques du nouveau composant, notamment sa capacité à supporter du flot de contrôle. L'efficacité de l'approche est éprouvée à travers le déploiement de plusieurs applications de traitement intensif. L'accélérateur proposé est enfin intégré à PULP, a Parallel Ultra-Low-Power Processing-Platform, pour explorer le bénéfice de ce genre de plate-forme hétérogène ultra basse consommation
Emerging trends in embedded systems and applications need high throughput and low power consumption. Due to the increasing demand for low power computing and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy efficient hardware accelerators. The main drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low is they perform one specific function and increasing the number of the accelerators in a system on chip (SoC) causes scalability issues. Programmable accelerators provide flexibility and solve the scalability issues. Coarse-Grained Reconfigurable Array (CGRA) architecture consisting of several processing elements with word level granularity is a promising choice for programmable accelerator. Inspired by the promising characteristics of programmable accelerators, potentials of CGRAs in near threshold computing platforms are studied and an end-to-end CGRA research framework is developed in this thesis. The major contributions of this framework are: CGRA design, implementation, integration in a computing system, and compilation for CGRA. First, the design and implementation of a CGRA named Integrated Programmable Array (IPA) is presented. Next, the problem of mapping applications with control and data flow onto CGRA is formulated. From this formulation, several efficient algorithms are developed using internal resources of a CGRA, with a vision for low power acceleration. The algorithms are integrated into an automated compilation flow. Finally, the IPA accelerator is augmented in PULP - a Parallel Ultra-Low-Power Processing-Platform to explore heterogeneous computing

APA, Harvard, Vancouver, ISO, and other styles

11

Werner, Stefan [Verfasser]. "Hybrid architecture for hardware-accelerated query processing in semantic web databases based on runtime reconfigurable FPGAs / Stefan Werner." Lübeck : Zentrale Hochschulbibliothek Lübeck, 2017. http://d-nb.info/1143986946/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Astolfi, Vitor Fiorotto. "ChipCflow - em hardware dinamicamente reconfigurável." Universidade de São Paulo, 2009. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05032010-203142/.

Full text

Abstract:

Nos últimos anos, houve um grande avanço na computação reconfigurável, em particular em hardware que emprega Field-Programmable Gate Arrays. Porém, esse aumento de capacidade e desempenho aumentou a distância entre a capacidade de projeto e a disponibilidade de tecnologia para o desenvolvimento do projeto. As linguagens de programação imperativas de alto nível, como C, são mais apropriadas para o desenvolvimento de aplicativos complexos que as linguagens de descrição de hardware. Por isso, surgiram diversas ferramentas para o desenvolvimento de hardware a partir de código em C. A ferramenta ChipCflow, da qual faz parte este projeto, é uma delas. A execução dos programas por meio dessa ferramenta será completamente baseada em seu fluxo de dados, seguindo o modelo dinâmico encontrado nas arquiteturas de computadores a fluxo de dados, aproveitando ao máximo o paralelismo considerado natural desse modelo e as características do hardware parcialmente reconfigurável. Neste projeto em particular, o objetivo é a prova de conceito (proof of concept) para a criação de instâncias, em forma de operadores, de um algoritmo ChipCflow em hardware parcialmente reconfigurável, tendo como base a plataforma Virtex da Xilinx
In recent years, reconfigurable computing has become increasingly more advanced, especially in hardware that uses Field-Programmable Gate Arrays. However, the increase of performance in FPGAs accumulated the gap between design capacity and technology for the development of the design. Imperative high-level programming languages such as C are more appropriate for the development of complex algorithms than hardware description languages (HDL). For this reason, many ANSI C-like programming tools for the development of hardware came to existence. The ChipCflow project, of which this project is part, is one of these tools. The execution of algorithms through this tool will be completely directed by data flow, according to the dynamic model found on Dataflow Architectures, taking advantage of its natural high levels of parallelism and the characteristics of the partially reconfigurable hardware. In this project, the objective is a proof of concept for the creation of instances, in the form of operators, of a ChipCflow algorithm on a partially reconfigurable hardware, taking as reference the Xilinx Virtex boards

APA, Harvard, Vancouver, ISO, and other styles

13

Qian, Wenchao. "Energy-efficientSpatio-temporalComputing Framework." Case Western Reserve University School of Graduate Studies / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=case1459257723.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Lloyd, G. Scott. "Accelerated Large-Scale Multiple Sequence Alignment with Reconfigurable Computing." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2729.

Full text

Abstract:

Multiple Sequence Alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. The time to compute an optimal MSA grows exponentially with respect to the number of sequences. Consequently, producing timely results on large problems requires more efficient algorithms and the use of parallel computing resources. Reconfigurable computing hardware provides one approach to the acceleration of biological sequence alignment. Other acceleration methods typically encounter scaling problems that arise from the overhead of inter-process communication and from the lack of parallelism. Reconfigurable computing allows a greater scale of parallelism with many custom processing elements that have a low-overhead interconnect. The proposed parallel algorithms and architecture accelerate the most computationally demanding portions of MSA. An overall speedup of up to 150 has been demonstrated on a large data set when compared to a single processor. The reduced runtime for MSA allows researchers to solve the larger problems that confront biologists today.

APA, Harvard, Vancouver, ISO, and other styles

15

El-Hassan, Fadi. "Hardware Architecture of an XML/XPath Broker/Router for Content-Based Publish/Subscribe Data Dissemination Systems." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/30660.

Full text

Abstract:

The dissemination of various types of data faces ongoing challenges with the growing need of accessing manifold information. Since the interest in content is what drives data networks, some new technologies and thoughts attempt to cope with these challenges by developing content-based rather than address-based architectures. The Publish/ Subscribe paradigm can be a promising approach toward content-based data dissemination, especially that it provides total decoupling between publishers and subscribers. However, in content-based publish/subscribe systems, subscriptions are expressive and the information is often delivered based on the matched expressive content - which may not deeply alleviate considerable performance challenges. This dissertation explores a hardware solution for disseminating data in content-based publish/subscribe systems. This solution consists of an efficient hardware architecture of an XML/XPath broker that can route information based on content to either other XML/XPath brokers or to ultimate users. A network of such brokers represent an overlay structure for XML content-based publish/subscribe data dissemination systems. Each broker can simultaneously process many XPath subscriptions, efficiently parse XML publications, and subsequently forward notifications that result from high-performance matching processes. In the core of the broker architecture, locates an XML parser that utilizes a novel Skeleton CAM-Based XML Parsing (SCBXP) technique in addition to an XPath processor and a high-performance matching engine. Moreover, the broker employs effective mechanisms for content-based routing, so as subscriptions, publications, and notifications are routed through the network based on content. The inherent reconfigurability feature of the broker’s hardware provides the system architecture with the capability of residing in any FPGA device of moderate logic density. Furthermore, such a system-on-chip architecture is upgradable, if any future hardware add-ons are needed. However, the current architecture is mature and can effectively be implemented on an ASIC device. Finally, this thesis presents and analyzes the experiments conducted on an FPGA prototype implementation of the proposed broker/router. The experiments tackle tests for the SCBXP alone and for two phases of development of the whole broker. The corresponding results indicate the high performance that the involved parsing, storing, matching, and routing processes can achieve.

APA, Harvard, Vancouver, ISO, and other styles

16

Bollengier, Théotime. "Du prototypage à l’exploitation d’overlays FPGA." Thesis, Brest, École nationale supérieure de techniques avancées Bretagne, 2018. http://www.theses.fr/2018ENTA0003/document.

Full text

Abstract:

De part leur capacité de reconfiguration et les performances qu’ils offrent, les FPGAs sont de bons candidats pour accélérer des applications dans le Cloud. Cependant, les FPGAs présentent certaines caractéristiques qui font obstacle à leur utilisation dans le Cloud et leur adoption par les clients : premièrement, la programmation des FPGAs se fait à bas niveau et demande une certaine expertise, que n’ont pas nécessairement les clients habituels du Cloud. Deuxièmement, les FPGAs ne présentent pas de mécanismes natifs permettant leur intégration dans le modèle de gestion dynamique d’une infrastructure Cloud.Dans ce travail, nous proposons d’utiliser des architectures overlay afin de faciliter l’adoption, l’intégration et l’exploitation de FPGAs dans le Cloud. Les overlays sont des architectures reconfigurables elles-mêmes implémentée sur FPGA. En tant que couche d’abstraction matérielle placée entre le FPGA et les applications, les overlays permettent de monter le niveau d’abstraction du modèle d’exécution présenté aux applications et aux utilisateurs, ainsi que d’implémenter des mécanismes facilitant leur intégration et leur exploitation dans une infrastructure Cloud.Ce travail présente une approche verticale adressant tous les aspects de la mise en œuvre d’overlays dans le Cloud en tant qu’accélérateurs reconfigurables par les clients : de la conception et l’implémentation des overlays, leur intégration sur des plateformes FPGA commerciales, la mise en place de leurs mécanismes d’exploitation, jusqu’à la réalisationde leurs outils de programmation. L’environnement réalisé est complet, modulaire et extensible, il repose en partie sur différents outils existants, et démontre la faisabilité de notre approche
Due to their reconfigurable capability and the performance they offer, FPGAs are good candidates for accelerating applications in the cloud. However, FPGAs have some features that hinder their use in the Cloud as well as their adoption by customers : first, FPGA programming is done at low level and requires some expertise that usual Cloud clients do not necessarily have. Secondly, FPGAs do not have native mechanisms allowing them to easily fit in the dynamic execution model of the Cloud.In this work, we propose to use overlay architectures to facilitate FPGA adoption, integration, and operation in the Cloud. Overlays are reconfigurable architectures synthesized on FPGA. As hardware abstraction layers placed between the FPGA and applications, overlays allow to raise the abstraction level of the execution model presented to applications and users, as well as to implement mechanisms making them fit in a Cloud infrastructure.This work presents a vertical approach addressing all aspects of overlay operation in the Cloud as reconfigurable accelerators programmable by tenants : from designing and implementing overlays, integrating them on commercial FPGA platforms, setting up their operating mechanisms, to developping their programming tools. The environment developped in this work is complete, modular and extensible, it is partially based on several existing tools, and demonstrate the feasibility of our approach

APA, Harvard, Vancouver, ISO, and other styles

17

Oliveira, Tiago de. "Desenvolvimento de uma arquitetura multiprocessada e reconfigurável para a síntese de redes de Petri em hardware /." Ilha Solteira : [s.n.], 2008. http://hdl.handle.net/11449/100361.

Full text

Abstract:

Orientador: Norian Marranghello
Banca: Aledir Silveira Pereira
Banca: Alexandre Cesar Rodrigues da Silva
Banca: Furio Damiani
Banca: Paulo Romero Martins Maciel
Resumo: O objetivo desta tese é o desenvolvimento de uma arquitetura multiprocessada e reconfiguravel que permita a implementação física de sistemas de controle descritos por meio de Redes de Petri coloridas de arcos constantes T-temporizadas e que possuam pro- babilidade de disparo nas transições. A arquitetura pode ser utilizada para implementar sistemas de controle (e n~ao para a avaliacao das propriedades da Rede de Petri), permi- tindo a implementacao física por meio de mapeamento tecnologico diretamente no nível comportamental, sem a necessidade de se utilizar um processo de síntese de alto nível para descrever o sistema em equações booleanas e tabelas de transição de estados. A arquitetura é composta por um arranjo de blocos de configuracao denominados BCERPs, por blocos reconfiguráveis denominados BCGNs e por um sistema de comunicacão, implementado por um conjunto de roteadores. Os blocos BCERPs podem ser configurados para implementar as transições da Rede de Petri e seus respectivos lugares de entrada. Blocos BCGNs são utilizados pelos blocos BCERPs para a geração de numeros pseudo-aleatorios. Estes numeros podem definir a probabilidade de disparo das transições e tambem podem ser usados no processo de resolução de conflito, que ocorre quando uma transição possuir um ou mais lugares de entrada compartilhados com outras transições. O sistema de comunicacão possui uma topologia de grelha, tendo como principal função o roteamento e armazenamento de pacotes entre os blocos de configuração. Os roteadores e blocos de configuração BCERPs e BCGNs foram descritos em VHDL e implementados em FPGAs.
Abstract: The goal of this thesis is to develop a reconfigurable multiprocessed architecture that allows the physical implementation of systems described by T-timed colored Petri nets with constant arcs having transitions with firing probabilities. The architecture can be used to implement control systems (not to evaluation Petri net properties). With this architecture, physical implementation of systems can be achieved through technology mapping directly from behavioral level, without the need to go through an expensive high level synthesis process to describe the system into boolean equations and state transition tables. The architecture comprises an array of configuration blocks named BCERPs; reconfigurable blocks named BCGNs; and a communication system implemented using a set of routers. BCERP blocks can be configured to implement Petri net transitions as well as the corresponding input places. BCGN blocks are used by BCERPs for pseudo random number generation. These numbers can define transitions firing probabilities. They can also be used for conflit resolution, which happens when two or more transitions share one or more input places. The communication system presents a grid topology. Its main functions are packet storage and routing among configuration blocks. The routers, BCGNs and BCERPs configuration blocks were described in VHDL and implemented in FPGAs.
Doutor

APA, Harvard, Vancouver, ISO, and other styles

18

Oliveira, Tiago de [UNESP]. "Desenvolvimento de uma arquitetura multiprocessada e reconfigurável para a síntese de redes de Petri em hardware." Universidade Estadual Paulista (UNESP), 2008. http://hdl.handle.net/11449/100361.

Full text

Abstract:

Made available in DSpace on 2014-06-11T19:30:51Z (GMT). No. of bitstreams: 0 Previous issue date: 2008-02-26Bitstream added on 2014-06-13T19:19:32Z : No. of bitstreams: 1 oliveira_t_dr_ilha.pdf: 1857904 bytes, checksum: 58f64d9e638aa2a1040b97776689687b (MD5)
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
O objetivo desta tese é o desenvolvimento de uma arquitetura multiprocessada e reconfiguravel que permita a implementação física de sistemas de controle descritos por meio de Redes de Petri coloridas de arcos constantes T-temporizadas e que possuam pro- babilidade de disparo nas transições. A arquitetura pode ser utilizada para implementar sistemas de controle (e n~ao para a avaliacao das propriedades da Rede de Petri), permi- tindo a implementacao física por meio de mapeamento tecnologico diretamente no nível comportamental, sem a necessidade de se utilizar um processo de síntese de alto nível para descrever o sistema em equações booleanas e tabelas de transição de estados. A arquitetura é composta por um arranjo de blocos de configuracao denominados BCERPs, por blocos reconfiguráveis denominados BCGNs e por um sistema de comunicacão, implementado por um conjunto de roteadores. Os blocos BCERPs podem ser configurados para implementar as transições da Rede de Petri e seus respectivos lugares de entrada. Blocos BCGNs são utilizados pelos blocos BCERPs para a geração de numeros pseudo-aleatorios. Estes numeros podem definir a probabilidade de disparo das transições e tambem podem ser usados no processo de resolução de conflito, que ocorre quando uma transição possuir um ou mais lugares de entrada compartilhados com outras transições. O sistema de comunicacão possui uma topologia de grelha, tendo como principal função o roteamento e armazenamento de pacotes entre os blocos de configuração. Os roteadores e blocos de configuração BCERPs e BCGNs foram descritos em VHDL e implementados em FPGAs.
The goal of this thesis is to develop a reconfigurable multiprocessed architecture that allows the physical implementation of systems described by T-timed colored Petri nets with constant arcs having transitions with firing probabilities. The architecture can be used to implement control systems (not to evaluation Petri net properties). With this architecture, physical implementation of systems can be achieved through technology mapping directly from behavioral level, without the need to go through an expensive high level synthesis process to describe the system into boolean equations and state transition tables. The architecture comprises an array of configuration blocks named BCERPs; reconfigurable blocks named BCGNs; and a communication system implemented using a set of routers. BCERP blocks can be configured to implement Petri net transitions as well as the corresponding input places. BCGN blocks are used by BCERPs for pseudo random number generation. These numbers can define transitions firing probabilities. They can also be used for conflit resolution, which happens when two or more transitions share one or more input places. The communication system presents a grid topology. Its main functions are packet storage and routing among configuration blocks. The routers, BCGNs and BCERPs configuration blocks were described in VHDL and implemented in FPGAs.

APA, Harvard, Vancouver, ISO, and other styles

19

Brunie, Nicolas. "Contribution à l'arithmétique des ordinateurs et applications aux systèmes embarqués." Thesis, Lyon, École normale supérieure, 2014. http://www.theses.fr/2014ENSL0894/document.

Full text

Abstract:

Au cours des dernières décennies les systèmes embarqués ont dû faire face à des demandes applicatives de plus en plus variées et de plus en plus contraintes. Ce constat s'est traduit pour l’arithmétique par le besoin de toujours plus de performances et d'efficacité énergétique. Ce travail se propose d'étudier des solutions allant du matériel au logiciel, ainsi que les diverses interactions qui existent entre ces domaines, pour améliorer le support arithmétique dans les systèmes embarqués. Certains résultats ont été intégrés au processeur MPPA développé par Kalray. La première partie est consacrée au support de l'arithmétique virgule flottante dans le MPPA. Elle commence par la mise au point d'une unité flottante matérielle basée sur l'opérateur classique FMA (fused multiply-Add). Les améliorations proposées, implémentées et évaluées incluent un FMA à précision mixte, l'addition à 3 opérandes et le produit scalaire 2D, à chaque fois avec un seul arrondi et le support des sous-Normaux. Cette partie se poursuit par l'étude de l'implémentation des autres primitives flottantes normalisées : division et racine carrée. L'unité flottante matérielle précédente est réutilisée et modifiée pour optimiser ces primitives à moindre coût. Cette première partie s’ouvre sur le développement d'un générateur de code destiné à l'implémentation de bibliothèques mathématiques optimisées pour différents contextes (architecture, précision, latence, débit). La seconde partie consiste en la présentation d'une nouvelle architecture de coprocesseur reconfigurable. Cet opérateur matériel peut être dynamiquement modifié pour s'adapter à la volée à des besoins applicatifs variés. Il vise à fournir des performances se rapprochant d'une implémentation matérielle dédiée sans renier la flexibilité inhérente au logiciel. Il a été spécifiquement pensé pour être intégré avec un cœur embarqué faible consommation du MPPA. Cette partie s'attache aussi à décrire le développement d'un environnement logiciel pour cibler ce coprocesseur ainsi qu'explorer divers choix architecturaux envisagés. La dernière partie étudie un problème plus large : l'utilisation efficace de ressources arithmétiques parallèles. Elle présente une amélioration des architectures régulières Single Instruction Multiple Data tels qu’on les trouve dans les accélérateurs graphiques (GPU) pour l'exécution de graphes de flot de contrôle divergents
In the last decades embedded systems have been challenged with more and more application variety, each time more constrained. This implies an ever growing need for performances and energy efficiency in arithmetic units. This work studies solutions ranging from hardware to software to improve arithmetic support in embedded systems. Some of these solutions were integrated in Kalray's MPPA processor. The first part of this work focuses on floating-Point arithmetic support in the MPPA. It starts with the design of a floating-Point unit (FPU) based on the classical FMA (Fused Multiply-Add) operator. The improvements we suggest, implement and evaluate include a mixed precision FMA, a 3-Operand add and a 2D scalar product, each time with a single rounding and support for subnormal numbers. It then considers the implementation of division and square root. The FPU is reused and modified to optimize the software implementations of those primitives at a lower cost. Finally, this first part opens up on the development of a code generator designed for the implementation of highly optimized mathematical libraries in different contexts (architecture, accuracy, latency, throughput). The second part studies a reconfigurable coprocessor, a hardware operator that could be dynamically modified to adapt on the fly to various applicative needs. It intends to provide performance close to ASIC implementation, with some of the flexibility of software. One of the addressed challenges is the integration of such a reconfigurable coprocessor into the low power embedded cluster of the MPPA. Another is the development of a software framework targeting the coprocessor and allowing design space exploration. The last part of this work leaves micro-Architecture considerations to study the efficient use of parallel arithmetic resources. It presents an improvement of regular architectures (Single Instruction Multiple Data), like those found in graphic processing units (GPU), for the execution of divergent control flow graphs

APA, Harvard, Vancouver, ISO, and other styles

20

Foucher, Clément. "Méthodologie de conception pour la virtualisation et le déploiement d'applications parallèles sur plateforme reconfigurable matériellement." Phd thesis, Université Nice Sophia Antipolis, 2012. http://tel.archives-ouvertes.fr/tel-00777511.

Full text

Abstract:

Les applications auto-adaptatives, dont le comportement évolue en fonction de l'environnement, sont un élément clé des systèmes de demain. L'utilisation de matériel reconfigurable, combiné à la parallélisation des unités de calcul, permettent d'envisager de nouveaux niveaux de performances pour ces mêmes applications. L'objectif de cette thèse est de mettre en place un ensemble d'outils permettant la description et le déploiement d'applications parallèles auto-adaptatives. Nous proposons à la fois un modèle d'application parallèle et une architecture de plateforme reconfigurable destinée au déploiement des applications conçues en utilisant ce modèle. Notre modèle d'applications sépare le contrôle du calcul en isolant ce dernier en différents noyaux virtuels. Associée à une représentation du contrôle indépendante de la plateforme, la structure de l'application est donc totalement portable. Indépendamment de celle-ci, les noyaux de calcul peuvent être distribués selon plusieurs implémentations, selon la plateforme, mais également pour proposer différents niveaux de performances. Une couche de virtualisation permet de faire le lien entre la partie contrôle et les noyaux, en traduisant les ordres génériques en actions adaptées à l'implémentation. Concernant la plateforme, nous proposons une architecture permettant l'intégration de ressources de calcul logicielles et matérielles, pouvant être implémentées tant statiquement qu'en utilisant du matériel reconfigurable. Cette architecture parallèle, inspirée du modèle des supercalculateurs, doit permettre d'utiliser tout type d'unités d'exécution et de matériel reconfigurable comme base matérielle pour la plateforme.

APA, Harvard, Vancouver, ISO, and other styles

21

Almeida, Manoel Aranda de. "Sistema embarcado reconfigurável de forma estática por programação genética utilizando hardware evolucionário híbrido." Universidade Federal de São Carlos, 2016. https://repositorio.ufscar.br/handle/ufscar/8000.

Full text

Abstract:

Submitted by Izabel Franco (izabel-franco@ufscar.br) on 2016-10-03T18:47:50Z No. of bitstreams: 1 DissMAA.pdf: 3325891 bytes, checksum: 1b4744d48d74943990bed42753cc4b4c (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T18:27:58Z (GMT) No. of bitstreams: 1 DissMAA.pdf: 3325891 bytes, checksum: 1b4744d48d74943990bed42753cc4b4c (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T18:28:04Z (GMT) No. of bitstreams: 1 DissMAA.pdf: 3325891 bytes, checksum: 1b4744d48d74943990bed42753cc4b4c (MD5)
Made available in DSpace on 2016-10-20T18:28:13Z (GMT). No. of bitstreams: 1 DissMAA.pdf: 3325891 bytes, checksum: 1b4744d48d74943990bed42753cc4b4c (MD5) Previous issue date: 2016-03-04
Não recebi financiamento
The use of technology based on Field Programmable Gate Arrays (FPGAs), a reconfigurable technology, has become a frequent object of study. This technique is feasible and a promising application in the development of embedded systems, however, the difficulty in finding a flexible and efficient way to perform such an application is their bigger problem. In this work, a virtual and reconfigurable architecture (AVR) in FPGA for hardware applications is presented using a Genetic Programming Software on the development of an optimal reconfiguration for this AVR, in order to build a hardware capable of performing a given task in an embedded system. This proposal is a simple, flexible and efficient way to achieve appropriate applications in embedded systems, when compared to other reconfigurable hardware techniques. The representation of phenotype of the proposed evolutionary system is based on a bi-dimensional network function elements (EF). The GPLAB tool for MATLAB is used in Genetic Programming, and the solution found by this procedure is converted into a memory mapping to represent the best solution, where it is used to reconfigure the hardware. In the tests, GPLAB found results for logic circuits in a few generations, and for image filters containing efficient solutions, where there was little hardware occupation, especially memory, in the cases this has been presented, with a reduced chromosome size, shows a proposal efficiency.
O uso da tecnologia baseada em Field Programmable Gate Arrays (FPGAs), de forma reconfigurável, para a solução de diversos problemas atuais, tem se tornado um frequente objeto de estudo. Essa técnica é de aplicação viável e promissora na elaboração de sistemas embarcados, porém, a dificuldade em encontrar uma forma flexível e eficiente de realizar tal aplicação é o seu maior problema. Neste trabalho, é apresentada uma arquitetura virtual e reconfigurável (AVR) em FPGA para aplicações em hardware, utilizando um software de Programação Genética na elaboração de uma reconfiguração ótima para esta AVR, de forma a construir um hardware capaz de efetuar uma determinada tarefa em um sistema embarcado. Esta proposta é uma forma simples, flexível e eficiente de realizar aplicações adequadas em sistemas embarcados, quando comparada a outras técnicas de hardware reconfigurável. A representação do fenótipo no sistema evolutivo proposto se baseia em uma rede de elementos de função (EF) bidimensional. A ferramenta GPLAB, para MATLAB, é usada na Programação Genética, e a solução encontrada por esta é convertida em um mapeamento de memória com o cromossomo da melhor solução, onde este é usado para reconfigurar o hardware. Nos testes realizados, a GPLAB encontrou resultados para circuitos lógicos em poucas gerações, e para filtros de imagem encontrou soluções eficientes, onde ocorreu pouca ocupação de hardware, principalmente da memória nos casos apresentados, apresentando um cromossomo de tamanho reduzido, o que demonstra uma boa eficiência da proposta.

APA, Harvard, Vancouver, ISO, and other styles

22

Zhu, Yiqun. "An investigation into hardware architectures of reconfigurable convolutional decoders." Thesis, University of Sheffield, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.403256.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Moss, Duncan J. M. "FPGA Architectures for Low Precision Machine Learning." Thesis, The University of Sydney, 2017. http://hdl.handle.net/2123/18182.

Full text

Abstract:

Machine learning is fast becoming a cornerstone in many data analytic, image processing and scientific computing applications. Depending on the deployment scale, these tasks can either be performed on embedded devices, or larger cloud computing platforms. However, one key trend is an exponential increase in the required compute power as data is collected and processed at a previously unprecedented scale. In an effort to reduce the computational complexity there has been significant work on reduced precision representations. Unlike Central Processing Units, Graphical Processing Units and Applications Specific Integrated Circuits which have fixed datapaths, Field Programmable Gate Arrays (FPGA) are flexible and uniquely positioned to take advantage of reduced precision representations. This thesis presents FPGA architectures for low precision machine learning algorithms, considering three distinct levels: the application, the framework and the operator. Firstly, a spectral anomaly detection application is presented, designed for low latency and real-time processing of radio signals. Two types of detector are explored, a neural network autoencoder and least squares bitmap detector. Secondly, a generalised matrix multiplication framework for the Intel HARPv2 is outlined. The framework was designed specifically for machine learning applications; containing runtime configurable optimisations for reduced precision deep learning. Finally, a new machine learning specific operator is presented. A bit-dependent multiplication algorithm designed to conditionally add only the relevant parts of the operands and arbitrarily skip over redundant computation. Demonstrating optimisations on all three levels; the application, the framework and the operator, illustrates that FPGAs can achieve state-of-the-art performance in important machine learning workloads where high performance is critical; while simultaneously reducing implementation complexity.

APA, Harvard, Vancouver, ISO, and other styles

24

Fröhlich, Dominik. "Object-Oriented Development for Reconfigurable Architectures." Doctoral thesis, Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola&quot, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:105-802464.

Full text

Abstract:

Reconfigurable hardware architectures have been available now for several years. Yet the application development for such architectures is still a challenging and error-prone task, since the methods, languages, and tools being used for development are inappropriate to handle the complexity of the problem. This thesis introduces a novel approach that tackles the complexity challenge by raising the level of abstraction to system-level and increasing the degree of automation. The approach is centered around the paradigms of object-orientation, platforms, and modeling. An application and all platforms being used for its design, implementation, and deployment are modeled with objects using UML and an action language. The application model is then transformed into an implementation, whereby the transformation is steered by the platform models. In this thesis solutions for the relevant problems behind this approach are discussed. It is shown how UML can be used for complete and precise modeling of applications and platforms. Application development is done at the system-level using a set of well-defined, orthogonal platform models. Thereby the core features of object-orientation - data abstraction, encapsulation, inheritance, and polymorphism - are fully supported. Novel algorithms are presented, that allow for an automatic mapping of such application models to the target architecture. Thereby the problems of platform mapping, estimation of implementation characteristics, and synthesis of UML models are discussed. The thesis explores the utilization of platform models for generation of highly optimized implementations in an automatic yet adaptable way. The approach is evaluated by a number of relevant applications. The execution of the generated implementations is supported by a run-time service. This service manages the hardware configurations and objects comprising the application. Moreover, it serves as broker for hardware objects. The efficient management of configurations and objects at run-time is discussed and optimized life cycles for these entities are proposed. Mechanisms are presented that make the approach portable among different physical hardware architectures. Further, this thesis presents UML profiles and example platforms that support system-level design. These extensions are embodied in a novel type of model compiler. The compiler is accompanied by an implementation of the run-time service. Both have been used to evaluate and improve the presented concepts and algorithms.

APA, Harvard, Vancouver, ISO, and other styles

25

Hussain, Hanaa Mohammad. "Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/7645.

Full text

Abstract:

The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field that has emerged due to the computational demands of current state-of-the-art biotechnology. BCB deals with the storage, organization, retrieval, and analysis of biological datasets, which have grown in size and complexity in recent years especially after the completion of the human genome project. The advent of Microarray technology in the 1990s has resulted in the new concept of high throughput experiment, which is a biotechnology that measures the gene expression profiles of thousands of genes simultaneously. As such, Microarray requires high computational power to extract the biological relevance from its high dimensional data. Current general purpose processors (GPPs) has been unable to keep-up with the increasing computational demands of Microarrays and reached a limit in terms of clock speed. Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low power viable solution to overcome the computational limitations of GPPs and other methods. The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to accelerate some of the most widely used data mining methods used for the analysis of Microarray data in an effort to investigate the viability of the technology as an efficient, low power, and economic solution for the analysis of Microarray data. Three widely used methods have been selected for the FPGA implementations: one is the un-supervised Kmeans clustering algorithm, while the other two are supervised classification methods, namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These methods are thought to benefit from parallel implementation. This thesis presents detailed designs and implementations of these three BCB applications on FPGA captured in Verilog HDL, whose performance are compared with equivalent implementations running on GPPs. In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR) capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned data mining methods. Implementing K-means clustering on FPGA using non-DPR design flow has outperformed equivalent implementations in GPP and GPU in terms of speed-up by two orders and one order of magnitude, respectively; while being eight times more power efficient than GPP and four times more than a GPU implementation. As for the energy efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray data increases. Additionally, the DPR implementations of the K-means clustering have shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration for single-core and eight-core implementations, respectively. Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1 and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup. Furthermore, the FPGA implementation outperformed the equivalent GPP implementation when the dimensionality of data was increased. In addition, The DPR implementations of the K-NN classifier have achieved speed-ups in reconfiguration time between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the classifier or the complete classifier. Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA whereby the former outperformed an equivalent GPP implementation by ~61x and the latter by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of ~8x in reconfiguration time when reconfiguring the complete core or when exchanging it with a K-NN core forming a multi-classifier. The aforementioned implementations clearly show FPGAs to be an efficacious, efficient and economic solution for bioinformatics Microarrays data analysis.

APA, Harvard, Vancouver, ISO, and other styles

26

Farag, Mohammed Morsy Naeem. "Architectural Enhancements to Increase Trust in Cyber-Physical Systems Containing Untrusted Software and Hardware." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/29084.

Full text

Abstract:

Embedded electronics are widely employed in cyber-physical systems (CPSes), which tightly integrate and coordinate computational and physical elements. CPSes are extensively deployed in security-critical applications and nationwide infrastructure. Perimeter security approaches to preventing malware infiltration of CPSes are challenged by the complexity of modern embedded systems incorporating numerous heterogeneous and updatable components. Global supply chains and third-party hardware components, tools, and software limit the reach of design verification techniques and introduce security concerns about deliberate Trojan inclusions. As a consequence, skilled attacks against CPSes have demonstrated that these systems can be surreptitiously compromised. Existing run-time security approaches are not adequate to counter such threats because of either the impact on performance and cost, lack of scalability and generality, trust needed in global third parties, or significant changes required to the design flow. We present a protection scheme called Run-time Enhancement of Trusted Computing (RETC) to enhance trust in CPSes containing untrusted software and hardware. RETC is complementary to design-time verification approaches and serves as a last line of defense against the rising number of inexorable threats against CPSes. We target systems built using reconfigurable hardware to meet the flexibility and high-performance requirements of modern security protections. Security policies are derived from the system physical characteristics and component operational specifications and translated into synthesizable hardware integrated into specific interfaces on a per-module or per-function basis. The policy-based approach addresses many security challenges by decoupling policies from system-specific implementations and optimizations, and minimizes changes required to the design flow. Interface guards enable in-line monitoring and enforcement of critical system computations at run-time. Trust is only required in a small set of simple, self-contained, and verifiable guard components. Hardware trust anchors simultaneously addresses the performance, flexibility, developer productivity, and security requirements of contemporary CPSes. We apply RETC to several CPSes having common security challenges including: secure reconfiguration control in reconfigurable cognitive radio platforms, tolerating hardware Trojan threats in third-party IP cores, and preserving stability in process control systems. High-level architectures demonstrated with prototypes are presented for the selected applications. Implementation results illustrate the RETC efficiency in terms of the performance and overheads of the hardware trust anchors. Testbenches associated with the addressed threat models are generated and experimentally validated on reconfigurable platform to establish the protection scheme efficacy in thwarting the selected threats. This new approach significantly enhances trust in CPSes containing untrusted components without sacrificing cost and performance.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

27

Parris, Matthew. "OPTIMIZING DYNAMIC LOGIC REALIZATIONS FOR PARTIAL RECONFIGURATION OF FIELD PROGRAMMABLE GATE ARRAYS." Master's thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4128.

Full text

Abstract:

Many digital logic applications can take advantage of the reconfiguration capability of Field Programmable Gate Arrays (FPGAs) to dynamically patch design flaws, recover from faults, or time-multiplex between functions. Partial reconfiguration is the process by which a user modifies one or more modules residing on the FPGA device independently of the others. Partial Reconfiguration reduces the granularity of reconfiguration to be a set of columns or rectangular region of the device. Decreasing the granularity of reconfiguration results in reduced configuration filesizes and, thus, reduced configuration times. When compared to one bitstream of a non-partial reconfiguration implementation, smaller modules resulting in smaller bitstream filesizes allow an FPGA to implement many more hardware configurations with greater speed under similar storage requirements. To realize the benefits of partial reconfiguration in a wider range of applications, this thesis begins with a survey of FPGA fault-handling methods, which are compared using performance-based metrics. Performance analysis of the Genetic Algorithm (GA) Offline Recovery method is investigated and candidate solutions provided by the GA are partitioned by age to improve its efficiency. Parameters of this aging technique are optimized to increase the occurrence rate of complete repairs. Continuing the discussion of partial reconfiguration, the thesis develops a case-study application that implements one partial reconfiguration module to demonstrate the functionality and benefits of time multiplexing and reveal the improved efficiencies of the latest large-capacity FPGA architectures. The number of active partial reconfiguration modules implemented on a single FPGA device is increased from one to eight to implement a dynamic video-processing architecture for Discrete Cosine Transform and Motion Estimation functions to demonstrate a 55-fold reduction in bitstream storage requirements thus improving partial reconfiguration capability.
M.S.Cp.E.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering MSCpE

APA, Harvard, Vancouver, ISO, and other styles

28

Ló, Thiago Berticelli. "Virtualização de hardware e exploração da memória de contexto em arquiteturas reconfiguráveis." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2012. http://hdl.handle.net/10183/66195.

Full text

Abstract:

Arquiteturas reconfiguráveis têm se demonstrado uma potencial solução para lidar com a crescente complexidade encontrada em sistemas embarcados. Para se alcançar ganhos em desempenho, é preciso uma grande redundância das unidades funcionais, acarretando o aumento da área ocupada pelas unidades funcionais. Uma das propostas deste trabalho será de explorar o espaço de projeto, visando à redução da área e da energia. Para isto, serão apresentadas duas técnicas de virtualização de hardware, sendo as mesmas semelhantes a um pipeline de estágios reconfiguráveis. Ambas as técnicas alcançaram mais de 94% de redução da área. Outro aspecto a ser explorado em uma arquitetura reconfigurável é o impacto em área e energia causado pela inserção da memória de contexto. Assim, este impacto será demonstrado neste trabalho e duas abordagens que modificam a memória de contexto serão propostas: a primeira abordagem baseia-se na exploração da largura ideal da porta da memória combinado com número de acessos, para que se minimize a energia consumida na busca dos bytes de configuração; a segunda abordagem possui um mecanismo de gerenciamento das configurações por meio de listas ligadas, que permite que as configurações sejam acessadas parcialmente. As duas abordagens apresentaram redução de energia de até 98%, podendo ser utilizadas em sistemas que apresentam tanto a reconfiguração parcial como a total.
Reconfigurable architectures have shown to be a potential solution to the problem of increasing complexity found in embedded systems. However, in order to achieve significant performance gains, large quantities of redundant functional units are generally necessary, with a corresponding increase in the area occupied by these units. This thesis explores the design space with the objective of reducing both area and energy consumption, and presents two hardware virtualization techniques, similar to reconfigurable pipeline stages, which achieve a reduction in area of more than 94%. The use of context memory in reconfigurable architectures has a significant impact in terms of area and energy, as is clearly demonstrated by initial experimental results. Two novel context memory architectures are presented: the first approach is being based on an exploration of the balance point between memory port width and number of accesses, in order to reduce the energy consumed during fetching of the configuration bytes; the second approach presents a configuration management mechanism using hardware linked lists, and that allows segmented access to configuration settings. Both approaches demonstrate energy reduction of up to 98% and can be adopted in both partial and atomic reconfiguration architectures.

APA, Harvard, Vancouver, ISO, and other styles

29

Fazzoletto, Emilio. "Characterization of Partial and Run-Time Reconfigurable FPGAs." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-202724.

Full text

Abstract:

FPGA based systems have been heavily used to prototype and test Application Specic Integrated Circuit (ASIC) designs with much lower costs and development time compared to hardwired prototypes. In recentyears, thanks to both the latest technology nodes and a change in the architecture of reconfigurable integrated circuits (from traditional Complex Programmable Logic Device (CPLD) to full-CMOS FPGA), FPGAs have become more popular in embedded systems, both as main computation resources and as hardware accelerators. A new era is beginning for FPGA based systems: the partial run-time reconguration of a FPGA is a feature now available in products already on the market and hardware designers and software developers have to exploit this capability. Previous works show that, when designed properly, a system can improve both its power efficiency and its performance taking advantage of a partial run-time reconfigurable architecture. Unfortunately, taking advantage of run-time reconfigurable hardware is very challenging and there are several problems to face: the reconfiguration overhead is not negligible compared to nowadays CPUs performance,the reconfiguration time is not easily predictable, and the software has to be re-though to work with a time-evolving platform. This thesis project aims to investigate the performance of a modern run-time reconfigurable SoC (a Xilinx Zynq 7020), focusing on the reconfiguration overhead and its predictability, on the achievable speedup, and the trade-off and limits of this kind of platform. Since it is not always obvious when an application (especially a real-time one) is really able to use at its own advantage a partial run-time reconfigurable platform, the data collected during this project could be a valid help for hardware designers that use reconfigurable computing.
FPGA-baserade system har tidigare främst använts för snabb och kostnadseffektiv konstruktion av prototyper vid framtagandet av applikationsspecika integrerade kretsar (ASIC). På senare år har användandet av FPGA:er i inbyggda system för implementation av hårdvaruacceleratorers såväl som huvudsaklig beräkningsenhet ökat. Denna ökning har möjliggjorts mycket tack vare den utveckling som har skett av rekonfigurerbara integrerade kretsar: från de mer traditionella Complex Programmable Logic Devices (CPLD) till helt CMOS-baserade FPGA:er. Nu inleds en ny era för FPGA-baserade system tack vare möjligheten att under körning rekonfigurera delar av FPGA:n genom så kallad partial run-time reconguration(RTR) - en teknik som redan idag finns tillgänglig i produkter på marknaden. Tidigare forskning visar att användandet av en RTR-baserad hårdvaruarkitektur kan ha en positiv effekt med avseende på prestanda såväl som strömförbrukning. Att använda RTR-baserad hårdvara innebär dock flera utmaningar: En ej försumbar rekonfigurationstid måste tas i beaktning, så även den icke-deterministiska exekveringstiden som en rekonfiguration kan innebära. Vidare måste anpassningar av mjukvaran göras för att fungera med en hårdvaruplattform som förändras över tid. Denna uppsats syftar till att undersöka prestandan hos ett modernt RTRbaserat SoC (Xilinx Zynq 7020) med fokus på rekonfigurationstider och dess förutsägbarhet, prestanda ökning, begränsningar samt nödvändiga kompromisser som denna arkitektur innebär. Huruvida en applikation kan dra nytta av en RTR-baserad arkitektur eller inte kan vara svårt att avgöra. Den insamlade datan som presenteras i denna rapport kan dock fungera som stöd för hårdvarukonstruktörer som önskar använda en RTR-baserad plattform.

APA, Harvard, Vancouver, ISO, and other styles

30

Afonso, George. "Vers une nouvelle génération de systèmes de test et de simulation avionique dynamiquement reconfigurables." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2013. http://tel.archives-ouvertes.fr/tel-00921874.

Full text

Abstract:

L'objectif de cette thèse est la proposition de nouvelles solutions dans le domaine des systèmes de test et de simulation avioniques et ce, à plusieurs niveaux. Dans un premier temps, nous avons proposé un modèle d'exécution dynamique permettant d'unifier les métiers du test et de la simulation, de répondre aux contraintes imposées, d'apporter de nouvelles possibilités et ainsi d'accélérer le cycle de développement des futurs équipements embarqués. Ensuite, un support matériel basé sur une architecture hétérogène CPU-FPGA a été défini afin de répondre à la problématique proposée et aux contraintes imposées par le domaine d'application telles que le respect du temps-réel et la capacité de reconfiguration dynamique hétérogène. A ce support matériel, est venue s'ajouter une méthodologie de développement permettant une meilleure prise en charge du code "legacy" de l'industriel. Enfin, un environnement unifié temps réel mou pour le test et la simulation avionique a été mis en avant, permettant de diminuer les coûts liés à la maîtrise et à la maintenance d'un nouvel environnement. Finalement, une étude de cas a permis de mettre en avant les capacités de reconfiguration dynamique et les performances de l'environnement développé.

APA, Harvard, Vancouver, ISO, and other styles

31

Lalevée, André. "Towards highly flexible hardware architectures for high-speed data processing : a 100 Gbps network case study." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0054/document.

Full text

Abstract:

L’augmentation de la taille des réseaux actuels ainsi que de la diversité des applications qui les utilisent font que les architectures de calcul traditionnelles deviennent limitées. En effet, les architectures purement logicielles ne permettent pas de tenir les débits en jeu, tandis que celles purement matérielles n’offrent pas assez de flexibilité pour répondre à la diversité des applications. Ainsi, l’utilisation de solutions de type matériel programmable, en particulier les Field Programmable Gate Arrays (FPGAs), a été envisagée. En effet, ces architectures sont souvent considérées comme un bon compromis entre performances et flexibilité, notamment grâce à la technique de Reconfiguration Dynamique Partielle (RDP), qui permet de modifier le comportement d’une partie du circuit pendant l’exécution. Cependant, cette technique peut présenter des inconvénients lorsqu’elle est utilisée de manière intensive, en particulier au niveau du stockage des fichiers de configuration, appelés bitstreams. Pour palier ce problème, il est possible d’utiliser la relocation de bitstreams, permettant de réduire le nombre de fichiers de configuration. Cependant cette technique est fastidieuse et exige des connaissances pointues dans les FPGAs. Un flot de conception entièrement automatisé a donc été développé dans le but de simplifier son utilisation.Pour permettre une flexibilité sur l’enchaînement des traitements effectués, une architecture de communication flexible supportant des hauts débits est également nécessaire. Ainsi, l’étude de Network-on-Chips dédiés aux circuits reconfigurables et au traitements réseaux à haut débit.Enfin, un cas d’étude a été mené pour valider notre approche
The increase in both size and diversity of applications regarding modern networks is making traditional computing architectures limited. Indeed, purely software architectures can not sustain typical throughputs, while purely hardware ones severely lack the flexibility needed to adapt to the diversity of applications. Thus, the investigation of programmable hardware, such as Field Programmable Gate Arrays (FPGAs), has been done. These architectures are indeed usually considered as a good tradeoff between performance and flexibility, mainly thanks to the Dynamic Partial Reconfiguration (DPR), which allows to reconfigure a part of the design during run-time.However, this technique can have several drawbacks, especially regarding the storing of the configuration files, called bitstreams. To solve this issue, bitstream relocation can be deployed, which allows to decrease the number of configuration files required. However, this technique is long, error-prone, and requires specific knowledge inFPGAs. A fully automated design flow has been developped to ease the use of this technique. In order to provide flexibility regarding the sequence of treatments to be done on our architecture, a flexible and high-throughput communication structure is required. Thus, a Network-on-Chips study and characterization has been done accordingly to network processing and bitstream relocation properties. Finally, a case study has been developed in order to validate our approach

APA, Harvard, Vancouver, ISO, and other styles

32

Marques, Nicolas. "Méthodologie et architecture adaptative pour le placement efficace de tâches matérielles de tailles variables sur des partitions reconfigurables." Thesis, Université de Lorraine, 2012. http://www.theses.fr/2012LORR0139/document.

Full text

Abstract:

Les architectures reconfigurables à base de FPGA sont capables de fournir des solutions adéquates pour plusieurs applications vu qu'elles permettent de modifier le comportement d'une partie du FPGA pendant que le reste du circuit continue de s'exécuter normalement. Ces architectures, malgré leurs progrès, souffrent encore de leur manque d'adaptabilité fasse à des applications constituées de tâches matérielles de taille différente. Cette hétérogénéité peut entraîner de mauvais placements conduisant à une utilisation sous-optimale des ressources et par conséquent une diminution des performances du système. La contribution de cette thèse porte sur la problématique du placement des tâches matérielles de tailles différentes et de la génération efficace des régions reconfigurables. Une méthodologie et une couche intermédiaire entre le FPGA et l'application sont proposées pour permettre le placement efficace des tâches matérielles de tailles différentes sur des partitions reconfigurables de taille prédéfinie. Pour valider la méthode, on propose une architecture basée sur l'utilisation de la reconfiguration partielle afin d'adapter le transcodage d'un format de compression vidéo à un autre de manière souple et efficace. Une étude sur le partitionnement de la région reconfigurable pour les tâches matérielles de l'encodeur entropique (CAVLC / VLC) est proposée afin de montrer l'apport du partitionnement. Puis une évaluation du gain obtenu et du surcoût de la méthode est présentée
FPGA-based reconfigurable architectures can deliver appropriate solutions for several applications as they allow for changing the performance of a part of the FPGA while the rest of the circuit continues to run normally. These architectures, despite their improvements, still suffer from their lack of adaptability when confronted with applications consisting of variable size material tasks. This heterogeneity may cause wrong placements leading to a sub-optimal use of resources and therefore a decrease in the system performances. The contribution of this thesis focuses on the problematic of variable size material task placement and reconfigurable region effective generation. A methodology and an intermediate layer between the FPGA and the application are proposed to allow for the effective placement of variable size material tasks on reconfigurable partitions of a predefined size. To approve the method, we suggest an architecture based on the use of partial reconfiguration in order to adapt the transcoding of one video compression format to another in a flexible and effective way. A study on the reconfigurable region partitioning for the entropy encoder material tasks (CAVLC / VLC) is proposed in order to show the contribution of partitioning. Then an assessment of the gain obtained and of the method additional costs is submitted

APA, Harvard, Vancouver, ISO, and other styles

33

Gonsales, Alex Dias. "Projeto de uma Nova Arquitetura de FPGA para aplicações BIST e DSP." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2002. http://hdl.handle.net/10183/12010.

Full text

Abstract:

Os sistemas eletrônicos digitais estão sendo cada vez mais utilizados em aplicações de telecomunicações, processamento de voz, instrumentação, biomedicina e multimídia. A maioria dessas aplicações requer algum tipo de processamento de sinal, sendo que essa função normalmente é executada em grande parte por um bloco digital. Além disso, considerando-se os diversos tipos de circuitos existentes num sistema, tais como memórias RAM (Random Access Memory) e ROM (Read Only Memory), partes operativas e partes de controle complexas, é cada vez mais importante a preocupação com o teste desses sistemas complexos. O aumento da complexidade dos circuitos a serem testados exige também um aumento na complexidade dos circuitos testadores (teste externo), tornando estes últimos muito caros. Uma alternativa viável é integrar algumas ou todas as funções de teste no próprio chip a ser testado. Por outro lado, essa estratégia pode resultar em um custo proibitivo em termos de área em silício.É interessante observar, no entanto, que se os testes e a função de processamento de sinal não necessitarem ser executados em paralelo, então é possível utilizar uma única área reconfigurável para realizar essas funções de uma maneira sequencial. Logo, este trabalho propõe uma arquitetura reconfigurável otimizada para a implementação desses dois tipos de circuitos (processamento digital de sinais e teste). Com esta abordagem pretende-se ter ganhos de área em relação tanto a uma implementação dedicada (full-custom) quanto a uma implementação em dispositivos reconfiguráveis comerciais. Para validar essas idéias, a arquitetura proposta é descrita em uma linguagem de descrição de hardware, e são mapeados e simulados algoritmos de teste e de processamento de sinais nessa arquitetura. S˜ao feitas estimativas da área ocupada pelas três abordagens (dedicada, dispositivo reconfigurável comercial e nova arquitetura proposta), bem como uma análise comparativa entre as mesmas. Também são feitas estimativas de atraso e frequência máxima de operação.
Digital electronic systems have been increasingly used in a large spectrum of applications, such as communication, voice processing, instrumentation, biomedicine, and multimedia. Most of these applications require some kind of signal processing. Most of this task is usually performed by a digital block. Moreover, these complex systems are composed of different kinds of circuits, such as RAM (Random Access Memory) and ROM (Read Only Memory) memories, complex datapaths and control parts. This way, the test of such systems is ever more important. Likewise, the increasingly complexity of the circuits to be tested requires more complex testers (external test), making the latter more expensive. An approach to address this problem is to embbed the test functions onto the chip to be tested itself. Nevertheless, this approach may bring a prohibitive cost in terms of area on silicon. However, if the test and the signal processing functions are not required to run in parallel, then it is possible to use the same reconfigurable area to implement these functions one after another. Thus, this work proposes an optimized reconfigurable architecture to implement this kind of circuits (digital signal processing and test). This approach intends to decrease the occupied area in comparison to a dedicated and also to a comercial reconfigurable device implementation. To validate these ideas, the proposed architecture is described using a hardware description language and some test and digital signal processing applications are mapped and simulated on this architecture. In this work an estimative of the occupied area by the three approaches (dedicated, comercial reconfigurable device, and the new proposed architecture) as well as a comparison analysis between them are performed. Likewise, a delay estimate is performed and the maximum operation frequency is evaluated.

APA, Harvard, Vancouver, ISO, and other styles

34

Imran, Naveed. "Autonomous Recovery of Reconfigurable Logic Devices using Priority Escalation of Slack." Doctoral diss., University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5949.

Full text

Abstract:

Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Reconfigurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria.
Ph.D.
Doctorate
Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering

APA, Harvard, Vancouver, ISO, and other styles

35

Pasca, Bogdan Mihai. "Calcul flottant haute performance sur circuits reconfigurables." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2011. http://tel.archives-ouvertes.fr/tel-00654121.

Full text

Abstract:

De plus en plus de constructeurs proposent des accélérateurs de calculs à base de circuits reconfigurables FPGA, cette technologie présentant bien plus de souplesse que le microprocesseur. Valoriser cette flexibilité dans le domaine de l'accélération de calcul flottant en utilisant les langages de description de circuits classiques (VHDL ou Verilog) reste toutefois très difficile, voire impossible parfois. Cette thèse a contribué au développement du logiciel FloPoCo, qui offre aux utilisateurs familiers avec VHDL un cadre C++ de description d'opérateurs arithmétiques génériques adapté au calcul reconfigurable. Ce cadre distingue explicitement la fonctionnalité combinatoire d'un opérateur, et la problématique de son pipeline pour une précision, une fréquence et un FPGA cible donnés. Afin de pouvoir utiliser FloPoCo pour concevoir des opérateurs haute performance en virgule flottante, il a fallu d'abord concevoir des blocs de bases optimisés. Nous avons d'abord développé des additionneurs pipelinés autour des lignes de propagation de retenue rapides, puis, à l'aide de techniques de pavages, nous avons conçu de gros multiplieurs, possiblement tronqués, utilisant des petits multiplieurs. L'évaluation de fonctions élémentaires en flottant implique souvent l'évaluation en virgule fixe d'une fonction. Nous présentons un opérateur générique de FloPoCo qui prend en entrée l'expression de la fonction à évaluer, avec ses précisions d'entrée et de sortie, et construit un évaluateur polynomial optimisé de cette fonction. Ce bloc de base a permis de développer des opérateurs en virgule flottante pour la racine carrée et l'exponentielle qui améliorent considérablement l'état de l'art. Nous avons aussi travaillé sur des techniques de compilation avancée pour adapter l'exécution d'un code C aux pipelines flexibles de nos opérateurs. FloPoCo a pu ainsi être utilisé pour implanter sur FPGA des applications complètes.

APA, Harvard, Vancouver, ISO, and other styles

36

Liu, Yue-qu, and 劉岳衢. "Reconfigurable Design and Implementation of Modular-Construction Based FFT Hardware Architecture." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/m9kw97.

Full text

Abstract:

碩士
國立中山大學
電機工程學系研究所
106
In the 3GPP-LTE communication standard, it defines many kinds of Fast Fourier Transform(FFT) sizes. So, we design a high performance FFT architecture which makes good use of modular design construction and reconfigurable design to achieve easily connection between every 2 stages. This design can be suitable for any requirement. In the 4-stage module, it can support 48 modes which perform 2-2187 FFT points. It also supports 32 modes defined in 3GPP-LTE communication standard. Each module contains two parts. (1) Reconfigurable Computing Kernel(RC-CK)：We employ radix-32 and radix-23 bases and suitably utilize the hardware reuse property. Without extra of hardware resource (ex：multipliers or adders), it can execute six types of different radix of FFT kernel operations. (2) Reconfigurable First-in First-Out(RC-FIFO)：We develop a high efficient design method for supporting many FFT points. The FIFO plan is easily managed and suitably located to maximize the hardware storage usage. In addition, we propose Section-based Twiddle Factor Generator(STFG) to support multi-FFT points. It can reduce the area cost and satisfy any communication systems effectively. In the chip implementation, the core area is only 0.318 mm2 by using TSMC 40-nm CMOS technology. The maximal operating frequency is 350 MHz and power dissipation in average is 44.2 mW. As compared with other state-of-the-arts, our proposed work has the best performance and support many FFT points. Most important of all, the proposed hardware architecture has the better scalability. In the future, we can support the undefined specifications of the 5th generation wireless system only by increasing/decreasing the module.

APA, Harvard, Vancouver, ISO, and other styles

37

Wang, Ching-Shun, and 王靖順. "Reconfigurable Hardware Architecture Design and Implementation for AI Deep Learning Accelerator." Thesis, 2019. http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5441107%22.&searchmode=basic.

Full text

Abstract:

碩士
國立中興大學
電機工程學系所
107
This paper proposes the Convolution Neural Network hardware accelerator architecture with 288PE to achieve 230.4GOPS@400Mhz. To verify the hardware function, the hardware is implemented at 100MHz in units of 72PE owing to the limitation of FPGA resources. The proposed CNN hardware accelerator is Layer-based architecture which can be reconfigured the layer parameters to suitable for different CNN architectures. The proposed architecture is based on operating three Rows Input feature map and then generate a Row Output feature map. The proposed architecture uses 322KB On-Chip Memory to store Input feature map, Bias, Kernel, and Output feature map to improve the efficiency of Data reuse and reduce bandwidth utilization. In this paper, the Max-pooling layer after the Convolution layer can be combined to reduce the bandwidth of DRAM.

APA, Harvard, Vancouver, ISO, and other styles

38

Chen, Wen-Chieh, and 陳文杰. "A new hardware-efficient algorithm and reconfigurable architecture for image contrast enhancement." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/ws6ckf.

Full text

Abstract:

碩士
國立臺北科技大學
電腦與通訊研究所
100
Contrast enhancement is crucial when generating high quality images for image processing applications such as digital image or video photography, LCD processing, and medical image analysis. In order to achieve real-time performance for high-definition video applications, it is necessary to design efficient contrast enhancement hardware architecture to meet the needs of real-time processing. In this paper, we propose a novel hardware-oriented contrast enhancement algorithm which can be implemented effectively for hardware design. In order to be considered for hardware implementation, approximation techniques are proposed to reduce these complex computations during performance of the contrast enhancement algorithm. The proposed hardware-oriented contrast enhancement algorithm achieves good image quality by measuring the results of qualitative and quantitative analyses. To decrease hardware cost and improve hardware utilization for real-time performance, a reduction in circuit area is proposed through use of parameter-controlled reconfigurable architecture. The experiment results show that the proposed hardware-oriented contrast enhancement algorithm can provide an average frame rate of 48.23 fps at high definition resolution 1920×1080. This means that the proposed hardware architecture can run in real-time.

APA, Harvard, Vancouver, ISO, and other styles

39

CHANG, YU-WEI, and 張祐維. "Hardware/Software Co-Design of Reconfigurable Architecture for 2D-to-3D Conversion." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/9rsz4y.

Full text

Abstract:

碩士
國立臺南大學
資訊工程學系碩士班
107
3D images are amazing, but the cost of using multiple photographic lenses for photography is very expensive. Therefore, there is a 2D-to-3D technology, and the software is superior in image processing, but if the software resources are insufficient due to lots of conversion schedules.At that time, it is necessary to assist with hardware. In the early stage of computer science design, software and hardware are separated, and they are integrated in the final stage.We can see lots of implementations of hardware/software Co-Design is already applied to business, and there are also many good methods for image edge detection and depth calculation. In order to accelerate the efficiency of 2D-to-3D, we proposed a hardware/software co-design of reconfigurable architecture for 2D-to-3D conversion that make more efficient for 2D-to-3D . Depth Image Based Rendering (DIBR) is used in 2D to 3D system to generate 3D image synthesis using original 2D images and depth information to generate 3D or multiple perspective virtual images. In this paper, We uses Xilinx Spartan 7 to do as a hardware device for a platform of hardware/software co-design.

APA, Harvard, Vancouver, ISO, and other styles

40

Biswas, Prasenjit. "Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture." Thesis, 2011. https://etd.iisc.ac.in/handle/2005/2108.

Full text

Abstract:

Application domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR factorization or SVD of matrices can be realized. MFA has the good property of realizing a host of matrix operations by computing the Schur complements on four blocked matrices, thereby reducing the overall computation requirements. We will use MFA as a representative Direct Solver in this work. We further discuss Given's rotation based QR algorithm for Decomposition of any matrix, often used to solve the linear least square problem. Systolic Array Architectures are widely accepted ASIC solutions for NLA algorithms. But the \can of worms" associated with this traditional solution spawns the need for alternative solutions. While popular custom hardware solution in form of systolic arrays can deliver high performance, but because of their rigid structure they are not scalable and reconfigurable, and hence not commercially viable. We show how a Reconfigurable computing platform can serve to contain the \can of worms". REDEFINE, a coarse grained runtime reconfigurable architecture has been used for systolic actualization of NLA kernels. We elaborate upon streaming NLA-specific enhancements to REDEFINE in order to meet expected performance goals. We explore the need for an algorithm aware custom compilation framework. We bring about a proposition to realize Faddeev's Algorithm on REDEFINE. We show that REDEFINE performs several times faster than traditional GPPs. Further we direct our interest to QR Decomposition to be the next NLA kernel as it ensures better stability than LU and other decompositions. We use QR Decomposition as a case study to explore the design space of the proposed solution on REDEFINE. We also investigate the architectural details of the Custom Functional Units (CFU) for these NLA kernels. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array. The framework used to realize QR Decomposition can be generalized for the realization of other algorithms dealing with decompositions like LU, Faddeev's Algorithm, Gauss-Jordon etc with different CFU definitions .

APA, Harvard, Vancouver, ISO, and other styles

41

Biswas, Prasenjit. "Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture." Thesis, 2011. http://etd.iisc.ernet.in/handle/2005/2108.

Full text

Abstract:

Application domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR factorization or SVD of matrices can be realized. MFA has the good property of realizing a host of matrix operations by computing the Schur complements on four blocked matrices, thereby reducing the overall computation requirements. We will use MFA as a representative Direct Solver in this work. We further discuss Given's rotation based QR algorithm for Decomposition of any matrix, often used to solve the linear least square problem. Systolic Array Architectures are widely accepted ASIC solutions for NLA algorithms. But the \can of worms" associated with this traditional solution spawns the need for alternative solutions. While popular custom hardware solution in form of systolic arrays can deliver high performance, but because of their rigid structure they are not scalable and reconfigurable, and hence not commercially viable. We show how a Reconfigurable computing platform can serve to contain the \can of worms". REDEFINE, a coarse grained runtime reconfigurable architecture has been used for systolic actualization of NLA kernels. We elaborate upon streaming NLA-specific enhancements to REDEFINE in order to meet expected performance goals. We explore the need for an algorithm aware custom compilation framework. We bring about a proposition to realize Faddeev's Algorithm on REDEFINE. We show that REDEFINE performs several times faster than traditional GPPs. Further we direct our interest to QR Decomposition to be the next NLA kernel as it ensures better stability than LU and other decompositions. We use QR Decomposition as a case study to explore the design space of the proposed solution on REDEFINE. We also investigate the architectural details of the Custom Functional Units (CFU) for these NLA kernels. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array. The framework used to realize QR Decomposition can be generalized for the realization of other algorithms dealing with decompositions like LU, Faddeev's Algorithm, Gauss-Jordon etc with different CFU definitions .

APA, Harvard, Vancouver, ISO, and other styles

42

Liu, Xiaobin. "ENERGY EFFICIENCY EXPLORATION OF COARSE-GRAIN RECONFIGURABLE ARCHITECTURE WITH EMERGING NONVOLATILE MEMORY." 2015. https://scholarworks.umass.edu/masters_theses_2/159.

Full text

Abstract:

With the rapid growth in consumer electronics, people expect thin, smart and powerful devices, e.g. Google Glass and other wearable devices. However, as portable electronic products become smaller, energy consumption becomes an issue that limits the development of portable systems due to battery lifetime. In general, simply reducing device size cannot fully address the energy issue. To tackle this problem, we propose an on-chip interconnect infrastructure and pro- gram storage structure for a coarse-grained reconfigurable architecture (CGRA) with emerging non-volatile embedded memory (MRAM). The interconnect is composed of a matrix of time-multiplexed switchboxes which can be dynamically reconfigured with the goal of energy reduction. The number of processors performing computation can also be adapted. The use of MRAM provides access to high-density storage and lower memory energy consumption versus more standard SRAM technologies. The combination of CGRA, MRAM, and flexible on-chip interconnection is considered for signal processing. This application domain is of interest based on its time-varying computing demands. To evaluate CGRA architectural features, prototype architectures have been pro- totyped in a field-programmable gate array (FPGA). Measurements of energy, power, instruction count, and execution time performance are considered for a scalable num- ber of processors. Applications such as adaptive Viterbi decoding and Reed Solomon coding are used for evaluation. To complete this thesis, a time-scheduled switchbox was integrated into our CGRA model. This model was prototyped on an FPGA. It is shown that energy consumption can be reduced by about 30% if dynamic design reconfiguration is performed.

APA, Harvard, Vancouver, ISO, and other styles

43

Alle, Mythri. "Compiling For Coarse-Grained Reconfigurable Architectures Based On Dataflow Execution Paradigm." Thesis, 2012. https://etd.iisc.ac.in/handle/2005/2453.

Full text

Abstract:

Coarse-Grained Reconfigurable Architectures(CGRAs) can be employed for accelerating computational workloads that demand both flexibility and performance. CGRAs comprise a set of computation elements interconnected using a network and this interconnection of computation elements is referred to as a reconfigurable fabric. The size of application that can be accommodated on the reconfigurable fabric is limited by the size of instruction buffers associated with each Compute element. When an application cannot be accommodated entirely, application is partitioned such that each of these partitions can be executed on the reconfigurable fabric. These partitions are scheduled by an orchestrator. The orchestrator employs dynamic dataflow execution paradigm. Dynamic dataflow execution paradigm has inherent support for synchronization and helps in exploitation of parallelism that exists across application partitions. In this thesis, we present a compiler that targets such CGRAs. The compiler presented in this thesis is capable of accepting applications specified in C89 standard. To enable architectural design space exploration, the compiler is designed such that it can be customized for several instances of CGRAs employing dataflow execution paradigm at the orchestrator. This can be achieved by specifying the appropriate configuration parameters to the compiler. The focus of this thesis is to provide efficient support for various kinds of parallelism while ensuring correctness. The compiler is designed to support fine-grained task level parallelism that exists across iterations of loops and function calls. Additionally, compiler can also support pipeline parallelism, where a loop is split into multiple stages that execute in a pipelined manner. The prototype compiler, which targets multiple instances of a CGRA, is demonstrated in this thesis. We used this compiler to target multiple variants of CGRAs employing dataflow execution paradigm. We varied the reconfigur-able fabric, orchestration mechanism employed, size of instruction buffers. We also choose applications from two different domains viz. cryptography and linear algebra. The execution time of the CGRA (the best among all instances) is compared against an Intel Quad core processor. Cryptography applications show a performance improvement ranging from more than one order of magnitude to close to two orders of magnitude. These applications have large amounts of ILP and our compiler could successfully expose the ILP available in these applications. Further, the domain customization also played an important role in achieving good performance. We employed two custom functional units for accelerating Cryptography applications and compiler could efficiently use them. In linear algebra kernels we observe multiple iterations of the loop executing in parallel, effectively exploiting loop-level parallelism at runtime. Inspite of this we notice close to an order of magnitude performance degradation. The reason for this degradation can be attributed to the use of non-pipelined floating point units, and the delays involved in accessing memory. Pipeline parallelism was demonstrated using this compiler for FFT and QR factorization. Thus, the compiler is capable of efficiently supporting different kinds of parallelism and can support complete C89 standard. Further, the compiler can also support different instances of CGRAs employing dataflow execution paradigm.

APA, Harvard, Vancouver, ISO, and other styles

44

Alle, Mythri. "Compiling For Coarse-Grained Reconfigurable Architectures Based On Dataflow Execution Paradigm." Thesis, 2012. http://etd.iisc.ernet.in/handle/2005/2453.

Full text

Abstract:

Coarse-Grained Reconfigurable Architectures(CGRAs) can be employed for accelerating computational workloads that demand both flexibility and performance. CGRAs comprise a set of computation elements interconnected using a network and this interconnection of computation elements is referred to as a reconfigurable fabric. The size of application that can be accommodated on the reconfigurable fabric is limited by the size of instruction buffers associated with each Compute element. When an application cannot be accommodated entirely, application is partitioned such that each of these partitions can be executed on the reconfigurable fabric. These partitions are scheduled by an orchestrator. The orchestrator employs dynamic dataflow execution paradigm. Dynamic dataflow execution paradigm has inherent support for synchronization and helps in exploitation of parallelism that exists across application partitions. In this thesis, we present a compiler that targets such CGRAs. The compiler presented in this thesis is capable of accepting applications specified in C89 standard. To enable architectural design space exploration, the compiler is designed such that it can be customized for several instances of CGRAs employing dataflow execution paradigm at the orchestrator. This can be achieved by specifying the appropriate configuration parameters to the compiler. The focus of this thesis is to provide efficient support for various kinds of parallelism while ensuring correctness. The compiler is designed to support fine-grained task level parallelism that exists across iterations of loops and function calls. Additionally, compiler can also support pipeline parallelism, where a loop is split into multiple stages that execute in a pipelined manner. The prototype compiler, which targets multiple instances of a CGRA, is demonstrated in this thesis. We used this compiler to target multiple variants of CGRAs employing dataflow execution paradigm. We varied the reconfigur-able fabric, orchestration mechanism employed, size of instruction buffers. We also choose applications from two different domains viz. cryptography and linear algebra. The execution time of the CGRA (the best among all instances) is compared against an Intel Quad core processor. Cryptography applications show a performance improvement ranging from more than one order of magnitude to close to two orders of magnitude. These applications have large amounts of ILP and our compiler could successfully expose the ILP available in these applications. Further, the domain customization also played an important role in achieving good performance. We employed two custom functional units for accelerating Cryptography applications and compiler could efficiently use them. In linear algebra kernels we observe multiple iterations of the loop executing in parallel, effectively exploiting loop-level parallelism at runtime. Inspite of this we notice close to an order of magnitude performance degradation. The reason for this degradation can be attributed to the use of non-pipelined floating point units, and the delays involved in accessing memory. Pipeline parallelism was demonstrated using this compiler for FFT and QR factorization. Thus, the compiler is capable of efficiently supporting different kinds of parallelism and can support complete C89 standard. Further, the compiler can also support different instances of CGRAs employing dataflow execution paradigm.

APA, Harvard, Vancouver, ISO, and other styles

45

Chang, Shao-Hsuan, and 張紹宣. "Design and Implementation of an ALU Cluster Intellectual Property as a Reconfigurable Hardware Accelerator for Media Streaming Architecture." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/66721172009467596171.

Full text

Abstract:

碩士
國立交通大學
電信工程系所
94
There are more and more portable systems such as mobiles, MP3 player, PDA, and other entertainment systems in today’s life. The functionality and complexity of them thus increase much higher than old-time ones. Therefore, having a great deal ability of multimedia operation is important for portable systems. However, it is tough to have enough amounts of multimedia operations from conventional hardware architecture. This results from the poor match between conventional architecture and features of media applications. It hence leads to inefficient memory access that induces performance degression. The worst case is unable to meet the real time requirement. According, this thesis designs an operational unit, ALU cluster, that is referenced from Stanford’s stream processor architecture and thus matches to media applications to provide necessary processing requirements for media applications. Besides, considering the issues of convenient usage in the future and rapid integration of real multimedia applications, we wrap ALU cluster as an AMBA-compatible IP by adding designed interface. Then, it is possible to exploit other existing IP and peripherals in the AMBA platform and truly treats our design as hardware accelerator for real multimedia applications. This thesis is finished with a synthesizable soft IP. The designed interface is verified by ARM-series baseboard. This ensures that the interface conforms to AMBA specification.

APA, Harvard, Vancouver, ISO, and other styles

46

"Scalable Register File Architecture for CGRA Accelerators." Master's thesis, 2016. http://hdl.handle.net/2286/R.I.40738.

Full text

Abstract:

abstract: Coarse-grained Reconfigurable Arrays (CGRAs) are promising accelerators capable of accelerating even non-parallel loops and loops with low trip-counts. One challenge in compiling for CGRAs is to manage both recurring and nonrecurring variables in the register file (RF) of the CGRA. Although prior works have managed recurring variables via rotating RF, they access the nonrecurring variables through either a global RF or from a constant memory. The former does not scale well, and the latter degrades the mapping quality. This work proposes a hardware-software codesign approach in order to manage all the variables in a local nonrotating RF. Hardware provides modulo addition based indexing mechanism to enable correct addressing of recurring variables in a nonrotating RF. The compiler determines the number of registers required for each recurring variable and configures the boundary between the registers used for recurring and nonrecurring variables. The compiler also pre-loads the read-only variables and constants into the local registers in the prologue of the schedule. Synthesis and place-and-route results of the previous and the proposed RF design show that proposed solution achieves 17% better cycle time. Experiments of mapping several important and performance-critical loops collected from MiBench show proposed approach improves performance (through better mapping) by 18%, compared to using constant memory.
Dissertation/Thesis
Masters Thesis Computer Science 2016

APA, Harvard, Vancouver, ISO, and other styles

47

Mohammadi, Mahnaz. "An Accelerator for Machine Learning Based Classifiers." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/4245.

Full text

Abstract:

Artificial Neural Networks (ANNs) are algorithmic techniques that simulate biological neural systems. Typical realization of ANNs are software solutions using High Level Languages (HLLs) such as C, C++, etc. Such solutions have performance limitations which can be attributed to one of the following reasons: • Code generated by the compiler cannot perform application specific optimizations. • Communication latencies between processors through a memory hierarchy could be significant due to non-deterministic nature of the communications. In data mining _eld, ANN algorithms have been widely used as classifiers for data classification applications. Classification involves predicting a certain outcome based on a given input. In order to predict the outcome more precisely, the training algorithms should discover relationships between the attributes to make the prediction possible. So later, when an unseen pattern containing same set of attributes except for the prediction attribute (which is not known yet) is given to the algorithm it can process that pattern and produce its outcome. The prediction accuracy which defines how good the algorithm is in recognizing unseen patterns, depends on how well the algorithm is trained. Radial Basis Function Neural Network (RBFNN) is a type of neural network which has been widely used in classification applications. A pure software implementation of this network will not be able to cope with the performance expected of high-performance ANN applications. Accelerators can be used to speed-up these kinds of applications. Accelerators can take many forms. They range from especially configured cores to reconfigurable circuits. Multi-core and GPU based accelerators can speed-up these applications up to several orders of magnitude when compared to general purpose processors (GPPs). The efficiency of accelerators for RBFNN reduce as the network size increases. Custom hardware implementation is often required to exploit the parallelism and minimize computing time for real time application requirements. Neural networks have been implemented on different hardware platforms such as Application-Specific Integrated Circuits (ASICs) and Field Programmable Logic Gate Arrays (FPGAs). We provide a generic hardware solution for classification using RBFNN and Feed-forward Neural Network with backpropagation learning algorithm (FFBPNN) on a reconfigurable data path that overcomes the major drawback of _axed-function hardware data paths which offers limited edibility in terms of application interchangeability and scalability. Our contributions in this thesis are as follows: • Deification and implementation of open-source reference software implementation of a few categories of ANNs for classification purpose. • Benchmarking the performance on general processors. • Porting the source code for execution on GPU using Cuda API and benchmarking the performance. • Proposing scalable and area efficient hardware architectures for training the learning parameters of ANN. • Synthesizing the ANN on reconfigurable architectures. • MPSoC implementation of ANNs for functional verification of our implementation • Demonstration of the performance advantage of ANN realization on reconfigurable architectures over CPU and GPU for classification applications. • Proposing a generalized methodology for realization of classification using ANNs on reconfigurable architectures.

APA, Harvard, Vancouver, ISO, and other styles

48

Fröhlich, Dominik. "Object-Oriented Development for Reconfigurable Architectures." Doctoral thesis, 2001. https://tubaf.qucosa.de/id/qucosa%3A22579.

Full text

Abstract:

Reconfigurable hardware architectures have been available now for several years. Yet the application development for such architectures is still a challenging and error-prone task, since the methods, languages, and tools being used for development are inappropriate to handle the complexity of the problem. This thesis introduces a novel approach that tackles the complexity challenge by raising the level of abstraction to system-level and increasing the degree of automation. The approach is centered around the paradigms of object-orientation, platforms, and modeling. An application and all platforms being used for its design, implementation, and deployment are modeled with objects using UML and an action language. The application model is then transformed into an implementation, whereby the transformation is steered by the platform models. In this thesis solutions for the relevant problems behind this approach are discussed. It is shown how UML can be used for complete and precise modeling of applications and platforms. Application development is done at the system-level using a set of well-defined, orthogonal platform models. Thereby the core features of object-orientation - data abstraction, encapsulation, inheritance, and polymorphism - are fully supported. Novel algorithms are presented, that allow for an automatic mapping of such application models to the target architecture. Thereby the problems of platform mapping, estimation of implementation characteristics, and synthesis of UML models are discussed. The thesis explores the utilization of platform models for generation of highly optimized implementations in an automatic yet adaptable way. The approach is evaluated by a number of relevant applications. The execution of the generated implementations is supported by a run-time service. This service manages the hardware configurations and objects comprising the application. Moreover, it serves as broker for hardware objects. The efficient management of configurations and objects at run-time is discussed and optimized life cycles for these entities are proposed. Mechanisms are presented that make the approach portable among different physical hardware architectures. Further, this thesis presents UML profiles and example platforms that support system-level design. These extensions are embodied in a novel type of model compiler. The compiler is accompanied by an implementation of the run-time service. Both have been used to evaluate and improve the presented concepts and algorithms.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!