Dissertations / Theses on the topic 'Reconfigurable Hardware Accelerator'

To see the other types of publications on this topic, follow the link: Reconfigurable Hardware Accelerator.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 22 dissertations / theses for your research on the topic 'Reconfigurable Hardware Accelerator.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Babecki, Christopher. "A Memory-Array Centric Reconfigurable Hardware Accelerator for Security Applications." Case Western Reserve University School of Graduate Studies / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=case1427381331.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Diniz, Claudio Machado. "Dedicated and reconfigurable hardware accelerators for high efficiency video coding standard." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2015. http://hdl.handle.net/10183/118394.

Full text
Abstract:
A demanda por vídeos de resolução ultra-alta (além de 1920x1080 pontos) levou à necessidade de desenvolvimento de padrões de codificação de vídeo novos e mais eficientes para prover alta eficiência de compressão. O novo padrão High Efficiency Video Coding (HEVC), publicado em 2013, atinge o dobro da eficiência de compressão (ou 50% de redução no tamanho do vídeo codificado) comparado com o padrão mais eficiente até então, e mais utilizado no mercado, o padrão H.264/AVC (Advanced Video Coding). O HEVC atinge este resultado ao custo de uma elevação da complexidade computacional das ferramentas inseridas no codificador e decodificador. O aumento do esforço computacional do padrão HEVC e as limitações de potência das tecnologias de fabricação em silício atuais tornam essencial o desenvolvimento de aceleradores de hardware para partes importantes da aplicação do HEVC. Aceleradores de hardware fornecem maior desempenho e eficiência energética para aplicações específicas que os processadores de propósito geral. Uma análise da aplicação do HEVC realizada neste trabalho identificou as partes mais importantes do HEVC do ponto de vista do esforço computacional, a saber, o Filtro de Interpolação de Ponto Fracionário, o Filtro de Deblocagem e o cálculo da Soma das Diferenças Absolutas. Uma análise de tempo de execução do Filtro de Interpolação indica um grande potencial de economia de potência/energia pela adaptação do acelerador de hardware à carga de trabalho variável. Esta tese introduz novas contribuições no tema de aceleradores dedicados e reconfiguráveis para o padrão HEVC. Aceleradores de hardware dedicados para o Filtro de Interpolação de Pixel Fracionário, para o Filtro de Deblocagem, e para o cálculo da Soma das Diferenças Absolutas, são propostos, projetados e avaliados nesta tese. A arquitetura de hardware proposta para o filtro de interpolação atinge taxa de processamento similar ao estado da arte, enquanto reduz a área do hardware para este bloco em 50%. A arquitetura de hardware proposta para o filtro de deblocagem também atinge taxa de processamento similar ao estado da arte com uma redução de 5X a 6X na contagem de gates e uma redução de 3X na dissipação de potência. A nova análise comparativa proposta para os elementos de processamento do cálculo da Soma das Diferenças Absolutas introduz diversas alternativas de projeto de arquitetura com diferentes resultados de área, desempenho e potência. A nova arquitetura reconfigurável para o filtro de interpolação do padrão HEVC fornece 57% de redução de área em tempo de projeto e adaptação da potência/energia em tempo-real a cada imagem processada, o que ainda não é suportado pelas arquiteturas do estado da arte para o filtro de interpolação. Adicionalmente, a tese propõe um novo esquema de alocação de aceleradores em tempo-real para arquiteturas reconfiguráveis baseadas em tiles de processamento e de grão-misto, o que reduz em 44% (23% em média) o “overhead” de comunicação comparado com uma estratégia first-fit com reuso de datapaths, para números diferentes de tiles e organizações internas de tile. Este esquema de alocação leva em conta a arquitetura interna para alocar aceleradores de uma maneira mais eficiente, evitando e minimizando a comunicação entre tiles. Os aceleradores e técnicas dedicadas e reconfiguráveis propostos nesta tese proporcionam implementações de codificadores de vídeo de nova geração, além do HEVC, com melhor área, desempenho e eficiência em potência.
The demand for ultra-high resolution video (beyond 1920x1080 pixels) led to the need of developing new and more efficient video coding standards to provide high compression efficiency. The High Efficiency Video Coding (HEVC) standard, published in 2013, reaches double compression efficiency (or 50% reduction in size of coded video) compared to the most efficient video coding standard at that time, and most used in the market, the H.264/AVC (Advanced Video Coding) standard. HEVC reaches this result at the cost of high computational effort of the tools included in the encoder and decoder. The increased computational effort of HEVC standard and the power limitations of current silicon fabrication technologies makes it essential to develop hardware accelerators for compute-intensive computational kernels of HEVC application. Hardware accelerators provide higher performance and energy efficiency than general purpose processors for specific applications. An HEVC application analysis conducted in this work identified the most compute-intensive kernels of HEVC, namely the Fractional-pixel Interpolation Filter, the Deblocking Filter and the Sum of Absolute Differences calculation. A run-time analysis on Interpolation Filter indicates a great potential of power/energy saving by adapting the hardware accelerator to the varying workload. This thesis introduces new contributions in the field of dedicated and reconfigurable hardware accelerators for HEVC standard. Dedicated hardware accelerators for the Fractional Pixel Interpolation Filter, the Deblocking Filter and the Sum of Absolute Differences calculation are herein proposed, designed and evaluated. The interpolation filter hardware architecture achieves throughput similar to the state of the art, while reducing hardware area by 50%. Our deblocking filter hardware architecture also achieves similar throughput compared to state of the art with a 5X to 6X reduction in gate count and 3X reduction in power dissipation. The thesis also does a new comparative analysis of Sum of Absolute Differences processing elements, in which various architecture design alternatives with different area, performance and power results were introduced. A novel reconfigurable interpolation filter hardware architecture for HEVC standard was developed, and it provides 57% design-time area reduction and run-time power/energy adaptation in a picture-by-picture basis, compared to the state-of-the-art. Additionally a run-time accelerator binding scheme is proposed for tile-based mixed-grained reconfigurable architectures, which reduces the communication overhead, compared to first-fit strategy with datapath reusing scheme, by up to 44% (23% on average) for different number of tiles and internal tile organizations. This run-time accelerator binding scheme is aware of the underlying architecture to bind datapaths in an efficient way, to avoid and minimize inter-tile communications. The new dedicated and reconfigurable hardware accelerators and techniques proposed in this thesis enable next-generation video coding standard implementations beyond HEVC with improved area, performance, and power efficiency.
APA, Harvard, Vancouver, ISO, and other styles
3

Das, Satyajit. "Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems." Thesis, Lorient, 2018. http://www.theses.fr/2018LORIS490/document.

Full text
Abstract:
La complexité des systèmes embarqués et des applications impose des besoins croissants en puissance de calcul et de consommation énergétique. Couplé au rendement en baisse de la technologie, le monde académique et industriel est toujours en quête d'accélérateurs matériels efficaces en énergie. L'inconvénient d'un accélérateur matériel est qu'il est non programmable, le rendant ainsi dédié à une fonction particulière. La multiplication des accélérateurs dédiés dans les systèmes sur puce conduit à une faible efficacité en surface et pose des problèmes de passage à l'échelle et d'interconnexion. Les accélérateurs programmables fournissent le bon compromis efficacité et flexibilité. Les architectures reconfigurables à gros grains (CGRA) sont composées d'éléments de calcul au niveau mot et constituent un choix prometteur d'accélérateurs programmables. Cette thèse propose d'exploiter le potentiel des architectures reconfigurables à gros grains et de pousser le matériel aux limites énergétiques dans un flot de conception complet. Les contributions de cette thèse sont une architecture de type CGRA, appelé IPA pour Integrated Programmable Array, sa mise en œuvre et son intégration dans un système sur puce, avec le flot de compilation associé qui permet d'exploiter les caractéristiques uniques du nouveau composant, notamment sa capacité à supporter du flot de contrôle. L'efficacité de l'approche est éprouvée à travers le déploiement de plusieurs applications de traitement intensif. L'accélérateur proposé est enfin intégré à PULP, a Parallel Ultra-Low-Power Processing-Platform, pour explorer le bénéfice de ce genre de plate-forme hétérogène ultra basse consommation
Emerging trends in embedded systems and applications need high throughput and low power consumption. Due to the increasing demand for low power computing and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy efficient hardware accelerators. The main drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low is they perform one specific function and increasing the number of the accelerators in a system on chip (SoC) causes scalability issues. Programmable accelerators provide flexibility and solve the scalability issues. Coarse-Grained Reconfigurable Array (CGRA) architecture consisting of several processing elements with word level granularity is a promising choice for programmable accelerator. Inspired by the promising characteristics of programmable accelerators, potentials of CGRAs in near threshold computing platforms are studied and an end-to-end CGRA research framework is developed in this thesis. The major contributions of this framework are: CGRA design, implementation, integration in a computing system, and compilation for CGRA. First, the design and implementation of a CGRA named Integrated Programmable Array (IPA) is presented. Next, the problem of mapping applications with control and data flow onto CGRA is formulated. From this formulation, several efficient algorithms are developed using internal resources of a CGRA, with a vision for low power acceleration. The algorithms are integrated into an automated compilation flow. Finally, the IPA accelerator is augmented in PULP - a Parallel Ultra-Low-Power Processing-Platform to explore heterogeneous computing
APA, Harvard, Vancouver, ISO, and other styles
4

Jung, Lukas Johannes [Verfasser], Christian [Akademischer Betreuer] Hochberger, and Diana [Akademischer Betreuer] Göhringer. "Optimization of the Memory Subsystem of a Coarse Grained Reconfigurable Hardware Accelerator / Lukas Johannes Jung ; Christian Hochberger, Diana Göhringer." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2019. http://d-nb.info/1187919810/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

El-Hassan, Fadi. "Hardware Architecture of an XML/XPath Broker/Router for Content-Based Publish/Subscribe Data Dissemination Systems." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/30660.

Full text
Abstract:
The dissemination of various types of data faces ongoing challenges with the growing need of accessing manifold information. Since the interest in content is what drives data networks, some new technologies and thoughts attempt to cope with these challenges by developing content-based rather than address-based architectures. The Publish/ Subscribe paradigm can be a promising approach toward content-based data dissemination, especially that it provides total decoupling between publishers and subscribers. However, in content-based publish/subscribe systems, subscriptions are expressive and the information is often delivered based on the matched expressive content - which may not deeply alleviate considerable performance challenges. This dissertation explores a hardware solution for disseminating data in content-based publish/subscribe systems. This solution consists of an efficient hardware architecture of an XML/XPath broker that can route information based on content to either other XML/XPath brokers or to ultimate users. A network of such brokers represent an overlay structure for XML content-based publish/subscribe data dissemination systems. Each broker can simultaneously process many XPath subscriptions, efficiently parse XML publications, and subsequently forward notifications that result from high-performance matching processes. In the core of the broker architecture, locates an XML parser that utilizes a novel Skeleton CAM-Based XML Parsing (SCBXP) technique in addition to an XPath processor and a high-performance matching engine. Moreover, the broker employs effective mechanisms for content-based routing, so as subscriptions, publications, and notifications are routed through the network based on content. The inherent reconfigurability feature of the broker’s hardware provides the system architecture with the capability of residing in any FPGA device of moderate logic density. Furthermore, such a system-on-chip architecture is upgradable, if any future hardware add-ons are needed. However, the current architecture is mature and can effectively be implemented on an ASIC device. Finally, this thesis presents and analyzes the experiments conducted on an FPGA prototype implementation of the proposed broker/router. The experiments tackle tests for the SCBXP alone and for two phases of development of the whole broker. The corresponding results indicate the high performance that the involved parsing, storing, matching, and routing processes can achieve.
APA, Harvard, Vancouver, ISO, and other styles
6

Abdelouahab, Kamel. "Reconfigurable hardware acceleration of CNNs on FPGA-based smart cameras." Thesis, Université Clermont Auvergne‎ (2017-2020), 2018. http://www.theses.fr/2018CLFAC042/document.

Full text
Abstract:
Les Réseaux de Neurones Convolutifs profonds (CNNs) ont connu un large succès au cours de la dernière décennie, devenant un standard de la vision par ordinateur. Ce succès s’est fait au détriment d’un large coût de calcul, où le déploiement des CNNs reste une tâche ardue surtout sous des contraintes de temps réel.Afin de rendre ce déploiement possible, la littérature exploite le parallélisme important de ces algorithmes, ce qui nécessite l’utilisation de plate-formes matérielles dédiées. Dans les environnements soumis à des contraintes de consommations énergétiques, tels que les nœuds des caméras intelligentes, les cœurs de traitement à base de FPGAs sont reconnus comme des solutions de choix pour accélérer les applications de vision par ordinateur. Ceci est d’autant plus vrai pour les CNNs, où les traitements se font naturellement sur un flot de données, rendant les architectures matérielles à base de FPGA d’autant plus pertinentes. Dans ce contexte, cette thèse aborde les problématiques liées à l’implémentation des CNNs sur FPGAs. En particulier, ces travaux visent à améliorer l’efficacité des implantations grâce à deux principales stratégies d’optimisation; la première explore le modèle et les paramètres des CNNs, tandis que la seconde se concentre sur les architectures matérielles adaptées au FPGA
Deep Convolutional Neural Networks (CNNs) have become a de-facto standard in computer vision. This success came at the price of a high computational cost, making the implementation of CNNs, under real-time constraints, a challenging task.To address this challenge, the literature exploits the large amount of parallelism exhibited by these algorithms, motivating the use of dedicated hardware platforms. In power-constrained environments, such as smart camera nodes, FPGA-based processing cores are known to be adequate solutions in accelerating computer vision applications. This is especially true for CNN workloads, which have a streaming nature that suits well to reconfigurable hardware architectures.In this context, the following thesis addresses the problems of CNN mapping on FPGAs. In Particular, it aims at improving the efficiency of CNN implementations through two main optimization strategies; The first one focuses on the CNN model and parameters while the second one considers the hardware architecture and the fine-grain building blocks
APA, Harvard, Vancouver, ISO, and other styles
7

Vargun, Bilgin. "Acceleration Of Molecular Dynamics Simulation For Tersoff2 Potential Through Reconfigurable Hardware." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12615063/index.pdf.

Full text
Abstract:
In nanotechnology, Carbon Nanotubes systems are studied with Molecular Dynamics Simulation software programs investigating the properties of molecular structure. Computational loads are very complex in these kinds of software programs. Especially in three body simulations, it takes a couple of weeks for small number of atoms. Researchers use supercomputers to study more complex systems. In recent years, by the development of sophisticated Field Programmable Gate Array (FPGA) Technology, researchers design special purpose co-processor to accelerate their simulations. Ongoing researches show that using application specific digital circuits will have better performance with respect to an ordinary computer. In this thesis, a new special co-processor, called TERSOFF2, is designed and implemented. Resulting design is a low cost, low power and high performance computing solution. It can solve same computation problem 1000 times faster. Moreover, an optimized digital mathematical elementary functions library is designed and implemented through thesis study. All of the work about digital circuits and architecture of co-processor will be given in the related chapter. Performance achievements will be at the end of thesis.
APA, Harvard, Vancouver, ISO, and other styles
8

Martin, Phillip Murray. "Acceleration methodology for the implementation of scientific application on reconfigurable hardware." Connect to this title online, 2009. http://etd.lib.clemson.edu/documents/1246557242/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Silva, João Paulo Sá da. "Data processing in Zynq APSoC." Master's thesis, Universidade de Aveiro, 2014. http://hdl.handle.net/10773/14703.

Full text
Abstract:
Mestrado em Engenharia de Computadores e Telemática
Field-Programmable Gate Arrays (FPGAs) were invented by Xilinx in 1985, i.e. less than 30 years ago. The influence of FPGAs on many directions in engineering is growing continuously and rapidly. There are many reasons for such progress and the most important are the inherent reconfigurability of FPGAs and relatively cheap development cost. Recent field-configurable micro-chips combine the capabilities of software and hardware by incorporating multi-core processors and reconfigurable logic enabling the development of highly optimized computational systems for a vast variety of practical applications, including high-performance computing, data, signal and image processing, embedded systems, and many others. In this context, the main goals of the thesis are to study the new micro-chips, namely the Zynq-7000 family and to apply them to two selected case studies: data sort and Hamming weight calculation for long vectors.
Field-Programmable Gate Arrays (FPGAs) foram inventadas pela Xilinx em 1985, ou seja, há menos de 30 anos. A influência das FPGAs está a crescer continua e rapidamente em muitos ramos de engenharia. Há varias razões para esta evolução, as mais importantes são a sua capacidade de reconfiguração inerente e os baixos custos de desenvolvimento. Os micro-chips mais recentes baseados em FPGAs combinam capacidades de software e hardware através da incorporação de processadores multi-core e lógica reconfigurável permitindo o desenvolvimento de sistemas computacionais altamente otimizados para uma grande variedade de aplicações práticas, incluindo computação de alto desempenho, processamento de dados, de sinal e imagem, sistemas embutidos, e muitos outros. Neste contexto, este trabalho tem como o objetivo principal estudar estes novos micro-chips, nomeadamente a família Zynq-7000, para encontrar as melhores formas de potenciar as vantagens deste sistema usando casos de estudo como ordenação de dados e cálculo do peso de Hamming para vetores longos.
APA, Harvard, Vancouver, ISO, and other styles
10

Blumer, Aric David. "Register Transfer Level Simulation Acceleration via Hardware/Software Process Migration." Diss., Virginia Tech, 2007. http://hdl.handle.net/10919/29380.

Full text
Abstract:
The run-time reconfiguration of Field Programmable Gate Arrays (FPGAs) opens new avenues to hardware reuse. Through the use of process migration between hardware and software, an FPGA provides a parallel execution cache. Busy processes can be migrated into hardware-based, parallel processors, and idle processes can be migrated out increasing the utilization of the hardware. The application of hardware/software process migration to the acceleration of Register Transfer Level (RTL) circuit simulation is developed and analyzed. RTL code can exhibit a form of locality of reference such that executing processes tend to be executed again. This property is termed executive temporal locality, and it can be exploited by migration systems to accelerate RTL simulation. In this dissertation, process migration is first formally modeled using Finite State Machines (FSMs). Upon FSMs are built programs, processes, migration realms, and the migration of process state within a realm. From this model, a taxonomy of migration realms is developed. Second, process migration is applied to the RTL simulation of digital circuits. The canonical form of an RTL process is defined, and transformations of HDL code are justified and demonstrated. These transformations allow a simulator to identify basic active units within the simulation and combine them to balance the load across a set of processors. Through the use of input monitors, executive locality of reference is identified and demonstrated on a set of six RTL designs. Finally, the implementation of a migration system is described which utilizes Virtual Machines (VMs) and Real Machines (RMs) in existing FPGAs. Empirical and algorithmic models are developed from the data collected from the implementation to evaluate the effect of optimizations and migration algorithms.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
11

Lloyd, G. Scott. "Accelerated Large-Scale Multiple Sequence Alignment with Reconfigurable Computing." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2729.

Full text
Abstract:
Multiple Sequence Alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. The time to compute an optimal MSA grows exponentially with respect to the number of sequences. Consequently, producing timely results on large problems requires more efficient algorithms and the use of parallel computing resources. Reconfigurable computing hardware provides one approach to the acceleration of biological sequence alignment. Other acceleration methods typically encounter scaling problems that arise from the overhead of inter-process communication and from the lack of parallelism. Reconfigurable computing allows a greater scale of parallelism with many custom processing elements that have a low-overhead interconnect. The proposed parallel algorithms and architecture accelerate the most computationally demanding portions of MSA. An overall speedup of up to 150 has been demonstrated on a large data set when compared to a single processor. The reduced runtime for MSA allows researchers to solve the larger problems that confront biologists today.
APA, Harvard, Vancouver, ISO, and other styles
12

Werner, Stefan [Verfasser]. "Hybrid architecture for hardware-accelerated query processing in semantic web databases based on runtime reconfigurable FPGAs / Stefan Werner." Lübeck : Zentrale Hochschulbibliothek Lübeck, 2017. http://d-nb.info/1143986946/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

SAU, CARLO. "Dataflow based design suite for the development and management of multi-functional reconfigurable systems." Doctoral thesis, Università degli Studi di Cagliari, 2016. http://hdl.handle.net/11584/266751.

Full text
Abstract:
Embedded systems development constitutes an extremely challenging scenario for the designers since several constraints have to be meet at the same time. Flexibil- ity, performance and power efficiency are typically colliding requirements that are hardly addressed together. Reconfigurable systems provide a valuable alternative to common architectures to challenge contemporarily all those issues. Such a kind of systems, and in particular the coarse grained ones, exhibit a certain level of flexi- bility while guaranteeing strong performance. However they suffer of an increased design and management complexity. In this thesis it is discussed a fully automated methodology for the development of coarse grained reconfigurable platforms, by exploiting dataflow models for the de- scription of the desired functionalities. The thesis describes, actually, a whole design suite that offers, besides the reconfigurable substrate composition, also structural optimisation, dynamic power management and co-processing support. All the pro- vided features have been validated on different signal, image and video processing scenarios, targeting either FPGA and ASIC.
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Ching-Shun, and 王靖順. "Reconfigurable Hardware Architecture Design and Implementation for AI Deep Learning Accelerator." Thesis, 2019. http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5441107%22.&searchmode=basic.

Full text
Abstract:
碩士
國立中興大學
電機工程學系所
107
This paper proposes the Convolution Neural Network hardware accelerator architecture with 288PE to achieve 230.4GOPS@400Mhz. To verify the hardware function, the hardware is implemented at 100MHz in units of 72PE owing to the limitation of FPGA resources. The proposed CNN hardware accelerator is Layer-based architecture which can be reconfigured the layer parameters to suitable for different CNN architectures. The proposed architecture is based on operating three Rows Input feature map and then generate a Row Output feature map. The proposed architecture uses 322KB On-Chip Memory to store Input feature map, Bias, Kernel, and Output feature map to improve the efficiency of Data reuse and reduce bandwidth utilization. In this paper, the Max-pooling layer after the Convolution layer can be combined to reduce the bandwidth of DRAM.
APA, Harvard, Vancouver, ISO, and other styles
15

Jung, Lukas Johannes. "Optimization of the Memory Subsystem of a Coarse Grained Reconfigurable Hardware Accelerator." Phd thesis, 2019. https://tuprints.ulb.tu-darmstadt.de/8674/1/2019-05-13_Jung_Lukas_Johannes.pdf.

Full text
Abstract:
Fast and energy efficient processing of data has always been a key requirement in processor design. The latest developments in technology emphasize these requirements even further. The widespread usage of mobile devices increases the demand of energy efficient solutions. Many new applications like advanced driver assistance systems focus more and more on machine learning algorithms and have to process large data sets in hard real time. Up to the 1990s the increase in processor performance was mainly achieved by new and better manufacturing technologies for processors. That way, processors could operate at higher clock frequencies, while the processor microarchitecture was mainly the same. At the beginning of the 21st century this development stopped. New manufacturing technologies made it possible to integrate more processor cores onto one chip, but almost no improvements were achieved anymore in terms of clock frequencies. This required new approaches in both processor microarchitecture and software design. Instead of improving the performance of a single processor, the current problem has to be divided into several subtasks that can be executed in parallel on different processing elements which speeds up the application. One common approach is to use multi-core processors or GPUs (Graphic Processing Units) in which each processing element calculates one subtask of the problem. This approach requires new programming techniques and legacy software has to be reformulated. Another approach is the usage of hardware accelerators which are coupled to a general purpose processor. For each problem a dedicated circuit is designed which can solve the problem fast and efficiently. The actual computation is then executed on the accelerator and not on the general purpose processor. The disadvantage of this approach is that a new circuit has to be designed for each problem. This results in an increased design effort and typically the circuit can not be adapted once it is deployed. This work covers reconfigurable hardware accelerators. They can be reconfigured during runtime so that the same hardware is used to accelerate different problems. During runtime, time consuming code fragments can be identified and the processor itself starts a process that creates a configuration for the hardware accelerator. This configuration can now be loaded and the code will then be executed on the accelerator faster and more efficient. A coarse grained reconfigurable architecture was chosen because creating a configuration for it is much less complex than creating a configuration for a fine grained reconfigurable architecture like an FPGA (Field Programmable Gate Array). Additionally, the smaller overhead for the reconfigurability results in higher clock frequencies. One advantage of this approach is that programmers don't need any knowledge about the underlying hardware, because the acceleration is done automatically during runtime. It is also possible to accelerate legacy code without user interaction (even when no source code is available anymore). One challenge that is relevant for all approaches, is the efficient and fast data exchange between processing elements and main memory. Therefore, this work concentrates on the optimization of the memory interface between the coarse grained reconfigurable hardware accelerator and the main memory. To achieve this, a simulator for a Java processor coupled with a coarse grained reconfigurable hardware accelerator was developed during this work. Several strategies were developed to improve the performance of the memory interface. The solutions range from different hardware designs to software solutions that try to optimize the usage of the memory interface during the creation of the configuration of the accelerator. The simulator was used to search the design space for the best implementation. With this optimization of the memory interface a performance improvement of 22.6% was achieved. Apart from that, a first prototype of this kind of accelerator was designed and implemented on an FPGA to show the correct functionality of the whole approach and the simulator.
APA, Harvard, Vancouver, ISO, and other styles
16

Fan, Yang-Tzu, and 范揚賜. "Design and Implementation of a Reconfigurable Computing System Environment and Hardware Accelerator IP Cores for Image Processing." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/21234983342049030923.

Full text
Abstract:
碩士
逢甲大學
資訊工程所
92
In the trend of product design nowadays, the life cycle of new product is shorter and shorter, but the functionality and complexity of the product is getting higher and higher. It is truly an arduous task for new products manufacturer to overcome. For this reason, developing a development environment with Intellectual Property for IP reuse could shorten the time-to-market. It could make the management and usage more convenience by designing hardware IP by components. It could also lower the risk of designing a huge hardware system by individually synthesizing and simulating the components that the hardware system used. Many researches indicated that reconfigurable computing system could improve the specific application performance. If it is possible to combine the IP reuse development environment and reconfigurable computing hardware, it would speedup the time-to-market and improve the system performance. In this paper, we brought up a design environment of reconfigurable computing system. It includes a suite of software named ReIPD Tool (Reusable IP Development tool) and a multi-FPGA PCI card named PCI-mFCU (PCI I/O with multiple FPGA Configurable Unit). In order to meet the requirements of more and more functionalities, complexities and shorter and shorter time-to-market of products, it is necessary to build a intellectual property library (IP Library). Hardware designers could use IP library to reduce the development time. After that, hardware designs could be verified by PCI-mFCU. Finally, we will develop and verify some image processing applications by using this develop environment with ReIPD Tool and PCI-mFCU.
APA, Harvard, Vancouver, ISO, and other styles
17

Chang, Shao-Hsuan, and 張紹宣. "Design and Implementation of an ALU Cluster Intellectual Property as a Reconfigurable Hardware Accelerator for Media Streaming Architecture." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/66721172009467596171.

Full text
Abstract:
碩士
國立交通大學
電信工程系所
94
There are more and more portable systems such as mobiles, MP3 player, PDA, and other entertainment systems in today’s life. The functionality and complexity of them thus increase much higher than old-time ones. Therefore, having a great deal ability of multimedia operation is important for portable systems. However, it is tough to have enough amounts of multimedia operations from conventional hardware architecture. This results from the poor match between conventional architecture and features of media applications. It hence leads to inefficient memory access that induces performance degression. The worst case is unable to meet the real time requirement. According, this thesis designs an operational unit, ALU cluster, that is referenced from Stanford’s stream processor architecture and thus matches to media applications to provide necessary processing requirements for media applications. Besides, considering the issues of convenient usage in the future and rapid integration of real multimedia applications, we wrap ALU cluster as an AMBA-compatible IP by adding designed interface. Then, it is possible to exploit other existing IP and peripherals in the AMBA platform and truly treats our design as hardware accelerator for real multimedia applications. This thesis is finished with a synthesizable soft IP. The designed interface is verified by ARM-series baseboard. This ensures that the interface conforms to AMBA specification.
APA, Harvard, Vancouver, ISO, and other styles
18

Thurmon, Brandon Parks. "Reconfigurable hardware acceleration of exact stochastic simulation." 2005. http://etd.utk.edu/2005/ThurmonBrandon.pdf.

Full text
Abstract:
Thesis (M.S.) -- University of Tennessee, Knoxville, 2005.
Title from title page screen (viewed on Sept. 1, 2005). Thesis advisor: Gregory D. Peterson. Document formatted into pages (viii, 218 p. : ill. (some color)). Vita. Includes bibliographical references (p. 67-69).
APA, Harvard, Vancouver, ISO, and other styles
19

Paulino, Nuno Miguel Cardanha. "Generation of Custom Run-Time Reconfigurable Hardware for Transparent Binary Acceleration." Doctoral thesis, 2016. https://repositorio-aberto.up.pt/handle/10216/83952.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Paulino, Nuno Miguel Cardanha. "Generation of Custom Run-Time Reconfigurable Hardware for Transparent Binary Acceleration." Tese, 2016. https://repositorio-aberto.up.pt/handle/10216/83952.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Best, Joel. "Real-Time Operating System Hardware Extension Core for System-on-Chip Designs." Thesis, 2013. http://hdl.handle.net/10214/5257.

Full text
Abstract:
This thesis presents a real-time operating system hardware extension core which supports the integration of hardware accelerators into real-time system-on-chip designs as hardware tasks. The hardware extension core utilizes reconfigurable logic to manage synchronization events, data transfers, and hardware task control. A reduction in interrupt latency, frequency, and execution time provides performance and predictability improvements for real-time applications. Required communication between the CPU and hardware accelerators is also reduced significantly. Compared to a software implementation, synthetic benchmarks of common synchronization tasks show up to a 41% increase in synchronization performance. Analysis of a test case design for audio encoding and encryption using three hardware accelerators shows results of a 2.89x throughput improvement in comparison to the use of software device driver tasks. Overall, this design simplifies the integration of hardware accelerators into real-time system-on-chip designs while improving the performance and predictability of these systems.
APA, Harvard, Vancouver, ISO, and other styles
22

"Scalable Register File Architecture for CGRA Accelerators." Master's thesis, 2016. http://hdl.handle.net/2286/R.I.40738.

Full text
Abstract:
abstract: Coarse-grained Reconfigurable Arrays (CGRAs) are promising accelerators capable of accelerating even non-parallel loops and loops with low trip-counts. One challenge in compiling for CGRAs is to manage both recurring and nonrecurring variables in the register file (RF) of the CGRA. Although prior works have managed recurring variables via rotating RF, they access the nonrecurring variables through either a global RF or from a constant memory. The former does not scale well, and the latter degrades the mapping quality. This work proposes a hardware-software codesign approach in order to manage all the variables in a local nonrotating RF. Hardware provides modulo addition based indexing mechanism to enable correct addressing of recurring variables in a nonrotating RF. The compiler determines the number of registers required for each recurring variable and configures the boundary between the registers used for recurring and nonrecurring variables. The compiler also pre-loads the read-only variables and constants into the local registers in the prologue of the schedule. Synthesis and place-and-route results of the previous and the proposed RF design show that proposed solution achieves 17% better cycle time. Experiments of mapping several important and performance-critical loops collected from MiBench show proposed approach improves performance (through better mapping) by 18%, compared to using constant memory.
Dissertation/Thesis
Masters Thesis Computer Science 2016
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography