Log in

Relevant bibliographies by topics / Mechatronics hardware design and architecture / Dissertations / Theses

Dissertations / Theses on the topic 'Mechatronics hardware design and architecture'

To see the other types of publications on this topic, follow the link: Mechatronics hardware design and architecture.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Mechatronics hardware design and architecture.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Basic, Goran. "Hardware-in-the-loop simulation of mechanical loads for mechatronics system design." Thesis, University of Ottawa (Canada), 2003. http://hdl.handle.net/10393/26323.

Full text

Abstract:

Current research efforts in Hardware-In-The-Loop (HIL) simulations are directed toward testing Electronic Control Units, simulated digitally, in a physical experimental setup. This thesis presents different approach of using HIL simulations. Active and passive mechanical loads can be simulated physically on direct drive motors, under computer control. The work in thesis is based on effort-flow concept, which allows components of the experimental setup to be replaced as needed by physical or digital model. The only requirement that has to be satisfied is that elements of the setup retain their inputs and outputs, in the form of effort and flow pairs. Based on this theory, the new experimental setup was built, a generic HIL setup containing two DC motors, which are connected by shaft. One of the motors is used to actuate the system, while another motor represents the physical simulator. Based on the sets of derived formulas, physical simulator is able to simulate active and passive loads. Three different experimental levels are presented in the thesis. The open loop, current control and torque control experiments. The experimental results prove the concept in whole and show that theory can be applied in real world applications. The focus in this research is on simulation of nonlinear loads, whose models are presented by sets of nonlinear differential equations. Digital simulations require solutions of those equations and it is demanding job. The method of physical simulations presented in this thesis shows simpler way of simulating complex loads.

APA, Harvard, Vancouver, ISO, and other styles

2

Yazdanpanah, Fahimeh. "Hardware design of task superscalar architecture." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/277376.

Full text

Abstract:

Exploiting concurrency to achieve greater performance is a difficult and important challenge for current high performance systems. Although the theory is plain, the complexity of traditional parallel programming models in most cases impedes the programmer to harvest performance. Several partitioning granularities have been proposed to better exploit concurrency at task granularity. In this sense, different dynamic software task management systems, such as task-based dataflow programming models, benefit dataflow principles to improve task-level parallelism and overcome the limitations of static task management systems. These models implicitly schedule computation and data and use tasks instead of instructions as a basic work unit, thereby relieving the programmer of explicitly managing parallelism. While these programming models share conceptual similarities with the well-known Out-of-Order superscalar pipelines (e.g., dynamic data dependency analysis and dataflow scheduling), they rely on software-based dependency analysis, which is inherently slow, and limits their scalability when there is fine-grained task granularity and a large amount of tasks. The aforementioned problem increases with the number of available cores. In order to keep all the cores busy and accelerate the overall application performance, it becomes necessary to partition it into more and smaller tasks. The task scheduling (i.e., creation and management of the execution of tasks) in software introduces overheads, and so becomes increasingly inefficient with the number of cores. In contrast, a hardware scheduling solution can achieve greater speed-ups as a hardware task scheduler requires fewer cycles than the software version to dispatch a task. The Task Superscalar is a hybrid dataflow/von-Neumann architecture that exploits the task level parallelism of the program. The Task Superscalar combines the effectiveness of Out-of-Order processors together with the task abstraction, and thereby provides an unified management layer for CMPs which effectively employs processors as functional units. The Task Superscalar has been implemented in software with limited parallelism and high memory consumption due to the nature of the software implementation. In this thesis, a Hardware Task Superscalar architecture is designed to be integrated in a future High Performance Computer with the ability to exploit fine-grained task parallelism. The main contributions of this thesis are: (1) a design of the operational flow of Task Superscalar architecture adapted and improved for hardware implementation, (2) a HDL prototype for latency exploration, (3) a full cycle-accurate simulator of the Hardware Task Superscalar (based on the previously obtained latencies), (4) full design space exploration of the Task Superscalar component configuration (number and size) for systems with different number of processing elements (cores), (5) comparison with a software implementation of a real task-based programming model runtime using real benchmarks, and (6) hardware resource usage exploration of the selected configurations.
Explotar la concurrencia para conseguir un mejor rendimiento es un reto importante y difícil para los sistemas de alto rendimiento. Aunque la teoría es sencilla, en muchos casos la complejidad de los modelos de programación paralela tradicionales impide al programador obtener un buen rendimiento. Se han propuesto diferentes granularidades de particionamiento de tareas para explotar mejor la concurrencia implícita en las aplicaciones. En este sentido, diferentes sistemas software de manejo dinámico de tareas utilizan los principios de ejecución "dataflow" para mejorar el paralelismo a nivel de tarea y superar el rendimiento de los sistemas de planificación estáticos. Estos modelos planfican la ejecución dinámicamente y utilizan tareas, en lugar de instrucciones, como unidad básica de trabajo. De esta forma descargan al programador de tener que realizar la sincronización de las tareas explícitamente en su programa. Aunque estos modelos de programación comparten muchas similitudes con los bien conocidos procesadores fuera de orden (como el análisis dinámico de dependencias y la ejecución en "dataflow"), dependen de un análisis dinámico software de las dependencias. Dicho análisis es inherentemente lento y limita la escalabilidad cuando hay un gran número de tareas pequeñas. Los problemas antes mencionados se incrementan exponencialmente con el número de núcleos disponibles. Para conseguir mantener todos los núcleos ocupados y conseguir acelerar el rendimiento global de la aplicación se hace necesario particionarla en muchas tareas pequeñas. La gestión de dichas tareas (es decir, su creación y distribución entre los núcleos) en software introduce sobrecostes, y por tanto resulta ineficiente conforme aumenta el número de núcleos. En contraposición, un sistema hardware de planificación de tareas puede conseguir mejores rendimientos ya que requiere una menor latencia en la gestión de las tareas. El Task Superscalar (TSS) es una arquitectura híbrida dataflow/von-Neumann que explota el paralelismo a nivel de tareas de los programas. El TSS combina la efectividad de los procesadores fuera de orden con la abstracción de tarea, y por tanto provee una capa unificada de gestión para los CMPs que gestiona los núcleos como unidades funcionales. Previo al trabajo de esta tesis el Task Superscalar se había implementado en software con un paralelismo limitado y mucho consumo de memoria debido a las limitaciones inherentes de una implementación software. En esta tesis se diseñado una implementación hardware de la arquitectura Task Superscalar con capacidad para manejar muchas tareas de pequeño tamaño que es integrable en un futuro computador de altas prestaciones. Así pues, las contribuciones principales de esta tesis son: (1) el diseño de un flujo operacional de la arquitectura Task Superscalar adaptado y mejorado para su implementación hardware; (2) un prototipo HDL de dicho flujo para la exploración de las latencias asociadas a la implementación hardware; (3) un simulador ciclo a ciclo del diseño hardware basado en los resultados obtenidos en la implementación hardware; (4) una exploración completa del espacio de diseño de los componentes hardware (número y cantidad de módulos, tamaños de las memorias, etc.) para diferentes tamaños de computadores (es decir, para diferentes cantidades de nucleos); (5) una comparación con la implementación software actual del mismo modelo de programación utilizando aplicaciones reales y; (6) una exploración de la utilización de recursos hardware de las diferentes configuraciones seleccionadas.

APA, Harvard, Vancouver, ISO, and other styles

3

Persson, Robert. "PPS5000 Thruster Emulator Architecture Development & Hardware Design." Thesis, Luleå tekniska universitet, Rymdteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-72827.

Full text

Abstract:

This Master's Thesis handles prestudy work and early hardware development that resulted in architectural definitions and prototype hardware of electronic ground support equipment. This equipment is destined to emulate the electric power consumption of the PPS5000 Hall Effect Thruster (HET), for use in satellite end-to-end tests of the all-electric Geostationary Satellite Electra, developed at OHB Sweden AB. The Thruster Emulator (TEM) was defined through a resulting compilation of intricate interdependent components that interface the satellite power system and the thruster, which yielded an architecture development to support some basic predefined emulator requirements. This architecture was then analyzed to form a base-line conceptual function of the emulator system, which incorporates the entire HET functionality. Six primary HET impedances were defined, of which the three most complex impedances were investigated fully. For the primary thruster discharge, research is shown of the complexity of implementing advanced electronic load hardware directly to the satellite's 5kW power system with respect to the transient primary plasma discharge during thruster start up, and with limitations on the electronic load reducing emulator-thruster similarities. Additionally, a fully functional plasma ignition emulator prototype circuit board was built to be used in the final hardware of the TEM to emulate the external HET cathode start-up functionality. Finally, a feasibility study for designing a possible solution for the large PPS5000 electromagnet impedance was performed, resulting in the manufacture of two prototype inductors with unsatisfying performance results according to the design requirements.

APA, Harvard, Vancouver, ISO, and other styles

4

Mahmud, Akib. "Hardware in the Loop (HIL) Rig Design and Electrical Architecture." Thesis, Uppsala universitet, Institutionen för teknikvetenskaper, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-324661.

Full text

Abstract:

Different types of machines are tested utilizing so called Hardware In the Loop simulation. To perform HIL-simulation a rig is used consisting of different types of hardware and software. Some of the hardware that are used during a simulation is located inside an EMS box. The box has not been properly updated since 2004, no documentation of changes has been made and often many errors occurs during simulations due to the lack of traceability. During this project a new structure of the EMS box has been designed with modifications to eliminate existing problems, prevent similar problems to occur in the future and improve the usability of the system. A simulation was performed on the camshaft to test if there were any improvements. Most issues were solved but there were one problem that remained. Some noises existed and were rooted in the old box which undeniably remained in the new one.

APA, Harvard, Vancouver, ISO, and other styles

5

Davis, Jesse H. Z. (Jesse Harper Zehring) 1980. "Hardware & software architecture for multi-level unmanned autonomous vehicle design." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/16968.

Full text

Abstract:

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.
Includes bibliographical references (p. 95-96).
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
The theory, simulation, design, and construction of a radically new type of unmanned aerial vehicle (UAV) are discussed. The vehicle architecture is based on a commercially available non-autonomous flyer called the Vectron Blackhawk Flying Saucer. Due to its full body rotation, the craft is more inherently gyroscopically stable than other more common types of UAVs. This morphology was chosen because it has never before been made autonomous, so the theory, simulation, design, and construction were all done from fundamental principles as an example of original multi-level autonomous development.
by Jesse H.Z. Davis.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

6

Pajayakrit, A. "VLSI architecture and design for the Fermat Number Transform implementation." Thesis, University of Newcastle Upon Tyne, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.379767.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Patel, Krutartha Computer Science &amp Engineering Faculty of Engineering UNSW. "Hardware-software design methods for security and reliability of MPSoCs." Awarded by:University of New South Wales. Computer Science & Engineering, 2009. http://handle.unsw.edu.au/1959.4/44854.

Full text

Abstract:

Security of a Multi-Processor System on Chip (MPSoC) is an emerging area of concern in embedded systems. MPSoC security is jeopardized by Code Injection attacks. Code Injection attacks, which are the most common types of software attacks, have plagued single processor systems. Design of MPSoCs must therefore incorporate security as one of the primary objectives. Code Injection attacks exploit vulnerabilities in \trusted" and legacy code. An architecture with a dedicated monitoring processor (MONITOR) is employed to simultaneously supervise the application processors on an MPSoC. The program code in the application processors is divided into basic blocks. The basic blocks in the application processors are statically instrumented with special instructions that allow communication with the MONITOR at runtime. The MONITOR verifies the execution of all the processors at runtime using control flow checks and either a timing or instruction count check. This thesis proposes a monitoring system called SOFTMON, a design methodology called SHIELD, a design flow called LOCS and an architectural framework called CUFFS for detecting Code Injection attacks. SOFTMON, a software monitoring system, uses a software algorithm in the MONITOR. SOFTMON incurs limited area overheads. However, the runtime performance overhead is quite high. SHIELD, an extension to the work in SOFTMON overcomes the limitation of high runtime overhead using a MONITOR that is predominantly hardware based. LOCS uses only one special instruction per basic block compared to two, as was the case in SOFTMON and SHIELD. Additionally, profile information is generated for all the basic blocks in all the application processors for the MPSoC designer to tune the design by increasing or decreasing the frequency of loop basic blocks. CUFFS detects attacks even without application processors communicating to the MONITOR. The SOFTMON, SHIELD and LOCS approaches can only detect attacks if the application processors communicate to the MONITOR. CUFFS relies on the exact number of instructions in basic blocks to determine an attack, rather than time-frame based measures used in SOFTMON, SHIELD and LOCS. The lowest runtime performance overhead was achieved by LOCS (worst case of 37.5%), while the SOFTMON monitoring system had the least amount of area overheads of about 25%. The CUFFS approach employed an active MONITOR and hence detected a greater range of attacks. The CUFFS framework also detects bit flip errors (reliability errors) in the control flow instructions of the application processors on an MPSoC. CUFFS can detect nearly 70% of all bit flip errors in the control flow instructions. Additionally, a modified CUFFS approach is proposed to ensure reliable inter-processor communication on an MPSoC. The modified CUFFS approach uses a hardware based checksum approach for reliable inter-processor communication and incurred a runtime performance overhead of up to 25% and negligible area overheads compared to CUFFS. Thus, the approaches proposed in this thesis equip an MPSoC designer with tools to embed security features during an MPSoC's design phase. Incorporating security measures at the processor design level provides security against software attacks in MPSoCs and incurs manageable runtime, area and code-size overheads.

APA, Harvard, Vancouver, ISO, and other styles

8

Liang, Cao. "Hardware/Software Co-Design Architecture and Implementations of MIMO Decoders on FPGA." ScholarWorks@UNO, 2006. http://scholarworks.uno.edu/td/416.

Full text

Abstract:

During the last years, multiple-input multiple-output (MIMO) technology has attracted great attentions in the area of wireless communications. The hardware implementation of MIMO decoders becomes a challenging task as the complexity of the MIMO system increases. This thesis presents hardware/software co-design architecture and implementations of two typical lattice decoding algorithms, including Agrell and Vardy (AV) algorithm and Viterbo and Boutros (VB) algorithm. Three levels of parallelisms are analyzed for an efficient implementation with the preprocessing part on embedded MicroBlaze soft processor and the decoding part on customized hardware. The decoders for a 4 by 4 MIMO system with 16-QAM modulation scheme are prototyped on a Xilinx XC2VP30 FPGA device. The hardware implementations of the AV and VB decoders show that they support up to 81 Mbps and 37 Mbps data rate respectively. The performances in terms of resource utilizations and BER are also compared between these two decoders.

APA, Harvard, Vancouver, ISO, and other styles

9

Moreira, Francis Birck. "Profiling and reducing micro-architecture bottlenecks at the hardware level." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/103977.

Full text

Abstract:

A maior parte dos mecanismos em processadores superescalares atuais usam granularidade de instrução para criar ou caracterizar especulações, tais como predição de desvios ou prefetchers. No entanto, muitas das características das instruções podem ser obtidas ao analisar uma granularidade mais grossa, o bloco básico de código, aumentando a quantidade de código coberta em um espaço similar de armazenamento. Adicionalmente, códigos podem ser analisados mais precisamente e prover uma variedade maior de informação ao observar diferentes tipos de instruções e suas relações. Devido a estas vantagens, a análise no nível de blocos pode fornecer mais oportunidades para mecanismos que necessitam desta informação. Por exemplo, é possível integrar informações de desvios mal previstos e acessos a memória para gerar informações mais precisas de quais acessos a memória oferecem melhor desempenho ao serem priorizados. Nesta tese propomos o Block-Level Architecture Profiler (BLAP) (Block Level Architecture Profiler), um mecanismo em hardware que caracteriza gargalos no nível microarquitetural, tal como loads delinquentes, desvios de difícil previsão e contenção nas unidades funcionais. O BLAP trabalha no nível de bloco básico, apenas detectando e fornecendo informações que podem ser usada para otimizar tais gargalos. Um mecanismo para a remoção de prefetches e uma política de controlador de memória DRAM foram criados para usar a informação criada pelo BLAP e demonstrar seu potencial. Juntos, estes mecanismos são capazes de melhorar o desempenho do sistema em até 17.39% (3.9% em média). Nosso método mostrou também ganhos médios de 13.14% quando avaliado com uma pressão na memória mais alta devido a prefetchers mais agressivos.
Most mechanisms in current superscalar processors use instruction granularity information for speculation, such as branch predictors or prefetchers. However, many of these characteristics can be obtained at the basic block level, increasing the amount of code that can be covered while requiring less space to store the data. Moreover, the code can be profiled more accurately and provide a higher variety of information by analyzing different instruction types inside a block. Because of these advantages, block-level analysis can offer more opportunities for mechanisms that use this information. For example, it is possible to integrate information about branch prediction and memory accesses to provide precise information for speculative mechanisms, increasing accuracy and performance. We propose a BLAP, an online mechanism that profiles bottlenecks at the microarchitectural level, such as delinquent memory loads, hard-to-predict branches and contention for functional units. BLAP works at the basic block level, providing information that can be used to reduce the impact of these bottlenecks. A prefetch dropping mechanism and a memory controller policy were developed to use the profiled information provided by BLAP. Together, these mechanisms are able to improve performance by up to 17.39% (3.90% on average). Our technique showed average gains of 13.14% when evaluated under high memory pressure due to highly aggressive prefetch.

APA, Harvard, Vancouver, ISO, and other styles

10

Woods, Walt. "The Design of a Simple, Spiking Sparse Coding Algorithm for Memristive Hardware." PDXScholar, 2016. http://pdxscholar.library.pdx.edu/open_access_etds/2721.

Full text

Abstract:

Calculating a sparse code for signals with high dimensionality, such as high-resolution images, takes substantial time to compute on a traditional computer architecture. Memristors present the opportunity to combine storage and computing elements into a single, compact device, drastically reducing the area required to perform these calculations. This work focused on the analysis of two existing sparse coding architectures, one of which utilizes memristors, as well as the design of a new, third architecture that employs a memristive crossbar. These architectures implement either a non-spiking or spiking variety of sparse coding based on the Locally Competitive Algorithm (LCA) introduced by Rozell et al. in 2008. Each architecture receives an arbitrary number of input lines and drives an arbitrary number of output lines. Training of the dictionary used for the sparse code was implemented through external control signals that approximate Oja's rule. The resulting designs were capable of representing input in real-time: no resets would be needed between frames of a video, for instance, though some settle time would be needed. The spiking architecture proposed is novel, emphasizing simplicity to achieve lower power than existing designs. The architectures presented were tested for their ability to encode and reconstruct 8 x 8 patches of natural images. The proposed network reconstructed patches with a normalized, root-mean-square error of 0.13, while a more complicated CMOS-only approach yielded 0.095, and a non-spiking approach yielded 0.074. Several outputs competing for representation of the input was shown to improve reconstruction quality and preserve more subtle components in the final encoding; the proposed algorithm lacks this feature. Steps to address this were proposed for future work by scaling input spikes according to the current expected residual, without adding much complexity. The architectures were also tested with the MNIST digit database, passing a sparse code onto a basic classifier. The proposed architecture scored 81% on this test, a CMOS-only spiking variant scored 76%, and the non-spiking algorithm scored 85%. Power calculations were made for each design and compared against other publications. The overall findings showed great promise for spiking memristor-based ASICs, consuming only 28% of the power used by non-spiking architectures and 6.6% as much power as a CMOS-only spiking architecture on this task. The spike-based nature of the novel design was also parameterized into several intuitive parameters that could be adjusted to prefer either performance or power efficiency. The design and analysis of architectures for sparse coding should greatly reduce the amount of future work needed to implement an end-to-end classification pipeline for images or other signal data. When lower power is a primary concern, the proposed architecture should be considered as it surpassed other published algorithms. These pipelines could be used to provide low-power visual assistance, highlighting objects within high-definition video frames in real-time. The technology could also be used to help self-driving cars identify hazards more quickly and efficiently.

APA, Harvard, Vancouver, ISO, and other styles

11

Zhang, Yuanzhi. "Algorithms and Hardware Co-Design of HEVC Intra Encoders." OpenSIUC, 2019. https://opensiuc.lib.siu.edu/dissertations/1769.

Full text

Abstract:

Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction.

APA, Harvard, Vancouver, ISO, and other styles

12

Robinson, Kylan Thomas. "An integrated development environment for the design and simulation of medium-grain reconfigurable hardware." Pullman, Wash. : Washington State University, 2010. http://www.dissertations.wsu.edu/Thesis/Spring2010/k_robinson_041510.pdf.

Full text

Abstract:

Thesis (M.S. in computer engineering)--Washington State University, May 2010.
Title from PDF title page (viewed on June 22, 2010). "School of Electrical Engineering and Computer Science." Includes bibliographical references (p. 75-76).

APA, Harvard, Vancouver, ISO, and other styles

13

Moustakas, Evangelos. "Design and simulation of a primitive RISC architecture using VHDL /." Online version of thesis, 1991. http://hdl.handle.net/1850/11229.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Haspel, Patrick R. "Researching methods for efficient hardware specification, design and implementation of a next generation communication architecture." [S.l.] : [s.n.], 2007. http://deposit.ddb.de/cgi-bin/dokserv?idn=984774084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Mikulcak, Marcus. "Development of a Predictable Hardware Architecture Template and Integration into an Automated System Design Flow." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-124497.

Full text

Abstract:

The requirements of safety-critical real-time embedded systems pose unique challenges on their design process which cannot be fulfilled with traditional development methods. To ensure their correct timing and functionality, it has been suggested to move the design process to a higher abstraction level, which opens the possibility to utilize automated correct-by-design development flows from a functional specification of the system down to the level of Multiprocessor Systems-on-Chip (MPSoCs). ForSyDe, an embedded system design methodology, presents a flow of this kind by basing system development on the theory of Models of Computation and side-effect-free processes, making it possible to separate the timing analysis of computation and communication of process networks. To be able to offer guarantees on the timing of tasks implemented on a MPSoc, the hardware platform needs to provide predictability and composability in every component, which in turn requires a range of special considerations in its design. This thesis presents a predictable and composable FPGA-based MPSoC template based on the Altera Nios II soft processor and Avalon Switch Fabric interconnection structure and its integration into the automated ForSyDe system design flow. To present the functionality and to test the validity of timing predictions, two sample applications have been developed and tested in the context of the design flow as well as on the implemented hardware platform.

APA, Harvard, Vancouver, ISO, and other styles

16

Vasudevan, Siddarth. "Design and Development of a CubeSat Hardware Architecture with COTS MPSoC using Radiation Mitigation Techniques." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285577.

Full text

Abstract:

CubeSat missions needs components that are tolerant against the radiation in space. The hardware components must be reliable, and it must not compromise the functionality on-board during the mission. At the same time, the cost of hardware and its development should not be high. Hence, this thesis discusses the design and development of a CubeSat architecture using a Commercial Off-The- Shelf (COTS) Multi-Processor System on Chip (MPSoC). The architecture employs an affordable Rad-Hard Micro-Controller Unit as a Supervisor for the MPSoC. Also, it uses several radiation mitigation techniques such as the Latch-up protection circuit to protect it against Single-Event Latch-ups (SELs), Readback scrubbing for Non- Volatile Memories (NVMs) such as NOR Flash and Configuration scrubbing for the FPGA present in the MPSoC to protect it against Single-Event Upset (SEU)s, reliable communication using Cyclic Redundancy Check (CRC) and Space packet protocol. Apart from such functionalities, the Supervisor executes tasks such as Watchdog that monitors the liveliness of the applications running in the MPSoC, data logging, performing Over-The-Air Software/Firmware update. The thesis work implements functionalities such as Communication, Readback memory scrubbing, Configuration scrubbing using SEM-IP, Watchdog, and Software/Firmware update. The execution times of the functionalities are presented for the application done in the Supervisor. As for the Configuration scrubbing that was implemented in Programmable Logic (PL)/FPGA, results of area and latency are reported.
CubeSat-uppdrag behöver komponenter som är toleranta mot strålningen i rymden. Maskinvarukomponenterna måste vara pålitliga och funktionaliteten ombord får inte äventyras under uppdraget. Samtidigt bör kostnaden för hårdvara och dess utveckling inte vara hög. Därför diskuterar denna avhandling design och utveckling av en CubeSatarkitektur med hjälp av COTS (eng. Custom-off-The-Shelf) MPSoC (eng. Multi Processor System-on-Chip). Arkitekturen använder en prisvärd strålningshärdad (eng. Rad-Hard) Micro-Controller Unit(MCU) som Övervakare för MPSoC:en och använder också flera tekniker för att begränsa strålningens effekter såsom kretser för att skydda kretsen från s.k. Single Event Latch-Ups (SELs), återläsningsskrubbning för icke-volatila minnen (eng. Non-Volatile Memories) NVMs som NOR Flash och skrubbning av konfigurationsminnet skrubbning för FPGA:er i MPSoC:en för att skydda dem mot Single-Event Upsets (SEUs), och tillhandahålla pålitlig kommunikation mha CRC och Space Packet Protocol. Bortsett från sådana funktioner utför Övervakaren uppgifter som Watchdog för att övervaka att applikationerna som körs i MPSoC:en fortfarande är vid liv, dataloggning, och Over- the-Air-uppdateringar av programvaran/Firmware. Examensarbetet implementerar funktioner såsom kommunikation, återläsningsskrubbning av minnet, konfigurationsminnesskrubbning mha SEM- IP, Watchdog och uppdatering av programvara/firmware. Exekveringstiderna för utförandet av funktionerna presenteras för den applikationen som körs i Övervakaren. När det gäller konfigurationsminnesskrubbningen som implementerats i den programmerbara logiken i FPGA:n, rapporteras area och latens.

APA, Harvard, Vancouver, ISO, and other styles

17

Haspel, Patrick R. "Researching methods for efficient hardware specification, design and implementation of a next generation communication architecture." Mannheim : Universität, 2006. http://madoc.bib.uni-mannheim.de/madoc/volltexte/2007/1416/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Cornevaux-Juignet, Franck. "Hardware and software co-design toward flexible terabits per second traffic processing." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2018. http://www.theses.fr/2018IMTA0081/document.

Full text

Abstract:

La fiabilité et la sécurité des réseaux de communication nécessitent des composants efficaces pour analyser finement le trafic de données. La diversification des services ainsi que l'augmentation des débits obligent les systèmes d'analyse à être plus performants pour gérer des débits de plusieurs centaines, voire milliers de Gigabits par seconde. Les solutions logicielles communément utilisées offrent une flexibilité et une accessibilité bienvenues pour les opérateurs du réseau mais ne suffisent plus pour répondre à ces fortes contraintes dans de nombreux cas critiques.Cette thèse étudie des solutions architecturales reposant sur des puces programmables de type Field-Programmable Gate Array (FPGA) qui allient puissance de calcul et flexibilité de traitement. Des cartes équipées de telles puces sont intégrées dans un flot de traitement commun logiciel/matériel afin de compenser les lacunes de chaque élément. Les composants du réseau développés avec cette approche innovante garantissent un traitement exhaustif des paquets circulant sur les liens physiques tout en conservant la flexibilité des solutions logicielles conventionnelles, ce qui est unique dans l'état de l'art.Cette approche est validée par la conception et l'implémentation d'une architecture de traitement de paquets flexible sur FPGA. Celle-ci peut traiter n'importe quel type de paquet au coût d'un faible surplus de consommation de ressources. Elle est de plus complètement paramétrable à partir du logiciel. La solution proposée permet ainsi un usage transparent de la puissance d'un accélérateur matériel par un ingénieur réseau sans nécessiter de compétence préalable en conception de circuits numériques
The reliability and the security of communication networks require efficient components to finely analyze the traffic of data. Service diversification and through put increase force network operators to constantly improve analysis systems in order to handle through puts of hundreds,even thousands of Gigabits per second. Commonly used solutions are software oriented solutions that offer a flexibility and an accessibility welcome for network operators, but they can no more answer these strong constraints in many critical cases.This thesis studies architectural solutions based on programmable chips like Field-Programmable Gate Arrays (FPGAs) combining computation power and processing flexibility. Boards equipped with such chips are integrated into a common software/hardware processing flow in order to balance short comings of each element. Network components developed with this innovative approach ensure an exhaustive processing of packets transmitted on physical links while keeping the flexibility of usual software solutions, which was never encountered in the previous state of theart.This approach is validated by the design and the implementation of a flexible packet processing architecture on FPGA. It is able to process any packet type at the cost of slight resources over consumption. It is moreover fully customizable from the software part. With the proposed solution, network engineers can transparently use the processing power of an hardware accelerator without the need of prior knowledge in digital circuit design

APA, Harvard, Vancouver, ISO, and other styles

19

Passarella, Alice. "Hardware Design and Firmware Architecture of a Multi-Sensor Platform for Monitoring of Workpieces and Machines." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

This Thesis work focuses on the description of the curricular internship activity carried out in the R&D Division of Measuring Systems at MARPOSS S.p.A. company in Bentivoglio, BO, Italy. Part of this work has been performed in the context of the 5G-SMART European project, whose goal is to demonstrate, evaluate and validate the potential of the usage of 5G networks in real manufacturing environments. The Thesis provides a description of the project, focusing on the objectives, the organizational structure and work-packages, as well as future developments. An overview of the design steps for the realization of a Multi-Sensor Platform for Monitoring of Workpieces and Machines is given. The goal is to design a device equipped with different sensors, both internal and external, able to acquire multiple data from workpieces and machines of a shop floor. Sensors must be able to communicate wirelessly via the 5G network. The analysis of the architecture options proposed as a model for the device is then provided, with the description of the final modular layout. The design schematics are examined from a circuit viewpoint, focusing on the hardware design of the various electronic components, and on their interaction with the microprocessor. In order to verify the correct functioning of the board, a basic library of the individual peripherals is developed, which is going to be used as a basis for the final Firmware.

APA, Harvard, Vancouver, ISO, and other styles

20

Niu, Xinwei. "System-on-a-Chip (SoC) based Hardware Acceleration in Register Transfer Level (RTL) Design." FIU Digital Commons, 2012. http://digitalcommons.fiu.edu/etd/888.

Full text

Abstract:

Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption.

APA, Harvard, Vancouver, ISO, and other styles

21

Schultek, Brian Robert. "Design and Implementation of the Heterogeneous Computing Device Management Architecture." University of Dayton / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1417801414.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Luo, Li-Ping, and 羅立平. "Extensible Sorting Hardware Architecture Design." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/86310261698659358923.

Full text

Abstract:

碩士
中原大學
電子工程研究所
101
In our thesis, we propose an extensible sorting hardware architecture circuit. Analyzing the repetition of the Odd-Even Transposition Sort method, we design a basic sorting cell by the hardware description language. We can apply several basic sorting cells to build as the extensible sorting hardware circuit and satisfy the required input numbers of the sorting data. The extensible sorting hardware circuit can be applied to the FlexRay communication controller circuit for adjustment of the several cases of the sorting input data numbers. FlexRay is a specification of vehicle network communication which provides high speed, timing trigger, and fault tolerance. In our thesis, we also implement the circuit of the FlexRay communication controller with the Verilog hardware description language. We demand a sorting circuit to sort the timing table data in the communication controller circuit to correct the global time. The basic sorting cell circuit can be re-used and convenient to other applications. The sorting circuit has a function to save energy by turning off the not-use modules. Finally, we verify the circuit of communication controller to simulate and synthesize the circuit on the FPGA. We experiment the field try to confirm our design of the communications controller and extensible sorting circuit working correctly.

APA, Harvard, Vancouver, ISO, and other styles

23

Wu, Tung-Yang, and 吳東陽. "Color Constancy: Algorithm and Hardware Architecture Design." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/71628149939660026643.

Full text

Abstract:

碩士
國立臺灣大學
電子工程學研究所
96
Abstract Humans are able to recognize the color of objects independently of the light sources, which is called as color constancy. In a digital still camera, a sensor is used to measure the reflected light, and the measured color at each pixel varies according to the color of the illuminant. Therefore, the resulting colors may not be the same as those perceived by users. Many algorithms have been developed to solve the color constancy problem, which is sometimes also called as auto white balance. Since digital cameras and mobile phones equipped with cameras became more and more popular in recently years, the selection of color constancy algorithms for realtime system implementation is an important issue. In this thesis, we first provide a comprehensive introduction to the field of color constancy, where the major color constancy algorithms are described. The performance of these algorithms are then evaluated, and the hardware cost of some algorithms are analyzed with a proposed system framework. Furthermore, based on the analysis results, we also propose a new algorithm by taking advantages of existing Gamut Mapping and modified Gary World algorithms. Gamut Mapping algorithm is employed when the number of recognized illuminants is small enough, and modified Gray World algorithm is employed for other cases. After comparing with other color constancy methods with a large date sets of images recording objects under different light sources, the experiments show that the proposed color constancy algorithm achieves the best performance with acceptable hardware cost.

APA, Harvard, Vancouver, ISO, and other styles

24

Wu, Tung-Yang. "Color Constancy: Algorithm and Hardware Architecture Design." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2907200814221300.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

HUANG, YU-NAN, and 黃育楠. "Hardware architecture design for adaptive predictive line search." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/31697894806495984204.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
93
Motion estimation’s computation is a key technique in most algorithms for video compression. In order to reduce the extremely high complexity of the “Full Search” approach, many fast algorithms for motion estimation’s computation have been proposed. The “Logarithm Search”, “Three-Step Search”, “Diamond Search” are among the most famous fast algorithms. In this thesis, we introduce the theory “Predictive Line Search” which is adaptive to the hardware implementation, and we also provide that the improved theory “Motion Adaptive Search”. By applying these two methods, 40% to 50% performance improvement in speed is gained while yielding nearly the same quality measured in PSNR. Combining the above-mentioned two theories, a kind of pipeline hardware architecture can be introduced in this thesis, and the Motion Estimation’s computation can be speed up by the proposed hardware’s property. In addition “Predictive Line Search” has regular search pattern, data reuse can be applied and less memory bandwidth is needed that is more appropriate for hardware strategy implementation and better speeding.

APA, Harvard, Vancouver, ISO, and other styles

26

Tsai, Fang-Hsu, and 蔡芳旭. "High efficiency image scaling and hardware architecture design." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/20761384613354447197.

Full text

Abstract:

碩士
元智大學
電機工程學系
105
Image scaling is a technique that widely used in electronic products with display device on them, such as digital camera, smart phone, tablet, and medical application like endoscopy. With the trend of high resolution display development, the duration times between generations become shorter and shorter. Nowadays, the mainstream of display 1080p resolution for LCD/LED TV has been advanced from Full HD (1,920 x 1,080) to 4K UHD (3,840x2,160). Furthermore, the next generation of 8K-UHD (7,680 x 4,320) display has become the next advanced display resolution. However, Full-HD is still main mainstream of display content in present, which means the resolution gap between display devices and images will be bigger and bigger in the future. Therefore, the throughput requirement will be a critical issue in the future. However, the max throughput of the past works can achieve is only 200Mpixesl/sec, and is not sufficient for 4K UHD application (the throughput requirement is 250Mpixesl/sec). Therefore, this thesis is proposing a novel image scaling, which not only can achieve maximum 285Mpixesl/sec, but also has higher hardware efficiency and better image quality.

APA, Harvard, Vancouver, ISO, and other styles

27

李佳勳. "Hardware Architecture Design of Adaptive Equalizer and TCM Decoder." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/rav63b.

Full text

Abstract:

碩士
國立交通大學
電信工程系所
93
The Single-pair High Speed Digital Subscriber Loop (SHDSL) is the new generation symmetric DSL technique which could supply at most 2.3 Mbps downlink and uplink data rate symmetrically to subscribers on a long loop. However, the InterSymbol Interference (ISI) is severe especially duration the transmission over a long loop. Normal data transmission is impossible without properly taking care of the ISI problem. To assure SHDSL tranceivers to provide full-rate transmission, we need a powerful adaptive equalizer to ease the ISI problem. The decision feedback equalizer is the most often used equalizer for sovling the ISI problem. However, it has the error propagation problem, which will degrade the system performance. To improve the performance, the joint equalization and channel decoding is necessary. Nevertheless, combining the trellis decoder in the decision feedback equalizer will result in high complexity hardware. A powerful equalizer called Tomlinson-Harashima precoder (THP) system was proposed to solve this problem. By use of the procoding technique, the joint equalization and channel decoding can be accomplished by cascade a linear equalizaer and a TCM decoder for channel coding. In this theisis, starting from algorithm design and computer simulation, we design the THP system and TCM decoder hardware architectures according to the G.SHDSL recommendation. The resulting hardware could achieve the maximum 2.3 Mbps data rate under the 50 MHz operation clock. The hardware was verified on the FPGA development board.

APA, Harvard, Vancouver, ISO, and other styles

28

郭皇志. "Algorithm and Hardware Architecture Design for Intra-frame Encoding." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/69770164040628325035.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

楊凱博. "Application of Synchronous Elastic Architecture to FDRCLCP Hardware Design." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/cwvc69.

Full text

Abstract:

碩士
國立彰化師範大學
資訊工程學系
107
In the image processing research, the natural color image captured by the digital camera may be affected by insufficient lighting conditions, which causes the image brightness contrast to be compressed and cause some defects because we can use the fast dynamic range compression to preserve the regional image contrast. The algorithm in which the luminance part uses Gaussian filtering to obtain the regional average value for image smoothing, this algorithm can save the details of the image under dynamic range compression and preserve the original contrast. The hardware implementation of this study uses fixed-point, unsigned numbers and displacements to improve computational efficiency, reduce the number of logic components, and the cost of hardware circuits. In addition, the use of look-up tables accelerates the processing of hardware processed signals. The speed and the pipeline structure enable each level to operate independently. Therefore, the pipelined architecture can effectively reduce the overall calculation time and circuit area. At the same time, we will study how to synchronize the elasticity. The principle and architecture of the circuit, and the application of the hardware design of this algorithm. Compared with the traditional pipeline circuit, the synchronous elastic circuit has the latency-insensitive feature, which allows the pipelined data path to be fully utilized by multiple threads. The work completed in this paper consists of designing a flexible circuit with latency-insensitive properties and studying how to insert appropriate synchronous elastic control circuits into the image contrast algorithm circuit to support the operation of this algorithm and analyze how much circuit area and performance are sacrificed to achieve the function of the synchronous elastic circuit.

APA, Harvard, Vancouver, ISO, and other styles

30

Chhabra, Robin. "Concurrent Design of Reconfigurable Robots using a Robotic Hardware-in-the-loop Simulation." Thesis, 2008. http://hdl.handle.net/1807/17156.

Full text

Abstract:

This thesis discusses a practical approach to the concurrent analysis and synthesis of reconfigurable robot manipulators based on the alternative design methodology of Linguistic Mechatronics (LM) as well as the utilization of a modular Robotic Hardware-In-the-Loop Simulation (RHILS) platform. Linguistic Mechatronics is a systematic design methodology for mechatronic systems, which formalizes subjective notions and simplifies the optimization process, in the hope that numerous naturally different design variables can be considered concurrently. The methodology redefines the ultimate goal of design based on the qualitative notions of wish and must satisfactions. The underlying concepts of LM are investigated through a simulation case study. In addition, the RHILS platform involving physical joint modules and a control unit, which takes into account various physical phenomena and reduces the simulation complexities, is employed to the design architecture. Ultimately, the new approach is applied to redesigning kinematic, dynamic and control parameters of an industrial manipulator.

APA, Harvard, Vancouver, ISO, and other styles

31

Hui, Lai Hsiao, and 賴曉輝. "Research and design in hardware architecture of digital beamforming receiver." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/24540902175678981676.

Full text

Abstract:

碩士
大葉大學
電機工程研究所
89
The thesis focuses on the research and design of digital beamforming receiver’s hardware architecture. The receiving front-end consists of an array of sensors (antennas) by the beamforming techniques, we can arrive at the goal of space division multiple access in wireless environment. The adaptive processor detects the direction of arrival (DOA) of each source spreading in space and estimates the optimal weight vector. Consequently, adaptive beamforming can be used to increase system capacity. One of the main issues in Space Division Multiple Access (SDMA) is a technique of smart antenna. The smart antenna can improve performance in several ways：(1) To increase spectral efficiency and system capacity.(2) To reduce multi-path interference.(3) To combat co-channel interference(CCI).(4) Range extension. In this text, we apply the band-pass sampling theorem for IF sampling of the architecture of software radio. Furthermore, we completed interface-circuit and control-software which can control multi-digital down converter in one computer simultaneously, and realize the Digital Beamforming(DBF) module by Field Programmable Gate Array (FPGA). The control-software of view interface software is written by Borland C++ Builder (BCB) . It contains three view form：(1) To explain any Digital Down Converter (DDC) magnitude interface (2) To monitor eighteen DDC magnitude interface simultaneously (3) tuning interface. The eight-bits adjust-addresser on the interface circuit can be used total two hundred fifty-six addresses. This reaches purpose of control multi-DDC. The Very High Description Language (VHDL) describes DBF circuit to accomplish DBF by Xilinx 4036-3’s chip.

APA, Harvard, Vancouver, ISO, and other styles

32

Long, Ho Shan, and 何昇龍. "A data compression hardware architecture design based on LZW algorithm." Thesis, 1994. http://ndltd.ncl.edu.tw/handle/99280506383631973475.

Full text

Abstract:

碩士
國立臺灣科技大學
工程技術研究所
82
We propose in this thesis an novel hardware architecture for lossless data compression . The underlying algorithm is based on a improved version of LZW algorithm . The major modifications of the original LZW algorithm are : using FIFO strategy for diction- ary insertion to meet various type of data, without actually storing the initial 256 signal character string, and using paral- lel dictionaries with variable sizes and word lengths. The re- sulting hardware architecture has the following advantages. It is simple and only requires a small RAM and a few logic gates. As for the compression speed , it could be controlled by setting system parameters. In addition, the decompression speed is faster than LZW algorithm and the memory required is much smaller than that of LZW algorithm.

APA, Harvard, Vancouver, ISO, and other styles

33

Chan, Wei-Kai, and 詹偉凱. "Algorithm, VLSI Hardware Architecture and System Design for Smart Surveillance." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/18173065914884095598.

Full text

Abstract:

博士
國立臺灣大學
電子工程學研究所
100
In the next-generation visual surveillance systems, content analysis tools will be integrated. New design issues will arise related to system cost, deployment space, network loading, and system scalability. In this thesis, after a discussion in terms of surveillance pipelines, it is proposed to utilize a content abstraction hierarchy to relieve network loading and increase system scalability, and integrate a hardware content analysis engine into a smart camera System-on-a-Chip (SoC) to reduce system cost and deployment space. As a result, the surveillance IP camera will become a smart camera with the embedded capabilities for automatic content analysis and the network of surveillance IP cameras will become smart surveillance networks. For the functions of content analysis, the video object segmentation and tracking are two important building blocks for smart surveillance. However, there are several issues needed to be solved. First, the threshold decision is a hard problem for background subtraction video object segmentation. Second, for video object tracking, there are some issues or conditions that make video object tracking hard to be robust, such as non-rigid object motion, target appearance changes due to illumination condition changes, background clutter, ..., etc. In this thesis, by proposing an improve threshold decision algorithm, the threshold for background-subtraction-based video object segmentation can be decided automatically and robustly under sever dynamic backgrounds. Besides, the proposed threshold decision is based on a mechanism different from that in background-subtraction-based video object segmentation, which can prevent possible error propagations. For video object tracking, by using diffusion distance for color histogram matching, the tracker can track non-rigid moving object under sever illumination condition changes, and, by using motion clue from video object segmentation, the tracker can be robust to background clutter. In the experiments results, we show that the presented algorithms are robust under several challenging sequences and our proposed methods are truly effective approaches for the mentioned issues. Beside of video object segmentation and tracking, two more functions of content analysis are also improved in this thesis. They are video object description, and face detection and scoring. For the video object description, a new descriptor for human objects, Human Color Structure Descriptor (HCSD), is proposed. Experimental results show that the proposed descriptor, HCSD, can achieve better performance than Scalable Color Descriptor and Color Structure Descriptor of MPEG-7 for human objects. For face detection and scoring, facial images with low resolution in surveillance sequences are hard to detect with traditional approaches. An efficient face detection and face scoring technique in surveillance systems is proposed. It combines spirits of image-based face detection and essences of video object segmentation to filter out high-quality faces. The proposed face scoring technique, which is useful for surveillance video summary and indexing, includes four scoring functions based on feature extraction and is integrated by a neural network training system to select high-quality face. Experiments show that the proposed algorithm effectively extracts low-resolution human faces, which the traditional face detection algorithms cannot handle well. It can also rank face candidates according to face scores, which determine face quality. For the hardware content analysis engine, a 5.877 TOPS/W and 111.329 GOPS/mm^2 Reconfigurable Smart-camera Stream Processor (ReSSP) is implemented in 90nm CMOS technology. A coarse-grained reconfigurable image stream processing architecture (CRISPA) along with design techniques of heterogeneous stream processing (HSP) and subword-level parallelism (SLP) is implemented to accelerate the processing algorithms for smart-camera vision applications. With the processor architecture of CRISPA and the design techniques of HSP and SLP, ReSSP can outperform existing vision chips in many aspects of hardware performances. Moreover, the programmability of ReSSP makes it capable of supporting many high-level vision algorithms in high spec, such as the real-time capability for full-HD video analysis. The implementation results show that the on-chip memory can be reduced by 94% with SLP memory sharing. The on-chip memory size, power efficiency and area efficiency are 18.2x to 182x, 4.5x to 33.0x, and 3.8x to 74.2x better than the state-of-the-art chips. Beside of the algorithms and hardware that are proposed for the single smart camera, this thesis also presents a cooperative surveillance system. It proposes a cooperation scheme between fixed cameras and a mobile robot. The fixed cameras detect the objects with background subtraction and locate the objects on a map with homography transform. At the same time, the information of the target to track, including the position and the appearance, is transmitted to the mobile robot. After Breadth First Search in a map of Boolean array, the mobile robot finds the target in its view by use of a stochastic scheme with the information given, then the mobile robot will track the target and keep it in the robot''s view wherever he or she goes. By proposing this system, the dead spot problem in typical surveillance systems with only fixed cameras is considered and resolved. Besides, the track initialization problem in typical tracking systems, i.e. how to decide the target of interests to be tracked, is also resolved with the proposed cooperation scheme in system level.

APA, Harvard, Vancouver, ISO, and other styles

34

Chen, Chieh-Li, and 陳潔立. "Hardware Architecture Design and Implementation of Image-Based Rendering Engine." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/67614083369674148466.

Full text

Abstract:

碩士
臺灣大學
電子工程學研究所
98
Currently, the trend on the development of multimedia display system focuses on providing excellent image quality to users. As the resolution of television grows from standard definition (720×480) to quad full high definition (3840×2160), the latest iPhone 4 also provides Retina display, which has 326 dpi, already exceeds the threshold from which human eyes can tell the difference. However, the viewing experience is restricted to high visual quality because of some limitation of current display system. Current display systems only play the data they stored without any modification. It restrains the viewing experience of users that they can only watch multimedia contents without any interaction with them or adding their opinions into multimedia contents. Thus, we think a customized display system should be developed. The customized display system should have the ability to respond to users’ requirements and interact with users. To achieve this goal, we design an real-time interactive system, called image-based rendering engine, which introduces hardware acceleration to achieve real-time requirement and can be integrated into current display system, to bring more entertainment and to provide more interaction and more customized view experience to users. The proposed image-based rendering engine can support several existent image-based rendering algorithms, such as 2D panorama, concentric mosaics and depth-image based rendering. The image-based rendering engine can also support a new interactive system, called Tennis Real Play, letting users to interact with the broadcast tennis video contents and play a game after they watch the Grand Slam tournaments. In order to overcome the typical hardware design challenge in accelerating rendering algorithms, such as high throughput requirement, high bandwidth requirement, programmability and low cost, we employ reconfigurable architecture and hardware sharing techniques. Besides that, we also introduce folding technique, cache mechanism and FIFO to optimize our hardware architecture. The proposed image-based rendering engine is implementedwith TSMC 0.18μm process technology. The area of the image-based rendering engine is around 89662 gate counts with 499712 gate counts of memory. With the employ of cache mechanism, the corresponding bandwidth has reduced 82.3%. With the introduce of folding technique, the area has reduced 33.8%. And with FIFO, the total processing cycle decreases 27.4%. The proposed rendering engine can respond to users’ instruction and has rendering speed 9 times faster than CPU and 2 times faster than GPU.

APA, Harvard, Vancouver, ISO, and other styles

35

Wu, Pei-Hsuan, and 吳佩軒. "Architecture Design and Implementation of Deep Neural Network Hardware Accelerators." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/2nq8p3.

Full text

Abstract:

碩士
國立中山大學
資訊工程學系研究所
107
Deep Neural Networks (DNN) widely used in computer vision applications have superior performance in image classification and object detection. However, the huge amount of data movement and computation complexity are two challenges if DNN is used in embedded systems where real-time processing and power consumption are two major design considerations. Hardware DNN accelerators are usually designed using FPGA or ASIC. In this proposal, we develop a memory access method and design a DNN hardware accelerator with fewer memory access and lower power consumption. Using mixed input/output/reuse method, we design a DNN hardware accelerator with 32 processing elements (PEs) that accelerates the computation of VGG16 convolutional layers. The accelerator can achieve a maximum frequency of 515MHz with internal SRAM size of 280 KB using TSMC 40nm process technology. The peak performance of the accelerator is 139 GOP/s, which has better computation speed and power compared to Eyeriss [21].

APA, Harvard, Vancouver, ISO, and other styles

36

Chiu, Wen-Yu, and 邱文昱. "Algorithm and Architecture Design of Hardware-Oriented Video Embedded Compression." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/uv575k.

Full text

Abstract:

碩士
元智大學
電機工程學系甲組
107
Nowadays people pursuit of the visual perception quality of display resolution. However, it cause the technology of the display resolution is improving constantly. Presently, the Full-HD 1080p (1920x1080) display has become a common condition of requirement on Digital TV marketing. Along with the display resolution of technology change with each passing day, the display resolution is developed into quad full high definition (QFHD) and ultra high definition (UHD). Moreover, the display of the resolution and the frame rate are growing constantly. It will cause two problems of design bottleneck. 1). Enormous and complicated mathematical operation load. 2). The external memory requires tremendous memory bandwidth to process. In this study,it utilized the effective prediction algorithm and entropy coding system to process. Therefore, the hardware of the low complicated design can save the hardware cost and also reduce the mathematical operation. The lossless compression algorithm flow can divide into two parts to process. 1). Prediction : Adopt the prediction algorithm to decrease the image residual and make the residual use the less bit to encode. 2). Entropy coding system : the residual will be encoded systematically and provide the information to the decoder analyze. Hence, the effective prediction and entropy coding system can economize 57% of bandwidth of the image data.

APA, Harvard, Vancouver, ISO, and other styles

37

wang, ying-chi, and 王英琪. "CF Card Hardware Design Under a Micro-Controller Combining ASIC Architecture." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/38376198718597857037.

Full text

Abstract:

碩士
國立海洋大學
電機工程學系
90
The theme of the thesis is the micro-controller combining application specific chip architecture of hardware application。We introduced the CompactFlash Card，and emphasized the design of buffer and external ram，and how the data move between the buffer，host and flash memory。And then we introduced the MP3 decoder，and used FPGA to communicate between the micro-controller and decoder。

APA, Harvard, Vancouver, ISO, and other styles

38

Tsai, Lian-Tsung, and 蔡連宗. "Design and Implementation of JPEG2000 Hardware Architecture and Digital Watermark System." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/84544916371638968856.

Full text

Abstract:

碩士
國立中央大學
電機工程研究所
92
A new still image standard, JPEG2000, supplies not only higher compression performance but also various functionalities. This thesis focuses on the analysis and architecture design for a JPEG2000 still image encoding system. We analyze this system with several experiments. Based on the experiment results, the bottleneck for JPEG2000, EBCOT, is found. In this regard, some strategies for decreasing the computation time of EBCOT are discussed. In order to improve the EBCOT algorithm, Clean-Up Pass Skipping method (CUPS) and Pass Predicting method (PP) are proposed. We verify the CUPS and PP methods by completed simulation on C environment and they can reduce the 65~69% clock cycles for EBCOT context modeling. We achieve the efficient hardware architecture for the JPEG2000 encoding system. For the architecture design of EBCOT context modeling, proposed speed-improved methods are included. The CUPS method only needs an accumulator to sum up the number of coefficient-bits in a bit-plane that have been coded in Pass1 and Pass2. The PP method requires extra combinational logic circuits and two predict tables to record the addresses when the Pass1 and Pass2 coding are needed. A few components can improve the speed efficiency. Due to the rapid development of the networking and communication, the distribution of digital data is faster and arbitrary. There more consumer product is produced and popular, such as DSC, DV ...etc. In order to protect copyright of the multimedia data, a data capturing, compressing and ownership declaring is done at the same time. In this paper, a watermarking system for embedding wavelet transform domain and Philips TriMedia TM-1300 implementation is presented. The watermarking system applied Toral Automorphism to build a watermarking system which it is suitable for JPEG2000 watermarking.

APA, Harvard, Vancouver, ISO, and other styles

39

Tai, Hung-Shou, and 戴宏碩. "The Hardware Architecture Design of Trilateral Noise Filter for Color Images." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/59667062748549662497.

Full text

Abstract:

碩士
國立臺灣師範大學
應用電子科技研究所
95
Removing noise while preserving and enhancing edges is one of the most fundamental operations of image/video processing. When taking pictures with digital cameras, it is frequently found that the color images are corrupted by miscellaneous noise, especially images get with high ISO values in low luminance. Hence, noise filtering is a necessary module in digital still cameras. The difficulty of designing noise filter is that the filter will also reduce the sharpness of the image. On the other hand, optical lens imperfections are usually equivalent to spatial low pass filters and tend to result in blurred images. It is customary to apply edge enhancement algorithm on the image in order to improve the sharpness, but this process usually increase the noise level as a by-product. Hence, an efficient noise filter is very important before edge enhancement. In this paper, the efficiency of trilateral filter and other popular filters are compared briefly, and the trilateral filter is implemented by HDL language for image processing chip.

APA, Harvard, Vancouver, ISO, and other styles

40

Lee, Chuan-Yiu, and 李權祐. "Hardware Architecture Design andImplementation of Ray-Triangle Intersectionwith Bounding Volume Hierarchies." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/98797645439049633169.

Full text

Abstract:

碩士
國立臺灣大學
電子工程學研究所
95
Ray tracing is a simple yet powerful and general algorithm for accurately computing global light transport and rendering high quality images. While recent algorithmic improvements and optimized parallel software implementations have increased ray tracing performance to interactive levels, few efficient hardware solution has been available due to hardware unfriendly of traditional ray tracing algorithm. This thesis proposes a more hardware friendly ray tracing algorithm and describes the architecture based on this algorithm. We also implement a first prototype chip around the world for ray tracing with standard cell based design flow. By the proposed algorithm, on-chip sram usage of my design is reduced dramatically compared to previous architectures while it retains a similar computation amounts. We also use multi-threading and folding technique to increase the hardware utilization and achieve maximum performance at minimum hardware resource. The external bandwidth is low enough to duplicate many the same units to process in parallel, which is achieved by a tiny cache with word length analysis and vertex sharing technique. The prototype chip is fabricated by TSMC 0.13 μm technology. The chip size is 1.697×1.7mm2. It is capable of 4.3 giga floating point operations per-second. vii

APA, Harvard, Vancouver, ISO, and other styles

41

Chen, Hong-Yuh, and 陳宏郁. "Algorithm and Hardware Architecture Design of Face Hallucination Using Eigen Patch." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/90152639931775714904.

Full text

Abstract:

碩士
國立臺灣大學
電子工程學研究所
102
In surveillance system, to recognize the human face is always the most principal target. Due to the low-quality sensor of surveillance camera and the compression by video coding, the captured facial images are usually in low-resolution. In order to reach a better face recognition rate, a better resolution of these facial images is needed. Besides enhance the quality of sensor or raise the performance of video coding, to enhance the resolution of desired facial images, the face hallucination can be applied. Face hallucination is a super resolution process targeting on facial images. It can recover the related high-resolution image with rich details from a low-resolution facial image. Therefore, the goal of our work is to improve the resolution targeting on low-resolution facial images. The corresponded hardware design is also provided. We propose a low complexity face hallucination algorithm called eigen-patch which can provide high-resolution facial images with rich details and sharpness. Our eigen-patch algorithm combine the eigen-transformation face hallucination with the structure of position-patch based face hallucination. This algorithm has two main contributions. First is conductiing the eigen-transformation on patch size. The eigen transformation raise the image quality and reduce the computational complexity without solving the least square problem. The second contribution is the input image alignment skill. In usual case, the input low-resolution image would not be well-aligned. Therefore, the result high image will suffer from the artifacts and significant quality degradation. Based on the input low-resolution image, the input image alignment mechanism open a search range on database image in order to reach a better alignment. Experimental reuslts shows that the proposed face hallucination algorithm performs better than other ones. In hardware architecture design, we also simplify the original Eigen-Patch algorithm. We shift the image alignm mechanism into an earlier position, as a result, we do not have to recover all the facial images of different position. Only the correct aligned one will be hallucinated. The new Eigen-Patch scheme reduce the system bandwidth. We also analysis the number of database images in order to reduce the number of database images. The reduction of database images can further decrease the system bandwidth. Finally, our hardware implementation can reach a 4 times hallucination with 30 X 25 input image in 30fps.

APA, Harvard, Vancouver, ISO, and other styles

42

Lin, Yi-Chun, and 林奕君. "Algorithm and Hardware Architecture Design of Super Resolution Targeting TV Scaler." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/70962576761638950697.

Full text

Abstract:

碩士
國立臺灣大學
電機工程學研究所
100
The goal of super resolution is to recover the high-resolution image with sharp edges and rich details from a low-resolution input image. Due to the increasing gap between the resolution of image sources and display devices, super resolution has become an essential technique in many applications. In this work, we focus on the application of TV scaler. Due to real-time requirement and low hardware-cost constraints, conventional TV scaler can only employ basic interpolation technique thus introduces some artifacts that degrade the viewing quality of the output sequences. Therefore, the goal of this work is to improve the performance of TV scaler by adopting the super resolution technique. The corresponded hardware design is also provided. We propose a low complexity super resolution algorithm which can provide vivid output image with rich details and sharp edges. There are two main contributions. First is the development of double interpolation up-sampling. Double interpolation quality evaluation can be used as a measurement of an interpolation operation. By using this double interpolation framework, the direction-adaptive upsampling algorithm is proposed to solve the zigzag artifact and enhance the quality of edges. The second contribution is the database-free texture synthesis technique. Based on the fractal property of nature images, it is possible to find proper high resolution patches in low resolution input image itself. Therefore, the texture synthesis can be performed without database to provide proper and rich details. The double interpolation framework for up-sampling and the reconstruction constraint for the final optimization combined with the texture synthesis form the whole super resolution algorithm. Experimental results show that the proposed super resolution algorithm performs better than other ones. For the VLSI hardware design, the target specification is set to 1920x1080 frame size, with throughput of 60 frames per second. The main contributions of hardware architecture design are one-pass double interpolation, tile-based gradient descent, and partial-sum reuse texture synthesis. One-pass double interpolation and tile-based gradient descent lower down the consumption of bandwidth and SRAM, while partial-sum reuse texture synthesis reduce 76 percent of the computational costs. The hardware is implemented with Verilog-HDL and synthesized with SYNOPSYS Design Compiler. TSMC 65nm cell library is adopted to design the hardware. The operation frequency is at 240MHz. The total gate count is 766K. We also verify the design with FPGA. The demo platform is based on Terasic DE4 development board with the Altera Stratix IV GX device. The FPGA demo system up-samples the video by the proposed super resolution hardware at the frame size of 1920x1080 and the frame rate of 24 frames per second. The results show that our architecture is able to provide high quality output in real-time while solving the problems of zigzag and blurred effects caused by conventional scaler.

APA, Harvard, Vancouver, ISO, and other styles

43

Lai, Ue-Ln, and 賴譽仁. "Hardware Architecture Design and Implementation of Elliptic Curve Encryption/Decryption Algorithms." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/60556511931839409769.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
90
In this thesis, the VLSI architecture design and implementation of two 162-bit elliptic curve encryption/decryption chips are presented. One of them is based-on the IEEE 1363-2000 standard, and the other is based-on the arithmetic operations over extension field. All of the chips perform arithmetic operations over field on projective coordinates with optimal normal-basis representation. To provide the flexibility of interfacing with common microprocessors, the data bus width can be set to 8, 16, or 32 bits. The performance of extension-field-based chip is superior to that of standard-based chip at the same operating frequency in terms of bit rate, die size, and power consumption. The standard-based chip operates at 45 MHz and has bit rate of 44.1 kbps when realized on the Xilinx FPGA Virtex V400BG560. It operates at 125 MHz and has bit rate 122.7 kbps when realized on TSMC 0.35 um cell-based process. The resulting chip occupies 2.713*2.713mm^2 die area and consumes 133.98mW. The extension-field-based chip operates at 48 MHz and has bit rate of 94.2 kbps when realized on the Xilinx FPGA Virtex V400BG560. It operates at 125 MHz and has bit rate 245.4 kbps when realized on TSMC 0.35um cell-based process. The resulting chip occupies 2.541*2.541mm^2 die area and consumes 124.74 mW.

APA, Harvard, Vancouver, ISO, and other styles

44

Wang, Chien-Chung, and 王建中. "The Hardware Architecture Design for Cube-root and Color Space Conversion." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/93908490441359952745.

Full text

Abstract:

碩士
國立雲林科技大學
電子與資訊工程研究所碩士班
90
Various color spaces have been reported in an attempt to identify a uniform perceptual color space for color measurement and prediction purposes. CIE (Commission Internationale de l''Eclairage) recommends one linear transformation to get the XYZ color space, and then, one non-linear transformation is used to get the L*a*b* color space. The design and implementation of hardware architecture, which can perform real time conversion from the RGB color coordinates to standard CIE L*a*b* color coordinates, is studied in this thesis. To calculate the cube-root in non-linear transformation, we propose the approximate arithmetic algorithms and the corresponding hardware architecture to replace the look-up tables. The accuracy of the color coordinate transform is simulated under Matlab programming tool. Then using the Verilog HDL programming language and SYNOPSYS synthesis tool to estimate and forecast its hardware performance. Finally, the implemented architecture for cube-root is faster than the LUT, and the presented combinational logic is less than the most recent published works for color space conversion.

APA, Harvard, Vancouver, ISO, and other styles

45

Lan, Wei, and 藍瑋. "High-throughput Hardware Architecture Design and Realization of RaptorQ Code Decoder." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/52697808591399871712.

Full text

Abstract:

碩士
國立臺灣大學
電子工程學研究所
103
With advances in technology, there are much more services that a smartphone can provide. The multimedia streaming is the most common one that people use. However, transmission latency is of utmost important to quality of viewing/listening experiences. Unfortunately, wireless transmission often suffers channel fading that renders robust transmission almost impossible without effective error correction mechanism. Conventional protocol generally retransmits the erased coded sequence until the receiver receives it correctly. Fountain codes, on the other hand, keep the partially decoded information and continue to receive and decode the coded symbol until the whole information sequence can be recovered. Such rateless code has drawn a great deal of attention and has been applied in many scenarios. RaptorQ code is the latest generation of Raptor codes. Compared with the previous version, RaptorQ code provides higher flexibility and the lower decoding failure probability. However, the decoding procedure is also much more complicated. Conventionally, the decoding of RaptorQ codes requires inverting a huge matrix. Instead of such costly matrix inversion, we proposed to calculate the inverse of another matrix whose rows are a little different from the one that needs to be decoded. Therefore, most computations are shifted offline. Next, previous decoding usually decodes the intermediate symbols while inverting the matrix, and recovers the information sequence from the intermediate symbols. With the pre-calculated inverse, the proposed algorithm combines the intermediate sequence decoding and the procedure of information sequence recovery to reduce the complexity. Last, due to the systematic code property of the RaptorQ code, we proposed a new method that avoids many unnecessary computation when decoding the received information sequence. Finally, the proposed decoding algorithm is not only simulated on software, but also verified with FPGA board to prove its feasibility.

APA, Harvard, Vancouver, ISO, and other styles

46

Chen, Chun-Ting, and 陳俊廷. "Intelligent Brain-inspired Human-centric RecognitionAlgorithm and its Hardware Architecture Design." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/94144528383791990153.

Full text

Abstract:

碩士
國立臺灣大學
電子工程學研究所
100
As the technologies continue to evolve, our computers have more and more computing capacity, which drives a lot of intelligent applications to emerge like smile shutter, automatic surveillance system, smart car and smart home. These smart machines can sense the surrounding like human and provide safety, convenience and efficiency to help human. These intelligent applications in this thesis are called human-centric applications which based on the needs of human. In this thesis, we focus on the human-centric recognition applications,such as face recognition, object recognition and action recognition. On the other hand, since we are in the era where radio equipped computers dominate, the amount of multimedia data is growing extremely fast. Youtube have reported that more than 35 hours of video are being uploaded to the video-sharing site every minute in 2010. In this rate, we need to handle over one zettabyte of information annually. Therefore, to support various intelligent applications and manage this huge amount of data, we need an efficient and scalable hardware platform to provide the required computation capability. The ultimate goal is to approach human-like intelligence. For building an intelligent machine, mimicking the structures and functions of visual cortex has always been a major approach to implement a human-like intelligent visual system. In this thesis, we started from exploring brain’s computing style and architecture, then designed a brainlike computing system for visual recognition,which can be easily scalable with the amount of resources for future intelligent applications. The whole system design flow starts from Neocortical Computing (NC) model design, Neocortical Computing System design and then the real-time human-centric NC architecture based on FPGA system. NC model provides the functionality for required intelligent human-centric applications. NC architecture is an efficient and scalable hardware platform optimized for NC model. And FPGA system verify the NC system by transforming the NC model into the specific memory content that can be interpreted by platform. In this thesis, the main system design strategy is to provide the application diversity and efficiency as human brains. At first, we analyze the current NC models and find that they are lack of the temporal domain integration and thus are hard to explore the object recognition into time-relevant action recognition. To solve this problem, inspired from the human brain system’s recurrent information transmission nature and neuron network research, we proposed a recurrent computing kernel to integrate the temporal domain action feature information efficiently. Therefore we could construct an efficient dimension-lifting Reservoir Kernel which exhibits the property of temporal memory and thus can integrate the temporal information provided by the HMAX network and boost up its recognition performances. Experimental results showed that it can outperform the state-of-the-art HMMSVM method substantially. Second, for the NC system design of NC model, we analyze the computation of NC model and state its main problem – massive data access, which results in power inefficiency, redundant external bandwidth usage, slow response and no communication scalability. In current computing system, this problem causes the NC system becomes a memory-bounded system. To address this issue, inspired from the information forwarding scheme of neurons, we proposed a Push-based Dataflow (Push-DF) structure using push-based processing for external memory access reduction and efficient sparse data forwarding. From the experimental result, the Push-DF in many-core architecture can achieve lower latency, power consumption and external bandwidth than RISC and GPU. Utilizing push-based processing greatly reduces the massive external memory access so that our NC system can break the bottleneck of traditional memory-bounded system. This important feature provides the communication scalability of our NC system, which meets the design goal for a scalable brain-mimicking hardware platform. At last, we utilized the proposed Push-DF structure for designing NC system and implemented a 8-core NCSoC in FPGA system. Our final implementation of NCSoC takes 0:179 seconds to recognize a 100×100 image. In conclusion, NCSoC supports NC model for various intelligent recognition tasks, and provides better performance, efficiency and scalability over current computing platform. As a result, it have the potential to support various intelligent applications and manage huge amount of multimedia data for future applications.

APA, Harvard, Vancouver, ISO, and other styles

47

Tzu-YinKuo and 郭姿吟. "High Performance Hardware Architecture Design of Homomorphic AES for Cloud Computing." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/5277h4.

Full text

Abstract:

碩士
國立成功大學
電機工程學系
105
Fully homomorphic encryption (FHE) is an emerging technique that allows the encrypted data to be processed directly in untrusted servers for ensuring data privacy. Despite of the important feature of FHE in cloud computing applications, there are still extremely high computation complexity and implementation cost in the underlying algorithms. Homomorphic evaluation of advanced encryption standard (AES) can be regarded as a complex function, in which existing homomorphic AES implementations still demand a significant amount of computational time. The most expensive operation in homomorphic AES is key switching after homomorphic multiplication and automorphism operations. To improve the performance of homomorphic AES by reducing homomorphic multiplication and automorphism operations in critical computational paths, this thesis proposes a parallel SubByte and MixColumn/ShiftRow algorithm by relaxing the underlying data dependency. Compared to the conventional homomorphic AES, the proposed one can reduce 3 key switching operations in one round of homomorphic AES assuming parallel processing. Moreover, high-performance hardware architectures of homomorphic AES are presented for different security levels. Performance evaluations show that the proposed design outperforms the related works in terms of computational time and performance.

APA, Harvard, Vancouver, ISO, and other styles

48

Liu, Yue-qu, and 劉岳衢. "Reconfigurable Design and Implementation of Modular-Construction Based FFT Hardware Architecture." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/m9kw97.

Full text

Abstract:

碩士
國立中山大學
電機工程學系研究所
106
In the 3GPP-LTE communication standard, it defines many kinds of Fast Fourier Transform(FFT) sizes. So, we design a high performance FFT architecture which makes good use of modular design construction and reconfigurable design to achieve easily connection between every 2 stages. This design can be suitable for any requirement. In the 4-stage module, it can support 48 modes which perform 2-2187 FFT points. It also supports 32 modes defined in 3GPP-LTE communication standard. Each module contains two parts. (1) Reconfigurable Computing Kernel(RC-CK)：We employ radix-32 and radix-23 bases and suitably utilize the hardware reuse property. Without extra of hardware resource (ex：multipliers or adders), it can execute six types of different radix of FFT kernel operations. (2) Reconfigurable First-in First-Out(RC-FIFO)：We develop a high efficient design method for supporting many FFT points. The FIFO plan is easily managed and suitably located to maximize the hardware storage usage. In addition, we propose Section-based Twiddle Factor Generator(STFG) to support multi-FFT points. It can reduce the area cost and satisfy any communication systems effectively. In the chip implementation, the core area is only 0.318 mm2 by using TSMC 40-nm CMOS technology. The maximal operating frequency is 350 MHz and power dissipation in average is 44.2 mW. As compared with other state-of-the-arts, our proposed work has the best performance and support many FFT points. Most important of all, the proposed hardware architecture has the better scalability. In the future, we can support the undefined specifications of the 5th generation wireless system only by increasing/decreasing the module.

APA, Harvard, Vancouver, ISO, and other styles

49

Wang, Ching-Shun, and 王靖順. "Reconfigurable Hardware Architecture Design and Implementation for AI Deep Learning Accelerator." Thesis, 2019. http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5441107%22.&searchmode=basic.

Full text

Abstract:

碩士
國立中興大學
電機工程學系所
107
This paper proposes the Convolution Neural Network hardware accelerator architecture with 288PE to achieve 230.4GOPS@400Mhz. To verify the hardware function, the hardware is implemented at 100MHz in units of 72PE owing to the limitation of FPGA resources. The proposed CNN hardware accelerator is Layer-based architecture which can be reconfigured the layer parameters to suitable for different CNN architectures. The proposed architecture is based on operating three Rows Input feature map and then generate a Row Output feature map. The proposed architecture uses 322KB On-Chip Memory to store Input feature map, Bias, Kernel, and Output feature map to improve the efficiency of Data reuse and reduce bandwidth utilization. In this paper, the Max-pooling layer after the Convolution layer can be combined to reduce the bandwidth of DRAM.

APA, Harvard, Vancouver, ISO, and other styles

50

Shiau, Wen-Shiuh, and 蕭紋旭. "Hardware Design of a Shared-memory Architecture for ATM/Ethernet Switching Subsystems." Thesis, 1997. http://ndltd.ncl.edu.tw/handle/64517773567389631715.

Full text

Abstract:

碩士
國立中正大學
電機工程學系
85
Abstract Design of ATM/Ethernet switching systems has received much attention from R&D organization around the world. This switching system provieds a seamlesstransport platform for interworking tradtional LAN environment with the ATMbackbone. In this thesis, we present an effort for realizing a switchingsubsystem that supports bridging and switching capabilities for transportingdata between Ethernet modules and ATM modules (including ATM/ATM and Ethernet/Ethernet). In this design, we adopt the shared memory structure for the information transfer between Ethernet modules and ATM modules. To support the shared memorystructure, we design a shared memory manager to handle all necessary control and management functions incurred during transferring data from one module toanother. We first describe the related functional blocks for the proposed subsystem. We define associated interfaces and structures for each block. Wedefine control tables to inform modules about data characteristics when involving data transfer. We make use of the translation tables to support thebridging capabilities. We also design a CPU interface for supporting the PVCconfiguration and SVC signaling message transfer. Based on the designed architecture, we describe the logic flow for eachfunctional block, and explain their associated FSMs. We conduct the hardwareimplementation through a Top-Down methodology, which starts with behaviorsynthesis and logic simulation. We simulate and verify the proposed subsystemthrough the following processes: block simulation, FSM simulation, module simulation, and system simulation. We present the related testing process and procedures for each moduleand perform an integrated test for the subsystem. Based on our calculation,throughput of the designed subsystem can reach up to 1056Mbps. To checkwhether the system performs correctly, we analyze worst-case delay performancefor different logic flows. From our simulation, the subsystem can be operatedat 33 MHz speed. Finally, we give some remarks about the development effort of the switchingsubsystem, which includes improvement of hardware design and support of other functionalities.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!