Дисертації з теми "Mechatronics hardware design and architecture"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Mechatronics hardware design and architecture".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Basic, Goran. "Hardware-in-the-loop simulation of mechanical loads for mechatronics system design." Thesis, University of Ottawa (Canada), 2003. http://hdl.handle.net/10393/26323.
Повний текст джерелаYazdanpanah, Fahimeh. "Hardware design of task superscalar architecture." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/277376.
Повний текст джерелаExplotar la concurrencia para conseguir un mejor rendimiento es un reto importante y difícil para los sistemas de alto rendimiento. Aunque la teoría es sencilla, en muchos casos la complejidad de los modelos de programación paralela tradicionales impide al programador obtener un buen rendimiento. Se han propuesto diferentes granularidades de particionamiento de tareas para explotar mejor la concurrencia implícita en las aplicaciones. En este sentido, diferentes sistemas software de manejo dinámico de tareas utilizan los principios de ejecución "dataflow" para mejorar el paralelismo a nivel de tarea y superar el rendimiento de los sistemas de planificación estáticos. Estos modelos planfican la ejecución dinámicamente y utilizan tareas, en lugar de instrucciones, como unidad básica de trabajo. De esta forma descargan al programador de tener que realizar la sincronización de las tareas explícitamente en su programa. Aunque estos modelos de programación comparten muchas similitudes con los bien conocidos procesadores fuera de orden (como el análisis dinámico de dependencias y la ejecución en "dataflow"), dependen de un análisis dinámico software de las dependencias. Dicho análisis es inherentemente lento y limita la escalabilidad cuando hay un gran número de tareas pequeñas. Los problemas antes mencionados se incrementan exponencialmente con el número de núcleos disponibles. Para conseguir mantener todos los núcleos ocupados y conseguir acelerar el rendimiento global de la aplicación se hace necesario particionarla en muchas tareas pequeñas. La gestión de dichas tareas (es decir, su creación y distribución entre los núcleos) en software introduce sobrecostes, y por tanto resulta ineficiente conforme aumenta el número de núcleos. En contraposición, un sistema hardware de planificación de tareas puede conseguir mejores rendimientos ya que requiere una menor latencia en la gestión de las tareas. El Task Superscalar (TSS) es una arquitectura híbrida dataflow/von-Neumann que explota el paralelismo a nivel de tareas de los programas. El TSS combina la efectividad de los procesadores fuera de orden con la abstracción de tarea, y por tanto provee una capa unificada de gestión para los CMPs que gestiona los núcleos como unidades funcionales. Previo al trabajo de esta tesis el Task Superscalar se había implementado en software con un paralelismo limitado y mucho consumo de memoria debido a las limitaciones inherentes de una implementación software. En esta tesis se diseñado una implementación hardware de la arquitectura Task Superscalar con capacidad para manejar muchas tareas de pequeño tamaño que es integrable en un futuro computador de altas prestaciones. Así pues, las contribuciones principales de esta tesis son: (1) el diseño de un flujo operacional de la arquitectura Task Superscalar adaptado y mejorado para su implementación hardware; (2) un prototipo HDL de dicho flujo para la exploración de las latencias asociadas a la implementación hardware; (3) un simulador ciclo a ciclo del diseño hardware basado en los resultados obtenidos en la implementación hardware; (4) una exploración completa del espacio de diseño de los componentes hardware (número y cantidad de módulos, tamaños de las memorias, etc.) para diferentes tamaños de computadores (es decir, para diferentes cantidades de nucleos); (5) una comparación con la implementación software actual del mismo modelo de programación utilizando aplicaciones reales y; (6) una exploración de la utilización de recursos hardware de las diferentes configuraciones seleccionadas.
Persson, Robert. "PPS5000 Thruster Emulator Architecture Development & Hardware Design." Thesis, Luleå tekniska universitet, Rymdteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-72827.
Повний текст джерелаMahmud, Akib. "Hardware in the Loop (HIL) Rig Design and Electrical Architecture." Thesis, Uppsala universitet, Institutionen för teknikvetenskaper, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-324661.
Повний текст джерелаDavis, Jesse H. Z. (Jesse Harper Zehring) 1980. "Hardware & software architecture for multi-level unmanned autonomous vehicle design." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/16968.
Повний текст джерелаIncludes bibliographical references (p. 95-96).
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
The theory, simulation, design, and construction of a radically new type of unmanned aerial vehicle (UAV) are discussed. The vehicle architecture is based on a commercially available non-autonomous flyer called the Vectron Blackhawk Flying Saucer. Due to its full body rotation, the craft is more inherently gyroscopically stable than other more common types of UAVs. This morphology was chosen because it has never before been made autonomous, so the theory, simulation, design, and construction were all done from fundamental principles as an example of original multi-level autonomous development.
by Jesse H.Z. Davis.
M.Eng.
Pajayakrit, A. "VLSI architecture and design for the Fermat Number Transform implementation." Thesis, University of Newcastle Upon Tyne, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.379767.
Повний текст джерелаPatel, Krutartha Computer Science & Engineering Faculty of Engineering UNSW. "Hardware-software design methods for security and reliability of MPSoCs." Awarded by:University of New South Wales. Computer Science & Engineering, 2009. http://handle.unsw.edu.au/1959.4/44854.
Повний текст джерелаLiang, Cao. "Hardware/Software Co-Design Architecture and Implementations of MIMO Decoders on FPGA." ScholarWorks@UNO, 2006. http://scholarworks.uno.edu/td/416.
Повний текст джерелаMoreira, Francis Birck. "Profiling and reducing micro-architecture bottlenecks at the hardware level." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/103977.
Повний текст джерелаMost mechanisms in current superscalar processors use instruction granularity information for speculation, such as branch predictors or prefetchers. However, many of these characteristics can be obtained at the basic block level, increasing the amount of code that can be covered while requiring less space to store the data. Moreover, the code can be profiled more accurately and provide a higher variety of information by analyzing different instruction types inside a block. Because of these advantages, block-level analysis can offer more opportunities for mechanisms that use this information. For example, it is possible to integrate information about branch prediction and memory accesses to provide precise information for speculative mechanisms, increasing accuracy and performance. We propose a BLAP, an online mechanism that profiles bottlenecks at the microarchitectural level, such as delinquent memory loads, hard-to-predict branches and contention for functional units. BLAP works at the basic block level, providing information that can be used to reduce the impact of these bottlenecks. A prefetch dropping mechanism and a memory controller policy were developed to use the profiled information provided by BLAP. Together, these mechanisms are able to improve performance by up to 17.39% (3.90% on average). Our technique showed average gains of 13.14% when evaluated under high memory pressure due to highly aggressive prefetch.
Woods, Walt. "The Design of a Simple, Spiking Sparse Coding Algorithm for Memristive Hardware." PDXScholar, 2016. http://pdxscholar.library.pdx.edu/open_access_etds/2721.
Повний текст джерелаZhang, Yuanzhi. "Algorithms and Hardware Co-Design of HEVC Intra Encoders." OpenSIUC, 2019. https://opensiuc.lib.siu.edu/dissertations/1769.
Повний текст джерелаRobinson, Kylan Thomas. "An integrated development environment for the design and simulation of medium-grain reconfigurable hardware." Pullman, Wash. : Washington State University, 2010. http://www.dissertations.wsu.edu/Thesis/Spring2010/k_robinson_041510.pdf.
Повний текст джерелаTitle from PDF title page (viewed on June 22, 2010). "School of Electrical Engineering and Computer Science." Includes bibliographical references (p. 75-76).
Moustakas, Evangelos. "Design and simulation of a primitive RISC architecture using VHDL /." Online version of thesis, 1991. http://hdl.handle.net/1850/11229.
Повний текст джерелаHaspel, Patrick R. "Researching methods for efficient hardware specification, design and implementation of a next generation communication architecture." [S.l.] : [s.n.], 2007. http://deposit.ddb.de/cgi-bin/dokserv?idn=984774084.
Повний текст джерелаMikulcak, Marcus. "Development of a Predictable Hardware Architecture Template and Integration into an Automated System Design Flow." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-124497.
Повний текст джерелаVasudevan, Siddarth. "Design and Development of a CubeSat Hardware Architecture with COTS MPSoC using Radiation Mitigation Techniques." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285577.
Повний текст джерелаCubeSat-uppdrag behöver komponenter som är toleranta mot strålningen i rymden. Maskinvarukomponenterna måste vara pålitliga och funktionaliteten ombord får inte äventyras under uppdraget. Samtidigt bör kostnaden för hårdvara och dess utveckling inte vara hög. Därför diskuterar denna avhandling design och utveckling av en CubeSatarkitektur med hjälp av COTS (eng. Custom-off-The-Shelf) MPSoC (eng. Multi Processor System-on-Chip). Arkitekturen använder en prisvärd strålningshärdad (eng. Rad-Hard) Micro-Controller Unit(MCU) som Övervakare för MPSoC:en och använder också flera tekniker för att begränsa strålningens effekter såsom kretser för att skydda kretsen från s.k. Single Event Latch-Ups (SELs), återläsningsskrubbning för icke-volatila minnen (eng. Non-Volatile Memories) NVMs som NOR Flash och skrubbning av konfigurationsminnet skrubbning för FPGA:er i MPSoC:en för att skydda dem mot Single-Event Upsets (SEUs), och tillhandahålla pålitlig kommunikation mha CRC och Space Packet Protocol. Bortsett från sådana funktioner utför Övervakaren uppgifter som Watchdog för att övervaka att applikationerna som körs i MPSoC:en fortfarande är vid liv, dataloggning, och Over- the-Air-uppdateringar av programvaran/Firmware. Examensarbetet implementerar funktioner såsom kommunikation, återläsningsskrubbning av minnet, konfigurationsminnesskrubbning mha SEM- IP, Watchdog och uppdatering av programvara/firmware. Exekveringstiderna för utförandet av funktionerna presenteras för den applikationen som körs i Övervakaren. När det gäller konfigurationsminnesskrubbningen som implementerats i den programmerbara logiken i FPGA:n, rapporteras area och latens.
Haspel, Patrick R. "Researching methods for efficient hardware specification, design and implementation of a next generation communication architecture." Mannheim : Universität, 2006. http://madoc.bib.uni-mannheim.de/madoc/volltexte/2007/1416/.
Повний текст джерелаCornevaux-Juignet, Franck. "Hardware and software co-design toward flexible terabits per second traffic processing." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2018. http://www.theses.fr/2018IMTA0081/document.
Повний текст джерелаThe reliability and the security of communication networks require efficient components to finely analyze the traffic of data. Service diversification and through put increase force network operators to constantly improve analysis systems in order to handle through puts of hundreds,even thousands of Gigabits per second. Commonly used solutions are software oriented solutions that offer a flexibility and an accessibility welcome for network operators, but they can no more answer these strong constraints in many critical cases.This thesis studies architectural solutions based on programmable chips like Field-Programmable Gate Arrays (FPGAs) combining computation power and processing flexibility. Boards equipped with such chips are integrated into a common software/hardware processing flow in order to balance short comings of each element. Network components developed with this innovative approach ensure an exhaustive processing of packets transmitted on physical links while keeping the flexibility of usual software solutions, which was never encountered in the previous state of theart.This approach is validated by the design and the implementation of a flexible packet processing architecture on FPGA. It is able to process any packet type at the cost of slight resources over consumption. It is moreover fully customizable from the software part. With the proposed solution, network engineers can transparently use the processing power of an hardware accelerator without the need of prior knowledge in digital circuit design
Passarella, Alice. "Hardware Design and Firmware Architecture of a Multi-Sensor Platform for Monitoring of Workpieces and Machines." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Знайти повний текст джерелаNiu, Xinwei. "System-on-a-Chip (SoC) based Hardware Acceleration in Register Transfer Level (RTL) Design." FIU Digital Commons, 2012. http://digitalcommons.fiu.edu/etd/888.
Повний текст джерелаSchultek, Brian Robert. "Design and Implementation of the Heterogeneous Computing Device Management Architecture." University of Dayton / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1417801414.
Повний текст джерелаLuo, Li-Ping, and 羅立平. "Extensible Sorting Hardware Architecture Design." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/86310261698659358923.
Повний текст джерела中原大學
電子工程研究所
101
In our thesis, we propose an extensible sorting hardware architecture circuit. Analyzing the repetition of the Odd-Even Transposition Sort method, we design a basic sorting cell by the hardware description language. We can apply several basic sorting cells to build as the extensible sorting hardware circuit and satisfy the required input numbers of the sorting data. The extensible sorting hardware circuit can be applied to the FlexRay communication controller circuit for adjustment of the several cases of the sorting input data numbers. FlexRay is a specification of vehicle network communication which provides high speed, timing trigger, and fault tolerance. In our thesis, we also implement the circuit of the FlexRay communication controller with the Verilog hardware description language. We demand a sorting circuit to sort the timing table data in the communication controller circuit to correct the global time. The basic sorting cell circuit can be re-used and convenient to other applications. The sorting circuit has a function to save energy by turning off the not-use modules. Finally, we verify the circuit of communication controller to simulate and synthesize the circuit on the FPGA. We experiment the field try to confirm our design of the communications controller and extensible sorting circuit working correctly.
Wu, Tung-Yang, and 吳東陽. "Color Constancy: Algorithm and Hardware Architecture Design." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/71628149939660026643.
Повний текст джерела國立臺灣大學
電子工程學研究所
96
Abstract Humans are able to recognize the color of objects independently of the light sources, which is called as color constancy. In a digital still camera, a sensor is used to measure the reflected light, and the measured color at each pixel varies according to the color of the illuminant. Therefore, the resulting colors may not be the same as those perceived by users. Many algorithms have been developed to solve the color constancy problem, which is sometimes also called as auto white balance. Since digital cameras and mobile phones equipped with cameras became more and more popular in recently years, the selection of color constancy algorithms for realtime system implementation is an important issue. In this thesis, we first provide a comprehensive introduction to the field of color constancy, where the major color constancy algorithms are described. The performance of these algorithms are then evaluated, and the hardware cost of some algorithms are analyzed with a proposed system framework. Furthermore, based on the analysis results, we also propose a new algorithm by taking advantages of existing Gamut Mapping and modified Gary World algorithms. Gamut Mapping algorithm is employed when the number of recognized illuminants is small enough, and modified Gray World algorithm is employed for other cases. After comparing with other color constancy methods with a large date sets of images recording objects under different light sources, the experiments show that the proposed color constancy algorithm achieves the best performance with acceptable hardware cost.
Wu, Tung-Yang. "Color Constancy: Algorithm and Hardware Architecture Design." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2907200814221300.
Повний текст джерелаHUANG, YU-NAN, and 黃育楠. "Hardware architecture design for adaptive predictive line search." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/31697894806495984204.
Повний текст джерела國立臺灣科技大學
電子工程系
93
Motion estimation’s computation is a key technique in most algorithms for video compression. In order to reduce the extremely high complexity of the “Full Search” approach, many fast algorithms for motion estimation’s computation have been proposed. The “Logarithm Search”, “Three-Step Search”, “Diamond Search” are among the most famous fast algorithms. In this thesis, we introduce the theory “Predictive Line Search” which is adaptive to the hardware implementation, and we also provide that the improved theory “Motion Adaptive Search”. By applying these two methods, 40% to 50% performance improvement in speed is gained while yielding nearly the same quality measured in PSNR. Combining the above-mentioned two theories, a kind of pipeline hardware architecture can be introduced in this thesis, and the Motion Estimation’s computation can be speed up by the proposed hardware’s property. In addition “Predictive Line Search” has regular search pattern, data reuse can be applied and less memory bandwidth is needed that is more appropriate for hardware strategy implementation and better speeding.
Tsai, Fang-Hsu, and 蔡芳旭. "High efficiency image scaling and hardware architecture design." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/20761384613354447197.
Повний текст джерела元智大學
電機工程學系
105
Image scaling is a technique that widely used in electronic products with display device on them, such as digital camera, smart phone, tablet, and medical application like endoscopy. With the trend of high resolution display development, the duration times between generations become shorter and shorter. Nowadays, the mainstream of display 1080p resolution for LCD/LED TV has been advanced from Full HD (1,920 x 1,080) to 4K UHD (3,840x2,160). Furthermore, the next generation of 8K-UHD (7,680 x 4,320) display has become the next advanced display resolution. However, Full-HD is still main mainstream of display content in present, which means the resolution gap between display devices and images will be bigger and bigger in the future. Therefore, the throughput requirement will be a critical issue in the future. However, the max throughput of the past works can achieve is only 200Mpixesl/sec, and is not sufficient for 4K UHD application (the throughput requirement is 250Mpixesl/sec). Therefore, this thesis is proposing a novel image scaling, which not only can achieve maximum 285Mpixesl/sec, but also has higher hardware efficiency and better image quality.
李佳勳. "Hardware Architecture Design of Adaptive Equalizer and TCM Decoder." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/rav63b.
Повний текст джерела國立交通大學
電信工程系所
93
The Single-pair High Speed Digital Subscriber Loop (SHDSL) is the new generation symmetric DSL technique which could supply at most 2.3 Mbps downlink and uplink data rate symmetrically to subscribers on a long loop. However, the InterSymbol Interference (ISI) is severe especially duration the transmission over a long loop. Normal data transmission is impossible without properly taking care of the ISI problem. To assure SHDSL tranceivers to provide full-rate transmission, we need a powerful adaptive equalizer to ease the ISI problem. The decision feedback equalizer is the most often used equalizer for sovling the ISI problem. However, it has the error propagation problem, which will degrade the system performance. To improve the performance, the joint equalization and channel decoding is necessary. Nevertheless, combining the trellis decoder in the decision feedback equalizer will result in high complexity hardware. A powerful equalizer called Tomlinson-Harashima precoder (THP) system was proposed to solve this problem. By use of the procoding technique, the joint equalization and channel decoding can be accomplished by cascade a linear equalizaer and a TCM decoder for channel coding. In this theisis, starting from algorithm design and computer simulation, we design the THP system and TCM decoder hardware architectures according to the G.SHDSL recommendation. The resulting hardware could achieve the maximum 2.3 Mbps data rate under the 50 MHz operation clock. The hardware was verified on the FPGA development board.
郭皇志. "Algorithm and Hardware Architecture Design for Intra-frame Encoding." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/69770164040628325035.
Повний текст джерела楊凱博. "Application of Synchronous Elastic Architecture to FDRCLCP Hardware Design." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/cwvc69.
Повний текст джерела國立彰化師範大學
資訊工程學系
107
In the image processing research, the natural color image captured by the digital camera may be affected by insufficient lighting conditions, which causes the image brightness contrast to be compressed and cause some defects because we can use the fast dynamic range compression to preserve the regional image contrast. The algorithm in which the luminance part uses Gaussian filtering to obtain the regional average value for image smoothing, this algorithm can save the details of the image under dynamic range compression and preserve the original contrast. The hardware implementation of this study uses fixed-point, unsigned numbers and displacements to improve computational efficiency, reduce the number of logic components, and the cost of hardware circuits. In addition, the use of look-up tables accelerates the processing of hardware processed signals. The speed and the pipeline structure enable each level to operate independently. Therefore, the pipelined architecture can effectively reduce the overall calculation time and circuit area. At the same time, we will study how to synchronize the elasticity. The principle and architecture of the circuit, and the application of the hardware design of this algorithm. Compared with the traditional pipeline circuit, the synchronous elastic circuit has the latency-insensitive feature, which allows the pipelined data path to be fully utilized by multiple threads. The work completed in this paper consists of designing a flexible circuit with latency-insensitive properties and studying how to insert appropriate synchronous elastic control circuits into the image contrast algorithm circuit to support the operation of this algorithm and analyze how much circuit area and performance are sacrificed to achieve the function of the synchronous elastic circuit.
Chhabra, Robin. "Concurrent Design of Reconfigurable Robots using a Robotic Hardware-in-the-loop Simulation." Thesis, 2008. http://hdl.handle.net/1807/17156.
Повний текст джерелаHui, Lai Hsiao, and 賴曉輝. "Research and design in hardware architecture of digital beamforming receiver." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/24540902175678981676.
Повний текст джерела大葉大學
電機工程研究所
89
The thesis focuses on the research and design of digital beamforming receiver’s hardware architecture. The receiving front-end consists of an array of sensors (antennas) by the beamforming techniques, we can arrive at the goal of space division multiple access in wireless environment. The adaptive processor detects the direction of arrival (DOA) of each source spreading in space and estimates the optimal weight vector. Consequently, adaptive beamforming can be used to increase system capacity. One of the main issues in Space Division Multiple Access (SDMA) is a technique of smart antenna. The smart antenna can improve performance in several ways:(1) To increase spectral efficiency and system capacity.(2) To reduce multi-path interference.(3) To combat co-channel interference(CCI).(4) Range extension. In this text, we apply the band-pass sampling theorem for IF sampling of the architecture of software radio. Furthermore, we completed interface-circuit and control-software which can control multi-digital down converter in one computer simultaneously, and realize the Digital Beamforming(DBF) module by Field Programmable Gate Array (FPGA). The control-software of view interface software is written by Borland C++ Builder (BCB) . It contains three view form:(1) To explain any Digital Down Converter (DDC) magnitude interface (2) To monitor eighteen DDC magnitude interface simultaneously (3) tuning interface. The eight-bits adjust-addresser on the interface circuit can be used total two hundred fifty-six addresses. This reaches purpose of control multi-DDC. The Very High Description Language (VHDL) describes DBF circuit to accomplish DBF by Xilinx 4036-3’s chip.
Long, Ho Shan, and 何昇龍. "A data compression hardware architecture design based on LZW algorithm." Thesis, 1994. http://ndltd.ncl.edu.tw/handle/99280506383631973475.
Повний текст джерела國立臺灣科技大學
工程技術研究所
82
We propose in this thesis an novel hardware architecture for lossless data compression . The underlying algorithm is based on a improved version of LZW algorithm . The major modifications of the original LZW algorithm are : using FIFO strategy for diction- ary insertion to meet various type of data, without actually storing the initial 256 signal character string, and using paral- lel dictionaries with variable sizes and word lengths. The re- sulting hardware architecture has the following advantages. It is simple and only requires a small RAM and a few logic gates. As for the compression speed , it could be controlled by setting system parameters. In addition, the decompression speed is faster than LZW algorithm and the memory required is much smaller than that of LZW algorithm.
Chan, Wei-Kai, and 詹偉凱. "Algorithm, VLSI Hardware Architecture and System Design for Smart Surveillance." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/18173065914884095598.
Повний текст джерела國立臺灣大學
電子工程學研究所
100
In the next-generation visual surveillance systems, content analysis tools will be integrated. New design issues will arise related to system cost, deployment space, network loading, and system scalability. In this thesis, after a discussion in terms of surveillance pipelines, it is proposed to utilize a content abstraction hierarchy to relieve network loading and increase system scalability, and integrate a hardware content analysis engine into a smart camera System-on-a-Chip (SoC) to reduce system cost and deployment space. As a result, the surveillance IP camera will become a smart camera with the embedded capabilities for automatic content analysis and the network of surveillance IP cameras will become smart surveillance networks. For the functions of content analysis, the video object segmentation and tracking are two important building blocks for smart surveillance. However, there are several issues needed to be solved. First, the threshold decision is a hard problem for background subtraction video object segmentation. Second, for video object tracking, there are some issues or conditions that make video object tracking hard to be robust, such as non-rigid object motion, target appearance changes due to illumination condition changes, background clutter, ..., etc. In this thesis, by proposing an improve threshold decision algorithm, the threshold for background-subtraction-based video object segmentation can be decided automatically and robustly under sever dynamic backgrounds. Besides, the proposed threshold decision is based on a mechanism different from that in background-subtraction-based video object segmentation, which can prevent possible error propagations. For video object tracking, by using diffusion distance for color histogram matching, the tracker can track non-rigid moving object under sever illumination condition changes, and, by using motion clue from video object segmentation, the tracker can be robust to background clutter. In the experiments results, we show that the presented algorithms are robust under several challenging sequences and our proposed methods are truly effective approaches for the mentioned issues. Beside of video object segmentation and tracking, two more functions of content analysis are also improved in this thesis. They are video object description, and face detection and scoring. For the video object description, a new descriptor for human objects, Human Color Structure Descriptor (HCSD), is proposed. Experimental results show that the proposed descriptor, HCSD, can achieve better performance than Scalable Color Descriptor and Color Structure Descriptor of MPEG-7 for human objects. For face detection and scoring, facial images with low resolution in surveillance sequences are hard to detect with traditional approaches. An efficient face detection and face scoring technique in surveillance systems is proposed. It combines spirits of image-based face detection and essences of video object segmentation to filter out high-quality faces. The proposed face scoring technique, which is useful for surveillance video summary and indexing, includes four scoring functions based on feature extraction and is integrated by a neural network training system to select high-quality face. Experiments show that the proposed algorithm effectively extracts low-resolution human faces, which the traditional face detection algorithms cannot handle well. It can also rank face candidates according to face scores, which determine face quality. For the hardware content analysis engine, a 5.877 TOPS/W and 111.329 GOPS/mm^2 Reconfigurable Smart-camera Stream Processor (ReSSP) is implemented in 90nm CMOS technology. A coarse-grained reconfigurable image stream processing architecture (CRISPA) along with design techniques of heterogeneous stream processing (HSP) and subword-level parallelism (SLP) is implemented to accelerate the processing algorithms for smart-camera vision applications. With the processor architecture of CRISPA and the design techniques of HSP and SLP, ReSSP can outperform existing vision chips in many aspects of hardware performances. Moreover, the programmability of ReSSP makes it capable of supporting many high-level vision algorithms in high spec, such as the real-time capability for full-HD video analysis. The implementation results show that the on-chip memory can be reduced by 94% with SLP memory sharing. The on-chip memory size, power efficiency and area efficiency are 18.2x to 182x, 4.5x to 33.0x, and 3.8x to 74.2x better than the state-of-the-art chips. Beside of the algorithms and hardware that are proposed for the single smart camera, this thesis also presents a cooperative surveillance system. It proposes a cooperation scheme between fixed cameras and a mobile robot. The fixed cameras detect the objects with background subtraction and locate the objects on a map with homography transform. At the same time, the information of the target to track, including the position and the appearance, is transmitted to the mobile robot. After Breadth First Search in a map of Boolean array, the mobile robot finds the target in its view by use of a stochastic scheme with the information given, then the mobile robot will track the target and keep it in the robot''s view wherever he or she goes. By proposing this system, the dead spot problem in typical surveillance systems with only fixed cameras is considered and resolved. Besides, the track initialization problem in typical tracking systems, i.e. how to decide the target of interests to be tracked, is also resolved with the proposed cooperation scheme in system level.
Chen, Chieh-Li, and 陳潔立. "Hardware Architecture Design and Implementation of Image-Based Rendering Engine." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/67614083369674148466.
Повний текст джерела臺灣大學
電子工程學研究所
98
Currently, the trend on the development of multimedia display system focuses on providing excellent image quality to users. As the resolution of television grows from standard definition (720×480) to quad full high definition (3840×2160), the latest iPhone 4 also provides Retina display, which has 326 dpi, already exceeds the threshold from which human eyes can tell the difference. However, the viewing experience is restricted to high visual quality because of some limitation of current display system. Current display systems only play the data they stored without any modification. It restrains the viewing experience of users that they can only watch multimedia contents without any interaction with them or adding their opinions into multimedia contents. Thus, we think a customized display system should be developed. The customized display system should have the ability to respond to users’ requirements and interact with users. To achieve this goal, we design an real-time interactive system, called image-based rendering engine, which introduces hardware acceleration to achieve real-time requirement and can be integrated into current display system, to bring more entertainment and to provide more interaction and more customized view experience to users. The proposed image-based rendering engine can support several existent image-based rendering algorithms, such as 2D panorama, concentric mosaics and depth-image based rendering. The image-based rendering engine can also support a new interactive system, called Tennis Real Play, letting users to interact with the broadcast tennis video contents and play a game after they watch the Grand Slam tournaments. In order to overcome the typical hardware design challenge in accelerating rendering algorithms, such as high throughput requirement, high bandwidth requirement, programmability and low cost, we employ reconfigurable architecture and hardware sharing techniques. Besides that, we also introduce folding technique, cache mechanism and FIFO to optimize our hardware architecture. The proposed image-based rendering engine is implementedwith TSMC 0.18μm process technology. The area of the image-based rendering engine is around 89662 gate counts with 499712 gate counts of memory. With the employ of cache mechanism, the corresponding bandwidth has reduced 82.3%. With the introduce of folding technique, the area has reduced 33.8%. And with FIFO, the total processing cycle decreases 27.4%. The proposed rendering engine can respond to users’ instruction and has rendering speed 9 times faster than CPU and 2 times faster than GPU.
Wu, Pei-Hsuan, and 吳佩軒. "Architecture Design and Implementation of Deep Neural Network Hardware Accelerators." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/2nq8p3.
Повний текст джерела國立中山大學
資訊工程學系研究所
107
Deep Neural Networks (DNN) widely used in computer vision applications have superior performance in image classification and object detection. However, the huge amount of data movement and computation complexity are two challenges if DNN is used in embedded systems where real-time processing and power consumption are two major design considerations. Hardware DNN accelerators are usually designed using FPGA or ASIC. In this proposal, we develop a memory access method and design a DNN hardware accelerator with fewer memory access and lower power consumption. Using mixed input/output/reuse method, we design a DNN hardware accelerator with 32 processing elements (PEs) that accelerates the computation of VGG16 convolutional layers. The accelerator can achieve a maximum frequency of 515MHz with internal SRAM size of 280 KB using TSMC 40nm process technology. The peak performance of the accelerator is 139 GOP/s, which has better computation speed and power compared to Eyeriss [21].
Chiu, Wen-Yu, and 邱文昱. "Algorithm and Architecture Design of Hardware-Oriented Video Embedded Compression." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/uv575k.
Повний текст джерела元智大學
電機工程學系甲組
107
Nowadays people pursuit of the visual perception quality of display resolution. However, it cause the technology of the display resolution is improving constantly. Presently, the Full-HD 1080p (1920x1080) display has become a common condition of requirement on Digital TV marketing. Along with the display resolution of technology change with each passing day, the display resolution is developed into quad full high definition (QFHD) and ultra high definition (UHD). Moreover, the display of the resolution and the frame rate are growing constantly. It will cause two problems of design bottleneck. 1). Enormous and complicated mathematical operation load. 2). The external memory requires tremendous memory bandwidth to process. In this study,it utilized the effective prediction algorithm and entropy coding system to process. Therefore, the hardware of the low complicated design can save the hardware cost and also reduce the mathematical operation. The lossless compression algorithm flow can divide into two parts to process. 1). Prediction : Adopt the prediction algorithm to decrease the image residual and make the residual use the less bit to encode. 2). Entropy coding system : the residual will be encoded systematically and provide the information to the decoder analyze. Hence, the effective prediction and entropy coding system can economize 57% of bandwidth of the image data.
wang, ying-chi, and 王英琪. "CF Card Hardware Design Under a Micro-Controller Combining ASIC Architecture." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/38376198718597857037.
Повний текст джерела國立海洋大學
電機工程學系
90
The theme of the thesis is the micro-controller combining application specific chip architecture of hardware application。We introduced the CompactFlash Card,and emphasized the design of buffer and external ram,and how the data move between the buffer,host and flash memory。And then we introduced the MP3 decoder,and used FPGA to communicate between the micro-controller and decoder。
Tsai, Lian-Tsung, and 蔡連宗. "Design and Implementation of JPEG2000 Hardware Architecture and Digital Watermark System." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/84544916371638968856.
Повний текст джерела國立中央大學
電機工程研究所
92
A new still image standard, JPEG2000, supplies not only higher compression performance but also various functionalities. This thesis focuses on the analysis and architecture design for a JPEG2000 still image encoding system. We analyze this system with several experiments. Based on the experiment results, the bottleneck for JPEG2000, EBCOT, is found. In this regard, some strategies for decreasing the computation time of EBCOT are discussed. In order to improve the EBCOT algorithm, Clean-Up Pass Skipping method (CUPS) and Pass Predicting method (PP) are proposed. We verify the CUPS and PP methods by completed simulation on C environment and they can reduce the 65~69% clock cycles for EBCOT context modeling. We achieve the efficient hardware architecture for the JPEG2000 encoding system. For the architecture design of EBCOT context modeling, proposed speed-improved methods are included. The CUPS method only needs an accumulator to sum up the number of coefficient-bits in a bit-plane that have been coded in Pass1 and Pass2. The PP method requires extra combinational logic circuits and two predict tables to record the addresses when the Pass1 and Pass2 coding are needed. A few components can improve the speed efficiency. Due to the rapid development of the networking and communication, the distribution of digital data is faster and arbitrary. There more consumer product is produced and popular, such as DSC, DV ...etc. In order to protect copyright of the multimedia data, a data capturing, compressing and ownership declaring is done at the same time. In this paper, a watermarking system for embedding wavelet transform domain and Philips TriMedia TM-1300 implementation is presented. The watermarking system applied Toral Automorphism to build a watermarking system which it is suitable for JPEG2000 watermarking.
Tai, Hung-Shou, and 戴宏碩. "The Hardware Architecture Design of Trilateral Noise Filter for Color Images." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/59667062748549662497.
Повний текст джерела國立臺灣師範大學
應用電子科技研究所
95
Removing noise while preserving and enhancing edges is one of the most fundamental operations of image/video processing. When taking pictures with digital cameras, it is frequently found that the color images are corrupted by miscellaneous noise, especially images get with high ISO values in low luminance. Hence, noise filtering is a necessary module in digital still cameras. The difficulty of designing noise filter is that the filter will also reduce the sharpness of the image. On the other hand, optical lens imperfections are usually equivalent to spatial low pass filters and tend to result in blurred images. It is customary to apply edge enhancement algorithm on the image in order to improve the sharpness, but this process usually increase the noise level as a by-product. Hence, an efficient noise filter is very important before edge enhancement. In this paper, the efficiency of trilateral filter and other popular filters are compared briefly, and the trilateral filter is implemented by HDL language for image processing chip.
Lee, Chuan-Yiu, and 李權祐. "Hardware Architecture Design andImplementation of Ray-Triangle Intersectionwith Bounding Volume Hierarchies." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/98797645439049633169.
Повний текст джерела國立臺灣大學
電子工程學研究所
95
Ray tracing is a simple yet powerful and general algorithm for accurately computing global light transport and rendering high quality images. While recent algorithmic improvements and optimized parallel software implementations have increased ray tracing performance to interactive levels, few efficient hardware solution has been available due to hardware unfriendly of traditional ray tracing algorithm. This thesis proposes a more hardware friendly ray tracing algorithm and describes the architecture based on this algorithm. We also implement a first prototype chip around the world for ray tracing with standard cell based design flow. By the proposed algorithm, on-chip sram usage of my design is reduced dramatically compared to previous architectures while it retains a similar computation amounts. We also use multi-threading and folding technique to increase the hardware utilization and achieve maximum performance at minimum hardware resource. The external bandwidth is low enough to duplicate many the same units to process in parallel, which is achieved by a tiny cache with word length analysis and vertex sharing technique. The prototype chip is fabricated by TSMC 0.13 μm technology. The chip size is 1.697×1.7mm2. It is capable of 4.3 giga floating point operations per-second. vii
Chen, Hong-Yuh, and 陳宏郁. "Algorithm and Hardware Architecture Design of Face Hallucination Using Eigen Patch." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/90152639931775714904.
Повний текст джерела國立臺灣大學
電子工程學研究所
102
In surveillance system, to recognize the human face is always the most principal target. Due to the low-quality sensor of surveillance camera and the compression by video coding, the captured facial images are usually in low-resolution. In order to reach a better face recognition rate, a better resolution of these facial images is needed. Besides enhance the quality of sensor or raise the performance of video coding, to enhance the resolution of desired facial images, the face hallucination can be applied. Face hallucination is a super resolution process targeting on facial images. It can recover the related high-resolution image with rich details from a low-resolution facial image. Therefore, the goal of our work is to improve the resolution targeting on low-resolution facial images. The corresponded hardware design is also provided. We propose a low complexity face hallucination algorithm called eigen-patch which can provide high-resolution facial images with rich details and sharpness. Our eigen-patch algorithm combine the eigen-transformation face hallucination with the structure of position-patch based face hallucination. This algorithm has two main contributions. First is conductiing the eigen-transformation on patch size. The eigen transformation raise the image quality and reduce the computational complexity without solving the least square problem. The second contribution is the input image alignment skill. In usual case, the input low-resolution image would not be well-aligned. Therefore, the result high image will suffer from the artifacts and significant quality degradation. Based on the input low-resolution image, the input image alignment mechanism open a search range on database image in order to reach a better alignment. Experimental reuslts shows that the proposed face hallucination algorithm performs better than other ones. In hardware architecture design, we also simplify the original Eigen-Patch algorithm. We shift the image alignm mechanism into an earlier position, as a result, we do not have to recover all the facial images of different position. Only the correct aligned one will be hallucinated. The new Eigen-Patch scheme reduce the system bandwidth. We also analysis the number of database images in order to reduce the number of database images. The reduction of database images can further decrease the system bandwidth. Finally, our hardware implementation can reach a 4 times hallucination with 30 X 25 input image in 30fps.
Lin, Yi-Chun, and 林奕君. "Algorithm and Hardware Architecture Design of Super Resolution Targeting TV Scaler." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/70962576761638950697.
Повний текст джерела國立臺灣大學
電機工程學研究所
100
The goal of super resolution is to recover the high-resolution image with sharp edges and rich details from a low-resolution input image. Due to the increasing gap between the resolution of image sources and display devices, super resolution has become an essential technique in many applications. In this work, we focus on the application of TV scaler. Due to real-time requirement and low hardware-cost constraints, conventional TV scaler can only employ basic interpolation technique thus introduces some artifacts that degrade the viewing quality of the output sequences. Therefore, the goal of this work is to improve the performance of TV scaler by adopting the super resolution technique. The corresponded hardware design is also provided. We propose a low complexity super resolution algorithm which can provide vivid output image with rich details and sharp edges. There are two main contributions. First is the development of double interpolation up-sampling. Double interpolation quality evaluation can be used as a measurement of an interpolation operation. By using this double interpolation framework, the direction-adaptive upsampling algorithm is proposed to solve the zigzag artifact and enhance the quality of edges. The second contribution is the database-free texture synthesis technique. Based on the fractal property of nature images, it is possible to find proper high resolution patches in low resolution input image itself. Therefore, the texture synthesis can be performed without database to provide proper and rich details. The double interpolation framework for up-sampling and the reconstruction constraint for the final optimization combined with the texture synthesis form the whole super resolution algorithm. Experimental results show that the proposed super resolution algorithm performs better than other ones. For the VLSI hardware design, the target specification is set to 1920x1080 frame size, with throughput of 60 frames per second. The main contributions of hardware architecture design are one-pass double interpolation, tile-based gradient descent, and partial-sum reuse texture synthesis. One-pass double interpolation and tile-based gradient descent lower down the consumption of bandwidth and SRAM, while partial-sum reuse texture synthesis reduce 76 percent of the computational costs. The hardware is implemented with Verilog-HDL and synthesized with SYNOPSYS Design Compiler. TSMC 65nm cell library is adopted to design the hardware. The operation frequency is at 240MHz. The total gate count is 766K. We also verify the design with FPGA. The demo platform is based on Terasic DE4 development board with the Altera Stratix IV GX device. The FPGA demo system up-samples the video by the proposed super resolution hardware at the frame size of 1920x1080 and the frame rate of 24 frames per second. The results show that our architecture is able to provide high quality output in real-time while solving the problems of zigzag and blurred effects caused by conventional scaler.
Lai, Ue-Ln, and 賴譽仁. "Hardware Architecture Design and Implementation of Elliptic Curve Encryption/Decryption Algorithms." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/60556511931839409769.
Повний текст джерела國立臺灣科技大學
電子工程系
90
In this thesis, the VLSI architecture design and implementation of two 162-bit elliptic curve encryption/decryption chips are presented. One of them is based-on the IEEE 1363-2000 standard, and the other is based-on the arithmetic operations over extension field. All of the chips perform arithmetic operations over field on projective coordinates with optimal normal-basis representation. To provide the flexibility of interfacing with common microprocessors, the data bus width can be set to 8, 16, or 32 bits. The performance of extension-field-based chip is superior to that of standard-based chip at the same operating frequency in terms of bit rate, die size, and power consumption. The standard-based chip operates at 45 MHz and has bit rate of 44.1 kbps when realized on the Xilinx FPGA Virtex V400BG560. It operates at 125 MHz and has bit rate 122.7 kbps when realized on TSMC 0.35 um cell-based process. The resulting chip occupies 2.713*2.713mm^2 die area and consumes 133.98mW. The extension-field-based chip operates at 48 MHz and has bit rate of 94.2 kbps when realized on the Xilinx FPGA Virtex V400BG560. It operates at 125 MHz and has bit rate 245.4 kbps when realized on TSMC 0.35um cell-based process. The resulting chip occupies 2.541*2.541mm^2 die area and consumes 124.74 mW.
Wang, Chien-Chung, and 王建中. "The Hardware Architecture Design for Cube-root and Color Space Conversion." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/93908490441359952745.
Повний текст джерела國立雲林科技大學
電子與資訊工程研究所碩士班
90
Various color spaces have been reported in an attempt to identify a uniform perceptual color space for color measurement and prediction purposes. CIE (Commission Internationale de l''Eclairage) recommends one linear transformation to get the XYZ color space, and then, one non-linear transformation is used to get the L*a*b* color space. The design and implementation of hardware architecture, which can perform real time conversion from the RGB color coordinates to standard CIE L*a*b* color coordinates, is studied in this thesis. To calculate the cube-root in non-linear transformation, we propose the approximate arithmetic algorithms and the corresponding hardware architecture to replace the look-up tables. The accuracy of the color coordinate transform is simulated under Matlab programming tool. Then using the Verilog HDL programming language and SYNOPSYS synthesis tool to estimate and forecast its hardware performance. Finally, the implemented architecture for cube-root is faster than the LUT, and the presented combinational logic is less than the most recent published works for color space conversion.
Lan, Wei, and 藍瑋. "High-throughput Hardware Architecture Design and Realization of RaptorQ Code Decoder." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/52697808591399871712.
Повний текст джерела國立臺灣大學
電子工程學研究所
103
With advances in technology, there are much more services that a smartphone can provide. The multimedia streaming is the most common one that people use. However, transmission latency is of utmost important to quality of viewing/listening experiences. Unfortunately, wireless transmission often suffers channel fading that renders robust transmission almost impossible without effective error correction mechanism. Conventional protocol generally retransmits the erased coded sequence until the receiver receives it correctly. Fountain codes, on the other hand, keep the partially decoded information and continue to receive and decode the coded symbol until the whole information sequence can be recovered. Such rateless code has drawn a great deal of attention and has been applied in many scenarios. RaptorQ code is the latest generation of Raptor codes. Compared with the previous version, RaptorQ code provides higher flexibility and the lower decoding failure probability. However, the decoding procedure is also much more complicated. Conventionally, the decoding of RaptorQ codes requires inverting a huge matrix. Instead of such costly matrix inversion, we proposed to calculate the inverse of another matrix whose rows are a little different from the one that needs to be decoded. Therefore, most computations are shifted offline. Next, previous decoding usually decodes the intermediate symbols while inverting the matrix, and recovers the information sequence from the intermediate symbols. With the pre-calculated inverse, the proposed algorithm combines the intermediate sequence decoding and the procedure of information sequence recovery to reduce the complexity. Last, due to the systematic code property of the RaptorQ code, we proposed a new method that avoids many unnecessary computation when decoding the received information sequence. Finally, the proposed decoding algorithm is not only simulated on software, but also verified with FPGA board to prove its feasibility.
Chen, Chun-Ting, and 陳俊廷. "Intelligent Brain-inspired Human-centric RecognitionAlgorithm and its Hardware Architecture Design." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/94144528383791990153.
Повний текст джерела國立臺灣大學
電子工程學研究所
100
As the technologies continue to evolve, our computers have more and more computing capacity, which drives a lot of intelligent applications to emerge like smile shutter, automatic surveillance system, smart car and smart home. These smart machines can sense the surrounding like human and provide safety, convenience and efficiency to help human. These intelligent applications in this thesis are called human-centric applications which based on the needs of human. In this thesis, we focus on the human-centric recognition applications,such as face recognition, object recognition and action recognition. On the other hand, since we are in the era where radio equipped computers dominate, the amount of multimedia data is growing extremely fast. Youtube have reported that more than 35 hours of video are being uploaded to the video-sharing site every minute in 2010. In this rate, we need to handle over one zettabyte of information annually. Therefore, to support various intelligent applications and manage this huge amount of data, we need an efficient and scalable hardware platform to provide the required computation capability. The ultimate goal is to approach human-like intelligence. For building an intelligent machine, mimicking the structures and functions of visual cortex has always been a major approach to implement a human-like intelligent visual system. In this thesis, we started from exploring brain’s computing style and architecture, then designed a brainlike computing system for visual recognition,which can be easily scalable with the amount of resources for future intelligent applications. The whole system design flow starts from Neocortical Computing (NC) model design, Neocortical Computing System design and then the real-time human-centric NC architecture based on FPGA system. NC model provides the functionality for required intelligent human-centric applications. NC architecture is an efficient and scalable hardware platform optimized for NC model. And FPGA system verify the NC system by transforming the NC model into the specific memory content that can be interpreted by platform. In this thesis, the main system design strategy is to provide the application diversity and efficiency as human brains. At first, we analyze the current NC models and find that they are lack of the temporal domain integration and thus are hard to explore the object recognition into time-relevant action recognition. To solve this problem, inspired from the human brain system’s recurrent information transmission nature and neuron network research, we proposed a recurrent computing kernel to integrate the temporal domain action feature information efficiently. Therefore we could construct an efficient dimension-lifting Reservoir Kernel which exhibits the property of temporal memory and thus can integrate the temporal information provided by the HMAX network and boost up its recognition performances. Experimental results showed that it can outperform the state-of-the-art HMMSVM method substantially. Second, for the NC system design of NC model, we analyze the computation of NC model and state its main problem – massive data access, which results in power inefficiency, redundant external bandwidth usage, slow response and no communication scalability. In current computing system, this problem causes the NC system becomes a memory-bounded system. To address this issue, inspired from the information forwarding scheme of neurons, we proposed a Push-based Dataflow (Push-DF) structure using push-based processing for external memory access reduction and efficient sparse data forwarding. From the experimental result, the Push-DF in many-core architecture can achieve lower latency, power consumption and external bandwidth than RISC and GPU. Utilizing push-based processing greatly reduces the massive external memory access so that our NC system can break the bottleneck of traditional memory-bounded system. This important feature provides the communication scalability of our NC system, which meets the design goal for a scalable brain-mimicking hardware platform. At last, we utilized the proposed Push-DF structure for designing NC system and implemented a 8-core NCSoC in FPGA system. Our final implementation of NCSoC takes 0:179 seconds to recognize a 100×100 image. In conclusion, NCSoC supports NC model for various intelligent recognition tasks, and provides better performance, efficiency and scalability over current computing platform. As a result, it have the potential to support various intelligent applications and manage huge amount of multimedia data for future applications.
Tzu-YinKuo and 郭姿吟. "High Performance Hardware Architecture Design of Homomorphic AES for Cloud Computing." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/5277h4.
Повний текст джерела國立成功大學
電機工程學系
105
Fully homomorphic encryption (FHE) is an emerging technique that allows the encrypted data to be processed directly in untrusted servers for ensuring data privacy. Despite of the important feature of FHE in cloud computing applications, there are still extremely high computation complexity and implementation cost in the underlying algorithms. Homomorphic evaluation of advanced encryption standard (AES) can be regarded as a complex function, in which existing homomorphic AES implementations still demand a significant amount of computational time. The most expensive operation in homomorphic AES is key switching after homomorphic multiplication and automorphism operations. To improve the performance of homomorphic AES by reducing homomorphic multiplication and automorphism operations in critical computational paths, this thesis proposes a parallel SubByte and MixColumn/ShiftRow algorithm by relaxing the underlying data dependency. Compared to the conventional homomorphic AES, the proposed one can reduce 3 key switching operations in one round of homomorphic AES assuming parallel processing. Moreover, high-performance hardware architectures of homomorphic AES are presented for different security levels. Performance evaluations show that the proposed design outperforms the related works in terms of computational time and performance.
Liu, Yue-qu, and 劉岳衢. "Reconfigurable Design and Implementation of Modular-Construction Based FFT Hardware Architecture." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/m9kw97.
Повний текст джерела國立中山大學
電機工程學系研究所
106
In the 3GPP-LTE communication standard, it defines many kinds of Fast Fourier Transform(FFT) sizes. So, we design a high performance FFT architecture which makes good use of modular design construction and reconfigurable design to achieve easily connection between every 2 stages. This design can be suitable for any requirement. In the 4-stage module, it can support 48 modes which perform 2-2187 FFT points. It also supports 32 modes defined in 3GPP-LTE communication standard. Each module contains two parts. (1) Reconfigurable Computing Kernel(RC-CK):We employ radix-32 and radix-23 bases and suitably utilize the hardware reuse property. Without extra of hardware resource (ex:multipliers or adders), it can execute six types of different radix of FFT kernel operations. (2) Reconfigurable First-in First-Out(RC-FIFO):We develop a high efficient design method for supporting many FFT points. The FIFO plan is easily managed and suitably located to maximize the hardware storage usage. In addition, we propose Section-based Twiddle Factor Generator(STFG) to support multi-FFT points. It can reduce the area cost and satisfy any communication systems effectively. In the chip implementation, the core area is only 0.318 mm2 by using TSMC 40-nm CMOS technology. The maximal operating frequency is 350 MHz and power dissipation in average is 44.2 mW. As compared with other state-of-the-arts, our proposed work has the best performance and support many FFT points. Most important of all, the proposed hardware architecture has the better scalability. In the future, we can support the undefined specifications of the 5th generation wireless system only by increasing/decreasing the module.
Wang, Ching-Shun, and 王靖順. "Reconfigurable Hardware Architecture Design and Implementation for AI Deep Learning Accelerator." Thesis, 2019. http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5441107%22.&searchmode=basic.
Повний текст джерела國立中興大學
電機工程學系所
107
This paper proposes the Convolution Neural Network hardware accelerator architecture with 288PE to achieve 230.4GOPS@400Mhz. To verify the hardware function, the hardware is implemented at 100MHz in units of 72PE owing to the limitation of FPGA resources. The proposed CNN hardware accelerator is Layer-based architecture which can be reconfigured the layer parameters to suitable for different CNN architectures. The proposed architecture is based on operating three Rows Input feature map and then generate a Row Output feature map. The proposed architecture uses 322KB On-Chip Memory to store Input feature map, Bias, Kernel, and Output feature map to improve the efficiency of Data reuse and reduce bandwidth utilization. In this paper, the Max-pooling layer after the Convolution layer can be combined to reduce the bandwidth of DRAM.
Shiau, Wen-Shiuh, and 蕭紋旭. "Hardware Design of a Shared-memory Architecture for ATM/Ethernet Switching Subsystems." Thesis, 1997. http://ndltd.ncl.edu.tw/handle/64517773567389631715.
Повний текст джерела國立中正大學
電機工程學系
85
Abstract Design of ATM/Ethernet switching systems has received much attention from R&D organization around the world. This switching system provieds a seamlesstransport platform for interworking tradtional LAN environment with the ATMbackbone. In this thesis, we present an effort for realizing a switchingsubsystem that supports bridging and switching capabilities for transportingdata between Ethernet modules and ATM modules (including ATM/ATM and Ethernet/Ethernet). In this design, we adopt the shared memory structure for the information transfer between Ethernet modules and ATM modules. To support the shared memorystructure, we design a shared memory manager to handle all necessary control and management functions incurred during transferring data from one module toanother. We first describe the related functional blocks for the proposed subsystem. We define associated interfaces and structures for each block. Wedefine control tables to inform modules about data characteristics when involving data transfer. We make use of the translation tables to support thebridging capabilities. We also design a CPU interface for supporting the PVCconfiguration and SVC signaling message transfer. Based on the designed architecture, we describe the logic flow for eachfunctional block, and explain their associated FSMs. We conduct the hardwareimplementation through a Top-Down methodology, which starts with behaviorsynthesis and logic simulation. We simulate and verify the proposed subsystemthrough the following processes: block simulation, FSM simulation, module simulation, and system simulation. We present the related testing process and procedures for each moduleand perform an integrated test for the subsystem. Based on our calculation,throughput of the designed subsystem can reach up to 1056Mbps. To checkwhether the system performs correctly, we analyze worst-case delay performancefor different logic flows. From our simulation, the subsystem can be operatedat 33 MHz speed. Finally, we give some remarks about the development effort of the switchingsubsystem, which includes improvement of hardware design and support of other functionalities.