Dissertations / Theses on the topic 'Hardware/algorithm co-design'

To see the other types of publications on this topic, follow the link: Hardware/algorithm co-design.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 21 dissertations / theses for your research on the topic 'Hardware/algorithm co-design.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Zhengdong Ph D. Massachusetts Institute of Technology. "Efficient computing for autonomous navigation using algorithm-and-hardware co-design." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122691.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 211-221).
Autonomous navigation algorithms are the backbone of many robotic systems, such as self-driving cars and drones. However, state-of-the-art autonomous navigation algorithms are computationally expensive, requiring powerful CPUs and GPUs to enable them to run in real time. As a result, it is prohibitive to deploy them on miniature robots with limited computational resources onboard. To tackle this challenge, this thesis presents an algorithm-and-hardware co-design approach to design energy-efficient algorithms that are optimized for dedicated hardware architectures at the same time. It covers the design for three essential modules of an autonomous navigation system: perception, localization, and exploration.
Compared with previous research that considers either algorithmic improvements or hardware architecture optimizations, our approach leads to algorithms that not only have lower time and space complexity but also map efficiently to specialized hardware architectures, resulting in significantly improved energy efficiency and throughput. First, this thesis studies how to design an energy-efficient visual perception system using the deformable part models (DPM) based object detection algorithm. It describes an algorithm that enforces sparsity in the data stored on a chip, which reduces the memory requirement by 34% and lowers the cost of the classification by 43%. Together with other hardware optimizations, this technique leads to an object detection chip that runs at 30 fps on 1920 x 1080 videos while consuming only 58.6mW of power.
Second, this thesis describes a systematic way to explore algorithm-hardware design choices to build a low-power chip that performs visual inertial odometry (VIO) to localize a vehicle. Each of the components in a VIO pipeline has multiple algorithmic choices with different time and space complexity. However, some algorithms of lower time complexity can be more expensive when implemented on-chip. This thesis examines each of the design choices from both the algorithm and hardware's point of view and presents a design that consumes 24mW of power while running at up to 90 fps and achieving near state-of-the-art localization accuracy Third, this thesis presents an efficient information theoretic mapping system for exploration. It features a novel algorithm called Fast computation of Shannon Mutual Information (FSMI) that computes the Shannon mutual information (MI) between perspective range measurements and the environment.
FSMI algorithm features an analytic solution that avoids the expensive numerical integration required by the previous state-of-the-art algorithms, enabling FSMI to run three orders-of-magnitude faster in practice. We also present an extension of the FSMI algorithm to 3D mapping; the algorithm leverages the compression of a large 3D map using run-length encoding (RLE) and achieves 8x acceleration in a real-world exploration task. In addition, this thesis presents a hardware architecture designed for the FSMI algorithm. The design consists of a novel memory banking method that increases the memory bandwidth so that multiple FSMI cores can run in parallel while maintaining high utilization. A novel arbiter is proposed to resolve the memory read conflicts between multiple cores within one clock cycle. The final design on an FPGA achieves more than 100x higher throughput compared with a CPU while consuming less than 1/10 of the power.
by Zhengdong Zhang.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
2

Tzou, Nicholas. "Low-cost sub-Nyquist sampling hardware and algorithm co-design for wideband and high-speed signal characterization and measurement." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51876.

Full text
Abstract:
Cost reduction has been and will continue to be a primary driving force in the evolution of hardware design and associated technologies. The objective of this research is to design low-cost signal acquisition systems for characterizing wideband and high-speed signals. As the bandwidth and the speed of such signals increase, the cost of testing also increases significantly; therefore, innovative hardware and algorithm co-design are needed to relieve this problem. In Chapter 2, a low-cost multi-rate system is proposed for characterizing the spectra of wideband signals. The design is low-cost in the sense of the actual component cost, the system complexity, and the effort required for calibration. The associated algorithms are designed such that the hardware can be implemented with low-complexity yet be robust enough to deal with various hardware variations. A hardware prototype is built not only to verify the proposed hardware scheme and algorithms but to serve as a concrete example that shows that characterizing signals with sub-Nyqusit sampling rate is feasible. Chapter 3 introduces a low-cost time-domain waveform reconstruction technique, which requires no mutual synchronization mechanisms. This brings down cost significantly and enables the implementation of systems capable of capturing tens of Gigahertz (GHz) signals for significantly lower cost than high-end oscilloscopes found in the market today. For the first time, band-interleaving and incoherent undersampling techniques are combined to form a low-cost solution for waveform reconstruction. This is enabled by co-designing the hardware and the back-end signal processing algorithms to compensate for the lack of coherent Nyquist rate sampling hardware. A hardware prototype was built to support this work. Chapter 4 describes a novel test methodology that significantly reduces the required time for crosstalk jitter characterization in parallel channels. This is done by using bit patterns with coprime periods as channel stimuli and using signal processing algorithms to separate multiple crosstalk coupling effects. This proposed test methodology can be applied seamlessly in conjunction with the current test methodology without re-designing the test setup. More importantly, the conclusion derived from the mathematical analysis shows that only such test stimuli give unbiased characterization results, which are critical in all high-precision test setups. Hardware measurement results and analysis are provided to support this methodology. This thesis starts with an overview of the background and a literature review. Three major previously mentioned works are addressed in three separate chapters. Each chapter documents the hardware designs, signal processing algorithms, and associated mathematical analyses. For the purpose of verification, the hardware measurement setups and results are discussed at the end of these three chapters. The last chapter presents conclusions and future directions for work from this thesis.
APA, Harvard, Vancouver, ISO, and other styles
3

Narasimhan, Seetharam. "Ultralow-Power and Robust Implantable Neural Interfaces: An Algorithm-Architecture-Circuit Co-Design Approach." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1333743306.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Trindade, Alessandro Bezerra. "Aplicando verificação de modelos baseada nas teorias do módulo da satisfabilidade para o particionamento de hardware/software em sistemas embarcados." Universidade Federal do Amazonas, 2015. http://tede.ufam.edu.br/handle/tede/4091.

Full text
Abstract:
Submitted by Kamila Costa (kamilavasconceloscosta@gmail.com) on 2015-06-15T21:23:16Z No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-16T15:00:54Z (GMT) No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-16T15:02:16Z (GMT) No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5)
Made available in DSpace on 2015-06-16T15:02:16Z (GMT). No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5) Previous issue date: 2015-02-09
Não Informada
When performing hardware/software co-design for embedded systems, does emerge the problem of allocating properly which functions of the system should be implemented in hardware (HW) or in software (SW). This problem is known as HW/SW partitioning and in the last ten years, a significant research effort has been carried out in this area. In this proposed project, we present two new approaches to solve the HW/SW partitioning problem by using SMT-based verification techniques, and comparing the results using the traditional technique of Integer Linear Programming (ILP) and a modern method of optimization by Genetic Algorithm (GA). The goal is to show with experimental results that model checking techniques can be effective, in particular cases, to find the optimal solution of the HW/SW partitioning problem using a state-of-the-art model checker based on Satisfiability Modulo Theories (SMT) solvers, when compared to the traditional techniques.
Quando se realiza um coprojeto de hardware/software para sistemas embarcados, emerge o problema de se decidir qual função do sistema deve ser implementada em hardware (HW) ou em software (SW). Este tipo de problema recebe o nome de particionamento de HW/SW. Na última década, um esforço significante de pesquisa tem sido empregado nesta área. Neste trabalho, são apresentadas duas novas abordagens para resolver o problema de particionamento de HW/SW usando técnicas de verificação formal baseadas nas teorias do módulo da satisfabilidade (SMT). São comparados os resultados obtidos com a tradicional técnica de programação linear inteira (ILP) e com o método moderno de otimização por algoritmo genético (GA). O objetivo é demonstrar, com os resultados empíricos, que as técnicas de verificação de modelos podem ser efetivas, em casos particulares, para encontrar a solução ótima do problema de particionamento de HW/SW usando um verificador de modelos baseado no solucionador SMT, quando comparado com técnicas tradicionais.
APA, Harvard, Vancouver, ISO, and other styles
5

Bahri, Imen. "Contribution des systèmes sur puce basés sur FPGA pour les applications embarquées d’entraînement électrique." Thesis, Cergy-Pontoise, 2011. http://www.theses.fr/2011CERG0529/document.

Full text
Abstract:
La conception des systèmes de contrôle embarqués devient de plus en plus complexe en raison des algorithmes utilisés, de l'augmentation des besoins industriels et de la nature des domaines d'applications. Une façon de gérer cette complexité est de concevoir les contrôleurs correspondant en se basant sur des plateformes numériques puissantes et ouvertes. Plus précisément, cette thèse s'intéresse à l'utilisation des plateformes FPGA System-on-Chip (SoC) pour la mise en œuvre des algorithmes d'entraînement électrique pour des applications avioniques. Ces dernières sont caractérisées par des difficultés techniques telles que leur environnement de travail (pression, température élevée) et les exigences de performance (le haut degré d'intégration, la flexibilité). Durant cette thèse, l'auteur a contribué à concevoir et à tester un contrôleur numérique pour un variateur de vitesse synchrone qui doit fonctionner à 200 °C de température ambiante. Il s'agit d'une commande par flux orienté (FOC) pour une Machine Synchrone à Aimants Permanents (MSAP) associée à un capteur de type résolveur. Une méthode de conception et de validation a été proposée et testée en utilisant une carte FPGA ProAsicPlus de la société Actel/Microsemi. L'impact de la température sur la fréquence de fonctionnement a également été analysé. Un état de l'art des technologies basées sur les SoC sur FPGA a été également présenté. Une description détaillée des plateformes numériques récentes et les contraintes en lien avec les applications embarquées a été également fourni. Ainsi, l'intérêt d'une approche basée sur SoC pour des applications d'entrainements électriques a été démontré. D'un autre coté et pour profiter pleinement des avantages offertes par les SoC, une méthodologie de Co-conception matériel-logiciel (hardware-software (HW-SW)) pour le contrôle d'entraînement électrique a été proposée. Cette méthode couvre l'ensemble des étapes de développement de l'application de contrôle à partir des spécifications jusqu'à la validation expérimentale. Une des principales étapes de cette méthode est le partitionnement HW-SW. Le but est de trouver une combinaison optimale entre les modules à mettre en œuvre dans la partie logiciel et celles qui doivent être mis en œuvre dans la partie matériel. Ce problème d'optimisation multi-objectif a été réalisé en utilisant l'algorithme de génétique, Non-Dominated Sorting Genetic Algorithm (NSGA-II). Ainsi, un Front de Pareto des solutions optimales peut être déduit. L'illustration de la méthodologie proposée a été effectuée en se basant sur l'exemple du régulateur de vitesse sans capteur utilisant le filtre de Kalman étendu (EKF). Le choix de cet exemple correspond à une tendance majeure dans le domaine des contrôleurs embraqués pour entrainements électriques. Par ailleurs, la gestion de l'architecture du contrôleur embarqué basée sur une approche SoC a été effectuée en utilisant un système d'exploitation temps réel. Afin d'accélérer les services de ce système d'exploitation, une unité temps réel a été développée en VHDL et associée au système d'exploitation. Il s'agit de placer les services d'ordonnanceur et des processus de communication du système d'exploitation logiciel au matériel. Ceci a permis une accélération significative du traitement. La validation expérimentale d'un contrôleur du courant a été effectuée en utilisant un banc de test du laboratoire. Les résultats obtenus prouvent l'intérêt de l'approche proposée
Designing embedded control systems becomes increasingly complex due to the growing of algorithm complexity, the rising of industrials requirements and the nature of application domains. One way to handle with this complexity is to design the corresponding controllers on performing powerful and open digital platforms. More specifically, this PhD deals with the use of FPGA System-on-Chip (SoC) platforms for the implementation of complex AC drive controllers for avionic applications. These latters are characterized by stringent technical issues such as environment conditions (pressure, high temperature) and high performance requirements (high integration, flexibility and efficiency). During this thesis, the author has contributed to design and to test a digital controller for a high temperature synchronous drive that must operate at 200°C ambient. It consists on the Flux Oriented Controller (FOC) for a Permanent Magnet Synchronous Machine (PMSM) associated with a Resolver sensor. A design and validation method has been proposed and tested using a FPGA ProAsicPlus board from Actel-Microsemi Company. The impact of the temperature on the operating frequency has been also analyzed. A state of the art FPGA SoC technology has been also presented. A detailed description of the recent digital platforms and constraints in link with embedded applications was investigated. Thus, the interest of a SoC-based approach for AC drives applications was also established. Additionally and to have full advantages of a SoC based approach, an appropriate HW-SW Co-design methodology for electrical AC drive has been proposed. This method covers the whole development steps of the control application from the specifications to the final experimental validation. One of the main important steps of this method is the HW-SW partitioning. The goal is to find an optimal combination between modules to be implemented in software and those to be implemented in hardware. This multi-objective optimization problem was performed with the Non-Dominated Sorting Genetic Algorithm (NSGA-II). Thus, the Pareto-Front of optimal solution can be deduced. The illustration of the proposed Co-design methodology was made based on the sensorless speed controller using the Extended Kalman Filter (EKF). The choice of this benchmark corresponds to a major trend in embedded control of AC drives. Besides, the management of SoC-based architecture of the embedded controller was allowed using an efficient Real-Time Operating System (RTOS). To accelerate the services of this operating system, a Real-Time Unit (RTU) was developed in VHDL and associated to the RTOS. It consists in hardware operating system that moves the scheduling and communication process from software RTOS to hardware. Thus, a significant acceleration has been achieved. The experimentation tests based on digital current controller were also carried out using a laboratory set-up. The obtained results prove the interest of the proposed approach
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Yuanzhi. "Algorithms and Hardware Co-Design of HEVC Intra Encoders." OpenSIUC, 2019. https://opensiuc.lib.siu.edu/dissertations/1769.

Full text
Abstract:
Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction.
APA, Harvard, Vancouver, ISO, and other styles
7

Marques, Vítor Manuel dos Santos. "Performance of hardware and software sorting algorithms implemented in a SOC." Master's thesis, Universidade de Aveiro, 2017. http://hdl.handle.net/10773/23467.

Full text
Abstract:
Mestrado em Engenharia de Computadores e Telemática
Field Programmable Gate Arrays (FPGAs) were invented by Xilinx in 1985. Their reconfigurable nature allows to use them in multiple areas of Information Technologies. This project aims to study this technology to be an alternative to traditional data processing methods, namely sorting. The proposed solution is based on the principle of reusing resources to counter this technology’s known resources limitations.
As Field Programmable Gate Arrays (FPGAs) foram inventadas em 1985 pela Xilinx. A sua natureza reconfiguratória permite que sejam utilizadas em várias áreas das tecnologias de informação. Este trabalho tem como objectivo estudar o uso desta tecnologia como alternativa aos métodos tradicionais de processamento de dados, nomeadamente a ordenação. A solução proposta baseia-se na reutilização de recursos para combater as conhecidas limitações deste tipo de tecnologia.
APA, Harvard, Vancouver, ISO, and other styles
8

Jiang, Zhewei. "Algorithm and Hardware Co-Design for Local/Edge Computing." Thesis, 2020. https://doi.org/10.7916/d8-nxwg-f771.

Full text
Abstract:
Advances in VLSI manufacturing and design technology over the decades have created many computing paradigms for disparate computing needs. With concerns for transmission cost, security, latency of centralized computing, edge/local computing are increasingly prevalent in the faster growing sectors like Internet-of-Things (IoT) and other sectors that require energy/connectivity autonomous systems such as biomedical and industrial applications. Energy and power efficient are the main design constraints in local and edge computing. While there exists a wide range of low power design techniques, they are often underutilized in custom circuit designs as the algorithms are developed independent of the hardware. Such compartmentalized design approach fails to take advantage of the many compatible algorithmic and hardware techniques that can improve the efficiency of the entire system. Algorithm hardware co-design is to explore the design space with whole stack awareness. The main goal of the algorithm hardware co-design methodology is the enablement and improvement of small form factor edge and local VLSI systems operating under strict constraints of area and energy efficiency. This thesis presents selected works of application specific digital and mixed-signal integrated circuit designs. The application space ranges from implantable biomedical devices to edge machine learning acceleration.
APA, Harvard, Vancouver, ISO, and other styles
9

"Algorithm and Hardware Co-design for Learning On-a-chip." Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.45949.

Full text
Abstract:
abstract: Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm. For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability. Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance. From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.
Dissertation/Thesis
Doctoral Dissertation Electrical Engineering 2017
APA, Harvard, Vancouver, ISO, and other styles
10

Lin, Yin-Hsin, and 林殷旭. "Hardware-Software Co-design of an Automatic White Balance Algorithm." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/b4636z.

Full text
Abstract:
碩士
國立臺北科技大學
電腦與通訊研究所
94
As electronic techniques is continuous improved rapidly cameras or video camcorders used for image retrieval technology and development become digitalized. The color of the photographs would look very different due to differences in light projection illumination when we take a picture. Human eyes are able to automatically adjust the color when the illuminations of the light source vary. However, the most frequently used image sensor, charge coupled device, CCD device can not correct the color as human eyes. This paper presents a hardware-software co-design method based on Lam’s automatic white balance algorithm, which combines gray world assumption and perfect reflector assumption algorithms. The execution steps of Lam’s algorithm were divided into three stages. The hardware-software co-design and analysis for each stage was realized. Three factors including processing time, slices and DSP48s of hardware resources were used to formulate a Objective Function, which was employed to evaluate the system performance and hardware resource cost. Experimental results shows suitable partitions of hardware-software co-designs were achieved. An embedded processor, MicroBlaze developed by Xilinx and a floating point processor were used to deal with the software part of the algorithm. The hardware part of the algorithm was implemented using an IP-based method. It is able to reduce the memory and CPU resources of the PC as well as to have the properties of easy modification and function expansion by using such system on a programmable chip architecture.
APA, Harvard, Vancouver, ISO, and other styles
11

Chundi, Pavan Kumar. "Algorithm Hardware Co-Design of Neural Networks for Always-On Devices." Thesis, 2021. https://doi.org/10.7916/d8-xb06-4658.

Full text
Abstract:
Deep learning has become the algorithm of choice in many applications like face recognition, object detection, speech recognition, etc. because of superior accuracy. Large models with several parameters were developed to obtain higher accuracy, which eventually gave diminishing returns at very large training and deployment cost. Consequently, greater attention is now being paid to the efficiency of neural networks. Low power consumption is particularly important in the case of always-on applications. Some examples of these applications are the datacenters, cellular base stations, battery-powered devices like implantable devices, wearables, cell phones and UAVs. Improvement in the efficiency of these devices by reducing the power consumed will bring down the energy cost or extend the battery life or decrease the form factor of these devices, thereby improving the acceptability and adoption of the device. Neural networks are a significant component of the total workload in the case of IoT devices with smart functions and datacenters. Base stations can also employ neural networks to improve the rate of convergence in channel estimation. Efficient execution of the neural networks on always-on devices, therefore, helps in lowering the overall power dissipation. Algorithm only solutions target CPU or GPU as a platform and tend to focus on the number of computing operations. Hardware only solutions tend to focus on programmability, low voltage operation, standby power reduction and on-chip data movement. Such solutions fail to take advantage of the joint optimization of both algorithm and hardware for the target application. This thesis contributes to improving the efficiency of neural networks on always-on devices through both algorithmic and hardware interventions. It presents works of algorithm-hardware co-design which can obtain better power reduction in the case of a smart IoT device, a datacenter and a small cell base station. It achieves power reduction through a combination of appropriate neural network algorithm and architecture, simpler operations and a reduction in the number of off-chip memory accesses.
APA, Harvard, Vancouver, ISO, and other styles
12

Jr-ShiangPeng and 彭志祥. "Hardware and Software Co-design of Silicon Intellectual Property Module Based on Sequential Minimal Optimization algorithm for Speaker Recognition." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/72913970118404970293.

Full text
Abstract:
碩士
國立成功大學
電機工程學系碩博士班
98
This thesis proposes a hardware/software co-design IP for embedded text-independent speaker recognition system to increase convenient life through portable speech application. In hardware part, the Sequential Minimal Optimization (SMO) algorithm is adopted for accelerating SVM training to create speaker models. In software part, we modify our lab’s previous fixed-point arithmetic design for both the Linear Prediction Cepstral Coefficients (LPCC) and the one vs. one highest voting analysis algorithm. Two schemes, the heuristics selection and the efficient cache utilization method are proposed to implement the SMO algorithm into hardware design for decreasing the training time. Moreover, a specific design is proposed to efficiently utilize the bus bandwidth and reduce delivering time for about 5% between software and hardware communications. Finally, our simulation/emulation results show that 90% of training time is reduced while the recognition accuracy rate can achieve 92.7%.
APA, Harvard, Vancouver, ISO, and other styles
13

Hsiao, Chin-Mu, and 蕭金木. "Hardware/Software Co-design of AES Algorithms Using Custom Instructions." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/35389142457501490628.

Full text
Abstract:
碩士
輔仁大學
電子工程學系
96
The Advanced Encryption Standard (AES) is the new encryption standard appointed by NIST. To shorten the encryption/decryption time of plenty of data, it is necessary to adopt the algorithm of hardware implementation; however, it is possible to meet the requirement for low cost by completely using software only. How to reach a balance between the cost and efficiency of software and hardware implementation is a question worth of being discussed. In this paper, we implemented the AES encryption algorithm with hardware in combination with part of software using the custom instruction mechanism provided by the Altera NiosII platform. We completed a parameterized synthesizable design. Given a parameter setting, our system can generate the hardware design and necessary software/hardware interface automatically. We explored various combinations of hardware and software to realize AES algorithm and discussed possible best solutions of different needs.
APA, Harvard, Vancouver, ISO, and other styles
14

Weng, Chih-hsien, and 翁智賢. "Hardware/Software Co-design and Implementation of Algorithmic Processors for Image Processing." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/96720386726092132758.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
96
This thesis is related to hardware/software co-design and verification of the algorithmic processors for image processing. The research work includes four parts. The first part is about software design of the image processing algorithms such as center and size finding, translation, scaling, rotation, and projection. The second part is to design and implement hardware processors for the algorithms mentioned above. The third part is to write the related drivers to integrate the algorithmic processors and the verification system together. The fourth part is about the verification and performance test of the related algorithmic processors. On the whole, the goal of this thesis is to design and develop various algorithmic processors for image processing. Meanwhile, a hardware/software co-design method is presented to improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
15

Hsu, Chih-hao, and 許志豪. "Hardware/Software Co-design and Implementation of an Algorithmic Processor for Image Binarization." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/94852945422796097338.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
97
This thesis is related to the hardware/software co-design and verification of an algorithmic processor for image binarization. The research work includes four parts. The first part is about software design of the various binarization algorithms for digital images. After analyzing the advantages and disadvantages of these algorithms, the modified Sauvola algorithm is chosen for hardware implementation. The second part is to design and implement a hardware processor for the modified Sauvola algorithm. Meanwhile, in order to enhance the data transfer performance, a 2-D DMA controller has been designed. Finally, the algorithmic processor and 2-D DMA controller are integrated by using a SOPC-based system and implemented on an Altera FPGA development board. The third part is to write the related drivers for the algorithmic processor. Then the function of the algorithmic processor is verified through using a RPC-based verification system. The fourth part is about the verification and evaluation of the run-time performance of the algorithmic processor. On the whole, the goal of this thesis is to do researches on the development of a binarization algorithm for digital images. Then the related algorithmic processor is developed and implemented on the FPGA development board. After being verified by using various digital images, the algorithm developed in this thesis has shown very good performance for image binarization. Meanwhile, it also shows that the hardware/software co-design method presented can improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
16

Huang, Uao-Shine, and 黃耀陞. "Hardware/Software Co-design and Implementation of an Algorithmic Processor for Document Image Rotation." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/39920568792275923898.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
98
This thesis is related to the hardware/software co-design and verification of an algorithmic processor for binary document image rotation. The research work includes four parts: The first part is about software design of the rotation algorithm for binary document images. After analyzing the advantages and disadvantages of these algorithms and considering about the limited resources in the embedded hardware, a window-based rotation algorithm which uses inverse mapping and linear interpolation has been developed. The second part is to design and implement an algorithmic processor for the window-based rotation algorithm mentioned above. It stores full binary document images in DDR SDRAM. Therefore the processor consists of reference-region fetch unit, rotation-interpolation unit, destination-data store unit, and DDR SDRAM controller. Finally, the above hardware modules are integrated into an SOPC-based system and implemented on an Altera FPGA development board. The third part is to write the related drivers for the algorithmic processor. Then the function of the algorithmic processor is verified through using a RPC-based verification system. The fourth part is about the verification and evaluation of the run-time performance of the algorithmic processor. On the whole, the goal of this thesis is to do researches on the development of a rotation algorithm for binary document images. Then the related algorithmic processor is developed and implemented on the FPGA development board. After being verified by using various images and rotation angles, the algorithm developed in this thesis has shown very good performance for binary document image rotation. Meanwhile, it also shows that the hardware/software co-design method presented can improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
17

Lin, Yi-hsien, and 林奕諴. "Hardware/Software Co-design and Implementation of an Algorithmic Processor for Document Skew Detection." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/36422924001553221768.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
98
This thesis is related to the hardware/software co-design and verification of an algorithmic processor for skew detection. The research work includes four parts. The first part is about software design of the various skew detection algorithms for binary document images. After analyzing the advantages and disadvantages of these algorithms, the MICC-Projection algorithm is developed to improve the correctness of skew detection. The second part is to design and implement an algorithmic processor for the MICC-Projection algorithm which consists of MICC and projection sub-processors. The processor is integrated into an SOPC-based system and implemented on an Altera FPGA development board. The third part is to write the related drivers for the algorithmic processor. Then the function of the algorithmic processor is verified through using a RPC-based verification system. The fourth part is about the verification and evaluation of the run-time performance of the algorithmic processor. On the whole, the goal of this thesis is to do researches on the development of a skew detection algorithm for binary document images. Then the related algorithmic processor is developed and implemented on the FPGA development board. After being verified by using various binary document images, the algorithm developed in this thesis has shown very good performance for skew detection. Meanwhile, it also shows that the hardware/software co-design method presented can improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
18

Huang, Yin-hsiu, and 黃寅修. "Hardware/Software Co-design and Implementation of Algorithmic Processors for Boundary and Corner Detection." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/41126442183628616181.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
96
This thesis is related to hardware/software co-design and verification of the algorithmic processors for digital image processing. The research work includes three parts. The first part is about using a Linux personal computer system to design and verify the software for the boundary and corner detection algorithms. Here boundary detection means to mark the boundary points in a binary digital image and corner detection means to separate boundary points into several classes of features (i.e., concave, convex, and straight-line points) through using the following operations such as path finding, computing the cosine value of a corner, and corner classification. The second part is about the design of hardware and software/hardware interface for the boundary and corner detection algorithmic processors. In this work, the processor hardware is implemented on an Altera FPGA development board, and the software/hardware interface is designed according to NIOS II CPU bus standard. The third part is to use a well-developed RPC-based embedded system for the verification and performance test of the related algorithmic processors. On the whole, the goal of this thesis is to design and develop the prototypes for the boundary and corner detection algorithmic processors. Meanwhile, a hardware/software co-design method is presented to improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
19

Huang, Jiang-Shiuan, and 黃健軒. "Hardware/Software Co-design and Implementation of a Two-stage Algorithmic Processor for Hough-Transform-based Line Detection." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/chw2e4.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
99
This thesis is related to the hardware/software co-design and verification of an algorithmic processor for an HT-based (Hough- Transform-based) two-stage line detection algorithm. The related research work includes four parts: The first part is about software design of the HT-based line detection algorithm for binary images. After analyzing the property of the HT-based algorithm and considering about the limited hardware resources in the embedded system, a two-stage HT-based algorithm for line detection has been developed. The second part is to design and implement a two-stage algorithmic processor for HT-based line detection. SDARM is used to store the whole binary images. Therefore the processor consists of source data fetching sub-processor, Hough transform sub-processor, and local max finding sub-processor. Finally, the above hardware modules are integrated into an SOPC-based system and implemented on an Altera FPGA development board. The third part is to write the related drivers for the algorithmic processor. Then the function of the algorithmic processor is verified through using a RPC-based verification system. The fourth part is about the verification and the evaluation of the run-time performance of the algorithmic processor. On the whole, the goal of this thesis is to do researches on the development of an HT-based two-stage line detection algorithm and its hardware processor. Then the related algorithmic processor is developed and implemented on the FPGA development board. After being verified by using various images, the algorithm developed in this thesis has shown very good performance. Meanwhile, it also shows that the hardware/software co-design method presented can improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
20

Hu, Hong-Min, and 胡閎閔. "Hardware/Software Co-design and Implementation of a Temporal-Median-Filter-based Algorithmic Processing System for Background Subtraction." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/26408138785003942689.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
103
This thesis is relevant to the hardware/software co-design and implementation of a temporal-median-filter-based algorithmic processing system for background subtraction. The research work consists of the following four parts. The first part is related to the software design of the temporal-median-filter-based background subtraction algorithm. Meanwhile, through using the image-based output results, this algorithm has demonstrated its superiority in various applications. The second part is to design and implement a temporal-median-filter-based algorithmic processor for background subtraction. This algorithmic processor comprises three subprocessors which are for image information access, median finding, and background subtraction. Finally, all these parts mentioned above are integrated together and implemented on an Altera FPGA development board. The third part is related to the design and implementation of an algorithmic processing system which comprises SDRAM (for storing multiple complete images), the algorithmic processor described above, NIOS II CPU, and the related firmware. Meanwhile, the functionality of this system is verified through using NIOS II IDE. The fourth part is to analyze and evaluate the software, firmware, and hardware performance of the whole algorithmic processing system. On the whole, the goals of this thesis are to do research on a temporal-median-filter-based background subtraction algorithm and design an algorithmic processing system (on an Altera FPGA development board) for it. After being verified with various kinds of digital images, the algorithmic processing system developed in this thesis has shown fabulous computing performance and the related hardware/software co-design method can also be used to improve the efficiency of the design and verification process for other algorithmic processing systems.
APA, Harvard, Vancouver, ISO, and other styles
21

Hsu, Bo-Hsiang, and 許博翔. "Hardware/Software Co-design and Implementation of a Multi-pixel-based Pipelined Algorithmic Processor for Single-pass-based Connected Component Labeling." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/uhe6rj.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
100
This thesis is relevant to the hardware/software co-design and verification of an algorithmic processor for single-pass-based connected component labeling. The research work consists of the following four parts. The first part of the thesis focuses on the software design for the connected component labeling algorithms. After analyzing the characteristics of the computing results and considering the limitation of physical resources in the embedded systems, single-pass-based connected component labeling algorithms have been developed. The second part of the thesis focuses on the hardware design for single-pass-based connected component labeling algorithms. A DDR SDRAM is used to store the whole binary input image and the coordinate information of the bounding box of the labeled components. The algorithmic processor comprises four sub-processors: table initializer, labeler, connected component combinator, and connected component information retriever. And, finally, these hardware designs are integrated together and implemented on an Altera FPGA development board. The third part of the thesis focuses on writing the relevant drivers to construct a verification system for the algorithmic processor. Through using the remote procedure calls this system is controlled to verify the functionality of the processor. The fourth part of the thesis focuses on the verification and performance evaluation of the whole hardware and software for the algorithmic processor. Generally speaking, the goal of this thesis is to do the research on the single-pass-based connected component labeling algorithms and algorithmic processors for them are designed and implemented with the Altera FPGA development board. After verifying the algorithmic processors with various types of digital images, it has been shown that the algorithmic processors developed in this thesis have fabulous computing performance. Meanwhile, this approach of hardware/software co-design can also improve the efficiency of both design and verification flows for algorithmic processors.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography