Academic literature on the topic 'Hardware/algorithm co-design'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Hardware/algorithm co-design.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Hardware/algorithm co-design"

1

Chen, Andrew, Rohaan Gupta, Anton Borzenko, Kevin Wang, and Morteza Biglari-Abhari. "Accelerating SuperBE with Hardware/Software Co-Design." Journal of Imaging 4, no. 10 (October 18, 2018): 122. http://dx.doi.org/10.3390/jimaging4100122.

Full text
Abstract:
Background Estimation is a common computer vision task, used for segmenting moving objects in video streams. This can be useful as a pre-processing step, isolating regions of interest for more complicated algorithms performing detection, recognition, and identification tasks, in order to reduce overall computation time. This is especially important in the context of embedded systems like smart cameras, which may need to process images with constrained computational resources. This work focuses on accelerating SuperBE, a superpixel-based background estimation algorithm that was designed for simplicity and reducing computational complexity while maintaining state-of-the-art levels of accuracy. We explore both software and hardware acceleration opportunities, converting the original algorithm into a greyscale, integer-only version, and using Hardware/Software Co-design to develop hardware acceleration components on FPGA fabric that assist a software processor. We achieved a 4.4× speed improvement with the software optimisations alone, and a 2× speed improvement with the hardware optimisations alone. When combined, these led to a 9× speed improvement on a Cyclone V System-on-Chip, delivering almost 38 fps on 320 × 240 resolution images.
APA, Harvard, Vancouver, ISO, and other styles
2

Krawczyk, Kamil, Paweł Tomaszewicz, and Mariusz Rawski. "Whirlpool SoPC Implementation - Hardware/Software Co-Design Example." International Journal of Electronics and Telecommunications 58, no. 1 (March 1, 2012): 21–26. http://dx.doi.org/10.2478/v10177-012-0003-9.

Full text
Abstract:
Whirlpool SoPC Implementation - Hardware/Software Co-Design Example The aim of this work was to design a System on Programmable Chip (SoPC), that implements the Whirlpool Hash Function (WHF) algorithm. An assumption of the project was to use an embedded soft-processor NIOS II controlling the whole system, which functionality was extended by a custom logic in order to improve the used algorithm efficiency. This paper presents the Whirlpool Hash Function realized in several SoPC configurations, which differ in implementation complexity and performance.
APA, Harvard, Vancouver, ISO, and other styles
3

López, M., J. Daugman, and E. Cantó. "Hardware–software co-design of an iris recognition algorithm." IET Information Security 5, no. 1 (2011): 60. http://dx.doi.org/10.1049/iet-ifs.2009.0267.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Shih-An, Chen-Chien Hsu, Ching-Chang Wong, and Chia-Jun Yu. "Hardware/software co-design for particle swarm optimization algorithm." Information Sciences 181, no. 20 (October 2011): 4582–96. http://dx.doi.org/10.1016/j.ins.2010.07.017.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Alecsa, Bogdan, and Alexandru Onea. "Hardware-Software Co-Design for BLDC Motor Speed Controller Design." Advanced Materials Research 463-464 (February 2012): 1256–59. http://dx.doi.org/10.4028/www.scientific.net/amr.463-464.1256.

Full text
Abstract:
This paper proposes a combined hardware-software approach for a controller design. The case of a brushless DC (BLDC) motor speed controller is studied. A hardware controller is implemented inside a field programmable gate array (FPGA) device, together with soft core processors that implement by software non-critical tasks, like liquid crystal display (LCD) interface and serial data communication to a host computer. This way, the control algorithm is executed in hardware, as fast as possible, while the monitoring tasks are performed by the software. Experimental results are provided, showing the working design.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Xinyi, Yawen Wu, Peipei Zhou, Xulong Tang, and Jingtong Hu. "Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices." ACM Transactions on Embedded Computing Systems 20, no. 5s (October 31, 2021): 1–24. http://dx.doi.org/10.1145/3477002.

Full text
Abstract:
Multi-head self-attention (attention mechanism) has been employed in a variety of fields such as machine translation, language modeling, and image processing due to its superiority in feature extraction and sequential data analysis. This is benefited from a large number of parameters and sophisticated model architecture behind the attention mechanism. To efficiently deploy attention mechanism on resource-constrained devices, existing works propose to reduce the model size by building a customized smaller model or compressing a big standard model. A customized smaller model is usually optimized for the specific task and needs effort in model parameters exploration. Model compression reduces model size without hurting the model architecture robustness, which can be efficiently applied to different tasks. The compressed weights in the model are usually regularly shaped (e.g. rectangle) but the dimension sizes vary (e.g. differs in rectangle height and width). Such compressed attention mechanism can be efficiently deployed on CPU/GPU platforms as their memory and computing resources can be flexibly assigned with demand. However, for Field Programmable Gate Arrays (FPGAs), the data buffer allocation and computing kernel are fixed at run time to achieve maximum energy efficiency. After compression, weights are much smaller and different in size, which leads to inefficient utilization of FPGA on-chip buffer. Moreover, the different weight heights and widths may lead to inefficient FPGA computing kernel execution. Due to the large number of weights in the attention mechanism, building a unique buffer and computing kernel for each compressed weight on FPGA is not feasible. In this work, we jointly consider the compression impact on buffer allocation and the required computing kernel during the attention mechanism compressing. A novel structural pruning method with memory footprint awareness is proposed and the associated accelerator on FPGA is designed. The experimental results show that our work can compress Transformer (an attention mechanism based model) by 95x. The developed accelerator can fully utilize the FPGA resource, processing the sparse attention mechanism with the run-time throughput performance of 1.87 Tops in ZCU102 FPGA.
APA, Harvard, Vancouver, ISO, and other styles
7

Ismael, Sarmad, Omar Tareq, and Yahya Taher Qassim. "Hardware/software co-design for a parallel three-dimensional bresenham’s algorithm." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 1 (February 1, 2019): 148. http://dx.doi.org/10.11591/ijece.v9i1.pp148-156.

Full text
Abstract:
<p>Line plotting is the one of the basic operations in the scan conversion. Bresenham’s line drawing algorithm is an efficient and high popular algorithm utilized for this purpose. This algorithm starts from one end-point of the line to the other end-point by calculating one point at each step. As a result, the calculation time for all the points depends on the length of the line thereby the number of the total points presented. In this paper, we developed an approach to speed up the Bresenham algorithm by partitioning each line into number of segments, find the points belong to those segments and drawing them simultaneously to formulate the main line. As a result, the higher number of segments generated, the faster the points are calculated. By employing 32 cores in the Field Programmable Gate Array, a line of length 992 points is formulated in 0.31μs only. The complete system is implemented using Zybo board that contains the Xilinx Zynq-7000 chip (Z-7010).<em></em></p>
APA, Harvard, Vancouver, ISO, and other styles
8

Grout, Ian Andrew, and Lenore Mullin. "Realizing Mathematics of Arrays Operations as Custom Architecture Hardware-Software Co-Design Solutions." Information 13, no. 11 (November 4, 2022): 528. http://dx.doi.org/10.3390/info13110528.

Full text
Abstract:
In embedded electronic system applications being developed today, complex datasets are required to be obtained, processed, and communicated. These can be from various sources such as environmental sensors, still image cameras, and video cameras. Once obtained and stored in electronic memory, the data is accessed and processed using suitable mathematical algorithms. How the data are stored, accessed, processed, and communicated will impact on the cost to process the data. Such algorithms are traditionally implemented in software programs that run on a suitable processor. However, different approaches can be considered to create the digital system architecture that would consist of the memory, processing, and communications operations. When considering the mathematics at the centre of the design making processes, this leads to system architectures that can be optimized for the required algorithm or algorithms to realize. Mathematics of Arrays (MoA) is a class of operations that supports n-dimensional array computations using array shapes and indexing of values held within the array. In this article, the concept of MoA is considered for realization in software and hardware using Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) technologies. The realization of MoA algorithms will be developed along with the design choices that would be required to map a MoA algorithm to hardware, software or hardware-software co-designs.
APA, Harvard, Vancouver, ISO, and other styles
9

Raghunathan, Shriram, Sumeet K. Gupta, Himanshu S. Markandeya, Kaushik Roy, and Pedro P. Irazoqui. "A hardware-algorithm co-design approach to optimize seizure detection algorithms for implantable applications." Journal of Neuroscience Methods 193, no. 1 (October 2010): 106–17. http://dx.doi.org/10.1016/j.jneumeth.2010.08.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Drumond, Mario, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, and Dionisios Pnevmatikatos. "Algorithm/Architecture Co-Design for Near-Memory Processing." ACM SIGOPS Operating Systems Review 52, no. 1 (August 28, 2018): 109–22. http://dx.doi.org/10.1145/3273982.3273992.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Hardware/algorithm co-design"

1

Zhang, Zhengdong Ph D. Massachusetts Institute of Technology. "Efficient computing for autonomous navigation using algorithm-and-hardware co-design." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122691.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 211-221).
Autonomous navigation algorithms are the backbone of many robotic systems, such as self-driving cars and drones. However, state-of-the-art autonomous navigation algorithms are computationally expensive, requiring powerful CPUs and GPUs to enable them to run in real time. As a result, it is prohibitive to deploy them on miniature robots with limited computational resources onboard. To tackle this challenge, this thesis presents an algorithm-and-hardware co-design approach to design energy-efficient algorithms that are optimized for dedicated hardware architectures at the same time. It covers the design for three essential modules of an autonomous navigation system: perception, localization, and exploration.
Compared with previous research that considers either algorithmic improvements or hardware architecture optimizations, our approach leads to algorithms that not only have lower time and space complexity but also map efficiently to specialized hardware architectures, resulting in significantly improved energy efficiency and throughput. First, this thesis studies how to design an energy-efficient visual perception system using the deformable part models (DPM) based object detection algorithm. It describes an algorithm that enforces sparsity in the data stored on a chip, which reduces the memory requirement by 34% and lowers the cost of the classification by 43%. Together with other hardware optimizations, this technique leads to an object detection chip that runs at 30 fps on 1920 x 1080 videos while consuming only 58.6mW of power.
Second, this thesis describes a systematic way to explore algorithm-hardware design choices to build a low-power chip that performs visual inertial odometry (VIO) to localize a vehicle. Each of the components in a VIO pipeline has multiple algorithmic choices with different time and space complexity. However, some algorithms of lower time complexity can be more expensive when implemented on-chip. This thesis examines each of the design choices from both the algorithm and hardware's point of view and presents a design that consumes 24mW of power while running at up to 90 fps and achieving near state-of-the-art localization accuracy Third, this thesis presents an efficient information theoretic mapping system for exploration. It features a novel algorithm called Fast computation of Shannon Mutual Information (FSMI) that computes the Shannon mutual information (MI) between perspective range measurements and the environment.
FSMI algorithm features an analytic solution that avoids the expensive numerical integration required by the previous state-of-the-art algorithms, enabling FSMI to run three orders-of-magnitude faster in practice. We also present an extension of the FSMI algorithm to 3D mapping; the algorithm leverages the compression of a large 3D map using run-length encoding (RLE) and achieves 8x acceleration in a real-world exploration task. In addition, this thesis presents a hardware architecture designed for the FSMI algorithm. The design consists of a novel memory banking method that increases the memory bandwidth so that multiple FSMI cores can run in parallel while maintaining high utilization. A novel arbiter is proposed to resolve the memory read conflicts between multiple cores within one clock cycle. The final design on an FPGA achieves more than 100x higher throughput compared with a CPU while consuming less than 1/10 of the power.
by Zhengdong Zhang.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
2

Tzou, Nicholas. "Low-cost sub-Nyquist sampling hardware and algorithm co-design for wideband and high-speed signal characterization and measurement." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51876.

Full text
Abstract:
Cost reduction has been and will continue to be a primary driving force in the evolution of hardware design and associated technologies. The objective of this research is to design low-cost signal acquisition systems for characterizing wideband and high-speed signals. As the bandwidth and the speed of such signals increase, the cost of testing also increases significantly; therefore, innovative hardware and algorithm co-design are needed to relieve this problem. In Chapter 2, a low-cost multi-rate system is proposed for characterizing the spectra of wideband signals. The design is low-cost in the sense of the actual component cost, the system complexity, and the effort required for calibration. The associated algorithms are designed such that the hardware can be implemented with low-complexity yet be robust enough to deal with various hardware variations. A hardware prototype is built not only to verify the proposed hardware scheme and algorithms but to serve as a concrete example that shows that characterizing signals with sub-Nyqusit sampling rate is feasible. Chapter 3 introduces a low-cost time-domain waveform reconstruction technique, which requires no mutual synchronization mechanisms. This brings down cost significantly and enables the implementation of systems capable of capturing tens of Gigahertz (GHz) signals for significantly lower cost than high-end oscilloscopes found in the market today. For the first time, band-interleaving and incoherent undersampling techniques are combined to form a low-cost solution for waveform reconstruction. This is enabled by co-designing the hardware and the back-end signal processing algorithms to compensate for the lack of coherent Nyquist rate sampling hardware. A hardware prototype was built to support this work. Chapter 4 describes a novel test methodology that significantly reduces the required time for crosstalk jitter characterization in parallel channels. This is done by using bit patterns with coprime periods as channel stimuli and using signal processing algorithms to separate multiple crosstalk coupling effects. This proposed test methodology can be applied seamlessly in conjunction with the current test methodology without re-designing the test setup. More importantly, the conclusion derived from the mathematical analysis shows that only such test stimuli give unbiased characterization results, which are critical in all high-precision test setups. Hardware measurement results and analysis are provided to support this methodology. This thesis starts with an overview of the background and a literature review. Three major previously mentioned works are addressed in three separate chapters. Each chapter documents the hardware designs, signal processing algorithms, and associated mathematical analyses. For the purpose of verification, the hardware measurement setups and results are discussed at the end of these three chapters. The last chapter presents conclusions and future directions for work from this thesis.
APA, Harvard, Vancouver, ISO, and other styles
3

Narasimhan, Seetharam. "Ultralow-Power and Robust Implantable Neural Interfaces: An Algorithm-Architecture-Circuit Co-Design Approach." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1333743306.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Trindade, Alessandro Bezerra. "Aplicando verificação de modelos baseada nas teorias do módulo da satisfabilidade para o particionamento de hardware/software em sistemas embarcados." Universidade Federal do Amazonas, 2015. http://tede.ufam.edu.br/handle/tede/4091.

Full text
Abstract:
Submitted by Kamila Costa (kamilavasconceloscosta@gmail.com) on 2015-06-15T21:23:16Z No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-16T15:00:54Z (GMT) No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-16T15:02:16Z (GMT) No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5)
Made available in DSpace on 2015-06-16T15:02:16Z (GMT). No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5) Previous issue date: 2015-02-09
Não Informada
When performing hardware/software co-design for embedded systems, does emerge the problem of allocating properly which functions of the system should be implemented in hardware (HW) or in software (SW). This problem is known as HW/SW partitioning and in the last ten years, a significant research effort has been carried out in this area. In this proposed project, we present two new approaches to solve the HW/SW partitioning problem by using SMT-based verification techniques, and comparing the results using the traditional technique of Integer Linear Programming (ILP) and a modern method of optimization by Genetic Algorithm (GA). The goal is to show with experimental results that model checking techniques can be effective, in particular cases, to find the optimal solution of the HW/SW partitioning problem using a state-of-the-art model checker based on Satisfiability Modulo Theories (SMT) solvers, when compared to the traditional techniques.
Quando se realiza um coprojeto de hardware/software para sistemas embarcados, emerge o problema de se decidir qual função do sistema deve ser implementada em hardware (HW) ou em software (SW). Este tipo de problema recebe o nome de particionamento de HW/SW. Na última década, um esforço significante de pesquisa tem sido empregado nesta área. Neste trabalho, são apresentadas duas novas abordagens para resolver o problema de particionamento de HW/SW usando técnicas de verificação formal baseadas nas teorias do módulo da satisfabilidade (SMT). São comparados os resultados obtidos com a tradicional técnica de programação linear inteira (ILP) e com o método moderno de otimização por algoritmo genético (GA). O objetivo é demonstrar, com os resultados empíricos, que as técnicas de verificação de modelos podem ser efetivas, em casos particulares, para encontrar a solução ótima do problema de particionamento de HW/SW usando um verificador de modelos baseado no solucionador SMT, quando comparado com técnicas tradicionais.
APA, Harvard, Vancouver, ISO, and other styles
5

Bahri, Imen. "Contribution des systèmes sur puce basés sur FPGA pour les applications embarquées d’entraînement électrique." Thesis, Cergy-Pontoise, 2011. http://www.theses.fr/2011CERG0529/document.

Full text
Abstract:
La conception des systèmes de contrôle embarqués devient de plus en plus complexe en raison des algorithmes utilisés, de l'augmentation des besoins industriels et de la nature des domaines d'applications. Une façon de gérer cette complexité est de concevoir les contrôleurs correspondant en se basant sur des plateformes numériques puissantes et ouvertes. Plus précisément, cette thèse s'intéresse à l'utilisation des plateformes FPGA System-on-Chip (SoC) pour la mise en œuvre des algorithmes d'entraînement électrique pour des applications avioniques. Ces dernières sont caractérisées par des difficultés techniques telles que leur environnement de travail (pression, température élevée) et les exigences de performance (le haut degré d'intégration, la flexibilité). Durant cette thèse, l'auteur a contribué à concevoir et à tester un contrôleur numérique pour un variateur de vitesse synchrone qui doit fonctionner à 200 °C de température ambiante. Il s'agit d'une commande par flux orienté (FOC) pour une Machine Synchrone à Aimants Permanents (MSAP) associée à un capteur de type résolveur. Une méthode de conception et de validation a été proposée et testée en utilisant une carte FPGA ProAsicPlus de la société Actel/Microsemi. L'impact de la température sur la fréquence de fonctionnement a également été analysé. Un état de l'art des technologies basées sur les SoC sur FPGA a été également présenté. Une description détaillée des plateformes numériques récentes et les contraintes en lien avec les applications embarquées a été également fourni. Ainsi, l'intérêt d'une approche basée sur SoC pour des applications d'entrainements électriques a été démontré. D'un autre coté et pour profiter pleinement des avantages offertes par les SoC, une méthodologie de Co-conception matériel-logiciel (hardware-software (HW-SW)) pour le contrôle d'entraînement électrique a été proposée. Cette méthode couvre l'ensemble des étapes de développement de l'application de contrôle à partir des spécifications jusqu'à la validation expérimentale. Une des principales étapes de cette méthode est le partitionnement HW-SW. Le but est de trouver une combinaison optimale entre les modules à mettre en œuvre dans la partie logiciel et celles qui doivent être mis en œuvre dans la partie matériel. Ce problème d'optimisation multi-objectif a été réalisé en utilisant l'algorithme de génétique, Non-Dominated Sorting Genetic Algorithm (NSGA-II). Ainsi, un Front de Pareto des solutions optimales peut être déduit. L'illustration de la méthodologie proposée a été effectuée en se basant sur l'exemple du régulateur de vitesse sans capteur utilisant le filtre de Kalman étendu (EKF). Le choix de cet exemple correspond à une tendance majeure dans le domaine des contrôleurs embraqués pour entrainements électriques. Par ailleurs, la gestion de l'architecture du contrôleur embarqué basée sur une approche SoC a été effectuée en utilisant un système d'exploitation temps réel. Afin d'accélérer les services de ce système d'exploitation, une unité temps réel a été développée en VHDL et associée au système d'exploitation. Il s'agit de placer les services d'ordonnanceur et des processus de communication du système d'exploitation logiciel au matériel. Ceci a permis une accélération significative du traitement. La validation expérimentale d'un contrôleur du courant a été effectuée en utilisant un banc de test du laboratoire. Les résultats obtenus prouvent l'intérêt de l'approche proposée
Designing embedded control systems becomes increasingly complex due to the growing of algorithm complexity, the rising of industrials requirements and the nature of application domains. One way to handle with this complexity is to design the corresponding controllers on performing powerful and open digital platforms. More specifically, this PhD deals with the use of FPGA System-on-Chip (SoC) platforms for the implementation of complex AC drive controllers for avionic applications. These latters are characterized by stringent technical issues such as environment conditions (pressure, high temperature) and high performance requirements (high integration, flexibility and efficiency). During this thesis, the author has contributed to design and to test a digital controller for a high temperature synchronous drive that must operate at 200°C ambient. It consists on the Flux Oriented Controller (FOC) for a Permanent Magnet Synchronous Machine (PMSM) associated with a Resolver sensor. A design and validation method has been proposed and tested using a FPGA ProAsicPlus board from Actel-Microsemi Company. The impact of the temperature on the operating frequency has been also analyzed. A state of the art FPGA SoC technology has been also presented. A detailed description of the recent digital platforms and constraints in link with embedded applications was investigated. Thus, the interest of a SoC-based approach for AC drives applications was also established. Additionally and to have full advantages of a SoC based approach, an appropriate HW-SW Co-design methodology for electrical AC drive has been proposed. This method covers the whole development steps of the control application from the specifications to the final experimental validation. One of the main important steps of this method is the HW-SW partitioning. The goal is to find an optimal combination between modules to be implemented in software and those to be implemented in hardware. This multi-objective optimization problem was performed with the Non-Dominated Sorting Genetic Algorithm (NSGA-II). Thus, the Pareto-Front of optimal solution can be deduced. The illustration of the proposed Co-design methodology was made based on the sensorless speed controller using the Extended Kalman Filter (EKF). The choice of this benchmark corresponds to a major trend in embedded control of AC drives. Besides, the management of SoC-based architecture of the embedded controller was allowed using an efficient Real-Time Operating System (RTOS). To accelerate the services of this operating system, a Real-Time Unit (RTU) was developed in VHDL and associated to the RTOS. It consists in hardware operating system that moves the scheduling and communication process from software RTOS to hardware. Thus, a significant acceleration has been achieved. The experimentation tests based on digital current controller were also carried out using a laboratory set-up. The obtained results prove the interest of the proposed approach
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Yuanzhi. "Algorithms and Hardware Co-Design of HEVC Intra Encoders." OpenSIUC, 2019. https://opensiuc.lib.siu.edu/dissertations/1769.

Full text
Abstract:
Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction.
APA, Harvard, Vancouver, ISO, and other styles
7

Marques, Vítor Manuel dos Santos. "Performance of hardware and software sorting algorithms implemented in a SOC." Master's thesis, Universidade de Aveiro, 2017. http://hdl.handle.net/10773/23467.

Full text
Abstract:
Mestrado em Engenharia de Computadores e Telemática
Field Programmable Gate Arrays (FPGAs) were invented by Xilinx in 1985. Their reconfigurable nature allows to use them in multiple areas of Information Technologies. This project aims to study this technology to be an alternative to traditional data processing methods, namely sorting. The proposed solution is based on the principle of reusing resources to counter this technology’s known resources limitations.
As Field Programmable Gate Arrays (FPGAs) foram inventadas em 1985 pela Xilinx. A sua natureza reconfiguratória permite que sejam utilizadas em várias áreas das tecnologias de informação. Este trabalho tem como objectivo estudar o uso desta tecnologia como alternativa aos métodos tradicionais de processamento de dados, nomeadamente a ordenação. A solução proposta baseia-se na reutilização de recursos para combater as conhecidas limitações deste tipo de tecnologia.
APA, Harvard, Vancouver, ISO, and other styles
8

Jiang, Zhewei. "Algorithm and Hardware Co-Design for Local/Edge Computing." Thesis, 2020. https://doi.org/10.7916/d8-nxwg-f771.

Full text
Abstract:
Advances in VLSI manufacturing and design technology over the decades have created many computing paradigms for disparate computing needs. With concerns for transmission cost, security, latency of centralized computing, edge/local computing are increasingly prevalent in the faster growing sectors like Internet-of-Things (IoT) and other sectors that require energy/connectivity autonomous systems such as biomedical and industrial applications. Energy and power efficient are the main design constraints in local and edge computing. While there exists a wide range of low power design techniques, they are often underutilized in custom circuit designs as the algorithms are developed independent of the hardware. Such compartmentalized design approach fails to take advantage of the many compatible algorithmic and hardware techniques that can improve the efficiency of the entire system. Algorithm hardware co-design is to explore the design space with whole stack awareness. The main goal of the algorithm hardware co-design methodology is the enablement and improvement of small form factor edge and local VLSI systems operating under strict constraints of area and energy efficiency. This thesis presents selected works of application specific digital and mixed-signal integrated circuit designs. The application space ranges from implantable biomedical devices to edge machine learning acceleration.
APA, Harvard, Vancouver, ISO, and other styles
9

"Algorithm and Hardware Co-design for Learning On-a-chip." Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.45949.

Full text
Abstract:
abstract: Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm. For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability. Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance. From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.
Dissertation/Thesis
Doctoral Dissertation Electrical Engineering 2017
APA, Harvard, Vancouver, ISO, and other styles
10

Lin, Yin-Hsin, and 林殷旭. "Hardware-Software Co-design of an Automatic White Balance Algorithm." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/b4636z.

Full text
Abstract:
碩士
國立臺北科技大學
電腦與通訊研究所
94
As electronic techniques is continuous improved rapidly cameras or video camcorders used for image retrieval technology and development become digitalized. The color of the photographs would look very different due to differences in light projection illumination when we take a picture. Human eyes are able to automatically adjust the color when the illuminations of the light source vary. However, the most frequently used image sensor, charge coupled device, CCD device can not correct the color as human eyes. This paper presents a hardware-software co-design method based on Lam’s automatic white balance algorithm, which combines gray world assumption and perfect reflector assumption algorithms. The execution steps of Lam’s algorithm were divided into three stages. The hardware-software co-design and analysis for each stage was realized. Three factors including processing time, slices and DSP48s of hardware resources were used to formulate a Objective Function, which was employed to evaluate the system performance and hardware resource cost. Experimental results shows suitable partitions of hardware-software co-designs were achieved. An embedded processor, MicroBlaze developed by Xilinx and a floating point processor were used to deal with the software part of the algorithm. The hardware part of the algorithm was implemented using an IP-based method. It is able to reduce the memory and CPU resources of the PC as well as to have the properties of easy modification and function expansion by using such system on a programmable chip architecture.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Hardware/algorithm co-design"

1

Lodha, Nupur, Nivesh Rai, Rahul Dubey, and Hrishikesh Venkataraman. "Hardware-Software Co-design of QRD-RLS Algorithm with Microblaze Soft Core Processor." In Information Systems, Technology and Management, 197–207. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-00405-6_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wolf, Wayne. "Hardware/Software Co-Synthesis Algorithms." In Hardware/Software Co-Design: Principles and Practice, 47–73. Boston, MA: Springer US, 1997. http://dx.doi.org/10.1007/978-1-4757-2649-7_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mycroft, Alan, and Richard Sharp. "Hardware/Software Co-design Using Functional Languages." In Tools and Algorithms for the Construction and Analysis of Systems, 236–51. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-45319-9_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wolf, Wayne H. "An Architectural Co-Synthesis Algorithm for Distributed, Embedded Computing Systems. Manuscript received December 21, 1994; revised October 6, 1995. This work was supported in part by the National Science Foundation under Grant MIP-9121901. Equipment support for this work was provided by the National Science Foundation under Grant CDA-9216171. Publisher Item Identifier S 1063-8210(97)01954-9." In Readings in Hardware/Software Co-Design, 338–49. Elsevier, 2002. http://dx.doi.org/10.1016/b978-155860702-6/50030-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

LIU, C. L., and JAMES W. LAYLAND. "Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment." In Readings in Hardware/Software Co-Design, 179–94. Elsevier, 2002. http://dx.doi.org/10.1016/b978-155860702-6/50016-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Bhattacharyya, Shuvra S., Joseph T. Buck, Soonhoi Ha, and Edward A. Lee. "Generating Compact Code from Dataflow Specifications of Multirate Signal Processing Algorithms Manuscript received May 25, 1993 December 1, 1994 This work was part of the Ptolemy project, supported by the Advanced Research Projects Agency and U. S. Air Force (RASSP program, Contract F33615-93-C-1317), Semiconductor Research Corporation (Project 94-DC-008), National Science Foundation (MIP-9201605), Office of Naval Technology (Naval Research Laboratories), State of California MICRO program, and the following companies: Bell Northern Research, Cadence, Dolby, Hitachi, Mentor Graphics, Mitsubishi, NEC, Pacific Bell, Philips, Rockwell, Sony, and Synopsys. This paper was recommended by Associate Editor D. Mlynski. IEEE Log Number 9409315." In Readings in Hardware/Software Co-Design, 452–64. Elsevier, 2002. http://dx.doi.org/10.1016/b978-155860702-6/50040-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Hardware/algorithm co-design"

1

Ji, Hao, Masha Sosonkina, and Yaohang Li. "An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors." In 2014 Hardware-Software Co-Design for High Performance Computing (Co-HPC). IEEE, 2014. http://dx.doi.org/10.1109/co-hpc.2014.10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tramm, John R., Kazutomo Yoshii, and Andrew R. Siegel. "Power Profiling of a Reduced Data Movement Algorithm for Neutron Cross Section Data in Monte Carlo Simulations." In 2014 Hardware-Software Co-Design for High Performance Computing (Co-HPC). IEEE, 2014. http://dx.doi.org/10.1109/co-hpc.2014.9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Huang, Qijing, Dequan Wang, Yizhao Gao, Yaohui Cai, Zhen Dong, Bichen Wu, Kurt Keutzer, and John Wawrzynek. "Algorithm-hardware Co-design for Deformable Convolution." In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS). IEEE, 2019. http://dx.doi.org/10.1109/emc2-nips53020.2019.00019.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shang, Qianyi, Lijun Chen, and Ruoxiong Tong. "Hardware/Software Co-design for Evolvable Hardware by Genetic Algorithm." In 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS). IEEE, 2020. http://dx.doi.org/10.1109/icaiis49377.2020.9194828.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Shih-An Li, Ching-Chang Wong, Chia-Jun Yu, and Chen-Chien Hsu. "Hardware/software co-design for particle swarm optimization algorithm." In 2010 IEEE International Conference on Systems, Man and Cybernetics - SMC. IEEE, 2010. http://dx.doi.org/10.1109/icsmc.2010.5641826.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Fan, Hongxiang, Martin Ferianc, Zhiqiang Que, He Li, Shuanglong Liu, Xinyu Niu, and Wayne Luk. "Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator." In 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2022. http://dx.doi.org/10.1109/asp-dac52403.2022.9712541.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jigang, Wu, Thambipillai Srikanthan, and Tao Jiao. "Efficient algorithm for functional scheduling in hardware/software co-design." In 2006 IEEE International Conference on Field Programmable Technology. IEEE, 2006. http://dx.doi.org/10.1109/fpt.2006.270296.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Lopez, Mariano, Enrique Canto, and Mariano Fons. "Hardware-Software Co-design of a Fingerprint Image Enhancement Algorithm." In IECON 2006 - 32nd Annual Conference on IEEE Industrial Electronics. IEEE, 2006. http://dx.doi.org/10.1109/iecon.2006.347727.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Kayankit, W., and W. Suntiamorntut. "Hardware/software co-design for line detection algorithm on FPGA." In 2009 6th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). IEEE, 2009. http://dx.doi.org/10.1109/ecticon.2009.5137079.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ganguly, Prapti, Saumita Roy, Tanmay Biswas, Chandrajit Pal, and Amlan Chakrabarti. "DSP hardware software co-design of audio de-noising algorithm." In 2014 International Conference on Control, Instrumentation, Energy and Communication (CIEC). IEEE, 2014. http://dx.doi.org/10.1109/ciec.2014.6959175.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography