Dissertations / Theses on the topic 'Algorithm co-design'

To see the other types of publications on this topic, follow the link: Algorithm co-design.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 38 dissertations / theses for your research on the topic 'Algorithm co-design.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Zhengdong Ph D. Massachusetts Institute of Technology. "Efficient computing for autonomous navigation using algorithm-and-hardware co-design." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122691.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 211-221).
Autonomous navigation algorithms are the backbone of many robotic systems, such as self-driving cars and drones. However, state-of-the-art autonomous navigation algorithms are computationally expensive, requiring powerful CPUs and GPUs to enable them to run in real time. As a result, it is prohibitive to deploy them on miniature robots with limited computational resources onboard. To tackle this challenge, this thesis presents an algorithm-and-hardware co-design approach to design energy-efficient algorithms that are optimized for dedicated hardware architectures at the same time. It covers the design for three essential modules of an autonomous navigation system: perception, localization, and exploration.
Compared with previous research that considers either algorithmic improvements or hardware architecture optimizations, our approach leads to algorithms that not only have lower time and space complexity but also map efficiently to specialized hardware architectures, resulting in significantly improved energy efficiency and throughput. First, this thesis studies how to design an energy-efficient visual perception system using the deformable part models (DPM) based object detection algorithm. It describes an algorithm that enforces sparsity in the data stored on a chip, which reduces the memory requirement by 34% and lowers the cost of the classification by 43%. Together with other hardware optimizations, this technique leads to an object detection chip that runs at 30 fps on 1920 x 1080 videos while consuming only 58.6mW of power.
Second, this thesis describes a systematic way to explore algorithm-hardware design choices to build a low-power chip that performs visual inertial odometry (VIO) to localize a vehicle. Each of the components in a VIO pipeline has multiple algorithmic choices with different time and space complexity. However, some algorithms of lower time complexity can be more expensive when implemented on-chip. This thesis examines each of the design choices from both the algorithm and hardware's point of view and presents a design that consumes 24mW of power while running at up to 90 fps and achieving near state-of-the-art localization accuracy Third, this thesis presents an efficient information theoretic mapping system for exploration. It features a novel algorithm called Fast computation of Shannon Mutual Information (FSMI) that computes the Shannon mutual information (MI) between perspective range measurements and the environment.
FSMI algorithm features an analytic solution that avoids the expensive numerical integration required by the previous state-of-the-art algorithms, enabling FSMI to run three orders-of-magnitude faster in practice. We also present an extension of the FSMI algorithm to 3D mapping; the algorithm leverages the compression of a large 3D map using run-length encoding (RLE) and achieves 8x acceleration in a real-world exploration task. In addition, this thesis presents a hardware architecture designed for the FSMI algorithm. The design consists of a novel memory banking method that increases the memory bandwidth so that multiple FSMI cores can run in parallel while maintaining high utilization. A novel arbiter is proposed to resolve the memory read conflicts between multiple cores within one clock cycle. The final design on an FPGA achieves more than 100x higher throughput compared with a CPU while consuming less than 1/10 of the power.
by Zhengdong Zhang.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
2

Sherbaf, Behtash Mohammad. "A Decomposition-based Multidisciplinary Dynamic System Design Optimization Algorithm for Large-Scale Dynamic System Co-Design." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535468984437623.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chee, Kenneth W. "APPLIED HW/SW CO-DESIGN: Using the Kendall Tau Algorithm for Adaptive Pacing." DigitalCommons@CalPoly, 2013. https://digitalcommons.calpoly.edu/theses/1038.

Full text
Abstract:
Microcontrollers, the brains of embedded systems, have found their way into every aspect of our lives including medical devices such as pacemakers. Pacemakers provide life supporting functions to people therefore it is critical for these devices to meet their timing constraints. This thesis examines the use of hardware co-processing to accelerate the calculation time associated with the critical tasks of a pacemaker. In particular, we use an FPGA to accelerate a microcontroller’s calculation time of the Kendall Tau Rank Correlation Coefficient algorithm. The Kendall Tau Rank Correlation Coefficient is a statistical measure that determines the pacemaker’s voltage level for heart stimulation. This thesis explores three different hardware distributions of this algorithm between an FPGA and a pacemaker’s microcontroller. The first implementation uses one microcontroller to establish the baseline performance of the system. The next implementation executes the entire Kendall Tau algorithm on an FPGA with varying degrees of parallelism. The final implementation of the Kendall Tau algorithm splits the computational requirements between the microcontroller and FPGA. This thesis uses these implementations to compare system-level issues such as power consumption and other tradeoffs that arise when using an FPGA for co-processing.
APA, Harvard, Vancouver, ISO, and other styles
4

Narasimhan, Seetharam. "Ultralow-Power and Robust Implantable Neural Interfaces: An Algorithm-Architecture-Circuit Co-Design Approach." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1333743306.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tzou, Nicholas. "Low-cost sub-Nyquist sampling hardware and algorithm co-design for wideband and high-speed signal characterization and measurement." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51876.

Full text
Abstract:
Cost reduction has been and will continue to be a primary driving force in the evolution of hardware design and associated technologies. The objective of this research is to design low-cost signal acquisition systems for characterizing wideband and high-speed signals. As the bandwidth and the speed of such signals increase, the cost of testing also increases significantly; therefore, innovative hardware and algorithm co-design are needed to relieve this problem. In Chapter 2, a low-cost multi-rate system is proposed for characterizing the spectra of wideband signals. The design is low-cost in the sense of the actual component cost, the system complexity, and the effort required for calibration. The associated algorithms are designed such that the hardware can be implemented with low-complexity yet be robust enough to deal with various hardware variations. A hardware prototype is built not only to verify the proposed hardware scheme and algorithms but to serve as a concrete example that shows that characterizing signals with sub-Nyqusit sampling rate is feasible. Chapter 3 introduces a low-cost time-domain waveform reconstruction technique, which requires no mutual synchronization mechanisms. This brings down cost significantly and enables the implementation of systems capable of capturing tens of Gigahertz (GHz) signals for significantly lower cost than high-end oscilloscopes found in the market today. For the first time, band-interleaving and incoherent undersampling techniques are combined to form a low-cost solution for waveform reconstruction. This is enabled by co-designing the hardware and the back-end signal processing algorithms to compensate for the lack of coherent Nyquist rate sampling hardware. A hardware prototype was built to support this work. Chapter 4 describes a novel test methodology that significantly reduces the required time for crosstalk jitter characterization in parallel channels. This is done by using bit patterns with coprime periods as channel stimuli and using signal processing algorithms to separate multiple crosstalk coupling effects. This proposed test methodology can be applied seamlessly in conjunction with the current test methodology without re-designing the test setup. More importantly, the conclusion derived from the mathematical analysis shows that only such test stimuli give unbiased characterization results, which are critical in all high-precision test setups. Hardware measurement results and analysis are provided to support this methodology. This thesis starts with an overview of the background and a literature review. Three major previously mentioned works are addressed in three separate chapters. Each chapter documents the hardware designs, signal processing algorithms, and associated mathematical analyses. For the purpose of verification, the hardware measurement setups and results are discussed at the end of these three chapters. The last chapter presents conclusions and future directions for work from this thesis.
APA, Harvard, Vancouver, ISO, and other styles
6

Cooksey, Kenneth Daniel. "A portfolio approach to design in the presence of scenario-based uncertainty." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/49036.

Full text
Abstract:
Current aircraft conceptual design practices result in the selection of a single (hopefully) Pareto optimal design to be carried forward into preliminary design. This paradigm is based on the assumption that carrying a significant number of concepts forward is too costly and thus early down-selection between competing concepts is necessary. However, this approach requires that key architectural design decisions which drive performance and market success are fixed very early in the design process, sometimes years before the aircraft actually goes to market. In the presence of uncertainty, if the design performance is examined for individual scenarios as opposed to measuring performance of the design with aggregate statistics, the author finds that the single concept approach can lead to less than desirable design outcomes. This thesis proposes an alternate conceptual design paradigm which leverages principles from economics (specifically the Nobel prize-winning modern portfolio theory) to improve design outcomes by intelligently selecting a small well diversified portfolio of concepts to carry forward through preliminary design, thus reducing the risk from external events that are outside of the engineer’s control. This alternate paradigm is expected to result in an increase in the overall profit by increasing the probability that the final design matches market needs at the time it goes to market. This thesis presents a portfolio based design approach, which leverages dynamic programming to enable a stochastic optimization of alternative portfolios of concepts. This optimization returns an optimized portfolio of concepts which are iteratively pruned to improve design outcomes in the presence of scenario-driven uncertainties. While dynamic programming is identified as a means for doing a stochastic portfolio optimization, dynamic programming is an analytical optimization process which suffers heavily from the curse of dimensionality. As a result, a new hybrid stochastic optimization process called the Evolutionary Cooperative Optimization with Simultaneous Independent Sub-optimization (ECOSIS) has been introduced. The ECOSIS algorithm leverages a co-evolutionary algorithm to optimize a multifaceted problem under uncertainty. ECOSIS allows for a stochastic portfolio optimization including the desired benefit-to-cost tradeoff for a well-diversified portfolio at the size and scope required for use in design problems. To demonstrate the applicability and value of a portfolio based design approach, an example application of the approach to the selection of a new 300 passenger aircraft is presented.
APA, Harvard, Vancouver, ISO, and other styles
7

Martelli, Maxime. "Approche haut niveau pour l’accélération d’algorithmes sur des architectures hétérogènes CPU/GPU/FPGA. Application à la qualification des radars et des systèmes d’écoute électromagnétique." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS581/document.

Full text
Abstract:
A l'heure où l'industrie des semi-conducteurs fait face à des difficultés majeures pour entretenir une croissance en berne, les nouveaux outils de synthèse de haut niveau repositionnent les FPGAs comme une technologie de premier plan pour l'accélération matérielle d'algorithmes face aux clusters à base de CPUs et GPUs. Mais en l’état, pour un ingénieur logiciel, ces outils ne garantissent pas, sans expertise du matériel sous-jacent, l’utilisation de ces technologies à leur plein potentiel. Cette particularité peut alors constituer un frein à leur démocratisation. C'est pourquoi nous proposons une méthodologie d'accélération d'algorithmes sur FPGA. Après avoir présenté un modèle d'architecture haut niveau de cette cible, nous détaillons différentes optimisations possibles en OpenCL, pour finalement définir une stratégie d'exploration pertinente pour l'accélération d'algorithmes sur FPGA. Appliquée sur différents cas d'étude, de la reconstruction tomographique à la modélisation d'un brouillage aéroporté radar, nous évaluons notre méthodologie suivant trois principaux critères de performance : le temps de développement, le temps d'exécution, et l'efficacité énergétique
As the semiconductor industry faces major challenges in sustaining its growth, new High-Level Synthesis tools are repositioning FPGAs as a leading technology for algorithm acceleration in the face of CPU and GPU-based clusters. But as it stands, for a software engineer, these tools do not guarantee, without expertise of the underlying hardware, that these technologies will be harnessed to their full potential. This can be a game breaker for their democratization. From this observation, we propose a methodology for algorithm acceleration on FPGAs. After presenting a high-level model of this architecture, we detail possible optimizations in OpenCL, and finally define a relevant exploration strategy for accelerating algorithms on FPGA. Applied to different case studies, from tomographic reconstruction to the modelling of an airborne radar jammer, we evaluate our methodology according to three main performance criteria: development time, execution time, and energy efficiency
APA, Harvard, Vancouver, ISO, and other styles
8

Bahri, Imen. "Contribution des systèmes sur puce basés sur FPGA pour les applications embarquées d’entraînement électrique." Thesis, Cergy-Pontoise, 2011. http://www.theses.fr/2011CERG0529/document.

Full text
Abstract:
La conception des systèmes de contrôle embarqués devient de plus en plus complexe en raison des algorithmes utilisés, de l'augmentation des besoins industriels et de la nature des domaines d'applications. Une façon de gérer cette complexité est de concevoir les contrôleurs correspondant en se basant sur des plateformes numériques puissantes et ouvertes. Plus précisément, cette thèse s'intéresse à l'utilisation des plateformes FPGA System-on-Chip (SoC) pour la mise en œuvre des algorithmes d'entraînement électrique pour des applications avioniques. Ces dernières sont caractérisées par des difficultés techniques telles que leur environnement de travail (pression, température élevée) et les exigences de performance (le haut degré d'intégration, la flexibilité). Durant cette thèse, l'auteur a contribué à concevoir et à tester un contrôleur numérique pour un variateur de vitesse synchrone qui doit fonctionner à 200 °C de température ambiante. Il s'agit d'une commande par flux orienté (FOC) pour une Machine Synchrone à Aimants Permanents (MSAP) associée à un capteur de type résolveur. Une méthode de conception et de validation a été proposée et testée en utilisant une carte FPGA ProAsicPlus de la société Actel/Microsemi. L'impact de la température sur la fréquence de fonctionnement a également été analysé. Un état de l'art des technologies basées sur les SoC sur FPGA a été également présenté. Une description détaillée des plateformes numériques récentes et les contraintes en lien avec les applications embarquées a été également fourni. Ainsi, l'intérêt d'une approche basée sur SoC pour des applications d'entrainements électriques a été démontré. D'un autre coté et pour profiter pleinement des avantages offertes par les SoC, une méthodologie de Co-conception matériel-logiciel (hardware-software (HW-SW)) pour le contrôle d'entraînement électrique a été proposée. Cette méthode couvre l'ensemble des étapes de développement de l'application de contrôle à partir des spécifications jusqu'à la validation expérimentale. Une des principales étapes de cette méthode est le partitionnement HW-SW. Le but est de trouver une combinaison optimale entre les modules à mettre en œuvre dans la partie logiciel et celles qui doivent être mis en œuvre dans la partie matériel. Ce problème d'optimisation multi-objectif a été réalisé en utilisant l'algorithme de génétique, Non-Dominated Sorting Genetic Algorithm (NSGA-II). Ainsi, un Front de Pareto des solutions optimales peut être déduit. L'illustration de la méthodologie proposée a été effectuée en se basant sur l'exemple du régulateur de vitesse sans capteur utilisant le filtre de Kalman étendu (EKF). Le choix de cet exemple correspond à une tendance majeure dans le domaine des contrôleurs embraqués pour entrainements électriques. Par ailleurs, la gestion de l'architecture du contrôleur embarqué basée sur une approche SoC a été effectuée en utilisant un système d'exploitation temps réel. Afin d'accélérer les services de ce système d'exploitation, une unité temps réel a été développée en VHDL et associée au système d'exploitation. Il s'agit de placer les services d'ordonnanceur et des processus de communication du système d'exploitation logiciel au matériel. Ceci a permis une accélération significative du traitement. La validation expérimentale d'un contrôleur du courant a été effectuée en utilisant un banc de test du laboratoire. Les résultats obtenus prouvent l'intérêt de l'approche proposée
Designing embedded control systems becomes increasingly complex due to the growing of algorithm complexity, the rising of industrials requirements and the nature of application domains. One way to handle with this complexity is to design the corresponding controllers on performing powerful and open digital platforms. More specifically, this PhD deals with the use of FPGA System-on-Chip (SoC) platforms for the implementation of complex AC drive controllers for avionic applications. These latters are characterized by stringent technical issues such as environment conditions (pressure, high temperature) and high performance requirements (high integration, flexibility and efficiency). During this thesis, the author has contributed to design and to test a digital controller for a high temperature synchronous drive that must operate at 200°C ambient. It consists on the Flux Oriented Controller (FOC) for a Permanent Magnet Synchronous Machine (PMSM) associated with a Resolver sensor. A design and validation method has been proposed and tested using a FPGA ProAsicPlus board from Actel-Microsemi Company. The impact of the temperature on the operating frequency has been also analyzed. A state of the art FPGA SoC technology has been also presented. A detailed description of the recent digital platforms and constraints in link with embedded applications was investigated. Thus, the interest of a SoC-based approach for AC drives applications was also established. Additionally and to have full advantages of a SoC based approach, an appropriate HW-SW Co-design methodology for electrical AC drive has been proposed. This method covers the whole development steps of the control application from the specifications to the final experimental validation. One of the main important steps of this method is the HW-SW partitioning. The goal is to find an optimal combination between modules to be implemented in software and those to be implemented in hardware. This multi-objective optimization problem was performed with the Non-Dominated Sorting Genetic Algorithm (NSGA-II). Thus, the Pareto-Front of optimal solution can be deduced. The illustration of the proposed Co-design methodology was made based on the sensorless speed controller using the Extended Kalman Filter (EKF). The choice of this benchmark corresponds to a major trend in embedded control of AC drives. Besides, the management of SoC-based architecture of the embedded controller was allowed using an efficient Real-Time Operating System (RTOS). To accelerate the services of this operating system, a Real-Time Unit (RTU) was developed in VHDL and associated to the RTOS. It consists in hardware operating system that moves the scheduling and communication process from software RTOS to hardware. Thus, a significant acceleration has been achieved. The experimentation tests based on digital current controller were also carried out using a laboratory set-up. The obtained results prove the interest of the proposed approach
APA, Harvard, Vancouver, ISO, and other styles
9

Trindade, Alessandro Bezerra. "Aplicando verificação de modelos baseada nas teorias do módulo da satisfabilidade para o particionamento de hardware/software em sistemas embarcados." Universidade Federal do Amazonas, 2015. http://tede.ufam.edu.br/handle/tede/4091.

Full text
Abstract:
Submitted by Kamila Costa (kamilavasconceloscosta@gmail.com) on 2015-06-15T21:23:16Z No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-16T15:00:54Z (GMT) No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-16T15:02:16Z (GMT) No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5)
Made available in DSpace on 2015-06-16T15:02:16Z (GMT). No. of bitstreams: 1 Dissertacao-Alessandro B Trindade.pdf: 1833454 bytes, checksum: 132beb74daa71e138bbfcdc0dcf5b174 (MD5) Previous issue date: 2015-02-09
Não Informada
When performing hardware/software co-design for embedded systems, does emerge the problem of allocating properly which functions of the system should be implemented in hardware (HW) or in software (SW). This problem is known as HW/SW partitioning and in the last ten years, a significant research effort has been carried out in this area. In this proposed project, we present two new approaches to solve the HW/SW partitioning problem by using SMT-based verification techniques, and comparing the results using the traditional technique of Integer Linear Programming (ILP) and a modern method of optimization by Genetic Algorithm (GA). The goal is to show with experimental results that model checking techniques can be effective, in particular cases, to find the optimal solution of the HW/SW partitioning problem using a state-of-the-art model checker based on Satisfiability Modulo Theories (SMT) solvers, when compared to the traditional techniques.
Quando se realiza um coprojeto de hardware/software para sistemas embarcados, emerge o problema de se decidir qual função do sistema deve ser implementada em hardware (HW) ou em software (SW). Este tipo de problema recebe o nome de particionamento de HW/SW. Na última década, um esforço significante de pesquisa tem sido empregado nesta área. Neste trabalho, são apresentadas duas novas abordagens para resolver o problema de particionamento de HW/SW usando técnicas de verificação formal baseadas nas teorias do módulo da satisfabilidade (SMT). São comparados os resultados obtidos com a tradicional técnica de programação linear inteira (ILP) e com o método moderno de otimização por algoritmo genético (GA). O objetivo é demonstrar, com os resultados empíricos, que as técnicas de verificação de modelos podem ser efetivas, em casos particulares, para encontrar a solução ótima do problema de particionamento de HW/SW usando um verificador de modelos baseado no solucionador SMT, quando comparado com técnicas tradicionais.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Yuanzhi. "Algorithms and Hardware Co-Design of HEVC Intra Encoders." OpenSIUC, 2019. https://opensiuc.lib.siu.edu/dissertations/1769.

Full text
Abstract:
Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction.
APA, Harvard, Vancouver, ISO, and other styles
11

Daniel, Tertei. "Co-design of architectures and algorithms for mobile robot localization and model-based detection of obstacles." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2016. http://www.cris.uns.ac.rs/record.jsf?recordId=101781&source=NDLTD&language=en.

Full text
Abstract:
This thesis proposes SoPC (System on a ProgrammableChip) architectures for efficient embedding of vison-basedlocalization and obstacle detection tasks in a navigationalpipeline on autonomous mobile robots. The obtainedresults are equivalent or better in comparison to state-ofthe-art. For localization, an efficient hardware architecturethat supports EKF-SLAM's local map management withseven-dimensional landmarks in real time is developed.For obstacle detection a novel method of objectrecognition is proposed - detection by identificationframework based on single detection window scale. Thisframework allows adequate algorithmic precision andexecution speeds on embedded hardware platforms.
Ova teza bavi se dizajnom SoPC (engl. System on aProgrammable Chip) arhitektura i algoritama za efikasnuimplementaciju zadataka lokalizacije i detekcije preprekabaziranih na viziji u kontekstu autonomne robotskenavigacije. Za lokalizaciju, razvijena je efikasnaračunarska arhitektura za EKF-SLAM algoritam, kojapodržava skladištenje i obradu sedmodimenzionalnihorijentira lokalne mape u realnom vremenu. Za detekcijuprepreka je predložena nova metoda prepoznavanjaobjekata u slici putem prozora detekcije fiksnedimenzije, koja omogućava veću brzinu izvršavanjaalgoritma detekcije na namenskim računarskimplatformama.
APA, Harvard, Vancouver, ISO, and other styles
12

Törtei, Dániel. "Co-design of architectures and algorithms for mobile robot localization and model-based detection of obstacles." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30294/document.

Full text
Abstract:
Un véhicule autonome ou un robot mobile est équipé d'un système de navigation qui doit comporter plusieurs briques fonctionnelles pour traiter de perception, localisation, planification de trajectoires et locomotion. Dès que ce robot ou ce véhicule se déplace dans un environnement humain dense, il exécute en boucle et en temps réel plusieurs fonctions pour envoyer des consignes aux moteurs, pour calculer sa position vis-à-vis d'un repère de référence connu, et pour détecter de potentiels obstacles sur sa trajectoire; du fait de la richesse sémantique des images et du faible coût des caméras, ces fonctions exploitent souvent la vision. Les systèmes embarqués sur ces machines doivent alors intégrer des cartes assez puissantes pour traiter des données visuelles en temps réel. Par ailleurs, les contraintes d'autonomie de ces plateformes imposent de très faibles consommations énergétiques. Cette thèse proposent des architectures de type SOPC (System on Programmable Chip) conçues par une méthodologie de co-design matériel/logiciel pour exécuter de manière efficace les fonctions de localisation et de détection des obstacles à partir de la vision. Les résultats obtenus sont équivalents ou meilleurs que l'état de l'art, concernant la gestion de la carte locale d'amers pour l'odométrie-visuelle par une approche EKF-SLAM, et le rapport vitesse d'exécution sur précision pour ce qui est de la détection d'obstacles par identification dans les images d'objets (piétons, voitures...) sur la base de modèles appris au préalable
An autonomous mobile platform is endowed with a navigational system which must contain multiple functional bricks: perception, localization, path planning and motion control. As soon as such a robot or vehicle moves in a crowded environment, it continously loops several tasks in real time: sending reference values to motors' actuators, calculating its position in respect to a known reference frame and detection of potential obstacles on its path. Thanks to semantic richness provided by images and to low cost of visual sensors, these tasks often exploit visual cues. Other embedded systems running on these mobile platforms thus demand for an additional integration of high-speed embeddable processing systems capable of treating abundant visual sensorial input in real-time. Moreover, constraints influencing the autonomy of the mobile platform impose low power consumption. This thesis proposes SOPC (System on a Programmable Chip) architectures for efficient embedding of vison-based localization and obstacle detection tasks in a navigational pipeline by making use of the software/hardware co-design methodology. The obtained results are equivalent or better in comparison to state-of-the-art for both EKF-SLAM based visual odometry: regarding the local map size management containing seven-dimensional landmarks and model-based detection-by-identification obstacle detection: algorithmic precision over execution speed metric
APA, Harvard, Vancouver, ISO, and other styles
13

Marques, Vítor Manuel dos Santos. "Performance of hardware and software sorting algorithms implemented in a SOC." Master's thesis, Universidade de Aveiro, 2017. http://hdl.handle.net/10773/23467.

Full text
Abstract:
Mestrado em Engenharia de Computadores e Telemática
Field Programmable Gate Arrays (FPGAs) were invented by Xilinx in 1985. Their reconfigurable nature allows to use them in multiple areas of Information Technologies. This project aims to study this technology to be an alternative to traditional data processing methods, namely sorting. The proposed solution is based on the principle of reusing resources to counter this technology’s known resources limitations.
As Field Programmable Gate Arrays (FPGAs) foram inventadas em 1985 pela Xilinx. A sua natureza reconfiguratória permite que sejam utilizadas em várias áreas das tecnologias de informação. Este trabalho tem como objectivo estudar o uso desta tecnologia como alternativa aos métodos tradicionais de processamento de dados, nomeadamente a ordenação. A solução proposta baseia-se na reutilização de recursos para combater as conhecidas limitações deste tipo de tecnologia.
APA, Harvard, Vancouver, ISO, and other styles
14

Lopes, Rodrigo Aranha Pereira. "Computational strategies applied to product design." Master's thesis, Universidade de Lisboa, Faculdade de Arquitetura, 2018. http://hdl.handle.net/10400.5/17993.

Full text
Abstract:
Dissertação de Mestrado em Design, com a especialização em Design de Produto apresentada na Faculdade de Arquitetura da Universidade de Lisboa para obtenção do grau de Mestre.
Em diferentes ocasiões, Richard Sennett e Vilém Flusser descreveram que a prática e a teoria, a técnica e a expressão, a arte e a tecnologia, o criador e o usuário, antes compartilhavam a mesma raíz. Ao longo da história, no entanto, estes conceitos se dividiram com o design posicionado ao centro. Esta proposta de pesquisa visa, em primeiro lugar, contribuir para a diminuição desta herdada separação. Isso, por meio do uso de estratégias computacionais aplicadas ao design. O presente estudo aplicará essa abordagem ao projeto e construção de uma prancha de surfe. Um dos objetivos é desenvolver uma plataforma de codesign que permita aos usuários gerarem suas próprias pranchas de surf, por meio de modelagem algorítmica / paramétrica (Grasshopper e ShapeDiver). Um segundo aspecto considera criticamente os materiais utilizados na indústria do surf, com o objetivo de desenvolver produtos que utilizem materiais menos nocivos ao meio ambiente e com maior capacidade de controle e alteração em relação às capacidades de desempenho. Em particular, esta proposta visa desenvolver um algoritmo para gerar objetos com seus núcleos internos compostos por estruturas de papel. O objeto específico a ser gerado neste caso é uma prancha de surf.
ABSTRACT: As pointed out on different occasions by both Richard Sennett and Villém Flusser, practice and theory, technique and expression, art and technology, maker and user, once shared a common ground. Throughout history, however, they have become divided. Design stands in between. This research proposal firstly aims to contribute to the diminishing of this historical inheritance. This, by means of providing a workflow for designers with the use of computational strategies. The present study will apply this approach to the design and building of a surfboard. The goal is to develop a co-designing platform allowing users to generate their own tailor-made surfboard by means of algorithmic/parametric modeling (Grasshopper and Shapediver). A second aspect critically considers the materials used in the surf industry, with the objective of developing products using materials that are less harmful to the environment and with a greater capacity of control and alteration with regards to performance capabilities. In particular, this proposal aims to develop an algorithm that can be used to generate objects of paper structures composing their inner core. The specific object to be generated in this case, is a surfboard.
N/A
APA, Harvard, Vancouver, ISO, and other styles
15

Farjallah, Asma. "Etude de l'adéquation des machines Exascale pour les algorithmes implémentant la méthode du Reverse Time Migation." Thesis, Versailles-St Quentin en Yvelines, 2014. http://www.theses.fr/2014VERS0050/document.

Full text
Abstract:
La caractérisation des applications en vue de les préparer pour les nouvelles architectures et les porter sur des systèmes très étendus est une étape importante pour pouvoir anticiper les modifications nécessaires. Comme les machines Exascale sont prévues pour la période 2018-2020, l'étude des applications et leur préparation pour ces machines s'avèrent donc essentielles. Nous nous intéressons aux applications d'imagerie sismique et en particulier à l'application Reverse Time Migration (RTM) car elle est très utilisée par les pétroliers dans le cadre de l'exploration sismique.La première partie de nos travaux a porté sur l'étude du cœur de calcul de l'application RTM qui consiste en un calcul de différences finies dans le domaine temporel (FDTD). Nous avons caractérisé cette partie de l'application en soulevant les aspects architecturaux des machines actuelles ayant un fort impact sur la performance, notamment les caches, les bandes passantes et le prefetching. Cette étude a abouti à l'élaboration d'un modèle de performance permettant de prédire le trafic DRAM des FDTD. La deuxième partie de la thèse se focalise sur l'impact de l'hétérogénéité et le parallélisme sur la FDTD et sur RTM. Nous avons choisi l'architecture manycore d’Intel, Xeon Phi, et nous avons étudié une implémentation "native" et une implémentation hétérogène et hybride, la version "symmetric". Enfin, nous avons porté l'application RTM sur un cluster hétérogène, Stampede du Texas Advanced Computing Center (TACC), où nous avons effectué des tests de scalabilité allant jusqu'à 64 nœuds contenant des coprocesseurs Xeon Phi et des processeurs Sandy Bridge ce qui correspond à presque 5000 cœurs
As we are expecting Exascale systems for the 2018-2020 time frame, performance analysis and characterization of applications for new processor architectures and large scale systems are important tasks that permit to anticipate the required changes to efficiently exploit the future HPC systems. This thesis focuses on seismic imaging applications used for modeling complex physical phenomena, in particular the depth imaging application called Reverse Time Migration (RTM). My first contribution consists in characterizing and modeling the performance of the computational core of RTM which is based on finite-difference time-domain (FDTD) computations. I identify and explore the major tuning parameters influencing performance and the interaction between the architecture and the application. The second contribution is an analysis to identify the challenges for a hybrid and heterogeneous implementation of FDTD for manycore architectures. We target Intel’s first Xeon Phi co-processor, the Knights Corner. This architecture is an interesting proxy for our study since it contains some of the expected features of an Exascale system: concurrency and heterogeneity.My third contribution is an extension of the performance analysis and modeling to the full RTM. This adds communications and IOs to the computation part. RTM is a data intensive application and requires the storage of intermediate values of the computational field resulting in expensive IO accesses. My fourth contribution is the final measurement and model validation of my hybrid RTM implementation on a large system. This has been done on Stampede, a machine of the Texas Advanced Computing Center (TACC), which allows us to test the scalability up to 64 nodes each containing one 61-core Xeon Phi and two 8-core CPUs for a total close to 5000 heterogeneous cores
APA, Harvard, Vancouver, ISO, and other styles
16

Merchant, Farhad. "Algorithm-Architecture Co-Design for Dense Linear Algebra Computations." Thesis, 2015. http://etd.iisc.ernet.in/2005/3958.

Full text
Abstract:
Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performance computing kernels is an interesting and challenging research area. Dense Linear Algebra (DLA) computation is a representative high-performance computing ap- plication, which is used, for example, in LU and QR factorizations. Unfortunately, mod- ern off-the-shelf microprocessors fall significantly short of achieving theoretical lower bound in CPI for high performance computing applications. In this thesis, we perform an in-depth analysis of the available parallelisms and propose suitable algorithmic and architectural variation to significantly improve the computation efficiency. There are two standard approaches for improving the computation effficiency, first, to perform application-specific architecture customization and second, to do algorithmic tuning. In the same manner, we first perform a graph-based analysis of selected DLA kernels. From the various forms of parallelism, thus identified, we design a custom processing element for improving the CPI. The processing elements are used as building blocks for a commercially available Coarse-Grained Reconfigurable Architecture (CGRA). By per- forming detailed experiments on a synthesized CGRA implementation, we demonstrate that our proposed algorithmic and architectural variations are able to achieve lower CPI compared to off-the-shelf microprocessors. We also benchmark against state-of-the-art custom implementations to report higher energy-performance-area product. DLA computations are encountered in many engineering and scientific computing ap- plications ranging from Computational Fluid Dynamics (CFD) to Eigenvalue problem. Traditionally, these applications are written in highly tuned High Performance Comput- ing (HPC) software packages like Linear Algebra Package (LAPACK), and/or Scalable Linear Algebra Package (ScaLAPACK). The basic building block for these packages is Ba- sic Linear Algebra Subprograms (BLAS). Algorithms pertaining LAPACK/ScaLAPACK are written in-terms of BLAS to achieve high throughput. Despite extensive intellectual efforts in development and tuning of these packages, there still exists a scope for fur- ther tuning in this packages. In this thesis, we revisit most prominent and widely used compute bound algorithms like GMM for further exploitation of Instruction Level Parallelism (ILP). We further look into LU and QR factorizations for generalizations and exhibit higher ILP in these algorithms. We first accelerate sequential performance of the algorithms in BLAS and LAPACK and then focus on the parallel realization of these algorithms. Major contributions in the algorithmic tuning in this thesis are as follows: Algorithms: We present graph based analysis of General Matrix Multiplication (GMM) and discuss different types of parallelisms available in GMM We present analysis of Givens Rotation based QR factorization where we improve GR and derive Column-wise GR (CGR) that can annihilate multiple elements of a column of a matrix simultaneously. We show that the multiplications in CGR are lower than GR We generalize CGR further and derive Generalized GR (GGR) that can annihilate multiple elements of the columns of a matrix simultaneously. We show that the parallelism exhibited by GGR is much higher than GR and Householder Transform (HT) We extend generalizations to Square root Free GR (also knows as Fast Givens Rotation) and Square root and Division Free GR (SDFG) and derive Column-wise Fast Givens, and Column-wise SDFG . We also extend generalization for complex matrices and derive Complex Column-wise Givens Rotation Coarse-grained Recon gurable Architectures (CGRAs) have gained popularity in the last decade due to their power and area efficiency. Furthermore, CGRAs like REDEFINE also exhibit support for domain customizations. REDEFINE is an array of Tiles where each Tile consists of a Compute Element and a Router. The Routers are responsible for on-chip communication, while Compute Elements in the REDEFINE can be domain customized to accelerate the applications pertaining to the domain of interest. In this thesis, we consider REDEFINE base architecture as a starting point and we design Processing Element (PE) that can execute algorithms in BLAS and LAPACK efficiently. We perform several architectural enhancements in the PE to approach lower bound of the CPI. For parallel realization of BLAS and LAPACK, we attach this PE to the Router of REDEFINE. We achieve better area and power performance compared to the yesteryear customized architecture for DLA. Major contributions in architecture in this thesis are as follows: Architecture: We present design of a PE for acceleration of GMM which is a Level-3 BLAS operation We methodically enhance the PE with different features for improvement in the performance of GMM For efficient realization of Linear Algebra Package (LAPACK), we use PE that can efficiently execute GMM and show better performance For further acceleration of LU and QR factorizations in LAPACK, we identify macro operations encountered in LU and QR factorizations, and realize them on a reconfigurable data-path resulting in 25-30% lower run-time
APA, Harvard, Vancouver, ISO, and other styles
17

Jiang, Zhewei. "Algorithm and Hardware Co-Design for Local/Edge Computing." Thesis, 2020. https://doi.org/10.7916/d8-nxwg-f771.

Full text
Abstract:
Advances in VLSI manufacturing and design technology over the decades have created many computing paradigms for disparate computing needs. With concerns for transmission cost, security, latency of centralized computing, edge/local computing are increasingly prevalent in the faster growing sectors like Internet-of-Things (IoT) and other sectors that require energy/connectivity autonomous systems such as biomedical and industrial applications. Energy and power efficient are the main design constraints in local and edge computing. While there exists a wide range of low power design techniques, they are often underutilized in custom circuit designs as the algorithms are developed independent of the hardware. Such compartmentalized design approach fails to take advantage of the many compatible algorithmic and hardware techniques that can improve the efficiency of the entire system. Algorithm hardware co-design is to explore the design space with whole stack awareness. The main goal of the algorithm hardware co-design methodology is the enablement and improvement of small form factor edge and local VLSI systems operating under strict constraints of area and energy efficiency. This thesis presents selected works of application specific digital and mixed-signal integrated circuit designs. The application space ranges from implantable biomedical devices to edge machine learning acceleration.
APA, Harvard, Vancouver, ISO, and other styles
18

"Algorithm Architecture Co-design for Dense and Sparse Matrix Computations." Master's thesis, 2018. http://hdl.handle.net/2286/R.I.51737.

Full text
Abstract:
abstract: With the end of Dennard scaling and Moore's law, architects have moved towards heterogeneous designs consisting of specialized cores to achieve higher performance and energy efficiency for a target application domain. Applications of linear algebra are ubiquitous in the field of scientific computing, machine learning, statistics, etc. with matrix computations being fundamental to these linear algebra based solutions. Design of multiple dense (or sparse) matrix computation routines on the same platform is quite challenging. Added to the complexity is the fact that dense and sparse matrix computations have large differences in their storage and access patterns and are difficult to optimize on the same architecture. This thesis addresses this challenge and introduces a reconfigurable accelerator that supports both dense and sparse matrix computations efficiently. The reconfigurable architecture has been optimized to execute the following linear algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM (Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver), LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication), SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where each core consists of a 2D array of processing elements (PE). The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix updates efficiently. A sequence of such updates is used to solve a larger problem inside a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR) is used to perform sparse kernel updates. Scalable partitioning and mapping schemes are presented that map input matrices of any given size to the multicore architecture. Design trade-offs related to the PE array dimension, size of local memory inside a core and the bandwidth between on-chip memories and the cores have been presented. An optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto 32 GOPS using a single core.
Dissertation/Thesis
Masters Thesis Computer Engineering 2018
APA, Harvard, Vancouver, ISO, and other styles
19

"Algorithm and Hardware Co-design for Learning On-a-chip." Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.45949.

Full text
Abstract:
abstract: Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm. For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability. Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance. From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.
Dissertation/Thesis
Doctoral Dissertation Electrical Engineering 2017
APA, Harvard, Vancouver, ISO, and other styles
20

Lin, Yin-Hsin, and 林殷旭. "Hardware-Software Co-design of an Automatic White Balance Algorithm." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/b4636z.

Full text
Abstract:
碩士
國立臺北科技大學
電腦與通訊研究所
94
As electronic techniques is continuous improved rapidly cameras or video camcorders used for image retrieval technology and development become digitalized. The color of the photographs would look very different due to differences in light projection illumination when we take a picture. Human eyes are able to automatically adjust the color when the illuminations of the light source vary. However, the most frequently used image sensor, charge coupled device, CCD device can not correct the color as human eyes. This paper presents a hardware-software co-design method based on Lam’s automatic white balance algorithm, which combines gray world assumption and perfect reflector assumption algorithms. The execution steps of Lam’s algorithm were divided into three stages. The hardware-software co-design and analysis for each stage was realized. Three factors including processing time, slices and DSP48s of hardware resources were used to formulate a Objective Function, which was employed to evaluate the system performance and hardware resource cost. Experimental results shows suitable partitions of hardware-software co-designs were achieved. An embedded processor, MicroBlaze developed by Xilinx and a floating point processor were used to deal with the software part of the algorithm. The hardware part of the algorithm was implemented using an IP-based method. It is able to reduce the memory and CPU resources of the PC as well as to have the properties of easy modification and function expansion by using such system on a programmable chip architecture.
APA, Harvard, Vancouver, ISO, and other styles
21

Chundi, Pavan Kumar. "Algorithm Hardware Co-Design of Neural Networks for Always-On Devices." Thesis, 2021. https://doi.org/10.7916/d8-xb06-4658.

Full text
Abstract:
Deep learning has become the algorithm of choice in many applications like face recognition, object detection, speech recognition, etc. because of superior accuracy. Large models with several parameters were developed to obtain higher accuracy, which eventually gave diminishing returns at very large training and deployment cost. Consequently, greater attention is now being paid to the efficiency of neural networks. Low power consumption is particularly important in the case of always-on applications. Some examples of these applications are the datacenters, cellular base stations, battery-powered devices like implantable devices, wearables, cell phones and UAVs. Improvement in the efficiency of these devices by reducing the power consumed will bring down the energy cost or extend the battery life or decrease the form factor of these devices, thereby improving the acceptability and adoption of the device. Neural networks are a significant component of the total workload in the case of IoT devices with smart functions and datacenters. Base stations can also employ neural networks to improve the rate of convergence in channel estimation. Efficient execution of the neural networks on always-on devices, therefore, helps in lowering the overall power dissipation. Algorithm only solutions target CPU or GPU as a platform and tend to focus on the number of computing operations. Hardware only solutions tend to focus on programmability, low voltage operation, standby power reduction and on-chip data movement. Such solutions fail to take advantage of the joint optimization of both algorithm and hardware for the target application. This thesis contributes to improving the efficiency of neural networks on always-on devices through both algorithmic and hardware interventions. It presents works of algorithm-hardware co-design which can obtain better power reduction in the case of a smart IoT device, a datacenter and a small cell base station. It achieves power reduction through a combination of appropriate neural network algorithm and architecture, simpler operations and a reduction in the number of off-chip memory accesses.
APA, Harvard, Vancouver, ISO, and other styles
22

Yen-Sheng, Chang. "An Architectural Co-Synthesis Algorithm for Energy-Aware Network-on-Chip Design." 2005. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2707200517413200.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Chang, Yen-Sheng, and 張延聖. "An Architectural Co-Synthesis Algorithm for Energy-Aware Network-on-Chip Design." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/64366099628414404685.

Full text
Abstract:
碩士
國立臺灣大學
資訊工程學研究所
93
Network-on-Chip has been proposed as a practical development platform for future system-on-chip products to reduce interconnection delay and to boost a good performance. In this thesis, we propose an energy-aware algorithm which simultaneously synthesizes the hardware and software architectures of a NoC-based system to meet a performance constraint and minimize total energy cost. The hardware architecture of the synthesized systems consists of an NoC platform and a set of PE (Processing Element) of multiple types; the software architecture consists of allocation of tasks to PE, the topological mapping of PEs to the NoC architecture and a static schedule for the task set. As the main contribution, we first formulate the problem of architectural co-synthesis algorithm with HW/SW co-design for a heterogeneous NoC platform and then propose an effective and efficient SA-based algorithm to solve it. With the aid of this framework, the designer can explore both hardware and software architectures simultaneously to find a system-wise energy-minimal hardware configuration along with corresponding software architecture under tight performance constraints.
APA, Harvard, Vancouver, ISO, and other styles
24

Yang, Chieh-Chao, and 楊傑超. "Low Power Algorithm-Architecture Co-Design of Fast Independent Component Analysis (ICA) for Multi-Gas Sensor Applications." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/46856389350093921149.

Full text
Abstract:
碩士
國立交通大學
電子工程學系 電子研究所
102
Since the size and energy consumption of multi-sensors developed in recent years are large, it is complex to make them portable. However, with the help of the technique of future 3DIC, the sensor system can be integrated into mobile applications. The sensor system is composed of AFE receiving, ADC, digital processing, and wireless transmission. We designed the digital processing part, trying to optimize and improve Independent Component Analysis system, which recovers the mixed signals received by Multi-Gas Sensor into original source signals. The aim of this paper is to co-design a low power Fast ICA system and implement into FPGA by optimizing algorithm and basic architecture. The analysis and comparison of different sizes system are accomplished to trade-off between power, delay, and accuracy of extracted signals. Several novelties are attached in the design to meet the requirements and optimize the gas signal extraction. The system starts before all signals approach to speed up the signal processing. In addition, stability check helps the system enters termination mode by clock gating when the gas signal is stable. Finally, the power as well as area can be reduced so the portable multi-gas sensor concept, even electronic nose can be fulfilled.
APA, Harvard, Vancouver, ISO, and other styles
25

Jr-ShiangPeng and 彭志祥. "Hardware and Software Co-design of Silicon Intellectual Property Module Based on Sequential Minimal Optimization algorithm for Speaker Recognition." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/72913970118404970293.

Full text
Abstract:
碩士
國立成功大學
電機工程學系碩博士班
98
This thesis proposes a hardware/software co-design IP for embedded text-independent speaker recognition system to increase convenient life through portable speech application. In hardware part, the Sequential Minimal Optimization (SMO) algorithm is adopted for accelerating SVM training to create speaker models. In software part, we modify our lab’s previous fixed-point arithmetic design for both the Linear Prediction Cepstral Coefficients (LPCC) and the one vs. one highest voting analysis algorithm. Two schemes, the heuristics selection and the efficient cache utilization method are proposed to implement the SMO algorithm into hardware design for decreasing the training time. Moreover, a specific design is proposed to efficiently utilize the bus bandwidth and reduce delivering time for about 5% between software and hardware communications. Finally, our simulation/emulation results show that 90% of training time is reduced while the recognition accuracy rate can achieve 92.7%.
APA, Harvard, Vancouver, ISO, and other styles
26

Fang, Jia-Wei, and 方家偉. "Routing Algorithms for Chip-Package-Board Co-Design." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/31233120502420379059.

Full text
Abstract:
博士
國立臺灣大學
電子工程學研究所
97
In VLSI deigns, nanometer effects have complicated the designs of chips as well as packages and printed circuit boards. Further, due to higher functionality in modern circuits, the number of I/O’s is also dramatically increased. In order to improve the routability, performance, and convergence of the design, two advanced packaging technologies: ball-grid-array packaging and flip-chip packaging, and chip-package-board co-design are strongly recommended by industry. In this dissertation, we present the first routing algorithms in the literature for chip-package-board co-design based on the two advanced packages. They can not only be applied to complete (1) the routing in the packages and printed circuit boards, but also can consider (2) chip-package co-design, (3) package-board co-design, and (4) chip-package-board co-design. For the routing in the packages and printed circuit boards, our routing algorithms adopt a two-stage technique of global routing followed by detailed routing. In the global routing, the computational geometry techniques (e.g., the Delaunay triangulation and the Voronoi diagram), minimum-cost maximum-flow network algorithm, and integer and linear programming are used to find an optimal global-routing wirelength for the addressed problems. Since we consider the wire congestion in our global-routing networks, the detailed routing can generate a 100% routable sequence to complete the routing. For chip-package co-design, an I/O netlist between a chip and a package can be simultaneously generated with the package layout. Therefore, the total wirelength can be reduced. By considering package-board co-design, the routing information from the chip and the printed circuit board can be kept during the package routing. Consequently, the routability can be improved. In chip-package-board co-design, due to the great design flexibility, we can additionally consider the I/O planning of a package except the package routing. Hence, the design cost can further be reduced in the early stage. Further, we can also get much shorter total wirelength and higher routability. Experimental results based on real industry designs show that our routing algorithms can achieve 100% routability and the optimal global-routing wirelength and satisfy all design constraints, under reasonable CPU times, whereas recent related work results in much inferior solution quality.
APA, Harvard, Vancouver, ISO, and other styles
27

Hung, Wei-Hsuan. "Analysis of Co-Synthesis Algorithms for Energy-Aware NoC Design." 2007. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-3008200713534400.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Hsiao, Chin-Mu, and 蕭金木. "Hardware/Software Co-design of AES Algorithms Using Custom Instructions." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/35389142457501490628.

Full text
Abstract:
碩士
輔仁大學
電子工程學系
96
The Advanced Encryption Standard (AES) is the new encryption standard appointed by NIST. To shorten the encryption/decryption time of plenty of data, it is necessary to adopt the algorithm of hardware implementation; however, it is possible to meet the requirement for low cost by completely using software only. How to reach a balance between the cost and efficiency of software and hardware implementation is a question worth of being discussed. In this paper, we implemented the AES encryption algorithm with hardware in combination with part of software using the custom instruction mechanism provided by the Altera NiosII platform. We completed a parameterized synthesizable design. Given a parameter setting, our system can generate the hardware design and necessary software/hardware interface automatically. We explored various combinations of hardware and software to realize AES algorithm and discussed possible best solutions of different needs.
APA, Harvard, Vancouver, ISO, and other styles
29

Hung, Wei-Hsuan, and 洪緯軒. "Analysis of Co-Synthesis Algorithms for Energy-Aware NoC Design." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/70393998063052012030.

Full text
Abstract:
碩士
臺灣大學
資訊工程學研究所
95
Network-on-Chip (NoC) has been proposed to overcome the complex on-chip communication problem of SoC (System-on-Chip) design in deep submicron. A complete NoC design contains exploration on both hardware and software architectures. The hardware architecture includes the selection of PEs (Processing Elements) with multiple types and their topology. The software architecture contains the allocation of tasks to PEs, scheduling of tasks and their communications. To find the best hardware design for the target tasks, both hardware and software architectures need to be considered simultaneously. Previous works on NoC design have proposed some co-synthesis algorithms, which minimizes energy consumption while meeting the real-time requirements commonly seen in the embedded applications. In this thesis, we compare the solution quality and running time of several types of co-synthesis algorithms including branch and bound algorithm, iteraitve algorithm and SA-based algorithm.
APA, Harvard, Vancouver, ISO, and other styles
30

Weng, Chih-hsien, and 翁智賢. "Hardware/Software Co-design and Implementation of Algorithmic Processors for Image Processing." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/96720386726092132758.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
96
This thesis is related to hardware/software co-design and verification of the algorithmic processors for image processing. The research work includes four parts. The first part is about software design of the image processing algorithms such as center and size finding, translation, scaling, rotation, and projection. The second part is to design and implement hardware processors for the algorithms mentioned above. The third part is to write the related drivers to integrate the algorithmic processors and the verification system together. The fourth part is about the verification and performance test of the related algorithmic processors. On the whole, the goal of this thesis is to design and develop various algorithmic processors for image processing. Meanwhile, a hardware/software co-design method is presented to improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
31

Hsu, Chih-hao, and 許志豪. "Hardware/Software Co-design and Implementation of an Algorithmic Processor for Image Binarization." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/94852945422796097338.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
97
This thesis is related to the hardware/software co-design and verification of an algorithmic processor for image binarization. The research work includes four parts. The first part is about software design of the various binarization algorithms for digital images. After analyzing the advantages and disadvantages of these algorithms, the modified Sauvola algorithm is chosen for hardware implementation. The second part is to design and implement a hardware processor for the modified Sauvola algorithm. Meanwhile, in order to enhance the data transfer performance, a 2-D DMA controller has been designed. Finally, the algorithmic processor and 2-D DMA controller are integrated by using a SOPC-based system and implemented on an Altera FPGA development board. The third part is to write the related drivers for the algorithmic processor. Then the function of the algorithmic processor is verified through using a RPC-based verification system. The fourth part is about the verification and evaluation of the run-time performance of the algorithmic processor. On the whole, the goal of this thesis is to do researches on the development of a binarization algorithm for digital images. Then the related algorithmic processor is developed and implemented on the FPGA development board. After being verified by using various digital images, the algorithm developed in this thesis has shown very good performance for image binarization. Meanwhile, it also shows that the hardware/software co-design method presented can improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
32

Huang, Uao-Shine, and 黃耀陞. "Hardware/Software Co-design and Implementation of an Algorithmic Processor for Document Image Rotation." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/39920568792275923898.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
98
This thesis is related to the hardware/software co-design and verification of an algorithmic processor for binary document image rotation. The research work includes four parts: The first part is about software design of the rotation algorithm for binary document images. After analyzing the advantages and disadvantages of these algorithms and considering about the limited resources in the embedded hardware, a window-based rotation algorithm which uses inverse mapping and linear interpolation has been developed. The second part is to design and implement an algorithmic processor for the window-based rotation algorithm mentioned above. It stores full binary document images in DDR SDRAM. Therefore the processor consists of reference-region fetch unit, rotation-interpolation unit, destination-data store unit, and DDR SDRAM controller. Finally, the above hardware modules are integrated into an SOPC-based system and implemented on an Altera FPGA development board. The third part is to write the related drivers for the algorithmic processor. Then the function of the algorithmic processor is verified through using a RPC-based verification system. The fourth part is about the verification and evaluation of the run-time performance of the algorithmic processor. On the whole, the goal of this thesis is to do researches on the development of a rotation algorithm for binary document images. Then the related algorithmic processor is developed and implemented on the FPGA development board. After being verified by using various images and rotation angles, the algorithm developed in this thesis has shown very good performance for binary document image rotation. Meanwhile, it also shows that the hardware/software co-design method presented can improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
33

Lin, Yi-hsien, and 林奕諴. "Hardware/Software Co-design and Implementation of an Algorithmic Processor for Document Skew Detection." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/36422924001553221768.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
98
This thesis is related to the hardware/software co-design and verification of an algorithmic processor for skew detection. The research work includes four parts. The first part is about software design of the various skew detection algorithms for binary document images. After analyzing the advantages and disadvantages of these algorithms, the MICC-Projection algorithm is developed to improve the correctness of skew detection. The second part is to design and implement an algorithmic processor for the MICC-Projection algorithm which consists of MICC and projection sub-processors. The processor is integrated into an SOPC-based system and implemented on an Altera FPGA development board. The third part is to write the related drivers for the algorithmic processor. Then the function of the algorithmic processor is verified through using a RPC-based verification system. The fourth part is about the verification and evaluation of the run-time performance of the algorithmic processor. On the whole, the goal of this thesis is to do researches on the development of a skew detection algorithm for binary document images. Then the related algorithmic processor is developed and implemented on the FPGA development board. After being verified by using various binary document images, the algorithm developed in this thesis has shown very good performance for skew detection. Meanwhile, it also shows that the hardware/software co-design method presented can improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
34

Huang, Yin-hsiu, and 黃寅修. "Hardware/Software Co-design and Implementation of Algorithmic Processors for Boundary and Corner Detection." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/41126442183628616181.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
96
This thesis is related to hardware/software co-design and verification of the algorithmic processors for digital image processing. The research work includes three parts. The first part is about using a Linux personal computer system to design and verify the software for the boundary and corner detection algorithms. Here boundary detection means to mark the boundary points in a binary digital image and corner detection means to separate boundary points into several classes of features (i.e., concave, convex, and straight-line points) through using the following operations such as path finding, computing the cosine value of a corner, and corner classification. The second part is about the design of hardware and software/hardware interface for the boundary and corner detection algorithmic processors. In this work, the processor hardware is implemented on an Altera FPGA development board, and the software/hardware interface is designed according to NIOS II CPU bus standard. The third part is to use a well-developed RPC-based embedded system for the verification and performance test of the related algorithmic processors. On the whole, the goal of this thesis is to design and develop the prototypes for the boundary and corner detection algorithmic processors. Meanwhile, a hardware/software co-design method is presented to improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
35

Chabalenge, Billy, Sachin A. Korde, Adrian L. Kelly, Daniel Neagu, and Anant R. Paradkar. "Understanding matrix-assisted continuous co-crystallization using a data mining approach in Quality by Design (QbD)." 2020. http://hdl.handle.net/10454/17941.

Full text
Abstract:
Yes
The present study demonstrates the application of decision tree algorithms to the co-crystallization process. Fifty four (54) batches of carbamazepine-salicylic acid co-crystals embedded in poly(ethylene oxide) were manufactured via hot melt extrusion and characterized by powder X-ray diffraction, differnetial scanning calorimetry, and near-infrared spectroscopy. This dataset was then applied in WEKA, which is an open-sourced machine learning software to study the effect of processing temperature, screw speed, screw configuration, and poly(ethylene oxide) concentration on the percentage of co-crystal conversion. The decision trees obtained provided statistically meaningful and easy-to-interpret rules, demonstrating the potential to use the method to make rational decisions during the development of co-crystallization processes.
Commonwealth Scholarship Commission in the UK (ZMCS-2018-783) and Engineering and Physical Sciences Research Council (EPSRC EP/J003360/1 and EP/L027011/1)
The full-text of this article will be released for public view at the end of the publisher embargo on 09 June 2021.
APA, Harvard, Vancouver, ISO, and other styles
36

Huang, Jiang-Shiuan, and 黃健軒. "Hardware/Software Co-design and Implementation of a Two-stage Algorithmic Processor for Hough-Transform-based Line Detection." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/chw2e4.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
99
This thesis is related to the hardware/software co-design and verification of an algorithmic processor for an HT-based (Hough- Transform-based) two-stage line detection algorithm. The related research work includes four parts: The first part is about software design of the HT-based line detection algorithm for binary images. After analyzing the property of the HT-based algorithm and considering about the limited hardware resources in the embedded system, a two-stage HT-based algorithm for line detection has been developed. The second part is to design and implement a two-stage algorithmic processor for HT-based line detection. SDARM is used to store the whole binary images. Therefore the processor consists of source data fetching sub-processor, Hough transform sub-processor, and local max finding sub-processor. Finally, the above hardware modules are integrated into an SOPC-based system and implemented on an Altera FPGA development board. The third part is to write the related drivers for the algorithmic processor. Then the function of the algorithmic processor is verified through using a RPC-based verification system. The fourth part is about the verification and the evaluation of the run-time performance of the algorithmic processor. On the whole, the goal of this thesis is to do researches on the development of an HT-based two-stage line detection algorithm and its hardware processor. Then the related algorithmic processor is developed and implemented on the FPGA development board. After being verified by using various images, the algorithm developed in this thesis has shown very good performance. Meanwhile, it also shows that the hardware/software co-design method presented can improve the efficiency of both the design and verification flows.
APA, Harvard, Vancouver, ISO, and other styles
37

Hu, Hong-Min, and 胡閎閔. "Hardware/Software Co-design and Implementation of a Temporal-Median-Filter-based Algorithmic Processing System for Background Subtraction." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/26408138785003942689.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
103
This thesis is relevant to the hardware/software co-design and implementation of a temporal-median-filter-based algorithmic processing system for background subtraction. The research work consists of the following four parts. The first part is related to the software design of the temporal-median-filter-based background subtraction algorithm. Meanwhile, through using the image-based output results, this algorithm has demonstrated its superiority in various applications. The second part is to design and implement a temporal-median-filter-based algorithmic processor for background subtraction. This algorithmic processor comprises three subprocessors which are for image information access, median finding, and background subtraction. Finally, all these parts mentioned above are integrated together and implemented on an Altera FPGA development board. The third part is related to the design and implementation of an algorithmic processing system which comprises SDRAM (for storing multiple complete images), the algorithmic processor described above, NIOS II CPU, and the related firmware. Meanwhile, the functionality of this system is verified through using NIOS II IDE. The fourth part is to analyze and evaluate the software, firmware, and hardware performance of the whole algorithmic processing system. On the whole, the goals of this thesis are to do research on a temporal-median-filter-based background subtraction algorithm and design an algorithmic processing system (on an Altera FPGA development board) for it. After being verified with various kinds of digital images, the algorithmic processing system developed in this thesis has shown fabulous computing performance and the related hardware/software co-design method can also be used to improve the efficiency of the design and verification process for other algorithmic processing systems.
APA, Harvard, Vancouver, ISO, and other styles
38

Hsu, Bo-Hsiang, and 許博翔. "Hardware/Software Co-design and Implementation of a Multi-pixel-based Pipelined Algorithmic Processor for Single-pass-based Connected Component Labeling." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/uhe6rj.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
100
This thesis is relevant to the hardware/software co-design and verification of an algorithmic processor for single-pass-based connected component labeling. The research work consists of the following four parts. The first part of the thesis focuses on the software design for the connected component labeling algorithms. After analyzing the characteristics of the computing results and considering the limitation of physical resources in the embedded systems, single-pass-based connected component labeling algorithms have been developed. The second part of the thesis focuses on the hardware design for single-pass-based connected component labeling algorithms. A DDR SDRAM is used to store the whole binary input image and the coordinate information of the bounding box of the labeled components. The algorithmic processor comprises four sub-processors: table initializer, labeler, connected component combinator, and connected component information retriever. And, finally, these hardware designs are integrated together and implemented on an Altera FPGA development board. The third part of the thesis focuses on writing the relevant drivers to construct a verification system for the algorithmic processor. Through using the remote procedure calls this system is controlled to verify the functionality of the processor. The fourth part of the thesis focuses on the verification and performance evaluation of the whole hardware and software for the algorithmic processor. Generally speaking, the goal of this thesis is to do the research on the single-pass-based connected component labeling algorithms and algorithmic processors for them are designed and implemented with the Altera FPGA development board. After verifying the algorithmic processors with various types of digital images, it has been shown that the algorithmic processors developed in this thesis have fabulous computing performance. Meanwhile, this approach of hardware/software co-design can also improve the efficiency of both design and verification flows for algorithmic processors.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography