Dissertations / Theses on the topic 'Software acceleration'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 43 dissertations / theses for your research on the topic 'Software acceleration.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Borgström, Fredrik. "Acceleration of FreeRTOS withSierra RTOS accelerator : Implementation of a FreeRTOS software layer onSierra RTOS accelerator." Thesis, KTH, Data- och elektroteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188518.
Full textIdag är effekten av de vanligaste åtgärderna för att förbättra prestandan av inbyggda system och realtidsoperativsystem väldigt liten. På grund av detta är det intressant att undersöka nya åtgärder för att tänja prestandagränserna av inbyggda system och realtidsoperativsystem ytterliggare. Det har tidigare påvisats att det hårdvarubaseraderealtidsoperativsystemet, Sierra, har bättre prestanda än det mjukvarubaseraderealtidsoperativsystemet, FreeRTOS. Dessa realtidsoperativsystem har även visats vara lika i flera aspekter, vilket betyder att det är möjligt för Sierra att accelererera FreeRTOS. I detta examensarbete har en implementering av en sådan acceleration genomförts. Eftersom befintliga realtidsoperativsystem ständigtär i utveckling i kombination med att det är flera år sedan som en tidigare jämförelse mellan de båda systemen utfördes, så jämfördes FreeRTOS och Sierra i fråga om funktionalitet och uppbyggnad även i detta examensarbete.Denna jämförelse visade att FreeRTOS och Sierra delar de mest grundläggande funktionerna av ett realtidsoperativsystem, och som därmed kan accelereras av Sierra, men att FreeRTOS även har ett antal exklusiva funktioner för att underlätta användningen av det realtidsoperativsystemet. Informationen som erhölls av denna jämförelse var sedan grunden för hur själva accelerationen skulle implementeras. Efter ett antal prestandatesterkunde det konstateras att alla implementerade funktioner, med undantag för ett fåtal, hade kortare exekveringstid än motsvarande funktioner i ursprungsversionen av FreeRTOS.
Kulkarni, Pallavi Anil. "Hardware acceleration of software library string functions." Ann Arbor, Mich. : ProQuest, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1447245.
Full textTitle from PDF title page (viewed Nov. 19, 2009). Source: Masters Abstracts International, Volume: 46-03, page: 1577. Adviser: Mitch Thornton. Includes bibliographical references.
Blumer, Aric David. "Register Transfer Level Simulation Acceleration via Hardware/Software Process Migration." Diss., Virginia Tech, 2007. http://hdl.handle.net/10919/29380.
Full textPh. D.
Samothrakis, Stavros Nikolaou. "Acceleration techniques in ray tracing for dynamic scenes." Thesis, University of Sussex, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.241671.
Full textSingh, Ajeet. "GePSeA: A General-Purpose Software Acceleration Framework for Lightweight Task Offloading." Thesis, Virginia Tech, 2009. http://hdl.handle.net/10919/34264.
Full text
Consequently, this thesis proposes a framework called GePSeA (General Purpose Software
Acceleration Framework), which uses a small
fraction of the computational power on multi-core architectures to offload complex application-specific tasks. Specifically, GePSeA provides a lightweight process that acts as a helper agent to the application by executing application-specific tasks asynchronously and efficiently. GePSeA is not meant to replace hardware accelerators but to extend them. GePSeA
provide several utilities called core components that offload tasks on to the core or to the special-purpose hardware when available in a way that is transparent to the application. Examples of such core components include reliable communication service, distributed lock management, global memory management, dynamic load distribution and network protocol processing. We then apply the GePSeA framework to two applications, namely mpiBLAST, an open-source computational biology application and Reliable Blast UDP (RBUDP) based file transfer application. We observe significant speed-up for both applications.
Master of Science
Zhu, Huanzhou. "Developing graph-based co-scheduling algorithms with GPU acceleration." Thesis, University of Warwick, 2016. http://wrap.warwick.ac.uk/92000/.
Full textYalim, Hacer. "Acceleration Of Direct Volume Rendering With Texture Slabs On Programmable Graphics Hardware." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606195/index.pdf.
Full textSherban, V. Yu. "Software components of the system for the kinematic and dynamic analysis of machines for sewing, textile and shoe industries." Thesis, Київський національний університет технологій та дизайну, 2017. https://er.knutd.edu.ua/handle/123456789/6655.
Full textWang, Tsu-Han. "Real-time Software Architectures and Performance Evaluation Methods for 5G Radio Systems." Electronic Thesis or Diss., Sorbonne université, 2022. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2022SORUS362.pdf.
Full textThe thesis deals with 5G real-time Software Defined Radio architectures. In order to match 5G performance requirements, computational acceleration combined with real-time process scheduling methods are required. In 5G embedded systems acceleration amounts to a judicious combination additional hardware units for the most computationally costly functions with software for simpler arithmetic and complex control procedures. Fully software-based solutions are also appearing for certain applications, in particular in the so-called Open Radio-Access Network (openRAN) ecosystem. The contributions of this thesis lie in methods for purely software-based acceleration and real-time control of low-latency fronthaul interfaces. Since 5G has stringent latency requirements and support for very high-speed data traffic, methods for scheduling baseband processing need to be tailored to the specifics of the air-interface. Specifically, we propose a functional decomposition of the 5G air interface which is amenable to multi-core software implementations targeting high-end servers exploiting single-instruction multiple-data (SIMD) acceleration. Moreover, we provide some avenues for multi-threaded processing through pipelining and the use of thread pools. We highlight the methods and their performance evaluation that have been exploited during the development of the OpenAirInterface 5G implementation
Tell, Eric. "Design of Programmable Baseband Processors." Doctoral thesis, Linköping : Univ, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-4377.
Full textAxillus, Viktor. "Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classification." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-19160.
Full textZávodník, Tomáš. "Architektura pro rekonstrukci knihy objednávek s nízkou latencí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255477.
Full textKekely, Lukáš. "Softwarově řízené monitorování síťového provozu." Doctoral thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-412592.
Full textDavid, Radu Alin. "Improving Channel Estimation and Tracking Performance in Distributed MIMO Communication Systems." Digital WPI, 2015. https://digitalcommons.wpi.edu/etd-dissertations/229.
Full textLee, Joo Hong. "Hybrid Parallel Computing Strategies for Scientific Computing Applications." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28882.
Full textPh. D.
Agha, Shahrukh. "Software and hardware techniques for accelerating MPEG2 motion estimation." Thesis, Loughborough University, 2006. https://dspace.lboro.ac.uk/2134/33935.
Full textLinford, John Christian. "Accelerating Atmospheric Modeling Through Emerging Multi-core Technologies." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/27599.
Full textPh. D.
Yu, Jason Kwok Kwun. "Vector processing as a soft-core processor accelerator." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/2394.
Full textBashford-Rogers, Thomas. "Accelerating global illumination for physically-based rendering." Thesis, University of Warwick, 2011. http://wrap.warwick.ac.uk/36762/.
Full textKancharla, Akshitha, and Akhil Pannala. "Factors for Accelerating the Development Speed in Systems of Artificial Intelligence." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18420.
Full textWoods, Andrew. "Accelerating software radio astronomy FX correlation with GPU and FPGA co-processors." Master's thesis, University of Cape Town, 2010. http://hdl.handle.net/11427/12212.
Full textIncludes bibliographical references (leaves [117]-121).
This thesis attempts to accelerate compute intensive sections of a frequency domain radio astronomy correlator using dedicated co-processors. Two co-processor implementations were made independently with one using reconfigurable hardware (Xilinx Virtex 4LXlOO) and the other uses a graphics processor (Nvidia 9800GT). The objective of a radio astronomy correlator is to compute the complex valued correlation products for each baseline which can be used to reconstruct the sky's radio brightness distribution. Radio astronomy correlators have huge computation demands and this dissertation focuses on the computational aspects of correlation, concentrating on the X-engine stage of the correlator.
Enes, Petter. "Build and Release Management : Supporting development of accelerator control software at CERN." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8708.
Full textSoftware configuration management deals with control of the evolution of complex computer systems. The ability to handle changes, corrections and extensions is decisive for the outcome of a software project. Automated processes for handling these elements are therefore a crucial part of software development. This thesis focuses on build and release management, in the context of developing a control system for the worlds biggest particle accelerator. Build and release cover topics such as build support, versioning, dependency management and release management. The main part of the work has consisted of extending an in-house solution supporting the development process of accelerator control software at CERN. The main focus of this report is on the practical work done in this context. Based on a literature survey and examining of available tools, this thesis presents the state of the art concerning build and release management before elaborating on the practical work. Based on the experience gained from the work of this thesis, I conclude with a discussion of whether or not it is beneficiary to stick with in-house solution, or if switching to an external tool could prove better for the development process implemented.
Motyka, Mikael. "Impact of Usability for Particle Accelerator Software Tools Analyzing Availability and Reliability." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-14394.
Full textVikten av att ta hänsyn till användbarhet vid mjukvaruutveckling är välkänt inom litteraturen. Denna icke-funktionella system-aspekt fokuserar på enkelheten och effektiviteten vid systemhantering. Användbarheten av ett system kan dock inte definieras som en specifik systemaspekt då den beror på tillämpningsområdet. Detta arbete undersöker inverkan av användbarheten gällande verktyg som används vid analys utav tillgänglighet och tillförlitlighet (Eng. Availability and Reliability) för partikelacceleratorer genom att vidareutveckla den befintliga mjukvaran Availsim. Mjukvaran är bevisad att på ett unikt sett kunna ta acceleratorspecifika hänsynstaganden som inte är möjliga att återskapa med de kommersiella verktyg som finns tillgängliga idag. Trots mjukvarans unika egenskaper är den inte använd. Detta, på grund av tidigare modifieringar, vars begränsningar endast möjliggör användandet av mjukvaran vid en specifik anläggning. Studien utfördes i samarbete med European Spallation Source, ERIC. ESS är en multidisciplinär forskningsanläggning baserad på världens kraftfullaste neutronkälla som för närvarande byggs i Lund, Sverige. Arbetet utfördes i säkerhetsgruppen inom acceleratordivisionen, där analysen utav acceleratorns tillgänglighet och tillförlitlighet utförs. Design Science Research användes som forskningsmetodik för att svara på hur den föreslagna mjukvaran kan bidra till att förbättra användbarheten vid den angivna analysen, samt definiera de befintliga användbarhetsproblemen inom området. För att få en överblick av hur analysen bedrivs i dagsläget skickades tre enkäter ut och en intervju genomfördes för att sammanställa viktiga egenskaper att ta till hänsyn vid utveckling av den nya mjukvaran, tillsammans med hur forskarna uppfattar användbarhet för denna typ av analys. Den utvecklade mjukvaran utvärderades med två standardiserade frågeformulär, inriktade på att mäta användbarhet för system vid namn ”After Scenario Questionnaire” och ”System Usability Scale”. En tredje uppsättning av frågor konstruerades också för att explicit mäta de viktiga egenskaper som framkommit vid enkätutskicket och intervjun. I resultatet lyfts problem i det aktuella området fram där de verktyg som används vid analysen listades tillsammans med deras positiva och negativa egenskaper. Dessa egenskaper indikerade på en omständig och lång process för att erhålla de analysresultat som önskas. Det konstaterades också att den anpassade Availsim-versionen förbättrar användbarheten gentemot tidigare versioner genom att lista specifika egenskaper som kunde identifieras till att direkt ha en inverkan i hur användbarheten uppfattas. Resultaten visade också på att det befintliga, kommersiella verktyget Reliasoft erhöll högre resultat vid de standardiserade testerna. Något som tyder på utrymme för förbättringar.
Khasymski, Aleksandr Sergeev. "Accelerated Storage Systems." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/51612.
Full textPh. D.
Alhamwi, Ali. "Co-design hardware/software of real time vision system on FPGA for obstacle detection." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30342/document.
Full textObstacle detection, localization and occupancy map reconstruction are essential abilities for a mobile robot to navigate in an environment. Solutions based on passive monocular vision such as simultaneous localization and mapping (SLAM) or optical flow (OF) require intensive computation. Systems based on these methods often rely on over-sized computation resources to meet real-time constraints. Inverse perspective mapping allows for obstacles detection at a low computational cost under the hypothesis of a flat ground observed during motion. It is thus possible to build an occupancy grid map by integrating obstacle detection over the course of the sensor. In this work we propose hardware/software system for obstacle detection, localization and 2D occupancy map reconstruction in real-time. The proposed system uses a FPGA-based design for vision and proprioceptive sensors for localization. Fusing this information allows for the construction of a simple environment model of the sensor surrounding. The resulting architecture is a low-cost, low-latency, high-throughput and low-power system
Magnuson, Martin. "Process Control Methods for Operation of Superconducting Cavities at the LEP Accelerator at CERN." Thesis, Linköpings universitet, Institutionen för fysik, kemi och biologi, 1992. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-56503.
Full textOuedraogo, Ganda Stéphane. "Automatic synthesis of hardware accelerator from high-level specifications of physical layers for flexible radio." Thesis, Rennes 1, 2014. http://www.theses.fr/2014REN1S183/document.
Full textThe Internet of Things (IoT) aims at connecting billions of communicating devices through an internet-like network. To this aim, the access to these things is expected to be performed via wireless technologies without using any predefined infrastructures or standards. This technology requires defining and implementing smart nodes capable to adapt to different radio communication protocols. In this thesis, we have defined a design methodology/flow, for such smart nodes, starting from their high-level specification down to their implementation in FPGA fabrics. This flow aims at improving the programmability of the waveforms by leveraging some high-level specifications. Thus, it relies on the High-Level Synthesis (HLS) for rapid prototyping of the waveforms functional blocks as well as the dataflow model of computation. Its entry point is Domain-Specific Language which enables modeling a waveform while inserting some implementation constraints for reconfigurable architectures such as the FPGAs. The flow is featured with a compiler which purpose is to produce some synthesis scripts and generate some RTL source code. The final waveform consists of a datapath and a control unit implemented as a Hierarchical Finite State Machine (HFSM)
Silva, João Paulo Sá da. "Data processing in Zynq APSoC." Master's thesis, Universidade de Aveiro, 2014. http://hdl.handle.net/10773/14703.
Full textField-Programmable Gate Arrays (FPGAs) were invented by Xilinx in 1985, i.e. less than 30 years ago. The influence of FPGAs on many directions in engineering is growing continuously and rapidly. There are many reasons for such progress and the most important are the inherent reconfigurability of FPGAs and relatively cheap development cost. Recent field-configurable micro-chips combine the capabilities of software and hardware by incorporating multi-core processors and reconfigurable logic enabling the development of highly optimized computational systems for a vast variety of practical applications, including high-performance computing, data, signal and image processing, embedded systems, and many others. In this context, the main goals of the thesis are to study the new micro-chips, namely the Zynq-7000 family and to apply them to two selected case studies: data sort and Hamming weight calculation for long vectors.
Field-Programmable Gate Arrays (FPGAs) foram inventadas pela Xilinx em 1985, ou seja, há menos de 30 anos. A influência das FPGAs está a crescer continua e rapidamente em muitos ramos de engenharia. Há varias razões para esta evolução, as mais importantes são a sua capacidade de reconfiguração inerente e os baixos custos de desenvolvimento. Os micro-chips mais recentes baseados em FPGAs combinam capacidades de software e hardware através da incorporação de processadores multi-core e lógica reconfigurável permitindo o desenvolvimento de sistemas computacionais altamente otimizados para uma grande variedade de aplicações práticas, incluindo computação de alto desempenho, processamento de dados, de sinal e imagem, sistemas embutidos, e muitos outros. Neste contexto, este trabalho tem como o objetivo principal estudar estes novos micro-chips, nomeadamente a família Zynq-7000, para encontrar as melhores formas de potenciar as vantagens deste sistema usando casos de estudo como ordenação de dados e cálculo do peso de Hamming para vetores longos.
Jönsson, Oscar. "An explorative study of the technology transfer coach as a preliminary for the design of a computer aid." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-108308.
Full textYang, Fu-Kai, and 楊復凱. "Acceleration and Improvement of MPEG View Synthesis Reference Software on NVIDIA CUDA." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/62819489159539479165.
Full text國立交通大學
電子研究所
100
With the prosperity of 3D technology, Free Viewpoint Television (FTV) becomes a popular research topic. “View Synthesis” is a key step in FTV. There are some important and to-be-solved issues such as real-time operation and complexity reduction. NVIDIA Compute Unified Device Architecture (CUDA) is an effective platform in handling data-intensive applications. To implement the MPEG view synthesis reference software (VSRS) on CUDA, we parallelize the VSRS structure. In the meanwhile, our proposed parallel scheme improves the picture quality. We first propose an intra hole filling scheme to replace the original median filter. Then, to avoid data dependence we properly partition the data so that they can be processed by the parallel GPU threads. Also, we rearrange the data processing order in the threads to reduce branching instructions. Combining these techniques together, we save more than 94% computing time and achieve a similar image quality.
Wu, Jyun-Cheng, and 吳峻丞. "Design of a Real-time Software-Based GPS Baseband Receiver Using GPU Acceleration." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/21287776369433988238.
Full text國立臺灣大學
電子工程學研究所
99
Nowaday, the personal navigation devices are more and more popular. The demand of GPS receiver in any form is also increasing. Developing the GPS receiver in software is feasible with the increasing of processor computation power. Compared to the traditional hardware receiver, the software-based receiver has many advantages. In system integration, upgrade, new algorism adopting and the platform changing, the software-based receiver has much more flexibility than traditional hardware receiver. In this thesis, I will improve the GPS software baseband receiver based on previous student’s research. There are three issues that I want to improve - software robustness, efficiency and position accuracy. For software robustness, I use the dynamic satellite list to let the software receiver can change its available satellite list based on the strength of satellite signals. Therefore the receiver can have much better adaptability to the real-world environment. For the efficiency of software execution, I adopted CUDA parallel programming model. By moving most computation cost elements into GPU, it not only could reduce the influence of CPU loading on the receiver performance, but also could speed up the execution of receiver. Furthermore, it can reduce the energy consumption of our receiver. Finally, I change some fine time estimation equation, in order to improve the position accuracy of our receiver.
Zhou, Boyou. "A multi-layer approach to designing secure systems: from circuit to software." Thesis, 2019. https://hdl.handle.net/2144/36149.
Full textNüssle, Mondrian [Verfasser]. "Acceleration of the hardware software interface of a communication device for parallel systems / vorgelegt von Mondrian Benediktus Nüßle." 2009. http://d-nb.info/993238440/34.
Full textTADDEI, RUGGERO. "Numerical Techniques for Antenna Arrays: Multi-Objective Optimization and Method of Moments Acceleration." Doctoral thesis, 2015. http://hdl.handle.net/2158/976428.
Full textAbell, Stephen W. "Parallel acceleration of deadlock detection and avoidance algorithms on GPUs." Thesis, 2013. http://hdl.handle.net/1805/3653.
Full textCurrent mainstream computing systems have become increasingly complex. Most of which have Central Processing Units (CPUs) that invoke multiple threads for their computing tasks. The growing issue with these systems is resource contention and with resource contention comes the risk of encountering a deadlock status in the system. Various software and hardware approaches exist that implement deadlock detection/avoidance techniques; however, they lack either the speed or problem size capability needed for real-time systems. The research conducted for this thesis aims to resolve issues present in past approaches by converging the two platforms (software and hardware) by means of the Graphics Processing Unit (GPU). Presented in this thesis are two GPU-based deadlock detection algorithms and one GPU-based deadlock avoidance algorithm. These GPU-based algorithms are: (i) GPU-OSDDA: A GPU-based Single Unit Resource Deadlock Detection Algorithm, (ii) GPU-LMDDA: A GPU-based Multi-Unit Resource Deadlock Detection Algorithm, and (iii) GPU-PBA: A GPU-based Deadlock Avoidance Algorithm. Both GPU-OSDDA and GPU-LMDDA utilize the Resource Allocation Graph (RAG) to represent resource allocation status in the system. However, the RAG is represented using integer-length bit-vectors. The advantages brought forth by this approach are plenty: (i) less memory required for algorithm matrices, (ii) 32 computations performed per instruction (in most cases), and (iii) allows our algorithms to handle large numbers of processes and resources. The deadlock detection algorithms also require minimal interaction with the CPU by implementing matrix storage and algorithm computations on the GPU, thus providing an interactive service type of behavior. As a result of this approach, both algorithms were able to achieve speedups over two orders of magnitude higher than their serial CPU implementations (3.17-317.42x for GPU-OSDDA and 37.17-812.50x for GPU-LMDDA). Lastly, GPU-PBA is the first parallel deadlock avoidance algorithm implemented on the GPU. While it does not achieve two orders of magnitude speedup over its CPU implementation, it does provide a platform for future deadlock avoidance research for the GPU.
Lin, Jing-bin, and 林景彬. "Software Accelerator Discussion for H.264/AVC." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/29527963119602892761.
Full text南台科技大學
電子工程系
96
With the flourishing development in multimedia technology and Internet, the application of multimedia is very popular. Due to the demand of transmitting and storing large image data, high-performance video compression techniques play important and inevitable roles in image processing. A new video compression standard, H.264, was proposed after the reveals of MPEG-1, MPEG-2 and MPEG-4 standards. It has the property of high compressing rate than MPEG-4, and has recently become a major role in multimedia field. The complexity of H.264 decoder is very huge. If the implementation of the decoder uses hardware design style, the cost is high. Also the flexibility of hardware design style is low. However, the performance is very low when the decoder is implemented by using software (C language) design style. In this thesis we discuss the design methodology of software acceleration for implementing H.264 decoder on an embedded system. In H.264 decoder the computation of IDCT part and memory is large. Instead of using C codes to implement the IDCT part and memory of the decoder, we use the assembly language to carry out the operation. We use the multimedia instruction (the wireless MMX instructions) proposed by the system to improve the performance of the decoder. The new decoder is run on the embedded system using WinCE operation. The experimental results show that the performance of our software acceleration method for designing the decoder improves 16.76% as compared to the design by using C language.
Yuan, Yi. "A microprocessor performance and reliability simulation framework using the speculative functional-first methodology." Thesis, 2011. http://hdl.handle.net/2152/ETD-UT-2011-12-4848.
Full texttext
Lin, Zi-Gang, and 林子剛. "Design of Stack Memory Device and System Software for Java Accelerator IP." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/61631031609034851274.
Full textNeto, Nuno Miguel Ladeira. "A Container-based architecture for accelerating software tests via setup state caching and parallelization." Master's thesis, 2019. https://hdl.handle.net/10216/122203.
Full textChang, Keng-Chia, and 張耿嘉. "Adaboost-based Hardware Accelerator DIP Design and Hardware/Software Co-simulation for Face Detection." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/96k2r2.
Full text國立中興大學
電機工程學系所
106
In recent years, many car accidents caused by the fatigue driving have occurred frequently. Thus, many scholars and experts all over the world have paid great efforts in this issue, and they are developing the suitable detection technologies to reduce car accidents caused by driver''s drowsiness. For the fatigue detection issue, the driver’s spirit status can be evaluated through the eye blinking condition. Therefore, the proposed design implements the hardware accelerator to process the large amount of high repetitiveness data on a hardware/software co-design platform for the drowsy detection system. In the proposed fatigue detection system, by recognizing the accurate facial and eye positions, the eye detection methodology with hardware acceleration is proposed to enhance the efficiency of driver’s fatigue detections. The proposed system includes four parts, which are the face detection, the eye-glasses bridge detection, the eye detection, and the eye closure detection. Firstly, the input images are filmed by the NIR camera which has the 720x480 resolution. The system uses gray-scale images without any color information in all steps, and the proposed design works effectively in daytime and nighttime. Secondly, for face detection, the proposed system uses the machine learning method to detect the face position and face size, and the information of face geometrical position is used to reduce the searching range of driver’s eyes. In this thesis, the proposed design uses the Adaboost-based hardware accelerator for face detections. When the face size and position are already known, the proposed system can decrease the search range of eyes. The hardware accelerator architecture design for face detection is the main contribution of the thesis, and the hardware accelerator has the expandable classifier features. If the system needs more complicated machine learning classifier, the hardware-based classifier can be expanded conveniently and be improved in the future work. In experimental results, the average processing frame rates are 331 frames/sec by the proposed hardware accelerator with the 90 nanometer CMOS technology, and the design can meet the goal of real-time applications. Furthermore, the hardware architecture of face classifier could be expanded for a more complex training module. To meet the real-time issue, the input image size and the complexity of face classifier could be adjustable to improve the system accuracy.
(9529172), Ejebagom J. Ojogbo. "ZipThru: A software architecture that exploits Zipfian skew in datasets for accelerating Big Data analysis." Thesis, 2020.
Find full text"Efficient and Secure Deep Learning Inference System: A Software and Hardware Co-design Perspective." Doctoral diss., 2020. http://hdl.handle.net/2286/R.I.62825.
Full textDissertation/Thesis
Doctoral Dissertation Electrical Engineering 2020
Ramesh, Chinthala. "Hardware-Software Co-Design Accelerators for Sparse BLAS." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/4276.
Full text